This assignment is inspired by a pair of articles (Facebook
article / archival, Google
article / archival) in the New York
Times in Spring 2018. One of their tech columnists downloaded all of the
data that those two companies had about him. For this assignment, your
job is to download, explore, and reflect on your own data from some
online service that you use.
Instructions:
- Take a moment and think. What sorts of data do you imagine the
companies behind the tech products/services you use have about you?
- Decide on a service you use and figure out how to get your data.
Most major companies have a fairly easy process for this, partly thanks
to regulatory requirements. Examples:
- Poke around in your data a bit. What sort of information has the
company collected about you? Do you think it’s more or less accurate?
Depending on the service you exported from, many of the files might be
in JSON format. Firefox can format JSON for you if you load the file in
the browser, and many text editors such as VS Code can be configured to
display JSON; in VS Code, you can prettify JSON data with the Format
Document command in the command palette. There are nice web viewers too,
but maybe you shouldn’t upload your personal data to them!)
- Write an approximately 1-page (single-spaced) reflection about your
findings. Use the questions below to guide your reflections, but don’t
feel the need to answer all of them. The 1 page guideline is very loose,
so please direct your efforts towards conveying substance rather than
filling a page: I’d much rather read 2 good paragraphs than try to fish
the substance out of 4 fluffy ones.
- We will have an in-class discussion on this topic (tentatively) on
Wednesday, October 8th. Please come to class prepared to discuss your
findings and thoughts with your classmates.
Questions:
Note: Nothing in the following questions obligates you to
share personal information about yourself. You are under no pressure to
reveal private information, and in fact it’s much better if you
don’t!
- How did your data compare to what you expected? Was there anything
surprising, or creepy, or just plain strange? Describe the types of data
that you see.
- How comprehensive was your download? Are you able to determine
whether the company gave you everything they had, or were they more
selective?
- What kinds of data science questions could someone answer about you
based solely on this data? What kinds of data science questions could
someone with access to millions of records like yours answer?
- Are you comfortable with the extent and/or accuracy of data
collected? If the company’s systems were breached by hackers and your
data was released on the internet or sold on the dark web, would this
change?
- Does the company have controls for opting out of collection of the
sorts of data you’d rather they not have? If not - or if the company
suddenly decided tomorrow to remove those controls - what should our
society do about this?
Submission
Submit your reflections on Canvas in PDF format.
Rubric
This assignment is out of 10 points, and will be graded relatively
coarsely on effort, thoughtfulness, and clarity. Full credit will be
awarded if you got your data, spent some time looking at it, and
produced thoughtful and clearly-written reflections.