DATA 311 - Lecture 5 Worksheet

Write the names of the students at your table:

For each of the following datasets, come up with at least two questions you think would be interesting to investigate, and could be answered using the available. Feel free to pull up the websites and dig more into what’s there.

  1. IMDB: All things movies. https://developer.imdb.com/non-commercial-datasets/
  1. Boston Bike Share data: https://www.bluebikes.com/system-data

  2. You flip a fair coin; if that fair coin lands heads, you roll a fair 3-sided die. If the coin lands tails you roll a weighted three-sided die whose odds of coming up 1 are 0.6, while the odds coming up 2 or 3 are 0.2 each. If \(C\) is the outcome of the coin flip (\(H\) for heads, \(T\) for tails) and \(D\) is the outcome of the die roll (1 through 3), write down the full joint distribution \(P(C, D)\). I’ve given you the first one

  3. What is \(P(D=1 | C=T)\)?

  4. What is \(P(D=1)\)?