Write the names of the students at your table:
As variability measures, the standard deviation and variance are closely related to the mean, meaning that they are likely to be sensitive to outliers just as the mean is. Can you think of a way to measure variability that might be more robust to outliers?
You flip a fair coin; if that fair coin lands heads, you roll a fair 3-sided die. If the coin lands tails you roll a weighted three-sided die whose odds of coming up 1 are 0.6, while the odds coming up 2 or 3 are 0.2 each. If \(C\) is the outcome of the coin flip (\(H\) for heads, \(T\) for tails) and \(D\) is the outcome of the die roll (1 through 3), write down the full joint distribution \(P(C, D)\). I’ve given you the first one
What is \(P(D=1 | C=T)\)?
What is \(P(D=1)\)?
For each of the following datasets, come up with at least two questions you think would be interesting to investigate, and could be answered using the available. Feel free to pull up the websites and dig more into what’s there.
IMDB: All things movies. https://developer.imdb.com/non-commercial-datasets/
Boston Bike Share data: https://www.bluebikes.com/system-data
Trips: Trip Duration, Start Time and Date, Stop Time and Date, Start Station Name & ID, End Station Name & ID, Bike ID, User Type (Casual = Single Trip or Day Pass user; Member = Annual or Monthly Member), Birth Year, Gender (self-reported by member)
Stations: ID, Name, GPS coordinates, # docks