Winter 2023
Now that we have the language to talk about, and the tools to create, effective visualizations, this lab asks you to spend some effort producing some nice visualizations.
This lab will be done individually. As usual, you may spend the lab period working together with a partner. After the lab period ends, you will work independently and submit your own solution, though you may continue to collaborate in accordance with the individual assignment collaboration policy listed on the syllabus.
You’ll complete all three parts of this lab in a single Colab
notebook called lab3.ipynb
.
Having been introduced to some of the principles of good visualization, you are now qualified to cultivate your visualization snobbery. A great way to do this is to peruse some of the various websites devoted to collecting and ridiculing weird and bad graphs and data visualizations.
If you want to revisit or dig a little deeper into some of the visualization aesthetics ideas we talked about in class, check out Chapter 6.2 in the textbook for a quick summary of the Tufte principles we discussed in class, and I also recommend perusing this excerpt from Tufte’s The Visual Display of Quantatitive Information.
For Part 1 of this lab, we’re going to dig into one particularly bad plot, talk about why it’s bad, then make it better. I’d like you to consider the graphs in this random blog post. The author was trying to do a bit of amateur data science back in 2015, and came up with some rather unusual ways to present the results. In particular, let’s focus on the third figure (the one with the “parents” and “all users” lines).
Your tasks are the following:
When making your plot, Just read the numbers off of the old plots as best as you can. It’s a little tedious, but I don’t really want to go ask the author for his spreadsheet…
In this part, you will produce two really nice visualizations. The
data you choose to visualize is mostly up to you - you are free to
revisit any of the datasets we’ve worked with in class, in prior labs,
or the datasets built into Seaborn (accessible via
sns.load_dataset
, discoverable via
sns.get_dataset_names()
). You can use one dataset for each
plots, or make both plots with the same dataset - this is up to you.
Though many nice plots can be created with simple calls to Seaborn functions, you will probably need to drop down to Matplotlib functions to fine-tune and customize your plots. We haven’t covered much on how to do this, so I expect you’ll need to do some searching around for the functionality you need.
I’m looking for not only a faithful representation of the data, but a
high degree of polish. Notice that on the rubric entry for the plot
itself, a basic df.plot.____()
would probably score around
2, since those plots get the point across but aren’t very nice beyond
that; a basic, completely uncustomized Seaborn plot might get a 3 thanks
to Seaborn’s better defaults.
For each of your plots, include:
Your plots should meet the following guidelines:
Each plot must be a different type. For example, you may not submit two scatterplots.
Your plots should be carefully designed to tell a specific story. This does not preclude you from making rich, data-dense plots (indeed, this is encouraged!), but the effect that motivated you to make the plot should be easy to see.
Both of your plots must be more complex than the basic form of the given type of visualization. For example, you should do something more interesting than showing a basic scatterplot showing that two variables are correlated.
Your design discussion should justify the choices you made in terms of the principles we talked about in class. You need not address every principle, but any that apply to your plot should be discussed. If there are important design elements where you stayed with the default settings of the plotting library you used, you should justify this too.
Make the worst plot you can with the data from one of your Part 2 plots. Break all the rules. Aim for something that still technically represents all of the right numbers while actually being totally misleading or unreadable. Prefer being misleading to merely unreadable. The best worst plots may be showcased in class to bestow glory upon their creators.
Submit your single notebook lab3.ipynb
in a zip file to
Canvas. If your plots require data that is not built-in to Seaborn or
accessible directly via url, also include the necessary data in CSV
format in your zip file.
Finally fill out the Lab 3 Survey on Canvas. Your submission will not be considered complete until you have submitted the survey.
Part 1 is worth 15 points, based on the quality and clarity of your discussions and plot.
Part 2 is worth 15 points per plot, for a total of 30 points.
Grading of plots will be done by critique based on the princples of visualization.
Plot
Justification
Part 3 is worth 5 points and will be graded on how thoroughly your plot violates the principles of visualization aesthetics discussed in class.