DATA 311 - Lab 3: Visualize!

Scott Wehrwein

Winter 2023

Introduction

Now that we have the language to talk about, and the tools to create, effective visualizations, this lab asks you to spend some effort producing some nice visualizations.

Collaboration Policy

This lab will be done individually. As usual, you may spend the lab period working together with a partner. After the lab period ends, you will work independently and submit your own solution, though you may continue to collaborate in accordance with the individual assignment collaboration policy listed on the syllabus.

Getting Started

You’ll complete all three parts of this lab in a single Colab notebook called lab3.ipynb.

Part 1: A really bad plot

Background

Having been introduced to some of the principles of good visualization, you are now qualified to cultivate your visualization snobbery. A great way to do this is to peruse some of the various websites devoted to collecting and ridiculing weird and bad graphs and data visualizations.

If you want to revisit or dig a little deeper into some of the visualization aesthetics ideas we talked about in class, check out Chapter 6.2 in the textbook for a quick summary of the Tufte principles we discussed in class, and I also recommend perusing this excerpt from Tufte’s The Visual Display of Quantatitive Information.

Your Tasks

For Part 1 of this lab, we’re going to dig into one particularly bad plot, talk about why it’s bad, then make it better. I’d like you to consider the graphs in this random blog post. The author was trying to do a bit of amateur data science back in 2015, and came up with some rather unusual ways to present the results. In particular, let’s focus on the third figure (the one with the “parents” and “all users” lines).

Your tasks are the following:

Write a paragraph or two discussing all of the merits and deficiencies you can find in the original plot. Try to find some of both.
Make a new plot of the same data that communicates the data as clearly as possible.
Write another paragraph or two justifying your design choices.

When making your plot, Just read the numbers off of the old plots as best as you can. It’s a little tedious, but I don’t really want to go ask the author for his spreadsheet…

Part 2: Some really nice plots

In this part, you will produce two really nice visualizations. The data you choose to visualize is mostly up to you - you are free to revisit any of the datasets we’ve worked with in class, in prior labs, or the datasets built into Seaborn (accessible via sns.load_dataset, discoverable via sns.get_dataset_names()). You can use one dataset for each plots, or make both plots with the same dataset - this is up to you.

Though many nice plots can be created with simple calls to Seaborn functions, you will probably need to drop down to Matplotlib functions to fine-tune and customize your plots. We haven’t covered much on how to do this, so I expect you’ll need to do some searching around for the functionality you need.

I’m looking for not only a faithful representation of the data, but a high degree of polish. Notice that on the rubric entry for the plot itself, a basic df.plot.____() would probably score around 2, since those plots get the point across but aren’t very nice beyond that; a basic, completely uncustomized Seaborn plot might get a 3 thanks to Seaborn’s better defaults.

For each of your plots, include:

Any data processing code needed
The code to produce the plot and plot itself
A caption describing what the plot shows and what the reader should focus on
A discussion of the design decisions you made when creating your plot

Your plots should meet the following guidelines:

Each plot must be a different type. For example, you may not submit two scatterplots.
Your plots should be carefully designed to tell a specific story. This does not preclude you from making rich, data-dense plots (indeed, this is encouraged!), but the effect that motivated you to make the plot should be easy to see.
Both of your plots must be more complex than the basic form of the given type of visualization. For example, you should do something more interesting than showing a basic scatterplot showing that two variables are correlated.
Your design discussion should justify the choices you made in terms of the principles we talked about in class. You need not address every principle, but any that apply to your plot should be discussed. If there are important design elements where you stayed with the default settings of the plotting library you used, you should justify this too.

Part 3

Make the worst plot you can with the data from one of your Part 2 plots. Break all the rules. Aim for something that still technically represents all of the right numbers while actually being totally misleading or unreadable. Prefer being misleading to merely unreadable. The best worst plots may be showcased in class to bestow glory upon their creators.

Submitting your work

Notebooks and Data

Submit your single notebook lab3.ipynb in a zip file to Canvas. If your plots require data that is not built-in to Seaborn or accessible directly via url, also include the necessary data in CSV format in your zip file.

Survey

Finally fill out the Lab 3 Survey on Canvas. Your submission will not be considered complete until you have submitted the survey.

Rubric

Part 1 is worth 15 points, based on the quality and clarity of your discussions and plot.

5 points: discussion of the blog post plot’s merits
- 5/5 thoughtful, observant, and detailed; identifies both strengths and weaknesses
- 3/5 identifies only one or two glaring issues with the plot
- 1/5 an attempt was made
10 points: your plot will be graded using the same scheme as the plots from Part 2.

Part 2 is worth 15 points per plot, for a total of 30 points.

Grading of plots will be done by critique based on the princples of visualization.

Plot

5/5 Plot is excellent
4/5 A few small improvements that could be made to the plot
3/5 Multiple uncontroversial improvements could be made
2/5 Minimal thought was put into the design of the plot
1/5 A plot exists
0/5 No plot exists

Justification

5/5 All decisions are carefully considered and well justified
4/5 A few important design decisions are not well justified.
3/5 Many design decisions are not well justified
2/5 Most design decisions are not well justified
0/5 No attempt was made to justify plot design

Part 3 is worth 5 points and will be graded on how thoroughly your plot violates the principles of visualization aesthetics discussed in class.