DATA 311 - Lab 3: Visualize!

Scott Wehrwein

Winter 2023

Introduction

Now that we have the language to talk about, and the tools to create, effective visualizations, this lab asks you to spend some effort producing some nice visualizations.

Collaboration Policy

This lab will be done individually. As usual, you may spend the lab period working together with a partner. After the lab period ends, you will work independently and submit your own solution, though you may continue to collaborate in accordance with the individual assignment collaboration policy listed on the syllabus.

Getting Started

You’ll complete all three parts of this lab in a single Colab notebook called lab3.ipynb.

Part 1: A really bad plot

Background

Having been introduced to some of the principles of good visualization, you are now qualified to cultivate your visualization snobbery. A great way to do this is to peruse some of the various websites devoted to collecting and ridiculing weird and bad graphs and data visualizations.

If you want to revisit or dig a little deeper into some of the visualization aesthetics ideas we talked about in class, check out Chapter 6.2 in the textbook for a quick summary of the Tufte principles we discussed in class, and I also recommend perusing this excerpt from Tufte’s The Visual Display of Quantatitive Information.

Your Tasks

For Part 1 of this lab, we’re going to dig into one particularly bad plot, talk about why it’s bad, then make it better. I’d like you to consider the graphs in this random blog post. The author was trying to do a bit of amateur data science back in 2015, and came up with some rather unusual ways to present the results. In particular, let’s focus on the third figure (the one with the “parents” and “all users” lines).

Your tasks are the following:

When making your plot, Just read the numbers off of the old plots as best as you can. It’s a little tedious, but I don’t really want to go ask the author for his spreadsheet…

Part 2: Some really nice plots

In this part, you will produce two really nice visualizations. The data you choose to visualize is mostly up to you - you are free to revisit any of the datasets we’ve worked with in class, in prior labs, or the datasets built into Seaborn (accessible via sns.load_dataset, discoverable via sns.get_dataset_names()). You can use one dataset for each plots, or make both plots with the same dataset - this is up to you.

Though many nice plots can be created with simple calls to Seaborn functions, you will probably need to drop down to Matplotlib functions to fine-tune and customize your plots. We haven’t covered much on how to do this, so I expect you’ll need to do some searching around for the functionality you need.

I’m looking for not only a faithful representation of the data, but a high degree of polish. Notice that on the rubric entry for the plot itself, a basic df.plot.____() would probably score around 2, since those plots get the point across but aren’t very nice beyond that; a basic, completely uncustomized Seaborn plot might get a 3 thanks to Seaborn’s better defaults.

For each of your plots, include:

Your plots should meet the following guidelines:

Part 3

Make the worst plot you can with the data from one of your Part 2 plots. Break all the rules. Aim for something that still technically represents all of the right numbers while actually being totally misleading or unreadable. Prefer being misleading to merely unreadable. The best worst plots may be showcased in class to bestow glory upon their creators.

Submitting your work

Notebooks and Data

Submit your single notebook lab3.ipynb in a zip file to Canvas. If your plots require data that is not built-in to Seaborn or accessible directly via url, also include the necessary data in CSV format in your zip file.

Survey

Finally fill out the Lab 3 Survey on Canvas. Your submission will not be considered complete until you have submitted the survey.

Rubric

Part 1 is worth 15 points, based on the quality and clarity of your discussions and plot.

Part 2 is worth 15 points per plot, for a total of 30 points.

Grading of plots will be done by critique based on the princples of visualization.

Plot

Justification

Part 3 is worth 5 points and will be graded on how thoroughly your plot violates the principles of visualization aesthetics discussed in class.