Lecture 19 - Visualization (1)

Announcements

Goals

Big Idea: Why visualize?

Consider Anscombe's Quartet:

Hey, they're all the same! ...right? Let's confirm by visualizing:

Hmm, that didn't come out how I thought it would.

Takeaway: visualization is often the best (and sometimes the only) way to understand a dataset.

When should you visualize?

What makes a good visualization?

This is like asking what makes a good painting - it requires a sense of aesthetics.

Game plan:

for p in principles:
  for e in examples:
      # discuss how p pertains to e

Principles of Good Visualization

Some principles to live by, based on the work of visualization pioneer Edward Tufte:

1. Maximize data-ink ratio

The data-ink ratio is the amount of "ink" used to represent data divided by the total amount of "ink" in the graphic:

$$ \frac{\textrm{ink used to represent data}}{\textrm{total ink in the graphic}}$$

2. Minimize lie factor

The lie factor is the ratio between the size of the effect in your graphic and the size of the effect in the data:

$$ \frac{\textrm{size of effect in the graphic}}{\textrm{size of effect in the data}}$$

3. Minimize chartjunk

Chartjunk is loosely defined as extraneous visual elements that do not further the purpose of the graphic.

4. Use scales and labeling well

5. Use color and shading well

6. Use repetition well

Examples

With apologies for shoddy scan quality on some of these, see vis_examples.