Lecture 7 - Introduction to Exploratory Data Analysis

Announcements:

Goals:

Exploratory Analysis

The Dataset

Demo Dataset

There's some data at this url:

So you have a dataset. What now?

What do you want to know about a dataset before you even look at it?

Brainstorm:

Some ideas:

To help us out when analyzing the data, I'm going to build a dictionary that maps original column names to friendlier ones. Now that we've answered our basic questions, let's also load up the data.

Let's rename those columns:

So you loaded a dataset. What now?

Brainstorm:

Some ideas:

Look at some rows

Which ones?

Compute summary statistics (of numerical columns)

Look at distribution of each column

Histograms! I love histograms!

Look at pairwise scatter plots of all pairs of (numerical) columns

TIL: Pandas has a function for this!

What if we consider only adults (21+)?

To-do list - did we do all of this?