data_url = "https://fw.cs.wwu.edu/~wehrwes/courses/data311_21f/data/yellow_tripdata_2018-06_small.csv"


import pandas as pd

df = pd.read_csv(data_url)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 435692 entries, 0 to 435691
Data columns (total 7 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Unnamed: 0        435692 non-null  int64  
 1   time_elapsed_min  435692 non-null  int64  
 2   passenger_count   435692 non-null  int64  
 3   trip_distance     435692 non-null  float64
 4   payment_type      435692 non-null  int64  
 5   fare_amount       435692 non-null  float64
 6   tip_amount        435692 non-null  float64
dtypes: float64(3), int64(4)
memory usage: 23.3 MB


df.describe()


df["payment_type"].value_counts()

1    302555
2    130039
3      2448
4       650
Name: payment_type, dtype: int64


pd.plotting.scatter_matrix(df[["time_elapsed_min","passenger_count", "trip_distance", "fare_amount", "tip_amount"]], figsize=(14, 14));


df.plot.scatter("fare_amount", "tip_amount", figsize=(14,14))


df[df["payment_type"] == 2].describe()


df.plot.scatter("fare_amount", "tip_amount", figsize=(14,14))

	Unnamed: 0	time_elapsed_min	passenger_count	trip_distance	payment_type	fare_amount	tip_amount
count	4.356920e+05	435692.000000	435692.000000	435692.000000	435692.000000	435692.000000	435692.000000
mean	4.357742e+06	17.155027	1.599605	3.012505	1.314178	13.267334	1.906684
std	2.513988e+06	64.953021	1.245546	3.843959	0.485448	11.798141	2.605647
min	2.300000e+01	0.000000	0.000000	0.000000	1.000000	-126.000000	-1.820000
25%	2.181294e+06	6.000000	1.000000	1.000000	1.000000	6.500000	0.000000
50%	4.362138e+06	11.000000	1.000000	1.660000	1.000000	9.500000	1.450000
75%	6.530340e+06	18.000000	2.000000	3.100000	2.000000	15.000000	2.460000
max	8.713817e+06	1439.000000	9.000000	69.460000	4.000000	480.000000	175.000000

	Unnamed: 0	time_elapsed_min	passenger_count	trip_distance	payment_type	fare_amount	tip_amount
count	1.300390e+05	130039.000000	130039.000000	130039.000000	130039.0	130039.000000	130039.000000
mean	4.378039e+06	16.753089	1.635271	2.664664	2.0	12.142102	0.000018
std	2.530190e+06	71.676311	1.265760	3.581286	0.0	10.745946	0.006655
min	4.100000e+01	0.000000	0.000000	0.000000	2.0	-52.000000	0.000000
25%	2.227600e+06	5.000000	1.000000	0.880000	2.0	6.000000	0.000000
50%	4.438654e+06	10.000000	1.000000	1.460000	2.0	8.500000	0.000000
75%	6.584410e+06	17.000000	2.000000	2.730000	2.0	14.000000	0.000000
max	8.713804e+06	1439.000000	7.000000	66.300000	2.0	300.000000	2.400000

Lecture 9 - Exploratory Analysis "Cold Open"¶

Announcements:¶

Goals:¶

...go!¶