A couple logistics reminders:
The CS department is interviewing faculty candidates. Each candidate gives 2 public talks - one research talk and one teaching demo. Students are welcome and encouraged to attend these! To incentivize attendance, I offer a bit of extra credit. See the "Talk Attendance Extra Credit" assignment on Canvas for details, but the deal is this:
If you attend a talk, you can grab an index card from me at the start. During the talk, fill out the card with four things:
Hand the card to me at the end of the talk for 1 point of extra credit in the Quizzes category.
I will remind you of scheduled talks; the first two are this coming Thursday and Friday at 4pm:
Some of these folks will be your future professors!
Be able to navigate and work in Jupyter with
Understand the fundamental data structures and concepts of the pandas
library, and how they relate to each other:
Know enough about pandas to be able to do, or look up how to do, the following basic data manipulation tasks:
What is Jupyter? What is Colab?
http://colab.research.google.com
Notebook features:
a = 2+2
b = 8
a+b
a
4
a
4
https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_21f/lab4/diagonal_example.png
But seriously: I won't teach you every little thing you need to use. I will expect you to be able to find and use functionality that gets the job done. I also won't quiz/test you on syntactic minutia.
For demo purposes, we'll use a dataset downloaded from FiveThirtyEight, which compiled it for a 2015 article entitled Joining The Avengers Is As Deadly As Jumping Off A Four-Story Building. It catalogs information about all of the characters from the Marvel comic books that were ever members of the Avengers. You can find some meta-information about the dataset including a description of what each column means in the accompanying readme file (it's in Markdown format; one easy way to display it nicely would be to paste its contents into a Markdown cell in a notebook).
data_url = 'https://fw.cs.wwu.edu/~wehrwes/courses/data311_21f/data/avengers/avengers.csv'
import pandas as pd
df = pd.read_csv(data_url, encoding='latin-1')
df
URL | Name/Alias | Appearances | Current? | Gender | Probationary Introl | Full/Reserve Avengers Intro | Year | Years since joining | Honorary | ... | Return1 | Death2 | Return2 | Death3 | Return3 | Death4 | Return4 | Death5 | Return5 | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | http://marvel.wikia.com/Henry_Pym_(Earth-616) | Henry Jonathan "Hank" Pym | 1269 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | NO | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Merged with Ultron in Rage of Ultron Vol. 1. A... |
1 | http://marvel.wikia.com/Janet_van_Dyne_(Earth-... | Janet van Dyne | 1165 | YES | FEMALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Secret Invasion V1:I8. Actually was se... |
2 | http://marvel.wikia.com/Anthony_Stark_(Earth-616) | Anthony Edward "Tony" Stark | 3068 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Death: "Later while under the influence of Imm... |
3 | http://marvel.wikia.com/Robert_Bruce_Banner_(E... | Robert Bruce Banner | 2089 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Ghosts of the Future arc. However "he ... |
4 | http://marvel.wikia.com/Thor_Odinson_(Earth-616) | Thor Odinson | 2402 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | YES | NO | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Fear Itself brought back because that'... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
168 | http://marvel.wikia.com/Eric_Brooks_(Earth-616)# | Eric Brooks | 198 | YES | MALE | NaN | 13-Nov | 2013 | 2 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
169 | http://marvel.wikia.com/Adam_Brashear_(Earth-6... | Adam Brashear | 29 | YES | MALE | NaN | 14-Jan | 2014 | 1 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
170 | http://marvel.wikia.com/Victor_Alvarez_(Earth-... | Victor Alvarez | 45 | YES | MALE | NaN | 14-Jan | 2014 | 1 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
171 | http://marvel.wikia.com/Ava_Ayala_(Earth-616)# | Ava Ayala | 49 | YES | FEMALE | NaN | 14-Jan | 2014 | 1 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
172 | http://marvel.wikia.com/Kaluu_(Earth-616)# | Kaluu | 35 | YES | MALE | NaN | 15-Jan | 2015 | 0 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
173 rows × 21 columns
from pandas import Series, DataFrame
import pandas as pd
Series - a 1D list-like thing (think of it as a column with labels)
s = Series([9,6,8,4])
s
s.values
s.index
s[2]
8
We can customize the labels:
s2 = Series([9,6,8,4],index=['win','spr','sum','fal'])
s2
s2.values
s2.index
s2[2]
s2['sum']
8
We can create a Series from a dictionary:
d = {}
d['win'] = 9
d['spr'] = 6
d['sum'] = 8
d['fal'] = 4
s3 = Series(d)
s3
win 9 spr 6 sum 8 fal 4 dtype: int64
Many things that work on dictionaries and lists work on Series:
'fal' in s2
'jan' in s2
False
DataFrames represent 2D tables; each column is a Series.
Create a DataFrame from scratch:
data = {'city': ['Seattle','Spokane','Tacoma','Vancouver'],
'pop': [787,230,222,189], # units are thousands
'tax': [10.25,9.0,10.3,8.5]}
df = DataFrame(data)
df
city | pop | tax | |
---|---|---|---|
0 | Seattle | 787 | 10.25 |
1 | Spokane | 230 | 9.00 |
2 | Tacoma | 222 | 10.30 |
3 | Vancouver | 189 | 8.50 |
df['city']
df.city
#df[0]
0 Seattle 1 Spokane 2 Tacoma 3 Vancouver Name: city, dtype: object
Elementwise arithmetic works on Series:
df['tax'] / 100
0 0.1025 1 0.0900 2 0.1030 3 0.0850 Name: tax, dtype: float64
Add a column to an existing DataFrame:
df['visits'] = [20,2,5,4]
df
city | pop | tax | visits | |
---|---|---|---|---|
0 | Seattle | 787 | 10.25 | 20 |
1 | Spokane | 230 | 9.00 | 2 |
2 | Tacoma | 222 | 10.30 | 5 |
3 | Vancouver | 189 | 8.50 | 4 |
avengers = pd.read_csv(data_url, encoding='latin-1')
avengers.head(2)
URL | Name/Alias | Appearances | Current? | Gender | Probationary Introl | Full/Reserve Avengers Intro | Year | Years since joining | Honorary | ... | Return1 | Death2 | Return2 | Death3 | Return3 | Death4 | Return4 | Death5 | Return5 | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | http://marvel.wikia.com/Henry_Pym_(Earth-616) | Henry Jonathan "Hank" Pym | 1269 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | NO | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Merged with Ultron in Rage of Ultron Vol. 1. A... |
1 | http://marvel.wikia.com/Janet_van_Dyne_(Earth-... | Janet van Dyne | 1165 | YES | FEMALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Secret Invasion V1:I8. Actually was se... |
2 rows × 21 columns
avengers.tail(3)
URL | Name/Alias | Appearances | Current? | Gender | Probationary Introl | Full/Reserve Avengers Intro | Year | Years since joining | Honorary | ... | Return1 | Death2 | Return2 | Death3 | Return3 | Death4 | Return4 | Death5 | Return5 | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
170 | http://marvel.wikia.com/Victor_Alvarez_(Earth-... | Victor Alvarez | 45 | YES | MALE | NaN | 14-Jan | 2014 | 1 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
171 | http://marvel.wikia.com/Ava_Ayala_(Earth-616)# | Ava Ayala | 49 | YES | FEMALE | NaN | 14-Jan | 2014 | 1 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
172 | http://marvel.wikia.com/Kaluu_(Earth-616)# | Kaluu | 35 | YES | MALE | NaN | 15-Jan | 2015 | 0 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 rows × 21 columns
avengers.drop(columns=["URL"]).head()
Name/Alias | Appearances | Current? | Gender | Probationary Introl | Full/Reserve Avengers Intro | Year | Years since joining | Honorary | Death1 | Return1 | Death2 | Return2 | Death3 | Return3 | Death4 | Return4 | Death5 | Return5 | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Henry Jonathan "Hank" Pym | 1269 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | YES | NO | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Merged with Ultron in Rage of Ultron Vol. 1. A... |
1 | Janet van Dyne | 1165 | YES | FEMALE | NaN | Sep-63 | 1963 | 52 | Full | YES | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Secret Invasion V1:I8. Actually was se... |
2 | Anthony Edward "Tony" Stark | 3068 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | YES | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Death: "Later while under the influence of Imm... |
3 | Robert Bruce Banner | 2089 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | YES | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Ghosts of the Future arc. However "he ... |
4 | Thor Odinson | 2402 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | YES | YES | YES | NO | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Fear Itself brought back because that'... |
avengers.shape
(173, 21)
avengers["Name/Alias"]
0 Henry Jonathan "Hank" Pym 1 Janet van Dyne 2 Anthony Edward "Tony" Stark 3 Robert Bruce Banner 4 Thor Odinson ... 168 Eric Brooks 169 Adam Brashear 170 Victor Alvarez 171 Ava Ayala 172 Kaluu Name: Name/Alias, Length: 173, dtype: object
na = avengers[["Name/Alias", "Appearances"]]
avengers[["Name/Alias"]]
Name/Alias | |
---|---|
0 | Henry Jonathan "Hank" Pym |
1 | Janet van Dyne |
2 | Anthony Edward "Tony" Stark |
3 | Robert Bruce Banner |
4 | Thor Odinson |
... | ... |
168 | Eric Brooks |
169 | Adam Brashear |
170 | Victor Alvarez |
171 | Ava Ayala |
172 | Kaluu |
173 rows × 1 columns
na = na.sort_values("Appearances", ascending=False)
na[10:20]
Name/Alias | Appearances | |
---|---|---|
140 | Ororo Munroe | 1598 |
49 | Namor McKenzie | 1561 |
7 | Clinton Francis Barton | 1456 |
141 | Matt Murdock | 1375 |
104 | Doctor Stephen Vincent Strange | 1324 |
0 | Henry Jonathan "Hank" Pym | 1269 |
9 | Wanda Maximoff | 1214 |
1 | Janet van Dyne | 1165 |
15 | Natalia Alianovna Romanova | 1112 |
13 | Victor Shade (alias) | 1036 |
na.plot(y="Appearances", use_index=False)
<matplotlib.axes._subplots.AxesSubplot at 0x7f6e91f63e80>
avengers["Gender"]
avengers.value_counts("Gender")
Gender MALE 115 FEMALE 58 dtype: int64
avengers.plot.scatter("Years since joining", "Appearances")
<matplotlib.axes._subplots.AxesSubplot at 0x7f6e90d840d0>
avengers.groupby("Gender")["Appearances"].mean()
Gender FEMALE 263.327586 MALE 490.069565 Name: Appearances, dtype: float64
avengers[avengers["Gender"] =="FEMALE"].head(3)
URL | Name/Alias | Appearances | Current? | Gender | Probationary Introl | Full/Reserve Avengers Intro | Year | Years since joining | Honorary | ... | Return1 | Death2 | Return2 | Death3 | Return3 | Death4 | Return4 | Death5 | Return5 | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | http://marvel.wikia.com/Janet_van_Dyne_(Earth-... | Janet van Dyne | 1165 | YES | FEMALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Secret Invasion V1:I8. Actually was se... |
9 | http://marvel.wikia.com/Wanda_Maximoff_(Earth-... | Wanda Maximoff | 1214 | YES | FEMALE | NaN | May-65 | 1965 | 50 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Uncanny_Avengers_Vol_1_14. Later comes... |
15 | http://marvel.wikia.com/Natalia_Romanova_(Eart... | Natalia Alianovna Romanova | 1112 | YES | FEMALE | NaN | May-73 | 1973 | 42 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Killed by The Hand. Later revived with The Sto... |
3 rows × 21 columns
avengers[avengers["Appearances"] > 2000]
URL | Name/Alias | Appearances | Current? | Gender | Probationary Introl | Full/Reserve Avengers Intro | Year | Years since joining | Honorary | ... | Return1 | Death2 | Return2 | Death3 | Return3 | Death4 | Return4 | Death5 | Return5 | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | http://marvel.wikia.com/Anthony_Stark_(Earth-616) | Anthony Edward "Tony" Stark | 3068 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Death: "Later while under the influence of Imm... |
3 | http://marvel.wikia.com/Robert_Bruce_Banner_(E... | Robert Bruce Banner | 2089 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Ghosts of the Future arc. However "he ... |
4 | http://marvel.wikia.com/Thor_Odinson_(Earth-616) | Thor Odinson | 2402 | YES | MALE | NaN | Sep-63 | 1963 | 52 | Full | ... | YES | YES | NO | NaN | NaN | NaN | NaN | NaN | NaN | Dies in Fear Itself brought back because that'... |
6 | http://marvel.wikia.com/Steven_Rogers_(Earth-616) | Steven Rogers | 3458 | YES | MALE | NaN | Mar-64 | 1964 | 51 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dies at the end of Civil War. Later comes back. |
40 | http://marvel.wikia.com/Benjamin_Grimm_(Earth-... | Benjamin Jacob Grimm | 2305 | NO | MALE | NaN | Jun-86 | 1986 | 29 | Full | ... | YES | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Once killed during a battle with Doctor Doom.'... |
57 | http://marvel.wikia.com/Reed_Richards_(Earth-6... | Reed Richards | 2125 | YES | MALE | NaN | Feb-89 | 1989 | 26 | Full | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
73 | http://marvel.wikia.com/Peter_Parker_(Earth-616)# | Peter Benjamin Parker | 4333 | YES | MALE | NaN | Apr-90 | 1990 | 25 | Full | ... | YES | YES | YES | NaN | NaN | NaN | NaN | NaN | NaN | Since joining the New Avengers: First death Ki... |
92 | http://marvel.wikia.com/James_Howlett_(Earth-6... | James "Logan" Howlett | 3130 | YES | MALE | NaN | 5-Jun | 2005 | 10 | Full | ... | NO | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Died in Death_of_Wolverine_Vol_1_4. Has not ye... |
8 rows × 21 columns
# column info
avengers.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 173 entries, 0 to 172 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 URL 173 non-null object 1 Name/Alias 163 non-null object 2 Appearances 173 non-null int64 3 Current? 173 non-null object 4 Gender 173 non-null object 5 Probationary Introl 15 non-null object 6 Full/Reserve Avengers Intro 159 non-null object 7 Year 173 non-null int64 8 Years since joining 173 non-null int64 9 Honorary 173 non-null object 10 Death1 173 non-null object 11 Return1 69 non-null object 12 Death2 17 non-null object 13 Return2 16 non-null object 14 Death3 2 non-null object 15 Return3 2 non-null object 16 Death4 1 non-null object 17 Return4 1 non-null object 18 Death5 1 non-null object 19 Return5 1 non-null object 20 Notes 75 non-null object dtypes: int64(3), object(18) memory usage: 28.5+ KB
avengers.describe()
Appearances | Year | Years since joining | |
---|---|---|---|
count | 173.000000 | 173.000000 | 173.000000 |
mean | 414.052023 | 1988.445087 | 26.554913 |
std | 677.991950 | 30.374669 | 30.374669 |
min | 2.000000 | 1900.000000 | 0.000000 |
25% | 58.000000 | 1979.000000 | 5.000000 |
50% | 132.000000 | 1996.000000 | 19.000000 |
75% | 491.000000 | 2010.000000 | 36.000000 |
max | 4333.000000 | 2015.000000 | 115.000000 |
Things to demo:
End of L02; done at the beginning of L03: