
## Part 1: New York City 311 Data

A little setup - import some libraries, tweak the plot settings, and define the url where we'll pull our data from:

In [None]:
import pandas as pd

# Make the graphs a bit prettier, and bigger
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (15, 5)

data_base_url = "https://fw.cs.wwu.edu/~wehrwes/courses/data311_23w/lab1/data/"

Read in the dataset of 311 (citizen hotline - no relation to our course number) requests: 

In [None]:
complaints = pd.read_csv(f"{data_base_url}/311-service-requests.csv", dtype="unicode")

### Basic Selection

1.1 Display the first 8 rows of the dataframe.

1.2 Extract and display just the "Complaint Type" column.

1.3 Combine the above two to get the first 5 rows of just the complaint type column. Does it matter which order you select them in?

1.4 Extract a DataFrame containing only the "Complaint Type" and "Borough" columns.

1.5 Display a tally of how many of each complaint type appears in the dataframe.

1.6 Make a bar plot showing the counts of the top 10 complaint types.

### Which borough has the most noise complaints?

1.7 Create a new Series that stores True if the complaint type is equal to "Noise - Street/Sidewalk", and False otherwise Assign it to a variable called `is_noise`.

1.8 Create a new DataFrame that contains only the noise complaints by indexing the `complaints` DataFrame with your `is_noise` Series.

1.9 Display a summary of the noise complaints; one call should tell you at a glance how many complaints there were, how many unique zip codes there were, and the most common Descriptor associated with noise complaints.

1.10 Display the count of noise complaints for each borough.

So it looks like Manhattan has the most noise complaints. Not too surprising! But Manhattan might also just have more complaints overall, so let's look at the *fraction* of complaints that were noise complaints.

### Which borough has the highest *fraction* of noise complaints?

1.11 Calculate the total count of complaints (of all types) for each borough. Store the result in `complaint_counts`.

1.12 Divide the noise complaint counts by the total complaint counts to get the fraction of noise complaints.

1.13 Multiply the above by 100 to get the percentage of noise complaints and create a bar plot showing the percentage of complaints that are noise complaints in each borough.

So yep, it looks like Manhattan is just noisy. Who knew?

## Part 2: Bike Path Data

### Reading the Data
Here we attempt to read in a CSV file containing bike path usage data from Montr√©al.

In [None]:
broken_df = pd.read_csv(f"{data_base_url}/bikes.csv", encoding = "ISO-8859-1")

Taking a look at the first five rows suggests that this did not go very well:

In [None]:
broken_df.head()

The `read_csv` has a bunch of options that will let us fix this. 

2.1 In the cell below, call `pd.read_csv` with the arguments needed to fix the following problems:

   * change the column separator to a `;`
   * Set the encoding to 'latin1' (the default is 'utf8')
   * Parse the dates in the 'Date' column into DateTime objects
   * Tell it that our dates have the day first instead of the month first (Canadians, am I right?)
   * Set the index to be the 'Date' column

Display the first 5 rows of the successfully loaded table.


2.2 Let's look at just one bike path; extract a Series with just the "Berri 1" column. Assign it to a variable called `berri_bikes`.

2.3 Plot the Series using a line plot.

2.4 What happens if you call `plot()` on the entire `bike_data` DataFrame instead of just one column?

Yikes - it's a little crowded, but it does show some interesting things!

### Plotting Usage Per Month

2.5 Use the `resample` function followed by the `sum` function to calculate the total usage for each month. Store this in a variable called `monthly`.

2.6 Extract the numerical month from the DataFrames index column (these are currently DateTime objects).

2.6 Set the index of the `monthly` DataFrame to the month numbers. Also set the name of the index to "Month". Display the resulting DataFrame.

2.7 Create a bar plot showing usage by month.

### Plotting Usage Per Weekday

Anytime you extract some part of a DataFrame and then modify it, you risk getting in trouble because the extracted part might just be a "view" into the original rather than a separate copy of the selection you made. If you plan to make any modifications to a selection from a DataFrame, it's a good idea to ensure you're working with a separate copy.

2.8 Extract a DataFrame (not a Series) containing only the "Berri 1" column from the bike_data and store a **copy** of it in a the `berri_bikes` variable.

2.9 Insert a new column called "weekday" in the `berri_bikes` dataframe containing the weekday attribute of the current index's DateTimes. Display the first five rows of the resulting DataFrame.

2.10 Display the first 5 rows corresponding to Mondays (day 0).

2.11 Do the same as above, but use `groupby` and `get_group` to get a DataFrame of all rows that correspond to Mondays (0). Display the first 5 rows.

2.12 Create a DataFrame called `weekday_counts` that contains the sum of trips taken for each weekday throughout the entire dataset. Display the whole thing.

2.13 Set the index to the names of the weekdays and create a bar plot showing usage per weekday.