# Lab 1: Introduction to NumPy

### Replace this cell with names of both partners.

### Overview
In this lab, you will learn to use NumPy, a Python library for working with arrays. Hint: if you find yourself stuck, reviewing the documentation can be useful https://numpy.org/doc/stable/user/.

## Part I: NumPy array (vector) indexing and calculations

Let $\textbf{x}$ be a random 1D array of integers with 50 elements distributed between a lower bound of 5 and upper bound of 313. Follow the prompts in the text cells to complete the code cells below. Note: for this exercise and throughout the quarter, the first element of an array/vector is always at index 0 when coding in Python.    

In [None]:
import numpy as np
# Set random seed so that correct results are deterministic
np.random.seed(10)
# Get a random vector to play with
x = np.random.randint(low=5, high=313, size=(50,))
x

Select the ***first*** 15 elements of $\textbf{x}$ in an array $\textbf{y}$ using slice indexing.

Select the ***last*** 15 elements of $\textbf{x}$ in an array $\textbf{z}$ using slice indexing.

Store the elementwise sum of $\textbf{y}$ and $\textbf{z}$ in an array named $\texttt{sumyz}$.

Store the elementwise difference of $\textbf{z}$ subtracted from $\textbf{y}$ in an array named $\texttt{yminusz}$.





Confirm that $\texttt{sumyz} + \texttt{yminusz}  = \textbf{y} + \textbf{y}$ using the $\texttt{==}$ comparison operator. Store the resulting boolean matrix in an array called $\texttt{check}$.

Store the elementwise product of $\textbf{y}$ and $\textbf{z}$ in an array named $\texttt{mulyz}$.

Store the 17th element of $\textbf{x}$ in a variable $\texttt{seventeen}$

NumPy slicing can also be used to select a range of elements in the middle of an array. For $i < j$, to select elements [$i, i+1, i+2, ..., j-2, j-1$] we use the syntax:

```
x[i:j]
```
This syntax generalizes to matrices and general $n$-dimensional arrays.

By default, NumPy slicing will select a consecutive series of elements. In addition, slicing can be used to select every 2nd, or 3rd, or in general every $n$-th element of an array. To select every other element of an array we use the syntax:

```
x[::2]
```

To select every 3rd element of an array we use the syntax:

```
x[::3]
```

In general, to select every $n$-th element of an array we use the syntax:

```
x[::n]
```

To select every $n$-th element of an array beginning at index $i$ and up to but not including index $j$ we use the syntax:

```
x[i:j:n]
```



Store the 3rd through the 11th element of $\textbf{z}$ in an array $\texttt{z311}$. Recall that since Python starts counting at 0, the 3rd element is at index 2 and the 11th element is at index 10. $\texttt{z311}$ should have 9 elements which you can check with the $\texttt{shape}$ method.

Store every 7th element of $\textbf{x}$ in an array $\texttt{x7}$.

Store all elements of $\textbf{x}$ greater that 11 using ***boolean indexing*** in an array $\texttt{g60}$.

Store the mean of $\textbf{x}$ in a variable $\texttt{xmean}$.

Store the minimum value of $\textbf{x}$ in a variable $\textbf{xmin}$.

Store the maximum value of $\textbf{x}$ in a variable $\texttt{xmax}$.

Store the 1st, 28th, 2nd, and 19th elements (in that order) of $\textbf{x}$ in
an array $\texttt{crazyx}$, using advanced indexing with a vector of integers.

## Part II: Matrix indexing

MNIST (Modified National Institute of Standards and Technology) dataset is a collection of 70,000 28x28 pixel grayscale images of handwritten digits (0-9), with each pixel corresponding to an integer between 0 (black) and 255 (white). The MNIST test dataset consists of 10,000 such images. In the following exercises, we will load the MNIST test dataset into a NumPy ndarray and practice NumPy indexing and mathematical operations.






In [None]:
# Load the mnist test dataset from disc
mnist = np.loadtxt('https://fw.cs.wwu.edu/~wehrwes/courses/data311_25f/data/mnist_test.csv', delimiter=',')
print(mnist.shape)
print(mnist.dtype)

Notice that the shape of the dataset isn't exactly what one would expect. Each image is 28 * 28 = 784 total pixels. Historically, when compute capability was more limited, these images were stored with compression algorithms that worked much better on the "flattened" images, i.e., the 28 rows (each 28 elements long) of the image matrices were concatenated into 784 element vectors. The first column of this matrix of flattened images corresponds to the digit (0-9) which was drawn.

In [None]:
# The digits corresponding to the drawn images
mnist[:, 0]

In [None]:
# Let's remove the digit labels. But we don't want to throw away the labels so we'll save them.
mnist_labels, mnist = mnist[:, 0], mnist[:, 1:]
mnist.shape

NumPy has an easy way to reconfigure data using the $\texttt{reshape}$ method.

In [None]:
mnist_flat, mnist = mnist, mnist.reshape(10000, 28, 28)

In [None]:
# Reshape only works when you propose a shape that has the same number of elements as the original n-way array.
# You can confirm this is the case here by taking the product of each shape tuple.
mnist_flat.shape, mnist.shape

Store the first image from the dataset in a matrix $\texttt{im1}$.

Matplotlib has an easy method to view image matrices rendered as images using the $\texttt{imshow}$ method. You should see a picture of a 7 when you run the following code cell. If not, you should go back and check your work before continuing.

In [None]:
import matplotlib.pyplot as plt
plt.imshow(im1, cmap='gray')

Calculate the number of pixels in $\texttt{im1}$ which are not black (i.e. have ink) using the comparison operator **>**, and the NumPy $\texttt{sum}$ method. The sum method will count True as 1 and False as 0 when dealing with boolean values. Now calculate the proportion of pixels that have ink and store in the variable $\texttt{ink}$.

From the high proportion of ink to blank space, this digit was likely drawn with a sharpie or some other kind of marker. See what kind of interesting insights data science can provide!

However, this image doesn't look quite right. The ink should be black, and the background white. This is how they were originally drawn prior to the dataset processing.

Create a matrix $\texttt{invert}$ where each element is equal to 255 minus the corresponding element in $\texttt{im1}$. So for instance, if the pixel value in  $\texttt{im1}$ is equal to zero, the corresponding element in $\texttt{invert}$ will be equal to 255.  

Display $\texttt{invert}$.

Create the matrix $\texttt{ident}$ which is the sum of $\texttt{invert}$ and $\texttt{im1}$. Display $\texttt{ident}$.

Using the $\texttt{mnist\_labels}$ vector, the **==** operator, and boolean indexing, create a 3-way array $\texttt{mnist0}$ with all the images of the digit 0 from the $\texttt{mnist}$ 3-way array. Display the last image in $\texttt{mnist0}$ as a sanity check that you did the indexing  correctly.

Using the NumPy $\texttt{mean}$ method and its $\texttt{axis}$ argument, create a 2-way array $\texttt{mishmash}$ which is the elementwise average of all the images in $\texttt{mnist0}$.
Display $\texttt{mishmash}$.

Now that's a pretty nice looking zero!

## Part III: Simulating Slit Scan and Time Slice Photography 

Color images can be represented by 3D `ndarray`s with shape (`rows`, `columns`, `channels`), where `channels` is typically size 3, representing red, green, and blue values of each pixel. A **video** can be thought of as a stack of images: a 4D array, sometimes represented as an array with shape (`frames`, `rows`, `columns`, `channels`).

You can imagine a video as being a cube-like object, with rows and columns as two dimensions and time (frames) as the third. The channel dimension just comes along for the ride - when displayed on your screen, those three values get stuck together into a single pixel. If you imagine taking a two-dimensional slice of the "video cube" in the plane of the `rows` and `columns` axis at some fixed value of the `frames` dimension, you'll just end up with a single frame of the video. But we can slice the cube in other directions as well, and this can result in some pretty interesting images! In this task, you'll experiment with "slicing" a video cube in a couple of non-traditional ways.

Here's some code that loads a list of URLs, one per image. These are just the frames of a video broken out into individual images to make them easier to work with.

In [None]:
import imageio.v3 as imageio
plaza_urls = np.genfromtxt("https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_25f/data/plaza_urls.txt", dtype=str).tolist()
plaza_urls[:5] # show just the first 5 to see what they look like

Here's what the first frame looks like:

In [None]:
plt.imshow(imageio.imread(plaza_urls[0]))

### 3.0: Create a Video Cube

First, write code below to load the frames into a single `ndarray`. Different approaches can work - here are a couple suggestions:

* Load all the images into a python list, then call `np.array` on that list; this will concatenate the list of `(height, width, 3)` images into a `(frames, height, width, 3)` array.
* Preallocate a `(frames, height, width, 3)` array, then read each image in assigning it to a slice of the array, as in `cube[i,:,:,:] = frame_i`.

*Note*: loading all the images is not a super quick operation, and it also uses significant network bandwidth. Please avoid re-running the image loading cell above more than needed - once it's working, you should need to run this cell only once per work session. Also, when you're testing and developing, you may want to try working on a small subset of the images (e.g., the first 5 frames) so you can try stuff out more quickly.

### 3.1: Slicing Directly Across Time

To get a frame, you can fix the frame dimension and take all rows and all columns. What if instead, we fix the **column** dimension and take all rows and all frames? If we do this, we get an image that's similar to a "slit scan" photograph ([wikipedia](https://en.wikipedia.org/wiki/Slit-scan_photography), [examples](https://www.google.com/search?q=slit+scan+photography&client=firefox-b-1-d&source=lnms&tbm=isch&sa=X&ved=2ahUKEwiLio-L-szzAhVHFjQIHYKCAAsQ_AUoAXoECAEQAw&biw=1441&bih=924&dpr=1)). 

Your task is to play with creating "time slice photographs" that fix the image column (or row) and produce an image where one of the dimensions represents time. Here's an example output that I made from the given image set:

![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_25f/lab1/timeslice_example.png)

Write code to compute a time slice image like the above. Play around and see what you can create! The only requirement is that one dimension must represent time and another must represent a *fixed* row or column of the image's spatial dimensions. After slicing out your image, call `plt.imshow` to display your result. 

### 3.2: Slicing Diagonally Across Time and Space

Above, we took a slice straight along the time dimension, orthogonal to the (`rows`, `columns`) plane. But there's no reason we have to stick to that! If we slice the space-time cube diagonally, we can mix change over time and change across the image's spatial dimensions into a single image. Here's an example I created using a timelapse of the New York City skyline (you can see the original video [here](https://www.youtube.com/watch?v=tQBYm7_1hqs)):

![](https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_25f/lab1/diagonal_example.png)

Let's load up the URLs for this video, which is a timelapse of the NYC skyline:

In [None]:
ny_urls = np.genfromtxt("https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_25f/data/ny_urls.txt", dtype=str).tolist()
plt.imshow(imageio.imread(ny_urls[100])) # the video fades in from black, so the first frame is boring

Write code to compute a "diagonal" time slice image like the example above. Again, you can play around with the specifics and get creative, but for this result please make sure that your slice is **not** parallel to any of the video cube's axis-aligned planes.

*Note*: as in Part 3.1, loading all the images is not a super quick operation. **Make sure you're loading the images in a separate code cell, and try to avoid loading the images more than once during a single session of work on this lab.**

In [None]:
# load the NYC dataset into a video cube

*Note:* while this kind of diagonal slicing is technically possible with a one-liner using boolean indexing, it might be more naturally implemented with a loop.

In [None]:
# extract and display your "diagonal" time slice image

## Submission

Make sure the output is shown for all cells, then download your completed notebook as a .ipynb file and submit it to the Lab 1 assignment on Canvas.

