DATA 311 Lecture 2 - numpy demo¶

Markdown Demo (this is a level 2 heading)¶

Lists¶

  • bulleted lists
  • another item
  1. numbered lists
  2. another item

Text formatting¶

bold, italics, monospace

A code block:

a = 4
b = 7

Links and images:¶

link text

alt text

In [1]:
import numpy as np
import random
import matplotlib.pyplot as plt
import imageio.v3 as imageio

Creating Arrays¶

  • array, zeros, ones, *_like
    • dtype argument
In [2]:
# create a python list with 0..9
a = list(range(10))
a
Out[2]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [3]:
# create a numpy array with the list's contents
a = np.array(a)
In [4]:
# show the array's data type
a.dtype
Out[4]:
dtype('int64')
In [5]:
# get the element at index 1
a[1]
Out[5]:
np.int64(1)
In [6]:
# get the shape of the array
a.shape
Out[6]:
(10,)

Basic list-like slicing¶

In [7]:
# slice with beginning and end
a[3:6]
Out[7]:
array([3, 4, 5])
In [8]:
# slice with implicit start (0)
a[:4]
Out[8]:
array([0, 1, 2, 3])
In [9]:
# slice with implicit end (len)
a[5:]
Out[9]:
array([5, 6, 7, 8, 9])
In [10]:
# slice with implicit start and end
a[:]
Out[10]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [11]:
# slice with a step size
a[1:7:2]
Out[11]:
array([1, 3, 5])

Elementwise Operations¶

In [12]:
a
Out[12]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [13]:
# array + scalar
a + 4
Out[13]:
array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])
In [14]:
# array + array
a + a
Out[14]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [15]:
# scalar * array
2 * a
Out[15]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [16]:
# array + array dimension matching
a + a[:4]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[16], line 2
      1 # array + array dimension matching
----> 2 a + a[:4]

ValueError: operands could not be broadcast together with shapes (10,) (4,) 

Exercise 1: Speed Check¶

In pairs: I've claimed numpy is faster than native Python. Let's find out how much faster.

  1. In the cell below, create a Python list (not a numpy array!) of 10,000 random floating-point numbers between 0.0 and 1.0. Useful tools: import random, random.random().
In [17]:
import random

b = []
for i in range(1_000_000):
    b.append(random.random())
b[:5]
Out[17]:
[0.8437826572010122,
 0.6696759345208393,
 0.14343354017923993,
 0.8904407099190859,
 0.2182667489356157]
  1. In the next cell, create a new Python list containing the same numbers as the original, but with 0.5 subtracted from each. Don't modify the original list. I've added the ipython magic command %%time to the top of the cell to measure and report the time it takes to execute that cell.
In [18]:
%%time

c = []
for v in b:
    c.append(v - 0.5) 
CPU times: user 83.7 ms, sys: 16.1 ms, total: 99.8 ms
Wall time: 99.4 ms
  1. In the next cell, create a numpy array np_nums containing the same numbers as your original (0.0 to 0.1) list.
In [19]:
np_nums = np.array(b)
  1. In the cell below, create a new numpy array np_result by subtracting 0.5 from np_nums (i.e., using elementwise operations). Time this cell's execution. How much faster is the numpy version than the native python version?
In [20]:
%%time
np_result = np_nums - 0.5
# your code here
CPU times: user 1.84 ms, sys: 2.96 ms, total: 4.8 ms
Wall time: 4.14 ms

Multidimensional Arrays¶

  • 2D arrays, slicing across dimensions
  • elementwise operations
    • comparisons / boolean dtype, masking
  • visualizing as an image

More ways of making arrays:

In [21]:
# create an array from [1, 2, 3]
np.array([1, 2, 3])
Out[21]:
array([1, 2, 3])
In [22]:
# create an array of 6 zeros
np.zeros((6,))
Out[22]:
array([0., 0., 0., 0., 0., 0.])
In [23]:
# create an array of 6 ones with 64-bit integer datatype
np.zeros((6,), dtype=np.int64)
Out[23]:
array([0, 0, 0, 0, 0, 0])
In [24]:
# create a 2-by-3 array of zeros
np.zeros((2, 3))
Out[24]:
array([[0., 0., 0.],
       [0., 0., 0.]])
In [ ]:
 

Reshaping¶

  • more than 2 dimensions
In [25]:
# set b to an array of 0..5, reshaped to 2-by-3
b = np.array(range(6)).reshape((2, 3))
b
Out[25]:
array([[0, 1, 2],
       [3, 4, 5]])
In [26]:
# demo indexing into b
b[1,2]
Out[26]:
np.int64(5)

Aggregation / Projection¶

In [27]:
# find the sum of all elements in b
b.sum()
Out[27]:
np.int64(15)
In [28]:
# find the minimum value in b
b.min()
Out[28]:
np.int64(0)
In [29]:
# display b, just for reference
b
Out[29]:
array([[0, 1, 2],
       [3, 4, 5]])
In [30]:
# sum the elements of b along axis 0 (the row dimension)
b.sum(axis=0)
Out[30]:
array([3, 5, 7])
In [31]:
# sum the elements of b along axis 1 (the column dimension)
b.sum(axis=1)
Out[31]:
array([ 3, 12])

Exercise 2 - Broadcasting¶

In pairs: We've seen that, to perform elementwise operations, the dimensions of the arrays must match. There's one convenient exception to this. Let's see it in action below:

In [32]:
b = np.array(range(6)).reshape((2, 3))
b.shape
Out[32]:
(2, 3)
In [33]:
c = np.array([2, 4])
c.shape
Out[33]:
(2,)
In [34]:
# dimension mismatch
b * c
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[34], line 2
      1 # dimension mismatch
----> 2 b * c

ValueError: operands could not be broadcast together with shapes (2,3) (2,) 
In [35]:
# reshape c to be 2D
c_2x1 = c.reshape((2, 1))
c_2x1.shape
Out[35]:
(2, 1)
In [36]:
b
Out[36]:
array([[0, 1, 2],
       [3, 4, 5]])
In [37]:
c_2x1
Out[37]:
array([[2],
       [4]])
In [38]:
# example of broadcasting:
b * c_2x1
Out[38]:
array([[ 0,  2,  4],
       [12, 16, 20]])
  1. This is called broadcasting. Explain what happened here.

The values in c_2x1 got repeated across the column dimension to match the column dimension of b.

Now, run the following cells to see another example.

In [39]:
d = np.array([1, 0, 1]).reshape(1, 3)
d.shape
Out[39]:
(1, 3)
In [40]:
b
Out[40]:
array([[0, 1, 2],
       [3, 4, 5]])
In [41]:
d
Out[41]:
array([[1, 0, 1]])
In [42]:
b * d
Out[42]:
array([[0, 0, 2],
       [3, 0, 5]])

Now, explain the general rule for:

  1. What kind of dimension mismatches are allowed?

Dimensions must match exactly, unless one array has a 1 in a given dimension.

  1. How do elementwise operations behave when such a mismatch is present?

The elements will be repeated across the singleton dimension.

Numpy, Continued¶

Fancy indexing¶

  • Integer indexing: a[ list or ndarray of integer indices ]
  • Boolean indexing: a[ list or ndarray of booleans ] where the list/ndarray's shape matches a's

See https://numpy.org/doc/stable/user/basics.indexing.html for much more.

Integer indexing¶

In [ ]:
a = np.array(range(10, 20))
a

Indexing with a list or array of integers pulls out only the elements at those indices:

In [ ]:
# get the first, third, and fifth elements:
In [ ]:
# get the fourth, second, and second elements (!):

Boolean Indexing¶

In [ ]:
b = np.ones((2, 2))
b[0,0] = 2
b[1,1] = 0

Quick quiz: what does b look like now?

In [ ]:
b

Make a "mask" of booleans that's the same shape as b:

In [ ]:
mask = np.array([
    [True, False],
    [False, True]
])
mask
In [ ]:
# index b with the boolean mask:

A common pattern - comparison operators to generate a mask:

In [ ]:
# get an array of only the elements of b that are greater than zero:

Tips for multidimensional arrays¶

  • I never display anything that's more than 2D.
  • I never try to visualize anything that's more than 3D.
In [ ]:
c = np.array(range(24)).reshape(2, 4, 3)
c
In [ ]:
# take one 2D slice
In [ ]:
# take another 2D slice along a different axis

Exercise 3 Play with my cat¶

In pairs: In this exercise, we'll manipulate an image as a 2D array.

We'll start by loading a picture of my cat, Beans:

In [ ]:
beans = imageio.imread("/cluster/academic/DATA311/202620/beans_gray.jpeg")

We'll use plt.imshow to visualize the image:

In [ ]:
plt.imshow(beans, cmap='gray')
  1. What is the dtype of the resulting array? What are the minimum and maximum values?
In [ ]:
 
  1. Display a binary image showing which pixels are greater than half the maximum pixel intensity (127).
In [ ]:
 
  1. What is the average value of pixels that have intensity value above 127?
In [ ]:
 
  1. Which column of the image has the highest average pixel value?
In [ ]: