Two more faculty candidate talks this week:
import numpy as np
import matplotlib.pyplot as plt
import imageio
numpy
package and ndarray
type¶The numpy
package is largely focused on providing the ndarray
type and related functionality; an ndarray
is a multi-dimensional array.
Why is this interesting to us?
import pandas as pd
df = pd.DataFrame({"Count": range(10)})
df["Count"].to_numpy()
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.array
, np.zeros
, np.ones
a = np.array([1,2,3])
a
array([1, 2, 3])
b = np.zeros(8)
b
np.ones(4)
array([1., 1., 1., 1.])
np.ones(4)*4
array([4., 4., 4., 4.])
a.dtype
; dtype kwarg to array
, zeros
, ones
Unlike DataFrames, ndarrays need to be all one type. Numpy builtin types include:
np.uint
(8, 16, 32, 64)np.int
(8, 16, 32, 64)np.float
(16, 32, 64)You can use python's native types too - bool
, float
(same as np.float64
), int
(same as int64
)...
np.array([1, 0, 0], dtype=np.float32).dtype
dtype('float32')
a.dtype
dtype('int64')
b.dtype
dtype('float64')
b.astype(np.float32)
array([0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
a.shape
, a.ndim
a.shape
(3,)
a2d = np.ones((3, 4))
a2d
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
a2d.shape
(3, 4)
a3d = np.ones((3,2,4))
a3d.shape
(3, 2, 4)
a3d.ndim
3
Storage order: last dimensions are most adjacent.
Exercise: Are 2D numpy arrays stored row-major or column-major?
b2d = np.array([[1,2,3], [4,5,6]])
b2d
array([[1, 2, 3], [4, 5, 6]])
b2d.flatten()
array([1, 2, 3, 4, 5, 6])
b3d = np.array([np.ones((2, 4))*i for i in range(3)])
a.T
(2d); a.transpose(order)
(3d); a.reshape(new_shape)
b2d
array([[1, 2, 3], [4, 5, 6]])
b2d.T
array([[1, 4], [2, 5], [3, 6]])
b2d.transpose((1, 0))
array([[1, 4], [2, 5], [3, 6]])
Exercise: b3d
has shape (2, 3, 4)
. What is b3d.transpose((2,0,1)).shape
?
b3d_really = np.zeros((2,3,4))
b3d_really.transpose((2,0,1)).shape
(4, 2, 3)
b3d.shape
(3, 2, 4)
b3d.transpose((2,0,1)).shape
(4, 3, 2)
b2d * 2 + 1
array([[ 3, 5, 7], [ 9, 11, 13]])
b2d > 2
array([[False, False, True], [ True, True, True]])
b2d
array([[1, 2, 3], [4, 5, 6]])
b2d * b2d
array([[ 1, 4, 9], [16, 25, 36]])
b2d * b3d
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-79-4f8879cf4867> in <module> ----> 1 b2d * b3d ValueError: operands could not be broadcast together with shapes (2,3) (3,2,4)
plt.imshow(b2d,cmap="gray")
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f0aa9f322b0>
Color images are conventionally stored (row, column, channel), where channel is a dimension of size 3 containing red, green, and blue values.
b3d.shape
(3, 2, 4)
imcolor = b3d.reshape((2, 4, 3)) / b3d.max()
imcolor.shape
(2, 4, 3)
plt.imshow(imcolor)
<matplotlib.image.AxesImage at 0x7f0aa9e25c70>
Image Manipulation
beans = imageio.imread("https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_23w/data/beans_200.jpeg")
plt.imshow(beans)
<matplotlib.image.AxesImage at 0x7f0aa9e8faf0>
beans.dtype
dtype('uint8')
beans.min()
0
beans.max()
1.0
beans = beans.astype(np.float32) / 255
beans.dtype
dtype('float32')
plt.imshow(beans)
<matplotlib.image.AxesImage at 0x7f0aa9d7ac10>
beans.shape
(200, 200, 3)
beans[0,0,0]
0.6392157
beans[0,0,:]
Array([0.6392157 , 0.5764706 , 0.47843137], dtype=float32)
beans[:,0,0].shape
(200,)
plt.imshow(beans[:100,:100,:])
<matplotlib.image.AxesImage at 0x7f0aa9d02c70>
Mortal minds like mine cannot comprehend the output when displaying an array with more than 2 dimensions:
imcolor
array([[[0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0.5], [0.5, 0.5, 0.5]], [[0.5, 0.5, 0.5], [0.5, 1. , 1. ], [1. , 1. , 1. ], [1. , 1. , 1. ]]])
imcolor[:,:,1]
array([[0. , 0. , 0. , 0.5], [0.5, 1. , 1. , 1. ]])
My coping strategy is to only ever print out a slice that's 2D or less.
For example, if I slice the 0th (red) channel of imcolor
and display that, it makes sense to me:
a.sum
, a.mean
; axis
kwarg
beans_gray = beans.mean(axis=2)
plt.imshow(beans_gray, cmap="gray")
<matplotlib.image.AxesImage at 0x7f0aa9c0cb50>
plt.imshow((beans_gray > 0.5), cmap="gray")
<matplotlib.image.AxesImage at 0x7f0aa9b74400>
(beans_gray > 0.5).sum()
18884
Array-Array elementwise operations require the arrays to have the same shape (and number of dimensions).
Exception: if a corresponding dimension is 1 in one array, the values will be repeated ("broadcast") along that dimension.
green_beans = beans[:,:,1] # the green channel of the beans image
plt.imshow(green_beans, cmap="gray")
Introduce a vertical "haze" gradient effect. In other words, make each row brighter by an amount that increases as you go down the image.
fade = np.array(range(200)) / 200
If you have one array that matches except it's simply missing a dimension, you can add a singleton dimension:
green_beans.shape
beans.shape
beans * green_beans