Lab 3 due Thursday night!
From Brian Hutchinson (DS Advisor):
Reminder that the major application deadline is this Friday, Oct 15th. See here for information and the application form: https://cs.wwu.edu/data-science-bs-major-information-application
There will be a Data Science BS Information Session held this Thursday, Oct 14th at 1pm. Here is relevant information for that event: https://cs.wwu.edu/data-science-bs-info-session
The Western Washington Data-Driven Discovery Seminar Series kicks off this week. This series may be of great relevance and interest to Data Science students.
You may need to pip install the imageio
and matplotlib
packages for some of the following to work. Make sure you install it inside your data 311 virtual environment. The numpy
package should already be installed as a dependency of pandas
.
import numpy as np
import matplotlib.pyplot as plt
import imageio
numpy
package and ndarray
type¶The numpy
package is largely focused on providing the ndarray
type and related functionality; an ndarray
is a multi-dimensional array.
Why is this interesting to us?
import pandas as pd
df = pd.DataFrame({"Count": range(100)})
type(df["Count"].array)
pandas.core.arrays.numpy_.PandasArray
np.array
, np.zeros
, np.ones
a = np.array([1, 2, 3])
b = np.zeros(4)
np.ones(17)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])
a.dtype
; dtype kwarg to array
, zeros
, ones
Unlike DataFrames, ndarrays need to be all one type. Numpy builtin types include:
np.uint
(8, 16, 32, 64)np.int
(8, 16, 32, 64)np.float
(16, 32, 64)You can use python's native types too - bool
, float
(same as np.float64
), int
(same as int64
)...
np.zeros(4, dtype=float).dtype
dtype('float64')
np.array([1, 0, 0], dtype=int).dtype
dtype('int64')
a.shape
, a.ndim
a2d = np.ones((3, 4))
a2d
array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])
a3d = np.ones((3,2,4))
a3d
array([[[1., 1., 1., 1.], [1., 1., 1., 1.]], [[1., 1., 1., 1.], [1., 1., 1., 1.]], [[1., 1., 1., 1.], [1., 1., 1., 1.]]])
b2d = np.array([[1,2,3], [4,5,6]])
b2d
array([[1, 2, 3], [4, 5, 6]])
b3d = np.array([np.ones((2, 4))*i for i in range(3)])
b3d
array([[[0., 0., 0., 0.], [0., 0., 0., 0.]], [[1., 1., 1., 1.], [1., 1., 1., 1.]], [[2., 2., 2., 2.], [2., 2., 2., 2.]]])
a.T
(2d); a.transpose(order)
(3d); a.reshape(new_shape)
b3d.shape
(3, 2, 4)
b3d.T.shape
(4, 2, 3)
a4d = np.ones((2,3,4,5))
a4d.shape
(2, 3, 4, 5)
a4d.transpose((1, 0, 2, 3)).shape
(3, 2, 4, 5)
b2d
array([[1, 2, 3], [4, 5, 6]])
b2d.shape
(2, 3)
b2d.reshape((3,2))
array([[1, 2], [3, 4], [5, 6]])
b2d.reshape((6,))
array([1, 2, 3, 4, 5, 6])
Array/Scalar, Array/Array
b2d * 2 + 4
array([[ 6, 8, 10], [12, 14, 16]])
c2d = b2d + 5
c2d
array([[ 6, 7, 8], [ 9, 10, 11]])
b2d + c2d
array([[ 7, 9, 11], [13, 15, 17]])
b2d.shape == c2d.shape
True
plt.imshow(b2d, cmap="gray")
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x1611126a0>
b2d
array([[1, 2, 3], [4, 5, 6]])
Color images are conventionally stored (row, column, channel), where channel is a dimension of size 3 containing red, green, and blue values.
imcolor = b3d.reshape((2, 4, 3)) / b3d.max()
plt.imshow(imcolor)
<matplotlib.image.AxesImage at 0x161210370>
beans = imageio.imread("https://facultyweb.cs.wwu.edu/~wehrwes/courses/data311_21f/data/beans_200.jpeg")
plt.imshow(beans)
<matplotlib.image.AxesImage at 0x161471490>
beans.dtype
dtype('uint8')
beans.min(), beans.max()
(0, 255)
beans = beans.astype(np.float32) / 255.0
beans.min(), beans.max()
(0.0, 1.0)
plt.imshow(beans)
<matplotlib.image.AxesImage at 0x161352e20>
beans[0,0,:]
Array([0.6392157 , 0.5764706 , 0.47843137], dtype=float32)
beans[0,:,:].shape
(200, 3)
plt.imshow(beans[:,:,1] > 0.5, cmap="gray")
<matplotlib.image.AxesImage at 0x1611d76d0>
mask = beans[:,:,1] > 0.5
beans[mask, :].shape
(17594, 3)
beans[mask,:] = 0
plt.imshow(beans)
<matplotlib.image.AxesImage at 0x160c4fdc0>
a.sum
, a.mean
; axis
kwarg
b2d
array([[1, 2, 3], [4, 5, 6]])
b2d.sum(axis=1)
array([ 6, 15])
b2d[b2d < 4] = 1
b2d
array([[1, 1, 1], [4, 5, 6]])
Array-Array elementwise operations require the arrays to have the same shape.
Exception: if a corresponding dimension is 1 in one array, the values will be repeated ("broadcast") along that dimension.
green_beans = beans[:,:,1] # the green channel of the beans image
plt.imshow(green_beans, cmap="gray")
<matplotlib.image.AxesImage at 0x1614cab20>
fade = np.array(range(200)) / 200
print(green_beans.shape)
print(fade[:,np.newaxis].shape)
plt.imshow(green_beans + fade[:,np.newaxis], cmap="gray")
(200, 200) (200, 1)
<matplotlib.image.AxesImage at 0x161533490>