import numpy as np
import seaborn as sns
Most prediction tasks fall into one of two categories: classification or regression. The basic distinction is whether you're trying to predict a discrete categorical property, or a continuous numerical property.
p = sns.load_dataset('penguins')
p
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | |
---|---|---|---|---|---|---|---|
0 | Adelie | Torgersen | 39.1 | 18.7 | 181.0 | 3750.0 | Male |
1 | Adelie | Torgersen | 39.5 | 17.4 | 186.0 | 3800.0 | Female |
2 | Adelie | Torgersen | 40.3 | 18.0 | 195.0 | 3250.0 | Female |
3 | Adelie | Torgersen | NaN | NaN | NaN | NaN | NaN |
4 | Adelie | Torgersen | 36.7 | 19.3 | 193.0 | 3450.0 | Female |
... | ... | ... | ... | ... | ... | ... | ... |
339 | Gentoo | Biscoe | NaN | NaN | NaN | NaN | NaN |
340 | Gentoo | Biscoe | 46.8 | 14.3 | 215.0 | 4850.0 | Female |
341 | Gentoo | Biscoe | 50.4 | 15.7 | 222.0 | 5750.0 | Male |
342 | Gentoo | Biscoe | 45.2 | 14.8 | 212.0 | 5200.0 | Female |
343 | Gentoo | Biscoe | 49.9 | 16.1 | 213.0 | 5400.0 | Male |
344 rows × 7 columns
Example: Consider flipper length and body mass. Suppose it's easier to put a penguin on a scale than to pin it down and measure its flipper with a measuring tape. Come up with a scheme to predict flipper length given only body mass. Note: no fancy "linear regression" allowed - tell me a scheme in terms an 8th grader could understand!
sns.relplot(x="body_mass_g", y="flipper_length_mm", s=5, data=p)
<seaborn.axisgrid.FacetGrid at 0x7f7070fa14c0>
Example: Suppose we want to predict a penguin's species based on its body mass and flipper length. Describe a scheme for doing this.
sns.relplot(x="body_mass_g", y="flipper_length_mm", hue="species", s=10, data=p[p["species"]!="Adelie"])
<seaborn.axisgrid.FacetGrid at 0x7f703dc5fc70>
Any classification and regression problem can be cast in terms very similar to those above. The number of variables you're basing your prediction on may change, but that doesn't fundamentally change the problem.
While there are many schemes for classification and regression, many of the most commonly used ones are all built on top of linear models. This means a few things:
where $c_i$ are some coefficients that we may need to figure out from the data.
It may not seem so at first, but the natural language to talk about these models (and most of the fancier ones built on top of them) is linear algebra, because linear functions are very naturally represented using the language of vectors and matrices.
We will talk about vectors and matrices. Vectors are sort of like math pandas Series and a matrix is sort of like a math pandas Dataframe.
Exercise Write the linear regression model from above $$ y = c_0 + c_1 x_1 + \ldots + c_d x_d $$ using vector notation.
One of numpy's main reasons for existing is to make linear algebra easy to code.
# make two 5d column vectors of integers
a = np.random.randint(1, 10, size=(5,1))
a
array([[2], [5], [1], [8], [5]])
b = np.random.randint(1, 10, size=(5,1))
b
array([[4], [4], [3], [7], [5]])
# transpose one
a.T
array([[2, 5, 1, 8, 5]])
# dot it with another vector, 4 ways (@, matmul, dot, sum elementwise)
a.T @ b
np.matmul(a.T, b)
np.dot(a.T, b)
np.sum(a * b)
112
# L2 norm, 2 ways
np.linalg.norm(a)
np.sqrt(np.dot(a.T, a))
array([[10.90871211]])
# calculate the distance between them
np.linalg.norm(b-a)
3.1622776601683795
X = np.array([[0,1,2],[3,4,5]])
X
array([[0, 1, 2], [3, 4, 5]])
Y = np.array([[1,2],[3,4],[5,6]])
Y
array([[1, 2], [3, 4], [5, 6]])
a = np.array([3, 2, 1]).reshape((3,1))
a
array([[3], [2], [1]])
# matrix-vector multiplication
np.matmul(X, a)
array([[ 4], [22]])
# matrix-matrix multiplication
X.dot(Y) # despite the name, this does matrix multiplication
array([[13, 16], [40, 52]])
If A has dimensions $a \times b$, B must have dimensions $b \times c$ for $AB$ to work. The dimensions of the result are $a \times c$.
The second dimension of the first thing must match the first dimension of the second, and that shared dimension "swallowed" by the multiplication.
Example using $X, Y, a$ from above.
$$X^{2 \times 3} Y^{3 \times 2} = (XY)^{2 \times 2}$$$$Y^{3 \times 2} X^{2 \times 3} = (YX)^{3 \times 3}$$$$X^{2 \times 3} a^{3 \times 1} = (Xa)^{2 \times 1}$$$$Y^{3 \times 2} a^{3 \times 1} = \text{(n/a - dimension mismatch)}$$Notice: if $X$ is $a \times b$, then $X^T$ is $b \times a$.
Let:
For each of the following, say whether the result is:
Questions:
Multiply $X Z^T$ first: $$ 7 \times 7 \cdot 7 \times 6 \cdot 6 \times 13$$ $$ 7 \times 6 \cdot 6 \times 13$$ $$ 7 \times 13$$
Multiply $Z^T Y^T$ first: $$ 7 \times 7 \cdot 7 \times 6 \cdot 6 \times 13$$ $$ 7 \times 7 \cdot 7 \times 13$$ $$ 7 \times 13$$