Lecture 18 - Linear Algebra: Why, and the Basics¶

In [2]:

import numpy as np
import seaborn as sns

Announcements:¶

Faculty candidate talks this week:
- Thursday 4pm CF 105 Research Talk
  - Title: Estimating Demand for Online Shopping using Limited Historical Observation
- Friday 4pm CF 316 Teaching Demo
  - Title: Introduction to Algorithms for Graphs: Representations, and Search Algorithms
Generalization content from last Wednesday - we'll pick this up again next week. For now we'll make a beeline for what you need for lab 7.
FP group formation - by Wednesday night
FP proposal: by Friday night
Starting with 6, labs should be getting a little lighter-weight to account for the final project.

Goals:¶

Understand the basic idea behind linear models and why linear algebra is the standard language for describing about machine learning models in general.
Know the key definitions, notations, and operations of linear algebra.
- Vectors
- Matrices
- Transposes
- Dot product
- Matrix multiplication
Know the constraints on dimensionality in matrix multiplication and be able to calculate the dimensions of the result of an operation.

Recall: Classification and Regression¶

Most prediction tasks fall into one of two categories: classification or regression. The basic distinction is whether you're trying to predict a discrete categorical property, or a continuous numerical property.

Classification is the problem of predicting one of a discrete set of labels.
Regression is the problem of predicting a real number

In [3]:

p = sns.load_dataset('penguins')
p

Out[3]:

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	Male
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	Female
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	Female
3	Adelie	Torgersen	NaN	NaN	NaN	NaN	NaN
4	Adelie	Torgersen	36.7	19.3	193.0	3450.0	Female
...	...	...	...	...	...	...	...
339	Gentoo	Biscoe	NaN	NaN	NaN	NaN	NaN
340	Gentoo	Biscoe	46.8	14.3	215.0	4850.0	Female
341	Gentoo	Biscoe	50.4	15.7	222.0	5750.0	Male
342	Gentoo	Biscoe	45.2	14.8	212.0	5200.0	Female
343	Gentoo	Biscoe	49.9	16.1	213.0	5400.0	Male

344 rows × 7 columns

Example: Consider flipper length and body mass. Suppose it's easier to put a penguin on a scale than to pin it down and measure its flipper with a measuring tape. Come up with a scheme to predict flipper length given only body mass. Note: no fancy "linear regression" allowed - tell me a scheme in terms an 8th grader could understand!

In [4]:

sns.relplot(x="body_mass_g", y="flipper_length_mm", s=5, data=p)

Out[4]:

<seaborn.axisgrid.FacetGrid at 0x7f7070fa14c0>

Example: Suppose we want to predict a penguin's species based on its body mass and flipper length. Describe a scheme for doing this.

In [5]:

sns.relplot(x="body_mass_g", y="flipper_length_mm", hue="species", s=10, data=p[p["species"]!="Adelie"])

Out[5]:

<seaborn.axisgrid.FacetGrid at 0x7f703dc5fc70>

Linear Algebra: Why?¶

Any classification and regression problem can be cast in terms very similar to those above. The number of variables you're basing your prediction on may change, but that doesn't fundamentally change the problem.

While there are many schemes for classification and regression, many of the most commonly used ones are all built on top of linear models. This means a few things:

The regressed value $y$ is a linear function of the input variables ($x_1 \ldots x_d$).

$$ y = c_0 + c_1 x_1 + \ldots + c_d x_d$$

The predicted probability $p(Y=y)$ of a given class is a linear function of the input variables.

$$ p(Y=y) = c_0 + c_1 x_1 + \ldots + c_d x_d$$

where $c_i$ are some coefficients that we may need to figure out from the data.

It may not seem so at first, but the natural language to talk about these models (and most of the fancier ones built on top of them) is linear algebra, because linear functions are very naturally represented using the language of vectors and matrices.

Linear Algebra: The Essentials¶

We will talk about vectors and matrices. Vectors are sort of like math pandas Series and a matrix is sort of like a math pandas Dataframe.

Vectors¶

A vector is an ordered sequence of numbers
Notation:
- $x \in A$ means that element $x$ is in set $A$
- If $x \in \mathbb{R}^D$ then $x$ is a vector of $D$ real numbers
  - I.e., $\mathbb{R}^D$ is the set of real-valued, D-dimensional vectors
- If $x \in \mathbb{Z}^D$, $x$ is a vector of $D$ integers
  - I.e., $\mathbb{Z}^D$ is the set of D-dimensional, integer-valued vectors
Geometric interpretation
- $x \in \mathbb{R}$ ($D=1$, aka $x$ is a scalar) then $x$ is a point along the real line
- $x \in \mathbb{R}^2$ is a point on a 2d dim plane
- $x \in \mathbb{R}^3$ is a point in 3d space
- etc.
Note: by default, we will assume $x$ is a column vector: D elements high by 1 element wide
- You can transpose $x$, denoted $x^T$ which turns it into a row vector (i.e., 1 element high by D elements wide).
- For math: we assume column vector by default
- For machine learning libraries: often assume row vector by default
What can you do to/with vectors?
- For the following, assume $x,y,z \in \mathbb{R}^D$
- Index into them: $x_i$ is the $i$th element of $x$, assuming $i \in \{1,2,\dots,D\}$.
- Add or subtract them (element-wise), assuming they have the same dim
  - $ z = x + y$ means $z_i = x_i + y_i$ for $i = 1,2,\dots,D$.
- Multiply by a scalar
  - $z = ax$ where $a \in \mathbb{R}$, then $z_i = a x_i$.
- Compute the inner product (or dot product) between two vectors: $x^T y = \sum_{i=1}^D x_i y_i$.
- Compute the length of a vector: $\Vert x \Vert_2 = \sqrt{x^T x} = \sqrt{\sum_{i=1}^D x_i^2}$.
  - This is known as the $\ell_2$ norm of the vector.
- Compute the distance between two vectors: $\Vert x-y \Vert_2$.

Exercise Write the linear regression model from above $$ y = c_0 + c_1 x_1 + \ldots + c_d x_d $$ using vector notation.

Vectors in numpy¶

One of numpy's main reasons for existing is to make linear algebra easy to code.

In [8]:

# make two 5d column vectors of integers
a = np.random.randint(1, 10, size=(5,1))
a

Out[8]:

array([[2],
       [5],
       [1],
       [8],
       [5]])

In [9]:

b = np.random.randint(1, 10, size=(5,1))
b 

Out[9]:

array([[4],
       [4],
       [3],
       [7],
       [5]])

In [12]:

# transpose one
a.T

Out[12]:

array([[2, 5, 1, 8, 5]])

In [18]:

# dot it with another vector, 4 ways (@, matmul, dot, sum elementwise)
a.T @ b
np.matmul(a.T, b)
np.dot(a.T, b)
np.sum(a * b)

Out[18]:

In [20]:

# L2 norm, 2 ways
np.linalg.norm(a)
np.sqrt(np.dot(a.T, a))

Out[20]:

array([[10.90871211]])

In [22]:

# calculate the distance between them
np.linalg.norm(b-a)

Out[22]:

3.1622776601683795

Matrices¶

A matrix generalizes a vector
$X \in \mathbb{R}^{N \times D}$ has $N$ rows and $D$ columns (i.e., is $N$ high and $D$ wide). Each element is a real number.
Geometric intuition? We won't be able to get deep enough in this class to appreciate them. :(
What can we do to matrices?
- Index into them: $X_{ij}$ is the element at row $i$ and column $j$
- Transpose: If $X \in \mathbb{R}^{N \times D}$ then $X^T \in \mathbb{R}^{D \times N}$, and the $(i,j)$th element of $X$ is the $(j,i)$th element of $X^T$.
- Add and subtract them: $Z = X + Y$ then $Z_{ij} = X_{ij} + Y_{ij}$.
- Multiply by a scalar: $Z = aX$ then $Z_{ij} = a X_{ij}$.
- Matrix multiplication is not element-wise multiplication
  - $X \in \mathbb{R}^{M \times N}$ and $Y \in \mathbb{R}^{N \times P}$, then
    - $Z = XY \in \mathbb{R}^{M \times P}$
    - $Z_{ik} = \sum_{j=1}^N X_{ij} Y_{jk}$
    - $Z_{ik}$ is the dot product of the $i$th row of $X$ and the $k$th column of $Y$
  - Example: $YX$ will result in a $3 \times 3$.
    - $Z_{1,1} = (1)(0) + (3)(2) = 6$

In [23]:

X = np.array([[0,1,2],[3,4,5]])
X

Out[23]:

array([[0, 1, 2],
       [3, 4, 5]])

In [24]:

Y = np.array([[1,2],[3,4],[5,6]])
Y

Out[24]:

array([[1, 2],
       [3, 4],
       [5, 6]])

In [25]:

a = np.array([3, 2, 1]).reshape((3,1))
a

Out[25]:

array([[3],
       [2],
       [1]])

In [26]:

# matrix-vector multiplication
np.matmul(X, a)

Out[26]:

array([[ 4],
       [22]])

In [28]:

# matrix-matrix multiplication
X.dot(Y) # despite the name, this does matrix multiplication

Out[28]:

array([[13, 16],
       [40, 52]])

Dimension Compatibility: Whiteboard¶

If A has dimensions $a \times b$, B must have dimensions $b \times c$ for $AB$ to work. The dimensions of the result are $a \times c$.

The second dimension of the first thing must match the first dimension of the second, and that shared dimension "swallowed" by the multiplication.

Example using $X, Y, a$ from above.

$$X^{2 \times 3} Y^{3 \times 2} = (XY)^{2 \times 2}$$$$Y^{3 \times 2} X^{2 \times 3} = (YX)^{3 \times 3}$$$$X^{2 \times 3} a^{3 \times 1} = (Xa)^{2 \times 1}$$$$Y^{3 \times 2} a^{3 \times 1} = \text{(n/a - dimension mismatch)}$$

Exercises: Matrix multiplication and dimension compability¶

Notice: if $X$ is $a \times b$, then $X^T$ is $b \times a$.

Let:

$X \in \mathbb{R}^{7 \times 7}$
$Y \in \mathbb{R}^{13 \times 6}$
$Z \in \mathbb{R}^{6 \times 7}$

For each of the following, say whether the result is:

A: Dim mismatch / error
B: $6 \times 7$
C: $7 \times 13$

Questions:

$YZ$
$Z^TX$
$XYZ$
$ZXY$
$ZXY^T$ (only $Y$ is being transposed)
$ZXX$
$XZ^TY^T$

In [ ]:

Multiply $X Z^T$ first: $$ 7 \times 7 \cdot 7 \times 6 \cdot 6 \times 13$$ $$ 7 \times 6 \cdot 6 \times 13$$ $$ 7 \times 13$$

Multiply $Z^T Y^T$ first: $$ 7 \times 7 \cdot 7 \times 6 \cdot 6 \times 13$$ $$ 7 \times 7 \cdot 7 \times 13$$ $$ 7 \times 13$$