A Brief Primer on Linear Algebra - Introduction to Machine Learning

In this notebook, we will explore a few fundamental concepts in Linear Algebra which are important for machine learning. If you are already familiar with concepts in Linear Algebra, this notebook will be a review.

Learning Objectives

By the end of this notebook, you should be able to

Consider solving a system of equations as a matrix multiplication operation.
List and carry out some common matrix operations.

Import modules

Begin by importing the modules to be used in this notebook

import numpy as np
import matplotlib.pyplot as plt

Systems of Equations with Matrices¶

In machine learning (and almost all branches of science), we find ourselves carrying out many operations at the same time.

Consider having a model that yields some value $y$ in terms of some independent data $x$ with a slope $m$ and intercept $b$ . This is a simple linear equation:

y = mx + b

(1)

To be concrete with our example, let’s envision that our slope is $m=4$ and our intercept is $b=5$ , and we want to compute our modeled values for some independent points, say, $x_1 = 1$ , $x_2 = 2$ and $x_3=3$ . Clearly we could go about solving all of these equations one by one:

\begin{align*} y_1 & = 4 (1) + 5 =9\\ y_2 & = 4 (2) + 5 =13\\ y_3 & = 4 (3) + 5 =17 \end{align*}

(2)

We can also do this with Python:

# model parameters
m = 4
b = 5

# independent data
x_1 = 1
x_2 = 2
x_3 = 3

# model computations
y_1 = m*x_1 + b
print('y_1 =',y_1)
y_2 = m*x_2 + b
print('y_2 =',y_2)
y_3 = m*x_3 + b
print('y_3 =',y_3)

y_1 = 9
y_2 = 13
y_3 = 17

This is a little cumbersome since we’re doing the same thing over and over again. One alternative is to group our data into a matrix - a collection of numbers that allow us to represent these operations.

For this particular example, we can define a matrix $\textbf{X}$ as

\textbf{X} = \begin{bmatrix} x_1 & 1 \\ x_2 & 1 \\ x_3 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 3 & 1 \end{bmatrix}

(3)

and a matrix $\textbf{w}$ as

\textbf{w} = \begin{bmatrix} m \\ b \end{bmatrix} = \begin{bmatrix} 4 \\ 5 \end{bmatrix}

(4)

The upshot of organizing our data in these matrices is that we can now use well-known rules for matrix multiplication to do our calculation in one swoop. We write this as follows:

\textbf{y} = \textbf{X} \cdot \textbf{w}

(5)

where here, $\textbf{y}$ is a matrix with the values $y_1$ , $y_2$ , and $y_3$ . This is built into Python’s numpy package in a couple ways:

# make the X and w matrices
X = np.array([[1,1],
              [2,1],
              [3,1]])
w = np.array([[4],
              [5]])

# use the np.dot function
y = np.dot(X, w)
print('np.dot(X,w)=')
print(y)

# use the @ shorthand notation
y = X@w
print('\nX@w=')
print(y)

# use the matmul function
y = np.matmul(X,w)
print('\nnp.matmul(X,w)=')
print(y)

np.dot(X,w)=
[[ 9]
 [13]
 [17]]

X@w=
[[ 9]
 [13]
 [17]]

np.matmul(X,w)=
[[ 9]
 [13]
 [17]]

We see above that, in the way we compute the matrix multiplication, it is following the exact same computations as we did one by one - just a lot more succinct. For this reason, matrices are the preferred method for organizing lots of computations in one concise way. There is an entire field of math called Linear Algebra dedicated to these and related operations.

Matrix Operations¶

In the above example, we saw the result of matrix multiplication. Here, we’ll explain how this works for any two matrices and then show a few other examples of matrix operations.

For the examples below, we’ll work with the two matrices $\textbf{A}$ and $\textbf{B}$ defined here:

\textbf{A} = \begin{bmatrix} 1 & 3 & 6\\ 4 & 6 & 2\\ 1 & 4 & 5 \end{bmatrix} \, \text{ and } \, \textbf{B} = \begin{bmatrix} 7 & 3 & 2\\ 4 & 3 & 9\\ 8 & 4 & 2 \end{bmatrix}

(6)

We can define these in Python too:

A = np.array([[1, 3, 6],
              [4, 6, 2],
              [1, 4, 5]])

B = np.array([[7, 3, 2],
              [4, 3, 9],
              [8, 4, 2]])

Matrix Multiplication¶

To multiply two matrices to produce a third, we compute the dot product of each row of the first matrix with each column of the second matrix. For example, the first row of $\textbf{A}$ is $\begin{bmatrix} 1 & 3 & 6 \end{bmatrix}$ and the first column of $\textbf{B}$ is $\begin{bmatrix} 7 & 4 & 8 \end{bmatrix}$ , and the dot product is:

\textbf{AB}_{1,1} = (1)(7) + (3)(4) + (6)(8) = 67

(7)

The other elements of $\textbf{AB}$ can calculated similarly and summarized by the following formula:

\textbf{AB}_{r,c} = \sum_{i=1}^{r} \textbf{A}_{r,i} \textbf{B}_{i,c}

(8)

where $r$ is an index for rows and $c$ is an index for columns. We can check the result of each of these dot products in Python:

A@B

array([[67, 36, 41],
       [68, 38, 66],
       [63, 35, 48]])

Matrix Addition¶

In a similar approach, we can also add matrices together. Matrix addition is done element-wise, i.e.

(\textbf{A+B})_{r,c} = \textbf{A}_{r,c} + \textbf{B}_{r,c}

(9)

We can check this in Python:

A+B

array([[ 8,  6,  8],
       [ 8,  9, 11],
       [ 9,  8,  7]])

Matrix Inversion¶

Another common operation is matrix inversion. One way to think about matrix inversion is similar to scalar operations. For example 4 and 1/4 are multiplicative “inverses” of each other because

\left(4\right)\left( \frac{1}{4} \right) = \left(4\right)\left( 4^{-1} \right) = 1

(10)

Similar, an inverse of a matrix $\textbf{A}$ , written as $\textbf{A}^{-1}$ is one that yields

\textbf{A}\textbf{A}^{-1} = \textbf{I}

(11)

where $\textbf{I}$ is the “identity” matrix - a matrix which is all 0’s except for 1’s on the diagonal line of the matrix.

We can compute the inverse of a matrix in Python as follows:

# use the np.linalg.inv function to copmute A inverse
A_inverse = np.linalg.inv(A)
print(A_inverse)

[[ 0.78571429  0.32142857 -1.07142857]
 [-0.64285714 -0.03571429  0.78571429]
 [ 0.35714286 -0.03571429 -0.21428571]]

We can check that this is the inverse of $\textbf{A}$ by checking that the multiplication gives the identity matrix:

# vertify the inverse calculation
(A_inverse@A).round(3)

array([[ 1.,  0.,  0.],
       [ 0.,  1., -0.],
       [ 0., -0.,  1.]])

Looking at the calculations above, we can also deduce that for this calculation to work, the number of columns in $\textbf{A}$ must equal the number of rows in $\textbf{B}$ .

The inversion calculation is a useful calculation because it allows us to “undo” a transformation. For example, above, we modeled a system of equation as

\textbf{y} = \textbf{X} \cdot \textbf{w}

(12)

If instead of having our model parameters $\textbf{w}$ and using them to compute the outputs, matrix inversion allows us to go the other way - compute the model parameters given $\textbf{y}$ , i.e.

\textbf{w} = \textbf{X}^{-1}\textbf{y}

(13)

This is the same thing as solving a system of equations! However, there is one caveat here - we can only solve a system of equations with $n$ unknown parameters if we have $n$ equations. Here, that’s equivalent to saying that $\textbf{X}$ must have an equal number of rows and columns. In the example given above, that’s not the case. And, in general in machine learning, it won’t be the case. The best we can hope for is to estimate our parameters with the least possible error. To run this calculation, we’ll need one more idea about matrices:

Matrix Transpose¶

The tranpose of a matrix $\textbf{A}$ , denoted $\textbf{A}^T$ , is the matrix where the rows and columns have been swapped, i.e.

(\textbf{A}^T)_{r,c} = \textbf{A}_{c,r}

(14)

There are two ways to compute $\textbf{A}^T$ in Python:

## numpy function
A_transpose = np.transpose(A)
print('np.transpose(A)=')
print(A_transpose)

## shorthand notation
A_transpose = A.T
print('\nA.T=')
print(A_transpose)

np.transpose(A)=
[[1 4 1]
 [3 6 4]
 [6 2 5]]

A.T=
[[1 4 1]
 [3 6 4]
 [6 2 5]]

Matrix Pseudo-Inversion¶

In cases where the matrix $\textbf{A}$ is not “square”, i.e. it doesn’t have an equivalent number of rows and columns, we can compute a pseudo-inverse, defined as

\textbf{A}^+ = (\textbf{A}^T\textbf{A})^{-1}\textbf{A}^T

(15)

In the context of systems of equations, the pseudo-inverse gives the best parameters to fit some data. Let’s test out this idea with our data above. If we repeat the definitions of $\textbf{y}$ and $\textbf{X}$ above:

X = np.array([[1,1],
              [2,1],
              [3,1]])
y = np.array([[9],
              [13],
              [17]])

We can use the psuedo-inverse to recover $\textbf{w}$ :

w = np.linalg.inv(X.T@X)@X.T@y
print(w)

[[4.]
 [5.]]

Key Takeaways

The sections above give a small glimpse into the world of linear algebra with some key ideas used in machine learning. Hopefully the following points are clear from the above discussion:

Matrices are a convenient way to organize linear model calculations that transform independent data (“features”) into a related set of dependent values (“targets”).
Much like calculations carried out with scalar values, matrices have analogous methods for addition, multiplication, inversion and other related properties.