In this notebook, we will explore a few fundamental concepts in Linear Algebra which are important for machine learning. If you are already familiar with concepts in Linear Algebra, this notebook will be a review.
Learning Objectives
By the end of this notebook, you should be able to
Consider solving a system of equations as a matrix multiplication operation.
List and carry out some common matrix operations.
Import modules
Begin by importing the modules to be used in this notebook
import numpy as np
import matplotlib.pyplot as pltSystems of Equations with Matrices¶
In machine learning (and almost all branches of science), we find ourselves carrying out many operations at the same time.
Consider having a model that yields some value in terms of some independent data with a slope and intercept . This is a simple linear equation:
To be concrete with our example, let’s envision that our slope is and our intercept is , and we want to compute our modeled values for some independent points, say, , and . Clearly we could go about solving all of these equations one by one:
We can also do this with Python:
# model parameters
m = 4
b = 5
# independent data
x_1 = 1
x_2 = 2
x_3 = 3
# model computations
y_1 = m*x_1 + b
print('y_1 =',y_1)
y_2 = m*x_2 + b
print('y_2 =',y_2)
y_3 = m*x_3 + b
print('y_3 =',y_3)y_1 = 9
y_2 = 13
y_3 = 17
This is a little cumbersome since we’re doing the same thing over and over again. One alternative is to group our data into a matrix - a collection of numbers that allow us to represent these operations.
For this particular example, we can define a matrix as
and a matrix as
The upshot of organizing our data in these matrices is that we can now use well-known rules for matrix multiplication to do our calculation in one swoop. We write this as follows:
where here, is a matrix with the values , , and . This is built into Python’s numpy package in a couple ways:
# make the X and w matrices
X = np.array([[1,1],
[2,1],
[3,1]])
w = np.array([[4],
[5]])
# use the np.dot function
y = np.dot(X, w)
print('np.dot(X,w)=')
print(y)
# use the @ shorthand notation
y = X@w
print('\nX@w=')
print(y)
# use the matmul function
y = np.matmul(X,w)
print('\nnp.matmul(X,w)=')
print(y)np.dot(X,w)=
[[ 9]
[13]
[17]]
X@w=
[[ 9]
[13]
[17]]
np.matmul(X,w)=
[[ 9]
[13]
[17]]
We see above that, in the way we compute the matrix multiplication, it is following the exact same computations as we did one by one - just a lot more succinct. For this reason, matrices are the preferred method for organizing lots of computations in one concise way. There is an entire field of math called Linear Algebra dedicated to these and related operations.
Matrix Operations¶
In the above example, we saw the result of matrix multiplication. Here, we’ll explain how this works for any two matrices and then show a few other examples of matrix operations.
For the examples below, we’ll work with the two matrices and defined here:
We can define these in Python too:
A = np.array([[1, 3, 6],
[4, 6, 2],
[1, 4, 5]])
B = np.array([[7, 3, 2],
[4, 3, 9],
[8, 4, 2]])Matrix Multiplication¶
To multiply two matrices to produce a third, we compute the dot product of each row of the first matrix with each column of the second matrix. For example, the first row of is and the first column of is , and the dot product is:
The other elements of can calculated similarly and summarized by the following formula:
where is an index for rows and is an index for columns. We can check the result of each of these dot products in Python:
A@Barray([[67, 36, 41],
[68, 38, 66],
[63, 35, 48]])Matrix Addition¶
In a similar approach, we can also add matrices together. Matrix addition is done element-wise, i.e.
We can check this in Python:
A+Barray([[ 8, 6, 8],
[ 8, 9, 11],
[ 9, 8, 7]])Matrix Inversion¶
Another common operation is matrix inversion. One way to think about matrix inversion is similar to scalar operations. For example 4 and 1/4 are multiplicative “inverses” of each other because
Similar, an inverse of a matrix , written as is one that yields
where is the “identity” matrix - a matrix which is all 0’s except for 1’s on the diagonal line of the matrix.
We can compute the inverse of a matrix in Python as follows:
# use the np.linalg.inv function to copmute A inverse
A_inverse = np.linalg.inv(A)
print(A_inverse)[[ 0.78571429 0.32142857 -1.07142857]
[-0.64285714 -0.03571429 0.78571429]
[ 0.35714286 -0.03571429 -0.21428571]]
We can check that this is the inverse of by checking that the multiplication gives the identity matrix:
# vertify the inverse calculation
(A_inverse@A).round(3)array([[ 1., 0., 0.],
[ 0., 1., -0.],
[ 0., -0., 1.]])Looking at the calculations above, we can also deduce that for this calculation to work, the number of columns in must equal the number of rows in .
The inversion calculation is a useful calculation because it allows us to “undo” a transformation. For example, above, we modeled a system of equation as
If instead of having our model parameters and using them to compute the outputs, matrix inversion allows us to go the other way - compute the model parameters given , i.e.
This is the same thing as solving a system of equations! However, there is one caveat here - we can only solve a system of equations with unknown parameters if we have equations. Here, that’s equivalent to saying that must have an equal number of rows and columns. In the example given above, that’s not the case. And, in general in machine learning, it won’t be the case. The best we can hope for is to estimate our parameters with the least possible error. To run this calculation, we’ll need one more idea about matrices:
Matrix Transpose¶
The tranpose of a matrix , denoted , is the matrix where the rows and columns have been swapped, i.e.
There are two ways to compute in Python:
## numpy function
A_transpose = np.transpose(A)
print('np.transpose(A)=')
print(A_transpose)
## shorthand notation
A_transpose = A.T
print('\nA.T=')
print(A_transpose)np.transpose(A)=
[[1 4 1]
[3 6 4]
[6 2 5]]
A.T=
[[1 4 1]
[3 6 4]
[6 2 5]]
Matrix Pseudo-Inversion¶
In cases where the matrix is not “square”, i.e. it doesn’t have an equivalent number of rows and columns, we can compute a pseudo-inverse, defined as
In the context of systems of equations, the pseudo-inverse gives the best parameters to fit some data. Let’s test out this idea with our data above. If we repeat the definitions of and above:
X = np.array([[1,1],
[2,1],
[3,1]])
y = np.array([[9],
[13],
[17]])We can use the psuedo-inverse to recover :
w = np.linalg.inv(X.T@X)@X.T@y
print(w)[[4.]
[5.]]
Key Takeaways
The sections above give a small glimpse into the world of linear algebra with some key ideas used in machine learning. Hopefully the following points are clear from the above discussion:
Matrices are a convenient way to organize linear model calculations that transform independent data (“features”) into a related set of dependent values (“targets”).
Much like calculations carried out with scalar values, matrices have analogous methods for addition, multiplication, inversion and other related properties.