5. Maths4ML: Matrices
#Matrices are Space Warpers
A matrix is not just a spreadsheet or a container for data. It is a function or a machine. Ab equation like (in a Neural Network or linear regression, etc.), the matrix is a function or an agent that grabs the data vector () and physically moves it, warps it and transforms it into a new position ().
Matrices can stretch space and collapse dimensions.
#Matrix transforming the Basis Vectors
Let the basis vectors be denoted by :
- : - A Green arrow pointing right.
- : - A Red arrow pointing up.
The columns of a matrix tells exactly where these 2 arrows will land after transformation.
Column 1 : The basis vector now lives here from .
Column 2 : The basis vector now lives here from .
Every other point on the grid will follow the new grid lines (basis) formed by these 2 arrows.
So, for example a point according to the old system would have meant to go 2 units of and 1 unit of , but now since these basis vectors point in different direction, the same vector will point in a completely different direction.
#Fundamental Matrices
#1. Scaling
Doubling the length of the green & red arrow will cause the matrix to zoom in on the data. For example :
#2. Rotation
Arrows stay the same in length but pivot , the matrix spins the entire world. The grid will remain square but tilted :
#3. Shearing
If the bottom of a square if fixed and the top is pushed sideways. The square will turn into a parallelogram.
Below image shows the exact matrices transformations as covered above.

Standard Matrix transformations (Scaling, Rotation & SHearing)
#Matrix Vector Multiplication
This is Row-by-Column multiplication. It is computationally correct but it doesn't offer any intuition. This same thing can be represented as :
This is literally saying :
Take steps along the transformed Green Arrow (Column 1) and then take steps along the transformed Red Arrow (Column 2).
#Matrix-Matrix Multiplication
A matrix-matrix multiplication like using standard row-by-column method is a mess of numbers.
A better way is to look at matrix as a collection of columns (vectors).
So now instead of one big operation it is just doing matrix-vector multiplication twice, once for each of the column of .
- Column 1 of result : acts on the first column of .
- Column 2 of result : acts on the second column of .
- Column 1 of matrix is the result of passing column 1 of through machine (matrix) :
- Column 2 of matrix is the result of passing column 2 of through machine (matrix) :
So, finally Matrix is just these 2 results pasted side-by-side :
Thus,
The output columns of are literally just weighted sums of the columns of . The resulting shape must live inside the space defined by 's columns.
#Matrix Multiplication is Function Composition
Matrix multiplication is simply chaining multiple machines.
The vector is the raw material.
- Machine is the first machine which transforms it.
- Machine is the second machine which grabs the result of Machine and transforms it again.
#Why MatMul is not commutative
Suppose 2 matrices & which stretch the -axis by 2 and rotate everything by respectively.
#Scenario 1 : Stretch then rotate
- Stretches left-right.
- Rotates so that left becomes bottom & right becomes top.
#Scenario 2 : Rotate then stretch
- Rotates so that left becomes bottom & right becomes top.
- Stretches the original top-bottom (which are now left-right).
Thus, even though the same operations are applied, the order changes everything. Thus, .
#Invertible Matrices
Chaining matrices to get back to from where we started.
- Matrix is a Shear Right matrix.
- Matrix is a Shear Left matrix.
Therefore, will slant a square right and then push it back to the original shape.
Thus,
is the identity matrix that does nothing.
#Transpose
Mechanical definition is to swap the rows & columns.
#Co-variance
is the similarity map of data . Let there be a dataset of 3 students consisting of their study time & score.
- Column 1 : Blue vector is the study vector.
- Column 2 : Red vector is the score vector.
Now Row 1 is the study vector & Row 2 is the score vector.
Thus, will become :
- Cell (1,1) : Variance of the Study vector.
- Cell (2,2) : Variance of the Score vector.
- Cell (1,2) & (2,1) : Covariance of the Score & Study vectors.
- Diagonals: How spread out is this feature? (Variance)
- Off-Diagonals: How much does Feature A look like Feature B? (Covariance/Similarity)
#Symmetric Matrix
Matrix is equal to its own transpose.
Let :
This is an Asymmetric Matrix. If the input is a circle, this matrix will grab the top & slide it sideways. The result will be a oval but it will be smeared.
This is a symmetric matrix. It will also stretch a circle but the resultant will be an ellipse with its major & minor axis perpendicular to each other.
- Asymmetric Matrix : Might shear space, twist it, and squash it at weird angles.
- Symmetric Matrix : It creates a shape where the axes of stretching are perpendicular.
#Trace
The trace () of the matrix is sum of its diagonal elements.
In a matrix :
- Off diagonal elements :
- : tells how much points Up into the y-axis.
- : tells how much points Right into the x-axis.
They describe how much becomes and becomes .
- Diagonal elements :
- : tells how much stretches while staying along the x-axis.
- : tells how much stretches while staying along the y-axis.
They describe the direct stretching.
So tells how much the matrix pushing outward along the original grid lines.
The Trace ignores the mixing. It only asks: "On average, is the machine stretching things out or shrinking them in?"
- Trace > 0 : Matrix is generally expanding the space.
- Trace < 0 : Matrix is generally collapsing the space.
- Trace = 0 : The expansion in one direction is perfectly cancelled by contraction in the other.
#Range (Column Space)
Range of a matrix is the Span of its columns.
- Span of a set of vectors is the set of all the vectors that can be formed by scaling & adding those vectors.
Thus, Column Space (Range) is the set of vectors that can be get by taking all possible linear combinations of its column vectors.
#Null Space
The null space (or kernel) of a matrix is the set of all vectors that satisfy the equation (the zero vector).
#Rank
It is a single number which measures the dimension of the space. It tells the number of actual, non-redundant columns in a matrix.
- Column 3 = Column 1 + Column 2.
- Thus, there are only 2 dimensions as the third column is just a diagonal lying in the plane defined by the first 2 columns.
- Thus, Rank = 2.
The concept of dimensionality reduction is based on this fact to throw away the Fake dimensions and keep only the Rank dimensions (the true signals).
Thus,
| Concept | Definition | Intuition |
|---|---|---|
| Columns | The vectors that make up the matrix . | The Raw tools, Arrows, some of which may be redundant |
| Span | The set of all possible linear combinations of a list of vectors: . | The Cloud. The total shape created by stretching and combining the raw tools in every possible way |
| Range (Column Space) | The subspace of outputs reachable by the linear transformation . Mathematically equivalent to the Span of the columns. | The Reach. When we view the matrix as a machine, the Range is the specific "territory" the machine can touch. |
| Basis | A minimal set of linearly independent vectors that spans a subspace. | The Skeleton. If you strip away all the redundant columns (the fake tools), this is the clean, efficient set of arrows left over that still builds the same Cloud. |
| Rank | The dimension of the Column Space. | The Score. A single number representing the "True Dimension" of the output. It tells how many useful dimensions exist in your data. |
The Columns of the matrix generate a Span. When viewed as a function, this Span is called the Range. The smallest set of vectors needed to describe this Range is the Basis, and the count of vectors in that Basis is the Rank.
#Space Warper
Modify the transformation matrix by dragging the basis vectors ( and ) or by changing the sliders values representing :
- Vector represents .
- Vector represents .
When the green arrow aligns with red arrow, it signifies a dimension loss.
With this this post on matrices and their geometric implementation, types of matrices and different operations using matrices is completed.