Kush Blogs

6. Maths4ML: Matrix Decomposition & SVD

Matrix Decomposition : Prime Factorization of Matrices

The process of breaking down a complex matrix into a product of simpler matrices.

  • It is computationally efficient to perform operations like matrix inversion on triangular or diagonal matrices rather than dense matrix.
  • Decomposition reveals hidden (latent) structures in the data that are not visible in the raw matrix.

There are many types of decompositions. Some of the fundamental decompositions are :

  1. LU Decomposition
  2. QR Decomposition
  3. Eigen Decomposition
  4. SVD

1. LU Decomposition

It splits a square matrix into 2 triangular matrices ans . It is like Gaussian Elimination with a memory.

  • : Lower Triangular matrix. It represents the record of the row operations (multipliers) used during elimination.
  • : Upper Triangular matrix. This is the Row Echelon Form, the final result of the elimination process.

Example

Goal is to find & such that .

  • To make upper trainagular (gaussian elimination) : .
  • .
  • is a lower triangular matrix. Its diagonal will always be one.
  • If the row operation to create a zero at position in , then the entry in the matrix at position is exactly .

Thus,

To verify,

Use of LU Decomposition

Solving a system using is computationally expensive () and unstable. With , it is solvable in two fast steps:

  1. Forward Substitution : Solve for .
  2. Backward Substitution : Solve for .

It is used in almost all linear algebra libraries like NumPy


2. QR Decomposition

It decomposes the matrix into :

  • : Orthogonal Matrix whose columns are orthonormal.
  • : Upper Triangle Matrix

  1. Orthogonal :
  • Two vectors are orthogonal if they are at a $90 \degree$ angle to each other.
    • Their dot product is zero. ().
  1. Orthonormal :
  • Two vectors are orthonormal if they are perpendicular and they both have a length of exactly 1.
    • Their dot product is zero. ().
    • Length of each vector is 1 (6 and 7).

Normalization : Turning an Orthogonal set into an Orthonormal set by dividing each vector by its own length.

Another important fact is : . This means


Example

Using Gram-Schmidt process to decompose this matrix :

Columns vectors of : and .

Goal : - Turn and into orthonormal vectors and .

  1. Find :
  • .
  • .
  1. Make orthogonal to , then normalize to find :
  • find the projection of onto : :
  • Subtract the projection from to get orthogonal part :
  • Normalize to get . ()
  • .

Thus,

Finding : contains the dot products (projections) calculated during the Gram-Schmidt process.

Use of QR Decomposition

  • It provide numerical stability as it doesn't amplify errors.

Least Square Problem

The problem is :

Find the value of that minimizes the squared Euclidean distance between and .

As discussed above, QR decomposition will give . Thus,

Multiplying both sides by :

Because

It is much faster to multiply with an Upper Triangular Matrix.

QR decomposition takes skewed, correlated vectors () and mathematically "straightens" them into a perfectly perpendicular grid (), while recording the original "lean" or correlation in the triangular matrix .


To summarize,

CategoryGoalKey Examples
Solving SystemsBreak into triangular forms to solve faster.LU
OrthogonalizingIsolate the direction of vectors from their scaling/shear.QR Decomposition
Spectral AnalysisFind the axes along which the matrix just "stretches" space.Eigendecomposition (for square matrix) , SVD

Singular Value Decomposition (SVD)

While Eigenvectors identify lines that stay parallel during a transformation, SVD generalizes this concept to all matrices, even rectangular ones. It reveals the fundamental geometry of data transformations by breaking a complex operation into three distinct, elementary actions.

The core principle behind SVD is :

Every linear transformation, no matter how complex, is actually just a sequence of three simple moves.

For any real matrix of size (where is the number of rows/samples and is the number of columns/features), the decomposition is:

It represents : . To understand the equation, it must be read from right to left as applied to a vector in :

  1. : Rotation/Reflection in the Input Space (-dimensional).
  2. : Scaling/Stretching along the axes.
  3. : Rotation/Reflection in the Output Space (-dimensional).

1. (First Rotation)

The transpose of the Right Singular Matrix.

  • It is an orthogonal matrix.
  • It takes the input vectors and rotates them to align with the "natural axes" of the data.
  • It does not change the length of vectors, only their orientation.
  • The rows of (which are the columns of ) are the eigenvectors of .
    • (the first row of ) aligns with the direction of maximum variance (the "spine") of the data.
    • aligns with the second greatest variance, perpendicular to .
  • This acts as a "change of basis" in the input domain.

2. (The Stretch)

The Singular Value Matrix.

  • It is an diagonal matrix (mostly zeros, with positive numbers on the main diagonal).
  • It stretches or shrinks the space along the axes defined by .
    • It doesn't rotate or shear as it is a diagonal matrix.
  • The diagonal entries are : . These are called Singular Values.
    • They are always real and non-negative, sorted in descending order ().
    • The value of dictates the "strength" or "energy" of the transformation along that axis.
      • A singular value of 0 indicates that dimension is squashed into nothingness (loss of information).
  • , where are the eigenvalues of .

3. (The Final Rotation)

The Left Singular Matrix.

  • It is an orthogonal matrix.
  • After the input has been rotated () and stretched (), it now lives in the output dimensions. rotates this result to align it with the standard axes of the output space.
  • The columns of are the eigenvectors of .
    • These vectors usually represent some patterns in the output space.
ComponentMatrix ShapeTypeRepresentsDerived From
OrthogonalOutput Space DirectionsEigenvectors of
DiagonalStretching Factors (Gain)
OrthogonalInput Space DirectionsEigenvectors of
  • It is called Singular Value Decomposition because it factorizes a matrix specifically to expose the critical Singular Values, the numbers that tell you if and how the matrix collapses space (becomes singular).
  • Geometrically, a Singular Matrix "crushes" space (e.g., flattens a 3D cube into a 2D sheet or 1D line).

Why and ?

These are co-variance matrices. If , then :

Since is orthogonal,

This is exactly like the Eigendecomposition formula ().

  • : The eigenvector matrix of .
  • : The eigenvalue matrix of .
    • Because is diagonal, just contains the squared singular values () on the diagonal.

Solved Example of SVD

  • Goal : Find ($2\times2$), ($2\times3$), and ($3\times3$).

1. Compute :

2. Find the eigenvectors of this symmetric matrix :

Solve .

  • This gives and

3. Find Singular Values ():

Square root of the eigen values :

and

4. Find Eigenvectors (Columns of )

Solving the linear system for the vector will give the eigen vectors.

  • For :
  • Normalize (divide by length (magnitude of )) :

  • For :

  • Normalize:

Thus,

5. Construct (The Stretch)

must have the same dimensions as the original matrix ($2 \times 3$). Place the singular values on the diagonal and pad the rest with zeros.

6. Find (Right Singular Vectors)

Using relation .

How did this relation came about ?

is diagonal, so .

Multiply both sides by a vector , one of the columns of : .

will result in a vector of zeros with a single $1$ at index , because is orthonormal.

  • Calculate (using ):
  • Similarly,
  • Because, , so not posiible to use above formula. But since, is orthogonal matrix, hence all its vectors (columns) are perpendicular. Thus using cross product,

The vector orthogonal to and is . Normalized by 6.

Finally,


Thus, reconstruct using :

  1. takes a 3D vector and rotates it in 3D.
  2. takes that 3D vector and removes the last dimension (multiplying by 0), landing back in 2D space.
  3. rotates that 2D result.

SVD Interactive Explorer

Using this simulation all the above theory about SVD can be visualized.

Use the presets to see how different linear transformations are decomposed into Rotation Stretch Rotation.


Matrix as a Sum of Layers

Instead of a giant wall of numbers, matrices can be visualized as a stack of simple, transparent sheet

Outer Product

  • Dot Product (Inner Product) : Takes two vectors and squashes them into a single number (a scalar).
  • Outer Product : Takes two vectors and explodes them into a matrix.

A column vector and a row vector when multiplied will create a grid (a matrix) where every row is just a copy of scaled by a number from .

The matrix created this way has Rank 1. Thus, it is the simplest possible matrix. So, all the rows and columns are parallel to each other. One simple pattern is repeated across the grid.


  • is a matrix of columns or eigenvectors of :
  • is a diagonal matrix:
  • is a matrix of rows (eigenvectors of ):
  1. Multiply

So, the equation becomes .

  1. Multiplying the Coumn Row :
  • multiply (columns) by the result from Step 1 (rows).

Thus applying this to the SVD matrices :

  • Column 1 of is .
  • Row 1 of is .

Their product gives : .

Similarly for all other rows and columns :

  • is the Pattern.
  • is the Importance / opacity of the pattern in the final data.

s are always in descending order.

So, Layer 1 will capture the highest information, and the importance of each later will decrease subsequently.

Summing up all these layers will give the perfect high-resolution image.

Use of SVD (Image Compression)

Because SVD sorts the layers by importance (from largest to smallest), we know that the bottom layers contribute almost nothing to the visible image.

So, deleting the bottom 50 layers (set to 0), we save a huge amount of storage space, having very minimal effect on the final appearance. Lose the noise and retain the signals.

SVD proves that a complex dataset is just a sum of simple patterns, ordered by how "loud" they are. You can mute the quiet ones to compress the data without losing the meaning.

Eigenface Compression

To prove that quite patterns can be muted without losing the meaning, let's look at a generated face. A face is highly structuredโ€”two eyes, a nose, and a mouth are always in roughly the same place. SVD exploits this structure to compress the image massively.

Drag the slider from k=1 up to k=50.

  • Rank 1 (The Ghost): The single "loudest" pattern. It captures the average head shape and lighting direction. It looks like a blurred mask.
  • Rank 10 (The Identity): By adding just 9 more layers, the eyes, nose bridge, and mouth become sharp. It is now possible to recognize the person.
  • Rank 50 (The Texture): The final layers add the "quiet" detailsโ€”skin texture, noise, and subtle imperfections.

The Result: Even at Rank 10, the image is recognizable, yet we have discarded over 80% of the raw data. This technique (Eigenfaces) was the foundation of early facial recognition systemsโ€”reducing complex human faces to a simple list of 10-20 numbers.


With this, the post on different types of matrix decomposition and their uses, then SVD and how it works is completed.