6. Maths4ML: Matrix Decomposition & SVD

📅 2026-01-03 | #Maths for ML

Matrix Decomposition : Prime Factorization of Matrices

The process of breaking down a complex matrix into a product of simpler matrices.

Complex Data A = Simple B \times Simple C

It is computationally efficient to perform operations like matrix inversion on triangular or diagonal matrices rather than dense matrix.
Decomposition reveals hidden (latent) structures in the data that are not visible in the raw matrix.

There are many types of decompositions. Some of the fundamental decompositions are :

LU Decomposition
QR Decomposition
Eigen Decomposition
- Already covered in previous blog.
SVD

1. LU Decomposition

A = L \cdot U

It splits a square matrix $A$ into 2 triangular matrices $L$ ans $U$ . It is like Gaussian Elimination with a memory.

$L$ : Lower Triangular matrix. It represents the record of the row operations (multipliers) used during elimination.
$U$ : Upper Triangular matrix. This is the Row Echelon Form, the final result of the elimination process.

Example

A = [4633]

Goal is to find $L$ & $U$ such that $LU = A$ .

To make $U$ upper trainagular (gaussian elimination) : $R_{2} \to R_{2} - 1.5 \times R_{1}$ .
$R_{2} = [0, - 1.5]$ .

U = [40 3 - 1.5]

$L$ is a lower triangular matrix. Its diagonal will always be one.
If the row operation $R_{i} \to R_{i} - ℓ_{ij} \times R_{j}$ to create a zero at position $(i, j)$ in $U$ , then the entry in the $L$ matrix at position $(i, j)$ is exactly $ℓ_{ij}$ .

Thus,

L = [1 1.5 01]

To verify,

LU = [1 1.5 01] [40 3 - 1.5] = [(1) (4) + 0 (1.5) (4) + 0 (1) (3) + 0 (1.5) (3) + (1) (- 1.5)] = [4633] = A

Use of LU Decomposition

Solving a system $A x = b$ using $A^{- 1}$ is computationally expensive ( $O (N^{3})$ ) and unstable. With $A = LU$ , it is solvable in two fast $O (N^{2})$ steps:

Forward Substitution : Solve $L y = b$ for $y$ .
Backward Substitution : Solve $U x = y$ for $x$ .

It is used in almost all linear algebra libraries like NumPy

2. QR Decomposition

A = Q \cdot R

It decomposes the matrix $A$ into :

$Q$ : Orthogonal Matrix whose columns are orthonormal.
$R$ : Upper Triangle Matrix

Orthogonal :

Two vectors are orthogonal if they are at a $90 \degree$ angle to each other.
- Their dot product is zero. ( $u \cdot v = 0$ ).

Orthonormal :

Two vectors are orthonormal if they are perpendicular and they both have a length of exactly 1.
- Their dot product is zero. ( $u \cdot v = 0$ ).
- Length of each vector is 1 (6 $∣∣ u ∣∣ = 1$ and 7 $∣∣ v ∣∣ = 1$ ).

Normalization : Turning an Orthogonal set into an Orthonormal set by dividing each vector by its own length.

Another important fact is : $Q^{⊤} Q = I$ . This means $Q^{- 1} = Q^{⊤}$

Example

Using Gram-Schmidt process to decompose this matrix :

A = [1110]

Columns vectors of $A$ : $a_{1} = [11]$ and $a_{2} = [10]$ .

Goal : - Turn $a_{1}$ and $a_{2}$ into orthonormal vectors $e_{1}$ and $e_{2}$ .

Find $e_{1}$ :

$∣∣ a_{1} ∣∣ = 1^{2} + 1^{2} = 2$ .
$e_{1} = \frac{a _{1}}{∣∣ a _{1} ∣∣} = [1/ 2 1/ 2]$ .

Make $a_{2}$ orthogonal to $e_{1}$ , then normalize to find $e_{2}$ :

find the projection of $a_{2}$ onto $e_{1}$ : $(a_{2} \cdot e_{1}) e_{1}$ :

Vector Projection Vector = Length (Scalar) (a_{2} \cdot e_{1}) \times Direction (Unit Vector) e_{1}

$a_{2} \cdot e_{1} = (1) (1/ 2) + (0) (1/ 2) = 1/ 2$
Subtract the projection from $a_{2}$ to get orthogonal part $u_{2}$ :

u_{2} = a_{2} - (1/ 2) e_{1} = [10] - [0.5 0.5] = [0.5 - 0.5]

Normalize $u_{2}$ to get $e_{2}$ . ( $∣∣ u_{2} ∣∣ = 0. 5^{2} + (- 0.5)^{2} = 0.5 = 1/ 2$ )
$e_{2} = \frac{u _{2}}{1/ 2} = 2 [0.5 - 0.5] = [1/ 2 - 1/ 2]$ .

Thus,

Q = [1/ 2 1/ 2 1/ 2 - 1/ 2]

Finding $R$ : $R$ contains the dot products (projections) calculated during the Gram-Schmidt process.

R = [a_{1} \cdot e_{1} 0 a_{2} \cdot e_{1} a_{2} \cdot e_{2}]

R = [20 1/ 2 1/ 2]

Use of QR Decomposition

It provide numerical stability as it doesn't amplify errors.

Least Square Problem

The problem is :

x arg min ∥ A x - b ∥_{2}

Find the value of $x$ that minimizes the squared Euclidean distance between $A x$ and $b$ .

As discussed above, QR decomposition will give $A = QR$ . Thus,

QR x = b

Multiplying both sides by $Q^{⊤}$ :

Q^{T} (QR x) = Q^{T} b

Because $Q^{T} Q = I$

R x = Q^{T} b

It is much faster to multiply with an Upper Triangular Matrix.

QR decomposition takes skewed, correlated vectors ( $A$ ) and mathematically "straightens" them into a perfectly perpendicular grid ( $Q$ ), while recording the original "lean" or correlation in the triangular matrix $R$ .

To summarize,

Category	Goal	Key Examples
Solving Systems	Break $A$ into triangular forms to solve $A x = b$ faster.	LU
Orthogonalizing	Isolate the direction of vectors from their scaling/shear.	QR Decomposition
Spectral Analysis	Find the axes along which the matrix just "stretches" space.	Eigendecomposition (for square matrix) , SVD

Singular Value Decomposition (SVD)

While Eigenvectors identify lines that stay parallel during a transformation, SVD generalizes this concept to all matrices, even rectangular ones. It reveals the fundamental geometry of data transformations by breaking a complex operation into three distinct, elementary actions.

The core principle behind SVD is :

Every linear transformation, no matter how complex, is actually just a sequence of three simple moves.

For any real matrix $A$ of size $m \times n$ (where $m$ is the number of rows/samples and $n$ is the number of columns/features), the decomposition is:

A = U Σ V^{⊤}

It represents : $Rotate \to Stretch \to Rotate$ . To understand the equation, it must be read from right to left as applied to a vector $x$ in $A x$ :

$V^{⊤}$ : Rotation/Reflection in the Input Space ( $n$ -dimensional).
$Σ$ : Scaling/Stretching along the axes.
$U$ : Rotation/Reflection in the Output Space ( $m$ -dimensional).

1. $V^{⊤}$ (First Rotation)

The transpose of the Right Singular Matrix.

It is an $n \times n$ orthogonal matrix.
It takes the input vectors and rotates them to align with the "natural axes" of the data.
It does not change the length of vectors, only their orientation.
The rows of $V^{⊤}$ (which are the columns of $V$ ) are the eigenvectors of $A^{⊤} A$ .
- $v_{1}$ (the first row of $V^{⊤}$ ) aligns with the direction of maximum variance (the "spine") of the data.
- $v_{2}$ aligns with the second greatest variance, perpendicular to $v_{1}$ .
This acts as a "change of basis" in the input domain.

2. $Σ$ (The Stretch)

The Singular Value Matrix.

It is an $m \times n$ diagonal matrix (mostly zeros, with positive numbers on the main diagonal).
It stretches or shrinks the space along the axes defined by $V$ .
- It doesn't rotate or shear as it is a diagonal matrix.
The diagonal entries are : $σ_{1}, σ_{2}, \dots, σ_{r}$ . These are called Singular Values.
- They are always real and non-negative, sorted in descending order ( $σ_{1} \geq σ_{2} \geq \dots \geq 0$ ).
- The value of $σ_{i}$ dictates the "strength" or "energy" of the transformation along that axis.
  - A singular value of 0 indicates that dimension is squashed into nothingness (loss of information).
$σ_{i} = λ_{i}$ , where $λ_{i}$ are the eigenvalues of $A^{⊤} A$ .

3. $U$ (The Final Rotation)

The Left Singular Matrix.

It is an $m \times m$ orthogonal matrix.
After the input has been rotated ( $V^{⊤}$ ) and stretched ( $Σ$ ), it now lives in the output dimensions. $U$ rotates this result to align it with the standard axes of the output space.
The columns of $U$ are the eigenvectors of $A A^{⊤}$ .
- These vectors usually represent some patterns in the output space.

Component	Matrix Shape	Type	Represents	Derived From
$U$	$m \times m$	Orthogonal	Output Space Directions	Eigenvectors of $A A^{⊤}$
$Σ$	$m \times n$	Diagonal	Stretching Factors (Gain)	$Eigenvalues of A^{⊤} A$
$V^{⊤}$	$n \times n$	Orthogonal	Input Space Directions	Eigenvectors of $A^{⊤} A$

It is called Singular Value Decomposition because it factorizes a matrix specifically to expose the critical Singular Values, the numbers that tell you if and how the matrix collapses space (becomes singular).
Geometrically, a Singular Matrix "crushes" space (e.g., flattens a 3D cube into a 2D sheet or 1D line).

Why $A^{⊤} A$ and $A A^{⊤}$ ?

These are co-variance matrices. If $A = U Σ V^{⊤}$ , then $A^{⊤} A$ :

A^{⊤} A = (U Σ V^{⊤})^{⊤} (U Σ V^{⊤})

A^{⊤} A = V Σ^{⊤} U^{⊤} U Σ V^{⊤}

Since $U$ is orthogonal, $U^{⊤} U = I$

A^{⊤} A = V (Σ^{⊤} Σ) V^{⊤}

This is exactly like the Eigendecomposition formula ( $P D^{- 1} P$ ).

$V$ : The eigenvector matrix of $A^{⊤} A$ .
$Σ^{⊤} Σ$ : The eigenvalue matrix of $A^{⊤} A$ .
- Because $Σ$ is diagonal, $Σ^{⊤} Σ$ just contains the squared singular values ( $σ^{2}$ ) on the diagonal.

Solved Example of SVD

A = [3223 2 - 2]

Goal : Find $U$ ($2\times2$), $Σ$ ($2\times3$), and $V$ ($3\times3$).

1. Compute $A^{⊤} A$ :

A A^{T} = [3223 2 - 2] 322 23 - 2 = [178817]

2. Find the eigenvectors of this symmetric matrix :

Solve $det (A A^{T} - λ I) = 0$ .

(17 - λ)^{2} - 64 = 0 ⟹ 17 - λ = \pm 8

This gives $λ_{1} = 25$ and $λ_{2} = 9$

3. Find Singular Values ( $Σ$ ):

Square root of the eigen values :

$σ_{1} = 25 = 5$ and $σ_{2} = 9 = 3$

4. Find Eigenvectors (Columns of $U$ )

Solving the linear system $(A - λ I) u = 0$ for the vector $u = [x y]$ will give the eigen vectors.

For $λ_{1} = 25$ :

[17 - 25 8 8 17 - 25] u = [- 8 8 8 - 8] u = 0 ⟹ u_{1} = [11]

Normalize (divide by length $2$ (magnitude of $u_{1}$ )) : $u_{1} = [1/ 2 1/ 2]$
For $λ_{2} = 9$ :

[8888] u = 0 ⟹ u_{2} = [- 1 1]

Normalize: $u_{2} = [- 1/ 2 1/ 2]$

Thus,

U = [\frac{1}{2} \frac{1}{2} - \frac{1}{2} \frac{1}{2}]

5. Construct $Σ$ (The Stretch)

$Σ$ must have the same dimensions as the original matrix $A$ ($2 \times 3$). Place the singular values on the diagonal and pad the rest with zeros.

Σ = [500300]

6. Find $V$ (Right Singular Vectors)

Using relation $A^{T} u = σ v \to v = \frac{1}{σ} A^{T} u$ .

How did this relation came about ?

A^{T} = (U Σ V^{T})^{T}

A^{T} = (V^{T})^{T} Σ^{T} U^{T}

A^{T} = V Σ U^{T}

$Σ$ is diagonal, so $Σ^{T} = Σ$ .

Multiply both sides by a vector $u$ , one of the columns of $U$ : $u_{i}$ .

$U^{T} u_{i}$ will result in a vector of zeros with a single $1$ at index $i$ , because $U$ is orthonormal.

A^{T} u_{i} = V Σ (U^{T} u_{i})

A^{T} u_{i} = σ_{i} v_{i}

Calculate $v_{1}$ (using $σ_{1} = 5, u_{1}$ ):

v_{1} = \frac{1}{5} 322 23 - 2 [\frac{1}{2} \frac{1}{2}] = \frac{1}{5 2} 550 = \frac{1}{2} \frac{1}{2} 0

Similarly,

v_{2} = \frac{- 1}{3 2} \frac{1}{3 2} \frac{- 4}{3 2}

Because, $σ_{3} = 0$ , so not posiible to use above formula. But since, $V$ is orthogonal matrix, hence all its vectors (columns) are perpendicular. Thus using cross product,

v_{3} = v_{1} \times v_{2} = - 2/3 2/3 1/3

The vector orthogonal to $(1, 1, 0)$ and $(- 1, 1, - 4)$ is $(- 4, 4, 2)$ . Normalized by 6.

Finally,

V = \frac{1}{2} \frac{1}{2} 0 \frac{- 1}{3 2} \frac{1}{3 2} \frac{- 4}{3 2} \frac{- 2}{3} \frac{2}{3} \frac{1}{3}

Thus, reconstruct $A$ using $U Σ V^{T}$ :

A = U (2 \times 2) [\frac{1}{2} \frac{1}{2} \frac{- 1}{2} \frac{1}{2}] \cdot Σ (2 \times 3) [500300] \cdot V^{T} (3 \times 3) \frac{1}{2} \frac{- 1}{3 2} \frac{- 2}{3} \frac{1}{2} \frac{1}{3 2} \frac{2}{3} 0 \frac{- 4}{3 2} \frac{1}{3}

$V^{T}$ takes a 3D vector and rotates it in 3D.
$Σ$ takes that 3D vector and removes the last dimension (multiplying by 0), landing back in 2D space.
$U$ rotates that 2D result.

SVD Interactive Explorer

Using this simulation all the above theory about SVD can be visualized.

Use the presets to see how different linear transformations are decomposed into Rotation $\to$ Stretch $\to$ Rotation.

Matrix as a Sum of Layers

Instead of a giant wall of numbers, matrices can be visualized as a stack of simple, transparent sheet

Outer Product

Dot Product (Inner Product) : Takes two vectors and squashes them into a single number (a scalar).
Outer Product : Takes two vectors and explodes them into a matrix.

A column vector $u$ and a row vector $v^{T}$ when multiplied will create a grid (a matrix) where every row is just a copy of $v$ scaled by a number from $u$ .

The matrix created this way $u v^{⊤}$ has Rank 1. Thus, it is the simplest possible matrix. So, all the rows and columns are parallel to each other. One simple pattern is repeated across the grid.

A = U Σ V^{⊤}

$U$ is a matrix of columns or eigenvectors of $A A^{⊤}$ :

∣ u_{1} ∣ ∣ u_{2} ∣ \dots

$Σ$ is a diagonal matrix:

σ_{1} 0 ⋮ 0 σ_{2} ⋮ \dots \dots ⋱

$V^{T}$ is a matrix of rows (eigenvectors of $A^{⊤} A$ ):

- - v_{1}^{T} v_{2}^{T} ⋮ - -

Multiply $Σ V^{⊤}$

Σ V^{T} = σ_{1} v_{1}^{T} σ_{2} v_{2}^{T} ⋮ σ_{r} v_{r}^{T}

So, the equation becomes $A = U \times (Σ V^{T})$ .

Multiplying the Coumn $\times$ Row :

multiply $U$ (columns) by the result from Step 1 (rows).

∣ u_{1} ∣ ∣ u_{2} ∣ \times [- - r_{1} r_{2} - -] = (u_{1} \times r_{1}) + (u_{2} \times r_{2})

Thus applying this to the SVD matrices :

Column 1 of $U$ is $u_{1}$ .
Row 1 of $(Σ V^{T})$ is $σ_{1} v_{1}^{T}$ .

Their product gives : $σ_{1} u_{1} v_{1}^{T}$ .

Similarly for all other rows and columns :

A = Layer 1 σ_{1} u_{1} v_{1}^{T} + Layer 2 σ_{2} u_{2} v_{2}^{T} + \dots

A = i \sum σ_{i} u_{i} v_{i}^{T}

$u_{1} v_{1}^{T}$ is the Pattern.
$σ_{1}$ is the Importance / opacity of the pattern in the final data.

$σ_{i}$ s are always in descending order.

So, Layer 1 will capture the highest information, and the importance of each later will decrease subsequently.

Summing up all these layers will give the perfect high-resolution image.

Use of SVD (Image Compression)

Because SVD sorts the layers by importance (from largest $σ$ to smallest), we know that the bottom layers contribute almost nothing to the visible image.

So, deleting the bottom 50 layers (set $σ_{51} \dots σ_{100}$ to 0), we save a huge amount of storage space, having very minimal effect on the final appearance. Lose the noise and retain the signals.

SVD proves that a complex dataset is just a sum of simple patterns, ordered by how "loud" they are. You can mute the quiet ones to compress the data without losing the meaning.

Eigenface Compression

To prove that quite patterns can be muted without losing the meaning, let's look at a generated face. A face is highly structured—two eyes, a nose, and a mouth are always in roughly the same place. SVD exploits this structure to compress the image massively.

Drag the slider from k=1 up to k=50.
Rank 1 (The Ghost): The single "loudest" pattern. It captures the average head shape and lighting direction. It looks like a blurred mask.
Rank 10 (The Identity): By adding just 9 more layers, the eyes, nose bridge, and mouth become sharp. It is now possible to recognize the person.
Rank 50 (The Texture): The final layers add the "quiet" details—skin texture, noise, and subtle imperfections.

The Result: Even at Rank 10, the image is recognizable, yet we have discarded over 80% of the raw data. This technique (Eigenfaces) was the foundation of early facial recognition systems—reducing complex human faces to a simple list of 10-20 numbers.

With this, the post on different types of matrix decomposition and their uses, then SVD and how it works is completed.

6. Maths4ML: Matrix Decomposition & SVD

Matrix Decomposition : Prime Factorization of Matrices

1. LU Decomposition

Example

Use of LU Decomposition

2. QR Decomposition

Example

Use of QR Decomposition

Least Square Problem

Singular Value Decomposition (SVD)

1. V⊤ (First Rotation)

2. Σ (The Stretch)

3. U (The Final Rotation)

Why A⊤A and AA⊤?

Solved Example of SVD

1. Compute A⊤A :

2. Find the eigenvectors of this symmetric matrix :

3. Find Singular Values (Σ):

4. Find Eigenvectors (Columns of U)

5. Construct Σ (The Stretch)

6. Find V (Right Singular Vectors)

SVD Interactive Explorer

Matrix as a Sum of Layers

Outer Product

Use of SVD (Image Compression)

Eigenface Compression

1. $V^{⊤}$ (First Rotation)

2. $Σ$ (The Stretch)

3. $U$ (The Final Rotation)

Why $A^{⊤} A$ and $A A^{⊤}$ ?

1. Compute $A^{⊤} A$ :

3. Find Singular Values ( $Σ$ ):

4. Find Eigenvectors (Columns of $U$ )

5. Construct $Σ$ (The Stretch)

6. Find $V$ (Right Singular Vectors)