Joeyonng
  • Notebook
  • Pages
  • About
  • Backyard
  1. Linear Algebra
  2. 10  SVD and Pseudoinverse
  • Welcome
  • Notations and Facts
  • Linear Algebra
    • 1  Fields and Spaces
    • 2  Vectors and Matrices
    • 3  Span and Linear Independence
    • 4  Basis and Dimension
    • 5  Linear Map and Rank
    • 6  Inner Product and Norm
    • 7  Orthogonality and Unitary Matrix
    • 8  Complementary Subspaces and Projection
    • 9  Orthogonal Complement and Decomposition
    • 10  SVD and Pseudoinverse
    • 11  Orthogonal and Affine Projection
    • 12  Determinants and Eigensystems
    • 13  Similarity and Diagonalization
    • 14  Normal and Hermitian Matrices
    • 15  Positive Definite Matrices
  • Calculus
    • 16  Derivatives
    • 17  Chain rule
  • Probability and Statistics
    • 18  Probability
    • 19  Random Variables
    • 20  Expectation
    • 21  Common Distributions
    • 22  Moment Generating Function
    • 23  Concentration Inequalities I
    • 24  Convergence
    • 25  Limit Theorems
    • 26  Maximum Likelihood Estimation
    • 27  Bayesian Estimation
    • 28  Expectation-maximization
    • 29  Concentration Inequalities II
  • Learning Theory
    • 30  Statistical Learning
    • 31  Bayesian Classifier
    • 32  Effective Class Size
    • 33  Empirical Risk Minimization
    • 34  Uniform Convergence
    • 35  PAC Learning
    • 36  Rademacher Complexity
  • Machine Learning
    • 37  Linear Discriminant
    • 38  Perceptron
    • 39  Logistic Regression
    • 40  Multi-layer Perceptron
    • 41  Boosting
    • 42  Support Vector Machine
    • 43  Decision Tree
    • 44  Principle Component Analysis
  • Deep Learning
    • 45  Transformer

Table of contents

  • URV Factorization
  • Singular Value Decomposition (SVD)
  • Pseudoinverse
    • Properties of pseudoinverse
  1. Linear Algebra
  2. 10  SVD and Pseudoinverse

10  SVD and Pseudoinverse

URV Factorization

For each \mathbf{A} \in \mathbb{R}^{m \times n} of rank r, there are orthogonal matrices \mathbf{U}_{m \times m} and \mathbf{V}_{n \times n} and a nonsingular matrix \mathbf{C}_{r \times r} such that

\mathbf{A} = \mathbf{U} \mathbf{R} \mathbf{V}^{T} = \mathbf{U} \begin{bmatrix} \mathbf{C}_{r \times r} & 0 \\ 0 & 0 \\ \end{bmatrix}_{m \times n} \mathbf{V}^{T}.

  • The first r columns in \mathbf{U} are an orthonormal basis for R (\mathbf{A}).

  • The last m - r columns of \mathbf{U} are an orthonormal basis for N (\mathbf{A}^{T}).

  • The first r columns in \mathbf{V} are an orthonormal basis for R (\mathbf{A}^{T}).

  • The last m - r columns of \mathbf{V} are an orthonormal basis for N (\mathbf{A}).

Proof

Suppose vector spaces \mathbb{R}^{m} and \mathbb{R}^{n} have orthonormal bases \{ \mathbf{u}_{1}, \dots, \mathbf{u}_{m} \} and \{ \mathbf{v}_{1}, \dots, \mathbf{v}_{n} \}, respectively.

Given any matrix \mathbf{A} \in \mathbb{R}^{m \times n}, the orthogonal decomposition theorem shows that the following subspaces are complementary in \mathbb{R}^{m} and \mathbb{R}^{n}.

R (\mathbf{A}) \oplus N (\mathbf{A}^{T}) = \mathbb{R}^{m},

R (\mathbf{A}^{T}) \oplus N (\mathbf{A}) = \mathbb{R}^{n}.

According to the property of the complementary subspaces, we can separate the basis for \mathbb{R}^{m} into the basis for R (\mathbf{A}) and N (\mathbf{A}^{T}), and the basis for \mathbb{R}^{n} into the basis for N (\mathbf{A}) and R (\mathbf{A}^{T}).

Since \mathrm{rank} (\mathbf{A}) = r, suppose the bases are separated in the following way:

  • \mathcal{B}_{R (\mathbf{A})} = \{ \mathbf{u}_{1}, \dots, \mathbf{u}_{r} \},

  • \mathcal{B}_{N (\mathbf{A}^{T})} = \{ \mathbf{u}_{r + 1}, \dots , \mathbf{u}_{m} \},

  • \mathcal{B}_{R (\mathbf{A}^{T})} = \{ \mathbf{v}_{1}, \dots, \mathbf{v}_{r} \},

  • \mathcal{B}_{N (\mathbf{A})} = \{ \mathbf{v}_{r + 1}, \dots , \mathbf{v}_{n} \}.

Now we treat the bases for \mathbb{R}^{m} and \mathbb{R}^{n} as the columns of the matrices \mathbf{U} and \mathbf{V}:

\mathbf{U} = \begin{bmatrix} \mathbf{u}_{1} & \dots & \mathbf{u}_{m} \end{bmatrix},

\mathbf{V} = \begin{bmatrix} \mathbf{v}_{1} & \dots & \mathbf{v}_{n} \end{bmatrix},

and define

\mathbf{R} = \mathbf{U}^{T} \mathbf{A} \mathbf{V}.

Note that

\mathbf{A} \mathbf{V} = \begin{bmatrix} \mathbf{A} \mathbf{v}_{1} & \dots & \mathbf{A} \mathbf{v}_{n} \end{bmatrix} = \begin{bmatrix} \mathbf{A} \mathbf{v}_{1} & \dots & \mathbf{A} \mathbf{v}_{r} & 0 & \dots & 0 \end{bmatrix}

\mathbf{U}^{T} \mathbf{A} = (\mathbf{A}^{T} \mathbf{U})^{T} = \begin{bmatrix} \mathbf{A}^{T} \mathbf{u}_{1} & \dots & \mathbf{A}^{T} \mathbf{u}_{n} \end{bmatrix}^{T} = \begin{bmatrix} \mathbf{A}^{T} \mathbf{u}_{1} & \dots & \mathbf{A}^{T} \mathbf{u}_{r} & 0 & \dots & 0 \end{bmatrix}^{T}

since \mathbf{v}_{r + 1}, \dots, \mathbf{v}_{n} \in N (\mathbf{A}), and \mathbf{u}_{r + 1}, \dots, \mathbf{u}_{m} \in N (\mathbf{A}^{T}).

Thus, \mathbf{R} has mostly 0 except the top left corner:

\mathbf{R} = \mathbf{U}^{T} \mathbf{A} \mathbf{V} = \begin{bmatrix} \mathbf{u}_{1}^{T} \mathbf{A} \mathbf{v}_{1} & \dots & \mathbf{u}_{1}^{T} \mathbf{A} \mathbf{v}_{r} & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \mathbf{u}_{r}^{T} \mathbf{A} \mathbf{v}_{1} & \dots & \mathbf{u}_{r}^{T} \mathbf{A} \mathbf{v}_{r} & 0 & \dots & 0 \\ 0 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 \\ \end{bmatrix} = \begin{bmatrix} \mathbf{C} & 0 \\ 0 & 0 \\ \end{bmatrix}.

Then we can see \mathbf{A} can always be decomposed in terms of \mathbf{U}, \mathbf{R}, \mathbf{V} because of the property of orthogonal matrix

\begin{aligned} \mathbf{U} \mathbf{R} \mathbf{V}^{T} & = \mathbf{U} \mathbf{U}^{T} \mathbf{A} \mathbf{V} \mathbf{V}^{T} \\ & = \mathbf{U} \mathbf{U}^{-1} \mathbf{A} \mathbf{V} \mathbf{V}^{-1} \\ & = \mathbf{A}. \end{aligned}

To see why \mathbf{C} is always non-singular, we first note that

\text{rank} (\mathbf{A}) = \text{rank} (\mathbf{U}^{T} \mathbf{A} \mathbf{V}) = r

because the multiplication by a full-rank square matrix preserves rank.

Thus,

\text{rank} (\mathbf{C}) = \text{rank} \left( \begin{bmatrix} \mathbf{C} & 0 \\ 0 & 0 \\ \end{bmatrix} \right) = \text{rank} (\mathbf{U}^{T} \mathbf{A} \mathbf{V}) = r.

Singular Value Decomposition (SVD)

Singular value decomposition is a special case of the URV factorization where the \mathbf{C} matrix is a diagonal matrix with increasing values in the diagonal.

For each \mathbf{A} \in \mathbb{C}^{m \times n} of rank r, there are orthogonal matrices \mathbf{U} \in \mathbb{R}^{m \times m} and \mathbf{V} \in \mathbb{R}^{n \times n}, and a diagonal matrix \mathbf{D} \in \mathbb{C}^{r \times r} = \text{diag} (\sigma_{1}, \dots, \sigma_{r}) such that

\mathbf{A} = \mathbf{U} \begin{bmatrix} \mathbf{D} & 0 \\ 0 & 0 \\ \end{bmatrix} \mathbf{V}^{H}

with

\sigma_{1} \geq \sigma_{2} \geq \dots, \geq \sigma_{r},

where

  • the columns in \mathbf{U} and \mathbf{V} are singular vectors and

  • the diagonal values of \mathbf{D} (\sigma_{i}) are singular values.

Pseudoinverse

A generalized inverse for any matrix can be defined using URV factorization or SVD.

Given a URV factorization of matrix \mathbf{A} \in \mathbb{R}^{m \times n}

\mathbf{A} = \mathbf{U} \begin{bmatrix} \mathbf{C} & 0 \\ 0 & 0 \\ \end{bmatrix} \mathbf{V}^{T}

the pseudoinverse of \mathbf{A} is defined as

\mathbf{A}^{\dagger} = \mathbf{V} \begin{bmatrix} \mathbf{C}^{-1} & 0 \\ 0 & 0 \\ \end{bmatrix} \mathbf{U}^{T}.

The pseudoinverse of matrix \mathbf{A} can also be stated as a matrix \mathbf{A}^{\dagger} that satisfies the following:

\mathbf{A} \mathbf{A}^{\dagger} \mathbf{A} = \mathbf{A},

\mathbf{A}^{\dagger} \mathbf{A} \mathbf{A}^{\dagger} = \mathbf{A}^{\dagger},

where \mathbf{A} \mathbf{A}^{\dagger} and \mathbf{A}^{\dagger} \mathbf{A} are symmetric matrix:

(\mathbf{A} \mathbf{A}^{\dagger})^{T} = \mathbf{A} \mathbf{A}^{\dagger},

(\mathbf{A}^{\dagger} \mathbf{A})^{T} = \mathbf{A}^{\dagger} \mathbf{A}.

Properties of pseudoinverse

  • The pseudoinverse \mathbf{A}^{\dagger} of \mathbf{A} is unique.

    Proof

    We prove by contradiction. Suppose that there are 2 pseudoinverse \mathbf{B}, \mathbf{C} for \mathbf{A}.

    \begin{aligned} \mathbf{A} \mathbf{B} & = (\mathbf{A} \mathbf{C} \mathbf{A}) \mathbf{B} & [\mathbf{A} \mathbf{A}^{\dagger} \mathbf{A} = \mathbf{A}] \\ & = (\mathbf{C}^{T} \mathbf{A}^{T}) (\mathbf{B}^{T} \mathbf{A}^{T}) & [(\mathbf{A} \mathbf{A}^{\dagger})^{T} = \mathbf{A} \mathbf{A}^{\dagger}] \\ & = \mathbf{C}^{T} (\mathbf{A}^{T} \mathbf{B}^{T} \mathbf{A}^{T}) \\ & = \mathbf{C}^{T} (\mathbf{A} \mathbf{B} \mathbf{A})^{T} \\ & = \mathbf{C}^{T} \mathbf{A}^{T} \\ & = \mathbf{A} \mathbf{C} \\ \end{aligned}

    \begin{aligned} \mathbf{B} \mathbf{A} & = \mathbf{B} (\mathbf{A} \mathbf{C} \mathbf{A}) & [\mathbf{A} \mathbf{A}^{\dagger} \mathbf{A} = \mathbf{A}] \\ & = (\mathbf{A}^{T} \mathbf{B}^{T}) (\mathbf{A}^{T} \mathbf{C}^{T}) & [(\mathbf{A} \mathbf{A}^{\dagger})^{T} = \mathbf{A} \mathbf{A}^{\dagger}] \\ & = (\mathbf{A}^{T} \mathbf{B}^{T} \mathbf{A}^{T}) \mathbf{C}^{T} \\ & = (\mathbf{A} \mathbf{B} \mathbf{A})^{T} \mathbf{C}^{T} \\ & = \mathbf{A}^{T} \mathbf{C}^{T} \\ & = \mathbf{C} \mathbf{A} \\ \end{aligned}

    Thus,

    \begin{aligned} \mathbf{B} & = \mathbf{B} \mathbf{A} \mathbf{B} & [\mathbf{A}^{\dagger} \mathbf{A} \mathbf{A}^{\dagger} = \mathbf{A}^{\dagger}] \\ & = \mathbf{C} \mathbf{A} \mathbf{B} & [\mathbf{C} \mathbf{A} = \mathbf{B} \mathbf{A}] \\ & = \mathbf{C} \mathbf{A} \mathbf{C} & [\mathbf{A} \mathbf{B} = \mathbf{A} \mathbf{C}] \\ & = \mathbf{C} \end{aligned}

  • Given a full rank matrix \mathbf{A} \in \mathbb{R}^{m \times n}.

    If m > n, the left inverse of \mathbf{A} is the pseudoinverse of \mathbf{A} and is written as

    (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T}.

    If m < n, the right inverse of \mathbf{A} is the pseudoinverse of \mathbf{A} and is written as

    \mathbf{A}^{T} (\mathbf{A} \mathbf{A}^{T})^{-1} .

    If m = n, the inverse of \mathbf{A} is the pseudoinverse of \mathbf{A}.

    Proof

    We first prove that (\mathbf{A}^{T} \mathbf{A})^{-1} (when m > n) and (\mathbf{A} \mathbf{A}^{T})^{-1} (when m < n) exist.

    Since \mathbf{A} is a full rank matrix,

    \text{rank} (\mathbf{A}) = \min (m, n).

    According to the property of rank,

    \text{rank} (\mathbf{A}^{T} \mathbf{A}) = \text{rank} (\mathbf{A} \mathbf{A}^{T}) = \text{rank} (\mathbf{A}) = \text{min} (m, n).

    Thus, \mathbf{A}^{T} \mathbf{A} (m > n) and \mathbf{A} \mathbf{A}^{T} (m < n) are non-singular.

    • if m > n, \mathbf{A}^{T} \mathbf{A} \in \mathbb{R}^{n \times n} is full rank because \text{rank} (\mathbf{A}^{T} \mathbf{A}) = \min (m, n) = n,

    • if m < n, \mathbf{A} \mathbf{A}^{T} \in \mathbb{R}^{m \times m} is full rank because \text{rank} (\mathbf{A}^{T} \mathbf{A}) = \min (m, n) = m.

    we prove that (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} when (m > n) is the pseudoinverse of \mathbf{A}.

    \mathbf{A} \mathbf{A}^{\dagger} \mathbf{A} = \mathbf{A} (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} \mathbf{A} = \mathbf{A} \mathbf{I}_{n \times n} = \mathbf{A}

    \mathbf{A}^{\dagger} \mathbf{A} \mathbf{A}^{\dagger} = (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} \mathbf{A} (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} = \mathbf{I}_{n \times n} \mathbf{A}^{\dagger} = \mathbf{A}^{\dagger}

    \mathbf{A}^{\dagger} \mathbf{A} = (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} \mathbf{A} = \mathbf{I}_{n \times n} = \mathbf{A}^{T} \mathbf{A} (\mathbf{A}^{T} \mathbf{A})^{-1} = \mathbf{A}^{T} (\mathbf{A}^{\dagger})^{\mathbf{T}}= (\mathbf{A}^{\dagger} \mathbf{A})^{T}

    \mathbf{A} \mathbf{A}^{\dagger} = \mathbf{A} (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} = (\mathbf{A}^{\dagger})^{T} \mathbf{A}^{T} = (\mathbf{A} \mathbf{A}^{\dagger})^{T}

    The case that \mathbf{A}^{T} (\mathbf{A} \mathbf{A}^{T})^{-1} when (m < n) can be proved similarly.

    Then we prove that (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} is the left inverse of \mathbf{A} when m > n:

    \mathbf{A}^{\dagger} \mathbf{A} = (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} \mathbf{A} = \mathbf{I}_{n \times n}.

    The case that \mathbf{A}^{T} (\mathbf{A} \mathbf{A}^{T})^{-1} is the right inverse of \mathbf{A} when m < n can be proved similarly.

    Thus we have proved that (\mathbf{A}^{T} \mathbf{A})^{-1} \mathbf{A}^{T} (m > n) and \mathbf{A}^{T} (\mathbf{A} \mathbf{A}^{T})^{-1} (m < n) are the left and right inverse of \mathbf{A}.

    When m = n, we have both

    \mathbf{A} \mathbf{A}^{-1} \mathbf{A} = \mathbf{A},

    \mathbf{A} \mathbf{A}^{\dagger} \mathbf{A} = \mathbf{A},

    but since \mathbf{A}^{-1} and \mathbf{A}^{\dagger} are both unique, it must be that

    \mathbf{A}^{-1} = \mathbf{A}^{\dagger}.

9  Orthogonal Complement and Decomposition
11  Orthogonal and Affine Projection