Joeyonng
  • Notebook
  • Pages
  • About
  • Backyard
  1. Linear Algebra
  2. 11  Orthogonal and Affine Projection
  • Welcome
  • Notations and Facts
  • Linear Algebra
    • 1  Fields and Spaces
    • 2  Vectors and Matrices
    • 3  Span and Linear Independence
    • 4  Basis and Dimension
    • 5  Linear Map and Rank
    • 6  Inner Product and Norm
    • 7  Orthogonality and Unitary Matrix
    • 8  Complementary Subspaces and Projection
    • 9  Orthogonal Complement and Decomposition
    • 10  SVD and Pseudoinverse
    • 11  Orthogonal and Affine Projection
    • 12  Determinants and Eigensystems
    • 13  Similarity and Diagonalization
    • 14  Normal and Hermitian Matrices
    • 15  Positive Definite Matrices
  • Calculus
    • 16  Derivatives
    • 17  Chain rule
  • Probability and Statistics
    • 18  Probability
    • 19  Random Variables
    • 20  Expectation
    • 21  Common Distributions
    • 22  Moment Generating Function
    • 23  Concentration Inequalities I
    • 24  Convergence
    • 25  Limit Theorems
    • 26  Maximum Likelihood Estimation
    • 27  Bayesian Estimation
    • 28  Expectation-maximization
    • 29  Concentration Inequalities II
  • Learning Theory
    • 30  Statistical Learning
    • 31  Bayesian Classifier
    • 32  Effective Class Size
    • 33  Empirical Risk Minimization
    • 34  Uniform Convergence
    • 35  PAC Learning
    • 36  Rademacher Complexity
  • Machine Learning
    • 37  Linear Discriminant
    • 38  Perceptron
    • 39  Logistic Regression
    • 40  Multi-layer Perceptron
    • 41  Boosting
    • 42  Support Vector Machine
    • 43  Decision Tree
    • 44  Principle Component Analysis
  • Deep Learning
    • 45  Transformer

Table of contents

  • Orthogonal projection
    • Properties of orthogonal projection
  • Orthogonal projection matrix
    • Properties of orthogonal projection matrix
  • Application: least square problem
  • Affine space
  • Affine projection
  • Hyperplanes
  1. Linear Algebra
  2. 11  Orthogonal and Affine Projection

11  Orthogonal and Affine Projection

Orthogonal projection

Suppose \mathcal{M} is a subspace in the vector space \mathcal{V}. Since \mathcal{M}^{\perp} is a orthogonal complementary subspace of \mathcal{M}, we have \mathcal{M} \oplus \mathcal{M}^{\perp} = \mathcal{V}.

For v \in \mathcal{V}, let v = m + n, where m \in \mathcal{M} and n \in \mathcal{M}^{\perp}. We define a linear operator \mathbf{P}_{\mathcal{M}} \in \mathbb{C}^{n \times n} such that

\mathbf{P}_{\mathcal{M}} v = m.

where

  • m is the orthogonal projection of v onto \mathcal{M} along \mathcal{M}^{\perp},

  • \mathbf{P}_{\mathcal{M}} is the orthogonal projection matrix of v onto \mathcal{M} along \mathcal{M}^{\perp}.

Properties of orthogonal projection

Suppose m is the orthogonal projection of v on the subspace \mathcal{M}

m = \mathbf{P}_{\mathcal{M}} v.

  • m always exists and is unique.

    Proof

    TODO

  • Orthogonality of the difference.

    \langle v - m, x \rangle = 0, \quad \forall x \in \mathcal{M}

    Proof

    Since n \in \mathcal{M}^{\perp}, by definition of the orthogonal complement of subspaces,

    \langle n, x \rangle = 0, \quad \forall x \in \mathcal{M}.

    Since by definition v = m + n,

    \langle v - m, x \rangle = 0, \quad \forall x \in \mathcal{M}.

  • Closest point theorem: m is the closest point in \mathcal{M} to v in terms of the ip norm:

    m = \min_{x \in \mathcal{M}} \lVert v - x \rVert.

    Proof

    According to the property of orthogonal projection, the projection m always exists.

    Thus, we can add and subtract m,

    \begin{aligned} \lVert v - x \rVert^{2} & = \lVert v - m + m - x \rVert^{2} \quad \forall x \in \mathcal{M} \\ & = \langle v - m + m - x, v - m + m - x \rangle \\ & = \langle v - m, v - m + m - x \rangle + \langle m - x, v - m + m - x \rangle \\ & = \lVert v - m \rVert^{2} + \langle v - m, m - x \rangle + \langle m - x, v - m \rangle + \lVert m - x \rVert^{2} \\ \end{aligned}

    Since m - x \in \mathcal{M}, according to the property of orthogonal projection,

    \langle v - m, m - x \rangle = \langle m - x, v - m \rangle = 0.

    Thus,

    \begin{aligned} \lVert v - x \rVert^{2} & = \lVert v - m \rVert^{2} + \lVert m - x \rVert^{2} \\ \lVert v - x \rVert^{2} & \geq \lVert v - m \rVert^{2} & [\lVert m - x \rVert^{2} \geq 0] \end{aligned}

    which shows that

    \lVert v - m \rVert^{2}

    minimizes the distances between v to all vectors x in the subspace \mathcal{M}.

Orthogonal projection matrix

If \mathbf{P}_{\mathcal{M}} is the orthogonal projection matrix of v onto the subspace \mathcal{M}, and the columns of \mathbf{M} \in \mathbb{C}^{n \times r} are r bases for \mathcal{M}, then

\mathbf{P}_{\mathcal{M}} = \mathbf{M} (\mathbf{M}^{H} \mathbf{M})^{-1} \mathbf{M}^{H}.

Proof

Since \text{rank} (\mathbf{M}^{H} \mathbf{H}) = \text{rank} (\mathcal{M}) = r and \mathbf{M}^{H} \mathbf{H} \in \mathbb{C}^{r \times r}, according to the property of rank, (\mathbf{M}^{H} \mathbf{H})^{-1} exists.

Suppose the columns of \mathbf{N} \in \mathbb{C}^{n \times s} are orthonormal bases for \mathcal{M}^{\perp}, where s = n - r according to the property of complementary subspaces.

Thus, we can construct the below matrix

\begin{bmatrix} (\mathbf{M}^{H} \mathbf{H})^{-1} \mathbf{M}^{H} \\ \mathbf{N}^{H} \\ \end{bmatrix}.

According to the definition of complementary subspaces, \mathbf{N}^{T} \mathbf{M} = 0 and \mathbf{M}^{T} \mathbf{N} = 0.

Thus,

\begin{aligned} \begin{bmatrix} (\mathbf{M}^{H} \mathbf{H})^{-1} \mathbf{M}^{H} \\ \mathbf{N}^{H} \\ \end{bmatrix} \begin{bmatrix} \mathbf{M} & \mathbf{N} \\ \end{bmatrix} & = \begin{bmatrix} \mathbf{I}_{r \times r} & \mathbf{0} \\ \mathbf{0} & \mathbf{I}_{s \times s} \\ \end{bmatrix} \\ \begin{bmatrix} \mathbf{M} & \mathbf{N} \\ \end{bmatrix}^{-1} & = \begin{bmatrix} (\mathbf{M}^{H} \mathbf{H})^{-1} \mathbf{M}^{H} \\ \mathbf{N}^{H} \\ \end{bmatrix}. \end{aligned}

Since \mathcal{M} and \mathcal{M}^{\perp} are complementary subspaces, by the definition of projector,

\begin{aligned} \mathbf{P}_{\mathcal{M}} & = \begin{bmatrix} \mathbf{M} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{M} & \mathbf{N} \end{bmatrix}^{-1} \\ & = \begin{bmatrix} \mathbf{M} & \mathbf{0} \end{bmatrix} \begin{bmatrix} (\mathbf{M}^{H} \mathbf{H})^{-1} \mathbf{M}^{H} \\ \mathbf{N}^{H} \\ \end{bmatrix} \\ & = \mathbf{M} (\mathbf{M}^{H} \mathbf{H})^{-1} \mathbf{M}^{H}. \end{aligned}

Properties of orthogonal projection matrix

  • \mathbf{P} is a orthogonal projection matrix if and only if

    R (\mathbf{P}) = N (\mathbf{P})^{\perp}.

    Proof

    TODO

  • \mathbf{P} is a orthogonal projection matrix if and only if

    \mathbf{P}^{H} = \mathbf{P}.

    Proof

    According to the orthogonal decomposition theorem,

    \begin{aligned} R (\mathbf{P})^{\perp} & = N (\mathbf{P}^{H}) \\ R (\mathbf{P}) & = N (\mathbf{P}^{H})^{\perp}. \\ \end{aligned}

    According to the property of the orthogonal projection matrix,

    \begin{aligned} N (\mathbf{P})^{\perp} & = N (\mathbf{P}^{H})^{\perp} \\ \mathbf{P} & = \mathbf{P}^{H}. \\ \end{aligned}

Application: least square problem

Consider the problem of solving a system of linear equation for \mathbf{x} \in \mathbb{C}^{n} given \mathbf{y} \in \mathbb{C}^{m} and \mathbf{A} \in \mathbb{C}^{m \times n},

\mathbf{y} = \mathbf{A} \mathbf{x}.

This problem has a solution only when \mathbf{y} \in R (\mathbf{A}). When it has no solution, the objective is changed to solve the least square problem

\mathbf{x}^{*} = \min_{\mathbf{x} \in \mathbb{C}^{n}} \lVert \mathbf{y} - \mathbf{A} \mathbf{x} \rVert_{2}^{2}

so that \mathbf{A} \mathbf{x} can be as close to \mathbf{y} as possible.

Solving the least square problem is the same as solving an orthogonal projection problem,

\begin{aligned} \mathbf{x}^{*} & = \min_{\mathbf{x} \in \mathbb{C}^{n}} \lVert \mathbf{y} - \mathbf{A} \mathbf{x} \rVert_{2}^{2} \\ & = \min_{\mathbf{x} \in \mathbb{C}^{n}} \lVert \mathbf{y} - \mathbf{A} \mathbf{x} \rVert_{2} & [\lVert \mathbf{y} - \mathbf{A} \mathbf{x} \rVert_{2} \geq 0] \\ \mathbf{z}^{*} & = \min_{\mathbf{z} \in R (\mathbf{A})} \lVert \mathbf{y} - \mathbf{z} \rVert_{2} & [\mathbf{z} = \mathbf{A} \mathbf{x}, \mathbf{z}^{*} = \mathbf{A} \mathbf{x}^{*}]. \end{aligned}

which is the problem of finding the closest point of \mathbf{y} on R (\mathbf{A}),

\begin{aligned} \mathbf{z}^{*} & = \mathbf{P}_{R (\mathbf{A})} \mathbf{y} \\ \mathbf{A} \mathbf{x}^{*} & = \mathbf{P}_{R (\mathbf{A})} \mathbf{y}. \end{aligned}

We can then deduce the system of normal equations:

\mathbf{A} \mathbf{x}^{*} = \mathbf{P}_{R (\mathbf{A})} \mathbf{y} \iff \mathbf{A}^{H} \mathbf{A} \mathbf{x}^{*} = \mathbf{A}^{H} \mathbf{y}.

Proof

First multiplying both ends by \mathbf{P}_{R (\mathbf{A})} and use the projector property,

\begin{aligned} \mathbf{A} \mathbf{x}^{*} & = \mathbf{P}_{R (\mathbf{A})} \mathbf{y} \\ \mathbf{P}_{R (\mathbf{A})} \mathbf{A} \mathbf{x}^{*} & = \mathbf{P}_{R (\mathbf{A})}^{2} \mathbf{y} \\ \mathbf{P}_{R (\mathbf{A})} \mathbf{A} \mathbf{x}^{*} & = \mathbf{P}_{R (\mathbf{A})} \mathbf{y}. & [\mathbf{P}_{R (\mathbf{A})}^{2} = \mathbf{P}_{R (\mathbf{A})}] \\ \end{aligned}

Then,

\begin{aligned} \mathbf{P}_{R (\mathbf{A})} \mathbf{A} \mathbf{x}^{*} & = \mathbf{P}_{R (\mathbf{A})} \mathbf{y}. \\ \mathbf{P}_{R (\mathbf{A})} (\mathbf{A} \mathbf{x}^{*} - \mathbf{y}) & = 0 \\ \end{aligned}

which shows that

\mathbf{A} \mathbf{x}^{*} - \mathbf{y} \in N (\mathbf{P}_{R (\mathbf{A})}).

According to the property of the orthogonal projection matrix,

N (\mathbf{P}_{R (\mathbf{A})}) = R (\mathbf{A})^{\perp} = N (\mathbf{A}^{H}),

which means

\begin{aligned} \mathbf{A}^{H} (\mathbf{A} \mathbf{x}^{*} - \mathbf{y}) & = 0 \\ \mathbf{A}^{H} \mathbf{A} \mathbf{x}^{*} & = \mathbf{A}^{H} \mathbf{y}. \end{aligned}

Affine space

A set of vectors in the vector space \mathcal{V} forms an affine space \mathcal{A} if they are the sums of the vectors in a subspace \mathcal{M} \subset \mathcal{V} and a non-zero vector v \in \mathcal{V},

\mathcal{A} = v + \mathcal{M}.

That is, all vectors in \mathcal{A} are sums of vectors in \mathcal{M} and v.

  • \mathcal{A} is a subspace as it does NOT necessarily contain 0 vector.

  • \mathcal{A} can visualized as the subspace \mathcal{M} translated away from origin through v.

Affine projection

Although affine spaces are not a subspaces, the concept of orthogonal projection can also be applied to affine spaces.

Given a vector b \in \mathcal{V} and an affine space \mathcal{A}, the affine projection a \in \mathcal{A} is the orthogonal projection of b onto the affine space \mathcal{A} and can be expressed as

a = v + \mathbf{P}_{\mathcal{M}} (b - v),

where \mathbf{P}_{\mathcal{M}} is the orthogonal projection matrix of \mathcal{M}.

Proof

Since the affine space \mathcal{A} is the set of vectors translated from \mathcal{M} through the vector v, the affine projection a of b onto \mathcal{A} is the translated version of orthogonal projection of the translated version of b onto the translated version of \mathcal{A}.

Thus, the affine projection onto \mathcal{A} is a orthogonal projection onto \mathcal{M} with a translated input and output.

By the definition of the orthogonal projection,

a - v = \mathbf{P}_{\mathcal{M}} (b - v)

where a - v is the translated version of a, b - v the translated version of b, and \mathbf{P}_{\mathcal{M}} is the orthogonal projection matrix of subspace \mathcal{M}.

Thus, the affine projection a of b onto \mathcal{A} can be expressed as

a = v + \mathbf{P}_{\mathcal{M}} (b - v).

Hyperplanes

An affine space \mathcal{H} = \mathbf{v} + \mathcal{M} \subseteq \mathbb{R}^{n} for which \text{dim} (\mathcal{M}) = n - 1 is called a hyperplane, and is usually expressed as the set

\mathcal{H} = \{ \mathbf{x} | \mathbf{w}^{T} \mathbf{x} = \beta \}

where \beta is a scalar and \mathbf{w} is a non-zero vector.

In this case, the hyperplane can be viewed as the subspace

\mathcal{M} = \mathbf{w}^{\perp}

translated by the vector

\mathbf{v} = \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w}.

Proof

The set \mathcal{H} contains the general solutions of the linear system \mathbf{w}^{T} \mathbf{x} = \beta.

A particular solution to the linear system is

\begin{aligned} \mathbf{x} & = (\mathbf{w}^{T})^{-1} \beta \\ & = (\mathbf{w}^{T})^{T} (\mathbf{w}^{T} (\mathbf{w}^{T})^{T})^{-1} \beta \\ & = \mathbf{w} (\mathbf{w}^{T} \mathbf{w})^{-1} \beta \\ & = \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} & [\beta, \mathbf{w}^{T} \mathbf{w} \text{ are scalars}]. \end{aligned}

According to the orthogonal decomposition theorem, the general solution to the linear system can be expressed as

\begin{aligned} N (\mathbf{w}^{T}) & = R (\mathbf{w})^{\perp} \\ & = \{ \mathbf{x} \mid \langle \mathbf{x}, \mathbf{y} \rangle = 0, \forall \mathbf{y} \in R (\mathbf{w}) \} & [\text{def of perp}] \\ & = \{ \mathbf{x} \mid \langle \mathbf{x}, \alpha \mathbf{w} \rangle = 0, \forall \alpha \in \mathbb{R} \} & [\text{def of } R (\mathbf{w})] \\ & = \{ \mathbf{x} \mid \alpha \langle \mathbf{x}, \mathbf{w} \rangle = 0, \forall \alpha \in \mathbb{R} \} & [\text{def of } \langle \cdot, \cdot \rangle] \\ & = \{ \mathbf{x} \mid \langle \mathbf{x}, \mathbf{w} \rangle = 0 \} \\ & = \mathbf{w}^{\perp}. \end{aligned}

Since the general solution of any linear system is the sum of a particular solution and the general solution of the associated homogeneous equation, the set of general solutions is expressed as

\frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} + \mathbf{w}^{\perp}.

Thus,

\mathcal{H} = \mathbf{v} + \mathcal{M},

where

\mathbf{v} = \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w},

\mathcal{M} = \mathbf{w}^{\perp}.

TODO: explain the general solution of any linear system.

The orthogonal projection \mathbf{a} of a point \mathbf{b} \in \mathbb{R}^{n} onto the hyperplane \mathcal{H} is given by

\mathbf{a} = \mathbf{b} - \left( \frac{ \mathbf{w}^{T} \mathbf{b} - \beta }{ \mathbf{w}^{T} \mathbf{w} } \right) \mathbf{w}.

Proof

Since we know that \mathcal{H} is an affine space with \mathcal{W} = \mathbf{w}^{\perp} and \mathbf{v} = \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w}, the affine projection can be expressed as

\begin{aligned} \mathbf{a} & = \mathbf{v} + \mathbf{P}_{\mathcal{M}} (\mathbf{b} - \mathbf{v}), \\ & = \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} + \mathbf{P}_{\mathbf{w}^{\perp}} \left( \mathbf{b} - \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} \right), \end{aligned}

According to the property of projection matrix,

\mathbf{P}_{\mathbf{w}^{\perp}} = \mathbf{I} - \mathbf{P}_{\mathbf{w}}.

Denoted the basis of subspace \mathcal{M} as the columns of matrix \mathbf{M}, the orthogonal projection matrix onto \mathcal{M} is

\begin{aligned} \mathbf{P}_{\mathcal{M}} & = \mathbf{M} (\mathbf{M}^{T} \mathbf{M})^{-1} \mathbf{M}^{T} \\ \mathbf{P}_{\mathbf{w}} & = \mathbf{w} (\mathbf{w}^{T} \mathbf{w})^{-1} \mathbf{w}^{T} \\ \mathbf{P}_{\mathbf{w}} & = \frac{\mathbf{w} \mathbf{w}^{T}}{\mathbf{w}^{T} \mathbf{w}}. \end{aligned}

Thus,

\mathbf{P}_{\mathbf{w}^{\perp}} = \mathbf{I} - \frac{\mathbf{w} \mathbf{w}^{T}}{\mathbf{w}^{T} \mathbf{w}}.

Plugging it back into the top equation

\begin{aligned} \mathbf{a} & = \frac{ \beta }{ \mathbf{w}^{T} \mathbf{w} } \mathbf{w} + \mathbf{P}_{\mathbf{w}^{\perp}} \left( \mathbf{b} - \frac{ \beta }{ \mathbf{w}^{T} \mathbf{w} } \mathbf{w} \right), \\ & = \frac{ \beta }{ \mathbf{w}^{T} \mathbf{w} } \mathbf{w} + \left( \mathbf{I} - \frac{ \mathbf{w} \mathbf{w}^{T} }{ \mathbf{w}^{T} \mathbf{w} } \right) \left( \mathbf{b} - \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} \right), \\ & = \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} + \left( \mathbf{b} - \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} - \frac{\mathbf{w} \mathbf{w}^{T}}{\mathbf{w}^{T} \mathbf{w}} \mathbf{b} + \frac{\mathbf{w}^{T} \mathbf{w} \beta}{\mathbf{w}^{T} \mathbf{w} \mathbf{w}^{T} \mathbf{w}} \mathbf{w} \right) \\ & = \mathbf{b} - \frac{\mathbf{w} \mathbf{w}^{T}}{\mathbf{w}^{T} \mathbf{w}} \mathbf{b} + \frac{\beta}{\mathbf{w}^{T} \mathbf{w}} \mathbf{w} \end{aligned}

10  SVD and Pseudoinverse
12  Determinants and Eigensystems