8 Generalized Inverses

8.1 Motivation

Consider the linear system \(X\beta = y\). In \(\mathbb{R}^2\), if \(X = [x_1, x_2]\) is invertible, the solution is unique: \(\beta = X^{-1}y\). This satisfies \(X(X^{-1}y) = y\).However, if \(X\) is not square or not invertible (e.g., \(X\) is \(2 \times 3\)), \(X\beta = y\) does not have a unique solution. We seek a matrix \(G\) such that \(\beta = Gy\) provides a solution whenever \(y \in C(X)\) (the column space of X). Substituting \(\beta = Gy\) into the equation \(X\beta = y\): \[ X(Gy) = y \quad \forall y \in C(X) \] Since any \(y \in C(X)\) can be written as \(Xw\) for some vector \(w\): \[ XGXw = Xw \quad \forall w \] This implies the defining condition: \[ XGX = X \]

8.2 Definition of Generalized Inverse

Definition 8.1 (Generalized Inverse) Let \(X\) be an \(n \times p\) matrix. A matrix \(X^-\) of size \(p \times n\) is called a generalized inverse of \(X\) if it satisfies: \[ XX^-X = X \]

Example 8.1 (Examples of Generalized Inverse)

Example 1: Diagonal Matrix If \(X = \text{diag}(\lambda_1, \lambda_2, 0, 0)\), we can write it in matrix form as: \[ X = \begin{pmatrix} \lambda_1 & 0 & 0 & 0 \\ 0 & \lambda_2 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} \] A generalized inverse is obtained by inverting the non-zero elements: \[ X^- = \begin{pmatrix} \lambda_1^{-1} & 0 & 0 & 0 \\ 0 & \lambda_2^{-1} & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{pmatrix} \]
Example 2: Row Vector Let \(X = (1, 2, 3)\). One possible generalized inverse is a column vector where the first element is the reciprocal of the first non-zero element of \(X\) (which is \(1\)), and others are zero: \[ X^- = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} \] Verification: \[ XX^-X = (1, 2, 3) \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} (1, 2, 3) = (1) \cdot (1, 2, 3) = (1, 2, 3) = X \] Other valid generalized inverses include \(\begin{pmatrix} 0 \\ 1/2 \\ 0 \end{pmatrix}\) or \(\begin{pmatrix} 0 \\ 0 \\ 1/3 \end{pmatrix}\).
Example 3: Rank Deficient Matrix Let \(A = \begin{pmatrix} 2 & 2 & 3 \\ 1 & 0 & 1 \\ 3 & 2 & 4 \end{pmatrix}\). Note that Row 3 = Row 1 + Row 2, so Rank\((A) = 2\).

Solution: A generalized inverse can be found by locating a non-singular \(2 \times 2\) submatrix, inverting it, and padding the rest with zeros. Let’s take the top-left minor \(M = \begin{pmatrix} 2 & 2 \\ 1 & 0 \end{pmatrix}\). The inverse is \(M^{-1} = \frac{1}{-2}\begin{pmatrix} 0 & -2 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0.5 & -1 \end{pmatrix}\).

Placing this in the corresponding position in \(A^-\) and setting the rest to 0: \[ A^- = \begin{pmatrix} 0 & 1 & 0 \\ 0.5 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix} \]

Verification (\(AA^-A = A\)): First, compute \(AA^-\): \[ AA^- = \begin{pmatrix} 2 & 2 & 3 \\ 1 & 0 & 1 \\ 3 & 2 & 4 \end{pmatrix} \begin{pmatrix} 0 & 1 & 0 \\ 0.5 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix} = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{pmatrix} \] Then multiply by \(A\): \[ (AA^-)A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{pmatrix} \begin{pmatrix} 2 & 2 & 3 \\ 1 & 0 & 1 \\ 3 & 2 & 4 \end{pmatrix} = \begin{pmatrix} 2 & 2 & 3 \\ 1 & 0 & 1 \\ 3 & 2 & 4 \end{pmatrix} = A \]

8.3 A Procedure to Find a Generalized Inverse

If we can partition \(X\) (possibly after permuting rows/columns) such that \(R_{11}\) is a non-singular rank \(r\) submatrix:

\[ X = \begin{pmatrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{pmatrix} \]

Then a generalized inverse is:

\[ X^- = \begin{pmatrix} R_{11}^{-1} & 0 \\ 0 & 0 \end{pmatrix} \]

Verification:

\[ \begin{aligned} XX^-X &= \begin{pmatrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{pmatrix} \begin{pmatrix} R_{11}^{-1} & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{pmatrix} \\ &= \begin{pmatrix} I_r & 0 \\ R_{21}R_{11}^{-1} & 0 \end{pmatrix} \begin{pmatrix} R_{11} & R_{12} \\ R_{21} & R_{22} \end{pmatrix} \\ &= \begin{pmatrix} R_{11} & R_{12} \\ R_{21} & R_{21}R_{11}^{-1}R_{12} \end{pmatrix} \end{aligned} \] Note that since rank\((X) = \text{rank}(R_{11})\), the rows of \([R_{21}, R_{22}]\) are linear combinations of \([R_{11}, R_{12}]\), implying \(R_{22} = R_{21}R_{11}^{-1}R_{12}\). Thus, \(XX^-X = X\).

An Algorithm for Finding a Generalized Inverse

A systematic procedure to find a generalized inverse \(A^-\) for any matrix \(A\):

Find any non-singular \(r \times r\) submatrix \(C\), where \(r\) is the rank of \(A\). It is not necessary for the elements of \(C\) to occupy adjacent rows and columns in \(A\).
Find \(C^{-1}\) and \((C^{-1})'\).
Replace the elements of \(C\) in \(A\) with the elements of \((C^{-1})'\).
Replace all other elements in \(A\) with zeros.
Transpose the resulting matrix.

Matrix Visual Representation \[ \underset{\text{Original } A}{\begin{pmatrix} \times & \otimes & \times & \otimes \\ \times & \otimes & \times & \otimes \\ \times & \times & \times & \times \end{pmatrix}} \xrightarrow[\text{with } (C^{-1})']{\text{Replace } C} \underset{\text{Intermediate}}{\begin{pmatrix} \times & \triangle & \times & \triangle \\ \times & \triangle & \times & \triangle \\ \times & \times & \times & \times \end{pmatrix}} \xrightarrow[\text{Result}]{\text{Transpose}} \underset{\text{Final } A^-}{\begin{pmatrix} \times & \times & \times \\ \square & \square & \times \\ \times & \times & \times \\ \square & \square & \times \end{pmatrix}} \]

Legend:

\(\otimes\): Elements of submatrix \(C\)
\(\triangle\): Elements of \((C^{-1})'\)
\(\square\): Elements of \(C^{-1}\) (after transposition)
\(\times\): Other elements (replaced by 0 in the final calculation)

8.4 Moore-Penrose Inverse

The Moore-Penrose inverse (denoted \(X^+\)) is a unique generalized inverse defined via Singular Value Decomposition (SVD).

If \(X\) has SVD: \[ X = U \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} V' \]

Then the Moore-Penrose inverse is: \[ X^+ = V \begin{pmatrix} \Lambda_r^{-1} & 0 \\ 0 & 0 \end{pmatrix} U' \]

where \(\Lambda_r = \text{diag}(\lambda_1, \dots, \lambda_r)\) contains the singular values. Unlike standard generalized inverses, \(X^+\) is unique.

Verification:

We verify that \(X^+\) satisfies the condition \(XX^+X = X\).

Substitute definitions: \[ XX^+X = \left[ U \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} V' \right] \left[ V \begin{pmatrix} \Lambda_r^{-1} & 0 \\ 0 & 0 \end{pmatrix} U' \right] \left[ U \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} V' \right] \]
Apply orthogonality: Recall that \(V'V = I\) and \(U'U = I\). \[ = U \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} \underbrace{(V'V)}_{I} \begin{pmatrix} \Lambda_r^{-1} & 0 \\ 0 & 0 \end{pmatrix} \underbrace{(U'U)}_{I} \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} V' \]
Multiply diagonal matrices: \[ = U \left[ \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} \Lambda_r^{-1} & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} \right] V' \] Since \(\Lambda_r \Lambda_r^{-1} \Lambda_r = I \cdot \Lambda_r = \Lambda_r\): \[ = U \begin{pmatrix} \Lambda_r & 0 \\ 0 & 0 \end{pmatrix} V' = X \]

8.5 Solving Linear Systems with Generalized Inverse

We apply generalized inverses to solve systems of linear equations \(X\beta = c\) where \(X\) is \(n \times p\).

Definition 8.2 (Consistency and Solution) The system \(X\beta = c\) is consistent if and only if \(c \in \text{Col}(X)\) (the column space of \(X\)). If consistent, \(\beta = X^- c\) is a solution.

Proof: If the system is consistent, there exists some \(b\) such that \(Xb = c\). Using the definition \(XX^-X = X\): \[ X(X^- c) = X(X^- X b) = (XX^-X)b = Xb = c \] Thus, \(X^-c\) is a solution. Note that the solution is not unique if \(X\) is not full rank.

Example 8.2 (Examples of Solutions of Linear System with Generalized Inverse)

Example 1: Underdetermined System

Let \(X = \begin{pmatrix} 1 & 2 & 3 \end{pmatrix}\) and we want to solve \(X\beta = 4\).

Solution 1: Using the generalized inverse \(X^- = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}\): \[ \beta = X^- \cdot 4 = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} 4 = \begin{pmatrix} 4 \\ 0 \\ 0 \end{pmatrix} \] Verification: \[ X\beta = \begin{pmatrix} 1 & 2 & 3 \end{pmatrix} \begin{pmatrix} 4 \\ 0 \\ 0 \end{pmatrix} = 1(4) + 2(0) + 3(0) = 4 \quad \checkmark \]

Solution 2: Using another generalized inverse \(X^- = \begin{pmatrix} 0 \\ 0 \\ 1/3 \end{pmatrix}\): \[ \beta = X^- \cdot 4 = \begin{pmatrix} 0 \\ 0 \\ 1/3 \end{pmatrix} 4 = \begin{pmatrix} 0 \\ 0 \\ 4/3 \end{pmatrix} \] Verification: \[ X\beta = \begin{pmatrix} 1 & 2 & 3 \end{pmatrix} \begin{pmatrix} 0 \\ 0 \\ 4/3 \end{pmatrix} = 0 + 0 + 3(4/3) = 4 \quad \checkmark \]
Example 2: Overdetermined System

Let \(X = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\). Solve \(X\beta = \begin{pmatrix} 2 \\ 4 \\ 6 \end{pmatrix} = c\). Here \(c = 2X\), so the system is consistent. Since \(X\) is a column vector, \(\beta\) is a scalar.

Solution: Using the generalized inverse \(X^- = \begin{pmatrix} 1 & 0 & 0 \end{pmatrix}\): \[ \beta = X^- c = \begin{pmatrix} 1 & 0 & 0 \end{pmatrix} \begin{pmatrix} 2 \\ 4 \\ 6 \end{pmatrix} = 1(2) + 0(4) + 0(6) = 2 \] Verification: \[ X\beta = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix} (2) = \begin{pmatrix} 2 \\ 4 \\ 6 \end{pmatrix} = c \quad \checkmark \]

8.6 Least Squares for Non-full-rank \(X\) with Generalized Inverse

8.6.1 Projection Matrix with Generalized Inverse of \(X'X\)

For the normal equations \((X'X)\beta = X'y\), a solution is given by: \[ \hat{\beta} = (X'X)^- X'y \] The fitted values are \[\hat{y} = X\hat{\beta} = X(X'X)^- X'y.\] This \(\hat{y}\) represents the unique orthogonal projection of \(y\) onto \(\text{Col}(X)\).

8.6.2 Invariance and Uniqueness of “the” Projection Matrix

Theorem 8.1 (Transpose Property of Generalized Inverses) \((X^-)'\) is a version of \((X')^-\). That is, \((X^-)'\) is a generalized inverse of \(X'\).

Proof. By definition, a generalized inverse \(X^-\) satisfies the property: \[ X X^- X = X \]

To verify that \((X^-)'\) is a generalized inverse of \(X'\), we need to show that it satisfies the condition \(A G A = A\) where \(A = X'\) and \(G = (X^-)'\).

Start with the fundamental definition: \[ X X^- X = X \]
Take the transpose of both sides of the equation: \[ (X X^- X)' = X' \]
Apply the reverse order law for transposes, \((ABC)' = C' B' A'\): \[ X' (X^-)' X' = X' \]

Since substituting \((X^-)'\) into the generalized inverse equation for \(X'\) yields \(X'\), \((X^-)'\) is a valid generalized inverse of \(X'\).

Lemma 8.1 (Invariance of Generalized Least Squares) For any version of the generalized inverse \((X'X)^-\), the matrix \(X'(X'X)^- X'\) is invariant and equals \(X'\). \[ X'X(X'X)^- X' = X' \]

Proof (using Projection): Let \(P = X(X'X)^- X'\). This is the projection matrix onto \(\text{Col}(X)\). By definition of projection, \(Px = x\) for any \(x \in \text{Col}(X)\). Since columns of \(X\) are in \(\text{Col}(X)\), \(PX = X\). Taking the transpose: \((PX)' = X' \implies X'P' = X'\). Since projection matrices are symmetric (\(P=P'\)), \(X'P = X'\). Substituting \(P\): \(X' X (X'X)^- X' = X'\).

Proof (Direct Matrix Manipulation): Decompose \(y = X\beta + e\) where \(e \perp \text{Col}(X)\) (i.e., \(X'e = 0\)). \[ \begin{aligned} X'X(X'X)^- X' y &= X'X(X'X)^- X' (X\beta + e) \\ &= X'X(X'X)^- X'X\beta + X'X(X'X)^- X'e \end{aligned} \] Using the property \(A A^- A = A\) (where \(A=X'X\)), the first term becomes \(X'X\beta\). The second term is 0 because \(X'e = 0\). Thus, the expression simplifies to \(X'X\beta = X'(X\beta) = X'\hat{y}_{\text{proj}}\). This implies the operator acts as \(X'\).

Theorem 8.2 (Properties of Projection Matrix \(P\)) Let \(P = X(X'X)^- X'\). This matrix has the following properties:

Symmetry: \(P = P'\).
Idempotence: \(P^2 = P\). \[ P^2 = X(X'X)^- X' X(X'X)^- X' = X(X'X)^- (X'X (X'X)^- X') \] Using the identity from Lemma 8.1 (\(X'X(X'X)^- X' = X'\)), this simplifies to: \[ X(X'X)^- X' = P \]
Uniqueness: \(P\) is unique and invariant to the choice of the generalized inverse \((X'X)^-\).

Proof. Proof of Uniqueness:

Let \(A\) and \(B\) be two different generalized inverses of \(X'X\). Define \(P_A = X A X'\) and \(P_B = X B X'\). From Lemma 8.1, we know that \(X' P_A = X'\) and \(X' P_B = X'\).

Subtracting these two equations: \[ X' (P_A - P_B) = 0 \] Taking the transpose, we get \((P_A - P_B) X = 0\). This implies that the columns of the difference matrix \(D = P_A - P_B\) are orthogonal to the columns of \(X\) (i.e., \(D \perp \text{Col}(X)\)).

However, by definition, the columns of \(P_A\) and \(P_B\) (and thus \(D\)) are linear combinations of the columns of \(X\) (i.e., \(D \in \text{Col}(X)\)).

The only matrix that lies in the column space of \(X\) but is also orthogonal to the column space of \(X\) is the zero matrix. Therefore: \[ P_A - P_B = 0 \implies P_A = P_B \]

Example: Projection with Linearly Dependent Columns and Generalized Inverses

In this example, we project a vector \(y \in \mathbb{R}^4\) onto the column space of a design matrix \(X = [x_1, x_2]\). We examine the specific case where the predictors are perfectly dependent such that \(x_2 = 2x_1\).

We will: 1. Construct the design matrix \(X\) and compute \(X^T X\). 2. Find two different generalized inverses of \(X^T X\). 3. Show that the projection matrix \(P\) is invariant to the choice of generalized inverse. 4. Demonstrate that the projection of \(y\) onto \(L(x_1, x_2)\) is exactly the same as projecting onto \(L(x_1)\) individually.

1. Define the Vectors

Let’s define our vectors in \(\mathbb{R}^4\). We set \(x_2\) to be exactly twice \(x_1\) to enforce linear dependence.

\[x_1 = \begin{bmatrix} 1 \\ 2 \\ 1 \\ 1 \end{bmatrix}, \quad x_2 = \begin{bmatrix} 2 \\ 4 \\ 2 \\ 2 \end{bmatrix}, \quad y = \begin{bmatrix} 4 \\ 2 \\ 1 \\ 3 \end{bmatrix}\]

2. Design Matrix and \(X^T X\)

The design matrix \(X = [x_1, x_2]\) is: \[X = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 1 & 2 \\ 1 & 2 \end{bmatrix}\]

To find the projection, we first compute the matrix \(X^T X\): \[X^T X = \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 1 & 2 \\ 1 & 2 \end{bmatrix} = \begin{bmatrix} 7 & 14 \\ 14 & 28 \end{bmatrix}\]

The determinant of this matrix is \((7)(28) - (14)(14) = 196 - 196 = 0\). Because the determinant is \(0\), the matrix \(X^T X\) is singular and cannot be inverted normally. We must use a generalized inverse.

3. Compute Two Generalized Inverses

A generalized inverse \(G\) satisfies the condition \(A G A = A\). Because \(X^T X\) has a rank of 1, we can easily find G-inverses by taking the reciprocal of a single non-zero diagonal element and setting all other entries to zero.

Choice 1 (\(G_1\)): Invert the top-left element (\(7 \rightarrow 1/7\)). \[G_1 = \begin{bmatrix} 1/7 & 0 \\ 0 & 0 \end{bmatrix}\]

Choice 2 (\(G_2\)): Invert the bottom-right element (\(28 \rightarrow 1/28\)). \[G_2 = \begin{bmatrix} 0 & 0 \\ 0 & 1/28 \end{bmatrix}\]

4. The Projection Matrix \(P\) is Invariant

The projection matrix is calculated as \(P = X G X^T\). Let’s compute it using \(G_1\):

\[X G_1 = \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 1 & 2 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 1/7 & 0 \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 1/7 & 0 \\ 2/7 & 0 \\ 1/7 & 0 \\ 1/7 & 0 \end{bmatrix}\]

\[P_1 = (X G_1) X^T = \begin{bmatrix} 1/7 & 0 \\ 2/7 & 0 \\ 1/7 & 0 \\ 1/7 & 0 \end{bmatrix} \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \end{bmatrix} = \frac{1}{7} \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \\ 1 & 2 & 1 & 1 \\ 1 & 2 & 1 & 1 \end{bmatrix}\]

If you perform the same calculation using \(G_2\), you will yield the exact same \(4 \times 4\) projection matrix. Thus, \(P_1 = P_2 = P\).

The projection of \(y\) onto \(L(x_1, x_2)\) is: \[\hat{y} = P y = \frac{1}{7} \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \\ 1 & 2 & 1 & 1 \\ 1 & 2 & 1 & 1 \end{bmatrix} \begin{bmatrix} 4 \\ 2 \\ 1 \\ 3 \end{bmatrix} = \frac{1}{7} \begin{bmatrix} 12 \\ 24 \\ 12 \\ 12 \end{bmatrix} \approx \begin{bmatrix} 1.714 \\ 3.429 \\ 1.714 \\ 1.714 \end{bmatrix}\]

5. Comparison to Projecting onto \(x_1\) Individually

For a single vector \(x_1\), the projection matrix is simply \(P_{x_1} = x_1(x_1^T x_1)^{-1}x_1^T\).

First, we find the scalar denominator: \[x_1^T x_1 = 1^2 + 2^2 + 1^2 + 1^2 = 7\]

Next, we construct the matrix: \[P_{x_1} = \frac{1}{7} \begin{bmatrix} 1 \\ 2 \\ 1 \\ 1 \end{bmatrix} \begin{bmatrix} 1 & 2 & 1 & 1 \end{bmatrix} = \frac{1}{7} \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \\ 1 & 2 & 1 & 1 \\ 1 & 2 & 1 & 1 \end{bmatrix}\]

Applying this to \(y\): \[P_{x_1} y = \frac{1}{7} \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \\ 1 & 2 & 1 & 1 \\ 1 & 2 & 1 & 1 \end{bmatrix} \begin{bmatrix} 4 \\ 2 \\ 1 \\ 3 \end{bmatrix} = \frac{1}{7} \begin{bmatrix} 12 \\ 24 \\ 12 \\ 12 \end{bmatrix}\]

We have shown that \(P y = P_{x_1} y = \hat{y}\). Because \(x_2\) is merely a scaled version of \(x_1\), it exists entirely within the 1-dimensional space already spanned by \(x_1\). Incorporating \(x_2\) into the model adds no new geometric dimensions, and the resulting projection is identical.

6. R Implementation Verification

Code

# Define vectors
x1 <- c(1, 2, 1, 1)
x2 <- c(2, 4, 2, 2)
y  <- c(4, 2, 1, 3)

# Design matrix X
X <- cbind(x1, x2)
XTX <- t(X) %*% X

# Define two different generalized inverses
G1 <- matrix(c(1/7, 0, 0, 0), nrow = 2)
G2 <- matrix(c(0, 0, 0, 1/28), nrow = 2)

# Verify they are generalized inverses (A * G * A = A)
is_G1_valid <- all.equal(XTX %*% G1 %*% XTX, XTX)
is_G2_valid <- all.equal(XTX %*% G2 %*% XTX, XTX)

# Compute Projection Matrices
P1 <- X %*% G1 %*% t(X)
P2 <- X %*% G2 %*% t(X)

# Compute Projections onto L(x1, x2)
y_hat_G1 <- P1 %*% y
y_hat_G2 <- P2 %*% y

# Compute Projection onto L(x1) only
P_x1 <- (x1 %*% t(x1)) / as.numeric(t(x1) %*% x1)
y_hat_x1 <- P_x1 %*% y

# Compile Results
results <- data.frame(
  Target_y = y,
  Proj_G1  = round(as.vector(y_hat_G1), 4),
  Proj_G2  = round(as.vector(y_hat_G2), 4),
  Proj_x1  = round(as.vector(y_hat_x1), 4)
)

print("Are the generalized inverses valid?")

[1] "Are the generalized inverses valid?"

Code

print(c(G1_Valid = is_G1_valid, G2_Valid = is_G2_valid))

G1_Valid G2_Valid 
    TRUE     TRUE

Code

print("Projection Results:")

[1] "Projection Results:"

Code

print(results)

  Target_y Proj_G1 Proj_G2 Proj_x1
1        4  1.7143  1.7143  1.7143
2        2  3.4286  3.4286  3.4286
3        1  1.7143  1.7143  1.7143
4        3  1.7143  1.7143  1.7143

8.7 The Left Inverse View: Recovering \(\hat{\beta}\) from \(\hat{y}\)

While the geometric properties of the linear model are most naturally established via the unique orthogonal projection \(\hat{y}\), we require a functional mapping—a statistical “bridge”—to translate the distribution of these fitted values back into the parameter space of \(\hat{\beta}\). This bridge is provided by the generalized left inverse.

8.7.1 The Generalized Left Inverse

To recover the parameter estimates directly from the fitted values, we define the generalized left inverse, denoted as \(X_{\text{left}}^-\), such that:

\[ \hat{\beta} = X_{\text{left}}^- \hat{y} \]

A standard choice for this operator, derived from the normal equations, is:

\[ X_{\text{left}}^- = (X' X)^- X' \]

When \(X\) is full-rank, the \(X_{\text{left}}^-\) is unique, which is given by

\[ X_{\text{left}}^- = (X' X)^{-1} X' \]

8.7.2 Verification of the Inverse Property

To verify that \(X_{\text{left}}^-\) acts as a valid generalized inverse of \(X\), it must satisfy the condition \(X X_{\text{left}}^- X = X\). Substituting our definition:

\[ X \underbrace{\left[ (X' X)^- X' \right]}_{X_{\text{left}}^-} X = X (X' X)^- (X' X) \]

Using the property of generalized inverses for symmetric matrices where \((X' X)(X' X)^- X' = X'\), the transpose of this identity gives \(X (X' X)^- (X' X) = X\). Thus, the condition holds:

\[ X X_{\text{left}}^- X = X \]

8.7.3 Recovering the Estimator

We can now demonstrate that applying this left inverse to the fitted values \(\hat{y}\) yields the standard solution to the normal equations.

Substituting the projection formula \(\hat{y} = X(X' X)^- X' y\):

\[ \begin{aligned} X_{\text{left}}^- \hat{y} &= \left[ (X' X)^- X' \right] \left[ X(X' X)^- X' y \right] \\ &= (X' X)^- \underbrace{(X' X) (X' X)^- (X' X)}_{\text{Property } A A^- A = A} (X' X)^- X' y \end{aligned} \]

Simplifying using the generalized inverse property \(A^- A A^- = A^-\) (where \(A = X' X\)):

\[ \begin{aligned} X_{\text{left}}^- \hat{y} &= \underbrace{(X' X)^- (X' X) (X' X)^-}_{(X' X)^-} X' y \\ &= (X' X)^- X' y \end{aligned} \]

Thus, we recover the standard estimator used in the normal equations:

\[ \mathbf{\hat{\beta} = (X' X)^- X' y} \]

8.8 Non-full-rank Least Squares with QR Decomposition

When \(X\) has rank \(r < p\) (where \(X\) is \(n \times p\)), we can derive the least squares estimator using partitioned matrices.

Assume the first \(r\) columns of \(X\) are linearly independent. We can partition \(X\) as: \[ X = Q (R_1, R_2) \] where \(Q\) is an \(n \times r\) matrix with orthogonal columns (\(Q'Q = I_r\)), \(R_1\) is an \(r \times r\) non-singular matrix, and \(R_2\) is \(r \times (p-r)\).

The normal equations are: \[ X'X\beta = X'y \implies \begin{pmatrix} R_1' \\ R_2' \end{pmatrix} Q' Q (R_1, R_2) \beta = \begin{pmatrix} R_1' \\ R_2' \end{pmatrix} Q'y \] Simplifying (\(Q'Q = I_r\)): \[ \begin{pmatrix} R_1'R_1 & R_1'R_2 \\ R_2'R_1 & R_2'R_2 \end{pmatrix} \beta = \begin{pmatrix} R_1'Q'y \\ R_2'Q'y \end{pmatrix} \]

8.8.1 Constructing a Solution by Solving Normal Equations

One specific generalized inverse of \(X'X\) can be found by focusing on the non-singular block \(R_1'R_1\): \[ (X'X)^- = \begin{pmatrix} (R_1'R_1)^{-1} & 0 \\ 0 & 0 \end{pmatrix} \]

Using this generalized inverse, the estimator \(\hat{\beta}\) becomes: \[ \hat{\beta} = (X'X)^- X'y = \begin{pmatrix} (R_1'R_1)^{-1} & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} R_1'Q'y \\ R_2'Q'y \end{pmatrix} \] \[ \hat{\beta} = \begin{pmatrix} (R_1'R_1)^{-1} R_1' Q'y \\ 0 \end{pmatrix} = \begin{pmatrix} R_1^{-1} Q'y \\ 0 \end{pmatrix} \]

The fitted values are: \[ \hat{y} = X\hat{\beta} = Q(R_1, R_2) \begin{pmatrix} R_1^{-1} Q'y \\ 0 \end{pmatrix} = Q R_1 R_1^{-1} Q'y = QQ'y \] This confirms that \(\hat{y}\) is the projection of \(y\) onto the column space of \(Q\) (which is the same as the column space of \(X\)).

8.8.2 Constructing a Solution by Solving Reparametrized \(\beta\)

We can view the model as: \[ y = Q(R_1, R_2)\beta + \epsilon = Qb + \epsilon \] where \(b = R_1\beta_1 + R_2\beta_2\).

Since the columns of \(Q\) are orthogonal, the least squares estimate for \(b\) is simply: \[ \hat{b} = (Q'Q)^{-1}Q'y = Q'y \]

To find \(\beta\), we solve the underdetermined system: \[ R_1\beta_1 + R_2\beta_2 = \hat{b} = Q'y \]

Solution 1: Set \(\beta_2 = 0\). Then: \[ R_1\beta_1 = Q'y \implies \hat{\beta}_1 = R_1^{-1}Q'y \] This yields the same result as the generalized inverse method above: \(\hat{\beta} = \begin{pmatrix} R_1^{-1}Q'y \\ 0 \end{pmatrix}\).

Solution 2: Using the generalized inverse of \(R = (R_1, R_2)\): \[ R^- = \begin{pmatrix} R_1^{-1} \\ 0 \end{pmatrix} \] \[ \hat{\beta} = R^- Q'y = \begin{pmatrix} R_1^{-1}Q'y \\ 0 \end{pmatrix} \] This demonstrates that finding a solution to the normal equations using \((X'X)^-\) is equivalent to solving the reparameterized system \(b = R\beta\).