Consider the linear system \(X\beta = y\). In \(\mathbb{R}^2\), if \(X = [x_1, x_2]\) is invertible, the solution is unique: \(\beta = X^{-1}y\). This satisfies \(X(X^{-1}y) = y\).However, if \(X\) is not square or not invertible (e.g., \(X\) is \(2 \times 3\)), \(X\beta = y\) does not have a unique solution. We seek a matrix \(G\) such that \(\beta = Gy\) provides a solution whenever \(y \in C(X)\) (the column space of X). Substituting \(\beta = Gy\) into the equation \(X\beta = y\): \[
X(Gy) = y \quad \forall y \in C(X)
\] Since any \(y \in C(X)\) can be written as \(Xw\) for some vector \(w\): \[
XGXw = Xw \quad \forall w
\] This implies the defining condition: \[
XGX = X
\]
8.2 Definition of Generalized Inverse
Definition 8.1 (Generalized Inverse) Let \(X\) be an \(n \times p\) matrix. A matrix \(X^-\) of size \(p \times n\) is called a generalized inverse of \(X\) if it satisfies: \[
XX^-X = X
\]
Example 8.1 (Examples of Generalized Inverse)
Example 1: Diagonal Matrix If \(X = \text{diag}(\lambda_1, \lambda_2, 0, 0)\), we can write it in matrix form as: \[
X = \begin{pmatrix}
\lambda_1 & 0 & 0 & 0 \\
0 & \lambda_2 & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{pmatrix}
\] A generalized inverse is obtained by inverting the non-zero elements: \[
X^- = \begin{pmatrix}
\lambda_1^{-1} & 0 & 0 & 0 \\
0 & \lambda_2^{-1} & 0 & 0 \\
0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0
\end{pmatrix}
\]
Example 2: Row Vector Let \(X = (1, 2, 3)\). One possible generalized inverse is a column vector where the first element is the reciprocal of the first non-zero element of \(X\) (which is \(1\)), and others are zero: \[
X^- = \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}
\]Verification:\[
XX^-X = (1, 2, 3) \begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix} (1, 2, 3) = (1) \cdot (1, 2, 3) = (1, 2, 3) = X
\] Other valid generalized inverses include \(\begin{pmatrix} 0 \\ 1/2 \\ 0 \end{pmatrix}\) or \(\begin{pmatrix} 0 \\ 0 \\ 1/3 \end{pmatrix}\).
Solution: A generalized inverse can be found by locating a non-singular \(2 \times 2\) submatrix, inverting it, and padding the rest with zeros. Let’s take the top-left minor \(M = \begin{pmatrix} 2 & 2 \\ 1 & 0 \end{pmatrix}\). The inverse is \(M^{-1} = \frac{1}{-2}\begin{pmatrix} 0 & -2 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 0 & 1 \\ 0.5 & -1 \end{pmatrix}\).
Placing this in the corresponding position in \(A^-\) and setting the rest to 0: \[
A^- = \begin{pmatrix} 0 & 1 & 0 \\ 0.5 & -1 & 0 \\ 0 & 0 & 0 \end{pmatrix}
\]
A systematic procedure to find a generalized inverse \(A^-\) for any matrix \(A\):
Find any non-singular \(r \times r\) submatrix \(C\), where \(r\) is the rank of \(A\). It is not necessary for the elements of \(C\) to occupy adjacent rows and columns in \(A\).
Find \(C^{-1}\) and \((C^{-1})'\).
Replace the elements of \(C\) in \(A\) with the elements of \((C^{-1})'\).
8.5 Solving Linear Systems with Generalized Inverse
We apply generalized inverses to solve systems of linear equations \(X\beta = c\) where \(X\) is \(n \times p\).
Definition 8.2 (Consistency and Solution) The system \(X\beta = c\) is consistent if and only if \(c \in \text{Col}(X)\) (the column space of \(X\)). If consistent, \(\beta = X^- c\) is a solution.
Proof: If the system is consistent, there exists some \(b\) such that \(Xb = c\). Using the definition \(XX^-X = X\): \[
X(X^- c) = X(X^- X b) = (XX^-X)b = Xb = c
\] Thus, \(X^-c\) is a solution. Note that the solution is not unique if \(X\) is not full rank.
Example 8.2 (Examples of Solutions of Linear System with Generalized Inverse)
Example 1: Underdetermined System
Let \(X = \begin{pmatrix} 1 & 2 & 3 \end{pmatrix}\) and we want to solve \(X\beta = 4\).
Let \(X = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}\). Solve \(X\beta = \begin{pmatrix} 2 \\ 4 \\ 6 \end{pmatrix} = c\). Here \(c = 2X\), so the system is consistent. Since \(X\) is a column vector, \(\beta\) is a scalar.
8.6 Least Squares for Non-full-rank \(X\) with Generalized Inverse
8.6.1 Projection Matrix with Generalized Inverse of \(X'X\)
For the normal equations \((X'X)\beta = X'y\), a solution is given by: \[
\hat{\beta} = (X'X)^- X'y
\] The fitted values are \[\hat{y} = X\hat{\beta} = X(X'X)^- X'y.\] This \(\hat{y}\) represents the unique orthogonal projection of \(y\) onto \(\text{Col}(X)\).
8.6.2 Invariance and Uniqueness of “the” Projection Matrix
Theorem 8.1 (Transpose Property of Generalized Inverses)\((X^-)'\) is a version of \((X')^-\). That is, \((X^-)'\) is a generalized inverse of \(X'\).
Proof. By definition, a generalized inverse \(X^-\) satisfies the property: \[
X X^- X = X
\]
To verify that \((X^-)'\) is a generalized inverse of \(X'\), we need to show that it satisfies the condition \(A G A = A\) where \(A = X'\) and \(G = (X^-)'\).
Start with the fundamental definition: \[
X X^- X = X
\]
Take the transpose of both sides of the equation: \[
(X X^- X)' = X'
\]
Apply the reverse order law for transposes, \((ABC)' = C' B' A'\): \[
X' (X^-)' X' = X'
\]
Since substituting \((X^-)'\) into the generalized inverse equation for \(X'\) yields \(X'\), \((X^-)'\) is a valid generalized inverse of \(X'\).
Lemma 8.1 (Invariance of Generalized Least Squares) For any version of the generalized inverse \((X'X)^-\), the matrix \(X'(X'X)^- X'\) is invariant and equals \(X'\). \[
X'X(X'X)^- X' = X'
\]
Proof (using Projection): Let \(P = X(X'X)^- X'\). This is the projection matrix onto \(\text{Col}(X)\). By definition of projection, \(Px = x\) for any \(x \in \text{Col}(X)\). Since columns of \(X\) are in \(\text{Col}(X)\), \(PX = X\). Taking the transpose: \((PX)' = X' \implies X'P' = X'\). Since projection matrices are symmetric (\(P=P'\)), \(X'P = X'\). Substituting \(P\): \(X' X (X'X)^- X' = X'\).
Proof (Direct Matrix Manipulation): Decompose \(y = X\beta + e\) where \(e \perp \text{Col}(X)\) (i.e., \(X'e = 0\)). \[
\begin{aligned}
X'X(X'X)^- X' y &= X'X(X'X)^- X' (X\beta + e) \\
&= X'X(X'X)^- X'X\beta + X'X(X'X)^- X'e
\end{aligned}
\] Using the property \(A A^- A = A\) (where \(A=X'X\)), the first term becomes \(X'X\beta\). The second term is 0 because \(X'e = 0\). Thus, the expression simplifies to \(X'X\beta = X'(X\beta) = X'\hat{y}_{\text{proj}}\). This implies the operator acts as \(X'\).
Theorem 8.2 (Properties of Projection Matrix \(P\)) Let \(P = X(X'X)^- X'\). This matrix has the following properties:
Symmetry:\(P = P'\).
Idempotence:\(P^2 = P\). \[
P^2 = X(X'X)^- X' X(X'X)^- X' = X(X'X)^- (X'X (X'X)^- X')
\] Using the identity from Lemma 8.1 (\(X'X(X'X)^- X' = X'\)), this simplifies to: \[
X(X'X)^- X' = P
\]
Uniqueness:\(P\) is unique and invariant to the choice of the generalized inverse \((X'X)^-\).
Proof. Proof of Uniqueness:
Let \(A\) and \(B\) be two different generalized inverses of \(X'X\). Define \(P_A = X A X'\) and \(P_B = X B X'\). From Lemma 8.1, we know that \(X' P_A = X'\) and \(X' P_B = X'\).
Subtracting these two equations: \[
X' (P_A - P_B) = 0
\] Taking the transpose, we get \((P_A - P_B) X = 0\). This implies that the columns of the difference matrix \(D = P_A - P_B\) are orthogonal to the columns of \(X\) (i.e., \(D \perp \text{Col}(X)\)).
However, by definition, the columns of \(P_A\) and \(P_B\) (and thus \(D\)) are linear combinations of the columns of \(X\) (i.e., \(D \in \text{Col}(X)\)).
The only matrix that lies in the column space of \(X\) but is also orthogonal to the column space of \(X\) is the zero matrix. Therefore: \[
P_A - P_B = 0 \implies P_A = P_B
\]
Example: Projection with Linearly Dependent Columns and Generalized Inverses
In this example, we project a vector \(y \in \mathbb{R}^4\) onto the column space of a design matrix \(X = [x_1, x_2]\). We examine the specific case where the predictors are perfectly dependent such that \(x_2 = 2x_1\).
We will: 1. Construct the design matrix \(X\) and compute \(X^T X\). 2. Find two different generalized inverses of \(X^T X\). 3. Show that the projection matrix \(P\) is invariant to the choice of generalized inverse. 4. Demonstrate that the projection of \(y\) onto \(L(x_1, x_2)\) is exactly the same as projecting onto \(L(x_1)\) individually.
1. Define the Vectors
Let’s define our vectors in \(\mathbb{R}^4\). We set \(x_2\) to be exactly twice \(x_1\) to enforce linear dependence.
To find the projection, we first compute the matrix \(X^T X\): \[X^T X = \begin{bmatrix} 1 & 2 & 1 & 1 \\ 2 & 4 & 2 & 2 \end{bmatrix} \begin{bmatrix} 1 & 2 \\ 2 & 4 \\ 1 & 2 \\ 1 & 2 \end{bmatrix} = \begin{bmatrix} 7 & 14 \\ 14 & 28 \end{bmatrix}\]
The determinant of this matrix is \((7)(28) - (14)(14) = 196 - 196 = 0\). Because the determinant is \(0\), the matrix \(X^T X\) is singular and cannot be inverted normally. We must use a generalized inverse.
3. Compute Two Generalized Inverses
A generalized inverse \(G\) satisfies the condition \(A G A = A\). Because \(X^T X\) has a rank of 1, we can easily find G-inverses by taking the reciprocal of a single non-zero diagonal element and setting all other entries to zero.
We have shown that \(P y = P_{x_1} y = \hat{y}\). Because \(x_2\) is merely a scaled version of \(x_1\), it exists entirely within the 1-dimensional space already spanned by \(x_1\). Incorporating \(x_2\) into the model adds no new geometric dimensions, and the resulting projection is identical.
8.7 The Left Inverse View: Recovering \(\hat{\beta}\) from \(\hat{y}\)
While the geometric properties of the linear model are most naturally established via the unique orthogonal projection \(\hat{y}\), we require a functional mapping—a statistical “bridge”—to translate the distribution of these fitted values back into the parameter space of \(\hat{\beta}\). This bridge is provided by the generalized left inverse.
8.7.1 The Generalized Left Inverse
To recover the parameter estimates directly from the fitted values, we define the generalized left inverse, denoted as \(X_{\text{left}}^-\), such that:
\[
\hat{\beta} = X_{\text{left}}^- \hat{y}
\]
A standard choice for this operator, derived from the normal equations, is:
\[
X_{\text{left}}^- = (X' X)^- X'
\]
When \(X\) is full-rank, the \(X_{\text{left}}^-\) is unique, which is given by
\[
X_{\text{left}}^- = (X' X)^{-1} X'
\]
8.7.2 Verification of the Inverse Property
To verify that \(X_{\text{left}}^-\) acts as a valid generalized inverse of \(X\), it must satisfy the condition \(X X_{\text{left}}^- X = X\). Substituting our definition:
\[
X \underbrace{\left[ (X' X)^- X' \right]}_{X_{\text{left}}^-} X = X (X' X)^- (X' X)
\]
Using the property of generalized inverses for symmetric matrices where \((X' X)(X' X)^- X' = X'\), the transpose of this identity gives \(X (X' X)^- (X' X) = X\). Thus, the condition holds:
\[
X X_{\text{left}}^- X = X
\]
8.7.3 Recovering the Estimator
We can now demonstrate that applying this left inverse to the fitted values \(\hat{y}\) yields the standard solution to the normal equations.
Substituting the projection formula \(\hat{y} = X(X' X)^- X' y\):
\[
\begin{aligned}
X_{\text{left}}^- \hat{y} &= \left[ (X' X)^- X' \right] \left[ X(X' X)^- X' y \right] \\
&= (X' X)^- \underbrace{(X' X) (X' X)^- (X' X)}_{\text{Property } A A^- A = A} (X' X)^- X' y
\end{aligned}
\]
Simplifying using the generalized inverse property \(A^- A A^- = A^-\) (where \(A = X' X\)):
Thus, we recover the standard estimator used in the normal equations:
\[
\mathbf{\hat{\beta} = (X' X)^- X' y}
\]
8.8 Non-full-rank Least Squares with QR Decomposition
When \(X\) has rank \(r < p\) (where \(X\) is \(n \times p\)), we can derive the least squares estimator using partitioned matrices.
Assume the first \(r\) columns of \(X\) are linearly independent. We can partition \(X\) as: \[
X = Q (R_1, R_2)
\] where \(Q\) is an \(n \times r\) matrix with orthogonal columns (\(Q'Q = I_r\)), \(R_1\) is an \(r \times r\) non-singular matrix, and \(R_2\) is \(r \times (p-r)\).
8.8.1 Constructing a Solution by Solving Normal Equations
One specific generalized inverse of \(X'X\) can be found by focusing on the non-singular block \(R_1'R_1\): \[
(X'X)^- = \begin{pmatrix} (R_1'R_1)^{-1} & 0 \\ 0 & 0 \end{pmatrix}
\]
The fitted values are: \[
\hat{y} = X\hat{\beta} = Q(R_1, R_2) \begin{pmatrix} R_1^{-1} Q'y \\ 0 \end{pmatrix} = Q R_1 R_1^{-1} Q'y = QQ'y
\] This confirms that \(\hat{y}\) is the projection of \(y\) onto the column space of \(Q\) (which is the same as the column space of \(X\)).
8.8.2 Constructing a Solution by Solving Reparametrized \(\beta\)
We can view the model as: \[
y = Q(R_1, R_2)\beta + \epsilon = Qb + \epsilon
\] where \(b = R_1\beta_1 + R_2\beta_2\).
Since the columns of \(Q\) are orthogonal, the least squares estimate for \(b\) is simply: \[
\hat{b} = (Q'Q)^{-1}Q'y = Q'y
\]
To find \(\beta\), we solve the underdetermined system: \[
R_1\beta_1 + R_2\beta_2 = \hat{b} = Q'y
\]
Solution 1: Set \(\beta_2 = 0\). Then: \[
R_1\beta_1 = Q'y \implies \hat{\beta}_1 = R_1^{-1}Q'y
\] This yields the same result as the generalized inverse method above: \(\hat{\beta} = \begin{pmatrix} R_1^{-1}Q'y \\ 0 \end{pmatrix}\).
Solution 2: Using the generalized inverse of \(R = (R_1, R_2)\): \[
R^- = \begin{pmatrix} R_1^{-1} \\ 0 \end{pmatrix}
\]\[
\hat{\beta} = R^- Q'y = \begin{pmatrix} R_1^{-1}Q'y \\ 0 \end{pmatrix}
\] This demonstrates that finding a solution to the normal equations using \((X'X)^-\) is equivalent to solving the reparameterized system \(b = R\beta\).