7  Hypothesis Testing in Linear Models

7.1 Testing Reduced Model vs Full Model (Partial F-test)

Consider the general linear model partitioned as follows:

\[ y = X\beta + e = (X_{1}, X_{2}) \begin{pmatrix} \beta_{1} \\ \beta_{2} \end{pmatrix} + e = X_1 \beta_1 + X_2 \beta_2 + e \]

where \(X\) is an \(n \times (k+1)\) matrix, \(X_1\) corresponds to the first set of predictors, and \(X_2\) corresponds to the remaining \(h\) predictors . We assume \(e \sim N(0, \sigma^2 I)\).

We are often interested in testing the hypothesis that the second set of coefficients is zero:

\[ H_0: \beta_2 = 0 \]

This leads to a comparison between two models:

  1. Full Model (FM): The maintained hypothesis where \(\mu \in C(X) = C([X_1, X_2])\) .

    \[ y = X_1 \beta_1 + X_2 \beta_2 + e \]

  2. Reduced Model (RM): The model under \(H_0\) where \(\mu \in C(X_1)\).

    \[ y = X_1 \beta_1 + e^* \]

The testing problem is to test \(H_0: \mu \in C(X_1)\) versus \(H_1: \mu \in C(X)\) but \(\mu \notin C(X_1)\) .

7.1.1 Geometric Interpretation

Since \(C(X_1) \subset C(X)\), if the Reduced Model is true, the Full Model must also be true. Under \(H_0\), the least squares estimates of the mean \(\mu\) from both models, \(P_{C(X_1)}y\) and \(P_{C(X)}y\), estimate the same quantity.

This suggests that the difference vector :

\[ P_{C(X)}y - P_{C(X_1)}y = (P_{C(X)} - P_{C(X_1)})y = \hat{y} - \hat{y}_0 \]

should be “small” under \(H_0\). The matrix \(P_{C(X)} - P_{C(X_1)}\) is the projection matrix onto \(C(X_1)^{\perp} \cap C(X)\), which is the orthogonal complement of \(C(X_1)\) with respect to \(C(X)\).

Figure 7.1: Geometric interpretation of the Full vs Reduced Model. The vector y is projected onto the Full Model space C(X) to get y_hat, and onto the Reduced Model space C(X1) to get y_hat0.

The squared length of this difference is used as a test statistic:

\[ \text{SSH} = ||\hat{y} - \hat{y}_0||^2 = y^T (P_{C(X)} - P_{C(X_1)}) y \]

7.1.2 Distributional Properties

To derive a test, we analyze the distribution of the sum of squares.

Theorem 7.1 (Distribution of Quadratic Forms) Suppose \(y \sim N(X\beta, \sigma^2 I)\) where \(X\) is \(n \times (k+1)\) of full rank, \(X\beta = X_1\beta_1 + X_2\beta_2\), and \(X_2\) is \(n \times h\). Let \(\hat{y} = P_{C(X)}y\), \(\hat{y}_0 = P_{C(X_1)}y\), and \(\mu_0 = P_{C(X_1)}\mu\). Then:

  1. \(\frac{1}{\sigma^2} ||y - \hat{y}||^2 = \frac{1}{\sigma^2} y^T (I - P_{C(X)}) y \sim \chi^2(n - k - 1)\).

  2. \(\frac{1}{\sigma^2} ||\hat{y} - \hat{y}_0||^2 = \frac{1}{\sigma^2} y^T (P_{C(X)} - P_{C(X_1)}) y \sim \chi^2(h, \lambda_1)\).

    where the non-centrality parameter is \(\lambda_1 = \frac{1}{\sigma^2} ||(P_{C(X)} - P_{C(X_1)})\mu||^2 = \frac{1}{\sigma^2} ||\mu - \mu_0||^2\).

  3. The two quadratic forms are independent.

Under \(H_0\), \(\mu \in C(X_1)\), so \(\mu = \mu_0\) and \(\lambda_1 = 0\). Thus, the numerator sum of squares follows a central \(\chi^2\) distribution.

7.1.3 The F-Test

Based on the independence and distribution of the quadratic forms, we construct the F-statistic.

Theorem 7.2 (F-Test for Reduced Model) Under the conditions of Theorem 7.1, the statistic

\[ F = \frac{||\hat{y} - \hat{y}_0||^2 / h}{||y - \hat{y}||^2 / (n - k - 1)} = \frac{\text{SSH}/h}{\text{SSE}_{\text{FM}}/(n - k - 1)} \]

follows a non-central F-distribution \(F(h, n - k - 1, \lambda_1)\). Under \(H_0: \beta_2 = 0\), it follows a central F-distribution \(F(h, n - k - 1)\) .

Using the Pythagorean Theorem, we can express the numerator in terms of Sum of Squares for Error (SSE) or Regression (SSR):

\[ ||\hat{y} - \hat{y}_0||^2 = SSE_{\text{RM}} - \text{SSE}_{\text{FM}} \]

\[ ||\hat{y} - \hat{y}_0||^2 = \text{SSR}_{\text{FM}} - \text{SSR}_{\text{RM}} \]

This quantity is often denoted as \(\text{SS}(\beta_2 | \beta_1)\), the “extra” regression sum of squares due to \(\beta_2\) after accounting for \(\beta_1\).

The ANOVA table for this comparison is constructed as follows:

Source SS MS F
Hypothesis \(\text{SSR}_{\text{FM}} - \text{SSR}_{\text{RM}}\) \(\text{SSH}/h\) \(\frac{\text{SSH}/h}{\text{MSE}_{\text{FM}}}\)
Error \(\text{SSE}_{\text{FM}}\) \(\text{SSE}_{\text{FM}}/(n-k-1)\)
Total \(\text{SST}\)

7.1.4 Overall Regression Test

A special case of the test \(H_0: \beta_2 = 0\) is when \(\beta_1\) contains only the intercept \(\beta_0\), and \(\beta_2\) contains all slope coefficients. The model is:

\[ y = \beta_0 j_n + X_2 \beta_2 + e \]

The hypothesis \(H_0: \beta_2 = 0\) is equivalent to \(H_0: \beta_1 = \beta_2 = \dots = \beta_k = 0\). In this case:

  1. The reduced model estimates \(\hat{y}_0 = \bar{y}j_n\).
  2. The degrees of freedom \(h = k\).
  3. The numerator becomes \(\text{SSR}/k \equiv \text{MSR}\).

Theorem 7.3 (Overall Regression F-Statistic) The test statistic for overall regression is given by:

\[ F = \frac{\text{SSR}/k}{\text{SSE}_{\text{FM}}/(n - k - 1)} = \frac{\text{MSR}}{\text{MSE}} \]

Under \(H_0\), \(F \sim F(k, n - k - 1)\).

This statistic can be expressed in terms of the coefficient of determination \(R^2\):

\[ F = \frac{R^2 / k}{(1 - R^2) / (n - k - 1)} \]

7.2 The General Linear Hypothesis

We can generalize these tests to the hypothesis \(H_0: C\beta = t\), where \(C\) is a \(q \times (k+1)\) matrix of rank \(q \le k+1\).

Common examples include :

  • Testing a subset of coefficients: \(C = (0, I_h)\).
  • Overall regression: \(C = (0, I_k)\).
  • Equality of coefficients (e.g., \(\beta_1 = \beta_2\)).

7.2.1 Test Statistic for \(C\beta = 0\)

The test compares \(C\hat{\beta}\) to its null value 0 using a squared statistical distance. Since \(\hat{\beta} \sim N(\beta, \sigma^2(X^T X)^{-1})\), it follows that \(C\hat{\beta} \sim N(C\beta, \sigma^2 C(X^T X)^{-1} C^T)\) .

Theorem 7.4 (F-Test for General Linear Hypothesis) If \(y \sim N(X\beta, \sigma^2 I)\) and \(C\) has rank \(q\), then under \(H_0: C\beta = 0\) :

\[ F = \frac{(C\hat{\beta})^T \{C(X^T X)^{-1} C^T\}^{-1} C\hat{\beta} / q}{\text{SSE}_{\text{FM}} / (n - k - 1)} \sim F(q, n - k - 1) \]

If \(H_0\) is false, it follows a non-central F distribution with parameter \(\lambda = \frac{(C\beta)^T [C(X^T X)^{-1} C^T]^{-1} C\beta}{\sigma^2}\) .

7.2.2 Nested Models Interpretation

The general linear hypothesis test is fundamentally a test of nested models.

Theorem 7.5 (General Linear Hypothesis as Nested Models) The F-test for the general linear hypothesis \(H_0: C\beta = 0\) is equivalent to a full-and-reduced model test. Specifically, testing \(H_0\) is equivalent to testing whether the mean vector \(\mu\) lies in a subspace \(V_0 \subset C(X)\). The test statistic satisfies:

\[ \text{SSH} = (C\hat{\beta})^T \{C(X^T X)^{-1} C^T\}^{-1} C\hat{\beta} = ||P_{C(Z)}y||^2 \]

where \(C(Z)\) is the orthogonal complement of the reduced model space \(V_0\) with respect to the full model space \(C(X)\).

Proof. Geometric Proof (Projections)

Under the null hypothesis \(H_0: C\beta = 0\), we have a constrained model. We observe that:

\[ C\beta = 0 \implies C(X^T X)^{-1} X^T (X\beta) = 0 \]

Let \(Z^T = C(X^T X)^{-1} X^T\). Then the condition becomes \(Z^T \mu = 0\) (since \(\mu = X\beta\)). This implies that under \(H_0\), \(\mu\) must be orthogonal to the column space of \(Z\), denoted \(\text{Col}(Z)\).

Since \(\mu\) must also lie in the full model space \(\text{Col}(X)\), under \(H_0\), \(\mu\) belongs to the intersection:

\[ V_0 = \text{Col}(Z)^{\perp} \cap \text{Col}(X) \]

This \(V_0\) represents the subspace for the Reduced Model (RM), while \(\text{Col}(X)\) is the subspace for the Full Model (FM). The hypotheses correspond to nested models since \(V_0 \subset \text{Col}(X)\).

The numerator sum of squares for comparing these models is the squared length of the projection of \(y\) onto the difference space. Since \(V_0\) is the orthogonal complement of \(\text{Col}(Z)\) relative to \(\text{Col}(X)\), the difference space is exactly \(\text{Col}(Z)\). Note that \(Z = X(X^T X)^{-1} C^T\), and since \(C\) has rank \(q\), \(Z\) has rank \(q\).

The sum of squares for the hypothesis is: \[ \text{SSH} = ||(P_{\text{Col}(X)} - P_{V_0})y||^2 = ||P_{\text{Col}(Z)}y||^2 \]

Expanding the projection matrix \(P_{\text{Col}(Z)} = Z(Z^T Z)^{-1} Z^T\):

\[ y^T P_{\text{Col}(Z)} y = y^T Z(Z^T Z)^{-1} Z^T y \]

Substituting \(Z = X(X^T X)^{-1} C^T\):

  1. Term \(Z^T y\):

    \[ Z^T y = C(X^T X)^{-1} X^T y = C\hat{\beta} \]

  2. Term \(Z^T Z\):

    \[ Z^T Z = C(X^T X)^{-1} X^T X (X^T X)^{-1} C^T = C(X^T X)^{-1} C^T \]

Thus, the quadratic form becomes:

\[ y^T P_{\text{Col}(Z)} y = (C\hat{\beta})^T [C(X^T X)^{-1} C^T]^{-1} (C\hat{\beta}) \]

This confirms that the Wald-type statistic derived for the General Linear Hypothesis is algebraically identical to the difference in sums of squares between the implied full and reduced models.

Figure 7.2: Geometric Interpretation of the General Linear Hypothesis

7.2.3 F-Test for Non-Zero General Linear Hypothesis

Theorem 7.6 (F-Test for Non-Zero General Linear Hypothesis) Consider the general linear model \(y = X\beta + \epsilon\) with \(\epsilon \sim N(0, \sigma^2 I)\). To test the non-homogeneous hypothesis \(H_0: C\beta = t\) where \(t\) is a known vector and \(C\) has rank \(q\), the test statistic is:

\[ F = \frac{(C\hat{\beta} - t)^T [C(X^T X)^{-1} C^T]^{-1} (C\hat{\beta} - t) / q}{\text{SSE}_{\text{FM}} / (n - k - 1)} \]

Under \(H_0\), this statistic follows an \(F\) distribution with \(q\) and \(n - k - 1\) degrees of freedom:

\[ F \sim F(q, n - k - 1) \]

Proof. Distributional Proof (Alternative)

This proof derives the F-statistic directly from the multivariate normal distribution of the coefficients, without explicitly invoking the geometry of projections.

Step 1: Distribution of the Linear Combination Recall that \(\hat{\beta} \sim N_{k+1}(\beta, \sigma^2(X^T X)^{-1})\). Under the null hypothesis \(H_0: C\beta = t\), we define the random vector \(\theta = C\hat{\beta} - t\). Its expected value is: \[E[\theta] = C E[\hat{\beta}] - t = C\beta - t = 0\]

Its variance-covariance matrix is: \[\text{Var}(\theta) = C \text{Var}(\hat{\beta}) C^T = C [\sigma^2(X^T X)^{-1}] C^T = \sigma^2 [C(X^T X)^{-1} C^T]\]

Let \(V = C(X^T X)^{-1} C^T\). Since \(C\) has full row rank \(q\), \(V\) is positive definite. Thus: \[\theta \sim N_q(0, \sigma^2 V)\]

Step 2: Formation of the Chi-Squared Variable A standard result in multivariate statistics states that if \(x \sim N_q(0, \Sigma)\), then the quadratic form \(x^T \Sigma^{-1} x \sim \chi^2_q\). Applying this to \(\theta\):

\[ \theta^T (\sigma^2 V)^{-1} \theta = \frac{(C\hat{\beta} - t)^T [C(X^T X)^{-1} C^T]^{-1} (C\hat{\beta} - t)}{\sigma^2} \sim \chi^2_q \]

Let \(Q_H = (C\hat{\beta} - t)^T [C(X^T X)^{-1} C^T]^{-1} (C\hat{\beta} - t)\). We have established that \(Q_H / \sigma^2 \sim \chi^2_q\).

Step 3: Independence and the F-Ratio From the properties of Least Squares, \(\hat{\beta}\) is independent of the residuals (and thus independent of \(\text{SSE}\)). Consequently, the numerator \(Q_H\) is independent of the denominator \(\text{SSE}\). We know that \(\text{SSE} / \sigma^2 \sim \chi^2_{n-k-1}\).

The F-statistic is constructed as the ratio of two independent Chi-squared variables divided by their respective degrees of freedom:

\[ F = \frac{(Q_H / \sigma^2) / q}{(\text{SSE} / \sigma^2) / (n - k - 1)} = \frac{Q_H / q}{\text{SSE} / (n - k - 1)} \]

The \(\sigma^2\) terms cancel out, leaving the standard General Linear Test statistic.

7.2.4 Numerical Examples

Example 7.1 (Chemical Reaction Data) Consider an experiment designed to optimize the yield of a chemical reaction. The explanatory variables are:

  • \(x_1\): Temperature (\(^\circ C\))
  • \(x_2\): Concentration of a reagent (%)
  • \(x_3\): Time of reaction (hours)

The response variable \(y_1\) is the percent of unchanged starting material. The data is provided below :

\(y_1\) \(x_1\) \(x_2\) \(x_3\)
41.5 162 23 3
33.8 162 23 8
27.7 162 30 5
21.7 162 30 8
19.9 172 25 5
15.0 172 25 8
12.2 172 30 5
4.3 172 30 8
19.3 167 27.5 6.5
6.4 177 27.5 6.5
37.6 157 27.5 6.5
18.0 167 32.5 6.5
26.3 167 22.5 6.5
9.9 167 27.5 9.5
25.0 167 27.5 3.5
14.1 177 20 6.5
15.2 177 20 6.5
15.9 160 34 7.5
19.6 160 34 7.5

We fit the full model: \[ y_1 = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \epsilon \]

Suppose we wish to test the hypothesis that the effect of temperature and concentration are equal (scaled by 2) and related to time as follows: \(H_0: 2\beta_1 = 2\beta_2 = \beta_3\). This can be broken down into two independent linear constraints:

  1. \(2\beta_1 - 2\beta_2 = 0\)
  2. \(2\beta_2 - \beta_3 = 0\)

We formulate this as a general linear hypothesis \(C\beta = 0\), where \(\beta = (\beta_0, \beta_1, \beta_2, \beta_3)^T\) and \(C\) is a \(2 \times 4\) matrix :

\[ C = \begin{pmatrix} 0 & 2 & -2 & 0 \\ 0 & 0 & 2 & -1 \end{pmatrix} \]

Using the data, the estimated coefficients \(\hat{\beta}\) yield \(C\hat{\beta} = \begin{pmatrix} -0.1214 \\ -0.6118 \end{pmatrix}\). The variance term involving the design matrix is calculated as:

\[ C(X^T X)^{-1} C^T = \begin{pmatrix} 0.003366 & -0.006943 \\ -0.006943 & 0.044974 \end{pmatrix} \]

The test statistic is computed using the quadratic form:

\[ F = \frac{(C\hat{\beta})^T [C(X^T X)^{-1} C^T]^{-1} (C\hat{\beta}) / 2}{s^2} = \frac{28.62301 / 2}{5.3449} = 2.6776 \]

where \(s^2 = \text{MSE}_{\text{FM}} = 5.3449\). This statistic follows an \(F(2, 15)\) distribution. The corresponding p-value is \(0.101\), suggesting that we fail to reject the null hypothesis at the \(\alpha = 0.05\) level.

Method 1: General Linear Hypothesis (Manual Matrix & car Package)

This approach tests the hypothesis \(H_0: C\beta = 0\). We calculate the Sum of Squares for the Hypothesis (\(SSH\)) using the general linear hypothesis formula and then structure the results into a standard ANOVA table format. Method 1: General Linear Hypothesis (Manual Matrix & car Package)

This approach tests the hypothesis \(H_0: C\beta = 0\). We calculate the Sum of Squares for the Hypothesis (\(SSH\)) using the general linear hypothesis formula and then structure the results into a standard ANOVA table format.

Code
library(knitr)
library(car)

# Load Data 
chemical_data <- data.frame(
  y1 = c(41.5, 33.8, 27.7, 21.7, 19.9, 15.0, 12.2, 4.3, 19.3, 6.4, 
         37.6, 18.0, 26.3, 9.9, 25.0, 14.1, 15.2, 15.9, 19.6),
  x1 = c(162, 162, 162, 162, 172, 172, 172, 172, 167, 177, 
         157, 167, 167, 167, 167, 177, 177, 160, 160),
  x2 = c(23, 23, 30, 30, 25, 25, 30, 30, 27.5, 27.5, 
         27.5, 32.5, 22.5, 27.5, 27.5, 20, 20, 34, 34),
  x3 = c(3, 8, 5, 8, 5, 8, 5, 8, 6.5, 6.5, 
         6.5, 6.5, 6.5, 9.5, 3.5, 6.5, 6.5, 7.5, 7.5)
)

# Fit Full Model
full_model <- lm(y1 ~ x1 + x2 + x3, data = chemical_data)
beta_hat <- coef(full_model)
XtX_inv  <- summary(full_model)$cov.unscaled
SSE_FM   <- sum(residuals(full_model)^2)
n <- nrow(chemical_data)
k <- 3

# --- Manual Matrix Implementation ---
C <- matrix(c(0, 2, -2,  0,
              0, 0,  2, -1), nrow = 2, byrow = TRUE)
q <- nrow(C) 

Cb <- C %*% beta_hat
middle_term <- solve(C %*% XtX_inv %*% t(C))
SSH <- as.numeric(t(Cb) %*% middle_term %*% Cb)

# Constructing ANOVA components
df_error <- n - k - 1
MSE <- SSE_FM / df_error
F_stat <- (SSH / q) / MSE
p_val <- pf(F_stat, q, df_error, lower.tail = FALSE)

# Display as an ANOVA table
anova_manual <- data.frame(
  Df = c(q, df_error),
  `Sum Sq` = c(SSH, SSE_FM),
  `Mean Sq` = c(SSH/q, MSE),
  `F value` = c(F_stat, NA),
  `Pr(>F)` = c(p_val, NA),
  row.names = c("Hypothesis", "Residuals")
)

print(as.table(as.matrix(anova_manual)))
                   Df     Sum.Sq    Mean.Sq    F.value     Pr..F.
Hypothesis  2.0000000 28.6232434 14.3116217  2.6776207  0.1013007
Residuals  15.0000000 80.1735397  5.3449026                      
Code
# --- Verification with car package ---
hypotheses <- c("2*x1 - 2*x2 = 0", "2*x2 - x3 = 0")
print(linearHypothesis(full_model, hypotheses))

Linear hypothesis test:
2 x1 - 2 x2 = 0
2 x2 - x3 = 0

Model 1: restricted model
Model 2: y1 ~ x1 + x2 + x3

  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1     17 108.797                           
2     15  80.174  2    28.623 2.6776 0.1013

Method 2: Nested Models Approach

The null hypothesis implies relationships between the parameters that allow us to rewrite the model. If \(2\beta_1 = \beta_3\) and \(2\beta_2 = \beta_3\), then \(\beta_1 = 0.5\beta_3\) and \(\beta_2 = 0.5\beta_3\). Substituting these into the full model:

\[ y_1 = \beta_0 + 0.5\beta_3 x_1 + 0.5\beta_3 x_2 + \beta_3 x_3 + \epsilon \] \[ y_1 = \beta_0 + \beta_3 (0.5 x_1 + 0.5 x_2 + x_3) + \epsilon \]

This reduced model has only two parameters: an intercept \(\beta_0\) and a slope \(\beta_3\) for the constructed variable \(z = 0.5x_1 + 0.5x_2 + x_3\). We could then calculate \(F\) using the difference in SSE between the full model and this reduced model. We analyze the change in error when moving from the Full Model to a Reduced Model. The plot illustrates \(SSE\) against the number of parameters \(p\). Labels are positioned to the right of each data point for clarity.

Code
library(ggplot2)

# Create the reduced variable z based on constraints
chemical_data$z <- 0.5 * chemical_data$x1 + 0.5 * chemical_data$x2 + chemical_data$x3

# Fit Models
reduced_model <- lm(y1 ~ z, data = chemical_data)

# Data for Plotting
n_obs <- nrow(chemical_data)
p_full <- length(coef(full_model))
p_red  <- length(coef(reduced_model))
p_sat  <- n_obs

viz_data <- data.frame(
  p = c(p_sat, p_full, p_red),
  sse = c(0, sum(residuals(full_model)^2), sum(residuals(reduced_model)^2)),
  Model = c("Saturated", "Full Model", "Reduced Model")
)

ggplot(viz_data, aes(x = p, y = sse)) +
  geom_line(color = "gray80", linetype = "dashed") +
  geom_point(aes(color = Model), size = 4) +
  # Use nudge_x and hjust=0 to place text on the right
  geom_text(aes(label = Model), hjust = 0, nudge_x = 0.5, size = 3.5) +
  labs(
    title = "SSE vs. Number of Parameters (p)",
    x = "Number of Parameters (p)",
    y = "Sum of Squares Error (SSE)"
  ) +
  theme_minimal() +
  scale_color_manual(values = c("Full Model" = "red", 
                                "Reduced Model" = "blue", 
                                "Saturated" = "darkgreen")) +
  # Expand x-axis slightly to make room for text on the right
  scale_x_continuous(expand = expansion(mult = c(0.1, 0.4))) +
  ylim(0, max(viz_data$sse) * 1.01)

Code
# Display raw text output from anova for nested models
anova(reduced_model, full_model)

7.3 Speficic Tests for Linear Combinations of \(\boldsymbol{\beta}\)

Test for \(a^T \beta = 0\)

To test a linear combination \(a^T \beta = 0\), the F-statistic is:

\[F = \frac{(a^T \hat{\beta})^2}{s^2 a^T (X^T X)^{-1} a} \sim F(1, n - k - 1)\]

This is equivalent to the t-test:

\[t = \frac{a^T \hat{\beta}}{s \sqrt{a^T (X^T X)^{-1} a}} \sim t(n - k - 1)\]

Test for \(\beta_j = 0\)

For a single coefficient \(\beta_j\), we set \(a\) to extract the \(j\)-th element. The test statistic becomes:

\[t = \frac{\hat{\beta}_j}{s \sqrt{g_{jj}}} = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}\]

where \(g_{jj}\) is the \(j\)-th diagonal element of \((X^T X)^{-1}\).


Confidence Region for \(\beta\)

A \(100(1-\alpha)\%\) confidence region for the vector \(\beta\) is the set of all vectors satisfying:

\[(\hat{\beta} - \beta)^T X^T X (\hat{\beta} - \beta) \le (k + 1) s^2 F_{1-\alpha}(k + 1, n - k - 1)\]

This region forms an ellipsoid centered at \(\hat{\beta}\).

Confidence Interval for \(a^T \beta\) and \(E(y_0)\)

A \(100(1-\alpha)\%\) confidence interval for a linear combination \(a^T \beta\) is given by:

\[a^T \hat{\beta} \pm t_{1-\alpha/2}(n - k - 1) s \sqrt{a^T (X^T X)^{-1} a}\]

To estimate the mean response \(E(y_0) = x_0^T \beta\) for a given \(x_0\), we set \(a = x_0\):

\[x_0^T \hat{\beta} \pm t_{1-\alpha/2}(n - k - 1) s \sqrt{x_0^T (X^T X)^{-1} x_0}\]

Prediction Interval for \(y_0\)

When predicting a new random observation \(y_0 = x_0^T \beta + e_0\), we must account for both the variance of the estimator and the variance of the new error term.

\[\text{Var}(y_0 - \hat{y}_0) = \text{Var}(e_0) + \text{Var}(x_0^T \hat{\beta}) = \sigma^2 (1 + x_0^T (X^T X)^{-1} x_0)\]

The \(100(1-\alpha)\%\) prediction interval is:

\[\hat{y}_0 \pm t_{1-\alpha/2}(n - k - 1) s \sqrt{1 + x_0^T (X^T X)^{-1} x_0}\]

7.3.1 Numerical Examples

Example 7.2 Specific Tests and Intervals Implementation

This example demonstrates how to:

  1. Test a specific linear combination of coefficients (Manually).
  2. Calculate Confidence and Prediction intervals using R’s predict() function.
  3. Verify those intervals using manual matrix algebra.
Code
library(knitr)
library(ggplot2)

# --- 0. Setup: Load Data and Fit Model ---
chemical_data <- data.frame(
  y1 = c(41.5, 33.8, 27.7, 21.7, 19.9, 15.0, 12.2, 4.3, 19.3, 6.4, 
         37.6, 18.0, 26.3, 9.9, 25.0, 14.1, 15.2, 15.9, 19.6),
  x1 = c(162, 162, 162, 162, 172, 172, 172, 172, 167, 177, 
         157, 167, 167, 167, 167, 177, 177, 160, 160),
  x2 = c(23, 23, 30, 30, 25, 25, 30, 30, 27.5, 27.5, 
         27.5, 32.5, 22.5, 27.5, 27.5, 20, 20, 34, 34),
  x3 = c(3, 8, 5, 8, 5, 8, 5, 8, 6.5, 6.5, 
         6.5, 6.5, 6.5, 9.5, 3.5, 6.5, 6.5, 7.5, 7.5)
)

full_model <- lm(y1 ~ x1 + x2 + x3, data = chemical_data)
beta_hat <- coef(full_model)
s_sq <- summary(full_model)$sigma^2
XtX_inv <- summary(full_model)$cov.unscaled

# --- 1. Test for Linear Combination: H0: beta1 + beta2 = 50 ---

# Define 'a' vector (Intercept, x1, x2, x3)
a <- c(0, 1, 1, 0) 

# Manual t-statistic calculation
est <- sum(a * beta_hat)
se_est <- sqrt(s_sq * t(a) %*% XtX_inv %*% a)
t_stat <- (est - 50) / se_est
p_val_t <- 2 * pt(abs(t_stat), df = full_model$df.residual, lower.tail = FALSE)

# --- 2. Confidence and Prediction Intervals using predict() ---

# Point of interest: x0 = (165, 25, 5)
new_pt <- data.frame(x1=165, x2=25, x3=5)

# Confidence Interval (for the Mean Response)
ci_r <- predict(full_model, newdata = new_pt, interval = "confidence", level = 0.95)

# Prediction Interval (for a New Observation)
pi_r <- predict(full_model, newdata = new_pt, interval = "prediction", level = 0.95)

# --- 3. Manual Verification of Intervals ---
x0_vec <- matrix(c(1, 165, 25, 5), ncol = 1) # Note the 1 for intercept
y_hat  <- as.numeric(t(x0_vec) %*% beta_hat)
t_crit <- qt(0.975, full_model$df.residual)

# Standard Errors
se_mean <- sqrt(as.numeric(s_sq * (t(x0_vec) %*% XtX_inv %*% x0_vec)))
se_pred <- sqrt(as.numeric(s_sq * (1 + t(x0_vec) %*% XtX_inv %*% x0_vec)))

# Manual Intervals
ci_man <- c(y_hat - t_crit * se_mean, y_hat + t_crit * se_mean)
pi_man <- c(y_hat - t_crit * se_pred, y_hat + t_crit * se_pred)

# --- 4. Display Results ---
results <- data.frame(
  Method = c("R predict()", "Manual Calc", "R predict()", "Manual Calc"),
  Type   = c("Confidence (Mean)", "Confidence (Mean)", "Prediction (New)", "Prediction (New)"),
  Lower  = c(ci_r[2], ci_man[1], pi_r[2], pi_man[1]),
  Upper  = c(ci_r[3], ci_man[2], pi_r[3], pi_man[2])
)

kable(results, digits = 4, caption = "Comparison of R functions vs Manual Matrix Algebra")
Comparison of R functions vs Manual Matrix Algebra
Method Type Lower Upper
R predict() Confidence (Mean) 28.4576 31.9957
Manual Calc Confidence (Mean) 28.4576 31.9957
R predict() Prediction (New) 24.9910 35.4623
Manual Calc Prediction (New) 24.9910 35.4623
Code
# --- 5. Visualization ---

# Varying x1 while holding x2 and x3 constant at their means
x_range <- data.frame(
  x1 = seq(min(chemical_data$x1), max(chemical_data$x1), length.out = 100),
  x2 = mean(chemical_data$x2),
  x3 = mean(chemical_data$x3)
)

c_bands <- predict(full_model, newdata = x_range, interval = "confidence")
p_bands <- predict(full_model, newdata = x_range, interval = "prediction")

plot_df <- data.frame(x_range, c_bands)
plot_df$p_lwr <- p_bands[,2]
plot_df$p_upr <- p_bands[,3]

ggplot(plot_df, aes(x = x1, y = fit)) +
  geom_ribbon(aes(ymin = p_lwr, ymax = p_upr), fill = "blue", alpha = 0.1) +
  geom_ribbon(aes(ymin = lwr, ymax = upr), fill = "blue", alpha = 0.3) +
  geom_line(color = "black") +
  labs(title = "95% Confidence (Dark) vs. Prediction (Light) Bands",
       subtitle = "Prediction bands are wider due to added variance of individual errors",
       x = "x1 (Holding x2, x3 at Mean)", y = "Predicted y1") +
  theme_minimal()