Functions and Package

1 An Example

A chemical product is produced in a pressure vessel. A factorial experiment is carried out in the pilot plant to study the factors to study the factors thought to influence the filtration rate of this product. The four factors are temperature (A), pressure (B), concentration of formaldehyde (C), and stirring rate (D). Each factor is present at two levels. A single replicate of the 24 experiment is carried out and the 16 runs are made in random order. The process engineer is interested in maximizing the filtration rate. Current process conditions give filtration rates of around 75 gal/h. The process also currently uses factor C at the high level. The engineer would like to reduce formaldehyde concentration as much as possible but has been unable to do so because it always results in lower filtration rates.

2 Read and Create Data Frame

y A B C D
45 -1 -1 -1 -1
71 1 -1 -1 -1
48 -1 1 -1 -1
65 1 1 -1 -1
68 -1 -1 1 -1
60 1 -1 1 -1
80 -1 1 1 -1
65 1 1 1 -1
43 -1 -1 -1 1
100 1 -1 -1 1
45 -1 1 -1 1
104 1 1 -1 1
75 -1 -1 1 1
86 1 -1 1 1
70 -1 1 1 1
96 1 1 1 1

3 Visualize the Data

Attach the dataset for assssing y, A,B,C,D, directly without typing filtration$

4 Fit a Linear Model

## 
## Call:
## lm(formula = y ~ (A + B + C + D)^4, data = filtration)
## 
## Residuals:
## ALL 16 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  70.0625         NA      NA       NA
## A            10.8125         NA      NA       NA
## B             1.5625         NA      NA       NA
## C             4.9375         NA      NA       NA
## D             7.3125         NA      NA       NA
## A:B           0.0625         NA      NA       NA
## A:C          -9.0625         NA      NA       NA
## A:D           8.3125         NA      NA       NA
## B:C           1.1875         NA      NA       NA
## B:D          -0.1875         NA      NA       NA
## C:D          -0.5625         NA      NA       NA
## A:B:C         0.9375         NA      NA       NA
## A:B:D         2.0625         NA      NA       NA
## A:C:D        -0.8125         NA      NA       NA
## B:C:D        -1.3125         NA      NA       NA
## A:B:C:D       0.6875         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:     1,   Adjusted R-squared:   NaN 
## F-statistic:  NaN on 15 and 0 DF,  p-value: NA

4.1 Computing and Understanding the Coefficients

4.1.1 Design Matrix

filtration.y X.Intercept. A B C D A.B A.C A.D B.C B.D C.D A.B.C A.B.D A.C.D B.C.D A.B.C.D
45 1 -1 -1 -1 -1 1 1 1 1 1 1 -1 -1 -1 -1 1
71 1 1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 -1 -1
48 1 -1 1 -1 -1 -1 1 1 -1 -1 1 1 1 -1 1 -1
65 1 1 1 -1 -1 1 -1 -1 -1 -1 1 -1 -1 1 1 1
68 1 -1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 1 -1
60 1 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1 1 1
80 1 -1 1 1 -1 -1 -1 1 1 -1 -1 -1 1 1 -1 1
65 1 1 1 1 -1 1 1 -1 1 -1 -1 1 -1 -1 -1 -1
43 1 -1 -1 -1 1 1 1 -1 1 -1 -1 -1 1 1 1 -1
100 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1
45 1 -1 1 -1 1 -1 1 -1 -1 1 -1 1 -1 1 -1 1
104 1 1 1 -1 1 1 -1 1 -1 1 -1 -1 1 -1 -1 -1
75 1 -1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 -1 1
86 1 1 -1 1 1 -1 1 1 -1 -1 1 -1 -1 1 -1 -1
70 1 -1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 1 -1
96 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

4.1.2 Contrasts

Signed \(y\) with \(X_j\), the jth column in the design matrix:

X.Intercept. A B C D A.B A.C A.D B.C B.D C.D A.B.C A.B.D A.C.D B.C.D A.B.C.D
45 -45 -45 -45 -45 45 45 45 45 45 45 -45 -45 -45 -45 45
71 71 -71 -71 -71 -71 -71 -71 71 71 71 71 71 71 -71 -71
48 -48 48 -48 -48 -48 48 48 -48 -48 48 48 48 -48 48 -48
65 65 65 -65 -65 65 -65 -65 -65 -65 65 -65 -65 65 65 65
68 -68 -68 68 -68 68 -68 68 -68 68 -68 68 -68 68 68 -68
60 60 -60 60 -60 -60 60 -60 -60 60 -60 -60 60 -60 60 60
80 -80 80 80 -80 -80 -80 80 80 -80 -80 -80 80 80 -80 80
65 65 65 65 -65 65 65 -65 65 -65 -65 65 -65 -65 -65 -65
43 -43 -43 -43 43 43 43 -43 43 -43 -43 -43 43 43 43 -43
100 100 -100 -100 100 -100 -100 100 100 -100 -100 100 -100 -100 100 100
45 -45 45 -45 45 -45 45 -45 -45 45 -45 45 -45 45 -45 45
104 104 104 -104 104 104 -104 104 -104 104 -104 -104 104 -104 -104 -104
75 -75 -75 75 75 75 -75 -75 -75 -75 75 75 75 -75 -75 75
86 86 -86 86 86 -86 86 86 -86 -86 86 -86 -86 86 -86 -86
70 -70 70 70 70 -70 -70 -70 70 70 70 -70 -70 -70 70 -70
96 96 96 96 96 96 96 96 96 96 96 96 96 96 96 96

Contrasts are the column-wise sum of y*X, that is the sum of \(y_i\) at high level (\(y_{X_j+}\)) minus the sum of \(y_i\) at low level (\(y_{X_j-}\)). For “intercept”, it is the sum of all \(y_i\). In matrix language, \[\mbox{Contrast}_j=X_j'y=y_{X_j+}-y_{X_j-},\] where \(N\) is the total sample size. Here \(N=16\).

colSums.y…X.
(Intercept) 1121
A 173
B 25
C 79
D 117
A:B 1
A:C -145
A:D 133
B:C 19
B:D -3
C:D -9
A:B:C 15
A:B:D 33
A:C:D -13
B:C:D -21
A:B:C:D 11

4.1.3 Regression coefficients \(\hat \tau_j\)

In matrix language, \(\hat\tau = (X'X)^{-1}X'y\). The differences between means (such as A, B, ABC) are \(2\times \hat\tau\). The formula for coefficients can be written as: \[\mbox{Contrast}_j=X_j'y=y_{X_j+}-y_{X_j-}\] \[\hat \tau_j ={\mbox{Contrast}_j\over N}=\frac{\bar y_{X_j+}-\bar y_{X_j-}}{2}\]

colSums.y…X.
(Intercept) 70.06
A 10.81
B 1.56
C 4.94
D 7.31
A:B 0.06
A:C -9.06
A:D 8.31
B:C 1.19
B:D -0.19
C:D -0.56
A:B:C 0.94
A:B:D 2.06
A:C:D -0.81
B:C:D -1.31
A:B:C:D 0.69

The linear model is written as:

\[y = 70 + 11 x_A + 1.6 x_B + 4.9 x_C + 7.3 x_D +\\ 0.062 x_Ax_B -9.1 x_Ax_C + 1.2x_Bx_C + 8.3x_Ax_D-0.19 x_Bx_D -0.56 x_Cx_D + \\ 0.94x_Ax_Bx_C+2.1x_Ax_Bx_D -0.81 x_Ax_Cx_D -1.3x_Bx_Cx_D+\\0.69x_Ax_Bx_Cx_D +\\ \epsilon,\] where \(x_j=1\) if factor \(j\) at high level, and \(x_j=-1\) is factor \(j\) at low level, for \(j = A,B,C,D\).

5 ANOVA

5.1 Output from ANOVA Function

## Warning in anova.lm(g): ANOVA F-tests on an essentially perfect fit are
## unreliable
Df Sum Sq Mean Sq F value Pr(>F)
A 1 1870.56 1870.56 NaN NaN
B 1 39.06 39.06 NaN NaN
C 1 390.06 390.06 NaN NaN
D 1 855.56 855.56 NaN NaN
A:B 1 0.06 0.06 NaN NaN
A:C 1 1314.06 1314.06 NaN NaN
A:D 1 1105.56 1105.56 NaN NaN
B:C 1 22.56 22.56 NaN NaN
B:D 1 0.56 0.56 NaN NaN
C:D 1 5.06 5.06 NaN NaN
A:B:C 1 14.06 14.06 NaN NaN
A:B:D 1 68.06 68.06 NaN NaN
A:C:D 1 10.56 10.56 NaN NaN
B:C:D 1 27.56 27.56 NaN NaN
A:B:C:D 1 7.56 7.56 NaN NaN
Residuals 0 0.00 NaN NA NA

5.2 How to compute SS by hand?

The sum squares can be computed by \[SS_j=N \times \hat \tau_j^2=\frac{\mbox{Contrast}^2_j}{N},\] where \(N=2^k\times n\) is the total number of observations. In this example, \(N=16\) since \(k=4, n=1\). This formular is also a specialization of the general formular for SS: \[SS_\mbox{treatment}=\sum_{i=1}^a n_i (\bar y_i-\bar y)^2\] with \[n_i=N/2, \bar y_{X_j+}-\bar y = \hat\tau_j,\bar y_{X_j-}-\bar y = -\hat\tau_j, a = 2.\] Therefore, \[SS_j=(N/2)\times\hat\tau_j^2 + (N/2)\times\hat \tau_j^2=N\times \hat\tau_j^2.\]

X.Intercept. A B C D A.B A.C A.D B.C B.D C.D A.B.C A.B.D A.C.D B.C.D A.B.C.D
78540 1871 39 390 856 0.06 1314 1106 23 0.56 5.1 14 68 11 28 7.6

\(SS_j\) is just the Sum square of \(X_j\hat \tau_j\):

X.Intercept. A B C D A.B A.C A.D B.C B.D C.D A.B.C A.B.D A.C.D B.C.D A.B.C.D
70 -11 -1.6 -4.9 -7.3 0.06 -9.1 8.3 1.2 -0.19 -0.56 -0.94 -2.1 0.81 1.3 0.69
70 11 -1.6 -4.9 -7.3 -0.06 9.1 -8.3 1.2 -0.19 -0.56 0.94 2.1 -0.81 1.3 -0.69
70 -11 1.6 -4.9 -7.3 -0.06 -9.1 8.3 -1.2 0.19 -0.56 0.94 2.1 0.81 -1.3 -0.69
70 11 1.6 -4.9 -7.3 0.06 9.1 -8.3 -1.2 0.19 -0.56 -0.94 -2.1 -0.81 -1.3 0.69
70 -11 -1.6 4.9 -7.3 0.06 9.1 8.3 -1.2 -0.19 0.56 0.94 -2.1 -0.81 -1.3 -0.69
70 11 -1.6 4.9 -7.3 -0.06 -9.1 -8.3 -1.2 -0.19 0.56 -0.94 2.1 0.81 -1.3 0.69
70 -11 1.6 4.9 -7.3 -0.06 9.1 8.3 1.2 0.19 0.56 -0.94 2.1 -0.81 1.3 0.69
70 11 1.6 4.9 -7.3 0.06 -9.1 -8.3 1.2 0.19 0.56 0.94 -2.1 0.81 1.3 -0.69
70 -11 -1.6 -4.9 7.3 0.06 -9.1 -8.3 1.2 0.19 0.56 -0.94 2.1 -0.81 -1.3 -0.69
70 11 -1.6 -4.9 7.3 -0.06 9.1 8.3 1.2 0.19 0.56 0.94 -2.1 0.81 -1.3 0.69
70 -11 1.6 -4.9 7.3 -0.06 -9.1 -8.3 -1.2 -0.19 0.56 0.94 -2.1 -0.81 1.3 0.69
70 11 1.6 -4.9 7.3 0.06 9.1 8.3 -1.2 -0.19 0.56 -0.94 2.1 0.81 1.3 -0.69
70 -11 -1.6 4.9 7.3 0.06 9.1 -8.3 -1.2 0.19 -0.56 0.94 2.1 0.81 1.3 0.69
70 11 -1.6 4.9 7.3 -0.06 -9.1 8.3 -1.2 0.19 -0.56 -0.94 -2.1 -0.81 1.3 -0.69
70 -11 1.6 4.9 7.3 -0.06 9.1 -8.3 1.2 -0.19 -0.56 -0.94 -2.1 0.81 -1.3 -0.69
70 11 1.6 4.9 7.3 0.06 -9.1 8.3 1.2 -0.19 -0.56 0.94 2.1 -0.81 -1.3 0.69

6 Selecting Significant Effects

6.2 Lenth Method to Determine “Significant” Effects

Understand Psuedo Standard deviation

## [1] 2
## [1] 10
## [1] 102

Lenth Method

QQ plot of effects with lenth significance

## Significant Factors Selected by Lenth's Method:
##  A D A:C A:D
## [1] "A"   "D"   "A:C" "A:D"

Line plots with lenth significance

## alpha   PSE    ME   SME 
##  0.05  1.31  3.37  6.85

7 Analysis with selected interactions

Df Sum Sq Mean Sq F value Pr(>F)
A 1 1871 1871 96 0
C 1 390 390 20 0
D 1 856 856 44 0
A:C 1 1314 1314 67 0
A:D 1 1106 1106 57 0
Residuals 10 195 20 NA NA

Important note: the Sum Squares for the residuals are equal to the sum of sum squares of those “removed” terms.

## 
## Call:
## lm(formula = y ~ A + C + D + A * C + A * D, data = filtration)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.375 -1.500  0.063  2.906  5.750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    70.06       1.10   63.44  2.3e-14 ***
## A              10.81       1.10    9.79  1.9e-06 ***
## C               4.94       1.10    4.47   0.0012 ** 
## D               7.31       1.10    6.62  5.9e-05 ***
## A:C            -9.06       1.10   -8.21  9.4e-06 ***
## A:D             8.31       1.10    7.53  2.0e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.4 on 10 degrees of freedom
## Multiple R-squared:  0.966,  Adjusted R-squared:  0.949 
## F-statistic: 56.7 on 5 and 10 DF,  p-value: 5.14e-07

The final linear model for relating y to x based on the above “lm” output is written as follows: \[y=70.06 + 10.81 x_A + 4.94 x_C + 7.31 x_D - 9.06 x_Ax_C + 8.31 x_Ax_D + \epsilon \]

8 What can we learn from the linear model (How do the factors interact)?

When \(x_A=1\), the coefficient for \(x_C\) is 4.94-9.1 = -4.2, implying that filtration rate decreases about 8% when C increases from low level to high level; when \(x_A=-1\), the coefficient for \(x_C\) is 4.94+9.1 = 14.01. We see that high C factor (formaldehyde) on average increases filtration rate, but can decrease filtration rate when temperature (A) high (+1). Look at the interaction plot:

When \(x_A=1\), the coefficient for \(x_D\) is 7.31+8.31 = 16; when \(x_A=-1\), the coefficient for \(x_D\) is 7.31-8.31=-1. High D (stiring rates) increases filtration rate on average, but increases filtration rate when A (temperatre) is high (+1), and decreases slightly filtration rate a little bit.

Prediction of \(y_i\) in each combination of \(A,C,D\):

A C D predicted.filtration
-1 -1 -1 46
-1 -1 1 44
-1 1 -1 74
-1 1 1 72
1 -1 -1 69
1 -1 1 101
1 1 -1 61
1 1 1 92

The combination high temperature (A+), low formaldehye (C-) and high stiring (D+) can provide good filtration rate.