library(BsMD)
qqnorm.lenth <- function (effects)
{
abseffects<- abs(effects)
m <- length (abseffects)
# Psuedo sd based all abs(effect)
s0<-1.5*median(abseffects)
non.sig <- abseffects<2.5*s0
# Refined estimate of sd
pse<-1.5*median(abseffects[non.sig])
sme<-qt(1- (1-0.95^(1/m))/2,m/3)*pse;
sig <- abseffects>sme
hqq<-qqnorm(effects, type = "n")
text(hqq$x, hqq$y, labels = names (effects), col = sig + 1)
qqline(effects)
cat ("Significant Factors Selected by Lenth's Method:\n",
names(abseffects)[sig], "\n")
names(effects)[sig]
}
A chemical product is produced in a pressure vessel. A factorial experiment is carried out in the pilot plant to study the factors to study the factors thought to influence the filtration rate of this product. The four factors are temperature (A), pressure (B), concentration of formaldehyde (C), and stirring rate (D). Each factor is present at two levels. A single replicate of the 24 experiment is carried out and the 16 runs are made in random order. The process engineer is interested in maximizing the filtration rate. Current process conditions give filtration rates of around 75 gal/h. The process also currently uses factor C at the high level. The engineer would like to reduce formaldehyde concentration as much as possible but has been unable to do so because it always results in lower filtration rates.
options(digits=2)
filtration <- data.frame(
y= scan(text="45 71 48 65 68 60 80 65 43 100 45 104 75 86 70 96"),
A = rep(rep (c(-1,1),each = 1),8),
B = rep(rep (c(-1,1),each = 2),4),
C = rep(rep (c(-1,1),each = 4),2),
D = rep(rep (c(-1,1),each = 8),1)
)
filtration
y | A | B | C | D |
---|---|---|---|---|
45 | -1 | -1 | -1 | -1 |
71 | 1 | -1 | -1 | -1 |
48 | -1 | 1 | -1 | -1 |
65 | 1 | 1 | -1 | -1 |
68 | -1 | -1 | 1 | -1 |
60 | 1 | -1 | 1 | -1 |
80 | -1 | 1 | 1 | -1 |
65 | 1 | 1 | 1 | -1 |
43 | -1 | -1 | -1 | 1 |
100 | 1 | -1 | -1 | 1 |
45 | -1 | 1 | -1 | 1 |
104 | 1 | 1 | -1 | 1 |
75 | -1 | -1 | 1 | 1 |
86 | 1 | -1 | 1 | 1 |
70 | -1 | 1 | 1 | 1 |
96 | 1 | 1 | 1 | 1 |
Attach the dataset for assssing y, A,B,C,D, directly without typing filtration$
par (mfrow = c(3,2))
interaction.plot(A, B, y, ylim=range (y), col = 1:2)
interaction.plot(C, D, y, ylim=range (y), col = 1:2)
interaction.plot(A, C, y, ylim=range (y), col = 1:2)
interaction.plot(B, C, y, ylim=range (y), col = 1:2)
interaction.plot(D, C, y, ylim=range (y), col = 1:2)
interaction.plot(A, D, y, ylim=range (y), col = 1:2)
##
## Call:
## lm(formula = y ~ (A + B + C + D)^4, data = filtration)
##
## Residuals:
## ALL 16 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.0625 NA NA NA
## A 10.8125 NA NA NA
## B 1.5625 NA NA NA
## C 4.9375 NA NA NA
## D 7.3125 NA NA NA
## A:B 0.0625 NA NA NA
## A:C -9.0625 NA NA NA
## A:D 8.3125 NA NA NA
## B:C 1.1875 NA NA NA
## B:D -0.1875 NA NA NA
## C:D -0.5625 NA NA NA
## A:B:C 0.9375 NA NA NA
## A:B:D 2.0625 NA NA NA
## A:C:D -0.8125 NA NA NA
## B:C:D -1.3125 NA NA NA
## A:B:C:D 0.6875 NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 15 and 0 DF, p-value: NA
filtration.y | X.Intercept. | A | B | C | D | A.B | A.C | A.D | B.C | B.D | C.D | A.B.C | A.B.D | A.C.D | B.C.D | A.B.C.D |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 |
71 | 1 | 1 | -1 | -1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | 1 | 1 | 1 | -1 | -1 |
48 | 1 | -1 | 1 | -1 | -1 | -1 | 1 | 1 | -1 | -1 | 1 | 1 | 1 | -1 | 1 | -1 |
65 | 1 | 1 | 1 | -1 | -1 | 1 | -1 | -1 | -1 | -1 | 1 | -1 | -1 | 1 | 1 | 1 |
68 | 1 | -1 | -1 | 1 | -1 | 1 | -1 | 1 | -1 | 1 | -1 | 1 | -1 | 1 | 1 | -1 |
60 | 1 | 1 | -1 | 1 | -1 | -1 | 1 | -1 | -1 | 1 | -1 | -1 | 1 | -1 | 1 | 1 |
80 | 1 | -1 | 1 | 1 | -1 | -1 | -1 | 1 | 1 | -1 | -1 | -1 | 1 | 1 | -1 | 1 |
65 | 1 | 1 | 1 | 1 | -1 | 1 | 1 | -1 | 1 | -1 | -1 | 1 | -1 | -1 | -1 | -1 |
43 | 1 | -1 | -1 | -1 | 1 | 1 | 1 | -1 | 1 | -1 | -1 | -1 | 1 | 1 | 1 | -1 |
100 | 1 | 1 | -1 | -1 | 1 | -1 | -1 | 1 | 1 | -1 | -1 | 1 | -1 | -1 | 1 | 1 |
45 | 1 | -1 | 1 | -1 | 1 | -1 | 1 | -1 | -1 | 1 | -1 | 1 | -1 | 1 | -1 | 1 |
104 | 1 | 1 | 1 | -1 | 1 | 1 | -1 | 1 | -1 | 1 | -1 | -1 | 1 | -1 | -1 | -1 |
75 | 1 | -1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | -1 | 1 | 1 | 1 | -1 | -1 | 1 |
86 | 1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | -1 | -1 | 1 | -1 | -1 | 1 | -1 | -1 |
70 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 |
96 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Signed \(y\) with \(X_j\), the jth column in the design matrix:
X.Intercept. | A | B | C | D | A.B | A.C | A.D | B.C | B.D | C.D | A.B.C | A.B.D | A.C.D | B.C.D | A.B.C.D |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45 | -45 | -45 | -45 | -45 | 45 | 45 | 45 | 45 | 45 | 45 | -45 | -45 | -45 | -45 | 45 |
71 | 71 | -71 | -71 | -71 | -71 | -71 | -71 | 71 | 71 | 71 | 71 | 71 | 71 | -71 | -71 |
48 | -48 | 48 | -48 | -48 | -48 | 48 | 48 | -48 | -48 | 48 | 48 | 48 | -48 | 48 | -48 |
65 | 65 | 65 | -65 | -65 | 65 | -65 | -65 | -65 | -65 | 65 | -65 | -65 | 65 | 65 | 65 |
68 | -68 | -68 | 68 | -68 | 68 | -68 | 68 | -68 | 68 | -68 | 68 | -68 | 68 | 68 | -68 |
60 | 60 | -60 | 60 | -60 | -60 | 60 | -60 | -60 | 60 | -60 | -60 | 60 | -60 | 60 | 60 |
80 | -80 | 80 | 80 | -80 | -80 | -80 | 80 | 80 | -80 | -80 | -80 | 80 | 80 | -80 | 80 |
65 | 65 | 65 | 65 | -65 | 65 | 65 | -65 | 65 | -65 | -65 | 65 | -65 | -65 | -65 | -65 |
43 | -43 | -43 | -43 | 43 | 43 | 43 | -43 | 43 | -43 | -43 | -43 | 43 | 43 | 43 | -43 |
100 | 100 | -100 | -100 | 100 | -100 | -100 | 100 | 100 | -100 | -100 | 100 | -100 | -100 | 100 | 100 |
45 | -45 | 45 | -45 | 45 | -45 | 45 | -45 | -45 | 45 | -45 | 45 | -45 | 45 | -45 | 45 |
104 | 104 | 104 | -104 | 104 | 104 | -104 | 104 | -104 | 104 | -104 | -104 | 104 | -104 | -104 | -104 |
75 | -75 | -75 | 75 | 75 | 75 | -75 | -75 | -75 | -75 | 75 | 75 | 75 | -75 | -75 | 75 |
86 | 86 | -86 | 86 | 86 | -86 | 86 | 86 | -86 | -86 | 86 | -86 | -86 | 86 | -86 | -86 |
70 | -70 | 70 | 70 | 70 | -70 | -70 | -70 | 70 | 70 | 70 | -70 | -70 | -70 | 70 | -70 |
96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 | 96 |
Contrasts are the column-wise sum of y*X, that is the sum of \(y_i\) at high level (\(y_{X_j+}\)) minus the sum of \(y_i\) at low level (\(y_{X_j-}\)). For “intercept”, it is the sum of all \(y_i\). In matrix language, \[\mbox{Contrast}_j=X_j'y=y_{X_j+}-y_{X_j-},\] where \(N\) is the total sample size. Here \(N=16\).
colSums.y…X. | |
---|---|
(Intercept) | 1121 |
A | 173 |
B | 25 |
C | 79 |
D | 117 |
A:B | 1 |
A:C | -145 |
A:D | 133 |
B:C | 19 |
B:D | -3 |
C:D | -9 |
A:B:C | 15 |
A:B:D | 33 |
A:C:D | -13 |
B:C:D | -21 |
A:B:C:D | 11 |
In matrix language, \(\hat\tau = (X'X)^{-1}X'y\). The differences between means (such as A, B, ABC) are \(2\times \hat\tau\). The formula for coefficients can be written as: \[\mbox{Contrast}_j=X_j'y=y_{X_j+}-y_{X_j-}\] \[\hat \tau_j ={\mbox{Contrast}_j\over N}=\frac{\bar y_{X_j+}-\bar y_{X_j-}}{2}\]
colSums.y…X. | |
---|---|
(Intercept) | 70.06 |
A | 10.81 |
B | 1.56 |
C | 4.94 |
D | 7.31 |
A:B | 0.06 |
A:C | -9.06 |
A:D | 8.31 |
B:C | 1.19 |
B:D | -0.19 |
C:D | -0.56 |
A:B:C | 0.94 |
A:B:D | 2.06 |
A:C:D | -0.81 |
B:C:D | -1.31 |
A:B:C:D | 0.69 |
The linear model is written as:
\[y = 70 + 11 x_A + 1.6 x_B + 4.9 x_C + 7.3 x_D +\\ 0.062 x_Ax_B -9.1 x_Ax_C + 1.2x_Bx_C + 8.3x_Ax_D-0.19 x_Bx_D -0.56 x_Cx_D + \\ 0.94x_Ax_Bx_C+2.1x_Ax_Bx_D -0.81 x_Ax_Cx_D -1.3x_Bx_Cx_D+\\0.69x_Ax_Bx_Cx_D +\\ \epsilon,\] where \(x_j=1\) if factor \(j\) at high level, and \(x_j=-1\) is factor \(j\) at low level, for \(j = A,B,C,D\).
## Warning in anova.lm(g): ANOVA F-tests on an essentially perfect fit are
## unreliable
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
A | 1 | 1870.56 | 1870.56 | NaN | NaN |
B | 1 | 39.06 | 39.06 | NaN | NaN |
C | 1 | 390.06 | 390.06 | NaN | NaN |
D | 1 | 855.56 | 855.56 | NaN | NaN |
A:B | 1 | 0.06 | 0.06 | NaN | NaN |
A:C | 1 | 1314.06 | 1314.06 | NaN | NaN |
A:D | 1 | 1105.56 | 1105.56 | NaN | NaN |
B:C | 1 | 22.56 | 22.56 | NaN | NaN |
B:D | 1 | 0.56 | 0.56 | NaN | NaN |
C:D | 1 | 5.06 | 5.06 | NaN | NaN |
A:B:C | 1 | 14.06 | 14.06 | NaN | NaN |
A:B:D | 1 | 68.06 | 68.06 | NaN | NaN |
A:C:D | 1 | 10.56 | 10.56 | NaN | NaN |
B:C:D | 1 | 27.56 | 27.56 | NaN | NaN |
A:B:C:D | 1 | 7.56 | 7.56 | NaN | NaN |
Residuals | 0 | 0.00 | NaN | NA | NA |
The sum squares can be computed by \[SS_j=N \times \hat \tau_j^2=\frac{\mbox{Contrast}^2_j}{N},\] where \(N=2^k\times n\) is the total number of observations. In this example, \(N=16\) since \(k=4, n=1\). This formular is also a specialization of the general formular for SS: \[SS_\mbox{treatment}=\sum_{i=1}^a n_i (\bar y_i-\bar y)^2\] with \[n_i=N/2, \bar y_{X_j+}-\bar y = \hat\tau_j,\bar y_{X_j-}-\bar y = -\hat\tau_j, a = 2.\] Therefore, \[SS_j=(N/2)\times\hat\tau_j^2 + (N/2)\times\hat \tau_j^2=N\times \hat\tau_j^2.\]
X.Intercept. | A | B | C | D | A.B | A.C | A.D | B.C | B.D | C.D | A.B.C | A.B.D | A.C.D | B.C.D | A.B.C.D |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
78540 | 1871 | 39 | 390 | 856 | 0.06 | 1314 | 1106 | 23 | 0.56 | 5.1 | 14 | 68 | 11 | 28 | 7.6 |
\(SS_j\) is just the Sum square of \(X_j\hat \tau_j\):
X.Intercept. | A | B | C | D | A.B | A.C | A.D | B.C | B.D | C.D | A.B.C | A.B.D | A.C.D | B.C.D | A.B.C.D |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
70 | -11 | -1.6 | -4.9 | -7.3 | 0.06 | -9.1 | 8.3 | 1.2 | -0.19 | -0.56 | -0.94 | -2.1 | 0.81 | 1.3 | 0.69 |
70 | 11 | -1.6 | -4.9 | -7.3 | -0.06 | 9.1 | -8.3 | 1.2 | -0.19 | -0.56 | 0.94 | 2.1 | -0.81 | 1.3 | -0.69 |
70 | -11 | 1.6 | -4.9 | -7.3 | -0.06 | -9.1 | 8.3 | -1.2 | 0.19 | -0.56 | 0.94 | 2.1 | 0.81 | -1.3 | -0.69 |
70 | 11 | 1.6 | -4.9 | -7.3 | 0.06 | 9.1 | -8.3 | -1.2 | 0.19 | -0.56 | -0.94 | -2.1 | -0.81 | -1.3 | 0.69 |
70 | -11 | -1.6 | 4.9 | -7.3 | 0.06 | 9.1 | 8.3 | -1.2 | -0.19 | 0.56 | 0.94 | -2.1 | -0.81 | -1.3 | -0.69 |
70 | 11 | -1.6 | 4.9 | -7.3 | -0.06 | -9.1 | -8.3 | -1.2 | -0.19 | 0.56 | -0.94 | 2.1 | 0.81 | -1.3 | 0.69 |
70 | -11 | 1.6 | 4.9 | -7.3 | -0.06 | 9.1 | 8.3 | 1.2 | 0.19 | 0.56 | -0.94 | 2.1 | -0.81 | 1.3 | 0.69 |
70 | 11 | 1.6 | 4.9 | -7.3 | 0.06 | -9.1 | -8.3 | 1.2 | 0.19 | 0.56 | 0.94 | -2.1 | 0.81 | 1.3 | -0.69 |
70 | -11 | -1.6 | -4.9 | 7.3 | 0.06 | -9.1 | -8.3 | 1.2 | 0.19 | 0.56 | -0.94 | 2.1 | -0.81 | -1.3 | -0.69 |
70 | 11 | -1.6 | -4.9 | 7.3 | -0.06 | 9.1 | 8.3 | 1.2 | 0.19 | 0.56 | 0.94 | -2.1 | 0.81 | -1.3 | 0.69 |
70 | -11 | 1.6 | -4.9 | 7.3 | -0.06 | -9.1 | -8.3 | -1.2 | -0.19 | 0.56 | 0.94 | -2.1 | -0.81 | 1.3 | 0.69 |
70 | 11 | 1.6 | -4.9 | 7.3 | 0.06 | 9.1 | 8.3 | -1.2 | -0.19 | 0.56 | -0.94 | 2.1 | 0.81 | 1.3 | -0.69 |
70 | -11 | -1.6 | 4.9 | 7.3 | 0.06 | 9.1 | -8.3 | -1.2 | 0.19 | -0.56 | 0.94 | 2.1 | 0.81 | 1.3 | 0.69 |
70 | 11 | -1.6 | 4.9 | 7.3 | -0.06 | -9.1 | 8.3 | -1.2 | 0.19 | -0.56 | -0.94 | -2.1 | -0.81 | 1.3 | -0.69 |
70 | -11 | 1.6 | 4.9 | 7.3 | -0.06 | 9.1 | -8.3 | 1.2 | -0.19 | -0.56 | -0.94 | -2.1 | 0.81 | -1.3 | -0.69 |
70 | 11 | 1.6 | 4.9 | 7.3 | 0.06 | -9.1 | 8.3 | 1.2 | -0.19 | -0.56 | 0.94 | 2.1 | -0.81 | -1.3 | 0.69 |
Understand Psuedo Standard deviation
## [1] 2
## [1] 10
## [1] 102
Lenth Method
QQ plot of effects with lenth significance
## Significant Factors Selected by Lenth's Method:
## A D A:C A:D
## [1] "A" "D" "A:C" "A:D"
Line plots with lenth significance
## alpha PSE ME SME
## 0.05 1.31 3.37 6.85
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
A | 1 | 1871 | 1871 | 96 | 0 |
C | 1 | 390 | 390 | 20 | 0 |
D | 1 | 856 | 856 | 44 | 0 |
A:C | 1 | 1314 | 1314 | 67 | 0 |
A:D | 1 | 1106 | 1106 | 57 | 0 |
Residuals | 10 | 195 | 20 | NA | NA |
Important note: the Sum Squares for the residuals are equal to the sum of sum squares of those “removed” terms.
##
## Call:
## lm(formula = y ~ A + C + D + A * C + A * D, data = filtration)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.375 -1.500 0.063 2.906 5.750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.06 1.10 63.44 2.3e-14 ***
## A 10.81 1.10 9.79 1.9e-06 ***
## C 4.94 1.10 4.47 0.0012 **
## D 7.31 1.10 6.62 5.9e-05 ***
## A:C -9.06 1.10 -8.21 9.4e-06 ***
## A:D 8.31 1.10 7.53 2.0e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.4 on 10 degrees of freedom
## Multiple R-squared: 0.966, Adjusted R-squared: 0.949
## F-statistic: 56.7 on 5 and 10 DF, p-value: 5.14e-07
The final linear model for relating y to x based on the above “lm” output is written as follows: \[y=70.06 + 10.81 x_A + 4.94 x_C + 7.31 x_D - 9.06 x_Ax_C + 8.31 x_Ax_D + \epsilon \]
When \(x_A=1\), the coefficient for \(x_C\) is 4.94-9.1 = -4.2, implying that filtration rate decreases about 8% when C increases from low level to high level; when \(x_A=-1\), the coefficient for \(x_C\) is 4.94+9.1 = 14.01. We see that high C factor (formaldehyde) on average increases filtration rate, but can decrease filtration rate when temperature (A) high (+1). Look at the interaction plot:
When \(x_A=1\), the coefficient for \(x_D\) is 7.31+8.31 = 16; when \(x_A=-1\), the coefficient for \(x_D\) is 7.31-8.31=-1. High D (stiring rates) increases filtration rate on average, but increases filtration rate when A (temperatre) is high (+1), and decreases slightly filtration rate a little bit.
Prediction of \(y_i\) in each combination of \(A,C,D\):
newdata <- data.frame (
A = rep(c(-1,1), each =4),
C = rep(rep(c(-1,1), each =2),2),
D = rep(c(-1,1),4)
)
predicted.filtration <- predict(h2, newdata=newdata)
data.frame(newdata,predicted.filtration)
A | C | D | predicted.filtration |
---|---|---|---|
-1 | -1 | -1 | 46 |
-1 | -1 | 1 | 44 |
-1 | 1 | -1 | 74 |
-1 | 1 | 1 | 72 |
1 | -1 | -1 | 69 |
1 | -1 | 1 | 101 |
1 | 1 | -1 | 61 |
1 | 1 | 1 | 92 |
The combination high temperature (A+), low formaldehye (C-) and high stiring (D+) can provide good filtration rate.