Packages and Functions
1 Generate Designs with Computer
2 Analyzing Randomized Complete Block Design
3 Analyzing Latin Square Design Experiments
4 Analyzing Balanced Incomplete Block Design
- 4.1 Fitting Linear Model and ANOVA
- 4.2 Effect Comparison in BIBD
5 Analysis of Covariance (Accounting for Uncontrollable Continuous Covariates)

Packages and Functions

if (!require("lawstat")) {install.packages("lawstat"); library ("lawstat")}
if (!require("agricolae")) {install.packages("agricolae"); library ("agricolae")}

A function with $R^2$ added to anova table output

anova.R2 <- function (obj.lm)
{
    obj.anova <- anova (obj.lm)
    sq <- obj.anova[,"Sum Sq"]
    R2 <- sq/sum(sq)
    obj.anova[, "R2"] <- R2
    obj.anova
}

1 Generate Designs with Computer

1.1 Randomized Complete Block Design

treatments <- LETTERS[1:4]
design.rcbd(treatments, r=5)[c("book", "sketch")]

## $book
##    plots block treatments
## 1    101     1          D
## 2    102     1          A
## 3    103     1          B
## 4    104     1          C
## 5    201     2          C
## 6    202     2          D
## 7    203     2          A
## 8    204     2          B
## 9    301     3          D
## 10   302     3          A
## 11   303     3          C
## 12   304     3          B
## 13   401     4          D
## 14   402     4          B
## 15   403     4          A
## 16   404     4          C
## 17   501     5          D
## 18   502     5          A
## 19   503     5          B
## 20   504     5          C
## 
## $sketch
##      [,1] [,2] [,3] [,4]
## [1,] "D"  "A"  "B"  "C" 
## [2,] "C"  "D"  "A"  "B" 
## [3,] "D"  "A"  "C"  "B" 
## [4,] "D"  "B"  "A"  "C" 
## [5,] "D"  "A"  "B"  "C"

treatments2 <- c(LETTERS[1:4], LETTERS[1:4])
design.rcbd(treatments2, r=3)[c("book", "sketch")]

## $book
##    plots block treatments2
## 1    101     1           C
## 2    102     1           A
## 3    103     1           D
## 4    104     1           B
## 5    105     1           B
## 6    106     1           C
## 7    107     1           D
## 8    108     1           A
## 9    201     2           D
## 10   202     2           D
## 11   203     2           C
## 12   204     2           A
## 13   205     2           B
## 14   206     2           C
## 15   207     2           A
## 16   208     2           B
## 17   301     3           B
## 18   302     3           A
## 19   303     3           D
## 20   304     3           A
## 21   305     3           C
## 22   306     3           C
## 23   307     3           D
## 24   308     3           B
## 
## $sketch
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] "C"  "A"  "D"  "B"  "B"  "C"  "D"  "A" 
## [2,] "D"  "D"  "C"  "A"  "B"  "C"  "A"  "B" 
## [3,] "B"  "A"  "D"  "A"  "C"  "C"  "D"  "B"

1.2 Latin-Square Design

treatments <- LETTERS[1:4]
design.lsd(treatments)[c("book", "sketch")]

## $book
##    plots row col treatments
## 1    101   1   1          A
## 2    102   1   2          C
## 3    103   1   3          B
## 4    104   1   4          D
## 5    201   2   1          B
## 6    202   2   2          D
## 7    203   2   3          C
## 8    204   2   4          A
## 9    301   3   1          D
## 10   302   3   2          B
## 11   303   3   3          A
## 12   304   3   4          C
## 13   401   4   1          C
## 14   402   4   2          A
## 15   403   4   3          D
## 16   404   4   4          B
## 
## $sketch
##      [,1] [,2] [,3] [,4]
## [1,] "A"  "C"  "B"  "D" 
## [2,] "B"  "D"  "C"  "A" 
## [3,] "D"  "B"  "A"  "C" 
## [4,] "C"  "A"  "D"  "B"

## each treatment has been combined with each level of each block
lsd_des <- design.lsd(treatments)$book
lsd_des [order(lsd_des[,"treatments"]), ]

	plots	row	col	treatments
4	104	1	4	A
6	202	2	2	A
9	301	3	1	A
15	403	4	3	A
1	101	1	1	B
7	203	2	3	B
10	302	3	2	B
16	404	4	4	B
2	102	1	2	C
8	204	2	4	C
11	303	3	3	C
13	401	4	1	C
3	103	1	3	D
5	201	2	1	D
12	304	3	4	D
14	402	4	2	D

## look at combination of factors
with (lsd_des, table(row, col))

##    col
## row 1 2 3 4
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

with (lsd_des, table(row, treatments))

##    treatments
## row A B C D
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

with (lsd_des, table(col, treatments))

##    treatments
## col A B C D
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

1.3 Graeco-Latin Design

treat1 <- LETTERS[1:4]
treat2 <- letters[1:4]
gld <- design.graeco(treat1, treat2)[c("book", "sketch")]; gld

## $book
##    plots row col treat1 treat2
## 1    101   1   1      D      d
## 2    102   1   2      C      b
## 3    103   1   3      A      c
## 4    104   1   4      B      a
## 5    201   2   1      C      c
## 6    202   2   2      D      a
## 7    203   2   3      B      d
## 8    204   2   4      A      b
## 9    301   3   1      A      a
## 10   302   3   2      B      c
## 11   303   3   3      D      b
## 12   304   3   4      C      d
## 13   401   4   1      B      b
## 14   402   4   2      A      d
## 15   403   4   3      C      a
## 16   404   4   4      D      c
## 
## $sketch
##      [,1]  [,2]  [,3]  [,4] 
## [1,] "D d" "C b" "A c" "B a"
## [2,] "C c" "D a" "B d" "A b"
## [3,] "A a" "B c" "D b" "C d"
## [4,] "B b" "A d" "C a" "D c"

## look at combination of factors
with(gld$book, table(row, col))

##    col
## row 1 2 3 4
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

with(gld$book, table(row, treat1))

##    treat1
## row A B C D
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

with(gld$book, table(row, treat2))

##    treat2
## row a b c d
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

with(gld$book, table(col, treat1))

##    treat1
## col A B C D
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

with(gld$book, table(col, treat2))

##    treat2
## col a b c d
##   1 1 1 1 1
##   2 1 1 1 1
##   3 1 1 1 1
##   4 1 1 1 1

1.4 Balanced Incomplete Block Design

treatments <-  LETTERS[1:4]
bib.des <- design.bib (treatments, k=3)[c("book", "sketch")]; bib.des

## 
## Parameters BIB
## ==============
## Lambda     : 2
## treatmeans : 4
## Block size : 3
## Blocks     : 4
## Replication: 3 
## 
## Efficiency factor 0.8888889 
## 
## <<< Book >>>

## $book
##    plots block treatments
## 1    101     1          C
## 2    102     1          A
## 3    103     1          B
## 4    201     2          D
## 5    202     2          B
## 6    203     2          C
## 7    301     3          C
## 8    302     3          D
## 9    303     3          A
## 10   401     4          B
## 11   402     4          A
## 12   403     4          D
## 
## $sketch
##      [,1] [,2] [,3]
## [1,] "C"  "A"  "B" 
## [2,] "D"  "B"  "C" 
## [3,] "C"  "D"  "A" 
## [4,] "B"  "A"  "D"

with (bib.des$book, table (block))

## block
## 1 2 3 4 
## 3 3 3 3

with (bib.des$book, table (treatments))

## treatments
## A B C D 
## 3 3 3 3

with (bib.des$book, table (block, treatments))

##      treatments
## block A B C D
##     1 1 1 1 0
##     2 0 1 1 1
##     3 1 0 1 1
##     4 1 1 0 1

treatments <-  LETTERS[1:4]
bib.des2 <- design.bib (treatments, k=2)[c("book", "sketch")]; bib.des2

## 
## Parameters BIB
## ==============
## Lambda     : 1
## treatmeans : 4
## Block size : 2
## Blocks     : 6
## Replication: 3 
## 
## Efficiency factor 0.6666667 
## 
## <<< Book >>>

## $book
##    plots block treatments
## 1    101     1          C
## 2    102     1          A
## 3    201     2          B
## 4    202     2          C
## 5    301     3          D
## 6    302     3          C
## 7    401     4          A
## 8    402     4          D
## 9    501     5          A
## 10   502     5          B
## 11   601     6          D
## 12   602     6          B
## 
## $sketch
##      [,1] [,2]
## [1,] "C"  "A" 
## [2,] "B"  "C" 
## [3,] "D"  "C" 
## [4,] "A"  "D" 
## [5,] "A"  "B" 
## [6,] "D"  "B"

with (bib.des2$book, table (block))

## block
## 1 2 3 4 5 6 
## 2 2 2 2 2 2

with (bib.des2$book, table (treatments))

## treatments
## A B C D 
## 3 3 3 3

with (bib.des2$book, table (block, treatments))

##      treatments
## block A B C D
##     1 1 0 1 0
##     2 0 1 1 0
##     3 0 0 1 1
##     4 1 0 0 1
##     5 1 1 0 0
##     6 0 1 0 1

2 Analyzing Randomized Complete Block Design

This is the example 4.1 from the textbook. A medical device manufacturer produces vascular grafts (artificial veins). These grafts are produced by extruding billets of polytetrafluoroethylene (PTFE) resin combined with a lubricant into tubes. Frequently, some of the tubes in a production run contain small, hard protrusions on the external surface. These defects are known as “flicks.” The defect is cause for rejection of the unit. The product developer responsible for the vascular grafts suspects that the extrusion pressure affects the occur- rence of yield and therefore intends to conduct an experi- ment to investigate this hypothesis. However, the resin is manufactured by an external supplier and is delivered to the medical device manufacturer in batches. The engineer also suspects that there may be significant batch-to-batch variation. Therefore, the product developer decided to investigate the effect of four different levels of extrusion pressure on yield using a randomized com- plete block design considering batches of resin as blocks. Note that there are four levels of extrusion pressure (treatments) and six batches of resin (blocks). Remember that the order in which the extrusion pressures are tested within each block is random. The response variable is yield, or the percentage of tubes in the production run that did not contain any flicks.

2.1 Read Data

vascular <- data.frame (yield = scan ("data/vasculargraft.txt"),
                        batch = as.factor(rep(1:6,4)),
                        pressure = as.factor(rep (1:4, each = 6)) 
)
attach (vascular)## this command allows assessing values in vascular directly

## The following object is masked from package:datasets:
## 
##     pressure

vascular

yield	batch	pressure
90.3	1	1
89.2	2	1
98.2	3	1
93.9	4	1
87.4	5	1
97.9	6	1
92.5	1	2
89.5	2	2
90.6	3	2
94.7	4	2
87.0	5	2
95.8	6	2
85.5	1	3
90.8	2	3
89.6	3	3
86.2	4	3
88.0	5	3
93.4	6	3
82.5	1	4
89.5	2	4
85.6	3	4
87.4	4	4
78.9	5	4
90.7	6	4

## table display the data
tvascular <- tapply(yield, INDEX = list(batch, pressure), mean); tvascular

##      1    2    3    4
## 1 90.3 92.5 85.5 82.5
## 2 89.2 89.5 90.8 89.5
## 3 98.2 90.6 89.6 85.6
## 4 93.9 94.7 86.2 87.4
## 5 87.4 87.0 88.0 78.9
## 6 97.9 95.8 93.4 90.7

tvascular <- cbind(tvascular, rowMeans(tvascular))
tvascular <- rbind (tvascular, colMeans(tvascular))
colnames(tvascular) <- c(paste0("Pressure", 1:4), "Average")
rownames(tvascular) <- c(paste0("Batch", 1:6), "Average")
tvascular

##         Pressure1 Pressure2 Pressure3 Pressure4  Average
## Batch1   90.30000  92.50000  85.50000  82.50000 87.70000
## Batch2   89.20000  89.50000  90.80000  89.50000 89.75000
## Batch3   98.20000  90.60000  89.60000  85.60000 91.00000
## Batch4   93.90000  94.70000  86.20000  87.40000 90.55000
## Batch5   87.40000  87.00000  88.00000  78.90000 85.32500
## Batch6   97.90000  95.80000  93.40000  90.70000 94.45000
## Average  92.81667  91.68333  88.91667  85.76667 89.79583

2.2 Visualize the Data

par (mfrow = c(2,2))
#boxplot
plot(yield ~ batch, data=vascular, main = "Block")
plot(yield ~ pressure, data=vascular, main = "Pressure")
#interaction plot
with (vascular, interaction.plot(batch, pressure, yield, type = "b"))
with (vascular, interaction.plot( pressure,batch, yield, type = "b"))

2.3 Linear Models

2.3.1 Effect Model

fit_vas_effect <- lm (yield ~  batch + pressure, 
                      contrasts = list(batch=contr.sum, pressure=contr.sum),
                      data = vascular)

data.frame(batch,pressure,model.matrix(fit_vas_effect))

batch	pressure	X.Intercept.	batch1	batch2	batch3	batch4	batch5	pressure1	pressure2	pressure3
1	1	1	1	0	0	0	0	1	0	0
2	1	1	0	1	0	0	0	1	0	0
3	1	1	0	0	1	0	0	1	0	0
4	1	1	0	0	0	1	0	1	0	0
5	1	1	0	0	0	0	1	1	0	0
6	1	1	-1	-1	-1	-1	-1	1	0	0
1	2	1	1	0	0	0	0	0	1	0
2	2	1	0	1	0	0	0	0	1	0
3	2	1	0	0	1	0	0	0	1	0
4	2	1	0	0	0	1	0	0	1	0
5	2	1	0	0	0	0	1	0	1	0
6	2	1	-1	-1	-1	-1	-1	0	1	0
1	3	1	1	0	0	0	0	0	0	1
2	3	1	0	1	0	0	0	0	0	1
3	3	1	0	0	1	0	0	0	0	1
4	3	1	0	0	0	1	0	0	0	1
5	3	1	0	0	0	0	1	0	0	1
6	3	1	-1	-1	-1	-1	-1	0	0	1
1	4	1	1	0	0	0	0	-1	-1	-1
2	4	1	0	1	0	0	0	-1	-1	-1
3	4	1	0	0	1	0	0	-1	-1	-1
4	4	1	0	0	0	1	0	-1	-1	-1
5	4	1	0	0	0	0	1	-1	-1	-1
6	4	1	-1	-1	-1	-1	-1	-1	-1	-1

summary(fit_vas_effect)

## 
## Call:
## lm(formula = yield ~ batch + pressure, data = vascular, contrasts = list(batch = contr.sum, 
##     pressure = contr.sum))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5708 -1.3333 -0.3167  1.1417  4.1792 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 89.79583    0.55248 162.531  < 2e-16 ***
## batch1      -2.09583    1.23539  -1.696  0.11044    
## batch2      -0.04583    1.23539  -0.037  0.97089    
## batch3       1.20417    1.23539   0.975  0.34516    
## batch4       0.75417    1.23539   0.610  0.55069    
## batch5      -4.47083    1.23539  -3.619  0.00253 ** 
## pressure1    3.02083    0.95693   3.157  0.00652 ** 
## pressure2    1.88750    0.95693   1.972  0.06728 .  
## pressure3   -0.87917    0.95693  -0.919  0.37277    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.707 on 15 degrees of freedom
## Multiple R-squared:  0.7712, Adjusted R-squared:  0.6492 
## F-statistic: 6.321 on 8 and 15 DF,  p-value: 0.00113

tvascular

##         Pressure1 Pressure2 Pressure3 Pressure4  Average
## Batch1   90.30000  92.50000  85.50000  82.50000 87.70000
## Batch2   89.20000  89.50000  90.80000  89.50000 89.75000
## Batch3   98.20000  90.60000  89.60000  85.60000 91.00000
## Batch4   93.90000  94.70000  86.20000  87.40000 90.55000
## Batch5   87.40000  87.00000  88.00000  78.90000 85.32500
## Batch6   97.90000  95.80000  93.40000  90.70000 94.45000
## Average  92.81667  91.68333  88.91667  85.76667 89.79583

mu <- tvascular["Average", "Average"]
mu

## [1] 89.79583

tau_pressure <- tvascular["Average",1:4]-tvascular["Average", "Average"]
tau_pressure

##  Pressure1  Pressure2  Pressure3  Pressure4 
##  3.0208333  1.8875000 -0.8791667 -4.0291667

tau_Batch <- tvascular[1:6,"Average"]-tvascular["Average", "Average"]
tau_Batch

##      Batch1      Batch2      Batch3      Batch4      Batch5      Batch6 
## -2.09583333 -0.04583333  1.20416667  0.75416667 -4.47083333  4.65416667

2.3.2 Treatment Difference Model

fit_vas <- lm (yield ~  batch + pressure, data = vascular)
data.frame(batch,pressure,model.matrix(fit_vas))

batch	pressure	X.Intercept.	batch2	batch3	batch4	batch5	batch6	pressure2	pressure3	pressure4
1	1	1	0	0	0	0	0	0	0	0
2	1	1	1	0	0	0	0	0	0	0
3	1	1	0	1	0	0	0	0	0	0
4	1	1	0	0	1	0	0	0	0	0
5	1	1	0	0	0	1	0	0	0	0
6	1	1	0	0	0	0	1	0	0	0
1	2	1	0	0	0	0	0	1	0	0
2	2	1	1	0	0	0	0	1	0	0
3	2	1	0	1	0	0	0	1	0	0
4	2	1	0	0	1	0	0	1	0	0
5	2	1	0	0	0	1	0	1	0	0
6	2	1	0	0	0	0	1	1	0	0
1	3	1	0	0	0	0	0	0	1	0
2	3	1	1	0	0	0	0	0	1	0
3	3	1	0	1	0	0	0	0	1	0
4	3	1	0	0	1	0	0	0	1	0
5	3	1	0	0	0	1	0	0	1	0
6	3	1	0	0	0	0	1	0	1	0
1	4	1	0	0	0	0	0	0	0	1
2	4	1	1	0	0	0	0	0	0	1
3	4	1	0	1	0	0	0	0	0	1
4	4	1	0	0	1	0	0	0	0	1
5	4	1	0	0	0	1	0	0	0	1
6	4	1	0	0	0	0	1	0	0	1

summary(fit_vas)

## 
## Call:
## lm(formula = yield ~ batch + pressure, data = vascular)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5708 -1.3333 -0.3167  1.1417  4.1792 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   90.721      1.657  54.735  < 2e-16 ***
## batch2         2.050      1.914   1.071 0.301043    
## batch3         3.300      1.914   1.724 0.105201    
## batch4         2.850      1.914   1.489 0.157175    
## batch5        -2.375      1.914  -1.241 0.233684    
## batch6         6.750      1.914   3.527 0.003050 ** 
## pressure2     -1.133      1.563  -0.725 0.479457    
## pressure3     -3.900      1.563  -2.496 0.024713 *  
## pressure4     -7.050      1.563  -4.512 0.000414 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.707 on 15 degrees of freedom
## Multiple R-squared:  0.7712, Adjusted R-squared:  0.6492 
## F-statistic: 6.321 on 8 and 15 DF,  p-value: 0.00113

tvascular

##         Pressure1 Pressure2 Pressure3 Pressure4  Average
## Batch1   90.30000  92.50000  85.50000  82.50000 87.70000
## Batch2   89.20000  89.50000  90.80000  89.50000 89.75000
## Batch3   98.20000  90.60000  89.60000  85.60000 91.00000
## Batch4   93.90000  94.70000  86.20000  87.40000 90.55000
## Batch5   87.40000  87.00000  88.00000  78.90000 85.32500
## Batch6   97.90000  95.80000  93.40000  90.70000 94.45000
## Average  92.81667  91.68333  88.91667  85.76667 89.79583

tvascular["Average",2:4]-tvascular["Average", 1]

## Pressure2 Pressure3 Pressure4 
## -1.133333 -3.900000 -7.050000

tvascular[2:6,"Average"]-tvascular[1,"Average"]

## Batch2 Batch3 Batch4 Batch5 Batch6 
##  2.050  3.300  2.850 -2.375  6.750

## The intercept of treatment difference model is:
mu + tau_Batch[1] + tau_pressure[1]

##   Batch1 
## 90.72083

2.3.3 Visualize the fitted model

summary(fit_vas)

## 
## Call:
## lm(formula = yield ~ batch + pressure, data = vascular)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5708 -1.3333 -0.3167  1.1417  4.1792 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   90.721      1.657  54.735  < 2e-16 ***
## batch2         2.050      1.914   1.071 0.301043    
## batch3         3.300      1.914   1.724 0.105201    
## batch4         2.850      1.914   1.489 0.157175    
## batch5        -2.375      1.914  -1.241 0.233684    
## batch6         6.750      1.914   3.527 0.003050 ** 
## pressure2     -1.133      1.563  -0.725 0.479457    
## pressure3     -3.900      1.563  -2.496 0.024713 *  
## pressure4     -7.050      1.563  -4.512 0.000414 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.707 on 15 degrees of freedom
## Multiple R-squared:  0.7712, Adjusted R-squared:  0.6492 
## F-statistic: 6.321 on 8 and 15 DF,  p-value: 0.00113

par (mfrow = c(1,2))

with (vascular, 
      interaction.plot(batch, pressure, yield, type = "b", col = "black",
                       main = "Original Data", ylim = c(78,100))
)
with (vascular, 
      interaction.plot(batch, pressure, fit_vas$fitted.values, type = "b", col = "red",
                       main = "Predicted/Fitted Values", ylim = c(78,100)))

2.4 ANOVA

2.4.1 Formulae

The formula for computing block and treatment sum squares is the same as in one-way ANOVA.

tvascular

##         Pressure1 Pressure2 Pressure3 Pressure4  Average
## Batch1   90.30000  92.50000  85.50000  82.50000 87.70000
## Batch2   89.20000  89.50000  90.80000  89.50000 89.75000
## Batch3   98.20000  90.60000  89.60000  85.60000 91.00000
## Batch4   93.90000  94.70000  86.20000  87.40000 90.55000
## Batch5   87.40000  87.00000  88.00000  78.90000 85.32500
## Batch6   97.90000  95.80000  93.40000  90.70000 94.45000
## Average  92.81667  91.68333  88.91667  85.76667 89.79583

## Total sum squares:

sum(tvascular[1:6,1:4]^2) - 24 * mu^2->SST; SST

## [1] 480.3096

## pressure sum squares:

sum(tvascular["Average",1:4]^2 * 6)-24*mu^2->SStr; SStr

## [1] 178.1712

## batch sum squares:

sum(tvascular[1:6, "Average"]^2 * 4)-24*mu^2->SSblock; SSblock

## [1] 192.2521

## SSE:

SST - SStr - SSblock

## [1] 109.8863

2.4.2 Using lm and anova functions

fit_pressure <- lm (yield~pressure, data = vascular)
anova (fit_pressure)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
pressure	3	178.1712	59.39042	3.931339	0.023448
Residuals	20	302.1383	15.10692	NA	NA

fit_batch <- lm (yield~batch, data = vascular)

anova (fit_batch)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
batch	5	192.2521	38.45042	2.402671	0.0777009
Residuals	18	288.0575	16.00319	NA	NA

## Due to the orthogonality in pressure and batch, the following two ways for anova are the same:
fit_batch_pressure <- lm (yield~batch+pressure, data = vascular)

anova (fit_batch_pressure)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
batch	5	192.2521	38.45042	5.248666	0.0055317
pressure	3	178.1712	59.39042	8.107077	0.0019163
Residuals	15	109.8863	7.32575	NA	NA

fit_pressure_batch <- lm (yield~pressure + batch, data = vascular)

anova (fit_pressure_batch)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
pressure	3	178.1712	59.39042	8.107077	0.0019163
batch	5	192.2521	38.45042	5.248666	0.0055317
Residuals	15	109.8863	7.32575	NA	NA

2.5 Effect Comparison

2.5.1 Standard Error of the Difference Between Two Means

\[se(\hat \mu_i-\hat \mu_j)=\hat\sigma\times \sqrt{1/n_i+1/n_j}\] where $\hat\sigma=\sqrt{MSE}$ from the treatment+block model. This SE will be multiplied by the t quantile or Tukey quantiles to obtain the LSD CI or family-wise CI.

2.5.2 Effect Comparison for Fixed Pair with LSD

cbind(summary (fit_vas)$coef, confint(fit_vas))

##              Estimate Std. Error    t value     Pr(>|t|)       2.5 %    97.5 %
## (Intercept) 90.720833   1.657455 54.7350287 1.091578e-18  87.1880522 94.253615
## batch2       2.050000   1.913864  1.0711316 3.010429e-01  -2.0293043  6.129304
## batch3       3.300000   1.913864  1.7242605 1.052012e-01  -0.7793043  7.379304
## batch4       2.850000   1.913864  1.4891341 1.571751e-01  -1.2293043  6.929304
## batch5      -2.375000   1.913864 -1.2409451 2.336842e-01  -6.4543043  1.704304
## batch6       6.750000   1.913864  3.5268966 3.050449e-03   2.6706957 10.829304
## pressure2   -1.133333   1.562663 -0.7252575 4.794567e-01  -4.4640714  2.197405
## pressure3   -3.900000   1.562663 -2.4957391 2.471273e-02  -7.2307380 -0.569262
## pressure4   -7.050000   1.562663 -4.5115284 4.136854e-04 -10.3807380 -3.719262

## hand calculation of SE
SE_diff_blocks <- sigma (fit_vas) * sqrt(1/4+1/4); SE_diff_blocks # 4 is number of treatment levels

## [1] 1.913864

SE_diff_treats <- sigma (fit_vas) * sqrt (1/6+1/6); SE_diff_treats # 6 is number of block levels

## [1] 1.562663

## ME in LSD (without correction for multiple comparison bias, illustration for hand calculation)
SE_diff_blocks* qt (0.975, df = 15) ## for blocks

## [1] 4.079304

SE_diff_treats* qt (0.975, df = 15) ## for treatments

## [1] 3.330738

2.5.3 Tukey Multiple Comparison of Effects

TukeyHSD(aov (yield~pressure + batch, vascular))

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = yield ~ pressure + batch, data = vascular)
## 
## $pressure
##          diff        lwr       upr     p adj
## 2-1 -1.133333  -5.637161  3.370495 0.8854831
## 3-1 -3.900000  -8.403828  0.603828 0.1013084
## 4-1 -7.050000 -11.553828 -2.546172 0.0020883
## 3-2 -2.766667  -7.270495  1.737161 0.3245644
## 4-2 -5.916667 -10.420495 -1.412839 0.0086667
## 4-3 -3.150000  -7.653828  1.353828 0.2257674
## 
## $batch
##       diff         lwr        upr     p adj
## 2-1  2.050  -4.1680828  8.2680828 0.8853016
## 3-1  3.300  -2.9180828  9.5180828 0.5376297
## 4-1  2.850  -3.3680828  9.0680828 0.6757699
## 5-1 -2.375  -8.5930828  3.8430828 0.8105903
## 6-1  6.750   0.5319172 12.9680828 0.0297368
## 3-2  1.250  -4.9680828  7.4680828 0.9845521
## 4-2  0.800  -5.4180828  7.0180828 0.9980198
## 5-2 -4.425 -10.6430828  1.7930828 0.2483499
## 6-2  4.700  -1.5180828 10.9180828 0.1986961
## 4-3 -0.450  -6.6680828  5.7680828 0.9998784
## 5-3 -5.675 -11.8930828  0.5430828 0.0837504
## 6-3  3.450  -2.7680828  9.6680828 0.4925715
## 5-4 -5.225 -11.4430828  0.9930828 0.1263042
## 6-4  3.900  -2.3180828 10.1180828 0.3674672
## 6-5  9.125   2.9069172 15.3430828 0.0027838

par (mfrow = c(1,2))
plot(TukeyHSD(aov (yield~pressure + batch, vascular)))

## ME for Tukey CI (illustration for hand computing)
SE_diff_blocks * qtukey(0.95, 6, df = 15)/sqrt(2) ## for blocks

## [1] 6.218083

SE_diff_treats * qtukey(0.95, 4, df = 15)/sqrt(2) ## for treatments

## [1] 4.503828

2.6 Model Checking

2.6.1 Checking the treatment or block only model

par(mfrow=c(2,2))
plot(fit_pressure$residuals ~vascular$batch)
plot(fit_batch$residuals ~vascular$pressure)
qqnorm(fit_batch$residuals)
qqline(fit_batch$residuals)
qqnorm(fit_pressure$residuals)
qqline(fit_pressure$residuals)

shapiro.test(fit_batch$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  fit_batch$residuals
## W = 0.98098, p-value = 0.9132

shapiro.test(fit_pressure$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  fit_pressure$residuals
## W = 0.95664, p-value = 0.3747

From the boxplots of residuals versus un-accounted variables, we can see clearly that there are still signals uncaptured. However, the normality test for the residuals of each model may tell you that the model is not bad good. To compare (not check) the goodness of fits of different models, another criteria, AIC, is often used. AIC is a criterion related to sum squares of residuals and the number of parameters.

AIC(fit_batch)

## [1] 141.7516

AIC(fit_pressure)

## [1] 138.897

AIC(fit_batch_pressure)

## [1] 124.6225

Note: AIC is not required for this course. ### Checking the treatment+block model

#overall normality
par (mfrow =c(1,3))
qqnorm(fit_vas$res)
qqline(fit_vas$res)
shapiro.test(fit_vas$res)

## 
##  Shapiro-Wilk normality test
## 
## data:  fit_vas$res
## W = 0.95631, p-value = 0.3689

#independence
plot(fit_vas$res,xlab="Run order",ylab="Residuals")
#constant variance
plot(fit_vas$fitted,fit_vas$res,xlab="Fitted",ylab="Residuals")

#residual plots of residuals vs. pressurement labels, block labels
par (mfrow = c(1,2))
with(vascular,plot(fit_vas$res~as.numeric(pressure), pch = as.numeric(pressure) ) )
abline(h=0)  

with(vascular,plot(fit_vas$res~as.numeric(batch), pch = as.numeric(batch) ) )
abline(h=0)

# levene test for constant variances within each pressurement or each block
par (mfrow = c(1,2))

anova (lm(abs(fit_vas$residuals)~vascular$batch))

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
vascular$batch	5	13.44405	2.688810	1.84921	0.1538582
Residuals	18	26.17257	1.454032	NA	NA

anova (lm(abs(fit_vas$residuals)~vascular$pressure))

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
vascular$pressure	3	0.1339583	0.0446528	0.0226189	0.9952375
Residuals	20	39.4826620	1.9741331	NA	NA

#detach the dataset
#detach ("vascular")

3 Analyzing Latin Square Design Experiments

A hardness testing machine presses a pointed rod (the ‘tip’) into a metal specimen (a ‘coupon’), with a known force. The depth of the depression is a measure of the hardness of the specimen. It is feared that, depending on the kind of tip used, the machine might give different readings. The experimenter wants 4 observations on each of the 4 types of tips. Suppose that the ‘coupon’ and the ‘operator’ of the testing machine are thought to be factors. Suppose there are p = 4 operators, p = 4 coupons, and p = 4 tips. The first two are nuisance factors, the last is the treatment factor.

# read data
hardness <- read.table("data/hardness.txt", 
                       header=T, 
                       colClasses=c("factor","factor","factor","numeric"))

attach(hardness)
hardness

operator	coupon	tip	y
1	1	A	9.3
1	2	B	9.4
1	3	C	9.2
1	4	D	9.7
2	1	B	9.3
2	2	A	9.4
2	3	D	9.6
2	4	C	9.4
3	1	C	9.5
3	2	D	10.0
3	3	A	9.6
3	4	B	9.8
4	1	D	10.2
4	2	C	9.7
4	3	B	9.9
4	4	A	10.0

## verifying a latin square design
table (operator, tip)

##         tip
## operator A B C D
##        1 1 1 1 1
##        2 1 1 1 1
##        3 1 1 1 1
##        4 1 1 1 1

table (operator, coupon)

##         coupon
## operator 1 2 3 4
##        1 1 1 1 1
##        2 1 1 1 1
##        3 1 1 1 1
##        4 1 1 1 1

table (coupon, tip)

##       tip
## coupon A B C D
##      1 1 1 1 1
##      2 1 1 1 1
##      3 1 1 1 1
##      4 1 1 1 1

## look at the data

par (mfrow = c(1,3))
with (hardness, interaction.plot(tip,operator,y, type = "b"))
with (hardness, interaction.plot(tip,coupon,y, type = "b"))
with (hardness, interaction.plot(operator,tip,y, type = "b"))

par (mfrow = c(1,3))
plot (y~operator)
plot (y~coupon)
plot (y~tip)

## fit the model and anova
g <- lm(y ~ operator + coupon + tip, hardness)
summary (g)

## 
## Call:
## lm(formula = y ~ operator + coupon + tip, data = hardness)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0.075 -0.025  0.000  0.025  0.050 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.300e+00  4.564e-02 203.753 9.43e-13 ***
## operator2    2.500e-02  4.082e-02   0.612 0.562764    
## operator3    3.250e-01  4.082e-02   7.961 0.000209 ***
## operator4    5.500e-01  4.082e-02  13.472 1.04e-05 ***
## coupon2      5.000e-02  4.082e-02   1.225 0.266570    
## coupon3      9.433e-16  4.082e-02   0.000 1.000000    
## coupon4      1.500e-01  4.082e-02   3.674 0.010402 *  
## tipB         2.500e-02  4.082e-02   0.612 0.562764    
## tipC        -1.250e-01  4.082e-02  -3.062 0.022172 *  
## tipD         3.000e-01  4.082e-02   7.348 0.000325 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05774 on 6 degrees of freedom
## Multiple R-squared:  0.9845, Adjusted R-squared:  0.9612 
## F-statistic: 42.33 on 9 and 6 DF,  p-value: 9.592e-05

anova.R2(g)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
operator	3	0.825	0.2750000	82.5	0.0000287	0.6395349
coupon	3	0.060	0.0200000	6.0	0.0307958	0.0465116
tip	3	0.385	0.1283333	38.5	0.0002585	0.2984496
Residuals	6	0.020	0.0033333	NA	NA	0.0155039

## Tukey Multiple Comparison
plot(TukeyHSD(aov(g)))

## model diagnosis
#plot(g)
plot (g$residuals~ g$fitted.values)
plot (g$residuals~ as.numeric(hardness$operator))
plot (g$residuals~ as.numeric(hardness$coupon))

plot (g$residuals~ as.numeric(hardness$tip))
qqnorm (g$residuals); qqline(g$residuals)
##
detach ("hardness")

4 Analyzing Balanced Incomplete Block Design

Suppose that a chemical engineer thinks that the time of reaction for a chemical process is a function of the type of catalyst employed. Four catalysts are currently being investigated. The experimental procedure consists of selecting a batch of raw material, loading the pilot plant, applying each catalyst in a separate run of the pilot plant, and observing the reaction time. Because variations in the batches of raw material may affect the performance of the catalysts, the engineer decides to use batches of raw material as blocks. However, each batch is only large enough to permit three catalysts to be run. Therefore a randomized incomplete block design must be used.

catalyst <- read.table("data/catalyst.txt", header=T, colClasses=c("factor","factor","numeric"))
catalyst

Treatment	Block	y
1	1	73
1	2	74
1	4	71
2	2	75
2	3	67
2	4	72
3	1	73
3	2	75
3	3	68
4	1	75
4	3	72
4	4	75

attach(catalyst)

## check the design
table (Block, Treatment)

##      Treatment
## Block 1 2 3 4
##     1 1 0 1 1
##     2 1 1 1 0
##     3 0 1 1 1
##     4 1 1 0 1

## look at the dataset
par (mfrow = c(1,3))
plot (y~Block)
plot (y~Treatment)
interaction.plot(Block, Treatment, y, type ="p", main = "Original Data")

4.1 Fitting Linear Model and ANOVA

## fit linear model (block then treatment)
g<-lm(y ~ Block + Treatment, catalyst)
## anova
anova.R2(g)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
Block	3	55.00	18.333333	28.20513	0.0014678	0.6790123
Treatment	3	22.75	7.583333	11.66667	0.0107387	0.2808642
Residuals	5	3.25	0.650000	NA	NA	0.0401235

## making new predictions on complete grid of blocks and treatment (for understanding the estimating process)
new.treat <- rep (as.character(1:4), 4)
new.block <- rep (as.character(1:4), each = 4)
new.data <- data.frame (Treatment=new.treat, Block = new.block)
new.pred <- predict (g, newdata = new.data)


## drawing interaction plots to look at the prediction
par (mfrow = c(1,2))
interaction.plot(Block, Treatment, y, type ="p", main = "Original Data",ylim = range (y, new.pred))

## plot predicted values
interaction.plot(new.block, new.treat, new.pred, 
                 type = "b",col ="red", ylim = range (y, new.pred),
                 xlab = "Block", ylab = "y", main = "Predicted Values in BIBD")
## show original points on the previous plot
points (as.numeric(Block), y, pch = as.character(Treatment ))

## fit another model (treatment then block)
g2<-lm(y ~  Treatment + Block, catalyst)
anova(g2)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)
Treatment	3	11.66667	3.888889	5.982906	0.0414634
Block	3	66.08333	22.027778	33.888889	0.0009528
Residuals	5	3.25000	0.650000	NA	NA

We see that the ordering makes differences in anova, although it does not change least square regression fitting.

4.2 Effect Comparison in BIBD

BIBD Effect Estimations is not exactly equal to treatment and block mean differences

means.treat <- tapply(y, Treatment, mean); means.treat - means.treat[1]

##          1          2          3          4 
##  0.0000000 -1.3333333 -0.6666667  1.3333333

means.block <- tapply(y, Block, mean); means.block - means.block[1]

##         1         2         3         4 
##  0.000000  1.000000 -4.666667 -1.000000

## fixed pair means comparison (effect estimate differences)
summary (g)

## 
## Call:
## lm(formula = y ~ Block + Treatment, data = catalyst)
## 
## Residuals:
##          1          2          3          4          5          6          7 
##  7.500e-01 -3.750e-01 -3.750e-01  3.750e-01 -7.500e-01  3.750e-01  1.250e-01 
##          8          9         10         11         12 
##  1.188e-14 -1.250e-01 -8.750e-01  8.750e-01  1.077e-14 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  72.2500     0.6158 117.334 8.53e-10 ***
## Block2        2.1250     0.6982   3.043  0.02864 *  
## Block3       -4.7500     0.6982  -6.803  0.00105 ** 
## Block4       -0.8750     0.6982  -1.253  0.26554    
## Treatment2    0.2500     0.6982   0.358  0.73492    
## Treatment3    0.6250     0.6982   0.895  0.41173    
## Treatment4    3.6250     0.6982   5.192  0.00349 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8062 on 5 degrees of freedom
## Multiple R-squared:  0.9599, Adjusted R-squared:  0.9117 
## F-statistic: 19.94 on 6 and 5 DF,  p-value: 0.002396

confint (g)

##                  2.5 %     97.5 %
## (Intercept) 70.6671254 73.8328746
## Block2       0.3301889  3.9198111
## Block3      -6.5448111 -2.9551889
## Block4      -2.6698111  0.9198111
## Treatment2  -1.5448111  2.0448111
## Treatment3  -1.1698111  2.4198111
## Treatment4   1.8301889  5.4198111

## multiple comparison
catalyst.tukeyci <- TukeyHSD(aov(y~Block+Treatment, data = catalyst)); catalyst.tukeyci

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = y ~ Block + Treatment, data = catalyst)
## 
## $Block
##          diff       lwr       upr     p adj
## 2-1  1.000000 -1.428998  3.428998 0.4917130
## 3-1 -4.666667 -7.095665 -2.237669 0.0032954
## 4-1 -1.000000 -3.428998  1.428998 0.4917130
## 3-2 -5.666667 -8.095665 -3.237669 0.0013433
## 4-2 -2.000000 -4.428998  0.428998 0.0975482
## 4-3  3.666667  1.237669  6.095665 0.0096068
## 
## $Treatment
##          diff        lwr      upr     p adj
## 2-1 0.2222222 -2.2067758 2.651220 0.9852477
## 3-1 0.5555556 -1.8734425 2.984554 0.8323947
## 4-1 3.2222222  0.7932242 5.651220 0.0165827
## 3-2 0.3333333 -2.0956647 2.762331 0.9540767
## 4-2 3.0000000  0.5710020 5.428998 0.0222012
## 4-3 2.6666667  0.2376686 5.095665 0.0352698

par (mfrow = c(1,2))
plot (catalyst.tukeyci)

catalyst.tukeyci2 <- TukeyHSD(aov(y~Treatment+Block, data = catalyst)); 
catalyst.tukeyci2

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = y ~ Treatment + Block, data = catalyst)
## 
## $Treatment
##           diff        lwr      upr     p adj
## 2-1 -1.3333333 -3.7623314 1.095665 0.2923571
## 3-1 -0.6666667 -3.0956647 1.762331 0.7501716
## 4-1  1.3333333 -1.0956647 3.762331 0.2923571
## 3-2  0.6666667 -1.7623314 3.095665 0.7501716
## 4-2  2.6666667  0.2376686 5.095665 0.0352698
## 4-3  2.0000000 -0.4289980 4.428998 0.0975482
## 
## $Block
##           diff        lwr        upr     p adj
## 2-1  1.8888889 -0.5401092  4.3178869 0.1167979
## 3-1 -4.2222222 -6.6512203 -1.7932242 0.0051736
## 4-1 -0.7777778 -3.2067758  1.6512203 0.6622480
## 3-2 -6.1111111 -8.5401092 -3.6821131 0.0009414
## 4-2 -2.6666667 -5.0956647 -0.2376686 0.0352698
## 4-3  3.4444444  1.0154464  5.8734425 0.0125455

## compare to the analysis that considers Block first then Treatment
catalyst.tukeyci

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = y ~ Block + Treatment, data = catalyst)
## 
## $Block
##          diff       lwr       upr     p adj
## 2-1  1.000000 -1.428998  3.428998 0.4917130
## 3-1 -4.666667 -7.095665 -2.237669 0.0032954
## 4-1 -1.000000 -3.428998  1.428998 0.4917130
## 3-2 -5.666667 -8.095665 -3.237669 0.0013433
## 4-2 -2.000000 -4.428998  0.428998 0.0975482
## 4-3  3.666667  1.237669  6.095665 0.0096068
## 
## $Treatment
##          diff        lwr      upr     p adj
## 2-1 0.2222222 -2.2067758 2.651220 0.9852477
## 3-1 0.5555556 -1.8734425 2.984554 0.8323947
## 4-1 3.2222222  0.7932242 5.651220 0.0165827
## 3-2 0.3333333 -2.0956647 2.762331 0.9540767
## 4-2 3.0000000  0.5710020 5.428998 0.0222012
## 4-3 2.6666667  0.2376686 5.095665 0.0352698

detach ("catalyst")

5 Analysis of Covariance (Accounting for Uncontrollable Continuous Covariates)

Textbook section 15.3

Consider an experiment (Flurry, 1939) to study the breaking strength (y) in grams of three types of starch film. The breaking strength is also known to depend on the thickness of the film (x) as measured in 10-4 inches. Because film thickness varies from run to run and its values cannot be controlled or chosen prior to the experiment, it should be treated as a covariate whose effect on strength needs to be accounted for before comparing the three types of starch.

starch <- read.csv("data/starch2.txt")
starch$type <- factor (starch$type, levels = c("corn", "canna", "potato"))
knitr::kable(starch)

strength	thickness	type
791.7	7.7	canna
610.0	6.3	canna
710.0	8.6	canna
940.7	11.8	canna
990.0	12.4	canna
916.2	12.0	canna
835.0	11.4	canna
724.3	10.4	canna
611.1	9.2	canna
621.7	9.0	canna
735.4	9.5	canna
990.0	12.5	canna
862.7	11.7	canna
731.0	8.0	corn
710.0	7.3	corn
604.7	7.2	corn
508.8	6.1	corn
393.0	6.4	corn
416.0	6.4	corn
400.0	6.9	corn
335.6	5.8	corn
306.4	5.3	corn
426.0	6.7	corn
382.5	5.8	corn
340.8	5.7	corn
436.7	6.1	corn
333.3	6.2	corn
382.3	6.3	corn
397.7	6.0	corn
619.1	6.8	corn
857.3	7.9	corn
592.5	7.2	corn
983.3	13.0	potato
958.8	13.3	potato
747.8	10.7	potato
866.0	12.2	potato
810.8	11.6	potato
950.0	9.7	potato
1282.0	10.8	potato
1233.8	10.1	potato
1660.0	12.7	potato
746.0	9.8	potato
650.0	10.0	potato
992.5	13.8	potato
896.7	13.3	potato
873.9	12.4	potato
924.4	12.2	potato
1050.0	14.1	potato
973.3	13.7	potato

attach (starch)

## look at data
par (mfrow=c(1,3))
plot (strength~thickness,pch = as.numeric(type), col =type);
legend (5.5,1700,  legend=levels(type), pch = 1:3, col=1:3 )
plot (strength~type)
plot (thickness~type)

5.1 Analysis 1-1: Considering ‘Type’ only, no ‘thickness’

fit1_typeonly <- lm (strength~type)
anova.R2 (fit1_typeonly)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
type	2	2246204	1123102.19	32.6191	0	0.586473
Residuals	46	1583818	34430.82	NA	NA	0.413527

par (mfrow = c(1,1))
plot(TukeyHSD(aov(strength~type)))

## we see that there is great difference in thickness between starch types

5.2 Analysis 1-2: Considering ‘thickness’ only, no ‘Type’

#fit_starch0 <- lm (strength~1)
fit_starch1 <- lm (strength~thickness)
coefficients(fit_starch1)

## (Intercept)   thickness 
##   -43.46202    83.13356

## looking at the prediction
plot (strength~thickness,pch = as.numeric(type), col =type, main = "One Intercept, One Slope");
legend (5.5,1700,  legend=levels(type), pch = 1:3, col=1:3 )
abline (fit_starch1)

anova.R2 (fit_starch1)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
thickness	1	2553357	2553357.20	94.00102	0	0.6666691
Residuals	47	1276665	27163.08	NA	NA	0.3333309

5.3 Analysis 2-1: Considering “thickness” and then “type”

#anova (fit_starch0, fit_starch1, fit_starch2)
## anova itself uses the above sequence in producing the anova table
fit_starch2<-lm(strength~thickness+type)
coefficients(fit_starch2)

## (Intercept)   thickness   typecanna  typepotato 
##    74.59478    62.50120    83.66605   154.02614

## look at the design matrix to understand the coefficients

model.matrix(fit_starch2)

##    (Intercept) thickness typecanna typepotato
## 1            1       7.7         1          0
## 2            1       6.3         1          0
## 3            1       8.6         1          0
## 4            1      11.8         1          0
## 5            1      12.4         1          0
## 6            1      12.0         1          0
## 7            1      11.4         1          0
## 8            1      10.4         1          0
## 9            1       9.2         1          0
## 10           1       9.0         1          0
## 11           1       9.5         1          0
## 12           1      12.5         1          0
## 13           1      11.7         1          0
## 14           1       8.0         0          0
## 15           1       7.3         0          0
## 16           1       7.2         0          0
## 17           1       6.1         0          0
## 18           1       6.4         0          0
## 19           1       6.4         0          0
## 20           1       6.9         0          0
## 21           1       5.8         0          0
## 22           1       5.3         0          0
## 23           1       6.7         0          0
## 24           1       5.8         0          0
## 25           1       5.7         0          0
## 26           1       6.1         0          0
## 27           1       6.2         0          0
## 28           1       6.3         0          0
## 29           1       6.0         0          0
## 30           1       6.8         0          0
## 31           1       7.9         0          0
## 32           1       7.2         0          0
## 33           1      13.0         0          1
## 34           1      13.3         0          1
## 35           1      10.7         0          1
## 36           1      12.2         0          1
## 37           1      11.6         0          1
## 38           1       9.7         0          1
## 39           1      10.8         0          1
## 40           1      10.1         0          1
## 41           1      12.7         0          1
## 42           1       9.8         0          1
## 43           1      10.0         0          1
## 44           1      13.8         0          1
## 45           1      13.3         0          1
## 46           1      12.4         0          1
## 47           1      12.2         0          1
## 48           1      14.1         0          1
## 49           1      13.7         0          1
## attr(,"assign")
## [1] 0 1 2 2
## attr(,"contrasts")
## attr(,"contrasts")$type
## [1] "contr.treatment"

## look at the prediction to understand the fitting
plot (strength~thickness,pch = as.numeric(type), col =type, 
      main = "Varying Intercept, One Slope");
legend (5.5,1700,  legend= levels(type), pch = 1:3, col=1:3)
#abline (fit_starch1, lwd=2)
coef.fit2 <- coefficients(fit_starch2)
abline (a = coef.fit2[1], b= coef.fit2[2], col = 1)
abline (a = coef.fit2[1]+ coef.fit2[3], b = coef.fit2[2] ,col =2)
abline (a = coef.fit2[1]+ coef.fit2[4], b = coef.fit2[2] ,col =3)

## anova
anova.R2 (fit_starch2)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
thickness	1	2553357.20	2553357.20	94.185852	0.0000000	0.6666691
type	2	56724.87	28362.44	1.046207	0.3596542	0.0148106
Residuals	45	1219939.85	27109.77	NA	NA	0.3185203

5.4 Analysis 2-2: Considering “type” then “thickness” (wrong for this question)

fit_starch4<-lm(strength~type+thickness)
coefficients(fit_starch4)

## (Intercept)   typecanna  typepotato   thickness 
##    74.59478    83.66605   154.02614    62.50120

Note that linear model estimation is the same as analysis 2-1, but anova will be different

anova.R2 (fit_starch4)

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
type	2	2246204.4	1123102.19	41.42794	0.0000000	0.5864730
thickness	1	363877.7	363877.69	13.42238	0.0006526	0.0950067
Residuals	45	1219939.9	27109.77	NA	NA	0.3185203

Note that the ordering of variables matters in producing ANOVA table.

5.5 Analyzing Starch Data Again: Varying Slopes (Interaction)

## we can fit better if allow intercept and slope both varying (interaction)

fit_starch5<-lm(strength~thickness*type)
summary (fit_starch5)

## 
## Call:
## lm(formula = strength ~ thickness * type)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -262.56  -83.31  -18.84   47.20  659.67 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -751.04     323.32  -2.323  0.02499 *  
## thickness              188.91      49.20   3.839  0.00040 ***
## typecanna              942.01     399.62   2.357  0.02303 *  
## typepotato            1338.49     446.49   2.998  0.00451 ** 
## thickness:typecanna   -129.62      54.17  -2.393  0.02116 *  
## thickness:typepotato  -156.40      55.44  -2.821  0.00721 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 154.7 on 43 degrees of freedom
## Multiple R-squared:  0.7314, Adjusted R-squared:  0.7002 
## F-statistic: 23.42 on 5 and 43 DF,  p-value: 2.774e-11

coef.fit5 <- coefficients(fit_starch5); knitr::kable(coef.fit5)

	x
(Intercept)	-751.0374
thickness	188.9074
typecanna	942.0086
typepotato	1338.4945
thickness:typecanna	-129.6155
thickness:typepotato	-156.3974

## look at the design matrix to understand the coefficient

data.frame(model.matrix (fit_starch5))

X.Intercept.	thickness	typecanna	typepotato	thickness.typecanna	thickness.typepotato
1	7.7	1	0	7.7	0.0
1	6.3	1	0	6.3	0.0
1	8.6	1	0	8.6	0.0
1	11.8	1	0	11.8	0.0
1	12.4	1	0	12.4	0.0
1	12.0	1	0	12.0	0.0
1	11.4	1	0	11.4	0.0
1	10.4	1	0	10.4	0.0
1	9.2	1	0	9.2	0.0
1	9.0	1	0	9.0	0.0
1	9.5	1	0	9.5	0.0
1	12.5	1	0	12.5	0.0
1	11.7	1	0	11.7	0.0
1	8.0	0	0	0.0	0.0
1	7.3	0	0	0.0	0.0
1	7.2	0	0	0.0	0.0
1	6.1	0	0	0.0	0.0
1	6.4	0	0	0.0	0.0
1	6.4	0	0	0.0	0.0
1	6.9	0	0	0.0	0.0
1	5.8	0	0	0.0	0.0
1	5.3	0	0	0.0	0.0
1	6.7	0	0	0.0	0.0
1	5.8	0	0	0.0	0.0
1	5.7	0	0	0.0	0.0
1	6.1	0	0	0.0	0.0
1	6.2	0	0	0.0	0.0
1	6.3	0	0	0.0	0.0
1	6.0	0	0	0.0	0.0
1	6.8	0	0	0.0	0.0
1	7.9	0	0	0.0	0.0
1	7.2	0	0	0.0	0.0
1	13.0	0	1	0.0	13.0
1	13.3	0	1	0.0	13.3
1	10.7	0	1	0.0	10.7
1	12.2	0	1	0.0	12.2
1	11.6	0	1	0.0	11.6
1	9.7	0	1	0.0	9.7
1	10.8	0	1	0.0	10.8
1	10.1	0	1	0.0	10.1
1	12.7	0	1	0.0	12.7
1	9.8	0	1	0.0	9.8
1	10.0	0	1	0.0	10.0
1	13.8	0	1	0.0	13.8
1	13.3	0	1	0.0	13.3
1	12.4	0	1	0.0	12.4
1	12.2	0	1	0.0	12.2
1	14.1	0	1	0.0	14.1
1	13.7	0	1	0.0	13.7

## looking at the prediction to understand the fitting
par (mfrow = c(1,1))
plot (strength~thickness,pch = as.numeric(type), col =type, 
      main = "Varying Intercept, Varying Slope");
legend (5.5,1700,  legend=levels(type), pch = 1:3, col=1:3 )
#abline (fit_starch1, lwd=2)
coef.fit2 <- coefficients(fit_starch2)
abline (a = coef.fit5[1], b= coef.fit5[2], col = 1)
abline (a = coef.fit5[1]+ coef.fit5[3], b = coef.fit5[2] + coef.fit5[5] ,col =2)
abline (a = coef.fit5[1]+ coef.fit5[4], b = coef.fit5[2] + coef.fit5[6],col =3)

anova.R2 (fit_starch5)-> anova.fit5; anova.fit5

	Df	Sum Sq	Mean Sq	F value	Pr(>F)	R2
thickness	1	2553357.20	2553357.20	106.741403	0.0000000	0.6666691
type	2	56724.87	28362.44	1.185673	0.3153306	0.0148106
thickness:type	2	191338.42	95669.21	3.999388	0.0255292	0.0499575
Residuals	43	1028601.43	23920.96	NA	NA	0.2685628

The most resonable way to account for the contribution of “type” should be comparing these two models.

fit_starch_thicknessonly <- lm (strength~thickness)
fit_startch_thickness.type.interaction <- lm (strength~thickness*type)
anova (fit_starch_thicknessonly, fit_startch_thickness.type.interaction)

Res.Df	RSS	Df	Sum of Sq	F	Pr(>F)
47	1276665	NA	NA	NA	NA
43	1028601	4	248063.3	2.59253	0.0497493

## R^2 (percentage of reduction of RSS) after considering "type" is about 19%
248063 / 1276665

## [1] 0.1943055

5.6 Which model should we choose to interpret? Using AIC

Notes: not required for this course

AIC is an estimate of the leave-one-out predictive accuracy, a value related to SSE but penalized by the number of model parameters.

AIC (fit_startch_thickness.type.interaction) # thickness*type

## [1] 640.6986

AIC( fit_starch4) # thickness+type

## [1] 645.0581

AIC( fit_starch1) # thickness only

## [1] 643.2851

AIC( fit1_typeonly) # type only

## [1] 655.8489

The model considering the interaction between thickness and type has the lowest AIC value, which means that it has the best out-of-sample predictive ability. Our conclusion should be based on this model.

The model considering thickness+type has lower predictive accuracy than the model considering thickness only. This is another approach to conclude that the type of starch does not affect the mean of the strengthness of films. After considering the interaction between the thickness and the type of starch, we discover that the type of starch affects the slope between the strengthness and thickness.

detach ("starch")

Unit3: Design and Analysis of Experiments with Block Factors

Longhai Li

Feburary 2020