Stats 201 Homework 8: Due Friday, Dec. 12 by 5pm For problems

Stats 201 Homework 8: Due Friday, Dec. 12 by 5pm
For problems that require the use of R, print all relevant R code, output, and plots.
Your R code and output should be clear and concise – our grader will not wade through
pages of R output looking for an answer. If there are handwritten portions as well as
printed R code and output, be sure the location of your answers is clear.
1. Generations of athletes have been cautioned that cigarette smoking hinders performance. One measure of the truth of that warning is the effect of smoking on heart
rate. In one study examining that impact, six each of non-smokers, light smokers,
moderate smokers, and heavy smokers undertook sustained physical exercise. Their
heart rates were measured after resting for three minutes. The results appear in
the following table:
Non-Smokers
69
52
71
58
59
65
Light Smokers
55
60
78
58
62
66
Moderate Smokers
66
81
70
77
57
79
Heavy Smokers
91
72
81
67
95
84
Note that this data set is not posted on the course webpage; in order to get the
data into R, you will need to manually enter the data into a data.frame.
(a) Create the one-way ANOVA table “by hand.” You may use R to calculate sample means and sample variances (using tapply with mean and var functions),
but do not use the aov or lm functions (though you may check your answers
using those functions). Be sure to show how you calculated each component
of the ANOVA table.
(b) Carry out the ANOVA F-test for this scenario. Report the null and alternative
hypotheses, defining any symbols used in context of the problem; the F test
statistic; the p-value; and your conclusion in terms of the problem. Use the
pf function in R to calculate the p-value rather than the aov or lm functions.
(c) Use R to compute the Tukey multiple comparison confidence intervals for all
pairwise mean differences in the smoker study using a familywise error rate of
α = .05 (now you can use the R function aov, and then TukeyHSD). Which
pairs of groups have significantly different mean heart rates?
(d) Write a few sentences summarizing your conclusions from this study.
2. Consider the two-way ANOVA model
Yijk = µ + αi + βj + αβij + ijk
iid
for k = 1, . . . , nij , i = 1, . . . , a, and j = 1, . . . , b, where ijk ∼ N (0, σ 2 ). Suppose
a = 2 and b = 3, and the sample sizes within each combination of the two factors
are n11 = n13 = n21 = n22 = 1 and n12 = n23 = 2. This model can be expressed as
a linear model in matrix form: Y = Xβ + . Write out the entries of the vector Y,
the matrix X and the vector β for this model.
3. Read in the 1985 Current Population Survey data we have used previously:
cps = read.csv("http://www.ics.uci.edu/~staceyah/201/data/cps.csv")
The response is hourly wage in dollars (wage), Factor A is sex (M or F) and Factor
B is married (Married or Single).
(a) Use R to calculate the sample sizes for each of the four factor combinations.
(b) Use R to calcualte the sample standard deviations for each of the four factor
combinations.
(c) Use R to calculate the four cell sample mean wages (ˆ
µij for i = 1, 2 and j = 1, 2),
as well as the four marginal sample mean wages (ˆ
µi. for i = 1, 2 and µ
ˆ.j for
j = 1, 2), and the overall sample mean wage.
(d) Find estimates of the following parameters: (i) α2 , (ii) β1 , (iii) αβ21 .
(e) Use R to produce an interaction plot for these data. From the interaction
plot, does it seem like an interaction is present between sex and marital status?
Explain.
(f) Produce the two-way ANOVA table (including interaction). Report the p-value
and the conclusion of the F-test for interaction.
(g) Explain why it would not make sense to interpret the main effects for these
data.
(h) Assess the assumptions of the two-way ANOVA model for these data. If the
assumptions are not met, suggest a transformation of the reponse that might
reduce the assumption violations.
The remaining problems are exercises from the Utts and Heckard supplemental chapter
on Two-Way Analysis of Variance (Chapter S4), posted in our EEE Dropbox:
4. Exercise S4.4.
5. Exercise S4.22.