STAT:5201 Homework 9 Solutions 1. We have a model with two crossed random factors operator and machine. There are 4 operators, 8 machines, and 3 observations from each operator/machine combination. (a) source df operator 3 machine 7 operator*machine 21 (b) Below, the table of expected mean squares is shown with the coefficients before terms missing. In your homework, provide the 8 missing values. Source Operator Machine Operator*Machine Expected Mean Square _1_Var(Error) + _3_Var(Operator*Machine) + _24_Var(Operator) _1_Var(Error) + _3_Var(Operator*Machine) + _12_Var(Machine) _1_Var(Error) + _3_Var(Operator*Machine) 2. a) These 10 litters were chosen at random. We think the inclusion of litter in our model will account for some of the variability in the activity measurement, but we re not interested in doing tests on only these specific 10 litters. Instead, we want to make a statement about the whole population of dog litters. b) There are only three specific reagents in the study, and these are the only ones of interest. If the researcher were to repeat the study, she would probably use the same three reagent levels. proc mixed data=dogs; class reagent litter; model activity=reagent/ddfm=satterth solution; random litter; lsmeans reagent/adj=tukey pdiff; run; Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F reagent 2 18 38.05 <.0001 Differences of Least Squares Means Effect reagent reagent Estimate Error DF t Value reagent 1 2-4.4000 0.5696 18-7.72 reagent 1 3-4.2000 0.5696 18-7.37 reagent 2 3 0.2000 0.5696 18 0.35 Differences of Least Squares Means Effect reagent reagent Pr > t Adjustment Adj P reagent 1 2 <.0001 Tukey-Kramer <.0001 reagent 1 3 <.0001 Tukey-Kramer <.0001 reagent 2 3 0.7296 Tukey-Kramer 0.9345 1
c) H 0 : α 1 = α 2 = α 3 = 0 (assuming you re using the parameters I chose in class) The p-value < 0.0001. There is statistically significant evidence that the mean muscle activity in at least one of the reagents is different from the others. Reagent 1 is significantly different from both reagent 2 & 3, and 2 & 3 are not significantly different from each other. d) Numerator df is 2, and Denominator df is 18. That s 2 df for reagent and 18 df for error. e) Covariance Parameter Estimates Or, ˆ σ2 β = 43.7370 and ˆσ 2 = 1.6222 Cov Parm Estimate litter 43.7370 Residual 1.6222 f) Effect reagent Estimate Error DF t Value Pr > t Intercept 19.5000 2.1298 9 9.16 <.0001 reagent 1-4.2000 0.5696 18-7.37 <.0001 reagent 2 0.2000 0.5696 18 0.35 0.7296 reagent 3 0.... g) The constraint SAS uses to fit the model and estimate the parameters is α 3 = 0. This forces µ to be interpretted as the mean activity for reagent 3 (µ + α 3 = µ) rather than the overall mean, and α 1 represents the difference between reagent 1 and reagent 3. Thus, the difference in the mean of reagent 1 and the mean of reagent 3 is 4.2. From this output, we can see that reagent 1 has the lowest mean, then reagent 3 is next, and reagent 2 has the highest mean. h) Least Squares Means Effect reagent Estimate Error DF t Value Pr > t reagent 1 15.3000 2.1298 9.44 7.18 <.0001 reagent 2 19.7000 2.1298 9.44 9.25 <.0001 reagent 3 19.5000 2.1298 9.44 9.16 <.0001 As I requested the Satterthwaite df for the denominator term in my test, my DF for the test are a decimal. This occurs when the standard error is made-up of a linear combination of MS terms, and we re using the (Satterthwaite) approximation to the t-distribution for the test (or confidence interval). If you didn t request this option, your standard error is still the same, but not the DF. 2
i) The GLM Procedure Source reagent litter Type III Expected Mean Square Var(Error) + Q(reagent) Var(Error) + 3 Var(litter) 3. (a) Write down the model (be sure subscripts show any nesting, and provide any relevant distributions). Y ijk = µ + α i + β j(i) + ɛ k(ij) µ represents the population mean time to complete a job for this class of jobs α i represents the random effect of job i β j(i) represents the random effect of operator j nested in job i ɛ k(ij) represents the random error (this could also be ɛ ijk ) α i, β j(i), and ɛ ijk are independent random variables with α i i.i.d N(0, σ 2 α) and β j(i) i.i.d N(0, σ 2 β ) and ɛ k(ij) i.i.d N(0, σ 2 ) (b) proc mixed data=jobs; class job operator; model time=; random job operator(job); run; Covariance Parameter Estimates Cov Parm Estimate JOB 4.6502 OPERATOR(JOB) 0.3143 Residual 1.0925 (c) There seems to be much more job-to-job variability than operator-to-operator variability. For a given job, the operators perform relatively similarly, compared to the larger variability from one job to the next. I don t think the engineer would still agree with his earlier statement. Within this class of jobs, there is still quite a bit of variability in the time it takes to complete the differing jobs. 3
4. Simulation problem. (a) sigma_b2.estimates <- NULL sigma2.estimates <- NULL Fstats <- NULL for (i in 1:1000){ RCBD.data=RCBD.mice.diet() table <- anova(lm(body.fat.change~cage + treatment,rcbd.data)) MS <- table[,3] sigma_b2.est <- (MS[1]-MS[3])/3 sigma2.est<-ms[3] sigma_b2.estimates <- c(sigma_b2.estimates,sigma_b2.est) sigma2.estimates <- c(sigma2.estimates,sigma2.est) Fstats <- c(fstats,table[2,4]) } (b) par(mfrow=c(1,2)) hist(sigma_b2.estimates,col="grey",prob=true,xlab="",cex.main=2, main=expression(paste("histogram: ", hat(sigma)[b]^2))) abline(v=1.44,col=2,lwd=2) hist(sigma2.estimates,col="grey",prob=true,xlab="",cex.main=2, smain=expression(paste("histogram: ", hat(sigma)^2))) abline(v=0.64,col=2,lwd=2) 2 Histogram: σ^b Histogram: σ^2 0.0 0.1 0.2 0.3 0.4 0.0 0.4 0.8 1.2 0 2 4 6 8 0.0 0.5 1.0 1.5 2.0 2.5 (c) (cutoff <- qf(0.95,2,8)) [1] 4.45897 ncp1 <- (10/.64) (power <- pf(cutoff,2,8,ncp=ncp1,lower.tail=false)) [1] 0.8333519 4
(d) i. hist(fstats,col="grey",prob=true,xlab="",ylim=c(0,0.08),cex.main=2, n=20,main="histogram: F-statistics") x <- seq(0,50,.1) y <- NULL ncp1 <- (10/.64) for (a in x){y<- c(y,df(a,2,8,ncp=ncp1))} lines(x,y,lwd=3,col="blue",lty=2) Histogram: F-statistics 0.00 0.02 0.04 0.06 0.08 0 20 40 60 80 100 ii. mean(fstats >cutoff) [1] 0.826 My calculated power was 0.83 and 82.6% of the F-tests were rejected, so they re pretty close. 5