STA 303H1F: Two-way Analysis of Variance Practice Problems

Similar documents
STA 303H1F: Two-way Analysis of Variance Practice Problems

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Repeated Measures Part 2: Cartoon data

Analysis of Covariance

1 Tomato yield example.

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

SAS Commands. General Plan. Output. Construct scatterplot / interaction plot. Run full model

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

General Linear Model (Chapter 4)

data proc sort proc corr run proc reg run proc glm run proc glm run proc glm run proc reg CONMAIN CONINT run proc reg DUMMAIN DUMINT run proc reg

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Answer Keys to Homework#10

Assignment 9 Answer Keys

171:162 Design and Analysis of Biomedical Studies, Summer 2011 Exam #3, July 16th

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Two-factor studies. STAT 525 Chapter 19 and 20. Professor Olga Vitek

4.8 Alternate Analysis as a Oneway ANOVA

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

What Does the F-Ratio Tell Us?

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

Outline Topic 21 - Two Factor ANOVA

Incomplete Block Designs

Outline. Topic 19 - Inference. The Cell Means Model. Estimates. Inference for Means Differences in cell means Contrasts. STAT Fall 2013

Linear Combinations of Group Means

Topic 20: Single Factor Analysis of Variance

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

STAT 115:Experimental Designs

Booklet of Code and Output for STAC32 Final Exam

STAT 350. Assignment 4

PLS205 Lab 2 January 15, Laboratory Topic 3

Multivariate analysis of variance and covariance

Unbalanced Data in Factorials Types I, II, III SS Part 2

Ch 2: Simple Linear Regression

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Lec 1: An Introduction to ANOVA

Lecture 11: Simple Linear Regression

STAT 350: Summer Semester Midterm 1: Solutions

Week 7.1--IES 612-STA STA doc

Topic 28: Unequal Replication in Two-Way ANOVA

Residuals from regression on original data 1

Profile Analysis Multivariate Regression

N J SS W /df W N - 1

Chapter 9. Multivariate and Within-cases Analysis. 9.1 Multivariate Analysis of Variance

Lecture 3: Inference in SLR

Topic 32: Two-Way Mixed Effects Model

PLS205 Lab 6 February 13, Laboratory Topic 9

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Assignment 6 Answer Keys

Topic 14: Inference in Multiple Regression

Laboratory Topics 4 & 5

Introduction to Crossover Trials

Overview Scatter Plot Example

Topic 23: Diagnostics and Remedies

Examination paper for TMA4255 Applied statistics

WITHIN-PARTICIPANT EXPERIMENTAL DESIGNS

Orthogonal and Non-orthogonal Polynomial Constrasts

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Assessing Model Adequacy

Lecture 4. Random Effects in Completely Randomized Design

STATISTICS 479 Exam II (100 points)

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Lecture 19 Multiple (Linear) Regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Introduction. Chapter 8

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

Analysis of Variance Bios 662

3 Variables: Cyberloafing Conscientiousness Age

Lecture 13 Extra Sums of Squares

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Simple, Marginal, and Interaction Effects in General Linear Models

11 Factors, ANOVA, and Regression: SAS versus Splus

Chapter 5: Multivariate Analysis and Repeated Measures

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design

Homework 2: Simple Linear Regression

y response variable x 1, x 2,, x k -- a set of explanatory variables

In Class Review Exercises Vartanian: SW 540

Mixed Models Lecture Notes By Dr. Hanford page 199 More Statistics& SAS Tutorial at

One-way ANOVA Model Assumptions

Analysis of variance. April 16, Contents Comparison of several groups

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Lecture 9: Factorial Design Montgomery: chapter 5

Regression Models - Introduction

Analysis of variance. April 16, 2009

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

CAS MA575 Linear Models

Correlation and the Analysis of Variance Approach to Simple Linear Regression

The Random Effects Model Introduction

5.3 Three-Stage Nested Design Example

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Math 3330: Solution to midterm Exam

Transcription:

STA 303H1F: Two-way Analysis of Variance Practice Problems 1. In the Pygmalion example from lecture, why are the average scores of the platoon used as the response variable, rather than the scores of the individual soldiers? 2. In two-way analysis of variance, (a) What does it mean when there are significant interactions but no significant main effects? ( Main effects are the effects of the factors considered on their own.) (b) What does it mean when are are significant main effects but no significant interaction? 3. A two-way analysis of variance model with G levels of one factor and H levels of the second factor can be thought of as a one-way analysis of variance with a factor with G H levels. Let Y ghi denote the response of the ith observation in the gth group of the first factor and hth group of the second factor, with E(Y ghi ) = θ gh for g = 1,..., G, h = 1,..., H, and i = 1,..., n gh where n gh is the number of observations in the gth level of the first factor and the hth level of the second factor. The least squares solutions can found by minimizing G H n gh (y ghi θ gh ) 2 g=1 h=1 i=1 with respect to θ gh for g = 1,..., G and h = 1,..., H. Show that the least squares solutions is where ˆθ gh = y gh y gh = 1 n gh y ghi. n gh 4. Consider the model for a two-way analysis of variance with two levels of each factor (a 2 2 classification Y i = β 0 + β 1 I factor 1,i + β 2 I factor 2,i + β 3 I factor 1,i I factor 2,i + e i where I factor 1,i = 1 if the ith observation is in the first group of factor 1 and is 0 otherwise. (a) What are the expected values of Y i for each of the 4 groups means? (b) Use the result of question 3 to show that the least squares estimate of the coefficients are b 0 = y 22 i=1 b 1 = y 12 y 22 b 2 = y 21 y 22 b 3 = y 11 y 21 + y 22 y 12 1

where y mn is the mean of observations for the mth level of factor 1 and the nth level of factor 2. (c) Under the assumption that the Y s are uncorrelated with variance σ 2, what is the variance of b 3? 5. (The scenario for this question is taken from Kleinbaum et al. Chapter 20, Question 7.) The effect of a new antidepressant drug on reducing the severity of depression was studied in manic-depressive patients at two state mental hospitals. In each hospital all such patients were randomly assigned to either a treatment (new drug) or a control (old drug) group. The results of this experiment are summarized in the following table; a high mean score indicates more lowering in depression level than does a low mean score. Group Hospital Treatment Control A n = 25, y = 8.5, s = 1.3 n = 31, y = 4.6, s = 1.8 B n = 25, y = 2.3, s = 0.9 n = 31, y = 1.7, s = 1.1 (a) Write an appropriate linear model for analysing these data, both with and without the use of matrices. (b) Use the results of question 4 to find a numeric value for the coefficient of the interaction term. (c) Estimate the variance of the coefficient of the interaction term. (d) Test the hypothesis of no interaction. 6. The data for this question were taken from the appendix of Kutner et al. (the SENIC data). The dependent variable is length of stay (variable name los in output below) in hospital for patients. In this question the effects of geographic region (variable name region, 4 categories where 1=North East, 2=North Central, 3=South, and 4=West) and age of patient are to be studied. For this question, age has been classified into three categories (variable name agegroup where 1=under 52.0 years, 2=52.0 - under 55.0 years, 3=55.0 years or more). (a) Write the linear model including interactions for analysing these data, both with and without the use of matrices, using indicator variables coded as 0 or 1. (b) In the SAS output that follows, complete the ANOVA table (some numbers have been replaced with X s). ---------------------------------------------------------- los -------------------------------------- Mean Std N -----------------+------------+------------+------------ region agegroup --------+-------- 1 1 9.71 0.82 5.00 2 10.48 1.74 12.00 3 12.38 3.52 11.00 --------+--------+------------+------------+------------ 2

2 1 9.71 1.33 16.00 2 10.01 0.86 9.00 3 9.21 1.22 7.00 --------+--------+------------+------------+------------ 3 1 9.14 1.31 17.00 2 8.97 1.20 7.00 3 9.38 1.20 13.00 --------+--------+------------+------------+------------ 4 1 7.54 0.65 4.00 2 8.95 0.88 7.00 3 7.41 0.38 5.00 ---------------------------------------------------------- Class Level Information Class Levels Values region 4 1 2 3 4 agegroup 3 1 2 3 Number of Observations Read 113 Number of Observations Used 113 Sum of Source DF Squares Mean Square F Value Pr > F Model XX 147.9763195 13.4523927 XXXX XXXXXX Error 101 261.2340610 2.5864759 Corrected Total 112 409.2103805 R-Square Coeff Var Root MSE los Mean 0.361614 16.66873 1.608252 9.648319 Source DF Type I SS Mean Square F Value Pr > F region 3 103.5541834 34.5180611 13.35 <.0001 agegroup 2 5.2461547 2.6230774 1.01 0.3664 region*agegroup 6 39.1759815 6.5293302 2.52 0.0256 Source DF Type III SS Mean Square F Value Pr > F region 3 84.24919035 28.08306345 10.86 <.0001 agegroup 2 6.47626605 3.23813302 1.25 0.2903 region*agegroup 6 39.17598146 6.52933024 2.52 0.0256 3

(c) What do you conclude? Is your conclusion consistent with the plot of means below?!"#$%& *- *, ** *+ ) ( ' *, - 2.!/01# "/!/.134 *, - (d) Below are plots of the residuals versus predicted values and a normal quantile plot of the residuals. What do you conclude from them?!"#$. ', & + & % ' ( ()*+, " ) * $ %) %( " %' %& %. 0 )* )) )( )'! " # $ # "! /!$ -.(/01234056+1)* 4

(e) Below is output from the lsmeans statement of proc glm. Why are means given for 12 groups rather than 3 or 4? What do you conclude? Least Squares Means LSMEAN region agegroup los LSMEAN Number 1 1 9.7100000 1 1 2 10.4791667 2 1 3 12.3809091 3 2 1 9.7056250 4 2 2 10.0122222 5 2 3 9.2100000 6 3 1 9.1358824 7 3 2 8.9671429 8 3 3 9.3846154 9 4 1 7.5400000 10 4 2 8.9457143 11 4 3 7.4080000 12 1 0.3711 0.0027 0.9958 0.7369 0.5966 2 0.3711 0.0056 0.2107 0.5118 0.1002 3 0.0027 0.0056 <.0001 0.0014 <.0001 4 0.9958 0.2107 <.0001 0.6483 0.4980 5 0.7369 0.5118 0.0014 0.6483 0.3246 6 0.5966 0.1002 <.0001 0.4980 0.3246 7 0.4845 0.0290 <.0001 0.3115 0.1892 0.9185 8 0.4320 0.0508 <.0001 0.3133 0.2002 0.7781 9 0.7014 0.0922 <.0001 0.5941 0.3703 0.8173 10 0.0469 0.0020 <.0001 0.0178 0.0120 0.1007 1 0.4845 0.4320 0.7014 0.0469 0.4189 0.0258 2 0.0290 0.0508 0.0922 0.0020 0.0477 0.0005 3 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 4 0.3115 0.3133 0.5941 0.0178 0.2996 0.0063 5 0.1892 0.2002 0.3703 0.0120 0.1912 0.0045 6 0.9185 0.7781 0.8173 0.1007 0.7591 0.0585 7 0.8157 0.6755 0.0772 0.7929 0.0372 8 0.8157 0.5810 0.1599 0.9802 0.1009 9 0.6755 0.5810 0.0475 0.5618 0.0215 10 0.0772 0.1599 0.0475 0.1662 0.9029 5

11 0.4189 0.0477 <.0001 0.2996 0.1912 0.7591 12 0.0258 0.0005 <.0001 0.0063 0.0045 0.0585 11 0.7929 0.9802 0.5618 0.1662 0.1056 12 0.0372 0.1009 0.0215 0.9029 0.1056 NOTE: To ensure overall protection level, only probabilities associated with pre-planned comparisons should be used. Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer 1 0.9990 0.1019 1.0000 1.0000 1.0000 2 0.9990 0.1825 0.9822 1.0000 0.8819 3 0.1019 0.1825 0.0027 0.0606 0.0049 4 1.0000 0.9822 0.0027 1.0000 0.9999 5 1.0000 1.0000 0.0606 1.0000 0.9976 6 1.0000 0.8819 0.0049 0.9999 0.9976 7 0.9999 0.5428 <.0001 0.9970 0.9743 1.0000 8 0.9997 0.7076 0.0016 0.9971 0.9787 1.0000 9 1.0000 0.8640 0.0009 1.0000 0.9990 1.0000 10 0.6847 0.0817 <.0001 0.4100 0.3176 0.8830 1 0.9999 0.9997 1.0000 0.6847 0.9996 0.5091 2 0.5428 0.7076 0.8640 0.0817 0.6891 0.0246 3 <.0001 0.0016 0.0009 <.0001 0.0014 <.0001 4 0.9970 0.9971 1.0000 0.4100 0.9962 0.2010 6

5 0.9743 0.9787 0.9990 0.3176 0.9752 0.1557 6 1.0000 1.0000 1.0000 0.8830 1.0000 0.7480 7 1.0000 1.0000 0.8219 1.0000 0.6157 8 1.0000 1.0000 0.9578 1.0000 0.8834 9 1.0000 1.0000 0.6883 1.0000 0.4591 10 0.8219 0.9578 0.6883 0.9621 1.0000 Least Squares Means Adjustment for Multiple Comparisons: Tukey-Kramer 11 0.9996 0.6891 0.0014 0.9962 0.9752 1.0000 12 0.5091 0.0246 <.0001 0.2010 0.1557 0.7480 11 1.0000 1.0000 1.0000 0.9621 0.8926 12 0.6157 0.8834 0.4591 1.0000 0.8926 7