QUEEN MARY, UNIVERSITY OF LONDON

Similar documents
22s:152 Applied Linear Regression. Take random samples from each of m populations.

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

Analysis of Variance

Multiple comparisons - subsequent inferences for two-way ANOVA

Analysis of Variance. Read Chapter 14 and Sections to review one-way ANOVA.

Lec 1: An Introduction to ANOVA

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

10/31/2012. One-Way ANOVA F-test

Unit 12: Analysis of Single Factor Experiments

Multiple Sample Numerical Data

Introduction to Business Statistics QM 220 Chapter 12

One-Way Analysis of Variance (ANOVA)

Research Methods II MICHAEL BERNSTEIN CS 376

Disadvantages of using many pooled t procedures. The sampling distribution of the sample means. The variability between the sample means

Statistics for Managers Using Microsoft Excel Chapter 10 ANOVA and Other C-Sample Tests With Numerical Data

One-way ANOVA. Experimental Design. One-way ANOVA

Summary of Chapter 7 (Sections ) and Chapter 8 (Section 8.1)

The t-statistic. Student s t Test

Unit 27 One-Way Analysis of Variance

1 One-way Analysis of Variance

Review for Final. Chapter 1 Type of studies: anecdotal, observational, experimental Random sampling

Analysis of Variance: Part 1

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Chapter 10: Analysis of variance (ANOVA)

Comparing Several Means: ANOVA

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to

Analysis of variance

STAT 350: Geometry of Least Squares

Sociology 6Z03 Review II

Introduction. Chapter 8

One-way ANOVA (Single-Factor CRD)

Lecture 4. Random Effects in Completely Randomized Design

44.2. Two-Way Analysis of Variance. Introduction. Prerequisites. Learning Outcomes

Solutions to Final STAT 421, Fall 2008

Design & Analysis of Experiments 7E 2009 Montgomery

Lec 5: Factorial Experiment

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Theorem A: Expectations of Sums of Squares Under the two-way ANOVA model, E(X i X) 2 = (µ i µ) 2 + n 1 n σ2

Econometrics. 4) Statistical inference

Analysis of Variance (ANOVA)

Analysis of Variance. ภาว น ศ ร ประภาน ก ล คณะเศรษฐศาสตร มหาว ทยาล ยธรรมศาสตร

Comparing the means of more than two groups

One-Way Analysis of Variance (ANOVA) Paul K. Strode, Ph.D.

Chapter 12. Analysis of variance

Simple Linear Regression

STAT 501 EXAM I NAME Spring 1999

DESAIN EKSPERIMEN BLOCKING FACTORS. Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

ONE FACTOR COMPLETELY RANDOMIZED ANOVA

Food consumption of rats. Two-way vs one-way vs nested ANOVA

Inferences for Regression

Chapter 11 - Lecture 1 Single Factor ANOVA

The legacy of Sir Ronald A. Fisher. Fisher s three fundamental principles: local control, replication, and randomization.

Inference for Regression

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Remedial Measures, Brown-Forsythe test, F test

SAMPLING IN FIELD EXPERIMENTS

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

21.0 Two-Factor Designs

Can you tell the relationship between students SAT scores and their college grades?

Variance Estimates and the F Ratio. ERSH 8310 Lecture 3 September 2, 2009

Chapter 11. Analysis of Variance (One-Way)

Lecture 14: ANOVA and the F-test

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

The Chi-Square Distributions

One-Way Analysis of Variance: ANOVA

Design of Experiments. Factorial experiments require a lot of resources

Open book and notes. 120 minutes. Covers Chapters 8 through 14 of Montgomery and Runger (fourth edition).

The Chi-Square Distributions

Week 14 Comparing k(> 2) Populations

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

Chapter 8: Hypothesis Testing Lecture 9: Likelihood ratio tests

Two-Sample Inferential Statistics

Factorial designs. Experiments

Ron Heck, Fall Week 3: Notes Building a Two-Level Model

Analysis of Variance and Co-variance. By Manza Ramesh

An Old Research Question

PLSC PRACTICE TEST ONE

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Difference in two or more average scores in different groups

Lecture 6: Linear models and Gauss-Markov theorem

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Design & Analysis of Experiments 7E 2009 Montgomery

Stat 6640 Solution to Midterm #2

Ch 2: Simple Linear Regression

Chapter 5 Introduction to Factorial Designs Solutions

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Single Factor Experiments

Sampling Distributions: Central Limit Theorem

ANOVA Randomized Block Design

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

STATS Analysis of variance: ANOVA

Inference for Regression Simple Linear Regression

Transcription:

QUEEN MARY, UNIVERSITY OF LONDON MTH634 Statistical Modelling II Solutions to Exercise Sheet 4 Octobe07. We can write (y i. y.. ) (yi. y i.y.. +y.. ) yi. y.. S T. ( Ti T i G n Ti G n y i. +y.. ) G n T i + G n ( ) T i G +n n The form S T t T i / G /n of the treatment sum of squares is convenient for calculation by hand. However, the equivalent expression S T t (y i. y.. ) shows more clearly that S T is nonnegative. We also have (y ij y i. ) j (yij y ijy i. +yi. ) j yij j yij j yij j S E. y i. T i T i + j y ij + yi. ( ) Ti In practice, to calculate the residual sum of squares for a given set of data, we would always find S G and S T first, and then compute S E as the difference S E S G S T. The equivalent expression S E t ri j (y ij y i. ) better shows that S E is nonnegative.. (a) An appropriate model for the data is the one-way ANOVA model for a completely randomised design. The treatments, and 3 are Control, NT-H and NT-L, respectively. Each of these has replication 5 fo,,3.

The corresponding model equation is Y ij µ+α i +ǫ ij fo,,3 and j,,...,5. Here, Y ij is the weight increase for the jth rat under treatment i, µ represents the overall mean and α i is the effect of treatment i. The ǫ ij terms are random errors for which it is assumed that ǫ ij N(0,σ ). Moreover, all of the ǫ ij are assumed to be independent. (b) The treatment totals for Control, NT-H and NT-L are, respectively, equal to T 578, T 36 and T 3 44. The grand total is G,38 and the sum of the squared responses is 3 5 j y ij 33,357. The correction factos G /n,38 /5,907,6/5 7,44.07. From the totals given, the treatment sum of squares can be calculated as S T 5 (578 +36 +44 ) 7,44.07 659,769 7,44.07 4,809.73. 5 Since 3 5 j y ij 33,357, the total sum of squares is and so the residual sum of squares is S G 33,357.00 7,44.07 6,.93, S E 6,.93 4,809.73,403.0. The corresponding ANOVA table is then as follows: Source SS df MS F Treatments 4, 809.73, 404.87 0.57 Residual, 403.0 6.93 Total 6,.93 4 To test for differences between the treatments, we test H 0 : α α α 3 against the alternative H that the effects of at least two of the treatments are different. For the test at the 5% level of significance, the observed value of F 0.57 is compared with the percentage point F,,0.05 3.885. Since F > F,,0.05, we reject H 0. We can thus conclude that the three diets have different effects on the increase in body weight. Note that the test does not allow us to draw any more specific conclusions. (c) The data points and means are included in the scatterplot. The means for the treatments are given below. Treatment Mean Control 5.6 NT-H 7. NT-L 88.4

Weight 30 00 70 Control NT-H NT-L The plot shows a clear distinction between results for Control and NT-H, and, less strongly, between Control and NT-L. The separation between results for treatments NT-H and NT-L is less clear. Regarding variability, there does not seem to be strong evidence for variability differing between treatments. (d) Tables (a)-(f) of the New Cambridge Statistical Tables present per cent points of various F distributions. Under the null hypothesis, the F distribution of the F test in part (b) had ν and ν degrees of freedom. Looking at Table (a), we can see that the 0% point of the F, distribution is equal to F,,0.0.807. The observed value of the test statistic in part (b) is F 0.57. Since F > F,,0.0, it follows that the p-value is smaller than 0%, and so 0.0 or 0% is an upper bound for the p-value. One way to demonstrate that you have understood why this is so is to draw a picture which sketches the probability density function of the F, distribution, shows the value F,,0.0.807 and the corresponding area under the curve, the size of which is 0. or 0% of the total area, and the value F 0.57 of the test statistic and the corresponding area under the curve. This area is equal to the p-value. We want to find the smallest upper bound and so we compare F 0.57 with the percent pointsintheothertablesinasimilar way. Inall cases, F 0.57 isgreater than the per cent point of the F, distribution. In particular, from Table (f), which gives 0.% points, we see that F > F,,0.00, where F,,0.00.97. Thus, 0.00 or 0.% is an upper bound for the p-value of the test in part (b) and it is in fact the smallest one that we can find by using Tables (a)-(f). 3. The test statistic for the two-sample t test, assuming equal variances in the two populations from which the samples are drawn, can be written as t y. y. ( ), r + {(r )s +( )s } r + where y i. T i / is the sample mean and s i j (y ij y i. ) /( ) is the sample variance for the ith sample. This form of the test statistic may appear to look different from the form of the statistic you met previously, but this is only due to the different notation. 3

We want to show that t is equal to F M T /M E. Each of the two samples may be regarded as corresponding to one out of t treatments. We start by showing that, if there are only t treatments, then Subsequently, we show that M E (r )s +( )s. () r + M T (y. y. ) r +. () From () and (), it is then obvious that t F. Question shows that S E t ri j (y ij y i. ). For t treatments, we have n r +, and so n t r +. Hence, we have M E S E n t r + r + r + r (y j y. ) + j r r r j (r )s +( )s. r + j (y j y. ) (y j y. ) + { (r )s +( )s } j (y j y. ) For t treatments, we have t. The mean square for treatments is therefore equal to M T S T t S T T + T G r n. Finally, the following calculation shows that () holds as desired: ( ) T r T (y. y. ) r + T r + T T r + T r r + r r ( r T T T + r T ) r + r r + ( r T r + r T r T T T + r T + T T T r + T (T +T ) r + T + T G r n M T. 4 )

4. (Feedback component) (a) The model for the score Y ij of student j given treatment i is Y ij µ+α i +ǫ ij, whereµandα,...,α t arethemodel parametersrepresentingtheoverall mean and the individual effect of each treatment; these parameters are constants with unknownvalue. Theerrorsǫ ij areindependentandidentically distributedasn(0,σ ). There are four treatments, so t 4, and thus i {,,...,t}. For each value of i, the student numbes indexed by j that ranges between and, and the replication pattern for this problem is r 6, 4, r 3 3 and r 4 3, so that the number of observations is n r + +r 3 +r 4 56. (b) This part is solved by identifying information available in the question, and from it building the required statistics for the ANOVA table. The following steps are needed: i. The means per method y i relate directly to the treatment totals via the formula y i T i /. Starting from the values y 74.4375, y 70.743, y 3 70.93 and y 4 77.93, the totals per treatment are T y r 74.4375 6,9, T y 70.743 4 990, T 3 y 3 r 3 70.93 3 9 and T 4 y 4 r 4 77.93 3,03. ii. The grand total is G T +T +T 3 +T 4 4,6. iii. The correction factos G /n 4,6 /56 6,94,456/56 30,56. iv. The treatment sum of squares is computed as S T T + T + T 3 + T 4 G r r 3 r 4 n,9 + 990 6 4 + 9 3 +,03 30,56 3 88,655.065 +70,007.48 +65,39.0769 +78,936.0769 30,56 463.4768. v. For the total sum of squares, we use S G 4 j y ij G n 3,388 30,56 8,86. vi. The residual sum of squares is obtained as the difference S E S G S T 8,86 463.4768 8,398.53. We are now able to complete the analysis of variance table: Source SS df MS F Methods 463.4768 3 M T S T /3 54.49 M T /M E 0.9565 Residual 8,398.53 5 M E S E /5 6.50 Total 8, 86. 55 The hypothesis to be tested is H 0 : α α α 3 α 4 against the alternative that there exist at least two treatment effects α i and α j, i j, such that α i α j. 5

Under H 0, the test statistic follows a F 3,5 distribution. An approximate critical value at the 5% significance level is.7903. This value is obtained by interpolating between values F 3,40,0.05.839 and F 3,60,0.05.758, so that F 3,5,0.05 F 3,40,0.05 + 5 40 60 40 (F 3,40,0.05 F 3,60,0.05 ).839 0.08.7904. 0 As F 0.9565 <.7904 F 3,5,0.05, we do not reject H 0 and conclude that there is no significant difference in the average score between the methods of teaching. The final part of the question involves the computation of 4 ri j y ij in step v. above, but only using information from the table. For each value of i, we compute the corresponding inner sum. This computation starts by noting that the second column of data given are standard errors of means. Each of the standard errors is s.e.(mean) i s.e. i /, where s.e. i yij yi. j Inverting the above relation gives the required inner sum: j y ij {s.e.(mean) i ri } ( )+ y i, which is easy to evaluate for each value of i. The addition of the founner sums is 4 yij 90,558.9038 +7,97.9699 +67,309.9899 +80,546.970 3,388, j which is the same value as given in the question. 6