F statistic = s2 1 s 2 ( F for Fisher )

Similar documents
x = , so that calculated

Chapter 11: Simple Linear Regression and Correlation

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Biostatistics 360 F&t Tests and Intervals in Regression 1

Statistics for Economics & Business

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Chapter 11: I = 2 samples independent samples paired samples Chapter 12: I 3 samples of equal size J one-way layout two-way layout

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Lecture 4 Hypothesis Testing

Statistics II Final Exam 26/6/18

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Chapter 12 Analysis of Covariance

+ E 1,1.k + E 2,1.k Again, we need a constraint because our model is over-parameterized. We add the constraint that

ANOVA. The Observations y ij

Introduction to Analysis of Variance (ANOVA) Part 1

/ n ) are compared. The logic is: if the two

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Topic 23 - Randomized Complete Block Designs (RCBD)

Joint Statistical Meetings - Biopharmaceutical Section

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistical Inference. 2.3 Summary Statistics Measures of Center and Spread. parameters ( population characteristics )

Comparison of Regression Lines

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Topic- 11 The Analysis of Variance

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Lecture 6 More on Complete Randomized Block Design (RBD)

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

Lecture 6: Introduction to Linear Regression

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

Chapter 14 Simple Linear Regression

STAT 511 FINAL EXAM NAME Spring 2001

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

STATISTICS QUESTIONS. Step by Step Solutions.

Basic Business Statistics, 10/e

Chapter 13: Multiple Regression

# c i. INFERENCE FOR CONTRASTS (Chapter 4) It's unbiased: Recall: A contrast is a linear combination of effects with coefficients summing to zero:

Economics 130. Lecture 4 Simple Linear Regression Continued

18. SIMPLE LINEAR REGRESSION III

Chapter 3 Describing Data Using Numerical Measures

x i1 =1 for all i (the constant ).

Definition. Measures of Dispersion. Measures of Dispersion. Definition. The Range. Measures of Dispersion 3/24/2014

F8: Heteroscedasticity

17 - LINEAR REGRESSION II

Linear Regression Analysis: Terminology and Notation

28. SIMPLE LINEAR REGRESSION III

STAT 3008 Applied Regression Analysis

1-FACTOR ANOVA (MOTIVATION) [DEVORE 10.1]

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Introduction to Regression

17 Nested and Higher Order Designs

Statistics for Business and Economics

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Two-factor model. Statistical Models. Least Squares estimation in LM two-factor model. Rats

Kernel Methods and SVMs Extension

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Number of cases Number of factors Number of covariates Number of levels of factor i. Value of the dependent variable for case k

First Year Examination Department of Statistics, University of Florida

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Lecture 3 Stat102, Spring 2007

Chapter 6. Supplemental Text Material

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Topic 7: Analysis of Variance

Stat 642, Lecture notes for 01/27/ d i = 1 t. n i t nj. n j

Chapter 15 - Multiple Regression

AS-Level Maths: Statistics 1 for Edexcel

Statistics MINITAB - Lab 2

Goodness of fit and Wilks theorem

a. (All your answers should be in the letter!

Convergence of random processes

Negative Binomial Regression

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 5 Multilevel Models

Analysis of Variance and Design of Experiments-II

Unit 8: Analysis of Variance (ANOVA) Chapter 5, Sec in the Text

Chapter 13 Analysis of Variance and Experimental Design

Professor Chris Murray. Midterm Exam

Statistical tables are provided Two Hours UNIVERSITY OF MANCHESTER. Date: Wednesday 4 th June 2008 Time: 1400 to 1600

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

SIMPLE LINEAR REGRESSION and CORRELATION

Statistics Chapter 4

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Chapter 8 Indicator Variables

experimenteel en correlationeel onderzoek

III. Econometric Methodology Regression Analysis

Unbalanced Nested ANOVA - Sokal & Rohlf Example

Transcription:

Stat 4 ANOVA Analyss of Varance /6/04 Comparng Two varances: F dstrbuton Typcal Data Sets One way analyss of varance : example Notaton for one way ANOVA Comparng Two varances: F dstrbuton We saw that the two sample tests had dfferent statstcs dependng on whether we could say that the varances of both groups were equal (but unknown) σ 2 = σ 2 2 = σ 2 or dfferent σ 2 σ 2 2. We suppose that both populatons are Normally dstrbuted wth respectve varances σ 2 and σ 2 2. In order to test the null hypothess H 0 : σ 2 = σ 2 2 we need a new test statstc for comparson of varances, these are on the square scale compared to the means that are the same unts as the orgnal varables. F-test for comparng two varances from Normal dstrbuted random varables to test H 0 : σ 2 = σ 2 2, calculate F statstc = s2 s 2 ( F for Fsher ) 2 Under the null hypothess F s dstrbuton s called the F dstrbuton wth (num,den) as the paramters where num s the number of degrees of freedom for the numerator, and den s the dof of the denomnator. As the usual computatons nvolve two sded tests, we always use H A : σ 2 σ2, 2 we take the largest of the two varances and put t on the numerator and test whether our observed F obs s larger than F α/2,num,den the α/2 th quantle of the F dstrbuton wth (num,den) degrees of freedom. If H 0 s true, F has the F num,den dstrbuton, f H 0 s false, F tends to be larger, so reect H 0 f F s suffcently large. P-value of F test = P(F num,den > F obs ) Example: Take the energy expendture example. Rato of varances: F obs =.275 P(F 8,2 >.275) = 0.34 > 0.025, we do not reect the equalty of varances. F 0.975 = qf(0.975, 8, 2) = 3.5 > qf(0.025,2,8) [] 0.284756 > pf(.2748,8,2) [] 0.660336 > -pf(.2748,8,2) [] 0.3398664 > /.2748 [] 0.7844368 > pf(0.78444,2,8) [] 0.3398687 df(seq(0, 5, 0.05), 2, 8) 0.0 0.2 0.4 0.6 0 2 3 4 5 seq(0, 5, 0.05) - two parameters F α[ν,ν 2], ν = numerator d.o.f., ν 2 = denomnator d.o.f. for exact P-values, use software ; n R: df(x, df, df2, log = FALSE) pf(q, df, df2, ncp=0, lower.tal = TRUE, log.p = FALSE) qf(p, df, df2, lower.tal = TRUE, log.p = FALSE) rf(n, df, df2) > var.test(lean,obese) F test to compare two varances data: lean and obese F = 0.7844, num df = 2, denom df = 8, p-value = 0.6797 alternatve hypothess: true rato of varances s not equal to

95 percent confdence nterval: 0.867876 2.754799 sample estmates: rato of varances 0.784446 Typcal Data Sets Coagulaton- Det : Det A 62 60 63 59 Det B 63 67 7 64 65 66 Det C 68 66 7 67 68 68 Det D 56 62 60 6 63 64 63 59 boxplot(coag~det,data=coag.df) summary(coag.df) coag det Mn. :56.00 A:4 st Qu.:6.75 B:6 Medan :63.50 C:6 Mean :64.00 D:8 3rd Qu.:67.00 Max. :7.00 60 65 70 The data consst of blood coagulaton tmes for 24 anmals fed one of 4 dfferent dets. In the followng I wrte the data n a table and decompose the table nto a sum of several tables. The 4 rows of the table correspond to Dets A, B, C and D. A B C D Comparng several(more than 2) dfferent samples Remnder: to compare two samples from populatons wth the same varance:. Compute the means for both samples: x and x 2 2. The wthn sample sum of squares (x x) 2 s found for both samples. 3. The pooled estmate of varance s 2 p s obtaned by addng the sums of squares of devatons and dvdng by the total degrees of freedom. 4. The standard error of the mean dfference x x 2 s computed as s p 5. Test the null hypothess µ = µ 2 by computng the test statstc null hypothess. n + n 2 x x 2 s p n + n 2 whch should follow a t n+n 2 2 dstrbuton under the A specal decomposton accordng to the dfferent factor levels Later n the course we wll use a vector notaton and then want to thnk of stackng up thes 24 values nto a sngle column vector but the tables save space. 62 60 63 59 63 67 7 64 65 66 68 66 7 67 68 68 56 62 60 6 63 64 63 59 = 64 + = 6 6 6 6 66 66 66 66 66 66 68 68 68 68 68 68 6 6 6 6 6 6 6 6 3 3 3 3 +2 +2 +2 +2 +2 +2 +4 +4 +4 +4 +4 +4 3 3 3 3 3 3 3 3 2

A, T and R are perpendcular 24-vector Pythagoras: Y = (y ) A = (ȳ) T = (ȳ ȳ) R = (y ȳ ) data average treatment resdual Y = A + T + R d.o.f.n = + (a ) + (n a) y 2 = ȳ 2 + (ȳ ȳ) 2 + SS = SS ave + SS among + SS wthn (y ȳ) 2 y 2 = ȳ 2 + (ȳ ȳ) 2 + (y ȳ) 2 () SS = SS ave + SS among + SS wthn SS total = SS SS ave = (y ȳ) 2 = SS among + SS wthn Model Checkng Model I asumptons y = µ + ɛ =,..., a =,..., n Look at resduals e = y ȳ, usually va plots e.g.. check normalty va normal quartle plots. ɛ ndep N(0, σ 2 ) 2. check (vsually) constancy of varance (σ 2 σ2 ) plot resduals versus ftted values e (y-axs) vs ȳ (x-axs) -look for evdence that spread of e depends on ȳ 3. tme sequence - check ndependnce assumpton of observaton y taken at tme t plot e (y-axs) vs t. Remark Alternate form of Model I y = µ + α + ɛ and dentfablty constrant a n α = 0 (2) Planned comparsons - sngle pars of means, or constrants specfed n advance Dfference of Means e.g. µ µ : lkea two-sample test -but, we are ANOVA model and hence the pooled varance estmate s 2 for the common varance σ 2. 00( k)% CI ȳ ȳ ± t α[ν]seȳ ȳ ν = n a SEȳ ȳ = n + n s ANOVA - Part I-One way //03 { =,..., a y = µ + ɛ =,..., n treatments groups observatons wthn groups µ fxed group means, ɛ N(0, σ 2 ), n = a = n = n + n 2 + n a (Unknown parameters of model: µ,..., µ a, σ 2 ) Null hypothess of nterest here: H 0 : µ = µ 2 =... = µ a vs H : not all equal. Notes: 3

. common error varance n all groups s assumed. 2. a = 2 reduces to two sample problem (σ 2 = σ 2 2) Varaton wthn treatments th group y y 2. y n ave ȳ var SS s 2 (n )s 2 = (y ȳ ) 2 dof n s 2 = n n = (y ȳ ) 2 s an unbased estmate of σ 2 for the th group. But we have =,..., a ndependent estmates of the common error varance σ 2. Pooled estmate of σ 2 = weghted average (by d.o.f.) of estmates. s 2 = (n )s 2 (n = SS wthn = MS wthn ( mean squares wthn groups ) ) n a SS wthn = (y ȳ ) 2 = sum of squares wthn treatments = = n a = (n ) d.o.f. wthn treatments Varaton among treatments -compare group sample means ȳ to overall sample mean ȳ = n ȳ Motvaton: Suppose that n fact that µ and n are same: µ µ ; n n n = y n Then ȳ (µ, σ2 n ), we look at ths new sample of a observed ȳ s and compute ther estmated varance. Then we would have another estmate of σ 2, separate from the pooled estmate descrbed above: Ths suggests defnng SSamong = MS among = a a = σ 2 n = a a (ȳ ȳ) 2 = a n (ȳ ȳ) 2 = sum of squares among treatments = n (y ȳ) 2 = SS among a Thus f H 0 : µ =... = µ a s true, have two estmates of varablty: MSamong (a- dof), MS wthn (n-a, dof). If H 0 s false, due to the varaton among µ, we expect F = MS among MS to be larger than. wthn Total varaton and an dentty SS total = (y ȳ) 2 = (y ȳ) 2 + (y ȳ) 2 = SS wthn + SS among = mean square among treatments a- = d.o.f. among treatments -a decomposton of varaton about the grand mean ȳ nto components of varaton about the ndvdual means and then the component between sample means. 4

- leads to an analyss of varance table, llustrated on the blood data. Source of Varaton SS d.o.f. MS F Among(Between) SS among = 228 a =3 MS among = 228/3 = 76 treatments Wthn Treatments SS wthn = 2 n a = 20 MS wthn = 2/20 = 5.6 SS total = 340 n = 23 ANOVA F-test: to test H 0 : µ = µ 2 =... = µ a, MS among MS wthn = 76 5.6 = 3.6 Calculate F statstc = MS among MS wthn ( F for Fsher ) If H 0 s true, F has the F a,n a dstrbuton. f H 0 s false, F tends to be larger, so reect H 0 f F s suffcently large. P-value of F test = P(F a,n a > F obs ) - two parameters F α[ν,ν 2], ν = numerator d.o.f., ν 2 = denomnator d.o.f. for exact P-values, use software ; n R: > pf(3.6,3,20) [] 0.999954 > pf(3.6,3,20,lower.tal=f) [] 4.594599e-05 > qf(0.999,3,20) [] 8.09838 Geometrc Pcture of Varance Decomposton. coag.aov_lm(coag~det,data=coag.df) anova(coag.aov) Analyss of Varance Table Response: coag Df Sum Sq Mean Sq F value Pr(>F) det 3 228.0 76.0 3.57 4.658e-05 *** Resduals 20 2.0 5.6 --- Sgnf. codes: 0 *** 0.00 ** 0.0 * 0.05. 0. The data consst of blood coagulaton tmes for 24 anmals fed one of 4 dfferent dets. In the followng I wrte the data n a table and decompose the table nto a sum of several tables. The 4 rows of the table correspond to Dets A, B, C and D. We could use a vector notaton and then want to thnk of stackng up thes 24 values nto a sngle column vector. 62 60 63 59 63 67 7 64 65 66 68 66 7 67 68 68 56 62 60 6 63 64 63 59 = 64 + = 6 6 6 6 66 66 66 66 66 66 68 68 68 68 68 68 6 6 6 6 6 6 6 6 3 3 3 3 +2 +2 +2 +2 +2 +2 +4 +4 +4 +4 +4 +4 3 3 3 3 3 3 3 3 On the left hand sde s the uncorrected total sum of squares. The frst term on the rght hand sde gves the total mean. Ths term s sometmes put n ANOVA tables as the Sum of Squares due to the Grand Mean but t s usually subtracted from the total to produce the Total Sum of Squares we usually put at the bottom of the table and often called the Corrected (or Adusted) Total Sum of Squares. In ths case the corrected sum of squares s the squared length of the table whch s 340. > sum(coag^2)-24*64^2 [] 340 The second term on the rght hand sde of the equaton has squared length 228 (whch s the Treatment Sum of Squares produced). > sum((pred-64)^2) [] 228 The squared length of the vector of ndvdual sample means mnus the grand mean. The last vector of the decomposton s called the resdual vector and has squared length 2. > sum(res^2) [] 2 5

(y ȳ) 2 = (y ȳ)(y ȳ) = (y 2 2y ȳ + ȳ 2 ) = y 2 2ȳ y + nȳ 2 = y 2 2nȳ 2 + nȳ 2 = y 2 nȳ 2 Correspondng to the decomposton of the total squared length of the data vector s a decomposton of ts dmenson, 24, nto the dmensons of subspaces. For nstance the grand mean s always a multple of the sngle vector all of whose entres are ; ths descrbes a one dmensonal space. The second vector, of devatons from a grand mean les n the three dmensonal subspace of tables whch are constant n each row and have a total equal to 0. Smlarly the vector of resduals les n a 20 dmensonal subspace the set of all tables whose rows sum to 0. Ths decomposton of dmensons s the decomposton fo degrees of freedom. So 24 = +3+20 and the degrees of freedom for treatment and error are 3 and 20 respectvely. The vector whose squared length s the Corrected Total Sum of Squares les n the 23 dmensonal subspace of vectors whose entres sum to ; ths produces the 23 total degrees of freedom n the usual ANOVA table. A Y=A+T+R A+T A, T and R are perpendcular 24-vector Pythagoras: Y = (y ) A = (ȳ) T = (ȳ ȳ) R = (y ȳ ) data average treatment resdual Y = A + T + R d.o.f.n = + (a ) + (n a) y 2 = ȳ 2 + (ȳ ȳ) 2 + (y ȳ) 2 (3) SS = SS ave + SS among + SS wthn SS total = SS SS ave = (y ȳ) 2 = SS among + SS wthn 6