Standardized Residuals vs Normal Scores

Size: px
Start display at page:

Download "Standardized Residuals vs Normal Scores"

Transcription

1 Stat 50 (Oehlert): Random effects Cmd> print(sire,format:"f4.0") These are the data from exercise 5- of Kuehl (994, Duxbury). There are 5 bulls selected at random, and we observe the birth weights of male calves. Sire is considered random. sire: () (6) () Cmd> print(wts,format:"f4.0") wts: () (6) () Cmd> sire<-factor(sire) Cmd> anova("wts=sire") This is the ordinary ANOVA. It doesn t know anything about fixed or random effects. The DF, SS, and MS are correct. Model used is wts=sire DF SS MS CONSTANT.758e e+05 sire ERROR Cmd> resvsrankits() Normality is not too bad. Standardized Residuals vs Normal Scores Standardized Resids Normal Scores

2 Stat 50 (Oehlert): Random effects Cmd> resvsyhat() Constant variance is a little bit doubtful, but no power family transformation will help much since the ratio of largest to smallest response is only about. Standardized Residuals vs Fitted Values (Yhat) Standardized Resids Fitted Values (Yhat) Cmd> # In order to do random effects analysis, we need some new commands. The first is ems(). ems() computes expected mean squares for models with random and/or fixed effects. Data may be unbalanced. The basic usage is to give the model and then a keyword phrase random:names, where names is a character vector with the names of the random effects. Several more specialized alternatives are also available. The command mixed() does mixed (random and fixed) effects anova, computing the correct denominator for tests. The basic arguments are the same as for ems(). You can also give it the output of ems() as an argument. The third command is varcomp(). varcomp() computes estimates of variance components, their standard errors, and approximate degrees of freedom. Same arguments as ems() or mixed(). The fourth command is reml(), which does restricted maximum likelihood estimation of fixed and random effects. Same basic arguments as the others. All of these commands are available from the Statistics :: ANOVA modeling submenu. Cmd> ems("wts=sire",random:"sire") OK, so let s get the expected mean squares. The arguments are the model, and then random:names, where names is a vector of character strings giving the names of the random terms. The last error term is automatically random. These data are balanced, so the EMS could be calculated with the Hasse diagram. EMS(CONSTANT) = V(ERROR) + 8V(sire) + 40Q(CONSTANT) EMS(sire) = V(ERROR) + 8V(sire) EMS(ERROR) = V(ERROR)

3 Stat 50 (Oehlert): Random effects Cmd> mixed("wts=sire",random:"sire") The mixed macro produces the correct anova for problems with random and/or fixed effects. There is a row for every term in the anova model. There are columns for the DF and MS of each term, the DF of MS for the error or denominator for each term, the F and the p-value. Here we see that sire is reasonably significant. DF MS Error DF Error MS F P value CONSTANT.76e sire ERROR MISSING MISSING Cmd> varcomp("wts=sire",random:"sire") Here are the estimated variance components. The estimate for ERROR is just itself. For sire, we have ( )/8 = Standard errors are computed from the variances of the anova mean squares and the coefficients used in the variance component estimates (0 and for ERROR and /8 and -/8 for sire). Contrast the fact that sire is fairly signficant with the fact that estimate of the sire variance component was less than one SE from zero. This is not necessarily a contridiction because the estimate plus or minus SE form of confidence interval is not appropriate for variance components based on few df. Estimate SE DF sire ERROR Cmd> reml("wts=sire",random:"sire") There are several other ways to estimate the fixed and random components of an ANOVA. The reml() command does restricted maximum likelihood. Its estimates of variance components will always be nonnegative. The output gives the estimated fixed effects (called theta), the estimated variance components (called phi), the variances for theta and phi, the degrees of freedom for the variance components, the estimated random effects (called gamma), the variances of the estimated random effects, the log likelihood for the model, and the residuals (data minus fixed effects; that is, the residuals contain all random effects). In some cases (such as this one), the REML estimates and the usual ANOVA-based estimates will agree. This will not always be true. component: theta CONSTANT 8.55 component: phi sire 6.75 ERROR component: thetavar (,) component: phivar sire ERROR sire ERROR component: phidf sire.767 ERROR 5 component: gamma (,) 0.78 (,) (,) -.98

4 Stat 50 (Oehlert): Random effects 4 (4,) (5,) component: gammavar () component: loglike (,) -79. component: residuals (,) -.55 (,) 7.45 (,) (4,) 0.45 (5,) 6.45 (6,) 0.45 (7,) (8,) (9,) (0,) 9.45 (,).45 (,) 0.45 (,) 5.45 (4,).45 (5,) 5.45 (6,).45 (7,) (8,) -.55 (9,) -.55 (0,) (,) (,) -.55 (,) (4,) 7.45 (5,) (6,) (7,) (8,) -.55 (9,) (0,) 8.45 (,) 8.45 (,) 8.45 (,) -.55 (4,) (5,) 7.45 (6,).45 (7,).45 (8,) 0.45 (9,).45 (40,) -7.55

5 Stat 50 (Oehlert): Random effects 5 Cmd> reml("wts=sire",random:"sire",usemle:t) You can also get maximum likelihood estimates by using the keyword phrase usemle:t. The ordinary REML estimates are more popular, because the REML estimates of variance components are less biased. component: theta CONSTANT 8.55 component: phi sire ERROR component: thetavar CONSTANT CONSTANT component: phivar sire ERROR sire ERROR component: phidf sire.675 ERROR 5 component: gamma (,) (,) (,) -.68 (4,) (5,) component: gammavar () component: loglike (,) Cmd> print(lot,format:"f4.0",labels:f);lot<-factor(lot) These are the data from problem 5- of Kuehl (994, Duxbury). There are 8 randomly chosen lots of cotton seed, and 4 samples are taken from each lot. The response is the amount of aflatoxin on the seeds. lot: Cmd> print(at,format:"f4.0",labels:f) at:

6 Stat 50 (Oehlert): Random effects 6 Cmd> anova("at=lot") Here is the base anova. Again, it does not know about random or fixed terms. In this case, ERROR is the correct denominator for lot. Model used is at=lot DF SS MS CONSTANT lot ERROR Cmd> resvsyhat() Constant variance looks pretty good. Standardized Residuals vs Fitted Values (Yhat) Standardized Resids Fitted Values (Yhat) Cmd> resvsrankits() A little hitch in the rankit plot, but not too bad. Standardized Residuals vs Normal Scores Standardized Resids Normal Scores

7 Stat 50 (Oehlert): Random effects 7 Cmd> ems("at=lot",random:"lot") Here are the expected mean squares. lot is random. Again, these data are balanced (4 samples per lot), so we could get these by hand. EMS(CONSTANT) = V(ERROR) + 4V(lot) + Q(CONSTANT) EMS(lot) = V(ERROR) + 4V(lot) EMS(ERROR) = V(ERROR) Cmd> mixed("at=lot",random:"lot") Here is the ANOVA with correct denominators. lot is highly significant. DF MS Error DF Error MS F P value CONSTANT 5.44e lot e-05 ERROR MISSING MISSING Cmd> varcomp("at=lot",random:"lot") Here are the estimated variance components. Again, even though lot is highly significant, its estimated variance component is less than two SE from zero. Estimate SE DF lot ERROR Cmd> EMSout<-ems("at=lot",random:"lot",keep:T) The keep:t keyword phase makes ems return it s information as a structure instead of printing it out. Cmd> mixed(emsout) We can use this ems output structure as input to mixed or varcomp. Most of the computation in mixed and varcomp is just doing the EMS. So, if the ems is slow and/or complicated, it might make sense to do it once and save the output. DF MS Error DF Error MS F P value CONSTANT 5.44e lot e-05 ERROR MISSING MISSING Cmd> varcomp(emsout) Estimate SE DF lot ERROR

8 Stat 50 (Oehlert): Random effects 8 Cmd> print(mohms,format:"f5.0",labels:f) These are data from problem 6.8 of Hicks and Turner (999 Oxford). Ten resistors are chosen at random, and three operators are chosen at random. Each operator measures the resistance of each resistor twice, with the 0 measurements made in random order. Response is in milliohms. mohms: Cmd> print(oper,format:"f5.0",labels:f) oper: Cmd> print(part,format:"f5.0",labels:f) part: Cmd> anova("mohms=oper*part") Basic ANOVA of the resistor data. Model used is mohms=oper*part DF SS MS CONSTANT 9.809e e+06 oper part oper.part ERROR

9 Stat 50 (Oehlert): Random effects 9 Cmd> chplot(mohms-residuals,residuals,oper) Here is a problem. Operator tends to measure a bit low, and he has more variance as well. This may cause some problems later, and no reasonable power transformation will fix this. RESIDUALS Cmd> ems("mohms=oper*part",random:vector("oper","part")) Here are the EMS. We can see that the two-factor interaction is the appropriate denominator for main effects. EMS(CONSTANT) = V(ERROR) + V(oper.part) + 6V(part) + 0V(oper) + 60Q(CONSTANT) EMS(oper) = V(ERROR) + V(oper.part) + 0V(oper) EMS(part) = V(ERROR) + V(oper.part) + 6V(part) EMS(oper.part) = V(ERROR) + V(oper.part) EMS(ERROR) = V(ERROR) Cmd> mixed("mohms=oper*part",random:vector("oper","part")) There is strong evidence for variation among operators, and there is no evidence of variation between parts (that s good) or an interaction. DF MS Error DF Error MS F P value CONSTANT 9.80e oper e-09 part oper.part ERROR MISSING MISSING Cmd> varcomp("mohms=oper*part",random:vector("oper","part")) Note the negative estimated variance component. This occurs when the F is less than. Also note that operator is highly significant, but less than SE from zero. Estimate SE DF oper part oper.part ERROR

10 Stat 50 (Oehlert): Random effects 0 Cmd> reml("mohms=oper*part",random:vector("oper","part")) Here is the REML fit to the same data. Note that two of the variance components are estimated as zero; only the operator and error variance components are nonzero. component: theta CONSTANT component: phi oper part 0 oper.part 0 ERROR 40. component: thetavar CONSTANT CONSTANT 6.0 component: phivar oper part oper.part ERROR oper part oper.part ERROR component: phidf oper.898 part 0 oper.part 0 ERROR 57 component: gamma (,) (,) (,).7 (4,) 0 (5,) 0 (6,) 0 (7,) 0 (8,) 0 (9,) 0 (0,) 0 (,) 0 (,) 0 (,) 0 (4,) 0 (5,) 0 (6,) 0 (7,) 0 (8,) 0 (9,) 0 (0,) 0 (,) 0 (,) 0 (,) 0 (4,) 0 (5,) 0 (6,) 0 (7,) 0 (8,) 0

11 Stat 50 (Oehlert): Random effects (9,) 0 (0,) 0 (,) 0 (,) 0 (,) 0 (4,) 0 (5,) 0 (6,) 0 (7,) 0 (8,) 0 (9,) 0 (40,) 0 (4,) 0 (4,) 0 (4,) 0 component: gammavar () (6) () (6) () (6) () (6) (4) component: loglike (,)

12 Stat 50 (Oehlert): Random effects Cmd> invchi(vector(-.05/,.05/),0) To compute a confidence interval for an EMS, we need the upper and lower E/ percent points of a chisquare. The EMS of MSE is σ, so we can use these percent points to form a confidence interval for σ. () Cmd> 0*5.68/invchi(vector(-.05/,.05/),0) Multiply the mean square by its degrees of freedom and divide by the upper and lower percent points from chisquare to get the confidence interval. Here, even with 0 df, the interval spans a factor of. () Cmd> *56/invchi(vector(-.05/,.05/),) We can do an analogous computation for the EMS of any MS, it s just that most of these aren t of much interest. Here we form a 95% interval for σ + σαβ + 0σ α, the EMS for the MS of operator. This EMS is not of too much interest, and with only df, the interval is a mile wide. () Cmd> invf(vector(-.05/,.05/),,8) We can compute confidence intervals for the ratio of two EMS s using upper and lower F percent points. Let s get an interval for the ratio of the EMS for operator to the EMS for operator by part; this has and 8 degrees of freedom. () Cmd> 6.7/invF(vector(-.05/,.05/),,8) Divide the F-ratio MS oper over MS oper.part by the F percent points. The produces an interval for EMS-oper/EMS-oper.part, or (σ +σ αβ +0σ α )/(σ +σ αβ ) = +0σ α /(σ + σ αβ ) () Cmd> (6.7/invF(vector(-.05/,.05/),,8)-)/0 Subtract and divide by 0 to get a confidence interval for σα/(σ + σαβ ). Note that the largest plausible ratio is almost 00 times the smallest! ()

13 Stat 50 (Oehlert): Random effects Cmd> invf(vector(-.05/4,.05/4),,8) For variance components with exact F-tests (such as σ α here), we can combine two E/ confidence intervals to construct a E interval for the component of interest. We need F percent points with numerator and denominator df from the two MS, and chisquare percent points for the numerator df. () Cmd> invchi(vector(-.05/4,.05/4),) () Cmd> # We use the numerator df (), the numerator MS (56), the observed F (6.7), the multiplier for the variance component of interest in its EMS (0 for σα ), and the upper and lower F and chisquare percent points. Cmd> *56*(-5.645/6.7)/0/8.764 Here s the lower endpoint. () 6.97 Cmd> *56*(-.0588/6.7)/0/.0558 Here s the upper endpoint. Our estimate of is in the interval, but the maximum is almost 400 times the minimum! () 60.5 Cmd>.9*76.78/invchi(vector(-.05/,.05/),.9) As a simple, but crude, approximation, we can use the estimated variance component and its approximate degrees of freedom as if it were a simple mean square. () Cmd> -cumf(50/(50+0*0)*invf(.95,,8),,8) Power is fairly simple for random effects. You need the probability that an F (MS/MS) is bigger than (EMS/EMS) times the rejection cutoff. Here, suppose that σ = 50, σαβ = 0, and σ α = 0. The test has and 8 df, and we get power.5. ()

14 Stat 50 (Oehlert): Random effects 4 Cmd> # Let s look at how often a confidence interval for error variance covers the true variance. We ll work with 5 degrees of freedom and consider 90% and 95% intervals for σ. The intervals are formed by dividing the error SS by chisquare percent points. When everything works right, the 95% intervals should miss.5% each high and low, and the 90% intervals should miss 5% each high and low. Cmd> lo90<-/invchi(.95,5);hi90<-/invchi(.05,5) Cmd> lo95<-/invchi(.975,5);hi95<-/invchi(.05,5) Cmd> lo90;hi90 Here are the factors. () () Cmd> lo95;hi95 () () Cmd> # We will take 0000 samples of size 6. For each sample of size 6 we ll compute the SS around the mean (with 5 df), and compute confidence intervals for the variance. Cmd> sum(lo90*ss>)/0000 This is for normally distributed data. We get about the fraction of misses high or low that we expect. () Cmd> sum(lo95*ss>)/0000 () 0.07 Cmd> sum(hi90*ss<)/0000 () Cmd> sum(hi95*ss<)/0000 () 0.045

15 Stat 50 (Oehlert): Random effects 5 Cmd> plot(rankits(z),z) Now some nonnormal data. Here is a NPP of 500 points from a distribution with longer tails than normally distributed data. 6 4 z Cmd> sum(lo90*ss>)/0000 () 0.5 Cmd> sum(lo95*ss>)/0000 () Cmd> sum(hi90*ss<)/0000 () Cmd> sum(hi95*ss<)/0000 () These error rates are much too high. The 90% ci only has coverage about.7, and the 95% ci has coverage about.80

16 Stat 50 (Oehlert): Random effects 6 Cmd> plot(rankits(z),z) Here is an NPP of data with longer tails. 0 - z Cmd> sum(lo90*ss>)/0000 () Cmd> sum(lo95*ss>)/0000 () 0.57 Cmd> sum(hi90*ss<)/0000 () 0.45 Cmd> sum(hi95*ss<)/0000 () Check the error rates. The 90% ci has coverage about.4, and the 95% ci has coverage about.5.

17 Stat 50 (Oehlert): Random effects 7 Cmd> plot(rankits(z),z) Here is an NPP of data that are mildly asymmetric, but not terribly outlier prone. 5 4 z Cmd> sum(lo90*ss>)/0000 () Cmd> sum(lo95*ss>)/0000 () Cmd> sum(hi90*ss<)/0000 () Cmd> sum(hi95*ss<)/0000 () The errors are about.5 to times what they should be.

18 Stat 50 (Oehlert): Random effects 8 Cmd> plot(rankits(z),z) Now we finish up with some short-tailed data from a uniform distribution..5.5 z Cmd> sum(lo90*ss>)/0000 () Cmd> sum(lo95*ss>)/0000 () Cmd> sum(hi90*ss<)/0000 () 0.00 Cmd> sum(hi95*ss<)/0000 () These error rates are a factor of 5 to 0 too small. Our coverage is actually greater than the nominal 90 or 95% when the errors are short tailed.

Statistics Univariate Linear Models Gary W. Oehlert School of Statistics 313B Ford Hall

Statistics Univariate Linear Models Gary W. Oehlert School of Statistics 313B Ford Hall Statistics 5401 14. Univariate Linear Models Gary W. Oehlert School of Statistics 313B ord Hall 612-625-1557 gary@stat.umn.edu Linear models relate a target or response or dependent variable to known predictor

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

1. (Problem 3.4 in OLRT)

1. (Problem 3.4 in OLRT) STAT:5201 Homework 5 Solutions 1. (Problem 3.4 in OLRT) The relationship of the untransformed data is shown below. There does appear to be a decrease in adenine with increased caffeine intake. This is

More information

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means

Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means Keppel, G. & Wickens, T. D. Design and Analysis Chapter 4: Analytical Comparisons Among Treatment Means 4.1 The Need for Analytical Comparisons...the between-groups sum of squares averages the differences

More information

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Stat 5303 (Oehlert): Randomized Complete Blocks 1 Stat 5303 (Oehlert): Randomized Complete Blocks 1 > library(stat5303libs);library(cfcdae);library(lme4) > immer Loc Var Y1 Y2 1 UF M 81.0 80.7 2 UF S 105.4 82.3 3 UF V 119.7 80.4 4 UF T 109.7 87.2 5 UF

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

One-way ANOVA Model Assumptions

One-way ANOVA Model Assumptions One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random

More information

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math. Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants.

Example: Four levels of herbicide strength in an experiment on dry weight of treated plants. The idea of ANOVA Reminders: A factor is a variable that can take one of several levels used to differentiate one group from another. An experiment has a one-way, or completely randomized, design if several

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Using SPSS for One Way Analysis of Variance

Using SPSS for One Way Analysis of Variance Using SPSS for One Way Analysis of Variance This tutorial will show you how to use SPSS version 12 to perform a one-way, between- subjects analysis of variance and related post-hoc tests. This tutorial

More information

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved.

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved. This file consists of Chapter 4 of MacAnova User s Guide by Gary W. Oehlert and Christopher Bingham, issued as Technical Report Number 617, School of Statistics, University of Minnesota, March 1997, describing

More information

Stat 5303 (Oehlert): Models for Interaction 1

Stat 5303 (Oehlert): Models for Interaction 1 Stat 5303 (Oehlert): Models for Interaction 1 > names(emp08.10) Recall the amylase activity data from example 8.10 [1] "atemp" "gtemp" "variety" "amylase" > amylase.data

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Using Tables and Graphing Calculators in Math 11

Using Tables and Graphing Calculators in Math 11 Using Tables and Graphing Calculators in Math 11 Graphing calculators are not required for Math 11, but they are likely to be helpful, primarily because they allow you to avoid the use of tables in some

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance ANOVA) Compare several means Radu Trîmbiţaş 1 Analysis of Variance for a One-Way Layout 1.1 One-way ANOVA Analysis of Variance for a One-Way Layout procedure for one-way layout Suppose

More information

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t =

Sampling distribution of t. 2. Sampling distribution of t. 3. Example: Gas mileage investigation. II. Inferential Statistics (8) t = 2. The distribution of t values that would be obtained if a value of t were calculated for each sample mean for all possible random of a given size from a population _ t ratio: (X - µ hyp ) t s x The result

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails

More information

Confidence intervals

Confidence intervals Confidence intervals We now want to take what we ve learned about sampling distributions and standard errors and construct confidence intervals. What are confidence intervals? Simply an interval for which

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

ANOVA: Analysis of Variation

ANOVA: Analysis of Variation ANOVA: Analysis of Variation The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative variables depend on which group (given by categorical

More information

STAT Chapter 10: Analysis of Variance

STAT Chapter 10: Analysis of Variance STAT 515 -- Chapter 10: Analysis of Variance Designed Experiment A study in which the researcher controls the levels of one or more variables to determine their effect on the variable of interest (called

More information

Is economic freedom related to economic growth?

Is economic freedom related to economic growth? Is economic freedom related to economic growth? It is an article of faith among supporters of capitalism: economic freedom leads to economic growth. The publication Economic Freedom of the World: 2003

More information

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs)

The One-Way Repeated-Measures ANOVA. (For Within-Subjects Designs) The One-Way Repeated-Measures ANOVA (For Within-Subjects Designs) Logic of the Repeated-Measures ANOVA The repeated-measures ANOVA extends the analysis of variance to research situations using repeated-measures

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

5:1LEC - BETWEEN-S FACTORIAL ANOVA

5:1LEC - BETWEEN-S FACTORIAL ANOVA 5:1LEC - BETWEEN-S FACTORIAL ANOVA The single-factor Between-S design described in previous classes is only appropriate when there is just one independent variable or factor in the study. Often, however,

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment

More information

Random and Mixed Effects Models - Part III

Random and Mixed Effects Models - Part III Random and Mixed Effects Models - Part III Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Quasi-F Tests When we get to more than two categorical factors, some times there are not nice F tests

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Two-Way Factorial Designs

Two-Way Factorial Designs 81-86 Two-Way Factorial Designs Yibi Huang 81-86 Two-Way Factorial Designs Chapter 8A - 1 Problem 81 Sprouting Barley (p166 in Oehlert) Brewer s malt is produced from germinating barley, so brewers like

More information

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b). Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

More information

Independent Samples ANOVA

Independent Samples ANOVA Independent Samples ANOVA In this example students were randomly assigned to one of three mnemonics (techniques for improving memory) rehearsal (the control group; simply repeat the words), visual imagery

More information

STAT22200 Spring 2014 Chapter 8A

STAT22200 Spring 2014 Chapter 8A STAT22200 Spring 2014 Chapter 8A Yibi Huang May 13, 2014 81-86 Two-Way Factorial Designs Chapter 8A - 1 Problem 81 Sprouting Barley (p166 in Oehlert) Brewer s malt is produced from germinating barley,

More information

Unit 27 One-Way Analysis of Variance

Unit 27 One-Way Analysis of Variance Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown

Lecture 17: Small-Sample Inferences for Normal Populations. Confidence intervals for µ when σ is unknown Lecture 17: Small-Sample Inferences for Normal Populations Confidence intervals for µ when σ is unknown If the population distribution is normal, then X µ σ/ n has a standard normal distribution. If σ

More information

Analysis of Variance

Analysis of Variance Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also

More information

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to

CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE. It would be very unusual for all the research one might conduct to be restricted to CHAPTER 10 ONE-WAY ANALYSIS OF VARIANCE It would be very unusual for all the research one might conduct to be restricted to comparisons of only two samples. Respondents and various groups are seldom divided

More information

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 > library(stat5303libs);library(cfcdae);library(lme4) > weardata

More information

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota

Multiple Testing. Gary W. Oehlert. January 28, School of Statistics University of Minnesota Multiple Testing Gary W. Oehlert School of Statistics University of Minnesota January 28, 2016 Background Suppose that you had a 20-sided die. Nineteen of the sides are labeled 0 and one of the sides is

More information

Analysis of Covariance (ANCOVA) with Two Groups

Analysis of Covariance (ANCOVA) with Two Groups Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA STAT 705 Chapter 19: Two-way ANOVA Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 41 Two-way ANOVA This material is covered in Sections

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

1 Introduction to One-way ANOVA

1 Introduction to One-way ANOVA Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.edu/~carpedm/courses/stat3610/textbookdata/minitab/

More information

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006 and F Distributions Lecture 9 Distribution The distribution is used to: construct confidence intervals for a variance compare a set of actual frequencies with expected frequencies test for association

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

10 Model Checking and Regression Diagnostics

10 Model Checking and Regression Diagnostics 10 Model Checking and Regression Diagnostics The simple linear regression model is usually written as i = β 0 + β 1 i + ɛ i where the ɛ i s are independent normal random variables with mean 0 and variance

More information

STAT 705 Chapter 19: Two-way ANOVA

STAT 705 Chapter 19: Two-way ANOVA STAT 705 Chapter 19: Two-way ANOVA Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 38 Two-way ANOVA Material covered in Sections 19.2 19.4, but a bit

More information

9 One-Way Analysis of Variance

9 One-Way Analysis of Variance 9 One-Way Analysis of Variance SW Chapter 11 - all sections except 6. The one-way analysis of variance (ANOVA) is a generalization of the two sample t test to k 2 groups. Assume that the populations of

More information

Battery Life. Factory

Battery Life. Factory Statistics 354 (Fall 2018) Analysis of Variance: Comparing Several Means Remark. These notes are from an elementary statistics class and introduce the Analysis of Variance technique for comparing several

More information

SMAM 314 Exam 42 Name

SMAM 314 Exam 42 Name SMAM 314 Exam 42 Name Mark the following statements True (T) or False (F) (10 points) 1. F A. The line that best fits points whose X and Y values are negatively correlated should have a positive slope.

More information

Content by Week Week of October 14 27

Content by Week Week of October 14 27 Content by Week Week of October 14 27 Learning objectives By the end of this week, you should be able to: Understand the purpose and interpretation of confidence intervals for the mean, Calculate confidence

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan COSC 341 Human Computer Interaction Dr. Bowen Hui University of British Columbia Okanagan 1 Last Topic Distribution of means When it is needed How to build one (from scratch) Determining the characteristics

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t t Confidence Interval for Population Mean Comparing z and t Confidence Intervals When neither z nor t Applies

More information

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal Hypothesis testing, part 2 With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal 1 CATEGORICAL IV, NUMERIC DV 2 Independent samples, one IV # Conditions Normal/Parametric Non-parametric

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Lecture 4. Random Effects in Completely Randomized Design

Lecture 4. Random Effects in Completely Randomized Design Lecture 4. Random Effects in Completely Randomized Design Montgomery: 3.9, 13.1 and 13.7 1 Lecture 4 Page 1 Random Effects vs Fixed Effects Consider factor with numerous possible levels Want to draw inference

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 03 The Chi-Square Distributions Dr. Neal, Spring 009 The chi-square distributions can be used in statistics to analyze the standard deviation of a normally distributed measurement and to test the

More information

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b). Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something

More information

Ch18 links / ch18 pdf links Ch18 image t-dist table

Ch18 links / ch18 pdf links Ch18 image t-dist table Ch18 links / ch18 pdf links Ch18 image t-dist table ch18 (inference about population mean) exercises: 18.3, 18.5, 18.7, 18.9, 18.15, 18.17, 18.19, 18.27 CHAPTER 18: Inference about a Population Mean The

More information

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests

z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests z and t tests for the mean of a normal distribution Confidence intervals for the mean Binomial tests Chapters 3.5.1 3.5.2, 3.3.2 Prof. Tesler Math 283 Fall 2018 Prof. Tesler z and t tests for mean Math

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Essential of Simple regression

Essential of Simple regression Essential of Simple regression We use simple regression when we are interested in the relationship between two variables (e.g., x is class size, and y is student s GPA). For simplicity we assume the relationship

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)

Lecture notes 13: ANOVA (a.k.a. Analysis of Variance) Lecture notes 13: ANOVA (a.k.a. Analysis of Variance) Outline: Testing for a difference in means Notation Sums of squares Mean squares The F distribution The ANOVA table Part II: multiple comparisons Worked

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

The Chi-Square Distributions

The Chi-Square Distributions MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness

More information

Chapter 10: Chi-Square and F Distributions

Chapter 10: Chi-Square and F Distributions Chapter 10: Chi-Square and F Distributions Chapter Notes 1 Chi-Square: Tests of Independence 2 4 & of Homogeneity 2 Chi-Square: Goodness of Fit 5 6 3 Testing & Estimating a Single Variance 7 10 or Standard

More information

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes Data Analysis Standard Error and Confidence Limits E80 Spring 05 otes We Believe in the Truth We frequently assume (believe) when making measurements of something (like the mass of a rocket motor) that

More information

Z-tables. January 12, This tutorial covers how to find areas under normal distributions using a z-table.

Z-tables. January 12, This tutorial covers how to find areas under normal distributions using a z-table. Z-tables January 12, 2019 Contents The standard normal distribution Areas above Areas below the mean Areas between two values of Finding -scores from areas Z tables in R: Questions This tutorial covers

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs) The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or

More information

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 ) Multiple Linear Regression is used to relate a continuous response (or dependent) variable Y to several explanatory (or independent) (or predictor) variables X 1, X 2,, X k assumes a linear relationship

More information

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]

[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.] Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the

More information

- a value calculated or derived from the data.

- a value calculated or derived from the data. Descriptive statistics: Note: I'm assuming you know some basics. If you don't, please read chapter 1 on your own. It's pretty easy material, and it gives you a good background as to why we need statistics.

More information

Confidence Intervals 1

Confidence Intervals 1 Confidence Intervals 1 November 1, 2017 1 HMS, 2017, v1.1 Chapter References Diez: Chapter 4.2 Navidi, Chapter 5.0, 5.1, (Self read, 5.2), 5.3, 5.4, 5.6, not 5.7, 5.8 Chapter References 2 Terminology Point

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

STAT 328 (Statistical Packages)

STAT 328 (Statistical Packages) Department of Statistics and Operations Research College of Science King Saud University Exercises STAT 328 (Statistical Packages) nashmiah r.alshammari ^-^ Excel and Minitab - 1 - Write the commands of

More information

Nesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik

Nesting and Mixed Effects: Part I. Lukas Meier, Seminar für Statistik Nesting and Mixed Effects: Part I Lukas Meier, Seminar für Statistik Where do we stand? So far: Fixed effects Random effects Both in the factorial context Now: Nested factor structure Mixed models: a combination

More information

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Confidence Intervals. - simply, an interval for which we have a certain confidence. Confidence Intervals I. What are confidence intervals? - simply, an interval for which we have a certain confidence. - for example, we are 90% certain that an interval contains the true value of something

More information

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1

Notes for Week 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Notes for Wee 13 Analysis of Variance (ANOVA) continued WEEK 13 page 1 Exam 3 is on Friday May 1. A part of one of the exam problems is on Predictiontervals : When randomly sampling from a normal population

More information

Statistics 512: Applied Linear Models. Topic 9

Statistics 512: Applied Linear Models. Topic 9 Topic Overview Statistics 51: Applied Linear Models Topic 9 This topic will cover Random vs. Fixed Effects Using E(MS) to obtain appropriate tests in a Random or Mixed Effects Model. Chapter 5: One-way

More information

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes Data Analysis Standard Error and Confidence Limits E80 Spring 0 otes We Believe in the Truth We frequently assume (believe) when making measurements of something (like the mass of a rocket motor) that

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Lecture 14: ANOVA and the F-test

Lecture 14: ANOVA and the F-test Lecture 14: ANOVA and the F-test S. Massa, Department of Statistics, University of Oxford 3 February 2016 Example Consider a study of 983 individuals and examine the relationship between duration of breastfeeding

More information