VARIANCE COMPONENT ANALYSIS

Size: px
Start display at page:

Download "VARIANCE COMPONENT ANALYSIS"

Transcription

1 VARIANCE COMPONENT ANALYSIS T. KRISHNAN Cranes Software International Limited Mahatma Gandhi Road, Bangalore krishnan.t@systat.com 1. Introduction In an experiment to compare the yields of two varieties of wheat, 10 farms participated, and in each farm both varieties were grown. All the 0 plots in the experiment were of equal area. The data on the yield in quintals is given below: Farm No. Variety A Variety B Note that the yields of Variety A and Variety B are correlated, because the conditions for both the varieties in a given farm would be the same. A standard method to analyze this kind of data is the paired t-test. Let x i be the yield for Variety A for the i th farm, and let y i be for Variety B. Then the paired t-test computes the differences and checks if z i = x i - y i, is far from 0 using the t distribution with 9 degrees of freedom, where Z is the mean of the Z i s. Let us perform this test. The results are: Hypothesis Testing: Paired t-test Paired Samples t-test on Variety A vs Variety B with 10 Cases Alternative = 'not equal' Sample No Variety A Variety B Mean Difference 95% CI SD of difference t df p- value to

2 Paired Samples t-test on Variety A vs Variety B with 10 Cases Alternative = 'not equal' Sample No Variety A Variety B Mean Difference 95% CI SD of difference t df p- value This test assumes that z i 's are independently and identically distributed normal random variables, which is the case if, for example, each (x i,y i ) pair is independently distributed as bivariate normal N(µ 1,µ, Σ ), where Σ is the covariance matrix. However, if we consider the data set for a moment, we can see that Σ cannot be just any covariance matrix. It is highly likely that the yields of the two varieties in the same farm will be positively correlated. Popular as it is, the paired t-test nonetheless fails to take this extra information about the data into account. It collapses the pairs (x i,y i ), into the differences z i, and thus fails to utilize the correlation structure of the original data. One way to remedy this loss of information is to assume that each measurement is made up of three components: The effect of the farm: It is customary to express the effect of the i th farm as µ + α i, where µ is called the mean effect, denoting the average level of yield over all farms, while α i denotes the departure of the i th farm from this average. The effect of the variety (this is where our interest lies). We shall denote the effect of the j th variety by β j for j =1,. Random error, which we call ε ij. So we have the model Yield = Overall effect + Farm effect + Variety effect + Random error, or in notations, y ij = µ + α i + β j + ε i j, where i =1,...,10 and j =1,. Here y ij is the yield of the j th variety in the i th farm. Thus, we have renamed x i as y i1 and y i as y i. In this sort of situation, the focus of the analysis is to determine which variety is better (greater yield) and by how much, over the collection of farms for which this exercise is being done. We are not particularly interested in these 10 farms and since experiments require experimental units (farms), farms inevitably come into the picture. The interest in these farms is only in so far as they represent the population of farms. In that spirit, we consider these farms as a random sample from a population of farms. Thus the farm effect α i 's are considered random and their variation which affects the comparison of varieties is of interest. The effects µ, β j s are fixed as before. A linear model where some (or all) of the parameters are random is called a linear mixed model. Here α i 's are called random effects, while µ, β j 's are called fixed effects. We assume that α i 's and ε ij 's are independent Gaussian (normal) random variables with zero mean. In this example we shall assume that

3 α i 's are distributed independently as N(0, σ a ), while ε ij 's have independent N(0, σ a ) distributions. It is easy to check that the correlation between the yields of the two varieties for the same farm is indeed positive under this model, since Cov (y i1,y i ) = Var (α i ) = σ a > 0. The model that we have formulated here is called a Variance Component (VC) Model, because the variance of each observation is the sum of two variances. One way to carry out a VC analysis of this data set is to consider the data as a two-way classification of Farm Variety and obtain the following ANOVA table. Notice that the F-test for Variety coincides with the paired t-test earlier (F = t ) and the p-values are the same. In the VC model, Var (y ij ) = e a σ + a e σ for all i, j; Cov (y i1,y i ) = e σ a as noted earlier, and so Var (y i1 - y i ) = σ a + σ - σ = σ. The paired t-test uses this as the unknown variance of the Z s; since it is unknown the coefficient does not matter. So what have we gained by the VC model in relation to the paired t-test? In the paired t- test output we have an estimate of σ e as (0.95) = and here we have directly an estimate of variance component σ e as However, VC analysis gives an estimate of σa which was lost in the paired analysis because we analyzed only the difference. This estimate is also a useful quantity, the variation from farm to farm. We discuss this further in the sequel. Analysis of Variance table Source SS Numerator df Denominator df Mean Square F-ratio p-value Variety Farm Error Source Variance Components SE Z p-value Lower 95% Upper 95% Farm Error Fixed Effects versus Random Effects When there are random effects in the data, the randomness in the data is thus split up into two parts: random effects and random error. We always assume that the random errors and 3

4 random effects are independent and are Gaussian with zero mean. The random effects need not be independent among themselves. The random errors may also be interdependent. Owing to the presence of the random effects the original observations are also correlated. Different covariance structures are used for the random effects as well as the random errors. But first let us see why one would want to consider an effect in a linear model as random. Consider the following data set on yield of wheat pertaining to two varieties and three farms. Each farm uses each variety on four plots. The data set is given below. Comparing varieties of wheat (Yield) FARM VARIETY 1 VARIETY 1 67, 73, 59, 84 75, 61, 67, 58 9, 84, 94, 83 54, 78, 61, , 7, 76, 64 4, 44, 80, 83 Let y ijk denote the yield of the k th plot in the i th farm using the j th variety. Then y ijk is the resultant of the i th farm effect as well as the j th variety effect. We shall assume that the plots are all more or less identical. So we have the linear model y ij = µ + α i + β j + ε i j. Here µ is the mean effect, α i is the i th farm effect, and β j is the effect of the j th variety. The ε's, as usual, denote the random errors. Now let us pause for a moment and wonder why one would really collect and analyze a data set of this kind. In other words, what type of inference do we want to make? There are two possible answers to this. First, we may be interested in knowing how these three farms perform using the two varieties. This question is of interest to, for instance, the owner of the farms, when he/she wants to decide which variety to grow. Here he/she has a specific set of farms in mind. Second, an agronomist may want to compare the two varieties irrespective of the farms. He does not have any specific set of farms in mind. He is comparing the performance of Variety 1 as applied by some randomly selected farm, with the performance of Variety applied by another (possibly different) randomly selected farm. In the first case all the effects are fixed. In the second case, the farm effects α i 's are random. Let us analyze the data set under both the models to see how the inference differs. First, the fixed effects model. The results are below. They give 1. an analysis of variance table where it is seen that farm differences are not significant and treatment difference is significant. 4

5 . estimates of the difference between farms 1 and 3, and farms and 3, with standard errors and tests of significance; also confidence intervals of the differences; 3. the difference between varieties 1 and, with standard error and test of significance; also confidence intervals of the differences. Analysis of Variance Source SS DF MS F p-value Farm Variety Error Estimates of fixed effects Effect Level Estimate SE df t p-value Intercept Farm Farm Farm Variety Variety CI's of fixed effects estimates Effect Level Estimate Lower 95% Upper 95% Intercept Farm Farm

6 CI's of fixed effects estimates Effect Level Estimate Lower 95% Upper 95% Farm Variety Variety If we use a model where the farm effect is random, the analysis is the same although the interpretations of the mean squares are different in the sense that the variance σe of the farm effect will be involved. Otherwise the conclusions are the same. Now let us introduce an interaction term in the model as follows: Yield = Overall effect + Farm effect + Variety effect + Interaction effect + Random error, y ijk = µ + α i + β j + γ ij + ε i jk. and consider both the farm and interaction effects to be random. Then the situation becomes quite different. The ANOVA table is Analysis of Variance Source SS DF MS F p-value Farm Variety Farm*Variety Error This ANOVA table is rather different from the earlier one without interaction, because of the additional interaction term with DF which was a part of the error term earlier. If you have more than one observation per cell (combination of farm and variety) it is possible to include the interaction term in the model and analyze it. Now when the interaction term is present and is a random effect, the variance due to Variety estimated by the Variety MS, has this interaction variance also as a part. The Variety and Interaction (Farm* Variety) variances differ only in the extra term in Variety due to Variety effect in terms of differences in β s. Hence under the hypothesis of no Variety effect, this term becomes 6

7 zero. Hence the correct denominator to test Variety is the Farm*Variety MS and not the Error MS. Hence the Variety F(1,)-ratio and the p-value are different. The p-value has gone down. This means that the varieties appear more significantly different when used over a population of farms, than when used for just a specific set of farms. It could have been the other way around also. Then the interpretation would be as follows: The significant difference in the fixed effects model implies that if the same farm uses both the varieties then the results are different. The lack of significance in the mixed effects model means that a random farm using one variety has more or less the same performance as a (possibly different) random farm using the other method. This is the case if, for instance, there is a lot of variability among the farms, and the difference between the varieties is swamped out by it. It is not the case here, though. A bad farm with a good variety may not perform much differently from a good farm with a bad variety. 3. Why use Random Effects? A linear model, just like any other statistical model, tries to capture the essence of the process generating the data, rather than that of the data themselves. We want our inference to hold not only for the given data set but also for future replications of the same experiment. So the choice of the model is dictated by what type of replications we have in mind. Depending upon this, there are different reasons behind treating an effect as random in a model. Here we outline three common situations. If we plan to use the same levels of an effect in all fresh replications, then we may treat the effect as fixed. However, if we plan to use fresh levels of the effecting different replications, then we should make the effect random. Inference based on random effects models is valid for a population of all possible levels of the random effects. The example above furnished an illustration. In such situations, the random coefficients are all independently and identically distributed, as they represent randomly selected levels of the effect. So the resulting model is a variance components model. In some cases, an effect may be considered random even if we plan to use the same levels for all future replications. Consider, for instance, a designed experiment where three operators in a farm are operating two tractors, the response being a score that combines the quality and quantity of the yield produced in a given season. A suitable model for this situation may be y ijk = µ + α i + β j + γ ij + ε i jk, where y ijk is the score for the k th run of the i th tractor operated by the j th operator. If the farm has only these three operators to operate the tractors, then the farm authorities would have to always choose the same three operators in all future replications of the experiment. However, the same operator may behave slightly differently from one replication of the experiment to the next depending on unpredictable factors like his mood. In this case, we would be justified in considering the operator effect as random. However, since the mood variability of the different operators may be different, the random coefficients β j 's need not be identically distributed. In fact, they may also be 7

8 correlated, because the moods of the all the operators may be governed by some common random condition prevailing during a replication of the experiment (e.g., weather during the experiment) that is difficult to control. Indeed, McLean, Sanders, and Stroup (1991) also suggest a model where the operator effect is fixed but the interaction effect (γ ij ) is random. Such a model would be appropriate if we consider the main effect as a measure of the proficiency of the operator, which is not likely to change between replications. However, the mood fluctuations may affect how an operator performs on a given tractor. Such models where the random effect coefficients may not be independently and identically distributed are more general than simple variance components models. A third situation that leads to random effects is where the model is developed in a multilevel fashion. Consider a situation where we want to linearly regress a response variable y (say, yield) on a predictor variable x (water). However, we believe that the regression slope is a random effect that depends on the values of a categorical variable z (variety). Then we have a two-level model. In the first level we model y in terms of x: y ijk = α + β j x ijk + ε i jk. Here j denotes the levels of the categorical variable z. In the second level we model the (random) regression slope in terms of β j = a + b j. Here b j 's are random effect coefficients. Putting the second level equation in the first we get the composite model y ijk = α + (a + b j ) x ijk + ε ijk = α + a x ijk + b j x ijk + ε ijk. This means that here x is present in the fixed part a x ijk as well as in the random part b j x ijk effect. If the deeper levels in a multi-level model have their own random errors, then they lead to random effects in the composite model. References and Suggested Reading Cox, D.R. and Solomon, P.J. (00). Components of Variance. New York: Chapman & Hall/CRC. McLean, R.A., Sanders, W.L., and Stroup, W.W. (1991). A unified approach to mixed linear models. The American Statistician, 45, Milliken, G.A. and Johnson, D.E. (199). Analysis of messy data, Volume I: Designed experiments. London: Chapman and Hall. Searle, S.R., Casella, G., and McCulloch, C.E. (199). Variance Components. New York: John Wiley & Sons. 8

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

Stat 705: Completely randomized and complete block designs

Stat 705: Completely randomized and complete block designs Stat 705: Completely randomized and complete block designs Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 16 Experimental design Our department offers

More information

Chap The McGraw-Hill Companies, Inc. All rights reserved.

Chap The McGraw-Hill Companies, Inc. All rights reserved. 11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview

More information

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Qualitative IV Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Analysis of variance

Analysis of variance Analysis of variance Andrew Gelman March 4, 2006 Abstract Analysis of variance (ANOVA) is a statistical procedure for summarizing a classical linear model a decomposition of sum of squares into a component

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Lecture 9. ANOVA: Random-effects model, sample size

Lecture 9. ANOVA: Random-effects model, sample size Lecture 9. ANOVA: Random-effects model, sample size Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regressions and Analysis of Variance fall 2015 Fixed or random? Is it reasonable

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Battery Life. Factory

Battery Life. Factory Statistics 354 (Fall 2018) Analysis of Variance: Comparing Several Means Remark. These notes are from an elementary statistics class and introduce the Analysis of Variance technique for comparing several

More information

Statistical Techniques II EXST7015 Simple Linear Regression

Statistical Techniques II EXST7015 Simple Linear Regression Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance

More information

Comparison of prediction quality of the best linear unbiased predictors in time series linear regression models

Comparison of prediction quality of the best linear unbiased predictors in time series linear regression models 1 Comparison of prediction quality of the best linear unbiased predictors in time series linear regression models Martina Hančová Institute of Mathematics, P. J. Šafárik University in Košice Jesenná 5,

More information

ACOVA and Interactions

ACOVA and Interactions Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

III. Inferential Tools

III. Inferential Tools III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Introduction to Regression

Introduction to Regression Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments. Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a

More information

MATH Chapter 21 Notes Two Sample Problems

MATH Chapter 21 Notes Two Sample Problems MATH 1070 - Chapter 21 Notes Two Sample Problems Recall: So far, we have dealt with inference (confidence intervals and hypothesis testing) pertaining to: Single sample of data. A matched pairs design

More information

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of

More information

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y. Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA

More information

Reference: Chapter 13 of Montgomery (8e)

Reference: Chapter 13 of Montgomery (8e) Reference: Chapter 1 of Montgomery (8e) Maghsoodloo 89 Factorial Experiments with Random Factors So far emphasis has been placed on factorial experiments where all factors are at a, b, c,... fixed levels

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

2.1: Inferences about β 1

2.1: Inferences about β 1 Chapter 2 1 2.1: Inferences about β 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1. Idea: If b 1 can be written as a linear combination of the responses (which

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Factorial designs. Experiments

Factorial designs. Experiments Chapter 5: Factorial designs Petter Mostad mostad@chalmers.se Experiments Actively making changes and observing the result, to find causal relationships. Many types of experimental plans Measuring response

More information

Basic Probability Reference Sheet

Basic Probability Reference Sheet February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Estadística II Chapter 4: Simple linear regression

Estadística II Chapter 4: Simple linear regression Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design

PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design The purpose of this experiment was to determine differences in alkaloid concentration of tea leaves, based on herb variety (Factor A)

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample

More information

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical

More information

Difference in two or more average scores in different groups

Difference in two or more average scores in different groups ANOVAs Analysis of Variance (ANOVA) Difference in two or more average scores in different groups Each participant tested once Same outcome tested in each group Simplest is one-way ANOVA (one variable as

More information

L6: Regression II. JJ Chen. July 2, 2015

L6: Regression II. JJ Chen. July 2, 2015 L6: Regression II JJ Chen July 2, 2015 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error,

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Stat 217 Final Exam. Name: May 1, 2002

Stat 217 Final Exam. Name: May 1, 2002 Stat 217 Final Exam Name: May 1, 2002 Problem 1. Three brands of batteries are under study. It is suspected that the lives (in weeks) of the three brands are different. Five batteries of each brand are

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:

More information

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000 Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination

More information

Multiple comparisons - subsequent inferences for two-way ANOVA

Multiple comparisons - subsequent inferences for two-way ANOVA 1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of

More information

Design of Experiments. Factorial experiments require a lot of resources

Design of Experiments. Factorial experiments require a lot of resources Design of Experiments Factorial experiments require a lot of resources Sometimes real-world practical considerations require us to design experiments in specialized ways. The design of an experiment is

More information

Chapter 13 Experiments with Random Factors Solutions

Chapter 13 Experiments with Random Factors Solutions Solutions from Montgomery, D. C. (01) Design and Analysis of Experiments, Wiley, NY Chapter 13 Experiments with Random Factors Solutions 13.. An article by Hoof and Berman ( Statistical Analysis of Power

More information

One-way between-subjects ANOVA. Comparing three or more independent means

One-way between-subjects ANOVA. Comparing three or more independent means One-way between-subjects ANOVA Comparing three or more independent means ANOVA: A Framework Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one-way between-subjects

More information

Lecture 3: Analysis of Variance II

Lecture 3: Analysis of Variance II Lecture 3: Analysis of Variance II http://www.stats.ox.ac.uk/ winkel/phs.html Dr Matthias Winkel 1 Outline I. A second introduction to two-way ANOVA II. Repeated measures design III. Independent versus

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication

CHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication CHAPTER 4 Analysis of Variance One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication 1 Introduction In this chapter, expand the idea of hypothesis tests. We

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

ANALYZING BINOMIAL DATA IN A SPLIT- PLOT DESIGN: CLASSICAL APPROACHES OR MODERN TECHNIQUES?

ANALYZING BINOMIAL DATA IN A SPLIT- PLOT DESIGN: CLASSICAL APPROACHES OR MODERN TECHNIQUES? Libraries Conference on Applied Statistics in Agriculture 2004-16th Annual Conference Proceedings ANALYZING BINOMIAL DATA IN A SPLIT- PLOT DESIGN: CLASSICAL APPROACHES OR MODERN TECHNIQUES? Liang Fang

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

MORE ON SIMPLE REGRESSION: OVERVIEW

MORE ON SIMPLE REGRESSION: OVERVIEW FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

20.0 Experimental Design

20.0 Experimental Design 20.0 Experimental Design Answer Questions 1 Philosophy One-Way ANOVA Egg Sample Multiple Comparisons 20.1 Philosophy Experiments are often expensive and/or dangerous. One wants to use good techniques that

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

More information

,i = 1,2,L, p. For a sample of size n, let the columns of data be

,i = 1,2,L, p. For a sample of size n, let the columns of data be MAC IIci: Miller Asymptotics Chapter 5: Regression Section?: Asymptotic Relationship Between a CC and its Associated Slope Estimates in Multiple Linear Regression The asymptotic null distribution of a

More information

Longitudinal Data Analysis of Health Outcomes

Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

Analysis of Variance

Analysis of Variance Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also

More information

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,

Sampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM, Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) Much of statistical inference centers around the ability to distinguish between two or more groups in terms of some underlying response variable y. Sometimes, there are but

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information