VARIANCE COMPONENT ANALYSIS
|
|
- Edgar Ross
- 5 years ago
- Views:
Transcription
1 VARIANCE COMPONENT ANALYSIS T. KRISHNAN Cranes Software International Limited Mahatma Gandhi Road, Bangalore krishnan.t@systat.com 1. Introduction In an experiment to compare the yields of two varieties of wheat, 10 farms participated, and in each farm both varieties were grown. All the 0 plots in the experiment were of equal area. The data on the yield in quintals is given below: Farm No. Variety A Variety B Note that the yields of Variety A and Variety B are correlated, because the conditions for both the varieties in a given farm would be the same. A standard method to analyze this kind of data is the paired t-test. Let x i be the yield for Variety A for the i th farm, and let y i be for Variety B. Then the paired t-test computes the differences and checks if z i = x i - y i, is far from 0 using the t distribution with 9 degrees of freedom, where Z is the mean of the Z i s. Let us perform this test. The results are: Hypothesis Testing: Paired t-test Paired Samples t-test on Variety A vs Variety B with 10 Cases Alternative = 'not equal' Sample No Variety A Variety B Mean Difference 95% CI SD of difference t df p- value to
2 Paired Samples t-test on Variety A vs Variety B with 10 Cases Alternative = 'not equal' Sample No Variety A Variety B Mean Difference 95% CI SD of difference t df p- value This test assumes that z i 's are independently and identically distributed normal random variables, which is the case if, for example, each (x i,y i ) pair is independently distributed as bivariate normal N(µ 1,µ, Σ ), where Σ is the covariance matrix. However, if we consider the data set for a moment, we can see that Σ cannot be just any covariance matrix. It is highly likely that the yields of the two varieties in the same farm will be positively correlated. Popular as it is, the paired t-test nonetheless fails to take this extra information about the data into account. It collapses the pairs (x i,y i ), into the differences z i, and thus fails to utilize the correlation structure of the original data. One way to remedy this loss of information is to assume that each measurement is made up of three components: The effect of the farm: It is customary to express the effect of the i th farm as µ + α i, where µ is called the mean effect, denoting the average level of yield over all farms, while α i denotes the departure of the i th farm from this average. The effect of the variety (this is where our interest lies). We shall denote the effect of the j th variety by β j for j =1,. Random error, which we call ε ij. So we have the model Yield = Overall effect + Farm effect + Variety effect + Random error, or in notations, y ij = µ + α i + β j + ε i j, where i =1,...,10 and j =1,. Here y ij is the yield of the j th variety in the i th farm. Thus, we have renamed x i as y i1 and y i as y i. In this sort of situation, the focus of the analysis is to determine which variety is better (greater yield) and by how much, over the collection of farms for which this exercise is being done. We are not particularly interested in these 10 farms and since experiments require experimental units (farms), farms inevitably come into the picture. The interest in these farms is only in so far as they represent the population of farms. In that spirit, we consider these farms as a random sample from a population of farms. Thus the farm effect α i 's are considered random and their variation which affects the comparison of varieties is of interest. The effects µ, β j s are fixed as before. A linear model where some (or all) of the parameters are random is called a linear mixed model. Here α i 's are called random effects, while µ, β j 's are called fixed effects. We assume that α i 's and ε ij 's are independent Gaussian (normal) random variables with zero mean. In this example we shall assume that
3 α i 's are distributed independently as N(0, σ a ), while ε ij 's have independent N(0, σ a ) distributions. It is easy to check that the correlation between the yields of the two varieties for the same farm is indeed positive under this model, since Cov (y i1,y i ) = Var (α i ) = σ a > 0. The model that we have formulated here is called a Variance Component (VC) Model, because the variance of each observation is the sum of two variances. One way to carry out a VC analysis of this data set is to consider the data as a two-way classification of Farm Variety and obtain the following ANOVA table. Notice that the F-test for Variety coincides with the paired t-test earlier (F = t ) and the p-values are the same. In the VC model, Var (y ij ) = e a σ + a e σ for all i, j; Cov (y i1,y i ) = e σ a as noted earlier, and so Var (y i1 - y i ) = σ a + σ - σ = σ. The paired t-test uses this as the unknown variance of the Z s; since it is unknown the coefficient does not matter. So what have we gained by the VC model in relation to the paired t-test? In the paired t- test output we have an estimate of σ e as (0.95) = and here we have directly an estimate of variance component σ e as However, VC analysis gives an estimate of σa which was lost in the paired analysis because we analyzed only the difference. This estimate is also a useful quantity, the variation from farm to farm. We discuss this further in the sequel. Analysis of Variance table Source SS Numerator df Denominator df Mean Square F-ratio p-value Variety Farm Error Source Variance Components SE Z p-value Lower 95% Upper 95% Farm Error Fixed Effects versus Random Effects When there are random effects in the data, the randomness in the data is thus split up into two parts: random effects and random error. We always assume that the random errors and 3
4 random effects are independent and are Gaussian with zero mean. The random effects need not be independent among themselves. The random errors may also be interdependent. Owing to the presence of the random effects the original observations are also correlated. Different covariance structures are used for the random effects as well as the random errors. But first let us see why one would want to consider an effect in a linear model as random. Consider the following data set on yield of wheat pertaining to two varieties and three farms. Each farm uses each variety on four plots. The data set is given below. Comparing varieties of wheat (Yield) FARM VARIETY 1 VARIETY 1 67, 73, 59, 84 75, 61, 67, 58 9, 84, 94, 83 54, 78, 61, , 7, 76, 64 4, 44, 80, 83 Let y ijk denote the yield of the k th plot in the i th farm using the j th variety. Then y ijk is the resultant of the i th farm effect as well as the j th variety effect. We shall assume that the plots are all more or less identical. So we have the linear model y ij = µ + α i + β j + ε i j. Here µ is the mean effect, α i is the i th farm effect, and β j is the effect of the j th variety. The ε's, as usual, denote the random errors. Now let us pause for a moment and wonder why one would really collect and analyze a data set of this kind. In other words, what type of inference do we want to make? There are two possible answers to this. First, we may be interested in knowing how these three farms perform using the two varieties. This question is of interest to, for instance, the owner of the farms, when he/she wants to decide which variety to grow. Here he/she has a specific set of farms in mind. Second, an agronomist may want to compare the two varieties irrespective of the farms. He does not have any specific set of farms in mind. He is comparing the performance of Variety 1 as applied by some randomly selected farm, with the performance of Variety applied by another (possibly different) randomly selected farm. In the first case all the effects are fixed. In the second case, the farm effects α i 's are random. Let us analyze the data set under both the models to see how the inference differs. First, the fixed effects model. The results are below. They give 1. an analysis of variance table where it is seen that farm differences are not significant and treatment difference is significant. 4
5 . estimates of the difference between farms 1 and 3, and farms and 3, with standard errors and tests of significance; also confidence intervals of the differences; 3. the difference between varieties 1 and, with standard error and test of significance; also confidence intervals of the differences. Analysis of Variance Source SS DF MS F p-value Farm Variety Error Estimates of fixed effects Effect Level Estimate SE df t p-value Intercept Farm Farm Farm Variety Variety CI's of fixed effects estimates Effect Level Estimate Lower 95% Upper 95% Intercept Farm Farm
6 CI's of fixed effects estimates Effect Level Estimate Lower 95% Upper 95% Farm Variety Variety If we use a model where the farm effect is random, the analysis is the same although the interpretations of the mean squares are different in the sense that the variance σe of the farm effect will be involved. Otherwise the conclusions are the same. Now let us introduce an interaction term in the model as follows: Yield = Overall effect + Farm effect + Variety effect + Interaction effect + Random error, y ijk = µ + α i + β j + γ ij + ε i jk. and consider both the farm and interaction effects to be random. Then the situation becomes quite different. The ANOVA table is Analysis of Variance Source SS DF MS F p-value Farm Variety Farm*Variety Error This ANOVA table is rather different from the earlier one without interaction, because of the additional interaction term with DF which was a part of the error term earlier. If you have more than one observation per cell (combination of farm and variety) it is possible to include the interaction term in the model and analyze it. Now when the interaction term is present and is a random effect, the variance due to Variety estimated by the Variety MS, has this interaction variance also as a part. The Variety and Interaction (Farm* Variety) variances differ only in the extra term in Variety due to Variety effect in terms of differences in β s. Hence under the hypothesis of no Variety effect, this term becomes 6
7 zero. Hence the correct denominator to test Variety is the Farm*Variety MS and not the Error MS. Hence the Variety F(1,)-ratio and the p-value are different. The p-value has gone down. This means that the varieties appear more significantly different when used over a population of farms, than when used for just a specific set of farms. It could have been the other way around also. Then the interpretation would be as follows: The significant difference in the fixed effects model implies that if the same farm uses both the varieties then the results are different. The lack of significance in the mixed effects model means that a random farm using one variety has more or less the same performance as a (possibly different) random farm using the other method. This is the case if, for instance, there is a lot of variability among the farms, and the difference between the varieties is swamped out by it. It is not the case here, though. A bad farm with a good variety may not perform much differently from a good farm with a bad variety. 3. Why use Random Effects? A linear model, just like any other statistical model, tries to capture the essence of the process generating the data, rather than that of the data themselves. We want our inference to hold not only for the given data set but also for future replications of the same experiment. So the choice of the model is dictated by what type of replications we have in mind. Depending upon this, there are different reasons behind treating an effect as random in a model. Here we outline three common situations. If we plan to use the same levels of an effect in all fresh replications, then we may treat the effect as fixed. However, if we plan to use fresh levels of the effecting different replications, then we should make the effect random. Inference based on random effects models is valid for a population of all possible levels of the random effects. The example above furnished an illustration. In such situations, the random coefficients are all independently and identically distributed, as they represent randomly selected levels of the effect. So the resulting model is a variance components model. In some cases, an effect may be considered random even if we plan to use the same levels for all future replications. Consider, for instance, a designed experiment where three operators in a farm are operating two tractors, the response being a score that combines the quality and quantity of the yield produced in a given season. A suitable model for this situation may be y ijk = µ + α i + β j + γ ij + ε i jk, where y ijk is the score for the k th run of the i th tractor operated by the j th operator. If the farm has only these three operators to operate the tractors, then the farm authorities would have to always choose the same three operators in all future replications of the experiment. However, the same operator may behave slightly differently from one replication of the experiment to the next depending on unpredictable factors like his mood. In this case, we would be justified in considering the operator effect as random. However, since the mood variability of the different operators may be different, the random coefficients β j 's need not be identically distributed. In fact, they may also be 7
8 correlated, because the moods of the all the operators may be governed by some common random condition prevailing during a replication of the experiment (e.g., weather during the experiment) that is difficult to control. Indeed, McLean, Sanders, and Stroup (1991) also suggest a model where the operator effect is fixed but the interaction effect (γ ij ) is random. Such a model would be appropriate if we consider the main effect as a measure of the proficiency of the operator, which is not likely to change between replications. However, the mood fluctuations may affect how an operator performs on a given tractor. Such models where the random effect coefficients may not be independently and identically distributed are more general than simple variance components models. A third situation that leads to random effects is where the model is developed in a multilevel fashion. Consider a situation where we want to linearly regress a response variable y (say, yield) on a predictor variable x (water). However, we believe that the regression slope is a random effect that depends on the values of a categorical variable z (variety). Then we have a two-level model. In the first level we model y in terms of x: y ijk = α + β j x ijk + ε i jk. Here j denotes the levels of the categorical variable z. In the second level we model the (random) regression slope in terms of β j = a + b j. Here b j 's are random effect coefficients. Putting the second level equation in the first we get the composite model y ijk = α + (a + b j ) x ijk + ε ijk = α + a x ijk + b j x ijk + ε ijk. This means that here x is present in the fixed part a x ijk as well as in the random part b j x ijk effect. If the deeper levels in a multi-level model have their own random errors, then they lead to random effects in the composite model. References and Suggested Reading Cox, D.R. and Solomon, P.J. (00). Components of Variance. New York: Chapman & Hall/CRC. McLean, R.A., Sanders, W.L., and Stroup, W.W. (1991). A unified approach to mixed linear models. The American Statistician, 45, Milliken, G.A. and Johnson, D.E. (199). Analysis of messy data, Volume I: Designed experiments. London: Chapman and Hall. Searle, S.R., Casella, G., and McCulloch, C.E. (199). Variance Components. New York: John Wiley & Sons. 8
Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters
Objectives 10.1 Simple linear regression Statistical model for linear regression Estimating the regression parameters Confidence interval for regression parameters Significance test for the slope Confidence
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationMULTIVARIATE ANALYSIS OF VARIANCE
MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,
More informationStat 705: Completely randomized and complete block designs
Stat 705: Completely randomized and complete block designs Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 16 Experimental design Our department offers
More informationChap The McGraw-Hill Companies, Inc. All rights reserved.
11 pter11 Chap Analysis of Variance Overview of ANOVA Multiple Comparisons Tests for Homogeneity of Variances Two-Factor ANOVA Without Replication General Linear Model Experimental Design: An Overview
More informationResidual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,
Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying
More informationVariance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.
10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for
More informationSimple Linear Regression: One Qualitative IV
Simple Linear Regression: One Qualitative IV 1. Purpose As noted before regression is used both to explain and predict variation in DVs, and adding to the equation categorical variables extends regression
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationAnalysis of variance
Analysis of variance Andrew Gelman March 4, 2006 Abstract Analysis of variance (ANOVA) is a statistical procedure for summarizing a classical linear model a decomposition of sum of squares into a component
More informationAnalysis of Variance and Co-variance. By Manza Ramesh
Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method
More informationLecture 9. ANOVA: Random-effects model, sample size
Lecture 9. ANOVA: Random-effects model, sample size Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regressions and Analysis of Variance fall 2015 Fixed or random? Is it reasonable
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationBattery Life. Factory
Statistics 354 (Fall 2018) Analysis of Variance: Comparing Several Means Remark. These notes are from an elementary statistics class and introduce the Analysis of Variance technique for comparing several
More informationStatistical Techniques II EXST7015 Simple Linear Regression
Statistical Techniques II EXST7015 Simple Linear Regression 03a_SLR 1 Y - the dependent variable 35 30 25 The objective Given points plotted on two coordinates, Y and X, find the best line to fit the data.
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationSMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning
SMA 6304 / MIT 2.853 / MIT 2.854 Manufacturing Systems Lecture 10: Data and Regression Analysis Lecturer: Prof. Duane S. Boning 1 Agenda 1. Comparison of Treatments (One Variable) Analysis of Variance
More informationComparison of prediction quality of the best linear unbiased predictors in time series linear regression models
1 Comparison of prediction quality of the best linear unbiased predictors in time series linear regression models Martina Hančová Institute of Mathematics, P. J. Šafárik University in Košice Jesenná 5,
More informationACOVA and Interactions
Chapter 15 ACOVA and Interactions Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA
More informationInference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationIII. Inferential Tools
III. Inferential Tools A. Introduction to Bat Echolocation Data (10.1.1) 1. Q: Do echolocating bats expend more enery than non-echolocating bats and birds, after accounting for mass? 2. Strategy: (i) Explore
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationIntroduction to Regression
Regression Introduction to Regression If two variables covary, we should be able to predict the value of one variable from another. Correlation only tells us how much two variables covary. In regression,
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationAnalysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.
Analysis of Covariance In some experiments, the experimental units (subjects) are nonhomogeneous or there is variation in the experimental conditions that are not due to the treatments. For example, a
More informationMATH Chapter 21 Notes Two Sample Problems
MATH 1070 - Chapter 21 Notes Two Sample Problems Recall: So far, we have dealt with inference (confidence intervals and hypothesis testing) pertaining to: Single sample of data. A matched pairs design
More informationEcon 3790: Business and Economics Statistics. Instructor: Yogesh Uppal
Econ 3790: Business and Economics Statistics Instructor: Yogesh Uppal yuppal@ysu.edu Sampling Distribution of b 1 Expected value of b 1 : Variance of b 1 : E(b 1 ) = 1 Var(b 1 ) = σ 2 /SS x Estimate of
More informationRegression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.
Regression Bivariate i linear regression: Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables and. Generally describe as a
More informationMeasuring relationships among multiple responses
Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.
More informationIntro to Linear Regression
Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA
More informationReference: Chapter 13 of Montgomery (8e)
Reference: Chapter 1 of Montgomery (8e) Maghsoodloo 89 Factorial Experiments with Random Factors So far emphasis has been placed on factorial experiments where all factors are at a, b, c,... fixed levels
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More information2.1: Inferences about β 1
Chapter 2 1 2.1: Inferences about β 1 Test of interest throughout regression: Need sampling distribution of the estimator b 1. Idea: If b 1 can be written as a linear combination of the responses (which
More informationIntro to Linear Regression
Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationFactorial designs. Experiments
Chapter 5: Factorial designs Petter Mostad mostad@chalmers.se Experiments Actively making changes and observing the result, to find causal relationships. Many types of experimental plans Measuring response
More informationBasic Probability Reference Sheet
February 27, 2001 Basic Probability Reference Sheet 17.846, 2001 This is intended to be used in addition to, not as a substitute for, a textbook. X is a random variable. This means that X is a variable
More informationThis gives us an upper and lower bound that capture our population mean.
Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when
More informationEstadística II Chapter 4: Simple linear regression
Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties
More informationCorrelation and Regression
Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationPROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design
PROBLEM TWO (ALKALOID CONCENTRATIONS IN TEA) 1. Statistical Design The purpose of this experiment was to determine differences in alkaloid concentration of tea leaves, based on herb variety (Factor A)
More informationHypothesis Testing hypothesis testing approach
Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we
More informationDr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46
BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics
More informationDESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya
DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Jurusan Teknik Industri Universitas Brawijaya Outline Introduction The Analysis of Variance Models for the Data Post-ANOVA Comparison of Means Sample
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationDifference in two or more average scores in different groups
ANOVAs Analysis of Variance (ANOVA) Difference in two or more average scores in different groups Each participant tested once Same outcome tested in each group Simplest is one-way ANOVA (one variable as
More informationL6: Regression II. JJ Chen. July 2, 2015
L6: Regression II JJ Chen July 2, 2015 Today s Plan Review basic inference based on Sample average Difference in sample average Extrapolate the knowledge to sample regression coefficients Standard error,
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationStat 217 Final Exam. Name: May 1, 2002
Stat 217 Final Exam Name: May 1, 2002 Problem 1. Three brands of batteries are under study. It is suspected that the lives (in weeks) of the three brands are different. Five batteries of each brand are
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationBasic Business Statistics, 10/e
Chapter 4 4- Basic Business Statistics th Edition Chapter 4 Introduction to Multiple Regression Basic Business Statistics, e 9 Prentice-Hall, Inc. Chap 4- Learning Objectives In this chapter, you learn:
More informationLecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000
Lecture 14 Analysis of Variance * Correlation and Regression Outline Analysis of Variance (ANOVA) 11-1 Introduction 11-2 Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationLecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)
Outline Lecture 14 Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA) 11-1 Introduction 11- Scatter Plots 11-3 Correlation 11-4 Regression Outline 11-5 Coefficient of Determination
More informationMultiple comparisons - subsequent inferences for two-way ANOVA
1 Multiple comparisons - subsequent inferences for two-way ANOVA the kinds of inferences to be made after the F tests of a two-way ANOVA depend on the results if none of the F tests lead to rejection of
More informationDesign of Experiments. Factorial experiments require a lot of resources
Design of Experiments Factorial experiments require a lot of resources Sometimes real-world practical considerations require us to design experiments in specialized ways. The design of an experiment is
More informationChapter 13 Experiments with Random Factors Solutions
Solutions from Montgomery, D. C. (01) Design and Analysis of Experiments, Wiley, NY Chapter 13 Experiments with Random Factors Solutions 13.. An article by Hoof and Berman ( Statistical Analysis of Power
More informationOne-way between-subjects ANOVA. Comparing three or more independent means
One-way between-subjects ANOVA Comparing three or more independent means ANOVA: A Framework Understand the basic principles of ANOVA Why it is done? What it tells us? Theory of one-way between-subjects
More informationLecture 3: Analysis of Variance II
Lecture 3: Analysis of Variance II http://www.stats.ox.ac.uk/ winkel/phs.html Dr Matthias Winkel 1 Outline I. A second introduction to two-way ANOVA II. Repeated measures design III. Independent versus
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationCHAPTER 4 Analysis of Variance. One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication
CHAPTER 4 Analysis of Variance One-way ANOVA Two-way ANOVA i) Two way ANOVA without replication ii) Two way ANOVA with replication 1 Introduction In this chapter, expand the idea of hypothesis tests. We
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationANALYZING BINOMIAL DATA IN A SPLIT- PLOT DESIGN: CLASSICAL APPROACHES OR MODERN TECHNIQUES?
Libraries Conference on Applied Statistics in Agriculture 2004-16th Annual Conference Proceedings ANALYZING BINOMIAL DATA IN A SPLIT- PLOT DESIGN: CLASSICAL APPROACHES OR MODERN TECHNIQUES? Liang Fang
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More informationMORE ON SIMPLE REGRESSION: OVERVIEW
FI=NOT0106 NOTICE. Unless otherwise indicated, all materials on this page and linked pages at the blue.temple.edu address and at the astro.temple.edu address are the sole property of Ralph B. Taylor and
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationOne-Way ANOVA. Some examples of when ANOVA would be appropriate include:
One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More information20.0 Experimental Design
20.0 Experimental Design Answer Questions 1 Philosophy One-Way ANOVA Egg Sample Multiple Comparisons 20.1 Philosophy Experiments are often expensive and/or dangerous. One wants to use good techniques that
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationFirst Year Examination Department of Statistics, University of Florida
First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show
More informationTied survival times; estimation of survival probabilities
Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation
More information,i = 1,2,L, p. For a sample of size n, let the columns of data be
MAC IIci: Miller Asymptotics Chapter 5: Regression Section?: Asymptotic Relationship Between a CC and its Associated Slope Estimates in Multiple Linear Regression The asymptotic null distribution of a
More informationLongitudinal Data Analysis of Health Outcomes
Longitudinal Data Analysis of Health Outcomes Longitudinal Data Analysis Workshop Running Example: Days 2 and 3 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development
More informationA Non-parametric bootstrap for multilevel models
A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is
More informationwith the usual assumptions about the error term. The two values of X 1 X 2 0 1
Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The
More informationAnalysis of Variance
Statistical Techniques II EXST7015 Analysis of Variance 15a_ANOVA_Introduction 1 Design The simplest model for Analysis of Variance (ANOVA) is the CRD, the Completely Randomized Design This model is also
More informationSampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,
Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean
More informationSTK4900/ Lecture 3. Program
STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies
More informationDo not copy, post, or distribute
14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationAnalysis of Variance (ANOVA)
Analysis of Variance (ANOVA) Much of statistical inference centers around the ability to distinguish between two or more groups in terms of some underlying response variable y. Sometimes, there are but
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationCorrelation and the Analysis of Variance Approach to Simple Linear Regression
Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation
More informationdf=degrees of freedom = n - 1
One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:
More information