Prediction of New Observations

Size: px
Start display at page:

Download "Prediction of New Observations"

Transcription

1 Statistic Seminar: 6 th talk ETHZ FS2010 Prediction of New Observations Martina Albers 12. April 2010 Papers: Welham (2004), Yiang (2007) 1

2 Content Introduction Prediction of Mixed Effects Prediction of Future Observation Principles of Prediction Prediction Process Fixed and Random Terms Example: Split-Plot Design 2

3 Linear mixed model y = X β + Zb + ε = W γ + ε y n 1 vector of observations X n p matrix associating observations with the appropriate combination of fixed effects β p 1 vector of fixed effects Z n q matrix associating observations with the app. comb. of random effects b q 1 vector of random effects ε n 1 vector of residual errors W, γ combined design matrix resp. vector of effects Introduction 3

4 Linear mixed model y = X β + Zb + ε = W τ + ε b ε Assumption ( ) ( 2 Y B = b ~ N Xβ + Zb, σ ) 1 I n ( Σ ) B ~ N 0, θ where Σ θ = cov( B) q q, symmetric 2 = σ Λ Λ θ θ Use: B =Λ U ( ) ( 2 θ Y U = u ~ N Xβ + ZΛ ) θu, σ1 In ( 2 U ~ N 0, σ ) 1 I 2 q U ~ N 0, σ I ( ) 1 q Introduction 4

5 Compute û and Linear mixed model β ˆθ uˆ y ZΛθ X X u arg min ˆ = β u, β θ 0 I 0 β 2 Λ I u ˆ θz ZΛθ + q Λ θ Z X Λ θz y = ˆ θ β y X YΛ X X θ X Lθ 0 L θ RZX 0 RZX RX RX L R θ ZX = Λ Z X θ R R = X X R R X X ZX ZX Introduction 5

6 What do we mean by prediction? Estimation of effects in the model prediction = linear combination of estimated effects Marginal vs. Conditional predictions What is needed? Marginal vs. Conditional predictions In Example: variety or nitrogen prediction? Introduction 6

7 Is there a general strategy? Problem: Each situation needs to be analyzed by hand! Questions that might arise: In- or exclusion of random model terms from the prediction? Different weighting schemes? etc. Introduction 7

8 Predictions 2 types of prediction: Prediction of mixed effects Prediction of a future observation Introduction 8

9 Linear mixed model y = X β + Zb + ε = W γ + ε y n 1 vector of observations X n p matrix associating observations with the appropriate combination of fixed effects β p 1 vector of fixed effects Z n q matrix associating observations with the app. comb. of random effects b q 1 vector of random effects ε n 1 vector of residual errors W, γ combined design matrix resp. vector of effects Prediction of Mixed effects 9

10 Prediction of Mixed Effects all parameters are known, i.e. fixed effects + variance components are known Consider : γ = x β + z b = x β + ξ x,z : known vectors β, b : vector of fixed/random effects Best predictor for ξ (as MSE): ˆ ξ = E ξ y = z E b y Prediction of Mixed Effects 10

11 Assumption b 0 cov( ) cov( ) ~ N b b Z, y X β cov( b ) Z V R = cov( ε), V = cov( y) = Z cov( b) Z + R 1 E b y = var( b ) Z ' V y X β ˆ 1 ξ = x' var( b) Z ' V y Xβ ( ) ( ) Best linear predictor of γ is then ( y ) 1 ˆ γ = x' β + z' cov( b) Z ' V Xβ Prediction of Mixed Effects 11

12 Example: IQ-Test IQ ~ Estimate the true IQ of a student scoring 130 in a test model: N(100,15 2 ) 2 score IQ ~ N(IQ,5 ) y = µ + b + ε, y student test score, b realization of a random effect Predict: µ + b student's true IQ (unobservable) Result: IQ = 127 Prediction of Mixed Effects 12

13 Prediction of Mixed Effects Fixed effects + variance components unknown 20 Students each student: 5 tests Computations as described before See R-File: rf_prediction3.r Prediction of Mixed Effects 13

14 Prediction of Future Observations AIM: construct prediction intervals, i.e. an interval in which future observation will fall with a certain probability given what has been observed. Examples: 1. Longitudinal studies Prediction of a fut. obs. from an individual not previously observed Less interest to predict another observation from an observed individual as the studies often aim at applications to a larger population E.g.: drugs going to the market after clinical trials 2. Surveys 2-step survey: A) number of families randomly selected B) some family members of each family are interviewed Prediction for a non-selected family Prediction of Future Observations 14

15 Prediction Intervals a. Assumption: fut. obs. has certain distribution Distribution defined up to a finite number of unknown parameters Estimate parameter obtain prediction interval BUT: if distribution assumption fails, the interval might be wrong b. Distribution-free Normality is not assumed Distribution-free approach Assumption: future observation is independent of current ones c. Markov-Chains, Montecarlo d. et cetera Prediction of Future Observations 15

16 Confidence vs. Prediction Intervals Confidence Interval (CI) Interval estimate of population parameter unobservable population parameter predict distribution of estimate of unobservable quantity of interest (e.g. true pop. mean) Prediction Interval (PI) Interval estimate of future observation future observation Predict distribution of individual future points Prediction of Future Observations 16

17 Confidence vs. Prediction Intervals (math.) Fixed effect model y = Xβ + ε, ε ~ N ˆ β = var var ( X ' X ) 1 X ' y ( 2 0, σ ) ( ˆ) 2 1 β = σ ( X ' X ) ( ˆ ) 2 1 β β = σ ( X ' X ) true y i = x iβ observed yi = x iβ + ε CI Interval: ˆ ˆ y ± 1.96 var modelled yˆ i = x iβ PI predicted yˆ ˆ n+ 1 = x n+ 1β Normal Approximation Prediction of Future Observations 17

18 Confidence vs. Prediction Intervals (math.) 2 ( ˆ ) = σ ( ) 1 CI: var y x X ' X x i i i ( 1 ) 2 ( ˆ ) σ ( ) PI: var y y = x X ' X x + 1 n+ 1 n+ 1 n+ 1 n+ 1 y ˆ x X X x 2 Confidence Interval: ± 1.96 σ i ' ˆ ( ) 1 i ( 1 n+ 1 ( ) n+ 1 ) 2 Prediction Interval: y ± 1.96 σ x X ' X x + 1 Prediction of Future Observations 18

19 Confidence vs. Prediction Intervals How do we construct Prediction Intervals for a more general model? Mixed effects model: y = X β + Zb + ε Prediction of Future Observations 19

20 Model: n Observations V cov Prediction Intervals y = X β + Zb + ε, b ε = ZΣ Z' + cov( ε ) θ Estimate and b : ( ) ( ) b ~ N 0, Σ q ( ) θ ( 2 σ I ) ε ~ N 0, n b = Σθ = σ I cov ε = σ I β ( β = X' V X) n ˆ ˆ X' Vˆ y bɶ = ˆ σ Z ' Vˆ ( y X ˆ β ) ˆ ˆ ˆ 2 2 where V = σ1 ZZ ' + σ In Prediction of Future Observations 20

21 Prediction Intervals cov Marginal: ( ˆ ) = ( ˆ β ) = ( ˆ β ) cov y cov x x cov x i i i i ( ˆ β β ε ) yˆ y = cov x x + z b + = ( ) ( ) n+ 1 n+ 1 n+ 1 n+ 1 n+ 1 ( ) 2 = x cov ˆ β β x + z Σ z + σ I Conditional (on all random effects): 2 n+ 1 n+ 1 n+ 1 θ n+ 1 n ( ) ( 1 ( ) ) 2 yˆ n+ 1 yn+ 1 b = bɶ = σ x n+ 1 X X x n cov 1 Prediction of Future Observations 21

22 Prediction Intervals Marginal: ( ˆ β ) Confidence Interval: yˆ ± 1.96 x cov x i ( ( ) ) 2 ˆ n 1 β β + n+ 1 n+ 1Σθ n+ 1 σ In Prediction Interval: yˆ ± 1.96 x cov x + z z + i Conditional on all random effects: ˆ ( 1 x ( ) ) n 1 X + n+ 1 2 Prediction Interval: y ± 1.96 σ X x + 1 Prediction of Future Observations 22

23 Prediction Intervals There is a difference between marginal and conditional predictions! Which one is of interest? Prediction of Future Observations 23

24 The prediction process Prediction: is a linear function of the best linear (unbiased) predictor of random effects with the best linear (unbiased) estimator of fixed effects in the model is typically associated with a combination of explanatory variables either averaged over, ignoring, or at a specific value of other explanatory variables in the model Principles of Prediction 24

25 The prediction process Partition of the explanatory variables (e.v.) into 3 sets: 1. Classifying set e.v. for which predicted values are required 2. Averaging set e.v. which have to have averaged over 3. Rest e.v. which will be ignored Principles of Prediction 25

26 The role of fixed and random effects with respect to prediction Fixed Terms have associated set of effects (parameters) which have to be estimated Random Terms associated effects are normally distributed with 0 mean and co-variance matrix co-variance matrix is function of (usually) unknown parameters error terms due to randomization or other structure of the data Principles of Prediction 26

27 How to deal with Random Factor Terms 1. Evaluate at a given value(s) specified by user 2. Average over the set of random effects Prediction specific to / conditional on the random effects observed Conditional prediction w.r.t. the term 3. Omit the random term from the model Prediction at the population average (zero) substitutes the assumed pop. mean for an unknown random effect Marginal prediction w.r.t. the term Principles of Prediction 27

28 How to deal with Fixed Factors no pre-defined population average no natural interpretation for a prediction derived by omitting a fixed term from the fitted values average over all the present levels to give a conditional average or: user should specify the value(s) Principles of Prediction 28

29 4 conceptual steps for the prediction process 1. Choose e.v. and their respective values for which predictive margins are required, i.e. determine the classifying set 2. Determine which variables should be averaged over, i.e. determine the averaging set 3. Determine terms that are needed to compute parameters and estimations 4. Choose the weighting for taking means over margins (for the averaging set) Principles of Prediction 29

30 Experiment: 4 levels of nitrogen 3 oat varieties 6 tries, i.e. 6 blocks 4 subplots 3 whole-plots random allocation of nitrogen within a block Split-Plot Design Fixed effects: treatment combination Random effects: blocking factors (source of error variation) AIM: estimate the performance of each treatment combination within the experiment Split-Plot Design 30

31 The Data-Set Split-Plot Design 31

32 The model: components fixed random residual ~ constant + variety + nitrogen + variety : nitrogen ~ blocks + blocks : wplots ~ blocks : wplots : splots Random terms: error terms used in estimation of treatment effects Not otherwise relevant to the prediction of treatment effects Split-Plot Design 32

33 Conditional vs. Marginal prediction for the random effects Conditional prediction: gives a prediction specific to the blocks and plots used in the experiment appropriate to inference for the specific instance that occurred in the dataset Marginal prediction: the prediction corresponds to the yields expected from a similar experiment laid out using different blocks and plots appropriate when inference is required for members of the wider population Split-Plot Design 33

34 n s v r s y ijk b i ijk i The model y = µ + b + v + w + n + vn + e r ( ij ) effect of variety r( ij) ( ij) randomizat ion of r( ij) ij s( ijk ) yield on block i = 1,...,6, whole - plot j = 1,..3, sub - plot µ overall constant w ( ijk ) ( ijk ) vn e ij rs ijk effect of block varieties effect of whole - plot j in block i effect of nitrogen level s randomizat ion of nitrogen levels to sub - plots interactio n of variety level r with nitrogen level s residual error for block i, to whole - plots rs whole - plot j, sub - plot k Split-Plot Design ijk k = 1,...,4 34

35 ijk i Assumption y = µ + b + v + w + n + vn + e r( ij) ij s( ijk ) rs ijk For the following terms we assume a normal distribution ( b, b,..., b ), w= ( w, w,..., w ) and e ( e, e e ) b = = ,..., 634 b w e ~ N 0 0, 0 σ 2 b 0 0 I 6 σ 0 2 w 0 I 18 σ I 72 Split-Plot Design 35

36 y = µ + b + v + w + n + vn + e ijk i r( ij) ij s( ijk ) rs ijk ANOVA no variety nitrogen interaction usually: drop non-significant terms Split-Plot Design 36

37 ijk Prediction Process y = µ + b + v + w + n + vn + e i r( ij) ij s( ijk ) rs ijk Prediction of yield for each nitrogen level = general effect of different nitrogen applications unweighted average across all varieties = prediction of yield for nitrogen level l for average block+whole-plot marginal prediction wrt block+whole-plot Calculation: ignore random terms: 3 1 ˆ µ + nˆ ( ˆ ) l + v j + vn jl 3 j= 1 Split-Plot Design 37

38 ijk Prediction Process y = µ + b + v + w + n + vn + e i r( ij) ij s( ijk ) rs ijk Prediction specific to blocks+whole-plots in experiment conditional prediction Calculation: include random terms: ˆ µ + b ɶ ˆ ( ˆ ) i + nl + wɶ ij + v j + vn jl i= 1 i= 1 j= 1 j= 1 Split-Plot Design 38

39 Prediction Process Explanatory variable Set Levels Averaging M C M C M C Variety a a all all e e Nitrogen c c all all n n Blocks x a - all - e Wplots x a - all - e splots x x M : marginal pred. C : conditional pred. a : averaging set c : classifying set x : excluded e : equal weights n : none Split-Plot Design 39

40 Prediction Process Model term In prediction? Constant + + Variety + + Nitrogen + + Variety:nitrogen + + Blocks x + Blocks:wplots x + Blocks:wplots:splots x x M + : used x : ignored C Split-Plot Design 40

41 The resulting predictions Predictions for nitrogen application levels with SE and SED, using marginal (M) or conditional (C) values of blocks+whole-plots Nitrogen application Prediction SE 0.0 cwt/acre cwt/acre cwt/acre cwt/acre SED M C Split-Plot Design 41

42 The resulting predictions variety nitrogen predictions Variety Nitrogen application (cwt/acre) Margin Golden Rain Marvellous Victory Margin Split-Plot Design 42

43 The resulting predictions SE smaller for conditional predictions because predictions are calculated conditional on the blocks+whole-plots observed! Using marginal values = no information on block+whole-plot effect Split-Plot Design 43

44 Special case: data missing Data for all replicates of Golden Rain with 0 cwt nitrogen are missing Split-Plot Design 44

45 Special case: data missing Variety Nitrogen application (cwt/acre) Margin Golden Rain??? Marvellous Victory Margin Corresponding cell cannot be estimated without additional assumptions! Split-Plot Design 45

46 Special case: data missing No significant variety main effect/ interactions present in the model Approach chosen has no great influence on nitrogen prediction consider variety predictions Possible approaches set inestimable parameters to 0, average over all cells average over cells with data present average over levels of nitrogen for which all varieties are present Split-Plot Design 46

47 The resulting predictions Variety predictions with Golden Rain + 0 cwt nitrogen plots set to missing value Margin Variety Inestimable parameters zero For data present On nitrogen levels Golden Rain Marvellous Victory Complete data Split-Plot Design 47

48 Variety The resulting predictions Inestimable parameters zero For data present On nitrogen levels Golden Rain Marvellous Victory In 2 nd case: variety ordering has changed prediction not comparable because of large nitrogen effect Split-Plot Design 48

49 Comparison Data complete Variety Margin Golden Rain Marvellous Victory Margin Data missing Variety Inest. param. zero For data present On nitrogen levels Golden Rain Marvellous Victory Split-Plot Design 49

50 Other special cases: data missing First entry in the data is missing (i.e. Victory, 0.0 cwt/acre, Block I) Data in Block I for all replicates with 0.0 cwt nitrogen are missing Split-Plot Design 50

51 Other special cases: data missing All Data for replicates with 0.0 cwt nitrogen are missing Split-Plot Design 51

52 Making the computations R-FILE! Split-Plot Design 52

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X. Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Lecture 12 Inference in MLR

Lecture 12 Inference in MLR Lecture 12 Inference in MLR STAT 512 Spring 2011 Background Reading KNNL: 6.6-6.7 12-1 Topic Overview Review MLR Model Inference about Regression Parameters Estimation of Mean Response Prediction 12-2

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

What is Experimental Design?

What is Experimental Design? One Factor ANOVA What is Experimental Design? A designed experiment is a test in which purposeful changes are made to the input variables (x) so that we may observe and identify the reasons for change

More information

Inference in Regression Analysis

Inference in Regression Analysis ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

Introduction to Estimation. Martina Litschmannová K210

Introduction to Estimation. Martina Litschmannová K210 Introduction to Estimation Martina Litschmannová martina.litschmannova@vsb.cz K210 Populations vs. Sample A population includes each element from the set of observations that can be made. A sample consists

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

. Example: For 3 factors, sse = (y ijkt. " y ijk

. Example: For 3 factors, sse = (y ijkt.  y ijk ANALYSIS OF BALANCED FACTORIAL DESIGNS Estimates of model parameters and contrasts can be obtained by the method of Least Squares. Additional constraints must be added to estimate non-estimable parameters.

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

THE MULTIVARIATE LINEAR REGRESSION MODEL

THE MULTIVARIATE LINEAR REGRESSION MODEL THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Linear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34

Linear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34 Linear Mixed-Effects Models Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 34 The Linear Mixed-Effects Model y = Xβ + Zu + e X is an n p design matrix of known constants β R

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

BIOS 6649: Handout Exercise Solution

BIOS 6649: Handout Exercise Solution BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

n i n T Note: You can use the fact that t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) =

n i n T Note: You can use the fact that t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) = MAT 3378 3X Midterm Examination (Solutions) 1. An experiment with a completely randomized design was run to determine whether four specific firing temperatures affect the density of a certain type of brick.

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Given a sample of n observations measured on k IVs and one DV, we obtain the equation

Given a sample of n observations measured on k IVs and one DV, we obtain the equation Psychology 8 Lecture #13 Outline Prediction and Cross-Validation One of the primary uses of MLR is for prediction of the value of a dependent variable for future observations, or observations that were

More information

The Random Effects Model Introduction

The Random Effects Model Introduction The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other

More information

A Guide to REML in GenStat th. (15 Edition)

A Guide to REML in GenStat th. (15 Edition) Reml A Guide to REML in GenStat th (15 Edition) by Roger Payne, Sue Welham and Simon Harding. GenStat is developed by VSN International Ltd, in collaboration with practising statisticians at Rothamsted

More information

Multivariate Regression (Chapter 10)

Multivariate Regression (Chapter 10) Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random

More information

VARIANCE COMPONENT ANALYSIS

VARIANCE COMPONENT ANALYSIS VARIANCE COMPONENT ANALYSIS T. KRISHNAN Cranes Software International Limited Mahatma Gandhi Road, Bangalore - 560 001 krishnan.t@systat.com 1. Introduction In an experiment to compare the yields of two

More information

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying

More information

7.1 Basic Properties of Confidence Intervals

7.1 Basic Properties of Confidence Intervals 7.1 Basic Properties of Confidence Intervals What s Missing in a Point Just a single estimate What we need: how reliable it is Estimate? No idea how reliable this estimate is some measure of the variability

More information

General Linear Model: Statistical Inference

General Linear Model: Statistical Inference Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Comparing two independent samples

Comparing two independent samples In many applications it is necessary to compare two competing methods (for example, to compare treatment effects of a standard drug and an experimental drug). To compare two methods from statistical point

More information

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model *Modified notes from Dr. Dan Nettleton from ISU Suppose intelligence quotients (IQs) for a population of

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Instrumental Variables

Instrumental Variables Instrumental Variables Department of Economics University of Wisconsin-Madison September 27, 2016 Treatment Effects Throughout the course we will focus on the Treatment Effect Model For now take that to

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100% QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100% 1) We want to conduct a study to estimate the mean I.Q. of a pop singer s fans. We want to have 96% confidence

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased

More information

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)

Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) School math achievement scores The data file consists of 7185 students

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing

More information

Lecture 24: Weighted and Generalized Least Squares

Lecture 24: Weighted and Generalized Least Squares Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)

More information

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design

Marcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Lecture 4: Multivariate Regression, Part 2

Lecture 4: Multivariate Regression, Part 2 Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Topic 14: Inference in Multiple Regression

Topic 14: Inference in Multiple Regression Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Econometrics. 7) Endogeneity

Econometrics. 7) Endogeneity 30C00200 Econometrics 7) Endogeneity Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Common types of endogeneity Simultaneity Omitted variables Measurement errors

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

Assessing Model Adequacy

Assessing Model Adequacy Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are

More information

Describing Contingency tables

Describing Contingency tables Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

3. Design Experiments and Variance Analysis

3. Design Experiments and Variance Analysis 3. Design Experiments and Variance Analysis Isabel M. Rodrigues 1 / 46 3.1. Completely randomized experiment. Experimentation allows an investigator to find out what happens to the output variables when

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov

MSc / PhD Course Advanced Biostatistics. dr. P. Nazarov MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 04-1-013 L4. Linear models edu.sablab.net/abs013 1 Outline ANOVA (L3.4) 1-factor ANOVA Multifactor ANOVA Experimental design

More information

Health utilities' affect you are reported alongside underestimates of uncertainty

Health utilities' affect you are reported alongside underestimates of uncertainty Dr. Kelvin Chan, Medical Oncologist, Associate Scientist, Odette Cancer Centre, Sunnybrook Health Sciences Centre and Dr. Eleanor Pullenayegum, Senior Scientist, Hospital for Sick Children Title: Underestimation

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information