Prediction of New Observations
|
|
- Franklin Norman
- 5 years ago
- Views:
Transcription
1 Statistic Seminar: 6 th talk ETHZ FS2010 Prediction of New Observations Martina Albers 12. April 2010 Papers: Welham (2004), Yiang (2007) 1
2 Content Introduction Prediction of Mixed Effects Prediction of Future Observation Principles of Prediction Prediction Process Fixed and Random Terms Example: Split-Plot Design 2
3 Linear mixed model y = X β + Zb + ε = W γ + ε y n 1 vector of observations X n p matrix associating observations with the appropriate combination of fixed effects β p 1 vector of fixed effects Z n q matrix associating observations with the app. comb. of random effects b q 1 vector of random effects ε n 1 vector of residual errors W, γ combined design matrix resp. vector of effects Introduction 3
4 Linear mixed model y = X β + Zb + ε = W τ + ε b ε Assumption ( ) ( 2 Y B = b ~ N Xβ + Zb, σ ) 1 I n ( Σ ) B ~ N 0, θ where Σ θ = cov( B) q q, symmetric 2 = σ Λ Λ θ θ Use: B =Λ U ( ) ( 2 θ Y U = u ~ N Xβ + ZΛ ) θu, σ1 In ( 2 U ~ N 0, σ ) 1 I 2 q U ~ N 0, σ I ( ) 1 q Introduction 4
5 Compute û and Linear mixed model β ˆθ uˆ y ZΛθ X X u arg min ˆ = β u, β θ 0 I 0 β 2 Λ I u ˆ θz ZΛθ + q Λ θ Z X Λ θz y = ˆ θ β y X YΛ X X θ X Lθ 0 L θ RZX 0 RZX RX RX L R θ ZX = Λ Z X θ R R = X X R R X X ZX ZX Introduction 5
6 What do we mean by prediction? Estimation of effects in the model prediction = linear combination of estimated effects Marginal vs. Conditional predictions What is needed? Marginal vs. Conditional predictions In Example: variety or nitrogen prediction? Introduction 6
7 Is there a general strategy? Problem: Each situation needs to be analyzed by hand! Questions that might arise: In- or exclusion of random model terms from the prediction? Different weighting schemes? etc. Introduction 7
8 Predictions 2 types of prediction: Prediction of mixed effects Prediction of a future observation Introduction 8
9 Linear mixed model y = X β + Zb + ε = W γ + ε y n 1 vector of observations X n p matrix associating observations with the appropriate combination of fixed effects β p 1 vector of fixed effects Z n q matrix associating observations with the app. comb. of random effects b q 1 vector of random effects ε n 1 vector of residual errors W, γ combined design matrix resp. vector of effects Prediction of Mixed effects 9
10 Prediction of Mixed Effects all parameters are known, i.e. fixed effects + variance components are known Consider : γ = x β + z b = x β + ξ x,z : known vectors β, b : vector of fixed/random effects Best predictor for ξ (as MSE): ˆ ξ = E ξ y = z E b y Prediction of Mixed Effects 10
11 Assumption b 0 cov( ) cov( ) ~ N b b Z, y X β cov( b ) Z V R = cov( ε), V = cov( y) = Z cov( b) Z + R 1 E b y = var( b ) Z ' V y X β ˆ 1 ξ = x' var( b) Z ' V y Xβ ( ) ( ) Best linear predictor of γ is then ( y ) 1 ˆ γ = x' β + z' cov( b) Z ' V Xβ Prediction of Mixed Effects 11
12 Example: IQ-Test IQ ~ Estimate the true IQ of a student scoring 130 in a test model: N(100,15 2 ) 2 score IQ ~ N(IQ,5 ) y = µ + b + ε, y student test score, b realization of a random effect Predict: µ + b student's true IQ (unobservable) Result: IQ = 127 Prediction of Mixed Effects 12
13 Prediction of Mixed Effects Fixed effects + variance components unknown 20 Students each student: 5 tests Computations as described before See R-File: rf_prediction3.r Prediction of Mixed Effects 13
14 Prediction of Future Observations AIM: construct prediction intervals, i.e. an interval in which future observation will fall with a certain probability given what has been observed. Examples: 1. Longitudinal studies Prediction of a fut. obs. from an individual not previously observed Less interest to predict another observation from an observed individual as the studies often aim at applications to a larger population E.g.: drugs going to the market after clinical trials 2. Surveys 2-step survey: A) number of families randomly selected B) some family members of each family are interviewed Prediction for a non-selected family Prediction of Future Observations 14
15 Prediction Intervals a. Assumption: fut. obs. has certain distribution Distribution defined up to a finite number of unknown parameters Estimate parameter obtain prediction interval BUT: if distribution assumption fails, the interval might be wrong b. Distribution-free Normality is not assumed Distribution-free approach Assumption: future observation is independent of current ones c. Markov-Chains, Montecarlo d. et cetera Prediction of Future Observations 15
16 Confidence vs. Prediction Intervals Confidence Interval (CI) Interval estimate of population parameter unobservable population parameter predict distribution of estimate of unobservable quantity of interest (e.g. true pop. mean) Prediction Interval (PI) Interval estimate of future observation future observation Predict distribution of individual future points Prediction of Future Observations 16
17 Confidence vs. Prediction Intervals (math.) Fixed effect model y = Xβ + ε, ε ~ N ˆ β = var var ( X ' X ) 1 X ' y ( 2 0, σ ) ( ˆ) 2 1 β = σ ( X ' X ) ( ˆ ) 2 1 β β = σ ( X ' X ) true y i = x iβ observed yi = x iβ + ε CI Interval: ˆ ˆ y ± 1.96 var modelled yˆ i = x iβ PI predicted yˆ ˆ n+ 1 = x n+ 1β Normal Approximation Prediction of Future Observations 17
18 Confidence vs. Prediction Intervals (math.) 2 ( ˆ ) = σ ( ) 1 CI: var y x X ' X x i i i ( 1 ) 2 ( ˆ ) σ ( ) PI: var y y = x X ' X x + 1 n+ 1 n+ 1 n+ 1 n+ 1 y ˆ x X X x 2 Confidence Interval: ± 1.96 σ i ' ˆ ( ) 1 i ( 1 n+ 1 ( ) n+ 1 ) 2 Prediction Interval: y ± 1.96 σ x X ' X x + 1 Prediction of Future Observations 18
19 Confidence vs. Prediction Intervals How do we construct Prediction Intervals for a more general model? Mixed effects model: y = X β + Zb + ε Prediction of Future Observations 19
20 Model: n Observations V cov Prediction Intervals y = X β + Zb + ε, b ε = ZΣ Z' + cov( ε ) θ Estimate and b : ( ) ( ) b ~ N 0, Σ q ( ) θ ( 2 σ I ) ε ~ N 0, n b = Σθ = σ I cov ε = σ I β ( β = X' V X) n ˆ ˆ X' Vˆ y bɶ = ˆ σ Z ' Vˆ ( y X ˆ β ) ˆ ˆ ˆ 2 2 where V = σ1 ZZ ' + σ In Prediction of Future Observations 20
21 Prediction Intervals cov Marginal: ( ˆ ) = ( ˆ β ) = ( ˆ β ) cov y cov x x cov x i i i i ( ˆ β β ε ) yˆ y = cov x x + z b + = ( ) ( ) n+ 1 n+ 1 n+ 1 n+ 1 n+ 1 ( ) 2 = x cov ˆ β β x + z Σ z + σ I Conditional (on all random effects): 2 n+ 1 n+ 1 n+ 1 θ n+ 1 n ( ) ( 1 ( ) ) 2 yˆ n+ 1 yn+ 1 b = bɶ = σ x n+ 1 X X x n cov 1 Prediction of Future Observations 21
22 Prediction Intervals Marginal: ( ˆ β ) Confidence Interval: yˆ ± 1.96 x cov x i ( ( ) ) 2 ˆ n 1 β β + n+ 1 n+ 1Σθ n+ 1 σ In Prediction Interval: yˆ ± 1.96 x cov x + z z + i Conditional on all random effects: ˆ ( 1 x ( ) ) n 1 X + n+ 1 2 Prediction Interval: y ± 1.96 σ X x + 1 Prediction of Future Observations 22
23 Prediction Intervals There is a difference between marginal and conditional predictions! Which one is of interest? Prediction of Future Observations 23
24 The prediction process Prediction: is a linear function of the best linear (unbiased) predictor of random effects with the best linear (unbiased) estimator of fixed effects in the model is typically associated with a combination of explanatory variables either averaged over, ignoring, or at a specific value of other explanatory variables in the model Principles of Prediction 24
25 The prediction process Partition of the explanatory variables (e.v.) into 3 sets: 1. Classifying set e.v. for which predicted values are required 2. Averaging set e.v. which have to have averaged over 3. Rest e.v. which will be ignored Principles of Prediction 25
26 The role of fixed and random effects with respect to prediction Fixed Terms have associated set of effects (parameters) which have to be estimated Random Terms associated effects are normally distributed with 0 mean and co-variance matrix co-variance matrix is function of (usually) unknown parameters error terms due to randomization or other structure of the data Principles of Prediction 26
27 How to deal with Random Factor Terms 1. Evaluate at a given value(s) specified by user 2. Average over the set of random effects Prediction specific to / conditional on the random effects observed Conditional prediction w.r.t. the term 3. Omit the random term from the model Prediction at the population average (zero) substitutes the assumed pop. mean for an unknown random effect Marginal prediction w.r.t. the term Principles of Prediction 27
28 How to deal with Fixed Factors no pre-defined population average no natural interpretation for a prediction derived by omitting a fixed term from the fitted values average over all the present levels to give a conditional average or: user should specify the value(s) Principles of Prediction 28
29 4 conceptual steps for the prediction process 1. Choose e.v. and their respective values for which predictive margins are required, i.e. determine the classifying set 2. Determine which variables should be averaged over, i.e. determine the averaging set 3. Determine terms that are needed to compute parameters and estimations 4. Choose the weighting for taking means over margins (for the averaging set) Principles of Prediction 29
30 Experiment: 4 levels of nitrogen 3 oat varieties 6 tries, i.e. 6 blocks 4 subplots 3 whole-plots random allocation of nitrogen within a block Split-Plot Design Fixed effects: treatment combination Random effects: blocking factors (source of error variation) AIM: estimate the performance of each treatment combination within the experiment Split-Plot Design 30
31 The Data-Set Split-Plot Design 31
32 The model: components fixed random residual ~ constant + variety + nitrogen + variety : nitrogen ~ blocks + blocks : wplots ~ blocks : wplots : splots Random terms: error terms used in estimation of treatment effects Not otherwise relevant to the prediction of treatment effects Split-Plot Design 32
33 Conditional vs. Marginal prediction for the random effects Conditional prediction: gives a prediction specific to the blocks and plots used in the experiment appropriate to inference for the specific instance that occurred in the dataset Marginal prediction: the prediction corresponds to the yields expected from a similar experiment laid out using different blocks and plots appropriate when inference is required for members of the wider population Split-Plot Design 33
34 n s v r s y ijk b i ijk i The model y = µ + b + v + w + n + vn + e r ( ij ) effect of variety r( ij) ( ij) randomizat ion of r( ij) ij s( ijk ) yield on block i = 1,...,6, whole - plot j = 1,..3, sub - plot µ overall constant w ( ijk ) ( ijk ) vn e ij rs ijk effect of block varieties effect of whole - plot j in block i effect of nitrogen level s randomizat ion of nitrogen levels to sub - plots interactio n of variety level r with nitrogen level s residual error for block i, to whole - plots rs whole - plot j, sub - plot k Split-Plot Design ijk k = 1,...,4 34
35 ijk i Assumption y = µ + b + v + w + n + vn + e r( ij) ij s( ijk ) rs ijk For the following terms we assume a normal distribution ( b, b,..., b ), w= ( w, w,..., w ) and e ( e, e e ) b = = ,..., 634 b w e ~ N 0 0, 0 σ 2 b 0 0 I 6 σ 0 2 w 0 I 18 σ I 72 Split-Plot Design 35
36 y = µ + b + v + w + n + vn + e ijk i r( ij) ij s( ijk ) rs ijk ANOVA no variety nitrogen interaction usually: drop non-significant terms Split-Plot Design 36
37 ijk Prediction Process y = µ + b + v + w + n + vn + e i r( ij) ij s( ijk ) rs ijk Prediction of yield for each nitrogen level = general effect of different nitrogen applications unweighted average across all varieties = prediction of yield for nitrogen level l for average block+whole-plot marginal prediction wrt block+whole-plot Calculation: ignore random terms: 3 1 ˆ µ + nˆ ( ˆ ) l + v j + vn jl 3 j= 1 Split-Plot Design 37
38 ijk Prediction Process y = µ + b + v + w + n + vn + e i r( ij) ij s( ijk ) rs ijk Prediction specific to blocks+whole-plots in experiment conditional prediction Calculation: include random terms: ˆ µ + b ɶ ˆ ( ˆ ) i + nl + wɶ ij + v j + vn jl i= 1 i= 1 j= 1 j= 1 Split-Plot Design 38
39 Prediction Process Explanatory variable Set Levels Averaging M C M C M C Variety a a all all e e Nitrogen c c all all n n Blocks x a - all - e Wplots x a - all - e splots x x M : marginal pred. C : conditional pred. a : averaging set c : classifying set x : excluded e : equal weights n : none Split-Plot Design 39
40 Prediction Process Model term In prediction? Constant + + Variety + + Nitrogen + + Variety:nitrogen + + Blocks x + Blocks:wplots x + Blocks:wplots:splots x x M + : used x : ignored C Split-Plot Design 40
41 The resulting predictions Predictions for nitrogen application levels with SE and SED, using marginal (M) or conditional (C) values of blocks+whole-plots Nitrogen application Prediction SE 0.0 cwt/acre cwt/acre cwt/acre cwt/acre SED M C Split-Plot Design 41
42 The resulting predictions variety nitrogen predictions Variety Nitrogen application (cwt/acre) Margin Golden Rain Marvellous Victory Margin Split-Plot Design 42
43 The resulting predictions SE smaller for conditional predictions because predictions are calculated conditional on the blocks+whole-plots observed! Using marginal values = no information on block+whole-plot effect Split-Plot Design 43
44 Special case: data missing Data for all replicates of Golden Rain with 0 cwt nitrogen are missing Split-Plot Design 44
45 Special case: data missing Variety Nitrogen application (cwt/acre) Margin Golden Rain??? Marvellous Victory Margin Corresponding cell cannot be estimated without additional assumptions! Split-Plot Design 45
46 Special case: data missing No significant variety main effect/ interactions present in the model Approach chosen has no great influence on nitrogen prediction consider variety predictions Possible approaches set inestimable parameters to 0, average over all cells average over cells with data present average over levels of nitrogen for which all varieties are present Split-Plot Design 46
47 The resulting predictions Variety predictions with Golden Rain + 0 cwt nitrogen plots set to missing value Margin Variety Inestimable parameters zero For data present On nitrogen levels Golden Rain Marvellous Victory Complete data Split-Plot Design 47
48 Variety The resulting predictions Inestimable parameters zero For data present On nitrogen levels Golden Rain Marvellous Victory In 2 nd case: variety ordering has changed prediction not comparable because of large nitrogen effect Split-Plot Design 48
49 Comparison Data complete Variety Margin Golden Rain Marvellous Victory Margin Data missing Variety Inest. param. zero For data present On nitrogen levels Golden Rain Marvellous Victory Split-Plot Design 49
50 Other special cases: data missing First entry in the data is missing (i.e. Victory, 0.0 cwt/acre, Block I) Data in Block I for all replicates with 0.0 cwt nitrogen are missing Split-Plot Design 50
51 Other special cases: data missing All Data for replicates with 0.0 cwt nitrogen are missing Split-Plot Design 51
52 Making the computations R-FILE! Split-Plot Design 52
Inference for the Regression Coefficient
Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates
More informationRegression Models - Introduction
Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationConfidence Interval for the mean response
Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.
More informationEstimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.
Estimating σ 2 We can do simple prediction of Y and estimation of the mean of Y at any value of X. To perform inferences about our regression line, we must estimate σ 2, the variance of the error term.
More informationLecture 3: Inference in SLR
Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals
More information1 One-way analysis of variance
LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.
More informationMultiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company
Multiple Regression Inference for Multiple Regression and A Case Study IPS Chapters 11.1 and 11.2 2009 W.H. Freeman and Company Objectives (IPS Chapters 11.1 and 11.2) Multiple regression Data for multiple
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationEconometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018
Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate
More informationGeneral Linear Model (Chapter 4)
General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients
More informationLecture 1 Linear Regression with One Predictor Variable.p2
Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationLecture 10 Multiple Linear Regression
Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable
More informationInference for Regression Simple Linear Regression
Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating
More informationLecture 12 Inference in MLR
Lecture 12 Inference in MLR STAT 512 Spring 2011 Background Reading KNNL: 6.6-6.7 12-1 Topic Overview Review MLR Model Inference about Regression Parameters Estimation of Mean Response Prediction 12-2
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationWhat is Experimental Design?
One Factor ANOVA What is Experimental Design? A designed experiment is a test in which purposeful changes are made to the input variables (x) so that we may observe and identify the reasons for change
More informationInference in Regression Analysis
ECNS 561 Inference Inference in Regression Analysis Up to this point 1.) OLS is unbiased 2.) OLS is BLUE (best linear unbiased estimator i.e., the variance is smallest among linear unbiased estimators)
More informationUnit 10: Simple Linear Regression and Correlation
Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for
More informationBIOS 2083 Linear Models c Abdus S. Wahed
Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter
More informationIntroduction to Estimation. Martina Litschmannová K210
Introduction to Estimation Martina Litschmannová martina.litschmannova@vsb.cz K210 Populations vs. Sample A population includes each element from the set of observations that can be made. A sample consists
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More information. Example: For 3 factors, sse = (y ijkt. " y ijk
ANALYSIS OF BALANCED FACTORIAL DESIGNS Estimates of model parameters and contrasts can be obtained by the method of Least Squares. Additional constraints must be added to estimate non-estimable parameters.
More informationBIOSTATISTICAL METHODS
BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the
More informationMultiple Linear Regression
Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there
More informationVariable Selection and Model Building
LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression
More informationSTAT Chapter 11: Regression
STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship
More informationTHE MULTIVARIATE LINEAR REGRESSION MODEL
THE MULTIVARIATE LINEAR REGRESSION MODEL Why multiple regression analysis? Model with more than 1 independent variable: y 0 1x1 2x2 u It allows : -Controlling for other factors, and get a ceteris paribus
More informationSleep data, two drugs Ch13.xls
Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch
More informationWeighted Least Squares
Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w
More informationApplied Statistics and Econometrics
Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple
More informationLinear Mixed-Effects Models. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 34
Linear Mixed-Effects Models Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 34 The Linear Mixed-Effects Model y = Xβ + Zu + e X is an n p design matrix of known constants β R
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationCorrelation and Linear Regression
Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means
More informationBIOS 6649: Handout Exercise Solution
BIOS 6649: Handout Exercise Solution NOTE: I encourage you to work together, but the work you submit must be your own. Any plagiarism will result in loss of all marks. This assignment is based on weight-loss
More informationEconometrics Summary Algebraic and Statistical Preliminaries
Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L
More informationn i n T Note: You can use the fact that t(.975; 10) = 2.228, t(.95; 10) = 1.813, t(.975; 12) = 2.179, t(.95; 12) =
MAT 3378 3X Midterm Examination (Solutions) 1. An experiment with a completely randomized design was run to determine whether four specific firing temperatures affect the density of a certain type of brick.
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationGiven a sample of n observations measured on k IVs and one DV, we obtain the equation
Psychology 8 Lecture #13 Outline Prediction and Cross-Validation One of the primary uses of MLR is for prediction of the value of a dependent variable for future observations, or observations that were
More informationThe Random Effects Model Introduction
The Random Effects Model Introduction Sometimes, treatments included in experiment are randomly chosen from set of all possible treatments. Conclusions from such experiment can then be generalized to other
More informationA Guide to REML in GenStat th. (15 Edition)
Reml A Guide to REML in GenStat th (15 Edition) by Roger Payne, Sue Welham and Simon Harding. GenStat is developed by VSN International Ltd, in collaboration with practising statisticians at Rothamsted
More informationMultivariate Regression (Chapter 10)
Multivariate Regression (Chapter 10) This week we ll cover multivariate regression and maybe a bit of canonical correlation. Today we ll mostly review univariate multivariate regression. With multivariate
More informationA Practitioner s Guide to Cluster-Robust Inference
A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode
More informationMultiple Regression Analysis. Part III. Multiple Regression Analysis
Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant
More informationMultiple Linear Regression
Multiple Linear Regression Asymptotics Asymptotics Multiple Linear Regression: Assumptions Assumption MLR. (Linearity in parameters) Assumption MLR. (Random Sampling from the population) We have a random
More informationVARIANCE COMPONENT ANALYSIS
VARIANCE COMPONENT ANALYSIS T. KRISHNAN Cranes Software International Limited Mahatma Gandhi Road, Bangalore - 560 001 krishnan.t@systat.com 1. Introduction In an experiment to compare the yields of two
More informationResidual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,
Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying
More information7.1 Basic Properties of Confidence Intervals
7.1 Basic Properties of Confidence Intervals What s Missing in a Point Just a single estimate What we need: how reliable it is Estimate? No idea how reliable this estimate is some measure of the variability
More informationGeneral Linear Model: Statistical Inference
Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter 4), least
More informationFormal Statement of Simple Linear Regression Model
Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor
More informationComparing two independent samples
In many applications it is necessary to compare two competing methods (for example, to compare treatment effects of a standard drug and an experimental drug). To compare two methods from statistical point
More informationBest Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU
Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model *Modified notes from Dr. Dan Nettleton from ISU Suppose intelligence quotients (IQs) for a population of
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationSimple Linear Regression Analysis
LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationChapter 9. Correlation and Regression
Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in
More informationPsychology 282 Lecture #4 Outline Inferences in SLR
Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations
More informationInstrumental Variables
Instrumental Variables Department of Economics University of Wisconsin-Madison September 27, 2016 Treatment Effects Throughout the course we will focus on the Treatment Effect Model For now take that to
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationComputational methods for mixed models
Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different
More informationQUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%
QUIZ 4 (CHAPTER 7) - SOLUTIONS MATH 119 SPRING 013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100% 1) We want to conduct a study to estimate the mean I.Q. of a pop singer s fans. We want to have 96% confidence
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationCharacterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ
Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased
More informationRandom Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars)
STAT:5201 Applied Statistic II Random Coefficient Model (a.k.a. multilevel model) (Adapted from UCLA Statistical Computing Seminars) School math achievement scores The data file consists of 7185 students
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationChapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression
Chapter 14 Student Lecture Notes 14-1 Department of Quantitative Methods & Information Systems Business Statistics Chapter 14 Multiple Regression QMIS 0 Dr. Mohammad Zainal Chapter Goals After completing
More informationLecture 24: Weighted and Generalized Least Squares
Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: MSE(b) = 1 n (Y i X i β)
More informationMarcel Dettling. Applied Statistical Regression AS 2012 Week 05. ETH Zürich, October 22, Institute for Data Analysis and Process Design
Marcel Dettling Institute for Data Analysis and Process Design Zurich University of Applied Sciences marcel.dettling@zhaw.ch http://stat.ethz.ch/~dettling ETH Zürich, October 22, 2012 1 What is Regression?
More informationMultiple Regression Analysis
Multiple Regression Analysis y = 0 + 1 x 1 + x +... k x k + u 6. Heteroskedasticity What is Heteroskedasticity?! Recall the assumption of homoskedasticity implied that conditional on the explanatory variables,
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationLecture 4: Multivariate Regression, Part 2
Lecture 4: Multivariate Regression, Part 2 Gauss-Markov Assumptions 1) Linear in Parameters: Y X X X i 0 1 1 2 2 k k 2) Random Sampling: we have a random sample from the population that follows the above
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationInference for Regression Inference about the Regression Model and Using the Regression Line
Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationTopic 14: Inference in Multiple Regression
Topic 14: Inference in Multiple Regression Outline Review multiple linear regression Inference of regression coefficients Application to book example Inference of mean Application to book example Inference
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationTopic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects
Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationEconometrics. 7) Endogeneity
30C00200 Econometrics 7) Endogeneity Timo Kuosmanen Professor, Ph.D. http://nomepre.net/index.php/timokuosmanen Today s topics Common types of endogeneity Simultaneity Omitted variables Measurement errors
More informationCIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8
CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval
More informationMultiple Linear Regression CIVL 7012/8012
Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for
More informationAssessing Model Adequacy
Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are
More informationDescribing Contingency tables
Today s topics: Describing Contingency tables 1. Probability structure for contingency tables (distributions, sensitivity/specificity, sampling schemes). 2. Comparing two proportions (relative risk, odds
More informationLecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1
Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More information3. Design Experiments and Variance Analysis
3. Design Experiments and Variance Analysis Isabel M. Rodrigues 1 / 46 3.1. Completely randomized experiment. Experimentation allows an investigator to find out what happens to the output variables when
More informationCorrelation & Simple Regression
Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.
More informationMSc / PhD Course Advanced Biostatistics. dr. P. Nazarov
MSc / PhD Course Advanced Biostatistics dr. P. Nazarov petr.nazarov@crp-sante.lu 04-1-013 L4. Linear models edu.sablab.net/abs013 1 Outline ANOVA (L3.4) 1-factor ANOVA Multifactor ANOVA Experimental design
More informationHealth utilities' affect you are reported alongside underestimates of uncertainty
Dr. Kelvin Chan, Medical Oncologist, Associate Scientist, Odette Cancer Centre, Sunnybrook Health Sciences Centre and Dr. Eleanor Pullenayegum, Senior Scientist, Hospital for Sick Children Title: Underestimation
More informationMath 423/533: The Main Theoretical Topics
Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)
More informationHomoskedasticity. Var (u X) = σ 2. (23)
Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This
More information