Measurement Error in Covariates

Size: px
Start display at page:

Download "Measurement Error in Covariates"

Transcription

1 Measurement Error in Covariates Raymond J. Carroll Department of Statistics Faculty of Nutrition Institute for Applied Mathematics and Computational Science Texas A&M University

2 My Goal Today Introduce the basic ideas Effects of measurement error Data needed Simplest models

3 Later Lectures Measurement error analysis Regression calibration SIMEX Corrected Scores Maximum likelihood Bayes

4 Today Start thinking about when a measurement error analysis is a good thing

5 Big Theme We always have a risk model relating disease to true exposure We always have a measurement error model relating observed exposure and true exposure The literature is split into methods that have an exposure model (structural) versus those that have no exposure model (functional)

6 Part 1: Overview

7 The Triple Whammy of Measurement Error Bias in parameter estimation, which with multiple covariates can lead to incorrect null hypothesis testing Loss of Power for detecting signals Masking the features of the data: measurement error causes the model of the observed data to change

8 Notation Response: Y True Covariates Subject to Measurement Error: X Covariates Measured Exactly: Z Observed Proxies for X: W

9 Emphasis I will focus on continuous exposures Measurement error with a categorical exposure X subject to misclassification is an even harder topic In those problems, the real issue is to get data to estimate the sensitivity and specificity of the surrogate exposure W, e.g.. Pr(W = 1 X = 1), Pr(W = 0 X = 0) The impact of misclassification is often profound

10 Nondifferential Measurement Error Definition: If you can observe X, you would not bother collecting W Technically, the proxies W are statistically independent of Y given (X, Z) All my lectures assume nondifferential measurement error The analysis of data subject to differential measurement error is a very different subject

11 Simple Linear Regression Linear regression model is Y = β 0 + β 1 X + ɛ; ɛ = Normal(0, σɛ 2 ). The interest here is in estimating β 1 We also want to form confidence intervals for β 1 We also want to test the null hypothesis that β 1 = 0.

12 Classical Measurement Error The classical measurement error model is that instead of observing X, we observe W, where W = X + U ; U = Normal(0, σ 2 u). We will shortly go into more detail, but the first two parts of the Triply Whammy can best be seen graphically.

13 Simulation Study Generate X 1,..., X 50 which are Normal(0, 1) Generate Y i = β 0 + β x X i + ɛ i ɛ i = Normal(0, 1/9) β 0 = 0 β x = 1 Generate U 1,..., U 50 = Normal(0, 1) Set W i = X i + U i Regress Y on X and Y on W and compare

14 Effects of Measurement Error 10 5 Reliable Data True Data Without Measurement Error

15 Effects of Measurement Error 10 5 Error--prone Data Reliable Data Observed Data With Measurement Error

16 Linear Regression We saw bias: the slope is too small in absolute value We saw excess variance: the variability about the line is much larger Of course, excess variance = loss of power

17 30 25 Sample Size Measurement Error Variance Sample Size for 80% Power. True slope β x = Variances σ 2 x = σ 2 ɛ = 1.

18 Loss of Features In many problems, we might have a more complex regression model This could include a change point of a threshold Or it could include a nonlinear regression: Y = sin(x) + ɛ; X = Uniform(0, 4π); U = Normal{0, var(x)/2}; σɛ 2 = 0.09.

19 2 Y=Sine(X)+Noise, with X observed

20 2 Y=Sine(X)+Noise, with X not observed

21 2 Y=Sine(X)+Noise, with X observed (Blue) or not (Red)

22 Examples of Measurement Error Problems Measuring usual nutrient intake Measuring Systolic Blood Pressure Measuring Radiation Dose from the Nevada Test Site or in Chernobyl Measuring Exposure to arsenic in drinking water, dust in the workplace, radon gas in the home, and other environmental hazards

23 Measures Of Nutrient Intake Response: Y = average daily % calories from fat by a Food Frequency Questionnaire (FFQ). True Unobserved Predictor: X = true long-term average daily percentage of calories from fat Regression Model: Assume Y = β 0 + β x X + ɛ Measurement Error: X is never observable

24 Measures Of Nutrient Intake, cont. Surrogate for X: On 6 days over the course of a year women are interviewed by phone and asked to recall their food intake over the past year (24 hour recalls). Their average is recorded and denoted by W. The analysis of 24 hour recall introduces some error = analysis error Measurement error = sampling error + analysis error Classical measurement error model: W i = X i + U i, U i are measurement errors

25 Discussion I will do "theory" for linear regression, because the calculations are exact Everything though applies to logistic regression survival analysis, Poisson regression, etc.

26 General Structure Of A Measurement Error Problem Y = response, Z = error-free predictor, X = error-prone predictor Outcome Model: E(Y Z, X) = f (Z, X, β) Observed data: (Y, Z, W ) Model Misspecification: E(Y Z, W ) f (Z, W, β) Measurement Error model: relating W and X Classical Model: W = X + U

27 Theory Behind The Pictures: The Naive Analysis Attenuation Factor a.k.a. Reliability Ratio λ = var(x) var(w ) = σ 2 x σ 2 x + σ 2 u = σ2 x σ 2 w Note how λ 1

28 Theory Behind The Pictures: The Naive Analysis, cont. Observed Data Model is a linear regression Intercept: β 0 + (1 λ)β x µ x Slope: λβ x Slope is biased, too small Residual Variance: σ 2 ɛ + (1 λ)β 2 xσ 2 x Variance is increased

29 Implications For Testing Hypotheses The observed data slope is λβ x Because β x = 0 iff λβ x = 0 it follows that [ ] H 0 : β x = 0 [ ] H 0 : λβ x = 0 so the naive test of β x = 0 is valid (correct Type I error rate). Because the residual variance is increased, the power is decreased

30 Multiple Linear Regression Triple Whammy Bias Increased variance and loss of power Loss of features In multiple linear regression, the bias can take unusual forms This can lead to invalidity of hypothesis testing

31 Multiple Linear Regression With Error Model Y = β 0 + β T z Z + β T x X + ɛ W = X + U Bias Regressing Y on Z and W estimates ) ) [ ( βz β x = Λ ( βz β x ( βz β x )]

32 Multiple Linear Regression With Error, cont. Λ is the attenuation matrix or reliability matrix Λ = ( σzz σ zx σ xz σ xx + σ uu ) 1 ( σzz σ zx σ xz σ xx ) The terms σ zz, etc. are now covariance matrices Biases can take many forms, including reversal of signs! Global null test is OK: Naive test of H 0 : β x = 0, andβ z = 0 is valid No effect for any component of X is OK: Naive test of H 0 : β x = 0 is valid

33 What Naive Tests are Invalid? Tests for Z: surprisingly, tests about the variables Z measured without error are not valid Exception is when X and Z are independent When X is multivariate: Naive tests here for components of X are not valid

34 Bivariate X and Bias Example: One component of X has no effect Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ β 2 = 0 Correlation: The X s are negatively correlated: [ ] cov(x 1, X 2 ) =

35 Bivariate X and Bias cov(x 1, X 2 ) = [ ] Measurement errors are positively correlated W 1 = X 1 + U 1 W 2 = X 2 + U [ 2 ] cov(u 1, U 2 ) =

36 Bivariate X and Bias cov(x 1, X 2 ) = cov(u 1, U 2 ) = [ [ ] ] Reliability Matrix Λ = [ ]

37 Bivariate X and Bias cov(x 1, X 2 ) = cov(u 1, U 2 ) = Λ = [ ] [ ] [ ] True β = (1, 0) T Observed β = (0.5, 0.25) T : Naive Test Invalid!

38 Multiple Linear Regression With Error For X scalar, attenuation factor in β x is λ 1 = σ2 x z σ 2 x z + σ2 u σx z 2 = residual variance in regression of X on Z σx z 2 σ2 x = σ 2 x z λ 1 = σx z 2 + σ2 u σ 2 x σx 2 + σu 2 = λ = Collinearity accentuates attenuation

39 Bias for Inferences About Error-Free Covariates When X and Z are related Regression of X on Z: E(X Z) = Γ 1 + Γ T z Z Effects of Error: You do not estimate β z, but instead you estimate β z = β z + (1 λ 1 )β x Γ z,

40 Analysis Of Covariance These results have implications for the two group ANCOVA. True Predictor X Group Assignment Z = dummy indicator of group 1, say Imbalance: If X has a different mean for the two groups, then the estimated effect of Z is biased Illustration: The next slides illustrate that even when there is no Z effect in truth, the observed data may indicate, falsely, that there is an apparent effect

41 2-Group ANCOVA, True X Data. Note no effect. 4 ANCOVA, True X data

42 2-Group ANCOVA, Observed W Data. Note apparent effect. 4 ANCOVA, Observed W data

43 Part 2: Effects of Corrections for Measurement Error

44 What can a Measurement Error Analysis Do? Response: Y True Covariate: X Surrogate: W Other Exactly Measured Covariates: Z

45 What can a Measurement Error Analysis Do? With a single exposure, and classical measurement error, usually the effect is that the observed data underestimate the relative risks, sometimes profoundly. Measurement error analysis can correct this underestimation

46 What else can a Measurement Error Analysis Do? We have seen that there are cases that hypothesis tests that ignore measurement error are invalid. A measurement error analysis can result in valid tests with real Type I errors of near 5%

47 What else can a Measurement Error Analysis Not Do? A measurement error analysis cannot ever be as statistically efficient and powerful than an analysis in which true exposure X is observed. Measurement error lowers power, and no fancy analysis can alleviate this fact.

48 Prices of a Measurement Error Analysis Somewhere, somehow, you have to provide a model for the relationship of the true exposure X and the surrogate exposure W, even though you do not observe X Usually, this means that you need additional data to get at the measurement error model Requires planning, and costs more

49 Prices of a Measurement Error Analysis Almost without exception, for a variable measured with error, a measurement error analysis leads to increased variability in the estimate of risk.

50 The NIH-AARP Diet and Health Study Survival analysis of colorectal cancer (Y ) True exposures are X consisting of usual intake of energy and the usual Healthy Eating Index 2005 (HEI-2005) total score. Other variables Z measured "exactly were a long list (age group, etc.)

51 The NIH-AARP Diet and Health Study n 220, 000 Instead of usual energy and HEI-2005, we have them (mis)measured by a food frequency questionnaire Also, in a sub-study, we have 1,000 people who contributed 2 24hr recalls. I will not go into the entire complex modeling process

52 The NIH-AARP Diet and Health Study: Men Using the FFQ log relative risk estimate = 0.33 Standard error = 0.07 Measurement error analysis log relative risk estimate = 0.45 (greater in absolute value) Standard error = 0.09 (larger standard error)

53 The NIH-AARP Diet and Health Study: Women Using the FFQ log relative risk estimate = 0.22 Standard error = 0.09 p = 0.02 Measurement error analysis log relative risk estimate = 0.49 (greater in absolute value) Standard error = 0.16 (larger standard error) p = 0.00

54 The NIH-AARP Diet and Health Study: Women The lower p-values for the MEM analysis for women can happen True exposure X is bivariate, and while the components (energy, HEI-2005) are not highly correlated, they are correlated nonetheless.

55 Part 3: Needed Data for a Measurement Error Analysis

56 Overview of Classical Models Response: Y True Covariate: X Surrogate: W Other Exactly Measured Covariates: Z

57 Conundrum The Classical Model says that W = X + U, U = Normal(0, σ 2 u). In general, The measurement error variance σ 2 u cannot be estimated from just (Y, W, Z) data Question: what data are needed to estimate σ 2 u?

58 Solution #1: Validation Data At least in principle, in some cases, one can effectively observe X in a sub-study This is called a validation study Validation studies are beautiful things They are rare, especially if X is a long-term exposure Of course, if such data exist, σ 2 u = var(w X).

59 Solution #1: Validation Data Validation data, which include X, also allow us to estimate the distribution of true exposure They also allow us to understand whether the classical error model actually holds! Validation study data are really data with missing data, in X, although they are not typical missing data problems because most of the X s are missing.

60 Solution #2: Replication Data In many cases, it is possible to observed replicated W data Thus, for the i th person, we observed (W i1,..., W im with W ij = X i + U ij. Replication data allow easy estimation of σ 2 u through ANOVA calculations They also allow data checking to see if the additive model with homoscedastic error holds (details not given, this is an overview).

61 Solution #2: Replication Data Replicated biomarkers or 24hr recalls Replicated blood pressure measurements Replicated monitoring equipment

62 Solution #2: Replication Data There are subtleties with replication data There is debate as to whether they measure long-term exposure unbiasedly, or short-term exposure only. Everyone agrees that replication data are a good thing

63 Solution #3: Instrumental Variables Often forgotten, but widely used in econometrics These are variables T which have the following properties (hopefully) There are correlated with true exposure X They are nondifferential They are independent of the measurement error U Convincing oneself (or referees) that T is a proper instrument is hard, because it cannot be verified numerically.

64 Part 4: Pure Berkson Error

65 Overview In radiation epidemiology and presumably also occupational epidemiology, the calculated exposure comes from two sources: Error-Prone estimates from an individual Error-prone estimates assigned to groups with similar characteristics The second error-prone estimates are generally designated as Berkson measurement error I want to introduce the Berkson error model

66 The Berkson Model and the Nevada Test Site Genesis: In the 1950 s, the U.S. did above-ground nuclear testing At least twice, they set off atomic bombs when the winds were high and in the direction of Utah. The radiation fell on the ground Cows ate the grass from the ground Kids drank the milk

67 The Berkson Model and the Nevada Test Site Concerns about radiation-caused thyroid disease for these "down-winders" led to a major epidemiologic study at the University of Utah Radiation and biological experts, including NCI s Andre Bouville, built a dosimetry system based on physical transport models Every person with certain shared characteristics were assigned the same dose, along with a value for the uncertainty.

68 The Berkson Model and the Nevada Test Site Example: all girls aged 6 living in Washington Country who got their milk from their own cows and drank 3 glasses of milk per day were assigned the same dose and an uncertainty Example: all boys aged 3 living in Lincoln Country who got their milk from stores and drank 2 glasses of milk per day were assigned the same dose and an uncertainty In real life, classical errors come from estimates of the amount of milk drunk

69 The Berkson Model and the Nevada Test Site Thought Experiment: Ignore the uncertainty (measurement error) in milk consumption From the dosimetry system, each individual gets an assigned/calculated dose W Crucial Point: Children with the same characteristics are given the same assigned dose W Direct Measurements of thyroid exposure for the individuals were not done

70 The Berkson Model and the Nevada Test Site Fact: Two people with similar characteristics might get the same assigned dose W However, their true radiation exposures X would be different Model (in log scale): X = W + U, where U is the assigned uncertainty This is the Berkson measurement error model

71 The Berkson Model The classical Berkson model says that True Exposure = Assigned Exposure + Mean Zero Error In symbols: X = W + U berk (or X = W U berk ), Assumption: W and U b are independent and E(U ) = 0 (additive error) or E(U ) = 1 (multiplicative error) so that E(X W ) = W Compare with classical measurement error model where W = X + U and E(X W ) = λw + (1 λ)µ x.

72 The Berkson Model From previous page X = W + U berk In the linear regression model, Y = β 0 + β x X + ɛ Substituting, Y = β 0 + β x (W + U berk ) + ɛ = β 0 + β x W + (β x U berk + ɛ) No Bias: The slope of the regression of Y on W is β x! Increased Variance/Loss of Power: However, the variance of the regression in W is increased: it is var(ɛ) + β 2 xvar(u berk )

MEASUREMENT ERROR IN HEALTH STUDIES

MEASUREMENT ERROR IN HEALTH STUDIES MEASUREMENT ERROR IN HEALTH STUDIES Lecture 1 Introduction, Examples, Effects of Measurement Error in Linear Models Lecture 2 Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors,

More information

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University

GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist

More information

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova

Measurement error modeling. Department of Statistical Sciences Università degli Studi Padova Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

Formula for the t-test

Formula for the t-test Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Correction for classical covariate measurement error and extensions to life-course studies

Correction for classical covariate measurement error and extensions to life-course studies Correction for classical covariate measurement error and extensions to life-course studies Jonathan William Bartlett A thesis submitted to the University of London for the degree of Doctor of Philosophy

More information

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent

More information

Moment Reconstruction and Moment-Adjusted Imputation When Exposure Is Generated by a Complex, Nonlinear Random Effects Modeling Process

Moment Reconstruction and Moment-Adjusted Imputation When Exposure Is Generated by a Complex, Nonlinear Random Effects Modeling Process Biometrics 72, 1369 1377 December 2016 DOI: 10.1111/biom.12524 Moment Reconstruction and Moment-Adjusted Imputation When Exposure Is Generated by a Complex, Nonlinear Random Effects Modeling Process Cornelis

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Interactions among Continuous Predictors

Interactions among Continuous Predictors Interactions among Continuous Predictors Today s Class: Simple main effects within two-way interactions Conquering TEST/ESTIMATE/LINCOM statements Regions of significance Three-way interactions (and beyond

More information

A Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES Questionnaire

A Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES Questionnaire A Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES 1999-2006 Questionnaire Janet A. Tooze, Richard P. Troiano, Raymond J. Carroll, Alanna

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit

Effects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit American Journal of Epidemiology Copyright 003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 157, No. 4 Printed in U.S.A. DOI: 10.1093/aje/kwf17 Effects of Exposure Measurement

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module

More information

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person

More information

STA441: Spring Multiple Regression. More than one explanatory variable at the same time

STA441: Spring Multiple Regression. More than one explanatory variable at the same time STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory

More information

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute

Power Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute Power Analysis Ben Kite KU CRMDA 2015 Summer Methodology Institute Created by Terrence D. Jorgensen, 2014 Recall Hypothesis Testing? Null Hypothesis Significance Testing (NHST) is the most common application

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models

A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models Laurence S. Freedman 1,, Vitaly Fainberg 1, Victor Kipnis 2, Douglas Midthune 2, and Raymond J. Carroll 3 1

More information

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models

Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Data Uncertainty, MCML and Sampling Density

Data Uncertainty, MCML and Sampling Density Data Uncertainty, MCML and Sampling Density Graham Byrnes International Agency for Research on Cancer 27 October 2015 Outline... Correlated Measurement Error Maximal Marginal Likelihood Monte Carlo Maximum

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Regression With a Categorical Independent Variable

Regression With a Categorical Independent Variable Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

One-sample categorical data: approximate inference

One-sample categorical data: approximate inference One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution

More information

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University

Causal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Gov 2000: 9. Regression with Two Independent Variables

Gov 2000: 9. Regression with Two Independent Variables Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur

Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

Rejection regions for the bivariate case

Rejection regions for the bivariate case Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Vicente J. Monleon for the degree of Doctor of Philosophy in Statistics presented on November, 005. Title: Regression Calibration and Maximum Likelihood Inference for

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Course Review. Kin 304W Week 14: April 9, 2013

Course Review. Kin 304W Week 14: April 9, 2013 Course Review Kin 304W Week 14: April 9, 2013 1 Today s Outline Format of Kin 304W Final Exam Course Review Hand back marked Project Part II 2 Kin 304W Final Exam Saturday, Thursday, April 18, 3:30-6:30

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

A Re-Introduction to General Linear Models

A Re-Introduction to General Linear Models A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation

More information

Regression With a Categorical Independent Variable: Mean Comparisons

Regression With a Categorical Independent Variable: Mean Comparisons Regression With a Categorical Independent Variable: Mean Lecture 16 March 29, 2005 Applied Regression Analysis Lecture #16-3/29/2005 Slide 1 of 43 Today s Lecture comparisons among means. Today s Lecture

More information

Biostatistics 4: Trends and Differences

Biostatistics 4: Trends and Differences Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among

More information

ECNS 561 Multiple Regression Analysis

ECNS 561 Multiple Regression Analysis ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking

More information

Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement

Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Introduction to Structural Equation Modeling Lecture #1 January 11, 2012 ERSH 8750: Lecture 1 Today s Class Introduction

More information

Difference scores or statistical control? What should I use to predict change over two time points? Jason T. Newsom

Difference scores or statistical control? What should I use to predict change over two time points? Jason T. Newsom Difference scores or statistical control? What should I use to predict change over two time points? Jason T. Newsom Overview Purpose is to introduce a few basic concepts that may help guide researchers

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.

where Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc. Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter

More information

Lecture 4 Multiple linear regression

Lecture 4 Multiple linear regression Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Warm-up Using the given data Create a scatterplot Find the regression line

Warm-up Using the given data Create a scatterplot Find the regression line Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22

GLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Simple linear regression

Simple linear regression Simple linear regression Prof. Giuseppe Verlato Unit of Epidemiology & Medical Statistics, Dept. of Diagnostics & Public Health, University of Verona Statistics with two variables two nominal variables:

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Simple, Marginal, and Interaction Effects in General Linear Models

Simple, Marginal, and Interaction Effects in General Linear Models Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means

More information

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary Correlation Patrick Breheny November 15 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 21 Introduction Descriptive statistics Generally speaking, scientific questions often

More information

Measurement Error in Spatial Modeling of Environmental Exposures

Measurement Error in Spatial Modeling of Environmental Exposures Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek

More information

1/11/2011. Chapter 4: Variability. Overview

1/11/2011. Chapter 4: Variability. Overview Chapter 4: Variability Overview In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In simple terms, if the scores in a distribution are all

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors

More information

Meta-analysis of epidemiological dose-response studies

Meta-analysis of epidemiological dose-response studies Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.

More information

Applied Epidemiologic Analysis

Applied Epidemiologic Analysis Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie Kranick Chelsea Morroni Sylvia Taylor Judith Weissman Lecture 13 Interactional questions and analyses Goals: To understand how

More information