Measurement Error in Covariates
|
|
- Marilynn Bell
- 5 years ago
- Views:
Transcription
1 Measurement Error in Covariates Raymond J. Carroll Department of Statistics Faculty of Nutrition Institute for Applied Mathematics and Computational Science Texas A&M University
2 My Goal Today Introduce the basic ideas Effects of measurement error Data needed Simplest models
3 Later Lectures Measurement error analysis Regression calibration SIMEX Corrected Scores Maximum likelihood Bayes
4 Today Start thinking about when a measurement error analysis is a good thing
5 Big Theme We always have a risk model relating disease to true exposure We always have a measurement error model relating observed exposure and true exposure The literature is split into methods that have an exposure model (structural) versus those that have no exposure model (functional)
6 Part 1: Overview
7 The Triple Whammy of Measurement Error Bias in parameter estimation, which with multiple covariates can lead to incorrect null hypothesis testing Loss of Power for detecting signals Masking the features of the data: measurement error causes the model of the observed data to change
8 Notation Response: Y True Covariates Subject to Measurement Error: X Covariates Measured Exactly: Z Observed Proxies for X: W
9 Emphasis I will focus on continuous exposures Measurement error with a categorical exposure X subject to misclassification is an even harder topic In those problems, the real issue is to get data to estimate the sensitivity and specificity of the surrogate exposure W, e.g.. Pr(W = 1 X = 1), Pr(W = 0 X = 0) The impact of misclassification is often profound
10 Nondifferential Measurement Error Definition: If you can observe X, you would not bother collecting W Technically, the proxies W are statistically independent of Y given (X, Z) All my lectures assume nondifferential measurement error The analysis of data subject to differential measurement error is a very different subject
11 Simple Linear Regression Linear regression model is Y = β 0 + β 1 X + ɛ; ɛ = Normal(0, σɛ 2 ). The interest here is in estimating β 1 We also want to form confidence intervals for β 1 We also want to test the null hypothesis that β 1 = 0.
12 Classical Measurement Error The classical measurement error model is that instead of observing X, we observe W, where W = X + U ; U = Normal(0, σ 2 u). We will shortly go into more detail, but the first two parts of the Triply Whammy can best be seen graphically.
13 Simulation Study Generate X 1,..., X 50 which are Normal(0, 1) Generate Y i = β 0 + β x X i + ɛ i ɛ i = Normal(0, 1/9) β 0 = 0 β x = 1 Generate U 1,..., U 50 = Normal(0, 1) Set W i = X i + U i Regress Y on X and Y on W and compare
14 Effects of Measurement Error 10 5 Reliable Data True Data Without Measurement Error
15 Effects of Measurement Error 10 5 Error--prone Data Reliable Data Observed Data With Measurement Error
16 Linear Regression We saw bias: the slope is too small in absolute value We saw excess variance: the variability about the line is much larger Of course, excess variance = loss of power
17 30 25 Sample Size Measurement Error Variance Sample Size for 80% Power. True slope β x = Variances σ 2 x = σ 2 ɛ = 1.
18 Loss of Features In many problems, we might have a more complex regression model This could include a change point of a threshold Or it could include a nonlinear regression: Y = sin(x) + ɛ; X = Uniform(0, 4π); U = Normal{0, var(x)/2}; σɛ 2 = 0.09.
19 2 Y=Sine(X)+Noise, with X observed
20 2 Y=Sine(X)+Noise, with X not observed
21 2 Y=Sine(X)+Noise, with X observed (Blue) or not (Red)
22 Examples of Measurement Error Problems Measuring usual nutrient intake Measuring Systolic Blood Pressure Measuring Radiation Dose from the Nevada Test Site or in Chernobyl Measuring Exposure to arsenic in drinking water, dust in the workplace, radon gas in the home, and other environmental hazards
23 Measures Of Nutrient Intake Response: Y = average daily % calories from fat by a Food Frequency Questionnaire (FFQ). True Unobserved Predictor: X = true long-term average daily percentage of calories from fat Regression Model: Assume Y = β 0 + β x X + ɛ Measurement Error: X is never observable
24 Measures Of Nutrient Intake, cont. Surrogate for X: On 6 days over the course of a year women are interviewed by phone and asked to recall their food intake over the past year (24 hour recalls). Their average is recorded and denoted by W. The analysis of 24 hour recall introduces some error = analysis error Measurement error = sampling error + analysis error Classical measurement error model: W i = X i + U i, U i are measurement errors
25 Discussion I will do "theory" for linear regression, because the calculations are exact Everything though applies to logistic regression survival analysis, Poisson regression, etc.
26 General Structure Of A Measurement Error Problem Y = response, Z = error-free predictor, X = error-prone predictor Outcome Model: E(Y Z, X) = f (Z, X, β) Observed data: (Y, Z, W ) Model Misspecification: E(Y Z, W ) f (Z, W, β) Measurement Error model: relating W and X Classical Model: W = X + U
27 Theory Behind The Pictures: The Naive Analysis Attenuation Factor a.k.a. Reliability Ratio λ = var(x) var(w ) = σ 2 x σ 2 x + σ 2 u = σ2 x σ 2 w Note how λ 1
28 Theory Behind The Pictures: The Naive Analysis, cont. Observed Data Model is a linear regression Intercept: β 0 + (1 λ)β x µ x Slope: λβ x Slope is biased, too small Residual Variance: σ 2 ɛ + (1 λ)β 2 xσ 2 x Variance is increased
29 Implications For Testing Hypotheses The observed data slope is λβ x Because β x = 0 iff λβ x = 0 it follows that [ ] H 0 : β x = 0 [ ] H 0 : λβ x = 0 so the naive test of β x = 0 is valid (correct Type I error rate). Because the residual variance is increased, the power is decreased
30 Multiple Linear Regression Triple Whammy Bias Increased variance and loss of power Loss of features In multiple linear regression, the bias can take unusual forms This can lead to invalidity of hypothesis testing
31 Multiple Linear Regression With Error Model Y = β 0 + β T z Z + β T x X + ɛ W = X + U Bias Regressing Y on Z and W estimates ) ) [ ( βz β x = Λ ( βz β x ( βz β x )]
32 Multiple Linear Regression With Error, cont. Λ is the attenuation matrix or reliability matrix Λ = ( σzz σ zx σ xz σ xx + σ uu ) 1 ( σzz σ zx σ xz σ xx ) The terms σ zz, etc. are now covariance matrices Biases can take many forms, including reversal of signs! Global null test is OK: Naive test of H 0 : β x = 0, andβ z = 0 is valid No effect for any component of X is OK: Naive test of H 0 : β x = 0 is valid
33 What Naive Tests are Invalid? Tests for Z: surprisingly, tests about the variables Z measured without error are not valid Exception is when X and Z are independent When X is multivariate: Naive tests here for components of X are not valid
34 Bivariate X and Bias Example: One component of X has no effect Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ β 2 = 0 Correlation: The X s are negatively correlated: [ ] cov(x 1, X 2 ) =
35 Bivariate X and Bias cov(x 1, X 2 ) = [ ] Measurement errors are positively correlated W 1 = X 1 + U 1 W 2 = X 2 + U [ 2 ] cov(u 1, U 2 ) =
36 Bivariate X and Bias cov(x 1, X 2 ) = cov(u 1, U 2 ) = [ [ ] ] Reliability Matrix Λ = [ ]
37 Bivariate X and Bias cov(x 1, X 2 ) = cov(u 1, U 2 ) = Λ = [ ] [ ] [ ] True β = (1, 0) T Observed β = (0.5, 0.25) T : Naive Test Invalid!
38 Multiple Linear Regression With Error For X scalar, attenuation factor in β x is λ 1 = σ2 x z σ 2 x z + σ2 u σx z 2 = residual variance in regression of X on Z σx z 2 σ2 x = σ 2 x z λ 1 = σx z 2 + σ2 u σ 2 x σx 2 + σu 2 = λ = Collinearity accentuates attenuation
39 Bias for Inferences About Error-Free Covariates When X and Z are related Regression of X on Z: E(X Z) = Γ 1 + Γ T z Z Effects of Error: You do not estimate β z, but instead you estimate β z = β z + (1 λ 1 )β x Γ z,
40 Analysis Of Covariance These results have implications for the two group ANCOVA. True Predictor X Group Assignment Z = dummy indicator of group 1, say Imbalance: If X has a different mean for the two groups, then the estimated effect of Z is biased Illustration: The next slides illustrate that even when there is no Z effect in truth, the observed data may indicate, falsely, that there is an apparent effect
41 2-Group ANCOVA, True X Data. Note no effect. 4 ANCOVA, True X data
42 2-Group ANCOVA, Observed W Data. Note apparent effect. 4 ANCOVA, Observed W data
43 Part 2: Effects of Corrections for Measurement Error
44 What can a Measurement Error Analysis Do? Response: Y True Covariate: X Surrogate: W Other Exactly Measured Covariates: Z
45 What can a Measurement Error Analysis Do? With a single exposure, and classical measurement error, usually the effect is that the observed data underestimate the relative risks, sometimes profoundly. Measurement error analysis can correct this underestimation
46 What else can a Measurement Error Analysis Do? We have seen that there are cases that hypothesis tests that ignore measurement error are invalid. A measurement error analysis can result in valid tests with real Type I errors of near 5%
47 What else can a Measurement Error Analysis Not Do? A measurement error analysis cannot ever be as statistically efficient and powerful than an analysis in which true exposure X is observed. Measurement error lowers power, and no fancy analysis can alleviate this fact.
48 Prices of a Measurement Error Analysis Somewhere, somehow, you have to provide a model for the relationship of the true exposure X and the surrogate exposure W, even though you do not observe X Usually, this means that you need additional data to get at the measurement error model Requires planning, and costs more
49 Prices of a Measurement Error Analysis Almost without exception, for a variable measured with error, a measurement error analysis leads to increased variability in the estimate of risk.
50 The NIH-AARP Diet and Health Study Survival analysis of colorectal cancer (Y ) True exposures are X consisting of usual intake of energy and the usual Healthy Eating Index 2005 (HEI-2005) total score. Other variables Z measured "exactly were a long list (age group, etc.)
51 The NIH-AARP Diet and Health Study n 220, 000 Instead of usual energy and HEI-2005, we have them (mis)measured by a food frequency questionnaire Also, in a sub-study, we have 1,000 people who contributed 2 24hr recalls. I will not go into the entire complex modeling process
52 The NIH-AARP Diet and Health Study: Men Using the FFQ log relative risk estimate = 0.33 Standard error = 0.07 Measurement error analysis log relative risk estimate = 0.45 (greater in absolute value) Standard error = 0.09 (larger standard error)
53 The NIH-AARP Diet and Health Study: Women Using the FFQ log relative risk estimate = 0.22 Standard error = 0.09 p = 0.02 Measurement error analysis log relative risk estimate = 0.49 (greater in absolute value) Standard error = 0.16 (larger standard error) p = 0.00
54 The NIH-AARP Diet and Health Study: Women The lower p-values for the MEM analysis for women can happen True exposure X is bivariate, and while the components (energy, HEI-2005) are not highly correlated, they are correlated nonetheless.
55 Part 3: Needed Data for a Measurement Error Analysis
56 Overview of Classical Models Response: Y True Covariate: X Surrogate: W Other Exactly Measured Covariates: Z
57 Conundrum The Classical Model says that W = X + U, U = Normal(0, σ 2 u). In general, The measurement error variance σ 2 u cannot be estimated from just (Y, W, Z) data Question: what data are needed to estimate σ 2 u?
58 Solution #1: Validation Data At least in principle, in some cases, one can effectively observe X in a sub-study This is called a validation study Validation studies are beautiful things They are rare, especially if X is a long-term exposure Of course, if such data exist, σ 2 u = var(w X).
59 Solution #1: Validation Data Validation data, which include X, also allow us to estimate the distribution of true exposure They also allow us to understand whether the classical error model actually holds! Validation study data are really data with missing data, in X, although they are not typical missing data problems because most of the X s are missing.
60 Solution #2: Replication Data In many cases, it is possible to observed replicated W data Thus, for the i th person, we observed (W i1,..., W im with W ij = X i + U ij. Replication data allow easy estimation of σ 2 u through ANOVA calculations They also allow data checking to see if the additive model with homoscedastic error holds (details not given, this is an overview).
61 Solution #2: Replication Data Replicated biomarkers or 24hr recalls Replicated blood pressure measurements Replicated monitoring equipment
62 Solution #2: Replication Data There are subtleties with replication data There is debate as to whether they measure long-term exposure unbiasedly, or short-term exposure only. Everyone agrees that replication data are a good thing
63 Solution #3: Instrumental Variables Often forgotten, but widely used in econometrics These are variables T which have the following properties (hopefully) There are correlated with true exposure X They are nondifferential They are independent of the measurement error U Convincing oneself (or referees) that T is a proper instrument is hard, because it cannot be verified numerically.
64 Part 4: Pure Berkson Error
65 Overview In radiation epidemiology and presumably also occupational epidemiology, the calculated exposure comes from two sources: Error-Prone estimates from an individual Error-prone estimates assigned to groups with similar characteristics The second error-prone estimates are generally designated as Berkson measurement error I want to introduce the Berkson error model
66 The Berkson Model and the Nevada Test Site Genesis: In the 1950 s, the U.S. did above-ground nuclear testing At least twice, they set off atomic bombs when the winds were high and in the direction of Utah. The radiation fell on the ground Cows ate the grass from the ground Kids drank the milk
67 The Berkson Model and the Nevada Test Site Concerns about radiation-caused thyroid disease for these "down-winders" led to a major epidemiologic study at the University of Utah Radiation and biological experts, including NCI s Andre Bouville, built a dosimetry system based on physical transport models Every person with certain shared characteristics were assigned the same dose, along with a value for the uncertainty.
68 The Berkson Model and the Nevada Test Site Example: all girls aged 6 living in Washington Country who got their milk from their own cows and drank 3 glasses of milk per day were assigned the same dose and an uncertainty Example: all boys aged 3 living in Lincoln Country who got their milk from stores and drank 2 glasses of milk per day were assigned the same dose and an uncertainty In real life, classical errors come from estimates of the amount of milk drunk
69 The Berkson Model and the Nevada Test Site Thought Experiment: Ignore the uncertainty (measurement error) in milk consumption From the dosimetry system, each individual gets an assigned/calculated dose W Crucial Point: Children with the same characteristics are given the same assigned dose W Direct Measurements of thyroid exposure for the individuals were not done
70 The Berkson Model and the Nevada Test Site Fact: Two people with similar characteristics might get the same assigned dose W However, their true radiation exposures X would be different Model (in log scale): X = W + U, where U is the assigned uncertainty This is the Berkson measurement error model
71 The Berkson Model The classical Berkson model says that True Exposure = Assigned Exposure + Mean Zero Error In symbols: X = W + U berk (or X = W U berk ), Assumption: W and U b are independent and E(U ) = 0 (additive error) or E(U ) = 1 (multiplicative error) so that E(X W ) = W Compare with classical measurement error model where W = X + U and E(X W ) = λw + (1 λ)µ x.
72 The Berkson Model From previous page X = W + U berk In the linear regression model, Y = β 0 + β x X + ɛ Substituting, Y = β 0 + β x (W + U berk ) + ɛ = β 0 + β x W + (β x U berk + ɛ) No Bias: The slope of the regression of Y on W is β x! Increased Variance/Loss of Power: However, the variance of the regression in W is increased: it is var(ɛ) + β 2 xvar(u berk )
MEASUREMENT ERROR IN HEALTH STUDIES
MEASUREMENT ERROR IN HEALTH STUDIES Lecture 1 Introduction, Examples, Effects of Measurement Error in Linear Models Lecture 2 Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors,
More informationGENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University
GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist
More informationMeasurement error modeling. Department of Statistical Sciences Università degli Studi Padova
Measurement error modeling Statistisches Beratungslabor Institut für Statistik Ludwig Maximilians Department of Statistical Sciences Università degli Studi Padova 29.4.2010 Overview 1 and Misclassification
More informationEMERGING MARKETS - Lecture 2: Methodology refresher
EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different
More informationEconometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague
Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage
More informationFormula for the t-test
Formula for the t-test: How the t-test Relates to the Distribution of the Data for the Groups Formula for the t-test: Formula for the Standard Error of the Difference Between the Means Formula for the
More informationProblem Set #6: OLS. Economics 835: Econometrics. Fall 2012
Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 1: August 22, 2012
More informationDescribing Change over Time: Adding Linear Trends
Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section
More informationCorrection for classical covariate measurement error and extensions to life-course studies
Correction for classical covariate measurement error and extensions to life-course studies Jonathan William Bartlett A thesis submitted to the University of London for the degree of Doctor of Philosophy
More informationCourse Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model
Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model EPSY 905: Multivariate Analysis Lecture 1 20 January 2016 EPSY 905: Lecture 1 -
More informationStatistical Distribution Assumptions of General Linear Models
Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions
More informationRegression With a Categorical Independent Variable
Regression With a Independent Variable Lecture 10 November 5, 2008 ERSH 8320 Lecture #10-11/5/2008 Slide 1 of 54 Today s Lecture Today s Lecture Chapter 11: Regression with a single categorical independent
More informationMoment Reconstruction and Moment-Adjusted Imputation When Exposure Is Generated by a Complex, Nonlinear Random Effects Modeling Process
Biometrics 72, 1369 1377 December 2016 DOI: 10.1111/biom.12524 Moment Reconstruction and Moment-Adjusted Imputation When Exposure Is Generated by a Complex, Nonlinear Random Effects Modeling Process Cornelis
More informationAn Introduction to Mplus and Path Analysis
An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationInteractions among Continuous Predictors
Interactions among Continuous Predictors Today s Class: Simple main effects within two-way interactions Conquering TEST/ESTIMATE/LINCOM statements Regions of significance Three-way interactions (and beyond
More informationA Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES Questionnaire
A Measurement Error Model for Physical Activity Level Measured by a Questionnaire, with application to the NHANES 1999-2006 Questionnaire Janet A. Tooze, Richard P. Troiano, Raymond J. Carroll, Alanna
More informationCategorical Predictor Variables
Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively
More informationEffects of Exposure Measurement Error When an Exposure Variable Is Constrained by a Lower Limit
American Journal of Epidemiology Copyright 003 by the Johns Hopkins Bloomberg School of Public Health All rights reserved Vol. 157, No. 4 Printed in U.S.A. DOI: 10.1093/aje/kwf17 Effects of Exposure Measurement
More informationreview session gov 2000 gov 2000 () review session 1 / 38
review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationModule 03 Lecture 14 Inferential Statistics ANOVA and TOI
Introduction of Data Analytics Prof. Nandan Sudarsanam and Prof. B Ravindran Department of Management Studies and Department of Computer Science and Engineering Indian Institute of Technology, Madras Module
More informationAn Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012
An Introduction to Multilevel Models PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012 Today s Class Concepts in Longitudinal Modeling Between-Person vs. +Within-Person
More informationSTA441: Spring Multiple Regression. More than one explanatory variable at the same time
STA441: Spring 2016 Multiple Regression More than one explanatory variable at the same time This slide show is a free open source document. See the last slide for copyright information. One Explanatory
More informationPower Analysis. Ben Kite KU CRMDA 2015 Summer Methodology Institute
Power Analysis Ben Kite KU CRMDA 2015 Summer Methodology Institute Created by Terrence D. Jorgensen, 2014 Recall Hypothesis Testing? Null Hypothesis Significance Testing (NHST) is the most common application
More informationSTA442/2101: Assignment 5
STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationAn Introduction to Path Analysis
An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving
More informationChapter 12 - Lecture 2 Inferences about regression coefficient
Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous
More informationPath Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis
Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate
More informationThe Simple Linear Regression Model
The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate
More information1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests
Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores
More informationA Re-Introduction to General Linear Models (GLM)
A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing
More informationA New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models
A New Method for Dealing With Measurement Error in Explanatory Variables of Regression Models Laurence S. Freedman 1,, Vitaly Fainberg 1, Victor Kipnis 2, Douglas Midthune 2, and Raymond J. Carroll 3 1
More informationEfficient Estimation of Population Quantiles in General Semiparametric Regression Models
Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu
More information1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as
ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available
More informationData Uncertainty, MCML and Sampling Density
Data Uncertainty, MCML and Sampling Density Graham Byrnes International Agency for Research on Cancer 27 October 2015 Outline... Correlated Measurement Error Maximal Marginal Likelihood Monte Carlo Maximum
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More information9 Correlation and Regression
9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationRegression With a Categorical Independent Variable
Regression ith a Independent Variable ERSH 8320 Slide 1 of 34 Today s Lecture Regression with a single categorical independent variable. Today s Lecture Coding procedures for analysis. Dummy coding. Relationship
More informationLecture 14 Simple Linear Regression
Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent
More informationOne-sample categorical data: approximate inference
One-sample categorical data: approximate inference Patrick Breheny October 6 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/25 Introduction It is relatively easy to think about the distribution
More informationMeasurement error as missing data: the case of epidemiologic assays. Roderick J. Little
Measurement error as missing data: the case of epidemiologic assays Roderick J. Little Outline Discuss two related calibration topics where classical methods are deficient (A) Limit of quantification methods
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationCausal Modeling in Environmental Epidemiology. Joel Schwartz Harvard University
Causal Modeling in Environmental Epidemiology Joel Schwartz Harvard University When I was Young What do I mean by Causal Modeling? What would have happened if the population had been exposed to a instead
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationGov 2000: 9. Regression with Two Independent Variables
Gov 2000: 9. Regression with Two Independent Variables Matthew Blackwell Fall 2016 1 / 62 1. Why Add Variables to a Regression? 2. Adding a Binary Covariate 3. Adding a Continuous Covariate 4. OLS Mechanics
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationLinear Regression. In this lecture we will study a particular type of regression model: the linear regression model
1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor
More informationBiostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras
Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras Lecture - 39 Regression Analysis Hello and welcome to the course on Biostatistics
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationEconometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur
Econometric Modelling Prof. Rudra P. Pradhan Department of Management Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 28 LOGIT and PROBIT Model Good afternoon, this is doctor Pradhan
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationNature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.
Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences
More informationRejection regions for the bivariate case
Rejection regions for the bivariate case The rejection region for the T 2 test (and similarly for Z 2 when Σ is known) is the region outside of an ellipse, for which there is a (1-α)% chance that the test
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Vicente J. Monleon for the degree of Doctor of Philosophy in Statistics presented on November, 005. Title: Regression Calibration and Maximum Likelihood Inference for
More informationSpecification Errors, Measurement Errors, Confounding
Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationRegression with a Single Regressor: Hypothesis Tests and Confidence Intervals
Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationCourse Review. Kin 304W Week 14: April 9, 2013
Course Review Kin 304W Week 14: April 9, 2013 1 Today s Outline Format of Kin 304W Final Exam Course Review Hand back marked Project Part II 2 Kin 304W Final Exam Saturday, Thursday, April 18, 3:30-6:30
More informationComparing IRT with Other Models
Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationA Re-Introduction to General Linear Models
A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation
More informationRegression With a Categorical Independent Variable: Mean Comparisons
Regression With a Categorical Independent Variable: Mean Lecture 16 March 29, 2005 Applied Regression Analysis Lecture #16-3/29/2005 Slide 1 of 43 Today s Lecture comparisons among means. Today s Lecture
More informationBiostatistics 4: Trends and Differences
Biostatistics 4: Trends and Differences Dr. Jessica Ketchum, PhD. email: McKinneyJL@vcu.edu Objectives 1) Know how to see the strength, direction, and linearity of relationships in a scatter plot 2) Interpret
More informationVectors and Matrices Statistics with Vectors and Matrices
Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc
More informationANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among
More informationECNS 561 Multiple Regression Analysis
ECNS 561 Multiple Regression Analysis Model with Two Independent Variables Consider the following model Crime i = β 0 + β 1 Educ i + β 2 [what else would we like to control for?] + ε i Here, we are taking
More informationClass Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement
Class Introduction and Overview; Review of ANOVA, Regression, and Psychological Measurement Introduction to Structural Equation Modeling Lecture #1 January 11, 2012 ERSH 8750: Lecture 1 Today s Class Introduction
More informationDifference scores or statistical control? What should I use to predict change over two time points? Jason T. Newsom
Difference scores or statistical control? What should I use to predict change over two time points? Jason T. Newsom Overview Purpose is to introduce a few basic concepts that may help guide researchers
More informationTwo-sample Categorical data: Testing
Two-sample Categorical data: Testing Patrick Breheny April 1 Patrick Breheny Introduction to Biostatistics (171:161) 1/28 Separate vs. paired samples Despite the fact that paired samples usually offer
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationwhere Female = 0 for males, = 1 for females Age is measured in years (22, 23, ) GPA is measured in units on a four-point scale (0, 1.22, 3.45, etc.
Notes on regression analysis 1. Basics in regression analysis key concepts (actual implementation is more complicated) A. Collect data B. Plot data on graph, draw a line through the middle of the scatter
More informationLecture 4 Multiple linear regression
Lecture 4 Multiple linear regression BIOST 515 January 15, 2004 Outline 1 Motivation for the multiple regression model Multiple regression in matrix notation Least squares estimation of model parameters
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationWarm-up Using the given data Create a scatterplot Find the regression line
Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationwith the usual assumptions about the error term. The two values of X 1 X 2 0 1
Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The
More information1 Motivation for Instrumental Variable (IV) Regression
ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data
More informationGLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22
GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical
More informationAnalysing data: regression and correlation S6 and S7
Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association
More informationSimple linear regression
Simple linear regression Prof. Giuseppe Verlato Unit of Epidemiology & Medical Statistics, Dept. of Diagnostics & Public Health, University of Verona Statistics with two variables two nominal variables:
More informationSimple Linear Regression for the Climate Data
Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO
More informationSimple, Marginal, and Interaction Effects in General Linear Models
Simple, Marginal, and Interaction Effects in General Linear Models PRE 905: Multivariate Analysis Lecture 3 Today s Class Centering and Coding Predictors Interpreting Parameters in the Model for the Means
More informationCorrelation. Patrick Breheny. November 15. Descriptive statistics Inference Summary
Correlation Patrick Breheny November 15 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 21 Introduction Descriptive statistics Generally speaking, scientific questions often
More informationMeasurement Error in Spatial Modeling of Environmental Exposures
Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek
More information1/11/2011. Chapter 4: Variability. Overview
Chapter 4: Variability Overview In statistics, our goal is to measure the amount of variability for a particular set of scores, a distribution. In simple terms, if the scores in a distribution are all
More informationSTA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.
STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory
More informationExample. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 29, 2015 Lecture 5: Multiple Regression Review of ANOVA & Simple Regression Both Quantitative outcome Independent, Gaussian errors
More informationMeta-analysis of epidemiological dose-response studies
Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.
More informationApplied Epidemiologic Analysis
Patricia Cohen, Ph.D. Henian Chen, M.D., Ph. D. Teaching Assistants Julie Kranick Chelsea Morroni Sylvia Taylor Judith Weissman Lecture 13 Interactional questions and analyses Goals: To understand how
More information