STK4900/ Lecture 10. Program

Size: px
Start display at page:

Download "STK4900/ Lecture 10. Program"

Transcription

1 STK4900/ Lecture 10 Program 1. Repeated measures and longitudinal data 2. Simple analysis approaches 3. Random effects models 4. Generalized estimating equations (GEE) 5. GEE for binary data (and GLMs) 6. Time series data Sections 7.1, 7.2, 7.3, 7.4 (except 7.4.5), 7.5 1

2 Example: Fecal fat Lack of digestive enzymes in the intestine can cause bowel absorption problems, which will be indicated by excess fat in the feces. Pancreatic enzyme supplements can reduce the problem. The data are from a study to determine if the form of the supplement makes a difference This is an example with repeated measurements (more than one observation per subject) 2

3 Fecal fat (g/day) none tablet capsule coated The plot shows that some patients tend to have high values for all pill types, while other patients tend to have low values The values for a patient are not independent, and this has to be taken into account when we analyze the data 3

4 Example: Birth weight and birth order We have recorded the weights of the babies of 200 mothers who all have five children. We are interested in studying the effect of birth order and the age of the mother on the birth weight Box plots of birth weights for all 200 mothers Birth weights for a sample of 30 mothers with fitted line (based on all) Birthweight (g) Birth Order This is an example of longitudinal data (repeated measures taken over time) The birth weights for a mother are not independent, and this has to be taken into account when we analyze the data 4

5 Simple approaches to analyzing repeated measures and longitudinal data 1. Standard analysis ignoring the dependence Fecal fat ex: One-way ANOVA on pill type Birth weight ex: Linear regression on birth order However: Ignoring dependence is WRONG! Not pursued further. 2. Looking at parts of the data for an individual to avoid the dependence problem Fecal fat ex: one option is to compare two pill types at a time using the paired t-test Birth weight example one option is to look at the difference in weight between the fifth and the first child. This is not wrong, but ignores part of the data (birth weight) or gives many comparisons (fecal fat) See slides

6 Approaches, contd. 3. Including dependence as fixed factor variable Fecal fat ex: Two-way ANOVA on pill type and individual Birth weight ex: Linear regression on birth order with mother as factor variable However: Generally not interested in factors individual/ mother. Also in birth weight ex many factor levels. 4. Including dependence as random factor: Random effects model Fecal fat ex: Two-way ANOVA on pill type and individual as random factor Birth weight ex: Linear regression on birth order with mother as random factor variable Pro: Individual/mother modeled as random variable. 6

7 Fecal fat example: R commands (comparing pill types 1(none) and 2(tablet)): fecfat=read.table(" header=t) x=fecfat$fecfat[fecfat$pilltype==1] y=fecfat$fecfat[fecfat$pilltype==2] t.test(x,y,paired=t) R output : Paired t-test t = 3.109, df = 5, p-value = percent confidence interval: mean of the differences: Estimated difference / P-value for the six comparisons of two pill types at a time (column minus row) None Tablet Capsule Tablet 21.6 / 2.7 % * * Capsule 20.7 / 3.7 % / 58.9 % * Coated 7.0 / 23.9 % / 7.8 % / 8.0 % 7

8 Birth weight example: R commands: babies=read.table(" header=t) first=babies$bweight[babies$birthord==1] fifth= babies$bweight[babies$birthord==5] diff=fifth-first t.test(diff) R output : One Sample t-test data: diff t = 4.211, df = 199, p-value = 3.849e percent confidence interval: mean of x On average the fifth child weights grams more than the first A 95 % confidence interval is from grams to grams If we divide by four we get the average increase per child 8

9 Approach 3: Individual as fixed factor Fecal fat example: Two way ANOVA R commands and output: > anova(lm(fecfat~factor(pilltype)+factor(subject),data=fecfat)) Response: fecfat Df Sum Sq Mean Sq F value Pr(>F) factor(pilltype) ** factor(subject) *** Residuals Birth weight example: Two way ANOVA R commands and output: > babiesanova=lm(bweight~birthord+initage+factor(momid),data=babies) > anova(babiesanova) Response: bweight Df Sum Sq Mean Sq F value Pr(>F) birthord e-06 *** initage e-09 *** factor(momid) < 2.2e-16 *** Residuals

10 Approach 4: Random effects model A drawback of Approach 3 is that an effect is estimated for every individual. The interest, however, lies in how such effects will vary over a population. A useful approach for analysing repeated measures is to consider a random effects model We will describe the random effects model using the fecal fat example Here we consider the model: ij j i ij where Y = µ + α + B + ε Y is the fecal fat for patient i when using pill type j ij α j is the effect of pill type j (relative to type 1) 2 the Bi are the effects of patients, assumed independent N(0, σ subj) 2 the εij are random errors, assumed independent N(0, σε ) To fit a random effects model, we use the "nlme" library 10

11 R commands: library(nlme) fit.fecfat=lme(fecfat~factor(pilltype), random=~1 subject, data=fecfat) summary(fit.fecfat) anova(fit.fecfat) R output (edited): Linear mixed-effects model fit by fit by REML Random effects: s subj Formula: ~1 subject (Intercept) Residual StdDev: Fixed effects: fecfat ~ factor(pilltype) Value Std.Error DF t-value p-value (Intercept) factor(pilltype) factor(pilltype) factor(pilltype) s ε numdf dendf F-value p-value (Intercept) factor(pilltype)

12 Correlation within subjects Covariance for two measurements from the same patient ( ): Cov( Y, Y ) ij ik = Cov( µ + α + B + ε, µ + α + B + ε ) j i ij k i ik = Cov( B + ε, B + ε ) = Cov( B, B) = Var( ) i ij i ik Variance for a measurement: Var( Yij ) = Var( µ + α j + Bi + εij ) Correlation for two measurements from the same patient: i i B i j k 2 = σ subj = Var( Bi + εij ) 2 2 =Var( B ) + Var( ε ) = σ + σ i ij subj ε corr( Y, Y ) ij ik = Cov( Y, Y ) ij ik Var( Y ) Var( Y ) ij ik = σ σ 2 subj 2 2 subj + σε Estimate of correlation: =

13 We will then analyze the birth weight example using a random effects model We here consider the model: where Y = µ + β x + β x + B + ε ij 1 1ij 2 2ij i ij Y is the birth weight for the j-th baby of the i-th mother ij x = j is the birth order (parity) of the j-th baby of the i-th mother 1ij β 1 is the effect of increasing the birth order by one x2ij is the age of the i-th mother when she had her first baby β 2 is the effect of one year's incerase in the age of the mother 2 Bi N σ subj the are the effects of mothers, assumed independent (0, ) 2 the εij are random errors, assumed independent N(0, σε ) 13

14 R commands: fit.babies=lme(bweight~birthord+initage, random=~1 momid, data=babies) summary(fit.babies) R output (edited): Linear mixed-effects model fit by fit by REML Random effects: Formula: ~1 momid (Intercept) Residual StdDev: Fixed effects: bweight ~ birthord + initage Value Std.Error* DF t-value* p-value (Intercept) birthord initage Estimate of correlation for two babies by the same mother: = 0.39 *) Standard errors and t-values differ slightly form those on page 283 in the text book, since we here use REML estimation. 14

15 Longitudinal data and correlation structures A random effects model for longitudinal data assumes that the correlation between any two observations for the same individual is the same E.g. for the birth weight example a random effects model assumes the same correlation between the birth weights of the first and second child as for the first and fifth child In general for longitudinal data, where observations are taken consecutively over time, it may be the case that observations that are close to each other in time are more correlated than those further apart In order to fit a model for longitudinal data, we need to take into account the type of correlation between the observations 15

16 Assume that the observations for the i-th subject are Y, Y,..., Y i1 i2 im Common assumptions on the correlation structure of the are ( ): Y ij j k Exchangeable: corr( Y, Y ) ij ik = ρ Autoregressive: corr( Y, Y ) = Unstructured: corr( Y, Y ) = ρ Independence: corr( Y, Y ) = 0 ij ij ik ik k j ρ ij ik jk Note that a random effects model implies an exchangeable correlation structure We may use the "gee" library to fit models with these correlation structures (using a method called generalized estimating equations) 16

17 R commands: library(gee) summary(gee(bweight~birthord+initage,id=momid,data=babies,corstr="exchangeable")) summary(gee(bweight~birthord+initage,id=momid,data=babies,corstr="ar-m")) summary(gee(bweight~birthord+initage,id=momid,data=babies,corstr= unstructured")) summary(gee(bweight~birthord+initage,id=momid,data=babies,corstr="independence")) R output (edited): Estimate Naive S.E. Naive z Robust S.E. Robust z birthord initage birthord initage birthord initage birthord initage The naive SE and z are valid if the assumed correlation structure is true Inference should be based on the robust SE and z, since these are valid also if the assumed "working correlation" does not hold 17

18 For the example with birth weight and birth order we have the following relation between the birth weights first second The weight of the first baby is less correlated with the others. Otherwise the weights have about the same correlation. An exchangeable correlation structure is a reasonable "working assumption" third fourth fifth Correlations: first second third fourth fiah first second third fourth fiah

19 In conclusion the birth weight data may be analyzed using generalized estimating equations with an exchangeable correlation structure (slide 14 ) or by using a random effects model (slide 11) The two models give comparable results in this example However, if we extend the generalized estimating (GEE) approach and the random effects model to generalized linear models (like logistic regression), the results need not longer agree. The GEE approach can be used for all glms (distributional families, link functions). We will only consider extension of logistic regression to dependent data. 19

20 GEE and binary data (logistic model) Example: Birth weight data, but with an indicator of low birth weight (lowbrth), i.e. < 3000g. geefit<- gee(lowbrth~birthord+initage,id=momid,family=binomial,data=babies, corstr="unstructured") R output (edited): > summary(geefit) Model: Link: Logit Variance to Mean Relation: Binomial Correlation Structure: Unstructured Coefficients: Estimate Naive S.E. Naive z Robust S.E. Robust z (Intercept) birthord initage Negative coefficients for birth order and initial age indicate that low birth weigh is less likely with increasing order and age. 20

21 GEE and Birth weight data, contd. Even the homemade expcoef function from Lecture 7 works on gee-objects More R output (edited): > expcoef(geefit) expcoef lower upper (Intercept) birthord initage

22 More on dependent data: Time series data (not in Vittinghoff et al.) In this lecture we have so far considered data with many small independent groups of subjects (measurements) dependence within each group Time series data is a different dependent data structure with Y, Y, Y,... long sequences of correlated data only one (or maybe a few) such sequences Examples Temperature on consecutive days (weeks) Stock prices on consecutive days (weeks) Sunspot activity years

23 Autocorrelation function (ACF) The correlation between observation at time t, time t-k,, is given as The Yt k ˆ( ρ k) = corr( Y, Y ) t t k ˆ( ρ k), k = 0,1,2,... Y t, and its lag at is referred to as the autocorrelation function. For the sunspot data we have ACF with high positive correlations at 11 year cycles 23

24 Uncertainty limits for the ACF The dashed horizontal lines in the ACF plots lie at values ±1.96 / n In particular for the sunspot data with n=289 this becomes ±0.118 These limits correspond to the test in Lecture 2 for using test statistic t = r n 2 1 r 2 H : ρ = 0 0 Then correlations within the uncertainty limits are not significantly different from zero. 24

25 Example: Luteinizing hormone in blood samples at 10 min. intervals from a human female, 48 samples. The sunspot example is a bit unusual in the 11-year correlation pattern. The luteinizing hormone example may be more typical of a time-series in the sense only the first (few) correlations are significantly different from 0. 25

26 Autoregressive models AR(p) Another approach for analyzing the dependence in a time series is through autoregressive models where the current value Y t is regressed on previous p values Yt 1, Yt 2, Yt 3,..., Yt p of the series, i.e. through a model Y = ay + a Y + a Y + ε t 1 t 1 2 t 2 p t p t where the ak are regression coefficients and the εt independent error terms. This is called an AR(p) model. For the special case AR(1) we have Y = ay + ε t 1 t 1 t and the current value only depends on the previous value. This is referred to as a Markov property. 26

27 Simulated Tme series from an AR(1) model Yt = ay 1 t 1+ εt with a 1 = 0.1, 0.5, 0.9 and y_t y_t time time y_t y_t time time 27

28 R commands: par(mfrow=c(2,2)) Tme =seq(1,100,1) a = c(0.1, 0.5, 0.9,- 0.9) for(i in 1:4) { y_t<- arima.sim(n,model=list(ar=a[i]),sd=1) plot(tme, y_t, type="l", main=a[i]) } AR(p) is a special case of the more general ARIMA(p,d,q) models for Tme series. AR- I- MA stands for AutoRegressive Integrated Moving Average, and p is the order of the autoregressive process, d is the order of a difference operator (helps against nonstatonarity) and q is the order of the moving average process (where the Tme series relates to the q past values of the noise). The parameters in such models can be estmated via maximum likelihood estmaton! R command: arima(y,order=c(p,d,q)) # in its most simple version 28

29 true a1 = 0.5 true a1 = 0.9 y1_t y2_t time > arima(y1_t, order=c(1,0,0)) Call: arima(x = y1_t, order = c(1, 0, 0)) Coefficients: ar1 intercept s.e sigma^2 estmated as 1.101: log likelihood = , aic = time > arima(y2_t, order=c(1,0,0)) Call: arima(x = y2_t, order = c(1, 0, 0)) Coefficients: ar1 intercept s.e sigma^2 estmated as 1.105: log likelihood = , aic =

30 STK Time series STK Survival and Event History Analysis STK StaTsTcal Learning: Advanced Regression and ClassificaTon STK Machine learning and statstcal methods for predicton and classificaton STK Applied Bayesian Analysis and Numerical Methods STK StaTsTcal Model SelecTon 30

31 THE END Mandatory number 2: To be handed in Thursday April 6 th 2017 before Feedback Tuesday 18 th and when actual, a second try to be handed in by Tuesday April 25 th 2017 before Writen 4 hours exam Friday May 12 th

Workshop 9.3a: Randomized block designs

Workshop 9.3a: Randomized block designs -1- Workshop 93a: Randomized block designs Murray Logan November 23, 16 Table of contents 1 Randomized Block (RCB) designs 1 2 Worked Examples 12 1 Randomized Block (RCB) designs 11 RCB design Simple Randomized

More information

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data.

These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. These slides illustrate a few example R commands that can be useful for the analysis of repeated measures data. We focus on the experiment designed to compare the effectiveness of three strength training

More information

Introduction to the Analysis of Hierarchical and Longitudinal Data

Introduction to the Analysis of Hierarchical and Longitudinal Data Introduction to the Analysis of Hierarchical and Longitudinal Data Georges Monette, York University with Ye Sun SPIDA June 7, 2004 1 Graphical overview of selected concepts Nature of hierarchical models

More information

1. Time-dependent data in general

1. Time-dependent data in general Lecture 10 Program 1. Time-dependent data in general 2. Repeated Measurements 3. Time series 4. Time series that depend on covariate time-series 1 Time-dependent data: Outcomes that are measured at several

More information

Lecture 9 STK3100/4100

Lecture 9 STK3100/4100 Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples

More information

Ch 6. Model Specification. Time Series Analysis

Ch 6. Model Specification. Time Series Analysis We start to build ARIMA(p,d,q) models. The subjects include: 1 how to determine p, d, q for a given series (Chapter 6); 2 how to estimate the parameters (φ s and θ s) of a specific ARIMA(p,d,q) model (Chapter

More information

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements: Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported

More information

13. October p. 1

13. October p. 1 Lecture 8 STK3100/4100 Linear mixed models 13. October 2014 Plan for lecture: 1. The lme function in the nlme library 2. Induced correlation structure 3. Marginal models 4. Estimation - ML and REML 5.

More information

Regression with correlation for the Sales Data

Regression with correlation for the Sales Data Regression with correlation for the Sales Data Scatter with Loess Curve Time Series Plot Sales 30 35 40 45 Sales 30 35 40 45 0 10 20 30 40 50 Week 0 10 20 30 40 50 Week Sales Data What is our goal with

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Repeated measures, part 1, simple methods

Repeated measures, part 1, simple methods enote 11 1 enote 11 Repeated measures, part 1, simple methods enote 11 INDHOLD 2 Indhold 11 Repeated measures, part 1, simple methods 1 11.1 Intro......................................... 2 11.1.1 Main

More information

Non-independence due to Time Correlation (Chapter 14)

Non-independence due to Time Correlation (Chapter 14) Non-independence due to Time Correlation (Chapter 14) When we model the mean structure with ordinary least squares, the mean structure explains the general trends in the data with respect to our dependent

More information

Module 6 Case Studies in Longitudinal Data Analysis

Module 6 Case Studies in Longitudinal Data Analysis Module 6 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation SISCR 2018 July 24, 2018 Learning objectives This module will focus on the design of longitudinal

More information

Univariate Time Series Analysis; ARIMA Models

Univariate Time Series Analysis; ARIMA Models Econometrics 2 Fall 24 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Outline of the Lecture () Introduction to univariate time series analysis. (2) Stationarity. (3) Characterizing

More information

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1) Identifying serial correlation. Plot Y t versus Y t 1. See

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 32 multiple choice

More information

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model

INTRODUCTION TO TIME SERIES ANALYSIS. The Simple Moving Average Model INTRODUCTION TO TIME SERIES ANALYSIS The Simple Moving Average Model The Simple Moving Average Model The simple moving average (MA) model: More formally: where t is mean zero white noise (WN). Three parameters:

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Monday 7 th Febraury 2005

Monday 7 th Febraury 2005 Monday 7 th Febraury 2 Analysis of Pigs data Data: Body weights of 48 pigs at 9 successive follow-up visits. This is an equally spaced data. It is always a good habit to reshape the data, so we can easily

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Biostatistics for physicists fall Correlation Linear regression Analysis of variance Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody

More information

LECTURE 11. Introduction to Econometrics. Autocorrelation

LECTURE 11. Introduction to Econometrics. Autocorrelation LECTURE 11 Introduction to Econometrics Autocorrelation November 29, 2016 1 / 24 ON PREVIOUS LECTURES We discussed the specification of a regression equation Specification consists of choosing: 1. correct

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

STK4900/ Lecture 5. Program

STK4900/ Lecture 5. Program STK4900/9900 - Lecture 5 Program 1. Checking model assumptions Linearity Equal variances Normality Influential observations Importance of model assumptions 2. Selection of predictors Forward and backward

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Inference with Simple Regression

Inference with Simple Regression 1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems

More information

Randomized Block Designs with Replicates

Randomized Block Designs with Replicates LMM 021 Randomized Block ANOVA with Replicates 1 ORIGIN := 0 Randomized Block Designs with Replicates prepared by Wm Stein Randomized Block Designs with Replicates extends the use of one or more random

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL B. N. MANDAL Abstract: Yearly sugarcane production data for the period of - to - of India were analyzed by time-series methods. Autocorrelation

More information

22s:152 Applied Linear Regression. Returning to a continuous response variable Y...

22s:152 Applied Linear Regression. Returning to a continuous response variable Y... 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y... Ordinary Least Squares Estimation The classical models we have fit so far with a continuous

More information

Workshop 9.1: Mixed effects models

Workshop 9.1: Mixed effects models -1- Workshop 91: Mixed effects models Murray Logan October 10, 2016 Table of contents 1 Non-independence - part 2 1 1 Non-independence - part 2 11 Linear models Homogeneity of variance σ 2 0 0 y i = β

More information

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ)

22s:152 Applied Linear Regression. In matrix notation, we can write this model: Generalized Least Squares. Y = Xβ + ɛ with ɛ N n (0, Σ) 22s:152 Applied Linear Regression Generalized Least Squares Returning to a continuous response variable Y Ordinary Least Squares Estimation The classical models we have fit so far with a continuous response

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

Classic Time Series Analysis

Classic Time Series Analysis Classic Time Series Analysis Concepts and Definitions Let Y be a random number with PDF f Y t ~f,t Define t =E[Y t ] m(t) is known as the trend Define the autocovariance t, s =COV [Y t,y s ] =E[ Y t t

More information

Mixed models with correlated measurement errors

Mixed models with correlated measurement errors Mixed models with correlated measurement errors Rasmus Waagepetersen October 9, 2018 Example from Department of Health Technology 25 subjects where exposed to electric pulses of 11 different durations

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

Intruction to General and Generalized Linear Models

Intruction to General and Generalized Linear Models Intruction to General and Generalized Linear Models Mixed Effects Models IV Henrik Madsen Anna Helga Jónsdóttir hm@imm.dtu.dk April 30, 2012 Henrik Madsen Anna Helga Jónsdóttir (hm@imm.dtu.dk) Intruction

More information

The data was collected from the website and then converted to a time-series indexed from 1 to 86.

The data was collected from the website  and then converted to a time-series indexed from 1 to 86. Introduction For our group project, we analyzed the S&P 500s futures from 30 November, 2015 till 28 March, 2016. The S&P 500 futures give a reasonable estimate of the changes in the market in the short

More information

PAPER 206 APPLIED STATISTICS

PAPER 206 APPLIED STATISTICS MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.

More information

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 217, Boston, Massachusetts Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 14 1 / 64 Data structure and Model t1 t2 tn i 1st subject y 11 y 12 y 1n1 2nd subject

More information

Theoretical and Simulation-guided Exploration of the AR(1) Model

Theoretical and Simulation-guided Exploration of the AR(1) Model Theoretical and Simulation-guided Exploration of the AR() Model Overview: Section : Motivation Section : Expectation A: Theory B: Simulation Section : Variance A: Theory B: Simulation Section : ACF A:

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL - MAY 2005 EXAMINATIONS STA 248 H1S Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 17 pages including

More information

Time Series I Time Domain Methods

Time Series I Time Domain Methods Astrostatistics Summer School Penn State University University Park, PA 16802 May 21, 2007 Overview Filtering and the Likelihood Function Time series is the study of data consisting of a sequence of DEPENDENT

More information

Repeated measures, part 2, advanced methods

Repeated measures, part 2, advanced methods enote 12 1 enote 12 Repeated measures, part 2, advanced methods enote 12 INDHOLD 2 Indhold 12 Repeated measures, part 2, advanced methods 1 12.1 Intro......................................... 3 12.2 A

More information

Model Building Chap 5 p251

Model Building Chap 5 p251 Model Building Chap 5 p251 Models with one qualitative variable, 5.7 p277 Example 4 Colours : Blue, Green, Lemon Yellow and white Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4

More information

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes Lecture 2.1 Basic Linear LDA 1 Outline Linear OLS Models vs: Linear Marginal Models Linear Conditional Models Random Intercepts Random Intercepts & Slopes Cond l & Marginal Connections Empirical Bayes

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH Cross-over Designs #: DESIGNING CLINICAL RESEARCH The subtraction of measurements from the same subject will mostly cancel or minimize effects

More information

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis CHAPTER 8 MODEL DIAGNOSTICS We have now discussed methods for specifying models and for efficiently estimating the parameters in those models. Model diagnostics, or model criticism, is concerned with testing

More information

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1 Forecasting using R Rob J Hyndman 2.4 Non-seasonal ARIMA models Forecasting using R 1 Outline 1 Autoregressive models 2 Moving average models 3 Non-seasonal ARIMA models 4 Partial autocorrelations 5 Estimation

More information

Statistical modelling: Theory and practice

Statistical modelling: Theory and practice Statistical modelling: Theory and practice Introduction Gilles Guillot gigu@dtu.dk August 27, 2013 Gilles Guillot (gigu@dtu.dk) Stat. modelling August 27, 2013 1 / 6 Schedule 13 weeks weekly time slot:

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

STAT Financial Time Series

STAT Financial Time Series STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks. Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think

More information

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1

Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 Stat 5303 (Oehlert): Balanced Incomplete Block Designs 1 > library(stat5303libs);library(cfcdae);library(lme4) > weardata

More information

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle a single answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 4, 215 Name: Please read the following directions. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directions This exam is closed book and closed notes. There are 31 questions. Circle

More information

End-Semester Examination MA 373 : Statistical Analysis on Financial Data

End-Semester Examination MA 373 : Statistical Analysis on Financial Data End-Semester Examination MA 373 : Statistical Analysis on Financial Data Instructor: Dr. Arabin Kumar Dey, Department of Mathematics, IIT Guwahati Note: Use the results in Section- III: Data Analysis using

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014

Ridge Regression. Summary. Sample StatFolio: ridge reg.sgp. STATGRAPHICS Rev. 10/1/2014 Ridge Regression Summary... 1 Data Input... 4 Analysis Summary... 5 Analysis Options... 6 Ridge Trace... 7 Regression Coefficients... 8 Standardized Regression Coefficients... 9 Observed versus Predicted...

More information

Step 2: Select Analyze, Mixed Models, and Linear.

Step 2: Select Analyze, Mixed Models, and Linear. Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

Applied Econometrics. Professor Bernard Fingleton

Applied Econometrics. Professor Bernard Fingleton Applied Econometrics Professor Bernard Fingleton 1 Causation & Prediction 2 Causation One of the main difficulties in the social sciences is estimating whether a variable has a true causal effect Data

More information

Regression of Time Series

Regression of Time Series Mahlerʼs Guide to Regression of Time Series CAS Exam S prepared by Howard C. Mahler, FCAS Copyright 2016 by Howard C. Mahler. Study Aid 2016F-S-9Supplement Howard Mahler hmahler@mac.com www.howardmahler.com/teaching

More information

A Data-Driven Model for Software Reliability Prediction

A Data-Driven Model for Software Reliability Prediction A Data-Driven Model for Software Reliability Prediction Author: Jung-Hua Lo IEEE International Conference on Granular Computing (2012) Young Taek Kim KAIST SE Lab. 9/4/2013 Contents Introduction Background

More information

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij

Hierarchical Linear Models (HLM) Using R Package nlme. Interpretation. 2 = ( x 2) u 0j. e ij Hierarchical Linear Models (HLM) Using R Package nlme Interpretation I. The Null Model Level 1 (student level) model is mathach ij = β 0j + e ij Level 2 (school level) model is β 0j = γ 00 + u 0j Combined

More information

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences Faculty of Health Sciences Longitudinal data Correlated data Longitudinal measurements Outline Designs Models for the mean Covariance patterns Lene Theil Skovgaard November 27, 2015 Random regression Baseline

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Repeated Measures Data

Repeated Measures Data Repeated Measures Data Mixed Models Lecture Notes By Dr. Hanford page 1 Data where subjects are measured repeatedly over time - predetermined intervals (weekly) - uncontrolled variable intervals between

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

MAT3379 (Winter 2016)

MAT3379 (Winter 2016) MAT3379 (Winter 2016) Assignment 4 - SOLUTIONS The following questions will be marked: 1a), 2, 4, 6, 7a Total number of points for Assignment 4: 20 Q1. (Theoretical Question, 2 points). Yule-Walker estimation

More information

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions

Spatial inference. Spatial inference. Accounting for spatial correlation. Multivariate normal distributions Spatial inference I will start with a simple model, using species diversity data Strong spatial dependence, Î = 0.79 what is the mean diversity? How precise is our estimate? Sampling discussion: The 64

More information

Confidence Interval for the mean response

Confidence Interval for the mean response Week 3: Prediction and Confidence Intervals at specified x. Testing lack of fit with replicates at some x's. Inference for the correlation. Introduction to regression with several explanatory variables.

More information

Lecture 3 Linear random intercept models

Lecture 3 Linear random intercept models Lecture 3 Linear random intercept models Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The response is measures at n different times, or under

More information

De-mystifying random effects models

De-mystifying random effects models De-mystifying random effects models Peter J Diggle Lecture 4, Leahurst, October 2012 Linear regression input variable x factor, covariate, explanatory variable,... output variable y response, end-point,

More information

at least 50 and preferably 100 observations should be available to build a proper model

at least 50 and preferably 100 observations should be available to build a proper model III Box-Jenkins Methods 1. Pros and Cons of ARIMA Forecasting a) need for data at least 50 and preferably 100 observations should be available to build a proper model used most frequently for hourly or

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

Univariate ARIMA Models

Univariate ARIMA Models Univariate ARIMA Models ARIMA Model Building Steps: Identification: Using graphs, statistics, ACFs and PACFs, transformations, etc. to achieve stationary and tentatively identify patterns and model components.

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Heteroskedasticity in Panel Data

Heteroskedasticity in Panel Data Essex Summer School in Social Science Data Analysis Panel Data Analysis for Comparative Research Heteroskedasticity in Panel Data Christopher Adolph Department of Political Science and Center for Statistics

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion

How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion How To: Deal with Heteroscedasticity Using STATGRAPHICS Centurion by Dr. Neil W. Polhemus July 28, 2005 Introduction When fitting statistical models, it is usually assumed that the error variance is the

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement

More information

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests ECON4150 - Introductory Econometrics Lecture 5: OLS with One Regressor: Hypothesis Tests Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 5 Lecture outline 2 Testing Hypotheses about one

More information

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010

Problem Set 1 Solution Sketches Time Series Analysis Spring 2010 Problem Set 1 Solution Sketches Time Series Analysis Spring 2010 1. Construct a martingale difference process that is not weakly stationary. Simplest e.g.: Let Y t be a sequence of independent, non-identically

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information