Dynamic analysis of binary longitudinal data

Size: px
Start display at page:

Download "Dynamic analysis of binary longitudinal data"

Transcription

1 Dynamic analysis of binary longitudinal data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint work with Rosemeire L. Fiaccone, Robin Henderson and Mauricio L. Barreto 1

2 Outline: - An example of binary longitudinal data: The Blue Bay project - Modelling missingness for longitudinal binary data (including the relation to independent censoring in event history analysis) - An additive model for longitudinal binary data - Dynamic covariates - Martingale residual processes - Concluding comments 2

3 Blue Bay project: Bahia State, Brazil (size of France) State capital Salvador (pop: 2.5 mill.) 3

4 Public works and education in the areas of sanitation and environment executed by the Bahia State Government since 1997 Cost: more than $1 billion Belgica 1996 Belgica

5 Data: Daily data on diarrhoea for almost a thousand children (one per family) Collected at home visits Oct 2000 to Jan 2002 Children less than 3 years of age at entry Diarrhoea: three or more fluid motions a day Episode of diarrhoea: sequence of days with diarrhea until at least two consecutive clear days 5

6 The reduced prevalence/incidence over time may reflect improved health over the study period, or may be an artefact due to ageing of the cohort 6

7 Social, demographic and economic characteristics collected at entry to the study: 7

8 Follow-up information on 10 children: Under observation: New episode: X Ongoing episode: X Drop-out: O 8

9 Pattern of missing observations for all 926 children: Non-available data collector Police strike Carnival St. John's day Christmas Day 9

10 Three types of missingness: - Late entries (16% of children) - Drop-outs (21% of children) - Intermittent missingness (20% of observations) 10

11 Features of the data: Longitudinal binary data Four time scales: calendar, age, study, episode Calendar time used as basic time scale Aims: Study factors of importance for incidence and prevalence of diarrhoea and how diarrhoea incidence and prevalence vary over calendar time Ignored (for this talk): Spatial associations Other non-independence 11

12 Modelling missingness: Model for binary data without missingness Joint model for binary data and missingness Conditions on the missingness are defined for this model Model for observed data Parameters of interest are defined for this model Statistical methods are derived and studied for this model We need to relate the models for the three situations (starting with models for one individual) 12

13 Model without missingness Observations for child i is a binary time series H i0 Yɶ, Yɶ,..., Yɶ i1 i2 it Y ɶ = 1 Here it if the child starts a new episode of diarrhea at day t (has diarrhoea at day t) Let be the σ-algebra generated by the fixed and external time-varying covariates for child i H = H σ { Yɶ, Yɶ,..., Yɶ } it i0 i1 i 2 it is the information that had been available on child i by day t had there been no missingness 13

14 Introduce the conditional probabilities α = P( Yɶ = 1 H ) it it i, t 1 The aim for our analysis is to study how the vary over time and how they depend on covariates, This differs from the common approach in longitudinal data analysis, where the focus is on the marginal probabilities µ = P( Y ɶ = 1 H ) it it i0 α it including dynamic covariates that are functions of for s < t Y ɶ is 14

15 Joint model for binary longitudinal data and missingness Introduce the observation process for individual i R it 1 observed at t = 0 not observed at t We need to consider the larger filtration: G = G σ{ R, Yɶ, R, Yɶ,..., R, Yɶ } it i0 i1 i1 i2 i2 it it G H i 0 where i0 is generated by and external aspects of the observation process for child i 15

16 We make two assumption on the missingness: (1) P( Yɶ = 1 G ) = P( Yɶ = 1 H ) it i, t 1 it i, t 1 (2) Y ɶ and R are indpendent, given G it it i, t 1 These assumption correspond to: sequential MAR in longitudinal data analysis independent censoring in event history analysis 16

17 Modelling the observable data Binary observations for individual i : Y = R Yɶ Observed filtration: (Note that we for convenience have included in the definition of ) it it it Fit = Gi 0 σ { Ri 1, Yi 1, Ri 2, Yi 2,..., Rit, Yit, Ri, t+ 1} R i, t + 1 F it 17

18 Then: λit = P( Yit = 1 Fi, t 1) = E{E( R Yɶ G, R ) F } it it i, t 1 it i, t 1 = R E{P( Yɶ = 1 G ) F } = R it it i, t 1 i, t 1 it E{ α F } it i, t 1 α it We will assume that is predictable, implying that the time-dependent dynamic covariates used for regression modelling depend only on observables F it Thus: λ = it Ri t α it 18

19 Intoduce ε Y λ it = it it The ε it are martingale differences M it = s t ε is is a discrete time martingale Predictable variation process: = M Var( ε F ) = (1 ) i t is i, s 1 s t λis s t λ is 19

20 Modelling the relation between individuals Denote by F t the information available to the researcher on all children by day t We impose the following assumptions: (i) (ii) λit = Yit = Fi, t 1 = Yit = Ft 1 P( 1 ) P( 1 ) Cov( ε, ε F ) = 0 for i j it jt t 1 The assumptions are weaker than independence Nevertheless they are debatable [(i) in particular] for the diarrhoea data Note that (ii) implies that the martingales and are orthogonal M it M jt 20

21 An additive model for longitudinal binary data Have the decomposition λ ε Y = + it it i t Let x i1t,, x ipt child i at day t be predictable covariates for Consider the model λ = it Ri t α it = { } 0t + 1t i1 t pt ipt Rit β β x β x 21

22 Conditional on "the past" F t-1 we at day t have Y λ ε it = it + it = β + β + + β + ε 0tRit 1 trit xi 1t... ptrit xipt it i.e. a linear regression model β jt We may estimate the by ordinary least squares at each day t (quick!) The estimates for each day will be quite unstable, but they may be accumulated over time to get stable estimates for the cumulative regression coefficients B jt = s t β js 22

23 Some estimated cumulative regression coefficients for a model for incidence with fixed covariates (may be interpreted as expected numbers) 23

24 We have (using "obvious" matrix notation) Bˆ t = s t βˆ = ( T ) 1 T s s s s s t s t ( X X ) 1 = B + T X T t s s s s = Xβ s s + εs Properties may be derived using martingale methods as for Aalen's additive hazards model for time-continuous event history data. T s t ˆ t B X X X Y s ε martingale transformation In particular is approximately multivariate normal with a covariance matrix that may be estimated by ˆ 1 1 ( ) T diag{ (1 )} ( T X ) s Xs Xs λ s λ s Xs Xs Xs ˆ 24

25 Dynamic covariates How can past episodes of diarrhoea be used to predict future episodes? 25

26 Consider dynamic covariates of the form: xɶ it = s< t s< t w st st Y w R is is with Y is incidence (prevalence) of diarrhoea w st 1 for t s τ = exp { ρ( t s τ )} for t s > τ Use τ = 30 days and ρ = 0.01 below 26

27 A dynamic covariate may be on the causal pathway between a fixed covariate and the event process The inclusion of dynamic covariates in the analysis may distort the estimation of the effects of the fixed covariates To avoid such distortion we at each time t regress the dynamic covariates on the fixed covariates and use the residuals from these fits as new covariates This procedure keeps the effect of the fixed covariates the same as in the model without the dynamic covariates 27

28 Cumulative regression coefficients for incidence: Average number of diarrhoea episodes Average number of days with diarrhoea Also: male, 3 or more per bedroom, contaminated water source, open sewerage, rain affected accommodation, young mother 28

29 Martingale residual processes ˆ = ( ˆ λ ) it is is M Y { T T 1 T 1 ( ) } s t = s t x X X X ε s s s s s martingale transformation Examples of standardized martingale residual processes (standardized by model based SDs) 29

30 Empirical standard deviations of the martingale residual processes: 30

31 Cumulative regression coefficients for prevalence: Baseline Average number of days with diarrhoea Diarrhoea previous day (lag 1) Lag 2 (residual effect) Lag 3 (residual effect) Lag 4 (residual effect) Also: male, age, 3 or more per bedroom, poor street, contaminated water storage and source, standing water, open sewerage, rain affected accommodation, young mother 31

32 Prevalence: empirical standard deviations of the martingale residual processes 32

33 Not Markovian! 33

34 Concluding comments: A dynamic additive model provides a flexible framework for analyzing longitudinal binary data The method illustrate how ideas and approaches from event history analysis may be useful for analysis of longitudinal data Advantage: method is computationally very quick Drawback: incidence and prevalence are not restricted to the range 0 to 1 Methodological work is needed, in particular on methods for model selection and goodness-of-fit 34

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Massimiliano Bratti & Alfonso Miranda In many fields of applied work researchers need to model an

More information

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan

( t) Cox regression part 2. Outline: Recapitulation. Estimation of cumulative hazards and survival probabilites. Ørnulf Borgan Outline: Cox regression part 2 Ørnulf Borgan Department of Mathematics University of Oslo Recapitulation Estimation of cumulative hazards and survival probabilites Assumptions for Cox regression and check

More information

ECON 3150/4150, Spring term Lecture 7

ECON 3150/4150, Spring term Lecture 7 ECON 3150/4150, Spring term 2014. Lecture 7 The multivariate regression model (I) Ragnar Nymoen University of Oslo 4 February 2014 1 / 23 References to Lecture 7 and 8 SW Ch. 6 BN Kap 7.1-7.8 2 / 23 Omitted

More information

DAGStat Event History Analysis.

DAGStat Event History Analysis. DAGStat 2016 Event History Analysis Robin.Henderson@ncl.ac.uk 1 / 75 Schedule 9.00 Introduction 10.30 Break 11.00 Regression Models, Frailty and Multivariate Survival 12.30 Lunch 13.30 Time-Variation and

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression Model 1 2 Ordinary Least Squares 3 4 Non-linearities 5 of the coefficients and their to the model We saw that econometrics studies E (Y x). More generally, we shall study regression analysis. : The regression

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, )

Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, ) Factor Analytic Models of Clustered Multivariate Data with Informative Censoring (refer to Dunson and Perreault, 2001, Biometrics 57, 302-308) Consider data in which multiple outcomes are collected for

More information

MIT Spring 2015

MIT Spring 2015 Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)

More information

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility

Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility American Economic Review: Papers & Proceedings 2016, 106(5): 400 404 http://dx.doi.org/10.1257/aer.p20161082 Fixed Effects, Invariance, and Spatial Variation in Intergenerational Mobility By Gary Chamberlain*

More information

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology

Group Sequential Tests for Delayed Responses. Christopher Jennison. Lisa Hampson. Workshop on Special Topics on Sequential Methodology Group Sequential Tests for Delayed Responses Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Lisa Hampson Department of Mathematics and Statistics,

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Discrete Choice Modeling

Discrete Choice Modeling [Part 4] 1/43 Discrete Choice Modeling 0 Introduction 1 Summary 2 Binary Choice 3 Panel Data 4 Bivariate Probit 5 Ordered Choice 6 Count Data 7 Multinomial Choice 8 Nested Logit 9 Heterogeneity 10 Latent

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 217, Boston, Massachusetts Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

ECON 450 Development Economics

ECON 450 Development Economics ECON 450 Development Economics Statistics Background University of Illinois at Urbana-Champaign Summer 2017 Outline 1 Introduction 2 3 4 5 Introduction Regression analysis is one of the most important

More information

Chapter 6 Stochastic Regressors

Chapter 6 Stochastic Regressors Chapter 6 Stochastic Regressors 6. Stochastic regressors in non-longitudinal settings 6.2 Stochastic regressors in longitudinal settings 6.3 Longitudinal data models with heterogeneity terms and sequentially

More information

Decomposing the Intergenerational Transmission of Income

Decomposing the Intergenerational Transmission of Income Rich Dad, Smart Dad: Decomposing the Intergenerational Transmission of Income Lars Lefgren (Brigham Young University) Matthew Lindquist (Stockholm University) David Sims (Brigham Young University) Research

More information

Recitation 1: Regression Review. Christina Patterson

Recitation 1: Regression Review. Christina Patterson Recitation 1: Regression Review Christina Patterson Outline For Recitation 1. Statistics. Bias, sampling variance and hypothesis testing.. Two important statistical theorems: Law of large numbers (LLN)

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Multistate models in survival and event history analysis

Multistate models in survival and event history analysis Multistate models in survival and event history analysis Dorota M. Dabrowska UCLA November 8, 2011 Research supported by the grant R01 AI067943 from NIAID. The content is solely the responsibility of the

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Longitudinal and Multilevel Methods for Multinomial Logit David K. Guilkey

Longitudinal and Multilevel Methods for Multinomial Logit David K. Guilkey Longitudinal and Multilevel Methods for Multinomial Logit David K. Guilkey Focus of this talk: Unordered categorical dependent variables Models will be logit based Empirical example uses data from the

More information

Analysing geoadditive regression data: a mixed model approach

Analysing geoadditive regression data: a mixed model approach Analysing geoadditive regression data: a mixed model approach Institut für Statistik, Ludwig-Maximilians-Universität München Joint work with Ludwig Fahrmeir & Stefan Lang 25.11.2005 Spatio-temporal regression

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Corporate Finance Data & The Role of Dynamic Panels. Mark Flannery, University of Florida Kristine W. Hankins, University of Kentucky

Corporate Finance Data & The Role of Dynamic Panels. Mark Flannery, University of Florida Kristine W. Hankins, University of Kentucky Corporate Finance Data & The Role of Dynamic Panels Mark Flannery, University of Florida Kristine W. Hankins, University of Kentucky Panel Data Fixed Effects Matter Growing Focus on Methodology Peterson,

More information

What s New in Econometrics. Lecture 1

What s New in Econometrics. Lecture 1 What s New in Econometrics Lecture 1 Estimation of Average Treatment Effects Under Unconfoundedness Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Potential Outcomes 3. Estimands and

More information

Spatial Discrete Choice Models

Spatial Discrete Choice Models Spatial Discrete Choice Models Professor William Greene Stern School of Business, New York University SPATIAL ECONOMETRICS ADVANCED INSTITUTE University of Rome May 23, 2011 Spatial Correlation Spatially

More information

Prediction of New Observations

Prediction of New Observations Statistic Seminar: 6 th talk ETHZ FS2010 Prediction of New Observations Martina Albers 12. April 2010 Papers: Welham (2004), Yiang (2007) 1 Content Introduction Prediction of Mixed Effects Prediction of

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work

More information

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -30 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc.

INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY. Lecture -30 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. INDIAN INSTITUTE OF SCIENCE STOCHASTIC HYDROLOGY Lecture -30 Course Instructor : Prof. P. P. MUJUMDAR Department of Civil Engg., IISc. Summary of the previous lecture IDF relationship Procedure for creating

More information

Itemwise Conditionally Independent Nonresponse Modeling for Incomplete Multivariate Data 1

Itemwise Conditionally Independent Nonresponse Modeling for Incomplete Multivariate Data 1 Itemwise Conditionally Independent Nonresponse Modeling for Incomplete Multivariate Data 1 Mauricio Sadinle Duke University and NISS Supported by NSF grant SES-11-31897 1 Joint work with Jerry Reiter What

More information

Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral

Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral 1 Migration Clusters in Brazil: an Analysis of Areas of Origin and Destination Ernesto Friedrich Amaral Research question and data The main goal of this research is to analyze whether the pattern of concentration

More information

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis Introduction to Time Series Analysis 1 Contents: I. Basics of Time Series Analysis... 4 I.1 Stationarity... 5 I.2 Autocorrelation Function... 9 I.3 Partial Autocorrelation Function (PACF)... 14 I.4 Transformation

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Econometric Methods for Panel Data

Econometric Methods for Panel Data Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies

More information

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models Journal of Finance and Investment Analysis, vol.1, no.1, 2012, 55-67 ISSN: 2241-0988 (print version), 2241-0996 (online) International Scientific Press, 2012 A Non-Parametric Approach of Heteroskedasticity

More information

Gibbs Sampling in Latent Variable Models #1

Gibbs Sampling in Latent Variable Models #1 Gibbs Sampling in Latent Variable Models #1 Econ 690 Purdue University Outline 1 Data augmentation 2 Probit Model Probit Application A Panel Probit Panel Probit 3 The Tobit Model Example: Female Labor

More information

Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records

Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records 1 / 22 Sample-weighted semiparametric estimates of cause-specific cumulative incidence using left-/interval censored data from electronic health records Noorie Hyun, Hormuzd A. Katki, Barry I. Graubard

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework

More information

Exercise sheet 6 Models with endogenous explanatory variables

Exercise sheet 6 Models with endogenous explanatory variables Exercise sheet 6 Models with endogenous explanatory variables Note: Some of the exercises include estimations and references to the data files. Use these to compare them to the results you obtained with

More information

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance

A Bootstrap Test for Causality with Endogenous Lag Length Choice. - theory and application in finance CESIS Electronic Working Paper Series Paper No. 223 A Bootstrap Test for Causality with Endogenous Lag Length Choice - theory and application in finance R. Scott Hacker and Abdulnasser Hatemi-J April 200

More information

Problem Set # 1. Master in Business and Quantitative Methods

Problem Set # 1. Master in Business and Quantitative Methods Problem Set # 1 Master in Business and Quantitative Methods Contents 0.1 Problems on endogeneity of the regressors........... 2 0.2 Lab exercises on endogeneity of the regressors......... 4 1 0.1 Problems

More information

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A) PhD/MA Econometrics Examination January, 2015 Total Time: 8 hours MA students are required to answer from A and B. PhD students are required to answer from A, B, and C. PART A (Answer any TWO from Part

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Econometrics for PhDs

Econometrics for PhDs Econometrics for PhDs Amine Ouazad April 2012, Final Assessment - Answer Key 1 Questions with a require some Stata in the answer. Other questions do not. 1 Ordinary Least Squares: Equality of Estimates

More information

Causality through the stochastic system approach

Causality through the stochastic system approach Causality through the stochastic system approach Daniel Commenges INSERM, Centre de Recherche Epidémiologie et Biostatistique, Equipe Biostatistique, Bordeaux http://sites.google.com/site/danielcommenges/

More information

Missing dependent variables in panel data models

Missing dependent variables in panel data models Missing dependent variables in panel data models Jason Abrevaya Abstract This paper considers estimation of a fixed-effects model in which the dependent variable may be missing. For cross-sectional units

More information

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes Chapter 1 Introduction What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes 1.1 What are longitudinal and panel data? With regression

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Dynamic Models Part 1

Dynamic Models Part 1 Dynamic Models Part 1 Christopher Taber University of Wisconsin December 5, 2016 Survival analysis This is especially useful for variables of interest measured in lengths of time: Length of life after

More information

Model Adequacy Test for Cox Proportional Hazard Model

Model Adequacy Test for Cox Proportional Hazard Model Delft University of Technology Faculty of Electrical Engineering, Mathematics and Computer Science Delft Institute of Applied Mathematics Master of Science Thesis Model Adequacy Test for Cox Proportional

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Econometrics Problem Set 4

Econometrics Problem Set 4 Econometrics Problem Set 4 WISE, Xiamen University Spring 2016-17 Conceptual Questions 1. This question refers to the estimated regressions in shown in Table 1 computed using data for 1988 from the CPS.

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation Patrick Breheny February 8 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/27 Introduction Basic idea Standardization Large-scale testing is, of course, a big area and we could keep talking

More information

Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs"

Supplemental Appendix to Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs Supplemental Appendix to "Alternative Assumptions to Identify LATE in Fuzzy Regression Discontinuity Designs" Yingying Dong University of California Irvine February 2018 Abstract This document provides

More information

Outline. Overview of Issues. Spatial Regression. Luc Anselin

Outline. Overview of Issues. Spatial Regression. Luc Anselin Spatial Regression Luc Anselin University of Illinois, Urbana-Champaign http://www.spacestat.com Outline Overview of Issues Spatial Regression Specifications Space-Time Models Spatial Latent Variable Models

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall

Structural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall 1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Duke University Medical, Dept. of Biostatistics Joint work

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø. Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p

Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø. Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p Title A hierarchical frailty model applied to two-generation melanoma data Author(s) Moger, TA; Haugen, M; Yip, BHK; Gjessing, HK; Borgan, Ø Citation Lifetime Data Analysis, 2010, v. 17, n. 3, p. 445-460

More information

Truncation and Censoring

Truncation and Censoring Truncation and Censoring Laura Magazzini laura.magazzini@univr.it Laura Magazzini (@univr.it) Truncation and Censoring 1 / 35 Truncation and censoring Truncation: sample data are drawn from a subset of

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

A Regression Model for the Copula Graphic Estimator

A Regression Model for the Copula Graphic Estimator Discussion Papers in Economics Discussion Paper No. 11/04 A Regression Model for the Copula Graphic Estimator S.M.S. Lo and R.A. Wilke April 2011 2011 DP 11/04 A Regression Model for the Copula Graphic

More information

Lesson 17: Vector AutoRegressive Models

Lesson 17: Vector AutoRegressive Models Dipartimento di Ingegneria e Scienze dell Informazione e Matematica Università dell Aquila, umberto.triacca@ec.univaq.it Vector AutoRegressive models The extension of ARMA models into a multivariate framework

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Approaches to Modeling Menstrual Cycle Function

Approaches to Modeling Menstrual Cycle Function Approaches to Modeling Menstrual Cycle Function Paul S. Albert (albertp@mail.nih.gov) Biostatistics & Bioinformatics Branch Division of Epidemiology, Statistics, and Prevention Research NICHD SPER Student

More information

Instrumental variables estimation in the Cox Proportional Hazard regression model

Instrumental variables estimation in the Cox Proportional Hazard regression model Instrumental variables estimation in the Cox Proportional Hazard regression model James O Malley, Ph.D. Department of Biomedical Data Science The Dartmouth Institute for Health Policy and Clinical Practice

More information

Time series and Forecasting

Time series and Forecasting Chapter 2 Time series and Forecasting 2.1 Introduction Data are frequently recorded at regular time intervals, for instance, daily stock market indices, the monthly rate of inflation or annual profit figures.

More information

A simple bivariate count data regression model. Abstract

A simple bivariate count data regression model. Abstract A simple bivariate count data regression model Shiferaw Gurmu Georgia State University John Elder North Dakota State University Abstract This paper develops a simple bivariate count data regression model

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data An application of the GAM-PCA-VAR model to respiratory disease and air pollution data Márton Ispány 1 Faculty of Informatics, University of Debrecen Hungary Joint work with Juliana Bottoni de Souza, Valdério

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Predicting Long-term Exposures for Health Effect Studies

Predicting Long-term Exposures for Health Effect Studies Predicting Long-term Exposures for Health Effect Studies Lianne Sheppard Adam A. Szpiro, Johan Lindström, Paul D. Sampson and the MESA Air team University of Washington CMAS Special Session, October 13,

More information

The Multiple Regression Model Estimation

The Multiple Regression Model Estimation Lesson 5 The Multiple Regression Model Estimation Pilar González and Susan Orbe Dpt Applied Econometrics III (Econometrics and Statistics) Pilar González and Susan Orbe OCW 2014 Lesson 5 Regression model:

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Multi-state Models: An Overview

Multi-state Models: An Overview Multi-state Models: An Overview Andrew Titman Lancaster University 14 April 2016 Overview Introduction to multi-state modelling Examples of applications Continuously observed processes Intermittently observed

More information

Changes in the Transitory Variance of Income Components and their Impact on Family Income Instability

Changes in the Transitory Variance of Income Components and their Impact on Family Income Instability Changes in the Transitory Variance of Income Components and their Impact on Family Income Instability Peter Gottschalk and Sisi Zhang August 22, 2010 Abstract The well-documented increase in family income

More information