Consider Table 1 (Note connection to start-stop process).

Similar documents
Logit estimates Number of obs = 5054 Wald chi2(1) = 2.70 Prob > chi2 = Log pseudolikelihood = Pseudo R2 =

POLI 7050 Spring 2008 February 27, 2008 Unordered Response Models I

Understanding the multinomial-poisson transformation

Ph.D. course: Regression models

Chapter 11. Regression with a Binary Dependent Variable

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 12: Effect modification, and confounding in logistic regression

Generalized logit models for nominal multinomial responses. Local odds ratios

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Binary Dependent Variables

One-stage dose-response meta-analysis

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

How To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 4.20) revised

Homework Solutions Applied Logistic Regression

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Lecture 2: Poisson and logistic regression

Stat 587: Key points and formulae Week 15

Survival Analysis Math 434 Fall 2011

ECON 594: Lecture #6

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised January 20, 2018

Lecture 5: Poisson and logistic regression

Beyond GLM and likelihood

Lecture 10: Introduction to Logistic Regression

Discrete Choice Models I

Lecture 6 PREDICTING SURVIVAL UNDER THE PH MODEL

Logistic Regression Models for Multinomial and Ordinal Outcomes

Duration Analysis. Joan Llull

Stat 642, Lecture notes for 04/12/05 96

1 The problem of survival analysis

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Statistical Questions: Classification Modeling Complex Dose-Response

disc choice5.tex; April 11, ffl See: King - Unifying Political Methodology ffl See: King/Tomz/Wittenberg (1998, APSA Meeting). ffl See: Alvarez

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

Model Estimation Example

raise Coef. Std. Err. z P> z [95% Conf. Interval]

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Calculating Effect-Sizes. David B. Wilson, PhD George Mason University

2. We care about proportion for categorical variable, but average for numerical one.

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Binary Dependent Variable. Regression with a

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

MAS3301 / MAS8311 Biostatistics Part II: Survival

Residuals and model diagnostics

POLI 7050 Spring 2008 March 5, 2008 Unordered Response Models II

Chapter 10 Nonlinear Models

Logistic Regression: Regression with a Binary Dependent Variable

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

STAT 7030: Categorical Data Analysis

22s:152 Applied Linear Regression. Chapter 2: Regression Analysis. a class of statistical methods for

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, Last revised April 6, 2015

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

,..., θ(2),..., θ(n)

TMA 4275 Lifetime Analysis June 2004 Solution

Generalized Linear Models for Non-Normal Data

Analysis of Time-to-Event Data: Chapter 6 - Regression diagnostics

Latent class analysis and finite mixture models with Stata

Part III. Hypothesis Testing. III.1. Log-rank Test for Right-censored Failure Time Data

Statistics in medicine

Lecture 3.1 Basic Logistic LDA

Item Reliability Analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Lecture 9. Statistics Survival Analysis. Presented February 23, Dan Gillen Department of Statistics University of California, Irvine

Logistic regression model for survival time analysis using time-varying coefficients

Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

multilevel modeling: concepts, applications and interpretations

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Chapter 1 Statistical Inference

β j = coefficient of x j in the model; β = ( β1, β2,

1 Introduction. 2 Residuals in PH model

Ch 6: Multicategory Logit Models

Applied Survival Analysis Lab 10: Analysis of multiple failures

Ex: Cubic Relationship. Transformations of Predictors. Ex: Threshold Effect of Dose? Ex: U-shaped Trend?

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

OFFICIAL JOURNAL OF THE AMERICAN SOCIOLOGICAL ASSOCIATION

Generalized linear models

Neural networks (not in book)

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Cox s proportional hazards/regression model - model assessment

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Statistical Modelling with Stata: Binary Outcomes

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Control Function and Related Methods: Nonlinear Models

Semiparametric Regression

Instantaneous geometric rates via Generalized Linear Models

Poisson regression: Further topics

7 Sensitivity Analysis

Analyzing Proportions

Transcription:

Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event Time I.D. Occurrence Year Elapsed 1 0 1974 1 1 0 1975 2.... 1 0 1986 13 1 1 1987 14 5 1 1974 1 45 0 1974 1 45 0 1975 2.... 45 0 1992 19 45 0 1993 20 These data are a portion of a data set originally analyzed in Brace, Hall, and Langer (1999). I thank Laura Langer for letting us use them. 1

Discrete Models and Math f(t) = Pr(T = t i ) (1) S(t) = Pr(T t i ) = j if(t j ) (2) h(t) = f(t) S(t) (3) Likelihood L = n [ h(ti ) t 1 (1 h(t i )) ] y [ it t (1 h(t i )) ] 1 y it (4) i i=1 i=1 L = n i=1 {f(t)} y it {S(t)} 1 y it, (5) A Generic Model h(t) = Pr(T = t i T t i,x) (6) λ it = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki. (7) 2

Some Binary Link Functions Logit log ( λ i 1 π ) = β0 + β 1 x 1i + β 2 x 2i +... + β k x ki. (8) ˆλ i = eβ x 1 + e β x, (9) Probit Φ 1 [λ i ] = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki, (10) ˆλ i = Φ(β x). (11) Complementary Log-Log log[ log(1 λ i )] = β 0 + β 1 x 1i + β 2 x 2i +... + β k x ki. (12) ˆλ i = 1 exp[ exp(β x)]. (13) 3

The Exponential Equivalent of a Logit EH Model Recall the exponential model. There is a close connection between that model s parameterization of the baseline hazard and the parameterization obtained by garden variety binary link models. log ( λ i 1 λ i ) = β0 + β 1 x 1i + β 2 x 2i, where x ki are two covariates of interest that have a mean of 0, and β 0 is the constant term. The baseline hazard under this model would be equivalent to ˆλ i = h 0 (t) = exp(β 0 ), which is a constant. The baseline hazard is thus an exponential equivalent. 4

Transformations on T The previous result suggest we may want to try different functional forms for time. There are a variety of choices: Ignore it (not usually recommended!) Piecewise Functions (perhaps with restrictions over sets of parameters) Standard transformations (i.e. logs, polynomials, etc.) Smoothing Functions (splines, lowess, etc.) 5

Figure 1: This figure illustrates different ways that the duration time may be transformed to account for duration dependency in the discrete-time model. The lines in the graph represent a cubic transformation, natural log transformation, time entered linearly, no time dependency (exponential), and a quadratic transformation. 6

Smoothing Functions Let s consider smoothing options. With smoothers, we want to find a function that provides a smooth characterization of the survival times. Splines At issue with spline functions is the determination of the location of so-called knot points that is, the point at which two segments of T are joined together. The choice of knot placement as well as the number of knots used in the spline function can be an issue. It is common that knots are placed at given percentiles of T, or that cubic spline functions are implemented. What makes the use of splines (as well as other smoothing methods) appealing is that they provide a very general way to characterize duration dependency. Few assumptions regarding duration dependency need to be made, as the shape of the function is empirically determined (conditional on the location and number of knot point placements). This flexibility is desirable insofar as it avoids having to make (perhaps wrong) assumptions about the shape of the baseline hazard rate or having to make (perhaps arbitrary) decisions regarding transformations of T. 7

Locally Weighted Scatterplot Smoothing (Lowesss) Lowess proceeds by estimating the relationship between y i and x i at several target points over the range of x. For each value of y i, a smoothed value of y i is estimated. The resulting smoothed function, in the context of the discrete-time duration model, can be used to characterize the baseline hazard function. As with the use of spline functions, there are important issues. Among them is the bandwidth decision. Bandwidth denotes the fraction or proportion of the data that is used in smoothing each point. The larger the bandwidth, the greater the data reduction (and the more smooth the function becomes); the smaller the bandwidth, the closer the lowess function follows the actual data. Note: splines were estimated using the btscs.do module. You can get it at: http://www.vanderbilt.edu/~rtucker/papers/btscs/ajps98/ 8

Baseline Hazard Functions.25.2.15 Natural Log.1 Cubic Spline Lowess.05 0 5 10 15 20 Years Since Roe vs. Wade Figure 2: This figure illustrates a cubic spline function, lowess, and natural log transformation of the baseline hazard function from a model of restrictive abortion policy adoption. 9

Some Similarity Between the Cox Model and the Conditional Logit Model Let s (quickly) get to the conditional logit. i individuals who choose among J alternatives (J 3) e.g. Voters= i, Parties=J, j = 1, 2,... J RUM Formulation: U ij = βx i + ǫ ij Example: U ij = β j0 + β j1 Ideology i + β j2 Gender 1 + β j3 Education i. Multinomial Logit x i can have different effects for each J Gives rise to multinomial logit estimator: Pr(y i = J) = exp(β jx i ) 1 + exp(β K x i ) After normalization, the log-linear model is obtained: log P ij P i1 = Σ(β j x i ) For J 1 alternatives, J 1 non-redundant logits (sometimes called baseline category logit). 10

Conditional Logit Under MNL, we are concerned with x i. These are attributes of the ith individual the chooser. Choice may be conditioned on choosers and choice attributes: x ij are covariates that vary across choosers and choices: i.e. party proximity scores will vary across the J parties. w i are covariates that vary across choices but not across individuals: i.e. ideological self-placement, gender, education. This may lead to consideration of a utility model like U ij = β 1 Proximity ij +γ j0 +γ j1 Ideology i +γ j2 Gender i +γ j3 Education i. This shows dependency on x ij and w i and leads to consideration of the conditional logit estimator: P ij = exp(β jx ij + γ j w i ) Σ J k=1 exp(β j x ij + γ j w i ) which is equivalent to which yields the important result: P ij = exp(β jx ij ) exp(γ j w i ) Σ J k=1 exp(β j x ij ) exp(γ j w i ) P ij = exp(β jx ij ) Σ J k=1 exp(β j x ij ) Important implication: any factor that is constant is dropped out of the model. Here, all w i are fixed within individuals; they cannot be estimated. 11

Some data Choice set consists of 4 parties (i.e. J = 4). Proximity measures the proximity of chooser i to party j: it is an x ij variable. Gender, education, and ideology are all constants within the chooser (i.e. they vary across choices, but not choosers): it is a w i variable. +------------------------------------------------------------------+ ID choice set choice proximity gender educ. ideology ------------------------------------------------------------------ 1. 1 1 0 7 1 16 7 2. 1 2 1 9 1 16 7 3. 1 3 0 3 1 16 7 4. 1 4 0 1 1 16 7 5. 2 1 0 1 0 13 2 6. 2 2 0 1 0 13 2 7. 2 3 0 7 0 13 2 8. 2 4 1 10 0 13 2 9. 3 1 1 9 1 10 10 10. 3 2 0 6 1 10 10 11. 3 3 0 2 1 10 10 12. 3 4 0 2 1 10 10 13. 4 1 0 3 0 14 8 14. 4 2 1 7 0 14 8 15. 4 3 0 3 0 14 8 16. 4 4 0 1 0 14 8 17. 5 1 0 2 1 12 3 18. 5 2 0 3 1 12 3 19. 5 3 1 7 1 12 3 20. 5 4 0 5 1 12 3 21. 6 1 0 8 0 18 9 22. 6 2 1 10 0 18 9 23. 6 3 0 2 0 18 9 24. 6 4 0 1 0 18 9 +------------------------------------------------------------------+ 12

Let s apply conditional logit:. clogit choice proximity gender education ideology, group(id) nolog note: ideology omitted due to no within-group variance. note: education omitted due to no within-group variance. note: gender omitted due to no within-group variance. Conditional (fixed-effects) logistic regression Number of obs = 80 LR chi2(1) = 50.84 Prob > chi2 = 0.0000 Log likelihood = -2.308022 Pseudo R2 = 0.9168 ------------------------------------------------------------------------------ choice Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- proximity 1.645505.741898 2.22 0.027.1914115 3.099598 ------------------------------------------------------------------------------ Note what happens if we specify the model with the w i covariates: They drop out. Implications: Repeating: fixed factors within groups are dropped out. For this reason, the conditional logit is sometimes called a fixed effects model (group-level factors are fixed). There is an equivalency to the MNL model (though I won t talk about it here).... there is also an equivalency to a Cox model... and I will talk about that! 13

Table 2: Example of Matched Case-Control Duration Data Observation Risk Event Duration Number Period Occurrence Time Risk Period 1 1 1 0 5 2 1 0 5 3 1 0 5 4 1 1 5 5 1 1 5 Risk Period 2 1 2 0 11 2 2 1 11 3 2 1 11 Some Similarites: This kind of fixed-effects model is also known as a matched casecontrol model. Observations are matched on some attribute; in the party choice example, the matching is on the individual. Cases record whether or not some outcome has occurred within the grouping and controls record the non-occurrence of an outcome (i.e. these are the 1s and 0s) Above, we have 1 : 3 matching for each grouping: an individual can only chose 1 party (the case); the other 3 are not selected (the controls). We can think of duration data as matched case-control data. 14

What are cases matched on? THE ORDERED FAILURE TIMES... or equivalently, the periods of risk. Sound like a Cox model? Take a look at Table 1. In risk period 1, there are five matched observations. Of these five observations, there are two cases and three controls which elicits a ratio of 2 : 3 matching. For risk period two, two cases are observed and one control, for 2 : 1 matching. The case is the event s occurrence. In standard choice models, the case is the choice made from the available choice-set. There is a fundamental equivalency in the structure of the data. Suppose there are k = 1, 2,... K distinct risk periods in the data set (equivalently, there are k ordered failure times observed), and i = 1, 2,... J k observations at risk in the kth period. The response pattern of the dependent variable for the kth risk period is y k = (y k1 + y k2 +... + y kjk ). The total number of events, n 1k (or cases ) observed in the kth risk period is: n 1k = J i=1 y ki. (14) 15

The total number of non-events or controls is: n 0k = J k n 1k. (15) The probability of the response pattern y k is then estimated, conditional on n 1k. It is: Pr(y k J i=1 y ki = n 1k ) = exp(β J i=1 x ki y ki ) dk R k exp(β J i=1 x ki d ki ), (16) where R k denotes all possible combinations of ones and zeroes, i.e., cases and controls, in the kth risk period, d k = (d k1, d k2,..., d kj ), d ki = 0 or 1. Look familiar? It s a very slightly modified version of the conditional logit estimator. Why the modifications? We have to account for the response pattern at each failure time and we have to account for the fact that the risk pool decreases in size after each failure time. What we are doing is estimating the probability of the response pattern y k for each risk period R k, for all possible combinations of events and non-events, i.e., 1s and 0s, observed among the J observations at risk. 16