Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies

Size: px

Start display at page:

Download "Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies"

Randall Little
5 years ago
Views:

1 Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies Thierry Duchesne 1 (Thierry.Duchesne@mat.ulaval.ca) with Radu Craiu, Daniel Fortin, Sophie Baillargeon Département de mathématiques et de statistique, Université Laval Department of Statistics, University of Toronto Département de biologie, Université Laval Department of Statistics Seminar University of Manitoba October 28, Research funded by NSERC.

2 Outline 1 Introduction Research objectives Sampling designs Data available Methodological objectives 2 Conditional logistic regression Model and notation Justification of conditional logistic regression 3 Population averaged inference Method Example of application 4 Subject specific inference Method Example of application 5 Conclusion 6 References

3 Research objectives Objectives of our research Ecological objectives For the biologists, it is important to understand the links between various attributes of a landscape and how animals select their habitat (or move within their home-range).

4 Research objectives Objectives of our research Ecological objectives For the biologists, it is important to understand the links between various attributes of a landscape and how animals select their habitat (or move within their home-range). Statistical objectives What are the appropriate sampling designs? What are the possible statistical models? How do we make inference on the model parameters?

5 Sampling designs Possible study designs Unmatched used vs unused (or available) designs Useful to determine what landscape attributes predict if a location is likely to be used or not over a specified time frame (e.g., trees with nests vs trees without nests).

6 Sampling designs Possible study designs Unmatched used vs unused (or available) designs Useful to determine what landscape attributes predict if a location is likely to be used or not over a specified time frame (e.g., trees with nests vs trees without nests). Usually analyzed with logistic regression (Y i = 1 if location i is used, Y i = 0 otherwise). To be used with care since in some contexts, available unused.

7 Sampling designs Possible study designs Unmatched used vs unused (or available) designs Useful to determine what landscape attributes predict if a location is likely to be used or not over a specified time frame (e.g., trees with nests vs trees without nests). Usually analyzed with logistic regression (Y i = 1 if location i is used, Y i = 0 otherwise). To be used with care since in some contexts, available unused. If sampling unit is animal (with many used locations per animal), then within animal correlation must be taken into consideration GEE (population-averaged) or mixed models (subject-specific) are used. Again, care must be exercised w.r.t. the available/unused locations.

8 Sampling designs Possible study designs Matched designs For each location used (or step traveled) by an animal, m unused locations that could have been visited by the same animal at the same time are sampled.

9 Sampling designs Possible study designs Matched designs For each location used (or step traveled) by an animal, m unused locations that could have been visited by the same animal at the same time are sampled. The dataset is comprised of several such matched strata for each animal.

10 Sampling designs Possible study designs Matched designs For each location used (or step traveled) by an animal, m unused locations that could have been visited by the same animal at the same time are sampled. The dataset is comprised of several such matched strata for each animal. Does not allow inference on absolute probability of use of a precise location, but does allow inference on the probability of choosing a given location among a set of locations when location attributes are given.

11 Sampling designs Matched design E.g., each location is matched with 10 locations picked at random among those that could have been used at same time. Step Selection Functions. Fortin et al Ecology 86(5):

12 Sampling designs Matched design E.g., each location is matched with 10 locations picked at random among those that could have been used at same time. Step Selection Functions. Fortin et al Ecology 86(5):

13 Data available Part I: Data on the available location We have a detailed GIS database of Prince Albert National Park

14 Data available Part II: Animal location data For each of K animals (female bison), GPS collars give their precise location at a large number of equally spaced time steps

15 Methodological objectives Our precise statistical problems In some cases, we can get more than one Y = 1 in a stratum: e.g., a pair of animals traveling together. How do we make inferences on the preferences of the animals for given landscape attributes under such a sampling design? We will see that this can be done if we can come up with a longitudinal version of conditional logistic regression.

16 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n;

17 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n; Response variable: y (c) ji = 1 if animal c was at location i in j-th stratum, 0 otherwise;

18 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n; Response variable: y (c) ji = 1 if animal c was at location i in j-th stratum, 0 otherwise; Covariates: Value of attributes of landscape at location i in stratum j of animal c: x (c) ji = (x (c) ji1,...,x(c) jip ) ;

19 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n; Response variable: y (c) ji = 1 if animal c was at location i in j-th stratum, 0 otherwise; Covariates: Value of attributes of landscape at location i in stratum j of animal c: x (c) ji = (x (c) ji1,...,x(c) jip ) ; Sampling design: By design, it is known before sampling that n i=1 y(c) ji = m for all j,c.

20 Model and notation Prospective model If we sampled locations without knowing the value of the y (c) ji in advance (i.e., prospective study), we could link landscape attributes x (c) ji with y (c) ji using logistic regression-type models.

21 Model and notation Prospective model If we sampled locations without knowing the value of the y (c) ji in advance (i.e., prospective study), we could link landscape attributes x (c) ji with y (c) ji using logistic regression-type models. E.g., given i.i.d. N(0, Σ) vectors of animal-level random effects, say b c, and the covariates, it is assumed that the y (c) ji are independent with ( ) [ ] Pr y (c) exp β x (c) ji = 1 b c,x (c) ji + b c z (c) ji ji = ( ). 1 + exp β x (c) ji + b c z (c) ji

22 Model and notation Resource selection function The exponential of the linear predictor is sometimes called resource selection function (RSF). Maps of its value can help to assess animal preferences.

23 Justification of conditional logistic regression Retrospective model When location i in stratum j of animal c is sampled on the basis of its y (c) ji value, how can we infer about β (and possibly Σ) in the prospective model?

24 Justification of conditional logistic regression Retrospective model When location i in stratum j of animal c is sampled on the basis of its y (c) ji value, how can we infer about β (and possibly Σ) in the prospective model? Using arguments based on conditional likelihood (e.g., Hosmer & Lemeshow 2000), on discrete choice theory (e.g., Manly et al. 2002, Train 2003) or on movement kernels (e.g., Forester et al, 2009), we get that a good way to deal with the retrospective design is conditional logistic regression.

25 Justification of conditional logistic regression Conditional likelihood If we suppose that b c z (c) ji = b c in the prospective model, then we get that [ ] ( ) n exp n i=1 β x (c) ji y (c) ji Pr ji,i = 1,...,n b c, y (c) i=1 y (c) ji = m,x (c) ji,i = 1,...,n where the sum at the denominator is over all vectors v l comprised of zeros and ones such that the sum of their elements is m. = (n m) l=1 exp ( n i=1 β x (c) ji v li )

26 Justification of conditional logistic regression Exponential movement kernels (Forester et al 2009) Suppose the animal is at location a at time step t. All locations in set D a are reachable by the animal until time step t + 1. Assume that the density of movement from a point a to a point b in a homogeneous baseline landscape over one time step is given by φ(d ab ), where d ab is the distance between a and b. Suppose that habitat characteristics have a log-linear effect on the movement kernel. Then f (b a,x s,s D a ) = φ(d ab)exp(β x b ) s D a φ(d as )exp(β x s ).

27 Justification of conditional logistic regression Exponential movement kernels (Forester et al 2009) Evaluation of the integral at the denominator can be replaced by an approximating sum. Forester et al (2009) show that if a sample S a comprised of b and n 1 other locations in D a are appropriately sampled, f (b a,x l,l D a ) = φ(d ab)exp(β x b ) l D a φ(d al )exp(β x l ) exp(β x b ) l Sa exp(β x l ), which is the probability of conditional logistic regression when m = 1 and the location with y = 1 is b.

28 Method Data and assumptions Now back to the general problem: K animals, S (c) strata observed for animal c, m cases (locations with y = 1) and n m controls (locations with y = 0) in each stratum. We want to make population averaged inference about β in the prospective model. It is assumed that the data can be partitioned into uncorrelated clusters (data from different animals uncorrelated, or clusters of observations on a same animal taken several time units apart).

29 Method Craiu et al (2008) We showed that the likelihood score function of the retrospective model can be rewritten as U(β) = = K S (c) n c=1 j=1 i=2 K c=1 where x (c) ji x(c) ji y (c) n) ji (m l=1 v lix (c) ji ( D (c) V Indep) (c) 1 {Ỹ(c) } µ(β), ( ) exp n h=2 β x (c) ji v lh (m n) l=1 exp ( n h=2 β x (c) ji v lh ) = x (c) ji x (c) j1 and Ỹ(c) is the vector of all responses, but without the y (c) j1 s and µ(β) = E Retro.[Ỹ (c) X (c) ].

30 Method Advantages With the robust (sandwich) estimate of Var( ˆβ), inferences about β are valid no matter what the correlation structure within clusters is... as long as data are uncorrelated between clusters.

31 Method Advantages With the robust (sandwich) estimate of Var( ˆβ), inferences about β are valid no matter what the correlation structure within clusters is... as long as data are uncorrelated between clusters. U(β) is the partial likelihood score for the Cox model for discrete data PROC PHREG or coxph() can be used to apply the method.

32 Method Advantages With the robust (sandwich) estimate of Var( ˆβ), inferences about β are valid no matter what the correlation structure within clusters is... as long as data are uncorrelated between clusters. U(β) is the partial likelihood score for the Cox model for discrete data PROC PHREG or coxph() can be used to apply the method. Simulations have shown that inferences are good in finite samples:

33 Method Simulation results, Craiu et al (2008, Table 1)

34 Method Disadvantages Inference on parameters of working correlation matrix not possible Must use independence working assumption.

35 Method Disadvantages Inference on parameters of working correlation matrix not possible Must use independence working assumption. Though better than AIC, the QIC(I) model selection criterion did not perform really well in simulations:

36 Method Simulation results

37 Example of application Application to female bison in Prince Albert 8 female bison with 14 clusters of 48 locations, and 1 female with 9 clusters, all followed between 2 Sept Dec Each observed location was matched to 10 locations picked at random in a 300 m buffer (so K = = 121, S = 48, m = 1, n = 11). x: 6 dummy variables to quantify seven-level habitat class categorical variable (deciduous stands = baseline level)

38 Example of application Model fit

39 Method Conditional inference Sometimes, subject-specific inferences are required. Can we estimate β and Σ from the mixed-effects prospective model with the retrospective sampling design?

40 Method Conditional inference Sometimes, subject-specific inferences are required. Can we estimate β and Σ from the mixed-effects prospective model with the retrospective sampling design? Already done in some special cases: Family studies of genetic diseases (special case S = 1) Mixed multinomial logit discrete choice model (special case m = 1)

41 Method Likelihood for the general case Craiu et al (2011) get the following likelihood in the general case:

42 Method Likelihood for the general case Craiu et al (2011) get the following likelihood in the general case: L(β,Σ) = K c=1 ( exp si y (c) si β x (c) si d (c) (β,b) s (c) exp l L s ) ( ) exp si y (c) si b z (c) si d (c) (β,b) df(b;σ) { }, i v (c) lsi (β x (c) si + b z (c) si ) df(b; Σ) where d (c) (β,b) = s i {1 + exp(β x (c) si + b z (c) si )} 1.

43 Method Likelihood for the general case Craiu et al (2011) get the following likelihood in the general case: L(β,Σ) = K c=1 ( exp si y (c) si β x (c) si d (c) (β,b) s (c) exp l L s ) ( ) exp si y (c) si b z (c) si d (c) (β,b) df(b;σ) { }, i v (c) lsi (β x (c) si + b z (c) si ) df(b; Σ) where d (c) (β,b) = s i {1 + exp(β x (c) si + b z (c) si )} 1. How do you maximize this thing?!?!?!?!!

44 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ.

45 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ. Mixed multinomial logit (Bhat 2001): Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize.

46 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ. Mixed multinomial logit (Bhat 2001): Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize. Craiu et al (2011), first attempt: Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize

47 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ. Mixed multinomial logit (Bhat 2001): Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize. Craiu et al (2011), first attempt: Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize With small K and large S, these methods are painfully slow and unstable!

48 Method Two-step algorithm, Craiu et al (2011) Inspired by earlier work for GLMM, we derived a two-step method that is numerically fast and stable and that yields estimators of β and Σ with good properties: Step 1: Separately for each cluster c, use traditional maximum likelihood for independent data (e.g., coxph()) to get ˆβ c and an estimate of its estimate R c = Var( ˆβ c ).

49 Method Two-step algorithm, Craiu et al (2011) Inspired by earlier work for GLMM, we derived a two-step method that is numerically fast and stable and that yields estimators of β and Σ with good properties: Step 1: Separately for each cluster c, use traditional maximum likelihood for independent data (e.g., coxph()) to get ˆβ c and an estimate of its estimate R c = Var( ˆβ c ). Step 2: Since the clusters are large, the ˆβ c are independent and ˆβ c N(β,R c ). Thus we can use linear mixed model theory and REML estimation to combine these estimates together to obtain estimates of β and Σ.

50 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ).

51 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ). Stack the vectors of random effects b 1,...,b K in a vector φ and their variances in a block diagonal matrix Σ = diag(σ,...,σ).

52 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ). Stack the vectors of random effects b 1,...,b K in a vector φ and their variances in a block diagonal matrix Σ = diag(σ,...,σ). Define W 1 = 1 K I p and W 2 = I K p.

53 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ). Stack the vectors of random effects b 1,...,b K in a vector φ and their variances in a block diagonal matrix Σ = diag(σ,...,σ). Define W 1 = 1 K I p and W 2 = I K p. Consider the linear mixed model U = W 1 β + W 2 φ + ε, where ε N(0,R), φ N(0, Σ) and φ ε.

54 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML).

55 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML). We first tried with ML, but variances were underestimated and ˆβ was biased.

56 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML). We first tried with ML, but variances were underestimated and ˆβ was biased. We used the EM algorithm (both E and M steps in closed form for a few specifications of the structure of Σ) to implement REML numerically quick and stable, estimators quite good in terms of bias, even in terms of efficiency.

57 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML). We first tried with ML, but variances were underestimated and ˆβ was biased. We used the EM algorithm (both E and M steps in closed form for a few specifications of the structure of Σ) to implement REML numerically quick and stable, estimators quite good in terms of bias, even in terms of efficiency. An R package (TwoStepClogit) implementing this method should be available on CRAN in the Spring!

58 Method Simulation results, Craiu et al (2011, Fig. 1)

59 Example of application Application to female bison in Prince Albert 20 pairs of two female bison followed between 15 Nov. 15 April, 2005, 2006, 2007 Each pair of observed locations was matched to 10 locations picked at random in a 700 m buffer (so K = 20, m = 2, n = 12, S varied between 21 and 349). x: dummy variables to quantify habitat class as well as an above-ground vegetation biomass index (in kg/m 2 )

60 Example of application Model fit

61 Future research How should the controls be sampled?

62 Future research How should the controls be sampled? Within cluster correlation: How to estimate working correlations in GEE? How to include autocorrelation among observations belonging to a same cluster in the prospective (then retrospective) model?

63 Future research How should the controls be sampled? Within cluster correlation: How to estimate working correlations in GEE? How to include autocorrelation among observations belonging to a same cluster in the prospective (then retrospective) model? Between cluster correlation: How can we include between animal (or between pair of animals) correlation in such models?

64 Future research How should the controls be sampled? Within cluster correlation: How to estimate working correlations in GEE? How to include autocorrelation among observations belonging to a same cluster in the prospective (then retrospective) model? Between cluster correlation: How can we include between animal (or between pair of animals) correlation in such models? Model validation: relatively easy to do informally with K-fold cross-validation type of approaches... but how can a formal goodness-of-fit test be done?

65 References Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model, Transport. Res. Part B, 35, Craiu, R. V., Duchesne, T., Fortin, D. (2008). Inference methods for the conditional logistic regression model with longitudinal data., Biometrical J., 50, Craiu, R. V., Duchesne, T., Fortin, D., Baillargeon, S. (2011). Conditional logistic regression with longitudinal follow up and individual-level random coefficients: A stable and efficient two-step estimation method, J. of Comput. & Graph. Statist, to appear. Forester, J. D., Im, H. K., Rathouz, P. J. (2009). Accounting for animal movement in estimation of resource selection functions: sampling and data analysis, Ecology, 90, Pfeiffer, R. M., Gail, M. H., Pee, D. (2001). Inference for covariates that accounts for ascertainment and random genetic effects in family studies, Biometrika, 88, Train, K. (2003). Discrete choice methods with simulation, New York: Cambridge University Press.

Package TwoStepCLogit

Package TwoStepCLogit March 21, 2016 Type Package Title Conditional Logistic Regression: A Two-Step Estimation Method Version 1.2.5 Date 2016-03-19 Author Radu V. Craiu, Thierry Duchesne, Daniel Fortin