Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies
|
|
- Randall Little
- 5 years ago
- Views:
Transcription
1 Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies Thierry Duchesne 1 (Thierry.Duchesne@mat.ulaval.ca) with Radu Craiu, Daniel Fortin, Sophie Baillargeon Département de mathématiques et de statistique, Université Laval Department of Statistics, University of Toronto Département de biologie, Université Laval Department of Statistics Seminar University of Manitoba October 28, Research funded by NSERC.
2 Outline 1 Introduction Research objectives Sampling designs Data available Methodological objectives 2 Conditional logistic regression Model and notation Justification of conditional logistic regression 3 Population averaged inference Method Example of application 4 Subject specific inference Method Example of application 5 Conclusion 6 References
3 Research objectives Objectives of our research Ecological objectives For the biologists, it is important to understand the links between various attributes of a landscape and how animals select their habitat (or move within their home-range).
4 Research objectives Objectives of our research Ecological objectives For the biologists, it is important to understand the links between various attributes of a landscape and how animals select their habitat (or move within their home-range). Statistical objectives What are the appropriate sampling designs? What are the possible statistical models? How do we make inference on the model parameters?
5 Sampling designs Possible study designs Unmatched used vs unused (or available) designs Useful to determine what landscape attributes predict if a location is likely to be used or not over a specified time frame (e.g., trees with nests vs trees without nests).
6 Sampling designs Possible study designs Unmatched used vs unused (or available) designs Useful to determine what landscape attributes predict if a location is likely to be used or not over a specified time frame (e.g., trees with nests vs trees without nests). Usually analyzed with logistic regression (Y i = 1 if location i is used, Y i = 0 otherwise). To be used with care since in some contexts, available unused.
7 Sampling designs Possible study designs Unmatched used vs unused (or available) designs Useful to determine what landscape attributes predict if a location is likely to be used or not over a specified time frame (e.g., trees with nests vs trees without nests). Usually analyzed with logistic regression (Y i = 1 if location i is used, Y i = 0 otherwise). To be used with care since in some contexts, available unused. If sampling unit is animal (with many used locations per animal), then within animal correlation must be taken into consideration GEE (population-averaged) or mixed models (subject-specific) are used. Again, care must be exercised w.r.t. the available/unused locations.
8 Sampling designs Possible study designs Matched designs For each location used (or step traveled) by an animal, m unused locations that could have been visited by the same animal at the same time are sampled.
9 Sampling designs Possible study designs Matched designs For each location used (or step traveled) by an animal, m unused locations that could have been visited by the same animal at the same time are sampled. The dataset is comprised of several such matched strata for each animal.
10 Sampling designs Possible study designs Matched designs For each location used (or step traveled) by an animal, m unused locations that could have been visited by the same animal at the same time are sampled. The dataset is comprised of several such matched strata for each animal. Does not allow inference on absolute probability of use of a precise location, but does allow inference on the probability of choosing a given location among a set of locations when location attributes are given.
11 Sampling designs Matched design E.g., each location is matched with 10 locations picked at random among those that could have been used at same time. Step Selection Functions. Fortin et al Ecology 86(5):
12 Sampling designs Matched design E.g., each location is matched with 10 locations picked at random among those that could have been used at same time. Step Selection Functions. Fortin et al Ecology 86(5):
13 Data available Part I: Data on the available location We have a detailed GIS database of Prince Albert National Park
14 Data available Part II: Animal location data For each of K animals (female bison), GPS collars give their precise location at a large number of equally spaced time steps
15 Methodological objectives Our precise statistical problems In some cases, we can get more than one Y = 1 in a stratum: e.g., a pair of animals traveling together. How do we make inferences on the preferences of the animals for given landscape attributes under such a sampling design? We will see that this can be done if we can come up with a longitudinal version of conditional logistic regression.
16 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n;
17 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n; Response variable: y (c) ji = 1 if animal c was at location i in j-th stratum, 0 otherwise;
18 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n; Response variable: y (c) ji = 1 if animal c was at location i in j-th stratum, 0 otherwise; Covariates: Value of attributes of landscape at location i in stratum j of animal c: x (c) ji = (x (c) ji1,...,x(c) jip ) ;
19 Model and notation Notation Animals: c = 1,2,...,K; Strata: j = 1,2,...,S c ; Locations: i = 1,2,...,n; Response variable: y (c) ji = 1 if animal c was at location i in j-th stratum, 0 otherwise; Covariates: Value of attributes of landscape at location i in stratum j of animal c: x (c) ji = (x (c) ji1,...,x(c) jip ) ; Sampling design: By design, it is known before sampling that n i=1 y(c) ji = m for all j,c.
20 Model and notation Prospective model If we sampled locations without knowing the value of the y (c) ji in advance (i.e., prospective study), we could link landscape attributes x (c) ji with y (c) ji using logistic regression-type models.
21 Model and notation Prospective model If we sampled locations without knowing the value of the y (c) ji in advance (i.e., prospective study), we could link landscape attributes x (c) ji with y (c) ji using logistic regression-type models. E.g., given i.i.d. N(0, Σ) vectors of animal-level random effects, say b c, and the covariates, it is assumed that the y (c) ji are independent with ( ) [ ] Pr y (c) exp β x (c) ji = 1 b c,x (c) ji + b c z (c) ji ji = ( ). 1 + exp β x (c) ji + b c z (c) ji
22 Model and notation Resource selection function The exponential of the linear predictor is sometimes called resource selection function (RSF). Maps of its value can help to assess animal preferences.
23 Justification of conditional logistic regression Retrospective model When location i in stratum j of animal c is sampled on the basis of its y (c) ji value, how can we infer about β (and possibly Σ) in the prospective model?
24 Justification of conditional logistic regression Retrospective model When location i in stratum j of animal c is sampled on the basis of its y (c) ji value, how can we infer about β (and possibly Σ) in the prospective model? Using arguments based on conditional likelihood (e.g., Hosmer & Lemeshow 2000), on discrete choice theory (e.g., Manly et al. 2002, Train 2003) or on movement kernels (e.g., Forester et al, 2009), we get that a good way to deal with the retrospective design is conditional logistic regression.
25 Justification of conditional logistic regression Conditional likelihood If we suppose that b c z (c) ji = b c in the prospective model, then we get that [ ] ( ) n exp n i=1 β x (c) ji y (c) ji Pr ji,i = 1,...,n b c, y (c) i=1 y (c) ji = m,x (c) ji,i = 1,...,n where the sum at the denominator is over all vectors v l comprised of zeros and ones such that the sum of their elements is m. = (n m) l=1 exp ( n i=1 β x (c) ji v li )
26 Justification of conditional logistic regression Exponential movement kernels (Forester et al 2009) Suppose the animal is at location a at time step t. All locations in set D a are reachable by the animal until time step t + 1. Assume that the density of movement from a point a to a point b in a homogeneous baseline landscape over one time step is given by φ(d ab ), where d ab is the distance between a and b. Suppose that habitat characteristics have a log-linear effect on the movement kernel. Then f (b a,x s,s D a ) = φ(d ab)exp(β x b ) s D a φ(d as )exp(β x s ).
27 Justification of conditional logistic regression Exponential movement kernels (Forester et al 2009) Evaluation of the integral at the denominator can be replaced by an approximating sum. Forester et al (2009) show that if a sample S a comprised of b and n 1 other locations in D a are appropriately sampled, f (b a,x l,l D a ) = φ(d ab)exp(β x b ) l D a φ(d al )exp(β x l ) exp(β x b ) l Sa exp(β x l ), which is the probability of conditional logistic regression when m = 1 and the location with y = 1 is b.
28 Method Data and assumptions Now back to the general problem: K animals, S (c) strata observed for animal c, m cases (locations with y = 1) and n m controls (locations with y = 0) in each stratum. We want to make population averaged inference about β in the prospective model. It is assumed that the data can be partitioned into uncorrelated clusters (data from different animals uncorrelated, or clusters of observations on a same animal taken several time units apart).
29 Method Craiu et al (2008) We showed that the likelihood score function of the retrospective model can be rewritten as U(β) = = K S (c) n c=1 j=1 i=2 K c=1 where x (c) ji x(c) ji y (c) n) ji (m l=1 v lix (c) ji ( D (c) V Indep) (c) 1 {Ỹ(c) } µ(β), ( ) exp n h=2 β x (c) ji v lh (m n) l=1 exp ( n h=2 β x (c) ji v lh ) = x (c) ji x (c) j1 and Ỹ(c) is the vector of all responses, but without the y (c) j1 s and µ(β) = E Retro.[Ỹ (c) X (c) ].
30 Method Advantages With the robust (sandwich) estimate of Var( ˆβ), inferences about β are valid no matter what the correlation structure within clusters is... as long as data are uncorrelated between clusters.
31 Method Advantages With the robust (sandwich) estimate of Var( ˆβ), inferences about β are valid no matter what the correlation structure within clusters is... as long as data are uncorrelated between clusters. U(β) is the partial likelihood score for the Cox model for discrete data PROC PHREG or coxph() can be used to apply the method.
32 Method Advantages With the robust (sandwich) estimate of Var( ˆβ), inferences about β are valid no matter what the correlation structure within clusters is... as long as data are uncorrelated between clusters. U(β) is the partial likelihood score for the Cox model for discrete data PROC PHREG or coxph() can be used to apply the method. Simulations have shown that inferences are good in finite samples:
33 Method Simulation results, Craiu et al (2008, Table 1)
34 Method Disadvantages Inference on parameters of working correlation matrix not possible Must use independence working assumption.
35 Method Disadvantages Inference on parameters of working correlation matrix not possible Must use independence working assumption. Though better than AIC, the QIC(I) model selection criterion did not perform really well in simulations:
36 Method Simulation results
37 Example of application Application to female bison in Prince Albert 8 female bison with 14 clusters of 48 locations, and 1 female with 9 clusters, all followed between 2 Sept Dec Each observed location was matched to 10 locations picked at random in a 300 m buffer (so K = = 121, S = 48, m = 1, n = 11). x: 6 dummy variables to quantify seven-level habitat class categorical variable (deciduous stands = baseline level)
38 Example of application Model fit
39 Method Conditional inference Sometimes, subject-specific inferences are required. Can we estimate β and Σ from the mixed-effects prospective model with the retrospective sampling design?
40 Method Conditional inference Sometimes, subject-specific inferences are required. Can we estimate β and Σ from the mixed-effects prospective model with the retrospective sampling design? Already done in some special cases: Family studies of genetic diseases (special case S = 1) Mixed multinomial logit discrete choice model (special case m = 1)
41 Method Likelihood for the general case Craiu et al (2011) get the following likelihood in the general case:
42 Method Likelihood for the general case Craiu et al (2011) get the following likelihood in the general case: L(β,Σ) = K c=1 ( exp si y (c) si β x (c) si d (c) (β,b) s (c) exp l L s ) ( ) exp si y (c) si b z (c) si d (c) (β,b) df(b;σ) { }, i v (c) lsi (β x (c) si + b z (c) si ) df(b; Σ) where d (c) (β,b) = s i {1 + exp(β x (c) si + b z (c) si )} 1.
43 Method Likelihood for the general case Craiu et al (2011) get the following likelihood in the general case: L(β,Σ) = K c=1 ( exp si y (c) si β x (c) si d (c) (β,b) s (c) exp l L s ) ( ) exp si y (c) si b z (c) si d (c) (β,b) df(b;σ) { }, i v (c) lsi (β x (c) si + b z (c) si ) df(b; Σ) where d (c) (β,b) = s i {1 + exp(β x (c) si + b z (c) si )} 1. How do you maximize this thing?!?!?!?!!
44 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ.
45 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ. Mixed multinomial logit (Bhat 2001): Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize.
46 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ. Mixed multinomial logit (Bhat 2001): Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize. Craiu et al (2011), first attempt: Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize
47 Method Maximization of the likelihood Family studies (Pfeiffer et al 2001): Evaluate the integrals by Monte Carlo method, then maximize using a hybrid of Newton-type methods for β and grid search for elements of Σ. Mixed multinomial logit (Bhat 2001): Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize. Craiu et al (2011), first attempt: Quasi-Monte Carlo evaluation of integrals, Newton-type methods to maximize With small K and large S, these methods are painfully slow and unstable!
48 Method Two-step algorithm, Craiu et al (2011) Inspired by earlier work for GLMM, we derived a two-step method that is numerically fast and stable and that yields estimators of β and Σ with good properties: Step 1: Separately for each cluster c, use traditional maximum likelihood for independent data (e.g., coxph()) to get ˆβ c and an estimate of its estimate R c = Var( ˆβ c ).
49 Method Two-step algorithm, Craiu et al (2011) Inspired by earlier work for GLMM, we derived a two-step method that is numerically fast and stable and that yields estimators of β and Σ with good properties: Step 1: Separately for each cluster c, use traditional maximum likelihood for independent data (e.g., coxph()) to get ˆβ c and an estimate of its estimate R c = Var( ˆβ c ). Step 2: Since the clusters are large, the ˆβ c are independent and ˆβ c N(β,R c ). Thus we can use linear mixed model theory and REML estimation to combine these estimates together to obtain estimates of β and Σ.
50 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ).
51 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ). Stack the vectors of random effects b 1,...,b K in a vector φ and their variances in a block diagonal matrix Σ = diag(σ,...,σ).
52 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ). Stack the vectors of random effects b 1,...,b K in a vector φ and their variances in a block diagonal matrix Σ = diag(σ,...,σ). Define W 1 = 1 K I p and W 2 = I K p.
53 Method Second step: REML with EM Easy to implement and to program... but difficult to explain due to extremely heavy notation! But in a nutshell, Stack the estimates ˆβ 1..., ˆβ K in a vector V and their variance estimates in a block diagonal matrix R = diag(r 1,...,R K ). Stack the vectors of random effects b 1,...,b K in a vector φ and their variances in a block diagonal matrix Σ = diag(σ,...,σ). Define W 1 = 1 K I p and W 2 = I K p. Consider the linear mixed model U = W 1 β + W 2 φ + ε, where ε N(0,R), φ N(0, Σ) and φ ε.
54 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML).
55 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML). We first tried with ML, but variances were underestimated and ˆβ was biased.
56 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML). We first tried with ML, but variances were underestimated and ˆβ was biased. We used the EM algorithm (both E and M steps in closed form for a few specifications of the structure of Σ) to implement REML numerically quick and stable, estimators quite good in terms of bias, even in terms of efficiency.
57 Method Second step: REML with EM β and Σ in this mixed linear model can be estimated by maximum likelihood (ML) or by restricted maximum likelihood (REML). We first tried with ML, but variances were underestimated and ˆβ was biased. We used the EM algorithm (both E and M steps in closed form for a few specifications of the structure of Σ) to implement REML numerically quick and stable, estimators quite good in terms of bias, even in terms of efficiency. An R package (TwoStepClogit) implementing this method should be available on CRAN in the Spring!
58 Method Simulation results, Craiu et al (2011, Fig. 1)
59 Example of application Application to female bison in Prince Albert 20 pairs of two female bison followed between 15 Nov. 15 April, 2005, 2006, 2007 Each pair of observed locations was matched to 10 locations picked at random in a 700 m buffer (so K = 20, m = 2, n = 12, S varied between 21 and 349). x: dummy variables to quantify habitat class as well as an above-ground vegetation biomass index (in kg/m 2 )
60 Example of application Model fit
61 Future research How should the controls be sampled?
62 Future research How should the controls be sampled? Within cluster correlation: How to estimate working correlations in GEE? How to include autocorrelation among observations belonging to a same cluster in the prospective (then retrospective) model?
63 Future research How should the controls be sampled? Within cluster correlation: How to estimate working correlations in GEE? How to include autocorrelation among observations belonging to a same cluster in the prospective (then retrospective) model? Between cluster correlation: How can we include between animal (or between pair of animals) correlation in such models?
64 Future research How should the controls be sampled? Within cluster correlation: How to estimate working correlations in GEE? How to include autocorrelation among observations belonging to a same cluster in the prospective (then retrospective) model? Between cluster correlation: How can we include between animal (or between pair of animals) correlation in such models? Model validation: relatively easy to do informally with K-fold cross-validation type of approaches... but how can a formal goodness-of-fit test be done?
65 References Bhat, C. (2001). Quasi-random maximum simulated likelihood estimation of the mixed multinomial logit model, Transport. Res. Part B, 35, Craiu, R. V., Duchesne, T., Fortin, D. (2008). Inference methods for the conditional logistic regression model with longitudinal data., Biometrical J., 50, Craiu, R. V., Duchesne, T., Fortin, D., Baillargeon, S. (2011). Conditional logistic regression with longitudinal follow up and individual-level random coefficients: A stable and efficient two-step estimation method, J. of Comput. & Graph. Statist, to appear. Forester, J. D., Im, H. K., Rathouz, P. J. (2009). Accounting for animal movement in estimation of resource selection functions: sampling and data analysis, Ecology, 90, Pfeiffer, R. M., Gail, M. H., Pee, D. (2001). Inference for covariates that accounts for ascertainment and random genetic effects in family studies, Biometrika, 88, Train, K. (2003). Discrete choice methods with simulation, New York: Cambridge University Press.
Package TwoStepCLogit
Package TwoStepCLogit March 21, 2016 Type Package Title Conditional Logistic Regression: A Two-Step Estimation Method Version 1.2.5 Date 2016-03-19 Author Radu V. Craiu, Thierry Duchesne, Daniel Fortin
More informationKey Words: CREML; EM-algorithm; Habitat selection; Mixed effects; Mixed multinomial logit; One-step estimator; REML; Two-step analysis.
Supplementary materials for this article are available online. PleaseclicktheJCGSlinkathttp://pubs.amstat.org. Conditional Logistic Regression With Longitudinal Follow-up and Individual-Level Random Coefficients:
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationGeneralized, Linear, and Mixed Models
Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationSurvival Regression Models
Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant
More informationCox s proportional hazards model and Cox s partial likelihood
Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationImpact of serial correlation structures on random effect misspecification with the linear mixed model.
Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)
More informationThe impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference
The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference An application to longitudinal modeling Brianna Heggeseth with Nicholas Jewell Department of Statistics
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationSurvival Analysis for Case-Cohort Studies
Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz
More informationIntroduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016
Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016 EPSY 905: Intro to Bayesian and MCMC Today s Class An
More informationFinite Population Sampling and Inference
Finite Population Sampling and Inference A Prediction Approach RICHARD VALLIANT ALAN H. DORFMAN RICHARD M. ROYALL A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationLogistic Regression. Continued Psy 524 Ainsworth
Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationEstimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004
Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationPropensity Score Methods for Causal Inference
John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationMore Statistics tutorial at Logistic Regression and the new:
Logistic Regression and the new: Residual Logistic Regression 1 Outline 1. Logistic Regression 2. Confounding Variables 3. Controlling for Confounding Variables 4. Residual Linear Regression 5. Residual
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationStat 587: Key points and formulae Week 15
Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place
More informationCox s proportional hazards/regression model - model assessment
Cox s proportional hazards/regression model - model assessment Rasmus Waagepetersen September 27, 2017 Topics: Plots based on estimated cumulative hazards Cox-Snell residuals: overall check of fit Martingale
More informationOn dealing with spatially correlated residuals in remote sensing and GIS
On dealing with spatially correlated residuals in remote sensing and GIS Nicholas A. S. Hamm 1, Peter M. Atkinson and Edward J. Milton 3 School of Geography University of Southampton Southampton SO17 3AT
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 14, 2014 Today s Schedule Course Project Introduction Linear Regression Model Decision Tree 2 Methods
More informationRegression. Oscar García
Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16 Dhruv Batra Virginia Tech Administrativia HW3 Due: April 14, 11:55pm You will implement
More informationIntroduction to mtm: An R Package for Marginalized Transition Models
Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationExtensions of Cox Model for Non-Proportional Hazards Purpose
PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationOn Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation
On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationOccupancy models. Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology
Occupancy models Gurutzeta Guillera-Arroita University of Kent, UK National Centre for Statistical Ecology Advances in Species distribution modelling in ecological studies and conservation Pavia and Gran
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationRegression Adjustment with Artificial Neural Networks
Regression Adjustment with Artificial Neural Networks Age of Big Data: data comes in a rate and in a variety of types that exceed our ability to analyse it Texts, image, speech, video Real motivation:
More informationLecture 12. Multivariate Survival Data Statistics Survival Analysis. Presented March 8, 2016
Statistics 255 - Survival Analysis Presented March 8, 2016 Dan Gillen Department of Statistics University of California, Irvine 12.1 Examples Clustered or correlated survival times Disease onset in family
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationExtensions of Cox Model for Non-Proportional Hazards Purpose
PhUSE Annual Conference 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Author: Jadwiga Borucka PAREXEL, Warsaw, Poland Brussels 13 th - 16 th October 2013 Presentation Plan
More informationStatistics: A review. Why statistics?
Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval
More informationChapter 14 Combining Models
Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationIncorporating Boosted Regression Trees into Ecological Latent Variable Models
Incorporating Boosted Regression Trees into Ecological Latent Variable Models Rebecca A. Hutchinson, Li-Ping Liu, Thomas G. Dietterich School of EECS, Oregon State University Motivation Species Distribution
More informationNew Developments in Econometrics Lecture 9: Stratified Sampling
New Developments in Econometrics Lecture 9: Stratified Sampling Jeff Wooldridge Cemmap Lectures, UCL, June 2009 1. Overview of Stratified Sampling 2. Regression Analysis 3. Clustering and Stratification
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationMultivariate Survival Analysis
Multivariate Survival Analysis Previously we have assumed that either (X i, δ i ) or (X i, δ i, Z i ), i = 1,..., n, are i.i.d.. This may not always be the case. Multivariate survival data can arise in
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationH-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL
H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly
More informationCharles E. McCulloch Biometrics Unit and Statistics Center Cornell University
A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components
More informationLogistic regression: Why we often can do what we think we can do. Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015
Logistic regression: Why we often can do what we think we can do Maarten Buis 19 th UK Stata Users Group meeting, 10 Sept. 2015 1 Introduction Introduction - In 2010 Carina Mood published an overview article
More informationModel Assumptions; Predicting Heterogeneity of Variance
Model Assumptions; Predicting Heterogeneity of Variance Today s topics: Model assumptions Normality Constant variance Predicting heterogeneity of variance CLP 945: Lecture 6 1 Checking for Violations of
More informationEstimating and contextualizing the attenuation of odds ratios due to non-collapsibility
Estimating and contextualizing the attenuation of odds ratios due to non-collapsibility Stephen Burgess Department of Public Health & Primary Care, University of Cambridge September 6, 014 Short title:
More informationMMWS Software Program Manual
MMWS Software Program Manual 1 Software Development The MMWS program is regularly updated. The latest beta version can be downloaded from http://hlmsoft.net/ghong/ MMWS Click here to get MMWS. For a sample
More informationCOMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS. Abstract
Far East J. Theo. Stat. 0() (006), 179-196 COMPARISON OF GMM WITH SECOND-ORDER LEAST SQUARES ESTIMATION IN NONLINEAR MODELS Department of Statistics University of Manitoba Winnipeg, Manitoba, Canada R3T
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationRegression tree-based diagnostics for linear multilevel models
Regression tree-based diagnostics for linear multilevel models Jeffrey S. Simonoff New York University May 11, 2011 Longitudinal and clustered data Panel or longitudinal data, in which we observe many
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationGENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR. Raymond J. Carroll: Texas A&M University
GENERALIZED LINEAR MIXED MODELS AND MEASUREMENT ERROR Raymond J. Carroll: Texas A&M University Naisyin Wang: Xihong Lin: Roberto Gutierrez: Texas A&M University University of Michigan Southern Methodist
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationLecture 8 Stat D. Gillen
Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationRobust Bayesian Variable Selection for Modeling Mean Medical Costs
Robust Bayesian Variable Selection for Modeling Mean Medical Costs Grace Yoon 1,, Wenxin Jiang 2, Lei Liu 3 and Ya-Chen T. Shih 4 1 Department of Statistics, Texas A&M University 2 Department of Statistics,
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationAnalysing longitudinal data when the visit times are informative
Analysing longitudinal data when the visit times are informative Eleanor Pullenayegum, PhD Scientist, Hospital for Sick Children Associate Professor, University of Toronto eleanor.pullenayegum@sickkids.ca
More informationSSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that need work
SSUI: Presentation Hints 1 Comparing Marginal and Random Eects (Frailty) Models Terry M. Therneau Mayo Clinic April 1998 SSUI: Presentation Hints 2 My Perspective Software Examples Reliability Areas that
More informationUnivariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation
Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationABHELSINKI UNIVERSITY OF TECHNOLOGY
Cross-Validation, Information Criteria, Expected Utilities and the Effective Number of Parameters Aki Vehtari and Jouko Lampinen Laboratory of Computational Engineering Introduction Expected utility -
More informationSimple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.
Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1
More informationMultilevel Methodology
Multilevel Methodology Geert Molenberghs Interuniversity Institute for Biostatistics and statistical Bioinformatics Universiteit Hasselt, Belgium geert.molenberghs@uhasselt.be www.censtat.uhasselt.be Katholieke
More information1 Mixed effect models and longitudinal data analysis
1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More information