Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Size: px
Start display at page:

Download "Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses"

Transcription

1 Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses 1

2 Marginal Model Marginal models (or population-average model) distinguish from the mixed effects models (or subjectspecific models) according to the interpretation of their regression coefficients. The marginal models are used to make inferences about population average. Marginal models emphasize the dependence of the mean response on the covariates of interest, but not on both the random effects and the covariates (mixed effects models), nor on previous responses (transition models). Marginal models do not require the joint distributional assumptions for the vector of responses, which may be difficult for the discrete data. The avoidance of distributional assumptions leads to a method of estimation of generalized estimating equations (GEE). For longitudinal data, the marginal model separately model the mean responses and within-subject association among the repeated responses. 2

3 Notation of the marginal models Let the random variable Y ij denote the responses for the ith individual, measured at time t ij, i = 1,..., m, j = 1,..., n i. The response variables for the ith subject is an n i 1 vector, Y i = (Y i1, Y i2,..., Y ini ). Let X ij denote p 1 vector of covariates X ij = (X ij1, X ij2,..., X ijp ) that are associated with each response, Y ij. X ij may include covariates whose values do not change over time, i.e. time-stationary or between-subjects covariates, and covariates whose values change over time, i.e., time-varying or within-subject covariates. The covariates can be grouped into an n i p matrix: X i = X i1 X i2. = X i11 X i12... X i1p X i21 X i22... X i2p, i = 1,..., m X in i X ini 1 X ini 2... X ini p 3

4 A marginal model has the following components: 1. Mean model: the marginal mean of each response, E(Y ij X ij ) = µ ij, depends on covariates via a known link function g(µ ij ) = η ij = Xijβ T 2. Correlation model (nuisance): Var(Y ij X i ) = ϕv (µ ij ) Cor(Y ij, Y ik X i ) = ρ ijk Cov(Y i X i ) = V i (ϕ, α) = ϕc 1/2 i R i C 1/2 i where R i is the correlation matrix and C i = diag(v (µ ij )) is a diagonal matrix of variances. The parameter α characterizes the correlation and ϕ is a scale parameter for variances. Estimation and valid inference are achieved by constructing an unbiased estimating function without likelihood function. 4

5 Example: marginal model for a continuous response Consider Y ij is a continuous response and we are interested in relating mean response to the covariates X i, The marginal mean model is: µ ij = η ij = Xijβ. T The variance model is: Var(Y ij X i ) = ϕv (µ ij ) = ϕ Cor(Y ij, Y ik X i ) = α k j where 0 α k j 1 is the pairwise correlation among the responses. Note in this example we assume the variance are homogeneous over time, other variance functions are also allowed, e.g. a separate scale parameter, ϕ j. We specify a first-order autoregressive correlation pattern in this example, other correlation structure, e.g. exchangeable or unstructured are also possible. 5

6 Example: marginal model for count data Consider Y ij is a count and we are interested to relate changes in the expected count to the covariates X i, The marginal mean model is: log(µ ij ) = η ij = Xijβ. T The variance model is: Var(Y ij X i ) = ϕµ ij Cor(Y ij, Y ik X i ) = α jk Note this is a log-linear regression model with an extra-poisson variance assumption which allows the variance to be inflated by a factor of ϕ to deal with the issue of over-dispersion. 6

7 Example: marginal model for a binary response Consider the effect of vitamin A deficiency (Xerophthalmia, X) on respiratory infection (RI, Y ) in the Indonesian Children s Health Study. Let i indicate the child and j the visit. The marginal mean model is: logit(µ ij ) = log Pr(Y ij = 1) Pr(Y ij = 0) = β 0 + β 1 I Xij =1. The variance model is: Var(Y ij ) = µ ij (1 µ ij ), Cor(Y ij, Y ik ) = α. The parameter of interest is β 1, exp(β 1 ) = Odds of RI among vitamin A deficient children. Odds of RI among non-deficient children When the prevalence of RI is low, the odds ratio (OR) is approximately the same as relative risk 7

8 (RR). The risk may be different for different children with the same covariates, so the parameter is a population average (assuming random sample). The correlation between two binary variables Y 1 and Y 2 has a constrained range that depends on µ 1 and µ 2. So it might be desirable to model the correlation differently, for example, using the marginal odds ratio. log OR(Y ij, Y ik ) = α jk. where OR(Y ij, Y ik ) = { } P r(yij = 1, Y ik = 1)P r(y ij = 0, Y ik = 0) P r(y ij = 1, Y ik = 0)P r(y ij = 0, Y ik = 1) 8

9 GEE1 Estimating function When (ϕ, α) are known, then the estimator ˆβ is defined by the estimating equation: 0 = m i=1 D T i V 1 i {Y i µ i (β)}, where D i (β) = µ i β, D i(j, k) = µ ij β k, V i (β, ϕ, α) = ϕc 1/2 i R i (α)c 1/2 i. where V i is known as a working covariance to distinguish it from the true underlying covariance among the Y ij s and R i is a working correlation. 9

10 For example, for logistic model with one covariate and exchangeable correlation, µ ij = exp(β 0 + β 1 x ij ) 1 + exp(β 0 + β 1 x ij ) ( µij D i (j) =, µ ) ij β 0 β 1 µ i1 (1 µ i1 ) µ C i = i2 (1 µ i2 ) µ ini (1 µ ini ) 1 α α α 1 α R i = α α 1 10

11 GEE1 estimation An iterative algorithm is used to find (ˆβ, ˆϕ, ˆα): 1. Starting with an initial estimate of β with an ordinary generalized linear model assuming independence. 2. Given ˆβ (l), calculate method-of-moments estimates for ϕ, α, and the estimate of covariance of V i : ˆV i = ˆϕĈ 1/2 i R i ( ˆα)Ĉ 1/2 i 3. Given estimates for ϕ, α and V i, solve the estimating equation using Fisher s scoring algorithm: ˆβ (l+1) = ˆβ (l) + ( m i=1 ˆD T i ˆV 1 i ˆD i ) 1 m i=1 ˆD T i ˆV 1 i {Y i ˆµ i }. 4. Iterate the above two steps until convergence is achieved. 11

12 GEE1 working correlation The model chosen for R i (α) is called the working correlation since it needs not to be the true correlation to obtain a valid point estimate ˆβ (consistent and asymptotically normal). If R i (α) is the correct correlation, then the model-based estimates of the standard errors for ˆβ can be used. Otherwise we use the empirical estimates of standard errors. Replacing α with a (any) consistent estimator does not affect the large sample properties of ˆβ. The asymptotic variance of ˆβ would be the same as if α is known. (Liang and Zeger, 1986). Some working correlation examples Independent correlation: assumes all pairwise correlation coefficients are zero: Cor(Y ij, Y ik ) = 0, j k. 12

13 Exchangeable correlation: assumes all pairwise correlation coefficients are equal and hence the components in the response vector Y are exchangeable (not ordered): Cor(Y ij, Y ik ) = α, j k. AR(1) correlation: the autoregressive correlation structure of order 1 assumes the correlation coefficients decay exponentially over time, and the responses are ordered in time and more correlated if they are closer to each other in time than if they are more distant: Cor(Y ij, Y ik ) = α j k, j k. Unstructured correlation: assumes all pairwise correlation coefficients are different parameters: Cor(Y ij, Y ik ) = α jk, j k. 13

14 GEE1 variance of ˆβ The solution ˆβ is consistent and asymptotically normal. If the correlation model is correct, then, the model-based estimate for the variance of ˆβ is ˆ Var(ˆβ) = A 1, where A = m i=1 Di T (ˆβ)V i 1 (ˆβ, ϕ, α)d i (ˆβ). If the correlation model is not correct, then we can use the empirical (so called sandwich ) variance estimate: Var(ˆβ) = A 1 BA 1, m B = U i Ui T = i=1 m Di T (ˆβ)V i 1 (ˆβ, ϕ, α) Cov(Y ˆ i )Vi 1 (ˆβ, ϕ, α)d i (ˆβ) i=1 14

15 where ˆ Cov(Y i ) = (Y i ˆµ i ) (Y i ˆµ i ) T. ˆ Cov(Y i ) is a poor estimator for Cov(Y i ). However, we do not need a good estimator for each Cov(Y i ). With sufficient independent replication (m large), the average covariance can be well estimated (consistency). What if (ϕ, α) are unknown? How can we estimate them and what is the impact on the estimation of β? Liang and Zeger (1986) proposed to use moment estimators for the unknown parameters (GEE1). 15

16 GEE1 estimating α using method of moments Let N = m i=1 n i. Recall that Var(Y ij X i ) = ϕv (µ ij ) where V is a known function. The scale parameter ϕ, if exists, can be estimated by ˆϕ = 1 N p m i=1 n i j=1 (Y ij ˆµ ij ) 2, V (ˆµ ij ) where p is the dimension of β. Binomial: V (ˆµ ij ) = ˆµ ij (1 ˆµ ij ). Poisson: V (ˆµ ij ) = ˆµ ij. 16

17 Define the residuals: r ij = Y ij ˆµ ij (ˆβ) [ ] 1/2, Var(Y ˆ ij ) where ˆ Var(Y ij ) = ˆϕV (ˆµ ij ). The correlation parameter α can be estimated as simple functions of r ij, e.g. Exchangeable correlation: ˆα = ( i n i(n i 1) p) ˆϕ m r ij r ik. i=1 j<k AR(1) correlation: Unstructured correlation: ˆα = 1 ( i (n i 1) p) ˆϕ 1 ˆα jk = (m p) ˆϕ m r ij r i,j+1. i=1 j n i 1 m r ij r ik. i=1 17

18 GEE1 more about α Recall that we use simple MoM estimators to estimate (ϕ, α) in GEE1. The scale parameter ϕ is often considered to be a nuisance and it does not affect the estimates of β. But what about α? 1. Shouldn t we consider the parameter as (β, α)? 2. Can t we improve upon the estimation of α? 3. Would better estimation of α help us to better estimate β? Shouldn t we consider the parameter as (β, α)? Answer: It depends. Is α a nuisance? If the covariance structure is of secondary interest (often the case) then GEE1 is usually fine. However, if the covariance matrix is of primary interest then GEE1 is not ideal. Are you willing to sacrifice some model robustness in order to let (β, α) be the target parameter? Note that in GEE1, the estimate ˆβ is consistent even if the model for α is wrong. Other approaches 18

19 that treat β and α on equal ground may not have this property. Can t we improve upon the estimation of α? Answer: Yes! We can adopt alternative association (dependence) models that are more suitable for categorical data. We can create estimators that are targeted at (β, α) jointly and are efficient for both. Would better estimation of α help us to better estimate β? Answer: Model for α is important for the efficiency of ˆβ. 19

20 GEE1 more about the sandwich estimator of Cov(ˆβ) It is a robust estimator as it provides valid standard error when the assumed model for the covariance among the repeated measures, i.e. working covariance, is misspecified. With the sandwich estimator, why bother expending effort to model the within-subject association in marginal model? Efficiency or precision could be improved with a working covariance more closer to the true underlying covariance. The robustness property depends on the large sample property, i.e., relative large number of subjects and relative small number of repeated measures. The sandwich estimator may be problematic for severely unbalanced data. The model-based variance estimator should be used instead. 20

21 GEE1 hypothesis testing Generalized Wald Test: H 0 : β j = 0 Write β = (β 1, β 2 ), H 0 : β 1 = 0 ˆβ T 1 ˆβ j s.e. ˆ N (0, 1). ˆ Var 1 (ˆβ) 1 ˆβ1 χ 2 r, where r is the dimension of β 1 and Var ˆ 1 (ˆβ) is the estimated variance matrix corresponding to ˆβ 1 (r r subprincipal matrix of Var(ˆβ)). ˆ H 0 : Lβ = 0, where L is a r p matrix, where ˆβ T L T (L ˆ Var(ˆβ)L T ) 1 Lˆβ χ 2 r, ˆ Var(ˆβ) is the estimated variance matrix for ˆβ. Generalized Score Test: H 0 : Lβ = 0 where L is a r p matrix. T = S( β) T AL T (LBL T ) 1 LAS( β) χ 2 r. where A and B are model-based and empirical covariance estimates, S( β) is the GEE evaluated at the regression parameter estimate β under H 0. Rotnitzky and Jewell (1990). 21

22 - In the generalized Wald test statistic, Var(ˆβ) ˆ using the sandwich form can be unstable if n i s are large and m is small. If the correlation structure has been carefully modelled, the model-based (or working ) Var(ˆβ) ˆ can be used. - The generalized score test is invariant under any differentiable transformation of the parameters, τ = τ (β). 22

23 Augmented GEE GEE1.5 Prentice (1988) s approach for binary data Idea: Estimation of both parameters vectors, β and α, in the marginal response model and the correlation model, respectively. GEE1 uses an estimating function U 1 based on the centered first moments (Y i µ i ) for the estimation of β. We can add a second estimation function based on the centered second moments to estimate α: [ where σ ijk = E ] (Y ij µ ij )(Y ik µ ik ) [µ ij (1 µ ij )µ ik (1 µ ik )] 1/2 (Y ij µ ij )(Y ik µ ik ) [µ ij (1 µ ij )µ ik (1 µ ik )] 1/2 σ ijk is the mean of the sample correlation. 23

24 Now, the joint estimating functions are: U 1 (β, α) = U 2 (β, α) = m i=1 m i=1 D T i (β)v 1 i (β, α) {Y i µ i (β)} (1) E T i (β, α)w 1 i (β, α) {S i σ i (β, α)}, (2) where S i = (S i1 S i2, S i1 S i3, S i1 S ini,, S ini 1S ini ) T S ij = (Y ij µ ij )/[µ ij (1 µ ij )] 1/2 σ i = E(S i ) E i = σ i α W i = diag(var(s i1 S i2 ),, Var(S ini 1S ini )) Var(S ij S ik ) = 1 + (1 2µ ij )(1 2µ ik )[µ ij (1 µ ij )µ ik (1 µ ik )] 1/2 σ ijk σ 2 ijk (Check this) 24

25 Note that the above W i is an n i (n i 1)/2-dimensional working independent variance matrix for S i. Other working matrices could readily be substituted. To model Cov(S i ) properly, we need specify models for higher moments (with more parameters), which is typically difficult. Iterative method can be used for solving the joint estimating equations 0 = U 1 (β, α) 0 = U 2 (β, α) Given (ˆβ (l), ˆα (l) ): 1. Fixed ˆα (l), solve 0 = U 1 (β, ˆα (l) ) to get ˆβ (l+1). 2. Fixed ˆβ (l+1), solve 0 = U 2 (ˆβ (l+1), α) to get ˆα (l+1). (ˆβ, ˆα) is consistent and asymptotically normal under correct model specification. Similar to GEE1, ˆβ is consistent even if the model for α is misspecified. 25

26 GEE2 Prentice and Zhao (1991) s appraoch They considered a systematic approach for generating estimating functions for δ = (β, α). Combine the response vector Y i and the pairwise crossproducts, S, into one outcome vector T T i = (Y T i, Z T i ), where Y T i = (y i1,..., y ini ), and Z T i = (y 2 i1, y i1y i2,..., y 2 i2, y i2y i3,..., y 2 in i ). The estimating equations for (β, α) can be generated under a Quadratic Exponential Family (QEF) model, where the loglikelihood Pr i (y i ; µ i, σ i ) θ T 1iY i + θ T 2iZ i + c i (Y i ), where the canonical parameters (θ 1i, θ 2i ) are functions of (β, α). 26

27 Optimal estimating equations for δ = (β T, α T ) T : U(δ) = = m i=1 µ i β 0 V i (1, 1) = Cov(Y i ) Di T (δ)vi 1 (δ)f i (δ) σ i β σ i α V i (1, 2) = Cov(Y i, S i ) V i (2, 2) = Cov(S i ) S ijk = (Y ij µ ij )(Y ik µ ik ) V i(1, 1) V i (1, 2) V i (1, 2) T V i (2, 2) σ i = E[S i ], σ ijk = Cov(Y ij, Y ik ). 1 y i µ i s i σ i Third and fourth order moments are contained in V i, specifically V i (1, 2) and V i (2, 2). 27

28 One can specify a working variance matrix in V i, in which the third and fourth moments are expressed as functions of (µ i, σ i ). Independence working models V i (1, 2) = 0 V i (2, 2) = diagonal matrix Gaussian working models V i (1, 2) = 0 V i (2, 2) : Cov(S ijk, S ij k ) = σ ijj σ ikk + σ ijk σ ikj 28

29 Estimation can be done using the Fisher scoring procedure ( ) 1 ( β(l+1) = β(l) + Di T Vi 1 D i i i α (l+1) α (l) ) Di T Vi 1 f i Sandwich variance estimator can be used to protect against the 3rd/4th moment model misspecification. Consistency of both ˆβ and ˆα depends on the correct modeling of both mean and covariance. The matrix V i has dimension M i M i where M i = n i + n i (n i + 1)/2, and its inverse is required! GEE1.5 is a special case of GEE2. The efficiency gain of GEE2 comparing with GEE1.5 depends on the correct specification of the 3rd/4th moments. Conclusion: GEE2 may not be worthwhile after all. If we want to specify higher order moments, why not use the likelihood? 29

30 Modeling the scale parameter ϕ Yan and Fine (2004) s approach Remember that Cov(Y i ) = ϕc 1/2 i R i C 1/2 i. When the scale parameter ϕ (i.e the over-dispersion parameter for heteroscedasticity) is important, Yan and Fine (2004) extended GEE1.5 by adding a third model for the scale parameter : g 3 (ϕ ij ) = X3ijγ, T where g 3 is the link function, e.g. log. The estimating equation is U ϕ = m i=1 D T 3iV 1 3i (T i ϕ i ) = 0, (3) where T i = (T i1,..., T ini ) T, T ij = (Y ij µ ij ) 2, V (µ ij ) D 3i = ϕ i γ, ϕ i = (ϕ i1,..., ϕ ini ) T. 30

31 V 3i also contains third and fourth moments. ˆβ is consistent with mis-specification of the correlation and scale parameters; ˆγ is consistent if the mean and the scale structures are correctly specified, regardless of mis-specification of the correlation. The method is implemented in R package geepack. 31

32 A Note on Modeling Correlation of Binary Responses Correlation for binary data are constrained by their means. Let E(Y 1 ) = µ 1, E(Y 2 ) = µ 2, ρ 12 = Cor(Y 1, Y 2 ), and π 12 = E(Y 1 Y 2 ), then π 12 < min(µ 1, µ 2 ), and hence, ρ 2 12 min { µ1 (1 µ 2 ) µ 2 (1 µ 1 ), µ } 2(1 µ 1 ), µ 1 (1 µ 2 ) because the pairwise correlation ρ 12 = π 12 µ 1 µ 2 [µ 1 (1 µ 1 )µ 2 (1 µ 2 )] 1/2, and π 12 is constrained to satisfy max(0, µ 1 + µ 2 1) < π 12 < min(µ 1, µ 2 ). 32

33 Modeling odds ratios: Ψ ijk = Pr(Y ij = 1, Y ik = 1) Pr(Y ij = 0, Y ik = 0) Pr(Y ij = 1, Y ik = 0) Pr(Y ij = 0, Y ik = 1), = Pr(Y ij = 1 Y ik = 1)/ Pr(Y ij = 0 Y ik = 1) Pr(Y ij = 1 Y ik = 0)/ Pr(Y ij = 0 Y ik = 0). The log odds ratios log Ψ (, ), are symmetric about 0 and not constrained by the marginal means. Interpretation: Ψ ijk = 1 or log Ψ ijk = 0 implies (Y ij, Y ik ) are uncorrelated. The odds ratio, Ψ ijk and the marginal means µ ij, µ ik determine π ijk = E(Y ij Y ik ), hence determine the correlation ρ ijk, and variance V i (β, α). 33

34 The odds ratio of Ψ ijk can be expressed as function of µ ij, µ ik and π ijk : Ψ ijk = π ijk(1 µ ij µ ik +π ijk ) (µ ij π ijk )(µ ik π ijk ) (4) Thus, π ijk can be solved from the above and expressed as function of µ ij, µ ik and Ψ ijk : π ijk = A [A2 4(Ψ ijk 1)Ψ ijk µ ij µ ik ] 1/2 2(Ψ ijk 1) when Ψ ijk 1 π ijk = Ψ ijk µ ij µ ik when Ψ ijk = 1 where A = 1 (µ ij + µ ik )(1 Ψ ijk ) Lipsitz et al (1991) 34

35 Using Marginal Odds Ratios to Model Association for Binary Responses Lipsitz et al (1991) s approach They modified the approach of Prentice (1988) using odds ratios. The joint models (Lipsitz et al, 1991) are: Mean model Correlation model logit(µ ij ) = X T ijβ log(ψ ijk ) = Z T ijkα Lipsitz et al. noted that β estimates from odds ratio model are slightly efficient comparing to that from correlation model by simulation and these two mdoels are computationally similar. 35

36 Carey, Zeger, and Diggle (1993) s alternating logistic regression (ALR) appraoch Let γ ijk = log Ψ ijk. Note that the pairwise conditional expectations: logit E(Y ij Y ik, X i ) = γ ijk Y ik + ijk, where ( ) µ ij π ijk ijk = log. 1 µ ij µ ik + π ijk (Check this using the fact in (4).) Suppose that all the odds ratios are the same, γ ijk = γ, then an estimate for γ could be obtained by a logistic regression of y ij on y ik, 1 j < k n i, i = 1,, m, using ijk as an offset. More generally, we assume log(ψ ijk ) = Zijk T α, where Z ijk is a set of covariates characterizing the log-odds ratio between observations j and k. For example, in family data, Z ijk would encode the type of family relationship for Y ij and Y ik : husband-wife, parent-sib, or sib-sib. We then estimate the vector α by a logistic regression of y ij on the product zijk T y ik with the offset ijk. 36

37 Note that the offset depends on both β and α so that iterations are required. We alternate the following two steps until convergence. 1. Given the current values of (β (l), α (l) ), calculate ˆV (l) and solve the U β (β, α (l) ) = 0 for an updated ˆβ (l+1), where U β (β, α) = m i=1 ( ) T µi V i (β, α) 1 (Y i µ i ). β 2. Given ˆβ (l+1) and ˆα (l), evaluate the offset and perform the offset logistic regression of Y ij on Z ijk Y ik with a total of m i=1 n i(n i 1)/2 observations to obtain ˆα (l+1). 37

38 Formally, the ALR uses the same model as in Lipsitz et al. (1991) but uses this estimating equation for α: U α (β, α) = m i=1 F T i (β, α) 1 W i (β, α)t i (β, α), where F i = ζ i α ζ ijk = E(Y ij Y ik ) T i s (j, k) entry: T ijk = Y ij ζ ijk W i (β, α) = diag(var(y ij Y ik )) = diag(ζ ijk (1 ζ ijk )). No working assumption about the 3rd/4th-order odds ratios are required. The efficiency is comparable to GEE2 but more computationally efficient for large clusters (does not require the inverse of large matrices). 38

39 Further Reading Chapters 7.1, 7.5, and 8 of DHLZ. References Cavey V, Zeger SL, and Diggle P (1993). Modelling multivariate binary data with alternating logistic regression. Biometrika 80: Ling KY and Zeger SL (1986). Longitudinal data analysis using generalized linear models. Biometrika 73: Liang KY, Zeger SL, and Qaquish B (1992). Multivariate regression analysse for categorical data (with discussion). JRSSB 54:3-40. Lipsitz SR, Laird NM, and Harrington DP (1991). Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78: Prentice RL (1988). Correlated binary regression with covariates specific o each binary observations. Biometrics 44: Prentice RL and Zhao LP (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47: Rotnitzky, A. and Jewell, N.P., (1990), Hypothesis Testing of Regression Parameters in Semiparametric Generalized Linear Models for Cluster Correlated Data. Biometrika, 77: Yan J and Fine J (2004). Estimating equations for association structures. Statistics in Medicine 23:

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Generalized Estimating Equations (gee) for glm type data

Generalized Estimating Equations (gee) for glm type data Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Part IV: Marginal models

Part IV: Marginal models Part IV: Marginal models 246 BIO 245, Spring 2018 Indonesian Children s Health Study (ICHS) Study conducted to determine the effects of vitamin A deficiency in preschool children Data on K=275 children

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio

Assessing GEE Models with Longitudinal Ordinal Data by Global Odds Ratio Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS074) p.5763 Assessing GEE Models wh Longudinal Ordinal Data by Global Odds Ratio LIN, KUO-CHIN Graduate Instute of

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Efficiency of generalized estimating equations for binary responses

Efficiency of generalized estimating equations for binary responses J. R. Statist. Soc. B (2004) 66, Part 4, pp. 851 860 Efficiency of generalized estimating equations for binary responses N. Rao Chaganty Old Dominion University, Norfolk, USA and Harry Joe University of

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Module 6 Case Studies in Longitudinal Data Analysis

Module 6 Case Studies in Longitudinal Data Analysis Module 6 Case Studies in Longitudinal Data Analysis Benjamin French, PhD Radiation Effects Research Foundation SISCR 2018 July 24, 2018 Learning objectives This module will focus on the design of longitudinal

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution Peng WANG, Guei-feng TSAI and Annie QU 1 Abstract In longitudinal studies, mixed-effects models are

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Longitudinal data analysis using generalized linear models

Longitudinal data analysis using generalized linear models Biomttrika (1986). 73. 1. pp. 13-22 13 I'rinlfH in flreal Britain Longitudinal data analysis using generalized linear models BY KUNG-YEE LIANG AND SCOTT L. ZEGER Department of Biostatistics, Johns Hopkins

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

A measure of partial association for generalized estimating equations

A measure of partial association for generalized estimating equations A measure of partial association for generalized estimating equations Sundar Natarajan, 1 Stuart Lipsitz, 2 Michael Parzen 3 and Stephen Lipshultz 4 1 Department of Medicine, New York University School

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software January 2006, Volume 15, Issue 2. http://www.jstatsoft.org/ The R Package geepack for Generalized Estimating Equations Ulrich Halekoh Danish Institute of Agricultural

More information

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference

The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference An application to longitudinal modeling Brianna Heggeseth with Nicholas Jewell Department of Statistics

More information

POWER AND SAMPLE SIZE CALCULATIONS FOR GENERALIZED ESTIMATING EQUATIONS VIA LOCAL ASYMPTOTICS

POWER AND SAMPLE SIZE CALCULATIONS FOR GENERALIZED ESTIMATING EQUATIONS VIA LOCAL ASYMPTOTICS Statistica Sinica 23 (2013), 231-250 doi:http://dx.doi.org/10.5705/ss.2011.081 POWER AND SAMPLE SIZE CALCULATIONS FOR GENERALIZED ESTIMATING EQUATIONS VIA LOCAL ASYMPTOTICS Zhigang Li and Ian W. McKeague

More information

Analysis of Correlated Data with Measurement Error in Responses or Covariates

Analysis of Correlated Data with Measurement Error in Responses or Covariates Analysis of Correlated Data with Measurement Error in Responses or Covariates by Zhijian Chen A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations Takeshi Emura and Hisayuki Tsukuma Abstract For testing the regression parameter in multivariate

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Justine Shults 1,,, Wenguang Sun 1,XinTu 2, Hanjoo Kim 1, Jay Amsterdam 3, Joseph M. Hilbe 4,5 and Thomas Ten-Have 1

Justine Shults 1,,, Wenguang Sun 1,XinTu 2, Hanjoo Kim 1, Jay Amsterdam 3, Joseph M. Hilbe 4,5 and Thomas Ten-Have 1 STATISTICS IN MEDICINE Statist. Med. 2009; 28:2338 2355 Published online 26 May 2009 in Wiley InterScience (www.interscience.wiley.com).3622 A comparison of several approaches for choosing between working

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori On Properties of QIC in Generalized Estimating Equations Shinpei Imori Graduate School of Engineering Science, Osaka University 1-3 Machikaneyama-cho, Toyonaka, Osaka 560-8531, Japan E-mail: imori.stat@gmail.com

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Composite Likelihood Estimation

Composite Likelihood Estimation Composite Likelihood Estimation With application to spatial clustered data Cristiano Varin Wirtschaftsuniversität Wien April 29, 2016 Credits CV, Nancy M Reid and David Firth (2011). An overview of composite

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Survival Analysis for Case-Cohort Studies

Survival Analysis for Case-Cohort Studies Survival Analysis for ase-ohort Studies Petr Klášterecký Dept. of Probability and Mathematical Statistics, Faculty of Mathematics and Physics, harles University, Prague, zech Republic e-mail: petr.klasterecky@matfyz.cz

More information

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Stuart R. Lipsitz and Garrett M. Fitzmaurice, Joseph G. Ibrahim, Debajyoti Sinha, Michael Parzen. and Steven Lipshultz

Stuart R. Lipsitz and Garrett M. Fitzmaurice, Joseph G. Ibrahim, Debajyoti Sinha, Michael Parzen. and Steven Lipshultz J. R. Statist. Soc. A (2009) 172, Part 1, pp. 3 20 Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Pseudo-score confidence intervals for parameters in discrete statistical models

Pseudo-score confidence intervals for parameters in discrete statistical models Biometrika Advance Access published January 14, 2010 Biometrika (2009), pp. 1 8 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asp074 Pseudo-score confidence intervals for parameters

More information

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION STATISTICS IN MEDICINE GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION NICHOLAS J. HORTON*, JUDITH D. BEBCHUK, CHERYL L. JONES, STUART R. LIPSITZ, PAUL J. CATALANO, GWENDOLYN

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Marginal Regression Analysis of Longitudinal Data With Time-Dependent Covariates: A Generalised Method of Moments Approach

Marginal Regression Analysis of Longitudinal Data With Time-Dependent Covariates: A Generalised Method of Moments Approach Marginal Regression Analysis of Longitudinal Data With Time-Dependent Covariates: A Generalised Method of Moments Approach Tze Leung Lai Stanford University Dylan Small University of Pennsylvania August

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Multivariate Extensions of McNemar s Test

Multivariate Extensions of McNemar s Test Multivariate Extensions of McNemar s Test Bernhard Klingenberg Department of Mathematics and Statistics, Williams College Williamstown, MA 01267, U.S.A. e-mail: bklingen@williams.edu and Alan Agresti Department

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

SUPPLEMENTARY SIMULATIONS & FIGURES

SUPPLEMENTARY SIMULATIONS & FIGURES Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,

More information

Trends in Human Development Index of European Union

Trends in Human Development Index of European Union Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion

Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Pairwise rank based likelihood for estimating the relationship between two homogeneous populations and their mixture proportion Glenn Heller and Jing Qin Department of Epidemiology and Biostatistics Memorial

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

Chapter 3 - Temporal processes

Chapter 3 - Temporal processes STK4150 - Intro 1 Chapter 3 - Temporal processes Odd Kolbjørnsen and Geir Storvik January 23 2017 STK4150 - Intro 2 Temporal processes Data collected over time Past, present, future, change Temporal aspect

More information

Robust Inference for Regression with Spatially Correlated Errors

Robust Inference for Regression with Spatially Correlated Errors Journal of Modern Applied Statistical Methods Volume 10 Issue 2 Article 7 11-1-2011 Robust Inference for Regression with Spatially Correlated Errors Juchi Ou Case Western Reserve University, jxo37@cwru.edu

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Semiparametric Mean-Covariance. Regression Analysis for Longitudinal Data

Semiparametric Mean-Covariance. Regression Analysis for Longitudinal Data Semiparametric Mean-Covariance Regression Analysis for Longitudinal Data July 14, 2009 Acknowledgments We would like to thank the Editor, an associate editor and two referees for all the constructive comments

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information