Power analyses for longitudinal trials and other clustered designs

Size: px
Start display at page:

Download "Power analyses for longitudinal trials and other clustered designs"

Transcription

1 STATISTICS IN MEDICINE Statist Med 2004; 23: (DOI: 0002/sim869) Power analyses for longitudinal trials and other clustered designs X M Tu ; 2; ; ;, J Kowalski 3;, J Zhang 4;, K G Lynch 4; and 5; P Crits-Christoph Department of Biostatistics and Computational Biology; University of Rochester; 60 Elmwood Avenue; Rochester; NY 4642; USA 2 Department of Psychiatry; University of Rochester; 60 Elmwood Avenue; Rochester; NY 4642; USA 3 Department of Oncology and Biostatistics; Johns Hopkins University; USA 4 Department of Biostatistics and Epidemiology; University of Pennsylvania School of Medicine; USA 5 Department of Psychiatry; University of Pennsylvania School of Medicine; USA SUMMARY Existing methods for power and sample size estimation for longitudinal and other clustered study designs have limited applications In this paper, we review and extend existing approaches to improve these limitations In particular, we focus on power analysis for the two most popular approaches for clustered data analysis, the generalized estimating equations and the linear mixed-eects models By basing the derivation of the power function on the asymptotic distribution of the model estimates, the proposed approach provides estimates of power that are consistent with the methods of inference for data analysis The proposed methodology is illustrated with numerous examples that are motivated by real study designs Copyright? 2004 John Wiley & Sons, Ltd KEY WORDS: epidemiological study; GEE; HIV; linear mixed-eects models; intraclass correlation; psychosocial and survey research INTRODUCTION Power and sample size estimation constitutes an important component in the design and planning of modern clinical trials It provides information for assessing the feasibility of a study to Correspondence to: X M Tu, Department of Biostatistics and Computational Biology, University of Rochester, 60 Elmwood Avenue, Box 630, Rochester, NY 4642, USA xin tu@urmcrochesteredu Professor Assistant Professor Graduate Student Assistant Professor Professor Contract=grant sponsor: NIH=MINH; contract=grant number: P50-MH Contract=grant sponsor: NIH=NIAD; contract=grant number: K22AI 586 Received January 2004 Copyright? 2004 John Wiley & Sons, Ltd Accepted March 2004

2 2800 X M TU ET AL detect some pre-specied eect size and for estimating the amount of resources necessary for its execution in both ecacy and eectiveness research [ 3] Although the past two decades have witnessed major advances in the statistical methods for clustered data analysis, particularly in longitudinal studies, most research eects have been centred around data analysis, with little attention paid to power and sample-size estimation As a result, the development of methods for the latter has been evolving at a much slower pace Often, methods developed based on cross-sectional study designs are used to provide power and sample-size estimates for longitudinal studies Only recently has attention been paid to the eect of within-subject correlations in such study designs [4 0] Despite these new developments, current methods still have limited applications, especially in the presence of continuous covariates (predictors) In this paper, we review and extend current work on power analysis for the two most popular clustered data approaches, the generalized estimating equations (GEE) and the linear mixed-eects models (LMM) Existing GEE- and LMM-based methods only apply to group comparisons and relatively simple study designs By extending these approaches to more complex designs and developing new analytic methods based on the asymptotic distribution of estimates, we are able to provide more accurate power estimates In Section 2, we develop the general approach and highlight its dierences with respect to existing alternatives In Section 3, we illustrate the approach with examples based on real design considerations for biomedical, epidemiological and psychosocial research studies In Section 4, we discuss limitations of the proposed approach and directions for future research 2 POWER ANALYSIS FOR CLUSTERED DATA Consider a study of n clusters, indexed by i, each of size m (6i6n) Let y it denote the response and x it the vector of predictors (or covariates) of the tth member from the ith cluster Note that we have assumed a common cluster size We discuss the implications of this assumption and extensions to accommodate varying cluster sizes in Discussion Note also that m is considered to be xed throughout the development so that asymptotic methods apply for large n The two most popular approaches for regression analysis with clustered data, especially for data from longitudinal studies, are the GEE [, 2] and the mixed-eects models (MM) [3 23] For data analysis, the latter has the advantage of being able to tease apart the between- and within-cluster variability, while the former class of models has the much desired distribution-free property for robust inference However, in most applications, power estimates are typically desired for detecting dierences in the (marginal) mean response or xed-eects Thus, the advantage of MM in obtaining cluster-specic variance estimates is no longer an important consideration On the other hand, the distribution-free property of the GEE becomes especially desirable for power analysis Without real data, it would be impossible to verify a parametric distribution model and the robust property of GEE will ensure reliable estimates regardless of the data distribution However, the nal decision rests upon which approach will be used for data analysis For example, it makes little sense to compute power based on a MM model, when the study data will be analysed by the GEE Regardless of which approach to use, we consider regression models that link x it to y it through a linear predictor x it or a function of such a predictor, where is the vector of parameters of interest Under GEE, such a regression for a continuous or binary response is

3 POWER ANALYSES FOR LONGITUDINAL TRIALS 280 dened by the marginal mean and variance of the response in terms of the generalized linear models [, 2]: it = E[y it x it ]=h(x it); Var(y it x it )= 2 v( it ); 6i6n; 6t6m () where h ( ) is a known link function, 2 a scale parameter and v( ) a known function Under MM, the serial correlations of the joint distribution of the clustered responses is explicitly accounted for by introducing some latent variables (random eects) For a continuous response, the LMM is dened by: y it = x it + z itb i + it or y i = X i + Z i b i + i b i N(0;D); i N(0; 2 I m ); 6t6m (2) where D denotes the variance of the random eect b i, 2 the variance of the model error i, I m the m m identity matrix, X i =(x i ;:::x im ) and Z i =(z i ;:::z im ) the design matrix for the xed and random eect, respectively In most applications, Z i = X i In this paper, we limit our discussion of the MM-based approach to LMM A primary reason for focusing on LMM rather than more general mixed-eects models such as the generalized linear mixed-eects models for discrete responses is the large number of dierent approaches proposed for inference and the complexity in the asymptotic distributions of model estimates [4 23] Also, for GEE, we limit our consideration to () for continuous and binary responses However, the proposed approach is readily generalized for GEE models for categorical responses [24] For power analysis, we consider the class of general linear hypothesis of the form: H 0 : K = b; H a : K = d b (3) where K is a known full rank s p matrix (p is the dimension of ), and b and d are s vectors of known constants The linear hypothesis (3) is the most general class of hypotheses that has been systematically studied for both the classic linear and generalized linear models and applies to virtually all types of hypotheses concerning arising in practical studies When b = 0, H 0 is known as a linear contrast For non-zero b, we can re-express (3) in terms of a contrast by performing the parameter transformation: = K (K K) b (eg Reference [25]) such that when expressed in terms of, () becomes: it = h it (c it + x it); Var(y it x it )= 2 v( it ); 6i6n; 6t6m (4) where c it = x itk (K K) b is a known constant For linear regressions, c it is often absorbed into the response by dening a new response variable (eg Reference [25]) For non-linear models, c it is often called an oset (eg Reference [26]) Since c it has no eect on power, we focus on the class of linear contrasts, ie b = 0 in (3) We develop power functions for such contrasts under each approach We start with our considerations for GEE 2 Generalized estimating equation Under GEE, estimates of are obtained as solutions to a generalized estimating equation [] Since the GEE procedure has been well documented and extensively discussed in the literature, we only highlight the results that are most relevant to the current development

4 2802 X M TU ET AL Let A i = diag[v(h it )]; y i =(y i ;:::;y im ) ; h i =(h i ;:::;h im ) V i = 2 A =2 i W ()A =2 i ; D h i; S i = y i h i ; U i = D i V i S i (5) where diag[v(h it )] denotes a diagonal matrix with v(h it ) on the tth diagonal and W () a working correlation matrix modelled by the parameter vector The working correlation matrix, W (), is generally not equal to the true within-cluster correlation of y and its role is to increase eciency (or decrease standard errors) [, 27] A GEE estimate, ˆ, obtained by solving the GEE, is both consistent and asymptotically normal, ie n( ˆ ) d N(0; = B U B ) B = E(D V D ); U = E[D V S S V D ] (6) where d denotes convergence in distribution [28, 29] Thus, for large sample size, ˆ has an approximate normal distribution, N(; =n ), which is the basis for inference about For data analysis,, B and U are estimated by the observed data By substituting these estimates into (6), we obtain a robust asymptotic variance estimate of In other words, with real data, we can obtain an estimate of, regardless of the choice of W () For power analysis, however, the situation is quite dierent Prior to data collection, estimated values are not available for any of these parameters In this case, we set W () equal to the true correlation matrix from which it follows that: U = E[D V E(S S x )V D x ]=B ; = E (D V D ) (7) In most applications of power analysis, this true correlation matrix is often modelled as a function of t s [7, 30] For example, under the uniform compound symmetry structure (eg Reference [7]), the correlation between y s and y t is modelled as a constant, st = (s t) This assumption may oversimplify the structure of the correlation of y, but other alternatives may be used to reach a compromise between analytic simplicity and reality Modelling such correlation structures has been extensively discussed in the literature and is not repeated here It follows from standard asymptotic theory (eg Reference [3]) and (6) that K ˆ has a limiting normal distribution under either H 0 or H a, ie H 0 : nk ˆ d N(0;K K ); H a : n(k ˆ d) d N(0;K K ) (8) In addition, the centred quadratic statistic, Qn0 2 = n[k( ˆ )] (K K ) [K( ˆ )] has an asymptotic central 2 distribution, Qn0 2 ds 2 (0), where s 2 (c) denotes a 2 distribution with degree of freedom s and non-centrality parameter c In other words, under H 0 and H a, the quadratic or Wald statistic, Qn 2 = n(k ˆ) (K K ) (K ˆ), has approximately a central and non-central s 2 distribution, respectively, H 0 : Q 2 n 2 s (0); H a : Q 2 n 2 s (c) (9)

5 POWER ANALYSES FOR LONGITUDINAL TRIALS 2803 where c = nd (K K ) d Let F 2 s (c) denote the cdf of s 2 (c) For a given level of type I error, let p denote the th percentile of s 2 (0) The power function,, for the linear contrasts (3) based on the Wald statistic is given by: (n; c; )= F 2 s (c)(p ); = F 2 s (0)(p ) (0) For a given sample size n, above provides the power for detecting H a in (3) Alternatively, (0) can be used to estimate the required sample size to achieve a pre-specied power In this case, is given and the minimum sample size required to reach is the solution to the rst equation in (0) This non-linear equation is numerically solved by root-nding methods such as the Newton s algorithm (eg Reference [32]) Most commercially available software packages implement one or more such root-nding methods For example, in SAS, the function, NLPFDD, computes the root of a general non-linear function using the Newton s method, where the required rst-order derivatives are either analytically calculated or numerically approximated, depending on the complexity of the non-linear function Sample size estimates are obtained by rounding o the solutions using the largest integer function Note that the power function (0) is derived based on the asymptotic distribution of the GEE estimates As a result, the power function depends on x it through its distribution, rather than a set of specic values, as in other existing methods for linear and generalized linear models [4, 8] Within our context, x it can contain both discrete and continuous variables Although for discrete x it both the proposed and existing approaches yield the same power functions, they are derived based on dierent principles We highlight this dierence with some specic models below 2 Repeated analysis of variance models First, consider the class of repeated analysis of variance (RANOVA) models Such models generalize the traditional analysis of variance (ANOVA) models to a longitudinal setting with repeated assessments [8] However, as the repeated measures give rise to correlated outcomes, methods for clustered data must be used to address the within-cluster correlations Let y kit denote the tth repeated measure from the ith subject within the kth group for 6i6n k,6k6g and 6t6m By identifying the repeated responses of the same subject as a cluster, () for modelling the group means over time is given by: or, in a matrix form, as: E(y kit )= kt ; Var(y kit )= 2 t ; 6i6n k ; 6k6g; 6t6m () E(y ki )= k ; V = Var(y ki ) = diag( s )( st ) diag( t ); 6i6n k ; 6k6g (2) where diag( s ) denotes a diagonal matrix with s on the sth diagonal, ( st ) the correlation matrix between the within-cluster responses and y ki =(y ki y kim ) ; k =( k km ) ; =( g ) (3) Note that implicit in () or (2) is the assumption that the variance matrix, Var(y ki ), is the same across all groups To compute power and sample size estimates, must be specied For example, may be assumed to follow the compound symmetry model, C(), where denotes the within-subject correlation In this case, V = diag( s )C() diag( t )

6 2804 X M TU ET AL Let y k = n k y ki ; ˆ =(y n y g ) ; n= g n k ; k i= k= p k = lim n n k n ; D= diag(p k ) (4) Then, it follows from (6) that the GEE estimate, ˆ, has the asymptotic distribution: n(ˆ ) d N(0; = D V ) (5) where denotes the Kronecker product (eg Reference [33]) The power function for the linear contrasts (3) (with replaced by ) is given by (9) and (0) after substituting for Note that in (5) is the asymptotic variance Thus, p k refers to the proportion of group k in the study sample When used for a nite sample size, p k is simply set to the estimate ˆp k = n k =n, which is the proportion of group k in the sample Thus, is determined by the distribution of ˆp k, rather than conditioned upon a particular sample, as in other existing methods (eg Reference [8]) This subtle dierence is best illustrated with continuous predictors (covariates) as discussed in the next section Note also that by ignoring the ordered structure of the repeated assessments (over time), the RANOVA can be applied to cross-sectional clustered study designs in survey and epidemiological research [0, 34] The only dierence when applying to such a context is that in most applications, the ordered nature of the repeated assessments no longer exists, ie all elements of y ki are exchangeable and thus hypotheses of interest only concern group means rather than mean vectors dened for each of the assessment times as in the context of longitudinal studies For example, if we want to compare g groups each with mean k (6k6g), we simply set k =( k k ) in (3) and proceed otherwise as above As a special case, for g = 2 and under the uniform compound symmetry correlation structure, this yields the procedure proposed by Manatunga et al [0] in the case of a common across-group cluster size Note that the authors also considered a general case with varying cluster sizes We discuss extensions of our approach to this more general setting in Discussion 22 Linear regression models with continuous covariates By setting h it to the identity link, we obtain from () the class of linear regression models for repeated measures: E(y it x it )=x it; Var(y it x it )= 2 t ; 6t6m; 6i6n (6) This class of models is quite popular in growth curve analysis As in the case of RANOVA, the GEE estimate ˆ again can be expressed in closed form: ( n ˆ = i= ) ( n ) Xi V X i Xi V y i i= where X i =(x i ;:::;x im ) For the class of linear contrasts (3), the power is given by (0), with = E (X V X ) To compute power, we must evaluate E(X V X ) To this end, note that [ E(X V X )=E[(x ;:::;x m )V (x ;:::;x m ) ]= c jk = ] v lr E(x jl x kr ) (7) l; r

7 POWER ANALYSES FOR LONGITUDINAL TRIALS 2805 where v lr denotes the lrth element of V and E(x jl x kr ) the lrth element of E(x l xr ) Thus, is a function of V and E(x s xt )(6s; t6m) Given V, E(X ) and Var(X ), E(X V X ) is readily evaluated Like RANOVA, is determined by the distribution of x it, or more precisely, the mean and variance of x it in this case In comparison, existing methods condition upon a particular sample of x it and thus require that the values of x it be known for each subject in the entire sample [4] This is quite unrealistic in practice The current approach only requires the mean and variance of x it and in most applications, it is possible to obtain estimates of these parameters from other similar studies, especially when x it only contains one or two covariates Note that for regression analysis with non-clustered study designs, a common approach is to use the expectation of the R 2 = SSR=SSTO as the eect size for power and sample size estimation (eg Reference [35, Chapter 9]), where SSR and SSTO denote the regression and total sums of squares, respectively (eg Reference [36, Chapter 7]) Such an approach does not require the distribution of the predictors A primary disadvantage of this approach is that it does not provide power for specic contrasts of interest In addition, such approach requires normal data distribution and may not be readily generalized to the GEE setting 23 Generalized linear models for binary response For a binary response y it, the model in () has the following form: E(y it x it )=h it (x it ); Var(y it x it )=h it ( h it ); 6t6m; 6i6n (8) For binary data, the within-cluster correlation does not have a straight forward interpretation as for continuous responses If estimates of such correlations are available from other similar studies, then they can be used for the variance matrix of the responses Otherwise, it is probably more convenient to express the correlations as a function of transition probabilities as in one-step Markov models For example, it is readily shown that the within-cluster correlation between y is and y it is given by: Corr(y is ;y it x is ; x it )= Pr[y it = y is =; x is ; x it ] h it ( his ) ( h it ) h is h it ; s t (9) Here, the one-step transition probability, Pr[y it = y is =; x is ; x it ], is generally easier to interpret than the correlation Corr (y is ;y it x is ; x it ) In addition, for most applications, we may also want to approximate Pr[y it = y is =; x is ; x it ] with Pr[y it = y is = ], which is even easier to specify For example, Pr[y it = y is = ] is simply the transition probability of observing a response from time t, given the same response at time s In general, for non-linear links such as the logit link, = E (D V D ) is not in closed form In addition, unlike linear models, E (D V D ) no longer depends on x t through its rst two moments and a distribution of x t must be assumed to evaluate this quantity Let F() denote the probability distribution function of X Then, we have: [ = E (D V D )= D ()V ()D ()df()] (20) For discrete x t, the above is readily expressed in closed form (see also Example 3 in Section 3) For continuous x t, (20) is generally not in closed form One way to approx-

8 2806 X M TU ET AL imate it is through Monte Carlo (MC) simulations For example, by generating a sample of size M from the distribution of X, we can approximate by the sample average: [ M M k= ] Dk V k Dk (2) The accuracy of the MC approximation improves as M increases In addition, the MC sample size M can even be selected to ensure that the MC approximation achieves required accuracies [37] These are well known facts and are not further discussed Note that for logistic regression with non-clustered data, Whittemore [38] proposed an approach to approximate the asymptotic variance of parameter estimates when the response probability is small However, even in this special case, the approach only works for some special types of distributions The MC-based approach above is more exible in terms of accommodating a mixture of both continuous and discrete predictors In addition, the accuracy of the MC approximation (2) is only a function of the MC sample size and is independent of the magnitude of the response probability as in Whittemore s approximation 22 Linear mixed-eects model Published studies have considered relatively simple designs such as RANOVA with two groups [9] and growth curve models with no predictors [5] We generalize these approaches to accommodate multiple groups in the former and predictors in the latter It follows from (2) that the conditional variance of y i given X i is given by: V i = Z i DZi + 2 I m Thus, the maximum likelihood estimate (MLE) of is given by: [ ˆ = n n i= X i ˆV i Xi ] [ n n i= ] X i ˆV i y i where ˆV i is estimated by the MLEs of D and 2 [3] Following the law of large numbers and the central limit theorem [28, 29], ˆ is both consistent and asymptotically normal: (22) n( ˆ ) d N(0; = E [X (Z DZ + 2 I m ) X ]) (23) For the linear contrast (3), the Wald statistic dened in Section 2 again has an asymptotic s 2 distribution (9), except for a redened given by (23) The power function is again given by (0) In most growth-curve analysis, z it is a function of time t only and is functionally independent of any baseline covariates In this case, V i becomes functionally independent of x it and in (23) is similarly calculated as in (7) Thus, the power function depends on X only through its rst two moments In some other applications, however, z it may depend on some baseline predictors (covariates), in which case, the distributions of X must be known in order to compute The considerations are similar to the binary model under GEE discussed above For example, if Z i contains part or whole of X i, ie Z i = Z i (X i ), we may estimate with a Monte Carlo approximation given by: ˆ = M M k= [X k (Z k DZ k + 2 I m ) X k ] (24)

9 POWER ANALYSES FOR LONGITUDINAL TRIALS 2807 where X k denotes a random sample from the distribution of X and Z k the corresponding matrix based on this sample Note that in (2), we have assumed that it has a constant variance 2 over time If this variance changes over time, the discussion above still applies, with 2 I m replaced by the variance of i 3 ILLUSTRATION In this section, we illustrate the general approach with several examples All these examples were motivated by real study designs from grant preparations within the School of Medicine at University of Pennsylvania and the School of Medicine and Dentistry at University of Rochester For illustration purposes, real study sizes are not used Instead a sample size of 70 (per group for group comparisons) is used throughout the examples, with the type I error xed at =0:05 Example (Repeated analysis of variance) In a longitudinal study to assess for depressive symptoms, chronic pain, and interpersonal functioning (dened as social role performance and attachment to others) among women presenting to a low-income women s public health clinic, it is of interest to compare patients with co-morbid depressive symptoms and chronic pain to those with chronic pain only and a control group with respect to some outcome of interest such as interpersonal functioning The design calls for an RANOVA model with three groups For illustration purposes, we assume three follow-up assessments, t =; 2; 3, (excluding baseline, denoted by t = 0) Thus, g =3; m=3 and n k =70 (6k6g) in () Under a common variance and a uniform compound symmetry correlation structure across all three groups, we have from (2) that V = 2 C() Now, consider testing the hypothesis of a constant mean dierence between group (k ) and k over the three visits, ie H 0 : kt (k )t = 0 versus H a : kt (k )t = a; k =2; 3; 6t63 (25) To express the above in the form of (3), let K =(c(; 2); c(3; 4); c(5; 6); c(); c(3); c(5); c(2); c(4); c(6)); a = a 6 (26) where c(i;:::;j) denotes a 6 vector with in the rows i;:::;j and 0 elsewhere, and 6 a 6 vector of s Power estimates for the hypothesis (25) are readily computed by (0) with s = 6 Shown in Table I are power estimates for a series of values of the input parameters, a; and As expected, increasing a or decreasing has a positive eect on power In addition, power decreases as the within-cluster correlation becomes larger The latter is also readily demonstrated analytically For example, let ˆk;(k ) =ˆ kt ˆ (k )t with ˆ kt dened in (4) Then, under H a in (25), it is readily shown that k;(k ) has the following asymptotic distribution: ) nk ( ˆ k;(k ) a) d N (0; 2 2 n k m [+(m )] ; k=2; 3 The asymptotic variance of the dierence statistic k;(k ) is an increasing function of Thus, power is inversely related to the within-cluster correlation

10 2808 X M TU ET AL Table I Estimates of power for detecting a constant between-group mean dierence over time among three groups in an RANOVA model with three assessments Within-subject over time correlation Standard deviation =4=3:5 Between-group mean dierence a =08 046= = =038 02= = = = = = = = =094 07=087 06=078 It is interesting to compare this example with the paired t test The latter is widely used for comparing two groups involving changes between two time points, often termed pre- and post-treatment It is well known that for the paired t-test power is an increasing function of the within-subject correlation (between pre- and post-treatment measures), in contrast to the behaviour of the power function for hypothesis (25) To explain the dierence, let us formulate the paired t-test using the set-up in this paper Let m = 2 and g = Let y it denote the paired pre- (t = ) and post-responses (t = 2), with mean t for t =; 2 The paired t-test is designed to detect a dierence between the t s, ie H 0 : 2 = 0 versus H a : 2 = a 0 (27) By comparing the above to (25), it is seen that the paired t-test detects dierences in the mean response between two assessment points, while the hypotheses in (25) concern between-group dierences within each of the assessment points Note that power for the paired t-test is usually computed based on the t-distribution [35] The alternative based on (27) is asymptotically equivalent to this procedure, with the advantage of not requiring the normal assumption Example 2 (Linear growth curve with a continuous baseline covariate under GEE) In a sleep study, it is of interest to model the change of some measure of sleep disturbance (total sleep time, averaged cortisol or melatonin levels, etc) over three assessment points Since sleep is a function of age, it is important to control for its eect in power analysis In this example, we assume that the change pattern over the period of study is independent of age so that there is no age by time interaction (see also Example 4 for a dierent analysis) Assume a linear growth curve model, with one baseline covariate x and three assessments, t =0;t 2 and t 3 Let x it =(x it ;x i2t ;x i3t ) =(;x i ;t) Then, under the GEE approach, it follows from Section 22 that E(y it x i )= 0 + x i + t 2 ; Var (y it x i )= 2 (28) Consider testing a non-zero slope for the linear growth: H 0 : 2 = 0 versus H a : 2 = a

11 POWER ANALYSES FOR LONGITUDINAL TRIALS 2809 Table II Estimates of power for detecting a slope of 05 in a GEE-based growth-curve model with sample size equal to 70 Within-subject over time correlation Standard deviation of model error =5=7 Standard deviation of covariate x i = =045 06= = = = = = = = = = = = =059 Then, K =(0; 0; ) and the non-centrality parameter for (9) is c = na 2 (K K ) 2 To compute, note that E(x tl x tm )=; E(x tl x 2tm )=E(x ); E(x tl x 3tm )=t m E(x 2tl x 2tm )=E(x 2 ); E(x 2tl x 3tm )=t m E(x ); E(x 3tl x 3tm )=t l t m (29) Given the rst two moments or the mean and variance of x i ; = E (X V X ) is readily computed using (7) Shown in Table II are power estimates under a compound symmetry correlation assumption for a range of values of the key parameters, ; and the standard deviation of the covariate As with Example, power is a decreasing function of the within-subject correlation In addition, power increases as the variance of the covariate gets larger Example 3 (Clustered binary responses) As an example of a clustered cross-sectional study design, consider a study in testing a new behavioural therapy in reducing the sexually transmitted disease (STD) due to HIV The goal of the study is to determine the ecacy of the therapy at post-treatment by testing for a dierential STD rate between the treated and a control group A total of 40 HIV serodiscordant heterosexual couples are targeted, which is evenly split with the two groups Although assessment is cross-sectional, partners from each couple form clustered responses To address this between-partner correlation, we model the binary responses from partners using the model in (), with cluster size m = 2 and a logit link, h it (x i ) = exp( 0 +x i )=+exp( 0 + x i ), where i indexes couple, t indexes partner (t =0; ), and the binary covariate x i indicates treatment condition: x i = for the treated x i = 0 for the control group Under this model, STD rate for the controlled and treated group can be expressed as: p 0 = h it ( 0 ) and p = h it ( 0 + ) and a dierential STD rate can be expressed in terms of as follows: H 0 : =0; H a : = a = h it (p ) h it (p 0 ) (30) To compute, note that it follows from (9) that, i = Corr(y it ;y is x i )= Pr[y it = y is =;x i ] h i h i

12 280 X M TU ET AL Table III Estimates of power for detecting a between-group dierence in incidence of STD based on a sample size of 70 couples per group STD incidence rate Between-partner correlation Control Treatment ( ) ( ) xi i D i (x i )=h i ( h i ) ; V i (x i )=h i ( h i ) x i i = E (D V D )= [ D (x )V (x )D (x )Pr[x = l]] (3) l=0; where Pr[x = l] denotes the proportion of group l in the total study population (l =0; ) For equal group size as in this example, Pr[x = l]= 2 For this application, we also set the within-couple correlation to a constant, i = This is a reasonable assumption in most such studies, since it is unlikely that the between-partner correlation changes across treatments Shown in Table III are power estimates for detecting a dierential STD rate for a range of values of the between-partner correlation As expected, power decreases as this correlation increases Note that unlike linear models, power also depends on the actual rates of the two groups, in addition to being a function of their dierence Example 4 (Linear growth curve with a binary baseline covariate under LMM) Consider again the sleep disturbance study in Example 2 Now, suppose that the change pattern varies with age For illustration purposes, we assume that we can group subjects into two age groups In addition, we assume a LMM for modelling the change pattern over time Note that applications of LMM require that the response variable follow a normal distribution For most biological measures such as averaged cortisol levels, the normal assumption may approximately apply Let x i denote a baseline binary covariate, indicating the two age groups As in Example 2, assume three assessment times t =0; t 2 and t 3, and set x it =(;x i ;t;tx i ) and z it =(;t) The LMM in (2) becomes: y it = 0 + x i + t 2 + tx i 3 + b 0i + tb i + it ; b i (0;D); N(0; 2 I m ) (32) Unlike Example 2, the above model includes a time by covariate interaction, which in this case, accounts for a dierential linear trend between two groups dened by x i = 0 and x i = Note that as special cases, (32) reduces to a two-group RANOVA if 2 = 3 = 0 and z it = [9] and to a growth curve model without covariate if = 3 = 0 [5] For this two-group growth curve model, hypotheses of interest include non-zero growth rate for one or both groups, dierential growth rates, etc For illustration purposes, consider testing

13 POWER ANALYSES FOR LONGITUDINAL TRIALS 28 Table IV Estimates of power for detecting a dierential slope in a LMM-based growth-curve model with a sample size 70 Ratio of between- to within-cluster standard deviation b Between-group slope dierence a =0:5=:0 Within-cluster std dev :99=:0 0:9=0:99 0:62=0:99 0:40=0:92 0:28=0:79 0:=0:29 4 0:99=:0 0:69=0:99 0:40=0:92 0:25=0:73 0:8=0:54 0:08=0:8 5 0:93=0:99 0:5=0:98 0:27=0:77 0:8=0:54 0:3=0:38 0:07=0:3 for a dierential growth rate, ie The power function is given by (0), with H 0 : 3 = 0 versus H a : 3 = a 0 (33) = E [X V X ]; V = Z DZ + 2 I 3 ; K =(0; 0; 0; ) (34) By identifying V above as the V in Example 2, it is readily seen that depends on x through its mean and variance the same way as in that example In comparison, the conditional variance of y i given x i here is explicitly partitioned into two parts; one accounts for the random eect (between-cluster), Z DZ, and the other for model (within-cluster) error, 2 I 3 Let D = b 2( r2 )=2 bg(; ) Note that since the intercept and slope may have quite different units, the uniform compound symmetry correlation structure is not appropriate for the random eect and the presence of is to account for such dierential units between the intercept and slope Then, it follows from (34) that = 2 E [X V X ]; V = 2 b 2 Z G(; )Z + I 3 Thus, is a function of 2 ; 2 b =2, and Shown in Table IV are power estimates for a range of values of the input parameters, 2 and 2 b =2, with =0:5 and = Note that for the particular hypothesis in (33), we found that power estimates did not change when was varied and for this reason, was set to in the calculations As in Example 2, power decreases as the variance ratio, 2 b =2, increases It is somewhat surprising to see the drastic eect of this parameter on power In addition, Table IV also shows a strong dependence of power on the slope dierence Example 5 (Intraclass correlation) We now illustrate a cross-sectional clustered design in psychotherapy research An important consideration in designing psychosocial studies is the so-called therapists eect Because therapists may dier in their skill or ability to form a therapeutic bond, there are often real dierences between therapists in their average outcomes [39] Thus, studies of the ecacy or eectiveness of therapy often have a built-in component for testing this eect The LMM is often used for this purpose

14 282 X M TU ET AL Table V Estimates of power for detecting a treatment dierence between two groups, with a sample of 70 therapists per group and 4 patients per therapist Intraclass correlation = 2 b 2 b + 2 Treatment dierence a =:0=:5 Total variance b :99=0:93 0:99=0:82 0:96=0:73 0:92=0:6 0:87=0:54 6 0:97=0:74 0:90=0:57 0:80=0:46 0:7=0:40 0:64=0: :88=0:55 0:73=0:40 0:6=0:32 0:52=0=27 0:45=0:23 Let n denote the number of therapists and m the number of patients seen by each therapist By treating n as the number of clusters and m as the size of each cluster, patients responses, y it, can be modelled using the following LMM: y it = 0 + x i + b i + it ; b i (0; 2 b); it N(0; 2 ); 6i6n; 6t6m (35) where x i is a predictor of interest In (35), the therapists eect is explicitly modelled by the random eect b i It follows from Section 22 that the asymptotic variance of the MLE of =( 0 ; ) given by: is = E [X V X ]; V = bj 2 m + 2 I m =(b )C() (36) where J m is an m m matrix of s and C() the uniform compound symmetry correlation matrix with = b 2=2 b + 2 The within-cluster correlation is widely known as the intraclass correlation Thus, the asymptotic variance is a function of the rst two moments of x i, the intraclass correlation and the total variance (b ) Consider the hypothesis: H 0 : = 0 versus H a : = a 0 (37) If x i is binary, indicating say two treatment conditions, the above tests for a dierential treatment eect If x i is a continuous covariate, then (36) tests for a linear relationship between y it and x i Shown in Table V are power estimates for detecting a non-zero treatment dierence between two samples (x i =0; ) As in all previous examples, power decreases as the intraclass correlation increases Like Example 4, power also depends heavily on the size of the treatment dierence a As noted in Section 2, cross-sectional clustered study designs also often arise in survey and epidemiological research [0, 34] However, unlike psychosocial applications where intraclass correlation is also of interest for assessing therapists eect, survey and epidemiological studies are mostly interested in estimating population means (or xed-eect), in which case one can apply either GEE or LMM Thus, LMM is more appropriate for psychosocial applications involving inference for intraclass correlations

15 POWER ANALYSES FOR LONGITUDINAL TRIALS 283 Pattern Time of Assessment Missing DataPatterns Pattern 2 Time of Assessment Pattern 3 Time of Assessment 2 3 y y 2 y 3 y 2 y 22 y y y y 2 y 32 y 42 y 53 y 63 y 2 y 3 y 4 y 5 y 6 Figure Three missing-data patterns for a longitudinal study with six subjects and three assessment points (y it denotes the response of the ith subject at time t with dots denoting missing data) 4 DISCUSSION In this paper, we have developed a systematic approach to power analysis for the two most popular clustered data approaches, the GEE and LMM By extending existing methods to accommodate more practical considerations, this unied approach improves the limitations of these methods and provides power and sample size estimation for quite general study designs under both modelling paradigms One major limitation of the proposed approach is the assumption of a common, constant cluster size across clusters In survey research and most epidemiological studies, cluster sizes often vary [0, 34] This issue of varying cluster sizes also arises in longitudinal studies as the result of missing data In the latter case, we must also address the order structure among the repeated assessments in addition to dierence in cluster sizes For data analysis, both the varying cluster size and order structure are readily addressed by applying GEE or LMM to the observed data, provided that missing data follows the missing completely at random assumption (for using GEE) or the missing at random assumption (for using LMM) [3, 40, 4] For power analysis, both become important considerations, since cluster size and order structure are a function of the sampling process and dynamically changes under replications of the same study design For example, shown in Figure are three possible missing data patterns for a hypothetical longitudinal study with six subjects and three assessments For a real study, only one of the patterns is observed and inference for model parameters of interest is performed by conditioning on the observed missing data pattern using either GEE and=or LMM However, for power analysis, we must consider all three plus many more potential missing data patterns as realizations of a random process Currently, there is no general approach to addressing the random nature of the missing data patterns Existing methods either provide power estimates conditional on a particular missing data pattern or based on certain marginal distributions of the missing data pattern [5, 6, 9, 0] Since dierent missing data patterns generally give rise to dierent power functions, the former approach does not address the eect of the dynamic

16 284 X M TU ET AL missing data patterns on power estimates The latter approach is also ineective in addressing the dierent missing data patterns For example, to account for missing data in longitudinal study designs, one such method is to condition on the available sample size at each assessment point However, this method does not distinguish the rst two patterns in Figure Methods used in survey research and epidemiological studies only account for dierences in cluster sizes [0] and as such cannot distinguish the last two missing data patterns in the gure Thus, to fundamentally address the missing data issue, it seems necessary to model the missing data process and incorporate this model into the power function Developing such an approach will be pursued in our future research ACKNOWLEDGEMENTS This research is supported in part by an NIH/MINH Grant P50-MH (Crits-Christoph and Tu) and by an NIH/NIAD Grant K22AI 586 (Kowalski) We especially thank two anonymous reviewers for bringing to our attention many important references and for numerous valuable comments that greatly improved the presentation of the material REFERENCES Clarke GN Improving the transition from basic ecacy research to eectiveness studies: methodological issues and procedures Journal of Consulting and Clinical Psychology 995; 63: Hoagwood K, Hibbs E, Brent D, Jensen P Introduction to the special section: ecacy and eectiveness in studies of child and adolescent psychotherapy Journal of Consulting and Clinical Psychology 995; 63: Hogarty GE, Schooler NR, Baker RW Ecacy versus eectiveness Psychiatric Services 997; 48:07 4 Muller KE, LaVange LM, Ramey SL, Ramey CT Power calculations for general linear multivariate models including repeated measures applications Journal of the American Statistical Association 992; 87: Wu M Sample size for comparison of changes in the presence of right censoring caused by death, withdrawal, and staggered entry Controlled Clinical Trials 988; 9: Lee JW, DeMets DL Sequential comparison of changes with repeated measurements data Journal of the American Statistical Association 99; 86: Diggle PJ, Liang KY, Zeger SL Analysis of Longitudinal Data Oxford University Press: New York, Rochon J Application of GEE procedures for sample size calculations in repeated measures experiments Statistics in Medicine 998; 7: Hedeker D, Gibbons RD, Waternaux C Sample size estimation for longitudinal designs with attrition: comparing time-related contrasts between two groups Journal of Educational and Behavioral Statistics 999; 24: Manatunga AK, Hudgens MG, Chen S Sample size estimation in cluster randomized studies with varying cluster size Biometrical Journal 200; 43:75 86 Liang KY, Zeger SL Longitudinal data analysis using generalized linear models Biometrika 996; 73: Zeger SL, Liang KY Longitudinal data analysis for discrete and continuous outcomes Biometrics 996; 42: Laird N, Ware J Random-eects models for longitudinal data Biometrics 982; 38: Stiratelli R, Laird N, Ware JH Random-eects models for serial observations with binary response Biometrics 984; 40: Gilmour AR, Anderson RD, Rae AL The analysis of binomial data by a generalized linear mixed model Biometrika 985; 72: Breslow NE, Clayton DG Approximate inference in generalized linear mixed models Journal of the American Statistical Association 993; 88: Davidian M, Gallant AR The nonlinear mixed eects model with a smooth random eects density Biometrika 993; 80: Wolnger R Laplace s approximation for nonlinear mixed models Biometrika 993; 80: Pinheiro JC, Bates DM Approximations to the log-likelihood function in the non-linear mixed-eects model Journal of Computational and Graphical Statistics 995; 4: Goldstein H, Rasbash J Improved approximations for multilevel models with binary responses Journal of the Royal Statistical Society Series A 996; 59:505 53

17 POWER ANALYSES FOR LONGITUDINAL TRIALS Lin X, Breslow NE Bias correction in generalized linear mixed models with multiple components of dispersion Journal of the American Statistical Association 996; 9: Wang N, Lin X, Gutierrez G, Carroll RJ Bias analysis and SIMEX approach in generalized linear mixed measurement error models Journal of the American Statistical Association 998; 93: Agresti A, Booth JG, Hobert JP, Cao B Random eects modelling of categorical response data Technical Report, Department of Statistics, University of Florida, Gainesville, FL, Lipsitz SR, Kyungmann K, Zhao L Analysis of repeated categorical data using generalized estimating equations Statistics in Medicine 994; 3: Searle SR Linear Models Wiley: New York, MacCullagh P, Nelder JA Generalized Linear Models (2nd edn) Chapman & Hall: London, Pepe MS, Anderson GL A cautionary note on inferences for marginal regression models with longitudinal data and general correlated response Communications in Statistics Part A Theory and Methods 994; 23: Billingsley P Probability and Measure (2nd edn) Wiley: New York, Chung KL A Course in Probability Theory (2nd edn) Academic Press: CA, Jennrich RI, Schluchter MD Unbalanced repeated-measures models with structured covariance matrices Biometrics 986; 42: Sering RJ Approximation Theorems of Mathematical Statistics Wiley: New York, Seber GAF, Wild CJ Nonlinear Regression Wiley: New York, Seber GAF Multivariate Observation Wiley: New York, Donner A, Klar N Cluster randomization trials in epidemiology: theory and applications Journal of Statistical Planning and Inference 994; 42: Cohen J Statistical Power Analysis for the Behavioral Sciences (2nd edn) Lawrence Erlbaum Associates: New Jersey, Neter J, Wasserman W, Kutner MH Applied Linear Statistical Models (3rd edn) Irwin: Illinois, Geweke J Bayesian inference in econometric models using Monte Carlo integration Econometrica 989; 57: Whittemore AS Sample size for logistical regression with small response probability Journal of the American Statistical Association 98; 76: Crits-Christoph P, Mintz J Implications of therapist eects for the design and analysis of comparative studies of psychotherapy Journal of Consulting and Clinical Psychology 99; 59: Little RJA, Rubin DB Statistical Analysis with Missing Data Wiley: New York, Robins J, Rotnitzky A, Zhao LP Analysis of semiparametric regression models for repeated outcomes in the presence of missing data Journal of the American Statistical Association 995; 90:06 2

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University A SURVEY OF VARIANCE COMPONENTS ESTIMATION FROM BINARY DATA by Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University BU-1211-M May 1993 ABSTRACT The basic problem of variance components

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

A New Analytic Framework for Moderation Analysis Moving Beyond Analytic Interactions

A New Analytic Framework for Moderation Analysis Moving Beyond Analytic Interactions Journal of Data Science 7(2009), 313-329 A New Analytic Framework for Moderation Analysis Moving Beyond Analytic Interactions Wan Tang 1, Qin Yu 1, Paul Crits-Christoph 2 and Xin M. Tu 1 1 University of

More information

Assessing intra, inter and total agreement with replicated readings

Assessing intra, inter and total agreement with replicated readings STATISTICS IN MEDICINE Statist. Med. 2005; 24:1371 1384 Published online 30 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2006 Assessing intra, inter and total agreement

More information

Approaches to Modeling Menstrual Cycle Function

Approaches to Modeling Menstrual Cycle Function Approaches to Modeling Menstrual Cycle Function Paul S. Albert (albertp@mail.nih.gov) Biostatistics & Bioinformatics Branch Division of Epidemiology, Statistics, and Prevention Research NICHD SPER Student

More information

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,

More information

Trends in Human Development Index of European Union

Trends in Human Development Index of European Union Trends in Human Development Index of European Union Department of Statistics, Hacettepe University, Beytepe, Ankara, Turkey spxl@hacettepe.edu.tr, deryacal@hacettepe.edu.tr Abstract: The Human Development

More information

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Paper 1025-2017 GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM Kyle M. Irimata, Arizona State University; Jeffrey R. Wilson, Arizona State University ABSTRACT The

More information

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance

More information

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)

More information

Confidence intervals for the variance component of random-effects linear models

Confidence intervals for the variance component of random-effects linear models The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

A measure of partial association for generalized estimating equations

A measure of partial association for generalized estimating equations A measure of partial association for generalized estimating equations Sundar Natarajan, 1 Stuart Lipsitz, 2 Michael Parzen 3 and Stephen Lipshultz 4 1 Department of Medicine, New York University School

More information

LONGITUDINAL DATA ANALYSIS

LONGITUDINAL DATA ANALYSIS References Full citations for all books, monographs, and journal articles referenced in the notes are given here. Also included are references to texts from which material in the notes was adapted. The

More information

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION

GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION STATISTICS IN MEDICINE GOODNESS-OF-FIT FOR GEE: AN EXAMPLE WITH MENTAL HEALTH SERVICE UTILIZATION NICHOLAS J. HORTON*, JUDITH D. BEBCHUK, CHERYL L. JONES, STUART R. LIPSITZ, PAUL J. CATALANO, GWENDOLYN

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model Applied and Computational Mathematics 2014; 3(5): 268-272 Published online November 10, 2014 (http://www.sciencepublishinggroup.com/j/acm) doi: 10.11648/j.acm.20140305.22 ISSN: 2328-5605 (Print); ISSN:

More information

Study Design: Sample Size Calculation & Power Analysis

Study Design: Sample Size Calculation & Power Analysis Study Design: Sample Size Calculation & Power Analysis RCMAR/CHIME/EXPORT April 21, 2008 Honghu Liu, Ph.D. Contents Background Common Designs Examples Computer Software Summary & Discussion Background

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Independent Increments in Group Sequential Tests: A Review

Independent Increments in Group Sequential Tests: A Review Independent Increments in Group Sequential Tests: A Review KyungMann Kim kmkim@biostat.wisc.edu University of Wisconsin-Madison, Madison, WI, USA July 13, 2013 Outline Early Sequential Analysis Independent

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

On power and sample size calculations for Wald tests in generalized linear models

On power and sample size calculations for Wald tests in generalized linear models Journal of tatistical lanning and Inference 128 (2005) 43 59 www.elsevier.com/locate/jspi On power and sample size calculations for Wald tests in generalized linear models Gwowen hieh epartment of Management

More information

NIH Public Access Author Manuscript Stat Med. Author manuscript; available in PMC 2014 October 16.

NIH Public Access Author Manuscript Stat Med. Author manuscript; available in PMC 2014 October 16. NIH Public Access Author Manuscript Published in final edited form as: Stat Med. 2013 October 30; 32(24): 4162 4179. doi:10.1002/sim.5819. Sample Size Determination for Clustered Count Data A. Amatya,

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective Second Edition Scott E. Maxwell Uniuersity of Notre Dame Harold D. Delaney Uniuersity of New Mexico J,t{,.?; LAWRENCE ERLBAUM ASSOCIATES,

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version

A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version A measure for the reliability of a rating scale based on longitudinal clinical trial data Link Peer-reviewed author version Made available by Hasselt University Library in Document Server@UHasselt Reference

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ

ADJUSTED POWER ESTIMATES IN. Ji Zhang. Biostatistics and Research Data Systems. Merck Research Laboratories. Rahway, NJ ADJUSTED POWER ESTIMATES IN MONTE CARLO EXPERIMENTS Ji Zhang Biostatistics and Research Data Systems Merck Research Laboratories Rahway, NJ 07065-0914 and Dennis D. Boos Department of Statistics, North

More information

Power and Sample Size for the Most Common Hypotheses in Mixed Models

Power and Sample Size for the Most Common Hypotheses in Mixed Models Power and Sample Size for the Most Common Hypotheses in Mixed Models Anna E. Barón Sarah M. Kreidler Deborah H. Glueck, University of Colorado and Keith E. Muller, University of Florida UNIVERSITY OF COLORADO

More information

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

A UNIFIED APPROACH FOR ASSESSING AGREEMENT FOR CONTINUOUS AND CATEGORICAL DATA

A UNIFIED APPROACH FOR ASSESSING AGREEMENT FOR CONTINUOUS AND CATEGORICAL DATA Journal of Biopharmaceutical Statistics, 17: 69 65, 007 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/150-5711 online DOI: 10.1080/10543400701376498 A UNIFIED APPROACH FOR ASSESSING AGREEMENT

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Strati cation in Multivariate Modeling

Strati cation in Multivariate Modeling Strati cation in Multivariate Modeling Tihomir Asparouhov Muthen & Muthen Mplus Web Notes: No. 9 Version 2, December 16, 2004 1 The author is thankful to Bengt Muthen for his guidance, to Linda Muthen

More information

Longitudinal data analysis using generalized linear models

Longitudinal data analysis using generalized linear models Biomttrika (1986). 73. 1. pp. 13-22 13 I'rinlfH in flreal Britain Longitudinal data analysis using generalized linear models BY KUNG-YEE LIANG AND SCOTT L. ZEGER Department of Biostatistics, Johns Hopkins

More information

A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts

A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts A Comparison of Multilevel Logistic Regression Models with Parametric and Nonparametric Random Intercepts Olga Lukocien_e, Jeroen K. Vermunt Department of Methodology and Statistics, Tilburg University,

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

PROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF

PROD. TYPE: COM. Simple improved condence intervals for comparing matched proportions. Alan Agresti ; and Yongyi Min UNCORRECTED PROOF pp: --2 (col.fig.: Nil) STATISTICS IN MEDICINE Statist. Med. 2004; 2:000 000 (DOI: 0.002/sim.8) PROD. TYPE: COM ED: Chandra PAGN: Vidya -- SCAN: Nil Simple improved condence intervals for comparing matched

More information

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004

Estimation in Generalized Linear Models with Heterogeneous Random Effects. Woncheol Jang Johan Lim. May 19, 2004 Estimation in Generalized Linear Models with Heterogeneous Random Effects Woncheol Jang Johan Lim May 19, 2004 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure

More information

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM An R Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM Lloyd J. Edwards, Ph.D. UNC-CH Department of Biostatistics email: Lloyd_Edwards@unc.edu Presented to the Department

More information

Discrete Dependent Variable Models

Discrete Dependent Variable Models Discrete Dependent Variable Models James J. Heckman University of Chicago This draft, April 10, 2006 Here s the general approach of this lecture: Economic model Decision rule (e.g. utility maximization)

More information

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization

Statistics and Probability Letters. Using randomization tests to preserve type I error with response adaptive and covariate adaptive randomization Statistics and Probability Letters ( ) Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: wwwelseviercom/locate/stapro Using randomization tests to preserve

More information

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Longitudinal Data Analysis Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago Course description: Longitudinal analysis is the study of short series of observations

More information

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS

SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE SEQUENTIAL DESIGN IN CLINICAL TRIALS Journal of Biopharmaceutical Statistics, 18: 1184 1196, 2008 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543400802369053 SAMPLE SIZE RE-ESTIMATION FOR ADAPTIVE

More information

TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies

TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies STATISTICS IN MEDICINE Statist. Med. 2004; 23:1455 1497 (DOI: 10.1002/sim.1728) TUTORIAL IN BIOSTATISTICS Handling drop-out in longitudinal studies Joseph W. Hogan 1; ;, Jason Roy 2; and Christina Korkontzelou

More information

Correspondence Analysis of Longitudinal Data

Correspondence Analysis of Longitudinal Data Correspondence Analysis of Longitudinal Data Mark de Rooij* LEIDEN UNIVERSITY, LEIDEN, NETHERLANDS Peter van der G. M. Heijden UTRECHT UNIVERSITY, UTRECHT, NETHERLANDS *Corresponding author (rooijm@fsw.leidenuniv.nl)

More information

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements:

Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis. Acknowledgements: Tutorial 6: Tutorial on Translating between GLIMMPSE Power Analysis and Data Analysis Anna E. Barón, Keith E. Muller, Sarah M. Kreidler, and Deborah H. Glueck Acknowledgements: The project was supported

More information

Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions K. Krishnamoorthy 1 and Dan Zhang University of Louisiana at Lafayette, Lafayette, LA 70504, USA SUMMARY

More information

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC A Simple Approach to Inference in Random Coefficient Models March 8, 1988 Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC 27695-8203 Key Words

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Projected partial likelihood and its application to longitudinal data SUSAN MURPHY AND BING LI Department of Statistics, Pennsylvania State University

Projected partial likelihood and its application to longitudinal data SUSAN MURPHY AND BING LI Department of Statistics, Pennsylvania State University Projected partial likelihood and its application to longitudinal data SUSAN MURPHY AND BING LI Department of Statistics, Pennsylvania State University, 326 Classroom Building, University Park, PA 16802,

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution

Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Semiparametric Mixed Effects Models with Flexible Random Effects Distribution Marie Davidian North Carolina State University davidian@stat.ncsu.edu www.stat.ncsu.edu/ davidian Joint work with A. Tsiatis,

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates Paper 10260-2016 SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates Katherine Cai, Jeffrey Wilson, Arizona State University ABSTRACT Longitudinal

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis

Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis Glenn Heller Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center,

More information

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments

Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Finite Sample Performance of A Minimum Distance Estimator Under Weak Instruments Tak Wai Chau February 20, 2014 Abstract This paper investigates the nite sample performance of a minimum distance estimator

More information

An Approximate Test for Homogeneity of Correlated Correlation Coefficients

An Approximate Test for Homogeneity of Correlated Correlation Coefficients Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL

A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL A TWO-STAGE LINEAR MIXED-EFFECTS/COX MODEL FOR LONGITUDINAL DATA WITH MEASUREMENT ERROR AND SURVIVAL Christopher H. Morrell, Loyola College in Maryland, and Larry J. Brant, NIA Christopher H. Morrell,

More information

Fisher information for generalised linear mixed models

Fisher information for generalised linear mixed models Journal of Multivariate Analysis 98 2007 1412 1416 www.elsevier.com/locate/jmva Fisher information for generalised linear mixed models M.P. Wand Department of Statistics, School of Mathematics and Statistics,

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Directed acyclic graphs and the use of linear mixed models

Directed acyclic graphs and the use of linear mixed models Directed acyclic graphs and the use of linear mixed models Siem H. Heisterkamp 1,2 1 Groningen Bioinformatics Centre, University of Groningen 2 Biostatistics and Research Decision Sciences (BARDS), MSD,

More information

Weighted tests of homogeneity for testing the number of components in a mixture

Weighted tests of homogeneity for testing the number of components in a mixture Computational Statistics & Data Analysis 41 (2003) 367 378 www.elsevier.com/locate/csda Weighted tests of homogeneity for testing the number of components in a mixture Edward Susko Department of Mathematics

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population

More information

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35 Rewrap ECON 4135 November 18, 2011 () Rewrap ECON 4135 November 18, 2011 1 / 35 What should you now know? 1 What is econometrics? 2 Fundamental regression analysis 1 Bivariate regression 2 Multivariate

More information

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution Peng WANG, Guei-feng TSAI and Annie QU 1 Abstract In longitudinal studies, mixed-effects models are

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Comparing Group Means When Nonresponse Rates Differ

Comparing Group Means When Nonresponse Rates Differ UNF Digital Commons UNF Theses and Dissertations Student Scholarship 2015 Comparing Group Means When Nonresponse Rates Differ Gabriela M. Stegmann University of North Florida Suggested Citation Stegmann,

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances

Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Tutorial 4: Power and Sample Size for the Two-sample t-test with Unequal Variances Preface Power is the probability that a study will reject the null hypothesis. The estimated probability is a function

More information