Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes

Size: px
Start display at page:

Download "Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes"

Transcription

1 Faculty of Health Sciences Non-normal outcomes Correlated data Non-normal outcomes Lene Theil Skovgaard December 5, 2014 Generalized linear models Generalized linear mixed models Population average models (PA) Subject specific models (SS) Examples with counts Leprosy Seizures (briefly) Two examples with binary outcome Amenorrhea (longitudinal) Smoking among school children (cluster) 1 / 99 2 / 99 Non-normal data Reminder on binary data Typical data from e.g. epidemiology are often not normally distributed (binary, ordinal, counts, survival...) Generalized linear models (in exponential families): Multiple regression models, on a scale that corresponds to the data: Normal (link=identity), traditional linear models Binomial (link=logit), logistic regression Poisson (link=log), log-linear models, Poisson regression Examples of binary outcomes: infection after surgery smoking among school children amenorrhea among contracepting women A binary variable X has a Bernoulli ditribution, meaning that P(U = 1) = p P(U = 0) = 1 p For such an outcome, the mean value is p, and the variance is p(1 p) 3 / 99 4 / 99

2 Binomial data Examples of Binomial distributions If we sum up binary observations, n=4, 20 and 50; p=0.02, 0.2 og 0.5 Y = e.g. n U i = U U n i=1 number of infections for each hospital number of smokers in each school class number of women with amenorrhea for each general practice we get a Binomial distribution, Y Bin(n, p), 5 / 99 6 / 99 Binomial distribution, and approximations Poisson distribution The Binomial variable Y has point probabilities P(Y = m) = ( ) n p m (1 p) n m m Its mean is np and its variance np(1 p) When n is large, this distribution is very intractable, so we use approximations 7 / 99 p moderate (not too close to 0 or 1): Normal distribution p close to either 0 or 1: Poisson distribution Counts with no well-defined upper limit: the number of cancer cases in a specific community during a specific year the number of positive swabs over a certain period of time Law of rare events: As the count parameter n in a Binomial distribution gets larger and the parameter p gets close to either 0 or 1, the Binomial probabilities are approximately equal to the Poisson dsitribution P(Y = m) = λm m! exp( λ) where λ = np is the mean value, as well as the variance. 8 / 99

3 Generalized linear models Generalized linear MIXED models Outcome variable Y i, with a distribution from an exponential family (includes Normal, Binomial, Poisson, Gamma,...), with Mean value: µ i Link funktion: g assumed linear in covariates, i.e. Outcome variable Y ij, e.g. j th measurement time for individual i: Mean value: µ ij Link funktion: g, assumed linear in covariate vector X ij. Two kinds of models: g(µ i ) = β 0 + β 1 x i1 + + β k x ik = Xi T where X i denote the covariate vector for individual i. Normal (link=identity) Binomial (link=logit) Poisson (link=log) β Population average models (PA): g(µ ij ) = β 0 + β 1 x ij1 + + β k x ijk = X T ij β and (Y ij1, Y ij2 ) are associated (correlated) Subject-specific models (SS): g(µ ij ) = β 0 + β 1 x ij1 + + β k x ijk +b i b i N (0, σ 2 b ) 9 / / 99 The two model types Marginal models = Population Average (PA) Marginal models: or Population average (PA): Describe covariate effects on the population mean, e.g. expected difference between the effects of two treatments (corresponds to the repeated statement) Mixed effects model: or Subject specific (SS): Describe covariate effects on specific individuals (or clusters), e.g. expected change over time, or differences between boys and girls in the same school class (corresponds to the random statement) We specify only Marginal mean, E(Y ij X ij ) = µ ij, where g(µ ij ) = X T ij β, i.e. covariate effects as usual Distribution (Normal, Binomial, Posson,...) Marginal variance, φv (µ ij ), depending on distribution Some measure of association for Y s belonging to the same individual/unit This creates problems: Multivariate Binomial and Poisson distributions do not exist It is more of an estimation procedure rather than a model 11 / / 99

4 Marginal models, technicalities Marginal models, technicalities II Since we do not actually have a model, we cannot use a maximum likelihood approach. Instead, we use a GEE: Generalized estimating equation, (written in vector notation) D T V 1 i (y i µ i ) = 0 where V i is the (working) covariance matrix Cov(Y i ) and D i is the matrix of derivatives of the mean value µ i with respect to β The GEE-method requires an iterative procedure, gives consistent estimates of β (they have the correct mean when the sample size is large), even if Cov(Y i ) is incorrect the estimates are asymptotically Normal (i.e for large samle size, we can construct confidence intervals with plus/minus 2 standard errors) standard error of ˆβ should be based on the empirical sandwich estimator, to allow for possible overdispersion and general misspecification of Cov(Y i ) 13 / / 99 Residual variance for non-normal data Overdispersion In general, there is no free variance parameter, since the variance is determined from the mean value: Normal (link=identity), free variance parameter σ 2 Binomial (link=logit), variance np(1 p) Poisson (link=log), variance λ = E(Y ) Overdispersion: The variance may be seen to be larger than determined by the distribution. can be caused by omitted covariates (isn t that always the case?) unrecognized clusters heterogeneity, e.g. a zero -group (non-susceptibles) Traditional solution: An over-dispersion parameter φ is estimated and multiplied onto the variance or more generally: Use the empirical sandwich estimator of Cov(Y i ) 15 / / 99

5 Mixed effects models = Subject Specific models (SS) Interpretation of SS Observations: Y ij, covariate vector X ij Additional covariate vector Z ij, specifying the random effects. We specify Mean, E(Y ij X ij, b i ) = µ ij, where g(µ ij ) = X T ij β+z T ij b i Distribution (Normal, Binomial, Poisson,...) Conditional variance, φv (µ ij ) Variance of random effects, b i N p (0, G), where G is the matrix (and software) notation for σ 2 b Conditional indepence, given the covariates and the random effects This is a real model, but Inference is conditional on random effects and therefore specific to the subject It is very difficult to interpret the effect of covariates that are constant within an individual (i.e. gender, treatment etc) It may be useful to think about it as The individual is a (class) covariate The effect of another covariate is interpreted as for fixed value of all other covariates, including for fixed value of the individual 17 / / 99 For traditional linear models (Normality) For non-normal outcomes with identity link: Subject-specific model with random intercept/level is equal to Marginal model with compound symmetry covariance structure (type=cs) The above is no longer true due to non-linearity of the link-function This means: The interpretation of the parameters β does depend on the way that we model the correlation. And the interpretation of the parameters are different! More generally: The interpretation of the parameters β does not depend on the way that we model the correlation (although the estimate may change somewhat depending on the assumed structure) This implies that effects may either be interpreted cross-sectionally (marginally, for comparison of different populations, say, of different age) or subject-specific (effect of ageing for a single individual) 19 / / 99

6 A very simple example Two individuals Individual Baseline Follow up Difference log(or) OR Average Hypothetical example for illustration Subject specific model with a covariate effect (x-axis) and 21 clusters (b i, individual curves). Red curve denote population average curve but log odds for the average is 0.811, and OR=2.25 The average of individual OR s is larger than the OR calculated from average probabilities 21 / / 99 Population average on logit scale Interpretations SS specifies parallel lines on logit scale Example: The need for glasses increase over age Marginally: Odds ratio for being in need of glasses for a population with mean age 50 compared to a population with mean age 30 is smaller than but the PA deviates somewhat from a straight line and has a smaller slope (smaller effect of covariate x) 23 / 99 Subject specific: the Odds ratio for needing glasses when you (a specific individual) are of age 50 compared to when you were at age / 99

7 Counts of leprosy bacilli Averages for the leprosy example Reference: Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods, (6th edn). Iowa State University Press Controlled clinical trial: 10 patients treated with placebo P 10 patients treated with antibiotic A 10 patients treated with antibiotic B Recording of the number of bacilli at six sites of the body, i.e. a count variable before treatment (baseline, time=0) several months after treatment, (time=1) 25 / 99 Analysis Variable : bacilli N drug time Obs N Mean Variance A B P Note: The variance is obviously bigger than the average...overdispersion 26 / 99 Spaghettiplot - the leprosy example Average plot - the leprosy example Legends: A B... P Legends: A B... P 27 / / 99

8 Purpose of investigation Why is this not simple? 1. Evaluate the efficiency of antibiotics: red vs green lines 2. Compare the two drugs, A and B: solid vs dotted red lines 3. Quantify the effects of the two antibiotic drugs (SS) Randomization: At baseline, all patients have the same expected mean count (mean value), but by chance, the placebo individuals have larger values than the remaining groups. This is just a before-after study... But we are dealing with non-negative counts, so we do not have a normal distribution, although it may be a reasonable approximation... Can t we just take logarithms? No, because we have zeroes Some other transformation then? Yes, square roots, or arcsine, but the interpretation would suffer a lot Could we just condition on the baseline value? Yes, we could do that... but it becomes more tricky when we have multiple time points 29 / / 99 Model reflections Model reflections, II We are dealing with counts, so it is natural to consider a Poisson distribution, with log-link (natural log) Because it is a randomized study, the mean values at baseline should be identical for the three groups We are prepared to see 3 different changes over time - but some of these may be identical (this is actually the main scientific question) Baseline and follow measurement are correlated within individuals Parametrization of mean values (on the log-scale): Treatment Period Mean (on log scale) P Baseline β 1 P Follow-up β 1 + β 2 A Baseline β 1 A Follow-up β 1 + β 2 + β 3 B Baseline β 1 B Follow-up β 1 + β 2 + β 4 β 3 and β 4 denote additional effects of A and B, when compared to placebo 31 / / 99

9 Marginal model (PA) for leprosy Comments to code A_effect=(drug= A )*time; B_effect=(drug= B )*time; proc genmod data=leprosy; class id; model bacilli= time A_effect B_effect / d=poisson link=log; repeated subject=id / type=un corrw; contrast Antibiotic effect A_effect 1, B_effect 1 / wald; contrast Effect of A equals B? A_effect 1 B_effect -1 / wald; estimate Effect B minus A A_effect 1 B_effect -1; estimate "changes for A" time 1 A_effect 1; estimate "changes for B" time 1 B_effect 1; output out=pa pred=pred_pa xbeta=xbeta_pa; run; time indicates the change over time for the placebo group (the parameter β 2 ) A_effect indicates the additional change over time for drug A (the parameter β 3 ) B_effect indicates the additional change over time for drug B (the parameter β 4 ) d=poisson: specifies the link-function as log, and the working correlation matrix as (proportional to) the mean link=log: may overrule the link-function from dist=poisson, if so needed repeated: specifies an association between measurements on the same id (corrw requests printing) 33 / / 99 Comments to code, II Output estimate statements: Estimate combination of the β s, here β4 β 3 β 2 + β 3 β 2 + β 4 contrast statements: Useful for testing several parameters simultaneously, here the tests β 3 = β 4 = 0: No (extra) effect of either A nor B β3 = β 4 : Effects of A and B are equal (identical to the estimate-statement above) The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable WORK.LEPROSY Poisson Log bacilli Number of Observations Read 60 Number of Observations Used 60 Class Level Information Class Levels Values id Parameter Information Parameter Prm1 Prm2 Prm3 Prm4 Effect Intercept time A_effect B_effect 35 / / 99

10 Output, II Output, III: Estimation GEE Model Information Correlation Structure Unstructured Subject Effect id (30 levels) Number of Clusters 30 Correlation Matrix Dimension 2 Maximum Cluster Size 2 Minimum Cluster Size 2 Algorithm converged. Working Correlation Matrix Col1 Col2 Row Row The GENMOD Procedure Analysis Of GEE Parameter Estimates Empirical Error Estimates 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept <.0001 time A_effect B_effect / / 99 Output, IV (additional statements) Interpretations Contrast Estimate Results Mean Mean L Beta Label Estimate Confidence Limits Estimate Error Alpha Effect B minus A changes for A changes for B L Beta Chi- Label Confidence Limits Square Pr > ChiSq Effect B minus A changes for A changes for B Contrast Results for GEE Analysis Chi- Contrast DF Square Pr > ChiSq Type Antibiotic effect Wald Effect of A equals B? Wald But note: It may not be reasonable to estimate the effect of each single drug in a PA-model! 39 / 99 There is a significant effect of antibiotics: 6.99 χ 2 (2) P = 0.03 The effect of placebo is estimated to exp( ˆβ 2 ) = exp( ) = 0.986, i.e a decrease of 1.4% The additional effect of drug A is estimated to exp( ˆβ 3 ) = 0.58, and the total effect to exp( ˆβ 2 + ˆβ 3 ) = exp( ) = 0.574, i.e a decrease of 42.6% The two antibiotics are not significantly different: 0.08 χ 2 (1) P = 0.78 (although the estimated effect is a tiny bit larger for drug A) 40 / 99

11 Predicted means from Population Average model (PA) Wrong analysis not taking the correlation into account proc genmod data=leprosy; class id; model bacilli= time A_effect B_effect / d=poisson link=log modelse type3; ****** no repeated statement; contrast Antibiotic effect A_effect 1, B_effect 1 / wald; contrast Effect of A equals B? A_effect 1 B_effect -1 / wald; estimate Effect B minus A A_effect 1 B_effect -1; estimate "changes for A" time 1 A_effect 1; estimate "changes for B" time 1 B_effect 1; run; Legends: A B... P 41 / / 99 Output from wrong analysis Mixed effects model (SS) Analysis Of Maximum Likelihood Parameter Estimates Wald 95% Confidence Wald Pr>ChiSq Parameter DF Estimate Error Limits Chi-Square Intercept <.0001 time A_effect <.0001 B_effect <.0001 Scale NOTE: The scale parameter was held fixed. Note: Larger effects Too small standard errors Much too small P-values We now assume random intercepts, b i N (0, σb 2 ), in order to answer the orange question from page 29. proc GLIMMIX data=leprosy method=quad(qpoints=50); class id; model bacilli = time A_effect B_effect / dist=poisson link=log solution; random intercept / subject=id type=vc g; contrast Drug x Time Interaction A_effect 1, B_effect 1; contrast Effect of A equals B? A_effect 1 B_effect -1; estimate "changes for A" time 1 A_effect 1; estimate "changes for B" time 1 B_effect 1; output out=ss pred=xbetamean pred(noblup)=xbeta_ss pred(ilink)=predmean pred(ilink noblup)=pres_ss; run; 43 / / 99

12 Comments to glimmix code Output from glimmix analysis method=quad: maximizes the likelihood function qpoints=50: the more quadrature points, the better accuracy random: here we have only one random intercept, so type=... is unimportant g: print the estimate of σb 2 (In glimmix, the parameter σb 2 is generally denoted G) The test of equality of A and B is hard to interpret and is only shown for making this comment on it Estimated G Matrix Effect Row Col1 Intercept Covariance Parameter Estimates Cov Parm Subject Estimate Error Intercept id Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > t Intercept <.0001 time A_effect B_effect / / 99 Output from glimmix analysis, II Predicted means from Subject Specific model (SS) Note: Different scaling from p. 41 Estimates Label Estimate Error DF t Value Pr > t Effect B minus A changes for A changes for B Contrasts Num Den Label DF DF F Value Pr > F Antibiotic effect Effect of A equals B? Note again: Only the drug-specific changes are readily interpreted 47 / 99 Legends: A B... P 48 / 99

13 Predicted individual means from Subject Specific model (SS) Predicted means from PA and SS Legends: A B... P Legends: A B... P 49 / / 99 Comments on difference between PA and SS The analysis uses a log-link, and since the logarithmic function is concave, we have the following: Study on epilepsy Reference: Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics. Controlled clinical trial: 30 treated with pragabide 28 treated with placebo Recording of the number of epileptic seizures during The average of two logarithmic values (SS) is smaller than the logarithm of the average (PA) The difference between the two is largest for small values Therefor the effects on log-scale (SS) appears larger 51 / 99 8-week interval before treatment visits every second week after treatment, i.e. in 2-weeks interval We consider rates, per week 52 / 99

14 Spaghettiplot - the epilepsy example Mean value plot Number af seizures per week: Legends: Progabide Placebo Legends: Progabide Placebo 53 / / 99 Purpose of investigation Model building 1. Investigate what happens over time, does the number of seizures decrease? 2. Compare the decrease for a patient treated with pragabide to the decrease for a similar patient in the placebo group 3. Compare the decrease for a population treated with pragabide to the decrease for a population treated with placebo Notation: T ij denotes the time span corresponding to the number of seizures, Y ij, so T ij is either 2 or 8 weeks Reasonable model (in principle) for the number of seizures: Poisson outcome Random regression, i.e. linear effect of week, with individual intercepts and slopes Mean value proportional to length of period (8 or 2 weeks) log(8) and log(2) used as offsets This ensures that we model the ratio (on log-scale) Y ij T ij 55 / / 99

15 Random regression, SS model in glimmix Ecological fallacy proc glimmix data=seizures method=quad(qpoints=50); class id trt visit; model seizures = weeks trt trt*weeks / dist=poisson offset=lweeks link=log solution; random intercept weeks / subject=id type=un g; estimate weekly decline trt=0 weeks 1 weeks*trt 1 0; estimate weekly decline trt=1 weeks 1 weeks*trt 0 1; run; Think about the research question: Do we want to say something about populations? between subject covariates or are we interested in specific individuals? within subject covariates Output not shown / / 99 Example: suicide and religion Analysis on population level: the regions In a number of regions, we count: Number of suicides Outcome: % suicides (among all citizens) Number of protestants and catholics, Covariate: % protestants Percent of suicides increases with percent of protestants. Purpose of study: Do people kill themselves when they live among protestants? Is this a precise question?? Are protestants more likely to commit suicidide? 59 / / 99

16 Analysis on subject level Subdivide each region into individual religion: protestants and catholics: More suicides among catholics in regions with many protestants but they do not count as much, since they are a minor group 61 / 99 Amenorrhea example 1151 contracepting women were randomized in two groups, receiving 100 mg of some drug (trt=0) 150 mg of the same drug (trt=1) All women received injections at time points (time=1,2,3,4) with intervals of 90 days (no measurement at baseline (time=0) Each time, it was recorded whether the woman had experienced amenorrhea (a suspected side effect of the drug) in the 90 days following the last injection. Many drop-outs 62 / 99 Amenorrhea example Mean value plot - amenorrhea The MEANS Procedure Analysis Variable : y N N trt time Obs N Miss Mean Variance Note: Baseline is unmeasured (time=0) 63 / / 99

17 Mean value plot - on logit scale Purpose of the amenorrhea investigation Estimate time trend in the probability of side effects for each dose of the drug Compare the two doses Do we have linearity? Not quite... Model could include A time effect (linear or quadratic) A group difference but they should be equal at baseline (time=0) An interaction between group and time (different patterns in the two groups) A random level for each individual 65 / / 99 Mixed effects model (SS) Output from mixed effects model with quadratic time effect Estimated G Matrix proc glimmix method=quad(qpoints=50) noclprint data=amen; class id; model amenorrhea = time time2 trt*time trt*time2 / dist=binomial link=logit solution; random intercept / subject=id g; contrast Interaction with time trt*time 1, trt*time2 1 / chisq; output out=pred_ss pred(noblup ilink)=predicted_ss_mean; run; Beware: Test for interaction is difficult to interpret 67 / 99 Effect Row Col1 Intercept Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > t Intercept <.0001 time <.0001 time time*trt time2*trt Contrasts Num Den Label DF DF Chi-Square F Value Pr > ChiSq Interaction with time Label Pr > F Interaction with time / 99

18 Interpretations Predicted profiles from SS-model Random effects variance G (ˆσ 2 b = ): can be cautiously interpreted as a correlation ˆσ 2 b ˆσ 2 b + π2 3 = 0.61 The interaction is hard to interpret as a within-subject covariate, since no individual has received both treatments. 69 / / 99 Marginal model using GEE (PA) Output from marginal model proc genmod descending data=amen; class id; model amenorrhea = time time2 trt*time trt*time2 / dist=binomial link=logit; repeated subject=id / logor=fullclust; contrast Interaction with time trt*time 1, trt*time2 1; output out=pred_pa pred=predicted_pa; run; Analysis Of GEE Parameter Estimates Empirical Error Estimates 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept <.0001 time <.0001 time time*trt time2*trt Alpha <.0001 Alpha <.0001 Alpha <.0001 Alpha <.0001 Alpha <.0001 Alpha <.0001 Note: We have have a missing value issue here, because we cannot use maximum likelihood 71 / 99 Contrast Results for GEE Analysis Chi- Contrast DF Square Pr > ChiSq Type Interaction with time Score 72 / 99

19 Predicted profiles from PA-model Comparison of predicted profiles Note: New scaling...and more so, if they are further away from 0.5 PA estimates are closer to 0.5 then SS estimates... so effects are smaller for PA 73 / / 99 An alternative SS program PROC NLMIXED very flexible, allows any (non-linear) mean value structure can only handle two levels (i.e. not pupils in classes in schools...) PROC NLMIXED data=amen QPOINTS=50; PARMS beta0=-2.5 beta1=0.8 beta2=-0.03 beta3=0.36 beta4=-0.07 g11=0 to 5 by 0.5; eta = beta0 + beta1*time + beta2*time2 + beta3*trt*time + beta4*trt*time2 + b1; mu = exp(eta)/(1+exp(eta)); MODEL y ~ BINARY(mu); RANDOM b1 ~ NORMAL(0, g11) SUBJECT=id; PREDICT mu OUT=predmean; run; 75 / 99 Output from NLMIXED Parameter Estimates Parameter Estimate Error DF t Value Pr > t Alpha Lower beta < beta < beta beta beta g < Parameter Estimates Parameter Upper Gradient beta beta beta beta beta g / 99

20 Smoking among school children Model for smoking Hierarchical (multilevel) design: 1498 children (i) 90 classes (c) 46 schools (s) Outcome: Individual smoking behaviour, smoker (0/1) Purpose of investigation Find out how to make an intervention to prevent smoking Y sci Bernoulli(p sci ) p sci : the probability that child i in class c on school s is a smoker. Model: logit(p sci ) = school covariate effects +A s +school class covariate effects +individual covariate effects +B sc Evaluate various covariate effects A s N (0, ω 2 ) between school variation B sc N (0, τ 2 ) between classes (within school) variation 77 / / 99 Mette Rasmussen Possible covariates, at various levels Initial model Too simple, but a starting point to gain understanding Individual (i): sex/gender, age, parental smoking behaviour, parental smoking attitude, parental labour market attachment, best friend smoking Class (c): sex ratio, number of pupils, grade, teachers School (s): Type of school (rural, urban) Two-level model: no covariates only random school nothing here / / proc glimmix data=smoke; / class school sclass; / model smoker(descending) = / / dist=binary link=logit ddfm=satterth s; random school; run; 79 / / 99

21 Important note Interesting part of output A full maximum likelihood estimation (method=quad) with a sufficient number of qpoints is not feasible for this problem, because of insufficient space and time. The default approximaive solution is method=rspl The simplest model may be fitted with ML and this yields results quite close to the ones presented below Perhaps, some day... The GLIMMIX Procedure Covariance Parameter Estimates Cov Parm Estimate Error SCHOOL Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > t Intercept < / / 99 Interpretation of estimates Interpretation of random effect Fixed effects: Only intercept, i.e. overall level: Inverse logit-transformation: > exp( )/(1+exp( )) [1] exp( ) (1 + exp( )) = Overall, approx. 18.6% of the pupils smoke Estimated between-school variance: ˆσ 2 b = A cautios interpretation as a correlation ˆσ 2 b ˆσ 2 b + π2 3 = 0.13 Median Odds Ratio (MOR) For two randomly chisen individuals from different schools, (with identical covariates) we calculate median OR for the high risk individual compared to the low risk individual: 83 / / 99

22 MOR in practice Interpretation of correlation structure Choose two random individuals from different schools: The distribution of OR between their risk of smoking (always chosen as the ratio above 1) will have a median of MOR = exp(0.954 ω) and since ω = = , we get MOR = exp( ) = 1.46 Pupils from the same school are correlated in their inclination to smoke Pupils from the same class are no more correlated than pupils from different classes on the same scholl. This does not seem appropriate We must introduce an extra correlation for pupils in the same class / / 99 Inclusion of variation between school classes Interpretation of results proc glimmix data=smoke; class school sclass; model smoker(descending) = / dist=binary link=logit ddfm=satterth s; random school sclass; run; Covariance Parameter Estimates Cov Parm Estimate Error SCHOOL 0. sclass Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > t Intercept <.0001 The variation between schools can be totally explained by the variation between school classes The intercept (level) changes slightly because of a different weighting of the observations Median Odds Ratio (MOR) for two children from different classes in the same school: exp( ) = 1.77 Median Odds Ratio (MOR) for two children from different classes in different schools: exp( ) = / / 99

23 An illustrative figure A possible third level... Three schools: blue, red, green Variation between classes in each school, but schools look alike Imagine an extra level/grouping: Gender group within class, i.e. a subgrouping in boys and girls, corresponding to an extra correlation between pupils of the same gender in the same class. Note: This is not the same as a gender effect it need not be a systematic difference the group definition is a substitute for cliques of which we know nothing Modify the Random-statement to: random school sclass ggroup; and remember ggroup in the Class-statement 89 / / 99 One school, gender group effect Output from 3-level model The GLIMMIX Procedure Covariance Parameter Estimates Cov Parm Estimate Error SCHOOL 0. sclass GGROUP Solutions for Fixed Effects Effect Estimate Error DF t Value Pr > t Intercept <.0001 Gender group/clique seems to be an important concept 91 / / 99

24 Interpretation of results Gender correlation - systematic effect? Median Odds Ratio (MOR) for two children of opposite sex (different gender groups) in the same class: exp( ) = 1.91 Median Odds Ratio (MOR) for two children (of either gender) in different classes (at same or different schools): exp( ) = 2.04 How much does systematic gender effect explain of the random components? A large part of the variation seems to be due to gender cliques, or is it simply a systematic difference between boys and girls? proc glimmix data=smoke; class school sclass ggroup sex; model smoker(descending) = sex / dist=binary link=logit ddfm=satterth s; random school sclass ggroup; run; 93 / / 99 One school, systematic gender effect Output from 3-level model, with systematic gender effect The GLIMMIX Procedure Covariance Parameter Estimates Cov Parm Estimate Error SCHOOL 0. sclass GGROUP Solutions for Fixed Effects Effect sex Estimate Error DF t Value Pr > t Intercept <.0001 sex boy sex girl / / 99

25 Interpretation of results Variance component estimates Systematic effect of sex: OR=exp(0.4188) = 1.52 for girls vs. boys Median Odds Ratio (MOR) for two children in different cliques of the same class: exp( ) = 1.83 Median Odds Ratio (MOR) for two children in different classes (at same or different schools): exp( ) = 2.00 How much did systematic gender effect explain of the random components? model school school class gender group school alone school and school class school, class and gender group as above, with sex Note the increase in the class variation 97 / / 99 MOR, and Odds ratios (OR) for gender In case of different: model school school class gender group gender school alone school and school class school, class and gender group as above, with sex Systematic gender effect and gender cliques seem to be the most important determinants for smoking. 99 / 99

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016 Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, 2016 1 / 76 Modeling count outcomes Outline The Poisson distribution for counts Poisson models,

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models Faculty of Health Sciences Overview Correlated data More on variance component models Variance component models, II Cross-over studies Non-normal data Comparing measurement devices Lene Theil Skovgaard

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

STAT 705 Generalized linear mixed models

STAT 705 Generalized linear mixed models STAT 705 Generalized linear mixed models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 24 Generalized Linear Mixed Models We have considered random

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Variance component models part I

Variance component models part I Faculty of Health Sciences Variance component models part I Analysis of repeated measurements, 30th November 2012 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

STAT 5200 Handout #26. Generalized Linear Mixed Models

STAT 5200 Handout #26. Generalized Linear Mixed Models STAT 5200 Handout #26 Generalized Linear Mixed Models Up until now, we have assumed our error terms are normally distributed. What if normality is not realistic due to the nature of the data? (For example,

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Analysis of variance and regression. December 4, 2007

Analysis of variance and regression. December 4, 2007 Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences Faculty of Health Sciences Longitudinal data Correlated data Longitudinal measurements Outline Designs Models for the mean Covariance patterns Lene Theil Skovgaard November 27, 2015 Random regression Baseline

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Analysis of variance and regression. May 13, 2008

Analysis of variance and regression. May 13, 2008 Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline

More information

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household. Swabs, revisited 18 families with 3 children each (in well defined age intervals) were followed over a certain period of time, during which repeated swabs were taken. The variable swabs indicates how many

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

multilevel modeling: concepts, applications and interpretations

multilevel modeling: concepts, applications and interpretations multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models

More information

Variance component models

Variance component models Faculty of Health Sciences Variance component models Analysis of repeated measurements, NFA 2016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen Topics for

More information

Varians- og regressionsanalyse

Varians- og regressionsanalyse Faculty of Health Sciences Varians- og regressionsanalyse Variance component models Lene Theil Skovgaard Department of Biostatistics Variance component models Definitions and motivation One-way anova with

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

SAS Code: Joint Models for Continuous and Discrete Longitudinal Data

SAS Code: Joint Models for Continuous and Discrete Longitudinal Data CHAPTER 14 SAS Code: Joint Models for Continuous and Discrete Longitudinal Data We show how models of a mixed type can be analyzed using standard statistical software. We mainly focus on the SAS procedures

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

Categorical and Zero Inflated Growth Models

Categorical and Zero Inflated Growth Models Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only)

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only) CLDP945 Example 7b page 1 Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only) This example comes from real data

More information

Multilevel Methodology

Multilevel Methodology Multilevel Methodology Geert Molenberghs Interuniversity Institute for Biostatistics and statistical Bioinformatics Universiteit Hasselt, Belgium geert.molenberghs@uhasselt.be www.censtat.uhasselt.be Katholieke

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 29, 2016 One-way anova with random variation The rabbit example Hierarchical

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago. Mixed Models for Longitudinal Binary Outcomes Don Hedeker Department of Public Health Sciences University of Chicago hedeker@uchicago.edu https://hedeker-sites.uchicago.edu/ Hedeker, D. (2005). Generalized

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Lecture 1 Introduction to Multi-level Models

Lecture 1 Introduction to Multi-level Models Lecture 1 Introduction to Multi-level Models Course Website: http://www.biostat.jhsph.edu/~ejohnson/multilevel.htm All lecture materials extracted and further developed from the Multilevel Model course

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 27, 2018 1 / 84 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Lene Theil

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure

ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS 18:58 Sunday, July 26, 2015 1 The SURVEYFREQ Procedure Data Summary Number of Clusters 30 Number of Observations 360 time_cat

More information

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */ CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences

Correlated data. Variance component models. Example: Evaluate vaccine. Traditional assumption so far. Faculty of Health Sciences Faculty of Health Sciences Variance component models Definitions and motivation Correlated data Variance component models, I Lene Theil Skovgaard November 29, 2013 One-way anova with random variation The

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Topics for today Variance components and LMMs Analysis of repeated measurements, 4th December 04 Leftover from 8/: Rest of random regression example. New concepts for today:

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

SAS PROC NLMIXED Mike Patefield The University of Reading 12 May

SAS PROC NLMIXED Mike Patefield The University of Reading 12 May SAS PROC NLMIXED Mike Patefield The University of Reading 1 May 004 E-mail: w.m.patefield@reading.ac.uk non-linear mixed models maximum likelihood repeated measurements on each subject (i) response vector

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Variance components and LMMs Analysis of repeated measurements, 4th December 2014 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1 Psyc 945 Example page Example : Unconditional Models for Change in Number Match 3 Response Time (complete data, syntax, and output available for SAS, SPSS, and STATA electronically) These data come from

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Lab 11. Multilevel Models. Description of Data

Lab 11. Multilevel Models. Description of Data Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level

More information

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 1 / 96 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 One-way anova with random variation The rabbit example Hierarchical

More information

Sample Size and Power Considerations for Longitudinal Studies

Sample Size and Power Considerations for Longitudinal Studies Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Describing Change over Time: Adding Linear Trends

Describing Change over Time: Adding Linear Trends Describing Change over Time: Adding Linear Trends Longitudinal Data Analysis Workshop Section 7 University of Georgia: Institute for Interdisciplinary Research in Education and Human Development Section

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Generalized Estimating Equations (gee) for glm type data

Generalized Estimating Equations (gee) for glm type data Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006

More information

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data

Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Introduction to lnmle: An R Package for Marginally Specified Logistic-Normal Models for Longitudinal Binary Data Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington

More information

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: CLP 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (2015) chapter 5. We will be examining the extent

More information

Random Intercept Models

Random Intercept Models Random Intercept Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline A very simple case of a random intercept

More information

Introduction (Alex Dmitrienko, Lilly) Web-based training program

Introduction (Alex Dmitrienko, Lilly) Web-based training program Web-based training Introduction (Alex Dmitrienko, Lilly) Web-based training program http://www.amstat.org/sections/sbiop/webinarseries.html Four-part web-based training series Geert Verbeke (Katholieke

More information

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16) Motivating Example: Glucose STAT 500 Handout #3 Repeated Measures Example (Ch. 16) An experiment is conducted to evaluate the effects of three diets on the serum glucose levels of human subjects. Twelve

More information

Case-control studies C&H 16

Case-control studies C&H 16 Case-control studies C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk http://bendixcarstensen.com PhD-course in Epidemiology, Department

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information