Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Size: px
Start display at page:

Download "Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016"

Transcription

1 Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, / 76

2 Modeling count outcomes Outline The Poisson distribution for counts Poisson models, log-linear models Overdispersion Generalized linear mixed models Population average models (PA) Subject specific models (SS) Examples: Leprosy Seizures (briefly) 2 / 76

3 Example: Counts of leprosy bacilli Controlled clinical trial: 10 patients treated with placebo P 10 patients treated with antibiotic A 10 patients treated with antibiotic B Recording of the number of bacilli at six sites of the body, i.e. a count variable before treatment (baseline, time=0) several months after treatment, (time=1) Reference: Snedecor, G.W. and Cochran, W.G. (1967). Statistical Methods, (6th edn). Iowa State University Press 3 / 76

4 Spaghettiplot - the leprosy example Number af bacilli at baseline and follow-up: 4 / 76

5 Counts at endpoint Do we see a difference at the end of follow-up? We probably do not have normal distributions here, and we cannot use logarithms because of zero values, so with a small dataset... Poisson distribution 5 / 76

6 Binary data Examples of binary outcomes: bacillus at a particular site of the body (1:yes / 0:no) smoking for a pupil in a school class (1:yes / 0:no) seizure on a single day (1:yes / 0:no) A binary variable X has a Bernoulli distribution, meaning that P(X = 1) = p P(X = 0) = 1 p For such an outcome, the mean value is E(X) = p, and the variance is Var(X) = p(1 p) 6 / 76

7 Binomial data If we sum up n binary observations, Y = n i=1 X i = X X n, e.g. number of bacilli in total number of smokers in each school class number of seizures in a specific time interval we get a Binomial distribution, Y Bin(n, p), with P(Y = m) = ( ) n p m (1 p) n m m and E(Y ) = np, Var(Y ) = np(1 p) 7 / 76

8 Examples of Binomial distributions n=10, 50; np=1, 2, 5 or 20 (mean value) 8 / 76

9 Approximations to the Binomial distribution When n is large, the Binomial distribution is very intractable, so we use approximations p moderate (not too close to 0 or 1) and np > 5: Normal distribution p close to 0 (Law of rare events): the Poisson distribution, with point probabilities P(Y = m) = µm m! exp( µ) m = 0, 1, 2,... 9 / 76

10 Poisson distribution Counts with no well-defined upper limit: the number of cancer cases in a specific community during a specific year the number of bacilli in total the number of seizures in a certain interval When Y has a Poisson distribution, we have Mean value: E(Y ) = µ = np Variance: Var(Y ) = np In a Poisson distribution, the mean and variance are equal This fact is unfortunately often overlooked / 76

11 Poisson distribution Poisson distribution with mean value: µ =1,2,5 and / 76

12 Models for non-normal data Generalized linear models are just like Multiple regression models, but on a scale that corresponds to the data: Normal (link=identity), mean values (almost) on the entire axis Traditional linear models Binomial (link=logit), mean values lie between 0 and 1 logistic regression (next lecture) Poisson (link=log), mean values are positive Log-linear models, Poisson regression 12 / 76

13 Generalized linear models, for count data Outcome variable Y i, following a Poisson distribution, with Mean value: E(Y i ) = µ i Link funktion: log, the natural logarithm. On this scale, we assume linearity in the covariates, i.e. log(µ i ) = β 0 + β 1 x i1 + + β k x ik (= X T i β) where x i1,..., x ik denote the covariate values for individual i. The log-link ensures that the mean value µ i = E(Y i ) will always be positive 13 / 76

14 Comparing distributions of counts Comparison of distributions from p. 5: Do we see a difference in bacilli counts at follow-up in the three groups? Model: Y i Poisson(µ i ), log(µ i ) = β t where the subscript t denotes treatment, which can be either A, B or P. This problem corresponds to a one-way ANOVA (in case of Normal distributions) We are comparing groups on a logarithmic scale, so results will be ratios: β A β P = log(µ A ) log(µ P ) exp(β A β P ) = µ A µp 14 / 76

15 Poisson analysis in SAS, endpoint proc genmod data=leprosy_wide; class drug; model endpoint = drug / dist=poisson link=log type3; estimate Effect A minus P drug 1 0-1; estimate Effect B minus A drug ; estimate Antibiotic effect drug ; run; Notes regarding the code: We use PROC GENMOD Options in the model-statement: dist=poisson: The distribution is Poisson link=log: Effects are modelled on a log-scale (natural log) type3 asks for a test of equality of all 3 drugs (as an F-test in ANOVA-models) Estimate-statements explained on p / 76

16 Output, I The GENMOD Procedure Model Information Data Set Distribution Link Function Dependent Variable WORK.LEPROSY Poisson Log endpoint Number of Observations Used 30 Class Level Information Class Levels Values drug 3 A B P Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Pearson Chi-Square Scaled Pearson X AIC (smaller is better) Algorithm converged. The values above 1 in the last column indicates a misfit, see later on (overdispersion, p. 20 ff) 16 / 76

17 Output, II LR Statistics For Type 3 Analysis Source DF Chi-Square Pr > ChiSq drug <.0001 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Conf Wald Parameter DF Estimate Error Limits Chi-Square Pr>ChiSq Intercept <.0001 drug A <.0001 drug B <.0001 drug P Scale NOTE: The scale parameter was held fixed. Estimate of β A β P = This is on log-scale and has to be back-transformed: exp( ) = 0.43, meaning that when treated with A, the level of bacilli at endpoint is only 43% compared to the placebo-treated. But remember the misfit from last page 17 / 76

18 Estimate-statements In the output on the previous page, we have 4 estimates: Intercept = : Level of (alphabetically) last group, i.e. β P 3 parameters in the drug-effect 1. drug A = : The difference on log-scale between drug A and the Intercept (drug P), i.e. β A β P 2. drug B = : As above, only for drug B 3. drug P = : The difference on log-scale between drug P and the Intercept (drug P), i.e. β P β P = 0 Each Estimate-statement (p. 15) describes how we want to combine these 4 estimates, e.g. estimate Effect A minus P drug 1 0-1; says we want the combination 1 (β A β P ) 1 0 = β A β P 18 / 76

19 Output, III Output from Estimate statements (some columns deleted): Contrast Estimate Results Mean Mean Mean Prob Obs Label Estimate LowerCL UpperCL ChiSq ChiSq 1 Effect A minus P < Effect B minus A Antibiotic effect <.0001 The back-transformed differences are ratios, given in the column Mean Estimate The active groups perform better at follow-up (ratio < 1). 19 / 76

20 Counts at baseline and follow-up The MEANS Procedure N drug Obs Variable N Median Mean Variance A 10 baseline endpoint B 10 baseline endpoint P 10 baseline endpoint Note: The variance is obviously bigger than the average (overdispersion, detected as a misfit on p. 16) 20 / 76

21 Overdispersion Overdispersion: The variance has been noted to be larger than expected for a Poisson distribution. This may be caused by omitted covariates (isn t that always the case?) unrecognized clusters heterogeneity, e.g. a zero -group (non-susceptibles) When overdispersion is disregarded The standard errors are erroneously small The P-values are erroneously small We get type I errors 21 / 76

22 Handling of overdispersion Two traditional solutions: Assuming that Var(Y ) = φe(y ) = φnp with some φ > 0 (most often > 1) although such a distibution does not actually exist... Including an extra random variation to account for the forgotten covariates, e.g. log(µ i ) = β 0 + β 1 x i1 + + β k x ik +b i with some assumption on the distribution of the b i s i.e. with exp(b i ) multiplied on the mean value 22 / 76

23 Overdispersion parameter The over-dispersion parameter φ can be estimated and multiplied onto the variance, yielding Larger standard errors Larger P-values φ is estimated from either Pearson Chi-Square Value/DF or Deviance Chi-Square Value/DF, using options scale=p or scale=d. and multiply the square root ˆφ on standard errors. 23 / 76

24 Additional random variation Possible models for b i : b i N (0, ωb 2 ): leads to a complicated model, which changes the level of the mean, since E(exp(b i )) = exp(ωb 2/2) > 1 (we shall return to this later) b i log Gamma: leads to Y i being distributed as a Negative binomial distribution, in which: E(Y i ) = µ i Var(Y i ) = µ i + θµ 2 i, so the variance may now be larger than the mean 24 / 76

25 Negative binomial distributions, with mean 10 Poisson distribution, followed by 3 negative binomial distributions, with variance 30, 110 and 210. All distributions have mean / 76

26 Overdispersion in PROC GENMOD Overdispersion parameter: proc genmod data=leprosy; class drug; model endpoint = drug / dist=poisson link=log type3 scale=pearson; run; Negative Binomial model: proc genmod data=leprosy; class drug; model endpoint = drug / dist=negbin link=log type3; run; 26 / 76

27 Overdispersion in PROC GENMOD, scale=pearson Code shown p. 26 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr>ChiSq Intercept <.0001 drug A drug B drug P Scale NOTE: The scale parameter was estimated by the square root of Pearson s Chi-Square/DOF. LR Statistics For Type 3 Analysis Source Num DF Den DF F Value Pr > F Chi-Square Pr > ChiSq drug / 76

28 Negative binomial analysis in GENMOD Code shown p. 26 Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Wald Parameter DF Estimate Error Confidence Limits Chi-Square Intercept <.0001 drug A drug B drug P Dispersion NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood. LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq drug / 76

29 Results for mean(a, B) vs. P Endpoint comparisons Ratio (CI) P-value Poisson (0.3582, ) < with overdispersion (0.2641, ) Negative Binomial (0.2368, ) Endpoints differ, but: Baseline comparisons Ratio (CI) P-value Poisson (0.5982, ) with overdispersion (0.5397, ) Negative Binomial (0.5492, ) How do we account for baseline differences? 29 / 76

30 Spaghettiplot - the leprosy example Now again including both time points (0 and 1): 30 / 76

31 Average plot - the leprosy example Note: New scaling, different from p / 76

32 Possible purposes of the investigation 1. Evaluate the efficiency of antibiotics: red lines vs green line 2. Compare the two drugs, A and B: solid vs dotted red line 3. Quantify the effects of each of the two antibiotic drugs separately Randomization: At baseline, all patients have the same expected mean count (mean value), but by chance, the placebo individuals have larger values than the two other groups. 32 / 76

33 Model reflections This is just a before-after study...but We are dealing with counts, so it is natural to consider a Poisson distribution, with log-link (natural log) Because it is a randomized study, the mean values at baseline should be identical for the three groups We are prepared to see 3 different changes over time - but some of these may be identical (this is actually the main scientific question) Baseline and follow-up measurements are correlated within individuals 33 / 76

34 Correlations within individual? It certainly seems so / 76

35 Model reflections, II Can t we just take logarithms? No, because we have zeroes Some other transformation then? Yes, square roots, or arcsine, but the interpretation would suffer a lot Could we just condition on the baseline value? Yes, we could do that...but it becomes more tricky when we have multiple time points Could we analyze differences? Or rather, ratios? Hmm... We could build a Constrained Model, forcing mean values to be equal at baseline. 35 / 76

36 ANCOVA, Poisson The GENMOD Procedure Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Pearson Chi-Square Scaled Pearson X Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr>ChiSq Intercept drug A drug B drug P before <.0001 Contrast Estimate Results Mean Mean Mean Prob Obs Label Estimate LowerCL UpperCL ChiSq ChiSq 1 Effect A minus P Effect B minus A Antibiotic effect We note ratios closer to 1 when we adjust for baseline. (compare to p. 19) 36 / 76

37 Constrained model for baseline correction Parametrization of mean values (on the log-scale): Treatment Period Mean (on log scale) P Baseline β 1 P Follow-up β 1 + β 2 A Baseline β 1 A Follow-up β 1 + β 2 + β 3 B Baseline β 1 B Follow-up β 1 + β 2 + β 4 β 3 resp. β 4 denote additional effects of A and B, when compared to placebo 37 / 76

38 Generalized linear MIXED models Outcome variable Y ij, e.g. j th measurement time for individual i: Mean value: µ ij Link funktion g: g(µ ij ) is assumed linear in covariate vector X ij. Two kinds of models: Population average models (PA): g(µ ij ) = β 0 + β 1 x ij1 + + β k x ijk = X T ij β and (Y ij1, Y ij2 ) are associated (correlated), with some (patterned) covariance (p.42 ff) Subject-specific models (SS): g(µ ij ) = β 0 + β 1 x ij1 + + β k x ijk +b i b i N (0, ωb 2 ), random intercepts (levels) may be generalized to other random effects: slopes,...(p. 56 ff) 38 / 76

39 The two model types Marginal models: or Population average (PA): Describe covariate effects on the population mean, e.g. expected difference between the effects of two treatments Corresponds to the repeated-statement Mixed effects model: or Subject specific (SS): Describe covariate effects on specific individuals (or clusters), e.g. expected change over time, (r differences between boys and girls in the same school class) Corresponds to the random-statement 39 / 76

40 For traditional linear models (Normality) with identity link: Subject-specific model (SS) with random intercept/level is equal to Marginal model (PA) with compound symmetry covariance structure (type=cs) More generally: The interpretation of the parameters β does not depend on the way that we model the covariance/correlation (although the estimate may change somewhat depending on the assumed structure of the covariance) 40 / 76

41 For non-normal outcomes The above is no longer true in general, due to non-linearity of the link-function For Poisson analyses this means: Including a random subject level (as in SS-models) will change the interpretation of the mean value, but not the parameters denoting the effects of the covariates (e.g. group or time). Parameters allowed to vary between individuals will differ in interpretation as well as size SS-models will provide median-like levels (or rather levels for median individuals), as opposed to average-like levels for PA-models 41 / 76

42 Marginal models = Population Average (PA) A Multivariate Poisson distribution does not exist, so we only specify Marginal mean, E(Y ij X ij ) = µ ij, where log(µ ij ) = X T ij β, i.e. covariate effects as usual Distribution... Poisson (in a way), but... Marginal variance, φv (µ ij ) = φµ ij (overdispersion) Some measure of association for Y s belonging to the same individual/unit, V i = Cov(Y i ), called the working covariance matrix 42 / 76

43 Marginal models, technicalities Since we do not actually have a model, we cannot use a maximum likelihood approach. This has implications for the handling of missing values (lecture 4). Instead, we use the socalled GEE: Generalized estimating equation, (written in vector notation) D T V 1 i (Y i µ i ) = 0 where V i is the (working) covariance matrix for Y i, and D i is the matrix of derivatives of the mean value µ i with respect to β 43 / 76

44 Properties of the GEE estimation procedure PRO s: It is robust, in the sense that it gives consistent estimates even if the working covariance matrix is misspecified provided that you use the Sandwich covariance estimate (which is fortunately default in PROC GENMOD). The Sandwich covariance estimate also takes care of the possible overdispersion, as well as possible differences in variability over time. For large sample sizes, the parameter estimates will be asymptotically Normal, (i.e we can construct confidence intervals with ±2 standard errors) 44 / 76

45 Properties of the GEE estimation procedure, II CON s: It can only be used for balanced data It performs poorly in small datasets (anti-conservative, i.e. may give type I errors). If missing data are not missing completely at random (MCAR), the results will be flawed, and we have to do some (quite complicated) weighting in order to get consistent results (lecture 6) 45 / 76

46 Choosing a working covariance PROC GENMOD offers several choices Unstructured (type=un) Compound Symmetry (type=cs) Autoregressive (type=ar) Working independence (type=ind) All choices will give consistent estimates, but choices closer to the true structure will be more efficient (narrower confidence intervals, i.e. more power). 46 / 76

47 Marginal model (PA) for leprosy In order to restrict baseline mean values to be equal (see p. 37), we define the adjusted treatment variable: drugadj=drug; if time=0 then drugadj="p"; and the code (with unstructured working covariance) will then be proc genmod data=leprosy; class id drugadj; model bacilli= time drugadj*time / dist=poisson link=log type3; repeated subject=id / type=un corrw; run; 47 / 76

48 Comments to code time indicates the change over time for the placebo group (because this is the reference group) drugadj*time: specifies additional time changes over and above the changes for placebo dist=poisson: specifies the link-function as log, and the working correlation matrix as (proportional to) the mean link=log: may overrule the link-function from dist=poisson, if so needed repeated: specifies an unstructured (type=un) association between measurements on the same id (corrw requests printing) 48 / 76

49 Output from marginal (PA) model The GENMOD Procedure Analysis Of GEE Parameter Estimates Empirical Standard Error Estimates Standard 95% Confidence Parameter Estimate Error Limits Z Pr > Z Intercept <.0001 time time*drugadj A time*drugadj B time*drugadj P Score Statistics For Type 3 GEE Analysis Source DF Chi-Square Pr > ChiSq time time*drugadj / 76

50 Interpretations There is a significant effect of antibiotics: Score test: 4.56 χ 2 (2) P = 0.10 Walds test: 6.99 χ 2 (2) P = 0.03 The effect of placebo is estimated to exp( ˆβ 2 ) = exp( ) = 0.986, i.e a decrease of 1.4% The additional effect of drug A is estimated to exp( ˆβ 3 ) = exp( ) = 0.58, and the total effect to exp( ˆβ 2 + ˆβ 3 ) = exp( ) = 0.574, i.e a decrease of 42.6% 50 / 76

51 Marginal model (PA) for leprosy, II With additional estimate- and output-statement, the code becomes: proc genmod data=leprosy; class id drugadj; model bacilli= time drugadj*time / dist=poisson link=log type3; repeated subject=id / type=un corrw; estimate "change for A" time 1 drugadj*time 1 0 0; estimate "change for B" time 1 drugadj*time 0 1 0; estimate "change for P" time 1 drugadj*time 0 0 1; estimate additional change A vs. P drugadj*time 1 0-1; estimate additional change B vs. P drugadj*time 0 1-1; estimate additional change A vs. B drugadj*time 1-1 0; estimate additional change (A,B) vs. P drugadj*time ; output out=pa pred=pred_pa xbeta=xbeta_pa; run; 51 / 76

52 Comments to additional code The estimate statements provide Estimates of time effects for each drug separately Estimates of the additional time effect for each of the two active drugs, as compared to placebo Estimates of the difference in time effect between the two active drugs Estimates of the additional average time effect for the two active drugs, as compared to placebo Output data set, with predicted values pred_pa, for illustration purposes (see p. 54) 52 / 76

53 Output from additional estimate statements from p. 51 Mean Mean Mean Prob Obs Label Estimate LowerCL UpperCL ChiSq ChiSq 1 change for A change for B change for P additional change A vs. P additional change B vs. P additional change A vs. B additional change (A,B) vs. P The two antibiotics are not significantly different: 0.08 χ 2 (1) P = 0.78 although the estimated effect is a tiny bit larger for drug A (smaller ratio for the decline) 53 / 76

54 Predicted means from Population Average model (PA) Note the identical baselines Legends: A B... P 54 / 76

55 Comments to estimates time profiles in comparison to the simple averages (p. 31): Treatment B starts off at a higher level Due to Regression to the mean, we therefore expect this group to have the steepest decline Since they are actually close to parallel in the averages (so that B is not steeper than A), this leads us to conclude that B is not as effective as A, and therefore, we see a difference in slope in the predicted means Same type of argument concerning P, which would decrease the most if it was equally effective 55 / 76

56 Subject Specific models (SS) Variance component models, see p : Observations: Y ij, with mean value µ ij where log(µ ij ) = X T ij β+z T ij b i The b i s denote the random effects, e.g. random levels (intercepts), random slopes etc. It is assumed that b i N (0, G) and are independent of the covariates X i For any subject, the repeated measurements are conditionally independent, given the random effects, and follows a Poisson distribution This is a proper multivariate model, in which the correlation between repeated measurements on the same subject is induced by the random effects 56 / 76

57 Interpretation of SS Since this is a real model, we can use maximum likelihood (and handle MAR missing values), but The effect of a covariate is interpreted as being for fixed value of all other covariates, including for fixed value of the individual, i.e. specific to this subject. For models with a log-link, however, the interpretation of covariate effects are still as usual, except for The intercept (which gets more of a median-like interpretation and therefore smaller than the mean interpretation from the PA-model) Covariates that also enter as random effects e.g. random slope = random effect of time (not here) 57 / 76

58 A very simple example of random slopes A population consisting of two individuals (number of bacilli): Random slope on log-scale means different ratios between Follow-Up and Baseline: Individual Baseline Follow up Ratio Average but for the population, the ratio is = The average of individual ratios is not equal to the ratio of the averages 58 / 76

59 PRO s and CON s of SS PRO : It is an actual model, allowing likelihood inference MAR-Missing values can be handled correctly It may be used for unbalanced data sets 59 / 76 CON : The interpretation is conditional upon individual random effects, and therefore not always in focus Higher risk of misspecification, due to assumptions that are difficult to check Computationally problematic when the number of random effects or the overall size of the data becomes large.

60 Mixed effects model (SS) We now assume random intercepts, b i N (0, ωb 2 ) by specifying a random level for each individual (so here, G = ωb 2): proc glimmix data=leprosy method=quad(qpoints=50); class id; model bacilli= time drugadj*time / d=poisson link=log type3; random intercept / subject=id type=vc g; /* optional statements added below */ estimate "change for A" time 1 drugadj*time / exp cl; estimate "change for B" time 1 drugadj*time / exp cl; estimate "change for P" time 1 drugadj*time / exp cl; estimate additional change A vs. P drugadj*time 1 0-1; estimate additional change B vs. P drugadj*time 0 1-1; estimate additional change A vs. B drugadj*time 1-1 0; estimate additional change (A,B) vs. P drugadj*time ; output out=ss pred=pred pred(noblup)=predav pred(ilink)=predmu pred(ilink noblup)=predmuav; run; 60 / 76

61 Comments to code for SS-model We use PROC GLIMMIX method=quad(qpoints=50): perform maximum likelihood estimation by approximating the likelihood function by Gaussian quadrature. The more quadrature points, the better accuracy. random: here we have only one random intercept, so type=... is unimportant g: prints the estimate of ωb 2 (In glimmix, the parameter ωb 2 is generally denoted G) estimate-statements as before (only now, we need options exp and cl) output out=: Saves predicted values in the data set ss (there are several different kinds, see p. 64) 61 / 76

62 Output from SS-type analysis Estimated G Matrix Effect Row Col1 Intercept Covariance Parameter Estimates Standard Cov Parm Subject Estimate Error Intercept id Solutions for Fixed Effects Standard Effect drugadj Estimate Error DF t Value Pr > t Intercept <.0001 time time*drugadj A time*drugadj B time*drugadj P Type III Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F time time*drugadj Note: Somewhat steeper lines than for PA-model / 76

63 Output from glimmix analysis, II Only some columns shown Exponentiated Label Estimate ExpLower ExpUpper Pr > t change for A change for B change for P additional change A vs. P additional change B vs. P additional change A vs. B additional change (A,B) vs. P Note again: Some differences to PA-analysis, but overall same conclusion 63 / 76

64 Output dataset from GLIMMIX analysis The data set ss, created p. 60, contains 4 different predicted values: output out=ss pred=pred pred(noblup)=predav pred(ilink)=predmu pred(ilink noblup)=predmuav; Predicions on log-scale (not so interesting): Pred: Individual predictions (pred=) PredAv: Predictions, averaged over population (pred(noblup)=) Predictions on original scale (more interesting): PredMu: Individual predictions (pred(ilink)=) PredMuAv: Back-transformed average predictions (pred(ilink noblup)=) 64 / 76

65 Individual predicted curves, SS pred(ilink)=predmu 65 / 76

66 Average individual predictions, SS Averages from p. 65 Legends: A B... P Note the resemblance to the average curves on p. 31 Here they are moved a bit closer together 66 / 76

67 Comparison of SS and PA PA left, SS right Legends: A B... P 67 / 76

68 Additional overdispersion in GLIMMIX Recall the assumptions from the SS-model: For any subject, the repeated measurements are conditionally independent, given the random effects, and they follow a Poisson distribution It is possible to add additional overdispersion to these conditional models by adding the line random _residual_; to the code from p. 60, or to use the Negative Binomial distribution instead of the Poisson distribution (dist=negbin) 68 / 76

69 Overview of results for Leprosy Decrease (A,B) vs. P Ratio (CI) P-value No correlation 0.46 (0.36, 0.60) < No corr., overdispersion 0.46 (0.29, 0.74) No corr., Negative Binomial 0.46 (0.28, 0.75) PA PA, Poisson 0.60 (0.41, 0.88) PA, Negative Binomial 0.58 (0.39, 0.86) SS SS, Poisson 0.57 (0.41, 0.80) SS, Poisson, overdispersion 0.56 (0.38, 0.82) SS, Negative Binomial 0.57 (0.40, 0.81) / 76

70 Example: Epileptic seizures Controlled clinical trial, with 58 epileptic patients: 28 treated with placebo 30 treated with progabide=active Recording of the number of epileptic seizures during an 8-week interval before treatment 4 2-weeks intervals after treatment Reference: Thall, P.F. and Vail, S.C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics. 70 / 76

71 Spaghettiplot - the epilepsy example Number af seizures per week: Looks good for the patients, but Week 0 level not comparable to the others: 8 weeks data collection 71 / 76

72 Seizures per week (rates) Now, week 0 is comparable to the other weeks in mean but not in variation (longer sample time) 72 / 76

73 Seizure example: Mean value plot Legends: Progabide Placebo Not linear...but for illustration, we could assume two straight lines on log-scale / 76

74 Purpose of investigation 1. Investigate what happens over time, does the number of seizures decrease? 2. Compare the decrease for a patient treated with pragabide to the decrease for a similar patient in the placebo group 3. Compare the decrease for a population treated with pragabide to the decrease for a population treated with placebo Notation: T ij denotes the time span corresponding to the number of seizures, Y ij, so T ij is either 2 or 8 weeks 74 / 76

75 Model building Model (in principle, not reasonable here) for the number of seizures: or Poisson outcome Random regression, i.e. linear effect of week, with individual intercepts and slopes Mean value proportional to length of period (8 or 2 weeks) log(8) and log(2) used as offsets This ensures that we model the ratio Y ij T ij, on log-scale, i.e. 75 / 76 ( ) E(Yij ) log = α + β time + γ treat time T ij log(e(y ij )) = α + β time + γ treat time + log(t ij )

76 Random regression, SS model in glimmix Important: The model is not reasonable here (see figure on p. 74), and is only showed to hint at possible extensions... proc glimmix data=seizures method=quad(qpoints=50); class id adjtreat visit; model seizures = weeks adjtreat*weeks / dist=poisson offset=lweeks link=log solution; random intercept weeks / subject=id type=un g; estimate weekly decline treat=0 weeks 1 weeks*adjtreat 1 0; estimate weekly decline treat=1 weeks 1 weeks*adjtreat 0 1; estimate slope, active vs. placebo?? week*adjtreat -1 1 / exp cl; output out=ss pred=pred pred(noblup)=predav pred(ilink)=predmu pred(ilink noblup)=predmuav; run; Since time (weeks) here enter as a random effect, the interpreation of time effects have to be conditional on the specific subject. 76 / 76

Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes

Correlated data. Non-normal outcomes. Reminder on binary data. Non-normal data. Faculty of Health Sciences. Non-normal outcomes Faculty of Health Sciences Non-normal outcomes Correlated data Non-normal outcomes Lene Theil Skovgaard December 5, 2014 Generalized linear models Generalized linear mixed models Population average models

More information

Models for binary data

Models for binary data Faculty of Health Sciences Models for binary data Analysis of repeated measurements 2015 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen 1 / 63 Program for

More information

STAT 705 Generalized linear mixed models

STAT 705 Generalized linear mixed models STAT 705 Generalized linear mixed models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 24 Generalized Linear Mixed Models We have considered random

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen Outline Data in wide and long format

More information

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data

Correlated data. Repeated measurements over time. Typical set-up for repeated measurements. Traditional presentation of data Faculty of Health Sciences Repeated measurements over time Correlated data NFA, May 22, 2014 Longitudinal measurements Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics University of

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p ) Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Models for longitudinal data

Models for longitudinal data Faculty of Health Sciences Contents Models for longitudinal data Analysis of repeated measurements, NFA 016 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Answer to exercise: Blood pressure lowering drugs

Answer to exercise: Blood pressure lowering drugs Answer to exercise: Blood pressure lowering drugs The data set bloodpressure.txt contains data from a cross-over trial, involving three different formulations of a drug for lowering of blood pressure:

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences

Correlated data. Longitudinal data. Typical set-up for repeated measurements. Examples from literature, I. Faculty of Health Sciences Faculty of Health Sciences Longitudinal data Correlated data Longitudinal measurements Outline Designs Models for the mean Covariance patterns Lene Theil Skovgaard November 27, 2015 Random regression Baseline

More information

Analysis of variance and regression. May 13, 2008

Analysis of variance and regression. May 13, 2008 Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models

Correlated data. Overview. Cross-over study. Repetition. Faculty of Health Sciences. Variance component models, II. More on variance component models Faculty of Health Sciences Overview Correlated data More on variance component models Variance component models, II Cross-over studies Non-normal data Comparing measurement devices Lene Theil Skovgaard

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

SAS Syntax and Output for Data Manipulation:

SAS Syntax and Output for Data Manipulation: CLP 944 Example 5 page 1 Practice with Fixed and Random Effects of Time in Modeling Within-Person Change The models for this example come from Hoffman (2015) chapter 5. We will be examining the extent

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

STAT 5200 Handout #26. Generalized Linear Mixed Models

STAT 5200 Handout #26. Generalized Linear Mixed Models STAT 5200 Handout #26 Generalized Linear Mixed Models Up until now, we have assumed our error terms are normally distributed. What if normality is not realistic due to the nature of the data? (For example,

More information

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only

Q30b Moyale Observed counts. The FREQ Procedure. Table 1 of type by response. Controlling for site=moyale. Improved (1+2) Same (3) Group only Moyale Observed counts 12:28 Thursday, December 01, 2011 1 The FREQ Procedure Table 1 of by Controlling for site=moyale Row Pct Improved (1+2) Same () Worsened (4+5) Group only 16 51.61 1.2 14 45.16 1

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Lecture 3.1 Basic Logistic LDA

Lecture 3.1 Basic Logistic LDA y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 27, 2018 1 / 84 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Example: Swelling due to vaccine. Variance component models. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models One-way anova with random variation The rabbit example Hierarchical models with several levels Random regression Lene Theil

More information

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes Lecture 2.1 Basic Linear LDA 1 Outline Linear OLS Models vs: Linear Marginal Models Linear Conditional Models Random Intercepts Random Intercepts & Slopes Cond l & Marginal Connections Empirical Bayes

More information

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA

Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA Paper 188-29 Repeated Measures Modeling With PROC MIXED E. Barry Moser, Louisiana State University, Baton Rouge, LA ABSTRACT PROC MIXED provides a very flexible environment in which to model many types

More information

GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

More information

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study

Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study Subject-specific observed profiles of log(fev1) vs age First 50 subjects in Six Cities Study 1.4 0.0-6 7 8 9 10 11 12 13 14 15 16 17 18 19 age Model 1: A simple broken stick model with knot at 14 fit with

More information

Statistics for exp. medical researchers Regression and Correlation

Statistics for exp. medical researchers Regression and Correlation Faculty of Health Sciences Regression analysis Statistics for exp. medical researchers Regression and Correlation Lene Theil Skovgaard Sept. 28, 2015 Linear regression, Estimation and Testing Confidence

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester Modelling Rates Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 05/12/2017 Modelling Rates Can model prevalence (proportion) with logistic regression Cannot model incidence in

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */ CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

More information

Variance component models part I

Variance component models part I Faculty of Health Sciences Variance component models part I Analysis of repeated measurements, 30th November 2012 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1

SAS Code for Data Manipulation: SPSS Code for Data Manipulation: STATA Code for Data Manipulation: Psyc 945 Example 1 page 1 Psyc 945 Example page Example : Unconditional Models for Change in Number Match 3 Response Time (complete data, syntax, and output available for SAS, SPSS, and STATA electronically) These data come from

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

ANOVA Longitudinal Models for the Practice Effects Data: via GLM Psyc 943 Lecture 25 page 1 ANOVA Longitudinal Models for the Practice Effects Data: via GLM Model 1. Saturated Means Model for Session, E-only Variances Model (BP) Variances Model: NO correlation, EQUAL

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

PAPER 218 STATISTICAL LEARNING IN PRACTICE

PAPER 218 STATISTICAL LEARNING IN PRACTICE MATHEMATICAL TRIPOS Part III Thursday, 7 June, 2018 9:00 am to 12:00 pm PAPER 218 STATISTICAL LEARNING IN PRACTICE Attempt no more than FOUR questions. There are SIX questions in total. The questions carry

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Multilevel Methodology

Multilevel Methodology Multilevel Methodology Geert Molenberghs Interuniversity Institute for Biostatistics and statistical Bioinformatics Universiteit Hasselt, Belgium geert.molenberghs@uhasselt.be www.censtat.uhasselt.be Katholieke

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010

MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 MIXED MODELS FOR REPEATED (LONGITUDINAL) DATA PART 2 DAVID C. HOWELL 4/1/2010 Part 1 of this document can be found at http://www.uvm.edu/~dhowell/methods/supplements/mixed Models for Repeated Measures1.pdf

More information

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S

Logistic regression analysis. Birthe Lykke Thomsen H. Lundbeck A/S Logistic regression analysis Birthe Lykke Thomsen H. Lundbeck A/S 1 Response with only two categories Example Odds ratio and risk ratio Quantitative explanatory variable More than one variable Logistic

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman.

Faculty of Health Sciences. Correlated data. Variance component models. Lene Theil Skovgaard & Julie Lyng Forman. Faculty of Health Sciences Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 1 / 96 Overview One-way anova with random variation The rabbit example Hierarchical

More information

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models

Correlated data. Overview. Variance component models. Terminology for correlated measurements. Faculty of Health Sciences. Variance component models Faculty of Health Sciences Overview Correlated data Variance component models Lene Theil Skovgaard & Julie Lyng Forman November 28, 2017 One-way anova with random variation The rabbit example Hierarchical

More information

Covariance Structure Approach to Within-Cases

Covariance Structure Approach to Within-Cases Covariance Structure Approach to Within-Cases Remember how the data file grapefruit1.data looks: Store sales1 sales2 sales3 1 62.1 61.3 60.8 2 58.2 57.9 55.1 3 51.6 49.2 46.2 4 53.7 51.5 48.3 5 61.4 58.7

More information

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials

CHL 5225 H Crossover Trials. CHL 5225 H Crossover Trials CHL 55 H Crossover Trials The Two-sequence, Two-Treatment, Two-period Crossover Trial Definition A trial in which patients are randomly allocated to one of two sequences of treatments (either 1 then, or

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household.

Swabs, revisited. The families were subdivided into 3 groups according to the factor crowding, which describes the space available for the household. Swabs, revisited 18 families with 3 children each (in well defined age intervals) were followed over a certain period of time, during which repeated swabs were taken. The variable swabs indicates how many

More information

Introduction to Within-Person Analysis and RM ANOVA

Introduction to Within-Person Analysis and RM ANOVA Introduction to Within-Person Analysis and RM ANOVA Today s Class: From between-person to within-person ANOVAs for longitudinal data Variance model comparisons using 2 LL CLP 944: Lecture 3 1 The Two Sides

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

Package HGLMMM for Hierarchical Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52 Outline General

More information

Some comments on Partitioning

Some comments on Partitioning Some comments on Partitioning Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/30 Partitioning Chi-Squares We have developed tests

More information

Changes Report 2: Examples from the Australian Longitudinal Study on Women s Health for Analysing Longitudinal Data

Changes Report 2: Examples from the Australian Longitudinal Study on Women s Health for Analysing Longitudinal Data ChangesReport: ExamplesfromtheAustralianLongitudinal StudyonWomen shealthforanalysing LongitudinalData June005 AustralianLongitudinalStudyonWomen shealth ReporttotheDepartmentofHealthandAgeing ThisreportisbasedonthecollectiveworkoftheStatisticsGroupoftheAustralianLongitudinal

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA

Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED. Maribeth Johnson Medical College of Georgia Augusta, GA Analysis of Longitudinal Data: Comparison Between PROC GLM and PROC MIXED Maribeth Johnson Medical College of Georgia Augusta, GA Overview Introduction to longitudinal data Describe the data for examples

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

One-stage dose-response meta-analysis

One-stage dose-response meta-analysis One-stage dose-response meta-analysis Nicola Orsini, Alessio Crippa Biostatistics Team Department of Public Health Sciences Karolinska Institutet http://ki.se/en/phs/biostatistics-team 2017 Nordic and

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Variance components and LMMs Analysis of repeated measurements, 4th December 2014 Julie Lyng Forman & Lene Theil Skovgaard Department of Biostatistics, University of Copenhagen

More information

Variance components and LMMs

Variance components and LMMs Faculty of Health Sciences Topics for today Variance components and LMMs Analysis of repeated measurements, 4th December 04 Leftover from 8/: Rest of random regression example. New concepts for today:

More information

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago. Mixed Models for Longitudinal Binary Outcomes Don Hedeker Department of Public Health Sciences University of Chicago hedeker@uchicago.edu https://hedeker-sites.uchicago.edu/ Hedeker, D. (2005). Generalized

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Generalized Linear Models for Count, Skewed, and If and How Much Outcomes Today s Class: Review of 3 parts of a generalized model Models for discrete count or continuous skewed outcomes Models for two-part

More information

Generalized linear models

Generalized linear models Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 1 Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data Dr Jisheng Cui Public Health

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16)

STAT 5200 Handout #23. Repeated Measures Example (Ch. 16) Motivating Example: Glucose STAT 500 Handout #3 Repeated Measures Example (Ch. 16) An experiment is conducted to evaluate the effects of three diets on the serum glucose levels of human subjects. Twelve

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Introduction to mtm: An R Package for Marginalized Transition Models

Introduction to mtm: An R Package for Marginalized Transition Models Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors

More information

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes

Introduction to Linear Mixed Models: Modeling continuous longitudinal outcomes 1/64 to : Modeling continuous longitudinal outcomes Dr Cameron Hurst cphurst@gmail.com CEU, ACRO and DAMASAC, Khon Kaen University 4 th Febuary, 2557 2/64 Some motivational datasets Before we start, I

More information

Analysis of variance and regression. November 22, 2007

Analysis of variance and regression. November 22, 2007 Analysis of variance and regression November 22, 2007 Parametrisations: Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information