,..., θ(2),..., θ(n)
|
|
- Gary Griffith
- 5 years ago
- Views:
Transcription
1 Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates. One choice is the log-linear model: 0 1 Pr(Y = y) = c(θ) X θ (1) j y j + X θ (2) j 1 j y j1 y j θ (n) n y 1...y n A, j j 1 <j 2 with 2 n 1 parameters θ = (θ (1) 1,..., θ(1) n, θ (2) and where c(θ) is the normalizing constant. 12,..., θ(2) n 1,n,..., θ(n) 12...n )T, This formulation allows calculation of cell probabilities, but is less useful for describing Pr(Y = y) as a function of x. Note that we have 2 n 1 parameters and we have two aims: reduce this number, and introduce a regression model. 324 Example: n = 2. We have Pr(Y 1 = y 1, Y 2 = y 2 ) = c(θ) exp θ (1) 1 y 1 + θ (1) 2 y 2 + θ (2) 12 y 1y 2, where θ = (θ (1) 1, θ(1) 2, θ(2) 12 )T and c(θ) 1 = 1X 1X y 1 =0 y 2 =0 exp θ (1) 1 y 1 + θ (1) 2 y 2 + θ (2) 12 y 1y 2 y 1 y 2 Pr(Y 1 = y 1, Y 2 = y 2 ) 0 0 c(θ) 1 0 c(θ) exp(θ (1) 1 ) 0 1 c(θ) exp(θ (1) 2 ) 1 1 c(θ) exp(θ (1) 1 + θ (1) 2 + θ (2) 12 ) 325
2 Hence we have interpretations: exp(θ (1) 1 ) = Pr(Y 1 = 1, Y 2 = 0) Pr(Y 1 = 0, Y 2 = 0) = Pr(Y 1 = 1 Y 2 = 0) Pr(Y 1 = 0 Y 2 = 0) the odds of an event at trial 1, given no event at trial 2; exp(θ (1) 2 ) = Pr(Y 1 = 0, Y 2 = 1) Pr(Y 1 = 0, Y 2 = 0) = Pr(Y 2 = 1 Y 1 = 0) Pr(Y 2 = 0 Y 1 = 0) the odds of an event at trial 2, given an event at trial 1; exp(θ (12) 12 ) = Pr(Y 1 = 1, Y 2 = 1)Pr(Y 1 = 0, Y 2 = 0) Pr(Y 1 = 1, Y 2 = 0)Pr(Y 1 = 0, Y 2 = 1) = Pr(Y 2 = 1 Y 1 = 1)/ Pr(Y 2 = 0 Y 1 = 1) Pr(Y 2 = 1 Y 1 = 0)/ Pr(Y 2 = 0 Y 1 = 0) the ratio of odds of an event at trial 2 given an event at trial 1, divided by the odds of an event at trial 2 given a at event at trial 1. Hence if this parameter is larger than 1 we have positive dependence. 326 Quadratic Exponential Log-Linear Model We describe three approaches to modeling binary data: conditional odds ratios, correlations, marginal odds ratios. Zhao and Prentice (1990) consider the log-linear model with third and higher-order terms set to zero, so that 0 1 Pr(Y = y) = c(θ) X j θ (1) j y j + X θ (2) jk y jy k A. j<k For this model Interpretation: Pr(Y j = 1 Y k = y k, Y l = 0, l j, k) Pr(Y j = 0 Y k = y k, Y l = 0, l j, k) = exp(θ(1) j + θ (2) jk y k). exp(θ (1) j ) is the odds of a success, given all other responses are zero. exp(θ (2) jk ) is the odds ratio describing the association between Y j and Y k, given all other responses are fixed (equal to zero). 327
3 Limitations: 1. Suppose we now wish to model θ as a function of x. Example: Y respiratory infection, x mother s smoking (no/yes). Then we could let the parameters θ depend on x, i.e. θ(x). But the difference between θ (1) j (x = 1) and θ (1) j (x = 0) represent the effect of smoking on the conditional probability of respiratory infection at visit j, given that there was no infection at any other visits. Difficult to interpret, and we would rather model the marginal probability. 2. The interpretation of the θ parameters depends on the number of responses n particularly a problem in a longitudinal setting with different n i. 328 Bahadur Representation Another approach to parameterizing a multivariate binary model was proposed by Bahadur (1961) who used marginal means, as well as second-order moments specified in terms of correlations. Let Then we can write 0 R j = Y j µ j [µ j (1 µ j )] 1/2 ρ jk = corr(y j, Y k ) = E[R j R k ] ρ jkl = E[R j R k R l ] ρ 1,...,n = E[R 1...R n ] Pr(Y = y) + X ρ jk r j r k + X j<k j<k<l ny j=1 µ y j j (1 µ j) 1 y j 1 ρ jkl r j r k r l ρ 1,...,n r 1 r 2...r n A Appealing because we have the marginal means µ j and nuisance parameters. 329
4 Limitations: Unfortunately, the correlations are constrained in complicated ways by the marginal means. Example: consider measurements on a single individual, Y 1 and Y 2, with means µ 1 and µ 2. We have and corr(y 1, Y 2 ) = Pr(Y 1 = 1, Y 2 = 1) µ 1 µ 2 {µ 1 (1 µ 1 )µ 2 (1 µ 2 )} 1/2 max(0, µ 1 + µ 2 1) Pr(Y 1 = 1, Y 2 = 1) min(µ 1, µ 2 ), which implies complex constraints on the correlation. For example, if µ 1 = 0.8 and µ 2 = 0.2 then 0 corr(y 1, Y 2 ) Marginal Odds Ratios An alternative is to parameterize in terms of the marginal means and the marginal odds ratios defined by γ jk = Pr(Y j = 1, Y k = 1) Pr(Y j = 0, Y k = 0) Pr(Y j = 1, Y k = 0)Pr(Y j = 0, Y k = 1) = Pr(Y j = 1 Y k = 1)/Pr(Y j = 0 Y k = 1) Pr(Y j = 1 Y k = 0)/Pr(Y j = 0 Y k = 0) which is the odds that the j-th observation is a 1, given the k-th observation is a 1, divided by the odds that the j-th observation is a 1, given the k-th observation is a 0. Hence if γ jk > 1 we have positive dependence between outcomes j and k. It is then possible to obtain the joint distribution in terms of the means µ, where µ j = Pr(Y j = 1) the odds ratios γ = (γ 12,..., γ n 1,n ) and contrasts of odds ratios 331
5 We need to find E[Y j Y k ] = µ jk, so that we can write down the likelihood function, or an estimating function. For the case of n = 2 we have and so γ 12 = Pr(Y 1 = 1, Y 2 = 1) Pr(Y 1 = 0, Y 2 = 0) Pr(Y 1 = 1, Y 2 = 0)Pr(Y 1 = 0, Y 2 = 1) = µ 12(1 µ 1 µ 2 + µ 12 ) (µ 1 µ 12 )(µ 2 µ 12 ), µ 2 12(γ 12 1) + µ 12 b + γ 12 µ 1 µ 2 = 0, where b = (µ 1 + µ 2 )(1 γ 12 ) 1, to give µ 12 = b ± p b 2 4(γ 12 1)µ 1 µ 2. 2(γ 12 1) Y Y µ 1 1 µ 12 µ 1 1 µ 2 µ Limitations In a longitudinal setting (we add an i subscript to denote individuals), finding the µ ijk terms is computationally complex. Large numbers of nuisance odds ratios if n i s are large assumptions such as γ ijk = γ for all i, j, k may be made. Another possibility is to take log γ ijk = α 0 + α 1 t ij t ik 1, so that the degree of association is inversely proportional to the time between observations. 333
6 Modeling Multivariate Binary Data Using GEE For a marginal Bernoulli outcome we have Pr(Y ij = y ij x ij ) = µ y ij ij (1 µ ij) 1 y ij = exp(y ij θ ij log{1 + e θ ij }), where θ ij = log(µ ij /(1 µ ij ), an exponential family representation. For independent responses we therefore have the likelihood mx Xn i mx Xn i mx Xn i Pr(Y x) = exp 4 y ij θ ij log{1 + e θ ij } 5 = exp 4 j=1 j=1 To find the MLEs we consider the score equation: G(β) = l β = mx Xn i j=1 l ij θ ij θ ij β = mx Xn i x ij (y ij µ ij ) = j=1 j=1 l ij 3 5. mx x T i (y i µ i ). So GEE with working independence can be implemented with standard software, though we need to fix-up the standard errors via sandwich estimation. 334 Non-independence GEE Assuming working correlation matrices: R i (α) and estimating equation G(β) = where W i = 1/2 i R i (α) 1/2 i. mx D T i W 1 i (y i µ i ), Here α are parameters that we need a consistent estimator of (Newey 1990, shows that the choice of estimator for α has no effect on the asymptotic efficiency). Define a set of n i (n i 1)/2 empirical correlations R ijk = (Y ij µ ij )(Y ik µ ik ) [µ ij (1 µ ij )µ ik (1 µ ik )] 1/2. We can then define a set of moment-based estimating equations to obtain estimates of α. 335
7 First Extension to GEE Rather than have a method of moments estimator for α, Prentice (1988) proposed using a second set of estimating equations for α. In the context of data with var(y ij ) = v(µ ij ): G 1 (β, α) = G 2 (β, α) = mx mx D T i W 1 i (Y i µ i ) E T i H 1 i (T i Σ i ) where R ij = {Y ij µ ij }/v(µ ij ) 1/2, to give data T T i = (R i1r i2,..., R ini 1R ini, R 2 i1,..., R2 in i ), Σ i (α) = E[T i ] is a model for the correlations and variances of the standardized residuals, E i = Σ i α, and H i = cov(t i ) is the working covariance model. The vector T i has n i (n i 1)/2 + n i elements in general the working covariance model H i is in general complex. 336 In the context of binary data Prentice (1988) the variances are determined by the mean and so the last n i terms of T i are dropped; he also suggests taking a diagonal working covariance model, H i, i.e. ignoring the covariances. The theoetical variances ae given by var(r ij R ik ) = 1+(1 2p ij )(1 2p ik ){p ij (1 p ij )p ik (1 p ik )} 1/2 Σ(α) ijk Σ(α) 2 ijk which depend on the assumed correlation model Σ these may be taken as the diagonal elements of H i. 337
8 Application of GEE Extension to Marginal Odds Model We have the marginal mean model logit E[Y ij X ij ] = βx ij. Suppose we specify a model for the associations in terms of the marginal log odds ratios: j ff Pr(Yij = 1, Y ik = 1) Pr(Y ij = 0, Y ik = 0) α ijk = log. Pr(Y ij = 1, Y ik = 0) Pr(Y ij = 0, Y ik = 1) These are nuisance parameters, but how do we estimate them? Carey et al. (1992) suggest the following approach fo estimating β and α. Let µ ij = Pr(Y ij = 1) µ ik = Pr(Y ik = 1) µ ijk = Pr(Y ij = 1, Y ik = 1) 338 It is easy to show that Pr(Y ij = 1 Y ik = y ik ) Pr(Y ij = 0 Y ik = y ik ) = exp(y ik α ijk ) Pr(Y ij = 1, Y ik = 0) Pr(Y ij = 0, Y ik = 0) «µ ij µ ijk = exp(y ik α ijk ) 1 µ ij µ ik + µ ijk which can be written as a logistic regression model in terms of conditional probabilities: «Pr(Yij = 1 Y ik = y ik ) logit E[Y ij Y ik ] = log Pr(Y ij = 0 Y ik = y ik ) «µ ij µ ijk = y ik α ijk + log 1 µ ij µ ik + µ ijk where the term on the right is a known offset (the µ s are a function of β only). Suppose for simplicity that α ijk = α then given current estimates of β, α, we can fit a logistic regression model by regressing Y ij on Y ik for 1 j < k n i, to estimate α this can then be used within the working correlation model. Carey et al. (1992) refer to this method as alternating logistic regressions. 339
9 Indonesian Children s Health Example > summary(geese(y ~ xero+age,corstr="independence",id=id,family="binomial")) Mean Model: Mean Link: logit Variance to Mean Relation: binomial Coefficients: (Intercept) e+00 age e-07 xero e-02 Scale Model: Scale Link: identity Estimated Scale Parameters: (Intercept) Correlation Model: Correlation Structure: independence Number of clusters: 275 Maximum cluster size: > summary(geese(y ~ age+xero, corstr="exchangeable", id=id,family="binomial")) (Intercept) e+00 age e-06 xero e-01 Estimated Scale Parameters: (Intercept) Estimated Correlation Parameters: alpha > summary(geese(y ~ age+xero, corstr="ar1", id=id,family="binomial")) Coefficients: (Intercept) e+00 age e-07 xero e-01 Estimated Scale Parameters: (Intercept) Estimated Correlation Parameters: alpha
10 Conditional Likelihood: Binary Longitudinal Data Recall that conditional likelihood is a technique for eliminating nuisance parameters, here what we have previously modeled as random effects. Consider individual i with binary observations y i1,..., y ini and assume the random intercepts model Y ij γ i, β Bernoulli(p ij ), where «pij log = x ij β + γ i 1 p ij where γ i = x i β + b i and x ij (a slight change from our usual notation), are those covariates which change within an individual. 342 We have Pr(y i1,..., y ini γ i, β) = = = n i Y exp (γ i y ij + x ij βy ij ) 1 + exp (γ j=1 i + x ij β) P ni exp γ i j=1 y ij + P n i j=1 x ijy ij β Q ni j=1 [1 + exp (γ i + x ij β)] exp (γ i t 2i + t 1i β) Q ni j=1 [1 + exp (γ i + x ij β)] = exp (γ it 2i + t 1i β) k(γ i, β) = p(t 1i, t 2i γ i, β) where t 1i = k(γ i, β) = n i X X x ij y ij, t 2i = j=1 n i Y j=1 n i j=1 y ij [1 + exp (γ i + x ij β)]. 343
11 We have where L c (β) = p(t 2i γ i, β) = my p(t 1i t 2i, β) = P ni y i+ l=1 exp my p(t 1i, t 2i γ i, β) p(t 2i γ i, β) γ i P ni j=1 y ij + P n i k=1 x iky l ik β k(γ i, β), where the summation is over the ` n i y i+ ways of choosing yi+ ones out of n i, and y l i = (yl i1,..., yl in i ), l = 1,..., ` n i y i+ is the collection of these ways. Hence L c (β) = = my my P ni y i+ P ni exp γ i j=1 y ij + P n i j=1 x ijy ij β l=1 exp P ni y i+ P ni γ i j=1 y ij + P n i k=1 x ikyik l β exp Pni j=1 x ijy ij β l=1 exp `P n i k=1 x ikyik l β 344 Notes Can be computationally expensive to evaluate likelihood if n i is large, e.g. if n i = 20 and y i+ = 10, ` n i y i+ = 184,756. There is no contribution to the conditional likelihood from individuals: With n i = 1. With y i+ = 0 or y i+ = n i. For those covariates with x i1 =... = x ini = x i. The conditional likelihood estimates β s that are associated with within-individual covariates. If a covariate only varies between individuals, then it cannot be estimated using conditional likelihood. For covariates that vary both between and within individuals, only the within-individual contrasts are used. The similarity to Cox s partial likelihood may be exploited to carry out computation. We have not made a distributional assumption for the γ i s! 345
12 Examples: If n i = 3 and y i = (0,0, 1) so that y i+ = 1 then y 1 i = (1, 0,0), y2 i = (0, 1,0), y3 i = (0,0, 1), and the contribution to the conditional likelihood is exp(x i3 β) exp(x i1 β) + exp(x i2 β) + exp(x i3 β). If n i = 3 and y i = (1,0, 1) so that y i+ = 2 then y 1 i = (1, 1,0), y2 i = (1, 0,1), y3 i = (0,1, 1), and the contribution to the conditional likelihood is exp(x i1 β + x i3 β) exp(x i1 β + x i2 β) + exp(x i1 β + x i3 β) + exp(x i2 β + x i3 β). 346
Figure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationModeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses
Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationRepeated ordinal measurements: a generalised estimating equation approach
Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related
More informationBayesian Multivariate Logistic Regression
Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationLinear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52
Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two
More informationHigh-Throughput Sequencing Course
High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationA measure of partial association for generalized estimating equations
A measure of partial association for generalized estimating equations Sundar Natarajan, 1 Stuart Lipsitz, 2 Michael Parzen 3 and Stephen Lipshultz 4 1 Department of Medicine, New York University School
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationUsing Estimating Equations for Spatially Correlated A
Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship
More informationGeneralized Estimating Equations (gee) for glm type data
Generalized Estimating Equations (gee) for glm type data Søren Højsgaard mailto:sorenh@agrsci.dk Biometry Research Unit Danish Institute of Agricultural Sciences January 23, 2006 Printed: January 23, 2006
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationGeneralized linear models
Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationGEE for Longitudinal Data - Chapter 8
GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method
More informationSample size calculations for logistic and Poisson regression models
Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National
More informationLecture 3.1 Basic Logistic LDA
y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data
More informationSimulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling
Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling J. Shults a a Department of Biostatistics, University of Pennsylvania, PA 19104, USA (v4.0 released January 2015)
More informationLinear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.
Linear Mixed Models One-way layout Y = Xβ + Zb + ɛ where X and Z are specified design matrices, β is a vector of fixed effect coefficients, b and ɛ are random, mean zero, Gaussian if needed. Usually think
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationModeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use
Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression
More informationStuart R. Lipsitz and Garrett M. Fitzmaurice, Joseph G. Ibrahim, Debajyoti Sinha, Michael Parzen. and Steven Lipshultz
J. R. Statist. Soc. A (2009) 172, Part 1, pp. 3 20 Joint generalized estimating equations for multivariate longitudinal binary outcomes with missing data: an application to acquired immune deficiency syndrome
More informationPart IV: Marginal models
Part IV: Marginal models 246 BIO 245, Spring 2018 Indonesian Children s Health Study (ICHS) Study conducted to determine the effects of vitamin A deficiency in preschool children Data on K=275 children
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationConvergence Properties of a Sequential Regression Multiple Imputation Algorithm
Convergence Properties of a Sequential Regression Multiple Imputation Algorithm Jian Zhu 1, Trivellore E. Raghunathan 2 Department of Biostatistics, University of Michigan, Ann Arbor Abstract A sequential
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationSample Size and Power Considerations for Longitudinal Studies
Sample Size and Power Considerations for Longitudinal Studies Outline Quantities required to determine the sample size in longitudinal studies Review of type I error, type II error, and power For continuous
More informationGeneralized Linear Models (1/29/13)
STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu 1 / 35 Tip + Paper Tip Meet with seminar speakers. When you go on
More informationGeneralized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University address:
Generalized Quasi-likelihood (GQL) Inference* by Brajendra C. Sutradhar Memorial University Email address: bsutradh@mun.ca QL Estimation for Independent Data. For i = 1,...,K, let Y i denote the response
More informationLatent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent
Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationEfficiency of generalized estimating equations for binary responses
J. R. Statist. Soc. B (2004) 66, Part 4, pp. 851 860 Efficiency of generalized estimating equations for binary responses N. Rao Chaganty Old Dominion University, Norfolk, USA and Harry Joe University of
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationPoisson regression: Further topics
Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized
More informationRobust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study
Science Journal of Applied Mathematics and Statistics 2014; 2(1): 20-25 Published online February 20, 2014 (http://www.sciencepublishinggroup.com/j/sjams) doi: 10.11648/j.sjams.20140201.13 Robust covariance
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationFREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE
FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE Donald A. Pierce Oregon State Univ (Emeritus), RERF Hiroshima (Retired), Oregon Health Sciences Univ (Adjunct) Ruggero Bellio Univ of Udine For Perugia
More informationPart 8: GLMs and Hierarchical LMs and GLMs
Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course
More informationFaculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics
Faculty of Health Sciences Regression models Counts, Poisson regression, 27-5-2013 Lene Theil Skovgaard Dept. of Biostatistics 1 / 36 Count outcome PKA & LTS, Sect. 7.2 Poisson regression The Binomial
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationGeneralized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data
Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More informationStatistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes
Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationMultinomial Regression Models
Multinomial Regression Models Objectives: Multinomial distribution and likelihood Ordinal data: Cumulative link models (POM). Ordinal data: Continuation models (CRM). 84 Heagerty, Bio/Stat 571 Models for
More informationIntroduction to mtm: An R Package for Marginalized Transition Models
Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition
More informationTopic 12 Overview of Estimation
Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More informationResearch Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis
Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009 Outline Project I: Free Knot Spline Cox Model Project I: Free Knot Spline Cox Model Consider
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationPQL Estimation Biases in Generalized Linear Mixed Models
PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized
More informationJoint Modeling of Longitudinal Item Response Data and Survival
Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,
More informationNon-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models
More informationChapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models
Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationJustine Shults 1,,, Wenguang Sun 1,XinTu 2, Hanjoo Kim 1, Jay Amsterdam 3, Joseph M. Hilbe 4,5 and Thomas Ten-Have 1
STATISTICS IN MEDICINE Statist. Med. 2009; 28:2338 2355 Published online 26 May 2009 in Wiley InterScience (www.interscience.wiley.com).3622 A comparison of several approaches for choosing between working
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationUsing the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes 1 JunXuJ.ScottLong Indiana University 2005-02-03 1 General Formula The delta method is a general
More informationConstrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources
Constrained Maximum Likelihood Estimation for Model Calibration Using Summary-level Information from External Big Data Sources Yi-Hau Chen Institute of Statistical Science, Academia Sinica Joint with Nilanjan
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationSmall Sample Methods for the Analysis of Clustered Binary Data
Utah State University DigitalCommons@USU All Graduate Theses and Dissertations Graduate Studies 5-2008 Small Sample Methods for the Analysis of Clustered Binary Data Lawrence J. Cook Utah State University
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2009 Paper 251 Nonparametric population average models: deriving the form of approximate population
More informationMixed models in R using the lme4 package Part 5: Generalized linear mixed models
Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14
More informationCovariance Models (*) X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed effects
Covariance Models (*) Mixed Models Laird & Ware (1982) Y i = X i β + Z i b i + e i Y i : (n i 1) response vector X i : (n i p) design matrix for fixed effects β : (p 1) regression coefficient for fixed
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationi=1 h n (ˆθ n ) = 0. (2)
Stat 8112 Lecture Notes Unbiased Estimating Equations Charles J. Geyer April 29, 2012 1 Introduction In this handout we generalize the notion of maximum likelihood estimation to solution of unbiased estimating
More informationConsider Table 1 (Note connection to start-stop process).
Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationAppendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator
Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator
More informationThe consequences of misspecifying the random effects distribution when fitting generalized linear mixed models
The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More informationBinary Response: Logistic Regression. STAT 526 Professor Olga Vitek
Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationModels for Clustered Data
Models for Clustered Data Edps/Psych/Soc 589 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Spring 2019 Outline Notation NELS88 data Fixed Effects ANOVA
More informationModels for Clustered Data
Models for Clustered Data Edps/Psych/Stat 587 Carolyn J Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2017 Outline Notation NELS88 data Fixed Effects ANOVA
More informationCourse topics (tentative) The role of random effects
Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationA generalized linear mixed model for longitudinal binary data with a marginal logit link function
A generalized linear mixed model for longitudinal binary data with a marginal logit link function MICHAEL PARZEN, Emory University, Atlanta, GA, U.S.A. SOUPARNO GHOSH Texas A&M University, College Station,
More information