H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Size: px
Start display at page:

Download "H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL"

Transcription

1 H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. ABSTRACT Clustered or Hierarchical structures data with binary responses are very common in many practical applications. Clustered data may have equal number of observations, or they may have not. These data structure often involve the analysis of data with complex patterns of variability. Mixed models are often the most appropriate models to use in practice, as they contain fixed effects of interest and random effects to account for the clustering. The random effects reflect multiple error structures. As for data that are clustered, According to Lee and Nelder (1996) for clustered binary mixed effects models, a preferred model is the Hierarchical Generalized Linear Model (HGLM). This article compares the performance of h-likelihood estimation method of the mixed effects clustered binary data models with balanced and unbalanced cluster sizes. The comparative was evaluated by computer simulation in terms of unbiasedness parameters, Type I error rate, power, and standard error. The simulation is performed by using different numbers of clusters and different numbers of cluster sizes. The results show that the balanced mixed effects clustered binary data models is more fit then unbalanced mixed effects clustered binary data models. Keywords: Hierarchical Generalized Linear Model (HGLM), H-Likelihood Method, Binary Response. INTRODUCTION Many research studies in health, finance, education, and social sciences have involved collecting binary data clustered into groups, such as the smoking status of students sampled from different schools or disease status of animals from different farms. Such data would be expected to be correlated within clusters, as students from the same school would tend to be more similar than those from different schools, and animals from the same farm would tend to be more similar than those from different farms. When designing such studies, a choice need to be made regarding the number of groups to sample from. A larger number of groups or schools resulted in less dependence in the data and more precision in estimating the effects of explanatory variables. In some experiments, the clusters may balanced or unbalanced; that is, the number of observations in a cluster (the size of the cluster), may equal or differs among the clusters. Unbalanced clusters resulted from sub-sampling unequal numbers of observations from each cluster. Unbalanced clusters also occurred when there were randomly missing vector elements for a clustered multivariate outcome or if subjects differed in the number of relevant vector elements for the analysis. Many authors studied the unbalanced clustered data; The different cluster size could lead to different dispersions for each cluster. This unbalanced data in each cluster brought up the problem of heterogeneous models which required different variance components, as had been addressed in previous studies for continuous response (El-Saeiti, 2004). In this article, the researcher used a nested design with mixed effects model, the mixed model was the most appropriate model to use in real life, as it contained fixed and random factors. When the model contains both fixed effects and random effects, it is named the generalized linear mixed models (GLMM) or hierarchical generalized linear models (HGLM), Lee and Nelder, Hierarchical generalized linear models allow extra error components in the linear predictors of generalized linear models. The distribution of these components is not required to be normal, allowing a broader class of models. In hierarchical generalized linear models, the response and random effects are allowed to follow any distribution in the exponential family for more details see McCullagh and Searle (2001). As such, the HGLM is more appropriate for clustered data than the generalized linear models (GLM). In generalized linear models, using the maximum likelihood (ML) to estimate the mean component. An extension to ML in HGLM is Restricted Pseudo Likelihood estimation (REPL) estimation method for binary mixed effect models that discussed in depth by (El-Saeiti, 2015). Helena and Louise (1997) showed ML and REPL have parameter estimates that agree fairly closely. To estimate the mean parameters and dispersion parameters, by using hierarchical likelihood estimation (HL). In HL the distribution of random components does not need to be normal same as REPL; this allows for a broader class of models (Lee and Nelder, 1996).

2 Lee and Nelder (1996) defined the hierarchical likelihood for y h = ln( f (y v; β, φ)) + ln ( f (v; α)) (1) l (β, φ ; y v) + l (α ; v), (2) where f (y v; β) and f (v; α) denote the condition density function of y given random effect v, and the density function of v, respectively. One reason for developing an algorithm for the v-scale rather than for the u-scale is that v could often assume any real value whereas u usually has range restrictions, which may cause problems in convergence (Lee & Nelder, 1996). The random component v is the scale on which the random effect u occurs linearly in the linear predictor, v = v(u), where β are fixed effects, φ are the dispersion parameters for the conditional distribution of y v, and α are the parameters for the random effects. Call estimates are derived from maximizing the h-likelihood and the maximum h-likelihood estimates (MH- LEs); these are obtained by solving: h β = 0, h v = 0. As an example to explain the HGLM, focusing on the binary outcome, According to (Lee & Nelder, 1996), the appropriate distribution for the dependent variable is binomial (since the outcome is binary) and the appropriate distribution for the random effect is a beta distribution. For more detail and example on binary data outcome with beta distribution for random effects see El-Saeiti (2013), Lalonde (2009) and Lee and Nelder (1996). The HGLM pieces: Response distribution, random distribution, linear component, and the link function respectively are: Y i j u i Bin(µ, µ(1 µ)), u i Beta (γ,λ), η i j = x i j β + v(u i ), η i j = logit (p). The h likelihood for binomial-beta model (Lee & Nelder, 1996) h = l (β, φ ; y v) + l (α ; v). As such, the h likelihood estimation equation for the fixed part β and random component v respectively are Thus, h = β k k i=1 n i j=1 [ x i jk y i j n i x i jk e (x i j β+v i) 1 + e (x i j β+v i) ] = 0, (3) and ˆv i = h v i = n i j=1 [ ] y i j e(x i j β+v i) 1 + e (x i j β+v i) e (v i) + γ (γ + λ) 1 + e (v = 0. (4) i)

3 Thus, equating h v i to zero gives an estimate of the random effect û i = k i=1 n i j=1 y i j n i p i + λ. λ + γ Then we could solve equations (3) and (4) by using either a Newton Raphson method or a Fisher s scoring method Gu (2008). SIMULATION For generating data, the researcher generate two dates sets, the first data set for balanced cluster size, and the second data set for unbalanced cluster size. Then defined the values for parameters and generated the values, random effect variable, and calculated the probability of the dependent variable. For an unequal cluster size was generated an unequal number of subjects per cluster from the Poisson distribution. The mean from the Poisson distribution was the mean for the number of observations for each cluster. By choosing different varying mean cluster sizes ( n = 10, 25, 50,100), the researcher showed the difference in statistical performance for various sample sizes. The next step was to generate a normally distributed continuous variable x i j with mean = 3 and a known variance = 20; x 1i j N(3,20). Thus, the researcher generated a beta distributed random variable u i with a parameter γ =2 and λ= 3 for each cluster i; u i Beta(2,3). For equal cluster size, the same steps were taken, but the number of observations is equal in each cluster. Finally, Y i j was generated for each data unit randomly from a Bernoulli distribution with a success probability where eβ 0+β 1 x i j +u i p i j = 1 + e β 0+β 1 x i j +u i Where β 0 =1, β 1 = 0.2 Parameter estimates were obtained using H-Likelihood, Heo and Leon (2005). The article defined to be the number of clusters [ K= 10, 20, 50,100], the cluster size for balanced cluster [ n= 10, 25, 100], and for unbalanced cluster as the mean number of observations per cluster [ n = 10, 25, 100]. For each combination of K and n, 1, 000 data sets were generated for each case equal and unequal to calculate the power, Type I error, and standard errors. To calculate the power, Type I error rate, and standard error, data were generated according to the model with the systematic component η i j = β 0 + β 1 x 1i j + v i, with one affected treatment of β 1. Thus, the model was fitted with the systematic component η i j = β 0 + β 1 x 1i j + β 2 x 2i j + v i,, where β 0 was the intercept,β 1 was the treatment effect, x 1 was generated from normal distribution, β 2 was an extra parameter, and x 2 was the second treatment effect generated from the Poisson distribution with mean = 3, x 2 Poi(λ = 3). Power was estimated as proportion of correct detection of significance for β 1, while Type I error rate was estimated as proportion of incorrect detection of significance for β 2. In H-Likelihood HGLM was described in last paragraph, the systematic component applied for generating data was η i j = x 1i j + v i, and the systematic component for the fit model was η i j = x 1i j + 3.1x 2i j + v i, where v i Beta(2,3). For the Binomial Beta h-likelihood, the researcher used the HGLM function in the HGLM package in R. Using the hglm function got the estimation for parameters β and t-statistics with the p-values. Through simulation, the average of 1,000 estimates was calculated for β 1, β 2, power of the hypothesis test for β 1, Type I error of the hypothesis test for β 2, and standard error for β 1.

4 RESULTS Table 1 for Binomial Beta h-likelihood estimate parameters. The Binomial Beta h-likelihood estimate Table 1: Estimate parameters Clusters Sample size ˆ β1 Balanced cluster ˆ β2 Unbalanced cluster K = K = K = K = parameters for balanced and unbalanced cluster size showed an estimate values for β 1 and β 2 were very close to actual values which were β 1 = 0.2 and β 2 = 0. The Binomial Beta h-likelihood was a good estimate method, with estimated values close to true parameters. The results show that the performance of Binomial Beta h-likelihood estimate is similar, regardless of inequality in cluster size. Table 2 explained the Binomial Beta h-likelihood Type I error rate for β 2 for balanced and unbalanced cluster size. Type I error rates were computed as the proportion of p values less than 0.05 under a null hypothesis H 0 : β 2 = 0. Ideally, Type I error rate should be close to Type I error rate for β 2 explained slightly different value for equal and unequal cluster size. It was noticed that balanced cluster size has smaller values for large cluster size then unbalanced cluster size. ˆ β1 ˆ β2

5 Table 2: Type I Error Clusters Sample size Balanced Unbalanced K = K = K = K = Table 3 demonstrated the Binomial Beta h-likelihood power of the hypothesis test for β 1. Statistical power was computed as the proportion of correct rejections of the hypothesis H 0 : β 1 = 0. Through simulation, the test was conducted 1,000 times to see how often the test was significant. The power was the proportion of those 1,000 tests rejected correctly. Table 3: Power Clusters Sample size Balanced Unbalanced K = K = K = K = It is noticed the balanced cluster size was more powerful then unbalanced cluster size especially with small sample size. The power statistics for balanced clustered is higher then unbalanced clustered which mean the Binomial Beta h-likelihood is better estimate method for balanced then unbalanced cluster binary model.

6 Table 4 refer to Stranded error. The SE was computed as the average of 1,000 SE of the estimates of β 1. Smaller SE represented smaller estimated variability, or greater precision, of the parameter estimates, Heo and Leon, The standard error for ˆβ indicated whether or not the efficiency improved. From Table 4, the Binomial Beta h-likelihood showed the balanced cluster has small standard errors. Table 1 to Table 4 for Table 4: Stranded error Clusters Sample size Balanced Unbalanced K = K = K = K = the Binomial Beta h-likelihood method for equal and unequal clusters sizes summarized the simulation result for parameters estimate, power statistics test, Type I error rate, and standard error. From Tables are noticed that Binomial Beta h-likelihood was a good estimate method, because the average of 1,000 replications gave estimates that were very close to actual value, which was 0.2 for β 1, and β 2 was close to zero. The power statistics for balanced was higher then unbalanced, and the Type I error rate for balanced clustered had a kind of smaller results then unbalanced clustered. The smaller average of SE represented smaller estimated variability, or greater precision, of the parameter estimates, Heo and Leon (2005). The balanced cluster size has a kind of better values then unbalanced cluster sizes. CONCLUSIONS Binomial Beta h-likelihood was an effective method for mixed effects for clustered binary data model with slightly different according to cluster size. The average of 1,000 replications gave estimates that were close to actual values. The power of the hypothesis test for regression parameters in balanced was better then unbalanced and the Type I error rate for the hypothesis test for regression parameters was acceptable with smaller values for balanced then unbalanced. The standard error for regression parameters was small. In this article, the author proves that Binomial Beta h-likelihood is a acceptable estimation method for balanced clustered sizes more then unbalanced clusters binary response. The results from the simulation demonstrated the capability of Binomial Beta h-likelihood estimation method with balanced cluster size.

7 FUTURE WORK Since Binomial Beta h-likelihood is a acceptable estimation method for balanced clustered sizes more then unbalanced clusters binary response; It is a good idea to adjust the Binomial Beta h-likelihood estimate method to deal with unbalanced cluster size which will be the next work for the author. References El-Saeiti, I. N. (2004). Messy data in heteroscedastic models case study: Mixed nested design. M.Sc. THESIS. El-Saeiti, I. N. (2013). Adjusted variance components for unbalanced clustered binary data models. Ph.D. Dissertations. El-Saeiti, I. N. (2015). Performance of mixed effects for clustered binary data models. AIP Conf. Proc., 1643:, Gu, Z. (2008). Model diagnostics for generalized linear mixed models. Dissertations. Helena, Geys. Geert, M. and Louise, M. R. (1997). Pseudo-likelihood inference for clustered binary data. COMMUN STATIST-THEORY METH, 26(11): Heo, M. and Leon, A. (2005). Performance of a mixed effects logistic regression model for binary outcomes with unequal cluster size. Biopharmaceutical Statistics, 15: Lalonde, T. L. (2009). Components of overdispersion in hierarchical generalized linear models. Dissertations. Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society, Series B (Methodological), 58(4): McCullagh, C. E. and Searle, S. R. (2001). Generalized, Linear, and Mixed Models. John Wiley & Sons, Inc., New York.

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

Generalized, Linear, and Mixed Models

Generalized, Linear, and Mixed Models Generalized, Linear, and Mixed Models CHARLES E. McCULLOCH SHAYLER.SEARLE Departments of Statistical Science and Biometrics Cornell University A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS, INC. New

More information

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93 Contents Preface ix Chapter 1 Introduction 1 1.1 Types of Models That Produce Data 1 1.2 Statistical Models 2 1.3 Fixed and Random Effects 4 1.4 Mixed Models 6 1.5 Typical Studies and the Modeling Issues

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

The performance of estimation methods for generalized linear mixed models

The performance of estimation methods for generalized linear mixed models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2008 The performance of estimation methods for generalized linear

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

LOGISTICS REGRESSION FOR SAMPLE SURVEYS

LOGISTICS REGRESSION FOR SAMPLE SURVEYS 4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models Optimum Design for Mixed Effects Non-Linear and generalized Linear Models Cambridge, August 9-12, 2011 Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

More information

Statistics 572 Semester Review

Statistics 572 Semester Review Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.

More information

Comparison of beta-binomial regression model approaches to analyze health related quality of life data

Comparison of beta-binomial regression model approaches to analyze health related quality of life data Comparison of beta-binomial regression model approaches to analyze health related quality of life data January 4, 2017 Josu Najera-Zuloaga 1, Dae-Jin Lee 1 and Inmaculada Arostegui 1,2,3 1 Basque Center

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations

Spring RMC Professional Development Series January 14, Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations Spring RMC Professional Development Series January 14, 2016 Generalized Linear Mixed Models (GLMMs): Concepts and some Demonstrations Ann A. O Connell, Ed.D. Professor, Educational Studies (QREM) Director,

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data

Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized Linear Mixed Models for Count Data Sankhyā : The Indian Journal of Statistics 2009, Volume 71-B, Part 1, pp. 55-78 c 2009, Indian Statistical Institute Generalized Quasi-likelihood versus Hierarchical Likelihood Inferences in Generalized

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Confidence intervals for the variance component of random-effects linear models

Confidence intervals for the variance component of random-effects linear models The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression. Ryan Godwin. ECON University of Manitoba Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

More information

Sample size calculations for logistic and Poisson regression models

Sample size calculations for logistic and Poisson regression models Biometrika (2), 88, 4, pp. 93 99 2 Biometrika Trust Printed in Great Britain Sample size calculations for logistic and Poisson regression models BY GWOWEN SHIEH Department of Management Science, National

More information

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University. Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 1 Review of Panel Data 2 Model Types 3 Review and Looking Forward 1 Review of Panel Data 2 Model Types 3 Review

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Generalized linear models

Generalized linear models Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction

More information

PQL Estimation Biases in Generalized Linear Mixed Models

PQL Estimation Biases in Generalized Linear Mixed Models PQL Estimation Biases in Generalized Linear Mixed Models Woncheol Jang Johan Lim March 18, 2006 Abstract The penalized quasi-likelihood (PQL) approach is the most common estimation procedure for the generalized

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Lecture #11: Classification & Logistic Regression

Lecture #11: Classification & Logistic Regression Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality

AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality AP-Optimum Designs for Minimizing the Average Variance and Probability-Based Optimality Authors: N. M. Kilany Faculty of Science, Menoufia University Menoufia, Egypt. (neveenkilany@hotmail.com) W. A. Hassanein

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research Research Methods Festival Oxford 9 th July 014 George Leckie

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

Package HGLMMM for Hierarchical Generalized Linear Models

Package HGLMMM for Hierarchical Generalized Linear Models Package HGLMMM for Hierarchical Generalized Linear Models Marek Molas Emmanuel Lesaffre Erasmus MC Erasmus Universiteit - Rotterdam The Netherlands ERASMUSMC - Biostatistics 20-04-2010 1 / 52 Outline General

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS Ivy Liu and Dong Q. Wang School of Mathematics, Statistics and Computer Science Victoria University of Wellington New Zealand Corresponding

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

SUPPLEMENTARY SIMULATIONS & FIGURES

SUPPLEMENTARY SIMULATIONS & FIGURES Supplementary Material: Supplementary Material for Mixed Effects Models for Resampled Network Statistics Improve Statistical Power to Find Differences in Multi-Subject Functional Connectivity Manjari Narayan,

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

STAT 705 Generalized linear mixed models

STAT 705 Generalized linear mixed models STAT 705 Generalized linear mixed models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 24 Generalized Linear Mixed Models We have considered random

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Hierarchical Generalized Linear Model Approach For Estimating Of Working Population In Kepulauan Riau Province

Hierarchical Generalized Linear Model Approach For Estimating Of Working Population In Kepulauan Riau Province IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Hierarchical Generalized Linear Model Approach For Estimating Of Working Population In Kepulauan Riau Province To cite this article:

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC

Mantel-Haenszel Test Statistics. for Correlated Binary Data. Department of Statistics, North Carolina State University. Raleigh, NC Mantel-Haenszel Test Statistics for Correlated Binary Data by Jie Zhang and Dennis D. Boos Department of Statistics, North Carolina State University Raleigh, NC 27695-8203 tel: (919) 515-1918 fax: (919)

More information

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data

Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Journal of Modern Applied Statistical Methods Volume 4 Issue Article 8 --5 Testing Goodness Of Fit Of The Geometric Distribution: An Application To Human Fecundability Data Sudhir R. Paul University of

More information

Handbook of Regression Analysis

Handbook of Regression Analysis Handbook of Regression Analysis Samprit Chatterjee New York University Jeffrey S. Simonoff New York University WILEY A JOHN WILEY & SONS, INC., PUBLICATION CONTENTS Preface xi PARTI THE MULTIPLE LINEAR

More information

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research Quality & Quantity 34: 137 152, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 137 Estimated Precision for Predictions from Generalized Linear Models in Sociological Research TIM FUTING

More information

Application of Prediction Techniques to Road Safety in Developing Countries

Application of Prediction Techniques to Road Safety in Developing Countries International Journal of Applied Science and Engineering 2009. 7, 2: 169-175 Application of Prediction Techniques to Road Safety in Developing Countries Dr. Jamal Al-Matawah * and Prof. Khair Jadaan Department

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

GLM models and OLS regression

GLM models and OLS regression GLM models and OLS regression Graeme Hutcheson, University of Manchester These lecture notes are based on material published in... Hutcheson, G. D. and Sofroniou, N. (1999). The Multivariate Social Scientist:

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures

SAS/STAT 13.1 User s Guide. Introduction to Survey Sampling and Analysis Procedures SAS/STAT 13.1 User s Guide Introduction to Survey Sampling and Analysis Procedures This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete

More information

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models

Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Clemson University TigerPrints All Dissertations Dissertations 8-207 Correlated and Interacting Predictor Omission for Linear and Logistic Regression Models Emily Nystrom Clemson University, emily.m.nystrom@gmail.com

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information