Chapter 4: Generalized Linear Models-II

Size: px
Start display at page:

Download "Chapter 4: Generalized Linear Models-II"

Transcription

1 : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay (VCU) 1 / 26

2 4.3 Generalized Linear Models for Counts Overdispersion for Poisson GLMs If data are truly Poisson, then we should have roughly E(Y i ) = var(y i ) = µ i. Data can be grouped into like categories and this can be informally checked. For the horseshoe crab data we have the following (Table 4.4): Width (cm) Sample mean Sample variance < > D. Bandyopadhyay (VCU) 2 / 26

3 4.3 Generalized Linear Models for Counts The sample variance tends to be 2-3 times as much as the mean. This is an example of overdispersion. There is greater variability in the data than we expect under our sampling model. Fixes: Find another sampling model! Include other important, explanatory covariates. Random effects as a proxy to unknown, latent covariates. Quasi-likelihood approach. We ll explore a common approach to the first fix above... D. Bandyopadhyay (VCU) 3 / 26

4 4.3 Generalized Linear Models for Counts Review: PhD 2011 Exam Question on Linear Models, Q1 We say Y NB(α,β), if Y λ Poisson(λ) and λ Gamma(α,β) with density f(λ) = 1 Γ(α)β λ α 1 e λ/β,α,β > 0. α Proof: The probability mass function for the Negative Binomial is h(y) = = = 0 0 g(y λ)f(λ)dλ e λ λ y λ α 1 e λ/β y! Γ(α)β α dλ 1 Γ(y +1)Γ(α)β α 0 λ y+α 1 exp [ λ ( ) ] β 1 dλ β +1 The integrand is the kernel of a Gamma density with shape parameter α+y and scale parameter β/(β +1). D. Bandyopadhyay (VCU) 4 / 26

5 4.3 Generalized Linear Models for Counts To get a Gamma density and make the integral equal 1, we divide the integrand by Γ(α+y) [β/(β +1)] α+y and of course to keep h(y) from changing, we must multiply the factor outside the integral by that quantity. So h(y) = = Γ(α+y)[β/(β +1)]α+y Γ(y +1)Γ(α)β ( α ) Γ(α+y) 1 α ( ) β y. Γ(y +1)Γ(α) 1+β 1+β Recall that the Gamma distribution for λ has mean αβ and variance αβ 2. E(y) = E[E(y λ)] = E[λ] = αβ = µ and Var(Y) = E[Var(Y λ)]+var[e(y λ)] = E[λ]+Var[λ] = αβ +αβ 2 = µ+µβ = µ(1+β) D. Bandyopadhyay (VCU) 5 / 26

6 4.3 Generalized Linear Models for Counts Negative binomial regression Let k = α and µ = αβ, then Y negbin(k,µ) with ( ) Γ(y +k) k k ( p(y) = 1 k ) y for y = 0,1,2,3,... Γ(k)Γ(y +1) µ+k µ+k Then E(Y) = µ and var(y) = µ+µ 2 /k. A sampling model that includes another parameter allows some separation between the mean and variance. The negative binomial distribution is a discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number k of failures occurs. The index k 1 is called a dispersion parameter. As k the Poisson distribution is obtained. Here, the variance increases with the mean; is that appropriate for the crab data? Book looks at crab data on p Another modeling approach: adding a random effect for each crab, coming up toward the end of the semester. D. Bandyopadhyay (VCU) 6 / 26

7 4.4 Mean, Variance & Likelihood for GLMs* 4.4 Mean & Variance for GLMs A two parameter exponential family includes a dispersion parameter φ: f(y i θ i,φ) = exp{[y i θ i b(θ i )]/a(φ)+c(y i,φ)}. When φ is known, it simplifies to f(y i θ i ) = a(θ i )b(y i )exp[y i Q(θ i )], the nature one parameter exponential family. This includes binomial, negative binomial, Poisson, normal, and many others. Let L i = logf(y i ;θ i,φ). This is the contribution of the i th observation to the likelihood in terms of θ i and φ. Then L(θ,φ) = N i=1 logf(y i;θ i,φ) = N i=1 L i, where L i = [y i θ i b(θ i )]/a(φ)+c(y i,φ). D. Bandyopadhyay (VCU) 7 / 26

8 4.4 Mean, Variance & Likelihood for GLMs* Then some work gives us µ i = E(Y i ) = b (θ i ) and var(y i ) = b (θ i )a(φ). The model imposes µ i = b (θ i ) = g 1 (x iβ). The N-dimensional µ, or equivalently θ, is reduced to the p-dimensional β (and φ in a 2-parameter family). Then L(β,φ) = N [ yi (b ) 1 (g 1 (x i β)) b((b ) 1 (g 1 (x i β))) i=1 as θ i = (b ) 1 (g 1 (x i β)). a(φ) The MLEs ˆβ and ˆφ are found by taking first derivatives of this, setting equal to zero, and solving (pp ). Things simplify when using the canonical link. ] +c(y i,φ), D. Bandyopadhyay (VCU) 8 / 26

9 4.4 Mean, Variance & Likelihood for GLMs* The asymptotic covariance matrix for ˆβ is the inverse of the fisher information matrix, cov(ˆβ). This is a function of the unknown β and φ, and in practice we just plug in the MLE values ˆβ and ˆφ yielding ĉov(ˆβ). Section shows how Poisson and binomial GLMs fit into the general exponential family form and specifies corresponding b(θ i ), a(φ), and c(y i,φ). Section carries out computations leading to ĉov(ˆβ) in the Poisson regression model with a log link. D. Bandyopadhyay (VCU) 9 / 26

10 4.5 Inference for GLMs Deviance and GOF For now assume we re able to get ˆβ. Anyway, we are able to, in SAS or R! Recall that the saturated model estimates the N µ i s with the N y i s, providing perfect fit. This model does not reduce data, provide a means for prediction for arbitrary covariate values x, allow for meaningful hypotheses to be tested, etc. However, we can use the saturated model to check the fit of a real GLM. If, essentially, the number of distinct covariate vectors remains fixed but N increases then G 2 = 2logL(µ(ˆβ), ˆφ r ;y) logl(y, ˆφ f ;y)] is the LRT statistic for testing H 0 : g(µ i ) = x iβ relative to the alternative that the means µ are unstructured. D. Bandyopadhyay (VCU) 10 / 26

11 4.5 Inference for GLMs In Poisson and binomial regression models a(φ) = 1, i.e. there is no dispersion parameter, and this LRT statistic is equal to the model deviance as described last time for grouped data. When there is a dispersion parameter φ (e.g. normal, negative binomial, or gamma regression models), 2 times the difference in saturated and reduced models log-likelihood is D/φ in most models, called the scaled deviance; see top, p The scaled deviance has an approximate chi-squared distribution when the reduced model holds. D. Bandyopadhyay (VCU) 11 / 26

12 4.5 Inference for GLMs LR model comparison In Poisson and binomial regression, D r = 2[L r L s ] where D is deviance, L r is log-likelihood evaluated at ˆβ for the GLM, and L s is log-likelihood evaluated at ˆµ i = y i under the saturated model. Say we add a few more predictors to the model so the dimension of β goes from p to p +q. Compute the deviance from the smaller model (D r ) and the larger model (D f ). Then D r D f is the likelihood ratio test statistic for testing H 0 : smaller model holds, and is asymptotically χ 2 q when H 0 is true. The larger this difference, the more evidence there is that the new predictors significantly improve model fit (and hence significantly reduce model deviance). Often data are not grouped; in this case it s safer to use L(β;y) directly from the output! D. Bandyopadhyay (VCU) 12 / 26

13 4.5 Inference for GLMs Residuals for GLMs Residuals indicate where model fit is inadequate. The deviance residual d i is defined in such a way that N see p i=1 d2 i = D, The Pearson residual is given by e i = y i ˆµ i. These have variance var(y i ) < 1. The standardized Pearson residuals r i properly standardize the residual to have variance one and in large samples are N(0,1) if the model holds. This means reasonably large n i for binomial data and reasonably large counts for Poisson data. So residuals r i > 3 show rather extreme lack of fit for (x i,y i ) according to the model. Residuals can be plotted versus predictors or against the linear predictor ˆη i = x iˆβ to assess systematic departures from model assumptions. Note: X 2 = N i=1 e2 i χ 2 s p when H 0 : g(µ i ) = x iβ is true. D. Bandyopadhyay (VCU) 13 / 26

14 4.6 Fitting GLMs Newton-Raphson Method in One Dimension Say we want to find where f(x) = 0 for differentiable f(x). Let x 0 be such that f(x 0 ) = 0. Taylor s theorem tells us f(x 0 ) f(x)+f (x)(x 0 x). Plugging in f(x 0 ) = 0 and solving for x 0 we get ˆx 0 = x f(x) f (x). Starting at an x near x 0, ˆx 0 should be closer to x 0 than x was. Let s iterate this idea t times: x (t+1) = x (t) f(x(t) ) f (x (t) ). Eventually, if things go right, x (t) should be close to x 0. D. Bandyopadhyay (VCU) 14 / 26

15 4.6 Fitting GLMs If f(x) : R p R p, the idea works the same, but in vector/matrix terms. Start with an initial guess x (0) and iterate x (t+1) = x (t) [Df(x (t) )] 1 f(x (t) ). If things are done right, then this should converge to x 0 such that f(x 0 ) = 0. We are interested in solving DL(β) = 0 (the score, or likelihood equations!) where DL(β) = L(β) β 1. L(β) β p and D2 L(β) = L(β) β 2 1. L(β) β p β 1 L(β) β 1 β p.... L(β) β 2 p. D. Bandyopadhyay (VCU) 15 / 26

16 4.6 Fitting GLMs So for us, we start with β (0) (maybe through a MOM or least squares estimate) and iterate β (t+1) = β (t) [D 2 L(β (t) )] 1 DL(β (t) ). This is (4.45) on p. 143 disguised. The process is typically stopped when β (t+1) β (t) < ǫ. Newton-Raphson uses D 2 L(β) as is, with the y plugged in. Fisher scoring instead uses E{D 2 L(β)}, with expectation taken over Y, which is not a function of the observed y, but harder to get. The latter approach is harder to implement, but conveniently yields ĉov(ˆβ) [ E{D 2 L(β)}] 1 evaluated at ˆβ when the process is done. D. Bandyopadhyay (VCU) 16 / 26

17 4.7 Quasi-likelihood and Overdispersion The MLE β satisfies: u j (β) = N i=1 (y i µ i )x ij v(µ i ) ( g 1 ) (η i ) = 0, j = 1,...,p, η i where η i = x i β and v(µ i) = var(y i ), a function of µ i. These are the partial derivatives of the log-likelihood function set to zero, also called the score equations. In exponential families, a given µ i = E(Y i ) and v(µ i ) = var(y i ) uniquely determines the distribution. For example, if we say E(Y i ) = µ i and var(y i ) = v(µ i ) = µ i, and that Y i is a distribution in the exponential family, then Y i has to be Poisson. D. Bandyopadhyay (VCU) 17 / 26

18 4.7 Quasi-likelihood and Overdispersion For Poisson data, we know v(µ i ) = µ i ; for Binomial data (E(Y i ) = µ i = n i π i ), we have v(π i ) = n i π i (1 π i ). If we add a dispersion parameter φ and declare that v(µ i ) = φµ i (Poisson) or v(π i ) = φn i π i (1 π i ) (binomial), the resulting family may not be exponential, or not even unique, but the score equations on the previous slide remain the same. So ˆβ does not change. What does change is the estimate ĉov(ˆβ). This estimate is the same as from the original model (where v(µ i ) = µ i or v(π i ) = n i π i (1 π i ) for Poisson or Binomial respectively) except multiplied by φ. Therefore regression effect standard errors are simply multiplied by ˆφ where ˆφ is an estimate of φ. D. Bandyopadhyay (VCU) 18 / 26

19 4.7 Quasi-likelihood and Overdispersion Let X 2 = N i=1 (y i ˆµ i ) 2 /ˆµ i for Poisson and X 2 = N i=1 (y i n iˆπ i ) 2 /[n iˆπ i (1 ˆπ i )] for binomial, the Pearson statistic for assessing (original) model fit. φ is not in the score equations, however, X 2 /φ χ 2 s p (when the dispersion model is true) where s is the number of unique covariate vectors in {x i }. Since E(χ 2 df ) = df, a MOM estimate of φ is ˆφ = X 2 /(s p). When data are grouped, s = N. The adjusted estimate is ĉov a (ˆβ) = ˆφ ĉov(ˆβ). When ˆφ > 1, which happens with overdispersed data, standard errors get properly inflated. This is an easy, ad hoc fix to overdispersion, but commonly done and useful. SAS does everything automatically when you specify SCALE=PEARSON in the MODEL statement of GENMOD. Also: SCALE=DEVIANCE works similarly. D. Bandyopadhyay (VCU) 19 / 26

20 4.7 Quasi-likelihood and Overdispersion To recall, SAS code for crab data without dispersion parameter: proc genmod; model satell = width / dist=poisson link=identity; Output: The GENMOD Procedure Model Informati on Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi Square Scaled Pearson X Log Likelihood Full Log Likelihood AIC ( smaller is better ) AICC ( smaller is better ) BIC ( smaller is better ) Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Li mi ts Chi Square Pr > ChiSq I n t e r c e p t <.0001 width <.0001 Scale D. Bandyopadhyay (VCU) 20 / 26

21 4.7 Quasi-likelihood and Overdispersion SAS code for crab data with dispersion parameter: proc genmod; model satell = width / dist=poisson link=identity scale =pearson; Output: The GENMOD Procedure Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi Square Scaled Pearson X Log Likelihood Full Log Likelihood AIC ( smaller is better ) AICC ( smaller is better ) BIC ( smaller is better ) Algorithm converged. Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Li mi ts Chi Square Pr > ChiSq I n t e r c e p t <.0001 width <.0001 Scale NOTE: The scale parameter was estimated by the square root of Pearson s Chi Squared/DOF. Note that ˆβ is the same with or without the dispersion parameter. What changes are se(ˆβ j ). D. Bandyopadhyay (VCU) 21 / 26

22 4.7 Quasi-likelihood and Overdispersion This approach to handling overdispersion works well when the mean structure is well modeled. Otherwise, what does ˆφ really estimate? This was a lot of information thrown at you very quickly. Meant to introduce notation and be an overview of things to come. We will slow down and investigate specific models in more detail. Be careful distinguishing s from N! In the saturated model, s is the number of distinct categories that data fall into. However, SAS takes the df for deviance to be the number of records N regardless. Data should be grouped in as few groups as possible when checking for dispersion. See 4.5.3, pp D. Bandyopadhyay (VCU) 22 / 26

23 4.7 Quasi-likelihood and Overdispersion Teratology example Female rats given one of four treatments: placebo, weekly iron supplement, days 7 & 10, days 0 & 7. See p. 152 for the data. The number dead y ij out of litter size n ij was recorded where i = 1,2,3,4 is the treatment group, and j = 1,...,m i is the number of litters in group i (31, 12, 5, 10). Let π i denote the probability of death in group i. The model is simply Y ij bin(n ij,π i ). The sum of two independent binomials with the same probability is also binomial. So according to the model, there really is only four observations: i y i+ n i D. Bandyopadhyay (VCU) 23 / 26

24 4.7 Quasi-likelihood and Overdispersion The idea behind this example is that there is litter-to-litter variability and so the data are really a mixture of binomial distributions and overdispersion might be present. If we do not group the data, then X 2 = 4 m i i=1 j=1 (y ij n ijˆπ i ) 2 n ijˆπ i (1 ˆπ i ). This has an approximate χ distribution when we think of litter sizes n ij. Then ˆφ = 2.86 and there s evidence of overdispersion. According to the model, the groupings are arbitrary; we cannot tell the difference between litters. If we group the data then X 2 = 4 i=1 (y i+ n i+ˆπ i ) 2 n i+ˆπ i (1 ˆπ i ) = 0. D. Bandyopadhyay (VCU) 24 / 26

25 4.7 Quasi-likelihood and Overdispersion The problem is that the latter model being fit is the saturated model. There is no way to check for overdispersion using the grouped data. However, according to the model, the groupings according to data recorded in terms on litters are arbitrary. We know this isn t the case, but the model cannot tell the difference between litters, only treatments. We are using information on litters to assess overdispersion, but not explicitly including this information in a real probability model, but rather through φ. (Better than ignoring the possibility entirely!) A possibly better approach is to include a separate term for each litter! Y ij bin(n ij,µ ij ), logit(µ ij ) = π i +γ ij, where γ ij iid N(0,σ 2 ). This random effects model explicitly includes litter-to-litter heterogeneity in the model. The γ ij serve as a proxy to unmeasured, latent genetic differences among litters. D. Bandyopadhyay (VCU) 25 / 26

26 4.7 Quasi-likelihood and Overdispersion Comments: Which approach is better, estimating φ and inflating the se s for ˆπ i or the random effects model? What assumptions under the random effects model might be violated? What strengths does it have? What assumptions using v(π i ) = φn i π i (1 π i ) might be violated? How does this affect the model? Can you see a potentially bigger problem here in using an estimate ˆφ? How would I analyze these data? With a random effects model, then examine ˆγ ij to check the normality assumption. D. Bandyopadhyay (VCU) 26 / 26

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Testing Independence

Testing Independence Testing Independence Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/50 Testing Independence Previously, we looked at RR = OR = 1

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Chapter 11: Models for Matched Pairs

Chapter 11: Models for Matched Pairs : Models for Matched Pairs Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Some comments on Partitioning

Some comments on Partitioning Some comments on Partitioning Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM 1/30 Partitioning Chi-Squares We have developed tests

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Likelihoods for Generalized Linear Models

Likelihoods for Generalized Linear Models 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University

Introduction. Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University Introduction Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 56 Course logistics Let Y be a discrete

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58

Generalized Linear Mixed-Effects Models. Copyright c 2015 Dan Nettleton (Iowa State University) Statistics / 58 Generalized Linear Mixed-Effects Models Copyright c 2015 Dan Nettleton (Iowa State University) Statistics 510 1 / 58 Reconsideration of the Plant Fungus Example Consider again the experiment designed to

More information

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1 Count models A classical, theoretical argument for the Poisson distribution is the approximation Binom(n, p) Pois(λ) for large n and small p and λ = np. This can be extended considerably to n approx Z

More information

Inference for Binomial Parameters

Inference for Binomial Parameters Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

Chapter 14 Logistic regression

Chapter 14 Logistic regression Chapter 14 Logistic regression Adapted from Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 62 Generalized linear models Generalize regular regression

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016

Faculty of Health Sciences. Correlated data. Count variables. Lene Theil Skovgaard & Julie Lyng Forman. December 6, 2016 Faculty of Health Sciences Correlated data Count variables Lene Theil Skovgaard & Julie Lyng Forman December 6, 2016 1 / 76 Modeling count outcomes Outline The Poisson distribution for counts Poisson models,

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Proportional hazards regression

Proportional hazards regression Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Chapter 2: Describing Contingency Tables - II

Chapter 2: Describing Contingency Tables - II : Describing Contingency Tables - II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu]

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27

The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27 The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell Dennis Boos Department of Statistics University of Florida and Department of Statistics North Carolina State

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Chapter 14 Logistic and Poisson Regressions

Chapter 14 Logistic and Poisson Regressions STAT 525 SPRING 2018 Chapter 14 Logistic and Poisson Regressions Professor Min Zhang Logistic Regression Background In many situations, the response variable has only two possible outcomes Disease (Y =

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

Analysis of Extra Zero Counts using Zero-inflated Poisson Models

Analysis of Extra Zero Counts using Zero-inflated Poisson Models Analysis of Extra Zero Counts using Zero-inflated Poisson Models Saranya Numna A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Mathematics and Statistics

More information

MSH3 Generalized linear model Ch. 6 Count data models

MSH3 Generalized linear model Ch. 6 Count data models Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear

More information