Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Size: px
Start display at page:

Download "Generalized Linear Models. Last time: Background & motivation for moving beyond linear"

Transcription

1 Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered categorical response data 2. Definition of the exponential family likelihood 3. Components of a generalized linear model (GLM) 4. Algorithm for obtaining maximum likelihood estimates 1

2 Cellular Differentiation Data (Piegorsch et al., 88) Interest in the effect of two agents of immuno-activating ability that may introduce cell differentiation. Response variable: number of cells that exhibited markers after exposure was recorded. Scientific interest: Do the agents TNF (tumor necrosis factor) and IFN (interferon) simulate cell differentiation independently or is there a synergetic effect? 2

3 Cellular Differentiation Data Number of cells Dose of Dose of differentiating TNF (U/ml) IFN (U/ml) Scale of response variable: count 3

4 Comments Potentially we can log transform the count response & then fit a normal linear regression model This is often done, but are there problems/pitfalls with this approach? For count data, often the variability increases with the mean, which is not characterized by typical normal linear models In addition, the data are discrete so the normal likelihood is clearly inappropriate, and may provide a particularly poor approximation when counts can be small 4

5 Job expectations for psychology students Study on perspectives of students asked psychology students at the University of Regensburg if they expected to find adequate employment after getting their degree. Response variable: Ordered categorical 1-3 ranking: 1. Don t expect adequate employment 2. Not sure 3. Immediately after the degree Predictor: Age in years 5

6 Grouped Job Expectations Data Age in Response years

7 Generalized Linear Model: Motivation Normal linear regression models are not appropriate for these examples. Need a more general regression framework accounting for response data having a variety of measurement scales. Methods for model fitting & inferences in this framework. Ideally, some simplifications of linear regression would carry over. Generalizations to more complex settings (correlated data, censored observations, etc) will be necessary in many applications 7

8 Generalized Linear Models: The Basics In the general linear model, (y i x i ) N(x iβ, σ 2 ), with E(y i x i ) = x iβ V(y i x i ) = σ 2 Systematic component Random component The generalized linear model, generalizes both the random & systematic components The focus is on distributions in the exponential family, which includes many useful special cases (normal, Poisson, gamma, binomial, etc) 8

9 Likelihood Function: The Simple Exponential Family Observations y i are conditionally-independent given x i (i = 1,..., n) The conditional distribution of y i x i belongs to a simple exponential family Thus, the probability density function can be expressed as: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. (1) Here, θ i, φ are parameters and a( ), b( ) and c( ) are known functions. The θ i and φ are location and scale parameters, respectively. 9

10 For example, for the Normal distribution, we have f(y i ; θ i, φ) = 1 2πσ exp{ (y i µ) 2 /2σ 2 } = exp[(y i µ µ 2 /2)/σ 2 {y 2 i /σ 2 + log(2πσ 2 )}/2] so that θ i = µ, φ = σ 2, a i (φ) = φ, b(θ i ) = θ 2 i /2 and c(y i, φ) = 1 2 {y2 i /σ 2 + log(2πσ 2 )}. Thus, in this case θ i is the mean and φ is the variance 10

11 Let l(θ i, φ; y i ) = logf(y i ; θ i, φ) denote the log-likelihood function We can derive the mean and variance for the general case using: E ( l ) ( 2 l ) ( l = 0 and E + E ) 2 = 0. θ i θi 2 θ i These relations are well known properties of the likelihood function obtained by differentiating w.r.t. θ i the identity f(yi ; θ i, φ)dy i 1, holding the dispersion parameter φ as fixed. The notation denotes that the processes of differentiation & averaging occur at the same value of θ i 11

12 Note that l(θ i ; y i ) = {y i θ i b(θ i )}/a(φ) + c(y i, φ) It follows that l θ i = {y i b (θ i )}/a(φ) and 2 l θ 2 i = b (θ i )/a(φ). Hence, from the previous equalities, we have 0 = E ( l ) = {E(yi ) b (θ i )}/a(φ), θ i which implies that E(y i ) = b (θ i ) 12

13 Similarly, we have 0 = E ( 2 l ) ( l ) 2 + E θi 2 θ i = b (θ i )/a(φ) + E[{y i b (θ i )} 2 /a(φ) 2 ] = b (θ i )a(φ) + E(y 2 i ) 2E(y i )b (θ i ) + b (θ i ) 2 = b (θ i )a(φ) + E(y 2 i ) E(y i ) 2 var(y i ) = b (θ i )a(φ) For most commonly used exponential family distributions, a(φ) = φ/w i, where φ is a dispersion parameter and w i is a weight (typically equal to one) Hence, the mean and variance will typically follow the form: µ i = b (θ i ) and σ 2 = b (θ i )φ. 13

14 Characteristics of common distributions in the exponential family Normal Poisson Binomial Gamma Notation N(µ i, σ 2 ) Pois(µ i ) Bin(n i, π i ) G(µ i, ν) Range of y i (, ) [0, ) [0, n i ] (0, ) Dispersion, φ σ 2 1 1/n i ν 1 Cumulant: b(θ i ) θ 2 i /2 exp(θ i ) log(1 + e θ i) log( θ i ) Mean function, µ(θ i ) θ i exp(θ i ) 1/(1 + e θ i) 1/θ i Canonical link: θ(µ i ) identify log logit reciprocal Variance function, V (µ i ) 1 µ µ(1 µ) µ 2 14

15 Definition of a GLM: Systematic Component, Link Functions Instead of modeling the mean, µ i, as a linear function of predictors, x i, we introduce on one-to-one continuously differentiable transformation g( ) and focus on η i = g(µ i ), where g( ) will be called the link function and η i the linear predictor. We assume that the transformed mean follows a linear model, η i = x iβ. Since the link function is invertible and one-to-one, we have µ i = g 1 (η i ) = g 1 (x iβ). 15

16 Note that we are transforming the expected value, µ i, instead of the raw data, y i. For classical linear models, the mean is the linear predictor. In this case, the identity link is reasonable since both µ i and η i can take any value on the real line. This is not the case in general. 16

17 Link Functions for Poisson Data For example, if Y i Poi(µ i ) then µ i must be > 0. In this case, a linear model is not reasonable since for some values of x i µ i 0. By using the model, η i = log(µ i ) = x iβ, we are guaranteed to have µ i > 0 for all β R p and all values of x i. In general, a link function for count data should map the interval (0, ) R (i.e., from the + real numbers to the entire real line). The log link is a natural choice 17

18 Link Functions for Binomial Data For the binomial distribution, 0 < µ i < 1 (mean of y i is n i µ i ) Therefore, the link function should map from (0, 1) R Standard choices: 1. logit: η i = log{µ i /(1 µ i )}. 2. probit: η i = Φ 1 (µ i ), where Φ( ) is the N(0, 1) cdf. 3. complementary log-log: η i = log{ log(1 µ i )}. Each of these choices is important in applications & will be considered in detail later in the course 18

19 Recall that the exponential family density has the following form: f(y i ; θ i, φ) = exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }. where a( ), b( ) and c( ) are known functions. Specifying the GLM involves choosing a( ), b( ), c( ): 1. Specify a( ), c( ) to correspond to particular distribution (e.g., Binomial, Poisson) 2. Specify b( ) to correspond to a particular link function 19

20 Recall that mean & variance are µ i = b (θ i ) and σ 2 = b (θ i )φ. Using b (θ i ) = g 1 (x iβ), we can express the density as f(y i ; x i, β, φ), so that the conditional likelihood of y i given x i depends on parameters β and φ. It would seem that a natural choice for b( ) and hence g( ), would correspond to θ i = η i = x iβ, so that b ( ) is the inverse link 20

21 Canonical Links and Sufficient Statistics Each of the distributions we have considered has a special, canonical, link function for which there exists a sufficient statistic equal in dimension to β. Canonical links occur when θ i = η i = x iβ, with θ i the canonical parameter As a homework exercise (due next Tuesday) show whether or not the following distributions are in the exponential family (and if so) provide the canonical links: (i) Poisson, (ii) Negative binomial; (iii) Gamma; (iv) Log normal For the canonical links, the sufficient statistic is X y, with components i x ij y i, for j = 1,..., p. 21

22 Although canonical links often nice properties, selection of the link function should be based on prior expectation and model fit Example: Logistic Regression Suppose y i Bin(1, p i ), for i = 1,..., n, are independent 0/1 indicator variables of an adverse response (e.g., preterm birth) and let x i denote a p 1 vector of predictors for individual i (e.g., dose of dde exposure, race, age, etc). The likelihood is as follows: f(y β) = n i=1 = n i=1 = exp [ n ( p i p y i i (1 p i ) 1 y i = n ) y i (1 p i ) i=1 1 p i exp { y i log ( p ) ( i 1 )} log 1 p i 1 p i i=1 {y i θ i log(1 + e θ i )} ]. 22

23 Choosing the canonical link, θ i = log ( p i 1 p i the likelihood has the following form: ) = x i β, f(y β) = exp[ n i=1 {y i x iβ log(1 + e x iβ )}]. This is logistic regression, which is widely used in epidemiology and other applications for modeling of binary response data. In general, if f(y i ; θ i, φ) is in the exponential family and θ i = θ(η i ), η i = x iβ, then the model is called a generalized linear model (GLM) 23

24 Model fitting Choosing a GLM results in a likelihood function: L(y; β, φ, x) = n i=1 exp { y i θ i b(θ i ) a(φ) + c(y i, φ) }, where θ i is a function of η i = x iβ. The maximum likelihood estimate is defined as β = sup L(y; β, φ, x), β with φ initially assumed to be known 24

25 Frequentist inferences for GLMs typically rely on β and asymptotic approximations. In the normal linear model special case, the MLE corresponds to the least squares estimator In general, there is no closed form expression so we need an algorithm to calculate β. 25

26 Maximum Likelihood Estimation of GLMs All GLMs can be fit using the same algorithm, a form of iteratively re-weighted least squares: 1. Given an initial value for β, calculate the estimated linear predictor η i = x i β and use that to obtain the fitted values µ i = g 1 ( η i ). Calculate the adjusted dependent variable, z i = η i + (y i µ i ) ( dη ) i dµ, 0 i where the derivative is evaluated at µ i. 26

27 2. Calculate the iterative weights W 1 i = ( dη i ) 2 dµ V 0 i. i where V i is the variance function evaluated at µ i. 3. Regress z i on x i with weight W i to give new estimates of β 27

28 Justification for the IWLS procedure Note that the log-likelihood can be expressed as l = n i=1 {y i θ i b(θ i )}/a(φ) + c(y i, φ). To maximize this log-likelihood we need l/ β j, l β j = n i=1 = n i=1 = n i=1 l i dθ i dµ i θ i dµ i dη i (y i µ i ) a(φ) (y i µ i ) W i a(φ) η i β j 1 V i dµ i dη i x ij, dη i dµ i x ij since µ i = b (θ i ) and b (θ i ) = V i implies dµ i /dθ i = V i. With constant dispersion (a(φ) = φ), the MLE equations for β j : n i=1 W i (y i µ i ) dη i dµ i x ij = 0. 28

29 Fisher s scoring method uses the gradient vector, l/ β = u, and minus the expected value of the Hessian matrix E ( 2 l ) = A. β r β s Given the current estimate b of β, choose the adjustment δb so Aδb = u. Excluding φ, the components of u are u r = n i=1 so we have A rs = E( u r / β s ) = E n i=1 [ (yi µ i ) β s W i (y i µ i ) dη i dµ i x ir, { dη } i dη i Wi x ir + Wi x ir (y i µ i ) ]. dµ i dµ i β s The expectation of the first term is 0 and the second term is n i=1 W i dη i dµ i x ir µ i β s = n i=1 W i dη i dµ i x ir dµ i dη i η i β s = n i=1 W i x ir x is. 29

30 The new estimate b = b + δb of β thus satisfies Ab = Ab + Aδb = Ab + u, where (Ab) r = s A rs b s = n i=1 W i x ir η i. Thus, the new estimate b satisfies (Ab ) r = n i=1 W i x ir {η i + (y i µ i )dη i /dµ i }. These equations have the form of linear weighted least squares equation with weight W i and dependent variable z i. 30

31 Some Comments The IWLS procedure is simple to implement and converges rapidly in most cases Procedures are available to calculate MLEs and implement frequentist inferences for GLMs in most software packages. In R or S-PLUS the glm( ) function can be used - try help(glm) In Matlab the glmfit( ) function can be used 31

32 Example: Smoking and Obesity y i = 1 if the child is obese and y i = 0 otherwise, for i = 1,..., n x i = (1, age i, smoke i, age i smoke i ) Bernoulli likelihood, L(y; β, x) = n where µ i = Pr(y i = 1 x i, β). i=1 µ y i i (1 µ i ) 1 y i, Choosing the canonical link, µ i = 1/{1 + exp( x iβ)}, results in a logistic regression model: Pr(y i = 1 x i, β) = exp(x iβ) 1 + exp(x iβ), Hence, probability of obesity depends on age and smoking through a non-linear model 32

33 Letting X = cbind(age,smoke,age*smoke) and Y = 0/1 obesity outcome in R, we use fit<- glm(y ~ age + smoke + age*smoke, family=binomial, data=obese) to implement IWLS and fit the model Note that data are available on the web - try to replicate results (note children a year or younger have been discarded) The command summary(glm) yields the results: 33

34 Coefficients: Value Std. Error t value (Intercept) age smoke age:smoke Null Deviance: on 3874 degrees of freedom Residual Deviance: on 3871 degrees of freedom Number of Fisher Scoring Iterations: 6 Correlation of Coefficients: (Intercept) age smoke age smoke age:smoke

35 Thus, the IWLS algorithm converged in 6 iterations to the MLE: β = ( 2.365, 0.066, 0.043, 0.008) For any value of the covariates we can calculate the probability of obesity For example, for non-smokers the age curves can be plotted by using: beta<- fit$coef ## introduce grid spanning range of observed ages x<- seq(min(obese$age),max(obese$age),length=100) ## calculate fitted probability of obesity py<- 1/(1+exp(-beta[1]+beta[2]*x)) plot(x,py,xlab="age in years", ylab="pr(obesity)") 35

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Likelihoods for Generalized Linear Models

Likelihoods for Generalized Linear Models 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Generalized Linear Models. stat 557 Heike Hofmann

Generalized Linear Models. stat 557 Heike Hofmann Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

When is MLE appropriate

When is MLE appropriate When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Generalized Linear Models and Exponential Families

Generalized Linear Models and Exponential Families Generalized Linear Models and Exponential Families David M. Blei COS424 Princeton University April 12, 2012 Generalized Linear Models x n y n β Linear regression and logistic regression are both linear

More information

Some explanations about the IWLS algorithm to fit generalized linear models

Some explanations about the IWLS algorithm to fit generalized linear models Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang To cite this version: Christophe Dutang. Some explanations about the IWLS algorithm to fit generalized linear

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs Outline Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team UseR!2009,

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Multinomial Regression Models

Multinomial Regression Models Multinomial Regression Models Objectives: Multinomial distribution and likelihood Ordinal data: Cumulative link models (POM). Ordinal data: Continuation models (CRM). 84 Heagerty, Bio/Stat 571 Models for

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized Linear Models and Extensions

Generalized Linear Models and Extensions Session 1 1 Generalized Linear Models and Extensions Clarice Garcia Borges Demétrio ESALQ/USP Piracicaba, SP, Brasil March 2013 email: Clarice.demetrio@usp.br Session 1 2 Course Outline Session 1 - Generalized

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables

More information

Part V: Binary response data

Part V: Binary response data Part V: Binary response data 275 BIO 233, Spring 2015 Western Collaborative Group Study Prospective study of coronary heart disease (CHD) Recruited 3,524 men aged 39-59 between 1960-61 employed at 10 companies

More information

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester Modelling Rates Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 05/12/2017 Modelling Rates Can model prevalence (proportion) with logistic regression Cannot model incidence in

More information

Stat 710: Mathematical Statistics Lecture 12

Stat 710: Mathematical Statistics Lecture 12 Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

MATH Generalized Linear Models

MATH Generalized Linear Models MATH 523 - Generalized Linear Models Pr. David A. Stephens Course notes by Léo Raymond-Belzile Leo.Raymond-Belzile@mail.mcgill.ca The current version is that of July 31, 2018 Winter 2013, McGill University

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

High-Throughput Sequencing Course

High-Throughput Sequencing Course High-Throughput Sequencing Course DESeq Model for RNA-Seq Biostatistics and Bioinformatics Summer 2017 Outline Review: Standard linear regression model (e.g., to model gene expression as function of an

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Lecture 13: More on Binary Data

Lecture 13: More on Binary Data Lecture 1: More on Binary Data Link functions for Binomial models Link η = g(π) π = g 1 (η) identity π η logarithmic log π e η logistic log ( π 1 π probit Φ 1 (π) Φ(η) log-log log( log π) exp( e η ) complementary

More information

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form Stat 579: General Instruction of Homework: All solutions should be rigorously explained. For problems using SAS or R, please attach code as part of your homework Assignment 1: Due Jan 30 Tuesday in class

More information

STA 216, GLM, Lecture 16. October 29, 2007

STA 216, GLM, Lecture 16. October 29, 2007 STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Assumptions of Linear Model Homoskedasticity Model variance No error in X variables Errors in variables No missing data Missing data model Normally distributed error Error in

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information