ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Size: px
Start display at page:

Download "ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples"

Transcription

1 ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1

2 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will use models as the basis of such analysis. Models can handle more complicated situations than discussed so far. We can also estimate the parameters, which describe the effects in a more informative way. 2

3 Example: Challenger O-ring For the 23 space shuttle flights that occurred before the Challenger mission disaster in 1986, the following table shows the temperature at the time of flight and whether at least one primary O-ring suffered thermal distress. 3

4 The Data Ft Temp TD Ft Temp TD Ft Temp TD Is there any association between Temperature and thermal distress? 4

5 Fit From Linear Regression 5

6 Fit From Logistic Regression 6

7 Example: Horseshoe Crabs Each female horseshoe crab in the crab in the study had a male crab attached to her in her nest. The study investigated factors that affect whether the female crab had any other males, called satellites, residing nearby her. Explanatory variables included the female crab s color, spine condition, weight, and carapace width. The response outcome for each female crab is her number of satellites. 7

8 Example Continued We consider the width alone as a predictor. To obtain a clearer picture, we grouped the female crabs into a set of width categories 23.25, , , , , , , >29.25 Calculated sample mean number of satellites for female crabs in each category. 8

9 9

10 10

11 Random component Components of A GLM Identifies the response variable Y and assumes a probability distribution for it Systematic component Specifies the explanatory variables used as predictors in the model Link Describes the functional relation between the systematic component and expected value of the random component 11

12 Random Component Let Y 1,, Y N denote the N observations on the response variables Y. The random component specifies a probability distribution for Y 1,, Y N. If the potential outcome for each observation Y i are binary such as success or failure ; or, more generally, each Y i might be number of successes out of a certain fixed number of trials, we can assume a binomial distribution for the random component. If each response observation is a non-negative count, such as cell count in a contingency table, then we may assume a Poisson distribution for the random component. 12

13 Systematic Component The systematic component specifies the explanatory variables. It specifies the variables that play the roles of x j in the formula α + β 1 x β k x k. This linear combination of explanatory variables is called the linear predictors. Some x j may be based on others in the model; for instance, perhaps x 3 = x 1 x 2, to allow interaction between x 1 and x 2 in their effects on Y, or perhaps x 3 = x 2 1 to allow a curvilinear effect of x 1. 13

14 Link It specifies how µ = E(Y ) relates to explanatory variables in the linear predictor. The model formula states that g(µ) = α + β 1 x β k x k The function g(.) is called the link function. 14

15 Some Popular Link Functions Identity Link g(µ) = µ = α + β 1 x β k x k Log link g(µ) = log(µ) = α + β 1 x β k x k Logit µ g(µ) = log[ 1 µ ] = α + β 1x β k x k 15

16 More On Link Functions Each potential probability distribution has one special function of the mean that is called its natural parameter. For the normal distribution, it is mean itself. For the Poisson, the natural parameter is the log of the mean. For the Binomial, the natural parameter is the logit of the success probability. The link function that uses the natural parameter as g(µ) in the GLM is called the canonical link. Though other links are possible, in practice the canonical links are most common 16

17 GLM For Binary Data: Random Component The distribution of a binary response is specified by probabilities P (Y = 1) = π of success and P (Y = 0) = 1 π of failure. For n independent observations on a binary response with parameter π, the number of successes has the binomial distribution specified by parameters n and π. 17

18 Linear Probability Model To model the effect of X, use ordinary linear regression, by which the expected value of Y is a linear function of X. The model π(x) = α + βx is called a linear probability model. Probabilities fall between 0 and 1 but for large of small values of x, the model may predict π(x) < 0 or π(x) > 1. This model is valid only for a finite range of x values 18

19 Example: Snoring Heart Disease Snoring Yes No Proportion Yes Linear Fit Never Occasional Nearly Every Night Every Night

20 Example: We use (0,2,4,5) for their snoring categories, treating the last two levels closer. Linear Fit using maximizing likelihood: π(x) = x The least squares fit is slightly different. 20

21 Logistic Regression Model Relationship between π(x) and x are usually nonlinear rather than linear. The most important function describing this nonlinear relationship has the form That is, π(x) log( 1 π(x) ) = α + βx π(x) = F 0 (α + βx), where F 0 (x) = ex 1 + e x = e x, where F 0 (x) is the cdf of the logistic distribution. Its pdf is F 0 (x)(1 F 0 (x)). The associated GLM is called the logistic regression f unction. Logistic regression models are often referred as logit models as the link in this GLM is the logit link. 21

22 Parameters The parameter β determines the rate of increase or decrease of the curve. When β > 0, π(x) increases with x. When β < 0, π(x) decreases as x increases. The magnitude of β determines how fast the curve increases or decreases. As β increases, the curve has a steeper rate of change. 22

23 Example: Snoring Heart Disease Snoring Yes No Proportion Linear Logit Yes Fit Fit Never Occasional Nearly Every Night Every Night

24 Effect of Parameters 24

25 Alternative Binary Links For logistic regression curves, the probability of a success increases or decreases continuously as x increases. Let X denote a random variable, the cumulative distribution function (cdf) F (x) is defined as F (x) = P (X x), < x < Such a function, plotted as a function of x, has appearance like that of the logistic function in the previous figures. It suggests a class of models for binary responses of the form π(x) = F (α + βx) where F is a cdf for some distribution. 25

26 Alternative Binary Links The logistic regression curve has this form. When β > 0, π(x) = F (α + βx) has the shape of the cdf of the two-parameter logistic distribution. When β < 0, 1 π(x) = 1 F (α + βx) has the shape of the cdf of the two-parameter logistic distribution. Each choice of α and β > 0 corresponds to a different logistic distribution. The logistic cdf F 0 (x) corresponds to a probability distribution F 0 (x)(1 F 0 (x)) with a symmetric, bell shape and very similar looking to a normal distribution. 26

27 Probit Models The probability of success, π(x), has the form Φ(α + βx) where Φ is the cdf of a standard normal distribution N(0, 1). The link function is known as probit link: g(π) = Φ 1 (π). The probit transform maps π(x) so that the regression curve for π(x) (or 1 π(x), when β < 0) has the appearance of the normal cdf with mean µ = α/β and standard deviation σ = 1/ β. 27

28 Example: Snoring Heart Disease Snoring Yes No Proportion Linear Logit Probit Yes Fit Fit Fit Never Occasional Nearly Every Night Every Night

29 29

30 GLM for Count Data Many discrete response variables have counts as possible outcomes. For a sample of cities worldwide, each observation might be the number of automobile thefts in For a sample of silicon wafers used in computer chips, each observation might be the number of imperfections on a wafer. We have earlier seen the Poisson distribution as a sampling model for counts. 30

31 Poisson Regression Assume a Poisson distribution for the random component. One can model the Poisson mean using the identity link. But more common to model the log of the mean. A Poisson loglinear model is a GLM that assumes a Poisson distribution for Y and uses the log link. 31

32 Poisson Regression - Continued Let µ denote the expected value of Y and let X denote an explanatory variable. Then the Poisson log-linear model has the form log(µ) = α + βx For this model: µ = exp(α + βx) = e α (e β ) x 32

33 Example: Horseshoe Data 33

34 Poisson Regression For Rate Data It is often relevant to certain types of events which occur over time, space, or some other index of size, to model the rate at which events occur. In modeling numbers of auto thefts in 2003 for a sample of cities, we could form a rate for each city by dividing the number of thefts by the city s population size. The model describes how the rate depends on some other explanatory variables. 34

35 Poisson Regression For Rate Data When a response count Y i has index (such as population size) equal to t i, the sample rate of outcomes is Y i /t i. The expected value of the rate is µ i /t i. A log-linear model for the expected rate has form: log(µ i /t i ) = α + βx i This has an equivalent representation: log µ i log t i = α + βx i The adjustment term, log t i, to the above equation is called an offset. 35

36 Exponential Family The random variable Y has a distribution in the exponential family, if its p.d.f (or p.m.f.) can be written as f(y; θ, ϕ) = exp{(yθ b(θ))/a(ϕ) + c(y, ϕ)} for some specific function a(ϕ), b(θ) and c(y, ϕ). The parameter θ is called the natural parameter and ϕ is called the dispersion (or scale) parameter. 36

37 The p.d.f. of N(µ, σ 2 ) Examples: Normal Distribution f(y; θ, ϕ) = 1 2πσ 2 exp{ (y µ)2 /(2σ 2 )} = exp{(yµ µ 2 /2)/σ 2 (y 2 /σ 2 + log(2πσ 2 ))/2} Here, θ = µ, ϕ = σ 2, and a(ϕ) = ϕ, b(θ) = θ 2 /2 and c(y, ϕ) = {y 2 /σ 2 + log(2πσ 2 )}/2 The canonical link: g(µ) = µ. 37

38 Examples: Binomial Distribution The p.d.f. of Bernoulli(π): f(y; θ, ϕ) = π y (1 π) 1 y π = (1 π)( 1 π )y π = exp{y log( 1 π ) log( 1 1 π )} Here θ = log( π 1 π ),ϕ = 1,b(θ) = log(1 + eθ ) a(ϕ) = 1, c(y, ϕ) = 0 The canonical link: g(π) = log( π 1 π ). 38

39 The p.d.f. of Poisson(λ): Examples: Poisson Distribution λ λy f(y; θ, ϕ) = e y! = exp{(y log λ λ) log y!} Here θ = log λ,ϕ = 1, b(θ) = e θ, a(ϕ) = 1, c(y, ϕ) = log y! The canonical link: g(λ) = log(λ). 39

40 The log-likelihood function Log-Likelihood Functions l(θ, ϕ; y) = log f(y; θ, ϕ) = (yθ b(θ))/a(ϕ) + c(y, ϕ) We use general likelihood results, applicable to exponential families E( l l θ ) = 0 and E( θ )2 = E( 2 l θ ) 2 Here l θ = (y b (θ))/a(ϕ) and 2 l θ 2 = b (θ)/a(ϕ) 40

41 Mean and Variances Now, we have 0 = E( l θ ) = {E(y) b (θ)}/a(ϕ) So that, E(Y ) = b (θ). Similarly, var(y ) a 2 (ϕ) So that, var(y ) = b (θ)a(ϕ). = b (θ) a(ϕ) 41

42 Examples Normal Distribution: b(θ) = θ 2 /2 E(Y ) = b (θ) = θ = µ var(y ) = b (θ)a(ϕ) = ϕ = σ 2. Bernoulli Distribution: b(θ) = log(1 + e θ ) E(Y ) = b (θ) = e θ /(1 + e θ ) = π. var(y ) = b (θ) = e θ /(1 + e θ ) 2 = π(1 π) Poisson Distribution: b(θ) = exp(θ) E(Y ) = b (θ) = exp(θ) = λ. var(y ) = b (θ) = exp(θ) = λ. 42

43 Likelihood Equations in GLMs Let (y 1,, y N ) denote responses for N independent observations. Let (x i1,, x ip ) denote values of p explanatory variables for observation i. The systematic component for the i-th observation η i = p β j x ij j=1 If E(y i ) = µ i, then the link for the i-th observation is: η i = g(µ i ) 43

44 Likelihood Equations in GLMs For N independent observations, the log-likelihood function is: L(β)= N L i = N log f(y i ; θ, ϕ) = N + N c(y i, ϕ) i=1 i=1 i=1 y i θ i b(θ i ) a(ϕ) The notation L(β) reflects the dependence of θ on the model parameters β. i=1 44

45 Likelihood Equations in GLMs The likelihood equations are: for j = 1,, p. Simplifying, we have L(β) β j = N i=1 L i β j = 0, N i=1 (y i µ i )x ij var(y i ) µ i η i = 0 for j = 1,, p. Notice that µ i η i = 1/g (µ i ). 45

46 Logit Model: Examples N (y i π i )x ij = 0, where π i = i=1 i=1 k=1 exp( 1+exp( p j=1 p j=1 β j x ij ) β j x ij ) Probit Model: N (y i π i )x ij π i (1 π i ) ϕ( p β k x ik ) = 0, where π i = Φ( p Log-Linear Model: N (y i exp( p β k x ik ))x ij = 0 i=1 k=1 j=1 β j x ij ) 46

47 Maximum Likelihood Estimates ML estimates of β j s are obtained by solving the likelihood equations using numerical methods. The ML estimates ˆβ j s are approximately normally distributed. Thus, a confidence interval for a model parameter β j equals ˆβ j ± z α/2 ASE where ASE is the asymptotic standard error of ˆβ j. 47

48 To test: H 0 : β j = 0. Testing For Significance The test statistic, Z = ˆβ j /ASE has an approximate standard normal distribution, when H 0 is true. Equivalently, Z 2 has a chi-squared distribution with d.f. = 1, which can be used for two-sided alternatives This type of test is known as Wald s test. 48

49 Likelihood Ratio Test The likelihood-ratio test statistic equals 2 log(l 0 /L 1 ) = 2[log L 0 log L 1 ] = 2[l 0 l 1 ] where L 0 and L 1 are the maximized likelihood functions under the null hypothesis and under the full model, respectively. Under H 0, this test statistic also has a large sample chi-squared distribution with d.f. = 1. 49

50 Score Test The score statistic or efficient score statistic uses the size of the derivative of the log-likelihood function evaluated at β j = 0. The score statistic is the square of the ratio of this derivative to its ASE. It also has an approximate chi-squared distribution. 50

51 Example In simple logistic regression with one explanatory variable, the log-likelihood function is: l(α, β) = N {y i (α + βx i ) log(1 + exp(α + βx i ))} i=1 Therefore for H 0 : β = 0 and H 1 : β 0, we have l 0 = N {y i log ȳ 1 ȳ log 1 i=1 1 ȳ }, l 1 = N {y i (ˆα + ˆβx i ) log(1 + exp(ˆα + ˆβx i ))} i=1 51

52 Model Residuals For i-th observation, the raw residual is: r i = y i ˆµ i = Observed fitted, where y i is the observed response and ˆµ i is the fitted value from the model. The Pearson residual is defined as Pearson residual= Oberved-fitted = var(observed) ˆ For Poisson GLMs, it simplifies to e i = y i ˆµ i ˆµi y i ˆµ i. var(yi ˆ ) 52

53 Adjusted Residuals The Pearson residuals divided by its estimated standard error are called adjusted residuals. Adjusted residuals have an approximate standard normal distribution. For Poisson GLMs, the general form of the adjusted residual is: (y i ˆµ i ) ˆµi (1 h i ) = e i 1 hi where h i is called the leverage of observation i. 53

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Likelihoods for Generalized Linear Models

Likelihoods for Generalized Linear Models 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Generalized Linear Models: An Introduction

Generalized Linear Models: An Introduction Applied Statistics With R Generalized Linear Models: An Introduction John Fox WU Wien May/June 2006 2006 by John Fox Generalized Linear Models: An Introduction 1 A synthesis due to Nelder and Wedderburn,

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Section Poisson Regression

Section Poisson Regression Section 14.13 Poisson Regression Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 26 Poisson regression Regular regression data {(x i, Y i )} n i=1,

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

11. Generalized Linear Models: An Introduction

11. Generalized Linear Models: An Introduction Sociology 740 John Fox Lecture Notes 11. Generalized Linear Models: An Introduction Copyright 2014 by John Fox Generalized Linear Models: An Introduction 1 1. Introduction I A synthesis due to Nelder and

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Generalized Linear Models. stat 557 Heike Hofmann

Generalized Linear Models. stat 557 Heike Hofmann Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann

Statistics of Contingency Tables - Extension to I x J. stat 557 Heike Hofmann Statistics of Contingency Tables - Extension to I x J stat 557 Heike Hofmann Outline Testing Independence Local Odds Ratios Concordance & Discordance Intro to GLMs Simpson s paradox Simpson s paradox:

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Extending the Linear Model

Extending the Linear Model Extending the Linear Model G.Janacek School of Computing Sciences, UEA Part I generalized linear models 1 Contents I generalized linear models 1 0.1 Introduction................................... 4 0.1.1

More information

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure).

STAT Chapter 13: Categorical Data. Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). STAT 515 -- Chapter 13: Categorical Data Recall we have studied binomial data, in which each trial falls into one of 2 categories (success/failure). Many studies allow for more than 2 categories. Example

More information

Linear regression is designed for a quantitative response variable; in the model equation

Linear regression is designed for a quantitative response variable; in the model equation Logistic Regression ST 370 Linear regression is designed for a quantitative response variable; in the model equation Y = β 0 + β 1 x + ɛ, the random noise term ɛ is usually assumed to be at least approximately

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is Example The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is log p 1 p = β 0 + β 1 f 1 (y 1 ) +... + β d f d (y d ).

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form

where F ( ) is the gamma function, y > 0, µ > 0, σ 2 > 0. (a) show that Y has an exponential family distribution of the form Stat 579: General Instruction of Homework: All solutions should be rigorously explained. For problems using SAS or R, please attach code as part of your homework Assignment 1: Due Jan 30 Tuesday in class

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

When is MLE appropriate

When is MLE appropriate When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name:

STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14. Your Name: STAT5044: Regression and ANOVA, Fall 2011 Final Exam on Dec 14 Your Name: Please make sure to specify all of your notations in each problem GOOD LUCK! 1 Problem# 1. Consider the following model, y i =

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................

More information

Generalized Linear Models and Extensions

Generalized Linear Models and Extensions Session 1 1 Generalized Linear Models and Extensions Clarice Garcia Borges Demétrio ESALQ/USP Piracicaba, SP, Brasil March 2013 email: Clarice.demetrio@usp.br Session 1 2 Course Outline Session 1 - Generalized

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Topic 12 Overview of Estimation

Topic 12 Overview of Estimation Topic 12 Overview of Estimation Classical Statistics 1 / 9 Outline Introduction Parameter Estimation Classical Statistics Densities and Likelihoods 2 / 9 Introduction In the simplest possible terms, the

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

2.3 Analysis of Categorical Data

2.3 Analysis of Categorical Data 90 CHAPTER 2. ESTIMATION AND HYPOTHESIS TESTING 2.3 Analysis of Categorical Data 2.3.1 The Multinomial Probability Distribution A mulinomial random variable is a generalization of the binomial rv. It results

More information

Applied Linear Statistical Methods

Applied Linear Statistical Methods Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

MATH Generalized Linear Models

MATH Generalized Linear Models MATH 523 - Generalized Linear Models Pr. David A. Stephens Course notes by Léo Raymond-Belzile Leo.Raymond-Belzile@mail.mcgill.ca The current version is that of July 31, 2018 Winter 2013, McGill University

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information