ZERO INFLATED POISSON REGRESSION

Similar documents
Chapter 11 The COUNTREG Procedure (Experimental)

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

MODELING COUNT DATA Joseph M. Hilbe

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Generalized Linear Models for Non-Normal Data

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Generalized Multilevel Models for Non-Normal Outcomes

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Lecture 2: Poisson and logistic regression

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Generalized Models: Part 1

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Exam Applied Statistical Regression. Good Luck!

Linear Regression Models P8111

Lecture 5: Poisson and logistic regression

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

Binary Logistic Regression

Stat 642, Lecture notes for 04/12/05 96

LOGISTIC REGRESSION Joseph M. Hilbe

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

STA6938-Logistic Regression Model

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Introduction to Generalized Models

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes 1

Chapter 1. Modeling Basics

Lecture 12: Effect modification, and confounding in logistic regression

CHAPTER 1: BINARY LOGIT MODEL

Investigating Models with Two or Three Categories

9 Generalized Linear Models

Modeling Overdispersion

Continuing with Binary and Count Outcomes

Estimated Precision for Predictions from Generalized Linear Models in Sociological Research

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Statistics: A review. Why statistics?

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Model Estimation Example

Lecture 11 Multiple Linear Regression

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

MSUG conference June 9, 2016

Semiparametric Generalized Linear Models

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Discrete Choice Modeling

High-Performance Variable Selection for Generalized Linear Models: PROC HPGENSELECT

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

Longitudinal Data Analysis Using SAS Paul D. Allison, Ph.D. Upcoming Seminar: October 13-14, 2017, Boston, Massachusetts

GENERALIZED LINEAR MODELS Joseph M. Hilbe

Logistic Regressions. Stat 430

COMPLEMENTARY LOG-LOG MODEL

Chapter 14 Logistic and Poisson Regressions

Generalized linear models

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Beyond GLM and likelihood

Regression Methods for Survey Data

ST3241 Categorical Data Analysis I Two-way Contingency Tables. 2 2 Tables, Relative Risks and Odds Ratios

Interactions in Logistic Regression

12 Generalized linear models

STAT5044: Regression and Anova

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

STAT 7030: Categorical Data Analysis

Package HGLMMM for Hierarchical Generalized Linear Models

Multinomial Logistic Regression Models

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Simple logistic regression

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Cohen s s Kappa and Log-linear Models

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

High-Throughput Sequencing Course

Regression and Generalized Linear Models. Dr. Wolfgang Rolke Expo in Statistics C3TEC, Caguas, October 9, 2015

Advanced Quantitative Data Analysis

Exam details. Final Review Session. Things to Review

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Log-linear Models for Contingency Tables

Lecture 10: Introduction to Logistic Regression

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

In Class Review Exercises Vartanian: SW 540

Models for Binary Outcomes

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Transcription:

STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY

POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS EXAMPLE

CONTINUOUS Regression BINARY Logistic Regression REGRESSION OUTCOME VARIABLES ORDINAL Ordinal Logistic Regression NOMINAL Nominal Logistic Regression COUNT Poisson Regression ( proc genmod )

COUNT Poisson Regression ( proc genmod ) PROBABILITY 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 µ=1 µ=2 µ=3 µ=4 µ=5 POISSON DISTRIBUTION ( See the #HO 4.2) µ is the mean of the distribution. Pr(y µ)= e μ μ y, for y=0, 1, 2,. y! As µ increases, the mass of the distribution shifts to the right. the probability of a zero count decreases the distribution approximates a normal distribution. 0 1 2 3 4 5 6 7 8 9 10 11 # OF EVENT µ=var (y). Equidispersion

HOWEVER, IN REAL WORLD EXCESS ZERO: More observed zeros than predicted by the Poisson distribution OVERDISPERSION: Variance > Mean 0.25 0.2 Real Data Poisson Distrb SAS OUTPUT N 797 Sum Weights 797 PROBABILITY 0.15 0.1 0.05 Mean 3.32371393 Sum Observations 2649 Std Deviation 2.57658616 Variance 6.63879624 Skewness 0.20107733 Kurtosis -1.1940515 Uncorrected SS 14089 Corrected SS 5284.48181 0 0 1 2 3 4 5 6 7 8 # OF EVENT Coeff Variation 77.5212975 Std Error Mean 0.09126736 DATA: Böhning, Dankmar, et al. "The zero inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology." Journal of the Royal Statistical Society: Series A (Statistics in Society) 162.2 (1999): 195-209.

WE NEED MORE THAN SIMPLE POISSON REGRESSION. WHAT WE CAN CONSIDER IS ZERO INFLATED POISSON REGRESSION COUNT OUTCOME TRY POISSON DONE OVEDISPERSION? NO YES FIT OK? NO YES EXCESS ZEROS? YES NO NEGATIVE BINOMIAL ZERO-INFLATED MODEL FIT OK? YES Zero-Inflated Negative Binomial Zero-Inflated Poisson.. DONE (Lavery, 2010, An Animated Guide: An Introduction To Poisson Regression )

ZERO INFLATED POISSON REGRESSION ZERO INFLATED REGRESSION 0.25 0.2 Real Data Poisson Distrb BINARY OUTCOME VARIABLE: y=0 or 1 EXCESS ZERO OR NOT?, (CALLED ALWAYS ZERO GROUP) PREDICTOR VARIABLES : x 1, x 2, x 3 LOGISTIC REGRESSION PROBABILITY 0.15 0.1 y=1 Pr y i = 1 x i = exp(xt β) 1+exp(x T β) = ψ i Pr y i = 0 x i = 1 ψ i 0.05 0 y=0 0 1 2 3 4 5 6 7 8 # OF EVENT POISSON REGRESSION COUNT OUTCOME VARIABLE: y= 1, 2, 3, k PREDICTOR VARIABLES: z 1, z 2, z 3. POISSON REGRESSION Pr y i z i ) = e μ iμ y i y i!

ZERO INFLATED POISSON REGRESSION- PROBABILITY ZEROS PROBABILITY 0.25 0.2 0.15 0.1 0.05 Pr (yi =0 xi ) = {Pr (INF ZERO) * Pr (0 INF ZERO )} + {Pr (~INF ZERO ) * Pr (0 ~ INF ZERO ) } = [ ψ i * 1] + [ ( 1 - ψ i ) * e μ i ] = ψ i + ( 1 - ψ i ) e μ i NON ZERO COUNTS 0 0 1 2 3 4 5 6 7 8 # OF EVENT Pr ( yi zi )= Pr (~INF ZERO )*Pr(yi ~INF ZERO) = ( 1 - ψ i ) * e μ iμ i y i y i!

SAS EXAMPLE DATA: National Latino Asian American Survey, 2003 SUBJECT OF ANALYSIS: LATINO, ASIAN IMMIGRANTS IN THE U.S., WHO HAVE LIVED IN THE U.S. LESS THAN A YEAR (N=552) OUTCOME VARIABLE: HOW MANY TIMES HAVE YOU EXPERIENCED PANIC ATTACK IN YOUR LIFE? PREDICTOR VARIABLE: LOGISTIC REGRESSION (ZERO INFLATED MEMEBERSHIP): gender POISSON REGRESSION : age, gender, marital status (married, divorced, single) OVERDISPERSION: Variance > Mean Variable N Mean Std Dev Minimum Maximum panic 552 2.3949275 7.7723865 0 55.00 AGE 552 34.1213768 11.9979481 18.00 86.00 gender 552 0.4365942 0.4964133 0 1.00 single 552 0.2083333 0.4064848 0 1.00 married 552 0.7228261 0.4480091 0 1.00 divorce 552 0.0688406 0.2534125 0 1.00 panic Frequency Percent EXCESS ZERO: More zeros than predicted by poisson distrb. Cumulative Frequency Cumulative Percent 0 491 88.95 491 88.95 4 2 0.36 493 89.31 5 1 0.18 494 89.49 6 1 0.54 499 90.40 55 1 0.18 552 100.00

proc countreg method=qn; model panic= age gender married divorce/dist=zip; zeromodel panic~gender; run; The COUNTREG Procedure Model Fit Summary Dependent Variable panic Number of Observations 552 Data Set Model ZI Link Function WORK.NLAAS2 ZIP Logistic Log Likelihood -455.54507 Maximum Absolute Gradient 0.0002048 Number of Iterations 18 Optimization Method Quasi-Newton AIC 925.09014 SBC 955.28498

proc countreg method=qn; model panic= age gender married divorce/dist=zip; zeromodel panic~gender; run; ZERO INFLATED (LOGISTIC) REGRESSION PART Being a male increases the odds of not having the opportunity of experiencing panic attack by 117% (exp0.77=2.17), and this is statistically significant (p=.0097) Parameter Estimates Parameter DF Estimate Standard t Value Approx Error Pr> t Intercept 1 2.207487 0.102312 21.58 <.0001 AGE 1 0.018109 0.002069 8.75 <.0001 gender 1-0.06173 0.0641-0.96 0.3356 married 1 0.202634 0.083279 2.43 0.015 divorce 1 0.132984 0.125869 1.06 0.2907 Inf_Intercept 1 1.803058 0.162704 11.08 <.0001 Inf_gender 1 0.775374 0.299601 2.59 0.0097 POISSON (COUNT) REGRESSION PART Among those who have the risk of panic attack, being one year older increases the expected rate of panic attack by 1.8% (exp=0.018=1.018), holding all other variables constant, and this is statistically significant (p<.0001).

SUMMARY COUNT OUTCOME OVEDISPERSION? YES EXCESS ZEROS? YES ZERO-INFLATED POISSON proc countreg /dist=zip ZERO INFLATED (BINARY LOGISTIC) REGRESSION PART zeromodel y ~ x1 x2 ; POISSON (COUNT) REGRESSION PART model y= z1 z2 ; /*Entire SAS CODE for ZIP */ proc countreg method=qn; model y= z1 z2 /dist=zip; zeromodel y ~ x1 x2; run;

SUN Y. JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY s.jeon@aggiemail.usu.edu