ZERO INFLATED POISSON REGRESSION

STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY

POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS EXAMPLE

CONTINUOUS Regression BINARY Logistic Regression REGRESSION OUTCOME VARIABLES ORDINAL Ordinal Logistic Regression NOMINAL Nominal Logistic Regression COUNT Poisson Regression ( proc genmod )

COUNT Poisson Regression ( proc genmod ) PROBABILITY 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 µ=1 µ=2 µ=3 µ=4 µ=5 POISSON DISTRIBUTION ( See the #HO 4.2) µ is the mean of the distribution. Pr(y µ)= e μ μ y, for y=0, 1, 2,. y! As µ increases, the mass of the distribution shifts to the right. the probability of a zero count decreases the distribution approximates a normal distribution. 0 1 2 3 4 5 6 7 8 9 10 11 # OF EVENT µ=var (y). Equidispersion

HOWEVER, IN REAL WORLD EXCESS ZERO: More observed zeros than predicted by the Poisson distribution OVERDISPERSION: Variance > Mean 0.25 0.2 Real Data Poisson Distrb SAS OUTPUT N 797 Sum Weights 797 PROBABILITY 0.15 0.1 0.05 Mean 3.32371393 Sum Observations 2649 Std Deviation 2.57658616 Variance 6.63879624 Skewness 0.20107733 Kurtosis -1.1940515 Uncorrected SS 14089 Corrected SS 5284.48181 0 0 1 2 3 4 5 6 7 8 # OF EVENT Coeff Variation 77.5212975 Std Error Mean 0.09126736 DATA: Böhning, Dankmar, et al. "The zero inflated Poisson model and the decayed, missing and filled teeth index in dental epidemiology." Journal of the Royal Statistical Society: Series A (Statistics in Society) 162.2 (1999): 195-209.

WE NEED MORE THAN SIMPLE POISSON REGRESSION. WHAT WE CAN CONSIDER IS ZERO INFLATED POISSON REGRESSION COUNT OUTCOME TRY POISSON DONE OVEDISPERSION? NO YES FIT OK? NO YES EXCESS ZEROS? YES NO NEGATIVE BINOMIAL ZERO-INFLATED MODEL FIT OK? YES Zero-Inflated Negative Binomial Zero-Inflated Poisson.. DONE (Lavery, 2010, An Animated Guide: An Introduction To Poisson Regression )

ZERO INFLATED POISSON REGRESSION ZERO INFLATED REGRESSION 0.25 0.2 Real Data Poisson Distrb BINARY OUTCOME VARIABLE: y=0 or 1 EXCESS ZERO OR NOT?, (CALLED ALWAYS ZERO GROUP) PREDICTOR VARIABLES : x 1, x 2, x 3 LOGISTIC REGRESSION PROBABILITY 0.15 0.1 y=1 Pr y i = 1 x i = exp(xt β) 1+exp(x T β) = ψ i Pr y i = 0 x i = 1 ψ i 0.05 0 y=0 0 1 2 3 4 5 6 7 8 # OF EVENT POISSON REGRESSION COUNT OUTCOME VARIABLE: y= 1, 2, 3, k PREDICTOR VARIABLES: z 1, z 2, z 3. POISSON REGRESSION Pr y i z i ) = e μ iμ y i y i!

ZERO INFLATED POISSON REGRESSION- PROBABILITY ZEROS PROBABILITY 0.25 0.2 0.15 0.1 0.05 Pr (yi =0 xi ) = {Pr (INF ZERO) * Pr (0 INF ZERO )} + {Pr (~INF ZERO ) * Pr (0 ~ INF ZERO ) } = [ ψ i * 1] + [ ( 1 - ψ i ) * e μ i ] = ψ i + ( 1 - ψ i ) e μ i NON ZERO COUNTS 0 0 1 2 3 4 5 6 7 8 # OF EVENT Pr ( yi zi )= Pr (~INF ZERO )*Pr(yi ~INF ZERO) = ( 1 - ψ i ) * e μ iμ i y i y i!

SAS EXAMPLE DATA: National Latino Asian American Survey, 2003 SUBJECT OF ANALYSIS: LATINO, ASIAN IMMIGRANTS IN THE U.S., WHO HAVE LIVED IN THE U.S. LESS THAN A YEAR (N=552) OUTCOME VARIABLE: HOW MANY TIMES HAVE YOU EXPERIENCED PANIC ATTACK IN YOUR LIFE? PREDICTOR VARIABLE: LOGISTIC REGRESSION (ZERO INFLATED MEMEBERSHIP): gender POISSON REGRESSION : age, gender, marital status (married, divorced, single) OVERDISPERSION: Variance > Mean Variable N Mean Std Dev Minimum Maximum panic 552 2.3949275 7.7723865 0 55.00 AGE 552 34.1213768 11.9979481 18.00 86.00 gender 552 0.4365942 0.4964133 0 1.00 single 552 0.2083333 0.4064848 0 1.00 married 552 0.7228261 0.4480091 0 1.00 divorce 552 0.0688406 0.2534125 0 1.00 panic Frequency Percent EXCESS ZERO: More zeros than predicted by poisson distrb. Cumulative Frequency Cumulative Percent 0 491 88.95 491 88.95 4 2 0.36 493 89.31 5 1 0.18 494 89.49 6 1 0.54 499 90.40 55 1 0.18 552 100.00

proc countreg method=qn; model panic= age gender married divorce/dist=zip; zeromodel panic~gender; run; The COUNTREG Procedure Model Fit Summary Dependent Variable panic Number of Observations 552 Data Set Model ZI Link Function WORK.NLAAS2 ZIP Logistic Log Likelihood -455.54507 Maximum Absolute Gradient 0.0002048 Number of Iterations 18 Optimization Method Quasi-Newton AIC 925.09014 SBC 955.28498

proc countreg method=qn; model panic= age gender married divorce/dist=zip; zeromodel panic~gender; run; ZERO INFLATED (LOGISTIC) REGRESSION PART Being a male increases the odds of not having the opportunity of experiencing panic attack by 117% (exp0.77=2.17), and this is statistically significant (p=.0097) Parameter Estimates Parameter DF Estimate Standard t Value Approx Error Pr> t Intercept 1 2.207487 0.102312 21.58 <.0001 AGE 1 0.018109 0.002069 8.75 <.0001 gender 1-0.06173 0.0641-0.96 0.3356 married 1 0.202634 0.083279 2.43 0.015 divorce 1 0.132984 0.125869 1.06 0.2907 Inf_Intercept 1 1.803058 0.162704 11.08 <.0001 Inf_gender 1 0.775374 0.299601 2.59 0.0097 POISSON (COUNT) REGRESSION PART Among those who have the risk of panic attack, being one year older increases the expected rate of panic attack by 1.8% (exp=0.018=1.018), holding all other variables constant, and this is statistically significant (p<.0001).

SUMMARY COUNT OUTCOME OVEDISPERSION? YES EXCESS ZEROS? YES ZERO-INFLATED POISSON proc countreg /dist=zip ZERO INFLATED (BINARY LOGISTIC) REGRESSION PART zeromodel y ~ x1 x2 ; POISSON (COUNT) REGRESSION PART model y= z1 z2 ; /*Entire SAS CODE for ZIP */ proc countreg method=qn; model y= z1 z2 /dist=zip; zeromodel y ~ x1 x2; run;

SUN Y. JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY s.jeon@aggiemail.usu.edu