The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle Y 1,..., Y n. Here, Y i can stand for the number of car accidents that erson i has during the last 5 years; the number of children of family i; the number of stries in comany i over the last 3 years; the number of brevets deosed by firm i during the last year (as a measure of innovation);... The Poisson regression model wants to exlain this counting variable Y i using exlicative variables x i, for 1 i n. This -dimensional variable x i contains characteristics for the i th observation. 1 The Poisson distribution By definition, Y follows a Poisson distribution with arameter λ if and only if for = 0, 1, 2,..., We recall that for a Poisson variable: P (Y = ) = ex( λ)λ, (1)! E[Y ] = λ and Var[Y ] = λ. (2) The Poisson distribution is discrete distribution, and we see the shae of its distribution in Figure 1, for several values of λ. In Figure 1, the distribution is visualized by lotting P (Y = ) versus. For low values of λ, the distribution is highly sewed. For large values of λ, the distribution of Y loos more normal. In the examles given above, Y i counts rather rare event, so that the value of λ will be rather small. For examle, we have high robabilities of having no or one car accident, but the robabilities of having several car accidents decay exonentially fast. The Poisson distribution is the most simle distribution for modeling counting data, but it is not the only one. 2 The Poisson regression model Lie in a linear regression model, we will model the conditional mean function using a linear combination β t x i of the exlicative variables: E[Y i x i ] = ex(β t x i ). (3) The use of the exonential function in (3) assures that the right hand side in the above equation is always ositive, as is the exected value of the counting variable Y i in the left hand side of the above equation. The choice for this exonential lin function is mainly for reasons of simlicity. In rincile, other lin functions returning only ositive values could be used, but then we do not sea about a Poisson regression model anymore.
lambda=0.5 lambda=1 0.0 0.2 0.4 0.6 0.0 0.1 0.2 0.3 lambda=3 lambda=10 0.0 0.05 0.10 0.15 0.20 0.0 0.04 0.08 0.12 The Poisson distribution for different values of λ
Moreover, to be able to use the Maximum Lielihood framewor, we will secify a distribution for Y i, given the exlicative variables x i. We as that every Y i, conditional on x i, follows a Poisson distribution with arameter λ i. Equations (2) and (3) give E[Y i x i ] = λ i = ex(β t x i ). Aim is then to estimate β, the unnown arameter in the model. Note that estimation of β induces an estimate of the whole conditional distribution of Y i given x i. This will allow us to estimate quantities lie P (Y i = 0 x i ), P (Y i > 5 x i ),... So we will be able to answer to questions lie What is the robability that somebody will have no single car accidents during a 5 year eriod, given the ersons characteristics x i, What is the robability that a family, given its characteristics x i, has more than 5 children,... Interretation of the arameters: Knowledge about β allows us to now the influence of an exlicative variable on the exected value of Y i. Suose for examle that we have x i = (x i1, x i2, 1) t. Then the Poisson regression model gives E[Y i x i ] = ex(β 1 x i1 + β 2 x i2 + β 3 ). The marginal effect of the first exlicative variable on the exected value of Y i, eeing the other variables constant, is given by E[Y i x i ] x i1 = β 1 ex(β 1 x i1 + β 2 x i2 + β 3 ). We see that β 1 has the same sign as this marginal effect, but the numerical value of the effect deends on the value of x i. We could summarize the marginal effects by relacing in the above equation x i1 an x i2 by average values of the exlicative variables over the whole samle. It is also ossible to interret β 1 as a semi-elasticity: log E[Y i x i ] x i1 = β 1. 3 The Maximum Lielihood estimator We observe data {(x i, y i ) 1 i n}. The number y i is a realization of the random variable Y i. The total log-lielihood is, using indeendency, given by with, according to (1), n Log L(y 1,..., y n β, x 1,..., x n ) = log P (Y i = y i β, x i ), P (Y i = y i β, x i ) = ex( λ i)λ y i i y i! (4)
and λ i = ex(β t x i ). Write now Log L(β) as shorthand notation for the total lielihood. Then it follows n Log L(β) = { ex(β t x i ) + y i (β t x i ) log(y i!)}. (5) The maximum lielihood (ML) estimator is then of course defined as ˆβ ML = argmax Log L(β). β It is instructive to comute the first order condition that the ML-estimator needs to fulfill. Derivation of (5) yields n (y i ŷ i )x i = 0, with ŷ i = ex( ˆβ t MLx i ) the fitted value of y i. The redicted/fitted value has as usual been taen as the estimated value of E[Y i x i ]. This first order condition tells us that the vector of residual is orthogonal to the vectors of exlicative variables. The advantage of the Maximum Lielihood framewor is that a formula for cov( ˆβ ML ) is readily available: ( cov( ˆβ n ) 1 ML ) = x i x t iŷ i Also, Hyothesis tests can now be carried by Wald test, Lagrange Multilier test, or Lielihood Ratio tests. 4 Overdisersion and the Negative binomial model If we believe the Poisson regression model, then we have E[Y i x i ] = Var[Y i x i ], imlying that the conditional mean function equals the condition variance function. This is very restrictive. If E[Y i x i ] < Var[Y i x i ], resectively E[Y i x i ] > Var[Y i x i ], then we sea about overdisersion, resectively underdisersion. The Poisson model does not allow for over- or underdisersion. A richer model is obtained by using the negative binomial distribution instead of the Poisson distribution. Instead of (4), we then use P (Y i = y i β, x i ) = Γ(θ + y ( ) yi ( i) λi 1 λ ) θ i. Γ(y i + 1)Γ(θ) λ i + θ λ i + θ This negative binomial distribution can be shown to have conditional mean λ i and conditional variance λ i (1 + η 2 λ i ), with η 2 := 1/θ. Note that the arameter η 2 is not
allowed to vary over the observations. As before, the conditional mean function is modeled as E[Y i x i ] = λ i = ex(β t x i ). The conditional variance function is then given by Var[Y i x i ] = ex(β t x i )(1 + η 2 ex(β t x i )). Using maximum lielihood, we can then estimate the regression arameter β, and also the extra arameter η. The arameter η measures the degree of over (or under) disersion. The limit case η = 0 corresonds to the Poisson model. Aendix: The Gamma function The Gamma function is defined as Γ(x) = 0 s x 1 ex( s)dx for every x > 0. Its most imortant roerties are 1. Γ( + 1) =! for every = 0, 1, 2, 3,... 2. Γ(x + 1) = xγ(x) for every x > 0. 3. Γ(0.5) = π The Gamma function can be seen as an extension of the factorial function! = ( 1)( 2)...... to all real ositive numbers. The Gamma function is increasing faster to infinity than any olynomial function or even exonential function. 5 Homewor We are interested in the number of accidents er service month for a samle of shis. The data can be found in the file shis.wmf. The endogenous variable is called ACC. The exlicative variables are: TYPE: there are 5 shi tyes, labeled as A-B-C-D-E or 1-2-3-4-5. TYPE is a categorical variable, and 5 dummy variables can be created: TA, TB, TC, TD, TE. CONSTRUCTION YEAR: the shis are constructed in one of four eriods, leading to the dummy variables T6064, T6569, T7074, and T7579. SERVICE: a measure for the amount of service that the shi has already carried out. Questions:
1. Mae an histogram of the variable ACC. Comment on its form. It this the histogram for the conditional of unconditional distribution of ACC? 2. Estimate the Poisson regression model, including all exlicative variables and a constant term. (Use estimation method: COUNT- integer counting data). 3. Comment on the coefficient for the variable SERVICE. Is it significant? 4. Perform a Wald test to test for the joint significance of the construction year dummy variables. 5. Given a shi of category A, constructed in the eriod 65-69, with SERVICE=1000. Predict the number of accidents er service month. Also estimate (a) the robability that no accident will occur for this shi, and (b) the robability that at most one accident will occur. 6. The comuter outut mentions: Convergence achieved after 9 iterations. What is this meaning? 7. What do we learn from the value of Probability(LR stat)? What is the corresonding null hyothesis? 8. Estimate now a Negative Binomial Model. EViews reorts the log(η 2 ) as the mixture arameter in the estimation outut. (a) Comare the estimates of β given by the two models. (b) Comare the seudo R 2 values of the two models. 9. Estimate now the Poisson model with only a constant term, so without exlicative variables (emty model). Derive mathematically a formula for this estimate of the constant term (in the emty model), using the first order condition of the ML-estimator.