The Poisson Regression Model

Similar documents
Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

Finite Mixture EFA in Mplus

Biostat Methods STAT 5500/6500 Handout #12: Methods and Issues in (Binary Response) Logistic Regression

Biostat Methods STAT 5820/6910 Handout #5a: Misc. Issues in Logistic Regression

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

arxiv: v1 [physics.data-an] 26 Oct 2012

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

CHAPTER 5 STATISTICAL INFERENCE. 1.0 Hypothesis Testing. 2.0 Decision Errors. 3.0 How a Hypothesis is Tested. 4.0 Test for Goodness of Fit

Introduction to Probability and Statistics

STK4900/ Lecture 7. Program

Outline for today. Maximum likelihood estimation. Computation with multivariate normal distributions. Multivariate normal distribution

Introduction to Probability for Graphical Models

General Linear Model Introduction, Classes of Linear models and Estimation

Ž. Ž. Ž. 2 QUADRATIC AND INVERSE REGRESSIONS FOR WISHART DISTRIBUTIONS 1

Participation Factors. However, it does not give the influence of each state on the mode.

ECE 534 Information Theory - Midterm 2

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Principal Components Analysis and Unsupervised Hebbian Learning

6 Stationary Distributions

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Estimating function analysis for a class of Tweedie regression models

Hotelling s Two- Sample T 2

Statistics II Logistic Regression. So far... Two-way repeated measures ANOVA: an example. RM-ANOVA example: the data after log transform

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

CERIAS Tech Report The period of the Bell numbers modulo a prime by Peter Montgomery, Sangil Nahm, Samuel Wagstaff Jr Center for Education

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Notes on Instrumental Variables Methods

Review of Probability Theory II

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

7.2 Inference for comparing means of two populations where the samples are independent

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Objectives. 6.1, 7.1 Estimating with confidence (CIS: Chapter 10) CI)

Objectives. Estimating with confidence Confidence intervals.

Elementary Analysis in Q p

Chapter 3. GMM: Selected Topics

HENSEL S LEMMA KEITH CONRAD

Named Entity Recognition using Maximum Entropy Model SEEM5680

The one-sample t test for a population mean

Chapter 7: Special Distributions

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

COMMUNICATION BETWEEN SHAREHOLDERS 1

i) the probability of type I error; ii) the 95% con dence interval; iii) the p value; iv) the probability of type II error; v) the power of a test.

ute measures of uncertainty called standard errors for these b j estimates and the resulting forecasts if certain conditions are satis- ed. Note the e

Solutions to exercises on delays. P (x = 0 θ = 1)P (θ = 1) P (x = 0) We can replace z in the first equation by its value in the second equation.

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES

Asymptotically Optimal Simulation Allocation under Dependent Sampling

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Collaborative Place Models Supplement 1

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Slash Distributions and Applications

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Exercises Econometric Models

Analysis of some entrance probabilities for killed birth-death processes

Machine Learning: Homework 4

ECON 4130 Supplementary Exercises 1-4

Estimating Time-Series Models

SAS for Bayesian Mediation Analysis

On split sample and randomized confidence intervals for binomial proportions

One-way ANOVA Inference for one-way ANOVA

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

Statics and dynamics: some elementary concepts

Topic: Lower Bounds on Randomized Algorithms Date: September 22, 2004 Scribe: Srinath Sridhar

arxiv:cond-mat/ v2 25 Sep 2002

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

1 Probability Spaces and Random Variables

UNIVERSITY OF DUBLIN TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

1 Gambler s Ruin Problem

arxiv: v1 [math-ph] 29 Apr 2016

1 Random Variables and Probability Distributions

POINTS ON CONICS MODULO p

General Random Variables

On the asymptotic sizes of subset Anderson-Rubin and Lagrange multiplier tests in linear instrumental variables regression

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATH 2710: NOTES FOR ANALYSIS

1-way quantum finite automata: strengths, weaknesses and generalizations

Random variables. Lecture 5 - Discrete Distributions. Discrete Probability distributions. Example - Discrete probability model

Estimation of the large covariance matrix with two-step monotone missing data

Background. GLM with clustered data. The problem. Solutions. A fixed effects approach

Determining Momentum and Energy Corrections for g1c Using Kinematic Fitting

Radial Basis Function Networks: Algorithms

2x2x2 Heckscher-Ohlin-Samuelson (H-O-S) model with factor substitution

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Outline. EECS150 - Digital Design Lecture 26 Error Correction Codes, Linear Feedback Shift Registers (LFSRs) Simple Error Detection Coding

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Supplementary Materials for Robust Estimation of the False Discovery Rate

1 Extremum Estimators

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Hypothesis Test-Confidence Interval connection

Department of Mathematics

LOGISTIC REGRESSION. VINAYANAND KANDALA M.Sc. (Agricultural Statistics), Roll No I.A.S.R.I, Library Avenue, New Delhi

Lecture 1.2 Units, Dimensions, Estimations 1. Units To measure a quantity in physics means to compare it with a standard. Since there are many

Lecture: Condorcet s Theorem

Topic 7: Using identity types

Transcription:

The Poisson Regression Model The Poisson regression model aims at modeling a counting variable Y, counting the number of times that a certain event occurs during a given time eriod. We observe a samle Y 1,..., Y n. Here, Y i can stand for the number of car accidents that erson i has during the last 5 years; the number of children of family i; the number of stries in comany i over the last 3 years; the number of brevets deosed by firm i during the last year (as a measure of innovation);... The Poisson regression model wants to exlain this counting variable Y i using exlicative variables x i, for 1 i n. This -dimensional variable x i contains characteristics for the i th observation. 1 The Poisson distribution By definition, Y follows a Poisson distribution with arameter λ if and only if for = 0, 1, 2,..., We recall that for a Poisson variable: P (Y = ) = ex( λ)λ, (1)! E[Y ] = λ and Var[Y ] = λ. (2) The Poisson distribution is discrete distribution, and we see the shae of its distribution in Figure 1, for several values of λ. In Figure 1, the distribution is visualized by lotting P (Y = ) versus. For low values of λ, the distribution is highly sewed. For large values of λ, the distribution of Y loos more normal. In the examles given above, Y i counts rather rare event, so that the value of λ will be rather small. For examle, we have high robabilities of having no or one car accident, but the robabilities of having several car accidents decay exonentially fast. The Poisson distribution is the most simle distribution for modeling counting data, but it is not the only one. 2 The Poisson regression model Lie in a linear regression model, we will model the conditional mean function using a linear combination β t x i of the exlicative variables: E[Y i x i ] = ex(β t x i ). (3) The use of the exonential function in (3) assures that the right hand side in the above equation is always ositive, as is the exected value of the counting variable Y i in the left hand side of the above equation. The choice for this exonential lin function is mainly for reasons of simlicity. In rincile, other lin functions returning only ositive values could be used, but then we do not sea about a Poisson regression model anymore.

lambda=0.5 lambda=1 0.0 0.2 0.4 0.6 0.0 0.1 0.2 0.3 lambda=3 lambda=10 0.0 0.05 0.10 0.15 0.20 0.0 0.04 0.08 0.12 The Poisson distribution for different values of λ

Moreover, to be able to use the Maximum Lielihood framewor, we will secify a distribution for Y i, given the exlicative variables x i. We as that every Y i, conditional on x i, follows a Poisson distribution with arameter λ i. Equations (2) and (3) give E[Y i x i ] = λ i = ex(β t x i ). Aim is then to estimate β, the unnown arameter in the model. Note that estimation of β induces an estimate of the whole conditional distribution of Y i given x i. This will allow us to estimate quantities lie P (Y i = 0 x i ), P (Y i > 5 x i ),... So we will be able to answer to questions lie What is the robability that somebody will have no single car accidents during a 5 year eriod, given the ersons characteristics x i, What is the robability that a family, given its characteristics x i, has more than 5 children,... Interretation of the arameters: Knowledge about β allows us to now the influence of an exlicative variable on the exected value of Y i. Suose for examle that we have x i = (x i1, x i2, 1) t. Then the Poisson regression model gives E[Y i x i ] = ex(β 1 x i1 + β 2 x i2 + β 3 ). The marginal effect of the first exlicative variable on the exected value of Y i, eeing the other variables constant, is given by E[Y i x i ] x i1 = β 1 ex(β 1 x i1 + β 2 x i2 + β 3 ). We see that β 1 has the same sign as this marginal effect, but the numerical value of the effect deends on the value of x i. We could summarize the marginal effects by relacing in the above equation x i1 an x i2 by average values of the exlicative variables over the whole samle. It is also ossible to interret β 1 as a semi-elasticity: log E[Y i x i ] x i1 = β 1. 3 The Maximum Lielihood estimator We observe data {(x i, y i ) 1 i n}. The number y i is a realization of the random variable Y i. The total log-lielihood is, using indeendency, given by with, according to (1), n Log L(y 1,..., y n β, x 1,..., x n ) = log P (Y i = y i β, x i ), P (Y i = y i β, x i ) = ex( λ i)λ y i i y i! (4)

and λ i = ex(β t x i ). Write now Log L(β) as shorthand notation for the total lielihood. Then it follows n Log L(β) = { ex(β t x i ) + y i (β t x i ) log(y i!)}. (5) The maximum lielihood (ML) estimator is then of course defined as ˆβ ML = argmax Log L(β). β It is instructive to comute the first order condition that the ML-estimator needs to fulfill. Derivation of (5) yields n (y i ŷ i )x i = 0, with ŷ i = ex( ˆβ t MLx i ) the fitted value of y i. The redicted/fitted value has as usual been taen as the estimated value of E[Y i x i ]. This first order condition tells us that the vector of residual is orthogonal to the vectors of exlicative variables. The advantage of the Maximum Lielihood framewor is that a formula for cov( ˆβ ML ) is readily available: ( cov( ˆβ n ) 1 ML ) = x i x t iŷ i Also, Hyothesis tests can now be carried by Wald test, Lagrange Multilier test, or Lielihood Ratio tests. 4 Overdisersion and the Negative binomial model If we believe the Poisson regression model, then we have E[Y i x i ] = Var[Y i x i ], imlying that the conditional mean function equals the condition variance function. This is very restrictive. If E[Y i x i ] < Var[Y i x i ], resectively E[Y i x i ] > Var[Y i x i ], then we sea about overdisersion, resectively underdisersion. The Poisson model does not allow for over- or underdisersion. A richer model is obtained by using the negative binomial distribution instead of the Poisson distribution. Instead of (4), we then use P (Y i = y i β, x i ) = Γ(θ + y ( ) yi ( i) λi 1 λ ) θ i. Γ(y i + 1)Γ(θ) λ i + θ λ i + θ This negative binomial distribution can be shown to have conditional mean λ i and conditional variance λ i (1 + η 2 λ i ), with η 2 := 1/θ. Note that the arameter η 2 is not

allowed to vary over the observations. As before, the conditional mean function is modeled as E[Y i x i ] = λ i = ex(β t x i ). The conditional variance function is then given by Var[Y i x i ] = ex(β t x i )(1 + η 2 ex(β t x i )). Using maximum lielihood, we can then estimate the regression arameter β, and also the extra arameter η. The arameter η measures the degree of over (or under) disersion. The limit case η = 0 corresonds to the Poisson model. Aendix: The Gamma function The Gamma function is defined as Γ(x) = 0 s x 1 ex( s)dx for every x > 0. Its most imortant roerties are 1. Γ( + 1) =! for every = 0, 1, 2, 3,... 2. Γ(x + 1) = xγ(x) for every x > 0. 3. Γ(0.5) = π The Gamma function can be seen as an extension of the factorial function! = ( 1)( 2)...... to all real ositive numbers. The Gamma function is increasing faster to infinity than any olynomial function or even exonential function. 5 Homewor We are interested in the number of accidents er service month for a samle of shis. The data can be found in the file shis.wmf. The endogenous variable is called ACC. The exlicative variables are: TYPE: there are 5 shi tyes, labeled as A-B-C-D-E or 1-2-3-4-5. TYPE is a categorical variable, and 5 dummy variables can be created: TA, TB, TC, TD, TE. CONSTRUCTION YEAR: the shis are constructed in one of four eriods, leading to the dummy variables T6064, T6569, T7074, and T7579. SERVICE: a measure for the amount of service that the shi has already carried out. Questions:

1. Mae an histogram of the variable ACC. Comment on its form. It this the histogram for the conditional of unconditional distribution of ACC? 2. Estimate the Poisson regression model, including all exlicative variables and a constant term. (Use estimation method: COUNT- integer counting data). 3. Comment on the coefficient for the variable SERVICE. Is it significant? 4. Perform a Wald test to test for the joint significance of the construction year dummy variables. 5. Given a shi of category A, constructed in the eriod 65-69, with SERVICE=1000. Predict the number of accidents er service month. Also estimate (a) the robability that no accident will occur for this shi, and (b) the robability that at most one accident will occur. 6. The comuter outut mentions: Convergence achieved after 9 iterations. What is this meaning? 7. What do we learn from the value of Probability(LR stat)? What is the corresonding null hyothesis? 8. Estimate now a Negative Binomial Model. EViews reorts the log(η 2 ) as the mixture arameter in the estimation outut. (a) Comare the estimates of β given by the two models. (b) Comare the seudo R 2 values of the two models. 9. Estimate now the Poisson model with only a constant term, so without exlicative variables (emty model). Derive mathematically a formula for this estimate of the constant term (in the emty model), using the first order condition of the ML-estimator.