See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

Similar documents
The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Basically, if you have a dummy dependent variable you will be estimating a probability.

Negative Binomial Regression

Comparison of Regression Lines

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

since [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Polynomial Regression Models

Chapter 11: Simple Linear Regression and Correlation

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

STAT 511 FINAL EXAM NAME Spring 2001

Composite Hypotheses testing

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

ANSWERS CHAPTER 9. TIO 9.2: If the values are the same, the difference is 0, therefore the null hypothesis cannot be rejected.

Limited Dependent Variables

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 13: Multiple Regression

Diagnostics in Poisson Regression. Models - Residual Analysis

Statistics for Economics & Business

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LOGIT ANALYSIS. A.K. VASISHT Indian Agricultural Statistics Research Institute, Library Avenue, New Delhi

The SAS program I used to obtain the analyses for my answers is given below.

Chapter 14 Simple Linear Regression

Economics 130. Lecture 4 Simple Linear Regression Continued

Linear Approximation with Regularization and Moving Least Squares

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Tests of Exclusion Restrictions on Regression Coefficients: Formulation and Interpretation

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

18. SIMPLE LINEAR REGRESSION III

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

28. SIMPLE LINEAR REGRESSION III

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

BIO Lab 2: TWO-LEVEL NORMAL MODELS with school children popularity data

Chapter 20 Duration Analysis

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Lecture 4 Hypothesis Testing

First Year Examination Department of Statistics, University of Florida

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

x = , so that calculated

Properties of Least Squares

THE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Chapter 5 Multilevel Models

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Correlation and Regression

Topic 23 - Randomized Complete Block Designs (RCBD)

/ n ) are compared. The logic is: if the two

January Examinations 2015

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Linear Regression Analysis: Terminology and Notation

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

Modeling and Simulation NETW 707

Statistics Chapter 4

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Hydrological statistics. Hydrological statistics and extremes

4.3 Poisson Regression

Biostatistics 360 F&t Tests and Intervals in Regression 1

Topic- 11 The Analysis of Variance

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

x i1 =1 for all i (the constant ).

Lecture 6: Introduction to Linear Regression

Chapter 15 Student Lecture Notes 15-1

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

Introduction to Generalized Linear Models

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Econometrics of Panel Data

Learning Objectives for Chapter 11

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Basic Business Statistics, 10/e

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

The written Master s Examination

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Chapter 8 Indicator Variables

4.1. Lecture 4: Fitting distributions: goodness of fit. Goodness of fit: the underlying principle

The Ordinary Least Squares (OLS) Estimator

Statistics for Business and Economics

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

3/3/2014. CDS M Phil Econometrics. Vijayamohanan Pillai N. CDS Mphil Econometrics Vijayamohan. 3-Mar-14. CDS M Phil Econometrics.

Chapter 12 Analysis of Covariance

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Scatter Plot x

Lecture 20: Hypothesis testing

CHAPTER 8. Exercise Solutions

CHAPTER 6 GOODNESS OF FIT AND CONTINGENCY TABLE PREPARED BY: DR SITI ZANARIAH SATARI & FARAHANIM MISNI

STAT 3008 Applied Regression Analysis

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Professor Chris Murray. Midterm Exam

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Chapter 5: Hypothesis Tests, Confidence Intervals & Gauss-Markov Result

ACCOUNTING FOR SERIAL CORRELATION IN COUNT MODELS OF TRAFFIC SAFETY

Transcription:

Count Data Models See Book Chapter 11 2 nd Edton (Chapter 10 1 st Edton) Count data consst of non-negatve nteger values Examples: number of drver route changes per week, the number of trp departure changes per week, drvers' frequency-of-use of ITS technologes over some tme perod, the number of accdents observed on road segments per year. Count data can be properly modeled by usng a number of methods, the most popular of whch are Posson and negatve bnomal regresson models.

Posson Regresson Model Consder the number of accdents occurrng per year at varous ntersectons n a cty. In a Posson regresson model, the probablty of ntersecton havng y accdents per year (where y s a non-negatve nteger) s gven by: ( ) P y = EXP ( ) y λ λ y! Where: P(y) s the probablty of ntersecton havng y accdents per year λ s the Posson parameter for ntersecton, whch s equal to ntersecton 's expected number of accdents per year, E[y].

Posson regresson models are estmated by specfyng the Posson parameter λ (the expected number of events per perod) as a functon of explanatory varables. The most common relatonshp between explanatory varables and the Posson parameter s the log-lnear model, λ = LN ( λ ) ( ) EXP βx = βx, or, equvalently Where: X s a vector of explanatory varables and β s a vector of estmable coeffcents.

In ths formulaton, the expected number of events per perod s gven by [ ] = λ = ( β ) E y EXP X For model estmaton, note the lkelhood functon s: So, wth the Posson equaton, ( ) P( y ) L β = ( ) L β = EXP ( ) λ λ y! y Snce λ EXP ( β X ) =, ( ) L β = ( ) EXP( βx ) EXP -EXP βx y! y

Whch gves the log-lkelhood, n. = 1 ( ) = ( ) + β (!) LL β EXP βx y X LN y Posson Regresson Model Goodness of Ft Measures The lkelhood rato test s a common test used to assess two competng models. It provdes evdence n support of one model The lkelhood rato test statstc s, -2[LL(β R ) LL (β U )] where

LL(β R ) s the log-lkelhood at convergence of the "restrcted" model (sometmes consdered to have all coeffcents n β equal to 0, or just to nclude the constant term, to test overall ft of the model) LL(β U ) s the log-lkelhood at convergence of the unrestrcted model. Ths statstc s χ 2 dstrbuted wth the degrees of freedom equal to the dfference n the numbers of coeffcents n the restrcted an unrestrcted model (the dfference n the number of coeffcents n the β R and the β U coeffcent vectors). Another measure of overall model ft s the ρ 2 statstc. The ρ 2 statstc s, 2 ρ = 1 LL LL ( β ) ( 0) Where:

LL(β) s the log-lkelhood at convergence wth coeffcent vector β and LL(0) s the ntal log-lkelhood (wth all coeffcents set to zero). The perfect model would have a lkelhood functon equal to one (all selected alternatve outcomes would be predcted by the model wth probablty one, and the product of these across the observatons would also be one) and the loglkelhood would be zero gvng a ρ 2 of one The ρ 2 statstc wll be between zero and one and the closer t s to one, the more varance the estmated model s explanng.

Truncated Posson Regresson Model Truncaton of data can occur n the routne collecton of transportaton data. Example, f the number of tmes per week an n-vehcle navgaton system s used on the mornng commute to work, durng weekdays, the data are rght truncated at 5, whch s the maxmum number of uses n any gven week. Estmatng a Posson regresson model wthout accountng for ths truncaton wll result n based estmates of the parameter vector β, and erroneous nferences wll be drawn. Fortunately, the Posson model s adapted easly to account for such truncaton. The rght-truncated Posson model s wrtten as: r y m P( y ) = λ y! ( λ m! ), m = 0

Where: P(y) s the probablty of commuter usng the system y tmes per week, λ s the Posson parameter for commuter ; m s the number of uses per week; and r s the rght truncaton (n ths case, 5 tmes per week). Negatve Bnomal Regresson Model Posson dstrbuton that restrcts the mean and varance to be equal: E[y] = VAR[y]. If ths equalty does not hold, the data are sad to be under dspersed (E[y ] > VAR[y ]) or overdspersed (E[y ] < VAR[y ]), and the coeffcent vector wll be based f correctve measures are not taken.

To account for cases when E[y ] VAR[y ], a negatve bnomal model s used. The negatve bnomal model s derved by rewrtng the λ equaton such that, λ = EXP(βX + ε ) where EXP(ε ) s a Gamma-dstrbuted error term wth mean 1 and varance α 2. The addton of ths term allows the varance to dffer from the mean as below, VAR[y ] = E[y ][1+ αe[y ]] = E[y ]+ αe[y ] 2 The Posson regresson model s regarded as a lmtng model of the negatve bnomal regresson model as α approaches zero, whch means that the selecton between these two models s dependent upon the value of α.

The parameter α s referred to as the overdsperson parameter. The negatve bnomal dstrbuton has the form, Py ( ) 1 α Γ((1 α) + y ) 1 α λ = Γ(1 α) y! (1 α) + λ (1 α) + λ y where Γ(.) s a gamma functon. Ths results n the lkelhood functon, 1α Γ((1 α) + y ) 1α λ L( λ ) = Γ(1 α) y! (1 α) + λ (1 α) + λ y

Zero-Inflated Posson and Negatve Bnomal Regresson Models Zero events can arse from two qualtatvely dfferent condtons. 1. One condton may result from smply falng to observe an event durng the observaton perod. 2. Another qualtatvely dfferent condton may result from an nablty to ever experence an event. Two states can be present, one beng a normal count-process state and the other beng a zero-count state. A zero-count state may refer to stuatons where the lkelhood of an event occurrng s extremely rare n comparson to the normal-count state where event occurrence s nevtable and follows some know count process

Two aspects of ths non qualtatve dstncton of the zero state are noteworthy: 1. There s a preponderance of zeroes n the data more than would be expected under a Posson process. 2. A samplng unt s not requred to be n the zero or near zero state nto perpetuty, and can move from the zero or near zero state to the normal count state wth postve probablty. Data obtaned from two-state regmes (normal-count and zero-count states) often suffer from overdsperson f consdered as part of a sngle, normal-count state because the number of zeroes s nflated by the zero-count state.

Zero-nflated Posson (ZIP) Assumes that the events, Y = (y 1, y 2,,y n ), are ndependent and the model s ( ) ( λ ) y = 0 wth probablty p + 1 p EXP y = y wth probablty ( 1 p ) EXP ( λ ) y y! λ. where y s the number of events per perod. Zero-nflated negatve bnomal (ZINB) regresson model follows a smlar formulaton wth events, Y = (y 1, y 2,, y n ), beng ndependent and,

1 y = 0 wth probablty p + ( 1 p) α 1 λ α + 1 1 α y Γ + y u (1 u) α y = y wth probablty ( 1 p), y=1, 2, 3... 1 Γ y! α where ( 1 ) ( 1 ) u = α α + λ. Zero-nflated models mply that the underlyng data-generatng process has a splttng regme that provdes for two types of zeros. The splttng process can be assumed to follow a logt (logstc) or probt (normal) probablty process, or other probablty processes. 1 α

A pont to remember s that there must be underlyng justfcaton to beleve the splttng process exsts (resultng n two dstnct states) pror to fttng ths type of statstcal model. There should be a bass for belevng that part of the process s n a zero-count state. To test the approprateness of usng a zero-nflated model rather than a tradtonal model, Vuong (1989) proposed a test statstc for non-nested models that s well suted for stuatons where the dstrbutons (Posson or negatve bnomal) are specfed. The statstc s calculated as (for each observaton ), ( ) ( ) f y X = 1 m LN f 2 y X where: f 1 (y X ) s the probablty densty functon of model 1, and

f 2 (y X ) s the probablty densty functon of model 2. Usng ths, Vuongs' statstc for testng the non-nested hypothess of model 1 versus model 2 s (Greene, 2000; Shankar et al., 1997), V n = 1 = = n 2 1 n = 1 1 n n m ( m m) ( ) n m S m Where: m s the mean ( ( ) = 1 1 n n m ), Sm s standard devaton, Vuongs' value s asymptotcally standard normal dstrbuted (to be compared to z-values), and f V s less than V crtcal (1.96 for a 95% confdence level), the test does not support the selecton of one model over another.

Large postve values of V greater than V crtcal favor model 1 over model 2, whereas large negatve values support model 2. Vuong statstc for ZINB(f 1 (.)) and NB(f 2 (.)) comparson t-statstc of the NB overdsperson parameter α < - 1.96 ZIP or Posson as alternatve to NB < 1.96 > 1.96 NB > 1.96 ZIP ZINB

Because overdsperson wll almost always nclude excess zeros, t s not always easy to determne whether excess zeros arse from true overdsperson or from an underlyng splttng regme. Ths could lead one to erroneously choose a negatve bnomal model when the correct model may be a zero-nflated Posson. The use of a zero-nflated model may be smply capturng model msspecfcaton that could result from factors such as unobserved effects (heterogenety) n the data.

LIMDEP ZIP/ZINB Beta Tau model: Uses same X s that predct frequency n the zero-state splttng model, but multpled by tau Separate functons model: Uses dfferent X s for splttng and frequency models Splttng functon: Logstc (default) or Normal Vuong statstc