DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005

Similar documents
Agricultural and Applied Economics 637 Applied Econometrics II. Assignment III Maximum Likelihood Estimation (Due: March 31, 2016)

Agricultural and Applied Economics 637 Applied Econometrics II. Assignment III Maximum Likelihood Estimation (Due: March 25, 2014)

Estimating the Variance of Decomposition Effects

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Latent class models for utilisation of health care

A simple bivariate count data regression model. Abstract

Applied Econometrics Lecture 1

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Nonlinear Models for Health and Medical Expenditure Data

Discrete Choice Modeling

MODELING COUNT DATA Joseph M. Hilbe

Dynamic Panel of Count Data with Initial Conditions and Correlated Random Effects. : Application for Health Data. Sungjoo Yoon

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

NELS 88. Latent Response Variable Formulation Versus Probability Curve Formulation

Applied Health Economics (for B.Sc.)

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Econometric Analysis of Count Data

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Discrete Choice Modeling

Semiparametric Generalized Linear Models

Partial effects in fixed effects models

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Truncation and Censoring

Endogenous Treatment Effects for Count Data Models with Endogenous Participation or Sample Selection

Censored quantile instrumental variable estimation with Stata

Microeconometrics. C. Hsiao (2014), Analysis of Panel Data, 3rd edition. Cambridge, University Press.

Field Course Descriptions

The Fixed-Effects Zero-Inflated Poisson Model with an Application to Health Care Utilization Majo, M.C.; van Soest, Arthur

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Using Instrumental Variables to Find Causal Effects in Public Health

A Guide to Modern Econometric:

ECON 594: Lecture #6

Mid-term exam Practice problems

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length

Econometric Analysis of Cross Section and Panel Data

Marginal and Interaction Effects in Ordered Response Models

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Editor Executive Editor Associate Editors Copyright Statement:

Analyzing Data Collected with Complex Survey Design. Hsueh-Sheng Wu CFDR Workshop Series Summer 2011

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Applied Microeconometrics (L5): Panel Data-Basics

A double-hurdle count model for completed fertility data from the developing world

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Multivariate negative binomial models for insurance claim counts

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Recent Developments in Cross Section and Panel Count Models

Stat 5101 Lecture Notes

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Testing and Model Selection

Basic Regressions and Panel Data in Stata

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

A flexible approach for estimating the effects of covariates on health expenditures

Counts using Jitters joint work with Peng Shi, Northern Illinois University

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Introduction to Linear Regression Analysis

Workshop for empirical trade analysis. December 2015 Bangkok, Thailand

Latent class analysis and finite mixture models with Stata

Day 1B Count Data Regression: Part 2

Inference and Regression

Determinants of General Practitioner Utilization: Controlling for Unobserved Heterogeneity

Group comparisons in logit and probit using predicted probabilities 1

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Introduction to Statistical Analysis

WISE International Masters

A strategy for modelling count data which may have extra zeros

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Exploring Marginal Treatment Effects

11. Bootstrap Methods

Modeling Health Care Utilization and Costs Partha Deb

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

Lecture 4: Multivariate Regression, Part 2

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Subject CS1 Actuarial Statistics 1 Core Principles

Econometric Analysis of Panel Data. Final Examination: Spring 2018

Final Exam - Solutions

APEC 8212: Econometric Analysis II

Comparing IRT with Other Models

Van Hasselt, M. (2011). Bayesian inference in a sample selection model. Journal of Econometrics, 165(2), DOI: /j.jeconom

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Marginal Specifications and a Gaussian Copula Estimation

PhD/MA Econometrics Examination. January, 2015 PART A. (Answer any TWO from Part A)

Day 3B Nonparametrics and Bootstrap

Econometrics for PhDs

INTRODUCTION TO BASIC LINEAR REGRESSION MODEL

Non-linear panel data modeling

Empirical Industrial Organization (ECO 310) University of Toronto. Department of Economics Fall Instructor: Victor Aguirregabiria

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

GROWING APART: THE CHANGING FIRM-SIZE WAGE PREMIUM AND ITS INEQUALITY CONSEQUENCES ONLINE APPENDIX

Poisson regression: Further topics

Practice exam questions

CARLETON ECONOMIC PAPERS

Testing for time-invariant unobserved heterogeneity in nonlinear panel-data models

A Course on Advanced Econometrics

Topic 10: Panel Data Analysis

Transcription:

DEEP, University of Lausanne Lectures on Econometric Analysis of Count Data Pravin K. Trivedi May 2005 The lectures will survey the topic of count regression with emphasis on the role on unobserved heterogeneity. The first part will cover the basic Poisson regression model and its most commonly used extensions. The second part will cover newer topics involving multivariate count models and models with endogenous regressors. The approach will emphasize practical modeling considerations. Illustrations will be drawn from my recent collaborative research in health economics. Topics to be covered 1. Basic Poisson regression and assumptions 2. Unobserved heterogeneity and motivation 3. Models with overdispersion 4. Generalizations of the Poisson - Negative binomial regression - Hurdle model - Zero inflated model - Latent class model 5. Maximum likelihood estimation 6. Robust variance estimation 7. Goodness of fit measures 8. An Empirical Example 9. Interpretation of Results 10. Partially parametric models. 11. Bivariate and multivariate count models 12. Count models with endogenous regressors - MLE and simulated MLE estimation - Copula based models - Moment based estimation 13. Estimating the impact (treatment effects) of endogenous variables 14. Empirical illustrations 1

General references A.C. Cameron and P.K. Trivedi, Regression Analysis of Count Data, CUP, 1998. (RACD) A.C. Cameron and P.K. Trivedi, Microeconometrics: Methods and Applications, CUP, 2005, (forthcoming). (MMA) J.S. Long, Regression Models for Categorical and Limited Dependent Variables, Sage (1997). P. Deb and P.K. Trivedi, Empirical models of health care use, forthcoming in Elgar Companion to Health Economics, Ed. Andrew Jones, Elgar, 2005. Useful for empirical work using STATA J.S. Long and J. Freese, Regression Models for Categorical and Limited Dependent Variables Using STATA (STATA Press). Chapter 7 of this book will be very useful in implementing standard count regression models. Specific references Topics 1-4: RACD Chapters 1, 3, 4.2, 4.7, 4.8, 6; MMA Chapter 20.1 20.4; Topics 5-6: RACD Chapter 2.5 MMA -- Chapter 20.5 Topic 7: RACD -- Chapter 5 Topic 8: P. Deb and P.K. Trivedi, Journal of Applied Econometrics, Vol. 12, 313-326, (1997); P. Deb and P.K. Trivedi, Journal of Health Economics Vol. 21, 601-625 (2002). Topic 9: RACD, Chapter 3. Topics 10-14: MMA, Chapter 20.5-20.6. P. Deb and P.K. Trivedi, Specification and Simulated Likelihood Estimation of a Non-normal Treatment-Outcome Model with Selection: Application to Health Care Utilization. Discussion paper, 2004. D.M. Zimmer and P.K. Trivedi, Using Trivariate Copulas to Model Sample Selection and Treatment Effects: Application to Family Health Care Demand. Discussion paper, 2005. C. Li and P.K. Trivedi, Disparity in Medical Care Use in the USA: The Role of Health Insurance Coverage and Selection Bias. Discussion paper, 2005. 2

DEEP Lectures on Count Data Analysis Homework 1 This assignment is concerned with practical issues regarding modeling of count data. The questions concern modeling a relationship between health care utilization and health insurance, controlling for several economic and socio-demographic factors. Please use the computer program STATA 8.0 or 9.0 to implement the required data analysis tasks. 1. You will receive by email a STATA data file which contains a subsample of data from the famous RAND Health Insurance Experiment. In tackling this assignment you should use STATA 8.0 or 9.0. In addition to the excellent online help available in STATA you will find it very useful to consult the book, Regression Models for Categorical and Dependent Variables Using STATA by J. Scott Long and Jeremy Freese. Chapter 7 of this book will be very useful in implementing the analysis required here. Data: The data are derived from one of the most well known social experiments conducted in the USA during the 1970s and 1980s. In this experiment the participants were exogenously assigned insurance plans with different coinsurance rate. The explanatory variable that corresponds to this is logc. This variable serves the same role as out-of-pocket price of doctor visit ; it is the central focus of studies of health utilization as a function of its determinants. Another price variable is IDP. Theincomevariableis linc, log of income. The demographic variables are family size, age, female, child, black, educed. Health status variables are physlim, ndisease, hlthg, hlthf, hlthp --allaredefined below. In your regression mdvis is the dependent variable. Your explanatory variables should exclude fmde and any other variables in the data file but not mentioned below in the table. Variable MDVIS LOGC IDP LPI FMDE LINC LFAM AGE FEMALE CHILD FEMCHILD BLACK EDUCDEC PHYSLIM NDISEASE HLTHG HLTHF HLTHP Definition Number of outpatient visits to an MD ln(coinsurance+1), 0 coinsurance 100 1 if individual deductible plan, 0 otherwise ln(max(1,annual participation incentive payment)) 0 if IDP=1 ln(max(1,mde/(0.01 coinsurance))) otherwise ln(family income) ln(family size) age in years 1ifpersonisfemale 1ifageislessthan18 FEMALE * CHILD 1 if race of household head is black education of the household head in years 1 if the person has a physical limitation number of chronic diseases 1 if self-rated health is good 1 if self-rated health is fair 1 if self-rated health is poor omitted category is excellent self-rated health 1

Data Analysis Tasks (a) Using appropriate STATA commands estimate Poisson and negative binomial regression with mdvis as the dependent variable and the following explanatory variable: logc idp linc female educdec xage black hlthg hlthf hlthp Carry out a likelihood ratio tests of the null hypothesis that the following variables have zero effect on mdvis: logc idp. Estimate the Poisson and NB models with regular and robust variance matrix options. Compare the results and comment on the appropriateness of the robust variance estimation procedure in each case. (b) Test for overdispersion in the Poisson regression using the formulations (3.39) and (3.40) in the Cameron-Trivedi, RACD, chapter 3. Which version of the variance formulation gets more support from the data? What do you conclude from this exercise? (c) Estimate the negative binomial model using the nbreg command in STATA. Compare the estimate of the overdispersion parameter with that in part (b) above? Explain the similarities and differences. Using the results from the Poisson and the NB regressions, compare the actual and fitted frequency distribution of counts. Does the NB model provide a better fit? How good is the fit intherighttailofthe distribution? (d) Using the results from the nbreg estimation above, compare the estimated marginal effect of a change in logc (examine the descriptive statistics for logc) for an average individual in excellent health (baseline) and an average individual in poor health (hlthp = 1). (e) For the above nbreg specification estimate the hurdle version consisting of a zero-part (logit or probit) and a positive part, (truncated-at-zero negative binomial). Compare these results with those from a regular negative binomial model. Analyze the similarities and differences between the implications of the two models. Based on your analysis, which model do you regard as a better explanation of the data? 2

SOME USEFUL COMMANDS FOR COUNT MODELS ESTIMATED IN STATA describe displays a summary of the contents of the data in memory or the data stored in a Stata-format dataset. summarize calculates and displays a variety of univariate summary statistics. If no varlist is specified, summary statistics are calculated for all the variables in the data. Option: detail produces additional statistics including skewness, kurtosis, and the four smallest and four largest values, along with various percentiles. tabulate produces one- and two-way tables of frequency counts along with various measures of association. poisson estimates a Poisson maximum-likelihood regression of depvar on varlist, where depvar is a nonnegative count variable. nbreg estimates a negative binomial (Poisson with overdispersion) maximum-likelihood regression of depvar on indepvars, where depvar is a nonnegative count variable. See help gnbreg for a version of the negative binomial model in which the overdispersion parameter can be modeled as a function of other variables. nbreg will estimate two different parameterizations of the negative binomial model. The default, also given by the option dispersion(mean), has dispersion for the i-th observation equal to 1 + alpha*exp(x_i*b + offset); i.e., the dispersion is a function of the expected mean of the counts for the i-th observation: exp(x_i*b + offset). The alternative parameterization, given by the option dispersion(constant), has dispersion equal to 1 + delta; i.e., it is a constant for all observations. fitstat computes the goodness of fit statistics for regression just run. zip estimates a maximum-likelihood zero-inflated Poisson regression of depvar on varlist, where depvar is a nonnegative count variable. Options: inflate(varlist[, offset(varname)] _cons) specifies the equation that determines whether the observed count is zero. Note that inflate() is not optional. Conceptually, omitting inflate() would be equivalent to estimating the model with poisson (zip) or nbreg (zinb). inflate(varlist[, offset(varname)]) specifies the variables in the equation. You may optionally include an offset for this varlist. inflate(_cons) specifies that the equation determining whether the count is zero contains only an intercept. To run a zero-inflated model of depvar with only an intercept in both equations, type "zip depvar, inflate(_cons)" or "zinb depvar, inflate(_cons)". zinb estimates a maximum-likelihood zero-inflated negative binomial regression of depvar on varlist, where depvar is a nonnegative count variable.

mfx compute calculates marginal effects after a count regression prvalue computes predicted probabilities for values of independent variable specified with x( ) and rest( ).