Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Similar documents
A Guide to Modern Econometric:

Factors Affecting the Severity of Injuries Sustained in Collisions with Roadside Objects

Econometric Analysis of Cross Section and Panel Data

A Joint Econometric Framework for Modeling Crash Counts by Severity

Chapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a

User perceptions of highway roughness. Kevan Shafizadeh and Fred Mannering

arxiv: v1 [stat.ap] 11 Aug 2008

WISE International Masters

Lecture-19: Modeling Count Data II

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY

A Latent Class Modeling Approach for Identifying Injury Severity Factors and Individuals at High Risk of Death at Highway-Railway Crossings

DAYLIGHT, TWILIGHT, AND NIGHT VARIATION IN ROAD ENVIRONMENT-RELATED FREEWAY TRAFFIC CRASHES IN KOREA

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type

Markov switching multinomial logit model: an application to accident injury severities

A Joint Tour-Based Model of Vehicle Type Choice and Tour Length

Exploring Crash Injury Severity on Urban Motorways by Applying Finite Mixture Models

Applied Health Economics (for B.Sc.)

Joint Modeling of Longitudinal Item Response Data and Survival

Applied Microeconometrics (L5): Panel Data-Basics

Introduction to Econometrics. Regression with Panel Data

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

A NOTE ON GENERALIZED ORDERED OUTCOME MODELS

AN ARTIFICIAL NEURAL NETWORK MODEL FOR ROAD ACCIDENT PREDICTION: A CASE STUDY OF KHULNA METROPOLITAN CITY

Discrete Choice Modeling

Applied Econometrics Lecture 1

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros

Introduction to Econometrics

STAC51: Categorical data Analysis

SWUTC/15/ April Report

THE AUSTRALIAN NATIONAL UNIVERSITY. Second Semester Final Examination November, Econometrics II: Econometric Modelling (EMET 2008/6008)

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Controlling for Time Invariant Heterogeneity

Spatiotemporal Analysis of Urban Traffic Accidents: A Case Study of Tehran City, Iran

Lecture 9: Markov Switching Models

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Robustness of Logit Analysis: Unobserved Heterogeneity and Misspecified Disturbances

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

Effect of Environmental Factors on Free-Flow Speed

New Achievement in the Prediction of Highway Accidents

The relationship between urban accidents, traffic and geometric design in Tehran

Road Accident Analysis and Severity prediction Model On State Highway-5 (Halol-Shamlaji Section)

Departamento de Economía Universidad de Chile

MODELING OF 85 TH PERCENTILE SPEED FOR RURAL HIGHWAYS FOR ENHANCED TRAFFIC SAFETY ANNUAL REPORT FOR FY 2009 (ODOT SPR ITEM No.

Texas A&M University

Machine Learning Overview

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Course Notes: Statistical and Econometric Methods

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009

Econometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

EMERGING MARKETS - Lecture 2: Methodology refresher

Binomial and Poisson Probability Distributions

NCHRP Inclusion Process and Literature Review Procedure for Part D

EXAMINATION OF THE SAFETY IMPACTS OF VARYING FOG DENSITIES: A CASE STUDY OF I-77 IN VIRGINIA

ENHANCING ROAD SAFETY MANAGEMENT WITH GIS MAPPING AND GEOSPATIAL DATABASE

Geospatial Big Data Analytics for Road Network Safety Management

A Count Data Frontier Model

A Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification

Real-time Traffic Safety Evaluation Models And Their Application For Variable Speed Limits

A Driving Simulation Based Study on the Effects of. Road Marking Luminance Contrast on Driving Safety. Yong Cao, Jyh-Hone Wang

Introduction to Econometrics

Transportation and Road Weather

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico

CATEGORICAL MODELING TO EVALUATE ROAD SAFETY AT THE PLANNING LEVEL

Solutions to Odd-Numbered End-of-Chapter Exercises: Chapter 10

A hidden semi-markov model for the occurrences of water pipe bursts

The Framework and Application of Geographic Information Systems (GIS)

Gibbs Sampling in Latent Variable Models #1

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Syllabus. By Joan Llull. Microeconometrics. IDEA PhD Program. Fall Chapter 1: Introduction and a Brief Review of Relevant Tools

Xiaoguang Wang, Assistant Professor, Department of Geography, Central Michigan University Chao Liu,

CITY OF NEW LONDON WINTER ROAD & SIDEWALK MAINTENANCE POLICY

Bootstrap Approach to Comparison of Alternative Methods of Parameter Estimation of a Simultaneous Equation Model

Lecture-1: Introduction to Econometrics

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Comparing IRT with Other Models

Bayesian Methods for Machine Learning

Discrete panel data. Michel Bierlaire

Applied Economics. Panel Data. Department of Economics Universidad Carlos III de Madrid

Using a K-Means Clustering Algorithm to Examine Patterns of Vehicle Crashes in Before-After Analysis

MARITIME TRANSPORTATION RESEARCH AND EDUCATION CENTER TIER 1 UNIVERSITY TRANSPORTATION CENTER U.S. DEPARTMENT OF TRANSPORTATION

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES

Introduction to Linear Regression Analysis

DEVELOPMENT OF TRAFFIC ACCIDENT ANALYSIS SYSTEM USING GIS

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Town of Oconomowoc Snow & Ice Control Policy

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

Application of eigenvector-based spatial filtering approach to. a multinomial logit model for land use data

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

New Developments in Econometrics Lecture 11: Difference-in-Differences Estimation

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Week 2: Pooling Cross Section across Time (Wooldridge Chapter 13)

Bayesian tsunami fragility modeling considering input data uncertainty

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Investigation of Road Traffic Fatal Accidents Using Data Mining Techniques

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

Discrete Choice Modeling

Implication of GIS Technology in Accident Research in Bangladesh

Transcription:

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida

Highway Accidents Cost the lives of 1.25 million people per year Leading cause of death for people aged 15 to 29 years old These numbers have persisted in spite of advanced vehicle safety features, advances in highway design, and various safety-countermeasure policies

Analysis of Highway Accident Data Existing data bases include information from accident reports, local weather stations, highwayasset-management databases, etc. However current databases only cover a small fraction of the large number of elements that define human behavior, vehicle and roadway characteristics, traffic characteristics, and environmental conditions that determine the likelihood of an accident and its resulting injury severity (big data will not solve this issue).

Factors likely to be unobserved: Weather and lighting conditions change continually over time as do the driver reactions to these conditions Once an accident has occurred, the characteristics of energy dissipation through the vehicle structure and the resulting effect on individuals, which may vary widely based on which of the vehicle safety features deployed as well as bone mass, overall health, physical dimensions, and so on will not be known

Heterogeneity models Attempt to address data deficiencies with advanced statistical and econometric approaches Allow analysts to make more accurate inferences by explicitly accounting for observation-specific variations in the effects of influential factors Ignoring unobserved heterogeneity can lead to erroneous inferences

Example: The effect of gender Consider the indicator variable 1 if male 0 if not Attempts to account for general physiological and behavior differences Problem: There is great variation across people of the same gender, including differences in height, weight, bone density, driving experience, response to stimuli and other factors that are generally unavailable to the analyst

Example: Type of accident Consider accident types such as angle or head on and their effect on injury severity Problem: Vehicle-to-vehicle kinematic interactions relating to vehicle speed differences, differences in vehicle size, variations in vehicle impact locations, variations in structural integrity of the vehicles, and variations in angle of impact all comprise a significant portion of heterogeneity in collision-type effects

Example: Roadway lighting Consider the presence of roadway lighting at night (1 if present, 0 otherwise) Accounts for safety provided by lighting Problem: There will be unobserved variations across roadway segments in lighting type, the ambient lighting from land uses nearby, as well as the light-output and types of lighting used

Ignoring unobserved heterogeneity: Restricting the variable effect to be the same across observations: Model will be misspecified Estimates will be bias and inefficient Erroneous inferences and predictions

Addressing unobserved heterogeneity: Random Parameters Models Effects of variables can vary across observations according to a specified distribution Finite Mixture (Latent Class) Models Group of observations have the same parameters Combination of Finite Mixture/Random Parameters Markov-Switching Models (Temporal Heterogeneity)

Random parameters formulations: Note: Random effects models are not random parameters models Random effects necessitate panel data Random parameter models can be estimated with panel and cross-sectional data

Random parameters accident likelihood models Count-data models (negative binomial, zero-inflated) Duration models (time between accidents) Ordered probability models to generalize counts Tobit regression (for accident rates on highway segments)

Introduction of random parameters Accident likelihood models The effect of explanatory variables in model formulations for highway entity i (roadway segment): bx i x i where is a vector of explanatory variables (including a constant) and b is a vector of estimable parameters

Introduction of random parameters Accident likelihood models Each estimable parameter on explanatory variable l in the vector can be written as,): x i β il = bl +ϕil where β il is the parameter on the lth explanatory variable for observation i, b l is the mean parameter estimate across all observations for the lth explanatory variable, and φ il is a randomly distributed scalar term that captures unobserved heterogeneity across observations.

Introduction of random parameters Accident likelihood models The term φ il can assume an analyst-specified distribution (such as the normal, lognormal, triangular, uniform and Weibull distributions). Estimation of random parameters models of this form is typically achieved with simulated maximum likelihood

Random parameters and injury severities Injury severity outcomes such as no injury, possible injury, evident injury, disabling injury and fatality are modeled with a variety of discrete-outcome models: Multinomial logit Ordered probability models (ordered probit)

Introduction of random parameters Done in much the same way with the random parameters multinomial logit written as (the probability of accident i resulting in injury severity level k), α k + βx ik e Pi ( k) = f ( ) d α m + β ϕ β βxim e m where f ( β ϕ) is the continuous density function accounting for unobserved heterogeneity and φ is a vector characterizing the chosen density function. Injury-severity models

Random parameters with correlated parameters Correlation can exist between factors like weather (snow) and physical roadway conditions (pavement roughness). This complicates estimation and interpretation, but can be important to the specification

Correlated parameters assuming a multivariate normal distribution for β i, β i = b + Cϕ i where β i is a vector of random parameters corresponding to explanatory variables for observation i, b is the mean parameter estimate across all observations, C is a lower triangular matrix for correlation among the elements β i, and φ i is a randomly and independently distributed uncorrelated vector term.

Heterogeneity in means Common applications of random parameters models assume one mean for all observations It my be more realistic to have means vary across observations in a systematic way

Heterogeneity in means Allow means to be functions of explanatory variables, β i b + Θ z + Cϕ i = i where z i is a vector of explanatory variables from i that influence the mean of the random parameter vector, Θ is an matrix of estimable parameters

Latent class (finite mixture) models Idea that groups of observations (latent classes) share a parameter value as opposed to each observation having its own Does not require distributional assumptions like random parameters models

Latent class (finite mixture) models Probability of belonging to a specific latent class: i ( ) P c γz e = ic e g γz ig Where z ic is a vector of explanatory variables specific to observation i and latent class c (including a constant for all latent classes except one) and γ is a vector of estimable parameters.

Latent class (finite mixture) models Then, the conditional models become, for example, in the injury severity case: i ( ) P k c kc c ik = Or, if random parameters are allowed in each class: m e e α + bx α + bx mc c im α kc + βx c ik e Pi( k ) c = f ( ϕ ) d mc c im e m β α + βx c c βc

Temporal heterogeneity Due to their low frequencies, accident data are gathered over periods of time (weeks, months, years). Unobserved factors may vary from one time period to the next Results in time-varying unobserved heterogeneity

Markov-switching approach Multiple hidden states exist and there is a transition between these states over time For a model with two states: ( 1 0) and ( 0 1) P s = s = = p, P s = s = = p t+ 1 t 0 1 t+ 1 t 1 0 Where state transition probabilities p0 1 and p1 0 can be estimated from the accident data

Unobserved heterogeneity and omitted variables bias Many times it is difficult to collect all relevant data, creating an omitted variables bias in parameter estimates These omissions will be picked up as unobserved heterogeneity But the model will not be able to track data as well as having omitted variables included, so it is still a problem.

Unobserved heterogeneity and transferability If the model is found to have significant unobserved heterogeneity, the model cannot be spatially transferred (and possibly not temporally transferred) If significant unobserved heterogeneity is found, that also means standard fixed parameter models most certainly cannot be spatially transferred.

Summary Accounting for unobserved heterogeneity has provided important new insights into accident analysis Advances in estimation techniques and software have allowed this No one heterogeneity approach is superior, and the advantages of one over another change from database to database.

Summary (cont.) Methodological applications are needed that address underlying data issues (unobserved heterogeneity, etc.) The methodological frontier needs to expand to include sophisticated new statistical and econometric methods