ABSTRACT INTRODUCTION SUMMARY OF ANALYSIS. Paper

Similar documents
Analyzing the effect of Weather on Uber Ridership

Tornado Inflicted Damages Pattern

Effect of Weather on Uber Ridership

United Kingdom (UK) Road Safety Development. Fatalities

Choosing a Safe Vehicle Challenge: Analysis: Measuring Speed Challenge: Analysis: Reflection:

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Integrated Electricity Demand and Price Forecasting

MODELING OF 85 TH PERCENTILE SPEED FOR RURAL HIGHWAYS FOR ENHANCED TRAFFIC SAFETY ANNUAL REPORT FOR FY 2009 (ODOT SPR ITEM No.

Sri Harsha Ramineni Koteswarrao Sriram

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Ex-Ante Forecast Model Performance with Rolling Simulations

G M = = = 0.927

Dosing In NONMEM Data Sets an Enigma

RANDOM and REPEATED statements - How to Use Them to Model the Covariance Structure in Proc Mixed. Charlie Liu, Dachuang Cao, Peiqi Chen, Tony Zagar

Predictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers

Traffic accidents and the road network in SAS/GIS

QUASI-ANALYTIC ACCELERATION INJURY RISK FUNCTIONS: APPLICATION TO CAR OCCUPANT RISK IN FRONTAL COLLISIONS

Accident predictive system in Benue State using artificial neural network

Trend

Member Level Forecasting Using SAS Enterprise Guide and SAS Forecast Studio

DISPLAYING THE POISSON REGRESSION ANALYSIS

UPDATING THE MINNESOTA NATIONAL WETLAND INVENTORY

Unit D Energy-Analysis Questions

Fitting PK Models with SAS NLMIXED Procedure Halimu Haridona, PPD Inc., Beijing

(c) McHenry Software. McHenry. Accident Reconstruction. by Raymond R. McHenry Brian G. McHenry. McHenry Training Seminar 2008

Components for Accurate Forecasting & Continuous Forecast Improvement

Risk Assessment of Pedestrian Accident Area Using Spatial Analysis and Deep Learning

Thunderstorm Forecasting and Warnings in the US: Applications to the Veneto Region

Traffic and Road Monitoring and Management System for Smart City Environment

A Study of Red Light Cameras in Kansas City, MO

The use of SHAVE and NWS flash flood reports for impact characterization and prediction

TEEN DRIVER ELECTRONIC DEVICE OBSERVATION FORM

Global Catalyst Market

Qinlei Huang, St. Jude Children s Research Hospital, Memphis, TN Liang Zhu, St. Jude Children s Research Hospital, Memphis, TN

Charity Quick, Rho, Inc, Chapel Hill, NC Paul Nguyen, Rho, Inc, Chapel Hill, NC

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Activity 4. Life (and Death) before Seat Belts. What Do You Think? For You To Do GOALS

Method of Velocity Determination Through Measurements of the Vehicle Body Deformation Plane Approximation Method

ZERO INFLATED POISSON REGRESSION

SGA-2: WP6- Early estimates. SURS: Boro Nikić, Tomaž Špeh ESSnet Big Data Project Brussels 26.,

Paper The syntax of the FILENAME is:

Crime Analysis. GIS Solutions for Intelligence-Led Policing

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

GMM Logistic Regression with Time-Dependent Covariates and Feedback Processes in SAS TM

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Encapsulating Urban Traffic Rhythms into Road Networks

A User s Guide to the Federal Statistical Research Data Centers

Estimating Traffic Accidents in Turkey Using Differential Evolution Algorithm

The Framework and Application of Geographic Information Systems (GIS)

Automatic Singular Spectrum Analysis and Forecasting Michele Trovero, Michael Leonard, and Bruce Elsheimer SAS Institute Inc.

Q1. (a) The diagram shows a car being driven at 14 rn/s. The driver has forgotten to clear a thick layer of snow from the roof.

Delta Boosting Machine and its application in Actuarial Modeling Simon CK Lee, Sheldon XS Lin KU Leuven, University of Toronto

Strategies for dealing with Missing Data

Safety evaluations of level crossings

Math 10 Chapter 6.10: Solving Application Problems Objectives: Exponential growth/growth models Using logarithms to solve

exponential function exponential growth asymptote growth factor exponential decay decay factor

Data Analyses in Multivariate Regression Chii-Dean Joey Lin, SDSU, San Diego, CA

Randomized Decision Trees

Trip and Parking Generation Study of Orem Fitness Center-Abstract

A Comparison among various Classification Algorithms for Travel Mode Detection using Sensors data collected by Smartphones

SAS Macro for Generalized Method of Moments Estimation for Longitudinal Data with Time-Dependent Covariates

A Multistage Modeling Strategy for Demand Forecasting

Similarity Analysis an Introduction, a Process, and a Supernova Paper

GIS and Business Location Analytics

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Practice of SAS Logistic Regression on Binary Pharmacodynamic Data Problems and Solutions. Alan J Xiao, Cognigen Corporation, Buffalo NY

Temporal and Spatial Impacts of Rainfall Intensity on Traffic Accidents in Hong Kong

May 31, Flood Response Overview

Mathematics 102 Solutions for HWK 22 SECTION 7.6 P (R 1 )=0.05

Spotlight on Population Resources for Geography Teachers. Pat Beeson, Education Services, Australian Bureau of Statistics

Performing response surface analysis using the SAS RSREG procedure

HSIP FUNDING APPLICATION

Snow Plow Safety Quick Reference Guide

Statistics Revision Questions Nov 2016 [175 marks]

Online Advertising is Big Business

Your Virtual Workforce. On Demand. Worldwide. COMPANY PRESENTATION. clickworker GmbH 2017

Finding the Gold in Your Data

SESUG 2011 ABSTRACT INTRODUCTION BACKGROUND ON LOGLINEAR SMOOTHING DESCRIPTION OF AN EXAMPLE. Paper CC-01

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

IMPROVING FLOOD RISK COMMUNICATIONS: SIGNS OF CHANGING BEHAVIOR? DANIEL M. TOMCZAK, CFM

COMMUNITY PREPAREDNESS

Random Forests for Poverty Classification

CIVL 7012/8012. Collection and Analysis of Information

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

CONTACT SAFETY PROGRAM MAY 2006

Investigation of Road Traffic Fatal Accidents Using Data Mining Techniques

Winter tires within Europe, in Iceland and Norway

Predictive Modeling: Classification. KSE 521 Topic 6 Mun Yi

Metabolite Identification and Characterization by Mining Mass Spectrometry Data with SAS and Python

Analysis of fatalities attributed to Hurricane Florence in the US.

Math 3339 Homework 2 (Chapter 2, 9.1 & 9.2)

Hazards, Densities, Repeated Events for Predictive Marketing. Bruce Lund

AFS IO CORPORATE PPT

LEGAL DISCLAIMER. APG Coin (APG) White Paper (hereinafter 'the White Paper', 'the Document') is presented for informational purposes only

Modeling Effect Modification and Higher-Order Interactions: Novel Approach for Repeated Measures Design using the LSMESTIMATE Statement in SAS 9.

10/18/2016 The Hoosier Co. Inc W. 86th Street, Indianapolis, IN

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

RUSSIAN CHEMICAL INDUSTRY

The Three Things You Need to Know About Tsunami Preparedness Patrick Corcoran, Oregon Sea Grant,

Transcription:

Paper 1891-2014 Using SAS Enterprise Miner to predict the Injury Risk involved in Car Accidents Prateek Khare, Oklahoma State University; Vandana Reddy, Oklahoma State University; Goutam Chakraborty, Oklahoma State University ABSTRACT There are yearly 2.35 million road accident cases recorded in the U.S. Among them, 37,000 are considered fatal. Road crashes cost USD 230.6 billion per year, or an average of USD 820 per person. Our efforts are to identify the important factors that lead to vehicle collisions and to predict the injury risk involved in them. Data was collected from National Automotive Sampling System (NASS), containing 20,247 cases with 19 variables. Input variables describe the factors involved in an accident such as Height, Age, Weight, Gender, Vehicle model year, Speed limit, Energy absorption in Collision & Deformation location, etc. The target variable is nominal showing levels of injury. Missing values in interval variables were imputed using mean and class variables using the count method. Multivariate analysis suggests high correlation between tire footprint and wheelbase (Corr=0.97, P<0.0001) and original weight of car and curb weight of car (Corr=0.79, P<0.0001). Variables having high kurtosis values were transformed using range standardization. Variables were sorted using variable importance using decision tree analysis. Models such as multiple regression, polynomial regression, neural network, and decision tree were applied in the dataset to identify the factors that are most significant in predicting the injury risk. MLP neural network came out to be the best model to predict injury risk index, with the least Average Squared Error of 0.086 in validation dataset. INTRODUCTION The data is collected from National Automotive Sampling System (NASS), the following data is of Fatality Analysis of Accidents. It has attributes and factors relating to accidents. SUMMARY OF ANALYSIS The data is collected from National Automotive Sampling System (NASS), the following data is of Fatality Analysis of Accidents. It has attributes and factors relating to accidents. Table 1. Data Partition Summary Dataset: Input- 2 ID, 14 Interval, 6 Nominal variables Target- 1 Nominal variables 1

Table 2. Variable Description Table 3. Variable Summary GV_ENERGY showed high kurtosis values of +36.97. (Variable transformation node) was used to transform the distribution to make it normal. Other values are observed and recorded for better understanding of dataset. Missing values in Interval variables were imputed by mean values using (Impute Node). 2

Table 4. Class Variable Summary Table 5. Interval Variable Summary Missing values in Class variables are being imputed using count method. Dataset Preparation: Display 1. Imputation summary SAS Enterprise Miner 3

Display 2. Transformation R-Square summary shows variable transformation SAS Enterprise Miner Table 6. Variable selection using Decision tree RESULT & ANALYSIS Different models are used to predict the target GV_MAIS i.e. Injury risk index. The dataset was partitioned as 70% Training data and 30% validation data using Data Partition node Imputation Node was used to impute missing data with Mean & Count Highly skewed data distribution was transformed using R-Square Variable importance node was used to identify the most important variable that affects the model. MODELS USED TO TRAIN AND VALIDATE ARE: Multiple Regression Polynomial Regression Multi-layer Perception Neural Network Radial Equal width Neural Network Decision Tree 4

Diagram 1. Model Diagram (Nodes) Using Model comparison Node, Multi linear Perception Neural Network came out to be best model to predict Injury risk index, with least Average Square Error in validation dataset. Display 3. Fit Statistics SAS Enterprise Miner Display 4. Average Square Error plot SAS Enterprise Miner 5

Display 5. Cumulative Lift plot SAS Enterprise Miner Diagram 2. Neural Network CONCLUSION Various Regression models, Neural network models, Decision trees have been built and trained with numerous simulations to identify the important factors leading to car accidents in the United States MLP Neural Network model come out as the best model with the least average square error of 0.08626 The factors are related to both the individual risk of the person involved in the accident and the technical and performance features of the car Energy absorption of the car, lateral and longitudinal component of V Delta have been identified as the most important factors This analysis will help both the car manufactures & the people to prevent fatal car accidents 6

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Prateek Khare Organization: Oklahoma State University Address: 127 North Duck Street City, State ZIP: Stillwater, OK 74075 Work Phone: 4057621514 Email: Prateek.khare@okstate.edu Name: Vandana Reddy Organization: Oklahoma State University Address: 127 North Duck Street City, State ZIP: Stillwater, OK 74075 Work Phone: 2192969394 Email: Vandana.reddy@okstate.edu Name: Goutam Chakraborty Organization: Oklahoma State University Address: 420A Business Building City, State ZIP: Stillwater, OK 74075 Work Phone: 4057447644 Email: Goutam.Chakraborty@okstate.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7