Applied Mathematical Scieces, Vol. 11, 2017, o. 42, 2047-2058 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2017.75179 Usig Logistic Regressio i Determiig the Effective Variables i Traffic Accidets Layla Aziz Ahmed Departmet of Mathematics, College of Educatio Uiversity of Garmia, Kurdista Regio, Iraq Copyright 2017 Layla Aziz Ahmed. This article is distributed uder the Creative Commos Attributio Licese, which permits urestricted use, distributio, ad reproductio i ay medium, provided the origial work is properly cited. Abstract This research was coducted to determie the importat ifluetial variables upo the deaths from road traffic accidets ad effect of each of those upo the studied pheomeo through applyig logistic regressio model. The maximum likelihood method was used to estimate parameters to determie the explaatory variables effect. Wald test was used to determie the sigificace of the explaatory variables. The data set used i this research cosists of a sample of (212) observatios ad was obtaied from the records of the directorate traffic- Garmia. The accidet victims is respose variable i this study ad it is a dichotomous variable with two categories. The study led to a umber of coclusios, amog them; logistic regressio models fit such data, three explaatory variables were foud most sigificatly associated to accidet victims respose variable amely; high speed, car type, ad locatio. Keywords: Traffic accidet, logistic regressio model, maximum likelihood estimatio, Wald's test
2048 Layla Aziz Ahmed 1. Itroductio The problem of deaths ad ijury as a result of road accidets is ow ackowledged to be a global pheomeo with authorities i virtually all coutries of the world cocered about the growth i the umber of people killed ad seriously ijured o the road. The word report o road traffic accidet prevetio has idicated the worldwide, a estimated 1.2 millio people died i road traffic accidet each year ad as may as 50 millio are beig ijured [3][6]. The logistic growth fuctio was first proposed as a tool for use i demographic studies by Verhulst (1838, 1845) ad was give its preset ame by reed ad Berkso (1929). The fuctio was also applied as a growth model i biology by Pearl ad Reed (1924) [7]. Logistic regressio is firstly developed by statisticia D. R. Cox i 1958 as a statistical method, ad after that it is used widely i may fields, icludig the medical ad social scieces [12]. Logistic regressio is used for predictio by fittig data to the logistic curve. It requires the fitted model to be compatible with the data. I logistic regressio, the variables are biary or multiomial. Gordo (1974) poited out that logistic regressio models have plays a major role i biological ad medical applicatios where cross- classified tables with large umbers of cells are typically replaced by a logistic or log- liear relatioship amog the variables, thus obviatig the eed for the table[14]. Logistic regressio was first proposed i the 1970s as a alterative techique to overcome limitatios of ordiary least square regressio i hadlig dichotomous outcomes. It became available i statistical packages i the early 1980s [10]. For logistic regressio, least squares estimatio is ot capable of producig miimum variace ubiased estimators for the actual parameters. I its place, maximum likelihood estimatio is used to solve for the parameters that best fit the data [15]. Bedard (2002) used the multivariate logistic regressio model to determie the idepedet cotributio of crash, driver ad vehicular characteristics that lead to icreasig driver's fatality risk. Reducig speed, icreasig the use of seatbelts ad reducig severity icideces attributed to driver side impacts was foud to be prevetig fatalities [8]. Mhamad (2011) used the logistic regressio model i traffic problems i Sulaimai, the purpose of the study was that the modelig of traffic accidets likig the accidet fatality ad the various factors that cause it [2]. Odhiambo (2015) used artificial eural etwork to model the mothly umber of road traffic ijuries ad the egative biomial regressio model as our baselie model ad oted that accidet data are o-egative itegers, ad thus S
Determiig the effective variables i traffic accidets 2049 the applicatio of stadard ordiary least squares regressio was ot appropriate [8]. 1.2 Purpose of Study This study was coducted to determie the importat ifluetial variables upo the death from road traffic accidets ad the effect of each of those upo the studied pheomeo through applyig logistic regressio model 1.3 Logistic Regressio Aalyses Logistic regressio is a statistical for aalyzig a dataset i which there are oe or more idepedet variables that determie a outcome. The outcome is measured with a dichotomous variable [16]. Logistic regressio provides a method for modelig a biary respose variables, which takes values 1 (success) ad 0 (failures). The goal of logistic regressio is to fid the beast fittig model to describe the relatioship betwee the dichotomous characteristic of iterest depedet variable (respose or outcome variable) ad a set of idepedet (predictor or explaatory) variables. Suppose that the model has the form [4]: y i = β 0 + β 1 x i + ε i (1) Ad the respose variable y i takes o the value either 0 or 1. We will assume that the variable y i is a Beroulli radom variable with probability distributio as follows: p(y i = 1) = p i (2) p(y i = 0) = 1 p i (3) Now sice E(ε i ) = 0, E(y i ) = p i, this implies that E(y i ) = β 0 + β 1 x i = p i (4) If the respose is biary, the error terms ε i ca oly take o two values, ε i = { 1 (β 0 + β 1 x i ), whe y i = 1 (β 0 + β 1 x i ), whe y i = 0 (5)
2050 Layla Aziz Ahmed Cosequetly, the errors i this model caot possibly be ormal, ad the error variace is ot costat, sice σ 2 yi = p i (1 p i ) 2 E(y i )[1 E(y i )] (6) σ yi = The logit respose fuctio has the form: E(y i ) = 1 1+exp [ (β 0 +β 1 x i )] (7) The estimated coefficiets for the idepedet variables are estimated usig either the logit value or the odds value as the depedet measure. I logistic regressio for a biary variable, we model the atural log of the odds ratio, which is called logit [13]. Each of these model formulatios is show here [1][4] [5]: p i Logit = l ( (1 p i ) 0 + β 1 x i ) (8) Odd = p i (1 p i ) 0 + β 1 x i ) (9) The simple logistic regressio model ca be easily exteded, for it to have to more tha oe predictor variable [9]. E(y i ) = exp (x i β) 1+exp (x i β) where x i β = β 0 + β 1 x i +. +β p 1 x i,p 1 (10) 1.4 Maximum Likelihood Estimatio Maximum likelihood method is the procedure of fidig the value of oe or more parameters for a give statistic which makes the kow likelihood distributio a maximum [9]. Sice each y i represets a biomial cout i the i th populatio, the joit probability desity fuctio of y is [2], [15]:
Determiig the effective variables i traffic accidets 2051 g(y i ) = i=1 f i (y i ) = y p i i=1 i (1 p i ) i y i (11) log g (y i ) = log ( i y ) p i i=1 y i i (1 p i ) i y i (12) After takig exp to both sides of equatio (12) log g (y i ) = (e K k=0 x ikβ k i=1 (1 e x ik β k K ) i (13) K ) y i K k=0 1+e k=0 x ik β k K log g (y i ) = (e y i k=0 x ikβ k i=1 ) (1 + e k=0 x ikβ k ) i (14) This is the kerel of the likelihood fuctio to maximize. Thus, takig the atural log of equatio (14) yields the log likelihood fuctio: K l(β) = i=1 y i ( k=0 x ik β k ) log (1 + e k=0 x ikβ k ) 15 (15) To fid the critical poits of the log likelihood fuctio, set the first derivative with respect to each β equal to zero. I differetiatig equatio (15) K K β k l(β) k=0 x ik β k = x ik (16) 1 = y β i=1 i x ik i K k 1 = i=1 y i x ik i K = y i = y i i=1 x ik i 1 (1 + e K k=0 x ikβ k ) 1+e k=0 x ik β k β k e K x k=0 ikβ k K 1+e k=0 x ik β k β k=0 k 1+e K e K k=0 x ikβ k x ik k=0 x ik β k x ik β k i=1 x ik i p i x ik (17) The equatios i equatio (17) equal to zero results i a system of k + 1 oliear equatios each with k + 1 ukow variables, ad so must be solved by iteratio. The system is a vector with elemets, β k. After verifyig that the solutio is the global mai rather tha a local maximum. The solutio must be umerically estimated usig a iterative process. Perhaps the most popular method for solvig systems of oliear equatios is Newto Raphso method [15].
2052 Layla Aziz Ahmed 1.5 Hosmer - Lemeshow Test The Hosmer - Lemeshow test is a statistical test for goodess of fit for the logistic regressio model. Givig by the statistic [10] X 2 = (O i i p i ) 2 i=1 (18) i p i (1 p i ) where i is the total umber of cases i the i th group. o i is the umber of evet outcomes i the ith group. p i is the average estimated probability of a evet outcome for the i th group. The test statistic a chi- square distributio with 2 degrees of freedom 1.6 Wald Test The Wald test is used to testig the sigificace of idividual coefficiets i the model. This statistic is calculated as [1], [4]: b j W = ( S.E(b j ) )2 (19) Where b represets the estimated coefficiet β for explaatory variable ad S. E(b) is its stadard error. The hypotheses are H 0 : β j = 0, j = 1,2,., k} (20) H 1 : β j 0 Each Wald statistic is compared with chi- square distributio with oe degree of freedom. A large value of chi- squared (with p- value< 0.05) idicates weak fit ad small chi- squared (with p- value closer to 1) idicate a good logistic regressio model fit. 1.7 The Coefficiet of Determiatio (R 2 ) There are several R 2 like statistics that ca be used to measure the stregth of the associatio betwee the depedet variables ad the predictor variables. Two commoly used statistics are [11]: Cox ad Sell R 2 2 = 1 ( L(β (0) ) ) 2 (21) ) R cs L(β Nagelkerkes R 2
Determiig the effective variables i traffic accidets 2053 R N 2 = R2 cs 2 1 L(β (0) ) (22) where L(β ) is the log-likelihood fuctio for the model with the estimated parameters ad L(β (0) is the log-likelihood with just the thresholds ad is the umber of cases. 2. Data Aalysis ad Results 2.1 Data Descriptio The data set used i this research cosists of a sample of (212) observatios ad was obtaied from the records of the directorate Traffic-Garmia i the period (2013-2014). The extractio results of aalyzes usig statistical package for social sciece (SPSS) V.22 icludes a set of data variables: Y: The respose variable is the accidet victim, which is biary (dichotomous) i ature (0 if the accidet results o ijury ad o fatality (56.6%), ad 1 whe there is at least oe ijury resultig from the accidet or at least oe fatality (43.4%). X: The explaatory variables, all variables are omial. X 1 : Accidet time (day 57.1%, ight 42.9%) X 2 : Car type (small car 45.3%, bus 42%, lorry 12.7%) X 3 : Accidet type (coup 21.7%, collisio 64.2%, ru over 14.2%) X 4 : Locatio (iside the city 25%, outside the city 75%) X 5 : Drivig licese (yes 81.6%, o 18.4%) X 6 : High speed (yes 52.8%, o 47.2%) X 7 : Due to the drikig (yes 4.7%, o 95.3%) X 8 : Road type (oe Side 96.7%, two Side 0.9%, cycle 2.4%) X 9 : Weather coditios (raiy 3.3%, suy 95.3%, cloudy 1.4%) X 10 : Other factors (crossig car illegally 6.1%, drivers carelessess 17.5%, losig cotrol 2.8%, eglectig the status of the roads 0.5%, crossig to the wrog side 4.2%, ot abidig by the traffic rules 30.7%, wrog tur 1.4%, other reasos 36.8%)
2054 Layla Aziz Ahmed Table 1.Classificatio Table Table (1) above shows that the fial iteratio, 67.9% of the cases was correctly classified as either accidets fatality with a cut value of 0.5 for its predicted probabilities. Step Table 2. Model summary -2 Log likelihood Cox & Sell R 2 Nagelkerke R 2 1 253.788 a.172.230 a. Estimatio termiated at iteratio umber 4 because parameter estimates chaged by less tha.001. From table (2), Cox & Sell R 2 idicatig that 17.2%of the variatio i the idepedet variable is explaied the logistic model. Nagelkerke R 2 is idicatig that 23.0%. Table 3. Hosmer ad Lemeshow Test Step Chi-square df Sig. 1 Observed Predicted Y1 0 1 Percetage Correct Step 0 Y1 0 120 0 100 1 92 0 0 Overall Percetage 56.6 Step 1 Y1 0 1 Overall Percetage 72 35 a. Costat is icluded i the model. b. The cut value is.500 33 72 68.6 67.3 67.9 8.167 8.417
Determiig the effective variables i traffic accidets 2055 The Hosmer ad Lemeshow goodess of fit statistic which is the appropriate test for model fit for the logistic regressio shows that the chi- square value is 8.167 ad the correspodig p- value for the chi- square distributio with 8 degree of freedom is 0.417 which meas that it is ot statistical sigificat ad there for our model is well fitted. Table 4. Variables i the equatios Variables B S.E. Wald df Sig. Exp (B) 95% C.I. for EXP(B) Lower Upper X1 -.275.317.752 1.386.760.408 1.414 X2.537.240 4.999 1.025 1.711 1.069 2.739 X3 -.301.266 1.276 1.259.740.439 1.247 X4 1.335.384 12.070 1.001 3.800 1.789 8.070 X5 -.492.399 1.520 1.218.611.280 1.337 X6 1.159.324 12.797 1.000 3.188 1.689 6.017 X7 -.089.773.013 1.909.915.201 4.162 X8 -.678.552 1.512 1.219.507.172 1.496 x9.215.693.096 1.756 1.240.319 4.822 x10 -.088.073 1.457 1.227.916.794 1.056 Costat -2.215 2.280.943 1.331.109 a. Variable(s) etered o step 1: X1, X2, X3, X4, X5, X6, X7, X8, x9, x10. Table (4) shows the logistic regressio coefficiet, Wald test, ad odd ratio for each of the predictors. Employig a 0.05 criterio of statistical sigificace, X 2, X 4, ad X 6 variables had sigificat effects o the accidets fatality. This is because their correspodig p- values are less tha 0.05, assumig a 95% cofidece level; hece the ull hypothesis was rejected. The predicted logit model was established usig three variables ( X 2, X 4, ad X 6 ) out of te explaatory variables i the model were sigificat usig the Wald's chi- square statistic. However, others were dropped because they did ot cotribute sigificatly to the model. The fitted model for predictio is l(odds) = 0.472X 2 + 1.335X 4 + 1.159X 6 l(odds) = 0.472 Car type + 1.335 Locatio + 1.159 High speed For every uit icrease i the car type variable parameter is equal to (0.537), the odds occurrig is (1.711), after cotrollig the impact of the other explaatory variables.
2056 Layla Aziz Ahmed The locatio variable, value of parameter is equal to (1.335), the odds occurrig is (3.800) times higher for the mea of explaatory variables, after cotrollig the impact of the other explaatory variables. The high speed variable parameter value is equal to (1.159), the odds occurrig is (3.188) times higher for the mea of explaatory variables whe cotrollig the impact of the other explaatory variables. This research was coducted to determie the importat ifluetial variables upo accidets fatality ad the effect of each of those upo the studied pheomeo through applyig regressio model. The logistic regressio model was applied ad the maximum likelihood method was used to estimate parameters to determie the explaatory variables effect. Wald test was used to determie the sigificace of the explaatory variables effect. Odds ratio was used to estimate the predictio power. Pearso chi- square was used for goodess of the fit. The results have showed that the logistic regressio models fit such data ad the variables (car type, locatio, ad high speed) have clear iflueces i terms of ifluecig o traffic accidets. 3. Coclusios From the practical work it is cocluded that: 1. The logistic regressio model that fitted data explaatory variables with accidets fatality has a statistically sigificat iterpretive ad predictive ability. 2. The relatioship betwee the respose variable ad explaatory variables were weak as the value of Cox & Sell R 2 ad Nagelkerke R 2 are 0.172 ad 0.23 respectively. 3. The variables (car type, locatio, ad high speed) had sigificat effects o traffic accidets ad great ifluece at accidets cause to humaity victims. 4. The speed cause has the greatest effect o traffic accidets, the locatio raked secod ad the car type raked third i terms of effect o traffic accidets. 4. Recommedatios 1. The result of the study shows that the car type, locatio, ad high speed have a great effect o traffic accidets, thus govermet must put more
Determiig the effective variables i traffic accidets 2057 traffic cotrol cameras i the highways ad put traffic sigs ad traffic light i all the roads of the city. 2. The govermet should provide the populatio with detailed guidace ad awareess of the traffic rules which ca be studied i educatio programs. Refereces [1] A. Afifi, V. A. Clark ad M. Susae, Computer-Aided Multivariate Aalysis, 4 th Editio, Chapma & Hall/CRC, Lodo, New York, 2004. [2] A. J. Mhamad, Usig Logistic Regressio Model i Traffic Problems i Sulaimai, M.Sc. Thesis, Uiversity of Sulaimai, 2011. [3] B. M. Hude ad Z. D. Aged, Statistical Aalysis of Road Traffic Car Accidet i Dire Dawa Admiistrative City, Easter Ethiopia, Sciece Joural of Applied Mathematics ad Statistics, 3 (2015), o. 6, 250-256. https://doi.org/10.11648/j.sjams.20150306.14 [4] D. C. Motagamery ad G. C. Ruger, Applied Statistics ad Probability for Egieers, 3 rd Editio, Joh Wiley ad Sos, Ic., USA, 2002. [5] E. B. Atitwa, Socio-Ecoomic Determiatio of Low Birth Weight i Keya: A Applicatio of Logistic Regressio Model, America Joural of Theoretical ad Applied Statistics, 4 (2015), o. 6, 438-445. https://doi.org/10.11648/j.ajtas.20150406.14 [6] H. M. Feta, D. L. Workie, Aalysis of Factors that affect road traffic accidets i Bahir Dar city, North Wester Ethiopia, Sciece Joural of Applied Mathematics ad Statistics, 2 (2014), o. 5, 91-96. https://doi.org/10.11648/j.sjams.20140205.11 [7] J. S. Cramer, The Origis of Logistic Regressio, Tiberge Istitute Discussio Paper, Faculty of Ecoomics ad Ecoometrics, Uiversity of Amsterdam, (2002). https://doi.org/10.2139/ssr.360300 [8] J. N. Odhiambo, A. K. Wajoya ad A. G. Waititu, Modelig Road Traffic Accidet Ijuries i Nairobi Couty: Model Compariso Approach, America Joural of Theoretical ad Applied Statistics, 4 (2015), o. 3, 178-184. https://doi.org/10.11648/j.ajtas.20150403.24
2058 Layla Aziz Ahmed [9] K. S. Barasa, C. Muchwaju, Icorporatig Survey Weights ito Biary ad Multiomial Logistic Regressio Models, Sciece Joural of Applied Mathematics ad Statistics, 3 (2015), o. 6, 243-249. https://doi.org/10.11648/j.sjams.20150306.13 [10] F. S. Osibajo, G. A. Olalude, M. O. Akitude ad A. G. Ajala, Applicatio of Logistic Regressio Model to Admissio Decisio of Foudatio Programme at Uiversity of Lagos, Iteratioal Joural of Mathematics ad Statistics Studies, 3 (2015), o. 4, 27-41. [11] G. A. Osuji, M. Obubu, H. O. Obiora-Ilouo ad C. N. Okoro, Post-Partum Hemorrhage i Delta State, Nigeria: A Logistic Approach, Iteratioal Joural of Mathematics ad Statistics Studies, 3 (2015), o. 5, 25-31. [12] R. Ji, F. Ya ad J. Zhu, Applicatio of Logistic Regressio Model i a Epidemiological Study, Sciece Joural of Applied Mathematics ad Statistics, 3 (2015), o. 5, 225-229. https://doi.org/10.11648/j.sjams.20150305.12 [13] R. A. Johso ad D. W. Wicher, Applied Multivariate Statistical Aalysis, 6 th Editio, Pearso Educatio, Ic., USA, 2007. [14] S. J. Press ad Sadra Wilso, Choosig Betwee Logistic Regressio ad Discrimiat Aalysis, Joural of the America Statistical Associatio, 73 (1978), o. 364, 699-705. https://doi.org/10.1080/01621459.1978.10480080 [15] A. C. Scott, Maximum Likelihood Estimatio of Logistic Regressio Models: Theory ad Implemetatio, (2002). [16] https://www.medcalc.org/maual/logistic-regressio.php 22/6/2016 Received: Jue 3, 2017; Published: August 16, 2017