Phd Program in Transportation. Transport Demand Modeling. Session 8

Size: px

Start display at page:

Download "Phd Program in Transportation. Transport Demand Modeling. Session 8"

Corey Alexander
5 years ago
Views:

1 Phd Program in Transportation Transport Demand Modeling Luis Martínez (based on the Lessons of Anabela Ribeiro TDM2010) Session 8 Generalized Linear Models Phd in Transportation / Transport Demand Modelling 1/44

2 GZLM Count Data Analysis steps Model Formulation Model Adjusment Model Selection and Validation Phd in Transportation / Transport Demand Modelling 2/44

3 GZLM Count Data Model Formulation Random Component - Dependent Variable (distributed as a Poisson or Negative Binomial) Systematic Component - Independent variables (explaining the dependent variable) Link or connection function (logarithmic) Phd in Transportation / Transport Demand Modelling 3/44

4 GZLM Count Data The type of model could be selected among a series of model types Poisson loglinear-specifies Poisson as the distribution and Log as the link function. Negative binomial with log link. Specifies Negative binomial (with a value of 1 for the ancillary parameter) as the distribution and Log as the link function. The custom tool allows the selection of specific models (specific distribuition) together with a specific link function Phd in Transportation / Transport Demand Modelling 4/44

5 GZLM Count Data The dependent variable is defined here Phd in Transportation / Transport Demand Modelling 5/44

6 GZLM Count Data Factors - Factors are categorical predictors; they can be numeric or string. Covariates - Covariates are scale predictors; they must be numeric Offset - The offset term is a structural predictor. Its coefficient is not estimated by the model but is assumed to have the value 1; thus, the values of the offset are simply added to the linear predictor of the target. This is especially useful in Poisson regression models, where each case may have different levels of exposure to the event of interest. When modeling accident rates for individual drivers, there is an important difference between a driver who has been at fault in one accident in three years of experience and a driver who has been at fault in one accident in 25 years! The number of accidents can be modeled as a Poisson or negative binomial response with a log link if the natural log of the experience of the driver is included as an offset term. Phd in Transportation / Transport Demand Modelling 6/44

GZLM Count Data Model Effects - The default model is intercept-only, the other model effects must be explicitly specified Main effects - Creates a maineffects term for each variable

7 GZLM Count Data Model Effects - The default model is intercept-only, the other model effects must be explicitly specified Main effects - Creates a maineffects term for each variable selected. Interaction - Creates the highestlevel interaction term for all selected variables. Selecting a model with an intercept term Phd in Transportation / Transport Demand Modelling 7/44

For the exponential family yi i b i f yi i, exp c yi,, y a i i 1.

8 GZLM Count Data Model Adjustment Maximum likelihood method (to estimate variables coefficients and dispersion parameter φ) Interactive computational estimation method: 1. For the exponential family yi i b i f yi i, exp c yi,, y a i i 1. The Log of Maximum likelihood estimation is given by L, ; y N i 1 exp y i i a i b i c y i, Phd in Transportation / Transport Demand Modelling 8/44

GZLM Count Data Model Adjustment Variable Coefficients Maximum likelihood method maximizes the likelihood function Y i in relation to βj, and therefore it allows to determine the absolute maximum

9 GZLM Count Data Model Adjustment Variable Coefficients Maximum likelihood method maximizes the likelihood function Y i in relation to βj, and therefore it allows to determine the absolute maximum (since the logarithmic function always grows). We must then solve the system of equations S(θi)=0 S( i ) L(, Since it is a system of non linear equations it must be estimated iteratively Newton-Raphson Fisher-Scoring j ; y) Hybrid (Fisher on a set of initial iterations and than changed to Newton) Phd in Transportation / Transport Demand Modelling 9/44

10 GZLM Count Data Model Adjustment Scale Parameter ϕ The scale parameter φ has a different nature then vector β β has a direct influence on the expected value of variable Yi, and the parameter reveals the data dispersion On some exponential families such as Poisson, the parameter is fixed and not estimated On other distributions it must be estimated through maximum likelihood log for the Y vector, by a derivate in order to φ and being equal to zero. Phd in Transportation / Transport Demand Modelling 10/44

GZLM Count Data Method - Estimation methods for the parameters could be selected here Scale parameter method - Maximumlikelihood jointly estimates the scale parameter with the

11 GZLM Count Data Method - Estimation methods for the parameters could be selected here Scale parameter method - Maximumlikelihood jointly estimates the scale parameter with the model effects. This option is not valid if the response has a negative binomial, Poisson, binomial, or multinomial distribution. Phd in Transportation / Transport Demand Modelling 11/44

12 GZLM Count Data Model Adjustment - Scale Parameter ϕ Estimated through Deviance φ D D D N p c 2( L N L p m ) Lc is the maximum likelihood log of the complete model (with all the variables) Lm is the maximum likelihood log of the model under analysis If Deviance is superior to N-p the model is overdispersed N road segments and p variables D is the Deviance Phd in Transportation / Transport Demand Modelling 12/44

is superior to N-p the model is overdispersed N road segments and p variables Both must be close

13 GZLM Count Data Model Adjustment - Scale Parameter ϕ Or through the statistic χ 2 of Pearson (φ χ2 ) N N p N p i 1 ( yi yˆ i ) var( yˆ ) i 2 Where χ 2 is the statistic of Pearson If χ 2 is superior to N-p the model is overdispersed N road segments and p variables Both must be close to 1 in order to use Poisson Regression Phd in Transportation / Transport Demand Modelling 13/44

14 GZLM Count Data Model Adjustment Model Selection and Validation Calculation of the previous two statistics guides the model choice Deviance φ D χ 2 of Pearson (φ χ2 ) If these values are higher than N-p than the model is overdispersed or other type of distribution is presented rather than Poisson Next step is to evaluate the adjustment quality of the model against other models Phd in Transportation / Transport Demand Modelling 14/44

15 GZLM Count Data Analysis type - Type I analysis is generally appropriate when there are a priori reasons for ordering predictors in the model. Type III is more generally applicable. The chi squared statistics could be estimated either using Wald or likelihood-ratio. Confidence intervals - Wald intervals are based on the assumption that parameters have an asymptotic normal distribution; profile likelihood intervals are more accurate but can be computationally expensive. The tolerance level is the criteria used to stop the iterative algorithm used to compute the intervals. Log-likelihood function -This controls the display format of the log-likelihood function. The full function includes an additional term that is constant with respect to the parameter estimates; it has no effect on parameter estimation. Phd in Transportation / Transport Demand Modelling 15/44

GZLM Count Data Print Case processing summary - number and percentage of cases included and excluded from the analysis and the Correlated Data Summary table.

16 GZLM Count Data Print Case processing summary - number and percentage of cases included and excluded from the analysis and the Correlated Data Summary table. Descriptive statistics - descriptive statistics and summary information about the dependent variable, covariates, and factors. Model information - dataset name, dependent variable or events and trials variables, offset variable, scale weight variable, probability distribution, and link function. Goodness of fit statistics - Deviance and scaled deviance, Pearson chi-square and scaled Pearson chi-square, log-likelihood, Akaike s information criterion (AIC), finite sample corrected AIC (AICC), Bayesian information criterion (BIC), and consistent AIC (CAIC). Model summary statistics - likelihood-ratio statistics for the model fit omnibus test and Lagrange multiplier test - Lagrange multiplier test statistics for statistics for the Type I or III contrasts for each assessing the validity of a scale parameter that is computed using effect. the deviance or Pearson chi-square. For the negative binomial Parameter estimates - Displays parameter distribution, this tests the fixed ancillary parameter. estimates and corresponding test statistics and confidence intervals. In addiction it can optionally Phd in Transportation / Transport Demand Modelling display exponentiated parameter estimates. 16/44

17 GZLM Count Data Model Selection and Validation Over dispersion of data (Maximum Likelihood Ratio and Lagrange Tests) Statistical significance (Wald test) Predictive capacity (Omnibus test; Pseudo R 2 ) Comparing between models (log maximum likelihood: AIC/AICC/BIC/CAIC) Phd in Transportation / Transport Demand Modelling 17/44

18 GZLM Count Data Model Selection and Validation Over dispersion of data (a) - Maximum likelihood ratio This test analyses the equality between the mean and the variance through Poisson Regression Standard against the alternative of the variance exceeding the mean (Binomial Negative) H0:K=0 H1:K=>0 X 2 ~ 2[ L( P) L( NB )] If p value is below 0.05 than the null hypothesis is rejected overdispersion is than identified Phd in Transportation / Transport Demand Modelling 18/44

19 GZLM Count Data Model Selection and Validation Over dispersion of data (b) - Lagrange tests on negative binomial for K variable = 0 (H 0 : K=0) Significance: Negative Binomial Non significance: Poisson Possibility of overdispersion being related with excess of zeros: Zero Inflated Poisson (but not possible without the presence of zero accidents segments) Phd in Transportation / Transport Demand Modelling 19/44

GZLM Count Data Model Selection and Validation Testing for the statistical signifcance of each coeficient β Assimptotical test or Wald Test WS var j 2 j v H

20 GZLM Count Data Model Selection and Validation Testing for the statistical signifcance of each coeficient β Assimptotical test or Wald Test WS var j 2 j v H 0 : ˆ j 0 For low p values the null hyphthesis is refused and the variable is influent in the model Phd in Transportation / Transport Demand Modelling 20/44

21 GZLM Count Data Model Selection and Validation If there is an indication for overdispersion, two other models must be tested Overdispersed Poisson regression (where a scale parameter in admissible) Negative Binomial Phd in Transportation / Transport Demand Modelling 21/44

22 GZLM Count Data Model Selection and Validation Predictive capacity of the model Omnibus test X 2 2[ LL( R ) LL( U )] Pseudo R LL( LL( U ) R ) R 2 Phd in Transportation / Transport Demand Modelling 22/44

23 GZLM Count Data Model Selection and Validation Omnibus test X 2 2[ LL( R ) LL( If significant (p-value less than 0,05) means that the estimated model is better than the null model U )] Lβ U is the log likelihood of the unrestricted model L β R is the log likelihood of the restricted model (without independent variables) With number of degress of freedmon equal to the diference between the number of parameters in the restricted and unrestricted model Phd in Transportation / Transport Demand Modelling 23/44

24 GZLM Count Data Model Selection and Validation Having the value of the Omnibus test and the value of LLβ U is possible to estimate the the log likelihood LLβ R of the restricted model (without independent variables and with all parameters set to zero) Having the value of L β 0 and Lβ it is possible to estimate the Pseudo R 2 1 LL( LL( U ) R ) Phd in Transportation / Transport Demand Modelling 24/44

25 GZLM Count Data Model Selection and Validation Having the value of the Pseudo R it is possible to estimate R 2 through an empirical relation set by Domencich and Macfaden (1975) Phd in Transportation / Transport Demand Modelling 25/44

26 GZLM Count Data Model Selection and Validation Other information criteria to compare models: AIC AIC 2L( ˆ) 2 p * AICC AICC 2L( ˆ) 2 p * N N p * 1 BIC BIC 2L( ˆ) p * ln( N) CAIC CAIC 2L( ˆ) p * (ln( N) 1) Phd in Transportation / Transport Demand Modelling 26/44

GZLM Count Data X 2 2[ LL( R ) LL( U )] The Omnibus test verifies if the explained variance is significantly greater than the unexplained variance Deviance compares the given model with the full

27 GZLM Count Data X 2 2[ LL( R ) LL( U )] The Omnibus test verifies if the explained variance is significantly greater than the unexplained variance Deviance compares the given model with the full model (the full model has one parameter for each observation, therefore has a perfect fit). The deviance in a perfect fit model is 0. The deviance could be used to have information about over dispersion (testing H0 a=0). In the present case we reject that hypothesis since the deviance value is higher than the X 2 critical, therefore the p-value is 0,00. When the Value/df >1, there is a sign of over dispersion Phd in Transportation / Transport Demand Modelling 27/44

28 GZLM Count Data Type III tests examine the significance of each partial effect, that is, the significance of an effect with all the other effects in the model. The chi-squared is a likelihood ratio for testing the significance of the effect added to the model containing all of the other effects Wald test for statistical inference of β coefficients for the independent variables Phd in Transportation / Transport Demand Modelling 28/44

GZLM Count Data Testing overdispersion Using the regression test to estimate if there is overdispersion A linear regression is estimated where Z is regressed on W

29 GZLM Count Data Testing overdispersion Using the regression test to estimate if there is overdispersion A linear regression is estimated where Z is regressed on W (Z=bW). If is statistically significant with g(e[y i ])=E[y i ] and g(e[y i ])=E[y i ] 2 then H 0 is rejected Phd in Transportation / Transport Demand Modelling 29/44

GZLM Count Data Testing overdispersion Predicted value of mean of response - Saves modelpredicted values for each case in the original response metric.

30 GZLM Count Data Testing overdispersion Predicted value of mean of response - Saves modelpredicted values for each case in the original response metric. Predicted value of linear predictor - Saves modelpredicted values for each case in the metric of the linear predictor (transformed response via the specified link function). When the response distribution is multinomial, the procedure saves the predicted value for each category of the response, except the last, up to the number of specified categories to save. Phd in Transportation / Transport Demand Modelling 30/44

31 SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Standard E Observatio 84 Model with g(e[y i ])=E[y i ] GZLM Count Data Testing overdispersion ANOVA df SS MS F ignificance F Regression Residual Total Coefficientsandard Erro t Stat P-value Lower 95%Upper 95%ower 95.0%Upper 95.0% Intercept 0 #N/A #N/A #N/A #N/A #N/A #N/A #N/A w The results of both linear regressions show that b is statistically significant in the two models. Therefore H 0 is rejected SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Standard E Observatio 84 Model with g(e[y i ])=E[y i ] 2 ANOVA df SS MS F ignificance F Regression Residual Total Coefficientsandard Erro t Stat P-value Lower 95%Upper 95%ower 95.0%Upper 95.0% Intercept 0 #N/A #N/A #N/A #N/A #N/A #N/A #N/A w Phd in Transportation / Transport Demand Modelling 31/44

Goodness of fit Goodness of Fit a GZLM Count Data The Omnibus test could be used to estimate the 2 From the value of the only the constant) Value df Value/df Deviance 330,391 83 3,981 Scaled Deviance

32 Goodness of fit Goodness of Fit a GZLM Count Data The Omnibus test could be used to estimate the 2 From the value of the only the constant) Value df Value/df Deviance 330, ,981 Scaled Deviance 330, Pearson Chi-Square 358, ,314 Scaled Pearson Chi-Square 358, Log Likelihood b -246,185 Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) 494, , ,800 Consistent AIC (CAIC) 497,800 Dependent Variable: Accident Model: (Intercept) a. Information criteria are in small-is-better form. b. The full log likelihood function is displayed and used in computing information criteria. 2 =0,312 (relative to the model with only constant) 2 it is possible to estimate the LL of the restricted model (with Phd in Transportation / Transport Demand Modelling 32/44

33 Over dispersed Poisson GZLM Count Data Since there is an indication for overdispersion, two other models must be tested Overdispersed Poisson regression (where a scale parameter in admissible) Negative Binomial Phd in Transportation / Transport Demand Modelling 33/44

Over dispersed Poisson GZLM Count Data The main difference with the Poisson Regression Model is that the scale parameter is estimated and not fixed.

34 Over dispersed Poisson GZLM Count Data The main difference with the Poisson Regression Model is that the scale parameter is estimated and not fixed. The Pearson Chi-squared method is used to estimate the Scale Parameter The scale parameter has a different nature then vector β of coefficients β has a direct influence on the expected value of variable Yi, and the parameter reveals the data dispersion On other distributions it must be estimated through maximum likelihood log for the Y vector Phd in Transportation / Transport Demand Modelling 34/44

35 Over dispersed Poisson GZLM Count Data Parameter Estimates Parameter B Std. Error 95% Wald Confidence Interval Hypothesis Test Lower Upper Wald Chi- Square df Sig. (Intercept) -,826,3553-1,522,130 5,404 1,020 AADT1 8,122E-005 1,8086E-005 4,577E-005,000 20,166 1,000 AADT2,001,0001,000,001 23,114 1,000 Median -,060,0338 -,126,006 3,156 1,076 Drive,075,0253,025,124 8,743 1,003 (Scale) 2,361 a Dependent Variable: Accident Model: (Intercept), AADT1, AADT2, Median, Drive a. Computed based on the Pearson chi-square. The coefficient estimates are similar to the ones obtained with the Poisson model. Although the standard errors are bigger, because they are adjusted by the scale parameter (When there is over dispersion, the variance is also larger then the standard errors of the parameters became inflated) Phd in Transportation / Transport Demand Modelling 35/44

36 Negative Binomial GZLM Count Data To estimate the Negative Binomial, and estimate the scale parameter using maximum likelihood Phd in Transportation / Transport Demand Modelling 36/44

37 Negative Binomial GZLM Count Data The Lagrange Multiplier test This test could only be performed if the scale parameter is fixed Phd in Transportation / Transport Demand Modelling 37/44

38 Negative Binomial GZLM Count Data Phd in Transportation / Transport Demand Modelling 38/44

39 Negative Binomial GZLM Count Data Phd in Transportation / Transport Demand Modelling 39/44

Negative Binomial GZLM Count Data The negative binomial model is the same as the Poisson model when the binomial model's ancillary (dispersion) parameter,, equals 0, the Lagrange multiplier test is a

A non-significant Lagrange test coefficient indicates that cannot be assumed to be different from 0, and hence a Poisson model would be preferred over a negative binomial model.

40 Negative Binomial GZLM Count Data The negative binomial model is the same as the Poisson model when the binomial model's ancillary (dispersion) parameter,, equals 0, the Lagrange multiplier test is a test of the null hypothesis that = 0. A non-significant Lagrange test coefficient indicates that cannot be assumed to be different from 0, and hence a Poisson model would be preferred over a negative binomial model. Yet, if LL(p) is substantially smaller than LL(NB), then, the use of a Negative Binomial might not improve the model results (even with over dispersion). An alternative approach might be run a Negative Binomial with k=0, and evaluate the Lagrange Multiplier Test. If the test is significant, over dispersion will not be a problem for the available data. Phd in Transportation / Transport Demand Modelling 40/44

41 Poisson example Accidents at intersections GZLM Count Data Example 1 Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and econometric Methods for Transportation Data Analysis, CRC Phd in Transportation / Transport Demand Modelling 41/44

42 Poisson example Accidents at intersections GZLM Count Data Example 1 Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and econometric Methods for Transportation Data Analysis, CRC Phd in Transportation / Transport Demand Modelling 42/44

43 Negative Binomial Accidents at intersections GZLM Count Data Example 1 Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and econometric Methods for Transportation Data Analysis, CRC Phd in Transportation / Transport Demand Modelling 43/44

44 Objectives To evaluate the importance/impact of the International friction index IFI on the level of accidents To develop the previous tests on clusters of data with different but homogeneous characteristics in order to define values of IFI and other road caracteristics related with pavement properties Test models only with IFI Test models with IFI and other variables Phd in Transportation / Transport Demand Modelling 44/44

45 Methodology Generalized Linear Models SPSS Model Formulation Model Adjustment Model Validation Phd in Transportation / Transport Demand Modelling 45/44

46 Model Formulation 1º - Response variable -Accidents frequency between 1997 and 2002 A. Poisson Distribution Standard with a unitary φ parameter B. Over dispersed Poisson where φ is > 1 C. Negative Binomial, if over dispersion exists - it allows for greater variance and absorbs over dispersion (K 0) (model estimation comparison of the three models) Phd in Transportation / Transport Demand Modelling 46/44

47 Note, two different things:.. B) Over dispersed Poisson where φ is > 1 C) Negative Binomial, if over dispersion exists - it allows for greater variance and absorbs over dispersion (K 0) The over dispersed Poisson is tested using the 1 as a limit for the dispersion parameter The Negative Binomial is tested contrasting K=0 with K>0 Phd in Transportation / Transport Demand Modelling 47/44

48 Model Formulation 1º - Response variable -Accidents frequency between 1997 and 2002 Lagrange Multiplier for choosing between Poisson and Binomial Negative Evaluation between dispersion parameter behavior and statistical inference through Maximum likelihood and Pseudo R 2 If dispersion is not a problem than Poisson is adequated even if whith worst values for maximum Likelihood and Pseudo-R 2 (which happens in smaller units) Phd in Transportation / Transport Demand Modelling 48/44

49 Model Formulation Initial assumptions on the unit of analysis (road segment): 6 year period 1km extension Time period Increasing the time period could reduce the number of zeros but the probability of changes would increase for other factors Section length Increasing the section length could increase section heterogeneity Phd in Transportation / Transport Demand Modelling 49/44

50 Model Formulation Data Accidents with victims registered by the national security authorities Overdispersion characterized by excess of zeros Zero Inflated Poisson Regression double state process: places with zero accidents, and places with a probability of occurrence of accidents...not possible because there is not a place 100% safe Phd in Transportation / Transport Demand Modelling 50/44

51 Model Formulation Independent Variables (for each segment road, 6years, 1 km): IFI - International friction index pheav - % heavy vehicles AS - Average speed purb - %crossing urban area prcross - %road crossings ptext - %turn extensions CDR - Less advantage radius class CI - Longitudinal inclination class AP - Average precipitation Phd in Transportation / Transport Demand Modelling 51/44

52 Model Formulation Offset variable If desired, the values of some variable can be added to the linear predictor of the dependent variable. (The linear predictor is the righthand side of the model equation). Alternatively, a fixed value may be added AcAADT Accumulated Annual Average Daily Traffic, representing the exposition to an accident risk. AcAADT = log(aadt*nyears(6)*365days) AADT - Annual Average Daily Traffic Phd in Transportation / Transport Demand Modelling 52/44

53 The offset itself is an explanatory variable - which may be both subject and time period specific - whose coefficient is fixed at 1.0, and is thus constant throughout the fitting process. By constraining the coefficient of the exposure variable to equal one you transform the model into a model of rates (e.g., injuries per unit of exposure instead of the probability that the person will be injured). This occurs because the link function is logistic, and one implication of that is that a positive coefficient of 1, subtracted from both sides of the equation, is equivalent to making the exposure the denominator of the left-hand side of the equation Phd in Transportation / Transport Demand Modelling 53/44

54 Model adjustment and validation Method: Maximum Likelihood Estimation Hibrid iterative method Deviance and statistic Pearson χ 2 for φ estimation (dispersion parameter) Wald test for statistical inference of β coefficients for the explicative variable Omnibus test, Log Likelihood, Information Criteria (AIC, AICC, BIC and CAIC) for comparing dispersion levels. Model predictive capacity Pseudo R 2 and R 2 Phd in Transportation / Transport Demand Modelling 54/44

55 Model adjustment and validation Models to be estimated (Poisson, Overdispersed Poisson and Negative Binomial) 1º For all the sections: 1.1Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi 1.2 Introduction of all the other explanatory variables Model: All_Multi A= β 0 +β 1 * IFI + β 2 pheav + β 3 *AS + β 4 *purb + β 5 *pcross + β 6 *ptext + β 7 *CDR + β 8 *CI + β 9 *AP Phd in Transportation / Transport Demand Modelling 55/44

56 Model adjustment and validation Models to be estimated (Poisson, Overdispersed Poisson and Negative Binomial) Intermediate step: Creation of clusters accordingly with road characteristics (not included in this analysis) 2º For each cluster: 1.1 Univariate only with IFI as independent Model: ARX_IFI 1.2 Introduction of all the other explanatory variables Model: ARX_Multi Phd in Transportation / Transport Demand Modelling 56/44

57 Objectives To evaluate the importance/impact of the International friction index IFI on the level of accidents, through estimation of the best predicting model Test models only with IFI Test models with IFI and other variables Phd in Transportation / Transport Demand Modelling 57/44

58 Summary: 1º Checking for over dispersion of data Lagrange tests on negative binomial for K variable (Ho: K=0) Significance: Negative Binomial Non significance: Poisson 2º Estimating Poisson 3º Estimating Poisson with dispersion parameter 4º Estimating Negative Binomial Phd in Transportation / Transport Demand Modelling 58/44

59 1º Checking for overdispersion of data Negative binomial with log link function for parameter K = 1 Note: If this is not done The K parameter is set to Zero (K=0) by default. But results will not differ substantially in either case Phd in Transportation / Transport Demand Modelling 59/44

60 1º Checking for overdispersion of data Print Lagrange Multiplier Test for K parameter Phd in Transportation / Transport Demand Modelling 60/44

61 1º Checking for over dispersion of data It is possible to notice that the test is highly significant the null hypothesis is rejected There is a strong indication pointing the inadequacy of the Poisson Model Yet, if L(p) is substantially smaller than L(NB), then, the use of a Negative Binomial might not improve the model results (even with overdispersion) An alternative approach might be run a Negative Binomial with k=0, and evaluate the Lagrange Multiplier Test. If the test is significant, overdispersion will not be a problem for the available data Phd in Transportation / Transport Demand Modelling 61/44

62 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 62/44

63 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 63/44

64 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 64/44

65 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 65/44

66 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 66/44

67 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 67/44

68 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 68/44

69 1º Estimating Poisson (Model Adjustment)) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 69/44

70 1º Estimating Poisson (Model Adjustment) Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 70/44

71 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 71/44

72 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Over dispersion (>1) To be compared with other models Phd in Transportation / Transport Demand Modelling 72/44

73 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Significance for the model Phd in Transportation / Transport Demand Modelling 73/44

74 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 74/44

75 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 75/44

76 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 76/44

77 1º Estimating Poisson with over dispersion (Model Validation) sections 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 77/44

78 1º Estimating Poisson with over dispersion (Model Validation) sections 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 78/44

79 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 79/44

80 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 80/44

81 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 81/44

82 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 82/44

83 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 83/44

84 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 84/44

85 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 85/44

86 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 86/44

87 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 87/44

88 Model adjustment and validation (A) TODOS_IF All 254 sections just with IFI as explicative variable Comparision Between Models Model selecion indicators Poisson regression with overdispersion Poisson Regression Indicators With Scale Parameter Pearson χ2 Negative Binomial Regression Value df ϕ D /ϕ χ2 Value GL ϕd/ϕχ2 Value GL ϕd/ϕχ2 Deviance 773, ,000 2, , ,000 2, , ,000 0,614 Scaled Deviance 733, , , , , ,000 Pearson χ2 798, ,000 3, , ,000 3, , ,000 0,578 Scaled Pearson χ2 798, , , , , ,000 Log Likelihood -727, , ,117 Adjusted Log Likelihood -229,437 AIC 1458, , ,354 AICC 1458, , ,401 BIC 1458, , ,428 CAIC 1467, , ,428 Omnibus-Test 792, ,045 94,478 p-value 0,000 0,000 0,000 Pseudo-R2??? R2??? Phd in Transportation / Transport Demand Modelling 88/44

89 Model adjustment and validation Calculation of Pseudo R 2 and R 2 (aprox.) for each model Poisson regression Omnibus(792,769) = -2(Lβ 0 - Lβ(-727,431)) L β 0 = -1123,82 ρ 2 = 1-(Lβ(-727,431) / L β 0 ( -1123,82 ) =... R 2 ~... Phd in Transportation / Transport Demand Modelling 89/44

90 Model adjustment and validation Calculation of Pseudo R 2 ans R 2 (aprox.) for each model Poisson with Over dispersion Omnibus(250,045) = -2(Lβ 0 -Lβ( -727,431) ) L β 0 = -852,454 ρ 2 = 1-(Lβ( -727,431) / L β 0 ( -852,454)=... R 2 ~... Phd in Transportation / Transport Demand Modelling 90/44

91 Model adjustment and validation Calculation of Pseudo R 2 ans R 2 (aprox.) for each model Negative Binomial Omnibus(94,478) = -2(Lβ 0 - Lβ( -633,117) ) L β 0 = -680,356 ρ 2 = 1-(Lβ( -633,117) / L β 0 ( -680,356 )=... R 2 ~... Phd in Transportation / Transport Demand Modelling 91/44

92 Model adjustment and validation GZLM - Case Study Calculation of Pseudo R 2 ans R 2 (aprox.) for each model Poisson Regression Poisson regression with overdispersion Negative Binomial Regression With Scale Parameter Pearson χ2 Value df ϕ D /ϕ χ2 Value GL ϕd/ϕχ2 Value GL ϕd/ϕχ2 Deviance 773, , , , , ,614 Scaled Deviance 733, , , Pearson χ2 798, , , , , ,578 Scaled Pearson χ2 798, , Log Likelihood -727, , ,117 Adjusted Log Likelihood -229,437 AIC 1458, , ,354 AICC 1458, , ,401 BIC 1458, , ,428 CAIC 1467, , ,428 Omnibus-Test 792, ,045 94,478 p-value Pseudo-R2 0, , , R2 0,7 0,3 0,0... Phd in Transportation / Transport Demand Modelling 92/44

93 Model adjustment and validation (A) Sections just with IFI as explicative variable Comparision Between Models Calibration of β parameters and Scale Parameter, Wald test and Confidence intervals Poisson regression with Poisson Regression overdispersion Negative Binomial With Scale Parameter Regression Pearson χ2 Value Sqr p-value Value Sqr p-value Value Sqr p-value Intercept (β0) -3,199 0,079 0,000-3,190 0,140 0,000-3,382 0,276 0,000 IFI (β1) -0,084 0,003 0,000-0,084 0,005 0,000-0,078 0,009 0,000 Scale parameter 1,000 0,000 3,171 1,000 Confidence interval Confidence interval Confidence interval Intercept (β0) [-3,353;-3,045] [-3,473;-2,925] [-3,930;-2,848] IFI (β1) [-0,084;0,0030] [-0,094;-0,073] [-0,095;-0,061] Phd in Transportation / Transport Demand Modelling 93/44

94 References References Hardin, J. W., & Hilbe, J. M. (2007). Generalized linear models and extensions (2nd ed.). College Station, TX: StataCorp LP. Hilbe, Joseph M. (2007). Negative binomial regression. New York: Cambridge University Press. Hoffman, J. P. (2004). Generalized linear models: An applied approach. Boston: Allyn & Bacon. An accessible extended introduction. McCullagh, P. & J.A. Nelder (1989). Generalized linear models. Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN Nelder, J. A. & Wedderburn, R. W. N. (1972). Generalized linear models. Journal of the Royal Statistical Society 135: The seminal article for GZLM. Phd in Transportation / Transport Demand Modelling 94/44

95 References References J. B. S. Haldane, "On a Method of Estimating Frequencies", Biometrika, Vol. 33, No. 3 (Nov., 1945), pp JSTOR Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and Econometric Methods for Transportation Data Analysis, CRC Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, Poisson-Gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention, pp Fernandes, A. (2010) Programas de manutenção de características da superfície de pavimentos associados a critérios de segurança rodoviária. Tese de Doutoramento em Engenharia Civil. Instituto Superior Técnico, Universidade Técnica de Lisboa. aisabel.capote@gmail.com Domencich and Macfaden (1975) Urban Travel Demand: A Behavioral Analysis. North-Holland Publishing Co., Reprinted Phd in Transportation / Transport Demand Modelling 95/44

Model Estimation Example

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions