Phd Program in Transportation. Transport Demand Modeling. Session 8

Size: px
Start display at page:

Download "Phd Program in Transportation. Transport Demand Modeling. Session 8"

Transcription

1 Phd Program in Transportation Transport Demand Modeling Luis Martínez (based on the Lessons of Anabela Ribeiro TDM2010) Session 8 Generalized Linear Models Phd in Transportation / Transport Demand Modelling 1/44

2 GZLM Count Data Analysis steps Model Formulation Model Adjusment Model Selection and Validation Phd in Transportation / Transport Demand Modelling 2/44

3 GZLM Count Data Model Formulation Random Component - Dependent Variable (distributed as a Poisson or Negative Binomial) Systematic Component - Independent variables (explaining the dependent variable) Link or connection function (logarithmic) Phd in Transportation / Transport Demand Modelling 3/44

4 GZLM Count Data The type of model could be selected among a series of model types Poisson loglinear-specifies Poisson as the distribution and Log as the link function. Negative binomial with log link. Specifies Negative binomial (with a value of 1 for the ancillary parameter) as the distribution and Log as the link function. The custom tool allows the selection of specific models (specific distribuition) together with a specific link function Phd in Transportation / Transport Demand Modelling 4/44

5 GZLM Count Data The dependent variable is defined here Phd in Transportation / Transport Demand Modelling 5/44

6 GZLM Count Data Factors - Factors are categorical predictors; they can be numeric or string. Covariates - Covariates are scale predictors; they must be numeric Offset - The offset term is a structural predictor. Its coefficient is not estimated by the model but is assumed to have the value 1; thus, the values of the offset are simply added to the linear predictor of the target. This is especially useful in Poisson regression models, where each case may have different levels of exposure to the event of interest. When modeling accident rates for individual drivers, there is an important difference between a driver who has been at fault in one accident in three years of experience and a driver who has been at fault in one accident in 25 years! The number of accidents can be modeled as a Poisson or negative binomial response with a log link if the natural log of the experience of the driver is included as an offset term. Phd in Transportation / Transport Demand Modelling 6/44

7 GZLM Count Data Model Effects - The default model is intercept-only, the other model effects must be explicitly specified Main effects - Creates a maineffects term for each variable selected. Interaction - Creates the highestlevel interaction term for all selected variables. Selecting a model with an intercept term Phd in Transportation / Transport Demand Modelling 7/44

8 GZLM Count Data Model Adjustment Maximum likelihood method (to estimate variables coefficients and dispersion parameter φ) Interactive computational estimation method: 1. For the exponential family yi i b i f yi i, exp c yi,, y a i i 1. The Log of Maximum likelihood estimation is given by L, ; y N i 1 exp y i i a i b i c y i, Phd in Transportation / Transport Demand Modelling 8/44

9 GZLM Count Data Model Adjustment Variable Coefficients Maximum likelihood method maximizes the likelihood function Y i in relation to βj, and therefore it allows to determine the absolute maximum (since the logarithmic function always grows). We must then solve the system of equations S(θi)=0 S( i ) L(, Since it is a system of non linear equations it must be estimated iteratively Newton-Raphson Fisher-Scoring j ; y) Hybrid (Fisher on a set of initial iterations and than changed to Newton) Phd in Transportation / Transport Demand Modelling 9/44

10 GZLM Count Data Model Adjustment Scale Parameter ϕ The scale parameter φ has a different nature then vector β β has a direct influence on the expected value of variable Yi, and the parameter reveals the data dispersion On some exponential families such as Poisson, the parameter is fixed and not estimated On other distributions it must be estimated through maximum likelihood log for the Y vector, by a derivate in order to φ and being equal to zero. Phd in Transportation / Transport Demand Modelling 10/44

11 GZLM Count Data Method - Estimation methods for the parameters could be selected here Scale parameter method - Maximumlikelihood jointly estimates the scale parameter with the model effects. This option is not valid if the response has a negative binomial, Poisson, binomial, or multinomial distribution. Phd in Transportation / Transport Demand Modelling 11/44

12 GZLM Count Data Model Adjustment - Scale Parameter ϕ Estimated through Deviance φ D D D N p c 2( L N L p m ) Lc is the maximum likelihood log of the complete model (with all the variables) Lm is the maximum likelihood log of the model under analysis If Deviance is superior to N-p the model is overdispersed N road segments and p variables D is the Deviance Phd in Transportation / Transport Demand Modelling 12/44

13 GZLM Count Data Model Adjustment - Scale Parameter ϕ Or through the statistic χ 2 of Pearson (φ χ2 ) N N p N p i 1 ( yi yˆ i ) var( yˆ ) i 2 Where χ 2 is the statistic of Pearson If χ 2 is superior to N-p the model is overdispersed N road segments and p variables Both must be close to 1 in order to use Poisson Regression Phd in Transportation / Transport Demand Modelling 13/44

14 GZLM Count Data Model Adjustment Model Selection and Validation Calculation of the previous two statistics guides the model choice Deviance φ D χ 2 of Pearson (φ χ2 ) If these values are higher than N-p than the model is overdispersed or other type of distribution is presented rather than Poisson Next step is to evaluate the adjustment quality of the model against other models Phd in Transportation / Transport Demand Modelling 14/44

15 GZLM Count Data Analysis type - Type I analysis is generally appropriate when there are a priori reasons for ordering predictors in the model. Type III is more generally applicable. The chi squared statistics could be estimated either using Wald or likelihood-ratio. Confidence intervals - Wald intervals are based on the assumption that parameters have an asymptotic normal distribution; profile likelihood intervals are more accurate but can be computationally expensive. The tolerance level is the criteria used to stop the iterative algorithm used to compute the intervals. Log-likelihood function -This controls the display format of the log-likelihood function. The full function includes an additional term that is constant with respect to the parameter estimates; it has no effect on parameter estimation. Phd in Transportation / Transport Demand Modelling 15/44

16 GZLM Count Data Print Case processing summary - number and percentage of cases included and excluded from the analysis and the Correlated Data Summary table. Descriptive statistics - descriptive statistics and summary information about the dependent variable, covariates, and factors. Model information - dataset name, dependent variable or events and trials variables, offset variable, scale weight variable, probability distribution, and link function. Goodness of fit statistics - Deviance and scaled deviance, Pearson chi-square and scaled Pearson chi-square, log-likelihood, Akaike s information criterion (AIC), finite sample corrected AIC (AICC), Bayesian information criterion (BIC), and consistent AIC (CAIC). Model summary statistics - likelihood-ratio statistics for the model fit omnibus test and Lagrange multiplier test - Lagrange multiplier test statistics for statistics for the Type I or III contrasts for each assessing the validity of a scale parameter that is computed using effect. the deviance or Pearson chi-square. For the negative binomial Parameter estimates - Displays parameter distribution, this tests the fixed ancillary parameter. estimates and corresponding test statistics and confidence intervals. In addiction it can optionally Phd in Transportation / Transport Demand Modelling display exponentiated parameter estimates. 16/44

17 GZLM Count Data Model Selection and Validation Over dispersion of data (Maximum Likelihood Ratio and Lagrange Tests) Statistical significance (Wald test) Predictive capacity (Omnibus test; Pseudo R 2 ) Comparing between models (log maximum likelihood: AIC/AICC/BIC/CAIC) Phd in Transportation / Transport Demand Modelling 17/44

18 GZLM Count Data Model Selection and Validation Over dispersion of data (a) - Maximum likelihood ratio This test analyses the equality between the mean and the variance through Poisson Regression Standard against the alternative of the variance exceeding the mean (Binomial Negative) H0:K=0 H1:K=>0 X 2 ~ 2[ L( P) L( NB )] If p value is below 0.05 than the null hypothesis is rejected overdispersion is than identified Phd in Transportation / Transport Demand Modelling 18/44

19 GZLM Count Data Model Selection and Validation Over dispersion of data (b) - Lagrange tests on negative binomial for K variable = 0 (H 0 : K=0) Significance: Negative Binomial Non significance: Poisson Possibility of overdispersion being related with excess of zeros: Zero Inflated Poisson (but not possible without the presence of zero accidents segments) Phd in Transportation / Transport Demand Modelling 19/44

20 GZLM Count Data Model Selection and Validation Testing for the statistical signifcance of each coeficient β Assimptotical test or Wald Test WS var j 2 j v H 0 : ˆ j 0 For low p values the null hyphthesis is refused and the variable is influent in the model Phd in Transportation / Transport Demand Modelling 20/44

21 GZLM Count Data Model Selection and Validation If there is an indication for overdispersion, two other models must be tested Overdispersed Poisson regression (where a scale parameter in admissible) Negative Binomial Phd in Transportation / Transport Demand Modelling 21/44

22 GZLM Count Data Model Selection and Validation Predictive capacity of the model Omnibus test X 2 2[ LL( R ) LL( U )] Pseudo R LL( LL( U ) R ) R 2 Phd in Transportation / Transport Demand Modelling 22/44

23 GZLM Count Data Model Selection and Validation Omnibus test X 2 2[ LL( R ) LL( If significant (p-value less than 0,05) means that the estimated model is better than the null model U )] Lβ U is the log likelihood of the unrestricted model L β R is the log likelihood of the restricted model (without independent variables) With number of degress of freedmon equal to the diference between the number of parameters in the restricted and unrestricted model Phd in Transportation / Transport Demand Modelling 23/44

24 GZLM Count Data Model Selection and Validation Having the value of the Omnibus test and the value of LLβ U is possible to estimate the the log likelihood LLβ R of the restricted model (without independent variables and with all parameters set to zero) Having the value of L β 0 and Lβ it is possible to estimate the Pseudo R 2 1 LL( LL( U ) R ) Phd in Transportation / Transport Demand Modelling 24/44

25 GZLM Count Data Model Selection and Validation Having the value of the Pseudo R it is possible to estimate R 2 through an empirical relation set by Domencich and Macfaden (1975) Phd in Transportation / Transport Demand Modelling 25/44

26 GZLM Count Data Model Selection and Validation Other information criteria to compare models: AIC AIC 2L( ˆ) 2 p * AICC AICC 2L( ˆ) 2 p * N N p * 1 BIC BIC 2L( ˆ) p * ln( N) CAIC CAIC 2L( ˆ) p * (ln( N) 1) Phd in Transportation / Transport Demand Modelling 26/44

27 GZLM Count Data X 2 2[ LL( R ) LL( U )] The Omnibus test verifies if the explained variance is significantly greater than the unexplained variance Deviance compares the given model with the full model (the full model has one parameter for each observation, therefore has a perfect fit). The deviance in a perfect fit model is 0. The deviance could be used to have information about over dispersion (testing H0 a=0). In the present case we reject that hypothesis since the deviance value is higher than the X 2 critical, therefore the p-value is 0,00. When the Value/df >1, there is a sign of over dispersion Phd in Transportation / Transport Demand Modelling 27/44

28 GZLM Count Data Type III tests examine the significance of each partial effect, that is, the significance of an effect with all the other effects in the model. The chi-squared is a likelihood ratio for testing the significance of the effect added to the model containing all of the other effects Wald test for statistical inference of β coefficients for the independent variables Phd in Transportation / Transport Demand Modelling 28/44

29 GZLM Count Data Testing overdispersion Using the regression test to estimate if there is overdispersion A linear regression is estimated where Z is regressed on W (Z=bW). If is statistically significant with g(e[y i ])=E[y i ] and g(e[y i ])=E[y i ] 2 then H 0 is rejected Phd in Transportation / Transport Demand Modelling 29/44

30 GZLM Count Data Testing overdispersion Predicted value of mean of response - Saves modelpredicted values for each case in the original response metric. Predicted value of linear predictor - Saves modelpredicted values for each case in the metric of the linear predictor (transformed response via the specified link function). When the response distribution is multinomial, the procedure saves the predicted value for each category of the response, except the last, up to the number of specified categories to save. Phd in Transportation / Transport Demand Modelling 30/44

31 SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Standard E Observatio 84 Model with g(e[y i ])=E[y i ] GZLM Count Data Testing overdispersion ANOVA df SS MS F ignificance F Regression Residual Total Coefficientsandard Erro t Stat P-value Lower 95%Upper 95%ower 95.0%Upper 95.0% Intercept 0 #N/A #N/A #N/A #N/A #N/A #N/A #N/A w The results of both linear regressions show that b is statistically significant in the two models. Therefore H 0 is rejected SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Standard E Observatio 84 Model with g(e[y i ])=E[y i ] 2 ANOVA df SS MS F ignificance F Regression Residual Total Coefficientsandard Erro t Stat P-value Lower 95%Upper 95%ower 95.0%Upper 95.0% Intercept 0 #N/A #N/A #N/A #N/A #N/A #N/A #N/A w Phd in Transportation / Transport Demand Modelling 31/44

32 Goodness of fit Goodness of Fit a GZLM Count Data The Omnibus test could be used to estimate the 2 From the value of the only the constant) Value df Value/df Deviance 330, ,981 Scaled Deviance 330, Pearson Chi-Square 358, ,314 Scaled Pearson Chi-Square 358, Log Likelihood b -246,185 Akaike's Information Criterion (AIC) Finite Sample Corrected AIC (AICC) Bayesian Information Criterion (BIC) 494, , ,800 Consistent AIC (CAIC) 497,800 Dependent Variable: Accident Model: (Intercept) a. Information criteria are in small-is-better form. b. The full log likelihood function is displayed and used in computing information criteria. 2 =0,312 (relative to the model with only constant) 2 it is possible to estimate the LL of the restricted model (with Phd in Transportation / Transport Demand Modelling 32/44

33 Over dispersed Poisson GZLM Count Data Since there is an indication for overdispersion, two other models must be tested Overdispersed Poisson regression (where a scale parameter in admissible) Negative Binomial Phd in Transportation / Transport Demand Modelling 33/44

34 Over dispersed Poisson GZLM Count Data The main difference with the Poisson Regression Model is that the scale parameter is estimated and not fixed. The Pearson Chi-squared method is used to estimate the Scale Parameter The scale parameter has a different nature then vector β of coefficients β has a direct influence on the expected value of variable Yi, and the parameter reveals the data dispersion On other distributions it must be estimated through maximum likelihood log for the Y vector Phd in Transportation / Transport Demand Modelling 34/44

35 Over dispersed Poisson GZLM Count Data Parameter Estimates Parameter B Std. Error 95% Wald Confidence Interval Hypothesis Test Lower Upper Wald Chi- Square df Sig. (Intercept) -,826,3553-1,522,130 5,404 1,020 AADT1 8,122E-005 1,8086E-005 4,577E-005,000 20,166 1,000 AADT2,001,0001,000,001 23,114 1,000 Median -,060,0338 -,126,006 3,156 1,076 Drive,075,0253,025,124 8,743 1,003 (Scale) 2,361 a Dependent Variable: Accident Model: (Intercept), AADT1, AADT2, Median, Drive a. Computed based on the Pearson chi-square. The coefficient estimates are similar to the ones obtained with the Poisson model. Although the standard errors are bigger, because they are adjusted by the scale parameter (When there is over dispersion, the variance is also larger then the standard errors of the parameters became inflated) Phd in Transportation / Transport Demand Modelling 35/44

36 Negative Binomial GZLM Count Data To estimate the Negative Binomial, and estimate the scale parameter using maximum likelihood Phd in Transportation / Transport Demand Modelling 36/44

37 Negative Binomial GZLM Count Data The Lagrange Multiplier test This test could only be performed if the scale parameter is fixed Phd in Transportation / Transport Demand Modelling 37/44

38 Negative Binomial GZLM Count Data Phd in Transportation / Transport Demand Modelling 38/44

39 Negative Binomial GZLM Count Data Phd in Transportation / Transport Demand Modelling 39/44

40 Negative Binomial GZLM Count Data The negative binomial model is the same as the Poisson model when the binomial model's ancillary (dispersion) parameter,, equals 0, the Lagrange multiplier test is a test of the null hypothesis that = 0. A non-significant Lagrange test coefficient indicates that cannot be assumed to be different from 0, and hence a Poisson model would be preferred over a negative binomial model. Yet, if LL(p) is substantially smaller than LL(NB), then, the use of a Negative Binomial might not improve the model results (even with over dispersion). An alternative approach might be run a Negative Binomial with k=0, and evaluate the Lagrange Multiplier Test. If the test is significant, over dispersion will not be a problem for the available data. Phd in Transportation / Transport Demand Modelling 40/44

41 Poisson example Accidents at intersections GZLM Count Data Example 1 Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and econometric Methods for Transportation Data Analysis, CRC Phd in Transportation / Transport Demand Modelling 41/44

42 Poisson example Accidents at intersections GZLM Count Data Example 1 Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and econometric Methods for Transportation Data Analysis, CRC Phd in Transportation / Transport Demand Modelling 42/44

43 Negative Binomial Accidents at intersections GZLM Count Data Example 1 Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and econometric Methods for Transportation Data Analysis, CRC Phd in Transportation / Transport Demand Modelling 43/44

44 Objectives To evaluate the importance/impact of the International friction index IFI on the level of accidents To develop the previous tests on clusters of data with different but homogeneous characteristics in order to define values of IFI and other road caracteristics related with pavement properties Test models only with IFI Test models with IFI and other variables Phd in Transportation / Transport Demand Modelling 44/44

45 Methodology Generalized Linear Models SPSS Model Formulation Model Adjustment Model Validation Phd in Transportation / Transport Demand Modelling 45/44

46 Model Formulation 1º - Response variable -Accidents frequency between 1997 and 2002 A. Poisson Distribution Standard with a unitary φ parameter B. Over dispersed Poisson where φ is > 1 C. Negative Binomial, if over dispersion exists - it allows for greater variance and absorbs over dispersion (K 0) (model estimation comparison of the three models) Phd in Transportation / Transport Demand Modelling 46/44

47 Note, two different things:.. B) Over dispersed Poisson where φ is > 1 C) Negative Binomial, if over dispersion exists - it allows for greater variance and absorbs over dispersion (K 0) The over dispersed Poisson is tested using the 1 as a limit for the dispersion parameter The Negative Binomial is tested contrasting K=0 with K>0 Phd in Transportation / Transport Demand Modelling 47/44

48 Model Formulation 1º - Response variable -Accidents frequency between 1997 and 2002 Lagrange Multiplier for choosing between Poisson and Binomial Negative Evaluation between dispersion parameter behavior and statistical inference through Maximum likelihood and Pseudo R 2 If dispersion is not a problem than Poisson is adequated even if whith worst values for maximum Likelihood and Pseudo-R 2 (which happens in smaller units) Phd in Transportation / Transport Demand Modelling 48/44

49 Model Formulation Initial assumptions on the unit of analysis (road segment): 6 year period 1km extension Time period Increasing the time period could reduce the number of zeros but the probability of changes would increase for other factors Section length Increasing the section length could increase section heterogeneity Phd in Transportation / Transport Demand Modelling 49/44

50 Model Formulation Data Accidents with victims registered by the national security authorities Overdispersion characterized by excess of zeros Zero Inflated Poisson Regression double state process: places with zero accidents, and places with a probability of occurrence of accidents...not possible because there is not a place 100% safe Phd in Transportation / Transport Demand Modelling 50/44

51 Model Formulation Independent Variables (for each segment road, 6years, 1 km): IFI - International friction index pheav - % heavy vehicles AS - Average speed purb - %crossing urban area prcross - %road crossings ptext - %turn extensions CDR - Less advantage radius class CI - Longitudinal inclination class AP - Average precipitation Phd in Transportation / Transport Demand Modelling 51/44

52 Model Formulation Offset variable If desired, the values of some variable can be added to the linear predictor of the dependent variable. (The linear predictor is the righthand side of the model equation). Alternatively, a fixed value may be added AcAADT Accumulated Annual Average Daily Traffic, representing the exposition to an accident risk. AcAADT = log(aadt*nyears(6)*365days) AADT - Annual Average Daily Traffic Phd in Transportation / Transport Demand Modelling 52/44

53 The offset itself is an explanatory variable - which may be both subject and time period specific - whose coefficient is fixed at 1.0, and is thus constant throughout the fitting process. By constraining the coefficient of the exposure variable to equal one you transform the model into a model of rates (e.g., injuries per unit of exposure instead of the probability that the person will be injured). This occurs because the link function is logistic, and one implication of that is that a positive coefficient of 1, subtracted from both sides of the equation, is equivalent to making the exposure the denominator of the left-hand side of the equation Phd in Transportation / Transport Demand Modelling 53/44

54 Model adjustment and validation Method: Maximum Likelihood Estimation Hibrid iterative method Deviance and statistic Pearson χ 2 for φ estimation (dispersion parameter) Wald test for statistical inference of β coefficients for the explicative variable Omnibus test, Log Likelihood, Information Criteria (AIC, AICC, BIC and CAIC) for comparing dispersion levels. Model predictive capacity Pseudo R 2 and R 2 Phd in Transportation / Transport Demand Modelling 54/44

55 Model adjustment and validation Models to be estimated (Poisson, Overdispersed Poisson and Negative Binomial) 1º For all the sections: 1.1Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi 1.2 Introduction of all the other explanatory variables Model: All_Multi A= β 0 +β 1 * IFI + β 2 pheav + β 3 *AS + β 4 *purb + β 5 *pcross + β 6 *ptext + β 7 *CDR + β 8 *CI + β 9 *AP Phd in Transportation / Transport Demand Modelling 55/44

56 Model adjustment and validation Models to be estimated (Poisson, Overdispersed Poisson and Negative Binomial) Intermediate step: Creation of clusters accordingly with road characteristics (not included in this analysis) 2º For each cluster: 1.1 Univariate only with IFI as independent Model: ARX_IFI 1.2 Introduction of all the other explanatory variables Model: ARX_Multi Phd in Transportation / Transport Demand Modelling 56/44

57 Objectives To evaluate the importance/impact of the International friction index IFI on the level of accidents, through estimation of the best predicting model Test models only with IFI Test models with IFI and other variables Phd in Transportation / Transport Demand Modelling 57/44

58 Summary: 1º Checking for over dispersion of data Lagrange tests on negative binomial for K variable (Ho: K=0) Significance: Negative Binomial Non significance: Poisson 2º Estimating Poisson 3º Estimating Poisson with dispersion parameter 4º Estimating Negative Binomial Phd in Transportation / Transport Demand Modelling 58/44

59 1º Checking for overdispersion of data Negative binomial with log link function for parameter K = 1 Note: If this is not done The K parameter is set to Zero (K=0) by default. But results will not differ substantially in either case Phd in Transportation / Transport Demand Modelling 59/44

60 1º Checking for overdispersion of data Print Lagrange Multiplier Test for K parameter Phd in Transportation / Transport Demand Modelling 60/44

61 1º Checking for over dispersion of data It is possible to notice that the test is highly significant the null hypothesis is rejected There is a strong indication pointing the inadequacy of the Poisson Model Yet, if L(p) is substantially smaller than L(NB), then, the use of a Negative Binomial might not improve the model results (even with overdispersion) An alternative approach might be run a Negative Binomial with k=0, and evaluate the Lagrange Multiplier Test. If the test is significant, overdispersion will not be a problem for the available data Phd in Transportation / Transport Demand Modelling 61/44

62 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 62/44

63 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 63/44

64 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 64/44

65 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 65/44

66 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 66/44

67 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 67/44

68 1º Estimating Poisson (Model Formulation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 68/44

69 1º Estimating Poisson (Model Adjustment)) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 69/44

70 1º Estimating Poisson (Model Adjustment) Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 70/44

71 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 71/44

72 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Over dispersion (>1) To be compared with other models Phd in Transportation / Transport Demand Modelling 72/44

73 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Significance for the model Phd in Transportation / Transport Demand Modelling 73/44

74 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 74/44

75 1º Estimating Poisson (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 75/44

76 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 76/44

77 1º Estimating Poisson with over dispersion (Model Validation) sections 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 77/44

78 1º Estimating Poisson with over dispersion (Model Validation) sections 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 78/44

79 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 79/44

80 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 80/44

81 1º Estimating Poisson with over dispersion (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 81/44

82 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 82/44

83 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 83/44

84 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI as independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 84/44

85 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 85/44

86 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 86/44

87 1º Estimating Negative Binomial (Model Validation) 1.1 Univariate only with IFI has independent Model: All_IFI A=β0 +β1*ifi Phd in Transportation / Transport Demand Modelling 87/44

88 Model adjustment and validation (A) TODOS_IF All 254 sections just with IFI as explicative variable Comparision Between Models Model selecion indicators Poisson regression with overdispersion Poisson Regression Indicators With Scale Parameter Pearson χ2 Negative Binomial Regression Value df ϕ D /ϕ χ2 Value GL ϕd/ϕχ2 Value GL ϕd/ϕχ2 Deviance 773, ,000 2, , ,000 2, , ,000 0,614 Scaled Deviance 733, , , , , ,000 Pearson χ2 798, ,000 3, , ,000 3, , ,000 0,578 Scaled Pearson χ2 798, , , , , ,000 Log Likelihood -727, , ,117 Adjusted Log Likelihood -229,437 AIC 1458, , ,354 AICC 1458, , ,401 BIC 1458, , ,428 CAIC 1467, , ,428 Omnibus-Test 792, ,045 94,478 p-value 0,000 0,000 0,000 Pseudo-R2??? R2??? Phd in Transportation / Transport Demand Modelling 88/44

89 Model adjustment and validation Calculation of Pseudo R 2 and R 2 (aprox.) for each model Poisson regression Omnibus(792,769) = -2(Lβ 0 - Lβ(-727,431)) L β 0 = -1123,82 ρ 2 = 1-(Lβ(-727,431) / L β 0 ( -1123,82 ) =... R 2 ~... Phd in Transportation / Transport Demand Modelling 89/44

90 Model adjustment and validation Calculation of Pseudo R 2 ans R 2 (aprox.) for each model Poisson with Over dispersion Omnibus(250,045) = -2(Lβ 0 -Lβ( -727,431) ) L β 0 = -852,454 ρ 2 = 1-(Lβ( -727,431) / L β 0 ( -852,454)=... R 2 ~... Phd in Transportation / Transport Demand Modelling 90/44

91 Model adjustment and validation Calculation of Pseudo R 2 ans R 2 (aprox.) for each model Negative Binomial Omnibus(94,478) = -2(Lβ 0 - Lβ( -633,117) ) L β 0 = -680,356 ρ 2 = 1-(Lβ( -633,117) / L β 0 ( -680,356 )=... R 2 ~... Phd in Transportation / Transport Demand Modelling 91/44

92 Model adjustment and validation GZLM - Case Study Calculation of Pseudo R 2 ans R 2 (aprox.) for each model Poisson Regression Poisson regression with overdispersion Negative Binomial Regression With Scale Parameter Pearson χ2 Value df ϕ D /ϕ χ2 Value GL ϕd/ϕχ2 Value GL ϕd/ϕχ2 Deviance 773, , , , , ,614 Scaled Deviance 733, , , Pearson χ2 798, , , , , ,578 Scaled Pearson χ2 798, , Log Likelihood -727, , ,117 Adjusted Log Likelihood -229,437 AIC 1458, , ,354 AICC 1458, , ,401 BIC 1458, , ,428 CAIC 1467, , ,428 Omnibus-Test 792, ,045 94,478 p-value Pseudo-R2 0, , , R2 0,7 0,3 0,0... Phd in Transportation / Transport Demand Modelling 92/44

93 Model adjustment and validation (A) Sections just with IFI as explicative variable Comparision Between Models Calibration of β parameters and Scale Parameter, Wald test and Confidence intervals Poisson regression with Poisson Regression overdispersion Negative Binomial With Scale Parameter Regression Pearson χ2 Value Sqr p-value Value Sqr p-value Value Sqr p-value Intercept (β0) -3,199 0,079 0,000-3,190 0,140 0,000-3,382 0,276 0,000 IFI (β1) -0,084 0,003 0,000-0,084 0,005 0,000-0,078 0,009 0,000 Scale parameter 1,000 0,000 3,171 1,000 Confidence interval Confidence interval Confidence interval Intercept (β0) [-3,353;-3,045] [-3,473;-2,925] [-3,930;-2,848] IFI (β1) [-0,084;0,0030] [-0,094;-0,073] [-0,095;-0,061] Phd in Transportation / Transport Demand Modelling 93/44

94 References References Hardin, J. W., & Hilbe, J. M. (2007). Generalized linear models and extensions (2nd ed.). College Station, TX: StataCorp LP. Hilbe, Joseph M. (2007). Negative binomial regression. New York: Cambridge University Press. Hoffman, J. P. (2004). Generalized linear models: An applied approach. Boston: Allyn & Bacon. An accessible extended introduction. McCullagh, P. & J.A. Nelder (1989). Generalized linear models. Second Edition. Boca Raton: Chapman and Hall/CRC. ISBN Nelder, J. A. & Wedderburn, R. W. N. (1972). Generalized linear models. Journal of the Royal Statistical Society 135: The seminal article for GZLM. Phd in Transportation / Transport Demand Modelling 94/44

95 References References J. B. S. Haldane, "On a Method of Estimating Frequencies", Biometrika, Vol. 33, No. 3 (Nov., 1945), pp JSTOR Washington, Simon P., Karlaftis, Mathew G. e Mannering (2003) Statistical and Econometric Methods for Transportation Data Analysis, CRC Lord, D., Washington, S. P., & Ivan, J. N. (2005). Poisson, Poisson-Gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis and Prevention, pp Fernandes, A. (2010) Programas de manutenção de características da superfície de pavimentos associados a critérios de segurança rodoviária. Tese de Doutoramento em Engenharia Civil. Instituto Superior Técnico, Universidade Técnica de Lisboa. aisabel.capote@gmail.com Domencich and Macfaden (1975) Urban Travel Demand: A Behavioral Analysis. North-Holland Publishing Co., Reprinted Phd in Transportation / Transport Demand Modelling 95/44

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico Evaluation of Road Safety in Portugal: A Case Study Analysis Ana Fernandes José Neves Instituto Superior Técnico OUTLINE Objectives Methodology Results Road environments Expected number of road accidents

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

GENERALIZED LINEAR MODELS Joseph M. Hilbe

GENERALIZED LINEAR MODELS Joseph M. Hilbe GENERALIZED LINEAR MODELS Joseph M. Hilbe Arizona State University 1. HISTORY Generalized Linear Models (GLM) is a covering algorithm allowing for the estimation of a number of otherwise distinct statistical

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

MODELING COUNT DATA Joseph M. Hilbe

MODELING COUNT DATA Joseph M. Hilbe MODELING COUNT DATA Joseph M. Hilbe Arizona State University Count models are a subset of discrete response regression models. Count data are distributed as non-negative integers, are intrinsically heteroskedastic,

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013

Analysis of Count Data A Business Perspective. George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Analysis of Count Data A Business Perspective George J. Hurley Sr. Research Manager The Hershey Company Milwaukee June 2013 Overview Count data Methods Conclusions 2 Count data Count data Anything with

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Prathyusha Vangala Graduate Student Zachry Department of Civil Engineering

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Li wan Chen, LENDIS Corporation, McLean, VA Forrest Council, Highway Safety Research Center,

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

SEPARATION PHENOMENA LOGISTIC REGRESSION FENÔMENOS DE SEPARAÇÃO REGRESSÃO LOGÍSITICA

SEPARATION PHENOMENA LOGISTIC REGRESSION FENÔMENOS DE SEPARAÇÃO REGRESSÃO LOGÍSITICA SEPARATION PHENOMENA LOGISTIC REGRESSION FENÔMENOS DE SEPARAÇÃO REGRESSÃO LOGÍSITICA Ikaro Daniel de Carvalho Barreto 1, Suzana Leitão Russo 2, Gutemberg Hespanha Brasil³, Vitor Hugo Simon 4 1 Departamento

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National

More information

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved.

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved. This file consists of Chapter 4 of MacAnova User s Guide by Gary W. Oehlert and Christopher Bingham, issued as Technical Report Number 617, School of Statistics, University of Minnesota, March 1997, describing

More information

Confidence intervals for the variance component of random-effects linear models

Confidence intervals for the variance component of random-effects linear models The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina

More information

Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach

Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach Nwankwo Chike H., Nwaigwe Godwin I Abstract: Road traffic crashes are count (discrete) in nature.

More information

LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS

LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS FINAL REPORT PennDOT/MAUTC Agreement Contract No. VT-8- DTRS99-G- Prepared for Virginia Transportation Research Council By H. Rakha,

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Testing and Model Selection

Testing and Model Selection Testing and Model Selection This is another digression on general statistics: see PE App C.8.4. The EViews output for least squares, probit and logit includes some statistics relevant to testing hypotheses

More information

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY Tingting Huang 1, Shuo Wang 2, Anuj Sharma 3 1,2,3 Department of Civil, Construction and Environmental Engineering,

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Generalized linear models

Generalized linear models Generalized linear models Christopher F Baum ECON 8823: Applied Econometrics Boston College, Spring 2016 Christopher F Baum (BC / DIW) Generalized linear models Boston College, Spring 2016 1 / 1 Introduction

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta

Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in the Niger Delta International Journal of Science and Engineering Investigations vol. 7, issue 77, June 2018 ISSN: 2251-8843 Application of Poisson and Negative Binomial Regression Models in Modelling Oil Spill Data in

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Non-Gaussian Response Variables

Non-Gaussian Response Variables Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized

More information

Step 2: Select Analyze, Mixed Models, and Linear.

Step 2: Select Analyze, Mixed Models, and Linear. Example 1a. 20 employees were given a mood questionnaire on Monday, Wednesday and again on Friday. The data will be first be analyzed using a Covariance Pattern model. Step 1: Copy Example1.sav data file

More information

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS020) p.3863 Model Selection for Semiparametric Bayesian Models with Application to Overdispersion Jinfang Wang and

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute

More information

Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models

Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models By Srinivas Reddy Geedipally Research Assistant Zachry Department

More information

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type TRB Paper 10-2572 Examining Methods for Estimating Crash Counts According to Their Collision Type Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas A&M University

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

The GENMOD Procedure (Book Excerpt)

The GENMOD Procedure (Book Excerpt) SAS/STAT 9.22 User s Guide The GENMOD Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.22 User s Guide. The correct bibliographic citation for the complete

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression. Ryan Godwin. ECON University of Manitoba Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

Transport Data Analysis and Modeling Methodologies

Transport Data Analysis and Modeling Methodologies 1 Transport Data Analysis and Modeling Methodologies Lab Session #12 (Random Parameters Count-Data Models) You are given accident, evirnomental, traffic, and roadway geometric data from 275 segments of

More information

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator As described in the manuscript, the Dimick-Staiger (DS) estimator

More information

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009 EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS Transportation Seminar February 16 th, 2009 By: Hongyun Chen Graduate Research Assistant 1 Outline Introduction Problem

More information

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros Dominique Lord 1 Associate Professor Zachry Department of Civil Engineering Texas

More information

BOOTSTRAPPING WITH MODELS FOR COUNT DATA

BOOTSTRAPPING WITH MODELS FOR COUNT DATA Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

SAS/STAT 14.2 User s Guide. The GENMOD Procedure

SAS/STAT 14.2 User s Guide. The GENMOD Procedure SAS/STAT 14.2 User s Guide The GENMOD Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

More information

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION

ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION ONE MORE TIME ABOUT R 2 MEASURES OF FIT IN LOGISTIC REGRESSION Ernest S. Shtatland, Ken Kleinman, Emily M. Cain Harvard Medical School, Harvard Pilgrim Health Care, Boston, MA ABSTRACT In logistic regression,

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

Twentieth vs. twenty first century statistics: evaluating the degree of improvement in model

Twentieth vs. twenty first century statistics: evaluating the degree of improvement in model Twentieth vs. twenty first century statistics: evaluating the degree of improvement in model fit between general and generalized linear modelling approaches. Rachel Buxton, Jens Currie, Sheena Fry, Megan

More information