A marginalized random effects hurdle negative binomial model for analyzing refined-scale crash frequency data Rongjie Yu Yiyun Wang Mohammed Quddus Jian Li 2134/37857 https://repository.lboro.ac.uk/articles/journal_contribution/A_marginalized_random_effects_hurdle_negative_binomial_model_for_analyzing_refined-scale_crash_frequency_data/9458564 Crash frequency prediction models have been an important subject of safety research that unveils a relationship between crash occurrences and their influencing factors. Recently, the hourly-based refined-scale crash frequency analysis becomes attractive since it holds the benefits of introducing time-varying explanatory information (e.g. traffic volume and operating speed). However, crash frequency data with short time intervals possess the analytical issues of excessive zeros and unobserved heterogeneity. In this study, a marginalized random effects hurdle negative binomial (MREHNB) model was developed in which the hurdle modelling structure handles the excessive zeros issue and site-specific random effect terms capture the factors associated with unobserved heterogeneity. Moreover, the marginalized inference approach was first introduced here to obtain the marginal mean inference for the overall population rather than subject-specific estimations. Empirical analyses were conducted based on data from the Shanghai urban expressway system, and the MREHNB model was compared with the HNB (hurdle negative binomial) and the REHNB (random effects hurdle negative binomial) model. In terms of model goodness-of-fits, REHNB and MREHNB model showed substantial improvement compared to the HNB model while there was no distinct difference between the REHNB and MREHNB models. However, as for the estimated parameters, the MREHNB model provided better inference precisions. 20 Furthermore, the MREHNB model provided interesting findings for the crash 21 contributing factors, for example, higher ratios of local vehicles within the volume 22 would enhance the probability of crash occurrence; and a non-linear relationship was 23 concluded between traffic volume and crash frequency with the moderate level of 24 volume held the highest crash occurrence probability. Finally, in-depth analyses about 25 the modeling results and the model technique were discussed. 2019-06-03 08:51:45 Marginalized model Site-specific random effects term Hurdle negative binomial model Excessive zeros Unobserved heterogeneity Built Environment and Design not elsewhere classified