Exploring crash-risk factors using Bayes’ theorem and an optimization routine

Regression models used to analyse crash counts are associated with some kinds of data aggregation (either spatial, or temporal or both) that may result in inconsistent or incorrect outcomes. This paper introduces a new non-regression approach for analysing risk factors affecting crash counts without aggregating crashes. The method is an application of the Bayes’ Theorem that enables to compare the distribution of the prevailing traffic conditions on a road network (i.e. a priori) with the distribution of traffic conditions just before crashes (i.e. a posteriori). By making use of Bayes’ Theorem, the probability densities of continuous explanatory variables are estimated using kernel density estimation and a posterior log likelihood is maximised by an optimisation routine (Maximum Likelihood Estimation). The method then estimates the parameters that define the crash risk that is associated with each of the examined crash contributory factors. Both simulated and real-world data were employed to demonstrate and validate the developed theory in which, for example, two explanatory traffic variables speed and volume were employed. Posterior kernel densities of speed and volume at the location and time of crashes have found to be different that prior kernel densities of the same variables. The findings are logical as higher traffic volumes increase the risk of all crashes independently of collision type, severity and time of occurrence. Higher speeds were found to decrease the risk of multiple-vehicle crashes at peak-times and not to affect significantly multiple vehicle crash occurrences during off-peak times. However, the risk of single vehicle crashes always increases while speed increases.