Multilevel logistic regression modelling for crash mapping in metropolitan areas

The spatial nature of traffic crashes makes crash locations one of the most important and informative attributes of crash databases. It is however very likely that recorded crash locations in terms of easting and northing coordinates, distances from junctions, addresses, road names and types are inaccurately reported. Improving the quality of crash locations therefore has the potential to enhance the accuracy of many spatial crash analyses. The determination of correct crash locations usually requires a combination of crash and network attributes with suitable crash mapping methods. Urban road networks are more sensitive to erroneous matches due to high road density and inherent complexity. This paper presents a novel crash mapping method suitable for urban and metropolitan areas that matched all the crashes that occurred in London from 2010-2012. The method is based on a hierarchical data structure of crashes (i.e. candidate road links are nested within vehicles and vehicles nested within crashes) and employs a multilevel logistic regression model to estimate the probability distribution of mapping a crash onto a set of candidate road links. The road link with the highest probability is considered to be the correct segment for mapping the crash. This is based on the two primary variables: (a) the distance between the crash location and a candidate segment and (b) the difference between the vehicle direction just before the collision and the link direction. Despite the fact that road names were not considered due to limited availability of this variable in the applied crash database, the developed method provides a 97.1% (±1%) accurate matches (N=1,000). The method was compared with two simpler, non-probabilistic crash mapping algorithms and the results were used to demonstrate the effect of crash location data quality on a crash risk analysis.