Understanding the effects of underreporting on injury severity estimation of single-vehicle motorcycle crashes: a hybrid approach incorporating majority class oversampling and random parameters with heterogeneity-in-means
journal contribution
posted on 2025-11-04, 11:29authored byNawaf Alnawmasi, Apostolos Ziakopoulos, Athanasios Theofilatos, Yasir AliYasir Ali
The underreporting of crash data is a well-documented issue in road safety literature, but few studies have focused on addressing this problem in the context of analyzing crash injury severities. This paper aims to provide an empirical assessment of the impact of underreporting issue using a hybrid approach in estimating injury severity for single-vehicle motorcycle crashes. Unlike traditional machine learning methods that oversample the minority class (the category with the fewer observations such as fatal and severe injuries), the present study oversamples the majority class (i.e. minor injuries), which are often underreported in crash datasets, thus providing a fresh perspective on this issue. Afterwards, random parameter models with heterogeneity in means and variances were applied. The results of this study, as supported by the likelihood ratio tests, indicate that the key variables influencing motorcyclists’ injury severities remain consistent across both original and oversampled data models. Specifically, crashes occurring during slowing down or stopping are associated with lower injury severity, whereas negotiating a right turn increases the probability of severe injuries. Interestingly, crashes that occur on dry pavements are associated with higher injury severity when compared to wet pavements, likely due to rider behavior adjustments in adverse weather conditions to compensate for the risk. Overall, the oversampled models have a significantly lower marginal effects values compared to the original model's marginal effects. This study provides a foundation for further examination of underreporting issue in crash injury severity modelling and also highlights the need to capture the dynamics of crash injuries suggesting that alternative approaches could improve the understanding and hence road safety management. Future studies are encouraged to replicate this methodology to validate the findings as well as utilize other advanced machine learning algorithms, like tree-based models to assess underreporting mitigation.<p></p>