Multimodal scene representation learning for localization of driverless vehicles
There has been a surge in interest in driverless vehicles and other vehicular technologies recently. Accurate sensing, perception and control technology that accounts for human behaviour, objects in the environment which includes static and dynamic, and the characteristics of roadways and their environments are necessary for the development of safe and dependable intelligent transportation systems. High-dimensional and complex environments are typical in the real world, and safe autonomous driving requires optimal and robust performance in these settings. This thesis presents three primary advances made to address critical issues in autonomous driving systems. 1) Inadequate datasets that capture the high-dimensional and complexity of the real world, 2) Utilizing LiDAR for mapping and localization in surroundings with glass surfaces that cannot be detected, localised, or represented by laser range measurements of LiDAR, and 3) Deep learning-powered LiDAR and camera fusion model.
To overcome the lack of publicly available datasets, the Loughborough Autonomous Vehicle dataset 2 (LboroAV2) was collected, incorporating data from multiple sensors such as cameras, LiDAR, ultrasound, e-compass, and rotary encoder. This unique multimodal dataset captures a wide range of structured and unstructured contexts, various lighting conditions, and diverse weather scenarios, making it ideal for the development of cutting-edge deep learning methods. A significant challenge in mapping the environment is the presence of glass, which is invisible to LiDAR scanners. The research proposes a novel method utilizing LiDAR sensors to effectively detect and localize glass, eliminating errors caused by glass in occupancy grid maps. This method achieves an impressive accuracy of 96.2% in detecting frameless glass from long ranges without relying on intensity peaks. The research leverages the LboroAV2 dataset to tackle the limitation of single-sensor-based estimation in ego-motion estimation. An end-to-end deep learning architecture is proposed for the fusion of RGB images and LiDAR laser scan data using a convolutional neural encoder and a recurrent neural network (RNN). This architecture, trained and evaluated on the LboroAV2 and KITTI Visual Odometry datasets, not only outperforms other methods but also provides explainability through visualization of the learning process.
This works contributes to the field of autonomous systems by emphasising the significance of multi-modal sensors in enhancing an Autonomous Vehicle (AV) system's ability to accurately represent, understand, and interpret its environment, all of which leads to more informed and effective decision making.
Funding
MIMIc: Multimodal Imitation Learning in MultI-Agent Environments
Engineering and Physical Sciences Research Council
Find out more...History
School
- Loughborough University, London
Publisher
Loughborough UniversityRights holder
© Haileleol TibebuPublication date
2022Notes
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.Language
- en
Supervisor(s)
Varuna DesilvaQualification name
- PhD
Qualification level
- Doctoral
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate