Loughborough University
Browse

Multimodal scene representation learning for localization of driverless vehicles

Download (3.48 MB)
thesis
posted on 2024-10-21, 12:46 authored by Haylat Tibebu

There has been a surge in interest in driverless vehicles and other vehicular technologies recently. Accurate sensing, perception and control technology that accounts for human behaviour, objects in the environment which includes static and dynamic, and the characteristics of roadways and their environments are necessary for the development of safe and dependable intelligent transportation systems. High-dimensional and complex environments are typical in the real world, and safe autonomous driving requires optimal and robust performance in these settings. This thesis presents three primary advances made to address critical issues in autonomous driving systems. 1) Inadequate datasets that capture the high-dimensional and complexity of the real world, 2) Utilizing LiDAR for mapping and localization in surroundings with glass surfaces that cannot be detected, localised, or represented by laser range measurements of LiDAR, and 3) Deep learning-powered LiDAR and camera fusion model.

To overcome the lack of publicly available datasets, the Loughborough Autonomous Vehicle dataset 2 (LboroAV2) was collected, incorporating data from multiple sensors such as cameras, LiDAR, ultrasound, e-compass, and rotary encoder. This unique multimodal dataset captures a wide range of structured and unstructured contexts, various lighting conditions, and diverse weather scenarios, making it ideal for the development of cutting-edge deep learning methods. A significant challenge in mapping the environment is the presence of glass, which is invisible to LiDAR scanners. The research proposes a novel method utilizing LiDAR sensors to effectively detect and localize glass, eliminating errors caused by glass in occupancy grid maps. This method achieves an impressive accuracy of 96.2% in detecting frameless glass from long ranges without relying on intensity peaks. The research leverages the LboroAV2 dataset to tackle the limitation of single-sensor-based estimation in ego-motion estimation. An end-to-end deep learning architecture is proposed for the fusion of RGB images and LiDAR laser scan data using a convolutional neural encoder and a recurrent neural network (RNN). This architecture, trained and evaluated on the LboroAV2 and KITTI Visual Odometry datasets, not only outperforms other methods but also provides explainability through visualization of the learning process.

This works contributes to the field of autonomous systems by emphasising the significance of multi-modal sensors in enhancing an Autonomous Vehicle (AV) system's ability to accurately represent, understand, and interpret its environment, all of which leads to more informed and effective decision making.

Funding

MIMIc: Multimodal Imitation Learning in MultI-Agent Environments

Engineering and Physical Sciences Research Council

Find out more...

History

School

  • Loughborough University, London

Publisher

Loughborough University

Rights holder

© Haileleol Tibebu

Publication date

2022

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

Varuna Desilva

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate

Usage metrics

    Loughborough University London Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC