posted on 2024-07-02, 16:37authored byKechen Song, Xiaotong Xue, Hongwei Wen, Yingying Ji, Yunhui Yan, Qinggang MengQinggang Meng
Multispectral object detection has achieved remarkable results due to its ability to fuse information from visible and thermal modalities in recent years. However, the existing visible-thermal datasets are constructed based on manually aligned image pairs, which cannot fully represent the challenges of real-world scenarios where image pairs are often misaligned. Existing methods for visible-thermal object detection are based on aligned data and are limited by the accuracy of registration. To address the above issues, we propose a dataset, namely DVTOD, which is a misaligned visible-thermal object detection dataset captured by drones. DVTOD includes 16 challenging attributes and 54 capture scenes. Furthermore, we introduce a cross-modal alignment detector (CMA-Det) for misaligned visible-thermal object detection. Firstly, we design an alignment network to estimate the visible-to-thermal deformation field, which is used to correct for misalignment of the corresponding visible and thermal features. Secondly, we propose a strategy called Object Search Rectification (OSR) to improve the robustness of feature alignment. To better remove the interference of complex backgrounds, a bi-directional feature correction fusion module (BFCFM) is designed to calibrate bimodal features by exploiting the correlation of channel and spatial information between two modalities. CMA-Det outperforms existing methods on the DVTOD dataset and two other visible-thermal object detection datasets. The dataset and code will be published at https://github.com/VDT-2048/DVTOD.
Funding
Fundamental Research Funds for the Central Universities (N2403008, N2403010)
Chunhui Plan Cooperative Project of Ministry of Education (HZKY20220433)
The 111 Project (B16009)
History
School
Science
Department
Computer Science
Published in
IEEE Transactions on Intelligent Vehicles
Publisher
Institute of Electrical and Electronics Engineers (IEEE)