X4D-SceneFormer: Enhanced scene understanding on 4D point cloud videos through cross-modal knowledge transfer
The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences.
However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer. This framework enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining. Specifically, the framework is designed with a dual-branch architecture, consisting of an 4D point cloud transformer and a Gradient-aware Image Transformer (GIT). The GIT combines visual texture and temporal correlation features to offer rich semantics and dynamics for better point cloud representation. During training, we employ multiple knowledge transfer techniques, including temporal consistency losses and masked self-attention, to strengthen the knowledge transfer between modalities. This leads to enhanced performance during inference using singlemodal 4D point cloud inputs. Extensive experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks, including action recognition, action segmentation and semantic segmentation.
The results achieve 1st places, i.e., 85.3% (+7.9%) accuracy and 47.3% (+5.0%) mIoU for 4D action segmentation and semantic segmentation, on the HOI4D challenge, outperforming previous state-of-the-art by a large margin.We release the code at https://github.com/jinglinglingling/X4D
Funding
Shanghai AI Laboratory, National Key R&D Program of China (2022ZD0160100)
National Natural Science Foundation of China (62106183)
Shenzhen General Program No. JCYJ20220530143600001
Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S&T Cooperation Zone
Shenzhen-Hong Kong Joint Funding No. SGDX20211123112401002
NSFC with Grant No. 62293482
Shenzhen Outstanding Talents Training Fund
Guangdong Research Project No. 2017ZT07X152 and No. 2019CX01X104
Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001)
Guangdong Provincial Key Laboratory of Big Data Computing
The Chinese University of Hong Kong, Shenzhen
NSFC 61931024&81922046
Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. ZDSYS201707251409055)
Key Area R&D Program of Guangdong Province with grant No. 2018B030338001
zelixir biotechnology company Fund
Tencent Open Fund
History
School
- Science
Department
- Computer Science
Published in
Proceedings of the 38th Annual AAAI Conference on Artificial IntelligenceVolume
38Issue
3Pages
2670-2678Source
The 38th Annual AAAI Conference on Artificial IntelligencePublisher
AAAI PressVersion
- AM (Accepted Manuscript)
Rights holder
© Association for the Advancement of Artificial IntelligencePublisher statement
This is a conference paper presented at the 38th Annual AAAI Conference on Artificial Intelligence and published openly by AAAI Press. © Association for the Advancement of Artificial Intelligence. All Rights Reserved.Acceptance date
2023-12-08Publication date
2024-03-24Copyright date
2024ISBN
9781577358800ISSN
2159-5399eISSN
2374-3468Publisher version
Language
- en