Loughborough University
Browse

File(s) under permanent embargo

Reason: Publisher requirement. Embargo will be lifted after publication.

X4D-SceneFormer: Enhanced scene understanding on 4D point cloud videos through cross-modal knowledge transfer

conference contribution
posted on 2024-01-11, 15:35 authored by Linglin JingLinglin Jing, Ying Xue, Xu Yan, Chaoda Zheng, Dong Wang, Ruimao Zhang, Zhigang Wang, Hui FangHui Fang, Bin Zhao, Zhen Li

The field of 4D point cloud understanding is rapidly developing with the goal of analyzing dynamic 3D point cloud sequences.

However, it remains a challenging task due to the sparsity and lack of texture in point clouds. Moreover, the irregularity of point cloud poses a difficulty in aligning temporal information within video sequences. To address these issues, we propose a novel cross-modal knowledge transfer framework, called X4D-SceneFormer. This framework enhances 4D-Scene understanding by transferring texture priors from RGB sequences using a Transformer architecture with temporal relationship mining. Specifically, the framework is designed with a dual-branch architecture, consisting of an 4D point cloud transformer and a Gradient-aware Image Transformer (GIT). The GIT combines visual texture and temporal correlation features to offer rich semantics and dynamics for better point cloud representation. During training, we employ multiple knowledge transfer techniques, including temporal consistency losses and masked self-attention, to strengthen the knowledge transfer between modalities. This leads to enhanced performance during inference using singlemodal 4D point cloud inputs. Extensive experiments demonstrate the superior performance of our framework on various 4D point cloud video understanding tasks, including action recognition, action segmentation and semantic segmentation.

The results achieve 1st places, i.e., 85.3% (+7.9%) accuracy and 47.3% (+5.0%) mIoU for 4D action segmentation and semantic segmentation, on the HOI4D challenge, outperforming previous state-of-the-art by a large margin.We release the code at https://github.com/jinglinglingling/X4D

Funding

Shanghai AI Laboratory, National Key R&D Program of China (2022ZD0160100)

National Natural Science Foundation of China (62106183)

Shenzhen General Program No. JCYJ20220530143600001

Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen HK S&T Cooperation Zone

Shenzhen-Hong Kong Joint Funding No. SGDX20211123112401002

NSFC with Grant No. 62293482

Shenzhen Outstanding Talents Training Fund

Guangdong Research Project No. 2017ZT07X152 and No. 2019CX01X104

Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001)

Guangdong Provincial Key Laboratory of Big Data Computing

The Chinese University of Hong Kong, Shenzhen

NSFC 61931024&81922046

Shenzhen Key Laboratory of Big Data and Artificial Intelligence (Grant No. ZDSYS201707251409055)

Key Area R&D Program of Guangdong Province with grant No. 2018B030338001

zelixir biotechnology company Fund

Tencent Open Fund

History

School

  • Science

Department

  • Computer Science

Published in

Proceedings of the 38th Annual AAAI Conference on Artificial Intelligence

Source

The 38th Annual AAAI Conference on Artificial Intelligence

Publisher

AAAI Press

Version

  • AM (Accepted Manuscript)

Rights holder

© Association for the Advancement of Artificial Intelligence

Publisher statement

This is a conference paper presented at the 8th Annual AAAI Conference on Artificial Intelligence and published openly by AAAI Press.

Acceptance date

2023-12-08

ISBN

9781577358800

ISSN

2159-5399

eISSN

2374-3468

Language

  • en

Location

Vancouver, Canada

Event dates

20th February 2024 - 27th February 2024

Depositor

Linglin Jing. Deposit date: 25 December 2023

Usage metrics

    Loughborough Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC