Skeleton-based action recognition by deep learning
Skeleton-based action recognition is an important research direction in computer vision. Compared with traditional video data, skeleton data can reduce environmental background interference. This feature of skeleton data makes action recognition have broad application prospects in many application scenarios, such as human-computer interaction. However, the current mainstream method, graph convolutional network, still faces many challenges in skeleton-based action recognition. These problems include information loss between nodes, limited receptive field, insufficient time series feature extraction, and slow training speed due to high model complexity. Based on these problems, we propose three new GCN-based models.
First, the graph instinctive attention convolutional network (GIAN) introduces an Instinctive attention module. This module applies self-attention before the convolution process to preserve the initial correlation between skeleton joints. This approach significantly improves the model's ability to capture complex joint relationships, thereby improving recognition accuracy.
Then, the independent dual graph attention convolutional network (IDGAN) enhances the model's information extraction ability by using two independent self-attention modules. The two modules process the data streams of different channels separately to avoid interference between channels. This architecture achieves more accurate spatial and temporal feature extraction and has strong compatibility with other GCN-based models.
Finally, the fast distance enhanced graph convolutional network (FD-GCN) introduces distance-enhanced topology (DTS) to expand the receptive field of the skeleton graph and fast response time series convolution (FTSC) to extract temporal information more efficiently. FD-GCN solves the problem of slow computation while maintaining state-of-the-art performance. The training and inference time of FD-GCN is significantly shortened, reducing the demand for GPUs for skeleton-based action recognition.
Extensive experiments on multiple datasets and different benchmarks show that these models have significant improvements in accuracy, contributing to the advancement of skeleton-based action recognition technology.
History
School
- Science
Department
- Computer Science
Publisher
Loughborough UniversityRights holder
© Jinze HuoPublication date
2024Notes
A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.Language
- en
Supervisor(s)
Qinggang MengQualification name
- PhD
Qualification level
- Doctoral
This submission includes a signed certificate in addition to the thesis file(s)
- I have submitted a signed certificate