Loughborough University
Browse

Skeleton-based action recognition by deep learning

Download (5.21 MB)
thesis
posted on 2025-02-21, 16:34 authored by Jinze Huo

Skeleton-based action recognition is an important research direction in computer vision. Compared with traditional video data, skeleton data can reduce environmental background interference. This feature of skeleton data makes action recognition have broad application prospects in many application scenarios, such as human-computer interaction. However, the current mainstream method, graph convolutional network, still faces many challenges in skeleton-based action recognition. These problems include information loss between nodes, limited receptive field, insufficient time series feature extraction, and slow training speed due to high model complexity. Based on these problems, we propose three new GCN-based models.

First, the graph instinctive attention convolutional network (GIAN) introduces an Instinctive attention module. This module applies self-attention before the convolution process to preserve the initial correlation between skeleton joints. This approach significantly improves the model's ability to capture complex joint relationships, thereby improving recognition accuracy.

Then, the independent dual graph attention convolutional network (IDGAN) enhances the model's information extraction ability by using two independent self-attention modules. The two modules process the data streams of different channels separately to avoid interference between channels. This architecture achieves more accurate spatial and temporal feature extraction and has strong compatibility with other GCN-based models.

Finally, the fast distance enhanced graph convolutional network (FD-GCN) introduces distance-enhanced topology (DTS) to expand the receptive field of the skeleton graph and fast response time series convolution (FTSC) to extract temporal information more efficiently. FD-GCN solves the problem of slow computation while maintaining state-of-the-art performance. The training and inference time of FD-GCN is significantly shortened, reducing the demand for GPUs for skeleton-based action recognition.

Extensive experiments on multiple datasets and different benchmarks show that these models have significant improvements in accuracy, contributing to the advancement of skeleton-based action recognition technology.

History

School

  • Science

Department

  • Computer Science

Publisher

Loughborough University

Rights holder

© Jinze Huo

Publication date

2024

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

Qinggang Meng

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate

Usage metrics

    Computer Science Theses

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC