Human action recognition (HAR) from RGB videos
is essential and challenging in the computer vision field due
to its wide range of real-world applications in fields of human
behaviour analysis, human-computer interactions, robotics and
surveillance etc. Since the breakthrough and fast development of
deep learning technology, the performance of HAR based on deep
neural networks has been significantly improved in this decade. In
this survey, we discuss the growing use of deep learning for HAR,
such as representative two-stream and 3D CNNs, and particularly
highlight most recent success achieved by using attention and
transformers. We will provide our perspective on the new trend
of designing innovative deep learning methods. In addition, we
also present popular HAR datasets developed in recent years
and benchmark accuracy achieved by current advancement in
deep learning. This draws research attention to the challenges of
HAR by identifying performance gaps when applying the deep
learning methods on large HAR datasets. Further, this survey
sheds light on the development of new methods and facilitates
qualitative comparison with state of the art.
History
School
Science
Department
Computer Science
Published in
2021 20th IEEE International Conference On Machine Learning And Applications (ICMLA)
Pages
304 - 311
Source
20th International Conference on Machine Learning and Applications (ICMLA)
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.