«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]刘子璇,俞建峰,等.基于图结构 Transformer 网络的骨架行为识别研究[J].机械与电子,2025,(03):3-8.
　LIU Zixuan,YU Jianfeng,et al.Research on Skeleton Action Recognition Based on Graph-structured Transformer Networks[J].Machinery & Electronics,2025,(03):3-8.
点击复制

基于图结构 Transformer 网络的骨架行为识别研究()

分享到：

《机械与电子》[ISSN:1001-2257/CN:52-1052/TH]

卷:
期数:: 2025年03期

页码:: 3-8

栏目:: 研究与设计

出版日期:: 2025-03-25

文章信息/Info

Title:: Research on Skeleton Action Recognition Based on Graph-structured Transformer Networks

文章编号:: 1001-2257 ( 2025 ) 03-0003-06

作者:: 刘子璇 ¹; 2 ; 俞建峰 ¹; 2 ; 钱陈豪¹; 2 ; 化春键 ³ ; 蒋毅³; 1. 江南大学机械工程学院,江苏无锡 214122 ;
2. 江苏省食品先进制造装备技术重点实验室,江苏无锡 214122 ;
3. 江南大学智能制造学院,江苏无锡 214122

Author(s):: LIU Zixuan¹; 2 ; YU Jianfeng¹; 2 ; QIAN Chenhao1; 2 ; HUA Chunjian³ ; JIANG Yi³; ( 1.School of Mechanical Engineering , Jiangnan University , Wuxi 214122 , China ;
2.Jiangsu Key Laboratory of Advanced Food Manufacturing Equipmentand Technology , Wuxi 214122 , China ;
3.School of Intelligent Manufacturing , Jiangnan University , Wuxi 214122 , China )

关键词:: 骨架动作识别; 图结构 Transformer ; 分层次注意力; 骨架图结构嵌入

Keywords:: skeleton action recognition ; graph-structured Transformer ; hierarchical attention ; skeleton graph structure embedding

分类号:: TP391.4

文献标志码:: A

摘要:: 针对目前人体骨架行为识别方法多用图卷积作为框架,缺少动作时序信息建模与时空特征融合能力的问题,提出一种基于 Transformer 算法的 Actionformer 模型。该模型采用分组注意力结构以加强局部特征的提取能力,并添加了空间信息嵌入与时序信息嵌入模块,以增强原始 Transformer 模型对空间和时间特征的提取。实验结果显示, Actionformer 模型在 NTU RGB+D 数据集上的动作识别准确率较高,优于 ST-GCN 和 ST-TR 等基于图卷积和 Transformer 的传统模型。

Abstract:: To address the limitations of current human skeleton based action recognition methods , which predominantly use graph convolutional frameworks and lack the ability to effectively model temporal information and fuse spatio temporal features , this paper proposes the Actionformer model based on the Transformer algorithm.This model employs a grouped attention structure to enhance the extraction of local features and incorporates spatial and temporal information embedding modules to improve the original Transformer model ’ s ability to capture spatial and temporal characteristics.Experimental results demonstrate that the Actionformer model achieves higher action recognition accuracy on the NTU RGB+D datasetthan traditional models based on graph convolution and Transformers , such as ST-GCN and ST-TR.

参考文献/References:

[ 1 ] 刘宝龙,周森,董建锋,等 . 基于骨架的人体动作识别技术研究进展[ J ] . 计算机辅助设计与图形学学报, 2023 ,35 ( 9 ): 1299-1322.

[ 2 ] 刘宽,奚小冰,周明东 . 基于自适应多尺度图卷积网络的骨架动作识别[ J ] . 计算机工程, 2023 , 49 ( 10 ): 264-271.

[ 3 ] 李子良,李庆党,王晓波,等 . 基于 Kinect-V2 和 Unity3D 的机械臂人机交互系统[ J ] . 计算机与数字工程,2024 , 52 ( 3 ): 735-739 , 745.

[ 4 ] SONG L C , YU G , YUAN J S , et al.Human pose estimation and its application to action recognition : a survey [ J ] .Journal of visual communication and image representation , 2021 , 75 : 103055.

[ 5 ] CHUNG J L , ONG LY , LEOW M C.Comparative analysis of skeleton-based human pose estimation [ J ] . Future internet , 2022 , 14 ( 12 ): 380.

[ 6 ] DONG C G , TANG Y H , ZHANG L Y , et al.HDA pose : a real-time 2D human pose estimation method based on modified YOLOv8 [ J ] .Signal , image and video processing , 2024 , 18 ( 8 / 9 ): 5823-5839.

[ 7 ] SHAHROUDY A , LIU J , NG T T , et al.NTU RGB+ D : a large scale dataset for 3D human activity analysis [ C ] ∥2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ) .New York : IEEE , 2016 : 1010-1019.

[ 8 ] YAN S J , XIONG Y J , LIN D H , et al.Spatial temporal graph convolutional networks for skeleton-based action recognition [ C ] ∥AAAI ’ 18 : AAAI Conference on Artificial Intelligence , 2018 : 7444-7452.

[ 9 ] 李晶晶,黄章进,邹露 . 基于运动引导图卷积网络的人体动作识别[ J ] . 计算机辅助设计与图形学学报, 2024 ,36 ( 7 ): 1077-1086.

[ 10 ] 韩宗旺,杨涵,吴世青,等 . 时空自适应图卷积与 Transformer 结合的动作识别网络[ J ] . 电子与信息学报,2024 , 46 ( 6 ): 2587-2595.

?[ 11 ] WANG G Q , LIU M Y , LIU H , et al.Augmented skeleton sequences with hypergraph network for self supervised group activity recognition [ J ] .Pattern recognition , 2024 , 152 : 110478.

[ 12 ] PANG Y S , KE Q H , RAHMANI H , et al.IGFormer : interaction graph transformer for skeleton-based human interaction recognition [ C ] ∥Computer Vision ECCV 2022 , 2022 : 605-622.

[ 13 ] MENG Y , SHI M Q , YANG W L .Skeleton action recognition based on transformer adaptive graph convolution [ J ] .Journal of physics : conference series , 2022 , 2170 : 12007.

[ 14 ] LI S , LI W Q , COOK C , et al.Independently recurrent neural network : building a longer and deeper RNN [ C ] ∥2018 IEEE / CVF Conference on Computer Vision and Pattern Recognition.New York : IEEE , 2018 : 5457-5466.

[ 15 ] SHI L , ZHANG Y F , CHENG J , et al.Two-stream adaptive graph convolutional networks for skeleton based action recognition. [ C ] ∥2019 IEEE / CVF Conference on Computer Vision and Pattern Recognition. New York : IEEE , 2019 : 12026-12035.

[ 16 ] PLIZZARI C , CANNICI M , MATTEUCCI M.Skeleton-based action recognition via spatial and temporal transformer networks [ J ] .Computer vision and image understanding , 2021 , 208 / 209 : 103219.

[ 17 ] CHENG Q , CHENG J , REN Z L , et al.Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition [ J ] .Pattern analysis and applications , 2023 , 26 ( 3 ): 1303-1315.

[ 18 ] CHEN Y X , ZHANG Z Q , YUAN C F , et al.Channel wise topology refinement graph convolution for skeleton-based action recognition [ C ] ∥Proceedings of IEEE / CVF International Conference on Computer Vision.New York : IEEE , 2022 : 13339-13348.

备注/Memo

备注/Memo:: 收稿日期: 2024-09-23
基金项目:国家自然科学基金资助项目( 51905215 )
作者简介:刘子璇 ( 2000- ),男,湖北孝感人,硕士研究生,研究方向为深度学习动作识别;俞建峰 ( 1974- ),男,江苏宜兴人,教授,研究方向为机器人运动控制、工业互联与智能传感,通信作者。

更新日期/Last Update: 2025-04-07

《机械与电子》[ISSN:1001-2257/CN:52-1052/TH]

文章信息/Info

参考文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics