[1]刘子璇,俞建峰,等.基于图结构 Transformer 网络的骨架行为识别研究[J].机械与电子,2025,(03):3-8.
 LIU Zixuan,YU Jianfeng,et al.Research on Skeleton Action Recognition Based on Graph-structured Transformer Networks[J].Machinery & Electronics,2025,(03):3-8.
点击复制

基于图结构 Transformer 网络的骨架行为识别研究()
分享到:

《机械与电子》[ISSN:1001-2257/CN:52-1052/TH]

卷:
期数:
2025年03期
页码:
3-8
栏目:
研究与设计
出版日期:
2025-03-25

文章信息/Info

Title:
Research on Skeleton Action Recognition Based on Graph-structured Transformer Networks
文章编号:
1001-2257 ( 2025 ) 03-0003-06
作者:
刘子璇 1 2 俞建峰 1 2 钱陈豪 1 2 化春键 3 蒋 毅 3
1. 江南大学机械工程学院,江苏 无锡 214122 ;
2. 江苏省食品先进制造装备技术重点实验室,江苏 无锡 214122 ;
3. 江南大学智能制造学院,江苏 无锡 214122
Author(s):
LIU Zixuan1 2 YU Jianfeng1 2 QIAN Chenhao1 2 HUA Chunjian3 JIANG Yi3
( 1.School of Mechanical Engineering , Jiangnan University , Wuxi 214122 , China ;
2.Jiangsu Key Laboratory of Advanced Food Manufacturing Equipmentand Technology , Wuxi 214122 , China ;
3.School of Intelligent Manufacturing , Jiangnan University , Wuxi 214122 , China )
关键词:
骨架动作识别图结构 Transformer 分层次注意力骨架图结构嵌入
Keywords:
skeleton action recognition graph-structured Transformer hierarchical attention skeleton graph structure embedding
分类号:
TP391.4
文献标志码:
A
摘要:
针对目前人体骨架行为识别方法多用图卷积作为框架,缺少动作时序信息建模与时空特征融合能力的问题,提出一种基于 Transformer 算法的 Actionformer 模型。该模型采用分组注意力结构以加强局部特征的提取能力,并添加了空间信息嵌入与时序信息嵌入模块,以增强原始 Transformer 模型对空间和时间特征的提取。实验结果显示, Actionformer 模型在 NTU RGB+D 数据集上的动作识别准确率较高,优于 ST-GCN 和 ST-TR 等基于图卷积和 Transformer 的传统模型。
Abstract:
To address the limitations of current human skeleton based action recognition methods , which predominantly use graph convolutional frameworks and lack the ability to effectively model temporal information and fuse spatio temporal features , this paper proposes the Actionformer model based on the Transformer algorithm.This model employs a grouped attention structure to enhance the extraction of local features and incorporates spatial and temporal information embedding modules to improve the original Transformer model ’ s ability to capture spatial and temporal characteristics.Experimental results demonstrate that the Actionformer model achieves higher action recognition accuracy on the NTU RGB+D datasetthan traditional models based on graph convolution and Transformers , such as ST-GCN and ST-TR.

参考文献/References:

[ 1 ] 刘宝龙,周森,董建锋,等 . 基于骨架的人体动作识别技术研究进展[ J ] . 计算机辅助设计与图形学学报, 2023 ,35 ( 9 ): 1299-1322.

[ 2 ] 刘宽,奚小冰,周明东 . 基于自适应多尺度图卷积网络的骨架动作识别[ J ] . 计算机工程, 2023 , 49 ( 10 ): 264-271.
[ 3 ] 李子良,李庆党,王晓波,等 . 基于 Kinect-V2 和 Unity3D 的机械臂人机交互系统[ J ] . 计算机与数字工程,2024 , 52 ( 3 ): 735-739 , 745.
[ 4 ] SONG L C , YU G , YUAN J S , et al.Human pose estimation and its application to action recognition : a survey [ J ] .Journal of visual communication and image representation , 2021 , 75 : 103055.
[ 5 ] CHUNG J L , ONG LY , LEOW M C.Comparative analysis of skeleton-based human pose estimation [ J ] . Future internet , 2022 , 14 ( 12 ): 380.
[ 6 ] DONG C G , TANG Y H , ZHANG L Y , et al.HDA pose : a real-time 2D human pose estimation method based on modified YOLOv8 [ J ] .Signal , image and video processing , 2024 , 18 ( 8 / 9 ): 5823-5839.
[ 7 ] SHAHROUDY A , LIU J , NG T T , et al.NTU RGB+ D : a large scale dataset for 3D human activity analysis [ C ] ∥2016 IEEE Conference on Computer Vision and Pattern Recognition ( CVPR ) .New York : IEEE , 2016 : 1010-1019.
[ 8 ] YAN S J , XIONG Y J , LIN D H , et al.Spatial temporal graph convolutional networks for skeleton-based action recognition [ C ] ∥AAAI ’ 18 : AAAI Conference on Artificial Intelligence , 2018 : 7444-7452.
[ 9 ] 李晶晶,黄章进,邹露 . 基于运动引导图卷积网络的人体动作识别[ J ] . 计算机辅助设计与图形学学报, 2024 ,36 ( 7 ): 1077-1086.
[ 10 ] 韩 宗 旺,杨 涵,吴 世 青,等 . 时空自适应图卷积与 Transformer 结合的动作识别网络[ J ] . 电子与信息学报,2024 , 46 ( 6 ): 2587-2595.
?[ 11 ] WANG G Q , LIU M Y , LIU H , et al.Augmented skeleton sequences with hypergraph network for self supervised group activity recognition [ J ] .Pattern recognition , 2024 , 152 : 110478.
[ 12 ] PANG Y S , KE Q H , RAHMANI H , et al.IGFormer : interaction graph transformer for skeleton-based human interaction recognition [ C ] ∥Computer Vision ECCV 2022 , 2022 : 605-622.
[ 13 ] MENG Y , SHI M Q , YANG W L .Skeleton action recognition based on transformer adaptive graph convolution [ J ] .Journal of physics : conference series , 2022 , 2170 : 12007.
[ 14 ] LI S , LI W Q , COOK C , et al.Independently recurrent neural network : building a longer and deeper RNN [ C ] ∥2018 IEEE / CVF Conference on Computer Vision and Pattern Recognition.New York : IEEE , 2018 : 5457-5466.
[ 15 ] SHI L , ZHANG Y F , CHENG J , et al.Two-stream adaptive graph convolutional networks for skeleton based action recognition. [ C ] ∥2019 IEEE / CVF Conference on Computer Vision and Pattern Recognition. New York : IEEE , 2019 : 12026-12035.
[ 16 ] PLIZZARI C , CANNICI M , MATTEUCCI M.Skeleton-based action recognition via spatial and temporal transformer networks [ J ] .Computer vision and image understanding , 2021 , 208 / 209 : 103219.
[ 17 ] CHENG Q , CHENG J , REN Z L , et al.Multi-scale spatial-temporal convolutional neural network for skeleton-based action recognition [ J ] .Pattern analysis and applications , 2023 , 26 ( 3 ): 1303-1315.
[ 18 ] CHEN Y X , ZHANG Z Q , YUAN C F , et al.Channel wise topology refinement graph convolution for skeleton-based action recognition [ C ] ∥Proceedings of IEEE / CVF International Conference on Computer Vision.New York : IEEE , 2022 : 13339-13348.

备注/Memo

备注/Memo:
收稿日期: 2024-09-23
基金项目:国家自然科学基金资助项目( 51905215 )
作者简介:刘子璇 ( 2000- ),男,湖北孝感人,硕士研究生,研究方向为深度学习动作识别;俞建峰 ( 1974- ),男,江苏宜兴人,教授,研究方向为机器人运动控制、工业互联与智能传感,通信作者。
更新日期/Last Update: 2025-04-07