[1]刘浚嘉 1,付 庄 1,谢荣理 2,等.模糊先验引导的高效强化学习移动机器人导航[J].机械与电子,2021,(08):72-76.
 LIU Junjia,FU Zhuang,XIE Rongli,et al.Inexplicit Priori Guided Efficient Reinforcement Learning for Mobile Robot Navigation[J].Machinery & Electronics,2021,(08):72-76.
点击复制

模糊先验引导的高效强化学习移动机器人导航()
分享到:

机械与电子[ISSN:1001-2257/CN:52-1052/TH]

卷:
期数:
2021年08期
页码:
72-76
栏目:
智能工程
出版日期:
2021-08-24

文章信息/Info

Title:
Inexplicit Priori Guided Efficient Reinforcement Learning for Mobile Robot Navigation
文章编号:
1001-2257 ( 2021 ) 08-0072-05
作者:
刘浚嘉 付 庄 谢荣理 张 俊 费 健
1. 上海交通大学机械系统与振动国家重点实验室,上海 200240 ;
2. 上海交通大学医学院附属瑞金医院,上海 200025
Author(s):
LIU Junjia FU Zhuang XIE Rongli ZHANG Jun FEI Jian
(1. State Key Laboratory of Mechanical System and Vibration , Shanghai Jiao Tong University , Shanghai 200240 , China ;
2.Ruijin Hospital Affiliated to Shanghai Jiao Tong University , Shanghai 200025 , China )
关键词:
移动机器人主动视觉导航坐标点导航数据高效强化学习
Keywords:
mobile robot active visual navigation point navigation data efficient reinforcement learning
分类号:
TP249
文献标志码:
A
摘要:
针对移动机器人仅使用 RGB-D 相机在室内场景下执行坐标点主动视觉导航任务,提出了一种基于模糊先验引导的高效强化学习控制算法,设计了一种先验与强化学习的融合控制方式.结果表明,所提出的融合控制方法能够在确保较好的导航准确率基础上,避免训练中无意义的探索并缩短强化学习训练时长,实现强化学习策略的高效学习.
Abstract:
Aiming at the mobile robot using only RGB-D cameras to perform coordinate point active visual navigation tasks in indoor scenes , an efficient reinforcement learning control algorithm based on inexplicit prior guidance is proposed , and a fusion control method of priori and reinforcement learning is designed.The results show that the proposed fusion control method can avoid meaningless exploration in training and shorten the training time of reinforcement learning on the basis of ensuring better navigation accuracy , and achieve efficient learning of reinforcement learning polices.

参考文献/References:

[1] CHAN M , ESTèVE D , ESCRIBA C , et al.A review of smart homes : Present state and future challenges [ J ] .Computer methods and programs in biomedicine ,2008 , 91 ( 1 ): 55-81.

[2] HE W , LI Z J , CHEN C L P.A survey of human-centered intelligent robots : issues and challenges [ J ] .IEEE / CAA Journal of automatica sinica , 2017 , 4 ( 4 ):602-609.
[3] SCHRITTWIESER J , ANTONOGLOU I , HUBERT T , et al.Mastering Atari , Go , chess and shogiby planning with a learned model [ J ] .Nature , 2020 , 588( 7839 ): 604-609.
[4] BADIA A P , PIOT B , KAPTUROWSKI S , et al.Agent57 : Outperforming the Atari human benchmark[ C ]// the 37th International Conference on Machine Learning , 2020 : 507-517.
[5] BERNER C , BROCKMAN G , CHAN B , et al.Dota2 with large scale deep reinforcement learning [ J ] .arXiv preprint arXiv : 1912. 06680 , 2019.
[6] DULAC-ARNOLD G , MANKOWITZ D , HESTER T.Challenges of real-world reinforcement learning [ J ] .arXiv preprint arXiv : 1904. 12901 , 2019.
[7] YU Y.Towards sample efficient reinforcement learning[ C ]// TheTwenty-Seventh International Joint Conference on Artificial Intelligence ( IJCAI-18 ), 2018 :5739-5743.
[8] JANNER M , FU J , ZHANG M , et al.When to trust your model : Model-based policy optimization [ J ] .arXiv preprint arXiv : 1906. 08253 , 2019.
[9] JING M X , MA X J , HUANG W B , et al.Reinforcement learning from imperfect demonstrations under soft expert guidance [ C ]// The Thirty-Fourth AAAI Conference on Artificial Intelligence , 2020 : 5109-5116.
[10] HAARNOJA T , ZHOU A , ABBEEL P , et al.Soft actor-critic : Off-policy maximum entropy deep reinforcement learning with a stochastic actor [ C ]//The 35th International Conference on Machine Learning , 2018 : 1861-1870.
[11] CHANG A , DAI A , FUNKHOUSER T , et al.Matterport3D : Learning from RGB-D data in indoor environments [ C ]// 2017International Conference on 3D Vision ( 3DV ), 2017 : 667-676.
[12] KADIAN A , TRUONG J , GOKASLAN A , et al.Sim2Real predictivity : Does evaluation in simulation predict real-world performance [ J ] . IEEE Robotics and automation letters , 2020 , 5 ( 4 ): 6670-6677.
[13] SAVVA M , KADIAN A , MAKSYMETS O , et al.Habitat : A platform for embodied AI research [ C ]//2019IEEE / CVF International Conference on Computer Vision ( ICCV ), 2019 : 9338-9346.
[14] WIJMANS E , KADIAN A , MORCOS A , et al.DDPPO : Learning near-perfect PointGoal navigators from2.5 billion frames [ J ] .arXiv preprint arXiv :1911. 00357 , 2019.
[15] SCHULMAN J , WOLSKI F , DHARIWAL P , et al.Proximal policy optimization algorithms [ J ] .arXiv preprint arXiv : 1707. 06347 , 2017.

备注/Memo

备注/Memo:
收稿日期: 2021-04-23
基金项目:国家自然科学基金面上项目( 61973210 );医工交叉项目( YG2019ZDA17 , ZH2018QNB23 );八院联合基金( USCAST2020-7 )
作者简介:刘浚嘉 ( 1997- ),男,辽宁大连人,硕士,研究方向为强化学习、机器人智能控制算法等;付 庄 ( 1972- ),男,山东招远人,教授,博士研究生导师,研究方向为机器人及智能控制系统、运动学与动力学等,通信作者.
更新日期/Last Update: 2021-09-02