«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]刘浚嘉１,付庄１,谢荣理２,等.模糊先验引导的高效强化学习移动机器人导航[J].机械与电子,2021,(08):72-76.
　LIU Junjia,FU Zhuang,XIE Rongli,et al.Inexplicit Priori Guided Efficient Reinforcement Learning for Mobile Robot Navigation[J].Machinery & Electronics,2021,(08):72-76.
点击复制

模糊先验引导的高效强化学习移动机器人导航()

分享到：

机械与电子[ISSN:1001-2257/CN:52-1052/TH]

卷:
期数:: 2021年08期

页码:: 72-76

栏目:: 智能工程

出版日期:: 2021-08-24

文章信息/Info

Title:: Inexplicit Priori Guided Efficient Reinforcement Learning for Mobile Robot Navigation

文章编号:: １００１-２２５７ ( ２０２１ ) ０８-００７２-０５

作者:: 刘浚嘉 ^１ ; 付庄 ^１ ; 谢荣理 ^２ ; 张俊 ^２ ; 费健 ^２; １．上海交通大学机械系统与振动国家重点实验室,上海２００２４０ ;
２．上海交通大学医学院附属瑞金医院,上海２０００２５

Author(s):: LIU Junjia^１ ; FU Zhuang^１ ; XIE Rongli^２ ; ZHANG Jun^２ ; FEI Jian^２; (１． State Key Laboratory of Mechanical System and Vibration , Shanghai Jiao Tong University , Shanghai ２００２４０ , China ;
２．Ruijin Hospital Affiliated to Shanghai Jiao Tong University , Shanghai ２０００２５ , China )

关键词:: 移动机器人主动视觉导航; 坐标点导航; 数据高效强化学习

Keywords:: mobile robot active visual navigation ; point navigation ; data efficient reinforcement learning

分类号:: TP２４9

文献标志码:: A

摘要:: 针对移动机器人仅使用 RGB-D 相机在室内场景下执行坐标点主动视觉导航任务,提出了一种基于模糊先验引导的高效强化学习控制算法,设计了一种先验与强化学习的融合控制方式.结果表明,所提出的融合控制方法能够在确保较好的导航准确率基础上,避免训练中无意义的探索并缩短强化学习训练时长,实现强化学习策略的高效学习.

Abstract:: Aiming at the mobile robot using only RGB-D cameras to perform coordinate point active visual navigation tasks in indoor scenes , an efficient reinforcement learning control algorithm based on inexplicit prior guidance is proposed , and a fusion control method of priori and reinforcement learning is designed．The results show that the proposed fusion control method can avoid meaningless exploration in training and shorten the training time of reinforcement learning on the basis of ensuring better navigation accuracy , and achieve efficient learning of reinforcement learning polices．

参考文献/References:

[１] CHAN M , ESTèVE D , ESCRIBA C , et al．A review of smart homes : Present state and future challenges [ J ] ．Computer methods and programs in biomedicine ,２００８ , ９１ ( １ ): ５５-８１．

[２] HE W , LI Z J , CHEN C L P．A survey of human-centered intelligent robots : issues and challenges [ J ] ．IEEE / CAA Journal of automatica sinica , ２０１７ , ４ ( ４ ):６０２-６０９．

[３] SCHRITTWIESER J , ANTONOGLOU I , HUBERT T , et al．Mastering Atari , Go , chess and shogiby planning with a learned model [ J ] ．Nature , ２０２０ , ５８８( ７８３９ ): ６０４-６０９．

[４] BADIA A P , PIOT B , KAPTUROWSKI S , et al．Agent５７ : Outperforming the Atari human benchmark[ C ]// the ３７th International Conference on Machine Learning , ２０２０ : ５０７-５１７．

[５] BERNER C , BROCKMAN G , CHAN B , et al．Dota２ with large scale deep reinforcement learning [ J ] ．arXiv preprint arXiv : １９１２．０６６８０ , ２０１９．

[６] DULAC-ARNOLD G , MANKOWITZ D , HESTER T．Challenges of real-world reinforcement learning [ J ] ．arXiv preprint arXiv : １９０４．１２９０１ , ２０１９．

[７] YU Y．Towards sample efficient reinforcement learning[ C ]// TheTwenty-Seventh International Joint Conference on Artificial Intelligence ( IJCAI-１８ ), ２０１８ :５７３９-５７４３．

[８] JANNER M , FU J , ZHANG M , et al．When to trust your model : Model-based policy optimization [ J ] ．arXiv preprint arXiv : １９０６．０８２５３ , ２０１９．

[９] JING M X , MA X J , HUANG W B , et al．Reinforcement learning from imperfect demonstrations under soft expert guidance [ C ]// The Thirty-Fourth AAAI Conference on Artificial Intelligence , ２０２０ : ５１０９-５１１６．

[１０] HAARNOJA T , ZHOU A , ABBEEL P , et al．Soft actor-critic : Off-policy maximum entropy deep reinforcement learning with a stochastic actor [ C ]//The ３５th International Conference on Machine Learning , ２０１８ : １８６１-１８７０．

[１１] CHANG A , DAI A , FUNKHOUSER T , et al．Matterport３D : Learning from RGB-D data in indoor environments [ C ]// ２０１７International Conference on ３D Vision ( ３DV ), ２０１７ : ６６７-６７６．

[１２] KADIAN A , TRUONG J , GOKASLAN A , et al．Sim２Real predictivity : Does evaluation in simulation predict real-world performance [ J ] ． IEEE Robotics and automation letters , ２０２０ , ５ ( ４ ): ６６７０-６６７７．

[１３] SAVVA M , KADIAN A , MAKSYMETS O , et al．Habitat : A platform for embodied AI research [ C ]//２０１９IEEE / CVF International Conference on Computer Vision ( ICCV ), ２０１９ : ９３３８-９３４６．

[１４] WIJMANS E , KADIAN A , MORCOS A , et al．DDPPO : Learning near-perfect PointGoal navigators from２.５ billion frames [ J ] ．arXiv preprint arXiv :１９１１．００３５７ , ２０１９．

[１５] SCHULMAN J , WOLSKI F , DHARIWAL P , et al．Proximal policy optimization algorithms [ J ] ．arXiv preprint arXiv : １７０７．０６３４７ , ２０１７．

备注/Memo

备注/Memo:: 收稿日期: ２０２１-０４-２３
基金项目:国家自然科学基金面上项目( ６１９７３２１０ );医工交叉项目( YG２０１９ZDA１７ , ZH２０１８QNB２３ );八院联合基金( USCAST２０２０-７ )
作者简介:刘浚嘉 ( １９９７－ ),男,辽宁大连人,硕士,研究方向为强化学习、机器人智能控制算法等;付庄 ( １９７２－ ),男,山东招远人,教授,博士研究生导师,研究方向为机器人及智能控制系统、运动学与动力学等,通信作者.

更新日期/Last Update: 2021-09-02

机械与电子[ISSN:1001-2257/CN:52-1052/TH]

文章信息/Info

参考文献/References:

备注/Memo

常用功能

导航/Navigate

工具/Tools

统计/Statistics