[1] CHAN M , ESTèVE D , ESCRIBA C , et al.A review of smart homes : Present state and future challenges [ J ] .Computer methods and programs in biomedicine ,2008 , 91 ( 1 ): 55-81.[2] HE W , LI Z J , CHEN C L P.A survey of human-centered intelligent robots : issues and challenges [ J ] .IEEE / CAA Journal of automatica sinica , 2017 , 4 ( 4 ):602-609.
[3] SCHRITTWIESER J , ANTONOGLOU I , HUBERT T , et al.Mastering Atari , Go , chess and shogiby planning with a learned model [ J ] .Nature , 2020 , 588( 7839 ): 604-609.
[4] BADIA A P , PIOT B , KAPTUROWSKI S , et al.Agent57 : Outperforming the Atari human benchmark[ C ]// the 37th International Conference on Machine Learning , 2020 : 507-517.
[5] BERNER C , BROCKMAN G , CHAN B , et al.Dota2 with large scale deep reinforcement learning [ J ] .arXiv preprint arXiv : 1912. 06680 , 2019.
[6] DULAC-ARNOLD G , MANKOWITZ D , HESTER T.Challenges of real-world reinforcement learning [ J ] .arXiv preprint arXiv : 1904. 12901 , 2019.
[7] YU Y.Towards sample efficient reinforcement learning[ C ]// TheTwenty-Seventh International Joint Conference on Artificial Intelligence ( IJCAI-18 ), 2018 :5739-5743.
[8] JANNER M , FU J , ZHANG M , et al.When to trust your model : Model-based policy optimization [ J ] .arXiv preprint arXiv : 1906. 08253 , 2019.
[9] JING M X , MA X J , HUANG W B , et al.Reinforcement learning from imperfect demonstrations under soft expert guidance [ C ]// The Thirty-Fourth AAAI Conference on Artificial Intelligence , 2020 : 5109-5116.
[10] HAARNOJA T , ZHOU A , ABBEEL P , et al.Soft actor-critic : Off-policy maximum entropy deep reinforcement learning with a stochastic actor [ C ]//The 35th International Conference on Machine Learning , 2018 : 1861-1870.
[11] CHANG A , DAI A , FUNKHOUSER T , et al.Matterport3D : Learning from RGB-D data in indoor environments [ C ]// 2017International Conference on 3D Vision ( 3DV ), 2017 : 667-676.
[12] KADIAN A , TRUONG J , GOKASLAN A , et al.Sim2Real predictivity : Does evaluation in simulation predict real-world performance [ J ] . IEEE Robotics and automation letters , 2020 , 5 ( 4 ): 6670-6677.
[13] SAVVA M , KADIAN A , MAKSYMETS O , et al.Habitat : A platform for embodied AI research [ C ]//2019IEEE / CVF International Conference on Computer Vision ( ICCV ), 2019 : 9338-9346.
[14] WIJMANS E , KADIAN A , MORCOS A , et al.DDPPO : Learning near-perfect PointGoal navigators from2.5 billion frames [ J ] .arXiv preprint arXiv :1911. 00357 , 2019.
[15] SCHULMAN J , WOLSKI F , DHARIWAL P , et al.Proximal policy optimization algorithms [ J ] .arXiv preprint arXiv : 1707. 06347 , 2017.