参考文献/References:
[1] WATKINS C H. Learning from delayed rewards[D]. London: King’s College, 1989: 89-95.
[2] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge, USA: MIT Press, 2018.
[3] SZEPESVRI C. The asymptotic convergence-rate of Q-learning[C]// Advances in Neural Information Processing Systems (NIPS). Cambridge, USA: MIT Press, 1998: 1064-1070.
[4] HWANG I, YOUNG J J. Q(λ) learning-based dynamic route guidance algorithm for overhead hoist transport systems in semiconductor fabs[J]. International Journal of Production Research, 2019(3): 1-23.
[5] ALIMORADI M R. KASHAN A H. A league championship algorithm equipped with network structure and backward Q-learning for extracting stock trading rules[J]. Applied Soft Computing, 2018, 68: 478-493.
[6] LIN Longji. Self-improving reactive agents based on reinforcement learning, planning and teaching[J]. Machine Learning, 1992, 8(3/4): 293-321.
[7] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[8] HASSELT H V, GUEZ A, SILVER D. Deep reinforcement learning with double Q-Learning[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. Phoenix, USA: AAAI Press, 2016: 2094-2100.
[9] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[EB/OL]. (2017-07-20)[2017-08-28]. https://arxiv.org/abs/1707.06347
[10] GU Shixiang, LILLICRAP T, SUTSKEVER I, et al. Continuous deep Q-learning with model-based acceleration[C]// International Conference on Machine Learning. New York. USA:[s.n.], 2016: 2829-2838.
[11] MAHMOOD A R, van HASSELT H, SUTTONR S. Weighted importance sampling for off-policy learning with linear function approximation[C]// Advances in Neural Information Processing Systems.[S. l.]: Neural Information Processing Systems Foundation, Inc., 2014: 3014-3022.
[12] SCHAUL T, QUAN J, ANTONOGLOU I, et al. Prioritized experience replay[EB/OL]. (2015-11-18)[2019-04-01]. https://arxiv.org/abs/1511.05952.
[13] VAN HASSELT H. Double Q-learning[C]// Advances in Neural Information Processing Systems.[S.l.]: Neural Information Processing Systems Foundation, Inc., 2010: 2613-2621.
[14] ARJONA-MEDINA J A, GILLHOFER M, WIDRICH M, et al. RUDDER: return decomposition for delayed rewards[DB/OL]. (2018-06-20)[2019-09-10]. https://arxiv.org/abs/1806.07857.
[15] ROLNICK D, AHUJA A, SCHWARZ J, et al. Experience replay for continual learning[C]// Advances in Neural Information Processing Systems.[S.l.]: Neural Information Processing Systems Foundation, Inc., 2019: 348-358.
相似文献/References:
[1]林春漪,尹俊勋,高 学,等.基于统计学习的多层医学图像语义建模方法[J].深圳大学学报理工版,2007,24(2):138.
LIN Chun-yi,YIN Jun-xun,GAO Xue,et al.A multi-level medical image semantic modeling approach based on statistical learning[J].Journal of Shenzhen University Science and Engineering,2007,24(2):138.
[2]骆剑平,李霞.求解TSP的改进混合蛙跳算法[J].深圳大学学报理工版,2010,27(2):173.
LUO Jian-ping and LI Xia.Improved shuffled frog leaping algorithm for solving TSP[J].Journal of Shenzhen University Science and Engineering,2010,27(2):173.
[3]蔡良伟,李霞.基于混合蛙跳算法的作业车间调度优化[J].深圳大学学报理工版,2010,27(4):391.
CAI Liang-wei and LI Xia.Optimization of job shop scheduling based on shuffled frog leaping algorithm[J].Journal of Shenzhen University Science and Engineering,2010,27(2):391.
[4]张重毅,刘彦斌,于繁华,等.CDA市场环境模型进化研究[J].深圳大学学报理工版,2010,27(4):413.
ZHANG Zhong-yi,LIU Yan-bin,YU Fan-hua,et al.Research on the evolution model of CDA market environment[J].Journal of Shenzhen University Science and Engineering,2010,27(2):413.
[5]姜建国,周佳薇,郑迎春,等.一种双菌群细菌觅食优化算法[J].深圳大学学报理工版,2014,31(1):43.[doi:10.3724/SP.J.1249.2014.01043]
Jiang Jianguo,Zhou Jiawei,Zheng Yingchun,et al.A double flora bacteria foraging optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(2):43.[doi:10.3724/SP.J.1249.2014.01043]
[6]蔡良伟,刘思麒,李霞,等.基于蚁群优化的正则表达式分组算法[J].深圳大学学报理工版,2014,31(3):279.[doi:10.3724/SP.J.1249.2014.03279]
Cai Liangwei,Liu Siqi,Li Xia,et al.Regular expression grouping algorithm based on ant colony optimization[J].Journal of Shenzhen University Science and Engineering,2014,31(2):279.[doi:10.3724/SP.J.1249.2014.03279]
[7]宁剑平,王冰,李洪儒,等.递减步长果蝇优化算法及应用[J].深圳大学学报理工版,2014,31(4):367.[doi:10.3724/SP.J.1249.2014.04367]
Ning Jianping,Wang Bing,Li Hongru,et al.Research on and application of diminishing step fruit fly optimization algorithm[J].Journal of Shenzhen University Science and Engineering,2014,31(2):367.[doi:10.3724/SP.J.1249.2014.04367]
[8]刘万峰,李霞.车辆路径问题的快速多邻域迭代局部搜索算法[J].深圳大学学报理工版,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
Liu Wanfeng,and Li Xia,A fast multi-neighborhood iterated local search algorithm for vehicle routing problem[J].Journal of Shenzhen University Science and Engineering,2015,32(2):196.[doi:10.3724/SP.J.1249.2015.02000]
[9]蔡良伟,程璐,李军,等.基于遗传算法的正则表达式规则分组优化[J].深圳大学学报理工版,2015,32(3):281.[doi:10.3724/SP.J.1249.2015.03281]
Cai Liangwei,Cheng Lu,Li Jun,et al.Regular expression grouping optimization based on genetic algorithm[J].Journal of Shenzhen University Science and Engineering,2015,32(2):281.[doi:10.3724/SP.J.1249.2015.03281]
[10]罗雪晖,李霞,张基宏.支持向量机及其应用研究[J].深圳大学学报理工版,2003,20(3):40.
LUO Xue-hui,LI Xia and ZHANG Ji-hong.Introduction to Support Vector Machine and Its Applications[J].Journal of Shenzhen University Science and Engineering,2003,20(2):40.
[11]潘长城,徐晨,李国.解全局优化问题的差分进化策略[J].深圳大学学报理工版,2008,25(2):211.
PAN Chang-cheng,XU Chen,and LI Guo.Differential evolutionary strategies for global optimization[J].Journal of Shenzhen University Science and Engineering,2008,25(2):211.
[12]岳琴,魏巍,冯凯,等.自适应稀疏表示引导的无监督降维[J].深圳大学学报理工版,2020,37(4):425.[doi:10.3724/SP.J.1249.2020.04425]
YUE Qin,WEI Wei,FENG Kai,et al.Adaptive sparse representation guided unsupervised dimensionality reduction[J].Journal of Shenzhen University Science and Engineering,2020,37(2):425.[doi:10.3724/SP.J.1249.2020.04425]