![]() ![]() 6(1), 2167–2374 (2016)įujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. Technical report (1986)Įrnest, N., Carroll, D., Schumacher, C., et al.: Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions. arXiv preprint arXiv:1710.03748 (2017)īurgin, G.H.: Improvements to the adaptive maneuvering logic program. Keywordsīansal, T., Pachocki, J., Sidor, S., Sutskever, I., Mordatch, I.: Emergent complexity via multi-agent competition. The result of the 1v1 BVR air combat problem shows that the improved NFSP-BRHC algorithm outperforms both the NFSP and the Self-Play (SP) algorithms. These two components helped our algorithm to achieve efficient training in the high-fidelity simulation environment. ![]() Our training algorithm improves Neural Fictitious Self-Play (NFSP) and proposes the best response history correction (BRHC) version of NFSP. Our decision-making model uses the Soft actor-critic (SAC) algorithm, a method based on maximum entropy, as the action control of the reinforcement learning part, and introduces an action mask to achieve efficient exploration. To address this problem, we propose a reinforcement learning self-play training framework to solve it from two aspects: the decision model and the training algorithm. The complexity of action and state space in this game makes it difficult to learn high-level air combat strategies from scratch. In contrast to most reinforcement learning problems, 1v1 BVR air combat belongs to the class of two-player zero-sum games with long decision-making periods and sparse rewards. We study the problem of utilizing reinforcement learning for action control in 1v1 Beyond-Visual-Range (BVR) air combat.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |