Tài liệu Smart agent for card games using reinforcement learning

.PDF

142

thanhphoquetoi Báo vi phạm

Tải xuống 142

Mô tả:

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY FACULTY OF COMPUTER SCIENCE AND ENGINEERING ——————– * ——————— THESIS Smart agent for card-games using reinforcement learning COUNCIL : COMPUTER SCIENCE SUPERVISOR : PhD. Nguyen Duc Dung REVIEWER : MSc. Vuong Ba Thinh —o0o— Students: Vo Khac Thanh Cap Dang Xuan Kiet 1713164 1711852 HO CHI MINH CITY, October, 2021 ĈҤ,+Ӑ&48Ӕ&*,$73+&0 75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ .+2$.+ .70i\WtQK %Ӝ0Ð1 .+07 &Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS7ӵGR+ҥQKSK~F 1+,ӊ09Ө/8Ұ1È17Ӕ71*+,ӊ3 +Ӑ9¬7Ç19}.KҳF7KjQK +Ӑ9¬7Ç1&iSĈһQJ;XkQ.LӋW 1*¬1+.KRDKӑFPi\WtQK 0669 0669 /Ӟ3 ĈҫXÿӅOXұQiQ $JHQW7K{QJ0LQKFKR&DUGJDPHYӟL0{+uQK+ӑF7ăQJ&ѭӡQJ 1KLӋPYө\rXFҫXYӅQӝLGXQJYjVӕOLӋXEDQÿҫX 7uPKLӇXFiFWKӇORҥLWXUQEDVHGFDUGJDPHFiFF{QJQJKӋ[k\GӵQJJDPH 7uPKLӇXFiFF{QJQJKӋKӑFWăQJFѭӡQJÿѭӧFGQJWURQJ[k\GӵQJDJHQW ĈӅ[XҩW[k\GӵQJPӝWWXUQEDVHGFDUGJDPHSKKӧSFKRQJKLrQFӭX ĈӅ[XҩWP{KuQKKӑFWăQJFѭӡQJÿӇSKiWWULӇQDJHQWFKRJDPH +LӋQWKӵFJDPHYjP{KuQKFKRDJHQW 3KkQWtFKYjÿiQKJLiFiFNӃWTXҧÿҥWÿѭӧFFӫDDJHQW 1Jj\JLDRQKLӋPYөOXұQiQ 1Jj\KRjQWKjQKQKLӋPYө +ӑWrQJLҧQJYLrQKѭӟQJGүQ 1JX\ӉQĈӭF'ǊQJ 3KҫQKѭӟQJGүQ 1ӝLGXQJYj\rXFҫX/971ÿmÿѭӧFWK{QJTXD%ӝP{Q 1Jj\WKiQJQăP &+Ӫ1+,ӊ0%Ӝ0Ð1 *,Ҧ1*9,Ç1+ѬӞ1*'Ү1&+Ë1+ .êYjJKLU}K͕WrQ .êYjJKLU}K͕WrQ 3*676+XǤQK7ѭӡQJ1JX\rQ 761JX\ӉQĈӭF'ǊQJ 3+̮1'¬1+&+2.+2$%͠0Ð1 1JѭӡLGX\ӋWFKҩPVѫEӝ ĈѫQYӏ 1Jj\EҧRYӋ ĈLӇPWәQJNӃW 1ѫLOѭXWUӳOXұQiQ 75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ .+2$.+ .70È<7Ë1+ &Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$ 9,ӊ71$0 ĈӝFOұS7ӵGR+ҥQKSK~F 1Jj\WKiQJQăP 3+,ӂ8&+Ҩ0%Ҧ29ӊ/971 'jQKFKRQJ˱ͥLK˱ͣQJG̳Q +ӑYjWrQ699}.KҳF7KjQK&iSĈһQJ;XkQ.LӋW 0669 1JjQKFKX\rQQJjQK.+07 ĈӅWjL$JHQW7K{QJ0LQKFKR&DUGJDPHYӟL0{+uQK+ӑF7ăQJ&ѭӡQJ +ӑWrQQJѭӡLKѭӟQJGүQ1JX\ӉQĈӭF'ǊQJ 7әQJTXiWYӅEҧQWKX\ӃWPLQK 6ӕWUDQJ 6ӕEҧQJVӕOLӋX 6ӕWjLOLӋXWKDPNKҧR +LӋQYұWVҧQSKҭP 6ӕFKѭѫQJ 6ӕKuQKYӁ 3KҫQPӅPWtQKWRiQ 7әQJTXiWYӅFiFEҧQYӁ 6ӕEҧQYӁ %ҧQ$ 6ӕEҧQYӁYӁWD\ %ҧQ$ .KәNKiF 6ӕEҧQYӁWUrQPi\WtQK 1KӳQJѭXÿLӇPFKtQKFӫD/971 ĈӅ WjL QJKLrQ FӭX [k\ GӵQJ PӝW DJHQW WK{QJ PLQK FKR FDUGJDPH YӟL Nӻ WKXұW KӑF WăQJ FѭӡQJ 1JKLrQ FӭX Qj\ Fy WKӇ Pӣ UD FiF KѭӟQJ SKiW WULӇQ PӟL WURQJ YLӋF [k\ GӵQJ FiF KӋ WKӕQJ UD TX\ӃW ÿӏQK WK{QJ PLQK Kӛ WUӧ FKR FRQ QJѭӡL ӣ QKLӅX OƭQK YӵF 1KyP ÿm ÿӅ [XҩW Yj [k\ GӵQJ WKjQK F{QJ PӝW FDUG JDPH YӟL ÿӝ NKy SK KӧS Yj WұS OXұW KRjQ FKӍQK *DPH Qj\ ÿѭӧF Vӱ GөQJ QKѭ PӝW QӅQ WҧQJ JL~S QJKLrQ FӭX FiF JLҧL WKXұW KӑF WăQJ FѭӡQJ QKҵP [k\ GӵQJ DJHQW WK{QJ PLQK 1KyP FǊQJ ÿm WKӵF KLӋQ QJKLrQ FӭX Yj [k\ GӵQJ FiF P{ KuQK KӑF WăQJ FѭӡQJ ÿӇ SKiW WULӇQ DJHQW &iF NӃW TXҧ WKXÿѭӧFNKҧTXDQYjFzQQKLӅXWLӅPQăQJÿӇFҧLWKLӋQ 1KӳQJWKLӃXVyWFKtQKFӫD/971 'R WKӡL JLDQ JLӟL KҥQ FǊQJ QKѭ FiF KҥQ FKӃ YӅ WjL QJX\rQ WtQK WRiQ QrQ QKyP FKӍ PӟL GӯQJ OҥL ӣ FiFDJHQWÿѫQJLҧQ ĈӅQJKӏĈѭӧFEҧRYӋ 🗹 %әVXQJWKrPÿӇEҧRYӋ Ƒ .K{QJÿѭӧFEҧRYӋ Ƒ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ D E F ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7%JLӓL ĈLӇP .êWrQJKLU}KӑWrQ 761JX\ӉQĈӭF'ǊQJ 75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ .+2$.+ .70È<7Ë1+ &Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$ 9,ӊ71$0 ĈӝFOұS7ӵGR+ҥQKSK~F 1Jj\WKiQJQăP 3+,ӂ8&+Ҩ0%Ҧ29ӊ/971 'jQKFKRQJ˱ͥLSK̫QEL͏Q +ӑYjWrQ699}.KҳF7KjQK0669Yj&iSĈһQJ;XkQ.LӋW0669 1JjQKFKX\rQQJjQK.KRDKӑFPi\WtQK ĈӅWjL$JHQW7K{QJ0LQKFKR&DUGJDPHYӟL0{+uQK+ӑF7ăQJ&ѭӡQJ +ӑWrQQJѭӡLSKҧQELӋQ9ѭѫQJ%i7KӏQK 7әQJTXiWYӅEҧQWKX\ӃWPLQK 6ӕWUDQJ 6ӕFKѭѫQJ 6ӕEҧQJVӕOLӋX 6ӕKuQKYӁ 6ӕWjLOLӋXWKDPNKҧR 3KҫQPӅPWtQKWRiQ +LӋQYұWVҧQSKҭP 7әQJTXiWYӅFiFEҧQYӁ 6ӕEҧQYӁ %ҧQ$ %ҧQ$ .KәNKiF 6ӕEҧQYӁYӁWD\ 6ӕEҧQYӁWUrQPi\WtQK 1KӳQJѭXÿLӇPFKtQKFӫD/971 1KyPVLQKYLrQÿmWuPKLӇXFiFF{QJWUuQKOLrQTXDQFiFF{QJFөKӛWUӧSKKӧS 3KkQWtFKWKLӃWNӃJDPHSKKӧSYӟLPөFWLrXWҥRUDDJHQWWK{QJPLQKGӵDWUrQKӑFWăQJFѭӡQJ +LӋQWKӵFJDPHFyÿӗKӑDәQ ;k\GӵQJKLӋQWKӵF1)63DJHQWYjFyÿiQKJLiNKҧQăQJFKѫLJDPHFӫDDJHQW ĈӅ[XҩWYjKLӋQWKӵFWKjQKF{QJP{KuQKNӃWKӧSJLӳD1)63ZLWK'RXEOH'41 1KӳQJWKLӃXVyWFKtQKFӫD/971 'ӳOLӋXKXҩQOX\ӋQFzQtWYjFKѭDVӱGөQJFiFSKѭѫQJSKiSP{SKӓQJÿӇVLQKGӳOLӋX *LӳDFiF)LWҥPJӑLOjERWÿӡLVDXÿѭӧFVLQKUDWӯ)ÿӡLWUѭӟFÿyFKѭDFyEҵQJFKӭQJELӋQOXұQ WKX\ӃWSKөFYӅVӵѭXYLӋFNKLFKѫLJDPHFӫD)VDXVRYӟL)WUѭӟF ĈӅQJKӏĈѭӧFEҧRYӋ Ƒ %әVXQJWKrPÿӇEҧRYӋ Ƒ .K{QJÿѭӧFEҧRYӋ Ƒ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ D1KyPVLQKYLrQFyWuPKLӇXYӅYLӋFNӃWKӧSJLӳD1)63YӟL0&76FKѭD" ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7% ĈLӇP .êWrQJKLU}KӑWrQ 9ѭѫQJ%i7KӏQK Declaration We hereby declare that unless a clear reference is made to the work of others, the substance of this dissertation is original and has not been submitted in whole or in part for consideration at this or any other university for any other degree or qualification. This study is our work and contains nothing that, except as stated in the text and acknowledgments, it the product of work done in collaboration with others. If this statement is not true, our team will accept all responsibility to the Dean of the School and the School Administration. Acknowledgments We would want to express our heartfelt and sincere appreciation to Dr.Nguyen Duc Dung. For roughly one year, we have faced many challenges. Many tasks that we thought they could be finished in hours turned out to be weeks, months. We saw things did not go as we expected, some dragged us down. However, we were eventually clear-headed and found solutions for every problems in the thesis, thanks to your advice and guidance. We will be eternally thankful and obliged to you for your direction, wisdom, passion, and encouragement in assisting us in studying and implementing this project. We would like to thank all of our faculty lecturers, without whom we would not have had the essential knowledge to complete our research. Later, armed with this university’s invaluable experience, we shall fearlessly face our future and be exceptional. Four years have passed since the first day we started our first lecture in the university. We had new friends, who were not only classmate, but also great teachers, teammates and assistants. It would have taken longer than four years for us to complete all classes if it had not been for you. Furthermore, this four-year period would never be fulfilled without happy, sad, cheerful and emotional memories that we had together. We will always respect and appreciate those as the best days that we have ever had. Despite our efforts, we are aware that this project is currently insufficient and will inevitably contain flaws. We’d love to hear from professors on how we might improve even more. Finally, we wish you health, wealth, and success in your endeavors. Students. Abstract Artificial intelligence and games have long been in a relationship. Gamers demand sophisticated AI, while games provide hard and diverse challenges whose solutions often apply to real-life problems. In the past, Chess and Go are played by computers at a superhuman level. Recently, in Defend of the Ancients 2 (DotA 2), computers defeated the best professional player, yet there were some restrictions in the match. Collectible Card Games (CCG), a branch of the strategy genre, challenges minds with their complex environments. These games are ideal for researching artificial intelligence as they are harder to play compared to traditional card games and the responding time requirement is relaxed. In this dissertation, we aim to create a CCG game and training smart agents who are able to take efficient actions and play at an accomplished level. Table of contents 1 2 3 Introduction 1.1 Overview . . . . . . . . . . . . . . 1.1.1 Video games & Card-games 1.1.2 Artificial Intelligence . . . . 1.2 Objectives . . . . . . . . . . . . . . 1.3 Development process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 2 3 3 Background 2.1 Game Engines . . . . . . . . . . . . . . . . . . . . . 2.1.1 Definition . . . . . . . . . . . . . . . . . . . 2.1.2 Runtime Engine Architecture . . . . . . . . . 2.1.2.1 Third-Party SDKs and Middleware 2.1.2.2 Resource Manager . . . . . . . . . 2.1.2.3 Rendering Engine . . . . . . . . . 2.1.2.4 Collision and Physics . . . . . . . 2.1.2.5 Human Interface Devices - HID . . 2.1.2.6 Gameplay Foundation Systems . . 2.2 Reinforcement Learning . . . . . . . . . . . . . . . 2.2.1 Markov Decision Process . . . . . . . . . . . 2.2.2 Q-learning . . . . . . . . . . . . . . . . . . 2.2.3 Fictitious Self-play . . . . . . . . . . . . . . 2.3 Supervised Learning . . . . . . . . . . . . . . . . . 2.3.1 Multilayer perceptron . . . . . . . . . . . . . 2.3.2 Recurrent neural network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 6 7 8 8 8 9 9 10 11 12 12 13 13 13 16 . . . . . . . . . . . . 18 18 18 19 19 19 21 21 22 22 22 23 23 Related Works 3.1 Game Engines . . . . . 3.1.1 Unity . . . . . 3.1.1.1 Pros 3.1.1.2 Cons 3.1.2 Cocos Creator . 3.1.2.1 Pros 3.1.2.2 Cons 3.1.3 Summary . . . 3.2 Games . . . . . . . . . 3.2.1 Yu-Gi-Oh! . . 3.2.1.1 Pros 3.2.1.2 Cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 3.3 3.4 4 Hearthstone . . . . . . . . . 3.2.2.1 Pros . . . . . . . 3.2.2.2 Cons . . . . . . . 3.2.2.3 Summary . . . . Reinforcement learning approaches . 3.3.1 Deep Q-network . . . . . . 3.3.1.1 Overview . . . . . 3.3.1.2 Pros . . . . . . . 3.3.1.3 Cons . . . . . . . 3.3.2 Double Deep Q-Network . . 3.3.2.1 Overview . . . . . 3.3.2.2 Pros . . . . . . . 3.3.2.3 Cons . . . . . . . 3.3.3 Neural Fictitious Self-play . 3.3.3.1 Overview . . . . . 3.3.3.2 Pros . . . . . . . 3.3.3.3 Cons . . . . . . . 3.3.4 Remarks . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Quest of the Divinity 4.1 System Architecture . . . . . . . . . . . . . 4.1.1 Application Programming Interface 4.1.2 Communication Server . . . . . . . 4.2 Quest of the Divinity . . . . . . . . . . . . 4.2.1 Game board . . . . . . . . . . . . . 4.2.2 Ruleset . . . . . . . . . . . . . . . 4.2.3 Game Controls . . . . . . . . . . . 4.2.4 How To Play . . . . . . . . . . . . 4.2.5 Card Design . . . . . . . . . . . . 4.2.6 Architecture Design . . . . . . . . 4.2.6.1 Client . . . . . . . . . . 4.2.6.2 API Director . . . . . . . 4.2.6.3 Log Writer . . . . . . . . 4.2.6.4 Mouse . . . . . . . . . . 4.2.6.5 Timer . . . . . . . . . . 4.2.6.6 Game Manager . . . . . 4.2.6.7 Player . . . . . . . . . . 4.2.6.8 Hand . . . . . . . . . . . 4.2.6.9 Deck . . . . . . . . . . . 4.2.6.10 Boardside . . . . . . . . 4.2.6.11 Card . . . . . . . . . . . 4.2.6.12 Visualizer . . . . . . . . 4.3 Game agent . . . . . . . . . . . . . . . . . 4.3.1 Feature Extraction . . . . . . . . . 4.3.2 Reward functions . . . . . . . . . . 4.3.3 NFSP agent with double DQN . . . 4.3.3.1 Imitation Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 24 25 25 25 25 25 26 26 27 27 27 27 29 29 30 30 31 31 . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 32 32 33 33 35 35 36 36 38 38 38 40 40 40 40 40 41 41 41 41 42 42 42 43 43 43 4.3.3.2 4.3.3.3 5 6 Double DQN Agent . . . . . . . . . . . . . . . . . . . . . . NFSP Agent . . . . . . . . . . . . . . . . . . . . . . . . . . Experiments & Results 5.1 Experiment . . . . . . . . . . . . . . 5.1.1 Experimental setup . . . . . . 5.1.2 Data description . . . . . . . 5.1.2.1 Data overview . . . 5.1.2.2 Data problem . . . 5.1.2.3 Solution and result . 5.2 Results . . . . . . . . . . . . . . . . . 5.2.1 Imitation Agents . . . . . . . 5.2.2 Evaluation with collected data 5.2.3 Evaluation with game . . . . . 5.2.4 Evaluation with human . . . . 44 44 . . . . . . . . . . . 47 47 47 48 48 48 49 50 50 51 52 54 Conclusion 6.1 What went well . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 What was not great . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 What could be different . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 55 55 55 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A In-game Images 59 B Card Details & Graphics References B.1 Card Details . . . . . . . . . . . . . B.1.1 Minions . . . . . . . . . . . B.1.1.1 None . . . . . . . B.1.1.2 Sage . . . . . . . B.1.1.3 Warrior . . . . . . B.1.1.4 Abyss . . . . . . B.1.1.5 Nature . . . . . . B.1.1.6 Dragon . . . . . . B.1.2 Spell Cards . . . . . . . . . B.1.3 Sacrifice Cards . . . . . . . B.2 Graphics References . . . . . . . . B.2.1 Asset Packs . . . . . . . . . B.2.2 Minions . . . . . . . . . . . B.2.2.1 None . . . . . . . B.2.2.2 Sage . . . . . . . B.2.2.3 Warrior . . . . . . B.2.2.4 Abyss . . . . . . B.2.2.5 Nature . . . . . . B.2.2.6 Dragon . . . . . . B.2.2.7 Summon Minions B.2.3 Spells . . . . . . . . . . . . B.2.4 Sacrifices . . . . . . . . . . B.2.5 Albums . . . . . . . . . . . B.2.6 Others . . . . . . . . . . . . 64 64 64 64 65 65 66 66 67 67 67 68 68 68 68 68 69 69 69 70 70 70 70 71 71 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C API: Card IDs 72 D API: Action IDs 73 List of figures Poker Championship, a Poker-based online game 1 . . . . . . . . . . . . . . . Deep Blue in IBM’s headquarters in Armonk, N.Y. 2 . . . . . . . . . . . . . . 1 2 Unreal Engine 4. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The runtime game engine architecture. 4 . . . . . . . . . . . . . . . . . . . . . Cocos 3.1.0 engine allows user to choose desired physics simulation library. . . Unity scripting system requires shutting the game down before recompilation. . Reinforcement learning process. . . . . . . . . . . . . . . . . . . . . . . . . . Multilayer perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . An example of a computational graph. . . . . . . . . . . . . . . . . . . . . . . Forward propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RNN and the unfolding in time of the computation involved in its forward computation [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Types of RNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 7 9 10 11 14 15 16 3.1 3.2 3.3 3.4 3.5 3.6 Unity Editor user interface. . . . . . . . . . . . . . . . . Cocos2d-x Framework Architecture5 . . . . . . . . . . . Cocos 2.4.5 preview. . . . . . . . . . . . . . . . . . . . Yu-Gi-Oh! Power of Chaos: Yugi The Destiny gameplay. An example of Hearthstone battlefield interface6 . . . . . Values estimated from the DQN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20 21 23 24 28 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 System architecture. . . . . . . . . . . . . . . The game board. . . . . . . . . . . . . . . . Elements of the game board. . . . . . . . . . Minion card overview. . . . . . . . . . . . . Spell and Sacrifice cards overview . . . . . . The game main architecture. . . . . . . . . . Architecture of Imitation Agent. . . . . . . . Neural Fictitious Self-play agent’s flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 34 34 37 37 39 44 46 5.1 5.2 5.3 5.4 5.5 5.6 5.7 Rule-based agent data flow. . . . . . . . . . . . . . . . . . . . Percentages of number of states of actions. . . . . . . . . . . . Accuracy of imitation agent during training process. . . . . . . Loss of imitation agent during training process. . . . . . . . . Accuracy and loss of imitation agents during training process. Total scores of agents while playing in emulator. . . . . . . . . Win rates of agents during playing 100 games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 48 49 49 51 52 53 A.1 Player can drag and drop cards to activate them. . . . . . . . . . . . . . . . . . 59 1.1 1.2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 17 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 Simple particle effect on activating a minion card. . . . . . . Apprentice Mage with initial chapion effect. . . . . . . . . . Minion outline types . . . . . . . . . . . . . . . . . . . . . A minion deals damage to its target, with some debris splash. Minion’s simple exploding effect. . . . . . . . . . . . . . . A spell card stays at its dropped position and wait for target. . Lord of the Apocalypse and Infernal creatures special effects. The One From Above with special attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 60 61 61 62 62 63 63 B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8 Card in None class. . . . Cards in Sage class. . . . Cards in Warrior class. . Cards in Abyss class. . . Cards in Nature class. . Cards in Dragon class. . The Spell card subset. . . The Sacrifice card subset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 65 65 66 66 67 67 68 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1 Introduction 1.1 1.1.1 Overview Video games & Card-games Video games, from the player perspective, are virtual environments where players can interact using peripheral devices (keyboard, mouse, gamepad, etc.) to relax and enjoy the fun that has been arranged by designers. Games, on the other hand, allow players to make rational decisions according to defined rules in an attempt to receive some sort of payoff [2]. There are many categories of games: action, first-person shooter (FPS), third-person shooter (TPS), strategy, and the list goes on. However, a game may be categorized into multiple types, such as action, hack ’n slash, third-person shooter... Figure 1.1: Poker Championship, a Poker-based online game 1 Card games are one of the most familiar types of games that have been played, not only digitally but also physically. Taking Poker as an example for this genre, it can be inferred that 1 https://store.steampowered.com/app/1138190/Poker_Championship/ 1 INTRODUCTION this game is built around playing cards that are identical in size and shape. Each game has a deck of cards or a large set of cards to be built into various customized decks. Card games are also strategy games, as they exploit the fact that each player knows only their cards and cards on the table. They have no information about the opponent’s cards and cards that remain on the deck. Precisely, card games are strategy games with imperfect information. 1.1.2 Artificial Intelligence Artificial Intelligence (AI) emerged in the 1950s, and the field has been researched and developed dramatically since then. We utilize AI in most computer applications, such as recommendation, recognition or detection, and many more. Game is not an exception. Now, AI becomes the core of most computer games. In the 1990s, people implemented classical AI methods in games. For instance, Deep Blue uses search algorithms to play chess. Although Deep Blue’s performance was good, its al- Figure 1.2: Deep Blue in IBM’s headquarters in Armonk, N.Y. 2 2 Yvonne Hemsey/Getty Images 2 INTRODUCTION gorithm demands a significant computational power to work well on much larger state-space games. In recent years, Machine Learning approaches (Supervised Learning - SL and Reinforcement Learning - RL) have been implemented and gradually improved, which resulted in high performance and low computational power requirement. 1.2 Objectives Card game is a good application for testing AI systems. We can evaluate the efficiency of AI systems on these games since they contain complicated states and action spaces. In this work, we aim for a modern card game with smart agents based on modern machine learning approaches. To achieve this goal, we are going to perform the following tasks: • Investigate card games to prepare for designing an alternative card game with smart agents based on machine learning models. • Build the proposed Card game in order to assist the smart agent. • Build Intelligent Agents using modern methods (RL, combined RL - SL) and evaluate their performance. We decide to build our own game instead of using an existed game because: • The game’s complexity can be supervised and controlled easily, that is, we would not have to invest too much time in exploring and understanding how a game functions, or every of its interactions and mechanisms. • On-demand customization and integration can be performed in short time, as we built the core components, it is easier to develop features or sub-system based on that core than revising others’ code. 1.3 Development process To achieve the above objectives, we break down the development process into phases as follows: • Phase 1: Research both game engines and machine learning methods: foundation, technical basis, mathematical basis, advantages and disadvantages. • Phase 2: Choose techniques to build the games and agents and design the core game components. • Phase 3: Implement a game prototype with basic interactions and an agent-training environment. • Phase 4: Build a complete game and an intermediate component to link the game and the training environment. • Phase 5: Update the game and train agents based on chosen approaches. • Phase 6: Evaluate the agents. • Phase 7: Organize a self-retrospective meeting to evaluate the whole thesis. We will clarify the development process in the rest of this dissertation. In the next chapter, we present mathematical and technical foundation that we have investigated on game engines, games and reinforcement learning approaches. In chapter 3, we list out some considerations of game engines, referenced games, modern machine learning approaches, then summarizes our chosen technologies and development plan. Then, we describe in details every main component: 3 INTRODUCTION communication server, proposed game and game agents in chapter 4. Chapter 5 presents evaluation results of our approach and analyze the performance of our smart agents. We conclude this work in chapter 6 and provide a self-retrospective on our performance: what went well, what did not, and what could be different. 4 Chapter 2 Background 2.1 Game Engines Since the early days of the gaming industry, the term “game engine” had not existed. The software that makes video and arcade games tick were highly specialized to both the game in question and the hardware on which it ran. Today, games are a multi-billion-dollar mainstream industry. The software that drives these now-ubiquitous three-dimensional and two-dimensional worlds: game engines, e.g. Epic Games’ Unreal Engine 4 and its upcoming descendant Unreal Engine 5, Valve’s Source engine, Unity game engine,... have become fully featured reusable software development kits that can be used to build almost any game. Virtually, all game engines contain a familiar set of core components, including rendering engine, collision and physics engine, the animation system, the audio system, the game world object model and so on. Figure 2.1: Unreal Engine 4. 1 1 https://www.unrealengine.com/en-US/onlinelearning-courses/introducing-unreal-engine 5 BACKGROUND 2.1.1 Definition The term “game engine” arose in the mid-1990s about first-person shooter (FPS) games like the legendary Doom by id Software. The game was architected with a reasonably welldefined separation between its core software components (such as the three-dimensional graphics rendering system, the collision detection system or the audio system) and the art assets, game worlds and rules of play that comprised the player’s gaming experience. The value of this separation became evident as developers began licensing games and retooling them into new products by creating new art, world layouts, weapons, characters, vehicles and game rules with only minimal changes to the "engine" software. Engines were made highly customizable via scripting languages like id’s Quake C, and engine licensing began to be a viable secondary revenue stream for developers who created them. Today, game developers can license a game engine and reuse significant portions of its key software components to build games. This practice can be much more economical than developing all of the core engine components in-house. The line between a game and its engine is often blurry. Some engines make a reasonably clear distinction, while others make almost no attempt to separate the two. In one game, the rendering code might "know" exactly how to draw an orc, in another one, the rendering engine might provide general-purpose material and shading facilities, and "orc-ness" might be defined entirely in data. Perhaps a data-driven architecture is what differentiates a game engine from a piece of software that is a game but not an engine. A game contains hard-coded logic or game rules, or employs special-case code to render specific types of game objects, it is obvious that the software is difficult or even impossible to reuse and make a different game. Game engine, on the other hand, is more extensible and can be used as the foundation for many different games without major modification. Although game engines are made to build any games, they are still specialized for one or two game genres. With the advent of faster graphics cards, processing units, hard disks, along with more efficient rendering algorithms and data structures, it is possible to use a first-person shooter (FPS) engine to build a strategy game, for example. However, the trade-off between generality and optimality still exists. The strategy game may render slower or handle the logic longer if it is built by a FPS engine rather than a fine-tuned strategy engine. Game Engine Unreal Engine 4 Unity Godot CryEngine Cocos Creator 2.x Amazon Lumberyard GameMaker: Studio 2 Platform 2D/3D 2D/3D 2D/3D 2D/3D 2D 3D 2D Scripting Language(s) C++, Python, Blueprint visual scripting C#, Javascript C/C++, GDScript C++, C#, Lua Javascript, Typescript Lua, Script Canvas GameMaker Language Table 2.1: Some popular game engines associated with their platforms and scripting languages. 1 https://www.gameenginebook.com/figures.html 6 BACKGROUND 2.1.2 Runtime Engine Architecture A game engine generally consists of a tool suite and a runtime component. Figure 2.2 shows all of the major runtime components that makeup a typical 3D game engine. Like all software systems, game engines are built in layers where upper layers depend on lower layers, but not vice versa. Figure 2.2: The runtime game engine architecture. 2 7

- Xem thêm -

Tài liệu Smart agent for card games using reinforcement learning

Tài liệu liên quan

Tài liệu vừa đăng

Tài liệu xem nhiều nhất