VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY
FACULTY OF COMPUTER SCIENCE AND ENGINEERING
——————– * ———————
THESIS
Smart agent for card-games using
reinforcement learning
COUNCIL
: COMPUTER SCIENCE
SUPERVISOR : PhD. Nguyen Duc Dung
REVIEWER : MSc. Vuong Ba Thinh
—o0o—
Students:
Vo Khac Thanh
Cap Dang Xuan Kiet
1713164
1711852
HO CHI MINH CITY, October, 2021
ĈҤ,+Ӑ&48Ӕ&*,$73+&0
75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$
.+2$.+ .70i\WtQK
%Ӝ0Ð1 .+07
&Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0
ĈӝFOұS7ӵGR+ҥQKSK~F
1+,ӊ09Ө/8Ұ1È17Ӕ71*+,ӊ3
+Ӑ9¬7Ç19}.KҳF7KjQK
+Ӑ9¬7Ç1&iSĈһQJ;XkQ.LӋW
1*¬1+.KRDKӑFPi\WtQK
0669
0669
/Ӟ3
ĈҫXÿӅOXұQiQ
$JHQW7K{QJ0LQKFKR&DUGJDPHYӟL0{+uQK+ӑF7ăQJ&ѭӡQJ
1KLӋPYө\rXFҫXYӅQӝLGXQJYjVӕOLӋXEDQÿҫX
7uPKLӇXFiFWKӇORҥLWXUQEDVHGFDUGJDPHFiFF{QJQJKӋ[k\GӵQJJDPH
7uPKLӇXFiFF{QJQJKӋKӑFWăQJFѭӡQJÿѭӧFGQJWURQJ[k\GӵQJDJHQW
ĈӅ[XҩW[k\GӵQJPӝWWXUQEDVHGFDUGJDPHSKKӧSFKRQJKLrQFӭX
ĈӅ[XҩWP{KuQKKӑFWăQJFѭӡQJÿӇSKiWWULӇQDJHQWFKRJDPH
+LӋQWKӵFJDPHYjP{KuQKFKRDJHQW
3KkQWtFKYjÿiQKJLiFiFNӃWTXҧÿҥWÿѭӧFFӫDDJHQW
1Jj\JLDRQKLӋPYөOXұQiQ
1Jj\KRjQWKjQKQKLӋPYө
+ӑWrQJLҧQJYLrQKѭӟQJGүQ
1JX\ӉQĈӭF'NJQJ
3KҫQKѭӟQJGүQ
1ӝLGXQJYj\rXFҫX/971ÿmÿѭӧFWK{QJTXD%ӝP{Q
1Jj\WKiQJQăP
&+Ӫ1+,ӊ0%Ӝ0Ð1
*,Ҧ1*9,Ç1+ѬӞ1*'Ү1&+Ë1+
.êYjJKLU}K͕WrQ
.êYjJKLU}K͕WrQ
3*676+XǤQK7ѭӡQJ1JX\rQ
761JX\ӉQĈӭF'NJQJ
3+̮1'¬1+&+2.+2$%͠0Ð1
1JѭӡLGX\ӋWFKҩPVѫEӝ
ĈѫQYӏ
1Jj\EҧRYӋ
ĈLӇPWәQJNӃW
1ѫLOѭXWUӳOXұQiQ
75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$
.+2$.+ .70È<7Ë1+
&Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$ 9,ӊ71$0
ĈӝFOұS7ӵGR+ҥQKSK~F
1Jj\WKiQJQăP
3+,ӂ8&+Ҩ0%Ҧ29ӊ/971
'jQKFKRQJ˱ͥLK˱ͣQJG̳Q
+ӑYjWrQ699}.KҳF7KjQK&iSĈһQJ;XkQ.LӋW
0669
1JjQKFKX\rQQJjQK.+07
ĈӅWjL$JHQW7K{QJ0LQKFKR&DUGJDPHYӟL0{+uQK+ӑF7ăQJ&ѭӡQJ
+ӑWrQQJѭӡLKѭӟQJGүQ1JX\ӉQĈӭF'NJQJ
7әQJTXiWYӅEҧQWKX\ӃWPLQK
6ӕWUDQJ
6ӕEҧQJVӕOLӋX
6ӕWjLOLӋXWKDPNKҧR
+LӋQYұWVҧQSKҭP
6ӕFKѭѫQJ
6ӕKuQKYӁ
3KҫQPӅPWtQKWRiQ
7әQJTXiWYӅFiFEҧQYӁ
6ӕEҧQYӁ
%ҧQ$
6ӕEҧQYӁYӁWD\
%ҧQ$
.KәNKiF
6ӕEҧQYӁWUrQPi\WtQK
1KӳQJѭXÿLӇPFKtQKFӫD/971
ĈӅ WjL QJKLrQ FӭX [k\ GӵQJ PӝW DJHQW WK{QJ PLQK FKR FDUGJDPH YӟL Nӻ WKXұW KӑF WăQJ FѭӡQJ
1JKLrQ FӭX Qj\ Fy WKӇ Pӣ UD FiF KѭӟQJ SKiW WULӇQ PӟL WURQJ YLӋF [k\ GӵQJ FiF KӋ WKӕQJ UD TX\ӃW
ÿӏQK WK{QJ PLQK Kӛ WUӧ FKR FRQ QJѭӡL ӣ QKLӅX OƭQK YӵF 1KyP ÿm ÿӅ [XҩW Yj [k\ GӵQJ WKjQK F{QJ
PӝW FDUG JDPH YӟL ÿӝ NKy SK KӧS Yj WұS OXұW KRjQ FKӍQK *DPH Qj\ ÿѭӧF Vӱ GөQJ QKѭ PӝW QӅQ
WҧQJ JL~S QJKLrQ FӭX FiF JLҧL WKXұW KӑF WăQJ FѭӡQJ QKҵP [k\ GӵQJ DJHQW WK{QJ PLQK 1KyP FNJQJ
ÿm WKӵF KLӋQ QJKLrQ FӭX Yj [k\ GӵQJ FiF P{ KuQK KӑF WăQJ FѭӡQJ ÿӇ SKiW WULӇQ DJHQW &iF NӃW TXҧ
WKXÿѭӧFNKҧTXDQYjFzQQKLӅXWLӅPQăQJÿӇFҧLWKLӋQ
1KӳQJWKLӃXVyWFKtQKFӫD/971
'R WKӡL JLDQ JLӟL KҥQ FNJQJ QKѭ FiF KҥQ FKӃ YӅ WjL QJX\rQ WtQK WRiQ QrQ QKyP FKӍ PӟL GӯQJ OҥL ӣ
FiFDJHQWÿѫQJLҧQ
ĈӅQJKӏĈѭӧFEҧRYӋ 🗹
%әVXQJWKrPÿӇEҧRYӋ Ƒ
.K{QJÿѭӧFEҧRYӋ Ƒ
FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ
D
E
F
ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7%JLӓL
ĈLӇP
.êWrQJKLU}KӑWrQ
761JX\ӉQĈӭF'NJQJ
75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$
.+2$.+ .70È<7Ë1+
&Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$ 9,ӊ71$0
ĈӝFOұS7ӵGR+ҥQKSK~F
1Jj\WKiQJQăP
3+,ӂ8&+Ҩ0%Ҧ29ӊ/971
'jQKFKRQJ˱ͥLSK̫QEL͏Q
+ӑYjWrQ699}.KҳF7KjQK0669Yj&iSĈһQJ;XkQ.LӋW0669
1JjQKFKX\rQQJjQK.KRDKӑFPi\WtQK
ĈӅWjL$JHQW7K{QJ0LQKFKR&DUGJDPHYӟL0{+uQK+ӑF7ăQJ&ѭӡQJ
+ӑWrQQJѭӡLSKҧQELӋQ9ѭѫQJ%i7KӏQK
7әQJTXiWYӅEҧQWKX\ӃWPLQK
6ӕWUDQJ
6ӕFKѭѫQJ
6ӕEҧQJVӕOLӋX
6ӕKuQKYӁ
6ӕWjLOLӋXWKDPNKҧR
3KҫQPӅPWtQKWRiQ
+LӋQYұWVҧQSKҭP
7әQJTXiWYӅFiFEҧQYӁ
6ӕEҧQYӁ
%ҧQ$
%ҧQ$
.KәNKiF
6ӕEҧQYӁYӁWD\
6ӕEҧQYӁWUrQPi\WtQK
1KӳQJѭXÿLӇPFKtQKFӫD/971
1KyPVLQKYLrQÿmWuPKLӇXFiFF{QJWUuQKOLrQTXDQFiFF{QJFөKӛWUӧSKKӧS
3KkQWtFKWKLӃWNӃJDPHSKKӧSYӟLPөFWLrXWҥRUDDJHQWWK{QJPLQKGӵDWUrQKӑFWăQJFѭӡQJ
+LӋQWKӵFJDPHFyÿӗKӑDәQ
;k\GӵQJKLӋQWKӵF1)63DJHQWYjFyÿiQKJLiNKҧQăQJFKѫLJDPHFӫDDJHQW
ĈӅ[XҩWYjKLӋQWKӵFWKjQKF{QJP{KuQKNӃWKӧSJLӳD1)63ZLWK'RXEOH'41
1KӳQJWKLӃXVyWFKtQKFӫD/971
'ӳOLӋXKXҩQOX\ӋQFzQtWYjFKѭDVӱGөQJFiFSKѭѫQJSKiSP{SKӓQJÿӇVLQKGӳOLӋX
*LӳDFiF)LWҥPJӑLOjERWÿӡLVDXÿѭӧFVLQKUDWӯ)ÿӡLWUѭӟFÿyFKѭDFyEҵQJFKӭQJELӋQOXұQ
WKX\ӃWSKөFYӅVӵѭXYLӋFNKLFKѫLJDPHFӫD)VDXVRYӟL)WUѭӟF
ĈӅQJKӏĈѭӧFEҧRYӋ Ƒ
%әVXQJWKrPÿӇEҧRYӋ Ƒ
.K{QJÿѭӧFEҧRYӋ Ƒ
FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ
D1KyPVLQKYLrQFyWuPKLӇXYӅYLӋFNӃWKӧSJLӳD1)63YӟL0&76FKѭD"
ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7%
ĈLӇP
.êWrQJKLU}KӑWrQ
9ѭѫQJ%i7KӏQK
Declaration
We hereby declare that unless a clear reference is made to the work of others, the substance of this dissertation is original and has not been submitted in whole or in part for
consideration at this or any other university for any other degree or qualification. This study
is our work and contains nothing that, except as stated in the text and acknowledgments, it
the product of work done in collaboration with others. If this statement is not true, our team
will accept all responsibility to the Dean of the School and the School Administration.
Acknowledgments
We would want to express our heartfelt and sincere appreciation to Dr.Nguyen Duc Dung.
For roughly one year, we have faced many challenges. Many tasks that we thought they
could be finished in hours turned out to be weeks, months. We saw things did not go as
we expected, some dragged us down. However, we were eventually clear-headed and found
solutions for every problems in the thesis, thanks to your advice and guidance. We will be
eternally thankful and obliged to you for your direction, wisdom, passion, and encouragement
in assisting us in studying and implementing this project.
We would like to thank all of our faculty lecturers, without whom we would not have
had the essential knowledge to complete our research. Later, armed with this university’s
invaluable experience, we shall fearlessly face our future and be exceptional.
Four years have passed since the first day we started our first lecture in the university.
We had new friends, who were not only classmate, but also great teachers, teammates and
assistants. It would have taken longer than four years for us to complete all classes if it had not
been for you. Furthermore, this four-year period would never be fulfilled without happy, sad,
cheerful and emotional memories that we had together. We will always respect and appreciate
those as the best days that we have ever had.
Despite our efforts, we are aware that this project is currently insufficient and will inevitably contain flaws. We’d love to hear from professors on how we might improve even
more.
Finally, we wish you health, wealth, and success in your endeavors.
Students.
Abstract
Artificial intelligence and games have long been in a relationship. Gamers demand sophisticated AI, while games provide hard and diverse challenges whose solutions often apply to
real-life problems. In the past, Chess and Go are played by computers at a superhuman level.
Recently, in Defend of the Ancients 2 (DotA 2), computers defeated the best professional player,
yet there were some restrictions in the match.
Collectible Card Games (CCG), a branch of the strategy genre, challenges minds with their
complex environments. These games are ideal for researching artificial intelligence as they are
harder to play compared to traditional card games and the responding time requirement is relaxed.
In this dissertation, we aim to create a CCG game and training smart agents who are able to
take efficient actions and play at an accomplished level.
Table of contents
1
2
3
Introduction
1.1 Overview . . . . . . . . . . . . . .
1.1.1 Video games & Card-games
1.1.2 Artificial Intelligence . . . .
1.2 Objectives . . . . . . . . . . . . . .
1.3 Development process . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
2
3
3
Background
2.1 Game Engines . . . . . . . . . . . . . . . . . . . . .
2.1.1 Definition . . . . . . . . . . . . . . . . . . .
2.1.2 Runtime Engine Architecture . . . . . . . . .
2.1.2.1 Third-Party SDKs and Middleware
2.1.2.2 Resource Manager . . . . . . . . .
2.1.2.3 Rendering Engine . . . . . . . . .
2.1.2.4 Collision and Physics . . . . . . .
2.1.2.5 Human Interface Devices - HID . .
2.1.2.6 Gameplay Foundation Systems . .
2.2 Reinforcement Learning . . . . . . . . . . . . . . .
2.2.1 Markov Decision Process . . . . . . . . . . .
2.2.2 Q-learning . . . . . . . . . . . . . . . . . .
2.2.3 Fictitious Self-play . . . . . . . . . . . . . .
2.3 Supervised Learning . . . . . . . . . . . . . . . . .
2.3.1 Multilayer perceptron . . . . . . . . . . . . .
2.3.2 Recurrent neural network . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
8
8
8
9
9
10
11
12
12
13
13
13
16
.
.
.
.
.
.
.
.
.
.
.
.
18
18
18
19
19
19
21
21
22
22
22
23
23
Related Works
3.1 Game Engines . . . . .
3.1.1 Unity . . . . .
3.1.1.1 Pros
3.1.1.2 Cons
3.1.2 Cocos Creator .
3.1.2.1 Pros
3.1.2.2 Cons
3.1.3 Summary . . .
3.2 Games . . . . . . . . .
3.2.1 Yu-Gi-Oh! . .
3.2.1.1 Pros
3.2.1.2 Cons
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3.2.2
3.3
3.4
4
Hearthstone . . . . . . . . .
3.2.2.1 Pros . . . . . . .
3.2.2.2 Cons . . . . . . .
3.2.2.3 Summary . . . .
Reinforcement learning approaches .
3.3.1 Deep Q-network . . . . . .
3.3.1.1 Overview . . . . .
3.3.1.2 Pros . . . . . . .
3.3.1.3 Cons . . . . . . .
3.3.2 Double Deep Q-Network . .
3.3.2.1 Overview . . . . .
3.3.2.2 Pros . . . . . . .
3.3.2.3 Cons . . . . . . .
3.3.3 Neural Fictitious Self-play .
3.3.3.1 Overview . . . . .
3.3.3.2 Pros . . . . . . .
3.3.3.3 Cons . . . . . . .
3.3.4 Remarks . . . . . . . . . .
Summary . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Quest of the Divinity
4.1 System Architecture . . . . . . . . . . . . .
4.1.1 Application Programming Interface
4.1.2 Communication Server . . . . . . .
4.2 Quest of the Divinity . . . . . . . . . . . .
4.2.1 Game board . . . . . . . . . . . . .
4.2.2 Ruleset . . . . . . . . . . . . . . .
4.2.3 Game Controls . . . . . . . . . . .
4.2.4 How To Play . . . . . . . . . . . .
4.2.5 Card Design . . . . . . . . . . . .
4.2.6 Architecture Design . . . . . . . .
4.2.6.1 Client . . . . . . . . . .
4.2.6.2 API Director . . . . . . .
4.2.6.3 Log Writer . . . . . . . .
4.2.6.4 Mouse . . . . . . . . . .
4.2.6.5 Timer . . . . . . . . . .
4.2.6.6 Game Manager . . . . .
4.2.6.7 Player . . . . . . . . . .
4.2.6.8 Hand . . . . . . . . . . .
4.2.6.9 Deck . . . . . . . . . . .
4.2.6.10 Boardside . . . . . . . .
4.2.6.11 Card . . . . . . . . . . .
4.2.6.12 Visualizer . . . . . . . .
4.3 Game agent . . . . . . . . . . . . . . . . .
4.3.1 Feature Extraction . . . . . . . . .
4.3.2 Reward functions . . . . . . . . . .
4.3.3 NFSP agent with double DQN . . .
4.3.3.1 Imitation Agent . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
23
24
25
25
25
25
25
26
26
27
27
27
27
29
29
30
30
31
31
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
32
32
32
33
33
35
35
36
36
38
38
38
40
40
40
40
40
41
41
41
41
42
42
42
43
43
43
4.3.3.2
4.3.3.3
5
6
Double DQN Agent . . . . . . . . . . . . . . . . . . . . . .
NFSP Agent . . . . . . . . . . . . . . . . . . . . . . . . . .
Experiments & Results
5.1 Experiment . . . . . . . . . . . . . .
5.1.1 Experimental setup . . . . . .
5.1.2 Data description . . . . . . .
5.1.2.1 Data overview . . .
5.1.2.2 Data problem . . .
5.1.2.3 Solution and result .
5.2 Results . . . . . . . . . . . . . . . . .
5.2.1 Imitation Agents . . . . . . .
5.2.2 Evaluation with collected data
5.2.3 Evaluation with game . . . . .
5.2.4 Evaluation with human . . . .
44
44
.
.
.
.
.
.
.
.
.
.
.
47
47
47
48
48
48
49
50
50
51
52
54
Conclusion
6.1 What went well . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 What was not great . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 What could be different . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
55
55
55
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
A In-game Images
59
B Card Details & Graphics References
B.1 Card Details . . . . . . . . . . . . .
B.1.1 Minions . . . . . . . . . . .
B.1.1.1 None . . . . . . .
B.1.1.2 Sage . . . . . . .
B.1.1.3 Warrior . . . . . .
B.1.1.4 Abyss . . . . . .
B.1.1.5 Nature . . . . . .
B.1.1.6 Dragon . . . . . .
B.1.2 Spell Cards . . . . . . . . .
B.1.3 Sacrifice Cards . . . . . . .
B.2 Graphics References . . . . . . . .
B.2.1 Asset Packs . . . . . . . . .
B.2.2 Minions . . . . . . . . . . .
B.2.2.1 None . . . . . . .
B.2.2.2 Sage . . . . . . .
B.2.2.3 Warrior . . . . . .
B.2.2.4 Abyss . . . . . .
B.2.2.5 Nature . . . . . .
B.2.2.6 Dragon . . . . . .
B.2.2.7 Summon Minions
B.2.3 Spells . . . . . . . . . . . .
B.2.4 Sacrifices . . . . . . . . . .
B.2.5 Albums . . . . . . . . . . .
B.2.6 Others . . . . . . . . . . . .
64
64
64
64
65
65
66
66
67
67
67
68
68
68
68
68
69
69
69
70
70
70
70
71
71
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
C API: Card IDs
72
D API: Action IDs
73
List of figures
Poker Championship, a Poker-based online game 1 . . . . . . . . . . . . . . .
Deep Blue in IBM’s headquarters in Armonk, N.Y. 2 . . . . . . . . . . . . . .
1
2
Unreal Engine 4. 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The runtime game engine architecture. 4 . . . . . . . . . . . . . . . . . . . . .
Cocos 3.1.0 engine allows user to choose desired physics simulation library. . .
Unity scripting system requires shutting the game down before recompilation. .
Reinforcement learning process. . . . . . . . . . . . . . . . . . . . . . . . . .
Multilayer perceptron. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
An example of a computational graph. . . . . . . . . . . . . . . . . . . . . . .
Forward propagation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RNN and the unfolding in time of the computation involved in its forward computation [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.10 Types of RNN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
7
9
10
11
14
15
16
3.1
3.2
3.3
3.4
3.5
3.6
Unity Editor user interface. . . . . . . . . . . . . . . . .
Cocos2d-x Framework Architecture5 . . . . . . . . . . .
Cocos 2.4.5 preview. . . . . . . . . . . . . . . . . . . .
Yu-Gi-Oh! Power of Chaos: Yugi The Destiny gameplay.
An example of Hearthstone battlefield interface6 . . . . .
Values estimated from the DQN . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
20
21
23
24
28
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
System architecture. . . . . . . . . . . . . . .
The game board. . . . . . . . . . . . . . . .
Elements of the game board. . . . . . . . . .
Minion card overview. . . . . . . . . . . . .
Spell and Sacrifice cards overview . . . . . .
The game main architecture. . . . . . . . . .
Architecture of Imitation Agent. . . . . . . .
Neural Fictitious Self-play agent’s flowchart.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
34
34
37
37
39
44
46
5.1
5.2
5.3
5.4
5.5
5.6
5.7
Rule-based agent data flow. . . . . . . . . . . . . . . . . . . .
Percentages of number of states of actions. . . . . . . . . . . .
Accuracy of imitation agent during training process. . . . . . .
Loss of imitation agent during training process. . . . . . . . .
Accuracy and loss of imitation agents during training process.
Total scores of agents while playing in emulator. . . . . . . . .
Win rates of agents during playing 100 games. . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
48
49
49
51
52
53
A.1 Player can drag and drop cards to activate them. . . . . . . . . . . . . . . . . .
59
1.1
1.2
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
16
17
A.2
A.3
A.4
A.5
A.6
A.7
A.8
A.9
Simple particle effect on activating a minion card. . . . . . .
Apprentice Mage with initial chapion effect. . . . . . . . . .
Minion outline types . . . . . . . . . . . . . . . . . . . . .
A minion deals damage to its target, with some debris splash.
Minion’s simple exploding effect. . . . . . . . . . . . . . .
A spell card stays at its dropped position and wait for target. .
Lord of the Apocalypse and Infernal creatures special effects.
The One From Above with special attack. . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
60
60
61
61
62
62
63
63
B.1
B.2
B.3
B.4
B.5
B.6
B.7
B.8
Card in None class. . . .
Cards in Sage class. . . .
Cards in Warrior class. .
Cards in Abyss class. . .
Cards in Nature class. .
Cards in Dragon class. .
The Spell card subset. . .
The Sacrifice card subset.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
64
65
65
66
66
67
67
68
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 1
Introduction
1.1
1.1.1
Overview
Video games & Card-games
Video games, from the player perspective, are virtual environments where players can interact using peripheral devices (keyboard, mouse, gamepad, etc.) to relax and enjoy the fun that has
been arranged by designers. Games, on the other hand, allow players to make rational decisions
according to defined rules in an attempt to receive some sort of payoff [2].
There are many categories of games: action, first-person shooter (FPS), third-person shooter
(TPS), strategy, and the list goes on. However, a game may be categorized into multiple types,
such as action, hack ’n slash, third-person shooter...
Figure 1.1: Poker Championship, a Poker-based online game 1
Card games are one of the most familiar types of games that have been played, not only
digitally but also physically. Taking Poker as an example for this genre, it can be inferred that
1 https://store.steampowered.com/app/1138190/Poker_Championship/
1
INTRODUCTION
this game is built around playing cards that are identical in size and shape. Each game has a
deck of cards or a large set of cards to be built into various customized decks.
Card games are also strategy games, as they exploit the fact that each player knows only their
cards and cards on the table. They have no information about the opponent’s cards and cards
that remain on the deck. Precisely, card games are strategy games with imperfect information.
1.1.2
Artificial Intelligence
Artificial Intelligence (AI) emerged in the 1950s, and the field has been researched and
developed dramatically since then. We utilize AI in most computer applications, such as recommendation, recognition or detection, and many more. Game is not an exception. Now, AI
becomes the core of most computer games.
In the 1990s, people implemented classical AI methods in games. For instance, Deep Blue
uses search algorithms to play chess. Although Deep Blue’s performance was good, its al-
Figure 1.2: Deep Blue in IBM’s headquarters in Armonk, N.Y. 2
2 Yvonne
Hemsey/Getty Images
2
INTRODUCTION
gorithm demands a significant computational power to work well on much larger state-space
games.
In recent years, Machine Learning approaches (Supervised Learning - SL and Reinforcement Learning - RL) have been implemented and gradually improved, which resulted in high
performance and low computational power requirement.
1.2
Objectives
Card game is a good application for testing AI systems. We can evaluate the efficiency of
AI systems on these games since they contain complicated states and action spaces.
In this work, we aim for a modern card game with smart agents based on modern machine
learning approaches. To achieve this goal, we are going to perform the following tasks:
• Investigate card games to prepare for designing an alternative card game with smart agents
based on machine learning models.
• Build the proposed Card game in order to assist the smart agent.
• Build Intelligent Agents using modern methods (RL, combined RL - SL) and evaluate
their performance.
We decide to build our own game instead of using an existed game because:
• The game’s complexity can be supervised and controlled easily, that is, we would not have
to invest too much time in exploring and understanding how a game functions, or every of
its interactions and mechanisms.
• On-demand customization and integration can be performed in short time, as we built the
core components, it is easier to develop features or sub-system based on that core than
revising others’ code.
1.3
Development process
To achieve the above objectives, we break down the development process into phases as
follows:
• Phase 1: Research both game engines and machine learning methods: foundation, technical
basis, mathematical basis, advantages and disadvantages.
• Phase 2: Choose techniques to build the games and agents and design the core game components.
• Phase 3: Implement a game prototype with basic interactions and an agent-training environment.
• Phase 4: Build a complete game and an intermediate component to link the game and the
training environment.
• Phase 5: Update the game and train agents based on chosen approaches.
• Phase 6: Evaluate the agents.
• Phase 7: Organize a self-retrospective meeting to evaluate the whole thesis.
We will clarify the development process in the rest of this dissertation. In the next chapter,
we present mathematical and technical foundation that we have investigated on game engines,
games and reinforcement learning approaches. In chapter 3, we list out some considerations of
game engines, referenced games, modern machine learning approaches, then summarizes our
chosen technologies and development plan. Then, we describe in details every main component:
3
INTRODUCTION
communication server, proposed game and game agents in chapter 4. Chapter 5 presents evaluation results of our approach and analyze the performance of our smart agents. We conclude this
work in chapter 6 and provide a self-retrospective on our performance: what went well, what
did not, and what could be different.
4
Chapter 2
Background
2.1
Game Engines
Since the early days of the gaming industry, the term “game engine” had not existed. The
software that makes video and arcade games tick were highly specialized to both the game in
question and the hardware on which it ran. Today, games are a multi-billion-dollar mainstream
industry. The software that drives these now-ubiquitous three-dimensional and two-dimensional
worlds: game engines, e.g. Epic Games’ Unreal Engine 4 and its upcoming descendant Unreal
Engine 5, Valve’s Source engine, Unity game engine,... have become fully featured reusable
software development kits that can be used to build almost any game.
Virtually, all game engines contain a familiar set of core components, including rendering
engine, collision and physics engine, the animation system, the audio system, the game world
object model and so on.
Figure 2.1: Unreal Engine 4. 1
1 https://www.unrealengine.com/en-US/onlinelearning-courses/introducing-unreal-engine
5
BACKGROUND
2.1.1
Definition
The term “game engine” arose in the mid-1990s about first-person shooter (FPS) games
like the legendary Doom by id Software. The game was architected with a reasonably welldefined separation between its core software components (such as the three-dimensional graphics rendering system, the collision detection system or the audio system) and the art assets,
game worlds and rules of play that comprised the player’s gaming experience. The value of this
separation became evident as developers began licensing games and retooling them into new
products by creating new art, world layouts, weapons, characters, vehicles and game rules with
only minimal changes to the "engine" software.
Engines were made highly customizable via scripting languages like id’s Quake C, and
engine licensing began to be a viable secondary revenue stream for developers who created
them. Today, game developers can license a game engine and reuse significant portions of its
key software components to build games. This practice can be much more economical than
developing all of the core engine components in-house.
The line between a game and its engine is often blurry. Some engines make a reasonably
clear distinction, while others make almost no attempt to separate the two. In one game, the
rendering code might "know" exactly how to draw an orc, in another one, the rendering engine
might provide general-purpose material and shading facilities, and "orc-ness" might be defined
entirely in data. Perhaps a data-driven architecture is what differentiates a game engine from a
piece of software that is a game but not an engine. A game contains hard-coded logic or game
rules, or employs special-case code to render specific types of game objects, it is obvious that
the software is difficult or even impossible to reuse and make a different game. Game engine, on
the other hand, is more extensible and can be used as the foundation for many different games
without major modification.
Although game engines are made to build any games, they are still specialized for one or
two game genres. With the advent of faster graphics cards, processing units, hard disks, along
with more efficient rendering algorithms and data structures, it is possible to use a first-person
shooter (FPS) engine to build a strategy game, for example. However, the trade-off between
generality and optimality still exists. The strategy game may render slower or handle the logic
longer if it is built by a FPS engine rather than a fine-tuned strategy engine.
Game Engine
Unreal Engine 4
Unity
Godot
CryEngine
Cocos Creator 2.x
Amazon Lumberyard
GameMaker: Studio 2
Platform
2D/3D
2D/3D
2D/3D
2D/3D
2D
3D
2D
Scripting Language(s)
C++, Python, Blueprint visual scripting
C#, Javascript
C/C++, GDScript
C++, C#, Lua
Javascript, Typescript
Lua, Script Canvas
GameMaker Language
Table 2.1: Some popular game engines associated with their platforms and scripting languages.
1 https://www.gameenginebook.com/figures.html
6
BACKGROUND
2.1.2
Runtime Engine Architecture
A game engine generally consists of a tool suite and a runtime component. Figure 2.2 shows
all of the major runtime components that makeup a typical 3D game engine. Like all software
systems, game engines are built in layers where upper layers depend on lower layers, but not
vice versa.
Figure 2.2: The runtime game engine architecture. 2
7
- Xem thêm -