Vietnam National University, Ho Chi Minh City
Ho Chi Minh City University of Technology
Faculty of Computer Science and Engineering
——————– * ———————
Bachelor of Engineering Thesis
Stocks Price Trends Prediction
Using Machine Learning Techniques
Committee
:
Supervisors :
Computer Science
Dr. Nguyen An Khuong,
HCMUT, VNU-HCM
Dr. Nguyen Tien Thinh,
HCMUT, VNU-HCM
Mr. Phan Son Tu,
Descartes Network
Mr. Nguyen Thanh Phuong, New Mexico State University
Reviewer
:
Dr. Nguyen Hua Phung,
HCMUT, VNU-HCM
Author
:
Nguyen Duc Phu,
1710234
Ho Chi Minh City, August 10, 2021
ĈҤ,+Ӑ&48Ӕ&*,$73+&0
---------75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$
KHOA: KH & KT Máy tính
%Ӝ0Ð1 KHMT
&Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0
ĈӝFOұS- 7ӵGR- +ҥQKSK~F
1+,ӊ09Ө/8Ұ1È17Ӕ71*+,ӊ3
&K~ê6LQKYLrQSK̫LGiQWͥQj\YjRWUDQJQK̭WFͯDE̫QWKX\͇Wtrình
+Ӑ9¬7Ç1 1JX\ӉQĈӭF3K~
NGÀNH: .KRDKӑF0i\WtQK
MSSV: 1710234
/Ӟ3 MT17KH02
ĈҫXÿӅOXұQYăQ: 'ӵÿRiQ[XKѭӟQJFӫDJLiFәSKLӃXVӱGөQJFiFNӻWKXұWKӑFPi\
(Stock price trends prediction using machine learning techniques)
1KLӋPYө\rXFҫXYӅQӝLGXQJYjVӕOLӋXEDQÿҫX
i) TìPKLӇXNLӃQWKӭFYӅFiFKWKӭFKRҥWÿӝQJFӫDWKӏWUѭӡQJFKӭQJNKRiQYjFiFKRҥWÿӝQJ
ÿҫXWѭPXD-EiQFәSKLӃXEҵQJWKXұWWRiQ.
ii) 7uPKLӇXFiFNӻWKXұWKӑFPi\WURQJYLӋF[ӱOêGӳOLӋXFKXӛL WKӡLJLDQÿһFELӋWOjGӳOLӋX
tài chính.
iii) 7KXWKұSOjPVҥFKYj[ӱOêGӳOLӋXWjLFKtQK FKRYLӋFKXҩQOX\ӋQKӑFPi\.
iv) ;k\GӵQJP{KuQKKӑFPi\ÿӇGӵÿRiQ[XKѭӟQJJLiFәSKLӃX
v) +LӋQWKӵFWUuQKJLҧOұSP{SKӓQJJLDRGӏFKFәSKLӃXÿѫQJLҧQÿӇNLӇPÿӏQKKLӋXTXҧFӫD
mô hình
1Jj\JLDRQKLӋPYөOXұQYăQ: 01/03/2021
1Jj\KRjQWKjQKQKLӋPYө 14/06/2021
+ӑWrQJLҧQJYLrQKѭӟQJGүQ
3KҫQKѭӟQJGүQ
1) 1JX\ӉQ$Q.KѭѫQJ, Ĉ+%iFK.KRD7S+&0
*ӧLêKѭӟQJÿӅWjLJLiPViWTXiWUuQK
WKӵFKLӋQ
2) 1JX\ӉQ7LӃQ7KӏQK, Ĉ+%iFK.KRD Tp.HCM
+ѭӟQJGүQNLӃQWKӭFQӅQWҧQJJLiPViW
TXiWUuQKWKӵFKLӋQ
3) 3KDQ6ѫQ7ӵ, Descartes Networks
ĈӏQKKѭӟQJÿӅWjLKѭӟQJGүQNӻWKXұW
4) 1JX\ӉQ7KjQK3KѭѫQJ, Ĉ+New Mexico State
ĈӏQKKѭӟQJÿӅWjLhѭӟQJGүQFKtQKNӻ
WKXұWF{QJQJKӋ
1ӝLGXQJYj\rXFҫX/971ÿmÿѭӧFWK{QJTXD%ӝP{Q
Ngày 03 tháng 08 QăP2021
&+Ӫ1+,ӊ0%Ӝ0Ð1
*,Ҧ1*9,Ç1+ѬӞ1*'Ү1&+Ë1+
.êYjJKLU}K͕WrQ
.êYjJKLU}K͕WrQ
3+̮1'¬1+&+2.+2$%͠0Ð1
1JѭӡLGX\ӋWFKҩPVѫEӝ________________________
ĈѫQYӏ _______________________________________
1Jj\EҧRYӋ __________________________________
ĈLӇPWәQJNӃW _________________________________
1ѫLOѭXWUӳOXұQiQ _____________________________
75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$
KHOA KH & KT MÁY TÍNH
&Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0
ĈӝFOұS- 7ӵGR- +ҥQKSK~F
---------------------------Ngày 11 tháng 08 QăP 2021
3+,ӂ8&+Ҩ0%Ҧ29ӊ/971
'jQKFKRQJ˱ͥLK˱ͣQJG̳Q
+ӑYjWrQ69 NguyӉQĈӭF3K~
MSSV: 1710234 (MT17KH02)
Ngành (chuyên ngành): KHMT
ĈӅWjL 'ӵÿRiQ[XKѭӟQJJLiFәSKLӃXVӱGөQJFiFNӻWKXұWKӑFPi\ (Stocks price trends
prediction using machine learning techniques)
+ӑWrQQJѭӡLKѭӟQJGүQ
x 1JX\ӉQ$Q.Kѭѫng, Khoa K+ .70i\WtQKĈ+BK
x 1JX\ӉQ7LӃQ7KӏQK, .KRD.+ .70i\WtQKĈ+%.
x 3KDQ6ѫQ7ӵ, Descartes Network
x 1JX\ӉQ7KjQK3KѭѫQJ, Ĉ+1HZ0H[LFR6WDWH+RD.Ǥ
7әQJTXiWYӅEҧQWKX\ӃWPLQK
6ӕWUDQJ 76
6ӕFKѭѫQJ 06 3KөOөF
6ӕEҧQJVӕOLӋX: 7
6ӕKuQKYӁ 48
6ӕtài OLӋXWKDPNKҧo: 36
3KҫQPӅPWtQKWRiQ
+LӋQYұWVҧQSKҭP: &'FKӭD các files [ӱOêGӳOLӋXKXҩQOX\ӋQYjNLӇPÿӏQKP{KuQK
7әQJTXiWYӅFiFEҧQYӁ
- 6ӕEҧQYӁ
%ҧQ$
%ҧQ$
.KәNKiF
- 6ӕEҧQYӁYӁWD\
6ӕEҧQYӁWUrQPi\WtQK
1KӳQJѭXÿLӇPFKtQKFӫD/9TN:
x /9ÿѭӧFYLӃWEҵQJWLӃQJ$QKNKiWӕWtWOӛL, trình bày ÿҽSPҥFKOҥFU}UjQJÿ~QJTX\
cách7/7.WUuQKEj\ÿ~QJFKXҭQ
x SVTH có QăQJOӵFWӕW, có NKҧQăQJWӵKӑFYjWLQKWKҫQOjPYLӋFÿӝFOұSUҩWFDR
x 697+QҳPYӳQJNLӃQWKӭFQӅQWҧQJNӻWKXұWYjFiF F{QJQJKӋFyOLrQTXDQ ÿӇ[ӱOêGӳOLӋX
FKXӛLWKӡLJLDQ[k\GӵQJÿѭӧFP{KuQKKӑFPi\ÿӇGӵÿRiQ[XKѭӟQJJLiFәSKLӃXYjKLӋQ
WKӵFJLҧOұSJLDRGӏFKFәSKLӃXÿѫQJLҧQGӵDWUrQGӵÿRiQFӫDP{KuQKQj\
x .ӃW TXҧÿҥWÿѭӧFFӫD/9FyêQJKƭDWKӵFWLӉQSKKӧSYӟLPөFWLrXYjJLӟLKҥQSKҥPYLÿӅ
WjLÿһWUDEDQÿҫX.
1KӳQJWKLӃXVyWFKtQKFӫD/971
x &ҫQWKӵFKLӋQÿiQKJLiSKkQWtFKFKLWLӃWKѫQEӝGӳOLӋXÿѭӧFVӱGөQJWURQJOXұQYăQ
x &KѭDWLQKFKӍQKFiFWK{QJ VӕFӫDP{KuQKÿӇÿҥWNӃWTXҧWӕLѭX
x ChѭDKLӋQWKӵFWKành công cөSKөFYөÿҫXWѭ
ĈӅQJKӏĈѭӧFEҧRYӋ;
%әVXQJWKrPÿӇEҧRYӋ
.K{QJÿѭӧFEҧRYӋ
9. 0ӝWVӕ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ Không có (69Vͅÿ˱ͫFK͗LWUFWL͇SWUrQ+Ĉ)
ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7% GiӓL
ĈLӇP9.5/10
.êWrQJKLU}KӑWrQ
1JX\ӉQ$Q.KѭѫQJ
TRƯỜNG ĐẠI HỌC BÁCH KHOA
KHOA KH & KT MÁY TÍNH
CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
Độc lập - Tự do - Hạnh phúc
---------------------------Ngày 10 tháng 8 năm 2021
PHIẾU CHẤM BẢO VỆ LVTN
(Dành cho người phản biện)
1. Họ và tên SV: Nguyễn Đức Phú
MSSV: 1710234
Ngành (chuyên ngành): Khoa học máy tính
2. Đề tài: Stocks Price Trends Prediction Using Machine Learning Techniques
3. Họ tên người phản biện: TS. Nguyễn Hứa Phùng
4. Tổng quát về bản thuyết minh:
Số trang: 72
Số chương: 6
Số bảng số liệu: 7
Số hình vẽ: 38
Số tài liệu tham khảo: 36
Phần mềm tính toán:
Hiện vật (sản phẩm)
5. Tổng quát về các bản vẽ:
- Số bản vẽ:
Bản A1:
Bản A2:
Khổ khác:
- Số bản vẽ vẽ tay
Số bản vẽ trên máy tính:
6. Những ưu điểm chính của LVTN:
Đề tài thực hiện dự đoán xu hướng giá của cổ phiếu chứng khoán giao dịch trong ngày. Đề tài thực
hiện sử dụng dữ liệu giao dịch trong 1 tháng của 7 cổ phiếu với khoảng 200 triệu bản ghi được cung
cấp bởi Wharton Research Data Services. Sinh viên Phú đã thực hiện xử lý dữ liệu (bình quân các
giao dịch trong mili giây để đưa về giao dịch trên giây, chuẩn hoá dữ liệu, tạo cửa sổ dữ liệu mỗi 5
phút. Sinh viên Phú cũng sử dụng các kỹ thuật học máy có sẵn (LSTM, ResNet50) và kết hợp
chúng theo hai hướng khác nhau (Hybrid, ResLSTM) sau đó triển khai thực nghiệm và thực hiện
mô phỏng giao dịch dựa vào kết quả dự đoán của các mô hình. Kết quả cho thấy có mô hình cho kết
quả tương đối tốt. Luận văn được viết bằng tiếng Anh khá tốt, ít lỗi.
7. Những thiếu sót chính của LVTN:
Đề tài chưa phân tích chi tiết các yêu cầu của đề tài, chưa thực hiện phân tích tập dữ liệu hiện có.
Một số vấn đề chưa được giải thích trong luận văn:
- Sự chênh lệch giữa giá ask và giá bid trong tập dữ liệu có lớn không? Nếu sự chênh lệch
không lớn thì có thể chỉ cần dùng một giá trong tập dữ liệu được không?
- Liệu có mối tương quan giữa khối lượng và giá dự báo không?
8. Đề nghị: Được bảo vệ
Bổ sung thêm để bảo vệ
Không được bảo vệ
9. 3 câu hỏi SV phải trả lời trước Hội đồng:
a. Sự chênh lệch giữa giá ask và giá bid trong tập dữ liệu có lớn không? Nếu sự chênh lệch không
lớn thì có thể chỉ cần dùng một giá trong tập dữ liệu được không?
b. Liệu có mối tương quan giữa khối lượng và giá dự báo không? Em đã thực hiện phân tích nào để
đánh giá mối tương quan giữa khối lượng và giá dự báo truớc khi đưa khối lượng vào mô hình học
máy?
10. Đánh giá chung (bằng chữ: giỏi, khá, TB): Giỏi
Điểm :
9 /10
Ký tên (ghi rõ họ tên)
TS. Nguyễn Hứa Phùng
Declaration
I certify that everything written in this thesis, as well as in the source code, is
done by myself, with the exception of quoted reference knowledge as well as code
provided by the manufacturer themselves, with no intention of plagiarising or
duplicating from foreign sources. If reassurances find contradicting results to the
aforementioned statement, I shall take full responsibility in front of the Faculty
and the University.
Author
Acknowledgements
We would like to express our very great appreciation to Dr. Nguyen An Khuong
for his huge support and useful critiques during the planning, development, and
completion of this thesis. His enthusiastic, credible, and continuous guidance plays
an important part in the completion of the thesis. Advice given by Dr. Nguyen
Tien Thinh has been a great help in both technical and presentation aspects.
We also would like to offer my special thanks to the seniors, Mr. Nguyen Thanh
Phuong, Mr. Van Tien Duc, Mr. Phan Son Tu, Mr. Tran Trung Hieu, Mr. Van
Minh Hao, and Mr. Nguyen Tan Duc, for their helpful advice during our research
process.
It would be incomplete without showing love to our family. They are the biggest
motivation for us to complete the thesis.
Finally, we would like to thanks my friends, Nguyen Dang Ha Nam and Nguyen
Huy Hong Huy, as well as Nguyen Nguyen Vi for their assists.
Author
i
Abstract
The stock market is nothing but one of the most attractive topics nowadays. Thanks
to the recent rapid development of machine learning, especially deep learning,
algorithmic trading becomes more popular. With the purpose of constructing an
automatic trading bot in mind, we decided to work on developing stocks price trends
predictors for our thesis as the first step. Besides two models using convolutional
neurons network and long-short term memory, we also propose other two hybrid
forms of these models. The result is competitive in terms of training and evaluation
performance, compared to other studies. Moreover, trading simulations based
on signals of trained models are conducted to provide more insights about the
potential of applying machine learning models into the real-life stock market, in
which one of our models achieves positive returns.
ii
Contents
Declaration
Acknowledgements
i
Abstract
ii
List of Figures
v
List of Tables
v
Abbreviations
viii
1 Introduction
1.1 Introduction to research problem . . . . . . . . . . . . . . . . . . .
1.2 Objectives of the study . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . .
2 Background
2.1 Basic concepts of stock . . . . . . . . .
2.1.1 Stock definition . . . . . . . . .
2.1.2 Stock markets . . . . . . . . . .
2.1.3 Stock order and types of orders
2.1.4 Order book . . . . . . . . . . .
2.2 Intraday trading . . . . . . . . . . . . .
2.3 High frequency data characteristics . .
2.3.1 Irregular temporal spacing . . .
2.3.2 Discreteness . . . . . . . . . . .
2.3.3 Diurnal patterns . . . . . . . .
2.4 Machine learning . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
3
5
5
5
5
6
7
8
10
10
10
11
12
iii
Contents
2.4.1
2.4.2
2.4.3
2.4.4
Deep learning . . . . . . . . . .
Feedforward Neural Networks .
Convolutional Neural Networks
Recurrent Neural Networks . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
16
19
25
3 Related Work
31
4 Proposed models and experiments
4.1 Data preparation . . . . . . . . . . .
4.1.1 Data overview . . . . . . . . .
4.1.2 Data preprocessing . . . . . .
4.1.3 Normalization . . . . . . . . .
4.1.4 Time intervals . . . . . . . . .
4.1.5 Labeling . . . . . . . . . . . .
4.2 Deep learning models . . . . . . . . .
4.2.1 LSTM . . . . . . . . . . . . .
4.2.2 ResNet . . . . . . . . . . . . .
4.2.3 Two proposed combinations of
4.2.4 Training method . . . . . . .
4.3 Trading strategies . . . . . . . . . . .
4.3.1 Trading strategies . . . . . . .
4.3.2 Evaluating trading strategy .
39
39
39
41
41
41
42
50
53
53
54
56
57
57
58
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
LSTM
. . . .
. . . .
. . . .
. . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
and ResNet
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Result
60
5.1 Evaluation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.2 Models performance . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Trading simulation performance . . . . . . . . . . . . . . . . . . . . 67
6 Conclusion and Future Work
71
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2 Limitation and future work . . . . . . . . . . . . . . . . . . . . . . 71
iv
List of Figures
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
Histogram of transaction price changes for Airgas
Analysis on the basis of publication year. . . . .
Analysis based on prediction techniques (2019). .
Analysis based on clustering techniques (2019). .
Histogram of publication count in topics (2020).
Histogram of publication count in models (2020).
Topic-model heatmap (2020). . . . . . . . . . . .
Neuron in human brain. . . . . . . . . . . . . . .
Computer replication of neuron. . . . . . . . . .
Basic feedforward neural network. . . . . . . . .
Convolutional Neural Network. . . . . . . . . . .
An example of 2D convolution. . . . . . . . . . .
Logistic sigmoid function. . . . . . . . . . . . . .
ReLU function. . . . . . . . . . . . . . . . . . . .
A residual block in ResNet. . . . . . . . . . . . .
An example of Unfolding Computational Graph.
Example of RNN architecture. . . . . . . . . . .
Illustration of LSTM block . . . . . . . . . . . .
3.1
3.2
3.3
3.4
Performance
Performance
Performance
Performance
4.1
4.2
4.3
4.4
Original
Original
Original
Original
of
of
of
of
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
13
13
14
15
16
17
17
18
19
20
21
22
23
24
26
27
29
stocks midprice trends prediction of CNN. . . .
stocks midprice trends prediction of LSTM. . .
DeepLOB for Nasdaq Nordic dataset. . . . . . .
DeepLOB for London Stock Exchange dataset.
.
.
.
.
.
.
.
.
34
35
37
38
AMZN data.
AMD data. .
AAPL data.
FB data. . .
43
44
45
46
midprice
midprice
midprice
midprice
and
and
and
and
processed
processed
processed
processed
midprice
midprice
midprice
midprice
of
of
of
of
stock.
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
averaged
averaged
averaged
averaged
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
v
List of Figures
4.5
4.6
4.7
4.8
4.9
4.10
4.11
4.12
4.13
4.14
Original midprice and processed midprice of averaged TSLA data.
Original midprice and processed midprice of averaged NVDA data.
Original midprice and processed midprice of averaged MSFT data.
Labeling using the first and last record. . . . . . . . . . . . . . . .
Labeling using averaged midprice with k = 1. . . . . . . . . . . . .
Labeling using averaged midprice with k = 10. . . . . . . . . . . .
LSTM model, built with Keras. . . . . . . . . . . . . . . . . . . . .
ResNet model, built with Keras and TensorFlow Hub. . . . . . . .
The first proposed model, built with Keras and TensorFlow Hub. .
The second proposed model, built with Keras and TensorFlow Hub.
47
48
49
50
51
52
53
54
55
56
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
Accuracy of models in training. . . . . . . . . .
Kappa coefficient of models in training. . . . .
Precision of models in training. . . . . . . . . .
Recall of models in training. . . . . . . . . . .
F1-Score of models in training. . . . . . . . . .
Accuracy of models in evaluation. . . . . . . .
Kappa coefficient of models in evaluation. . . .
Precision of models in evaluation. . . . . . . . .
Recall of models in evaluation. . . . . . . . . .
F1-Score of models in evaluation. . . . . . . . .
The midprice of Apple Inc. stock in simulation.
Cumulative returns of models in simulation. . .
62
63
63
64
64
65
65
66
66
67
68
69
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
List of Tables
4.1
4.2
4.3
Percentage of labels using the first and last record of tensor. . . . . 50
Percentage of labels using averaged midprice with k = 1. . . . . . . 51
Percentage of labels using averaged midprice with k = 10. . . . . . 52
5.1
5.2
5.3
5.4
Performance of models after trained with seven datasets.
Performance of models in the final evaluation. . . . . . .
Performance of models in the simulation. . . . . . . . . .
Final Balance of models. . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
62
65
69
70
vii
Abbreviations
S&P500 . . . . . . Standard and Poor’s 500 Stock Index
AI . . . . . . . . . . . . Artificial Intelligence
ML . . . . . . . . . . . Machine Learning
DL . . . . . . . . . . . . Deep Learning
LSTM . . . . . . . . Long Short Term Memory
CNN . . . . . . . . . Convolutional Neural Network
LOB . . . . . . . . . . Limit order book
SEC . . . . . . . . . . Securities and Exchange Commission
WRDS . . . . . . . Wharton Research Data Service
MLP . . . . . . . . . Multilayer Perceptron
SVM . . . . . . . . . Support Vector Machine
1 Introduction
1.1
Introduction to research problem
The stock market is nothing but one of the most attractive topics nowadays. The
U.S. stock market ended 2020 at all-time highs despite a deadly pandemic . Global
stocks (as measured by the MSCI World Index) climbed 14%. Global stocks have
now posted two consecutive years of double-digit gains. In 2019, the MSCI World
Index gained 24%, and U.S. stocks, as measured by S&P500, added 28%.1
Thanks to the quick development of computing and communications, electronic
trading becomes the main trading activity in replace of traditional face-to-face
trading, which also makes possible algorithmic trading supported by artificial
intelligence (AI). Chi Nzelu, Head of Macro e-Trading, said, “Through automation,
we can capture more data - a problem previously unsolvable by algorithms. Machine
learning allows us to improve the quality of services in our trading ecosystem,
which also should gradually improve over time”. According to a survey conducted
by J.P.Morgan in 20202 , 71% of traders believe that AI and machine learning (ML)
provide deep data analytics for their daily trading activity, whereas 58% of traders
believe that AI and ML represent an opportunity to hone their trading decisions.
Together with the growth of ML, especially deep learning (DL), algorithmic trading
attracts more attention from researchers. The survey about recent applications of
DL in the financial industry, conducted by Ozbayoglu et al. [1], proves that prices
1 https://www.fidelity.com/learning-center/trading-investing/markets-sectors/2020-stock-
market-report
2 https://www.jpmorgan.com/solutions/cib/markets/e-trading-2020
1
Chapter 1. Introduction
or price trends prediction, along with algorithmic trading, has the most interest
from DL researchers. Also in [1], Ozbayoglu et al. claim that Long-Short Term
Memory (LSTM) is the most used model thanks to its advantage in the financial
time series research area. Meanwhile, Convolutional Neural Networks (CNNs)
based models, which are well-known in image processing, also gain popularity
among researchers.
Problem statement. As we can see, due to an increasing number of daily
traders, there is a demand for more accurate and efficient tools that are able
to support daily traders to make better decisions and profits on stock markets.
Therefore, our study is executed with the hope of providing some insights into the
performance of DL models in algorithmic trading, i.e. price movements prediction.
We hope that the work can be considered as the first step to construct a useful
DL tool for “intraday” traders3 , an automatic trading bot. For more specified,
this thesis concentrates on developing price trends predictors based on the “order
book”4 , which can be used as automatic decision makers in following projects.
1.2
Objectives of the study
The thesis aims are to develop machine learning models for predicting stocks
midprice movements based on the high frequency limit order book (LOB) data and
simulate trading strategies using proposed models. In particular, the thesis has the
following specific objectives:
• Studying how the stock market works.
• Doing engineering to preprocess the dataset.
• Applying machine learning techniques to forecast price trends during trading
days of chosen U.S. stocks.
• Simulating trades and conducting statistical reports on the outcome of
suggested models and strategies.
3 discussed
4 discussed
in Section 2.2
in Subsection 2.1.4
2
Chapter 1. Introduction
Because of the variety and diversity of the research area as well as the limited
resources, in the scope of this thesis, we consider following restricts:
• Ignoring effects of other economic factors, such as “dark pool”5 , social network
statements that affect the market.
• Simulating simple trades in real-time using quote data given by Investors
Exchange Cloud API6 .
• Ignoring the effect of transaction cost when testing trading strategies.
We hope that the study can contribute to the literature of algorithmic trading using
machine learning, provide more insights about DL algorithmic trading, and may
come up with models and strategies with better performance. Our expectations
are to process the data, construct models that perform acceptably in training
and simulating. Moreover, the thesis can also be considered as the first step to
develop a comprehensive automation trading bot in further research, which may
be eventually deployed in the real market.
1.3
Structure of the thesis
Based on the objectives that we have discussed in Section 1.2 , the thesis is organized
as follows:
Chapter 1: Introduction. We introduce the research problem as well as the
objectives, scope, and structure of the thesis.
Chapter 2: Background. In this chapter, domain knowledge about finance
and stocks is presented. Chapter 2 also includes the machine learning background
relevant to the study.
Chapter 3: Related Work. We discuss the state of machine learning applications in the financial industry and the methodologies proposed by previous
researchers in solving the research problem.
5 discussed
in Subsection 2.1.4
6 https://iexcloud.io/
3
Chapter 1. Introduction
Chapter 4: Data, Models and Trading Strategies. This chapter showcases our data and the way we process it. Models and the training phase are also
mentioned, followed by our trading strategies.
Chapter 5: Result. The performance of models and the result of simulating
trades are demonstrated in detail.
Chapter 6: Conclusion and Future Work. We summarize the thesis,
evaluate what we have achieved and what we have not, as well as some plan in the
future for the research problem.
4
2 Background
2.1
Basic concepts of stock
2.1.1
Stock definition
Stocks represent the ownership in a company. By owning shares or stocks, investors
own a piece of a company. The value of stock increases when the company operates
well and otherwise, the stock may decrease in value when the company does not
do well. Some companies may pay a dividend to the owners of stocks. People buy
stocks for various reasons, such as capital appreciation1 , dividend payments, or
the ability to vote and influence the company. Companies issue stock when they
need money, for maintenance and development purposes, or to paying off debt [2].
Without stocks, companies may struggle to collect such a large amount of money
from individual investors.
2.1.2
Stock markets
The stock market refers to the collection of destinations where regular activities
of buying, selling happened between investors as well as the issuance of shares of
publicly-held companies. Though it’s called a stock market or equity market, other
securities, like exchange traded funds (ETF), bonds, gold are also traded in the
stock market.
1 occurs
when a stock price rises.
5
Chapter 2. Background
Stock markets provide a secure and regulated environment where traders,
companies, and organizations can safely take financial actions like trading. The
stock markets have two missions, known as “primary markets” and “secondary
markets”, which both follow the rules defined by the regulator.
The first task is that the stock market allows companies to hold an initial public
offering (IPO), which refers to issue and sell parts of itself (shares) to the public
for raising fund purposes. The second task is to provide a trading platform that
allows transactions of the listed shares. For every transaction, traders, individuals
or organizations, have to pay the stock market a fee, called a transaction fee.
Long-term investors and short-term traders are not the only two roles taking
part in stock markets. Brokers, portfolio managers, investment banks, and market
makers also contribute to the operation of a stock market [2].
2.1.3
Stock order and types of orders
According to U.S SEC [2], market orders, limit orders, and stop-loss orders are
among the most popular types of orders used in the stock market orders.2
Market Orders are the most common ones in trading. Market orders allow
to buy or sell immediately at the current price, which means buying a stock at or
near the posted ask price, or selling a stock at or near the posted bid price. The
last traded price is not necessarily the price at which the market order will be
executed. Market orders mostly suit the investors who want to issue transactions
without any delay, although the price is not guaranteed.
Limit Orders, which are sometimes referred to as pending orders, allow
investors to guarantee the price at which the transaction, buy or sell, is executed.
Limit orders determine the level where the price must reach for the order to be
filled. If the required level is not met, the limit order will wait until being fulfilled
or canceled by investors. Limit orders help the traders to acquire the best price
possible, in exchange for the immediate execution.
Stop-Loss Orders, which are also referred to as stop orders, are orders to
2 https://www.investor.gov/introduction-investing/investing-basics/how-stock-marketswork/types-orders
6
Chapter 2. Background
trade once the stock price reaches the specified milestone, known as the stop price.
Different from the limit order, a stop order becomes a market order when the stop
price is activated [2].
Other special orders which may be allowed by brokerage firms are Day Orders,
Good-Till-Cancelled Orders, etc. However, in the thesis, we only care about Limit
Orders, which form the limit order book. The following subsection will discuss the
limit order book (LOB).
2.1.4
Order book
Most of the knowledge we discussed in this subsection is from the article in
Investopedia.com3 .
The term order book refers to an electronic list of buy and sell orders for specific
security organized by price level. An order book lists the number of shares being bid
on or offered at each price point. It also identifies the market participants behind
the buy and sell orders, though some choose to remain anonymous. These lists
help traders and also improve market transparency because they provide valuable
trading information.
An order book is dynamic, meaning it is constantly updated in real-time
throughout the day. Orders that specify execution only at market open or market
close are maintained separately, known as “opening order book” and “closing order
book” respectively.
There are typically three parts to an order book, i.e. buy orders, sell orders,
and order history:
• Buy orders contain buyer information including all the bids, the amount they
wish to purchase, and the ask price
• Sell orders are similar to buy orders
• Market order histories show all the transactions that have taken place in the
past
3 https://www.investopedia.com/terms/o/order-book.asp
7
- Xem thêm -