Đăng ký Đăng nhập
Trang chủ Stocks price trends prediction using machine learning techniques ...

Tài liệu Stocks price trends prediction using machine learning techniques

.PDF
89
1
77

Mô tả:

Vietnam National University, Ho Chi Minh City Ho Chi Minh City University of Technology Faculty of Computer Science and Engineering ——————– * ——————— Bachelor of Engineering Thesis Stocks Price Trends Prediction Using Machine Learning Techniques Committee : Supervisors : Computer Science Dr. Nguyen An Khuong, HCMUT, VNU-HCM Dr. Nguyen Tien Thinh, HCMUT, VNU-HCM Mr. Phan Son Tu, Descartes Network Mr. Nguyen Thanh Phuong, New Mexico State University Reviewer : Dr. Nguyen Hua Phung, HCMUT, VNU-HCM Author : Nguyen Duc Phu, 1710234 Ho Chi Minh City, August 10, 2021 ĈҤ,+Ӑ&48Ӕ&*,$73+&0 ---------75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ KHOA: KH & KT Máy tính %Ӝ0Ð1 KHMT &Ӝ1*+Ñ$;­+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS- 7ӵGR- +ҥQKSK~F 1+,ӊ09Ө/8Ұ1È17Ӕ71*+,ӊ3 &K~ê6LQKYLrQSK̫LGiQWͥQj\YjRWUDQJQK̭WFͯDE̫QWKX\͇Wtrình +Ӑ9¬7Ç1 1JX\ӉQĈӭF3K~ NGÀNH: .KRDKӑF0i\WtQK MSSV: 1710234 /Ӟ3 MT17KH02 ĈҫXÿӅOXұQYăQ: 'ӵÿRiQ[XKѭӟQJFӫDJLiFәSKLӃXVӱGөQJFiFNӻWKXұWKӑFPi\ (Stock price trends prediction using machine learning techniques) 1KLӋPYө \rXFҫXYӅQӝLGXQJYjVӕOLӋXEDQÿҫX  i) TìPKLӇXNLӃQWKӭFYӅFiFKWKӭFKRҥWÿӝQJFӫDWKӏWUѭӡQJFKӭQJNKRiQYjFiFKRҥWÿӝQJ ÿҫXWѭPXD-EiQFәSKLӃXEҵQJWKXұWWRiQ. ii) 7uPKLӇXFiFNӻWKXұWKӑFPi\WURQJYLӋF[ӱOêGӳOLӋXFKXӛL WKӡLJLDQÿһFELӋWOjGӳOLӋX tài chính. iii) 7KXWKұSOjPVҥFKYj[ӱOêGӳOLӋXWjLFKtQK FKRYLӋFKXҩQOX\ӋQKӑFPi\. iv) ;k\GӵQJP{KuQKKӑFPi\ÿӇGӵÿRiQ[XKѭӟQJJLiFәSKLӃX v) +LӋQWKӵFWUuQKJLҧOұSP{SKӓQJJLDRGӏFKFәSKLӃXÿѫQJLҧQÿӇNLӇPÿӏQKKLӋXTXҧFӫD mô hình 1Jj\JLDRQKLӋPYөOXұQYăQ: 01/03/2021 1Jj\KRjQWKjQKQKLӋPYө 14/06/2021 +ӑWrQJLҧQJYLrQKѭӟQJGүQ 3KҫQKѭӟQJGүQ 1) 1JX\ӉQ$Q.KѭѫQJ, Ĉ+%iFK.KRD7S+&0 *ӧLêKѭӟQJÿӅWjLJLiPViWTXiWUuQK WKӵFKLӋQ 2) 1JX\ӉQ7LӃQ7KӏQK, Ĉ+%iFK.KRD Tp.HCM +ѭӟQJGүQNLӃQWKӭFQӅQWҧQJJLiPViW TXiWUuQKWKӵFKLӋQ 3) 3KDQ6ѫQ7ӵ, Descartes Networks ĈӏQKKѭӟQJÿӅWjLKѭӟQJGүQNӻWKXұW 4) 1JX\ӉQ7KjQK3KѭѫQJ, Ĉ+New Mexico State ĈӏQKKѭӟQJÿӅWjLhѭӟQJGүQFKtQKNӻ WKXұWF{QJQJKӋ 1ӝLGXQJYj\rXFҫX/971ÿmÿѭӧFWK{QJTXD%ӝP{Q Ngày 03 tháng 08 QăP2021 &+Ӫ1+,ӊ0%Ӝ0Ð1 *,Ҧ1*9,Ç1+ѬӞ1*'Ү1&+Ë1+ .êYjJKLU}K͕WrQ .êYjJKLU}K͕WrQ 3+̮1'¬1+&+2.+2$%͠0Ð1 1JѭӡLGX\ӋW FKҩPVѫEӝ ________________________ ĈѫQYӏ _______________________________________ 1Jj\EҧRYӋ __________________________________ ĈLӇPWәQJNӃW _________________________________ 1ѫLOѭXWUӳOXұQiQ _____________________________ 75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ KHOA KH & KT MÁY TÍNH &Ӝ1*+Ñ$;­+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS- 7ӵGR- +ҥQKSK~F ---------------------------Ngày 11 tháng 08 QăP 2021 3+,ӂ8&+Ҩ0%Ҧ29ӊ/971 'jQKFKRQJ˱ͥLK˱ͣQJG̳Q +ӑYjWrQ69 NguyӉQĈӭF3K~ MSSV: 1710234 (MT17KH02) Ngành (chuyên ngành): KHMT ĈӅWjL 'ӵÿRiQ[XKѭӟQJJLiFәSKLӃXVӱGөQJFiFNӻWKXұWKӑFPi\ (Stocks price trends prediction using machine learning techniques) +ӑWrQQJѭӡLKѭӟQJGүQ x 1JX\ӉQ$Q.Kѭѫng, Khoa K+ .70i\WtQKĈ+BK x 1JX\ӉQ7LӃQ7KӏQK, .KRD.+ .70i\WtQKĈ+%. x 3KDQ6ѫQ7ӵ, Descartes Network x 1JX\ӉQ7KjQK3KѭѫQJ, Ĉ+1HZ0H[LFR6WDWH+RD.Ǥ 7әQJTXiWYӅEҧQWKX\ӃWPLQK 6ӕWUDQJ 76 6ӕFKѭѫQJ 06 3KөOөF 6ӕEҧQJVӕOLӋX: 7 6ӕKuQKYӁ 48 6ӕtài OLӋXWKDPNKҧo: 36 3KҫQPӅPWtQKWRiQ +LӋQYұW VҧQSKҭP : &'FKӭD các files [ӱOêGӳOLӋXKXҩQOX\ӋQYjNLӇPÿӏQKP{KuQK 7әQJTXiWYӅFiFEҧQYӁ - 6ӕEҧQYӁ %ҧQ$ %ҧQ$ .KәNKiF - 6ӕEҧQYӁYӁWD\ 6ӕEҧQYӁWUrQPi\WtQK 1KӳQJѭXÿLӇPFKtQKFӫD/9TN: x /9ÿѭӧFYLӃWEҵQJWLӃQJ$QKNKiWӕWtWOӛL, trình bày ÿҽSPҥFKOҥFU}UjQJÿ~QJTX\ cách7/7.WUuQKEj\ÿ~QJFKXҭQ x SVTH có QăQJOӵFWӕW, có NKҧQăQJWӵKӑFYjWLQKWKҫQOjPYLӋFÿӝFOұSUҩWFDR x 697+QҳPYӳQJNLӃQWKӭFQӅQWҧQJNӻWKXұWYjFiF F{QJQJKӋFyOLrQTXDQ ÿӇ[ӱOêGӳOLӋX FKXӛLWKӡLJLDQ[k\GӵQJÿѭӧFP{KuQKKӑFPi\ÿӇGӵÿRiQ[XKѭӟQJJLiFәSKLӃXYjKLӋQ WKӵFJLҧOұSJLDRGӏFKFәSKLӃXÿѫQJLҧQGӵDWUrQGӵÿRiQFӫDP{KuQKQj\ x .ӃW TXҧÿҥWÿѭӧFFӫD/9FyêQJKƭDWKӵFWLӉQSKKӧSYӟLPөFWLrXYjJLӟLKҥQSKҥPYLÿӅ WjLÿһWUDEDQÿҫX. 1KӳQJWKLӃXVyWFKtQKFӫD/971 x &ҫQWKӵFKLӋQÿiQKJLiSKkQWtFKFKLWLӃWKѫQEӝGӳOLӋXÿѭӧFVӱGөQJWURQJOXұQYăQ x &KѭDWLQKFKӍQKFiFWK{QJ VӕFӫDP{KuQKÿӇÿҥWNӃWTXҧWӕLѭX x ChѭDKLӋQWKӵFWKành công cөSKөFYөÿҫXWѭ ĈӅQJKӏĈѭӧFEҧRYӋ; %әVXQJWKrPÿӇEҧRYӋ† .K{QJÿѭӧFEҧRYӋ† 9. 0ӝWVӕ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ Không có (69Vͅÿ˱ͫFK͗LWU͹FWL͇SWUrQ+Ĉ) ĈiQKJLiFKXQJ EҵQJFKӳJLӓLNKi7%  GiӓL ĈLӇP9.5/10 .êWrQ JKLU}KӑWrQ 1JX\ӉQ$Q.KѭѫQJ TRƯỜNG ĐẠI HỌC BÁCH KHOA KHOA KH & KT MÁY TÍNH CỘNG HÒA Xà HỘI CHỦ NGHĨA VIỆT NAM Độc lập - Tự do - Hạnh phúc ---------------------------Ngày 10 tháng 8 năm 2021 PHIẾU CHẤM BẢO VỆ LVTN (Dành cho người phản biện) 1. Họ và tên SV: Nguyễn Đức Phú MSSV: 1710234 Ngành (chuyên ngành): Khoa học máy tính 2. Đề tài: Stocks Price Trends Prediction Using Machine Learning Techniques 3. Họ tên người phản biện: TS. Nguyễn Hứa Phùng 4. Tổng quát về bản thuyết minh: Số trang: 72 Số chương: 6 Số bảng số liệu: 7 Số hình vẽ: 38 Số tài liệu tham khảo: 36 Phần mềm tính toán: Hiện vật (sản phẩm) 5. Tổng quát về các bản vẽ: - Số bản vẽ: Bản A1: Bản A2: Khổ khác: - Số bản vẽ vẽ tay Số bản vẽ trên máy tính: 6. Những ưu điểm chính của LVTN: Đề tài thực hiện dự đoán xu hướng giá của cổ phiếu chứng khoán giao dịch trong ngày. Đề tài thực hiện sử dụng dữ liệu giao dịch trong 1 tháng của 7 cổ phiếu với khoảng 200 triệu bản ghi được cung cấp bởi Wharton Research Data Services. Sinh viên Phú đã thực hiện xử lý dữ liệu (bình quân các giao dịch trong mili giây để đưa về giao dịch trên giây, chuẩn hoá dữ liệu, tạo cửa sổ dữ liệu mỗi 5 phút. Sinh viên Phú cũng sử dụng các kỹ thuật học máy có sẵn (LSTM, ResNet50) và kết hợp chúng theo hai hướng khác nhau (Hybrid, ResLSTM) sau đó triển khai thực nghiệm và thực hiện mô phỏng giao dịch dựa vào kết quả dự đoán của các mô hình. Kết quả cho thấy có mô hình cho kết quả tương đối tốt. Luận văn được viết bằng tiếng Anh khá tốt, ít lỗi. 7. Những thiếu sót chính của LVTN: Đề tài chưa phân tích chi tiết các yêu cầu của đề tài, chưa thực hiện phân tích tập dữ liệu hiện có. Một số vấn đề chưa được giải thích trong luận văn: - Sự chênh lệch giữa giá ask và giá bid trong tập dữ liệu có lớn không? Nếu sự chênh lệch không lớn thì có thể chỉ cần dùng một giá trong tập dữ liệu được không? - Liệu có mối tương quan giữa khối lượng và giá dự báo không? 8. Đề nghị: Được bảo vệ  Bổ sung thêm để bảo vệ  Không được bảo vệ  9. 3 câu hỏi SV phải trả lời trước Hội đồng: a. Sự chênh lệch giữa giá ask và giá bid trong tập dữ liệu có lớn không? Nếu sự chênh lệch không lớn thì có thể chỉ cần dùng một giá trong tập dữ liệu được không? b. Liệu có mối tương quan giữa khối lượng và giá dự báo không? Em đã thực hiện phân tích nào để đánh giá mối tương quan giữa khối lượng và giá dự báo truớc khi đưa khối lượng vào mô hình học máy? 10. Đánh giá chung (bằng chữ: giỏi, khá, TB): Giỏi Điểm : 9 /10 Ký tên (ghi rõ họ tên) TS. Nguyễn Hứa Phùng Declaration I certify that everything written in this thesis, as well as in the source code, is done by myself, with the exception of quoted reference knowledge as well as code provided by the manufacturer themselves, with no intention of plagiarising or duplicating from foreign sources. If reassurances find contradicting results to the aforementioned statement, I shall take full responsibility in front of the Faculty and the University. Author Acknowledgements We would like to express our very great appreciation to Dr. Nguyen An Khuong for his huge support and useful critiques during the planning, development, and completion of this thesis. His enthusiastic, credible, and continuous guidance plays an important part in the completion of the thesis. Advice given by Dr. Nguyen Tien Thinh has been a great help in both technical and presentation aspects. We also would like to offer my special thanks to the seniors, Mr. Nguyen Thanh Phuong, Mr. Van Tien Duc, Mr. Phan Son Tu, Mr. Tran Trung Hieu, Mr. Van Minh Hao, and Mr. Nguyen Tan Duc, for their helpful advice during our research process. It would be incomplete without showing love to our family. They are the biggest motivation for us to complete the thesis. Finally, we would like to thanks my friends, Nguyen Dang Ha Nam and Nguyen Huy Hong Huy, as well as Nguyen Nguyen Vi for their assists. Author i Abstract The stock market is nothing but one of the most attractive topics nowadays. Thanks to the recent rapid development of machine learning, especially deep learning, algorithmic trading becomes more popular. With the purpose of constructing an automatic trading bot in mind, we decided to work on developing stocks price trends predictors for our thesis as the first step. Besides two models using convolutional neurons network and long-short term memory, we also propose other two hybrid forms of these models. The result is competitive in terms of training and evaluation performance, compared to other studies. Moreover, trading simulations based on signals of trained models are conducted to provide more insights about the potential of applying machine learning models into the real-life stock market, in which one of our models achieves positive returns. ii Contents Declaration Acknowledgements i Abstract ii List of Figures v List of Tables v Abbreviations viii 1 Introduction 1.1 Introduction to research problem . . . . . . . . . . . . . . . . . . . 1.2 Objectives of the study . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Structure of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Basic concepts of stock . . . . . . . . . 2.1.1 Stock definition . . . . . . . . . 2.1.2 Stock markets . . . . . . . . . . 2.1.3 Stock order and types of orders 2.1.4 Order book . . . . . . . . . . . 2.2 Intraday trading . . . . . . . . . . . . . 2.3 High frequency data characteristics . . 2.3.1 Irregular temporal spacing . . . 2.3.2 Discreteness . . . . . . . . . . . 2.3.3 Diurnal patterns . . . . . . . . 2.4 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 3 5 5 5 5 6 7 8 10 10 10 11 12 iii Contents 2.4.1 2.4.2 2.4.3 2.4.4 Deep learning . . . . . . . . . . Feedforward Neural Networks . Convolutional Neural Networks Recurrent Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 16 19 25 3 Related Work 31 4 Proposed models and experiments 4.1 Data preparation . . . . . . . . . . . 4.1.1 Data overview . . . . . . . . . 4.1.2 Data preprocessing . . . . . . 4.1.3 Normalization . . . . . . . . . 4.1.4 Time intervals . . . . . . . . . 4.1.5 Labeling . . . . . . . . . . . . 4.2 Deep learning models . . . . . . . . . 4.2.1 LSTM . . . . . . . . . . . . . 4.2.2 ResNet . . . . . . . . . . . . . 4.2.3 Two proposed combinations of 4.2.4 Training method . . . . . . . 4.3 Trading strategies . . . . . . . . . . . 4.3.1 Trading strategies . . . . . . . 4.3.2 Evaluating trading strategy . 39 39 39 41 41 41 42 50 53 53 54 56 57 57 58 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . and ResNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Result 60 5.1 Evaluation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2 Models performance . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3 Trading simulation performance . . . . . . . . . . . . . . . . . . . . 67 6 Conclusion and Future Work 71 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2 Limitation and future work . . . . . . . . . . . . . . . . . . . . . . 71 iv List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 Histogram of transaction price changes for Airgas Analysis on the basis of publication year. . . . . Analysis based on prediction techniques (2019). . Analysis based on clustering techniques (2019). . Histogram of publication count in topics (2020). Histogram of publication count in models (2020). Topic-model heatmap (2020). . . . . . . . . . . . Neuron in human brain. . . . . . . . . . . . . . . Computer replication of neuron. . . . . . . . . . Basic feedforward neural network. . . . . . . . . Convolutional Neural Network. . . . . . . . . . . An example of 2D convolution. . . . . . . . . . . Logistic sigmoid function. . . . . . . . . . . . . . ReLU function. . . . . . . . . . . . . . . . . . . . A residual block in ResNet. . . . . . . . . . . . . An example of Unfolding Computational Graph. Example of RNN architecture. . . . . . . . . . . Illustration of LSTM block . . . . . . . . . . . . 3.1 3.2 3.3 3.4 Performance Performance Performance Performance 4.1 4.2 4.3 4.4 Original Original Original Original of of of of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 13 13 14 15 16 17 17 18 19 20 21 22 23 24 26 27 29 stocks midprice trends prediction of CNN. . . . stocks midprice trends prediction of LSTM. . . DeepLOB for Nasdaq Nordic dataset. . . . . . . DeepLOB for London Stock Exchange dataset. . . . . . . . . 34 35 37 38 AMZN data. AMD data. . AAPL data. FB data. . . 43 44 45 46 midprice midprice midprice midprice and and and and processed processed processed processed midprice midprice midprice midprice of of of of stock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . averaged averaged averaged averaged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 Original midprice and processed midprice of averaged TSLA data. Original midprice and processed midprice of averaged NVDA data. Original midprice and processed midprice of averaged MSFT data. Labeling using the first and last record. . . . . . . . . . . . . . . . Labeling using averaged midprice with k = 1. . . . . . . . . . . . . Labeling using averaged midprice with k = 10. . . . . . . . . . . . LSTM model, built with Keras. . . . . . . . . . . . . . . . . . . . . ResNet model, built with Keras and TensorFlow Hub. . . . . . . . The first proposed model, built with Keras and TensorFlow Hub. . The second proposed model, built with Keras and TensorFlow Hub. 47 48 49 50 51 52 53 54 55 56 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 5.11 5.12 Accuracy of models in training. . . . . . . . . . Kappa coefficient of models in training. . . . . Precision of models in training. . . . . . . . . . Recall of models in training. . . . . . . . . . . F1-Score of models in training. . . . . . . . . . Accuracy of models in evaluation. . . . . . . . Kappa coefficient of models in evaluation. . . . Precision of models in evaluation. . . . . . . . . Recall of models in evaluation. . . . . . . . . . F1-Score of models in evaluation. . . . . . . . . The midprice of Apple Inc. stock in simulation. Cumulative returns of models in simulation. . . 62 63 63 64 64 65 65 66 66 67 68 69 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables 4.1 4.2 4.3 Percentage of labels using the first and last record of tensor. . . . . 50 Percentage of labels using averaged midprice with k = 1. . . . . . . 51 Percentage of labels using averaged midprice with k = 10. . . . . . 52 5.1 5.2 5.3 5.4 Performance of models after trained with seven datasets. Performance of models in the final evaluation. . . . . . . Performance of models in the simulation. . . . . . . . . . Final Balance of models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 65 69 70 vii Abbreviations S&P500 . . . . . . Standard and Poor’s 500 Stock Index AI . . . . . . . . . . . . Artificial Intelligence ML . . . . . . . . . . . Machine Learning DL . . . . . . . . . . . . Deep Learning LSTM . . . . . . . . Long Short Term Memory CNN . . . . . . . . . Convolutional Neural Network LOB . . . . . . . . . . Limit order book SEC . . . . . . . . . . Securities and Exchange Commission WRDS . . . . . . . Wharton Research Data Service MLP . . . . . . . . . Multilayer Perceptron SVM . . . . . . . . . Support Vector Machine 1 Introduction 1.1 Introduction to research problem The stock market is nothing but one of the most attractive topics nowadays. The U.S. stock market ended 2020 at all-time highs despite a deadly pandemic . Global stocks (as measured by the MSCI World Index) climbed 14%. Global stocks have now posted two consecutive years of double-digit gains. In 2019, the MSCI World Index gained 24%, and U.S. stocks, as measured by S&P500, added 28%.1 Thanks to the quick development of computing and communications, electronic trading becomes the main trading activity in replace of traditional face-to-face trading, which also makes possible algorithmic trading supported by artificial intelligence (AI). Chi Nzelu, Head of Macro e-Trading, said, “Through automation, we can capture more data - a problem previously unsolvable by algorithms. Machine learning allows us to improve the quality of services in our trading ecosystem, which also should gradually improve over time”. According to a survey conducted by J.P.Morgan in 20202 , 71% of traders believe that AI and machine learning (ML) provide deep data analytics for their daily trading activity, whereas 58% of traders believe that AI and ML represent an opportunity to hone their trading decisions. Together with the growth of ML, especially deep learning (DL), algorithmic trading attracts more attention from researchers. The survey about recent applications of DL in the financial industry, conducted by Ozbayoglu et al. [1], proves that prices 1 https://www.fidelity.com/learning-center/trading-investing/markets-sectors/2020-stock- market-report 2 https://www.jpmorgan.com/solutions/cib/markets/e-trading-2020 1 Chapter 1. Introduction or price trends prediction, along with algorithmic trading, has the most interest from DL researchers. Also in [1], Ozbayoglu et al. claim that Long-Short Term Memory (LSTM) is the most used model thanks to its advantage in the financial time series research area. Meanwhile, Convolutional Neural Networks (CNNs) based models, which are well-known in image processing, also gain popularity among researchers. Problem statement. As we can see, due to an increasing number of daily traders, there is a demand for more accurate and efficient tools that are able to support daily traders to make better decisions and profits on stock markets. Therefore, our study is executed with the hope of providing some insights into the performance of DL models in algorithmic trading, i.e. price movements prediction. We hope that the work can be considered as the first step to construct a useful DL tool for “intraday” traders3 , an automatic trading bot. For more specified, this thesis concentrates on developing price trends predictors based on the “order book”4 , which can be used as automatic decision makers in following projects. 1.2 Objectives of the study The thesis aims are to develop machine learning models for predicting stocks midprice movements based on the high frequency limit order book (LOB) data and simulate trading strategies using proposed models. In particular, the thesis has the following specific objectives: • Studying how the stock market works. • Doing engineering to preprocess the dataset. • Applying machine learning techniques to forecast price trends during trading days of chosen U.S. stocks. • Simulating trades and conducting statistical reports on the outcome of suggested models and strategies. 3 discussed 4 discussed in Section 2.2 in Subsection 2.1.4 2 Chapter 1. Introduction Because of the variety and diversity of the research area as well as the limited resources, in the scope of this thesis, we consider following restricts: • Ignoring effects of other economic factors, such as “dark pool”5 , social network statements that affect the market. • Simulating simple trades in real-time using quote data given by Investors Exchange Cloud API6 . • Ignoring the effect of transaction cost when testing trading strategies. We hope that the study can contribute to the literature of algorithmic trading using machine learning, provide more insights about DL algorithmic trading, and may come up with models and strategies with better performance. Our expectations are to process the data, construct models that perform acceptably in training and simulating. Moreover, the thesis can also be considered as the first step to develop a comprehensive automation trading bot in further research, which may be eventually deployed in the real market. 1.3 Structure of the thesis Based on the objectives that we have discussed in Section 1.2 , the thesis is organized as follows: Chapter 1: Introduction. We introduce the research problem as well as the objectives, scope, and structure of the thesis. Chapter 2: Background. In this chapter, domain knowledge about finance and stocks is presented. Chapter 2 also includes the machine learning background relevant to the study. Chapter 3: Related Work. We discuss the state of machine learning applications in the financial industry and the methodologies proposed by previous researchers in solving the research problem. 5 discussed in Subsection 2.1.4 6 https://iexcloud.io/ 3 Chapter 1. Introduction Chapter 4: Data, Models and Trading Strategies. This chapter showcases our data and the way we process it. Models and the training phase are also mentioned, followed by our trading strategies. Chapter 5: Result. The performance of models and the result of simulating trades are demonstrated in detail. Chapter 6: Conclusion and Future Work. We summarize the thesis, evaluate what we have achieved and what we have not, as well as some plan in the future for the research problem. 4 2 Background 2.1 Basic concepts of stock 2.1.1 Stock definition Stocks represent the ownership in a company. By owning shares or stocks, investors own a piece of a company. The value of stock increases when the company operates well and otherwise, the stock may decrease in value when the company does not do well. Some companies may pay a dividend to the owners of stocks. People buy stocks for various reasons, such as capital appreciation1 , dividend payments, or the ability to vote and influence the company. Companies issue stock when they need money, for maintenance and development purposes, or to paying off debt [2]. Without stocks, companies may struggle to collect such a large amount of money from individual investors. 2.1.2 Stock markets The stock market refers to the collection of destinations where regular activities of buying, selling happened between investors as well as the issuance of shares of publicly-held companies. Though it’s called a stock market or equity market, other securities, like exchange traded funds (ETF), bonds, gold are also traded in the stock market. 1 occurs when a stock price rises. 5 Chapter 2. Background Stock markets provide a secure and regulated environment where traders, companies, and organizations can safely take financial actions like trading. The stock markets have two missions, known as “primary markets” and “secondary markets”, which both follow the rules defined by the regulator. The first task is that the stock market allows companies to hold an initial public offering (IPO), which refers to issue and sell parts of itself (shares) to the public for raising fund purposes. The second task is to provide a trading platform that allows transactions of the listed shares. For every transaction, traders, individuals or organizations, have to pay the stock market a fee, called a transaction fee. Long-term investors and short-term traders are not the only two roles taking part in stock markets. Brokers, portfolio managers, investment banks, and market makers also contribute to the operation of a stock market [2]. 2.1.3 Stock order and types of orders According to U.S SEC [2], market orders, limit orders, and stop-loss orders are among the most popular types of orders used in the stock market orders.2 Market Orders are the most common ones in trading. Market orders allow to buy or sell immediately at the current price, which means buying a stock at or near the posted ask price, or selling a stock at or near the posted bid price. The last traded price is not necessarily the price at which the market order will be executed. Market orders mostly suit the investors who want to issue transactions without any delay, although the price is not guaranteed. Limit Orders, which are sometimes referred to as pending orders, allow investors to guarantee the price at which the transaction, buy or sell, is executed. Limit orders determine the level where the price must reach for the order to be filled. If the required level is not met, the limit order will wait until being fulfilled or canceled by investors. Limit orders help the traders to acquire the best price possible, in exchange for the immediate execution. Stop-Loss Orders, which are also referred to as stop orders, are orders to 2 https://www.investor.gov/introduction-investing/investing-basics/how-stock-marketswork/types-orders 6 Chapter 2. Background trade once the stock price reaches the specified milestone, known as the stop price. Different from the limit order, a stop order becomes a market order when the stop price is activated [2]. Other special orders which may be allowed by brokerage firms are Day Orders, Good-Till-Cancelled Orders, etc. However, in the thesis, we only care about Limit Orders, which form the limit order book. The following subsection will discuss the limit order book (LOB). 2.1.4 Order book Most of the knowledge we discussed in this subsection is from the article in Investopedia.com3 . The term order book refers to an electronic list of buy and sell orders for specific security organized by price level. An order book lists the number of shares being bid on or offered at each price point. It also identifies the market participants behind the buy and sell orders, though some choose to remain anonymous. These lists help traders and also improve market transparency because they provide valuable trading information. An order book is dynamic, meaning it is constantly updated in real-time throughout the day. Orders that specify execution only at market open or market close are maintained separately, known as “opening order book” and “closing order book” respectively. There are typically three parts to an order book, i.e. buy orders, sell orders, and order history: • Buy orders contain buyer information including all the bids, the amount they wish to purchase, and the ask price • Sell orders are similar to buy orders • Market order histories show all the transactions that have taken place in the past 3 https://www.investopedia.com/terms/o/order-book.asp 7
- Xem thêm -

Tài liệu liên quan