Tài liệu Building a diagram recognition application with computer vision approach

.PDF

120

127

thanhphoquetoi Báo vi phạm

Tải xuống 127

Mô tả:

VIETNAM NATIONAL UNIVERSITY - HO CHI MINH CITY HO CHI MINH UNIVERSITY OF TECHNOLOGY COMPUTER SCIENCE AND ENGINEERING FACULTY ——————– * ——————— GRADUATION THESIS Building A Diagram Recognition Application with Computer Vision Approach Committee: Computer Science 1 Advisor: Dr. Nguyen Duc Dung Reviewer: Dr. Tran Tuan Anh —–o0o—– Students: Huynh Tan Thanh 1752048 Nguyen Quang Sang 1752465 HO CHI MINH CITY, 12/2021 ĈҤ,+Ӑ&48Ӕ&*,$73+&0 ---------75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ KHOA: KH & KT Máy tính %Ӝ0Ð1KHMT &Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS- 7ӵGR- +ҥQKSK~F 1+,ӊ09Ө/8Ұ1È17Ӕ71*+,ӊ3 &K~ê6LQKYLrQSK̫LGiQWͥQj\YjRWUDQJQK̭WFͯDE̫QWKX\͇WWUuQK +Ӑ9¬7Ç11*8<ӈ1QUANG SANG +Ӑ9¬7Ç1+8ǣ1+7Ҩ17+¬1+ MSSV: 1752465 MSSV: 1752048 NGÀNH: KHMT /Ӟ3CLC-K2017 ĈҫXÿӅOXұQiQ ³;k\GӵQJӭQJGөQJQKұQGLӋQOѭXÿӗVӱGөQJFiFKWLӃS FұQFӫDWKӏJLiFPi\WtQK´ (Building A Diagram Recognition Application with Computer Vision Approach) 1KLӋPYө\rXFҫXYӅQӝLGXQJYjVӕOLӋXEDQÿҫX - Investigate approaches in computer vision for diagram recognition problem - Design the framework and the processing pipeline for the diagram recognition system - Collect data and perform labeling tasks on the data - Implement the recognition model, which uses both the DL approach and traditional Computer vision algorithms in the pipeline - Implement the mobile application - Evaluating the application and performance of the proposed system 1Jj\JLDRQKLӋPYөOXұQiQ 1Jj\KRjQWKjQKQKLӋPYө +ӑWrQJLҧQJYLrQKѭӟQJGүQ 1) TS. 1JX\ӉQĈӭF'ǊQJ, Khoa KH&KT Máy tính 3KҫQKѭӟQJGүQ 1ӝLGXQJYj\rXFҫX/971ÿmÿѭӧFWK{QJTXD%ӝP{Q 1Jj\WKiQJQăP &+Ӫ1+,ӊ0%Ӝ0Ð1 *,Ҧ1*9,Ç1+ѬӞ1*'Ү1&+Ë1+ .êYjJKLU}K͕WrQ .êYjJKLU}K͕WrQ 3*676+XǤQK7ѭӡQJ1JX\rQ 3+̮1'¬1+&+2.+2$%͠0Ð1 1JѭӡLGX\ӋWFKҩPVѫEӝ ĈѫQYӏ 1Jj\EҧRYӋ ĈLӇPWәQJNӃW 1ѫLOѭXWUӳOXұQiQ 1JX\ӉQĈӭF'ǊQJ 75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ KHOA KH & KT MÁY TÍNH &Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS- 7ӵGR- +ҥQKSK~F ---------------------------1Jj\WKiQJQăP 3+,ӂ8&+Ҩ0%Ҧ29ӊ/971 'jQKFKRQJ˱ͥLK˱ͣQJG̳Q +ӑYjWrQ691JX\ӉQ4XDQJ6DQJ+XǤQK7ҩQ7KjQK MSSV: 1752465, 1752048 Ngành (chuyên ngành): KHMT ĈӅWjL ;k\GӵQJӭQJGөQJQKұQGLӋQOѭXÿӗVӱGөQJFiFKWLӃSFұQFӫDWKӏJLiFPi\WtQK (Building A Diagram Recognition Application with Computer Vision Approach) 3. HӑWrQQJѭӡLKѭӟQJGүQ761JX\ӉQĈӭF'ǊQJ 7әQJTXiWYӅEҧQWKX\ӃWPLQK 6ӕWUDQJ 6ӕFKѭѫQJ 6ӕEҧQJVӕOLӋX 6ӕKuQKYӁ 6ӕWjLOLӋXWKDPNKҧR 3KҫQPӅPWtQKWRiQ +LӋQYұWVҧQSKҭP 7әQJTXiWYӅFiFEҧQYӁ - 6ӕEҧQYӁ %ҧQ$ %ҧQ$ .KәNKiF - 6ӕEҧQYӁYӁWD\ 6ӕEҧQYӁWUrQPi\WtQK 1KӳQJѭXÿLӇPFKtQKFӫD/971 - The students proposed a solution for the diagram recognition problem, which utilizes the advantages of computer vision techniques and machine learning approaches. In addition, the students have successfully built a mobile application that allows users to interact with the system easier. The application was built with useful features and easy to use interface. - The students also did a lot of evaluation as well as proposed some improvement in the recognition algorithm. 1KӳQJWKLӃXVyWFKtQKFӫD/971 - Some algorithms used in the project are not so advanced and may not be able to handle some difficult cases in the problem. - The evaluation results are promising but still need to improve further, especially when investigating various cases in recognition. ĈӅQJKӏĈѭӧFEҧRYӋ %әVXQJWKrPÿӇEҧRYӋ .K{QJÿѭӧFEҧRYӋ 0ӝWVӕFkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ a. b. c. ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7%*LӓL ĈLӇP9/10 .êWrQJKLU}KӑWrQ 761JX\ӉQĈӭF'ǊQJ 75ѬӠ1*ĈҤ,+Ӑ&%È&+.+2$ KHOA KH & KT MÁY TÍNH &Ӝ1*+Ñ$;+Ӝ,&+Ӫ1*+Ƭ$9,ӊ71$0 ĈӝFOұS- 7ӵGR- +ҥQKSK~F ---------------------------Ngày 27 tháng 12 QăP 2021 3+,ӂ8&+Ҩ0%Ҧ29ӊ/971 'jQKFKRQJ˱ͥLK˱ͣQJG̳QSK̫QEL͏Q +ӑYjWrQ69 NguyӉn Quang Sang, HuǤnh Tҩn Thjnh MSSV: 1752465, 1752048 Ngành (chuyên ngành): Khoa HӑF0áy Tính ĈӅWjL Building A Diagram Recognition Application with Computer Vision Approach +ӑWrQQJѭӡLKѭӟQJGүQSKҧQELӋQ TrҫQ7XҩQ$QK 7әQJTXiWYӅEҧQWKX\ӃWPLQK 6ӕWUDQJ 6ӕFKѭѫQJ 6ӕEҧQJVӕOLӋX 6ӕKuQKYӁ 6ӕWjLOLӋXWKDPNKҧR 3KҫQPӅPWtQKWRiQ +LӋQYұWVҧQSKҭP 7әQJTXiWYӅFiFEҧQYӁ - 6ӕEҧQYӁ %ҧQ$ %ҧQ$ .KәNKiF - 6ӕEҧQYӁYӁWD\ 6ӕEҧQYӁWUrQPi\WtQK 1KӳQJѭXÿLӇPFKtQKFӫD/971 - The thesis presents a system that can convert handwritten flowcharts into digital documents. - The system is built quite full of features and has a good application demo. - This thesis has quite a large amount of work including recognizing shapes, handwriting, arrows and building demo app. - The thesis has experiments and is quite fully cited. This thesis also presents quite detailed algorithms and models 1KӳQJWKLӃXVyWFKtQKFӫD/971 - This application requires many techniques combined, leading to a lot of work in many technique areas. This is also one of the weaknesses of the thesis when the research works on the topic have not been strongly developed. For example, the handwriting entry. The team can focus on developing a few key techniques instead of all of them, the rest can use existing results. - The evaluation parameters are not detailed and user-oriented, for example, is the assessment of the arrow considered fair for all arrow types? - The data used to train the model is not clearly presented. The application should explore more about usability, adapting to the user, instead of just focusing on general accuracy. - Models should be analyzed in more detail, rather than just using it. 8. ĈӅQJKӏĈѭӧFEҧRYӋ %әVXQJWKrPÿӇEҧRYӋ .K{QJÿѭӧFEҧRYӋ FkXKӓL69SKҧLWUҧOӡLWUѭӟF+ӝLÿӗQJ a. The evaluation methods proposed in this thesis is effective? For example, is the assessment of the arrow considered fair for all arrow types? is there any general evaluation for the application? b. What are the main strengths of this thesis? Also, what is the main point that users should use your app? c. What is your next research priority? ĈiQKJLiFKXQJEҵQJFKӳJLӓLNKi7% GiӓL ĈLӇP8.7 /10 .êWrQJKLU}KӑWrQ TrҫQ7XҩQ$QK Declaration We hereby undertake that this is our own research project under the guidance of Dr. Nguyen. Research content and results are truthful and have never been published before. The data used for the analysis and comments are collected by us from many different sources and will be clearly stated in the references. Additionally, a number of reviews and figures of other authors and organizations we use will have citations and origins clearly stated in the report. If we detect any fraud, we take full responsibility for the content of our graduation thesis. Ho Chi Minh City University of Technology is not related to the copyright and copyright infringement caused by us in the implementation process. Nguyen Quang Sang Huynh Tan Thanh Acknowledgments We would like to express my deepest thanks to Dr.Nguyen Duc Dung for the continuous support in studying and implementing this thesis. This project would not have been possible without your thoughtful and passionate guides. Besides our advisor, we would also like to thank all of our faculty lecturers, who gave us the valuable knowledge to do this wonderful project. With the invaluable experience from this golden opportunity, we became more confident in our research ability and technical skills. We strongly believe that there is no perfection, especially in the science field. With that in mind, we will always have room for more enhancement and would love to hear your opinion about any improvement. Best regards, Nguyen Quang Sang Huynh Tan Thanh Abstract Graphical language has been and is always one of the most effective tools for demonstrating ideas to others. Besides text and images, a flow chart plays a vital role in providing people a clearer view of a plan, or a process with simple symbols, notations. Nowadays, many meetings still enjoy the traditional way by using board, paper to draw diagrams expressing their thoughts on the topics discussed. A problem occurs when saving these drawings as a reference for future purposes since we cannot edit the diagram taken from the picture. These drawn pictures need to be re-drawn by some tools to be suitable in professional documents. In addition, the re-drawn tool can be a computer or a particular device like electronics drawing boards and digital pens, which cost a lot and is not the most convenient tools to use. Therefore, a new approach is necessary to convert hand-drawing charts pictures into digital ones. The approach can help us avoid re-drawn tasks, simplify the sharing process between users, and be able to export them into another form like picture files (png, jpg), document files (pdf), or standard diagram editing files (drawio). The application must be able to run on popular platforms and accessible to everyone. Contents 1 Introduction 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Project Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 2 2 Related works 2.1 Related Applications . . . . . . . . . . . . . . . . . . . . . 2.1.1 Object recognition . . . . . . . . . . . . . . . . . . 2.1.2 Diagram tools . . . . . . . . . . . . . . . . . . . . . 2.1.3 Diagram recognition applications on mobile devices 2.2 Diagram recognition . . . . . . . . . . . . . . . . . . . . . 2.3 Handwriting Text recognition . . . . . . . . . . . . . . . . . 2.3.1 Preprocessing Phase . . . . . . . . . . . . . . . . . 2.3.2 Recognition Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 4 4 5 5 7 7 9 3 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Background 3.1 Faster R-CNN . . . . . . . . . . . . . . . . . . . . . 3.1.1 Backbone CNN . . . . . . . . . . . . . . . . 3.1.2 Regional Proposal Network . . . . . . . . . 3.1.3 Non-Maximum Suppression . . . . . . . . . 3.1.4 Region of Interest Pooling (RoI Pooling) . . . 3.2 Mask R-CNN . . . . . . . . . . . . . . . . . . . . . 3.2.1 Object Mask (Binary Mask) . . . . . . . . . 3.2.2 Feature Pyramid Network . . . . . . . . . . 3.2.3 Region of Interest Align (RoI Align) . . . . . 3.3 Handwriting Text Recognition . . . . . . . . . . . . 3.3.1 Long Short Term Memory (LSTM) . . . . . 3.3.2 Gated Recurrent Unit (GRU) . . . . . . . . . 3.3.3 Bidirectional RNN (BRNN) . . . . . . . . . 3.3.4 Connectionist Temporal Classification (CTC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 12 12 13 14 16 16 17 18 18 19 19 20 21 22 Proposed model 4.1 Diagram Recognition Approach . . . . 4.1.1 Preparing diagram dataset . . . 4.1.2 Recognition model . . . . . . . 4.1.2.1 Feature map generator 4.1.2.2 Proposal generator . . 4.1.2.3 Instance generator . . 4.1.3 Diagram building . . . . . . . . 4.1.4 Symbol-Arrow relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 25 25 28 28 29 31 32 33 i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 4.3 5 6 7 4.1.5 The relationship of text . . . . . . . . . . . . . . . . . . . . . . . . . . Handwriting Text Recognition Approach . . . . . . . . . . . . . . . . . . . . . Digital diagram output format . . . . . . . . . . . . . . . . . . . . . . . . . . System design 5.1 Requirements . . . . . . . . . . . . . . . . 5.1.1 Functional requirement . . . . . . . 5.1.2 Nonfunctional requirement . . . . . 5.1.3 Hardware requirement . . . . . . . 5.2 System Architecture . . . . . . . . . . . . . 5.3 Framework . . . . . . . . . . . . . . . . . 5.3.1 Flutter . . . . . . . . . . . . . . . . 5.3.2 Nodejs . . . . . . . . . . . . . . . 5.4 Database Design . . . . . . . . . . . . . . 5.4.1 Diagram File Design . . . . . . . . 5.5 Feature design . . . . . . . . . . . . . . . . 5.5.1 Usecase Design . . . . . . . . . . . 5.5.2 Login/Register Screen . . . . . . . 5.5.3 Diagram List . . . . . . . . . . . . 5.5.4 Create diagram . . . . . . . . . . . 5.5.4.1 Diagram Scanning . . . . 5.5.4.2 Create from blank . . . . 5.5.5 Diagram Editing . . . . . . . . . . 5.5.6 Exporting . . . . . . . . . . . . . . 5.5.6.1 Converting to drawio files 5.5.7 Member Management . . . . . . . 5.5.8 Version and History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 36 37 40 40 40 41 41 42 42 42 43 44 44 45 45 46 47 47 47 48 49 53 54 54 54 Experiments 6.1 Initial experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Experiments on the recognition pipeline . . . . . . . . . . . . . . . . . . . . . 6.2.1 Perform training and evaluation on HTR model . . . . . . . . . . . . . 6.2.2 Perform training and evaluation on diagram recognition model . . . . . 6.2.3 Perform experiments on the combination of diagram recognition model and HTR model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Display diagram on device . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Interactive Viewer and Matrix4 . . . . . . . . . . . . . . . . . . . . . 6.3.2 Rendering diagram recognition on device . . . . . . . . . . . . . . . . 55 55 55 56 58 58 59 Conclusion and Future Work 7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 68 68 69 A Usecase detail 59 63 63 63 70 B User interface design B.1 Login/Register Screen . . . . . B.1.1 Login Screen . . . . . B.1.2 Register Screen . . . . B.2 Home Screen . . . . . . . . . B.2.1 Diagram List . . . . . B.2.2 Diagram Option . . . B.2.3 Scanning . . . . . . . B.2.4 Exporting . . . . . . . B.2.5 Member Management B.2.6 Other options . . . . . B.3 Editing Screen . . . . . . . . . B.3.1 Vertex List . . . . . . B.3.2 Zoom View . . . . . . B.3.3 Edit Option . . . . . . B.4 Diagram history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 83 83 84 85 85 86 87 88 89 90 91 91 92 93 96 C Testing 97 C.1 Login and register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 C.2 Home screen options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 C.3 Editing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 List of Tables 4.1 Statistics of DIDI images[26] . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.1 6.2 6.3 Number of symbols in dataset . . . . . . . . . . . . . . . . . . . . . . . . . . Measure Arrow Average Precision . . . . . . . . . . . . . . . . . . . . . . . . Evaluation summary of the two models . . . . . . . . . . . . . . . . . . . . . 58 58 61 A.1 A.2 A.3 A.4 A.5 A.6 A.7 A.8 A.9 A.10 A.11 A.12 A.13 A.14 Usecase List . . . . . . . . . . . . . Usecase: Login. . . . . . . . . . . . Usecase: Sign up. . . . . . . . . . . Usecase: Create new diagram. . . . Usecase: Scan diagram with camera. Usecase: Scan from image. . . . . . Usecase: Preview Diagram. . . . . . Usecase: Export file. . . . . . . . . Usecase: Modify diagram. . . . . . Usecase: Delete Diagram. . . . . . . Set permission. . . . . . . . . . . . View version history. . . . . . . . . Comment. . . . . . . . . . . . . . . Usecase: Logout. . . . . . . . . . . 70 71 72 73 74 75 76 77 78 79 80 81 81 82 . . . . . . . . . . . . . . iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . List of Figures 2.1 2.2 Flor-HTR architecture [35] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dataset samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 11 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 ResNet50 architecture [45] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anchor Box in RPN [46] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Result of a Non-Maximum Suppression application[47] . . . . . . . . . . . . . Region of Interest Pooling[48] . . . . . . . . . . . . . . . . . . . . . . . . . . Mask R-CNN architecture[51] . . . . . . . . . . . . . . . . . . . . . . . . . . Binary mask sample in diagram recognition . . . . . . . . . . . . . . . . . . . Feature Pyramid Network[53] . . . . . . . . . . . . . . . . . . . . . . . . . . Region of Interest Align [54] . . . . . . . . . . . . . . . . . . . . . . . . . . . Long Short Term Memory [56] . . . . . . . . . . . . . . . . . . . . . . . . . . Gated Recurrent Unit [57] . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bidirectional RNN [58] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Horizontal position of characters [60] . . . . . . . . . . . . . . . . . . . . . . Character-score Matrix [60], the black lines presents the path to get character "a" ("aa", "a-" and "-a"), while the dash line presents the character "" ("–") . . . 13 14 15 16 17 17 18 19 20 21 21 22 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 DIDI sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A original sample of FC dataset (left) and preprocessed result (right) Our drawn image (left) and the preprocessed result (right) . . . . . . Our pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feature Pyramid Network with ResNet[63] . . . . . . . . . . . . . . A prediction of our model . . . . . . . . . . . . . . . . . . . . . . . Example of Eucludian Distance not working . . . . . . . . . . . . . Preprocessed samples . . . . . . . . . . . . . . . . . . . . . . . . . Line segmentation sample . . . . . . . . . . . . . . . . . . . . . . Diagram recognized image . . . . . . . . . . . . . . . . . . . . . . Model JSON output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 27 27 28 29 32 33 37 37 38 39 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 System Architecture Design Database Design . . . . . . Diagram JSON file design . Usecase Design . . . . . . . Login/Register Screen . . . . Home Screen . . . . . . . . Scanning sequence . . . . . Scanning phrase . . . . . . . Flowchart symbols . . . . . Edit Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 43 45 46 47 48 49 50 51 53 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 6.1 6.2 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 Testing pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experiment results of figure 6.1a. (a) The warped image of figure 6.1a after applying perspective transformation and grayscale conversion; (b) The binary image converted from (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experiment results of figure 6.1b. (a) figure 6.1b after resizing and grayscale conversion; (b) the binary image converted from figure (a). . . . . . . . . . . . Inference results from model training with DIDI dataset images only . . . . . . Inference results from model training with new dataset . . . . . . . . . . . . . Loss and validate loss over epoch of HTR model . . . . . . . . . . . . . . . . . Loss over iterations of diagram recognition model . . . . . . . . . . . . . . . . Inference results at above 0.6 score . . . . . . . . . . . . . . . . . . . . . . . . (a) Normal text box and (b) Padded text box . . . . . . . . . . . . . . . . . . . Small boxes in sub function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Inference with problem drawings . . . . . . . . . . . . . . . . . . . . . . . . . Example of matrix4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interactive space using identity matrix . . . . . . . . . . . . . . . . . . . . . . Scaling in X and Y . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scaling in Z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Moving the space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diagram display result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rendering drawn diagram pictures on a mobile device . . . . . . . . . . . . . . 57 58 59 60 60 61 61 62 62 64 64 65 65 66 66 67 B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.9 B.11 Login Screen . . . Register Screen . . Diagram List . . . Turn off save a copy Scanning . . . . . Export Sheet . . . Management Sheet Add vertex . . . . . Zoom options . . . 83 84 85 86 87 88 89 91 92 6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 57 Chapter 1 Introduction 1.1 Overview Diagram has quickly risen to become one of the efficient communication method that has replaced text for demonstrating certain types of information such as algorithms, business process models, and production structure. The ideas proposed that visualized by diagrams are more clear than any word can do, which helps viewers easily comprehend the key ideas, how it works, and so on. Additionally, people tend to process information visually and be able to remember graphical information more readily than anything we read. The powerful effects of diagrams are able to be seen in many common events which we often attend, presentations. It will be a nightmare for the audience if a presentation only uses words, numbers to describe the knowledge. The inability to absorb the raw knowledge in a limited time will lead to most of them are leaked and the failure of the presentation is inevitable. On the other hand, with informative diagrams or pictures, their presentation will be more catchy and comprehensive, thus helps audiences understand the illustrated ideas faster comparing with texts. Nowadays, due to the benefits of diagrams, various of services are created to serve the purpose of creating diagrams with diverse types and a wide range of supported platforms such as web, desktop, and mobile. One of the most popular is the Lucidchart website, draw.io website, DrawExpress Diagram Lite for android, etc. However, an idea or plan rarely to be created on these applications at the beginning. They are usually sketched on paper or whiteboard in meetings. These initial conceptions are crucial for building bigger, more complicated designs and on a greater scale. So, in order to make it global or just simply to share them with everyone in a group, they need to be digitized. Organizations and companies usually have people redrawn these ideas on the computer and export them to editable files, yet this can raise a lot of potential problems. First of all, this is a waste of time and resources. The design on paper needs to be replicated exactly to preserve the original proposal. In addition, there is still a limited option when it takes to save these sketches and transform them into digital form for storing and referencing in the future. Not to mention that sometimes due to technical problems, these jobs cannot be done by the creator, the one who understand the design most, and it can lead to information loss or misconception. Realizing the need for a product that saving these raw drawings, we decided to build an application that is able to convert hand-drawn ideas into digital form. This app should also have other diagram-based services for users such as sharing diagrams across people in a group, edit and modify the digital form of the sketch diagram, etc. For this product, in this proposal, we need to carry out some surveys in the area of object detection and the existing applications related to the diagram for the services which our project can offer to users. 1 INTRODUCTION In the field of object detection, there are many approaches developed for various types of object. These approaches are categorized into two types which are two-stage and one-stage. The two-stage detectors, such as Faster R-CNN and Mask R-CNN, which have the Region proposal network (RPN) to generate the regions of interests (RoI) in the first state, then use these regions for object classification and bounding-box regression. The accuracy of these approaches reaches the highest rate, but these approaches are slow. on the other hand, the approaches considered as the one-stage like YOLO (You Only Look One) and SSD (Single Shot Multi Detector), which do not have the stage generating the bounding box, only classify the objects and estimate their size with bounding-box in one go. These approaches are faster than the two-stage, but their accuracy is lower. For the diagram recognition, there are many small symbols, such as arrow, which needs a high accuracy approaches to detect than a faster one. In the two-stage, we find out that there are some attempts with the detection of diagrams, especially flowcharts. These researches clarified the flowcharts into two main groups: online and offline recognition. In online recognition, the diagram is drawn as a sequence of strokes using a device with a ink input device with digital pen. The approach for this group is often in the form of RNN. For the online recognition, many researches focused on this field such as [1, 2, 3, 4, 5, 6, 7, 8, 9]. For the offline recognition, the target used for detection is handwritten diagram from an image captured by the camera. However, there are less the attempts [10, 11] for this group until 2019, 2020. In these years, a paper introduced about a new approach called Arrow R-CNN [12]. The Arrow R-CNN is designed for the offline diagram recognition, whose targets include detection of text, non-arrow symbols like process, decision and the arrows symbols for the relations between the non-arrow ones. The approach of the paper built its system with the Faster R-CNN as the base. By using the Feature Pyramid Network (FPN), it also handles the limitation of the Faster RCNN, which is the datasets containing objects have a large-scale variance. For the arrows, the paper uses the key-points prediction, which is often used to detect human’s pose from other field of object detection, to deal with the diagram arrow. However, the approach only detects the regions of text in diagram, which we also needs to find a handwritten character recognition approach to digitize the text in the diagram. These approaches are going to be described more clearly in Chapter 2 and we dive deeply about the detail in the layers of these approaches for better understanding in Chapter 3. 1.2 Project Goals This project’s main target is to build a diagram recognition system consisting of a computer vision model that can convert the hand-drawn flowchart from a source of the image into a digital form, and the other is an application service that utilizes this model to serve real-life projects. The service should include these features and restrictions: • Develop a system that includes a server to process and run recognition models and an Android application using the Flutter framework for users to scan and modify diagrams. • Users will need to have an internet connection and a system account in order to access any function of the service. The app is best used in vertical mode. • The system only supports creating and editing flowcharts in this thesis scope. The mobile application can be used to take picture of the diagram, scan it and create digital versions. Users can modify the diagram with provided tools. • The system only supports converting and editing small charts (A4 page with medium size text is recommended, drawn with a ball pen on fresh white paper) and the flowchart contains a maximum of 20 symbols, not including arrows. The flowchart must not have any 2 INTRODUCTION intersection of arrows. It also must not have two arrows having the same edge. The arrow in flowchart must have an arrowhead. In addition, there must not be isolated text/symbol, and strike through text is restricted. The diagram should only contain supported elements. The list of supported vertices and arrows will be discussed in detail in chapter 5. • The app does not support importing files from other platforms. Users can export the digital diagram into other type such as picture, document or other diagram platforms (The system only supports exporting chart files to images (png, jpg), text (pdf), and .drawio files). • Members with different roles can be added to the project to work on the same diagram, each role must have specific permissions and is only allowed to perform certain actions. • Each time a new changes is saved, it will be uploaded as a new version. Users can then review the previous version and discuss within the app. • The camera of the device should be in good condition and the surface on which the diagram is drawn should be clean, flat, and distinguishable with the background. • All of the required permission on the app should be granted to have the best experience. • The device used to run this application should have adequate performance and suitable hardware (this will be discussed in detail in chapter 5). The report is organized as follows Chapter 2 briefly surveys application that can detect object and related work in diagram detection in general and flowchart detection in particular. Chapter 3 provides sufficient knowledge in order to implement the project and understand the related work. Chapter 5 shows our proposed system, including how the application works and shows the implementation of the application and server. Chapter 6 lists our experiment and implementation of the system. Finally, Chapter 7 shows our challenge and potential future in the thesis. 3 Chapter 2 Related works 2.1 2.1.1 Related Applications Object recognition There have been numerous applications that focus on detecting and recognizing various types of objects. For example, Google Lens, an AI-powered service, introduced by Google [13], is designed not only to detect what kind of object appears in the picture but also to search for relevant information related to the object like price, age, or where to get it. It is a combination of many machine learning techniques that are developed and improved in many fields, which gives Google Lens the ability to access a wide range of knowledge and have an incredible speed in image processing. Using Google Lens, users can identify many types of objects such as plants or animals, they are also able to copy and paste text from the real work to use in their smartphone. Google Lens also provides users the information regarding the object it identifies. This means if users give an image of a flower as an input to the application, it will show the user the name of that flower, other flower images related to the target flower, and other details. There computer vision, machine learning techniques that lead it to success are Region Proposal Network (RPN) to detect object bounding boxes and Convolutional Neural Network (CNN) with a Long shortterm memory (LSTM) to recognize text in images. Another example is Microsoft Math Solver [14], released by Microsoft in December 2019. This application is designed with the service using optical character recognition (OCR) approaches to read an image of a handwritten math problem from students and solve the problem of typing complex formulas or expressions. The application’s main services focus on using OCR and Natural language processing techniques to categorize letters, math symbols, and characters. There are also other applications that can detect certain types of objects such as Aipoly Vision [15] for recognizing object and color to help the blinds, visually impaired, and color blind; and Vivino [16] for providing information related to wine from the image source. 2.1.2 Diagram tools One of the most popular tools that use to create diagrams and diagram-related products are Lucidchart [17] and draw.io [18]. These applications provide a multi-platform service to create many forms of digital charts for users to express their ideas as freely as possible. Users can also share the work with ease on many accounts, many platforms, or even many devices. Ultimately, real-time collaboration can connect these users to break the limit of distance and time. Users are not limited to one or two kinds of content they can access. These tools give a 4 RELATED WORKS wide range of diagrams users can draw, such as database charts for purposes like conceptual design, relational database design, or diagrams demonstrating software architecture like usecase diagrams, sequence diagrams, and flowcharts. They offer many kinds of symbols, arrows, and many fonts, styles of text for users to describe their ideas as visually as possible to others in a perfect graphical form. 2.1.3 Diagram recognition applications on mobile devices As for specific devices like mobile phones, smartwatches, or smart bands, their popularity is so enormous that the company that can bring their services into these little inseparable technologies have a huge advantage in approaching customers, and using the service will soon become a user’s daily routine, which can bring great benefit for the company. This is also true for diagram services on mobile like Lekh Diagram [19] and DrawExpress [20]. Imagine having a whiteboard in the pocket that is ready every time a new idea pops up in the user’s mind. However, the worst deal-breaker for these services is the accuracy, and the ability to interact with the device is limited. Most phone users use their fingers to touch, slide and scroll on the screen to communicate with the devices. These actions may work well on a bigger screen and with a pen where it can freely interact and not be bounded by the screen area or the precision of the pen tip. However, on a much smaller and much bigger fingertip, any simple task like drawing or choosing a slightly small object can become impossible and bring poor user experiences. Not to mention, many people are used to writing on paper using a pen instead of on the phone. All of these drawbacks explain why there are not many applications designed for professionally designing on mobile platforms. So why the services mentioned earlier are still working well. Most of them allow users to simplify the amount of work they need to visualize their ideas. The most straightforward actions on mobile devices (touch, slide, rotate, drag and drop) are now being used to enhance and unlock users from being cramped with traditional computer interaction. Drag and drop become the most popular actions used in designed applications instead of selecting the location and adjusting. Furthermore, Lekh Diagram and DrawExpress also allow users to sketch a shape and beautify it with artificial intelligence to recognize the form and tell the system to change it into the correct one in real-time. The user can draw a not-perfect triangle connected by a messy square and still get the most precise result they want, thanks to the real-time recognition feature. However, most of the existed systems or services right now only focus on real-time recognition. That means users still need to input the diagram themselves even though they have already drawn it somewhere else. In addition, some people may find working on an actual paper or whiteboard makes their work much more convenient. It would be a bad experiment to do the same job again to digitalize your work. We realized that many people prefer to work on paper using pen and traditional tools to caving their work, yet still need to modernize the workflow by digitalizing the result, putting it online, and sharing it with others. This system is designed to capture the hand-drawing diagram with the device’s camera and put it into the device, which you can bring along, edit, share and discuss with the team. 2.2 Diagram recognition As we mentioned in chapter 1, there are two types of diagram recognition which are Online Diagram Recognition and Offline Diagram Recognition. There was more attention to the Online Diagram Recognition to handle the diagram handwritten drawn by digital ink device 5 RELATED WORKS in the past. Firstly, Valois et al. (ICDAR2001)[3] proposed a solution about online recognizing sketched electrical diagrams. The proposed system tried to decompose the ink strokes into primitive components (lines or arcs). Then, the system checks whether it can merge these primitives and their neighbors into a higher-level component. Each set of relations predefined for the primitives is recognized as matching the confidence factor using probabilistic normalization functions. Its downsides are the system’s simplicity, and their low accuracy leads it not to be a suitable approach in real-life situations. Feng et al. (j.patcog2009)[4] proposed a more modern technique in recognizing electrical circuits. Symbol hypotheses generation and classification are generated using a Hidden Markov Model (HMM) and traced on 2D-DP. However, when dealing with a large diagram or a huge number of hypotheses, it becomes slow. Thus, it is also not considered as an approach that we can use in our project. ChemInk by Tom and Randall (IUI2011)[21], a system for detecting chemical formula sketches, categorizing strokes into elements and bonds between them. The final joint is performed using conditional random fields (CRF), which combines features from a three-layer hierarchy: ink points, segments, and candidate symbols. Qi et al. (CVPR2005)[22] applies a similar approach to recognize diagram structure with Bayesian CRF - ARD. These methods outperform traditional techniques, but in the final step of recognition, they used pairwise for joining the features, causing them to be harder for future adaptations. In addition, these approaches only focused on the symbols; they did not mention text in the diagram, while there are many words, letters presented in the diagram in real-life situations. After Awal et al. (SPIE2011) released the Online Handwritten Flowchart Dataset (OHFCD) [23], many researchers concentrated on a new target, the flowchart involving this dataset. The next approaches got an improvement that is they also mentioned text and proposed some methods to classify text and non-text symbols. Lemaitre et al.(2013)[5] proposed DMOS (Description and MOdification of the Segmentation) for online flowchart recognition. The work of Wang et al. (ICFHR2016)[6] used a max-margin Markov Random Field to perform segmentation and recognition. In paper of Wang et al. (IJDAR2017)[7], they extend their work by adding a grammatical description that combines the labeled isolated strokes while ensuring global consistency of the recognition. Bresler et al. (ICDAR2013)[8] proposed a pipeline model, where they separate strokes and text by using a text/non-text classifier. Then, they detect symbol candidates by using a max-sum model by a group of temporally and spatially close strokes. The author also proposed an offline extension that uses a preprocessing model to reconstruct the strokes from flowchart [9]. While online flowchart recognition detects candidates based on strokes, offline flowchart recognition recognizes the targets from the image source. Bresler also gave some attempts in the offline flowchart recognition; he provided a preprocessing stage to reconstruct online stroke from offline data [10]. However, that preprocessing step is waste-time because we can recognize the whole diagram structure independently with strokes. As online recognition seem to attract more researchers, there have not been many studies about offline detection. Julca-Aguilar and Hirata proposed a method using Faster R-CNN to detect candidates and evaluate its accuracy on OHFCD in [24]. Using this approach, they need to convert the online data to offline, which we can also consider as an offline approach. The model can detect components in the diagram, including arrows, but it cannot detect the arrowhead. Until late 2019, early 2020, there is a new attempt researching offline recognition for the flowchart. The paper introduces a new model called Arrow R-CNN [12] which improves the version of Faster R-CNN. Faster RCNN has a limitation when it works with datasets where objects have a large-scale variance. To handle this problem, the author added Feature Pyramid Network in the backbone of the model. By this approach, the backbone will generate a pyramid of feature maps at different scales. The image feature pyramid is a multi-scale feature representation in which all levels are semanti6

- Xem thêm -

Tài liệu Building a diagram recognition application with computer vision approach

Tài liệu liên quan

Tài liệu vừa đăng

Tài liệu xem nhiều nhất