Đăng ký Đăng nhập
Trang chủ Giáo dục - Đào tạo Cao đẳng - Đại học Nhận dạng cử chỉ động của bàn tay người sử dụng kết hợp thông tin hình ảnh và độ...

Tài liệu Nhận dạng cử chỉ động của bàn tay người sử dụng kết hợp thông tin hình ảnh và độ sâu ứng dụng trong tương tác người thiết bị.

.PDF
157
731
122

Mô tả:

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECNOLOGY THI HUONG GIANG DOAN DYNAMIC HAND GESTURE RECOGNITION USING RGB-D IMAGES FOR HUMAN-MACHINE INTERACTION DOCTORAL THESIS OF CONTROL ENGINEERING AND AUTOMATION Hanoi 12−2017 MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECNOLOGY THI HUONG GIANG DOAN DYNAMIC HAND GESTURE RECOGNITION USING RGB-D IMAGES FOR HUMAN-MACHINE INTERACTION Specialty: Control Engineering and Automation Specialty Code: 62520216 DOCTORAL THESIS OF CONTROL ENGINEERING AND AUTOMATION SUPERVISORS: 1. Dr. Hai Vu 2. Dr. Thanh Hai Tran Hanoi 12−2017 DECLARATION OF AUTHORSHIP I, Thi Huong Giang Doan, declare that the thesis titled, “Dynamic Hand Gesture Recognition Using RGB-D Images for Human-Machine Interaction”, and the works presented in it are my own. I confirm that:       This work was done wholly or mainly while in candidature for a Ph.D. research degree at Hanoi University of Science and Technology. Where any part of this thesis has previously been submitted for a degree or any other qualification at Hanoi University of Science and Technology or any other institution, this has been clearly stated. Where I have consulted the published work of others, this is always clearly attributed. Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work. I have acknowledged all main sources of help. Where the thesis is based on work done by myself jointly with others, I have made exactly what was done by others and what I have contributed myself. Hanoi, December 2017 PhD STUDENT Thi Huong Giang DOAN SUPERVISORS Dr. Hai VU Dr. Thi Thanh Hai TRAN i ACKNOWLEDGEMENT This thesis was written during my doctoral study at International Research Institute Multimedia, Information, Communication and Applications (MICA), Hanoi University of Science and Technology (HUST). It is my great pleasure to thank all the people who supported me for completing this work. First, I would like to express my sincere gratitude to my advisors Dr. Hai Vu and Dr. Thi Thanh Hai Tran for the continuous support of my Ph.D. study and related research, for their patience, motivation, and immense knowledge. Their guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor for my Ph.D. study. Besides my advisors, I would like to thank the scientists and the authors of the published works which are cited in this thesis, and I am provided with valuable information resources from their works for my thesis. The attention at scientific conferences have always been a great experience for me to receive many the useful comments. In the process of implementation and completion of my research, I have received many supports from the board of MICA directors. My sincere thanks go to Prof. Yen Ngoc Pham, Prof. Eric Castelli and Dr. Son Viet Nguyen, who provided me with an opportunity to join researching works in MICA institute, and who gave access to the laboratory and research facilities. Without their precious support would it have been being impossible to conduct this research. As a Ph.D. student of 911 programme, I would like to thanks 911 programme for their financial support during my Ph.D course. I also gratefully acknowledge the financial support for publishing papers and conference fees from research projects T2014-100, T2016-PC-189, and T2016-LN-27. I would like to thank my colleagues at Computer Vision Department and Multi-Lab of MICA institute over the years both at work and outside of work. Special thanks to my family. Words can not express how grateful I am to my mother and father for all of the sacrifices that they have made on my behalf. I would also like to thank my beloved husband. Thank you for supporting me for everything. Hanoi, December 2017 Ph.D. Student Thi Huong Giang DOAN ii CONTENTS DECLARATION OF AUTHORSHIP i ACKNOWLEDGEMENT ii CONTENTS vi SYMBOLS vii LIST OF TABLES xi LIST OF FIGURES xvi 1 LITERATURE REVIEW 1.1 Completed hand gesture recognition systems for controlling home appliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 GUI device dependent systems . . . . . . . . . . . . . . . . . . . 1.1.2 GUI device independent systems . . . . . . . . . . . . . . . . . 1.2 Hand detection and segmentation . . . . . . . . . . . . . . . . . . . . . 1.2.1 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.5 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Hand gesture spotting system . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Model-based approaches . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Feature-based approaches . . . . . . . . . . . . . . . . . . . . . 1.3.3 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Dynamic hand gesture recognition . . . . . . . . . . . . . . . . . . . . . 1.4.1 HMM-based approach . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 DTW-based approach . . . . . . . . . . . . . . . . . . . . . . . 1.4.3 SVM-based approach . . . . . . . . . . . . . . . . . . . . . . . . 1.4.4 Deep learning-based approach . . . . . . . . . . . . . . . . . . . 1.4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 8 8 14 18 19 20 21 21 23 24 25 27 29 29 30 31 33 34 35 35 2 A NEW DYNAMIC HAND GESTURE SET OF CYCLIC MOVEMENT 37 iii 8 2.1 2.2 2.3 2.4 2.5 Defining dynamic hand gestures . . . . . . . . . . . . . . . . . . . . . . The existing dynamic hand gesture datasets . . . . . . . . . . . . . . . 2.2.1 The published dynamic hand gesture datasets . . . . . . . . . . 2.2.1.1 The RGB hand gesture datasets . . . . . . . . . . . . . 2.2.1.2 The Depth hand gesture datasets . . . . . . . . . . . . 2.2.1.3 The RGB and Depth hand gesture datasets . . . . . . 2.2.2 The non-published hand gesture datasets . . . . . . . . . . . . . 2.2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definition of the closed-form pattern of gestures and phasing issues . . 2.3.1 A conducting commands of a dynamic hand gestures set . . . . 2.3.2 Definition of the closed-form pattern of gestures and phasing issues 2.3.3 Characteristics of dynamic hand gesture set . . . . . . . . . . . Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 MICA1 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 MICA2 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 MICA3 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 MICA4 dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 3 HAND DETECTION AND GESTURE SPOTTING WITH USERGUIDE SCHEME 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Heuristic user-guide scheme . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Proposed framework . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Estimating heuristic parameters . . . . . . . . . . . . . . . . . . 3.2.3.1 Estimating parameters of background model for body detection . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3.2 Estimating the distance from hand to the Kinect sensor for extracting hand candidates . . . . . . . . . . . . . 3.2.3.3 Estimating skin color parameters for pruning hand regions 3.2.4 Hand detection phase using heuristic parameters . . . . . . . . . 3.2.4.1 Hand detection . . . . . . . . . . . . . . . . . . . . . . 3.2.4.2 Hand posture recognition . . . . . . . . . . . . . . . . 3.3 Dynamic hand gesture spotting . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Catching buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.2 Spotting dynamic hand gesture . . . . . . . . . . . . . . . . . . 3.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 The required learning time for end-users . . . . . . . . . . . . . iv 37 38 38 38 40 41 44 46 47 47 48 50 51 51 52 53 54 55 56 56 58 58 58 60 60 62 63 65 65 66 66 66 67 71 71 3.4.2 3.4.3 3.5 The computational time for hand segmentation and recognition Performance of the hand region segmentations . . . . . . . . . . 3.4.3.1 Evaluate the hand segmentation . . . . . . . . . . . . 3.4.3.2 Compare the hand posture recognition results . . . . . 3.4.4 Performance of the gesture spotting algorithm . . . . . . . . . . Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 75 75 75 76 78 78 78 4 DYNAMIC HAND GESTURE REPRESENTATION AND RECOGNITION USING SPATIAL-TEMPORAL FEATURES 79 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2 Proposed framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.1 Hand representation from spatial and temporal features . . . . . 81 4.2.1.1 Temporal features extraction . . . . . . . . . . . . . . 81 4.2.1.2 Spatial features extraction using linear reduction space 83 4.2.1.3 Spatial features extraction using non-linear reduction space . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.2.2 DTW-based phase synchronization and KNN-based classification 86 4.2.2.1 Dynamic Time Warping for phase synchronization . . 86 4.2.2.2 Dynamic hand gesture recognition using K-NN method 88 4.2.3 Interpolation-based synchronization and SVM Classification . . 89 4.2.3.1 Dynamic hand gesture representation . . . . . . . . . . 89 4.2.3.2 Quasi-periodic dynamic hand gesture pattern . . . . . 91 4.2.3.3 Phase synchronization using hand posture interpolation 94 4.2.3.4 Dynamic hand gesture recognition using difference classifications . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.3.1 Influence of temporal resolution on recognition accuracy . . . . 97 4.3.2 Tunning kernel scale parameters RBF-SVM classifier . . . . . . 98 4.3.3 Performance evaluation of the proposed method . . . . . . . . . 99 4.3.4 Impacts of the phase normalization . . . . . . . . . . . . . . . . 100 4.3.5 Further evaluations on public datasets . . . . . . . . . . . . . . 101 4.4 Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.4.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5 CONTROLLING HOME APPLIANCES USING DYNAMIC HAND GESTURES 105 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 v 5.2 5.3 5.4 Deployment of control systems using hand gestures . . . . . . . . . . . 5.2.1 Assignment of hand gestures to commands . . . . . . . . . . . . 5.2.2 Different modes of operations carried out by hand gestures . . . 5.2.2.1 Different states of lamp and their transitions . . . . . . 5.2.2.2 Different states of fan and their transition . . . . . . . 5.2.3 Implementation of the control system . . . . . . . . . . . . . . . 5.2.3.1 Main components of the control system using hand gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3.2 Integration of hand gesture recognition modules . . . . Experiments of control systems using hand gestures . . . . . . . . . . . 5.3.1 Environment and material setup . . . . . . . . . . . . . . . . . . 5.3.2 Pre-built script . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3.1 Evaluation of hand gesture recognition . . . . . . . . . 5.3.3.2 Evaluation of time costs . . . . . . . . . . . . . . . . . 5.3.4 Evaluation of usability . . . . . . . . . . . . . . . . . . . . . . . Discussion and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bibliography 105 105 107 107 108 108 108 109 115 115 116 117 118 119 120 121 121 122 126 vi ABBREVIATIONS TT Abbreviation Meaning 1 ANN Artifical Neural Network 2 ASL American Sign Language 3 BB Bounding Box 4 BGS Background Subtraction 5 BW Baum Welch 6 BOW Bag Of Words 7 C3D Convolutional 3D 8 CD Compact Disc 9 CIF Common Intermediate Format 10 CNN Convolution Neural Network 11 CPU Central Processing Unit 12 CRFs Conditional Random Fields 13 CSI Channel State Information 14 DBN Deep Belief Network 15 DDNN Deep Dynamic Neural Networks 16 DoF Degree of Freedom 17 DT Decision Tree 18 DTM Dense Trajectories Motion 19 DTW Dynamic Time Warping 20 FAR False Acceptance Rate 21 FD Fourier Descriptor 22 FP False Positive 23 FN False Negative 24 FSM Finite State Machine 25 fps f rame per second 26 GA Genetic Algorithm 27 GMM Gaussian Mixture Model 28 GT Ground True 29 GUI Graphic User Interface 30 HCI Human Computer Interaction vii 31 HCRFs Hidden Conditional Random Fields 32 HNN Hopfield Neural Network 33 HMM Hidden Markov Model 34 HOG Histogram of Oriented Gradient 35 HSV Hue Saturation Value 36 ID IDentification 37 IP Internet Protocol 38 IR InfRared 39 ISOMAP ISOmetric MAPing 40 JI Jaccard Index 41 KLT Kanade Lucas Tomasi 42 KNN K Nearest Neighbors 43 LAN Local Area Network 44 LE Laplacian Eigenmaps 45 LLE Locally Linear Embedding 46 LRB Left Right Banded 47 MOG Mixture of Gaussian 48 MFC Microsoft Founding Classes 49 MSC Mean Shift Clustering 50 MR Magic Ring 51 NB Naive Bayesian 52 PC Persional Computer 53 PCA Principal Component Analysis 54 PDF Probability Distribution Function 55 PNG Portable Network Graphics 56 QCIF Quarter Common Intermediate Format 57 RAM Random Acess Memory 58 RANSAC RANdom SAmple Consensus 59 RBF Radial Basic Function 60 RF Random Forest 61 RGB Red Green Blue 62 RGB-D Red Green Blue Depth 63 RMSE Root Mean Square Error 64 ROI Region of Interest 65 RNN Recurrent Neural Network viii 66 SIFT Scale Ivariant Feature Transform 67 SVM Support Vector Machine 68 STE Short Time Energy 69 STF Spatial Temporal Feature 70 ToF Time of Flight 71 TN True Negative 72 TP True Positive 73 TV TeleVion 74 XML Xextensible Markup Language ix LIST OF TABLES Table 1.1 Soft remote control system and commands assignment . . . . . . 12 Table 1.2 Omron TV command assignment . . . . . . . . . . . . . . . . . . 15 Table 1.3 Hand gestures utilized for different devices using Wisee technique 16 Table 1.4 Hand gestures utilized for different devices using MR technique. . 17 Table 1.5 The existing in-air gesture-based systems . . . . . . . . . . . . . 18 Table 1.6 The existing vision-based dynamic hand gesture methods . . . . 36 Table 2.1 The existing Hand gesture datasets . . . . . . . . . . . . . . . . . 46 Table 2.2 The main commands of some smart home electrical appliances . 48 Table 2.3 Notations used in this research . . . . . . . . . . . . . . . . . . . 50 Table 2.4 Characteristic of the defined databases . . . . . . . . . . . . . . . 55 Table 3.1 The required time to learning parameters of the background model 72 Table 3.2 The required time to learn parameters of the hand-skin color model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Table 3.3 The required time to learn the hand to Kinect distance . . . . . 73 Table 3.4 The required time to hand segmentation . . . . . . . . . . . . . 74 Table 3.5 The required time to hand posture recognition . . . . . . . . . . 74 Table 3.6 Results of the JI indexes without/with learning scheme 75 . . . . . Table 4.1 Recall rate the proposed method (%) on myself datasets with the difference classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Table 4.2 Performance of the proposed method on three different datasets . 103 Table 5.1 Assignment of hand gestures to commands for controlling lamp and fan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Table 5.2 Confusion matrix of dynamic hand gesture recognition . . . . . . 118 x Table 5.3 Accuracy rate (%) of dynamic hand gesture commands . . . . . . 118 Table 5.4 Assessment of end-users on the defined dataset . . . . . . . . . . 120 xi LIST OF FIGURES Figure 1 Home appliances in a smart homes . . . . . . . . . . . . . . . . 3 Figure 2 Controlling home appliances using dynamic hand gestures in smart house. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Figure 3 The proposed frame-work of the dynamic hand gesture recognition for controlling home appliances. . . . . . . . . . . . . . . . . . . . 6 Figure 1.1 Mitsubishi hand gesture-based TV [46]. . . . . . . . . . . . . . . 9 Figure 1.2 Samsung-Smart-TV using hand gestures. . . . . . . . . . . . . . 10 Figure 1.3 Dynamic hand gestures used for Samsung-Smart-TV. . . . . . . 10 Figure 1.4 Hand gesture commands in Soft Remote Control System [39]. . 11 Figure 1.5 General framework of the Soft remote control system [39]. . . . 11 Figure 1.6 Hand gesture-based home appliances system [143]. . . . . . . . . 12 Figure 1.7 TV Controlling with GUI of Dynamic gesture recognition [151]. 13 Figure 1.8 Commands of GUI of Dynamic gesture recognition [103] . . . . 13 Figure 1.9 Features of the Omron dataset [3]. . . . . . . . . . . . . . . . . . 14 Figure 1.10 Wi-Fi signals to control home appliances using hand gesture [119]. 15 Figure 1.11 Seven hand gestures for wireles-based interaction[9] (Wisee dataset). 16 Figure 1.12 Simulation of using MR to control some home appliances [62]. . 17 Figure 1.13 AirTouch-based control uses depth cue [33]. . . . . . . . . . . . 18 Figure 1.14 Depth threshold cues and face skin [97]. . . . . . . . . . . . . . 22 Figure 1.15 Depth threshold and skeleton [60]. . . . . . . . . . . . . . . . . . 23 Figure 1.16 The process of detecting hand region [69]. . . . . . . . . . . . . 23 Figure 1.17 Spotting dynamic hand gestures system using HMM model [71]. 25 Figure 1.18 Threshold using HMM model for different gestures [71]. . . . . . 26 Figure 1.19 CRFs-based spotting method use threshold [142]. . . . . . . . . 26 xii Figure 1.20 Designed gesture in proposed method [13] . . . . . . . . . . . . . 28 Figure 1.21 Two gesture boundaries is spotted [65]. . . . . . . . . . . . . . . 29 Figure 1.22 Gesture recognition using HMM [42]. . . . . . . . . . . . . . . . 31 Figure 1.23 Gesture features extraction [8]. . . . . . . . . . . . . . . . . . . 33 Figure 2.1 Periodic image sequences appear in many common actions. . . . 38 Figure 2.2 Four hand gestures of [83]. . . . . . . . . . . . . . . . . . . . . . 39 Figure 2.3 Cambridge hand gesture dataset of [67]. . . . . . . . . . . . . . 39 Figure 2.4 Five hand gestures of [82]. . . . . . . . . . . . . . . . . . . . . . 40 Figure 2.5 Twelve dynamic hand gestures of the MSRGesture3D dataset [1]. 41 Figure 2.6 Dynamic hand gestures of [88]. . . . . . . . . . . . . . . . . . . 42 Figure 2.7 Gestures of NATOPS dataset [140]. . . . . . . . . . . . . . . . . 43 Figure 2.8 Dynamic hand gestures of SKIG Dataset [76]. . . . . . . . . . . 44 Figure 2.9 Gestures in Charlean dataset. . . . . . . . . . . . . . . . . . . . 44 Figure 2.10 Dynamic hand gestures of [93]. . . . . . . . . . . . . . . . . . . 44 Figure 2.11 Dynamic hand gestures of the NVIDIA dataset [87]. . . . . . . . 45 Figure 2.12 Dynamic hand gestures of PowerGesture dataset [71]. . . . . . . 45 Figure 2.13 Hand shape variations and hand trajectories (low panel) of the proposed gesture set (5 gestures). . . . . . . . . . . . . . . . . . . . . . 48 Figure 2.14 In each row, changes of the hand shape during a gesture performing. From left-to-right, hand-shapes of the completed gesture chance in a cyclical pattern (closed-opened-closed). . . . . . . . . . . . . . . . . . 49 Figure 2.15 Comparing the similarity between the closed-form gestures and a simple sinusoidal signal. . . . . . . . . . . . . . . . . . . . . . . . . . 51 Figure 2.16 Close cyclical hand gesture pattern and cycle signal. . . . . . . . 51 Figure 2.17 The environment setup the MICA1 dataset. . . . . . . . . . . . 52 Figure 2.18 The environment setup for the MICA2 dataset . . . . . . . . . . 52 Figure 2.19 The environment setup for the MICA3 dataset. . . . . . . . . . 53 xiii Figure 2.20 The environment setup for the MICA4 dataset. . . . . . . . . . 54 Figure 3.1 Diagram of the proposed hand gesture spotting system. . . . . . 57 Figure 3.2 Diagram of the proposed hand detection and segmentation system. 59 Figure 3.3 The Venn diagram representing the relationship between the pixel sets I, D, Bd , Hd , S and H ∗ . . . . . . . . . . . . . . . . . . . . . . . . . 60 Figure 3.4 Results of hand region detection . . . . . . . . . . . . . . . . . . 61 Figure 3.5 Result of the learning distance parameter. (a-c) Three consecutive frames; (d) Results of subtracting two first frames; (e) Results of the subtracting two next frames; (f) Binary thresholding operator; (g) A range of hand (left) and of body (right) on the depth histogram . . . 63 Figure 3.6 The training skin color model . . . . . . . . . . . . . . . . . . . 63 Figure 3.7 Result of the training skin color model . . . . . . . . . . . . . . 64 Figure 3.8 Results of the hand segmentation. (a) a Candidate of hand; (b) Mahalanobis distance; (c) Refining the segmentation results using RGB features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Figure 3.9 Catching buffer to store continous hand frames. . . . . . . . . . 67 Figure 3.10 The area cues of the hand regions. . . . . . . . . . . . . . . . . . 68 Figure 3.11 The velocity cues of the hand regions. . . . . . . . . . . . . . . . 68 Figure 3.12 The comnination of area and velocity signal of the hands. . . . . 69 Figure 3.13 The finding of local peak from the original area signal of the hands. 70 Figure 3.14 Log activities of an evaluator who follows stages of the user-guide scheme and represents seven hand postures for preparing the posture dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Figure 3.15 Seven type of the postures recognized in the proposed system. (a) The first row: original images with results of the hand detections (in red boxes). (b) The second row: zoom-in version of the hand regions without segmentation. (c) The third row: the corresponding segmented hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Figure 3.16 Results of the kernel-based descriptors for hand posture recognition without/with segmentation . . . . . . . . . . . . . . . . . . . . . . 76 xiv Figure 3.17 Performances of the dynamic gesture spotting on two datasets MICA1 and MICA2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Figure 3.18 An illustration of the gesture spotting errors. . . . . . . . . . . . 77 Figure 4.1 The comparison framework of hand gesture recognition . . . . . 81 Figure 4.2 Optical flow and Trajectory of the go-right hand gesture. . . . . 83 Figure 4.3 An illustration of the Go left hand gesture before and after projecting in the constructed PCA space. . . . . . . . . . . . . . . . . . . . 84 Figure 4.4 3D manifold of hand postures belonging to five gesture classes. . 86 Figure 4.5 An illustration of the DTW results of two hand gestures (T,P). (a)-(b) Alignments between postures in T and P in the image space and the spatial-temporal space. (c)-(d) The refined alignments after removing repetitive ones. . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Figure 4.6 Distribution of dynamic hand gestures in the low-dimension. . . 89 Figure 4.7 Five dynamic hand gestures in the 3D dimension. . . . . . . . . 90 Figure 4.8 Define quasi-periodic image sequence . . . . . . . . . . . . . . . 91 Figure 4.9 Illustrations of the phase variations. . . . . . . . . . . . . . . . . 92 Figure 4.10 Define quasi-periodic image sequence in phase domain. . . . . . 92 Figure 4.11 Manifold representation of the cyclical Next hand gesture . . . . 93 Figure 4.12 Phase synchronization. . . . . . . . . . . . . . . . . . . . . . . . 94 Figure 4.13 Whole length sequence is synchronized with the best difference phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Figure 4.14 Whole length sequence is synchronized with the the best similar phase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Figure 4.15 a, c) Original hand gestures. b,d) corresponding interpolated hand gestures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 Figure 4.16 ROC curves of hand gesture recognition results with SVM classifier. 97 Figure 4.17 The dynamic hand gesture recognition results with the difference kernel scale SVM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 98 Figure 4.18 The comparison combination characteristics (KLT and ISOMAP) of dynamic hand gesture . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Figure 4.19 Performance comparisons with different techniques . . . . . . . 101 Figure 4.20 Comparison results between the proposed method vs. others at thirteen positions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Figure 4.21 Dynamic hand gestures in the sub-NVIDIA dataset. . . . . . . . 102 Figure 4.22 Confusion Matrixs with MSRGesture3D and Sub-NVIDIA Datasets103 Figure 5.1 Illustration of light controlling using dynamic hand gestures with different levels of intensity of the lamp. . . . . . . . . . . . . . . . . . . 106 Figure 5.2 Illustration of ten modes of fan controlled by dynamic hand gestures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Figure 5.3 The state diagram of the proposed lighting control system. . . . 107 Figure 5.4 The state diagram of the proposed fan control system. . . . . . 108 Figure 5.5 A schematic representation of basic components in hand gesturebased control system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Figure 5.6 Integration of hand gesture recognition modules. . . . . . . . . . 109 Figure 5.7 The proposed frame-work for training phase. . . . . . . . . . . . 110 Figure 5.8 The proposed flow chart for the online dynamic hand gesture recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Figure 5.9 The proposed flow chart for controlling lamp. . . . . . . . . . . 113 Figure 5.10 The proposed flow chart for controlling fan. . . . . . . . . . . . 114 Figure 5.11 Setup for evaluating the control systems . . . . . . . . . . . . . 115 Figure 5.12 Illustration of environment and material setup. . . . . . . . . . . 117 Figure 5.13 The time-line of the proposed evaluation system. . . . . . . . . . 119 Figure 5.14 The time cost for the proposed dynamic hand gesture recognition system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Figure 5.15 Usability evaluation of the proposed system. . . . . . . . . . . . 120 xvi INTRODUCTION Motivation Home-automation products have been widely used in smart homes (or smart spaces) thanks to recent advances in intelligent computing, smart devices, and new communication protocols. In term of automating ability, most of advanced technologies are focusing on either saving energy or facilitating the control via an user-interface (e.g., remote controllers [92], mobile phones [7], tablets [52], voice recognition [11]). To maximize user ability, a human-computer interaction method must allow end-users easily using and naturally performing the conventional operations. Motivated by such advantages, this thesis pursues an unified solution to deploy a complete hand gesturebased control system for home appliances. A natural and friendly way will be deployed in order to replace the conventional remote controller. A complete gesture-based controlling application requires both robustness as well as low computational time. However, these requirements face to many technical challenges such as a huge computational cost and complexity of hand movements. The previous solution only focus on one of problems in this field. To solve these issues, two trends in the literature are investigated. One common trend bases on aided-devices and another focuses on improving the relevant algorithms/paradigms. The first group addresses the critical issues by using supportive devices such as a data-glove [85, 75], hand markers [111], or contact sensors mounted on hand, or palm of end-users when they control home appliances. Obviously, these solutions are expensive or inconvenient for the end-users. For the second one, hand gesture recognition has been widely attempted by researchers in the communities of computer visions, robotics, and automation control. However, how to achieve the robustness and low computational time still remaining an open question. In this thesis, the main motivation pursues a set of “suggestive” hand gestures. There is an argument that the characteristics of hand gestures are important cues in contexts of deploying a complete hand gesture-based system. On the other hand, recent new and low-cost depth sensors have been widely applied in the fields of robotic and automation control. These devices open new opportunities for addressing the critical issues of gesture recognition schemes. This work attempts to benefit from Kinect sensor [2] which provides both RGB and depth features. Utilizing such valuable features offer an efficient and robust solution for addressing the challenges. 1 Objectives The thesis aims to achieve a robust, real-time hand gesture recognition system. As a feasible solution, the proposed method should be natural and friendly for end-users. A real application is deployed for automatically controlling a fan and/or bulb/lamp using hand gestures. They are the common electrical home appliances. Without any limitation, the proposed technique tends to extend a specific case to general home automation control systems. To this end, the concrete objectives are: - Defining an unique set of dynamic hand gestures. This gesture set conveys commands that are available in common home electronic appliances such as television, fan, lamp, door, air-conditioner, and so on. Moreover, the proposed gesture set is designed with unique characteristics. These characteristics are important cues and offer promising solutions to address the challenges of a dynamic hand gestures recognition system. - A real-time spotting dynamic hand gestures from input video stream. The proposed spotting gesture technique consists of relevant solutions of hand detection and hand segmentation from consecutive RGB-D images. In the view of a complete system, the spotting technique considers a preprocessing procedure. - Performances of a dynamic hand gesture recognition method depends on gesture’s representation and matching phases. This work aims to extract and represent both spatial and temporal features of the gestures. Moreover, thesis intends to match phases of the gallery and probe sequences using a phase synchronization scheme. The proposed phase synchronization aims to solve variants of gesture speeds, acquisition frame rates. In the experiments, the proposed method with various positions, directions, and distances from the human to the Kinect sensor are evaluated. - A proposed framework to control home appliances (such as lamp/fan) is deployed. A full hand gesture-based system is built in an indoor scenario (a smart-room). The prototypes of the proposed system for controlling fans and lamps are shown in Fig. 5.1 and in Fig. 5.2, respectively. Evaluations of usability with the proposed datasets and experimental evaluations are reported. Datasets are also shared to the community for further evaluations. Context, constraints, and challenges Figure 2 shows the context when end-user controls home electronic appliances in a living room environment. Nowadays there are many methods to control home 2
- Xem thêm -

Tài liệu liên quan