Tài liệu Cải tiến một số thuật toán trong miễn dịch nhân tạo cho phát hiện xâm nhập mạng

.DOCX

116

113

145

hoangtuavartar Báo vi phạm

Tải xuống 145

Mô tả:

MINISTRY OF EDUCATION AND TRAINING VIETNAMESE ACADEMY OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY |||||||||||| NGUYEN VAN TRUONG IMPROVING SOME ARTIFICIAL IMMUNE ALGORITHMS FOR NETWORK INTRUSION DETECTION THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS Hanoi - 2019 MINISTRY OF EDUCATION VIETNAMESE ACADEMY AND TRAINING OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY |||||||||||| NGUYEN VAN TRUONG IMPROVING SOME ARTIFICIAL IMMUNE ALGORITHMS FOR NETWORK INTRUSION DETECTION THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS Major: Mathematical foundations for Informatics Code: 62 46 01 10 Scienti c supervisor: 1. Assoc. Prof., Dr. Nguyen Xuan Hoai 2. Assoc. Prof., Dr. Luong Chi Mai Hanoi - 2019 Acknowledgments First of all I would like to thank is my principal supervisor, Assoc. Prof., Dr. Nguyen Xuan Hoai for introducing me to the eld of Arti cial Immune System. He guides me step by step through research activities such as seminar presentations, paper writing, etc. His genius has been a constant source of help. I am intrigued by his constructive criticism throughout my PhD. journey. I wish also to thank my cosupervisor, Assoc. Prof., Dr. Luong Chi Mai. She is always very enthusiastic in our discussion promising research questions. It is a pleasure and luxury for me to work with her. This thesis could not have been possible without my supervisors’ support. I gratefully acknowledge the support from Institute of Information Technology, Vietnamese Academy of Science and Technology, and from Thai Nguyen University of Education. I thank the nancial support from the National Foundation for Science and Technology Development (NAFOSTED), ASEANEuropean Academic University Network (ASEA-UNINET). I thank M.Sc. Vu Duc Quang, M.Sc. Trinh Van Ha and M.Sc. Pham Dinh Lam, my co-authors of published papers. I thank Assoc. Prof., Dr. Tran Quang Anh and Dr. Nguyen Quang Uy for many helpful insights for my research. I thank colleagues, especially my cool labmate Mr. Nguyen Tran Dinh Long, in IT Research & Development Center, HaNoi University. Finally, I thank my family for their endless love and steady support. Certi cate of Originality I hereby declare that this submission is my own work under my scienti c super-visors, Assoc. Prof., Dr. Nguyen Xuan Hoai, and Assoc. Prof., Dr. Luong Chi Mai. I declare that, it contains no material previously published or written by another person, except where due reference is made in the text of the thesis. In addition, I certify that all my co-authors allow me to present our work in this thesis. Hanoi, 2019 PhD. student Nguyen Van Truong i Contents List of Figures List of Tables Notation and Abbreviation INTRODUCTION Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 BACKGROUND 1.1 Detection of Network Anomalies . . . . . . . . . . . . . . . . . . . . . . 1.1.1 1.1.2 1.1.3 1.1.4 1.2 A brief overview of human immune sys 1.3 AIS for IDS . . . . . . . . . . . . . . . . . . . . . 1.3.1 1.3.2 1.4 Selection algorithms . . . . . . . . . . . . . 1.4.1 1.4.2 1.5 Basic terms and de nitions . . . . . . . . . 1.5.1 1.5.2 1.5.3 1.5.4 1.5.5 1.5.6 1.5.7 1.5.8 1.6 Datasets . . . . . . . . . . . . . . . . . . . . . . . 1.6.1 1.6.2 1.6.3 1.6.4 1.7 Summary . . . . . . . . . . . . . . . . . . . . . . 2 COMBINATION OF NEGATIVE SELECTION AND POSITIVE SELECTION 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 2.2 Related works . . . . . . . . . . . . . . . . . . . 2.3 New Positive-Negative Selection Algo 2.4 Experiments . . . . . . . . . . . . . . . . . . . . 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . 3 GENERATION OF COMPACT DETECTOR SET 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 3.2 Related works . . . . . . . . . . . . . . . . . . . 3.3 New negative selection algorithm . . . 3.3.1 3.3.2 3.4 Experiments . . . . . . . . . . . . . . . . . . . . 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . 4 FAST SELECTION ALGORITHMS 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . 4.2 Related works . . . . . . . . . . . . . . . . . . . 4.3 A fast negative selection algorithm bas 4.4 A fast negative selection algorithm bas 4.5 Experiments . . . . . . . . . . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . 5 APPLYING HYBRID ARTIFICIAL IMMUNE SYSTEM FOR NETWORK SECURITY 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . 5.2 Related works . . . . . . . . . . . . . . . . . . . 5.3 Hybrid positive selection algorithm with 5.4 Experiments . . . . . . . . . . . . . . . . . . . . 5.4.1 5.4.2 5.4.3 5.4.4 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . CONCLUSIONS Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Published works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv BIBLIOGRAPHY v List of Figures 1.1 Classi cation of anomaly-based intrusion detection met 1.2 Multi-layered protection and elimination architecture . . 1.3 Multi-layer AIS model for IDS . . . . . . . . . . . . . . . . . . . . 1.4 Outline of a typical negative selection algorithm. . . . . . 1.5 Outline of a typical positive selection algorithm. . . . . . . 1.6 Example of a pre x tree and a pre x DAG. . . . . . . . . . . 1.7 Existence of holes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Negative selections with 3-chunk and 3-contiguous det 1.9 A simple ring-based representation (b) of a string (a). . 1.10 Frequency trees for all 3-chunk detectors. . . . . . . . . . . 2.1 Binary tree representation of the detectors set generate 2.2 Conversion of a positive tree to a negative one. . . . . . . 2.3 Diagram of the Detector Generation Algorithm. . . . . . . 2.4 Diagram of the Positive-Negative Selection Algorithm. 2.5 One node is reduced in a tree: a compact positive tree h and its conversion (a negative tree) has 3 node (b). . . 2.6 Detection time of NSA and PNSA. . . . . . . . . . . . . . . . . 2.7 Nodes reduction on trees created by PNSA on Net ow 2.8 Comparison of nodes reduction on Spambase dataset. 3.1 Diagram of a algorithm to generate perfect rcbvl detect 4.1 Diagram of the algorithm to generate positive r-chunk detectors set. . . 55 vi 4.2 A pre x DAG G and an automaton M . . . . . . . . . . . . . . . . . 4.3 Diagram of the algorithm to generate negative r-contiguous d 4.4 An automaton represents 3-contiguous detectors set. . . . . 4.5 Comparison of ratios of runtime of r-chunk detector-based time of Chunk-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Comparison of ratios of runtime of r-contiguous detector-ba runtime of Cont-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables 1.1 Performance comparison of NSAs on linear strings and ring strings. . . 24 2.1 Comparison of memory and detection time reductions. . . . . . . . . . 39 2.2 Comparison of nodes generation on Net ow dataset. . . . . . . . . . . . 40 3.1 Data and parameters distribution for experiments and results comparison. 49 4.1 Comparison of our results with the runtimes of previously published algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Comparison of Chunk-NSA with r-chunk detector-based NSA. . . . . . 63 4.3 Comparison of proposed Cont-NSA with r-contiguous detector-based NSA. 64 5.1 Features for NIDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Distribution of 73 ows and parameters for experiments. . . . . . . . . . . 5.3 Comparison between PSA2 and other algorithms. . . . . . . . . . . . . 74 5.4 Comparison between ring string-based PSA2 and linear string-based PSA2. 76 viii Notation and Abbreviation Notation Dpi Dni Length of data samples Set of ring presentations of all strings in S Cardinality of set X An alphabet, a nonempty and nite set of symbols Set of all strings of length k on alphabet , where k is a positive integer. Set of all strings on alphabet , including an empty string. Matching threshold Set of all positive r-chunk detectors at position i. Set of all negative r-chunk detectors at position i. CONT(S; r) L(X) rcbvl Set of all r-contiguous detectors. Set of all nonself strings detected by X. r-contiguous bit with variable length. ‘ Sr jXj k r Abbreviation Cont-NSA DR DAG FAR GA Arti cial Immune System Accuracy Rate Ant Colony Optimization Anomaly Network Intrusion Detection System Block-Based Neural Network Chunk Detector-Based Negative Selection Algorithm Contiguous Detector-Based Negative Selection Algorithm Detection Rate Directed Acyclic Graph False Alarm Rate Genetic Algorithm HIDS IDS Host Intrusion Detection System Intrusion Detection System AIS ACC ACO ANIDS BBNN Chunk-NSA ix ML MLP NIDS NS NSA NSM PNSA PSA PSA2 PSO Machine Learning Multilayer Perceptron Network Intrusion Detection System Negative Selection Negative Selection Algorithm Negative Selection Mutation Positive-Negative Selection Algorithm Positive Selection Algorithm Two-class Positive Selection Algorithm Particle Swarm Optimization PSOGSA Particle Swarm Optimization-Gravitational Search Algorithm RNSA SVM TCP VNSA Real-valued NSA Support Vector Machines Transmission Control Protocol Variable length detector-based NSA 1 INTRODUCTION Motivation Internet users and computer networks are su ering from rapidly increasing num-ber of attacks. In order to keep them safe, there is a need for e ective security monitor-ing systems, such as Intrusion Detection Systems (IDS). However, intrusion detection has to face a number of di erent problems such as large network tra c volumes, im-balanced data distribution, di culties to realize decision boundaries between normal and abnormal actions, and a requirement for continuous adaptation to a constantly changing environment. As a result, many researchers have attempted to use di erent types of approaches to build reliable intrusion detection system. Computational intelligence techniques, known for their ability to adapt and to exhibit fault tolerance, high computational speed and resilience against noisy informa-tion, are hopefully alternative methods to the problem. One of the promising computational intelligence methods for intrusion detection that have emerged recently are arti cial immune systems (AIS) inspired by the biological immune system. Negative selection algorithm (NSA), a dominating model of AIS, is widely used for intrusion detection systems (IDS) [55, 52]. Despite its successful application, NSA has some weaknesses: 1-High false positive rate (false alarm rate) and false negative rate, 2-High training and testing time, 3-Exponential relationship between the size of the training data and the number of detectors possibly generated for testing, 4-Changeable de nitions of "normal data" and "abnormal data" in dynamic network environment [55, 79, 92]. To overcome these limitations, trends of recent works are to concentrate on complex structures of immune detectors, matching methods and hybrid NSAs [11, 94, 52]. Following trends mentioned above, in this thesis we investigate the ability of NSA to combine with other classi cation methods and propose more e ective data 2 representations to improve some NSA’s weaknesses. Scienti c meaning of the thesis: to provide further background to improve per-formance of AIS-based computer security eld in particular and IDS in general. Reality meaning of the thesis: to assist computer security practicers or experts implement their IDS with new features from AIS origin. The major contributions of this research are: Propose a new representation of data for better performance of IDS; Propose a combination of existing algorithms as well as some statistical approaches in an uniform framework; Propose a complete and non-redundant detector representation to archive optimal time and memory complex-ities. Objectives Since data representation is one of the factors that a ect the training and testing time, a compact and complete detector generation algorithm is investigated. The thesis investigates optimal algorithms to generate detector set in AIS. They help to reduce both training time and detecting time of AIS-based IDSs. Also, it is regarded to propose and investigate an AIS-based IDS that can promptly detect attacks, either if they are known or never seen before. The proposed system makes use of AIS with statistics as analysis methods and owbased network tra c as experimental data. Problem statements Since the NSA has some limitations as listed in the rst section, this thesis concentrates on three problems: 1. The rst problem is to nd compact representations of data. Objectives of this problem’s solution is not only to minimize memory storage but also to reduce testing time. 2. The second problem is to propose algorithms that can reduce training time and testing time in compared with all existing related algorithms. 3 3. The third problem is to improve detection performance with respect to reduc-ing false alarm rates while keeping detection rate and accuracy rate as high as possible. Solutions of these problems can partly improve rst three weaknesses as listed in the rst section. Regarding to the last NSAs’ weakness about changeable de nitions of "normal data" and "abnormal data" in dynamic network environment, we consider it as a risk in our proposed algorithm and left it for future work. Logically, it is impossible to nd an optimal algorithm that can both reduce time and memory complexities and obtain best detection performance. These aspects are always in con ict with each other. Thus, in each chapter, we will propose algorithms to solve each problem quite independently. The intrusion detection problem mentioned in this thesis can be informally stated as: Given a nite set S of network ows which labeled with self (normal) or nonself (abnormal). The objective is to build classifying models on S that can label exactly an unlabeled network ow s. Outline of thesis The rst chapter introduces the background knowledge necessary to discuss the algorithms proposed in following chapters. First, detection of network anomalies is brie y introduced. Following that, the human immune system, arti cial immune sys-tem, machine learning and their relevance are reviewed and discussed. Then, popular datasets used for experiments in the thesis are examined. related works. In Chapter 2, a combination method of selection algorithms is presented. The proposed technique helps to reduce detectors storage generated in training phase. Test-ing time, an important measurement in IDS, will also be reduced as a direct consequence of a smaller memory complexity. Tree structure is used in this chapter (and in Chapter 5) to improve time and memory complexities. A complete and nonredundant detector set, also called perfect detectors set, 4 is necessary to archive acceptable self and nonself coverage of classi ers. A selection algorithm to generate a perfect detectors set is investigated in Chapter 3. Each detector in the set is a string concatenated from overlapping classical ones. Di erent from approaches in the other chapters, discrete structure of string-based detectors in this chapter are suitable for detection in distributed environment. Chapter 4 includes two selection algorithms for fast training phase. The optimal algorithms can generate a detectors set in linear time with respect to size of training data. The experiment results and theoretical proof show that proposed algorithms outperform all existing ones in term of training time. In term of detection time, the rst algorithm and the second one is linear and polynomial, respectively. Chapter 5 mainly introduces a hybrid approach of positive selection algorithm with statistics for more e ective NIDS. Frequencies of self and nonself data (strings) are contained in leaves of trees representing detectors. This information plays an important role in improving performance of the proposed algorithms. The hybrid approach came as a new positive selection algorithm for two-class classi cation that can be trained with samples from both self and nonself data types. 5 Chapter 1 BACKGROUND The human immune system (HIS) has successfully protected our bodies against attacks from various harmful pathogens, such as bacteria, viruses, and parasites. It distinguishes pathogens from self-tissue, and further eliminates these pathogens. This provides a rich source of inspiration for computer security systems, especially intrusion detection systems [92]. Hence, applying theoretical immunology and observed immune functions, its principles, and its models to IDS has gradually developed into a new research eld, called arti cial immune system (AIS). How to apply remarkable features of HIS to archive scalable and robust IDS is considered a researching gap in the eld of computer security. In this chapter, we introduce the background knowledge necessary to discuss the algorithms proposed in following chapters that can partly ful ll the gap. Firstly, a brief introduction to network anomaly detection is presented. We then overview HIS. Next, immune selection algorithms, detectors, performance metrics and their relevance are reviewed and discussed. Finally, some popular datasets are examined. 1.1 Detection of Network Anomalies The idea of intrusion detection is predicated on the belief that an intruder’s behavior is noticeably di erent from that of a legitimate user and that many unauthorized actions are detectable [65]. Intrusion detection systems (IDSs) are deployed as a second line of defense along with other preventive security mechanisms, such as user 6 authentication and access control. Based on its deployment, an IDS can act either as a host-based or as a network-based IDS. 1.1.1 Host-Based IDS A Host-Based IDS (HIDS) monitors and analyzes the internals of a computing system. A HIDS may detect internal activity such as which program accesses what resources and attempts illegitimate access, for example, an activity that modi es the system password database. Similarly, a HIDS may look at the state of a system and its stored information whether it is in RAM or in the le system or in log les or elsewhere. Thus, one can think of a HIDS as an agent that monitors whether anything or anyone internal or external has circumvented the security policy that the operating system tries to enforce [12]. 1.1.2 Network-Based IDS A Network-Based IDS (NIDS) detects intrusions in network data. Intrusions typically occur as anomalous patterns. Most techniques model the data in a sequential fashion and detect anomalous subsequences. The primary reason for these anomalies is the attacks launched by outside attackers who want to gain unauthorized access to the network to steal information or to disrupt the network. In a typical setting, a network is connected to the rest of the world through the Internet. The NIDS reads all incoming packets or ows, trying to nd suspicious patterns. For example, if a large number of TCP connection requests to a very large number of di erent ports are observed within a short time, one could assume that there is someone committing a port scan at some of the computers in the network. Port scans mostly try to detect incoming shell codes in the same manner that an ordinary intrusion detection system does. In addition to inspecting the incoming tra c, a NIDS also provides valuable information about intrusion from outgoing or local tra c. Some attacks might even be staged from the inside of a monitored network or network segment; and therefore, not regarded as incoming tra c at all. The data available for intrusion detection systems can be at di erent levels of granularity, like packet level traces or Cisco net ow data. 7 The data is high dimensional, typically, with a mix of categorical as well as continuous numeric attributes. Misuse-based NIDSs attempt to search for known intrusive patterns while an anomaly-based intrusion detector searches for unusual patterns. Today, the intrusion detection research is mostly concentrated on anomaly-based network intrusion detection because it can detect both known and unknown attacks [12]. 1.1.3 Methods On the basis of the availability of prior knowledge, the detection mechanism used, the mode of performance and the ability to detect attacks, existing anomaly detection methods are categorized into six broad categories [41] as shown in Fig. 1.1. This gure is adapted from [12]. Supervised Learning Unsupervised Learning Probabilistic Learning Anomaly Detection Soft Computing Knowledge based Combination Learners Figure 1.1: Classi cation of anomaly-based intrusion detection methods

- Xem thêm -

Tài liệu liên quan

Tài liệu vừa đăng

Tài liệu xem nhiều nhất