Tài liệu Http based botnet detection using network traffic traces

.PDF

147

127

145

tailieuonline Báo vi phạm

Tải xuống 145

Mô tả:

学校代码： 10286 类号： TP393 密级：公开 UDC： 004.7 学号： 119736 HTTP-BASED BOTNET DETECTION USING NETWORK TRAFFIC TRACES 研究生姓名： TRUONG DINH TU _ 导师姓名：程光教授申请学位类别工学博士一级学科名称计算机科学与技术论文答辩日期 2015 年 12 月 22 日_ 二级学科名称计算机系统结构学位授予日期 20__ 年 __ 月 22 日_ 答辩委员会主席陈鸣教授学位授予单位 _ 评阅东南大学______ 人_____________________ ________________________ 2015 年 12 月 22 日博士学位论文 HTTP-BASED BOTNET DETECTION USING NETWORK TRAFFIC TRACES 专业名称：计算机系统结构研究生姓名：TRUONG DINH TU 导师姓名：程光教授 HTTP-BASED BOTNET DETECTION USING NETWORK TRAFFIC TRACES A Dissertation Submitted to Southeast University For the Academic Degree of Doctor of Engineering BY TRUONG DINH TU Supervised by Prof. CHENG Guang School of Computer Science and Engineering Southeast University November 2015 摘要摘要僵尸网络已经成为当今 Internet 面临的最严重威胁之一，它们被作为高度受控的平台用于进行大规模合作的网络攻击，如：分布式拒绝服务，垃圾邮件，信息窃取等。因此，僵尸网络检测至关重要，安全研究人员已经提出了诸多有效的僵尸网络检测方法。然而，僵尸网络制作者仍不断开发新的技术来改进僵尸程序，以逃避安全研究人员提出的检测方法。近年来，基于 HTTP 的僵尸网络愈加泛滥，已对众多政府组织和工业机构造成巨大破坏。新一代的 HTTP 僵尸网络多采取 fastflux, domain-flux 或 DGA (Domain Generation Algorithmically)技术来逃避检测，其中一些使用 domain-flux 技术来规避黑名单检测，而一些使用 fast-flux 技术来隐藏真实的命令控制服务器位置。因此，本文主要研究目标是对使用 DGA, domain-flux 或 fast-flux 技术来逃避检测的 HTTP 僵尸网络构建检测方案。为此，本文解决如下三个问题：（1）研究在被管网或企业网中识别与检测感染 DGA 僵持程序的主机；（2）检测与识别使用 domain-flux 或 DGA 技术的 C&C 服务器；（3）检测恶意的 fast-flux 服务网络。此 3 项的主要研究内容概括如下：第一个问题是如何在被管网或企业网中识别出感染 DGA 僵持程序的主机。为此，本文收集了多个知名的 domain-flux 或 DGA-bot 僵尸程序样本，如 Kraken, Zeus, Conficker, Bobax 和 Murofet。然后在虚拟机环境中执行这些样本并获取相应的网络流量数据。通过检查和分析这些网络流量数据，本文发现这 I Southeast University, PhD. Dissertation, Truong Dinh Tu 些僵尸程序样本在请求域名时呈现出相似的周期行为。另外，感染 domainflux 或 DGA-bot 僵尸程序的主机在查找 C&C 服务器时经常会请求大量的非存在域名，且请求行为的周期时间间隔序列具有相似性。而一般的合法主机是不会以相似的周期时间间隔序列来访问许多不同域名，并且产生大量的非存在域名应答。这些相似行为仅发生在感染 DGA 僵持程序的主机上。因此，基于上述特征，本文提出一种通过分析 DNS 请求时间间隔序列对的关联性来聚类相似域名的方法, 即同一僵尸网络或 DGA 算法所产生的域名相似性的方法。实验结果表明，相同 DGA 僵尸代码产生的域名会被划为同一类别中。请求某类域名的主机则被标记为感染相应 domain-flux 或 DGA-bot 僵尸程序的主机。该方法并不能适用于所有感染僵尸程序主机的检测。它只有效检测被管网内感染 domain-flux 或 DGA 类型僵尸程序主机。此项研究结果将有助于寻找新的 C&C 服务器检测方法，这也是本文今后的研究工作之一（第 4 章）。第二个问题是如何检测出 domain-flux 或 DGA 僵尸网络的 C&C 服务器。已有一些研究工作关注此问题[1-4]，而且这些方法也取得了一定效果。Yadav 等[1]给出了一种基于所有域名一元和二元语法分布的 DGA 僵尸网络 C&C 域名检测方法。然而，该方法特别是检测 Kraken, Bobax 或 Murofet 僵尸网络产生的域名时效果欠佳，因为这些僵尸网络产生的域名与正常域名在一元和二元语法分布上没有较大差别。为克服此缺陷，本文工作目标是改进和扩展 Yadav 等[1]等人的工作。本文计算了正常域名的 n-grams (n=3, 4, 5)的发生频率，并分别给每个 n-gram 评分。为区分一个域名是合法域名还是僵尸网络产生的域名，本文提出了一种方法来测量域名期望分值，并且结合其他两个特征来输入 II 摘要进事先训练好的分类器中。该分类器可用来从用户产生的域名中分类出僵持程序域名。本文使用 5 种不同机器学习算法的分类器，并评测了每种算法检测的有效性。实验结果表明，与其它算法相比，决策树算法（J48）效果最好，对 domain-flux 僵尸网络的检测更为有效。实验结果页证明本文所提方法可在被管网络中有效检测僵尸网络。该方法的具体细节可参考本文第 4 章。最后一个问题是如何使用基于特征机器学习的分类方法来检测恶意 fastflux 服务网络（FFSN）。关于 FFSN 的检测，已经存在一些方法[5-8]。由于 FFSN 的特点是一个或多个域名可被解析为许多（数百个或者数千个）不同的具有较短 TTL，DNS 应答快速变化的 IP 地址。因此，分类过程需要依赖所收集的各种用户所发送请求的完全非预测性时序数据。文献[5-8]所提出的方法使用少量的主动 DNS 流量记录，所以并不能完全获得恶意 fast-flux 网络的所有可能的解析 IP 地址。这个不足会造成假阳率与假阴率的增大。但是，此不足可通过使用被动 DNS 复制方法来解决。本文开发了用于从网络接口或 pcap 文件嗅探 DNS 请求的工具 PassiveTool，并将 DNS 服务器应答输出到日志文件中 (DNSlog)。此技术实际就是重建域名解析系统中心数据库中部分可见数据，并且能被查询和请求，如某个域名过去是指向什么地方，给定名字服务器的是什么域名？指向给定 IP 网络的域名是什么？某个域名下的子域名是什么？等等。本文也定义了 DNSlog 文件来方便跟踪和管理与每个域名相关的请求/应答信息。此外，Holz 等人 [7] 关注从主动 DNS 请求中得出的三个特征（即， DNS A 记录的个数，DNS NS 记录的个数和自治系统的数量）。Passerini 等人 [8] 使用 9 个不同特征，而本文使用 16 个特征来训练分类器。在此 16 个特征中 12 III Southeast University, PhD. Dissertation, Truong Dinh Tu 个特征是本文首次提出来。此方法的优点是能够对大范围的包括恶意域名在内的 fast-flux 域名进行检测进行非常有效的检测。实验结果表明本文方法能产生较低的的 FPR（0.13%），相比之下文献[7]和[8]的 FPR 为 6.17% 和 4.08%。该方法的具体细节可参见本文第 5 章。关键词： HTTP 僵尸网络，命令控制服务器，域名生成算法，DomainFlux, Fast-Flux. IV Abstract Abstract Botnets are generally recognized as one of the most serious threats on the Internet today, because they serve as platforms for the vast majority of large-scale and coordinated cyber-attacks, such as distributed denial of service, spamming, and information stolen. Detecting botnet is therefore of great importance and some security researchers have concerned about this threat and proposed many effective botnet detection approaches. However, botnet developers are constantly developing new techniques in order to improve their bot and avoid the detection from security researchers. In recent years, HTTP-based botnets have become more widespread and caused enormous damage to many government organizations and industries. New generation HTTP botnets tend to use techniques called DGA (Domain Generation Algorithmically), domain-flux, or fast-flux to avoid the detection. Some botnets use the domain-flux technique to evade from being blacklisted; some botnets use the fast-flux technique to hide the true location of their servers. Therefore, the main research objective of this dissertation is to build solutions for detecting HTTP botnets that attackers often use techniques such as DGA, domain-flux or fast-flux to evade the detection. To achieve these goals, the dissertation solves three main problems: (1) To detect the presence of domain-flux or DGA-based botnets infected machines inside an enterprise network or the monitored network; (2) To detect C&C servers of botnets using domain-flux or DGA-based evasion techniques; (3) To detect malicious Fast-Flux Service Networks (FFSNs). The main contents of these three research works are summarized as follows: The first problem is how to identify the presence of domain-flux or DGA-based botnets infected machines inside the enterprise network or the monitored network. To answer this question, multiple well-known domain-flux or DGA-based botnet samples are collected, such as Kraken, Zeus, Conficker, Bobax and Murofet botnets. Then, we execute these bot samples in a virtual machine environment to obtain network traffic traces. Through examining and analyzing on the large number of V Southeast University, PhD. Dissertation, Truong Dinh Tu network traffic traces, we discover that these botnets exhibit many similar periodic behaviors in querying to domain names. In addition, the evidence from this study shows that the domain-flux or DGA-based botnet infected machines often query a large number of the non-existent domain names with similar periodic time interval series to look for their C&C server. The normal legitimate hosts have no reason to query a large number of different domain names with the similar periodic time interval series to yield high volumes of NX-Domains replies. This similar behavior only occurs with the domain-flux or DGA-based botnet infected hosts. Therefore, based on these characteristics, we propose a method based on analyzing correlation between each pair of time intervals series of queries to cluster the similarity of domain names. The experiment results show that the domain names are generated by the same botnet or DGA are grouped into the same clusters. The lists of hosts that tried to query to clusters of these domains are marked as compromised hosts running a given domain-flux or DGA-based botnets. This work is not comprehensive to detect all bot-infected machines. It is only effective for detecting domain-flux or DGA-based bot infected machines inside the monitored network. The results of this research motivate us to consider a new method to detect botnet C&C servers. This research is a part in our next research works (in Chapter 4). The second problem is how to detect C&C servers of domain-flux or DGAbased botnets. Several previous approaches [1-4] have concerned about this threat and their strategies have brought the useful results. Yadav et al. [1] presented a technique to detect C&C domains of DGA-based botnets by looking at the distribution of unigrams and bigrams in all domain names. However, the unigramsand bigrams-based technique may not suffice, especially to detect domains generated by Kraken, Bobax or Murofet botnets due to the distributions of unigram and bigrams in all domains of these botnets are not significant difference compared to those of benign domains. To overcome this limitation, our works aim to improve and expand from the works of Yadav et al. [1]. We calculate frequency of occurrence of n-grams (n=3, 4, 5) in benign domain names and then assign score for each ngram, respectively. To distinguish a domain generated by legitimate users or botnets, we present a method to measure the expected score of domain (ESOD) and combine with two other features aiming to feed into a classifier that we previously trained to VI Abstract classify bot-generated domain names from human-generated ones. We use five various machine learning algorithms to train classifiers and evaluate the detection effectiveness on each algorithm. The experimental results show that the decision tree algorithm (J48) is the best classifier can be used to detect botnet more efficient than other algorithms. The evidence from the experimental results has demonstrated that our proposed approach can be used to detect botnet in the monitored network efficiently. The details of the method will be presented in Chapter 4 of the dissertation. The final problem is how to detect malicious fast-flux service networks use feature-based machine learning classification techniques. There are some approaches have been developed to detect FFSN [5-8]. Since the characteristics of FFSN is one or more domain names that are resolved to multiple (hundreds or even thousands) different IP addresses with short time-to-live, and the rapid (fast) change in DNS answers. Therefore, classification process needs to rely on data gathered by completely unpredictable timing of queries sent by various users. The approaches that are proposed by [5-8] use a small amount of active DNS traffic traces, so it cannot obtain as many as possible resolved IP addresses of malicious fast-flux networks. This disadvantage may enhance false positive and false negative rates. However, this limitation may overcome if passive DNS replication method is installed. In this study, we build a PassiveDNS tool to sniff traffic from an interface or read a pcap-file and outputs the DNS-server answers to a log file (DNSlog). This is a technique to reconstruct a partial view of the data available in the Domain Name System into a central database, where it can be indexed and queried. The DNSlog databases are extremely useful for a variety of purposes, it can answer questions that are difficult or impossible to answer with the standard DNS protocol, such as where did this domain name point to in the past? What domain names are hosted by a given name-server? What domain names point into a given IP network? What subdomains exist below a certain domain name? We also define a DNSlog data aggregate aim to facilitate tracking and management of the query/response information related to each domain. Moreover, Holz et al. [7] focus on just three features derived from active DNS queries (i.e., the number of DNS "A" records, the number of DNS "NS" records, and the number of distinct Autonomous Systems (AS)). Passerini et al. [8] VII Southeast University, PhD. Dissertation, Truong Dinh Tu employ 9 different features, while we use 16 key features to train classifiers. Among the 16 introduced features, there are 12 features are first proposed in this dissertation. The advantage of our approach is that it is able to detect a wide range of fast flux domains including malware domains with a significant detection effect. The experimental results show that our method produces a lower false positive rate (FPR) (0.13%) compared to FPR of 6.17% produce by [7] and 4.08% produce by [8]. The details of the method will be presented in Chapter 5 of this dissertation. Keywords: HTTP botnet, C&C Server, Domain Generation Algorithm (DGA), Domain-Flux, Fast-Flux. VIII Table of Content Table of Content 摘要................................................................................................................................ I Abstract ......................................................................................................................... V Table of Content ...........................................................................................................IX List of Figures ........................................................................................................... XIII List of Tables .............................................................................................................. XV List of Abbreviations................................................................................................. XVI Chapter 1. Introduction .................................................................................................. 1 1.1 Botnet Definition ............................................................................................. 1 1.1.1 Bot and botnet .......................................................................................... 1 1.1.2 History of the Botnet ................................................................................ 2 1.1.3 Botnet Architecture .................................................................................. 4 1.1.4 Botnet lifecycle ........................................................................................ 8 1.2 Evolution of Botnet ....................................................................................... 11 1.2.1 IRC-Based Botnet .................................................................................. 12 1.2.2 P2P-Based Botnet .................................................................................. 12 1.2.3 HTTP-Based Botnet ............................................................................... 13 1.3 Motivation and Challenges............................................................................ 14 1.4 The goal of the dissertation ........................................................................... 16 1.5 Contributions and Outline of dissertation ..................................................... 16 1.5.1 Contributions.......................................................................................... 16 1.5.2 Outline of the Dissertation ..................................................................... 19 Chapter 2. Background and Related Works ................................................................. 21 2.1 Botnet Detection Techniques......................................................................... 21 2.1.1 Honeypots-based detection .................................................................... 21 2.1.2 Anomaly-based Detection ...................................................................... 23 2.1.3 DNS-based Detection............................................................................. 23 2.1.4 Mining-based Detection ......................................................................... 25 IX Southeast University, PhD. Dissertation, Truong Dinh Tu 2.2 Detection evasion techniques ........................................................................ 26 2.2.1 DGA-Based technique ........................................................................... 26 2.2.2 Fast Flux-Based technique ..................................................................... 27 2.3 Related Works ............................................................................................... 31 Chapter 3. Detecting DGA-Bot Infected Machines Based On Analyzing The Similar Periodic Of Domain Queries........................................................................................ 35 3.1 Introduction ................................................................................................... 35 3.2 Proposed methods ......................................................................................... 37 3.2.1 System Overview ................................................................................... 37 3.2.2 Filtering DNS traffic .............................................................................. 38 3.2.3 Similarity Analyzer ................................................................................ 39 3.2.4 Clustering ............................................................................................... 41 3.3 Experiment Results ....................................................................................... 42 3.3.1 Bot samples collection ........................................................................... 42 3.3.2 DNS traffic extraction ............................................................................ 43 3.3.3 Detection and Clustering........................................................................ 47 3.4 Discussions .................................................................................................... 49 3.5 Conclusion and Future Work ......................................................................... 50 Chapter 4. Detecting C&C Servers Of Botnet With Analysis Features Of Network Traffic ........................................................................................................................... 51 4.1 Introduction ................................................................................................... 51 4.2 Related Works ............................................................................................... 53 4.3 Proposed Approach ....................................................................................... 54 4.3.1 System Overview ................................................................................... 54 4.3.2 Training Phase........................................................................................ 55 4.3.3 Detecting Phase...................................................................................... 57 4.3.4 Feature extraction................................................................................... 59 4.3.5 C&C Detection....................................................................................... 61 4.4 Experimental and Evaluation ........................................................................ 62 4.4.1 Prepare the Training Data Set ................................................................ 62 X Table of Content 4.4.2 Evaluation of features selection ............................................................. 62 4.4.3 The Classifier Comparison .................................................................... 64 4.4.4 Evaluation of the detection rate on real-world DNS traffic ................... 66 4.4.5 Compare with other approaches............................................................. 71 4.5 Discussion ..................................................................................................... 73 4.6 Conclusion ..................................................................................................... 74 Chapter 5. Detecting Malicious Fast-Flux Service Networks Use Feature-Based Machine Learning Classification Techniques .............................................................. 75 5.1 Introduction ................................................................................................... 75 5.2 Related works ................................................................................................ 78 5.3 Proposed Methods ......................................................................................... 80 5.3.1 System Overview ................................................................................... 80 5.3.2 Data Aggregate....................................................................................... 81 5.3.3 Data Pre-filtering ................................................................................... 83 5.3.4 Feature Extraction .................................................................................. 85 5.4 Experiment and Evaluation ........................................................................... 95 5.4.1 Data Set .................................................................................................. 95 5.4.2 Experimental Results ............................................................................. 97 5.4.3 Compare with previous works ............................................................. 104 5.5 Conclusion ................................................................................................... 105 Chapter 6. Conclusion and Future Works .................................................................. 107 6.1 Summary of Research and Conclusions ...................................................... 107 6.2 Limitation and Future Work ........................................................................ 109 Bibliography .............................................................................................................. 111 Acknowledgements .................................................................................................... 111 List of Publications .................................................................................................... 123 XI

- Xem thêm -

Tài liệu liên quan

Tài liệu vừa đăng

Tài liệu xem nhiều nhất