Topology Optimization for DHT-based Application Layer Multicast

  • Số trang: 49 |
  • Loại file: PDF |
  • Lượt xem: 26 |
  • Lượt tải: 0
nhattuvisu

Đã đăng 26946 tài liệu

Mô tả:

Table of Contents Abstract 1 Acknowledgements iii Abstract 1 1 Introduction 1.1 Motivation . . . 1.2 Objectives . . . 1.3 Contributions . 1.4 Thesis structure . . . . 2 2 4 5 5 . . . . . . . . . . . . . . . 6 6 6 7 8 9 10 10 10 11 12 12 13 14 15 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Background 2.1 Multicast . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Introduction . . . . . . . . . . . . . . . . . . 2.1.1.1 IP Multicast . . . . . . . . . . . . 2.1.1.2 Application Layer Multicast . . . 2.1.2 Application layer multicast protocols . . . . 2.1.2.1 Application Domain . . . . . . . . 2.1.2.2 Deployment Level . . . . . . . . . 2.1.2.3 Group Management . . . . . . . . 2.1.2.4 Routing Mechanism . . . . . . . . 2.2 DHT-based multicast . . . . . . . . . . . . . . . . . 2.2.1 Introduction of P2P Networks . . . . . . . . 2.2.1.1 Unstructured P2P Network model 2.2.1.2 Hybrid P2P Network model . . . . 2.2.1.3 Structured P2P Network model . . 2.2.2 DHT-based structure P2P networks . . . . . iv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TABLE OF CONTENTS 2.3 v 2.2.2.1 Chord Network . . . . . . . . . . . . . . . . . . . . . 2.2.2.2 Content Adressable Network . . . . . . . . . . . . . . 2.2.3 DHT-based multicast . . . . . . . . . . . . . . . . . . . . . . . 2.2.3.1 CAN-based multicast . . . . . . . . . . . . . . . . . . 2.2.3.2 Chord-based multicast . . . . . . . . . . . . . . . . . 2.2.3.3 Scribe . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Topology optimization issues for DHT-based multicast . . . . Related works on topology optimization for DHT-based multicast . . 2.3.1 SplitStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Capacity Aware Multicast based on Overlay Network - CAMChord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 DHT-based lightweight broadcast algorithms in large-scale computing infrastructures . . . . . . . . . . . . . . . . . . . . . . . 3 Bandwidth Adaptive Multicast 3.1 Overview . . . . . . . . . . . . 3.1.1 Node identifier . . . . 3.1.2 Finger table . . . . . . 3.2 Network Construction . . . . 3.3 Multicast method . . . . . . . over . . . . . . . . . . . . . . . 4 Simulations and Evaluations 4.1 Simulation description . . . . . . . 4.2 Evaluation . . . . . . . . . . . . . . 4.2.1 The depth of multicast tree 4.2.2 Control Overhead . . . . . . 5 Conclusions Chord : BAM Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20 22 22 24 25 26 26 26 27 28 . . . . . 31 31 32 34 35 36 . . . . 38 38 38 38 40 42 List of Figures 2.1 Type of transmissions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 IP Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 A Comparison between IP multicast and application-layer multicast . 9 2.4 P2P Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 A unstructured P2P Network with a server . . . . . . . . . . . . . . . 13 2.6 Gnutella-like P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.7 An example of Hybrid P2P Model . . . . . . . . . . . . . . . . . . . . 15 2.8 Scheme of a classic Hash Table . . . . . . . . . . . . . . . . . . . . . 16 2.9 Scheme of a Distributed Hash Table (DHT) . . . . . . . . . . . . . . 17 2.10 A Chord ring with 6 bit identifier. Notice how the finger table is organized and how K54 is looked up following Chord’s algorithm . . . 18 2.11 Example of Chord Finger table . . . . . . . . . . . . . . . . . . . . . 20 2.12 Example of Content Addressable Network . . . . . . . . . . . . . . . 21 2.13 Smart multicast in CAN. The dot represents the starting node . . . . 23 2.14 An example of Chord-based multicast method. In this example, Node 1 sends messages to all nodes in the Chord ring. . . . . . . . . . . . . 24 2.15 A simple example illustrating the basic approach of SplitStream. Original content is split into two stripes and correlatively, an independent multicast tree is built for each stripes. A node is a leaf in one tree and a interior in the other . . . . . . . . . . . . . . . . . . . 27 2.16 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 28 2.17 Balanced Distributed broadcast tree construction using token in 16node Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.18 Balanced Distributed broadcast tree construction using partition in an 11-node Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 33 vi LIST OF FIGURES vii 3.2 3.3 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 35 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 37 4.1 4.2 4.3 Average path length . . . . . . . . . . . . . . . . . . . . . . . . . . Maximum path length . . . . . . . . . . . . . . . . . . . . . . . . . Average path length in Chord, CAM-Chord and BAM-Chord after 32 time steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average link number per node in Chord . . . . . . . . . . . . . . . Average link in Chord, CAM-Chord and BAM-Chord after 32 time steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 4.5 . 39 . 40 . 40 . 41 . 41 List of Tables 2.1 Definition of variables for node n, using m-bit identifiers . . . . . . . 19 viii Abstract In recent years, Distributed Hash Table (DHT) becomes active and ongoing area of research at a lot of universities and labs. DHT has many advantages: Decentralization, scalability, fault tolerance, load balancing, data integrity, and performance,... Those properties make DHTs are very suitable for deploying multicast services at application layer and in fact, DHT-based network such as CAN, Chord, Pastry, Tapestry, etc can be used to implement Internet-scale application layer multicast. However, early DHT-based multicast systems are insufficient in addressing all of these issues: Heterogeneous node capacity, large- scale multicast and dynamic membership. Moreover, in those system, when one node joins into system through an arbitrary way, some factors are not considered: nodes bandwidth, nodes positon on DHT network (i.e node identifiers), thus, the multicast tree can be built inefficiently and not balance in structure. The solution for assigning an appropriate number of child nodes to each node is far from optimal in term of bandwidth: If the number of child nodes is too high, low capacity node will be overloaded, therefore slows the entire session multicast down. If the number of child nodes is too low, high capacity nodes will be used inefficiently. In this thesis, we study the method to optimize topology for DHT-based multicast. We propose a DHT- based bandwidth adaptive multicast system that forcus on host heterogeneity, scalibility, fault tolerate. In our system, nodes bandwidth is firstly considered, result of this process is the basis for determining the level of the node and correlatively caculating nodes identify. Level of a node is used to define maximum number of its child nodes. As a result, in our model, each node is assigned an optimal numbers of child nodes to forward multicast data. Thus, our method can make tradeoff between depth of the multicast tree and bandwidth of every node and take advandtages of DHTs in maintaining multicast tree in churn overlay. System chosen for implementation and avaluation is Chord. This model is called Bandwidth Adaptive Multicast over Chord: BAM-Chord. 1 Chapter 1 Introduction 1.1 Motivation Today, Internet continues to grow rapidly, many communication requirements that we must deal with. Cause of the explosion in data communications, demand on transferring large volumes of data to many destination quickly increases. To solve bandwidth problems, multicast is a effective solution. As the most cost-effective way of delivering the same data to a multiple receivers at the same time, multicast is suited to a large number of applications as mobile Internet, e-commerce, content rich applications, and multi-media services: television broadcast, video conferencing. The word ”multicast” is typically used to refer to network layer multicast - IP multicast and was proposed about some decades ago. However, IP multicast still has not been widely deployed because of various technical and administrational reasons. The disadvantages of implementing multicast at the IP level has led to an interesting application-level multicast. Application layer multicast refers to the implementation of multicast capability at the application layer instead of network layer. A number of application-level multicast systems have been proposed that are built using structured peer-to-peer (p2p) overlays. Originally, distribute hash table (DHTs) were developed with applications like peer-to-peer file sharing. Recent years, there has been considerable interest in applying DHTs to overlay multicast applications since it has many advantages: Decentralization, Scalability, Fault tolerance, Load balancing, and performance,...Those properties make DHTs are very suitable for deploying multicast services at application layer and in fact, DHT-based network such as CAN, Chord, Pastry, Tapestry, 2 1.1. Motivation 3 etc has been already used to implement Internet -scale applicastion layer muticast. However, early DHT-based multicast systems are insufficient in addressing a number of technical issues: • Heterogeneous output bandwidth of Node: Since the number of child nodes of a node in a multicast tree is decided based on the DHT algorithm without consideration of nodes bandwidth capacity, a node with low bandwidth may become a bottleneck if it is an internal node of a multicast tree • Effective bandwidth utilization: If a node with high bandwidth is a leaf node, the system can not utilize the bandwidth capacity of the node to deliver multicast messages. • Dynamic membership: The optimization of multicast trees must be achieved even when member nodes join or leave at anytime. To limit the multicasting load on a node, M. Castro, el al. propose a solution in SplitStream (M.Castro, ), where one node informs its out-degree. When it receives a futher join requests, it pushdown this request to its children. Nevertheless, Bharambe, et al. explored and found heterogeneity issue is not address with SplitStream (Feb, February 24 25 2005). With CAM CAM-Chord and CAM-Koorde designs in (6-1, 6 10 June 2005), heterogeneity is tackled by allowing node to alter out-degree according to its capacity. However, they did not describe how to maintain topology when the out-degree limits. Therefore, heterogeneity issue remains open, and should be addressed before deploying DHTs for high-bandwidth multicasting applications. S. Ratnasamy, M. Handley, R. Karp, and S. Shenker proposed CAN-based multicast (S. Ratnasamy & Shenker, Nov 7 9 2001). That was the first application-level multicast scheme scales to groups of several thousands of nodes. Solution in (S.Q.Zhuang & J.D.Kubiatowicz., 2001) can scales to large groups but has a single root node for each multicast group. It supports the any-source model only by having the root node operate as a reflector for multiple senders. There have been several ways to deal with dynamic membership: In (S.Q.Zhuang & J.D.Kubiatowicz., 2001), they ensure that the root node of the multicasting tree does not handle all requests to join or leave the multicast group; In (J. Li & Lim, 2005), use multiple interior-node- disjoint trees to avoid single points of failure in tree structures; In (S. El-Ansary & Haridi, February 2003) The earliest 4 Chapter 1. Introduction DHT-based broadcasting work by El-Ansary, et al. did not address the issue of dynamic membership. Moreover, in those system, one node join into system through an arbitrary way, thus, the multicast tree can be built inefficiently and not balance in structure. The solution for assigning an appropriate number of children to each node is far from optimal in term of bandwidth: If the number of children is too high, low capacity node will be overload, therefore slows the entire session multicast down. If the number of children is too low, high capacity nodes will be used inefficiently. In this thesis, we study the method to optimize topology for DHT-based multicast. We propose a DHT- based bandwidth adaptive multicast system that forcus on host heterogeneity, scalibility, fault tolerate. System chosen for implementation and avaluation is Chord, a DHTs representative. We extend Chord to be more flexible and it is called BAM-Chord: Bandwidth Adaptive Multicast Chord. Our method can make tradeoff between depth of the multicast tree and out-degree of every node and retain the advantages of a DHT network. 1.2 Objectives This thesis aims our works at building a multicast system over DHT-based network. Hence, we will study to attain the following objectives: • Study P2P network, application layer multicast, DHT-based network and DHT-based multicast system • Study topology optimization problems for DHT-based multicast; • Study models have been proposed on topology optimazation for DHT-based multicast, advantages and disadvantages of those model • Propose a improvement model for deploying multicast over DHT network: BAM-Chord • Evaluate the simulation results and compare with some existing model. 1.3. Contributions 1.3 5 Contributions In the scope of this thesis,we propose a model for deploying multicast service over DHT-based network. This thesis not only introduces the background of P2P network, DHT-based network, overall view of DHT-based multicast and related technical issues, but also makes the following contributions: • Proposes an improvement that can optimize network topology. This makes multicast tree more balanced and node’s bandwidth adaptation. This • Proposes an algorithm to find an optimal position when node joins into multicast session. • Proposes a improvement model that can address technical issues in earlier existing systems 1.4 Thesis structure Thesis is organized in 5 chapters. Besides the Introduction in Chapter 1, the rest of this thesis is organized as followed: • Chapter 2 introduces multicast, IP multicast, application layer multicast, DHT-based network, multicast over DHT-based network and related works on topology optimization for DHT-based multicast • Chapter 3 descirbes proposed model for deploying multicast over Chord, a DHT representative: Banwidth adaptive multicast over Chord: BAM-Chord • Chapter 4 shows the simulation and evaluate the simulation results in comparison with Chord, CAM-Chord • Chapter 5 concludes our work and points out some possible improvements of our system in the future. Chapter 2 Background 2.1 2.1.1 Multicast Introduction Unicast is a basic type of transmission in which information is sent from only one source to only one destination. In another words, Unicast transmission is between one-to-one nodes. This type of transmission is used at almost network protocols: IP, TCP, UDP, HTTP, FTP,.... However, when one sender sends the same data to many receivers, it is not effective to use unicast. Broadcast is a type of transmission in which information is sent from just one source but is received by all destinations connected to the network. This would mean that every time a node use broadcast transmission, all the other node will receive that information. An example of broadcast is ARP (Address Resolution Protocol) which will broadcast the address resolution request to all other computers on the network. Another example is DHCP (Dynamic Host Configuration Protocol) Figure 2.1: Type of transmissions 6 2.1. Multicast 7 Multicast is a very much different from Unicast and Broadcast in definition and application as well. It is a type of transmission or communication in which the information is sent by one source to a set of destination. Many people think is an excellent transmission method for multimedia: Internet radio broadcast, television broadcast, video conferencing, stock market tickers, slide presentations, etc. However, multicast is also suited to a large number of other applications: file transfers to multiple locations, dynamic web page updates, online gaming, news feeds, chatrooms,... Multicast is the most efficient method of delivering the same data to multiple receivers at the same time. Servers send only one data stream to reach any number of end-users. Using unicast, a server would need to send out as many streams as there are receivers. This increases the CPU load, bandwidth required to reach that audience. Using multicast, there will only be a single copy of the data streamed across any links, therefore, can decrease bandwidth without upgrading links and routers. 2.1.1.1 IP Multicast The type of communication between one sender and one receiver (a connection between a Web browser and a server) can be addressed by basic TCP/TP network. However, there is an emerging class of applications require multi-point communications between one or more senders and several receivers in order to work well. In this case, data is sent from a sender to the receivers asynchronously as needed. IP Multicast is designed to solve this problem. First introduced by S. Deering in 1988 [1], IP multicast is a method of sending Internet Protocol (IP) datagrams to a group of interested receivers in a single transmission. In IP multicast, router has to be equipped with multicasting capabilities. Router actively participate in multicast and has a very important role. Routers make copies of packets as need and forward towards multicast receivers. In this model, all interested host form a multicast group for communicating with each other through sending a message to multicast router using multicast group membership discovery protocol as: IGMP (IPv4) or MLD (IPv6). IMGP stands for Internet Group Management Protocol and MLD stands for Multicast Listener Discovery. After joining to a multicast group, a host can receive all data that are sent for that belonging multicast group without knowing the source address. Each 8 Chapter 2. Background Figure 2.2: IP Multicast of the multicast group is identified by a special multicast address called class-D address (224.0.0.0 through 239.255.255.255). IP muticast has all of advantages of multicast transmission. However, important issues such as routing, group management, address allocation, authorization and security, quality of Service (QoS) and scalability are not addressed effectively, IP Multicast has not been widely deployed. Due to the increasing number of application such as video-conferencing, multiparty games, private chat rooms, web cache replication and database/directory replication, and a lack of widely deployment of IP multicast in all IP-based networks, there has been interest in multicast protocols that can be supported without relying on the IP multicast infrastructure, and do not depend on multicast-capable routers. This is the so-called application-layer multicast (ALM). 2.1.1.2 Application Layer Multicast The concept of application level multicast is implementation of multicasting functionality at application layer instead of network layer. As a result, the multicastrelated functionalities are moved from routers to end-hosts. The basic idea of application-layer multicast is shown in Figure 2.3 : S establishes unicast connections and send data to D1 and D2; In turn, D1, D2 delivers data to D3, D4 via unicast connection respectively. In IP multicast, data packets are replicated at routers inside the network. In 2.1. Multicast 9 Figure 2.3: A Comparison between IP multicast and application-layer multicast ALM, data packets are replicated at end hosts. Logically, the end-hosts form an overlay network, and the goal of ALM is to construct and maintain an overlay for data transmission. Since IP Multicast is implemented by network routers and avoids multiple copies of the same multicast packet on the same link. ALM is implemented by application end-host and can not avoid multiple copies of the same packet on the same link. Thus, ALM is less efficient than network layer multicast. However, compared to IP multicast, application level multicast has some advantages: • First, most proposed ALM models do not require any special support from network routers, therefore, ALM can be widely deployed. • Second, ALM is more easily deploy than IP multicast because ALM avoids issues related to iner-domain multicast. • Third, in recent years, numbers of improvements in ALM technology overcome its weakness and ALM become a up-and coming solution, DHT-based multicast is an example. 2.1.2 Application layer multicast protocols There have been a lot of ALM protocols with a different approaches and characteristics. In this thesis, we only review and give brief introduction about some important categories and general approaches of different protocols as presented in (M. Hosseini, 2007). 10 Chapter 2. Background 2.1.2.1 Application Domain The application domain determines the number of users that a protocol must support, the data types a protocol supply users, and the metrics that multicast tree attempts to optimize. Thus, ALMs deploy as some categorization: • Audio/video streaming: Protocols in this application domain usually involve a single source distributing media to a large number of receivers. The metric is bandwidth and latency • Audio/video conferencing: Protocols in this application domain usually involve a multi-party conferencing session, interacting groups in this session have small to medium size. Important metrics are bandwidth and latency. • Generic multicast service Protocols in this application domain usually try to create a generic multicast service based on specific metrics that can affect a variety of applications. • Realiable data broadcast and file transfer: Reliable transfer and distribution of files. Bandwidth is the only metric. 2.1.2.2 Deployment Level Deployment level determines what level the protocol is expected to be deployed: at the infrastructure level or end system level. Infrastructure-level, also known as proxy-based ALM protocols, requires the deployment of dedicated servers/proxies on the Internet where they self-organized into an overlay network and provides a transparent multicast service to the end-user. End system level ALM protocols assume only a unicast service from the infrastructure and expect end-system hosts to participate in providing the multicasting functionality by taking on some of the forwarding responsibility 2.1.2.3 Group Management After deciding what is application domain and deployment level of ALM protocol, designer must make some key decisions to determine how to manage a group of nodes in a multicast session. Some key factors are considered: How users find out the multicast session, how users join a session, how users leave, can the users still contribute to existing multicast session even if they are not a part of it; Group 2.1. Multicast 11 management is centralized or distributed and how this affects the design and service provided; Mesh-first approach or tree-first approach is taken? The advantages and the disadvantages of each approach? Mesh-first versus Tree-first In the mesh-first approach, the mest topology is created at the begining, member nodes keep a connected in this topology. Then multicast tree is built based on the mesh relative of nodes to the root. In the treefirst approach, the multicast tree is built directly without any mesh. The member nodes explicitly select their parent from the known members in the tree. Parents is chosen based on running algorithm to detect and avoid loops. The advantages of tree-first approach is that this approach gives direct control over the tree, mer nodes can select a best parent neighbor that has enough resources. Another advantage of the tree-first approach is that it has a lower communication overhead. However, the disadvantage is that when a member changes a parent, it drags all of it descendents with it. The advantage of the mesh-first approach is fault tolerance, it is possible to manipulate the tree topology to a significant extent by selecting mesh neighbors and changing the metrics. This approach is suitable for multi-source applications. The disadvantages of this approach is that it is difficult to do load balancing and it has higher control overhead. Source Specific Tree versus Shared Tree Source-specific tree is built by minimizing the length of the path to a specific individual destination. Shared tree is built by minimizing the total number of hops or the cummulative end-to-end delay to forward the packet to all the destination. The choice between a source-specific tree and shared tree usually depends on whether the multiple sources use the same overlay for data distribution or not. When there is a multiparty communication, shared trees are preferred because it is better than source specific tree in term of the maintenance cost. On the other hand, source specific tree allows for optimization of the tree for a given source, but cannot support efficiently multiple sources on that tree 2.1.2.4 Routing Mechanism Group 1: Shortest Path This group of ALMs use RTT measurement to determine the shortest path tree from the source to the end hosts and minimize the time delay for each applicastion while considering the degree constraint and QoS. A Shortest Path Tree constructs a minimum cost path from source node to all its receivers. For example: Yoid, SpreadIt, TAG... 12 Chapter 2. Background Group 2: Minimum Spanning Tree This group tries to construct a low cost tree. A Minimum Spanning Tree is a tree with minimum total cost spanning all the members. For example: ALMI, HBM Group 3: Clustering Structure This group constructs a cluster of nodes that can be use to construct trees. For example: NICE, ZIGZAG Group 4: Peer-to-Peer Structure In P2P structure, Reverse-path forwarding or Forward-path forwarding is used for routing 2.2 2.2.1 DHT-based multicast Introduction of P2P Networks In simplest form, a peer-to-peer (P2P) network is created when two or more computers are connected and share resources (such as processing power, disk storage or network bandwidth) without going through a server. Contrast to client-server networks, in P2P network, peers are both suppliers and consumers of resources. A P2P network model allows many PCs to pool their resources together. Figure 2.4: P2P Network When Shawn Fanning created Napster in 1999, he did not imagine that today, P2P networks have been receiving increasing demand from users. P2P networks are decentralized, self-organized, and dynamic, they offer an alternative to the traditional client-server model of computing. In contrast to Client-server architecture, P2P networks are almost unlimited in their scalability. In recent years, due to the advantages of P2P networks, they have been widely used in many ares: Instant Messaging, File Sharing, Collaborative Community, IP 2.2. DHT-based multicast 13 Telephony, High Performance Computing - Grid Computing the P2P networks have become a focus of research at almost every major university. 2.2.1.1 Unstructured P2P Network model Unstructured P2P network is form when each participating node connects to others arbitrarily, and such networks can be easily constructed. When a new peer wants to join the network, it can copy links of another existing node then forms its own links. Figure 2.5: A unstructured P2P Network with a server In an unstructured P2P network, if a peer wants to find desired data in the network, the query has to be flooded through the network to find as many peers as possible. However, there is no way to guarantee that flooding will find a peer has the desired data. Thus, the main disadvantage with unstructured P2P networks is that the queries may not always be resolved. When a peer looks for data shared by only a few other peers, its search will be likely unsuccessful and hence such networks typically have very poor search efficiency. Also, the bandwidth cost of searching on unstructured P2P network would grow exponentially to the number of connected nodes. The typical unstructured P2P networks is Gnutella. Peer nodes of the system are organized randomly in unstructured overlay and described as servent. A new node, who wants to join in the system, would perform an operation ”Join”. Firstly, it would contact with a bootstrap node. Then bootstrap node would send back its 14 Chapter 2. Background own contact list of existed node, which is chosen randomly. The new node would store this list of the existed node as its neighbors. The new node may reach other nodes in the communication network base on its neighbor. Following figure shows an example of locating resources in a Gnutella-like P2P environment. Figure 2.6: Gnutella-like P2P 2.2.1.2 Hybrid P2P Network model In Unstructured P2P Network model, there is no correlation between a peer and its data, queries might not always be resolved. In hybrid P2P network model, problems of routing and lookup in unstructured P2P Network are solved . There peer nodes are divided into two types, data nodes and hub nodes. Data node stores real data and information of a hub node. Hub node is the same as a ”server”, it do not store real data. It store only indexes to files in the network. To join in the system, each node has to contact with a server. Since, the server would create indexes for these files, and then it stores these indexes in its database. As unstructured P2P Network model, Hybrid P2P network is easily constructed because it does not need to implement distribution and routing algorithms. Moreover, Hybrid model does not use query flooding or random searching, since it is more efficient than unstructured model. However, this model is essentially based on ”servers” to guarantee successful routing and searching, system can not exist if those servers get error. In Hybrid P2P Model, eDonkey network is a typical example. There is no central hub for the network. eDonkey client programs connect to the network to share files. 2.2. DHT-based multicast 15 Figure 2.7: An example of Hybrid P2P Model eDonkey servers act as communication hubs for the clients, allowing users to locate files within the network. 2.2.1.3 Structured P2P Network model Unlike Hybrid P2P network where some hub nodes acting as ”servers”, the Structured P2P Network is completely decentralized and self-organizing. Structured P2P model is constructed to achieve efficiently data searching, routing and distribution among peers. Each node has capacities of storing data and routing to other nodes based on a common routing algorithm. Nodes are arranged in a structured fashion. When a new node joins into the network, a network identify (ID) is assigned to it, this ID is created by using a global consistent hash function. Node ID represent its position on the network and correlatively, a portion of data corresponding to. When a node wishes to look for a resource, it must be redirected to the node which is supposed to hold it by using a efficient look up function. Structured P2P network is a scalable, efficient, completely decentralized and self-organizing, and load balanced model. Each node stores information of O(logN) neighbors. Routing algorithm allows it looking up any key with O(logN) hop time. Because of the routing algorithm based on globally consistent hash function, a set of keys are distributed equally to the key space. However, this P2P network model faces to some challengers such as: avoiding bottlenecks in particular nodes, adaptation to
- Xem thêm -