Đăng ký Đăng nhập
Trang chủ An improvement solution for multiple attribute information searching based on st...

Tài liệu An improvement solution for multiple attribute information searching based on structured P2P networks

.PDF
87
49433
115

Mô tả:

о ѵ ч .^ !N ỏ 广 An Improvement Solution for M ultiple A ttribute Information Searching Based On Structured P2P Networks N g u y e n T h a n h D a t Faculty of Information Technology Hanoi University of Engineering and Technology Vietnam National University, Hanoi Supervised by Doctor Nguyen Hoai Sổn A thesis s u b m itte d in fu lfillm e n t o f the re q u ire m e n ts fo r th e degree o f M a ste r o f In fo rm a tio n Technology Decem ber, 2009 Đ A I H Ọ C QUO C G IA HA NỘI TRUNG TẨM THÔNG TIN THU VIỀN A - L<^ ỗb ______ Table of Contents Acknowledgements ii A b stra ct 1 1 In tr o d u c tio n 2 1.1 Overview and M otivation..................................................................... 2 1.2 Coưim unicat.iọn netw ork m o d e l s ....................................... .............................. f> 1.3 P2P network m o d e ls ........................................................................... 9 1.3.1 P2P N e tw o rk ........................................................................... 9 1.3.2 P2P Network Models................................................................ 10 1.3.2.1 Unstructured. P2P N e tw o rk ....................................... 10 Hybrid P2P N etw ork .................................................. Structured P2 P N etw ork ............................................ 12 14 1.4 DHT-based P ro to c o l........................................................................... 15 1.3.2.2 1.3.2.3 1.4.1 Distributed Haah Table - DHT ............................................... 15 1.4.2 CHORD Protocol...................................................................... 16 1.4.2.1 Topology...................................................................... 17 1.4.2.2 Lookup and I n s e r t ..................................................... 1.4.2.3 Join and Leave.......................................................... 20 1.4.2.4 Stabilization and F a ilu re ................. 1.5 Summary 2 , . 18 ............... 21 ............................................................................................ 22 R e la te d W o rk s 2Л 24 INS/Twine :Information distribution based on Attribute-Value trees . 24 2.1.1 S o lu tio n .................................................................................... 24 2.1.2 System architecture................................................................... 26 2.1.3 System architecture................................................................... 26 2.1.4 S u m m a ry ................................................................................. 27 iii — 2.2 _____ TABLE OF CONTENTS CDS: Irifonnation D istribution Based O il Load Balancing M a trix 2.2.1 . . 28 S o lu tio n ....................................................................................28 2.2.2 System architccturc.................................................................. 28 2.2.2.1 Registering a content n a m e ...........................................29 2.2.3 System architecture...................................................................29 2.2.4 2.2.3.1 Query re solutio n....................................................... 30 2.2.3.2 S u m m a ry .......................................................................... 30 Load Balancing M a trix (LB M ) 2.2.4.1 2.2.5 2.3 The structure of L B M ...............................................31 System architecture.................................................................. 31 2.2.5 .1 2.2.6 .................................................... 30 LB M management mechanism....................................... 32 S u m m a ry ................................................................................. 33 Data In d e x in g ....................................................................................... 2.3.1 33 S o lu tio n ............................................................................................. 33 2.3/2 Insert a f i l o .............................................................................. 36 2.3.3 Lookups.................................................................................... 37 2.3.4 S u m m a ry ................................................................................. 38 2 ậ.4 SMAV: Searching - M ultiple-attribu te V a lu e ..........................................38 2.4.1 S o lu tio n .................................................................................... 38 2.4.2 Distribution of information c o n te n t......................................... 40 2.4.3 Information content name query...............................................42 2.4.4 S u m m a ry ................................................................................. 42 A n Im p ro v e m e n t S o lu tio n fo r M u ltip le - a ttr ib u te In fo r m a tio n Search­ in g on S tru c tu re d P 2 P N e tw o rk 3.1 44 I d e a .................................................................................................................45 3.2 Three Levels MappingModel 3.2.1 Overview .............................................................. 46 ...................................................................... .. 3.2.2 Thrce-Levclb Sub-key Mapping S to rin g ................................... 48 3.2.3 Distribution of information c o n te n t......................................... 50 3/2.4 Inform ation q u e ry ............................................................................. 54 3.2.5 S u m m a r y .......................................................................................... 57 3.3 The Dynamic ThresholdV alue s............................................................57 3.3.1 A formula of threshold v a lu e s .................................................. 57 3.3.2 Adjusted Distribution A lg o rith m ............................................ 59 3.3.3 Updating Threshold Value Periodically....................................61 46 TABLE OF CON TENTS 3.1 4 _____ ____ _________ v 3.3.4 Adjusted Lookup A lg o rith m ..................................................... 62 Summary ............................................................................................. 64 S im u la tio n s and E v a lu a tio n s 65 4 .1 Q ualitative 65 4.2 Simulation D e s c rip tio n ................................................................................ 67 4.3 Evaluation Based On S im u la tio n s ..............................................................68 E valuations................................................................. 4.3.1 Load balancing......................................................................... 68 4.3.2 Distribution of informationc o n te n t........................................... 71 4.3.3 Routing Performance........................................................ .. 5 Conclusions and Future W ork 74 77 List of Figures 1.1 Client/Server network m o d e l...................................................................... 8 1.2 Peer-To-Peer network model with 5 peers........................................... 10 1.3 Locating resources in a Gnutella-likc P 2P e n v iro n m e n t...................... 11 1.4 An example of H yb rid P2P M o d e l............................................................ 13 L5 15 1.6 Distribution data progress based on D H T ........................................... Chord's key space with 23 points . . ........... ..................................... 17 1.7 Chord's com ponents......................... .. .............................................. 17 1.8 Lookup progress of Chord*s protocol with key 5 4 ............................... 19 1.9 Joining phase of a node in C H O R D protocol.................. ......................... 21 2.J Meta string and A V T rc o .....................................................................25 2.2 Architecture of IN S /T w in S y s te m ............................................................ 26 2.3 S p litting a rcbourcc description into s t r a n d s ......................................... 27 2.4 The architecture of CDS system ......................................................... 29 2.5 An example of distribution of AVs in nodes........................................30 2.6 The structure of Load Balancing Matrix for { 이, V i } ......................... 31 2.7 An example of described d a t a ...................................................................34 2.8 Sample File Queries .............................................................................34 2.9 Mappings between q u e rie s ..................................................................35 2.10 Query mapping for three descriptors ................................................ 36 2.11 An example of mappings t r e e ............................................................36 2.12 An example of a path of queries......................................................... 37 2.13 Key - sub-key m a p p in g s.....................................................................40 3.1 The number of hop levels of a content name in pure S M A V ............. 3.2 Mappings are created fr 이ฑ key k ị ............................................................ 49 3.3 The generation of distributed keys from a content n a m e ......................53 3.4 Block diagram of query progress of improved S M A V ......................... 55 49 LIS T OF FIG U R E S v ii 3.5 3.6 An example of query progress with 4 common k e y s ......................... 56 Combining of common keys and a uncommon k e y ............................ 61 4.1 The distribu tion of ЛѴ pairs in content n a m e s ...................................... 68 4.2 Number of inform ation contents stored in each of 5000 n o d e s ............ 69 4.3 The number o f queries is processeci by each of 5000 nodes.................. 70 4.4 4.5 Load balancing among nodes...............................................................70 Mappings stored in every node............................................................ 72 4.Г) Mappings is created by CNs in a. DSMAV b. S M A V .......................72 4.7 4.8 4.9 The number of keys stored in every n o d e ...........................................73 Level-к Sub-keys are created by three solutions ............................... 73 Logicai hop count required for each query ........................................ 74 4.10 The maximum number of hop level of three s o lu tio n s ...................... 75 4.11 The number of successful queries ...................................................... 75 List of Tables 1Л Comparison: Client/Server vs. P 2 P ......................................................... 9 1.2 Definition of variables for node ท. using m -bit identifiers .................. 18 2.1 Mapping tabic between distributed key and content n a m e s.............. 41 2.2 Mapping table between distributed key and s u b -k e y s ....................... 41 2.3 Mapping table between distributed key and uncommon k e y s ........... 42 3.1 Mapping Table between distributed key and content n a m e s .................51 3.2 Mapping table between level-2 sub-keys and distributed k e y s ............ 53 3.3 Mapping Table between keys and co n te n ts........................................60 viii Abstract Conventional information searching engines such as Google, Yahoo, and Wikipedia... support only Keyword-based searching on websites. They cannot search information in various kinds of resources such as personal devices like Laptop, PDA} Cell Phone or sharing files in P2P Network. Besides, DHT-based P2P networks such as Chord, CAN, Pastry can achieve cxact (!ucry (i.e. query of an exact key) with characteristic of scalability, efficicncy and fault-tolerate. However, in the Cítóe of complex queries such as range query or multiple-attribute query, pure DHT is not efficient since lots of query messages must be sent. In this thesis, we focus our intentions on m ultiple-attribute query on DHTbatícd P2P network. The big problem here is the unbalance among nodes due to the appearance of common attribute/value pairs (AV pairs) in content names. The main idea of our method is to lim it number of content items, which assigned to an ID by creating sub-IDs from multiple AV pairs if those AV pairs appear in lots of content names, to threshold value of each node. To reduce query cost, our system also keeps the mapping between an ID and its sub IDs if existed in the node responsible for the ID. Moreover, we store only mappings, which are created in distribution progress, to 2 nodes. Our method can achieve both efficiency and a good degree of load balancing even when the distribution of AV pairs is skewed. Our simulation result shows the efficiency of our solution in respects of lookup time and the degree of load balancing 1 Chapter 1 Introduction 1.1 Overview and Motivation With the unprecedented growth of information technology, today wecan see that information is appearing in everywhere. Information might be found in various kinds of resource« «uch as personal dcvices like Laptops, PDAs, Cell phones..., websites in the Internet, sharing files in P2P network 1.,. From the explosion of information, there are more and more information search­ ing demands in somewhere. Every day we need lots of information to communicate and work efficiently and easily. For instance, we search for weather forecast information before a trip or a picnic. We also search for information of the latast news of the day,refercncc« of a product to buy, information of market priccs, etc... In lots of cases, if we seize desired information quickly and exactly, we might have more suc­ cessful opportunities in communication and work. Therefore,information searching is a necessary demand in nowadays information age. The emergence of new applica­ tions and services will require an efficiency information searching system which can realize complex query on contcnt names in a sealable manner (พ . Adjie-Winoto & Liliey. 1999;Carzaniga Sz Wolf, 2001; Foster & Tuecke,2002). There are many large systems to allow searching information such as conventional search engines: Google, Yahoo. Amazon, eBay, Wikipedia... Google engine allows users to search information based on keywords on Internet. This engine can link to billions of websites to search information. Information of each website is described by keywords and then they are processed and stored in servers of Google. Conventional search systems often use Client/Server model where servers pro 9 1.1. Overview and M otivation vide searching services to clients. However, Client/Server model have some disad­ vantages. Firstly, it has limitation in scalability. Servers are made with high cost because it need a very big capacity of processing and storing. Secondly, each server may be a single point of failure. When server goes down,operations will be ceascd. Moreover, as the big number of simultaneous client requests to a given server increaiies. the server can become overloaded. When a big amount of clients join to the network, traffic congestion on the network has also been an issue. Rcccntly, the appcarancc of Pccr-to-Pccr (P2P) network model has attracted the interest of lots of people, P2P with their decentralized control, self-organization and adaptation have emerged as a significant social and technical phenomenon over the last year. Unlike Client/Server model, P2P networks aim to aggregate largo num­ bers of computers that join and leave the network frequently. In pure P2P systems, individual computers communicate dircctly with each other and bharc information and resources without using dedicated servers. For example, they provide infrastruc­ ture for communities that share CPU cycles (e.g., SETI@Home, Entropia) and/or storage spacc (e.g., Napster (Idit Kcidar, 2006; Napster, 1999). FrccNet, Gnutella (Gnutella, 1999)). or that support collaborative environments (Groove). In P2P networks, all clients provide resources, including bandwidth, storage spac(î, and computing power. If there are more and more many nodes to join to the svstem, the total capacity of the systcn would be more and more increase. This is not true of Client/Server network model with a fixed set of servers, in which adding more clients could mean slower data transfer for all users. The distributed nature of P2P networks also increases robustness in case of failures by replicating data over multiple peers, and by enabling peers to find the data without relying on a centralized index server. In the latter ease, there i« 110 single point of failure ill the system. Information searching on P2P network is attended in recent years. Advantages of P2P network model allows us to construct information searching systems with capabilities of scalability and fault-toleratc. Bccausc of the whole of data of system are distributed to all nodes; each node is responsible for a portion of data and to take part, in search progrevss. The Gnutella network (Gnutella, 1999) supports to share and search file«. It searches data by flooding messages to the whole network. Nevertheless? Gnutella network requires high overhead; the search may be failed because a query may be not routed to the node is responsible for desired information. Hence, it leads to search information inefficiently. eDonkey (Weikum, 2002) network C h ap ter 1. Introduction suits to share and scarch big files such video files, fu ll music albuma and software based on some nodes (which are called indexing servers) that allow users to locate files within the network. However, maintaining indexing servers cause ฯcentral point of failure” • If indexing servers ccasc operates, a query may be not sent to the nodes whose data are indexed in these servers. In general, scalability, efficiency and faulttolerance in message routing and inform ation query on P2P networks are essential problems th a t must be resolved. The adoption of D istributed Hash Table (D H T ) such as C AN (ร. Ratnaäamv к Karp, 2001), Chord (Stoica et al.. 2001). Pastry (A. Rowstroib 2001),etc offers a promising solution for routing problems in P2P network. A DHT-based network constructs a structure of a v irtu a l key spacc where cach node is responsible for a portion of key space7 and keys are used as destinations to route messages. Ill the case of Chord, it can route a message to a node responsible for a destination kcv in O(logN) hops while each node only needs to maintain O(logN) links to other nodes, where N is the to ta l of nodes in the network. Some typical DHT-based network are BitTorrcnt, the Storm botnct. the Kad network, YaCy. W ith above organization. DHT-hasod network meet, a problem. I t only achieve exact queries. In DHT-based network, inform ation contcnt. is represented by a key from its description. I f users want to scarch this inform ation content, they need to send w ith complete query to the network. In fact, if users only remember a portion of inform ation content, they would be not able to search this inform ation contcnt. In the сабе of complex queries such as range query or m ultiple-attribu te query, pure D H T is not efficient since a lot of query messages must be sent. In this thesis, we focus our intentions on DHT-based P2P networks and the realization of m u ltiple-attribu te searching system based on DHT. Simple DHT-ba*scd distribution of inform ation contents can be used to realize exact query (i.e. querying information contents which have the same content name as the name in the query) efficiently. However,it is not efficient to perform complex queries such as multiplea ttribute queries. It is because the number of inform ation contents th a t match a complcx query is often large and therefore a lot of query messages must be sent to resolve a query. To realize multiplc-attribute queries efficiently on DHT-based P2P network, a framework b u ilt above the D H T is necessary. Several papers (Gao, 2004; Y- Arakawa et al.,2005; M. Balazinska,2002; GarcesErice & Ross, 2004) have proposed solutions for above problems. In these solutions, content names are represented by attribute and value (AV) pairs. In (Gao, 2Q04; .1. Overview and M otivation Y. Arakawa et al.. 2005),cach AV pair of a content name iỉ> hashed to a key. In (M, Balazinska,2002; Garces-Erice & Ross, 2004), a content name forms an attributed/value tree and each strand of an AV tree is matched to a key. Information content is then distributed to cach node rcbponsiblc for the key based on DH 丁 routing algorithm. Information query is realized by looking up the nodes that are responsible for keys created from a query AV pair. However, the distribution of information content ba«ed on each AV pair or each strand of AV tree causes the load unbalancing since there may be common AV pairs those appear in content names w ith high probability. There arc several solutions for this problem such as using m ultiple nodes in responsible for a key (Gao. 2004) or lim itin g number of content item« corresponding to a key in a node (Y. Arakawa et al., 2005). However, these approaches have not good balance between load balancing and query efficiency though there is a tradeoff between load balancing and query efficiency. In this thesis we propose a system callcd DSMAV (using Dynamic threshold values for Searching Multiple Attribute/Value) to distribute and query information content ba^ed on AV pairs on DHT-bai>ed network. The m ain idea of our system is to lim it the number of inform ation contents, which are distributed by a distributed key. to a reasonable threshold value. It is done by creating sub-keys from multiple AV pairs in a contcnt name if these AV pairs already appear in a lot of contcnt names. If the tota l number of inform ation contents of a sub-key is not over a threshold value^ it would be used as a distributed key to distribute its information content. Our system also keeps the mappings between sub-keys created from two common AV pairs and distributed keys created from more two common AV pairs to reduce query cost. O ur system satibfics three following requirements: • Efficiency Distribution :A content name will be distributed to a few of nodes. The number of the mappings and keys are distributed to every node reasonably. Mappings of level-2 sub-key and level-m sub-key (m I 2) are stored in the nodes th a t take responsibility for level-2 sub-keys. • Efficiency Searching: If a query name contains an uncommon AV pair, it will be resolved by querying only node which is responsible for the key created by the uncommon AV pair. I f a query name contains only common AV pairs, query messages w ill be sent quickly to nodes responsible for prim ary key and level-2 sub-key. The node, which is responsible for level-2 sub-key, also use mappings between level-2 sul>key and other sub-keys to route query messages 6 C h a p te r 1. Introduction to other nodes, which store queried contents. • Load balancing: The number of information contents those are corresponding to a distributed key is not over a threshold value. Hence, even if the distrib ilio n of AV pairs in content names is skewed, the number of inform ation contents stored in cach node can be kept balanced naturally. O ur solution uses DHT-baised protocols to route messages. The keys, which are created by one or more AV pairs in the system, are routed by DHT-based protocols such as CHORD. However, the inform ation contents, which include a set of attribute-value pairs, arc distributed to the node« effectively. An inform ation content, which contains common AV pairs, is distributed to a set of nodes by using keys, which are hashed from these common AV pairs. Sim ilarly, a query process also hash one or more AV pairs of a query name into keys, which are sent to the nodes take responsibility for the keys. Returned result would be a list of inform ation contents that are matchcd w ith AV pairs of the query name. O ur system can achievc both cfficicncy and a good degree of load balance even when the distribution of AV pairs ill content names is skewed. O ur sim ulation shows the effectiveness of our proposed system. Our thesis І8 organized in 5 chapters. In first Chapter^ we would introduce the overview of P2P network and P2P network models. W ith Chapter Two. we describe conventional solutions for M ultiple-attributes inform ation searching aỉá our related works. Then, we proposed a solution for M ultiple-attribu ted inform ation searching based on dynamic threshold values on Structured P2P Network in Chapter Three. Baling on our proposed solution, in Chapter Four wc implement simulations and evaluate the solution qualitatively and quantitatively. The last of all, we summarize carried works and talk about trends of development in the future in Chapter Five. 1.2 Communication network m odels W ith the rising up of requirem ent of searching inform ation everywhere» there are more and more people, who might want to share, search or download data from network environment. Today,network environments allow users to create and share information resources together Besides, the appearance of Internet, where contains frequently updated inform ation, supports user's searching inform ation whenever and 1.2. Com m unication netw ork models 7 whenever they need. Users can search latest news, inform ation of a book, reviews for business products, or download resource fr 이n on Internet easily. To do this, some inform ation searching systems al lo พร users to search data from everywhere by ubiquitous com puting models. I biquitous com puting is a post-desktop model of human-computer interaction in which inform ation processing has been thoroughly integrated into everyday objects and activities. In the progress of ordinary activities, som eone, •using” ubiquitous computing engages many com putational devices and systems simultaneously, and may not necessarily even be aware th a t they are doing so. For this reason, ubiquitous computing model is used in many inform ation searching systems. In these systems, the term MDistributed information” is described as lots of inform ation resources th a t can connect together based on a communication network environment. They contain devices, which store inform ation, such as P D A ’S,Mobile phone. Laptops… Their data may be pieces of information such as the information of a laptop or a book, or data files such as PDF, M P 3 ,Video files.... To support users searching the inform ation flexibly, the systems use characteristics of an object to represent inform ation. In other hand, each object may be described by M ultipleattributes based on its characteristic. For instance, a laptop’s attributes can be described by its characteristics as name of manufacturer, CPU, R A M , HDD... I f an object is represented by more and more attributes, it would be more and more found easily. For this reason,an information searching system w ill store lots of data, which are created by objects’ representations. It hence needs to analysis and organize data. Reasonably choosing of a communication network model and implementing a well protocol would increase performance and effective of the system. There are two typical communication network models, which a M ultiple-attribu te inform ation searching system may be able to use, namely C licnt/S crvcr and P2P network. Client/Server network model (Figure 1.1) is used widely and popularly everywhere. This model is a distributed application architecture that partitions tasks or workloads between service providers (servers) and service requesters, called clients. Often clients and servers operate over a computer network on separate hardware. A server machine ib a high-pcrformance host that is running one or more server programs which share its resources w ith clients. A client does not share any of its resources, but requests a server’s content or service function. Clients therefore in itia te com munication sessions w ith servers which await incoming requests. W ith such architecture, C lient/Server network model has some advantages and C hapter 1. Introduction CUENT ff lì 广 ÍỄÊ3 m ะJ i î Figure 1 . 1 :C lient/Server network model disadvantages. In most cases, it allows the roles and responsibilities of a computing system to be distributed among several independent computers th a t are known to each other only through a network. This creates an additional advantage to this architecture :ease of maintenance. For example, it is possible to replace, repair, upgrade, or even relocate a server while its clients remain both unaware and un­ affected by th a t change. A ll data are stored on the server, which generally have better security controls than most clients. Servers can control access and resources, to guarantee th a t only those clients w ith the appropriate permissions may access and change data. Since data storage is centralized^ updating data are administered easily. It functions w ith m ultiple different clients of different capabilities. However C lient/Scrvcr model may be not reasonable to constructs distributed inform ation searching systems because its disadvantages. Traffic congestion on the network has been an issue since the inception of the Client/Server network model. The server may become overloaded if it processes a lot of requests from the big number of simultaneous clients. I t may be hard to achieve scalability w ith a single point of failure. In the case the server is damaged, clients' requests cannot be fu lfille d and the system w ill cease. Moreover, activities of server require adm inistrators who are knowledge­ able experts about network and system. This increases cost of management and operation. In contrast w ith the C licnt/S ervcr network model 1 P2P network model, where it aggregated bandwidth actually increases as nodes are added, since the P2P net­ work's overall bandwidth can be roughly computed the sum of the bandwidths of every node in that network. It is very suitable for distributed network architecture 1.3. P 2P n e tw o rk m o d e ls 9 Table 1.1:Comparison :Client/Server vs. P2P Client/Server P2P A s y m m e tric : client and servers carry out Sym m etric: each node carry out the ьаше difFcrcnt tasks tasks; No distinction between Client and Servei G lo b a l knowledge: servers have a global Local knowledge: nodes only know a small set of other nodes view of the network D ecentralization: no global knowledge, C entralization: Communications and management arc centralized only local interactions Robustness: several nodes mav fail with Single p o in t o f fa ilu re : a server failure little or no im pact brings down the system H igh scalability: High aggregate capacity, L im ite d scalability: servers easily load d istribu tion overloaded E xp ensive: server storage and bandwidth Low-cost: storage bandwidth is contributed by users. capacity is not cheap with composing of participants that make a portion of th e ir resources (disk storage, network bandwidth or processing cycles). Peers are both suppliers and consumers of resources, in contrast to the traditional Client-Server model where only servers sup­ ply. and clients consumci. Organization of P2P network allows overcoming problems of Client/Server network. Table 1,1 shows comparison of C licnt/Servcr and P2P model. In next section, we describe P2P network architecture w ith its advantages and disadvantages. 1.3 P2P network models 1.3.1 P 2 P N etw ork Contrast to client-server networks, where network inform ation is stored Oïl a cen­ tralized file server PC and made available to tens, hundreds, or thousands of client PCs, the inform ation stored across P2P networks ib uniquely decentralized. A P2P network model allows ใฑany PCs to pool their resources together. Individual re­ sources like Desktops. Laptops, PDAs, and storage devices... are transformed into shared, collective resources th a t arc accessible from every PC. Because P2P PCs have their own hard disk drives th a t are accessible by all com­ puters. each PC acts as both a client (information requestor) and a server (informa- 10 C h ap ter 1. Introduction tion provider). Ill Figure 1,2, five P2P workstations arc shown. A ll five computers can conimunicatc directly with each other and share one another.s resources. Figure 1.2: Peer-To-Peer network model with 5 peers A P2P system is a distributed collection of peer nodes. It may provide services to other peers and consume services from other peers. Data of each node may be a portion of common data of system. It means that common data of systeixi is distributed to whole of possible nodes. Each node stores a portion of common data and related services. Therefore, the load of storage and processing data is also divided into several parts, which correspond with peer nodes. Each node stores and processes a small number of pieces data, so if some node fail, the network system І8 almost not affcctcd seriously. As a completely decentraiized model, it allows the development of applications with :high-availability. fault-tolerance, and scalability such as Sharing of content ap­ plication (Fire sharing, content delivery network, Gnutella. eMule, Akamay), Shar­ ing of storage (Distributed file system), Sharing of CPU time (Parallel computing, Grid) There are three models of P2P network consisting of Unstructured, Hybrid and Structured P2P network. Each model has individual advantages and disadvantages. 1.3.2 P 2 P N etw ork M odels 1 .3 .2 .1 U nstru ctu red P 2 P N etw o rk Unstructured P2P network is an unregulated overlay where each participating node connects to others rand이nly and arbitrarily, act as equals and merge the roles of 1.3. P 2P netw ork models 11 clients and server. Unstructured P2P network has no central server managing the network, neither is there a central router. Joining of a new node based on a wellknown node that, is called a bootstrap node. The bootstrap node is flexible in the now node's neighbor selection and routing mcchanisms. Unstructured P2P network has profound impact on the efficiency of search. The typical Unstructured P2P Network system is Gnutella (Gnutella;1999) (Figure 1.3). For the purpose of files sharing. Gnutella system is constructed basing on Un­ structured P2P model. Peer nodes of the svstem are organized randomly in Unstruc­ tured overlay. Data of the system consist of files such as mp3, image, and video, which each of whole peer nodes may share to other nodes. A new node, who wants to join ill the system, would perform an operation ” Join” . Firstly, it would contact with a bootstrap node. The bootstrap node would send to back a li«t of existed node, which is choscn randomly. Then, the new node would store and relate ship to the list of the existed node as its neighbors. From the neighbor nodes, the new node may reach other nodes in the communication network. Thence, the new node continue to get more addresses of other existed nodes from the nodes it can reach, and so on. Finally, the new node is existed as peer node of the system. Figure 1.3 shows an example of locating resources in a Gnutella-like P2P environment. « เ๏๏ n d s c ь.ч>кні Figure 1.3: Locating resources in a Gnutella-like P2P environment In Unstructured P2P Networks, the search progress is unstructured, result of a 12 C hapter 1. Introduction query may be not found. (Figure 1.3). Popular data is usually found easily because of they are stored by many nodes. The number of content of popular data is enough to locate a searched content by flooding the network. Using flooding allows the search to be performed easily and reliably in a highly connected overlay. However. Hooding has some disadvantages. Firstly. It creates a lots of duplicate query messages to send to a given node from its many neighbors. Secondly;it is diliicult to determine the appropriate T im e -to L iv e (T T L ) which controls the flooding progress. A high T T L allows achieving high scarch re liability but requires high overhead. Characteristics of the overlay affect the flooding effectiveness versus the overhead. Otherwise, a lim ita tio n of йсаіе-free topologies is the high load on very few number of hub nodes. Peers are not w illing to m aintain high loads as they may not. want to store large number of entries for construction of overlay topology. 1.3.2.2 H y b rid P 2 P N e tw o rk In Unstructured P 2P Network model, queries m ight not always be resolved. A l­ though popular data m ight be stored at several peers, but if a peer node look up data shared by only a few other peers, then it is highly likely th a t search w ill be not successful. Because of there is no correlation between a peer and the data, which is managed by it. There is no guarantee th a t flooding w ill find a peer th a t has the dc\รircd data. Flooding also causes a high amount of signaling traffic in the whole network and hence such networks have very poor search efficiency. W hile the structure of the hybrid P2P network model m ight tacklc problems of routing and lookup in U nstructured P2P Network. In that, peer nodes are divided into two types, data nodcb and hub nodes. Data node stores real data and informa­ tion of a hub node. Hub node is the same as a ,1server Î it do not store real data. I t store only indexes to files in the network. To jo in in the systemโ each node has to contact w ith a server and to provide real data,which it wants to share to others. Since, t he server would create indexes for these files, and then it stores these indexes in its database. Figure 1.4 shows an example of H yb rid P2P Model. Servers are entrusted w ith routing tasks. Each server store indexes to files and inform ation of data node and other servers. Each data node would store data files and information of a server. Data files are distributed among participating data nodes. However, to look up data, each node needs to use routing inform ation th a t is stored centrally in servers. 1.3. P 2 P netw ork m odels 13 Figuře 1.4: An example of Hybrid P2P Model Figure 1.4 also shows an example of lookup a file in H yb rid P2P Model. Searching node would send query message to its server. The server looks up filcs^ locations in the list of indexes ill its database. It also sends query requests to other servers simultaneously. Those servers would search for indexes corresponding to desired files and return the files* locations to the searching node. Finally, the searching nodo would contact the data nodes, which contain desired files, and download real files dircctly. In this example, node A would connect to node в and then to download data files from node в directly after getting location inform ation from servers. Being a decentralized network model,eDonkey network is typical H ybrid P2P Model. It best suited to share big files among users. It. allows sharing video files, full music albums and softwares. There is no central hub for the network. Data file« are distributed among peer nodes. eDonkey servers act as communication hubs for the clients, allowing users to locate files w ith in the network. Like Unstructured P2 P Network model. Hybrid P2P network is implemented easily becausc of it does not need to implement distribution and routing algorithms‘ Furthermore, because of servers undertake lookup the locations of data files, Hy­ brid model does not use query flooding or random searching like Unstructured P2P model, since it is more efficient. However, m aintaining indexing server cause Mcen­ tra l point of fa ilu re ř. If a server stops providing service, the data, which indexes are created by them ill the server^ database, would not exist in the system. Therefore, the whole system would cease if all servers stop operating.
- Xem thêm -

Tài liệu liên quan