An ant colony optimization approach for phylogenetic tree reconstruction problem

  • Số trang: 66 |
  • Loại file: PDF |
  • Lượt xem: 18 |
  • Lượt tải: 0
nhattuvisu

Đã đăng 26946 tài liệu

Mô tả:

Vietnam National University, Hanoi College of Technology Huy Quang D inh A n A nt Colony O p tim izatio n A p p ro ach for P hylogenetic Tree R eco n stru ctio n P ro b le m M ajor : Information Technology Code : 1.01.10 M ASTER THESIS Advisor : Prof. A rndt von Haeseler Co-Advisor : Dr. Hoang Xuan Huan Hanoi - December, 2006 C o n te n ts 1 A b s tr a c t *ii D e c la ra tio n IV A c k n o w le d g e m e n ts v In tr o d u c tio n 1 1.1 M o tivation........................................................................................................ 1 1.2 2 1.1.1 Com putational B iology.................................................................... 1 1.1.2 Phytogeny R e c o n stru c tio n .............................................................. 2 Thesis Works and S t r u c t u r e ....................................................................... 3 P h y lo g e n e tic T re e R e c o n s tru c tio n 4 2.1 4 Phylogenetic T r e e s ........................................................................................ 2.2 Sequence Alignment ..................................................................................... 7 2.2.1 Biological D a t a .................................................................................. 7 2.2.2 Pairwise and Multiple sequence a lig n m e n t................................. 8 2.3 Approaches for phylogeny re c o n stru ctio n ................................................. 11 2.4 Maximum Parsimony Principle .................................................................. 11 2.4.1 Parsimony C o n c e p t........................................................................... 11 2.4.2 Counting evolutionary changes ........................................................ 12 2.4.3 Remarks on Maximum Parsimony A p p ro a ch e s...........................14 2.5 Finding the best tree by heuristic searches . ........................................... 15 2.5.1 Sequential Addition Methods ........................................................ 15 2.5.2 Tree Arrangement M e th o d s ........................................................... 16 vi C ontents_____________________________________ _______________________ vn 2.5.3 3 3.2 The Ant Algorithms 20 .................................................................................. 20 3.1.1 Double bridge experim ents............................................................20 3.1.2 Ant S y s te m ..................................................................................... 22 3.1.3 Ant Colony S y s te m .........................................................................24 3.1.4 Max-Min Ant S ystem ..................................................................... 25 Ant Colony Optimization M eta-h eu ristic............................................... 27 3.2.1 Problem R e p re se n ta tio n ............................................................... 27 3.2.2 Artificial A n ts ............ ..................................................................... 28 3.2.3 Meta-heuristic S c h e m e .................................................................. 29 3.3 Remarks on ACO A p p licatio n ...................................................................30 3.4 ACO approaches in phylogenetics............................................................ 31 P hylogenetic Inference w ith Ant Colony O ptim ization 33 4.1 Related W orks............................................................................................... 33 4.2 Tree Graph D escription............................................................................... 34 4.3 4.4 4.5 5 19 A n t C olony O p tim iz a tio n 3.1 4 Other heuristic search m e t h o d s .................................................. 4.2.1 BD Tree C o d e ............................................... .................................. 34 4.2.2 State Graph D e sc rip tio n ...............................................................38 Our ACO-applicd A p p ro a c h ......................................................................39 4.3.1 Pheromoue Trail and Heuristic Information 4.3.2 Solution Construction Procedure 4.3.3 Pheromonc Update Chosen Procedure Simulation Results ............................ 40 ...............................................40 ..................................... 42 ...................................................................................... 42 4.4.1 Simulated Data..................................................................................43 4.4.2 Real D a t a ........................................................................................ 44 D iscussion......................................................................................................46 C onclusion a n d O u tlo o k 48 B ibliography 50 C o n te n ts viii A p p e n d ix .1 Probabilistic Decision R u le ......................................................................... 57 .2 Tree encoding from a BD tree c o d e ......................................................... .3 BD tree code Decoding a lg o rith m .............................................................58 .4 ACC) Solution Construction P ro c e d u re ................................................... 58 .5 Pltcromone Trails Update P r o c e d u re ...................................................... 59 .6 Algorithm for calculating evolutionary changes...................................... 60 57 57 List of Figures 1.1 The exponentially growth of nucleotide d a ta b a s e s ................................ 2 2.1 A looted tree of life....................................................................................... 5 2.2 Unrooted tree representation of annelid relationships .......................... 6 2.3 Three possible topologies of unrooted tree for four t a x a ....................... 7 2.4 An example of four types of nucleotide mutations (Nei and Kumar, 2 0 0 0 ).............................................................................................................. 9 2.5 Multiple Sequence Alignment E x a m p le ................................................... 10 2.6 An example for Fitch a lg o rith m ................................................................ 13 2.7 An example of sequential addition m e th o d ...................................... ... . 16 2.8 An example of Nearest-Neighbor Interchange O p eratio n ....................... 17 2.9 An example of Subtree Pruning and Regrafting Operation ................ 18 2.10 An example of Tree Bisection and Rcconnncction O p e r a t io n ............. 18 3.1 Experimental setup for the double bridge e x p e rim e n t..........................21 3.2 Results gained in the double bridge experiment 4.1 ................................... 22 Example of encoding a tree from a given BD tree code .......................36 4.2 Graph structure description with a*,iV p la n e ..........................................38 4.3 An example of tree building on the a,. N p la n e .......................................39 4.4 A found tree with 17 real species ix .............................................................45 List of T ables 2.1 Twenty different types of amino acids with corresponding codc . . . . 4.1 The number of instances for which the reconstructed tree and the generated true tree arc identical in the simulated data instances ... 4.2 Simulation results with real data of our proposed a p p ro a c h .............. 44 x 8 43 C h a p te r 1 I n t r o d u c t io n 1.1 1.1.1 M o tiv a tio n C o m p u ta tio n a l B iology Nowadays, based on the modern com puter technologies and the development of efficient sequencing technologies, a huge am ount of genetic d a ta is collected in many genome projects including GenBank(USA), EM BL-Bank(Europe), and DNA D atabase of Japan (DDBJ) (see Figure 1.1). The size of the GenBank database is extrem ely large: over 65 billion DNA ba.se pairs in 61 million molecular sequences J. This drastic growth of biological d a ta requires com putational tools for biological data (.so-called bioinformatics tools) being capable of handing a large-scale analysis. The term s bioinformatics and com putational biology are often used interchange­ ably. It is further emphasized th a t there is a tight coupling of developments and knowledge between the more hypothesis-driven research in com putational biology and technique-driven research in bioinformatics 2. A lot of approaches in com puter science have been applied to solve more and more complex problems in com putational biology (Baldi and Brunak, 2000); unfortunately almost, all such problems are N P-hard or NP-complctc. Therefore, heuristic search m ethods play an im portant role in tackling the com binatorial optim ization problems. 1htt p: / / www. n cb i.n lm . nih.gov / G e n b a n k / 2h tt p: / / www.b is ti.nill .gov/ C om puB ioD ef. pdf 1 2 1.1. M o tiv a tio n Figure 1.1: The exponentially growth of nucleotide databases Growth of the International Nucleotide Sequence Database Collaboration B.IS« P & rs by 'S H n fla rk ii— ** t M S i — OOBJ —• http://w w w .ncbi.nlm.nih.gov/Genbank/ Recently, Ant Colony Optimization (Dorigo. 1992) has been proposed and shortly afterwards has been recognized as one efficient method for finding an approximate solution for NP-hard problems. The first application is traveling salesman problem by inspiring by the real ants’s behavior when traveling from the colony to the food resource and transporting the food back. ACO technique is widely used in various types of combinatorial optimization problems including in bioinformatics (Dorigo and Stutzle, 2004). 1.1.2 P hylogeny R eco n stru ctio n Since the t ime of Charles Darwin, evolutionary biology has been a main focus among biologists to understand the evolutionary history of all organisms. Where the re­ lationship of the structure of the organisms is often expressed as a phylogenetic tree (Haeckel. 1866). Since the mid of twentieth century, the emergence of rnolec- 1.2. T h e sis W o rk s an d S tr u c tu r e 3 ul,u biology has given rise to a new branch ot study based on inolccular scqucnce (e.g DNA or protein). Moreover, phylogenetic analysis helps not only elucidate the evolutionary pattern but also understand the process of adaptive evolution at the molecular level (Nei and Kumar. 2000). In molecular phylogenetics, the sequences of the contemporary species arc given and one asks for the tree topology (including the branch lengths) which explains the data. It is commonly accepted that phytogenies arc rooted bifurcating trees, where the root is the most common ancestor of the contemporary species. The leaves represent contemporary species, and the internal nodes stand for spéciation events. Among plenty of approaches to rcconstruc phylogenetic trees, the statistic-based methods have been recognized as sound and accurate methods. Determining the best phylogénies based on optimality critcrions such as maximum parsimony, mini­ mum evolution and maximum likelihood was proved as NP-hard and NP-completc problems (Graham and Foulds, 1982; Day and Sankoff, 1986; Chor and Tullcr, 2005). 1.2 T hesis W orks and Structure In this thesis, we will build a general framework to apply ACO principle into phylo­ genetics and mainly deal with maximum parsimony. However, such approach can be easily adapted to any objective function. Our contribution is the formal description of framework to apply ACO mctaheuristics to solve the phylogcny reconstruction problem. Attempts to solve the phylogenetic reconstruction problem using ACO gained only a poor results partly because of the poor construction graph (Ando and lba, 2002: Kumnorkacw et. a l, 2004; Perrctto and Lopes, 2005). We proposed a mure general graph representation to overcome this problem. Except the introduction and conclusion, the thesis is organized into 3 chapters. The first chapter sketches the major problem of reconstructing phylogenetic trees from given biological sequences. The second chapter will show the general building block of ACO technique and application for solving the combinatorial optimization problems. The third chapter describes the main outcome of the thesis. It will de­ scribe our approach and some initial experiences to employ ACO into phylogenetics. C h a p te r 2 P h y lo g e n e tic T ree R e c o n s tru c tio n The goal of the phylogenetic tree reconstruction problem is to assemble a tree rep­ resenting a hypothesis about the evolutionary relationship among a set of genes, species, or other taxa. In this chapter, we will briefly introduce the main concept of phylogenetics and the state-of-the-art methods. In particular, we will concen­ trate on the maximum parsimony principle used as an objective function for our optimization approach discussed in chapter 4. 2.1 P h y lo g en etic Trees According to Charles Darwin’s evolution theory, all species have evolved from an­ cestors under the pressure of natural selection (Darwin, 1872). Evolutionary trees or phylogenetic trees in phylogenetics terminology arc the one way to display the evolu­ tionary relationships among species. A phylogenetic tree, also called an evolutionary tree , or a phytogeny is a graph-theoretic tree representing the evolutionary relation­ ships among a number of species having a common ancestor. Figure 2.1 depicts the phylogenetic tree of life consisting of three domains of all existing species: Bacte­ ria. Archaea, and Eukarya. In a phylogenetic: tree, each internal node represents ¿in unknown common ancestor th at split into two or more species, its descendants. Each external node or leaf represents a living spec ies, each branch has a length cor­ responding to the time between two splitting events or to the amount of changes that accumulated between two splits. 4 5 2.1. P h y lo g e n e tic T re e s B acteria A rch aea Eucarya 0 re* « & p ir o c h « « » t t»ac«efw» http://www.biologydaily.com /biology/Archaebactena Phylogenetic tree can be displayed as either a rooted tree or an unrooted tree. Figure 2.1 and 2.2 constitute examples of a rooted and a unrooted tree, respectively. The real unrooted phylogenetic tree of Annelida, the segmented worms including three major groups: Polychaeta, Oligochaeta (earthworms etc.) and Hirudinea (leeches), represents the most conservative representation of our understanding of annelid relationships in Figure 2.2. In a rooted tree, one has the information about the position of ancestral node. Whereas in the unrooted case, no such information is available and one can thus see how related the taxa are connected in the tree. Phylogcnctic applications usual produce an unrooted tree. To identify the root po­ sition. one often inserts an outer group one or several extra taxa not closely related to the original taxa, and observes the branch it joins to the tree. From now on, we only focus on unrooted trees. Unrooted trees can be bifurcating and multifurcating. In a bifurcating tree, each internal node has the degree three, while a multifurcating tree allows internal node of arbitrary degree. Typically, one assumes a bifurcating tree, i.e a speciation event, in the past leads to two lineages. Hence for the rest, of the thesis, we mean 6 2.1. P h y lo g e n e tic Trees Figure 2.2: Unrooted tree representation of annelid relationships Aeolosomatidao+Pocamodrilidae http://www.tolwcb.org/Annclida phylogenetic trees as unrooted and bifurcating. The branching pattern of a tree is called a topology or tree structure. In phylogenetic analysis, the branch lengths represent the evolution time a species needs to evolve into another specics. The phylogenetic T(S) tree is formally defined (Scmplc and Steel, 2003) on a set of N contemporary species S — {a'i, «2, ..., s.v} as a pair (T,tp) consisting of an underlying tree T — (V. E) with V is set of tree internal and external nodes, E is rhe corresponding set of edges and an injectivc map ip : S V. Thanks to this data structure used, the traversals on trees is easily performed by applying two famous strategics in graph theory, namely preorder and postorder traversals (i.e (Fitch, 1971)). Graph theory in general and graph data structure in particular play the very important role in phylogcnctic analysis. This traditional framework not 7 2.2. S e q u e n c e A lig n m e n t Figure 2.3: Three possible topologies of unrooted tree for four taxa c a b a b a b c d d c only helps build a perfect structure for phylogenetic trees but also provides a lot of efficient strategies from traversal to searching for optimization stuffs in the main reconstruction problem. T he n u m b e r of phylogenetic tre e s In general, the number of possible topologies for a bifurcating unrooted tree of in taxa is given by (2m - 5)!! = 1.3.5...(2m - 5) = (2m - 5)! 2m-3(ra - 3)! ( 2 . 1) for in > 3 (Cavalli-Sforza and Edwards, 1967; Felsenstein, 1978). There are only three unrooted trees of four taxa a, b, c, d as display in Figure 2.3; the smallest unrooted tree is often called quartet. In fact, finding the best topology based on almost all optimality criteria is intractable problem, for example with rri — 12 there are more than thirteen billion trees (Felsenstein, 2004). Therefore, heuristic searches are essential when the number of taxa becomes large. 2.2 S equ en ce A lignm ent 2.2.1 Biological D a ta The data in biology and nature is very diverse and abundant. Nowadays, one can study the evolutionary relationships of organisms by comparing their deoxyribonu­ cleic acid (DNA) since the blueprint of all organisms is written in DNA (or ribonu­ cleic acid RNA in some cases of viruses) (Nci and Kumar, 2000). DNA consists of the four types of nucleotides: Adenine, Cytosine, Guanine and Thymine classified into either purine (A and G) or pyrimidine (C and T) bases; Uracil is replaced by Thymine when considering the RNA sequences. Besides, another type of genetic 8 2.2. S e q u e n c e A lig n m e n t Table' 2.1: Twenty different types of amino acids with corresponding code 3-letter 1-lcttcr Methionine Met M C Asparagine Asn N Asp D Proline Pro P Glutamic Acid Gin E Glutamine Gin Q Phenylalanine Plie F Arginine Arg R Glycine C h­ G Serine Scr S Histidine ilis H Threonine Thr T Isoleueine lie 1 Valine Val V Lysine Lvs K Tryptophan Tip W Leucine Leu L Tyrosine Tyr Y 3-letter 1-letter Alanine Ala A Cysteine Cys Aspartic Acid Name Name sequences, amino acids including twenty different kinds listed in Table 2.1 (Brown <:t al.. 2002) are widely used in phylogenetic analysis. Both types of molecular sequences (nucleotides and amino acids) play an im portant role in molecular phy­ logenetics especially in phylogenetic inferences (Swofford et al., 1996; Fclscnstcin, 2004). From here, we assumed that the biological sequence d ata is molecular data. 2.2.2 P airw ise an d M ultiple sequence alignm ent As we known, one of the most important features in evolution is replicating gene in an organism. According to evolutionary theory, the genes in the later generation is not exactly copied from those in the previous generation be cause of the errors dur­ ing DNA replication or damaging effects of mutagens such as chemical and radiation (Brown et al.. 2002). Since all morphological characters of organisms arc ultimately controlled by the genetic information carried by DNA, any mutational changes in these character are due to some changes in DNA molecular sequences (Nei and Ku­ mar, 2000). There arc four basic types of changes in DNA: substitutions, insertions, deletions and inversions (Nei and Kumar, 2000) where all types except for inversions are point mutations (Vandammc, 2003). 9 2.2. S e q u e n c e A lig n m e n t Figure 2.4: An example of tour types of nucleotide imitations (Nci and Kumar, 200Ü) (C) Insertion (A) Substitution Thr Tyr Leu Leu ACC TAT TTG CTG Thr Tyr Leu Leu ACC TAT TTG CTG I 1 ACC TAC TTT GCT G Thr Tyr Phe Ala ACC TCT TTG CTG Thr Ser Leu Leu (D) Inversion (B) Deletion Thr Tyr Leu Leu ACC TAT TTG CTG Thr Tyr Leu Leu ACC TAT TTG CTG i— *—i i ACC TTT ATG CTG Thr Phe Met Leu ACC TAT TGC TGThr Tyr Cys • S u b s titu tio n s: replacing a character by another one. In Figure 2.4A, that the character A is substituted by C causes Tyrosine (Tyr) amino acid is re­ placed by Serine (Ser) in the new sequence. Nucleotide substitutions can be divided into two classes: transitions and transversions. A transition is the substitution of a p u rin e (A or G) for another purine or the substitution of a p y rim id in e (T or C) for another pyrimidine. Other types of nucleotide substitutions are called transversions. • In sertio n s: inserting one or more characters into the sequence. In Figure 2.4B . the character C is inserted before the character acid. T in Tyrosine amino After that, two new amino acids (Phenylalanine (Phe) and Alanine (Ala)) replace two consequence Leucine(Leu) amino acids before the unknown amino acid starting with character G. • D eletions: deleting one or more characters from the sequence. In the example 2.4C’. the deleting of the character T in the first, amino acid Leucine from the sequence creates tin' new amino acid Cysteine (Cys) and a triple of characters ending with the gap character. 10 2.2. S eq uence A lign m en t Figure 2.5: Multiple Sequence Alignment Example 1 2 3 4 5 6 7 8 9 Human C' A A C T T T C C Chimpanzee C A G - T T T C ( ¡in ilia c A C’ C T T T Rhesus C A T - T T T Cow C C T - T T T Dog C C T G T T T Mouse c: C T - T T T Bird T G T - T T T c C c C c c c c c 10 c c c c c c c c c c c c 11 12 T T T T T T T T T T T T T T T T • Inversions: inverting one or a constant number of characters between the beginning and ending parts in a subsequence of the given sequence. The first character in switched with the last one in subsequence A T T . After that, two amino acids Tyrosine and Leucine are substituted by two new ones Phenylala­ nine and Methionine(Met). Sequences are typically presented in a multiple sequence alignment (MSA). The general input to phylogony reconstruction programs is MSA (Felsenstcin, 2004). In general, a matrix, in which the genetic sequences is aligned such that homologous sequences are assigned into the same column (so-called site), defines a MSA (Wa­ terman. 2000). Figure 2.5 illustrates an example MSA with Human, Chimpanzee, Gorilla, Rhesus. Cow. Dog, Mouse and Bird. In this example, at least three point mutations occurred: the substitutions A ^ G can be made between the gene of Human and Chimpanzee, t he character G can be deleted in Dog gene or inserted in Mouse one. The computational and memory space complexities arc 0 ( m n2n) and 0 (m ") respectively in building the multiple sequence alignment by dynamic pro­ gramming (Waterman, 1995) where ri is the number of sequences, m is the number ot site s. Approximation methods have been proposed in case of larger number of sequences such as C L l’STALW (Thompson et a i. 1994), DIALIGN (Morgenstern, 1999).T-COFFEE (Notredame et al., 2000), or MUSCLE (Edgar, 2004). 2.3. A p p ro a ch e s for pliylogeny reco nstruction 2.3 11 A pproaches for phylogeny reconstruction 'I'lit* pliylogeny reconstruction approaches can be divided into two classes: characterhast d and distance-based. Distance-based approaches reconstruct, phylogenies for a set of species S based on the pairwise distance matrix D = {d (u ,v )} where d (u .v) is i he d i s t a n c e of two species u. r £ S estimated by many ways (Nei and Kumar. 2000). The first type of them is introduced by (Cavalli-Sforza and Edwards, 1967) and (Fitch and Margoliash. 1967), unfortunately they require a very huge computation times. Hence, we did not used the distance-based approaches for applying ACO approach to solve phylogeny reconstruction problem. Another one of character-based approaches besides the Maximum Parsimony ap­ proach discussed in the next section is Maximum Likelihood. Maximum Likelihood approach is more and more widely used for inferring the phylogenies. The results on computer simulations showed that maximum likelihood methods often give the better results than maximum parsimony ones (e.g, Tateno et al., 1994; Spcnccr ft al.. 2005). Using maximum likelihood can obtain the better experimental results, however due to limited time, we apply Maximum Parsimony criterion for easier com­ puting process. YYc did that because we want to consider the performance of ACO approach compared to another approaches based on the same objective function. 2.4 M axim um Parsim ony Principle 2.4.1 P arsim o n y C oncept Maximum Parsimony (MP) was proposed by Edwards and Cavalli-Sforza (1963) where they showed that the evolutionary tree is to be preferred that involves ” the m inim um net am ount o f evolution ” . In general, the goal of the MP methods is to select phylogenies that minimize the total number of substitutions along all branches of the tree required to explain a given set of aligned sequences (MSA) (Swofford et al., 1996). Mathematically, the general maximum parsimony problem is defined as follows. Uiven a multiple sequence alignment of n sequences with length rn (the number of 2.4. M ax im u m P arsim o ny Principle 12 sites), find all trees T that minimize the tree length in (2.2) L (T ) .7 = 1 (u.v) where the sum is over all sites j in the alignment and over all branches (u ,v) of i lie tiee T. the coefficient ir, assigns a weight to the given site, x u], x v] represent either the charac ters of the alignment if u or v is external node or optimal assigned cliarart.er-st.Hte if a or r is internal node, each terminal node i (including the one at the root), assign a state set S, containing the character state assigned to the corresponding taxon (i.c, S, — {.-I}). Initialize the tree length to zero. 2. Visit an internal node k for which a state set S t has not been defined but for which the state sets of k's two immediate descendants has been defined. Let / and j represent k's two immediate descendant. Assign to k a state set S* according to the following rules: (a) If t he intersect ion of the state sets assigned to nodes i and j is non-empty ( S , n S j 7^0). let k's state set equal this interscction(i.e,Sfc = 5, fl Sj). (b) Otherwise (S t n Sj = 0) let k's state set equal the union of those state sets (i.e. St — 5, U S ;). Increase the tree length by one unit. 3. If node k is located at the basal fork of the tree (i.c, the immediate descendant ot the terminal node placed at the root),the traversal has been completed; proceed to step 4. Otherwise return to step 2. 4. If the state set. to the terminal node at the root of tree is not contained in the state set just assigned to the node at the basal fork of the tree, increase the tree length by more one unit. 2.4. M a x im u m Parsim on y Principle 14 In tin example (Figure 2.G), there are totally three union operations in traversal for six sequences in a given site:{OT} = {C} U {T}. {GT} — {G} U {7"}, { AGT} — { CT} U {.4}.. The remaining immediate descendants arc created by intersection operation with the common character T. Therefore, the tree length for the given site is three. 2.4.3 R em arks on M axim um P arsim ony A pproaches Although the maximum parsimony approaches do not have statistical properties like the maximum likelihood ones (Tateno et al., 1994; Spencer et al., 2005), they play an important role in phylogenetic analysis. First, MP often consumes much less computation than other statistical-based approaches. That will be of great benefit when the tree becomes larger to provide a first view how the tree will look like. Second, analysis on morphological data is normally carricd out with MP- bascd methods. Beside the strong points of MP approaches, there are still some disadvantages. The first one is that MP does not use all sequence information because there are only informative sites1 in the parsimony sense used. Actually, the singleton sites2 arc informative for topology construction in other tree-building methods even that invariable sites 3 have some phylogenetic information in distance and maximum likelihood methods (Nci and Kumar, 2000). The second disadvantage is that MP approaches do not fully account for multiple mutations because of not implying a model of evolution as other statistical methods such as maximum likelihood. Early descriptions of MP methods were (Kluge and Farris, 1969), (Farris, 1970), (Fitch. 1971) and (Sankoff, 1975). Heuristic searches described in the next section have bet'ii proposed to reduce computational burden in Maximum Parsimony meth­ ods such as latched-based methods (Nixon, 1999), hill-climbing searches based on local tree rearrangement, operations (Maddison, 1991; Goloboff, 1999; Quickc et al., 2001) or divido-and-conqucr techniques Roshan et al. (2004). Nowadays, the inod'T ln 'iv m ust b e a t least tw o differen t k in d s of nu cleo tides, each re p re se n te d a t lea st tw o tim es "N u c le o tid e s ite a t w hich only u n iq u e n u cleo tid e ex ist •*Sit t* have th e sam e n u cleo tid e for all ta x a
- Xem thêm -