Vietnam National University, Hanoi
College of Technology
Huy Quang D inh
A n A nt Colony O p tim izatio n
A p p ro ach for P hylogenetic Tree
R eco n stru ctio n P ro b le m
M ajor : Information Technology
Code : 1.01.10
M ASTER THESIS
Advisor : Prof. A rndt von Haeseler
Co-Advisor : Dr. Hoang Xuan Huan
Hanoi - December, 2006
C o n te n ts
1
A b s tr a c t
*ii
D e c la ra tio n
IV
A c k n o w le d g e m e n ts
v
In tr o d u c tio n
1
1.1 M o tivation........................................................................................................
1
1.2
2
1.1.1
Com putational B iology....................................................................
1
1.1.2
Phytogeny R e c o n stru c tio n ..............................................................
2
Thesis Works and S t r u c t u r e .......................................................................
3
P h y lo g e n e tic T re e R e c o n s tru c tio n
4
2.1
4
Phylogenetic T r e e s ........................................................................................
2.2 Sequence Alignment
.....................................................................................
7
2.2.1
Biological D a t a ..................................................................................
7
2.2.2
Pairwise and Multiple sequence a lig n m e n t.................................
8
2.3 Approaches for phylogeny re c o n stru ctio n .................................................
11
2.4 Maximum Parsimony Principle
..................................................................
11
2.4.1
Parsimony C o n c e p t...........................................................................
11
2.4.2
Counting evolutionary changes ........................................................
12
2.4.3
Remarks on Maximum Parsimony A p p ro a ch e s...........................14
2.5 Finding the best tree by heuristic searches . ...........................................
15
2.5.1
Sequential Addition Methods
........................................................
15
2.5.2
Tree Arrangement M e th o d s ...........................................................
16
vi
C ontents_____________________________________ _______________________ vn
2.5.3
3
3.2
The Ant Algorithms
20
.................................................................................. 20
3.1.1
Double bridge experim ents............................................................20
3.1.2
Ant S y s te m ..................................................................................... 22
3.1.3
Ant Colony S y s te m .........................................................................24
3.1.4
Max-Min Ant S ystem ..................................................................... 25
Ant Colony Optimization M eta-h eu ristic............................................... 27
3.2.1
Problem R e p re se n ta tio n ............................................................... 27
3.2.2
Artificial A n ts ............ ..................................................................... 28
3.2.3
Meta-heuristic S c h e m e .................................................................. 29
3.3
Remarks on ACO A p p licatio n ...................................................................30
3.4
ACO approaches in phylogenetics............................................................ 31
P hylogenetic Inference w ith Ant Colony O ptim ization
33
4.1
Related W orks............................................................................................... 33
4.2
Tree Graph D escription............................................................................... 34
4.3
4.4
4.5
5
19
A n t C olony O p tim iz a tio n
3.1
4
Other heuristic search m e t h o d s ..................................................
4.2.1
BD Tree C o d e ............................................... .................................. 34
4.2.2
State Graph D e sc rip tio n ...............................................................38
Our ACO-applicd A p p ro a c h ......................................................................39
4.3.1
Pheromoue Trail and Heuristic Information
4.3.2
Solution Construction Procedure
4.3.3
Pheromonc Update Chosen Procedure
Simulation Results
............................ 40
...............................................40
..................................... 42
...................................................................................... 42
4.4.1
Simulated Data..................................................................................43
4.4.2
Real D a t a ........................................................................................ 44
D iscussion......................................................................................................46
C onclusion a n d O u tlo o k
48
B ibliography
50
C o n te n ts
viii
A p p e n d ix
.1
Probabilistic Decision R u le ......................................................................... 57
.2
Tree encoding from a BD tree c o d e .........................................................
.3
BD tree code Decoding a lg o rith m .............................................................58
.4
ACC) Solution Construction P ro c e d u re ................................................... 58
.5
Pltcromone Trails Update P r o c e d u re ...................................................... 59
.6
Algorithm for calculating evolutionary changes...................................... 60
57
57
List of Figures
1.1 The exponentially growth of nucleotide d a ta b a s e s ................................
2
2.1 A looted tree of life.......................................................................................
5
2.2 Unrooted tree representation of annelid relationships
..........................
6
2.3 Three possible topologies of unrooted tree for four t a x a .......................
7
2.4 An example of four types of nucleotide mutations (Nei and Kumar,
2 0 0 0 )..............................................................................................................
9
2.5 Multiple Sequence Alignment E x a m p le ...................................................
10
2.6 An example for Fitch a lg o rith m ................................................................
13
2.7
An example of sequential addition m e th o d ...................................... ... . 16
2.8 An example of Nearest-Neighbor Interchange O p eratio n .......................
17
2.9 An example of Subtree Pruning and Regrafting Operation
................
18
2.10 An example of Tree Bisection and Rcconnncction O p e r a t io n .............
18
3.1 Experimental setup for the double bridge e x p e rim e n t..........................21
3.2 Results gained in the double bridge experiment
4.1
................................... 22
Example of encoding a tree from a given BD tree code
.......................36
4.2 Graph structure description with a*,iV p la n e ..........................................38
4.3 An example of tree building on the a,. N p la n e .......................................39
4.4 A found tree with 17 real species
ix
.............................................................45
List of T ables
2.1 Twenty different types of amino acids with corresponding codc . . . .
4.1 The number of instances for which the reconstructed tree and
the
generated true tree arc identical in the simulated data instances
...
4.2 Simulation results with real data of our proposed a p p ro a c h .............. 44
x
8
43
C h a p te r 1
I n t r o d u c t io n
1.1
1.1.1
M o tiv a tio n
C o m p u ta tio n a l B iology
Nowadays, based on the modern com puter technologies and the development of
efficient sequencing technologies, a huge am ount of genetic d a ta is collected in
many genome projects including GenBank(USA), EM BL-Bank(Europe), and DNA
D atabase of Japan (DDBJ) (see Figure 1.1). The size of the GenBank database is
extrem ely large: over 65 billion DNA ba.se pairs in 61 million molecular sequences
J. This drastic growth of biological d a ta requires com putational tools for biological
data (.so-called bioinformatics tools) being capable of handing a large-scale analysis.
The term s bioinformatics and com putational biology are often used interchange
ably. It is further emphasized th a t there is a tight coupling of developments and
knowledge between the more hypothesis-driven research in com putational biology
and technique-driven research in bioinformatics 2.
A lot of approaches in com puter science have been applied to solve more and more
complex problems in com putational biology (Baldi and Brunak, 2000); unfortunately
almost, all such problems are N P-hard or NP-complctc. Therefore, heuristic search
m ethods play an im portant role in tackling the com binatorial optim ization problems.
1htt p: / / www. n cb i.n lm . nih.gov / G e n b a n k /
2h tt p: / / www.b is ti.nill .gov/ C om puB ioD ef. pdf
1
2
1.1. M o tiv a tio n
Figure 1.1: The exponentially growth of nucleotide databases
Growth of the
International Nucleotide Sequence Database Collaboration
B.IS« P & rs
by
'S H n fla rk ii— ** t M S i —
OOBJ —•
http://w w w .ncbi.nlm.nih.gov/Genbank/
Recently, Ant Colony Optimization (Dorigo. 1992) has been proposed and shortly
afterwards has been recognized as one efficient method for finding an approximate
solution for NP-hard problems. The first application is traveling salesman problem
by inspiring by the real ants’s behavior when traveling from the colony to the food
resource and transporting the food back. ACO technique is widely used in various
types of combinatorial optimization problems including in bioinformatics (Dorigo
and Stutzle, 2004).
1.1.2
P hylogeny R eco n stru ctio n
Since the t ime of Charles Darwin, evolutionary biology has been a main focus among
biologists to understand the evolutionary history of all organisms. Where the re
lationship of the structure of the organisms is often expressed as a phylogenetic
tree (Haeckel. 1866). Since the mid of twentieth century, the emergence of rnolec-
1.2. T h e sis W o rk s an d S tr u c tu r e
3
ul,u biology has given rise to a new branch ot study based on inolccular scqucnce
(e.g DNA or protein). Moreover, phylogenetic analysis helps not only elucidate the
evolutionary pattern but also understand the process of adaptive evolution at the
molecular level (Nei and Kumar. 2000).
In molecular phylogenetics, the sequences of the contemporary species arc given
and one asks for the tree topology (including the branch lengths) which explains the
data. It is commonly accepted that phytogenies arc rooted bifurcating trees, where
the root is the most common ancestor of the contemporary species. The leaves
represent contemporary species, and the internal nodes stand for spéciation events.
Among plenty of approaches to rcconstruc phylogenetic trees, the statistic-based
methods have been recognized as sound and accurate methods. Determining the
best phylogénies based on optimality critcrions such as maximum parsimony, mini
mum evolution and maximum likelihood was proved as NP-hard and NP-completc
problems (Graham and Foulds, 1982; Day and Sankoff, 1986; Chor and Tullcr, 2005).
1.2
T hesis W orks and Structure
In this thesis, we will build a general framework to apply ACO principle into phylo
genetics and mainly deal with maximum parsimony. However, such approach can be
easily adapted to any objective function. Our contribution is the formal description
of framework to apply ACO mctaheuristics to solve the phylogcny reconstruction
problem. Attempts to solve the phylogenetic reconstruction problem using ACO
gained only a poor results partly because of the poor construction graph (Ando and
lba, 2002: Kumnorkacw et. a l, 2004; Perrctto and Lopes, 2005). We proposed a
mure general graph representation to overcome this problem.
Except the introduction and conclusion, the thesis is organized into 3 chapters.
The first chapter sketches the major problem of reconstructing phylogenetic trees
from given biological sequences. The second chapter will show the general building
block of ACO technique and application for solving the combinatorial optimization
problems. The third chapter describes the main outcome of the thesis. It will de
scribe our approach and some initial experiences to employ ACO into phylogenetics.
C h a p te r 2
P h y lo g e n e tic T ree R e c o n s tru c tio n
The goal of the phylogenetic tree reconstruction problem is to assemble a tree rep
resenting a hypothesis about the evolutionary relationship among a set of genes,
species, or other taxa. In this chapter, we will briefly introduce the main concept
of phylogenetics and the state-of-the-art methods. In particular, we will concen
trate on the maximum parsimony principle used as an objective function for our
optimization approach discussed in chapter 4.
2.1
P h y lo g en etic Trees
According to Charles Darwin’s evolution theory, all species have evolved from an
cestors under the pressure of natural selection (Darwin, 1872). Evolutionary trees or
phylogenetic trees in phylogenetics terminology arc the one way to display the evolu
tionary relationships among species. A phylogenetic tree, also called an evolutionary
tree , or a phytogeny is a graph-theoretic tree representing the evolutionary relation
ships among a number of species having a common ancestor. Figure 2.1 depicts the
phylogenetic tree of life consisting of three domains of all existing species: Bacte
ria. Archaea, and Eukarya. In a phylogenetic: tree, each internal node represents
¿in unknown common ancestor th at split into two or more species, its descendants.
Each external node or leaf represents a living spec ies, each branch has a length cor
responding to the time between two splitting events or to the amount of changes
that accumulated between two splits.
4
5
2.1. P h y lo g e n e tic T re e s
B acteria
A rch aea
Eucarya
0 re* «
& p ir o c h « « » t
t»ac«efw»
http://www.biologydaily.com /biology/Archaebactena
Phylogenetic tree can be displayed as either a rooted tree or an unrooted tree.
Figure 2.1 and 2.2 constitute examples of a rooted and a unrooted tree, respectively.
The real unrooted phylogenetic tree of Annelida, the segmented worms including
three major groups: Polychaeta, Oligochaeta (earthworms etc.)
and Hirudinea
(leeches), represents the most conservative representation of our understanding of
annelid relationships in Figure 2.2. In a rooted tree, one has the information about
the position of ancestral node. Whereas in the unrooted case, no such information
is available and one can thus see how related the taxa are connected in the tree.
Phylogcnctic applications usual produce an unrooted tree. To identify the root po
sition. one often inserts an outer group one or several extra taxa not closely related
to the original taxa, and observes the branch it joins to the tree. From now on, we
only focus on unrooted trees.
Unrooted trees can be bifurcating and multifurcating.
In a bifurcating tree,
each internal node has the degree three, while a multifurcating tree allows internal
node of arbitrary degree. Typically, one assumes a bifurcating tree, i.e a speciation
event, in the past leads to two lineages. Hence for the rest, of the thesis, we mean
6
2.1. P h y lo g e n e tic Trees
Figure 2.2: Unrooted tree representation of annelid relationships
Aeolosomatidao+Pocamodrilidae
http://www.tolwcb.org/Annclida
phylogenetic trees as unrooted and bifurcating. The branching pattern of a tree
is called a topology or tree structure. In phylogenetic analysis, the branch lengths
represent the evolution time a species needs to evolve into another specics.
The phylogenetic T(S) tree is formally defined (Scmplc and Steel, 2003) on a
set of N contemporary species S — {a'i, «2, ..., s.v} as a pair (T,tp) consisting of
an underlying tree T — (V. E) with V is set of tree internal and external nodes,
E is rhe corresponding set of edges and an injectivc map ip : S
V. Thanks
to this data structure used, the traversals on trees is easily performed by applying
two famous strategics in graph theory, namely preorder and postorder traversals (i.e
(Fitch, 1971)). Graph theory in general and graph data structure in particular play
the very important role in phylogcnctic analysis. This traditional framework not
7
2.2. S e q u e n c e A lig n m e n t
Figure 2.3: Three possible topologies of unrooted tree for four taxa
c
a
b
a
b
a
b
c
d
d
c
only helps build a perfect structure for phylogenetic trees but also provides a lot
of efficient strategies from traversal to searching for optimization stuffs in the main
reconstruction problem.
T he n u m b e r of phylogenetic tre e s
In general, the number of possible topologies for a bifurcating unrooted tree of
in taxa is given by
(2m - 5)!! = 1.3.5...(2m - 5) =
(2m - 5)!
2m-3(ra - 3)!
( 2 . 1)
for in > 3 (Cavalli-Sforza and Edwards, 1967; Felsenstein, 1978). There are only
three unrooted trees of four taxa a, b, c, d as display in Figure 2.3; the smallest
unrooted tree is often called quartet. In fact, finding the best topology based on
almost all optimality criteria is intractable problem, for example with rri — 12 there
are more than thirteen billion trees (Felsenstein, 2004). Therefore, heuristic searches
are essential when the number of taxa becomes large.
2.2
S equ en ce A lignm ent
2.2.1
Biological D a ta
The data in biology and nature is very diverse and abundant. Nowadays, one can
study the evolutionary relationships of organisms by comparing their deoxyribonu
cleic acid (DNA) since the blueprint of all organisms is written in DNA (or ribonu
cleic acid RNA in some cases of viruses) (Nci and Kumar, 2000). DNA consists of
the four types of nucleotides: Adenine, Cytosine, Guanine and Thymine classified
into either purine (A and G) or pyrimidine (C and T) bases; Uracil is replaced by
Thymine when considering the RNA sequences. Besides, another type of genetic
8
2.2. S e q u e n c e A lig n m e n t
Table' 2.1: Twenty different types of amino acids with corresponding code
3-letter
1-lcttcr
Methionine
Met
M
C
Asparagine
Asn
N
Asp
D
Proline
Pro
P
Glutamic Acid
Gin
E
Glutamine
Gin
Q
Phenylalanine
Plie
F
Arginine
Arg
R
Glycine
C h
G
Serine
Scr
S
Histidine
ilis
H
Threonine
Thr
T
Isoleueine
lie
1
Valine
Val
V
Lysine
Lvs
K
Tryptophan
Tip
W
Leucine
Leu
L
Tyrosine
Tyr
Y
3-letter
1-letter
Alanine
Ala
A
Cysteine
Cys
Aspartic Acid
Name
Name
sequences, amino acids including twenty different kinds listed in Table 2.1 (Brown
<:t al.. 2002) are widely used in phylogenetic analysis.
Both types of molecular
sequences (nucleotides and amino acids) play an im portant role in molecular phy
logenetics especially in phylogenetic inferences (Swofford et al., 1996; Fclscnstcin,
2004). From here, we assumed that the biological sequence d ata is molecular data.
2.2.2
P airw ise an d M ultiple sequence alignm ent
As we known, one of the most important features in evolution is replicating gene in
an organism. According to evolutionary theory, the genes in the later generation is
not exactly copied from those in the previous generation be cause of the errors dur
ing
DNA replication or damaging effects of mutagens such as chemical and radiation
(Brown et al.. 2002). Since all morphological characters of organisms arc ultimately
controlled by the genetic information carried by DNA, any mutational changes in
these character are due to some changes in DNA molecular sequences (Nei and Ku
mar, 2000). There arc four basic types of changes in DNA: substitutions, insertions,
deletions and inversions (Nei and Kumar, 2000) where all types except for inversions
are point mutations (Vandammc, 2003).
9
2.2. S e q u e n c e A lig n m e n t
Figure 2.4: An example of tour types of nucleotide imitations (Nci and Kumar, 200Ü)
(C) Insertion
(A) Substitution
Thr Tyr Leu Leu
ACC TAT TTG CTG
Thr Tyr Leu Leu
ACC TAT TTG CTG
I
1
ACC TAC TTT GCT G Thr Tyr Phe Ala
ACC TCT TTG CTG
Thr Ser Leu Leu
(D) Inversion
(B) Deletion
Thr Tyr Leu Leu
ACC TAT TTG CTG
Thr Tyr Leu Leu
ACC TAT TTG CTG
i— *—i
i
ACC TTT ATG CTG
Thr Phe Met Leu
ACC TAT TGC TGThr Tyr Cys
• S u b s titu tio n s: replacing a character by another one. In Figure 2.4A, that
the character A is substituted by C causes Tyrosine (Tyr) amino acid is re
placed by Serine (Ser) in the new sequence. Nucleotide substitutions can be
divided into two classes: transitions and transversions. A transition is the
substitution of a p u rin e (A or G) for another purine or the substitution of
a p y rim id in e (T or C) for another pyrimidine. Other types of nucleotide
substitutions are called transversions.
• In sertio n s: inserting one or more characters into the sequence. In Figure
2.4B . the character C is inserted before the character
acid.
T
in Tyrosine amino
After that, two new amino acids (Phenylalanine (Phe) and Alanine
(Ala)) replace two consequence Leucine(Leu) amino acids before the unknown
amino acid starting with character
G.
• D eletions: deleting one or more characters from the sequence. In the example
2.4C’. the deleting of the character
T
in the first, amino acid Leucine from the
sequence creates tin' new amino acid Cysteine (Cys) and a triple of characters
ending with the gap
character.
10
2.2. S eq uence A lign m en t
Figure 2.5: Multiple Sequence Alignment Example
1
2
3
4
5
6
7
8
9
Human
C'
A
A
C
T
T
T
C
C
Chimpanzee
C
A
G
-
T
T
T
C
( ¡in ilia
c
A
C’ C
T
T
T
Rhesus
C
A
T
-
T
T
T
Cow
C
C
T
-
T
T
T
Dog
C
C
T
G
T
T
T
Mouse
c:
C
T
-
T
T
T
Bird
T
G
T
-
T
T
T
c
C c
C c
c c
c c
10
c
c
c
c
c
c
c c c
c c c
11
12
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
• Inversions: inverting one or a constant number of characters between the
beginning and ending parts in a subsequence of the given sequence. The first
character in switched with the last one in subsequence A T T . After that, two
amino acids Tyrosine and Leucine are substituted by two new ones Phenylala
nine and Methionine(Met).
Sequences are typically presented in a multiple sequence alignment (MSA). The
general input to phylogony reconstruction programs is MSA (Felsenstcin, 2004). In
general, a matrix, in which the genetic sequences is aligned such that homologous
sequences are assigned into the same column (so-called site), defines a MSA (Wa
terman. 2000). Figure 2.5 illustrates an example MSA with Human, Chimpanzee,
Gorilla, Rhesus. Cow. Dog, Mouse and Bird. In this example, at least three point
mutations occurred: the substitutions A ^
G can be made between the gene of
Human and Chimpanzee, t he character G can be deleted in Dog gene or inserted in
Mouse one. The computational and memory space complexities arc 0 ( m n2n) and
0 (m ") respectively in building the multiple sequence alignment by dynamic pro
gramming (Waterman, 1995) where ri is the number of sequences, m is the number
ot site s. Approximation methods have been proposed in case of larger number of
sequences such as C L l’STALW (Thompson et a i. 1994), DIALIGN (Morgenstern,
1999).T-COFFEE (Notredame et al., 2000), or MUSCLE (Edgar, 2004).
2.3. A p p ro a ch e s for pliylogeny reco nstruction
2.3
11
A pproaches for phylogeny reconstruction
'I'lit* pliylogeny reconstruction approaches can be divided into two classes: characterhast d and distance-based. Distance-based approaches reconstruct, phylogenies for a
set of species S based on the pairwise distance matrix D = {d (u ,v )} where d (u .v) is
i he d i s t a n c e of two species u. r £ S estimated by many ways (Nei and Kumar. 2000).
The first type of them is introduced by (Cavalli-Sforza and Edwards, 1967) and
(Fitch and Margoliash. 1967), unfortunately they require a very huge computation
times. Hence, we did not used the distance-based approaches for applying ACO
approach to solve phylogeny reconstruction problem.
Another one of character-based approaches besides the Maximum Parsimony ap
proach discussed in the next section is Maximum Likelihood. Maximum Likelihood
approach is more and more widely used for inferring the phylogenies. The results
on computer simulations showed that maximum likelihood methods often give the
better results than maximum parsimony ones (e.g, Tateno et al., 1994; Spcnccr
ft al.. 2005). Using maximum likelihood can obtain the better experimental results,
however due to limited time, we apply Maximum Parsimony criterion for easier com
puting process. YYc did that because we want to consider the performance of ACO
approach compared to another approaches based on the same objective function.
2.4
M axim um Parsim ony Principle
2.4.1
P arsim o n y C oncept
Maximum Parsimony (MP) was proposed by Edwards and Cavalli-Sforza (1963)
where they showed that the evolutionary tree is to be preferred that involves ” the
m inim um net am ount o f evolution ” . In general, the goal of the MP methods is to
select phylogenies that minimize the total number of substitutions along all branches
of the tree required to explain a given set of aligned sequences (MSA) (Swofford et al.,
1996).
Mathematically, the general maximum parsimony problem is defined as follows.
Uiven a multiple sequence alignment of n sequences with length rn (the number of
2.4. M ax im u m P arsim o ny Principle
12
sites), find all trees T that minimize the tree length
in
(2.2)
L (T )
.7 = 1
(u.v)
where the sum is over all sites j in the alignment and over all branches (u ,v) of
i lie tiee T. the coefficient ir, assigns a weight to the given site, x u], x v] represent
either the charac ters of the alignment if u or v is external node or optimal assigned
cliarart.er-st.Hte if a or r is internal node,
each terminal node i (including the one at the root), assign a state set
S, containing the character state assigned to the corresponding taxon (i.c,
S, — {.-I}). Initialize the tree length to zero.
2. Visit an internal node k for which a state set S t has not been defined but for
which the state sets of k's two immediate descendants has been defined. Let
/ and j represent k's two immediate descendant. Assign to k a state set S*
according to the following rules:
(a) If t he intersect ion of the state sets assigned to nodes i and j is non-empty
( S ,
n
S j
7^0). let k's state set equal this interscction(i.e,Sfc = 5, fl Sj).
(b) Otherwise (S t n Sj = 0) let k's state set equal the union of those state
sets (i.e. St — 5, U S ;). Increase the tree length by one unit.
3. If node k is located at the basal fork of the tree (i.c, the immediate descendant
ot the terminal node placed at the root),the traversal has been completed;
proceed to step 4. Otherwise return to step 2.
4. If the state set. to the terminal node at the root of tree is not contained in the
state set just assigned to the node at the basal fork of the tree, increase the
tree length by more one unit.
2.4. M a x im u m Parsim on y Principle
14
In tin example (Figure 2.G), there are totally three union operations in traversal
for six sequences in a given site:{OT} = {C} U {T}. {GT} — {G} U {7"}, { AGT} —
{ CT} U {.4}.. The remaining immediate descendants arc created by intersection
operation with the common character T. Therefore, the tree length for the given
site is three.
2.4.3
R em arks on M axim um P arsim ony A pproaches
Although the maximum parsimony approaches do not have statistical properties
like the maximum likelihood ones (Tateno et al., 1994; Spencer et al., 2005), they
play an important role in phylogenetic analysis. First, MP often consumes much
less computation than other statistical-based approaches. That will be of great
benefit when the tree becomes larger to provide a first view how the tree will look
like.
Second, analysis on morphological data is normally carricd out with MP-
bascd methods. Beside the strong points of MP approaches, there are still some
disadvantages.
The first one is that MP does not use all sequence information because there
are only informative sites1 in the parsimony sense used. Actually, the singleton
sites2 arc informative for topology construction in other tree-building methods even
that invariable sites 3 have some phylogenetic information in distance and maximum
likelihood methods (Nci and Kumar, 2000). The second disadvantage is that MP
approaches do not fully account for multiple mutations because of not implying a
model of evolution as other statistical methods such as maximum likelihood.
Early descriptions of MP methods were (Kluge and Farris, 1969), (Farris, 1970),
(Fitch. 1971) and (Sankoff, 1975). Heuristic searches described in the next section
have bet'ii proposed to reduce computational burden in Maximum Parsimony meth
ods such as latched-based methods (Nixon, 1999), hill-climbing searches based on
local tree rearrangement, operations (Maddison, 1991; Goloboff, 1999; Quickc et al.,
2001) or divido-and-conqucr techniques Roshan et al. (2004). Nowadays, the inod'T ln 'iv m ust b e a t least tw o differen t k in d s of nu cleo tides, each re p re se n te d a t lea st tw o tim es
"N u c le o tid e s ite a t w hich only u n iq u e n u cleo tid e ex ist
•*Sit t* have th e sam e n u cleo tid e for all ta x a
- Xem thêm -