MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI

Size: px

Start display at page:

Download "MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI"

Joanna Phoebe Walton
5 years ago
Views:

1 MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI DEPARTMENT OF STATISTICS FACULTY OF MATHEMATICS AND NATURAL SCIENCES BOGOR AGRICULTURAL UNIVERSITY 2010

2 ABSTRACT DEA RYNANDA PUTRI. Molecular Phylogenetic Tree using Three Different Methods based on p-distance Model. Advised by ASEP SAEFUDDIN and MULADNO. Phylogenetic inference is needed in describing the relationship between proteins, genes or species. In phylogeny, the object is assumed to be evolutionary related. The evolutionary tree is used to show the evolutionary relationship among the organisms. However, to build reliable evolutionary tree, reliable set of data is needed to find the best model. In this paper, the data is obtained from D-loop region in mitochondrion DNA (mtdna) that is available in Gen Bank. Five different species of animals were used, those were: Bison bison, Bos taurus, Bos indicus, Bubalus bubalis, and Capra hircus. The objective was to obtain the most reliable method, measured by its stability among UPGMA, Minimum Evolution, and Neighbor-Joining. To build the cases, five species were grouped into seven classes that have different characters. P-distance model was used to build the distance matrices. The reliability of each method was measured using the Felsentein s bootstrap method. The whole bootstrap process for each method will be repeated 100, 1000, and times to detect its reliability. Almost all methods do not have the misclassified problems in reconstructing the evolutionary tree. However, Minimum Evolution failed to reconstruct a reliable evolutionary tree compared to UPGMA and Neighbor-Joining. Key words : phylogenetic inference, d-loop mitochondrion DNA, evolutionary tree

3 MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI G Research Report to complete the requirement for graduation of Bachelor Degree in Statistics at Department of Statistics Faculty of Mathematics and Natural Sciences Bogor Agricultural University DEPARTMENT OF STATISTICS FACULTY OF MATHEMATICS AND NATURAL SCIENCES BOGOR AGRICULTURAL UNIVERSITY 2010

4 Title : Molecular Phylogenetic Tree using Three Different Methods based on p-distance Model Author : Dea Rynanda Putri NIM : G Approved by : Advisor I Advisor II Dr. Asep Saefuddin, M.Sc. NIP Prof. Dr. Muladno Basar, MSA NIP Acknowledged by : Head of Department of Statistics Dr. Ir. Hari Wijayanto, M. Si NIP Graduation date:

5 BIOGRAPHY Dea Rynanda Putri was born in Jakarta, on 27 th September 1988, as the daughter of Hedi Suhardi and Mary L Sutanto. She has one younger sister. She graduated from SD Kristen I BPK Penabur Jakarta in 2000 and from SLTP Kristen II BPK Penabur Jakarta in After graduated from SMA Negeri 68 Jakarta in 2006, she continued her study in Bogor Agricultural University through USMI. A year later, she chose Statistics as her major in Department of Statistics, and Monetary and Actuarial Mathematics from Department of Mathematics as her minor subject. During her studies, she was active in collage organization, such as International Association of Students in Agricultural and Related Sciences (IAAS) and IPB Debating Community (IDC). She was trusted to be a secretary of IDC during the period , and member of Exchange Program Department in IAAS during the period In 2009, she joined the International IndoMS Conference on Mathematics and It s Applied (IICMA) in Gadjah Mada University to present a research paper collaborated with Farid M Affendi, M.Si. And in the same year, she joined The 16 th Tri-U International Student Symposium in Mie University, Japan to present a research paper. On November 2010, she is going to join The 17 th Tri-U International Student Symposium in Chiang Mai University, Thailand. In February 2009, she had an opportunity to follow an internship program in Laboratory of Molecular Genetics, located in Faculty of Animal Husbandry, Bogor Agricultural University.

6 ACKNOWLEDGEMENTS Thanks be to God, many grateful to my beloved Jesus Christ Who gives me endless chance, spirit, health, and capability, especially in finishing my research. This paper is the representation of my research in bioinformatics. It was performed to complete a requirement for graduation of Bachelor Degree in Statistics, at Department of Statistics, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University. I have to admit that the completion of my research would not be possible without help from many people, started from the beginning, during the progress, until it was done. Thousand appreciations are presented for their ideas, critics, and improvement during the process. I would like to express my sincere gratitude to my advisors, Dr. Asep Saefuddin for his expert guidance and suggestion for this research, and Prof. Muladno for the enlightening suggestions and discussions. I would like to thank all my friends in Statistika 43, IAAS, and Tri-U delegations for the togetherness in finding knowledge and giving truly friendships. My special gratefulness is tribute to: Apri, Anita, Boer, Defri, TW, Nadia, and Nia for all the time we passed and nights we spent. I would like thank all friends in Laboratory of Molecular Genetics who helped me during the internship program, especially to Dr. Jakaria who helped me much in understanding the bioinformatics field. I am so grateful for my beloved family: Pap, Bunbun, and my only Gunz for their never ending love and support. Finally, I wish that this little work could be useful for all. Bogor, July 2010 Dea Rynanda Putri

7 CONTENT LIST OF FIGURE viii LIST OF TABLE viii LIST OF APPENDIX viii INTRODUCTION 1 Background 1 Objective 1 LITERATURE REVIEW 1 D-loop Mitochondrion DNA 1 Evolutionary Tree 2 p-distance 2 Distance Matrix Method 2 UPGMA 2 Minimum Evolution 2 Neighbor Joining 3 Bootstrap 3 Bootstrap Variance 3 METHODOLOGY 4 Data Sources 4 Methods 4 RESULT AND DISCUSSION 4 Distance Matrix 4 UPGMA s Performance in Reconstructing the Evolutionary Tree 4 ME s Performance in Reconstructing the Evolutionary Tree 5 NJ s Performance in Reconstructing the Evolutionary Tree 5 Performance Comparison of UPGMA, ME, and NJ in Reconstructing the Evolutionary Tree 6 CONCLUSION 7 RECOMMENDATION 7 REFERENCE 7 Page

8 LIST OF FIGURE Figure 1 Mitochondria Region 2 Figure 2 Evolutionary Tree s Component 2 Figure 3 The Evolutionary Tree with for 100, 1000, and repeated times respectively of Group A (a) and Group E (b) 5 Figure 4 Consensus Tree of Group C using ME with respectively for 100, 1000, and repeated times 5 Figure 5 Consensus Tree of Group E using NJ with respectively for 100, 1000, and repeated times 5 Figure 6 Consensus Tree of Group G using UPGMA (a), ME (b), and NJ (c) respectively for 100, 1000, and repeated times 6 Page LIST OF TABLE Table 1 Overall Mean and Standard Error for each Group 4 Page LIST OF APPENDIX Appendix 1 Available Information of Bison bison 8 Appendix 2 Nucleotide Composition 9 Appendix 3 Nucleotide Pair Frequencies 9 Appendix 4 Page Consistency comparison among UPGMA (a), ME (b), and NJ(c) conducted to all built cases 10 Appendix 5(a) Statistics of the built cases Group A, Group C, Group E, and Group F 11 Appendix 5(b) Statistics of the built cases Group G, Group D, and Group B 12 Appendix 6 Constructed evolutionary tree for Group B using UPGMA (a), ME(b), and NJ (c) 13 Appendix 7 Original constructed tree of Group B using UPGMA with repeated times respectively are 100, 1000, and Appendix 8 Original constructed tree of Group B using NJ with repeated times respectively are 100, 1000, and Appendix 9 Original constructed tree of Group B using ME with repeated times respectively are 100, 1000 and Appendix 10 Comparison of computational time among all methods and cases 16

1 INTRODUCTION Background Systematic biologists for centuries have striven to expose the natural order of living things, and for the past 150 years (since Darwin 1859) this endeavor has focused

9 1 INTRODUCTION Background Systematic biologists for centuries have striven to expose the natural order of living things, and for the past 150 years (since Darwin 1859) this endeavor has focused largely on inferring phylogeny. But unfortunately, evolution is not something that we can see. It has only happened once and leaves behind clues as to what happened. Many methods, in addition to intuition, have been developed to be used in phylogeny reconstruction. Early efforts to reconstruct phylogeny were based on morpho-logical data, but as molecular characters became accessible, they were quickly integrated into phylogenetic analyses. With these conditions, the phylogenetic systematists use these clues to try to reconstruct the evolutionary relationship by using the evolutionary tree. The phylogenetic systematic is very important in order (Elfaizi 2004): (1) to explain the history of evolution, (2) to map the variance of patogenous thread for vaccines, (3) to find the causes of disease or to find the genetics effect of disease or to find the genetics effect of disease, (4) to predict the function of new found genes, (5) to analyze the biodiversity, and (6) to have further understanding of the ecology of microbes. The reconstruction of evolutionary trees by using statistical methods was initiated independently in numerical taxonomy for morphological characters and in population genetics for gene frequency data (Nei & Kumar 2000). Some of the statistical methods developed for these purposes are still used for phylogenetic analysis in molecular data, but in recent years many new methods have been developed. There are some sources of information that could be used in reconstructing the evolutionary trees, such as: characters, traits, anatomical and physiological characteristics, behaviors, or genetic sequences. New and better data could change the outcome of the evolutionary trees and shows different way that the organisms are related. In this paper, we used the D-loop mitochondrion DNA sequences. MtDNA sequence variations have been widely applied in population genetics studies of animals due the maternal inheritance and high substitutions of this organelle genome. Displacement loop or D-loop is an area in mitochondria that highly varied. With the increasing emphasis on tree reconstruction, questions arose as to how confident one should be in a given phylogenetic tree and how support for phylogenetic trees should be measured. Felsenstein (1985, refers to Soltis & Soltis 2003) formally proposed bootstrapping as a method for obtaining confidence limits on phylogenies. D-loop mtdna sequences of five different species were used to compare the performance of each method: UPGMA, Minimum Evolution (ME), and Neighbor-Joining (NJ). The performance was measured using two aspects: computational times and consistency. Otherwise, the consistency was measured using bootstrap procedure. In built the cases, those five used species were: Bison bison, Bos taurus, Bos indicus, Bubalus bubalis, and Capra hircus available in Gen Bank. Objective The main objectives of this research were: 1. To compare the phylogenetic inferences which are based on distance methods. 2. To describe the characteristic of each inferences. 3. To find out which method is more reliable in which cases. 4. To help the molecular biologist to determine which method is more suitable for their data. LITERATURE REVIEW D-loop Mitochondrion DNA Mitochondrion DNA (mtdna) is the DNA located in organelle called mitochondria, structures within cells that convert the energy from food into a form that cells can use. MtDNA is located in the cytoplasm of the cell. In mammals, each double-stranded circular mtdna molecule consists of 15,000-17,000 base pairs of 37 genes, 13 are for proteins Figure 1 Mitochondria Region

10 2 (polypeptides), 22 are transfer RNA (trna) and two are for the small and large subunits of ribosomal RNA (rrna). D-loop occurs in the main non-coding area of the mtdna molecule, a segment called the control region. Certain bases within the D- loop region are conserved, but large parts are highly variable. Evolutionary Tree Phylogenetic describes the relationship between genes, proteins, or species. In phylogenic, the objects are being assumed to be evolutionary related. The evolutionary tree is used to show the evolutionary relationship between the organisms. To build the correct evolutionary tree, we also need a correct and proper data. The correct and proper data could be (Li 2001): (1) taxa: the groups of organisms that we are interested to know the evolutionary relationship, (2) characters: a list of organism phenotype characteristics and some groups of organisms that have different phenotype characteristics. The components of the evolutionary tree are mentioned in Figure 2. There are two methods in building the evolutionary tree (Nei & Kumar 2000): (1) distance methods and (2) characteristic methods. The distance methods or distance matrix methods, evolutionary distances are computed for all pairs of taxa, and an evolutionary tree is constructed by considering the relationships among these distance values. Figure 2 Evolutionary Tree s Component p-distance This distance is merely the proportion (p) of nucleotide sites at which the two sequences compared are different. This is obtained by dividing the number of nucleotide differences by the total number of nucleotides compared. Thus, (1) The computation of this distance is simple and for constructing phylogenetic trees it gives essentially the same results as the more complicated distance measures, as long as all pairwise distances are small. The assumption of this model is that the rate of nucleotide substitution is the same for all evolutionary lineages. Distance Matrix Methods In distance method or distance matrix method, evolutionary distances are computed for all pairs of taxa, and an evolutionary tree is constructed by considering the relationship among these distance value. There are many different methods of constructing trees from distance data. UPGMA The simplest method in distance method category is the Unweighted Pair-Group Method using Arithmetic Average (UPGMA). Sokal and Michener (1958, refers to Nei & Kumar 2000) are the first authors who introduced the use of this method. A tree constructed by this method is sometimes called a phenogram, because it was originally used to represent the extent of phenotypic similarity for a group of species in numerical taxonomy. However, it can be used for constructing molecular phylogenies when the rate of gene substitution is more or less constant. Assume that stands for the distance between -th and -th taxa. Clustering of taxa starts with a pair of two taxa with the smallest distance. Suppose that is the smallest among all distance values. Taxa 1 and 2 are then clustered with a branch point located at distance. In UPGMA we assume that the lengths of the branches leading from this branch point to taxa 1 and 2 are the same. Taxa 1 and 2 are then combined into a single composite taxon or cluster, and the distance between this and another taxon is computed by. Therefore, we will have the new distance matrix. We continue the algorithm until there are no more taxa to be grouped in one cluster. Minimum Evolution The principle of this method is to find the best topologies which has the smallest number of, which describe as (2) where is the total number of branches. It is computed for all plausible topologies. An estimate of evolutionary distance, branch length, and the sampling error between sequence and respectively represented as

11 3, and. Using matrix algebra, the equation would be like (3) So that the LS estimate of is then given by (4) where. Obviously, an estimate of the length of the -th branch is (5) Neighbor-Joining Saitou and Nei (1987, refers to Nei & Kumar 2000) developed an efficient treebuilding method that is based on the minimum evolution principle. Construction of a tree by the NJ method begins with a star tree, which is produced under the assumption that there is no clustering of taxa. We then estimate the branch lengths of the star tree and compute the sum of all branches. This sum should be greater than the sum for the final NJ tree (6) where is the total number of sequence used, is the branch length estimate between nodes and, and. In practice, since we do not know which pairs of taxa are true neighbors, we consider all pairs of taxa as a potential pair of taxa are true. We then choose the taxa and that show the smallest value (Equation 7). This procedure is repeated until the final tree is produced. (7) where and. Once the smallest determined, we can create a new node that connects taxa and. The branch lengths is given by the following formula: (8) (9) The next following step is to compute the distance between the new node ( ) and the remaining taxa. (10) Bootstrap Bootstrap firstly introduced by Efron (1979) to obtain estimates of error in nonstandard situations by resampling the data set many times to provide a distribution against which hypotheses could be tested. On 1985, Felsentein (referst to Soltis & Soltis 2003) formally proposed bootstrapping as a method for obtaining the confidence limits on phylogenies. We can indicate the tree-building process schematically as where is an estimated distance matrix. Felsentein s method proceeds as follows. A bootstrap data matrix is formed by randomly selecting columns from the original matrix. Then the original treebuilding algorithm is applied to, giving a bootstrap tree as Then, the proportions of bootstrap trees agreeing with the original tree are calculated. These proportions are the bootstrap confidence values. Bootstrap Variance In this paper, the bootstrap method was used to compute the variances of distance measure. The procedure for the bootstrap method in resampling the nucleotide sequences with base pairs lengths is the same way introduced before, where the random sample is produced by resampling the nucleotide sites (columns) with replacement. When the bootstrap resampled data set is obtained, distance estimations are then computed using Equation 1 for each sequence. This procedure is repeated times. We denote by, the value for the -th bootstrap replication. The bootstrap variance is then computed by (11) where is the mean of over all replications. One assumption often made for the bootstrap is that all sites evolve independently. This assumption of course does not hold in the present case. However, if the number of sites examined is large as in the present case, the effect of violation of the assumption is not important, because most sites with different evolutionary rates will be represented in each bootstrap sample. METHODOLOGY Data Sources For this research, the mtdna complete genome data from five common species of animals were obtained from Gen Bank ( for free. The data was accessed on April 2, The species

12 4 were: Bison bison, Bos taurus, Bos indicus, Bubalus bubalis, and Capra hircus. The information about d-loop region location was obtained from the information available as shown in Appendix 1. Methods The procedures that were conducted for this research are: 1. Downloaded the complete mtdna sequence from Gen Bank. The available data sets were: a. Bison bison (2) b. Bos taurus (10) c. Bos indicus (3) d. Bubalus bubalis (3) e. Capra hircus (7) Numbers in the bracket shows the amount of sequences that was downloaded from Gen Bank. 2. Reduced the data from complete mtdna sequence to D-loop region only in mtdna for all used sequences. The total number of the D-loop mtdna sequence is around 1,122 base-pairs length (before the gaps edited). 3. Aligned all data sets. It is necessary to make the numbers of nucleotide of the sequences compared to be the same. 4. Both insertions and deletions introduced gaps in the DNA sequence alignment due to the alignment procedure, so all gaps deletion in the data sets was needed. The total length of the D-loop mtdna sequence here already reduced to 882 base-pairs length. 5. Built the cases by making some groups of taxon, which are: a. Group A consists of: Bison bison (2), Bos taurus (2), Bos indicus (2), Bubalus bubalis (2), Capra hircus (2). b. Group B consists of: Bison bison (2), Bos taurus (10), Bos indicus (3), Bubalus bubalis (3), Capra hircus (7). c. Group C consists of: Bison bison (2), Bos taurus (1), Bos indicus (1), Bubalus bubalis (1), Capra hircus (1). d. Group D consists of: Bison bison (1), Bos taurus (10), Bos indicus (1), Bubalus bubalis (1), Capra hircus (1). e. Group E consists of: Bison bison (1), Bos taurus (1), Bos indicus (3), Bubalus bubalis (1), Capra hircus (1). f. Group F consists of: Bison bison (1), Bos taurus (1), Bos indicus (1), Bubalus bubalis (3), Capra hircus (1). g. Group G consists of: Bison bison (1), Bos taurus (1), Bos indicus (1), Bubalus bubalis (1), Capra hircus (7). Numbers in the brackets show the amount of sequences that was used to build the cases. The sample of species used in a group was selected randomly from available sequences. 6. Constructed the UPGMA, ME, and NJ based on p-distance model. The standard errors of overall mean of estimated distance for all groups were counted using the bootstrap procedure with 1000 repeated times. 7. Checked the reliability of each method using the bootstrap procedure that was repeated 100, 1000, and times. All procedure was conducted using MEGA RESULT AND DISCUSSION Distance Matrix In Table 1, we can see the overall mean of estimated distance for all groups. The standard error of this model is relatively small and constant. The standard error is computed using bootstrap procedure with 1000 repeated times. With these results, the p-distance model is reliable enough to be used in constructing the evolutionary tree. Table 1 Overall Mean and Standard Error for each Group Group Mean S.E A B C D E F G UPGMA s Performance in Reconstructing the Evolutionary Tree The reliability of UPGMA is averagely good in all conditions. In group C and group F, UPGMA shows stable topologies through the changing of repeated times using bootstrap procedure. UPGMA failed to construct a reliable topology to describe the relationship between BosIndicus(1) and BosIndicus(2) in group A and group E. The bootstrap confidence value goes down when the repeated times was changing from 100 to 1000 (from 100% to 99%), but constant in

13 5 Figure 3 The Evolutionary Tree with group E (b) a for 100, 1000, repeated times respectively of group A (a) and b 99% when the repeated times changed from 1000 to 10000, as you might see in Figure 3. From seven groups used to compare the reliability of each method, only two groups are stable from repeated times, while the repeated times changed from times, there are four consistence groups. No conclusion could be obtained in the reconstructing of group B where the was went up and down as the repeated times changed. But there was not topologies changing the sequences. ME s Performance in Reconstructing the Evolutionary Tree The reliability of ME is very low, especially when the number of used sequences is increase. In group B and group D where the number of used sequences respectively are 25 and 14, the consistence shown was low (Appendix 4). Just like UPGMA, ME could not gain a stable in group B. In fact, the went down as the changing of repeated times (Appendix 9). ME shows the consistence performance only in group F, while in group C where the number of sequences is only six, the instability happened in two nodes. The first one is the interior branches that joined BosIndicus-BosTaurus with the changing of repeated times from (decreased from 100% to 99%) and the second one is the nodes that relates BisBison(1)-BisBison(2) with BosIndicus-BosTaurus with the changing of repeated times from (decreased from 100% to 99%). NJ s Performance in Reconstructing the Evolutionary Tree The reliability of NJ method is increasing as the number of repeated times increase for almost all cases. In group A, when the repeated moves from 100 to 1000, the value of that shows the relationship between BosIndicus(1)-BosIndicus(2) goes down from 100% to 99%, but steady in 99% when the repeated times moves to The same thing happened in group E, in the nodes of BosIndicus(1)-BosIndicus(2). The Appendix 4 showed that the numbers of consistence clades in NJ varied when the repeated times changed from , but mostly steady when the repeated times changed from It means that when the variation among the used sequence is high, 1000 repeated times were sufficient to see the reliability of NJ. Figure 5 Consensus Tree of group E using NJ with respectively for 100, 1000, repeated times Figure 4 Consensus Tree of group C using ME with respectively for 100, 1000, repeated times NJ was also failed in constructing a reliable evolutionary tree for group B. It failed in maintaining the for all repeated times, but compared to ME and UPGMA, NJ is the most reliable method in constructing the

14 6 evolutionary of group B since the repeated times was only Performance Comparison of UPGMA, ME, and NJ in Reconstructing the Evolutionary Tree Compared to others, ME has the longest computational time (Appendix 10), while UPGMA is the shortest one. It may due to the computational iteration in ME, where all possible topologies were constructed one by one to find the topology which has the smallest number of, while UPGMA classified two taxon that has the smallest genetic distance for instance. Appendix 4 shows the consistence comparison among UPGMA, ME, and NJ methods through the changing of repeated times from 100 to 1000 and from 1000 to that has applied to all built cases. From graphic in Appendix 4, we could see that UPGMA has the most consistence value compared to ME and NJ for almost every group. While the evolutionary tree conducted from NJ method shows consistency starts when the repeated times were All methods failed in reconstructing a reliable evolutionary tree for group B (see Appendix 5). NJ shows slightly different topologies relationships with UPGMA. But unfortunately, they failed in giving a consistence for the topologies. Due to this condition, further understanding about this case is needed. Appendix 5 shows the nucleotide composition s means and variances for all built cases. This information shows that compared to other cases, the nucleotide variance for group B was relatively small for each nucleotide compositions. The same thing happened in group D where the nucleotide variance for T(U), C, A, G respectively are 0.192, 0.169, 0.207, and Those are relatively small comparing to group F, which has the nucleotide variance respectively 0.298, 0.480, 1.367, and While in group G, all methods showed inconsistency in construct the topologies among Capra hircus (Figure 6), especially in describing the relationship between CapHircus(2) with CapHircus(1)- CapHircus(4). It may caused by the nucleotide composition between CapHircus(1), CapHircus(2), and CapHircus(4). They have a slight different between Cytosine (C) and Guanine (G). Where the percentage of Cytosine in CapHircus(1) and CapHircus(4) is 26.4% while in CapHircus(2) is 26.3%. Otherwise, the percentage of Guanine in CapHircus(1) and CapHircus(4) is 15.5% while in CapHircus(2) is 15.6%. CONCLUSION Under the assumption that the nucleotide substitution rate is the same for all evolutionary lineages, UPGMA is the most consistence distance method, followed by NJ and ME at last. UPGMA is a good distance method that could be used if someone interest limitedly on classified the sequences and the total branch length, because the branch length (a) (b) (c) Figure 6 Consensus Tree of CapHircus using UPGMA (a), ME (b), and NJ (c) respectively for 100, 1000, repeated times

15 7 between nearest taxon in UPGMA is assumed to be equal, so this method is not appropriate if someone would like to know the evolutionary distance of sequences partially. While NJ is a consistence distance method when the number of bootstrap repeated times is not less than 1000 times. But NJ is a good distance method if someone would like to have the information about the evolutionary distance among sequences. When the number of sequences is large and the extent of sequence divergence is low, the realized tree may have many interior branches with zero length unless a large number of nucleotides are examined. In this case, like shown in group B, it is generally difficult to reconstruct the true tree by any method. In this case, there is no need to examine them, because the tree would not be reliable anyway whichever tree-building method is being used. It is now clear that there is no method that is superior to other methods in all conditions and that some methods perform better than others under certain conditions but worse under other conditions. Therefore, even if many interior branches of a tree are not well supported by the bootstrap, the tree should not be discarded. It is a hypothetical tree, but it could be a correct one. RECCOMENDATION This research is based on many assumptions and suffered by several limitations. If the assumptions and boundaries can be relaxed, a better result could be expected. There are some recommendations for the next research, which are: 1. The pairwise distance model in this research is under the assumption that the nucleotide substitution rate is the same for all evolutionary lineages. It might not be reflected the real condition of mtdna sequences that could be varied in mutation and or substitution rate. It might give a better result if the distance model could reflect the real mutation and or substitution rate in the sequences. 2. It would be interesting if the empirical distance method could be applied from the current statistical method, for example using the Bayesian or maximum likelihood method to estimated the parameter (pairwise distance), in order to find the most reliable method. REFERENCE [Anonim] DNA Mitokondria. DNAmitokondria&amp [May 18, 2010]. [Anonim] Understanding Evolution. icle/phylogenetics_05 [May 18, 2010]. Backeljau T et.al Multiple UPGMA and Neigbor-Joining Trees and the Performance of Some Computer Packages. Mol Biol Evol 13(2): g [March 23, 2010]. Efron B, Tibshirani R An Introduction to the Bootstrap. New York: Chapman & Hall. Elfaizi MA, Aprijani DA Bioinformatika: Perkembangan, Disiplin Ilmu, dan Penerapannya di Indonesia. http// rg/copyleft/fdl.html. [January 26, 2010] Ewens WJ, Grant GR, Dietz K, editor Statistical Methods in Bioinformatics: An Introduction. New York: Springer-Verlag. Gascuel O, Bryant D, Denis F Strengths and Limitations of the Minimum Evolution Principle. Sys Biol 50(5): [April 21, 2010]. Holmes S Bootstraping Phylogenetic Trees: Theory and Methods. Stat Sci 18(2): Husmeier D A Brief Tutorial on Phylogenetics. [March 22, 2010]. Li Yan How to Build a Phylogenetic Tree. [March 22, 2010]. Nei M, Kumar S Molecular Evolution and Phylogenetics. New York: Oxford University Press. Singh K, Xie M Bootstrap: A Statistical Method. gers.edu/~mxie/ RCPaRCPa/bootstrap.pdf. [May 25, 2010]. Soltis SP, Soltis DE Applying the Bootstrap in Phylogeny Reconstruction. Stat Sci 18(2):

16 8 Appendix1 Available Information of Bison bison LOCUS NC_ bp DNA circular MAM 13-APR-2009 DEFINITION Bison bison mitochondrion, complete genome. ACCESSION NC_ VERSION NC_ GI: DBLINK Project:36339 KEYWORDS. SOURCE mitochondrion Bison bison (American bison) ORGANISM Bison bison Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bison. REFERENCE 1 (bases 1 to 16319) FEATURES // Location/Qualifiers source /organism="bison bison" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:9901" /breed="american" D-loop join( ,1..360) trna /product="trna-phe" rrna /product="s-rrna" CDS /gene="cytb" /codon_start=1 /transl_table=2 /product="cytochrome b" /protein_id="yp_ " /db_xref="gi: " /db_xref="geneid: " /translation="mtnlrkshplmkivnnafidlpapsnisswwnfgsllgmcltlq ILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGL YYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNL VEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKI PFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLF AYAILRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLT LTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW" 1 actaatgact aatcagccca tgctcacaca taactgtgct gtcatacatt tggtattttt 61 ttattttggg ggatgcttgg actcagctat ggccgtcaaa ggccctgacc cggagcatct 121 attgtagctg gacttaactg caccttgagc accagcataa tggtaagcat gcacatatag 181 tcaatggtta caggacataa ctgtattata tatccccccc tccataaaaa ttccccctta 241 aatatttacc actgctttta acagattttt ccctagttac ctatttaaat tttccacact 301 ttcaatactc aaattagcac tccatataaa gtcaatatat aaacgcaggc cccccccccc 361 cgttgatgta gcttaaccca aagcaaggca ctgaaaatgc ctagatgagt ctcccaactc 1861 aataaatctc actgtaactt taaaagttaa tctaaaaagg tacagccttt tagaaacgga 1921 tacaaccttg actagagagt aaaatataac actaccatag taggcccaaa agcagccacc 1981 aattgagaaa gcgttaaagc tcaacaacaa aaattaaaca gatcccaata acaagtaatt 2041 aactcctagc cccaatactg gactaatcta ttattgaata gaagtaataa tgttagtatg 2101 agtaacaaga aaaactttct ccttgcataa gtctaagtca gtatctgata atactctgac cattaatgta ataaaaacat attatgtata tagtacatta aattatatgc cccatgcata taagcaagta cttatcctct attgacagta catagtacat aaagttatta attgtacata gcacattatg tcaaatctac ccttggcaac atgcatatcc cttccattag atcacgagct taattaccat gccgcgtgaa accagcaacc cgctaggcag aggatccctc ttctcgctcc gggcccatga accgtggggg tcgctattta atgaacttta tcagacatct ggttctttct tcagggccat ctcacctaaa atcgcccatt ctttcctctt aaataagaca tctcgatgg

17 9 Appendix 2 Nucleotide Composition T(U) C A G BisBison(1) BisBison(2) BosIndicus(1) BosIndicus(2) BosIndicus(3) BosTaurus(1) BosTaurus(2) BosTaurus(3) BosTaurus(4) BosTaurus(5) BosTaurus(6) BosTaurus(7) BosTaurus(8) BosTaurus(9) BosTaurus(10) BubBubalis(1) BubBubalis(2) BubBubalis(3) CapHircus(1) CapHircus(2) CapHircus(3) CapHircus(4) CapHircus(5) CapHircus(6) CapHircus(7) Avg Appendix 3 Nucleotide Pair Frequencies Domain ii si sv R TT TC TA TG CC CA CG AA GG Total avg

18 10 Appendix 4 Consistency comparison among UPGMA (a), ME (b), and NJ (c) conducted to all built cases UPGMA ME NJ Group A Group B Group C Group D Group E Group F Group G

19 11 Appendix 5(a) Statistics of the built cases Group A, Group C, Group E, and Group F Group A Group C Group E Group F Relationship constant constant constant constant UPGMA-NJ-ME UPGMA-NJ, ME - - UPGMA - ME - NJ UPGMA (1) constant (1) constant NJ (1) (1) (1) constant ME (2) (2) (2) constant No. of Sequence Sequence Lenth Description T(U) C A G T(U) C A G T(U) C A G T(U) C A G avg var

20 12 Appendix 5(b) Statistics of the built cases Group G, Group D, and Group B Group G Group D Group B Relationship constant varied unstable UPGMA-NJ-ME - UPGMA, NJ-ME - UPGMA (4) (5) (1) NJ (3) (6) (1) ME (4) (7) (2) No. of Sequence Sequence Lenth Description T(U) C A G T(U) C A G T(U) C A G avg var

21 13 Appendix 6 Constructed evolutionary tree for Group B using UPGMA (a), ME (b), and NJ (c) (a) (b) (c) BosTaurus(3) BosTaurus(4) BosTaurus(6) BosTaurus(7) BosTaurus(1) BosTaurus(2) BosTaurus(5) BosIndicus(2) BosTaurus(10) BosIndicus(1) BosTaurus(9) BosIndicus(3) BosTaurus(8) BisBison(1) BisBison(2) BubBubalis(3) BubBubalis(1) BubBubalis(2) CapHircus(7) CapHircus(6) CapHircus(3) CapHircus(5) CapHircus(2) CapHircus(1) CapHircus(4) BosTaurus(6) BosTaurus(7) BosTaurus(1) BosTaurus(3) BosTaurus(4) BosTaurus(2) BosTaurus(5) BosIndicus(2) BosTaurus(10) BosIndicus(1) BosTaurus(9) BosIndicus(3) BosTaurus(8) BisBison(1) BisBison(2) BubBubalis(3) BubBubalis(1) BubBubalis(2) CapHircus(7) CapHircus(6) CapHircus(3) CapHircus(1) CapHircus(4) CapHircus(2) CapHircus(5) BosTaurus(3) BosTaurus(4) BosTaurus(6) BosTaurus(7) BosTaurus(1) BosTaurus(2) BosTaurus(5) BosIndicus(2) BosTaurus(10) BosIndicus(1) BosTaurus(9) BosIndicus(3) BosTaurus(8) BisBison(1) BisBison(2) BubBubalis(3) BubBubalis(1) BubBubalis(2) CapHircus(7) CapHircus(6) CapHircus(3) CapHircus(5) CapHircus(2) CapHircus(1) CapHircus(4)

22 14 Appendix 7 Original constructed tree of Group B using UPGMA with repeated times respectively are 100, 1000, and Appendix 8 Original constructed tree of Group B using NJ with repeated times respectively are 100, 1000, and 10000

23 Appendix 9 Original constructed tree of Group B using ME with repeated times respectively are 100, 1000, and

24 16 Appendix 10 Comparison of Computational Time Among All Methods and Cases Group A Group B Group C Group D Group E Group F Group G Constructed Tree Bootstrap Tree UPGMA 01.1 s 01.3 s 01.5 s 05.9 s ME 01.5 s 01.7 s 01.9 s 07.5 s NJ 01.2 s 01.5 s 01.7 s 05.9 s UPGMA 01.3 s 01.5 s 03.4 s 21.8 s ME 01.6 s 01.7 s 04.1 s 31.1 s NJ 01.4 s 01.6 s 03.4 s 22.6 s UPGMA 01.3 s 01.4 s 01.6 s 03.9 s ME 01.3 s 01.7 s 03.0 s 04.4 s NJ 01.3 s 01.5 s 01.7 s 03.9 s UPGMA 01.4 s 01.4 s 01.9 s 08.7 s ME 01.5 s 01.5 s 03.7 s 12.1 s NJ 01.4 s 01.5 s 03.2 s 08.7 s UPGMA 01.4 s 01.5 s 01.7 s 04.3 s ME 01.5 s 01.6 s 01.9 s 04.8 s NJ 01.3 s 01.5 s 01.7 s 04.2 s UPGMA 01.3 s 01.4 s 01.6 s 04.3 s ME 01.5 s 01.5 s 02.8 s 04.7 s NJ 01.2 s 01.4 s 03.2 s 04.2 s UPGMA 01.4 s 01.5 s 01.8 s 06.3 s ME 01.5 s 01.6 s 02.8 s 08.2 s NJ 01.4 s 01.4 s 01.8 s 06.6 s

Agricultural University

Agricultural University , April 2011 p : 8-16 ISSN : 0853-811 Vol16 No.1 PERFORMANCE COMPARISON BETWEEN KIMURA 2-PARAMETERS AND JUKES-CANTOR MODEL IN CONSTRUCTING PHYLOGENETIC TREE OF NEIGHBOUR JOINING Hendra Prasetya 1, Asep