MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI

Size: px
Start display at page:

Download "MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI"

Transcription

1 MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI DEPARTMENT OF STATISTICS FACULTY OF MATHEMATICS AND NATURAL SCIENCES BOGOR AGRICULTURAL UNIVERSITY 2010

2 ABSTRACT DEA RYNANDA PUTRI. Molecular Phylogenetic Tree using Three Different Methods based on p-distance Model. Advised by ASEP SAEFUDDIN and MULADNO. Phylogenetic inference is needed in describing the relationship between proteins, genes or species. In phylogeny, the object is assumed to be evolutionary related. The evolutionary tree is used to show the evolutionary relationship among the organisms. However, to build reliable evolutionary tree, reliable set of data is needed to find the best model. In this paper, the data is obtained from D-loop region in mitochondrion DNA (mtdna) that is available in Gen Bank. Five different species of animals were used, those were: Bison bison, Bos taurus, Bos indicus, Bubalus bubalis, and Capra hircus. The objective was to obtain the most reliable method, measured by its stability among UPGMA, Minimum Evolution, and Neighbor-Joining. To build the cases, five species were grouped into seven classes that have different characters. P-distance model was used to build the distance matrices. The reliability of each method was measured using the Felsentein s bootstrap method. The whole bootstrap process for each method will be repeated 100, 1000, and times to detect its reliability. Almost all methods do not have the misclassified problems in reconstructing the evolutionary tree. However, Minimum Evolution failed to reconstruct a reliable evolutionary tree compared to UPGMA and Neighbor-Joining. Key words : phylogenetic inference, d-loop mitochondrion DNA, evolutionary tree

3 MOLECULAR PHYLOGENETIC TREE USING THREE DIFFERENT METHODS BASED ON P-DISTANCE MODEL DEA RYNANDA PUTRI G Research Report to complete the requirement for graduation of Bachelor Degree in Statistics at Department of Statistics Faculty of Mathematics and Natural Sciences Bogor Agricultural University DEPARTMENT OF STATISTICS FACULTY OF MATHEMATICS AND NATURAL SCIENCES BOGOR AGRICULTURAL UNIVERSITY 2010

4 Title : Molecular Phylogenetic Tree using Three Different Methods based on p-distance Model Author : Dea Rynanda Putri NIM : G Approved by : Advisor I Advisor II Dr. Asep Saefuddin, M.Sc. NIP Prof. Dr. Muladno Basar, MSA NIP Acknowledged by : Head of Department of Statistics Dr. Ir. Hari Wijayanto, M. Si NIP Graduation date:

5 BIOGRAPHY Dea Rynanda Putri was born in Jakarta, on 27 th September 1988, as the daughter of Hedi Suhardi and Mary L Sutanto. She has one younger sister. She graduated from SD Kristen I BPK Penabur Jakarta in 2000 and from SLTP Kristen II BPK Penabur Jakarta in After graduated from SMA Negeri 68 Jakarta in 2006, she continued her study in Bogor Agricultural University through USMI. A year later, she chose Statistics as her major in Department of Statistics, and Monetary and Actuarial Mathematics from Department of Mathematics as her minor subject. During her studies, she was active in collage organization, such as International Association of Students in Agricultural and Related Sciences (IAAS) and IPB Debating Community (IDC). She was trusted to be a secretary of IDC during the period , and member of Exchange Program Department in IAAS during the period In 2009, she joined the International IndoMS Conference on Mathematics and It s Applied (IICMA) in Gadjah Mada University to present a research paper collaborated with Farid M Affendi, M.Si. And in the same year, she joined The 16 th Tri-U International Student Symposium in Mie University, Japan to present a research paper. On November 2010, she is going to join The 17 th Tri-U International Student Symposium in Chiang Mai University, Thailand. In February 2009, she had an opportunity to follow an internship program in Laboratory of Molecular Genetics, located in Faculty of Animal Husbandry, Bogor Agricultural University.

6 ACKNOWLEDGEMENTS Thanks be to God, many grateful to my beloved Jesus Christ Who gives me endless chance, spirit, health, and capability, especially in finishing my research. This paper is the representation of my research in bioinformatics. It was performed to complete a requirement for graduation of Bachelor Degree in Statistics, at Department of Statistics, Faculty of Mathematics and Natural Sciences, Bogor Agricultural University. I have to admit that the completion of my research would not be possible without help from many people, started from the beginning, during the progress, until it was done. Thousand appreciations are presented for their ideas, critics, and improvement during the process. I would like to express my sincere gratitude to my advisors, Dr. Asep Saefuddin for his expert guidance and suggestion for this research, and Prof. Muladno for the enlightening suggestions and discussions. I would like to thank all my friends in Statistika 43, IAAS, and Tri-U delegations for the togetherness in finding knowledge and giving truly friendships. My special gratefulness is tribute to: Apri, Anita, Boer, Defri, TW, Nadia, and Nia for all the time we passed and nights we spent. I would like thank all friends in Laboratory of Molecular Genetics who helped me during the internship program, especially to Dr. Jakaria who helped me much in understanding the bioinformatics field. I am so grateful for my beloved family: Pap, Bunbun, and my only Gunz for their never ending love and support. Finally, I wish that this little work could be useful for all. Bogor, July 2010 Dea Rynanda Putri

7 CONTENT LIST OF FIGURE viii LIST OF TABLE viii LIST OF APPENDIX viii INTRODUCTION 1 Background 1 Objective 1 LITERATURE REVIEW 1 D-loop Mitochondrion DNA 1 Evolutionary Tree 2 p-distance 2 Distance Matrix Method 2 UPGMA 2 Minimum Evolution 2 Neighbor Joining 3 Bootstrap 3 Bootstrap Variance 3 METHODOLOGY 4 Data Sources 4 Methods 4 RESULT AND DISCUSSION 4 Distance Matrix 4 UPGMA s Performance in Reconstructing the Evolutionary Tree 4 ME s Performance in Reconstructing the Evolutionary Tree 5 NJ s Performance in Reconstructing the Evolutionary Tree 5 Performance Comparison of UPGMA, ME, and NJ in Reconstructing the Evolutionary Tree 6 CONCLUSION 7 RECOMMENDATION 7 REFERENCE 7 Page

8 LIST OF FIGURE Figure 1 Mitochondria Region 2 Figure 2 Evolutionary Tree s Component 2 Figure 3 The Evolutionary Tree with for 100, 1000, and repeated times respectively of Group A (a) and Group E (b) 5 Figure 4 Consensus Tree of Group C using ME with respectively for 100, 1000, and repeated times 5 Figure 5 Consensus Tree of Group E using NJ with respectively for 100, 1000, and repeated times 5 Figure 6 Consensus Tree of Group G using UPGMA (a), ME (b), and NJ (c) respectively for 100, 1000, and repeated times 6 Page LIST OF TABLE Table 1 Overall Mean and Standard Error for each Group 4 Page LIST OF APPENDIX Appendix 1 Available Information of Bison bison 8 Appendix 2 Nucleotide Composition 9 Appendix 3 Nucleotide Pair Frequencies 9 Appendix 4 Page Consistency comparison among UPGMA (a), ME (b), and NJ(c) conducted to all built cases 10 Appendix 5(a) Statistics of the built cases Group A, Group C, Group E, and Group F 11 Appendix 5(b) Statistics of the built cases Group G, Group D, and Group B 12 Appendix 6 Constructed evolutionary tree for Group B using UPGMA (a), ME(b), and NJ (c) 13 Appendix 7 Original constructed tree of Group B using UPGMA with repeated times respectively are 100, 1000, and Appendix 8 Original constructed tree of Group B using NJ with repeated times respectively are 100, 1000, and Appendix 9 Original constructed tree of Group B using ME with repeated times respectively are 100, 1000 and Appendix 10 Comparison of computational time among all methods and cases 16

9 1 INTRODUCTION Background Systematic biologists for centuries have striven to expose the natural order of living things, and for the past 150 years (since Darwin 1859) this endeavor has focused largely on inferring phylogeny. But unfortunately, evolution is not something that we can see. It has only happened once and leaves behind clues as to what happened. Many methods, in addition to intuition, have been developed to be used in phylogeny reconstruction. Early efforts to reconstruct phylogeny were based on morpho-logical data, but as molecular characters became accessible, they were quickly integrated into phylogenetic analyses. With these conditions, the phylogenetic systematists use these clues to try to reconstruct the evolutionary relationship by using the evolutionary tree. The phylogenetic systematic is very important in order (Elfaizi 2004): (1) to explain the history of evolution, (2) to map the variance of patogenous thread for vaccines, (3) to find the causes of disease or to find the genetics effect of disease or to find the genetics effect of disease, (4) to predict the function of new found genes, (5) to analyze the biodiversity, and (6) to have further understanding of the ecology of microbes. The reconstruction of evolutionary trees by using statistical methods was initiated independently in numerical taxonomy for morphological characters and in population genetics for gene frequency data (Nei & Kumar 2000). Some of the statistical methods developed for these purposes are still used for phylogenetic analysis in molecular data, but in recent years many new methods have been developed. There are some sources of information that could be used in reconstructing the evolutionary trees, such as: characters, traits, anatomical and physiological characteristics, behaviors, or genetic sequences. New and better data could change the outcome of the evolutionary trees and shows different way that the organisms are related. In this paper, we used the D-loop mitochondrion DNA sequences. MtDNA sequence variations have been widely applied in population genetics studies of animals due the maternal inheritance and high substitutions of this organelle genome. Displacement loop or D-loop is an area in mitochondria that highly varied. With the increasing emphasis on tree reconstruction, questions arose as to how confident one should be in a given phylogenetic tree and how support for phylogenetic trees should be measured. Felsenstein (1985, refers to Soltis & Soltis 2003) formally proposed bootstrapping as a method for obtaining confidence limits on phylogenies. D-loop mtdna sequences of five different species were used to compare the performance of each method: UPGMA, Minimum Evolution (ME), and Neighbor-Joining (NJ). The performance was measured using two aspects: computational times and consistency. Otherwise, the consistency was measured using bootstrap procedure. In built the cases, those five used species were: Bison bison, Bos taurus, Bos indicus, Bubalus bubalis, and Capra hircus available in Gen Bank. Objective The main objectives of this research were: 1. To compare the phylogenetic inferences which are based on distance methods. 2. To describe the characteristic of each inferences. 3. To find out which method is more reliable in which cases. 4. To help the molecular biologist to determine which method is more suitable for their data. LITERATURE REVIEW D-loop Mitochondrion DNA Mitochondrion DNA (mtdna) is the DNA located in organelle called mitochondria, structures within cells that convert the energy from food into a form that cells can use. MtDNA is located in the cytoplasm of the cell. In mammals, each double-stranded circular mtdna molecule consists of 15,000-17,000 base pairs of 37 genes, 13 are for proteins Figure 1 Mitochondria Region

10 2 (polypeptides), 22 are transfer RNA (trna) and two are for the small and large subunits of ribosomal RNA (rrna). D-loop occurs in the main non-coding area of the mtdna molecule, a segment called the control region. Certain bases within the D- loop region are conserved, but large parts are highly variable. Evolutionary Tree Phylogenetic describes the relationship between genes, proteins, or species. In phylogenic, the objects are being assumed to be evolutionary related. The evolutionary tree is used to show the evolutionary relationship between the organisms. To build the correct evolutionary tree, we also need a correct and proper data. The correct and proper data could be (Li 2001): (1) taxa: the groups of organisms that we are interested to know the evolutionary relationship, (2) characters: a list of organism phenotype characteristics and some groups of organisms that have different phenotype characteristics. The components of the evolutionary tree are mentioned in Figure 2. There are two methods in building the evolutionary tree (Nei & Kumar 2000): (1) distance methods and (2) characteristic methods. The distance methods or distance matrix methods, evolutionary distances are computed for all pairs of taxa, and an evolutionary tree is constructed by considering the relationships among these distance values. Figure 2 Evolutionary Tree s Component p-distance This distance is merely the proportion (p) of nucleotide sites at which the two sequences compared are different. This is obtained by dividing the number of nucleotide differences by the total number of nucleotides compared. Thus, (1) The computation of this distance is simple and for constructing phylogenetic trees it gives essentially the same results as the more complicated distance measures, as long as all pairwise distances are small. The assumption of this model is that the rate of nucleotide substitution is the same for all evolutionary lineages. Distance Matrix Methods In distance method or distance matrix method, evolutionary distances are computed for all pairs of taxa, and an evolutionary tree is constructed by considering the relationship among these distance value. There are many different methods of constructing trees from distance data. UPGMA The simplest method in distance method category is the Unweighted Pair-Group Method using Arithmetic Average (UPGMA). Sokal and Michener (1958, refers to Nei & Kumar 2000) are the first authors who introduced the use of this method. A tree constructed by this method is sometimes called a phenogram, because it was originally used to represent the extent of phenotypic similarity for a group of species in numerical taxonomy. However, it can be used for constructing molecular phylogenies when the rate of gene substitution is more or less constant. Assume that stands for the distance between -th and -th taxa. Clustering of taxa starts with a pair of two taxa with the smallest distance. Suppose that is the smallest among all distance values. Taxa 1 and 2 are then clustered with a branch point located at distance. In UPGMA we assume that the lengths of the branches leading from this branch point to taxa 1 and 2 are the same. Taxa 1 and 2 are then combined into a single composite taxon or cluster, and the distance between this and another taxon is computed by. Therefore, we will have the new distance matrix. We continue the algorithm until there are no more taxa to be grouped in one cluster. Minimum Evolution The principle of this method is to find the best topologies which has the smallest number of, which describe as (2) where is the total number of branches. It is computed for all plausible topologies. An estimate of evolutionary distance, branch length, and the sampling error between sequence and respectively represented as

11 3, and. Using matrix algebra, the equation would be like (3) So that the LS estimate of is then given by (4) where. Obviously, an estimate of the length of the -th branch is (5) Neighbor-Joining Saitou and Nei (1987, refers to Nei & Kumar 2000) developed an efficient treebuilding method that is based on the minimum evolution principle. Construction of a tree by the NJ method begins with a star tree, which is produced under the assumption that there is no clustering of taxa. We then estimate the branch lengths of the star tree and compute the sum of all branches. This sum should be greater than the sum for the final NJ tree (6) where is the total number of sequence used, is the branch length estimate between nodes and, and. In practice, since we do not know which pairs of taxa are true neighbors, we consider all pairs of taxa as a potential pair of taxa are true. We then choose the taxa and that show the smallest value (Equation 7). This procedure is repeated until the final tree is produced. (7) where and. Once the smallest determined, we can create a new node that connects taxa and. The branch lengths is given by the following formula: (8) (9) The next following step is to compute the distance between the new node ( ) and the remaining taxa. (10) Bootstrap Bootstrap firstly introduced by Efron (1979) to obtain estimates of error in nonstandard situations by resampling the data set many times to provide a distribution against which hypotheses could be tested. On 1985, Felsentein (referst to Soltis & Soltis 2003) formally proposed bootstrapping as a method for obtaining the confidence limits on phylogenies. We can indicate the tree-building process schematically as where is an estimated distance matrix. Felsentein s method proceeds as follows. A bootstrap data matrix is formed by randomly selecting columns from the original matrix. Then the original treebuilding algorithm is applied to, giving a bootstrap tree as Then, the proportions of bootstrap trees agreeing with the original tree are calculated. These proportions are the bootstrap confidence values. Bootstrap Variance In this paper, the bootstrap method was used to compute the variances of distance measure. The procedure for the bootstrap method in resampling the nucleotide sequences with base pairs lengths is the same way introduced before, where the random sample is produced by resampling the nucleotide sites (columns) with replacement. When the bootstrap resampled data set is obtained, distance estimations are then computed using Equation 1 for each sequence. This procedure is repeated times. We denote by, the value for the -th bootstrap replication. The bootstrap variance is then computed by (11) where is the mean of over all replications. One assumption often made for the bootstrap is that all sites evolve independently. This assumption of course does not hold in the present case. However, if the number of sites examined is large as in the present case, the effect of violation of the assumption is not important, because most sites with different evolutionary rates will be represented in each bootstrap sample. METHODOLOGY Data Sources For this research, the mtdna complete genome data from five common species of animals were obtained from Gen Bank ( for free. The data was accessed on April 2, The species

12 4 were: Bison bison, Bos taurus, Bos indicus, Bubalus bubalis, and Capra hircus. The information about d-loop region location was obtained from the information available as shown in Appendix 1. Methods The procedures that were conducted for this research are: 1. Downloaded the complete mtdna sequence from Gen Bank. The available data sets were: a. Bison bison (2) b. Bos taurus (10) c. Bos indicus (3) d. Bubalus bubalis (3) e. Capra hircus (7) Numbers in the bracket shows the amount of sequences that was downloaded from Gen Bank. 2. Reduced the data from complete mtdna sequence to D-loop region only in mtdna for all used sequences. The total number of the D-loop mtdna sequence is around 1,122 base-pairs length (before the gaps edited). 3. Aligned all data sets. It is necessary to make the numbers of nucleotide of the sequences compared to be the same. 4. Both insertions and deletions introduced gaps in the DNA sequence alignment due to the alignment procedure, so all gaps deletion in the data sets was needed. The total length of the D-loop mtdna sequence here already reduced to 882 base-pairs length. 5. Built the cases by making some groups of taxon, which are: a. Group A consists of: Bison bison (2), Bos taurus (2), Bos indicus (2), Bubalus bubalis (2), Capra hircus (2). b. Group B consists of: Bison bison (2), Bos taurus (10), Bos indicus (3), Bubalus bubalis (3), Capra hircus (7). c. Group C consists of: Bison bison (2), Bos taurus (1), Bos indicus (1), Bubalus bubalis (1), Capra hircus (1). d. Group D consists of: Bison bison (1), Bos taurus (10), Bos indicus (1), Bubalus bubalis (1), Capra hircus (1). e. Group E consists of: Bison bison (1), Bos taurus (1), Bos indicus (3), Bubalus bubalis (1), Capra hircus (1). f. Group F consists of: Bison bison (1), Bos taurus (1), Bos indicus (1), Bubalus bubalis (3), Capra hircus (1). g. Group G consists of: Bison bison (1), Bos taurus (1), Bos indicus (1), Bubalus bubalis (1), Capra hircus (7). Numbers in the brackets show the amount of sequences that was used to build the cases. The sample of species used in a group was selected randomly from available sequences. 6. Constructed the UPGMA, ME, and NJ based on p-distance model. The standard errors of overall mean of estimated distance for all groups were counted using the bootstrap procedure with 1000 repeated times. 7. Checked the reliability of each method using the bootstrap procedure that was repeated 100, 1000, and times. All procedure was conducted using MEGA RESULT AND DISCUSSION Distance Matrix In Table 1, we can see the overall mean of estimated distance for all groups. The standard error of this model is relatively small and constant. The standard error is computed using bootstrap procedure with 1000 repeated times. With these results, the p-distance model is reliable enough to be used in constructing the evolutionary tree. Table 1 Overall Mean and Standard Error for each Group Group Mean S.E A B C D E F G UPGMA s Performance in Reconstructing the Evolutionary Tree The reliability of UPGMA is averagely good in all conditions. In group C and group F, UPGMA shows stable topologies through the changing of repeated times using bootstrap procedure. UPGMA failed to construct a reliable topology to describe the relationship between BosIndicus(1) and BosIndicus(2) in group A and group E. The bootstrap confidence value goes down when the repeated times was changing from 100 to 1000 (from 100% to 99%), but constant in

13 5 Figure 3 The Evolutionary Tree with group E (b) a for 100, 1000, repeated times respectively of group A (a) and b 99% when the repeated times changed from 1000 to 10000, as you might see in Figure 3. From seven groups used to compare the reliability of each method, only two groups are stable from repeated times, while the repeated times changed from times, there are four consistence groups. No conclusion could be obtained in the reconstructing of group B where the was went up and down as the repeated times changed. But there was not topologies changing the sequences. ME s Performance in Reconstructing the Evolutionary Tree The reliability of ME is very low, especially when the number of used sequences is increase. In group B and group D where the number of used sequences respectively are 25 and 14, the consistence shown was low (Appendix 4). Just like UPGMA, ME could not gain a stable in group B. In fact, the went down as the changing of repeated times (Appendix 9). ME shows the consistence performance only in group F, while in group C where the number of sequences is only six, the instability happened in two nodes. The first one is the interior branches that joined BosIndicus-BosTaurus with the changing of repeated times from (decreased from 100% to 99%) and the second one is the nodes that relates BisBison(1)-BisBison(2) with BosIndicus-BosTaurus with the changing of repeated times from (decreased from 100% to 99%). NJ s Performance in Reconstructing the Evolutionary Tree The reliability of NJ method is increasing as the number of repeated times increase for almost all cases. In group A, when the repeated moves from 100 to 1000, the value of that shows the relationship between BosIndicus(1)-BosIndicus(2) goes down from 100% to 99%, but steady in 99% when the repeated times moves to The same thing happened in group E, in the nodes of BosIndicus(1)-BosIndicus(2). The Appendix 4 showed that the numbers of consistence clades in NJ varied when the repeated times changed from , but mostly steady when the repeated times changed from It means that when the variation among the used sequence is high, 1000 repeated times were sufficient to see the reliability of NJ. Figure 5 Consensus Tree of group E using NJ with respectively for 100, 1000, repeated times Figure 4 Consensus Tree of group C using ME with respectively for 100, 1000, repeated times NJ was also failed in constructing a reliable evolutionary tree for group B. It failed in maintaining the for all repeated times, but compared to ME and UPGMA, NJ is the most reliable method in constructing the

14 6 evolutionary of group B since the repeated times was only Performance Comparison of UPGMA, ME, and NJ in Reconstructing the Evolutionary Tree Compared to others, ME has the longest computational time (Appendix 10), while UPGMA is the shortest one. It may due to the computational iteration in ME, where all possible topologies were constructed one by one to find the topology which has the smallest number of, while UPGMA classified two taxon that has the smallest genetic distance for instance. Appendix 4 shows the consistence comparison among UPGMA, ME, and NJ methods through the changing of repeated times from 100 to 1000 and from 1000 to that has applied to all built cases. From graphic in Appendix 4, we could see that UPGMA has the most consistence value compared to ME and NJ for almost every group. While the evolutionary tree conducted from NJ method shows consistency starts when the repeated times were All methods failed in reconstructing a reliable evolutionary tree for group B (see Appendix 5). NJ shows slightly different topologies relationships with UPGMA. But unfortunately, they failed in giving a consistence for the topologies. Due to this condition, further understanding about this case is needed. Appendix 5 shows the nucleotide composition s means and variances for all built cases. This information shows that compared to other cases, the nucleotide variance for group B was relatively small for each nucleotide compositions. The same thing happened in group D where the nucleotide variance for T(U), C, A, G respectively are 0.192, 0.169, 0.207, and Those are relatively small comparing to group F, which has the nucleotide variance respectively 0.298, 0.480, 1.367, and While in group G, all methods showed inconsistency in construct the topologies among Capra hircus (Figure 6), especially in describing the relationship between CapHircus(2) with CapHircus(1)- CapHircus(4). It may caused by the nucleotide composition between CapHircus(1), CapHircus(2), and CapHircus(4). They have a slight different between Cytosine (C) and Guanine (G). Where the percentage of Cytosine in CapHircus(1) and CapHircus(4) is 26.4% while in CapHircus(2) is 26.3%. Otherwise, the percentage of Guanine in CapHircus(1) and CapHircus(4) is 15.5% while in CapHircus(2) is 15.6%. CONCLUSION Under the assumption that the nucleotide substitution rate is the same for all evolutionary lineages, UPGMA is the most consistence distance method, followed by NJ and ME at last. UPGMA is a good distance method that could be used if someone interest limitedly on classified the sequences and the total branch length, because the branch length (a) (b) (c) Figure 6 Consensus Tree of CapHircus using UPGMA (a), ME (b), and NJ (c) respectively for 100, 1000, repeated times

15 7 between nearest taxon in UPGMA is assumed to be equal, so this method is not appropriate if someone would like to know the evolutionary distance of sequences partially. While NJ is a consistence distance method when the number of bootstrap repeated times is not less than 1000 times. But NJ is a good distance method if someone would like to have the information about the evolutionary distance among sequences. When the number of sequences is large and the extent of sequence divergence is low, the realized tree may have many interior branches with zero length unless a large number of nucleotides are examined. In this case, like shown in group B, it is generally difficult to reconstruct the true tree by any method. In this case, there is no need to examine them, because the tree would not be reliable anyway whichever tree-building method is being used. It is now clear that there is no method that is superior to other methods in all conditions and that some methods perform better than others under certain conditions but worse under other conditions. Therefore, even if many interior branches of a tree are not well supported by the bootstrap, the tree should not be discarded. It is a hypothetical tree, but it could be a correct one. RECCOMENDATION This research is based on many assumptions and suffered by several limitations. If the assumptions and boundaries can be relaxed, a better result could be expected. There are some recommendations for the next research, which are: 1. The pairwise distance model in this research is under the assumption that the nucleotide substitution rate is the same for all evolutionary lineages. It might not be reflected the real condition of mtdna sequences that could be varied in mutation and or substitution rate. It might give a better result if the distance model could reflect the real mutation and or substitution rate in the sequences. 2. It would be interesting if the empirical distance method could be applied from the current statistical method, for example using the Bayesian or maximum likelihood method to estimated the parameter (pairwise distance), in order to find the most reliable method. REFERENCE [Anonim] DNA Mitokondria. DNAmitokondria&amp [May 18, 2010]. [Anonim] Understanding Evolution. icle/phylogenetics_05 [May 18, 2010]. Backeljau T et.al Multiple UPGMA and Neigbor-Joining Trees and the Performance of Some Computer Packages. Mol Biol Evol 13(2): g [March 23, 2010]. Efron B, Tibshirani R An Introduction to the Bootstrap. New York: Chapman & Hall. Elfaizi MA, Aprijani DA Bioinformatika: Perkembangan, Disiplin Ilmu, dan Penerapannya di Indonesia. http// rg/copyleft/fdl.html. [January 26, 2010] Ewens WJ, Grant GR, Dietz K, editor Statistical Methods in Bioinformatics: An Introduction. New York: Springer-Verlag. Gascuel O, Bryant D, Denis F Strengths and Limitations of the Minimum Evolution Principle. Sys Biol 50(5): [April 21, 2010]. Holmes S Bootstraping Phylogenetic Trees: Theory and Methods. Stat Sci 18(2): Husmeier D A Brief Tutorial on Phylogenetics. [March 22, 2010]. Li Yan How to Build a Phylogenetic Tree. [March 22, 2010]. Nei M, Kumar S Molecular Evolution and Phylogenetics. New York: Oxford University Press. Singh K, Xie M Bootstrap: A Statistical Method. gers.edu/~mxie/ RCPaRCPa/bootstrap.pdf. [May 25, 2010]. Soltis SP, Soltis DE Applying the Bootstrap in Phylogeny Reconstruction. Stat Sci 18(2):

16 8 Appendix1 Available Information of Bison bison LOCUS NC_ bp DNA circular MAM 13-APR-2009 DEFINITION Bison bison mitochondrion, complete genome. ACCESSION NC_ VERSION NC_ GI: DBLINK Project:36339 KEYWORDS. SOURCE mitochondrion Bison bison (American bison) ORGANISM Bison bison Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae; Bovinae; Bison. REFERENCE 1 (bases 1 to 16319) FEATURES // Location/Qualifiers source /organism="bison bison" /organelle="mitochondrion" /mol_type="genomic DNA" /db_xref="taxon:9901" /breed="american" D-loop join( ,1..360) trna /product="trna-phe" rrna /product="s-rrna" CDS /gene="cytb" /codon_start=1 /transl_table=2 /product="cytochrome b" /protein_id="yp_ " /db_xref="gi: " /db_xref="geneid: " /translation="mtnlrkshplmkivnnafidlpapsnisswwnfgsllgmcltlq ILTGLFLAMHYTSDTTTAFSSVAHICRDVNYGWIIRYMHANGASMFFICLYMHVGRGL YYGSYTFLETWNIGVILLLTVMATAFMGYVLPWGQMSFWGATVITNLLSAIPYIGTNL VEWIWGGFSVDKATLTRFFAFHFILPFIIMAIAMVHLLFLHETGSNNPTGISSDMDKI PFHPYYTIKDILGALLLILALMLLVLFTPDLLGDPDNYTPANPLNTPPHIKPEWYFLF AYAILRSIPNKLGGVLALAFSILILALIPLLHTSKQRSMIFRPLSQCLFWTLVADLLT LTWIGGQPVEHPYIIIGQMASIMYFLLILVLMPTAGTIENKLLKW" 1 actaatgact aatcagccca tgctcacaca taactgtgct gtcatacatt tggtattttt 61 ttattttggg ggatgcttgg actcagctat ggccgtcaaa ggccctgacc cggagcatct 121 attgtagctg gacttaactg caccttgagc accagcataa tggtaagcat gcacatatag 181 tcaatggtta caggacataa ctgtattata tatccccccc tccataaaaa ttccccctta 241 aatatttacc actgctttta acagattttt ccctagttac ctatttaaat tttccacact 301 ttcaatactc aaattagcac tccatataaa gtcaatatat aaacgcaggc cccccccccc 361 cgttgatgta gcttaaccca aagcaaggca ctgaaaatgc ctagatgagt ctcccaactc 1861 aataaatctc actgtaactt taaaagttaa tctaaaaagg tacagccttt tagaaacgga 1921 tacaaccttg actagagagt aaaatataac actaccatag taggcccaaa agcagccacc 1981 aattgagaaa gcgttaaagc tcaacaacaa aaattaaaca gatcccaata acaagtaatt 2041 aactcctagc cccaatactg gactaatcta ttattgaata gaagtaataa tgttagtatg 2101 agtaacaaga aaaactttct ccttgcataa gtctaagtca gtatctgata atactctgac cattaatgta ataaaaacat attatgtata tagtacatta aattatatgc cccatgcata taagcaagta cttatcctct attgacagta catagtacat aaagttatta attgtacata gcacattatg tcaaatctac ccttggcaac atgcatatcc cttccattag atcacgagct taattaccat gccgcgtgaa accagcaacc cgctaggcag aggatccctc ttctcgctcc gggcccatga accgtggggg tcgctattta atgaacttta tcagacatct ggttctttct tcagggccat ctcacctaaa atcgcccatt ctttcctctt aaataagaca tctcgatgg

17 9 Appendix 2 Nucleotide Composition T(U) C A G BisBison(1) BisBison(2) BosIndicus(1) BosIndicus(2) BosIndicus(3) BosTaurus(1) BosTaurus(2) BosTaurus(3) BosTaurus(4) BosTaurus(5) BosTaurus(6) BosTaurus(7) BosTaurus(8) BosTaurus(9) BosTaurus(10) BubBubalis(1) BubBubalis(2) BubBubalis(3) CapHircus(1) CapHircus(2) CapHircus(3) CapHircus(4) CapHircus(5) CapHircus(6) CapHircus(7) Avg Appendix 3 Nucleotide Pair Frequencies Domain ii si sv R TT TC TA TG CC CA CG AA GG Total avg

18 10 Appendix 4 Consistency comparison among UPGMA (a), ME (b), and NJ (c) conducted to all built cases UPGMA ME NJ Group A Group B Group C Group D Group E Group F Group G

19 11 Appendix 5(a) Statistics of the built cases Group A, Group C, Group E, and Group F Group A Group C Group E Group F Relationship constant constant constant constant UPGMA-NJ-ME UPGMA-NJ, ME - - UPGMA - ME - NJ UPGMA (1) constant (1) constant NJ (1) (1) (1) constant ME (2) (2) (2) constant No. of Sequence Sequence Lenth Description T(U) C A G T(U) C A G T(U) C A G T(U) C A G avg var

20 12 Appendix 5(b) Statistics of the built cases Group G, Group D, and Group B Group G Group D Group B Relationship constant varied unstable UPGMA-NJ-ME - UPGMA, NJ-ME - UPGMA (4) (5) (1) NJ (3) (6) (1) ME (4) (7) (2) No. of Sequence Sequence Lenth Description T(U) C A G T(U) C A G T(U) C A G avg var

21 13 Appendix 6 Constructed evolutionary tree for Group B using UPGMA (a), ME (b), and NJ (c) (a) (b) (c) BosTaurus(3) BosTaurus(4) BosTaurus(6) BosTaurus(7) BosTaurus(1) BosTaurus(2) BosTaurus(5) BosIndicus(2) BosTaurus(10) BosIndicus(1) BosTaurus(9) BosIndicus(3) BosTaurus(8) BisBison(1) BisBison(2) BubBubalis(3) BubBubalis(1) BubBubalis(2) CapHircus(7) CapHircus(6) CapHircus(3) CapHircus(5) CapHircus(2) CapHircus(1) CapHircus(4) BosTaurus(6) BosTaurus(7) BosTaurus(1) BosTaurus(3) BosTaurus(4) BosTaurus(2) BosTaurus(5) BosIndicus(2) BosTaurus(10) BosIndicus(1) BosTaurus(9) BosIndicus(3) BosTaurus(8) BisBison(1) BisBison(2) BubBubalis(3) BubBubalis(1) BubBubalis(2) CapHircus(7) CapHircus(6) CapHircus(3) CapHircus(1) CapHircus(4) CapHircus(2) CapHircus(5) BosTaurus(3) BosTaurus(4) BosTaurus(6) BosTaurus(7) BosTaurus(1) BosTaurus(2) BosTaurus(5) BosIndicus(2) BosTaurus(10) BosIndicus(1) BosTaurus(9) BosIndicus(3) BosTaurus(8) BisBison(1) BisBison(2) BubBubalis(3) BubBubalis(1) BubBubalis(2) CapHircus(7) CapHircus(6) CapHircus(3) CapHircus(5) CapHircus(2) CapHircus(1) CapHircus(4)

22 14 Appendix 7 Original constructed tree of Group B using UPGMA with repeated times respectively are 100, 1000, and Appendix 8 Original constructed tree of Group B using NJ with repeated times respectively are 100, 1000, and 10000

23 Appendix 9 Original constructed tree of Group B using ME with repeated times respectively are 100, 1000, and

24 16 Appendix 10 Comparison of Computational Time Among All Methods and Cases Group A Group B Group C Group D Group E Group F Group G Constructed Tree Bootstrap Tree UPGMA 01.1 s 01.3 s 01.5 s 05.9 s ME 01.5 s 01.7 s 01.9 s 07.5 s NJ 01.2 s 01.5 s 01.7 s 05.9 s UPGMA 01.3 s 01.5 s 03.4 s 21.8 s ME 01.6 s 01.7 s 04.1 s 31.1 s NJ 01.4 s 01.6 s 03.4 s 22.6 s UPGMA 01.3 s 01.4 s 01.6 s 03.9 s ME 01.3 s 01.7 s 03.0 s 04.4 s NJ 01.3 s 01.5 s 01.7 s 03.9 s UPGMA 01.4 s 01.4 s 01.9 s 08.7 s ME 01.5 s 01.5 s 03.7 s 12.1 s NJ 01.4 s 01.5 s 03.2 s 08.7 s UPGMA 01.4 s 01.5 s 01.7 s 04.3 s ME 01.5 s 01.6 s 01.9 s 04.8 s NJ 01.3 s 01.5 s 01.7 s 04.2 s UPGMA 01.3 s 01.4 s 01.6 s 04.3 s ME 01.5 s 01.5 s 02.8 s 04.7 s NJ 01.2 s 01.4 s 03.2 s 04.2 s UPGMA 01.4 s 01.5 s 01.8 s 06.3 s ME 01.5 s 01.6 s 02.8 s 08.2 s NJ 01.4 s 01.4 s 01.8 s 06.6 s

Agricultural University

Agricultural University , April 2011 p : 8-16 ISSN : 0853-811 Vol16 No.1 PERFORMANCE COMPARISON BETWEEN KIMURA 2-PARAMETERS AND JUKES-CANTOR MODEL IN CONSTRUCTING PHYLOGENETIC TREE OF NEIGHBOUR JOINING Hendra Prasetya 1, Asep

More information

Dr. Amira A. AL-Hosary

Dr. Amira A. AL-Hosary Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

More information

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

More information

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

More information

Phylogenetic inference

Phylogenetic inference Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

More information

BINF6201/8201. Molecular phylogenetic methods

BINF6201/8201. Molecular phylogenetic methods BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

More information

C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution. Exercises #3: Phylogenetics C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetic Tree Reconstruction

Phylogenetic Tree Reconstruction I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

More information

Lecture 11 Friday, October 21, 2011

Lecture 11 Friday, October 21, 2011 Lecture 11 Friday, October 21, 2011 Phylogenetic tree (phylogeny) Darwin and classification: In the Origin, Darwin said that descent from a common ancestral species could explain why the Linnaean system

More information

A (short) introduction to phylogenetics

A (short) introduction to phylogenetics A (short) introduction to phylogenetics Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR Statistics, Millport Field

More information

Phylogenetic inference: from sequences to trees

Phylogenetic inference: from sequences to trees W ESTFÄLISCHE W ESTFÄLISCHE W ILHELMS -U NIVERSITÄT NIVERSITÄT WILHELMS-U ÜNSTER MM ÜNSTER VOLUTIONARY FUNCTIONAL UNCTIONAL GENOMICS ENOMICS EVOLUTIONARY Bioinformatics 1 Phylogenetic inference: from sequences

More information

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

CHAPTERS 24-25: Evidence for Evolution and Phylogeny CHAPTERS 24-25: Evidence for Evolution and Phylogeny 1. For each of the following, indicate how it is used as evidence of evolution by natural selection or shown as an evolutionary trend: a. Paleontology

More information

8/23/2014. Phylogeny and the Tree of Life

8/23/2014. Phylogeny and the Tree of Life Phylogeny and the Tree of Life Chapter 26 Objectives Explain the following characteristics of the Linnaean system of classification: a. binomial nomenclature b. hierarchical classification List the major

More information

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004,

Algorithmic Methods Well-defined methodology Tree reconstruction those that are well-defined enough to be carried out by a computer. Felsenstein 2004, Tracing the Evolution of Numerical Phylogenetics: History, Philosophy, and Significance Adam W. Ferguson Phylogenetic Systematics 26 January 2009 Inferring Phylogenies Historical endeavor Darwin- 1837

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Distance Methods Character Methods

More information

Phylogeny: building the tree of life

Phylogeny: building the tree of life Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

More information

How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

More information

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

More information

EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

More information

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5. Five Sami Khuri Department of Computer Science San José State University San José, California, USA sami.khuri@sjsu.edu v Distance Methods v Character Methods v Molecular Clock v UPGMA v Maximum Parsimony

More information

Phylogenetics: Building Phylogenetic Trees

Phylogenetics: Building Phylogenetic Trees 1 Phylogenetics: Building Phylogenetic Trees COMP 571 Luay Nakhleh, Rice University 2 Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary model should

More information

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University

Phylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary

More information

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships Chapter 26: Phylogeny and the Tree of Life You Must Know The taxonomic categories and how they indicate relatedness. How systematics is used to develop phylogenetic trees. How to construct a phylogenetic

More information

Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment. Sequences Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

More information

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

More information

Consistency Index (CI)

Consistency Index (CI) Consistency Index (CI) minimum number of changes divided by the number required on the tree. CI=1 if there is no homoplasy negatively correlated with the number of species sampled Retention Index (RI)

More information

Theory of Evolution Charles Darwin

Theory of Evolution Charles Darwin Theory of Evolution Charles arwin 858-59: Origin of Species 5 year voyage of H.M.S. eagle (83-36) Populations have variations. Natural Selection & Survival of the fittest: nature selects best adapted varieties

More information

OMICS Journals are welcoming Submissions

OMICS Journals are welcoming Submissions OMICS Journals are welcoming Submissions OMICS International welcomes submissions that are original and technically so as to serve both the developing world and developed countries in the best possible

More information

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogeny? - Systematics? The Phylogenetic Systematics (Phylogeny and Systematics) - Phylogenetic systematics? Connection between phylogeny and classification. - Phylogenetic systematics informs the

More information

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution Taxonomy Content Why Taxonomy? How to determine & classify a species Domains versus Kingdoms Phylogeny and evolution Why Taxonomy? Classification Arrangement in groups or taxa (taxon = group) Nomenclature

More information

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei"

MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS. Masatoshi Nei MOLECULAR PHYLOGENY AND GENETIC DIVERSITY ANALYSIS Masatoshi Nei" Abstract: Phylogenetic trees: Recent advances in statistical methods for phylogenetic reconstruction and genetic diversity analysis were

More information

Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

More information

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center

Phylogenetic Analysis. Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Phylogenetic Analysis Han Liang, Ph.D. Assistant Professor of Bioinformatics and Computational Biology UT MD Anderson Cancer Center Outline Basic Concepts Tree Construction Methods Distance-based methods

More information

Letter to the Editor. Department of Biology, Arizona State University

Letter to the Editor. Department of Biology, Arizona State University Letter to the Editor Traditional Phylogenetic Reconstruction Methods Reconstruct Shallow and Deep Evolutionary Relationships Equally Well Michael S. Rosenberg and Sudhir Kumar Department of Biology, Arizona

More information

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics:

Homework Assignment, Evolutionary Systems Biology, Spring Homework Part I: Phylogenetics: Homework Assignment, Evolutionary Systems Biology, Spring 2009. Homework Part I: Phylogenetics: Introduction. The objective of this assignment is to understand the basics of phylogenetic relationships

More information

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree) I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

More information

How should we organize the diversity of animal life?

How should we organize the diversity of animal life? How should we organize the diversity of animal life? The difference between Taxonomy Linneaus, and Cladistics Darwin What are phylogenies? How do we read them? How do we estimate them? Classification (Taxonomy)

More information

Phylogenetic Tree Generation using Different Scoring Methods

Phylogenetic Tree Generation using Different Scoring Methods International Journal of Computer Applications (975 8887) Phylogenetic Tree Generation using Different Scoring Methods Rajbir Singh Associate Prof. & Head Department of IT LLRIET, Moga Sinapreet Kaur Student

More information

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization. Yanbin Yin Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

More information

a,bD (modules 1 and 10 are required)

a,bD (modules 1 and 10 are required) This form should be used for all taxonomic proposals. Please complete all those modules that are applicable (and then delete the unwanted sections). For guidance, see the notes written in blue and the

More information

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution

Inferring phylogeny. Today s topics. Milestones of molecular evolution studies Contributions to molecular evolution Today s topics Inferring phylogeny Introduction! Distance methods! Parsimony method!"#$%&'(!)* +,-.'/01!23454(6!7!2845*0&4'9#6!:&454(6 ;?@AB=C?DEF Overview of phylogenetic inferences Methodology Methods

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees Erin Molloy and Tandy Warnow {emolloy2, warnow}@illinois.edu University of Illinois at Urbana

More information

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization)

STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) STEM-hy: Species Tree Estimation using Maximum likelihood (with hybridization) Laura Salter Kubatko Departments of Statistics and Evolution, Ecology, and Organismal Biology The Ohio State University kubatko.2@osu.edu

More information

Effects of Gap Open and Gap Extension Penalties

Effects of Gap Open and Gap Extension Penalties Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

More information

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression)

Using phylogenetics to estimate species divergence times... Basics and basic issues for Bayesian inference of divergence times (plus some digression) Using phylogenetics to estimate species divergence times... More accurately... Basics and basic issues for Bayesian inference of divergence times (plus some digression) "A comparison of the structures

More information

Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley

Integrative Biology 200 PRINCIPLES OF PHYLOGENETICS Spring 2018 University of California, Berkeley Integrative Biology 200 "PRINCIPLES OF PHYLOGENETICS" Spring 2018 University of California, Berkeley B.D. Mishler Feb. 14, 2018. Phylogenetic trees VI: Dating in the 21st century: clocks, & calibrations;

More information

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Macroevolution Part I: Phylogenies

Macroevolution Part I: Phylogenies Macroevolution Part I: Phylogenies Taxonomy Classification originated with Carolus Linnaeus in the 18 th century. Based on structural (outward and inward) similarities Hierarchal scheme, the largest most

More information

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution

Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Likelihood Ratio Tests for Detecting Positive Selection and Application to Primate Lysozyme Evolution Ziheng Yang Department of Biology, University College, London An excess of nonsynonymous substitutions

More information

PHYLOGENY AND SYSTEMATICS

PHYLOGENY AND SYSTEMATICS AP BIOLOGY EVOLUTION/HEREDITY UNIT Unit 1 Part 11 Chapter 26 Activity #15 NAME DATE PERIOD PHYLOGENY AND SYSTEMATICS PHYLOGENY Evolutionary history of species or group of related species SYSTEMATICS Study

More information

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

More information

Chapter 19: Taxonomy, Systematics, and Phylogeny

Chapter 19: Taxonomy, Systematics, and Phylogeny Chapter 19: Taxonomy, Systematics, and Phylogeny AP Curriculum Alignment Chapter 19 expands on the topics of phylogenies and cladograms, which are important to Big Idea 1. In order for students to understand

More information

Estimating Evolutionary Trees. Phylogenetic Methods

Estimating Evolutionary Trees. Phylogenetic Methods Estimating Evolutionary Trees v if the data are consistent with infinite sites then all methods should yield the same tree v it gets more complicated when there is homoplasy, i.e., parallel or convergent

More information

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Why are these disciplines important in evolutionary biology and how are they related to each other? Phylogeny and systematics Phylogeny: the evolutionary history of a species

More information

Quantifying sequence similarity

Quantifying sequence similarity Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

More information

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks!

Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Some of these slides have been borrowed from Dr. Paul Lewis, Dr. Joe Felsenstein. Thanks! Paul has many great tools for teaching phylogenetics at his web site: http://hydrodictyon.eeb.uconn.edu/people/plewis

More information

Intraspecific gene genealogies: trees grafting into networks

Intraspecific gene genealogies: trees grafting into networks Intraspecific gene genealogies: trees grafting into networks by David Posada & Keith A. Crandall Kessy Abarenkov Tartu, 2004 Article describes: Population genetics principles Intraspecific genetic variation

More information

Phylogenetic methods in molecular systematics

Phylogenetic methods in molecular systematics Phylogenetic methods in molecular systematics Niklas Wahlberg Stockholm University Acknowledgement Many of the slides in this lecture series modified from slides by others www.dbbm.fiocruz.br/james/lectures.html

More information

"PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley

PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION Integrative Biology 200B Spring 2009 University of California, Berkeley "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B Spring 2009 University of California, Berkeley B.D. Mishler Jan. 22, 2009. Trees I. Summary of previous lecture: Hennigian

More information

What is Phylogenetics

What is Phylogenetics What is Phylogenetics Phylogenetics is the area of research concerned with finding the genetic connections and relationships between species. The basic idea is to compare specific characters (features)

More information

Cladistics and Bioinformatics Questions 2013

Cladistics and Bioinformatics Questions 2013 AP Biology Name Cladistics and Bioinformatics Questions 2013 1. The following table shows the percentage similarity in sequences of nucleotides from a homologous gene derived from five different species

More information

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc.

Chapter 26. Phylogeny and the Tree of Life. Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Pearson Education, Inc. Chapter 26 Phylogeny and the Tree of Life Lecture Presentations by Nicole Tunbridge and Kathleen Fitzpatrick Investigating the Tree of Life Phylogeny is the evolutionary history of a species or group of

More information

Reconstructing the history of lineages

Reconstructing the history of lineages Reconstructing the history of lineages Class outline Systematics Phylogenetic systematics Phylogenetic trees and maps Class outline Definitions Systematics Phylogenetic systematics/cladistics Systematics

More information

Phylogenetic analyses. Kirsi Kostamo

Phylogenetic analyses. Kirsi Kostamo Phylogenetic analyses Kirsi Kostamo The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among different groups (individuals, populations, species,

More information

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline Phylogenetics Todd Vision iology 522 March 26, 2007 pplications of phylogenetics Studying organismal or biogeographic history Systematics ating events in the fossil record onservation biology Studying

More information

Isolating - A New Resampling Method for Gene Order Data

Isolating - A New Resampling Method for Gene Order Data Isolating - A New Resampling Method for Gene Order Data Jian Shi, William Arndt, Fei Hu and Jijun Tang Abstract The purpose of using resampling methods on phylogenetic data is to estimate the confidence

More information

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26

Phylogeny 9/8/2014. Evolutionary Relationships. Data Supporting Phylogeny. Chapter 26 Phylogeny Chapter 26 Taxonomy Taxonomy: ordered division of organisms into categories based on a set of characteristics used to assess similarities and differences Carolus Linnaeus developed binomial nomenclature,

More information

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016 Molecular phylogeny - Using molecular sequences to infer evolutionary relationships Tore Samuelsson Feb 2016 Molecular phylogeny is being used in the identification and characterization of new pathogens,

More information

Darwin's theory of natural selection, its rivals, and cells. Week 3 (finish ch 2 and start ch 3)

Darwin's theory of natural selection, its rivals, and cells. Week 3 (finish ch 2 and start ch 3) Darwin's theory of natural selection, its rivals, and cells Week 3 (finish ch 2 and start ch 3) 1 Historical context Discovery of the new world -new observations challenged long-held views -exposure to

More information

Modern Evolutionary Classification. Section 18-2 pgs

Modern Evolutionary Classification. Section 18-2 pgs Modern Evolutionary Classification Section 18-2 pgs 451-455 Modern Evolutionary Classification In a sense, organisms determine who belongs to their species by choosing with whom they will mate. Taxonomic

More information

1 ATGGGTCTC 2 ATGAGTCTC

1 ATGGGTCTC 2 ATGAGTCTC We need an optimality criterion to choose a best estimate (tree) Other optimality criteria used to choose a best estimate (tree) Parsimony: begins with the assumption that the simplest hypothesis that

More information

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996

Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Bootstrap confidence levels for phylogenetic trees B. Efron, E. Halloran, and S. Holmes, 1996 Following Confidence limits on phylogenies: an approach using the bootstrap, J. Felsenstein, 1985 1 I. Short

More information

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition

Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition Assessing an Unknown Evolutionary Process: Effect of Increasing Site- Specific Knowledge Through Taxon Addition David D. Pollock* and William J. Bruno* *Theoretical Biology and Biophysics, Los Alamos National

More information

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method

Plan: Evolutionary trees, characters. Perfect phylogeny Methods: NJ, parsimony, max likelihood, Quartet method Phylogeny 1 Plan: Phylogeny is an important subject. We have 2.5 hours. So I will teach all the concepts via one example of a chain letter evolution. The concepts we will discuss include: Evolutionary

More information

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

More information

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003

CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1. Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 CS5238 Combinatorial methods in bioinformatics 2003/2004 Semester 1 Lecture 8: Phylogenetic Tree Reconstruction: Distance Based - October 10, 2003 Lecturer: Wing-Kin Sung Scribe: Ning K., Shan T., Xiang

More information

Chapter 7: Models of discrete character evolution

Chapter 7: Models of discrete character evolution Chapter 7: Models of discrete character evolution pdf version R markdown to recreate analyses Biological motivation: Limblessness as a discrete trait Squamates, the clade that includes all living species

More information

Curriculum Links. AQA GCE Biology. AS level

Curriculum Links. AQA GCE Biology. AS level Curriculum Links AQA GCE Biology Unit 2 BIOL2 The variety of living organisms 3.2.1 Living organisms vary and this variation is influenced by genetic and environmental factors Causes of variation 3.2.2

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 2009 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B

Microbial Diversity and Assessment (II) Spring, 2007 Guangyi Wang, Ph.D. POST103B Microbial Diversity and Assessment (II) Spring, 007 Guangyi Wang, Ph.D. POST03B guangyi@hawaii.edu http://www.soest.hawaii.edu/marinefungi/ocn403webpage.htm General introduction and overview Taxonomy [Greek

More information

Building Phylogenetic Trees UPGMA & NJ

Building Phylogenetic Trees UPGMA & NJ uilding Phylogenetic Trees UPGM & NJ UPGM UPGM Unweighted Pair-Group Method with rithmetic mean Unweighted = all pairwise distances contribute equally. Pair-Group = groups are combined in pairs. rithmetic

More information

Classification, Phylogeny yand Evolutionary History

Classification, Phylogeny yand Evolutionary History Classification, Phylogeny yand Evolutionary History The diversity of life is great. To communicate about it, there must be a scheme for organization. There are many species that would be difficult to organize

More information

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

More information

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

Molecular phylogeny How to infer phylogenetic trees using molecular sequences Molecular phylogeny How to infer phylogenetic trees using molecular sequences ore Samuelsson Nov 200 Applications of phylogenetic methods Reconstruction of evolutionary history / Resolving taxonomy issues

More information

Lecture Notes: Markov chains

Lecture Notes: Markov chains Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

More information

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057

Estimating Phylogenies (Evolutionary Trees) II. Biol4230 Thurs, March 2, 2017 Bill Pearson Jordan 6-057 Estimating Phylogenies (Evolutionary Trees) II Biol4230 Thurs, March 2, 2017 Bill Pearson wrp@virginia.edu 4-2818 Jordan 6-057 Tree estimation strategies: Parsimony?no model, simply count minimum number

More information

Introduction to characters and parsimony analysis

Introduction to characters and parsimony analysis Introduction to characters and parsimony analysis Genetic Relationships Genetic relationships exist between individuals within populations These include ancestordescendent relationships and more indirect

More information

Phylogeny and Molecular Evolution. Introduction

Phylogeny and Molecular Evolution. Introduction Phylogeny and Molecular Evolution Introduction 1 2/62 3/62 Credit Serafim Batzoglou (UPGMA slides) http://www.stanford.edu/class/cs262/slides Notes by Nir Friedman, Dan Geiger, Shlomo Moran, Ron Shamir,

More information

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible.

Microbes usually have few distinguishing properties that relate them, so a hierarchical taxonomy mainly has not been possible. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; "fast- clock" molecules for fine-structure.

Microbial Taxonomy. Slowly evolving molecules (e.g., rrna) used for large-scale structure; fast- clock molecules for fine-structure. Microbial Taxonomy Traditional taxonomy or the classification through identification and nomenclature of microbes, both "prokaryote" and eukaryote, has been in a mess we were stuck with it for traditional

More information

Biology 211 (2) Week 1 KEY!

Biology 211 (2) Week 1 KEY! Biology 211 (2) Week 1 KEY Chapter 1 KEY FIGURES: 1.2, 1.3, 1.4, 1.5, 1.6, 1.7 VOCABULARY: Adaptation: a trait that increases the fitness Cells: a developed, system bound with a thin outer layer made of

More information

DEVELOPMENT OF LAND SUITABILITY EVALUATION SYSTEM FOR COASTAL AQUACULTURE USING ARTIFICIAL NEURAL NETWORK AND GEOGRAPHICAL INFORMATION SYSTEMS

DEVELOPMENT OF LAND SUITABILITY EVALUATION SYSTEM FOR COASTAL AQUACULTURE USING ARTIFICIAL NEURAL NETWORK AND GEOGRAPHICAL INFORMATION SYSTEMS DEVELOPMENT OF LAND SUITABILITY EVALUATION SYSTEM FOR COASTAL AQUACULTURE USING ARTIFICIAL NEURAL NETWORK AND GEOGRAPHICAL INFORMATION SYSTEMS Case Study: Mahakam Delta, East Kalimantan I KETUT SUTARGA

More information

Microbial Taxonomy and the Evolution of Diversity

Microbial Taxonomy and the Evolution of Diversity 19 Microbial Taxonomy and the Evolution of Diversity Copyright McGraw-Hill Global Education Holdings, LLC. Permission required for reproduction or display. 1 Taxonomy Introduction to Microbial Taxonomy

More information