Comparison of Cost Functions in Sequence Alignment. Ryan Healey

Similar documents
InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Effects of Gap Open and Gap Extension Penalties

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Phylogenetic hypotheses and the utility of multiple sequence alignment

Small RNA in rice genome

BLAST. Varieties of BLAST

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Sequence Alignment Techniques and Their Uses

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

NJMerge: A generic technique for scaling phylogeny estimation methods and its application to species trees

Statistical Distributions of Optimal Global Alignment Scores of Random Protein Sequences

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Multiple sequence alignment accuracy and phylogenetic inference

Quantifying sequence similarity

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Large Grain Size Stochastic Optimization Alignment

Motivating the need for optimal sequence alignments...

Woods Hole brief primer on Multiple Sequence Alignment presented by Mark Holder

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Thanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Multiple sequence alignment

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Unsupervised Learning in Spectral Genome Analysis

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Sequence Alignment (chapter 6)

GAMINGRE 8/1/ of 7

Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing

Dr. Amira A. AL-Hosary

A profile-based protein sequence alignment algorithm for a domain clustering database

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Network alignment and querying

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Phylogenetic inference

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Tools and Algorithms in Bioinformatics

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Multiple Alignment of Genomic Sequences

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Erasing Errors Due to Alignment Ambiguity When Estimating Positive Selection

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

I519 Introduction to Bioinformatics, Genome Comparison. Yuzhen Ye School of Informatics & Computing, IUB

Phylogenomics, Multiple Sequence Alignment, and Metagenomics. Tandy Warnow University of Illinois at Urbana-Champaign

Single alignment: Substitution Matrix. 16 march 2017

Lecture Notes: Markov chains

Comparison of heuristic approaches to the generalized tree alignment problem

HMMs and biological sequence analysis

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

The statistical and informatics challenges posed by ascertainment biases in phylogenetic data collection

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

In-Depth Assessment of Local Sequence Alignment

Computational approaches for functional genomics

Symmetric Tree, ClustalW. Divergence x 0.5 Divergence x 1 Divergence x 2. Alignment length

Supporting Information

Genomes and Their Evolution

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

EECS730: Introduction to Bioinformatics

Hidden Markov models in population genetics and evolutionary biology

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

BIOINFORMATICS: An Introduction

SUPPLEMENTARY INFORMATION

Overview Multiple Sequence Alignment

Markov Chains and Hidden Markov Models. = stochastic, generative models

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Handling Rearrangements in DNA Sequence Alignment

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

An Introduction to Sequence Similarity ( Homology ) Searching

Information Theoretic Distance Measures in Phylogenomics

Warm-Up- Review Natural Selection and Reproduction for quiz today!!!! Notes on Evidence of Evolution Work on Vocabulary and Lab

Some Problems from Enzyme Families

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs

Probalign: Multiple sequence alignment using partition function posterior probabilities

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Whole Genome Alignments and Synteny Maps

first (i.e., weaker) sense of the term, using a variety of algorithmic approaches. For example, some methods (e.g., *BEAST 20) co-estimate gene trees

The Phylo- HMM approach to problems in comparative genomics, with examples.

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

LOCAL RELIABILITY MEASURES FROM SETS OF CO-OPTIMAL MULTIPLE SEQUENCE ALIGNMENTS

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists

Molecular phylogeny How to infer phylogenetic trees using molecular sequences

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Supplemental Data. Perea-Resa et al. Plant Cell. (2012) /tpc

Sequence analysis and Genomics

Bioinformatics Exercises

CLADOGRAMS & GENETIC PHYLOGENIES

Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Fundamentals of database searching

Transcription:

Comparison of Cost Functions in Sequence Alignment Ryan Healey

Use of Cost Functions Used to score and align sequences Mathematically model how sequences mutate and evolve. Evolution and mutation can be dependent on the source and other conditions of each sequence. Cost functions can be context dependent. Small changes can have significant effects (as measured by sensitivity)

Common Methods: Gap Functions Simple C Affine C + C L Logarithmic C + C Log(L) Affine-Logarithmic C + C L + C Log(L) Other Methods Stochastic / Probabilistic Weighting by sequence, position, nucleotide, etc. Structural Homology Guidance And more

Questions to Answer When (if ever) is each method preferred? How are parameter values chosen? How do the parameter values affect performance? What limitations may make each method insufficient? Methods of Comparison Popularity Speed-Complexity Alignment Accuracy Divergence Others?

Sources [1] Altschul, Stephen. "Gap Costs for Multiple Sequence Alignment." Gap Costs for Multiple Sequence Alignment - ScienceDirect. Journal of Theoretical Biology, n.d. Web. [2] Cartwright, Reed A. "Logarithmic Gap Costs Decrease Alignment Accuracy." BMC Bioinformatics. BioMed Central, 05 Dec. 2006. Web. [3] Cartwright, Reed A. "Problems and Solutions for Estimating Indel Rates and Length Distributions." Molecular Biology and Evolution. Oxford University Press, 28 Nov. 2008. Web. [4] Fan, YanHui, Qi Shi, JinFeng Chen, WenJuan Wang, HongXia Pang, JiaoWei Tang, and ShiHeng Tao. "The Rates and Patterns of Insertions, Deletions and Substitutions in Mouse and Rat Inferred from Introns." SpringerLink. SP Science in China Press, 17 Sept. 2008. Web. [5] Liu, Kevin, and Tandy Warnow. "Barking Up The Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy - IEEE Xplore Document." Barking Up The Wrong Treelength: The Impact of Gap Penalty on Alignment and Tree Accuracy - IEEE Xplore Document. IEEE, 2008. Web. [6] Keightley, Peter D., and Toby Johnson. "MCALIGN: Stochastic Alignment of Noncoding DNA Sequences Based on an Evolutionary Model of Sequence Evolution." Genome Research. Cold Spring Harbor Lab, 01 Jan. 1970. Web. [7] Kim, Jaebum, and Saurabh Sinha. "Indelign: A Probabilistic Framework for Annotation of Insertions and Deletions in a Multiple Alignment." Bioinformatics. Oxford University Press, 15 Nov. 2006. Web. [8] Liu, Kevin, and Tandy Warnow. "Treelength Optimization for Phylogeny Estimation." PLOS ONE. Public Library of Science, 19 Mar. 2012. Web. [9] Lunter, Gerton. "Probabilistic Whole-genome Alignments Reveal High Indel Rates in the Human and Mouse Genomes." Bioinformatics. Oxford University Press, 01 July 2007. Web. [10] Ogden, T. Heath, and Michael S. Rosenberg. "Alignment and Topological Accuracy of the Direct Optimization Approach via POY and Traditional Phylogenetics via ClustalW + PAUP*."Systematic Biology. Oxford University Press, 01 Apr. 2007. Web. [11] Phillips, Aloysius, Daniel Janies, and Ward Wheeler. "Multiple Sequence Alignment in Phylogenetic Analysis." Multiple Sequence Alignment in Phylogenetic Analysis - ScienceDirect. Molecular Phylogenetics and Evolution, Sept. 2000. Web. [12] Redelings, Benjamin. "Erasing Errors Due to Alignment Ambiguity When Estimating Positive Selection." Molecular Biology and Evolution. Oxford Academic, 27 May 2014. Web. [13] Rivas, Elena, and Sean R. Eddy. "Parameterizing Sequence Alignment with an Explicit Evolutionary Model." BMC Bioinformatics. BioMed Central, 10 Dec. 2015. Web. [14] Shafee, Thomas M. A., Andrew J. Robinson, Nicole Weerden, and Marilyn A. Anderson. "Structural Homology Guided Alignment of Cysteine Rich Proteins." SpringerPlus. Springer International Publishing, 12 Jan. 2016. Web. [15] Thompson, Julie D., Desmond G. Higgins, and Toby J. Gibson. "CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-specific Gap Penalties and Weight Matrix Choice." Nucleic Acids Research. Oxford University Press, 11 Nov. 1994. Web. 12 Apr. 2017. [16] Varón, Andrés; Wheeler, Ward; and Bar-Noy, Amotz, "TR-2008015: An Efficient Heuristic for the Tree Alignment Problem" (2008). CUNY Academic Works. [17] Varón, Andrés, and Ward C. Wheeler. "The Tree Alignment Problem." BMC Bioinformatics. BioMed Central, 2012. Web. [18] Yamane, Kyoko, Kentaro Yano, and Taihachi Kawahara. "Pattern and Rate of Indel Evolution Inferred from Whole Chloroplast Intergenic Regions in Sugarcane, Maize and Rice." DNA Research. Oxford University Press, 01 Jan. 2006. Web. [19] Zhang, Jia, Li Xiao, Yufang Yin, Pierre Sirois, Hanlin Gao, and Kai Li. "A Law of Mutation: Power Decay of Small Insertions and Small Deletions Associated with Human Diseases."SpringerLink. Humana Press Inc, 10 Oct. 2009. Web.