Multiple Sequence Alignment

Similar documents
Multiple Sequence Alignment: HMMs and Other Approaches

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Copyright 2000 N. AYDIN. All rights reserved. 1

Overview Multiple Sequence Alignment

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Sequence Analysis '17- lecture 8. Multiple sequence alignment

Protein Threading. BMI/CS 776 Colin Dewey Spring 2015

Week 10: Homology Modelling (II) - HHpred

Tools and Algorithms in Bioinformatics

Multiple Sequence Alignment

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Exercise 5. Sequence Profiles & BLAST

Hidden Markov Models (Part 1)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

HMMs and biological sequence analysis

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Pattern Matching (Exact Matching) Overview

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

Moreover, the circular logic

Multiple Sequence Alignment using Profile HMM

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Introduction to Bioinformatics

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Sequence Alignment Techniques and Their Uses

Sequence analysis and Genomics

Similarity searching summary (2)

Multiple Sequence Alignment

Large-Scale Genomic Surveys

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Advanced topics in bioinformatics

An Introduction to Sequence Similarity ( Homology ) Searching

Multiple sequence alignment

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Multiple Whole Genome Alignment

Hidden Markov Models

Introduction to Bioinformatics Online Course: IBT

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Comparative Network Analysis

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Pairwise sequence alignment and pair hidden Markov models

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Heuristic Alignment and Searching

Effects of Gap Open and Gap Extension Penalties

An Introduction to Bioinformatics Algorithms Hidden Markov Models

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Example questions. Z:\summer_10_teaching\bioinfo\Beispiel_frage_bioinformatik.doc [1 / 5]

Haploid & diploid recombination and their evolutionary impact

11.3 Decoding Algorithm

EVOLUTIONARY DISTANCES

Neyman-Pearson. More Motifs. Weight Matrix Models. What s best WMM?

DNA1: Last week's take-home lessons

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Sequencing alignment Ameer Effat M. Elfarash

HMM : Viterbi algorithm - a toy example

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

Michael Yaffe Lecture #5 (((A,B)C)D) Database Searching & Molecular Phylogenetics A B C D B C D

Graph Alignment and Biological Networks

Scoring Matrices. Shifra Ben-Dor Irit Orr

Ch. 9 Multiple Sequence Alignment (MSA)

Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models

Multiple Sequence Alignment

Bioinformatics 1--lectures 15, 16. Markov chains Hidden Markov models Profile HMMs

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Introduction to Mathematical Methods in Bioinformatics

Lecture 8 Multiple Alignment and Phylogeny

Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Algorithms in Bioinformatics

EECS730: Introduction to Bioinformatics

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

1.5 Sequence alignment

Interpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Stephen Scott.

Gibbs Sampling Methods for Multiple Sequence Alignment

Inferring Models of cis-regulatory Modules using Information Theory

Computational Biology

Algorithmische Bioinformatik WS 11/12:, by R. Krause/ K. Reinert, 14. November 2011, 12: Motif finding

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Single alignment: Substitution Matrix. 16 march 2017

Hidden Markov Models

BLAST: Basic Local Alignment Search Tool

Transcription:

Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for scoring an alignment Do: determine the correspondences between the sequences such that the alignment score is maximized 1

Motivation for MSA establish input data for phylogenetic analyses determine evolutionary history of a set of sequences At what point in history did certain mutations occur? discovering a common motif in a set of sequences (e.g. DNA sequences that bind the same protein characterizing a set of sequences (e.g. a protein family building profiles for sequence-database searching PSI-BLAST generalizes a query sequence into a profile to search for remote relatives Multiple Alignment of SH3 Domain Figure from A. Krogh, An Introduction to Hidden Marov Models for Biological Sequences 2

Scoring a Multiple Alignment ey issue: how do we assess the quality of a multiple sequence alignment? usually, the assumption is made that the individual columns of an alignment are independent Score(m = G + S(m i gap function we ll discuss two methods sum of pairs (SP minimum entropy i score of i th column Scoring an Alignment: Sum of Pairs compute the sum of the pairwise scores mi s = = S(m i = s(m i,m l i < l character of the th sequence in the i th column substitution matrix 3

Scoring an Alignment: Minimum Entropy basic idea: try to minimize the entropy of each column another way of thining about it: columns that can be communicated using few bits are good information theory tells us that an optimal code uses log 2 p bits to encode a message of probability p Scoring an Alignment: Minimum Entropy the messages in this case are the characters in a given column the entropy of a column is given by: S( m c log 2 p m i c ia p ia = = = = i ia ia a the i th column of an alignment m count of character a in column i probability of character a in column i 4

Dynamic Programming Approach can find optimal alignments using dynamic programming generalization of methods for pairwise alignment consider -dimension matrix for sequences (instead of 2-dimensional matrix each matrix element represents alignment score for subsequences (instead of 2 subsequences given sequences of length n space complexity is O( n Dynamic Programming Approach α i, i, K, i 1 2 max score of alignment for subsequences x, x, K, x 1 2 i i i 1 2 α α α = max α 1 2 i 1, i 1, K, i 1 + S( xi, xi, K, xi 1 2 1 2 2 i, i 1, K, i 1 + S(, xi, K, xi 1 2 2 1 i 1, i, K, i 1 + S( xi,, K, xi 1 2 1 i, i, K, i 1 i 1 2 M M + S(,, K, x 5

Dynamic Programming Approach given sequences of length n time complexity is O 2 ( 2 n if we use sum of pairs O( 2 n if column scores can be computed in O(, as with entropy Heuristic Alignment Methods since time complexity of DP approach is exponential in the number of sequences, heuristic methods are usually used progressive alignment: construct a succession of pairwise alignments star approach tree approaches, lie CLUSTALW etc. iterative refinement given a multiple alignment (say from a progressive method remove a sequence, realign it to profile of other sequences repeat until convergence 6

Star Alignment Approach given: sequences to be aligned x,..., 1 x pic one sequence x c as the center for each x x determine an optimal alignment between x i and c x i c merge pairwise alignments return: multiple alignment resulting from aggregate Star Alignments: Approaches to Picing the Center Two possible approaches: 1. try each sequence as the center, return the best multiple alignment 2. compute all pairwise alignments and select the string x c that maximizes: sim(x i,x c i c 7

Star Alignments: Aggregating Pairwise Alignments once a gap, always a gap shift entire columns when incorporating gaps Star Alignment Example Given: ATGGCCATT ATCCAATTTT ATCTTCTT ATTGCCGATT ATGGCCATT ATC-CAATTTT -- ATTGCCGATT ATTGCC-ATT ATCTTC-TT 8

Star Alignment Example merging pairwise alignments 1. present pair ATGGCCATT alignment ATGGCCATT 2. ATC-CAATTTT -- -- ATGGCCATT-- ATC-CAATTTT Star Alignment Example 3. present pair ATCTTC-TT alignment -- ATGGCCATT-- ATC-CAATTTT ATCTTC-TT-- 4. ATTGCCGATT ATTGCC-ATT shift entire columns when incorporating a gap ATTGCC- A TT-- ATGGCC- A TT-- ATC-CA- A TTTT ATCTTC- - TT-- ATTGCCG A TT-- 9

Tree Alignments basic idea: organize multiple sequence alignment using a guide tree leaves represent sequences internal nodes represent alignments determine alignments from bottom of tree upward return multiple alignment represented at the root of the tree one common variant: the CLUSTALW algorithm [Thompson et al. 1994] Doing the Progressive Alignment in CLUSTALW depending on the internal node in the tree, we may have to align a a sequence with a sequence a sequence with a profile (partial alignment a profile with a profile in all cases we can use dynamic programming for the profile cases, use SP scoring 10

Tree Alignment Example -TGTTAAC -TGT-AAC -TGT--AC ATGT---C ATGT-GGC -TGTAAC -TGT-AC ATGT--C ATGTGGC TGTAAC TGT-AC ATGT--C ATGTGGC TGTTAAC TGTAAC TGTAC ATGTC ATGTGGC Aligning Profiles aligning sequences/profiles to profiles is essentially pairwise alignment shift entire columns when incorporating gaps TGTTAAC -TGT AAC -TGT -AC ATGT --C ATGT GGC -TGTTAAC -TGT-AAC -TGT--AC ATGT---C ATGT-GGC 11