Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Similar documents
Multiple Sequence Alignment

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

Copyright 2000 N. AYDIN. All rights reserved. 1

Multiple Sequence Alignment

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Phylogenetic inference

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

1. In most cases, genes code for and it is that

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Overview Multiple Sequence Alignment

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence analysis and comparison

Introduction to Bioinformatics Introduction to Bioinformatics

Quantifying sequence similarity

Multiple Sequence Alignment

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Moreover, the circular logic

p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Evolutionary Models. Evolutionary Models

Lecture 5: September Time Complexity Analysis of Local Alignment

CREATING PHYLOGENETIC TREES FROM DNA SEQUENCES

Practical considerations of working with sequencing data

Multiple Choice Review- Eukaryotic Gene Expression

Pairwise & Multiple sequence alignments

Alignment Strategies for Large Scale Genome Alignments

Lecture Notes: Markov chains

Using Bioinformatics to Study Evolutionary Relationships Instructions

Lecture 8 Multiple Alignment and Phylogeny

Today s Lecture: HMMs

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Evolutionary Tree Analysis. Overview

BINF 730. DNA Sequence Alignment Why?

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Introduction to Evolutionary Concepts

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

EVOLUTIONARY DISTANCES

1.5 Sequence alignment

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Multiple sequence alignment

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Applications of Hidden Markov Models

Phylogenetic trees 07/10/13

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Bio nformatics. Lecture 3. Saad Mneimneh

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Motivating the need for optimal sequence alignments...

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

objective functions...

Algorithms in Bioinformatics

Phylogenetic Tree Reconstruction

Effects of Gap Open and Gap Extension Penalties

Supplementary Information for Hurst et al.: Causes of trends of amino acid gain and loss

Genomics and bioinformatics summary. Finding genes -- computer searches

Tools and Algorithms in Bioinformatics

Computational Biology: Basics & Interesting Problems

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Pairwise sequence alignment and pair hidden Markov models

Multiple Sequence Alignment. Sequences

Ch. 9 Multiple Sequence Alignment (MSA)

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

A Browser for Pig Genome Data

Mul$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Sequence comparison: Score matrices

Phylogenetic Trees. Phylogenetic Trees Five. Phylogeny: Inference Tool. Phylogeny Terminology. Picture of Last Quagga. Importance of Phylogeny 5.

GCD3033:Cell Biology. Transcription

Mitochondrial Genome Annotation

An Introduction to Sequence Similarity ( Homology ) Searching

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

SUPPLEMENTARY INFORMATION

Multiple Sequence Alignment: HMMs and Other Approaches

Bioinformatics 1 -- lecture 9. Phylogenetic trees Distance-based tree building Parsimony

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

HMMs and biological sequence analysis

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Graph Alignment and Biological Networks

BIOINFORMATICS: An Introduction

Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

7. Tests for selection

Probalign: Multiple sequence alignment using partition function posterior probabilities

Exploring Evolution & Bioinformatics

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Sequence Analysis '17- lecture 8. Multiple sequence alignment

GEP Annotation Report

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Lecture 2: Pairwise Alignment. CG Ron Shamir

Pairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

A (short) introduction to phylogenetics

Transcription:

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) 2 19 2015 Scribe: John Ekins Multiple Sequence Alignment Given N sequences x 1, x 2,, x N : Insert gaps in each of the sequences such that: All sequences have the same length L. Score of the global map is maximum. Multiple alignments allow us to explore the protein sequences and other similarities between different organisms. The things that are common must be useful and thus can tell us the function and more details about the proteins and features that are conserved. The best scheme for scoring the best pairs is the sum of pairs scoring method. Gene Structure There is a process in which a DNA molecule is transcribed, spliced, and translated into a protein, removing introns and keeping extrons. Those are the instructions for which precise ordering of codons will make up that protein needed. Certain structures will be more prone to mutation than others. Induced Pairwise Alignment We take two sequences and remove gaps inserted in the same column. For example: x: AC GCGG C y: AC GC GAG z: GCCGC GAG causes a comparison like this: x: AC GCGG C y: AC GC GAG x: AC GCGG C y: AC GC GAG z: GCCGC GAG z: GCCGC GAG The induced pairwise alignment of 2 sequences is obtained by removing gaps and glueing the sequences together. Sum of Pairs Given n sequences which are multiply aligned, this scoring scheme sums the scores of the induced pairwise alignments between any pair of sequences.

Weighted Sum of Pairs linearly weighted score: We would like to weight pairs of sequences in cases where some species are densely populated and other are sparsely populated. For example, if we have a more populated horse branch and a sparsely populated human branch in the diagram shown, we may not want too much redundant information caused by the population density of horse. So we could weigh humans heavier than horses. Profile Representation

With this technique we create a profile representation of a multiple alignment and the probability that the the base pair is a certain letter in that position. Multidimensional Dynamic Programming We find the optimal alignment up to a given point by maximizing over its neighbors score. This model only has a gap penalty. The graphic and formula refer to 3 sequences that with 7 neighbors for each cell.

One drawback is running time at O(L N 2 N ) since we have N sequences of length L and every cell has 2 N 1 neighbors. If we add affine gap scoring running time becomes O(L N 4 N ). This algorithm isn t used much in practice. Progressive Alignment We can use an evolutionary tree to compare different species and generate a pairwise alignment at each node. The weights at the node are proportional to the divergence in the corresponding edges. The algorithm goes like this: Align closest nodes first, in order of tree In each step, align two sequences to generate a new alignment and associated profile Heres an example problem:

Iterative Refinement

The Barton Stenberg algorithm iteratively removes a sequence and realigns it to the rest of the sequences. We reduce time complexity by allowing the sequence to vary in an arbitrary band around sequence. Example alignment: Here is an example where iterative alignment doesn t solve the multiple alignment problem because removing any single y i doesn t change the output.

Consistency The consistency technique allows us to find even more accurate alignments by avoiding making bad choices in pairwise alignment between two sequences with the help of information from other multiple sequences.

Multiple Alignment Programs Genome Resources Annotation and alignment genome browser at UCSC http://genome.ucsc.edu/cgi bin/hggateway Specialized VISTA alignment browser at LBNL http://pipeline.lbl.gov/cgi bin/gateway2 ABC Nice Stanford tool for browsing alignments http://encode.stanford.edu/~asimenos/abc/ Protein Multiple Aligners CLUSTALW most widely used http://www.ebi.ac.uk/clustalw/ MUSCLE most scalable http://phylogenomics.berkeley.edu/cgi bin/muscle/input_muscle.py PROBCONS most accurate http://probcons.stanford.edu/