# Sequence Analysis '17- lecture 8. Multiple sequence alignment

Size: px
Start display at page:

## Transcription

1 Sequence Analysis '17- lecture 8 Multiple sequence alignment

2 Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database size and P() is the EVD, which models the number of random database search scores. So, by definition, the number of random database search scores is the e-value. m* P(S x) = 10 e-value = 10 m* P(S x) = 4 e-value = 4 m*p(s x) = 2 e-value = 2 m*p(s x) = 1 e-value = 1 m* P(S x) = 3 10 Random chance number of occurrences in a database search e-value 2

3 Manual editing of alignments in UGENE Download and open bad alignment from the course web page* Align using Kalign. Can you make it better? Edit manually to consolidate gaps without forcing too many mismatches. How many indel events are implied by your alignment? * Opens as an alignment. Older versions of UGENE open this as a list of sequences instead of an alignment. If it does, select them and right-click/export all sequences as alignment, add to Project. 3

4 Methods for multiple sequence alignment Dynamic programming Star Progressive ClustalW, uses variable gap penalty Kalign. Very fast. Uses exact match. Progressive + stochastic Muscle. 4 MSA algorithms must be computationally efficient AND biologically relevant.

5 Is dynamic programming possible for more than two sequences? A 3 sequence alignment matrix... DP in 3D S(i,j,k) = MAX { A(i-1,j-1,k-1)+S(i,j,k), A(i-1,j,k)-gap, A(i,j-1,k)-gap, A(i,j,k-1)-gap, A(i-1,j-1,k)-gap, A(i-1,j,k-1)-gap, A(i,j-1,k-1)-gap } How about adding a 4th seq? How does DP run-time scale with number of seuqences? 5

6 Star alignment 1. Align all sequences to one sequence. 2. Stack them up. B Potential problems with star alignment: Unaligned gaps. Ambiguous associations C E A D A G H. I. W W. P F W P A G H. I I F W. P Y.. A G H I I.. W F P F W P A G H. I P W W. P... F G Each pairwise alignment by itself looks fine, but when you stack them up, you see disagreements. 6

7 What that alignment should look like. A G H I. W W P F W P A G H I I F W P Y.. A G H I I W F P F W P A G H I P W W P... 7

8 BLAST "query-anchored" alignments are star alignments 8

9 Progressive alignment Method for progressive alignment 1. Align all pairs. Save scores in a 2. Make a guide tree. 3.Pairwise align two most similar. 4. Align the next two most similar sequence. Etc. 5. Add sequences until all sequences are aligned Current alignment { sequence to add A W P Y distance matrix gap A G H I. W W P F A G H I I F W P Y DP alignment matrix guide tree 9

10 Progressive alignment Method for progressive alignment 1. Align all pairs. Save scores in a distance matrix 2. Make a guide tree. 3.Pairwise align two most similar. 4. Align the next two most similar sequence. Etc. 5. Add sequences until all sequences are aligned sequence sequence

11 "Distances" versus "similarities" Maximizing similarity and Minimizing distance are equivalent if d(i,j) + s(i,j) = s max, where s max is the maximum possible similarity, and the minimum distance is d=0. For each position in the alignment. Distance based on identity score (p-distance) d = %identity Distance using empirical J-C correction djc = -ln((s real -S rand )/(S ident -S rand )) where Sident = score of an identity alignment, and Srand = mode score of a false alignment. For proteins, Srand 25%. Twilight zone (R. Doolittle, 1986) djc sreal

12 Juke-Cantor for proteins Empirical J-C correction djc = -ln((pid-25)/75) where 25 = mode score of a false alignment. djc sreal p-distance 0

13 Progressive alignment Method for progressive alignment 1. Align all pairs. Save scores in a distance matrix 2. Make a guide tree. 3.Pairwise align two most similar. 4. Align the next two most similar sequence. Etc. 5. Add sequences until all sequences are aligned distance matrix 13 Select shortest distance i,j Join i,j Reduce the rank of the distance matrix by joining columns i and j, rows i, j Minimum rule: select the minimum of the values Maximum rule: select the maximum of the values Repeat until rank = 1.

14 In class: progressive alignment Making a guide tree Neighbor-joining algorithm: A B C D E F A B C D E F A Fill in J-C distances. B C D E F Draw guide tree here

15 Progressive alignment Method for progressive alignment 1. Align all pairs. Save scores in a distance matrix 2. Make a guide tree. 3.Pairwise align two most similar. 4. Align the next two most similar sequence. 5. Add sequences until all sequences are aligned Current alignment { sequence to add A W P Y A G H I. W W P F A G H I I F W P Y DP alignment matrix

16 How do we represent two aligned sequences as one "sequence"? A G H I. W W P F A G H I I F W P Y A C D E F G H I K L M N P Q R S T V W Y

17 PSSMs and profiles 20xN scoring matrix. Set of probability distributions over the 20 amino acids. (Gap probabilities are (usually) not included.) P(a i) = ws / ws S Si=a [Spoken equation: The probability of amino acid a at position i is the sum of the sequence weights ws over all ] sequences S such that the amino acid at position i of that sequence Si is a, divided by the sum over the sequence weights ws for all sequences S.

18 Sequence weights??? w1 w A G H I. W W P F A G H I I F W P Y A C D E F G H I K L M N P Q R S T V W Y

19 Why do we need sequence weights? A MSA represents a sequence "family" A sequence family has an amino acid preference at each position. That preference is determined by counting. But, the MSA may be over-represented. primates rabbit rat E. coli lawyer

20 Sequence weighting corrects for uneven Simplest weighting scheme: Build a tree sampling Start with weight = 1.0 at the common ancestor of the tree. Split the weight evenly at each node Primate sequences are 10/18 of the tree, but only of the weights, because they are overrepresented weights: primates rabbit rat lawyer E. coli

21 Progressive alignment Method for progressive alignment 1. Align all pairs. Save scores in a distance matrix 2. Make a guide tree. 3.Pairwise align two most similar. 4. Align the next two most similar sequence. Etc. 5. Add sequences until all sequences are aligned A W P Y { gap A G H I. W W P F A G H I I F W P Y DP alignment matrix matchscore =(0.25*S(P,W) *S(P,F)) 21 Match score for multiple sequence alignments: matchscore(i,j) =ΣΣ wnwms(s n i,s m j) n m n=number of sequence in group 1 m=number of sequence in group 2 wn = weight of sequence n wm = weight of sequence m S(aa1,aa2) = substitution matrix value for aa1 to aa2

22 NOTE: Initial pairwise alignments are used to get the distances that are used make the guide tree, but these alignments are discarded and new alignments are made using the progressive method. 22

23 CLUSTALW JD Thompson, DG Higgins, TJ Gibson - Nucleic acids research, 1994 Start with unrooted tree, using Neighbor joining. choose root to get guide tree progressive alignment matches are scored using sequence weights gaps are position dependent GOP lower for polar residues GOP zero where there is already a gap

24 Lightning-striking-twice-in-the same-place theory There should be no gap penalty for aligning a gap to an already existing gap! If i is already a gap position in any sequence, set gap(i)=0. A W P Y A G H I. W W P F A G H I I F W P Y A(i,j) = A(i-1,j) - gap(1,i) A(i,j) = A(i,j-1) - gap(2,j) No gap penalty for the purple arrow. Sequence-specific, Position-specific gap penalties. NOTE: DP is still optimal when the gap penalty is position-specific. 24

25 CLUSTALW Position specific gap penalty 25

26 MUSCLE Iterative MSA k-mer distance matrix UPGMA tree progressive alignment--> MSA1 UPGMA tree progressive alignment -->MSA2 For randomly selected tree branches: 1.split alignment into two groups 2.calculate profiles 3.align profiles 4.accept or reject the new alignment. 5.Repeat RC Edgar - Nucleic acids research, 2004 Not DP. Based on short identical matches One way to build a guide tree. 26

27 UPGMA Unweighted pair group method using averages A B C D E Species A Species B Species C Species D Species E J-C corrected distances 1) Generate neighbor-joining tree. (NJ) 2) For first neighbors, distance to ancestor is dij/2 3) For next neighbors, distance to ancestor is average pairwise distance between taxa in two clades, divided by two. 4) Subtract to get lineage distances A B C D E raw p-distances To be discussed again when we talk about trees... 27

28 MUSCLE iterative alignment XP_ YEPTDKEMDDILSAYFFYPSYKDYTRYVVDIFHRNYVSIFIYGNIAMPTEKEDENATS-- XP_ YDPTDKEMDDLLSAYFFYPSYKDYTKYVVDFFHRNYVSIFIYGNIAMTTEKENENATS-- XP_ YTPTNKEMYDILNAYFFYPSYNAYRTYVNEYFLRNYVVIFIYGNIIISDLKGEENITKNN XP_ YIPTNKEIYDILNAYLFYPLYNSYIKYINNFFHKNYINIFIYGNLSIPNEINIKNETN-- XP_ XP_ VVQAQYYTAELFLEELNILDLESLQQFHSNYFSNFRVSSFVSGNILRSEVEDLLHSIR-- XP_ VVQAQYYTSQLFQDELATLDLESLQEFHSNYFSNFRVSSFVSGNILRSEVEDLLHTIR-- XP_ DNTWPWMDG---LEVIPHLEADDLAKFVPMLLSRAFLECYIAGNIEPKEAEAMIHHIE-- XP_ RNRFSQLDLRSAVTDASS-QFEDFKVFLEKVLTKNALDVFIMGDIDYEEARKLAEDFRAA phylogenetic tree X random cut point VVQAQYYTAELFLEELNILDLESLQQFHSNYFSNFRVSSFVSGNILRSEVEDLLHSIR-- VVQAQYYTSQLFQDELATLDLESLQEFHSNYFSNFRVSSFVSGNILRSEVEDLLHTIR-- DNTWPWMDG---LEVIPHLEADDLAKFVPMLLSRAFLECYIAGNIEPKEAEAMIHHIE-- RNRFSQLDLRSAVTDASS-QFEDFKVFLEKVLTKNALDVFIMGDIDYEEARKLAEDFRAA YEPTDKEMDDILSAYFFYPSYKDYTRYVVDIFHRNYVSIFIYGNIAMPTEKEDENATS-- YDPTDKEMDDLLSAYFFYPSYKDYTKYVVDFFHRNYVSIFIYGNIAMTTEKENENATS-- YTPTNKEMYDILNAYFFYPSYNAYRTYVNEYFLRNYVVIFIYGNIIISDLKGEENITKNN YIPTNKEIYDILNAYLFYPLYNSYIKYINNFFHKNYINIFIYGNLSIPNEINIKNETN-- DP profile-profile alignment YEPTDKEMDDILSAYFFYPSYKDYTRYVVDIFHRNYV..SIFIYGNIAMPTEKEDENATS-- YDPTDKEMDDLLSAYFFYPSYKDYTKYVVDFFHRNYV..SIFIYGNIAMTTEKENENATS-- YTPTNKEMYDILNAYFFYPSYNAYRTYVNEYFLRNYV..FIYGNIIISDLKGEENITKNN YIPTNKEIYDILNAYLFYPLYNSYIKYINNFFHKNYI..NIFIYGNLSIPNEINIKNETN-- VVQAQYYTAELFLEELNILDLESLQQFHS..NYFSNFRVSSFVSGNILRSEVEDLLHSIR-- VVQAQYYTSQLFQDELATLDLESLQEFHS..NYFSNFRVSSFVSGNILRSEVEDLLHTIR-- DNTWPWMDG---LEVIPHLEADDLAKFVP..MLLSRAFLECYIAGNIEPKEAEAMIHHIE-- RNRFSQLDLRSAVTDASS-QFEDFKVFLE..KVLTKNALDVFIMGDIDYEEARKLAEDFRAA new MSA In each iteration: The phylogenetic tree is cut at a random branch, the two subtrees are converted to profiles, and aligned. The new alignment is either accepted or rejected 28

29 Databases of multiple sequence alignments balibase -- structural alignment-based BLOCKS -- gapless regions PFAM -- Hidden Markov models CDD -- conserved domain database FSSP -- structural alignment-based (families) 29

30 Visit balibase A database of curated multiple sequence alignments derived from structure-based alignments. 30

31 Selective re-alignment Global affine-gap DP alignment may be used to refine an alignment between two, conserved and confidently aligned columns. Select. Align with MUSCLE. Selected columns. Or, paste into ClustalW web site. Use same penalty for opening gap and end gap. 31

32 Exercise 7: make a MSA due Oct 5 Select a protein sequence in NCBI. Run a BLAST search. Keep the top 50. Select the hits and download to a FASTA file. Open in UGENE (merge sequences into an alignment) Run MUSCLE. Color using Zappo. Reduce size so that the entire alignment (or as much of it as possible) fits on the screen. Save image. Paste into a file and write a blurb (10 words or less) Save as PDF and send to me in an . 32

33 Review Are multiple sequence alignments optimal? How is phylogenetic information used in MSA algorithms? What are the advantages/disadvantages of a star alignment? What information is ClustalW encoding in its MSA algorithm? What is the outermost loop in the MUSCLE alignment? 33

### Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact

### InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

### SUPPLEMENTARY INFORMATION

Supplementary information S3 (box) Methods Methods Genome weighting The currently available collection of archaeal and bacterial genomes has a highly biased distribution of isolates across taxa. For example,

### Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

### EVOLUTIONARY DISTANCES

EVOLUTIONARY DISTANCES FROM STRINGS TO TREES Luca Bortolussi 1 1 Dipartimento di Matematica ed Informatica Università degli studi di Trieste luca@dmi.units.it Trieste, 14 th November 2007 OUTLINE 1 STRINGS:

### Moreover, the circular logic

Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

### Quantifying sequence similarity

Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity

### Phylogenetic inference

Phylogenetic inference Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, March 7 th 016 After this lecture, you can discuss (dis-) advantages of different information types

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: istance-based methods Ultrametric Additive: UPGMA Transformed istance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

### bioinformatics 1 -- lecture 7

bioinformatics 1 -- lecture 7 Probability and conditional probability Random sequences and significance (real sequences are not random) Erdos & Renyi: theoretical basis for the significance of an alignment

### Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

### 08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments

### "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

MOLECULAR PHYLOGENY "Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky EVOLUTION - theory that groups of organisms change over time so that descendeants differ structurally

### CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### Phylogenetic trees 07/10/13

Phylogenetic trees 07/10/13 A tree is the only figure to occur in On the Origin of Species by Charles Darwin. It is a graphical representation of the evolutionary relationships among entities that share

### An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

### Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic analysis Phylogenetic Basics: Biological

### Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

### Sequence Analysis '17 -- lecture 7

Sequence Analysis '17 -- lecture 7 Significance E-values How significant is that? Please give me a number for......how likely the data would not have been the result of chance,......as opposed to......a

### Multiple Sequence Alignment

Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for

### 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### Dr. Amira A. AL-Hosary

Phylogenetic analysis Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut University-Egypt Phylogenetic Basics: Biological

### Large-Scale Genomic Surveys

Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

### Ch. 9 Multiple Sequence Alignment (MSA)

Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -

### Introduction to Bioinformatics

Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

### C3020 Molecular Evolution. Exercises #3: Phylogenetics

C3020 Molecular Evolution Exercises #3: Phylogenetics Consider the following sequences for five taxa 1-5 and the known outgroup O, which has the ancestral states (note that sequence 3 has changed from

### Phylogenetic Tree Reconstruction

I519 Introduction to Bioinformatics, 2011 Phylogenetic Tree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Evolution theory Speciation Evolution of new organisms is driven

### Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs

Multiple Sequence Alignment: A Critical Comparison of Four Popular Programs Shirley Sutton, Biochemistry 218 Final Project, March 14, 2008 Introduction For both the computational biologist and the research

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

### p(-,i)+p(,i)+p(-,v)+p(i,v),v)+p(i,v)

Multile Sequence Alignment Given: Set of sequences Score matrix Ga enalties Find: Alignment of sequences such that otimal score is achieved. Motivation Aligning rotein families Establish evolutionary relationshis

### BINF6201/8201. Molecular phylogenetic methods

BINF60/80 Molecular phylogenetic methods 0-7-06 Phylogenetics Ø According to the evolutionary theory, all life forms on this planet are related to one another by descent. Ø Traditionally, phylogenetics

### Effects of Gap Open and Gap Extension Penalties

Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See

### Bioinformatics Exercises

Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

### 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

### Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bioinformatics tools for phylogeny and visualization Yanbin Yin 1 Homework assignment 5 1. Take the MAFFT alignment http://cys.bios.niu.edu/yyin/teach/pbb/purdue.cellwall.list.lignin.f a.aln as input and

### Similarity searching summary (2)

Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

### MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE

MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr

### Phylogenetics: Distance Methods. COMP Spring 2015 Luay Nakhleh, Rice University

Phylogenetics: Distance Methods COMP 571 - Spring 2015 Luay Nakhleh, Rice University Outline Evolutionary models and distance corrections Distance-based methods Evolutionary Models and Distance Correction

### Phylogenies Scores for Exhaustive Maximum Likelihood and Parsimony Scores Searches

Int. J. Bioinformatics Research and Applications, Vol. x, No. x, xxxx Phylogenies Scores for Exhaustive Maximum Likelihood and s Searches Hyrum D. Carroll, Perry G. Ridge, Mark J. Clement, Quinn O. Snell

### Bioinformatics 1. Sepp Hochreiter. Biology, Sequences, Phylogenetics Part 4. Bioinformatics 1: Biology, Sequences, Phylogenetics

Bioinformatics 1 Biology, Sequences, Phylogenetics Part 4 Sepp Hochreiter Klausur Mo. 30.01.2011 Zeit: 15:30 17:00 Raum: HS14 Anmeldung Kusss Contents Methods and Bootstrapping of Maximum Methods Methods

### Multiple Sequence Alignment. Sequences

Multiple Sequence Alignment Sequences > YOR020c mstllksaksivplmdrvlvqrikaqaktasglylpe knveklnqaevvavgpgftdangnkvvpqvkvgdqvl ipqfggstiklgnddevilfrdaeilakiakd > crassa mattvrsvksliplldrvlvqrvkaeaktasgiflpe

### Exercise 5. Sequence Profiles & BLAST

Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

### Probalign: Multiple sequence alignment using partition function posterior probabilities

Sequence Analysis Probalign: Multiple sequence alignment using partition function posterior probabilities Usman Roshan 1* and Dennis R. Livesay 2 1 Department of Computer Science, New Jersey Institute

### POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics

POPULATION GENETICS Winter 2005 Lecture 17 Molecular phylogenetics - in deriving a phylogeny our goal is simply to reconstruct the historical relationships between a group of taxa. - before we review the

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Introduction to Bioinformatics Online Course: IBT

Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple

### Constructing Evolutionary/Phylogenetic Trees

Constructing Evolutionary/Phylogenetic Trees 2 broad categories: Distance-based methods Ultrametric Additive: UPGMA Transformed Distance Neighbor-Joining Character-based Maximum Parsimony Maximum Likelihood

### HMMs and biological sequence analysis

HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

### Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

### Phylogeny: building the tree of life

Phylogeny: building the tree of life Dr. Fayyaz ul Amir Afsar Minhas Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan

### Evolutionary Tree Analysis. Overview

CSI/BINF 5330 Evolutionary Tree Analysis Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Backgrounds Distance-Based Evolutionary Tree Reconstruction Character-Based

### Some Problems from Enzyme Families

Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems

### Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

### Multiple sequence alignment

Multiple sequence alignment Wednesday, October 11, 2006 Sarah Wheelan swheelan@jhmi.edu Copyright notice Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics

### Computational Biology

Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

### Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

### Sequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir

Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out

### Inferring Phylogenetic Trees. Distance Approaches. Representing distances. in rooted and unrooted trees. The distance approach to phylogenies

Inferring Phylogenetic Trees Distance Approaches Representing distances in rooted and unrooted trees The distance approach to phylogenies given: an n n matrix M where M ij is the distance between taxa

### Sequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University

Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)

### BLAST. Varieties of BLAST

BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

### Supplementary Information

Supplementary Information For the article"comparable system-level organization of Archaea and ukaryotes" by J. Podani, Z. N. Oltvai, H. Jeong, B. Tombor, A.-L. Barabási, and. Szathmáry (reference numbers

### Mul\$ple Sequence Alignment Methods. Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu

Mul\$ple Sequence Alignment Methods Tandy Warnow Departments of Bioengineering and Computer Science h?p://tandy.cs.illinois.edu Species Tree Orangutan Gorilla Chimpanzee Human From the Tree of the Life

### Lecture Notes: Markov chains

Computational Genomics and Molecular Biology, Fall 5 Lecture Notes: Markov chains Dannie Durand At the beginning of the semester, we introduced two simple scoring functions for pairwise alignments: a similarity

### Collected Works of Charles Dickens

Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny

### Sequence comparison: Score matrices

Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best

### Molecular Evolution and Phylogenetic Tree Reconstruction

1 4 Molecular Evolution and Phylogenetic Tree Reconstruction 3 2 5 1 4 2 3 5 Orthology, Paralogy, Inparalogs, Outparalogs Phylogenetic Trees Nodes: species Edges: time of independent evolution Edge length

### Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins

Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) 2 19 2015 Scribe: John Ekins Multiple Sequence Alignment Given N sequences x 1, x 2,, x N : Insert gaps in each of the sequences

### Phylogenetic Trees. What They Are Why We Do It & How To Do It. Presented by Amy Harris Dr Brad Morantz

Phylogenetic Trees What They Are Why We Do It & How To Do It Presented by Amy Harris Dr Brad Morantz Overview What is a phylogenetic tree Why do we do it How do we do it Methods and programs Parallels

### Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

### BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged

### Evaluation Measures of Multiple Sequence Alignments. Gaston H. Gonnet, *Chantal Korostensky and Steve Benner. Institute for Scientic Computing

Evaluation Measures of Multiple Sequence Alignments Gaston H. Gonnet, *Chantal Korostensky and Steve Benner Institute for Scientic Computing ETH Zurich, 8092 Zuerich, Switzerland phone: ++41 1 632 74 79

### 9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by

### How to read and make phylogenetic trees Zuzana Starostová

How to read and make phylogenetic trees Zuzana Starostová How to make phylogenetic trees? Workflow: obtain DNA sequence quality check sequence alignment calculating genetic distances phylogeny estimation

### Comparing whole genomes

BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

### Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best alignment path onsider the last step in

### 17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:

17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.

### Building 3D models of proteins

Building 3D models of proteins Why make a structural model for your protein? The structure can provide clues to the function through structural similarity with other proteins With a structure it is easier

### Molecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment

Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.

### Large Grain Size Stochastic Optimization Alignment

Brigham Young University BYU ScholarsArchive All Faculty Publications 2006-10-01 Large Grain Size Stochastic Optimization Alignment Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu

### Using Bioinformatics to Study Evolutionary Relationships Instructions

3 Using Bioinformatics to Study Evolutionary Relationships Instructions Student Researcher Background: Making and Using Multiple Sequence Alignments One of the primary tasks of genetic researchers is comparing

### Seuqence Analysis '17--lecture 10. Trees types of trees Newick notation UPGMA Fitch Margoliash Distance vs Parsimony

Seuqence nalysis '17--lecture 10 Trees types of trees Newick notation UPGM Fitch Margoliash istance vs Parsimony Phyogenetic trees What is a phylogenetic tree? model of evolutionary relationships -- common

### General context Anchor-based method Evaluation Discussion. CoCoGen meeting. Accuracy of the anchor-based strategy for genome alignment.

CoCoGen meeting Accuracy of the anchor-based strategy for genome alignment Raluca Uricaru LIRMM, CNRS Université de Montpellier 2 3 octobre 2008 1 / 31 Summary 1 General context 2 Global alignment : anchor-based

### Pairwise sequence alignments

Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best

### Heuristic Alignment and Searching

3/28/2012 Types of alignments Global Alignment Each letter of each sequence is aligned to a letter or a gap (e.g., Needleman-Wunsch). Local Alignment An optimal pair of subsequences is taken from the two

### Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

Science in China Series C: Life Sciences 2007 Science in China Press Springer-Verlag Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids

### Additive distances. w(e), where P ij is the path in T from i to j. Then the matrix [D ij ] is said to be additive.

Additive distances Let T be a tree on leaf set S and let w : E R + be an edge-weighting of T, and assume T has no nodes of degree two. Let D ij = e P ij w(e), where P ij is the path in T from i to j. Then