Multiple Sequence Alignment
|
|
- Joanna Arnold
- 6 years ago
- Views:
Transcription
1 Multiple equence lignment Four ami Khuri Dept of omputer cience an José tate University Multiple equence lignment v Progressive lignment v Guide Tree v lustalw v Toffee v Muscle v MFFT * 20 * 0 * 60 * 80 GTTTGGTGGTTTGGTGTGTTTTGTTGTTGGTGGGTGGGGGG Wombat : : 83 Opossum : GTTTGGTGGTTTTGGTTGGTTTTGGTTTGTGTTGGGTGGTGG : 83 rmadillo : GTTGGTGGTTTTGGGTGTGTTTTTTGTGTGTGGGGGTTGTTTGG : 83 loth : GTTTGGTGGTTTTGGTGTGTTTTTGTGTTGGGGGGTTGTTGG : 83 Dugong : GTTTGGTGGTTTTTGGTGTGGTG GTGTTGTGTGGGGTTGGTTGG : 7 Hyrax : GTTTGGTGGTTTTGGTGT GTGTTTGTGGGGGTTGTTTGG : 7 ardvark : GTTTGGTGGTTTTGGTGTGGTG GTGGTTGTGGGGTTGTTGG : 7 Tenrec : GGTTGGTGGTTTTGGGTG GGTGTTGGTGGGGGTGGTGGGG : 7 Rhinoceros : GTTTGGTGGTTTTGGTGTGTTTTTTGTGTTGTGGGGGGTGTTTG : 83 Pig : GTTTGGTGGTTTTTGGGTGTGTTTTTGGTGGGGGGTTGTTTGG : 83 Hedgehog : GTGTGTGGTTTGGTGTGTGTTTTTGTGTTTGTGGGTTTTG : 83 Human : GTTTGGTGGTTTTGGTGTGTGTTGGTTTGTGTTGTGGGGGTTGTTG : 83 Rat : GTGTGGTGGTTTTGTGGTGTGTTTTTGTGTTGGGGGTGGTTGG : 83 Hare : GTTGGTGGTTTGGTTGTGTTTTGTGTTTGGGGGTTGTTG : 83 * 100 * 120 * 10 * GGTGTGTGTTGGTGGGTGTGGGGGTTTGGTTTTGGGTG Wombat : : 156 Opossum : GGTGTTTGTTGGGTGTGG---GTGGTTTGTTTTGGTGT : 153 rmadillo : GTGTGGTGTTGGTT------TGGTGTGTTTTGTTTTTGGGTG : 150 loth : GTGTTGGTGTTGGTT------TGGTGTGGTTTTGGTTTTGGGTG : 150 Dugong : GTGTGGTGTTTGGTT------GGGTTGGTTTTGTTTTGGTG : 11 Hyrax : GTGGTGGTGTTT------GGTGGTTGTTTTGTTTTGGTGT : 11 ardvark : TGGTGGTGTTGGTT------TTGGTTGTTTTGGTTTTGGTG : 11 Tenrec : GTGTGTGTTGGTT------GGGTGTGTTTTGTTTGGG : 11 Rhinoceros : GTGTGGTGGTGGTT------TGGTGTGGTTTTGGTTTTGGTGG : 150 Pig : GGTGTGGTGGGGGTT------TGGGTGGTTTGGGTTTTGGTG : 150 Hedgehog : GTTGTGGTT------TGTGTGRTTTTTTGGTTTTGGT : 150 Human : GTGTGTGTTTGGGTT------TTGGGTGTGTTTTGGTTTTGGTG : 150 Rat : GTGTGTTGTGTTGGTT------TTGGTGGTGGTGTTTGTTTTGTG : 150 Hare : GTGGTGGTGTTGGT------GGGGTGTGGTTTTGGTTTGGTG : 150 Part of the alignment of the DN sequences of the R1 gene From ioinformatics and Molecular Evolution by Paul Higgs and Teresa ttwood ligning R1 equences * * * * * Wombat : KVNEWLRDILDNNGRHEQEVPLEDGHPDTEGNVEKTD : 52 Opossum : KVNEWLFRNDVLPDYVRHEQNETNLEYGHVET-DGNIEKTD : 51 rmadillo : KVNEWFRGDDILTDDHDRGELNEVGLKV--KEVDEYFEKID : 50 loth : KVNEWFRDDILTDDHNGGENEVVGLKV--PNEVDGYGEKID : 50 Dugong : KVNEWFFRDGL---DDLHDKGENEVGLEV--PEEVHGYEKID : 7 Hyrax : KVNEWFRDNL---DPEGELNGKVGPVKL--PGEVHRYFPENID : 7 ardvark : KVNEWFRDGL---DGHDEGENEIGGLEV--NEVHYGEKID : 7 Tenrec : KVNEWFKHGL---GDRDGRPEGDVVFEV--PDEEYPEKTD : 7 Rhinoceros : KVNEWFRDEILTDDHDGGPENTEVGVEV--QNEVDGYGEKIG : 50 Pig : KVNEWFRDEMLTDDQDRRENTGVGEV--PNEDGHLGEKID : 50 Hedgehog : KVNEWLRDELLTDDYDKGKKTEVTVTTEV--PNIDXFFGEKIN : 50 Human : KVNEWFRDELLGDDHDGEENKVDVLDV--LNEVDEYGEKID : 50 Rat : KVNEWFRTGEMLTDNDRRPNEVVLEV--NEVDGFKKID : 50 Hare : KVNEWFRNEMLTPDDLDRRENKVGLEV--PKEVDGYGTEKID : 50 KVNEWfs 6 d s e n e eki lignment of R1 protein sequences for the same region on the gene From ioinformatics and Molecular Evolution by Paul Higgs and Teresa ttwood ligning Kinases: n Example Pairwise vs. Multiple lignment Multiple sequence alignment between a cmp-kinase and 5 PI-3 kinases. Green indicates total conservation (identical residues), while blue indicates physicochemically conserved residues (belonging to the same partition of amino acids). Top Figure: The pairwise alignment of the two homologous kinases does not align the important active-site residues and the DFG motif (in green). ottom Figure: The multiple sequence alignment of 5 homologous kinases forces the best-conserved regions to be matched ami Khuri.1
2 What is Multiple lignment Most simple extension of pairwise alignment Given: et of sequences Match matrix Gap penalties Find: lignment of sequences such that an optimal score is achieved. Uses of Multiple lignment good alignment is critical for further analysis Determine the relationships between a group of sequences Determine the conserved regions Evolutionary nalysis Determine the phylogenetic relationships and evolution tructural nalysis Determine the overall structure of the proteins Uses of Multiple lignment From a good alignment, one can Infer phylogenetic relationships; evolution of organisms. Elucidate biological facts about proteins: most conserved regions are usually biologically significant. Formulate and test hypothese about protein 3-D structure (based on conserved regions). Formulate and test hypotheses about protein function (see which regions of a gene, or its derived protein, are susceptible to mutaton & which can have one residue replaced by another without changing the function) M: Exact vs. Heuristic The exact algorithm traverses the entire search space finds overall measure of alignment quality and tries to maximize this quality. The operation is computationally intensive. The largest computers can only optimally align a few sequences (7-8). Therefore, we have to use heuristics; i.e., faster algorithms, if we want to align many sequences. Heuristic lgorithms ased on a progressive pairwise alignment approach lustalw (luster lignment) PileUp (GG) MW uilds a global alignment based on local alignments uilds local multiple alignments ased on Hidden Markov Models ased on Genetic algorithms. Progressive trategies for M common strategy to the M problem is to progressively align pairs of sequences. starting pair of sequences is selected and aligned Each subsequent sequence is aligned to the previous alignment. Progressive alignment is a greedy algorithm ami Khuri.2
3 Iterative Pairwise lignment The greedy algorithm: align some pair while not done pick an unaligned string near some aligned one(s) align with the previously aligned group There are many variants to the algorithm. tep One of lustalw: Pairwise lignments 1) Perform pairwise alignments of all sequences ompare each sequence with each other calculate a distance matrix Distance Matrix Note that.87 means 87% identical. Distance = Number of exact matches divided by the sequence length (ignoring gaps). tep Two of lustalw: reate Guide Tree 2) Use the results of the Distance Matrix to create a Guide Tree to help determine in what order the sequences are aligned Guide Tree The Guide Tree, or Dendrogram has no phylogenetic meaning. It cannot be used to show evolutionary relationships..60 tep Three of lustalw: Progressive lignment 3) Use the Guide Tree to align the sequences lign and first Then add sequence to the previous alignment lign the most closely related sequences first, then add in the most distantly related ones and align them to the existing alignment, inserting gaps if necessary. Multiple lignment Problems Does the quality of the guide tree matter? Not for very closely related sequences, but perhaps for distantly related ones. Local minimum problem If the initial alignments have a problem, they cannot be removed during subsequent steps. lustalw: Package for M lustalw [the W is from Weighted] is a software package for the M problem. Different weights are given to sequences and parameters in different parts of the alignment to and create an alignment that makes sense biologically. calable Gap Penalties for protein profile alignments gap opening next to a conserved hydrophobic residue can be penalized more heavily than a gap opening next to a hydrophilic residue. gap opening very close to another gap can be penalized more heavily than an isolated gap ami Khuri.3
4 teps of lustalw lustalw: n Example ll Pairwise lignments imilarity Matrix luster nalysis Multiple lignment tep: 1. ligning 1 and 3 2. ligning 2 and 3. ligning (1,3) with (2,). Dendrogram Distance y using the same five sequences and aligning them with LUTLW, we get the illustrated results. * = identity : = strongly conserved. = weakly conserved Practical onsiderations When to use lustalw? an be used to align any group of protein or nucleic acid sequences that are related to each other over their entire lengths. lustal is optimized to align sets of sequences that are entirely co-linear, i.e. sequences that have the same protein domains, in the same order. When Not To Use lustalw equences do not share common ancestry. equences are partially related. equences include short non overlapping fragments. lignment Problems Final result sometimes depends on the order that sequences were analyzed. Gaps can make alignment unrealistically long. equences of different lengths can cause problems. Non-conserved regions can dilute conserved areas. Only need to align the shared domain. o trim away any excess sequence and realign. lustal Omega 2016 ami Khuri.
5 DN or Protein lignment If we are comparing two or more sequences, is it better to align the DN, or Protein? It depends on what we want to compare. If protein function, then look at the amino acids If genetic changes, then look at the DN The initial mutations take place at the DN level, but the evolutionary pressure occurs at the protein level. tructural lignment What you really want to do is align regions of similar function. These are the areas that are evolutionarily conserved. (Folds, domains, disulfide bonds) Problem The computer does not know anything about the structure or function of the proteins. olution Use computer alignment as a first step, then manually adjust the alignment to account for regions of structural similarity. lternatives to LUTLW (I) lustal Omega Toffee: collection of tools for omputing, Evaluating and Manipulating Multiple lignments of DN, RN, Protein equences and tructures. Good for distantly related sequences too. MULE: Multiple equence omparison by Log-Expectation lternatives to LUTLW (II) MFFT: Multiple lignment using Fast Fourier Transform. good balance between accuracy and speed. align.genome.jp/mafft PRRN: web-based multiple sequence alignment package. align.genome.jp/prrn lternatives to LUTLW (III) lternatives to LUTLW (IV) ami Khuri.5
6 M Editors Once the multiple alignment is produced, it may be necessary to edit the sequence manually to obtain a more reasonable or expected alignment. ome of the considerations for an editor: the use of colors to aid in the visual representation of the alignment, the capability of recognizing the alignment format, the ability of using the mouse to add, delete, or move sequences, thus allowing for an adequate windows interface. M Editor and Formatter Programs Multiple equence lignment programs: INEM (olor Interactive Editor for Multiple lignments) GDE (Genetic Data Environment) GeneDoc MW Multiple equence lignment programs: oxshade LUTLX 2016 ami Khuri.6
Multiple Sequence Alignment. Progressive Alignment Iterative Pairwise Guide Tree ClustalW Co-linearity Multiple Sequence Alignment Editors
Yverdon Le Bain Introduction to Bioinformatic Introduction to Bioinformatic ami Khuri Department of Computer cience an Joé tate Univerity an Joé, California, UA khuri@c.ju.edu www.c.ju.edu/faculty/khuri
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationTHEORY. Based on sequence Length According to the length of sequence being compared it is of following two types
Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationMultiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas
n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming
More information5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT
5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationSara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)
Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline
More informationCopyright 2000 N. AYDIN. All rights reserved. 1
Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationSequence Bioinformatics. Multiple Sequence Alignment Waqas Nasir
Sequence Bioinformatics Multiple Sequence Alignment Waqas Nasir 2010-11-12 Multiple Sequence Alignment One amino acid plays coy; a pair of homologous sequences whisper; many aligned sequences shout out
More informationWeek 10: Homology Modelling (II) - HHpred
Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More information08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega
BLAST Multiple Sequence Alignments: Clustal Omega What does basic BLAST do (e.g. what is input sequence and how does BLAST look for matches?) Susan Parrish McDaniel College Multiple Sequence Alignments
More informationSequence Alignment Techniques and Their Uses
Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationA Method for Aligning RNA Secondary Structures
Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationOverview Multiple Sequence Alignment
Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationChapter 5. Proteomics and the analysis of protein sequence Ⅱ
Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and
More informationHomology Modeling. Roberto Lins EPFL - summer semester 2005
Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,
More informationBio nformatics. Lecture 3. Saad Mneimneh
Bio nformatics Lecture 3 Sequencing As before, DNA is cut into small ( 0.4KB) fragments and a clone library is formed. Biological experiments allow to read a certain number of these short fragments per
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationPairwise Alignment. Guan-Shieng Huang. Dept. of CSIE, NCNU. Pairwise Alignment p.1/55
Pairwise Alignment Guan-Shieng Huang shieng@ncnu.edu.tw Dept. of CSIE, NCNU Pairwise Alignment p.1/55 Approach 1. Problem definition 2. Computational method (algorithms) 3. Complexity and performance Pairwise
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationMultiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:
Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:
More informationCh. 9 Multiple Sequence Alignment (MSA)
Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -
More informationQuantifying sequence similarity
Quantifying sequence similarity Bas E. Dutilh Systems Biology: Bioinformatic Data Analysis Utrecht University, February 16 th 2016 After this lecture, you can define homology, similarity, and identity
More informationConserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.
onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence
More informationMolecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment
Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.
More informationSingle alignment: Substitution Matrix. 16 march 2017
Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block
More informationBIOINFORMATICS: An Introduction
BIOINFORMATICS: An Introduction What is Bioinformatics? The term was first coined in 1988 by Dr. Hwa Lim The original definition was : a collective term for data compilation, organisation, analysis and
More informationIntroduction to Evolutionary Concepts
Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq
More informationAn Introduction to Sequence Similarity ( Homology ) Searching
An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and
More informationMultiple Sequence Alignment
Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for
More informationSequence alignment methods. Pairwise alignment. The universe of biological sequence analysis
he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction
More informationSequence analysis and Genomics
Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute
More informationBioinformatics. Dept. of Computational Biology & Bioinformatics
Bioinformatics Dept. of Computational Biology & Bioinformatics 3 Bioinformatics - play with sequences & structures Dept. of Computational Biology & Bioinformatics 4 ORGANIZATION OF LIFE ROLE OF BIOINFORMATICS
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationMultiple sequence alignment
Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple
More informationModule: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment
Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationIntroduction to Bioinformatics Online Course: IBT
Introduction to Bioinformatics Online Course: IBT Multiple Sequence Alignment Building Multiple Sequence Alignment Lec1 Building a Multiple Sequence Alignment Learning Outcomes 1- Understanding Why multiple
More informationCONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018
CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More information9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)
I9 Introduction to Bioinformatics, 0 Phylogenetic ree Reconstruction Yuzhen Ye (yye@indiana.edu) School of Informatics & omputing, IUB Evolution theory Speciation Evolution of new organisms is driven by
More informationComputational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins
Computational Analysis of the Fungal and Metazoan Groups of Heat Shock Proteins Introduction: Benjamin Cooper, The Pennsylvania State University Advisor: Dr. Hugh Nicolas, Biomedical Initiative, Carnegie
More informationProtein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche
Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its
More informationSome Problems from Enzyme Families
Some Problems from Enzyme Families Greg Butler Department of Computer Science Concordia University, Montreal www.cs.concordia.ca/~faculty/gregb gregb@cs.concordia.ca Abstract I will discuss some problems
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationMoreover, the circular logic
Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT
More informationUSING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES
USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES HOW CAN BIOINFORMATICS BE USED AS A TOOL TO DETERMINE EVOLUTIONARY RELATIONSHPS AND TO BETTER UNDERSTAND PROTEIN HERITAGE?
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationPage 1. Evolutionary Trees. Why build evolutionary tree? Outline
Page Evolutionary Trees Russ. ltman MI S 7 Outline. Why build evolutionary trees?. istance-based vs. character-based methods. istance-based: Ultrametric Trees dditive Trees. haracter-based: Perfect phylogeny
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationSequence Analysis '17- lecture 8. Multiple sequence alignment
Sequence Analysis '17- lecture 8 Multiple sequence alignment Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database
More informationStudy and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis
Study and Implementation of Various Techniques Involved in DNA and Protein Sequence Analysis Kumud Joseph Kujur, Sumit Pal Singh, O.P. Vyas, Ruchir Bhatia, Varun Singh* Indian Institute of Information
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationMULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE
MULTIPLE SEQUENCE ALIGNMENT FOR CONSTRUCTION OF PHYLOGENETIC TREE Manmeet Kaur 1, Navneet Kaur Bawa 2 1 M-tech research scholar (CSE Dept) ACET, Manawala,Asr 2 Associate Professor (CSE Dept) ACET, Manawala,Asr
More informationSequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013
Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation
More informationMultiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins
Multiple Alignment using Hydrophobic Clusters : a tool to align and identify distantly related proteins J. Baussand, C. Deremble, A. Carbone Analytical Genomics Laboratoire d Immuno-Biologie Cellulaire
More informationIntroduction to Comparative Protein Modeling. Chapter 4 Part I
Introduction to Comparative Protein Modeling Chapter 4 Part I 1 Information on Proteins Each modeling study depends on the quality of the known experimental data. Basis of the model Search in the literature
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationInferring Molecular Phylogeny
r. Walter Salzburger The tree of life, ustav Klimt (1907) Inferring Molecular Phylogeny Inferring Molecular Phylogeny 2 1. Molecular Markers Inferring Molecular Phylogeny 3 Immunological comparisons! Nuttall
More informationProtein structure prediction. CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror
Protein structure prediction CS/CME/BioE/Biophys/BMI 279 Oct. 10 and 12, 2017 Ron Dror 1 Outline Why predict protein structure? Can we use (pure) physics-based methods? Knowledge-based methods Two major
More informationEffects of Gap Open and Gap Extension Penalties
Brigham Young University BYU ScholarsArchive All Faculty Publications 200-10-01 Effects of Gap Open and Gap Extension Penalties Hyrum Carroll hyrumcarroll@gmail.com Mark J. Clement clement@cs.byu.edu See
More informationPairwise sequence alignments
Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationLecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) Scribe: John Ekins
Lecture 14: Multiple Sequence Alignment (Gene Finding, Conserved Elements) 2 19 2015 Scribe: John Ekins Multiple Sequence Alignment Given N sequences x 1, x 2,, x N : Insert gaps in each of the sequences
More informationBMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)
BMI/CS 776 Lecture #20 Alignment of whole genomes Colin Dewey (with slides adapted from those by Mark Craven) 2007.03.29 1 Multiple whole genome alignment Input set of whole genome sequences genomes diverged
More informationSequence Alignment: Scoring Schemes. COMP 571 Luay Nakhleh, Rice University
Sequence Alignment: Scoring Schemes COMP 571 Luay Nakhleh, Rice University Scoring Schemes Recall that an alignment score is aimed at providing a scale to measure the degree of similarity (or difference)
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationBioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre
Bioinformatics Scoring Matrices David Gilbert Bioinformatics Research Centre www.brc.dcs.gla.ac.uk Department of Computing Science, University of Glasgow Learning Objectives To explain the requirement
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationSupporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB
Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications, Cvicek et al. Supporting Text 1 Here we compare the GRoSS alignment
More informationLecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)
Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationHands-On Nine The PAX6 Gene and Protein
Hands-On Nine The PAX6 Gene and Protein Main Purpose of Hands-On Activity: Using bioinformatics tools to examine the sequences, homology, and disease relevance of the Pax6: a master gene of eye formation.
More informationSession 5: Phylogenomics
Session 5: Phylogenomics B.- Phylogeny based orthology assignment REMINDER: Gene tree reconstruction is divided in three steps: homology search, multiple sequence alignment and model selection plus tree
More informationobjective functions...
objective functions... COFFEE (Notredame et al. 1998) measures column by column similarity between pairwise and multiple sequence alignments assumes that the pairwise alignments are optimal assumes a set
More informationThanks to Paul Lewis, Jeff Thorne, and Joe Felsenstein for the use of slides
hanks to Paul Lewis, Jeff horne, and Joe Felsenstein for the use of slides Hennigian logic reconstructs the tree if we know polarity of characters and there is no homoplasy UPM infers a tree from a distance
More informationSequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.
Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationBasics on bioinforma-cs Lecture 7. Nunzio D Agostino
Basics on bioinforma-cs Lecture 7 Nunzio D Agostino nunzio.dagostino@entecra.it; nunzio.dagostino@gmail.com Multiple alignments One sequence plays coy a pair of homologous sequence whisper many aligned
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationIntroductory course on Multiple Sequence Alignment Part I: Theoretical foundations
Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:
More informationTools and Algorithms in Bioinformatics
Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology
More informationMolecular Evolution and DNA systematics
Biology 4505 - Biogeography & Systematics Dr. Carr Molecular Evolution and DNA systematics Ultimately, the source of all organismal variation that we have examined in this course is the genome, written
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationSimilarity or Identity? When are molecules similar?
Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are
More informationC E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment
C E N T R E F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U Introduction to bioinformatics 2007 Lecture 5 Pair-wise Sequence Alignment Bioinformatics Nothing in Biology makes sense except in
More information