Outline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University

Similar documents
Outline Sequence-comparison methods. Buzzzzzzzz. MB330 - The class of 2008

Introduction to Bioinformatics

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Genomes and Their Evolution

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Example of Function Prediction

Biol478/ August

CSCE555 Bioinformatics. Protein Function Annotation

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Pairwise & Multiple sequence alignments

Exploring Evolution & Bioinformatics

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

EECS730: Introduction to Bioinformatics

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Practical considerations of working with sequencing data

Computational methods for predicting protein-protein interactions

Motivating the need for optimal sequence alignments...

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Evolutionary Tree Analysis. Overview

9/30/11. Evolution theory. Phylogenetic Tree Reconstruction. Phylogenetic trees (binary trees) Phylogeny (phylogenetic tree)

Homology and Information Gathering and Domain Annotation for Proteins

Large-Scale Genomic Surveys

Orthologs Detection and Applications

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6)

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Comparative Genomics II

Structure to Function. Molecular Bioinformatics, X3, 2006

Computational Biology

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Dr. Amira A. AL-Hosary

Week 10: Homology Modelling (II) - HHpred

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Algorithms in Bioinformatics

Sequence analysis and comparison

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Session 5: Phylogenomics

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

EVOLUTIONARY DISTANCES

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Welcome to HST.508/Biophysics 170

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Sequence Alignment Techniques and Their Uses

Exhaustive search. CS 466 Saurabh Sinha

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Bioinformatics for Biologists

O 3 O 4 O 5. q 3. q 4. Transition

Computational approaches for functional genomics

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Bioinformatics 2 - Lecture 4

7.36/7.91 recitation CB Lecture #4

Homology. and. Information Gathering and Domain Annotation for Proteins

BLAST. Varieties of BLAST

Page 1. Evolutionary Trees. Why build evolutionary tree? Outline

Introduction to protein alignments

Tools and Algorithms in Bioinformatics

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Phylogeny and Evolution. Gina Cannarozzi ETH Zurich Institute of Computational Science

Comparative genomics: Overview & Tools + MUMmer algorithm

Hidden Markov Models

Computational Biology: Basics & Interesting Problems

Visit to BPRC. Data is crucial! Case study: Evolution of AIRE protein 6/7/13

Pairwise sequence alignments

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Hands-On Nine The PAX6 Gene and Protein

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Sequence analysis and Genomics

Quantifying sequence similarity

Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi

Gene function annotation

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Protein Structure Prediction Using Neural Networks

Multiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

COPIA: A New Software for Finding Consensus Patterns. Chengzhi Liang. A thesis. presented to the University ofwaterloo. in fulfilment of the


Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

Phylogeny Tree Algorithms

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Gibbs Sampling Methods for Multiple Sequence Alignment

Practical Bioinformatics

Collected Works of Charles Dickens

Computational Genomics and Molecular Biology, Fall

Multiple Sequence Alignment

Transcription:

MB330 - January, 2006 Sequence-comparison methods erard Kleywegt Uppsala University Outline! Why compare sequences?! Dotplots! airwise sequence alignments &! Multiple sequence alignments! rofile methods! Hidden Markov Models (HMMs) Separate lectures by atrik Johansson! Motif- and family-based methods Separate lecture by Marian ovotny Why compare sequences? Buzzzzzzzz! Sequence comparison is the bread and butter of bioinformatics - WHY??? Sequence-to-database Sequence-to-sequence! Discuss in groups of 3 for 3 minutes! Write down ~3 things that you think protein sequence comparisons could be used for! 1

he class of 2006! Sequence-to-database dentification of protein lues about function Find related sequences lues about domain structure Verify hypothetical proteins lues about structural similarities Find sequence motifs (active site, ) he class of 2006! Sequence-to-sequence nvestigate evolutionary history and relationships nalyse differences between species and between individuals (e.g., disease-causing mutations) Structure modelling lues about secondary structure Sequence motifs (active site, ) Sequence-database comparison! Find related sequences Homology Descended from a common ancestor (/F!!) Occurrence in other organisms (orthologs; speciation) Occurrence in same organism (paralogs; gene duplication) onvergent evolution ndependently evolved same function Shared motif(s) Shared domains hance similarities! Find clues about function! Find clues about structure Sequence-sequence comparison! lignment of (possibly) homologous sequences Measure similarity, cluster Determine residue-residue correspondences Find patterns of conservation and variability Functionally important sites Structurally important sites nfer evolutionary relationships, phylogeny Structure prediction Secondary structure prediction Homology modelling Function prediction (caution!) 2

Sequence identity! Sequence identity (%S) = 100% * (r of identical residues in pairwise alignment) / (ength of the shortest sequence)! x: -- - Sequence identity/homology! Homology and level of sequence identity (or similarity) are two fundamentally different concepts!! an homology be inferred/rejected based on the level of sequence identity?! %S = 100% * 6 / min(9,10) = 67% Sequence identity/homology! Sequence identity of non-homologous proteins Sequence identity/homology! Sequence identity of homologous proteins (Rost, 1999) (Rost, 1999) 3

Sequence identity/homology! wo proteins of 100 or more residues with %S >35% are likely to be homologous! However, homologous proteins may well have %S <35% wilight Zone (Doolittle)! %S <20% Midnight Zone (Rost)! verage %S ~8.5% for remote homologs! verage %S ~5.6% for random sequences Structure conservation! Homologous proteins will have similar structures! Structure better conserved than sequence! on-homologous proteins may have similar structures (hothia & esk, 1986) Dotplots Dotplots! Dotplot: simple overview of the similarities of two words/sequences ives clues about alignment too! alculation: Matrix olumns = residues of sequence 1 Rows = residues of sequence 2 (or 1) Simplest form: put dots in the matrix where the row and column residues are identical 4

5 Dotplot example O Z H D M O Z H W Self-dotplot! nternal symmetry ranslational = domain duplication nversion D recognition sites for transcriptional regulators and restriction enzymes x: cor: / ow-complexity regions x: lu repeat! Why compare a sequence to itself? Dotplot of a palindrome? Dotplot of a palindrome!

ow-complexity region? ow-complexity region! D F D F D D F F Domain duplication? Domain duplication! 6

Shared domains? Shared domains! Domain D Domain D Domain F Domain F Dotplots Dotplots with window! Usually: Define a window size ount number of identical residues within the window f the count exceeds a certain threshold, put a dot in the matrix element x: window 3 (-1,0,+1), minimum of 2 identities x: window 15 (-7,-6,,+7), minimum of 6 identities Window 3 hreshold 2???? 7

Dotplots with window Dotplot examples Window 3 hreshold 2! Human lactalbumin: 123 Residues, sequence from DB entry 1B9O alcium-binding protein involved in lactose biosynthesis! Hen egg-white lysozyme: 129 Residues, sequence from DB entry 2DS nzyme that breaks down bacterial cell walls! Homologous; %S ~36% (structure-based sequence alignment)! ote: plots now from lower-left to upper-right corner Window 1, threshold 1 Window 3, threshold 2 8

Window 11, threshold 5 Summary! Dotplots are an excellent means of assessing the (self-)similarity of sequences asy to calculate asy to interpret ompare every residue in one sequence to every residue in the other sequence rovide an indication of how the sequences should be aligned Detect similarities that are easily missed by global pairwise alignment (e.g., shuffled domain order, internal symmetry) different kind of dotplot! Dotplots can be used to compare any strings! x: a manual chapter in Dutch, French, erman, talian, Spanish, and Swedish (one million 4-grams) Sequencing!! For the next lecture need two random D sequences! ach of you pick one of the four nucleotides,,, or! We ll generate a ojk sequence and a jej sequence 9

Sequencing! jej-jej dotplot! he class of 2006 generated the following random sequences:! jej ote: contains low-complexity palindrome () and a repeat of the domain ()! ojk ote: contains low-complexity region () and a palindrome-in-a-palindrome ()! he following dotplots were calculated with window size 3 and threshold 2 ojk-ojk dotplot jej-ojk dotplot 10