Biol478/ August

Similar documents
10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Example of Function Prediction

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Graph Alignment and Biological Networks

Homology and Information Gathering and Domain Annotation for Proteins

C3020 Molecular Evolution. Exercises #3: Phylogenetics

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Computational methods for predicting protein-protein interactions

Sequence analysis and comparison

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

Exploring Evolution & Bioinformatics

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

8/23/2014. Phylogeny and the Tree of Life

Introduction to Bioinformatics

BLAST. Varieties of BLAST

Computational approaches for functional genomics

Pairwise & Multiple sequence alignments

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Practical considerations of working with sequencing data

Molecular phylogeny - Using molecular sequences to infer evolutionary relationships. Tore Samuelsson Feb 2016

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Algorithms in Bioinformatics

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

Research Proposal. Title: Multiple Sequence Alignment used to investigate the co-evolving positions in OxyR Protein family.

Molecular evolution - Part 1. Pawan Dhar BII

Homology. and. Information Gathering and Domain Annotation for Proteins

Chapter 26: Phylogeny and the Tree of Life Phylogenies Show Evolutionary Relationships

UoN, CAS, DBSC BIOL102 lecture notes by: Dr. Mustafa A. Mansi. The Phylogenetic Systematics (Phylogeny and Systematics)

Computational Biology: Basics & Interesting Problems

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Practical Bioinformatics

Basic Local Alignment Search Tool

Genomes and Their Evolution

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

An Introduction to Sequence Similarity ( Homology ) Searching

Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi

Comparing Genomes! Homologies and Families! Sequence Alignments!

Alignment Algorithms. Alignment Algorithms

Genomics and bioinformatics summary. Finding genes -- computer searches

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

BIOINFORMATICS LAB AP BIOLOGY

Comparative Genomics II

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Sequence analysis and Genomics

Outline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University

3/8/ Complex adaptations. 2. often a novel trait

Comparative genomics: Overview & Tools + MUMmer algorithm

Computational Molecular Biology (

Introduction to protein alignments

BMI/CS 776 Lecture #20 Alignment of whole genomes. Colin Dewey (with slides adapted from those by Mark Craven)

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Outline Sequence-comparison methods. Buzzzzzzzz. MB330 - The class of 2008

V19 Metabolic Networks - Overview

UNIT 5. Protein Synthesis 11/22/16

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

V14 extreme pathways

Administration. ndrew Torda April /04/2008 [ 1 ]

UNIT 6 PART 3 *REGULATION USING OPERONS* Hillis Textbook, CH 11

Lecture 5,6 Local sequence alignment

BIOL 1010 Introduction to Biology: The Evolution and Diversity of Life. Spring 2011 Sections A & B

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Network Alignment 858L

Fitness constraints on horizontal gene transfer

Introduction to sequence alignment. Local alignment the Smith-Waterman algorithm

Sequence Alignment (chapter 6)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Tools and Algorithms in Bioinformatics

Phylogenetic inference

Chapter 26: Phylogeny and the Tree of Life

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

How much non-coding DNA do eukaryotes require?

Sequences, Structures, and Gene Regulatory Networks

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

7.36/7.91 recitation CB Lecture #4

Similarity or Identity? When are molecules similar?

Evolutionary Models. Evolutionary Models

Biology. Biology. Slide 1 of 26. End Show. Copyright Pearson Prentice Hall

Biology II. Evolution

Proteomics. Yeast two hybrid. Proteomics - PAGE techniques. Data obtained. What is it?

Amira A. AL-Hosary PhD of infectious diseases Department of Animal Medicine (Infectious Diseases) Faculty of Veterinary Medicine Assiut

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Announcements Wednesday, September 20

16.4 Evidence of Evolution

CSCE555 Bioinformatics. Protein Function Annotation

Algorithms in Bioinformatics

Name Period The Control of Gene Expression in Prokaryotes Notes

Phylogenetics. Applications of phylogenetics. Unrooted networks vs. rooted trees. Outline

Bioinformatics and BLAST

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Transcription:

Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database Searching Ch. 6 5 F 5 MG Database Searching Hw1 Reading Handouts Mount: Chapter 6 Sequence database searching for similar sequences Homework Usually posted Wednesday, due following week Friday

Looking at similar molecules can tell us where they came from (history) how they work (mapping knowledge) The key starting point is the knowledge that you are looking at molecules that are ancestrally related. This is called homology

What is Homology? Seen in the light of evolution, biology is, perhaps, intellectually the most satisfying and inspiring science. Without that light it becomes a pile of sundry facts some of them interesting or curious but making no meaningful picture as a whole. Nothing in biology makes sense except in the light of evolution. Theodosius Dobzhansky (1900-1975) homology - the presence of a similar feature because of descent from a common ancestor Homology cannot be observed. We can t actually see the ancestral organisms/molecules and trace descent. Homology is an inference, a conclusion we draw based on observed similarity. Homology is an all-or-none relationship no partial homology homoplasy - the presence of a similar feature because of convergence

Why is homology Important? Homology strongly suggests that the molecules have similar structure and function Some time in the past, the molecules had identical structure and function Biology is conservative - small accumulations of mutations lead to small changes in function, but not to radical changes If you can prove homology, you have a strong basis for predicting similar structure and function Known information about related molecules l can be "mapped" onto unknown molecules

Proving homology There are (very) many ways to fold a polypeptide to place specific chemical groups at specific locations. There is no reason, a priori, why proteins with a specific function should have similar 3-D structures. Therefore, there is no reason, a priori, why unrelated sequences should have any detectable similarity in sequence. Significantly similar molecular sequences are very unlikely to arise by chance - i.e. homoplasy on the molecular level is very unlikely. When we see significant similarity, we infer that the sequences/structures are homologous, i.e. at some point in the past they share an identical structure. The only thing that keeps the sequences tied to each other is the commonality of structure and function arising from homology.

We can only make comparisons when we know we are comparing the "same" things. More precisely homologous genes homologous proteins Argument: This gene in mouse is the same as a gene in humans, therefore it does about the same thing What do you find the same gene in two genomes? find the same protein in two proteomes? guess the function of a new gene?

Homology Sequences alignments and database searches let us Find homologous sequences (genes/proteins) Map information from known systems to new ones Gene identification Gene function Metabolic and regulatory systems Two common classes of homologs Orthologs genes separated by a speciation event, i.e. the same gene in two species Paralogs genes separated by a duplication events, originally the same but now diverged with possibly different functions

How is homology determined? Because molecular homoplasy is unlikely, significant sequence similarity strongly indicates homology Similarity homology Similarity is determined by sequence matching or comparison, more commonly called sequence alignment Approaches Dotplots Alignments Database searches

Dotplots a simple way to compare sequences 1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF... 46..::... :..:... : :.... 1 GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLK 50..... 47..DLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVD 94.:.::.... :. :....:.:.: ::. 51 SEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIP 100.... 95 PVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR 141. :.::. ::.... ::.:..::.::.... :.. : 101 VKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYK 147

H I S S E Q E N C E M Y S E Q U E N C E E C N E Q E S S I H H I S S E Q E N C E M Y S E Q U E N C E M Y S E Q U E N C E

Dotplots Simplest method - put a dot wherever sequences are identical A little better - use a scoring table, put a dot wherever the residues have better than a certain score Or, put a dot wherever you get at least n matches in a row (identity matching, compare/word) Even better - filter the plot

Dotplots

2/2 4/4 RecA DNA sequence from Helicobacter pylori and Streptococcus mutans, window/ match shown below figure

Dotplots Windowed scores Calculate a score within a window Move the window over one A C C T T G T C C T C T T T A C C T G C C G A A A C G T T G A C C T G T A A C C T G C C G A T T Window Length = Segment = Span = 6

9/6 RecA DNA sequence from Helicobacter pylori and Streptococcus mutans, window/ match shown below figure

RecA DNA sequence from Helicobacter pylori and Streptococcus mutans, window/ match = 12/8

Dot Matrix Plots What can you see in dotplots? Similar regions Repeated sequences Rearrangements RNA structures

low entropy sequences Lin12 repeats EGF repeats Drosophila Notch protein

1 2 3 4 5 6 Repeated sequence in E. coli ribosomal protein S1

Homology Homologous sequences show up as diagonal lines in dotplots Basis for methods to find..::... :..:... : :.... homologous sequences 1 VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF... 46 1 GLSDGEWQLVLNVWGKVEADIPGHGQEVLIRLFKGHPETLEKFDKFKHLK 50..... 47..DLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVD 94.:.::.... :. :....:.:.: ::. 51 SEDEMKASEDLKKHGATVLTALGGILKKKGHHEAEIKPLAQSHATKHKIP 100.... 95 PVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR 141. :.::. ::.... ::.:..::.::.... :.. : 101 VKYLEFISECIIQVLQSKHPGDFGADAQGAMNKALELFRKDMASNYK 147

HIV Env Gene Conserved (C) and Variable (V) regions C Eukaryotic N-ter C-ter

HIV Protease aspartic protease retrovirus protease Pearl & Taylor, Nature 329, 351-354,1987

Database Searching Big problem is database size Bigger database means longer search. Alignments are O(n 2) Bigger database means worse signal to noise ratio Sequence data doubles every 12-14 months