# Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Size: px
Start display at page:

Transcription

1 1 Sequence Analysis and Databases 2: Sequences and Multiple Alignments Jose María González-Izarzugaza Martínez CNIO Spanish National Cancer Research Centre

2 2 Sequence Comparisons: How? - Pairwise Alignment of 2 Sequences: - Aligning a couple of sequences - Searching for homologues (BLAST) - Multiple Sequence Alignments (n>2): - Advanced Sequence Alignments: - Patterns, Profiles, HMMs - Distant Homologues with PSI-BLAST

3 3 Multiple Sequence Alignments Rationale: - They try to align more than 2 homologous sequences. - Conserved regions must be important due to selective pressure. - Better Alignments as we can focus on general rules. Algorithmic Complexity: - 2 Sequences: NxM (As seen for Pairwise Alignments) - 3 Sequences: NxMxL - 4 Sequences: NxMxLxJ - Not feasible in general for a set of homologous proteins. Examples of Heuristics to skip this algorithmic problem: - T_coffee : [ - Clustalw : [

4 4 Clustalw Algorithm The Algorithm in bare-words Sort the sequences by similarity Align the two most similar ones. Label the sequences as aligned. Repeat until there are no unlabeled seqs Computational Complexity: Now it is similar to performing N pairwise alignments, thus it is feasible for the computer to calculate them for a family of homologues.

5 5 View of a MSA using Belvu

6 6 PairWise vs MSA - If a couple of homologues have diverged more than 20% the signal between them is so low that BLAST is not able to catch it. In other words, we can NOT use BLAST to find remote homologues. - A Multiple Sequence Alignment (MSA) is able to spot important regions. Since important regions have higher selective pressure they change (evolve) less than other regions Conservation is related to Importance. -If the matches between a set of sequences occur in the conserved regions, the chances of these 2 sequences being homologues increase. So How can we use all this information to improve our knowledge? Can a MSA spot remote homologues?

7 7 Sequence Comparisons: How? - Pairwise Alignment of 2 Sequences: - Aligning a couple of sequences - Searching for homologues (BLAST) - Multiple Sequence Alignments (n>2): - Advanced Sequence Alignments: - Patterns, Profiles, HMMs - Distant Homologues with PSI-BLAST

8 8 Advanced Searches: Consensus Algorithm: For each position in the sequence, the consensus position will represent the most repeated monomer. Pros: - The most basic way of summarizing a MSA into a single line. - Easy to implement, easy to understand Cons: - Not taking into account the frequencies, but the most represented ACTGACTACGTACA ATGCGTACCATACA ATCAGTATCGTAGA ATCAGTATCGAACA ATCAGTATCGTACA Consensus Sequence

9 9 Advanced Searches: Patterns -Patterns are also called Regular Expressions by bioinformaticians - Useful when dealing with motifs - It is a complex (but powerful) language, not always easy at first glance - Ambiguity can be depicted: [A,B] : Can be both A or B {A, B} : Anything but A or B X: Any monomer -Repetitions are easily represented: A{2,4}: Can be either AA, AAA or AAAA A+: Any number of A s (but at least one) A*: Any number of A s (or even none) - MSA s are reduced to a single line - Not taking into account the frequencies AGLV AGLV AG[IL]V AGIV [AC]-x-G-x{4}-{L,I} [Ala or Cys]-any-Gly-any-any-any-any-[any but Leu or Ile]

10 10 Advanced Searches: Profiles - Profiles are Position Specific Scoring Matrices (PSSM). - Same concept as the Scoring Matrices (i.e. BLOSUM), but these ones are calculated on scratch using each position in the MSA instead of being pre-calculated. -Thus, PSSM s are 20xN matrices, being N the length of the sequence - PSSM s take into account information specific to the family of proteins, so is Inferred from the alignment Few Assumptions - We can align a sequence and a profile using Smith & Waterman Algorithm Search for homologues MSA PSSM PSSM borrowed from F.Abascal

11 11 Advanced Searches: HMM-profiles - HMM stands for Hidden Markov Model. - Originally implemented for Speech Recognition - They are much robust than the simple profiles, specially when dealing with gaps. However, they are harder to implement. -HMMer is a package similar to BLAST but using HMMs x hidden states y observable outputs a transition probabilities b output probabilities

12 12 Sequence Comparisons: How? - Pairwise Alignment of 2 Sequences: - Aligning a couple of sequences - Searching for homologues (BLAST) - Multiple Sequence Alignments (n>2): - Advanced Sequence Alignments: - Patterns, Profiles, HMMs - Distant Homologues with PSI-BLAST

13 13 Remote Homologues: PSI-Blast - PSI-BLAST Position Specific Iterated Blast - PSI-Blast is useful to retrieve remote homologues (id<20%) - Algorithm: 1) Run BLAST [Iteration #0] 2) Generate PSSM with the results better than a given threshold (e-value) 3) Run BLAST again using the PSSM as Input, [Iterations #1 to #N] 4) Update PSSM with new results 5) Repeat from 3 until convergence* *Convergence: When we can not find new results

14 14 Remote Homologues: PSI-Blast Target Sequence BLAS T DataBase PSI-BLAST PSSM Closely Related Homologues Remotely Related Homologues

15 15 Acknowledgements Federico Abascal (Original Text) Juan Carlos Sanchez Alfonso Valencia

16 16 XXX

### Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

### THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

### Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

Sequence Alignments Dynamic programming approaches, scoring, and significance Lucy Skrabanek ICB, WMC January 31, 213 Sequence alignment Compare two (or more) sequences to: Find regions of conservation

### Similarity searching summary (2)

Similarity searching / sequence alignment summary Biol4230 Thurs, February 22, 2016 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 What have we covered? Homology excess similiarity but no excess similarity

### Exercise 5. Sequence Profiles & BLAST

Exercise 5 Sequence Profiles & BLAST 1 Substitution Matrix (BLOSUM62) Likelihood to substitute one amino acid with another Figure taken from https://en.wikipedia.org/wiki/blosum 2 Substitution Matrix (BLOSUM62)

### Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

### Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Proteomics Chapter 5. Proteomics and the analysis of protein sequence Ⅱ 1 Pairwise similarity searching (1) Figure 5.5: manual alignment One of the amino acids in the top sequence has no equivalent and

### Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment: today s goals to define what a multiple sequence alignment is and how it is generated; to describe profile HMMs to introduce databases of multiple

### Similarity or Identity? When are molecules similar?

Similarity or Identity? When are molecules similar? Mapping Identity A -> A T -> T G -> G C -> C or Leu -> Leu Pro -> Pro Arg -> Arg Phe -> Phe etc If we map similarity using identity, how similar are

### EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

### CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

### Template-Based 3D Structure Prediction

Template-Based 3D Structure Prediction Sequence and Structure-based Template Detection and Alignment Issues The rate of new sequences is growing exponentially relative to the rate of protein structures

### Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

### Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5 Why Look at More Than One Sequence? 1. Multiple Sequence Alignment shows patterns of conservation 2. What and how many

### Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Alignment principles and homology searching using (PSI-)BLAST Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU) http://ibivu.cs.vu.nl Bioinformatics Nothing in Biology makes sense except in

### Single alignment: Substitution Matrix. 16 march 2017

Single alignment: Substitution Matrix 16 march 2017 BLOSUM Matrix BLOSUM Matrix [2] (Blocks Amino Acid Substitution Matrices ) It is based on the amino acids substitutions observed in ~2000 conserved block

### Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

Motifs, Profiles and Domains Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC Comparing Two Proteins Sequence Alignment Determining the pattern of evolution and identifying conserved

### Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

### Large-Scale Genomic Surveys

Bioinformatics Subtopics Fold Recognition Secondary Structure Prediction Docking & Drug Design Protein Geometry Protein Flexibility Homology Modeling Sequence Alignment Structure Classification Gene Prediction

### Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

### 5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT

5. MULTIPLE SEQUENCE ALIGNMENT BIOINFORMATICS COURSE MTAT.03.239 03.10.2012 ALIGNMENT Alignment is the task of locating equivalent regions of two or more sequences to maximize their similarity. Homology:

### Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Sequence Analysis, '18 -- lecture 9 Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene. How can I represent thousands of homolog sequences in a compact

### CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

### Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

### Sequence analysis and comparison

The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2015 Week-4 BLAST Algorithm Continued Multiple Sequence Alignment Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and

### Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program) Course Name: Structural Bioinformatics Course Description: Instructor: This course introduces fundamental concepts and methods for structural

### Pairwise sequence alignments. Vassilios Ioannidis (From Volker Flegel )

Pairwise sequence alignments Vassilios Ioannidis (From Volker Flegel ) Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs Importance

### Introduction to Bioinformatics

Introduction to Bioinformatics Jianlin Cheng, PhD Department of Computer Science Informatics Institute 2011 Topics Introduction Biological Sequence Alignment and Database Search Analysis of gene expression

### Bioinformatics. Proteins II. - Pattern, Profile, & Structure Database Searching. Robert Latek, Ph.D. Bioinformatics, Biocomputing

Bioinformatics Proteins II. - Pattern, Profile, & Structure Database Searching Robert Latek, Ph.D. Bioinformatics, Biocomputing WIBR Bioinformatics Course, Whitehead Institute, 2002 1 Proteins I.-III.

### Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:

Multiple Sequence Alignment, Gunnar Klau, December 9, 2005, 17:50 5001 5 Multiple Sequence Alignment The first part of this exposition is based on the following sources, which are recommended reading:

### Alignment & BLAST. By: Hadi Mozafari KUMS

Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence

### Programme Last week s quiz results + Summary Fold recognition Break Exercise: Modelling remote homologues

Programme 8.00-8.20 Last week s quiz results + Summary 8.20-9.00 Fold recognition 9.00-9.15 Break 9.15-11.20 Exercise: Modelling remote homologues 11.20-11.40 Summary & discussion 11.40-12.00 Quiz 1 Feedback

### 3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode

### Tutorial 4 Substitution matrices and PSI-BLAST

Tutorial 4 Substitution matrices and PSI-BLAST 1 Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about

### An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

### Multiple Sequence Alignment

Multiple Sequence Alignment BMI/CS 576 www.biostat.wisc.edu/bmi576.html Colin Dewey cdewey@biostat.wisc.edu Multiple Sequence Alignment: Tas Definition Given a set of more than 2 sequences a method for

### Protein Structure Prediction and Display

Protein Structure Prediction and Display Goal Take primary structure (sequence) and, using rules derived from known structures, predict the secondary structure that is most likely to be adopted by each

### Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

Introduction to Bioinformatics Prof. Dr. Nizamettin AYDIN naydin@yildiz.edu.tr Multiple Sequence Alignment Outline Multiple sequence alignment introduction to msa methods of msa progressive global alignment

### Pairwise sequence alignments

Pairwise sequence alignments Volker Flegel VI, October 2003 Page 1 Outline Introduction Definitions Biological context of pairwise alignments Computing of pairwise alignments Some programs VI, October

### Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

### CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

### Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

### Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.

### Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Christian Sigrist General Definition on Conserved Regions Conserved regions in proteins can be classified into 5 different groups: Domains: specific combination of secondary structures organized into a

### Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Bioinformatics II Probability and Statistics Universität Zürich and ETH Zürich Spring Semester 2009 Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM) Dr Fraser Daly adapted from

### Hidden Markov Models

Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

### Administration. ndrew Torda April /04/2008 [ 1 ]

ndrew Torda April 2008 Administration 22/04/2008 [ 1 ] Sprache? zu verhandeln (Englisch, Hochdeutsch, Bayerisch) Selection of topics Proteins / DNA / RNA Two halves to course week 1-7 Prof Torda (larger

### HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION

From THE CENTER FOR GENOMICS AND BIOINFORMATICS Karolinska Institutet, Stockholm, Sweden HIDDEN MARKOV MODELS FOR REMOTE PROTEIN HOMOLOGY DETECTION Markus Wistrand Stockholm 2005 All previously published

### PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES

PROTEIN FUNCTION PREDICTION WITH AMINO ACID SEQUENCE AND SECONDARY STRUCTURE ALIGNMENT SCORES Eser Aygün 1, Caner Kömürlü 2, Zafer Aydin 3 and Zehra Çataltepe 1 1 Computer Engineering Department and 2

### Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Sequence Analysis 17: lecture 5 Substitution matrices Multiple sequence alignment Substitution matrices Used to score aligned positions, usually of amino acids. Expressed as the log-likelihood ratio of

### A profile-based protein sequence alignment algorithm for a domain clustering database

A profile-based protein sequence alignment algorithm for a domain clustering database Lin Xu,2 Fa Zhang and Zhiyong Liu 3, Key Laboratory of Computer System and architecture, the Institute of Computing

### Tools and Algorithms in Bioinformatics

Tools and Algorithms in Bioinformatics GCBA815, Fall 2013 Week3: Blast Algorithm, theory and practice Babu Guda, Ph.D. Department of Genetics, Cell Biology & Anatomy Bioinformatics and Systems Biology

### Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

### Practical considerations of working with sequencing data

Practical considerations of working with sequencing data File Types Fastq ->aligner -> reference(genome) coordinates Coordinate files SAM/BAM most complete, contains all of the info in fastq and more!

### An Introduction to Bioinformatics Algorithms Hidden Markov Models

Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

### Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations

Sequence Analysis and Structure Prediction Service Centro Nacional de Biotecnología CSIC 8-10 May, 2013 Introductory course on Multiple Sequence Alignment Part I: Theoretical foundations Course Notes Instructor:

### HMMs and biological sequence analysis

HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

### CMPS 3110: Bioinformatics. Tertiary Structure Prediction

CMPS 3110: Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the laws of physics! Conformation space is finite

### CMPS 6630: Introduction to Computational Biology and Bioinformatics. Tertiary Structure Prediction

CMPS 6630: Introduction to Computational Biology and Bioinformatics Tertiary Structure Prediction Tertiary Structure Prediction Why Should Tertiary Structure Prediction Be Possible? Molecules obey the

### Protein Structure Prediction using String Kernels. Technical Report

Protein Structure Prediction using String Kernels Technical Report Department of Computer Science and Engineering University of Minnesota 4-192 EECS Building 200 Union Street SE Minneapolis, MN 55455-0159

### Ch. 9 Multiple Sequence Alignment (MSA)

Ch. 9 Multiple Sequence Alignment (MSA) - gather seqs. to make MSA - doing MSA with ClustalW - doing MSA with Tcoffee - comparing seqs. that cannot align Introduction - from pairwise alignment to MSA -

### Algorithms in Bioinformatics

Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

### Sequence-specific sequence comparison using pairwise statistical significance

Graduate Theses and Dissertations Graduate College 2009 Sequence-specific sequence comparison using pairwise statistical significance Ankit Agrawal Iowa State University Follow this and additional works

### Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Bioinformática Sequence Alignment Pairwise Sequence Alignment Universidade da Beira Interior (Thanks to Ana Teresa Freitas, IST for useful resources on this subject) 1 16/3/29 & 23/3/29 27/4/29 Outline

### Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract

### Global alignments - review

Global alignments - review Take two sequences: X[j] and Y[j] M[i-1, j-1] ± 1 M[i, j] = max M[i, j-1] 2 M[i-1, j] 2 The best alignment for X[1 i] and Y[1 j] is called M[i, j] X[j] Initiation: M[,]= pply

### Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William

### Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation Qi Sun Bioinformatics Facility Cornell University Some basic bioinformatics tools BLAST PSI-BLAST - Position-Specific Scoring Matrix HMM - Hidden Markov Model NCBI BLAST How does BLAST

### Supporting Text 1. Comparison of GRoSS sequence alignment to HMM-HMM and GPCRDB

Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications, Cvicek et al. Supporting Text 1 Here we compare the GRoSS alignment

### Collected Works of Charles Dickens

Collected Works of Charles Dickens A Random Dickens Quote If there were no bad people, there would be no good lawyers. Original Sentence It was a dark and stormy night; the night was dark except at sunny

### Bioinformatics: Secondary Structure Prediction

Bioinformatics: Secondary Structure Prediction Prof. David Jones d.jones@cs.ucl.ac.uk LMLSTQNPALLKRNIIYWNNVALLWEAGSD The greatest unsolved problem in molecular biology:the Protein Folding Problem? Entries

### 09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Sequence identity Structural similarity Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Fold recognition Sommersemester 2009 Peter Güntert Structural similarity X Sequence identity Non-uniform

### Intro Protein structure Motifs Motif databases End. Last time. Probability based methods How find a good root? Reliability Reconciliation analysis

Last time Probability based methods How find a good root? Reliability Reconciliation analysis Today Intro to proteinstructure Motifs and domains First dogma of Bioinformatics Sequence structure function

### A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling

A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling Ari Ugarte, Riccardo Vicedomini, Juliana Silva Bernardes, Alessandra Carbone 9 September,

### Overview Multiple Sequence Alignment

Overview Multiple Sequence Alignment Inge Jonassen Bioinformatics group Dept. of Informatics, UoB Inge.Jonassen@ii.uib.no Definition/examples Use of alignments The alignment problem scoring alignments

### Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

### Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

### Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Combining pairwise sequence similarity and support vector machines for remote protein homology detection Li Liao Central Research & Development E. I. du Pont de Nemours Company li.liao@usa.dupont.com William

### CSE 549: Computational Biology. Substitution Matrices

CSE 9: Computational Biology Substitution Matrices How should we score alignments So far, we ve looked at arbitrary schemes for scoring mutations. How can we assign scores in a more meaningful way? Are

### SEQUENCE alignment is an underlying application in the

194 IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 8, NO. 1, JANUARY/FEBRUARY 2011 Pairwise Statistical Significance of Local Sequence Alignment Using Sequence-Specific and Position-Specific

### Sequence comparison: Score matrices. Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas

Sequence comparison: Score matrices Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas Informal inductive proof of best alignment path onsider the last step in the best

### Scoring Matrices. Shifra Ben-Dor Irit Orr

Scoring Matrices Shifra Ben-Dor Irit Orr Scoring matrices Sequence alignment and database searching programs compare sequences to each other as a series of characters. All algorithms (programs) for comparison

### In-Depth Assessment of Local Sequence Alignment

2012 International Conference on Environment Science and Engieering IPCBEE vol.3 2(2012) (2012)IACSIT Press, Singapoore In-Depth Assessment of Local Sequence Alignment Atoosa Ghahremani and Mahmood A.

### Moreover, the circular logic

Moreover, the circular logic How do we know what is the right distance without a good alignment? And how do we construct a good alignment without knowing what substitutions were made previously? ATGCGT--GCAAGT

### BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer Protein Structure Prediction I

BIOINF 4120 Bioinformatics 2 - Structures and Systems - Oliver Kohlbacher Summer 2013 9. Protein Structure Prediction I Structure Prediction Overview Overview of problem variants Secondary structure prediction

### Sequence analysis and Genomics

Sequence analysis and Genomics October 12 th November 23 rd 2 PM 5 PM Prof. Peter Stadler Dr. Katja Nowick Katja: group leader TFome and Transcriptome Evolution Bioinformatics group Paul-Flechsig-Institute

### The Pennsylvania State University. The Graduate School. College of Engineering A COMPUTATIONAL FRAMEWORK FOR INFERRING STRUCTURE, FUNCTION,

The Pennsylvania State University The Graduate School College of Engineering A COMPUTATIONAL FRAMEWORK FOR INFERRING STRUCTURE, FUNCTION, AND EVOLUTION OF PROTEINS A Dissertation in Computer Science and

### Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm Alignment scoring schemes and theory: substitution matrices and gap models 1 Local sequence alignments Local sequence alignments are necessary

### Sequence Analysis '17- lecture 8. Multiple sequence alignment

Sequence Analysis '17- lecture 8 Multiple sequence alignment Ex5 explanation How many random database search scores have e-values 10? (Answer: 10!) Why? e-value of x = m*p(s x), where m is the database

### Sequence comparison: Score matrices

Sequence comparison: Score matrices http://facultywashingtonedu/jht/gs559_2013/ Genome 559: Introduction to Statistical and omputational Genomics Prof James H Thomas FYI - informal inductive proof of best

### Computational Biology

Computational Biology Lecture 6 31 October 2004 1 Overview Scoring matrices (Thanks to Shannon McWeeney) BLAST algorithm Start sequence alignment 2 1 What is a homologous sequence? A homologous sequence,

### Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona (tgabaldon@crg.es) http://gabaldonlab.crg.es Homology the same organ in different animals under

### proteins Refinement by shifting secondary structure elements improves sequence alignments

proteins STRUCTURE O FUNCTION O BIOINFORMATICS Refinement by shifting secondary structure elements improves sequence alignments Jing Tong, 1,2 Jimin Pei, 3 Zbyszek Otwinowski, 1,2 and Nick V. Grishin 1,2,3

### 1-D Predictions. Prediction of local features: Secondary structure & surface exposure

1-D Predictions Prediction of local features: Secondary structure & surface exposure 1 Learning Objectives After today s session you should be able to: Explain the meaning and usage of the following local

### Basic Local Alignment Search Tool

Basic Local Alignment Search Tool Alignments used to uncover homologies between sequences combined with phylogenetic studies o can determine orthologous and paralogous relationships Local Alignment uses

### CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

CSE182-L7 Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding 10-07 CSE182 Bell Labs Honors Pattern matching 10-07 CSE182 Just the Facts Consider the set of all substrings