Tutorial 4 Substitution matrices and PSI-BLAST

Similar documents
Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Week 10: Homology Modelling (II) - HHpred

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Scoring Matrices. Shifra Ben-Dor Irit Orr

BIO 285/CSCI 285/MATH 285 Bioinformatics Programming Lecture 8 Pairwise Sequence Alignment 2 And Python Function Instructor: Lei Qian Fisk University

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Computational Biology

Bioinformatics. Scoring Matrices. David Gilbert Bioinformatics Research Centre

Bioinformatics and BLAST

Biology Tutorial. Aarti Balasubramani Anusha Bharadwaj Massa Shoura Stefan Giovan

Algorithms in Bioinformatics

- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

Large-Scale Genomic Surveys

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Lecture 4: Evolutionary Models and Substitution Matrices (PAM and BLOSUM)

Exercise 5. Sequence Profiles & BLAST

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Introduction to protein alignments

7.36/7.91 recitation CB Lecture #4

Sequence Analysis 17: lecture 5. Substitution matrices Multiple sequence alignment

Quantifying sequence similarity

Tools and Algorithms in Bioinformatics

Sequence Database Search Techniques I: Blast and PatternHunter tools

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Substitution matrices

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Bioinformatics for Biologists

Introduction to Bioinformatics

Basic Local Alignment Search Tool

Tools and Algorithms in Bioinformatics

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Sequence Alignments. Dynamic programming approaches, scoring, and significance. Lucy Skrabanek ICB, WMC January 31, 2013

SUPPLEMENTARY INFORMATION

Exploring Evolution & Bioinformatics

Sequence analysis and comparison

First generation sequencing and pairwise alignment (High-tech, not high throughput) Analysis of Biological Sequences

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

BLAST. Varieties of BLAST

Pairwise sequence alignment

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Bioinformatics Exercises

EECS730: Introduction to Bioinformatics

Similarity searching summary (2)

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

In-Depth Assessment of Local Sequence Alignment

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Lecture 2, 5/12/2001: Local alignment the Smith-Waterman algorithm. Alignment scoring schemes and theory: substitution matrices and gap models

Single alignment: Substitution Matrix. 16 march 2017

Bioinformatics Chapter 1. Introduction

EECS730: Introduction to Bioinformatics

BLAST: Target frequencies and information content Dannie Durand

Practical considerations of working with sequencing data

Moreover, the circular logic

Collected Works of Charles Dickens

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Sequence analysis and Genomics

Local Alignment Statistics

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Sequence Alignment Techniques and Their Uses

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

BME 5742 Biosystems Modeling and Control

SUPPLEMENTARY INFORMATION

An Introduction to Sequence Similarity ( Homology ) Searching

Lecture 4: Evolutionary models and substitution matrices (PAM and BLOSUM).

Pairwise & Multiple sequence alignments

Advanced Certificate in Principles in Protein Structure. You will be given a start time with your exam instructions

Alignment & BLAST. By: Hadi Mozafari KUMS

Copyright Mark Brandt, Ph.D A third method, cryogenic electron microscopy has seen increasing use over the past few years.

2 Genome evolution: gene fusion versus gene fission

Computational Genomics and Molecular Biology, Fall

Supplemental Materials

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Syllabus of BIOINF 528 (2017 Fall, Bioinformatics Program)

Christian Sigrist. November 14 Protein Bioinformatics: Sequence-Structure-Function 2018 Basel

Administration. ndrew Torda April /04/2008 [ 1 ]

Scoring Matrices. Shifra Ben Dor Irit Orr

Biology I Fall Semester Exam Review 2014

09/06/25. Computergestützte Strukturbiologie (Strukturelle Bioinformatik) Non-uniform distribution of folds. Scheme of protein structure predicition

Number of questions TEK (Learning Target) Biomolecules & Enzymes

Protein function prediction based on sequence analysis

Advanced topics in bioinformatics

Domain-based computational approaches to understand the molecular basis of diseases

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

C E N T R. Introduction to bioinformatics 2007 E B I O I N F O R M A T I C S V U F O R I N T. Lecture 5 G R A T I V. Pair-wise Sequence Alignment

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Genomics and bioinformatics summary. Finding genes -- computer searches

4) The diagram below represents the organization of genetic information within a cell nucleus.

Genomes and Their Evolution

Fundamentals of database searching

Homology and Information Gathering and Domain Annotation for Proteins

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Introduction to Bioinformatics Online Course: IBT

G4120: Introduction to Computational Biology

16 The Cell Cycle. Chapter Outline The Eukaryotic Cell Cycle Regulators of Cell Cycle Progression The Events of M Phase Meiosis and Fertilization

Transcription:

Tutorial 4 Substitution matrices and PSI-BLAST 1

Agenda Substitution Matrices PAM - Point Accepted Mutations BLOSUM - Blocks Substitution Matrix PSI-BLAST Cool story of the day: Why should we care about cellular fusion in worms? 2

Studying distant homologies When we study a new organism/ protein we may find a lot of unknown sequences that we would like to characterize. We might not be able to find any close homologies. In the evolution, three-dimensional structures of proteins may be conserved even after considerable erosion of their sequence similarity. https://www.ncbi.nlm.nih.gov/books/nbk/ 2590 3

Multiple alignment of the new protein families of the HSP70-actin fold O-sialoglycoproteases (OSGP) and related proteins Newly identified proteins with the HSP70-actin fold UDPases and extracellular ATPases Classic HSP70 and sugar kinases Aravind and Koonin, Journal of Molecular Biology, 1999 4

Comparison of the HSP70 structure and a structural model of the O-sialoglycoprotease Aravind and Koonin, Journal of Molecular Biology, 1999 5

Substitution matrices model different evolutionary distances. PSI-BLAST enable to find more distant relations between proteins. 6

Amino acids were not born equally Both substitution matrices and PSI-BLAST are designed to model the process by which AAs mutate. 7

Substitution Matrix Scoring matrix S of size 20x20 Si,j represents the gain/penalty due to substituting AAj by AAi (i line, j column) Based on likelihood this substitution is found in nature Computed differently in PAM and BLOSUM Each matrix is tailored to a particular evolutionary distance. 8

PAM vs. BLOSUM PAM Based on global alignments of closely related proteins. The PAM1 is calculated from comparisons of sequences with no more than 1% divergence. Other PAM matrices are extrapolated from PAM1. BLOSUM Based on local alignments. BLOSUM 62 is calculated from comparisons of sequences with no more than 62% identity in the blocks. All BLOSUM matrices are based on observed alignments. They are not extrapolated from comparisons of closely related proteins. BLOSUM are the substitution matrices in use 9

Use Recommendations PAM100 ~ BLOSUM90 PAM120 ~ BLOSUM80 PAM160 ~ BLOSUM60 PAM200 ~ BLOSUM52 PAM250 ~ BLOSUM45 Closely Related Highly Divergent Query length Matrix Gap costs <35 PAM30 9,1 35-50 PAM70 10,1 50-85 BLOSUM80 10,1 >85 BLOSUM62 11,1 http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html 10

Example Query: an uncharacterized (hypothetical) protein Data Base: nr Blast Program: BLASTP Matrices: PAM30 / PAM250 BLOSUM45 / BLOSUM90 11

12

13

PSI-BLAST Position Specific Iterative BLAST Aimed to find more distant proteins than BLAST allows 14

PSI-BLAST Steps 1. Search a query against a protein database. 2. Constructs a specialized multiple sequence alignment based on the top results. 3. Creates a position-specific scoring matrix (PSSM). 4. The PSSM is used as a query against the database. 5. PSI-BLAST estimates statistical significance (E values) Repeat steps 3-5 iteratively. Iterations Query Search Protein DB Results PSSM 15

PSSM The PSSM captures the conservation pattern in alignment and stores it as a matrix of scores for each position in the alignment. This profile is used in place of the original substitution matrix for a further search of the database to detect sequences that match the conservation pattern specified by the PSSM. http://www.ebi.ac.uk/training/online/course/introduction-protein-classification-ebi/what-areprotein-signatures/signature-types/what-are https://www.ncbi.nlm.nih.gov/books/nbk2590/- 16

PSI-BLAST Example Cellular DNA polymerase enzymes tend to dissociate from DNA after adding a few nucleotides and require an accessory factor to tether them to DNA while elongating the growing DNA chain. In eukaryotes: PCNA In prokayotes: β-subunit of DNA polymerase encoded by the dnan gene https://www.ncbi.nlm.nih.gov/books/nbk/ 2590 17

E.Coli (dimer) Human (trimer) 18

Querying the human protein Changed to 1000 19

20

21

Summarize results by organism 22

23

24

25

Marked in yellow are sequences scoring below threshold on previous iteration 26

Iteration 2 27

Iteration 3 28

Iteration X Hydrophobic AAs: V, L, F, A Polar AAs: E, D, K,R, N, Y Alignment of human proliferating cell nuclear antigen (PCNA) and Escherichia coli DNA polymerase III β-subunit. 29

Example 2 We will use a sequence of an uncharacterized (hypothetical) protein: 30

Threshold for initial BLAST Search (default: 10) Threshold for inclusion in PSI-BLAST iterations (default: 0.005) 31

The results are all hypothetical proteins 32

33

Cool Story of the day Why should we care about cellular fusion in worms?

Cellular fusion In cellular fusion two cells unite and form one cell Fertilization Muscle cells are composed of rows of fused cells Placenta is made up of powerful multinucleated cells that are actually numerous individual cells that have fused The eyes' lenses are formed of rows of fused cells In bones too cellular fusion occurs. The fusion processes are also involved in cancer, viral infections and stem cells. http://www1.technion.ac.il/_local/includes/blocks/scinews-items/100513-elegans/news-item-en.htm 35

Cellular fusion in C.elegans How exactly two cells fuse is not is the focus of work in Prof. Podbilewicz's lab. The worm suits cell fusion research because intensive cell-cell fusion processes take place in its skin and can be easily followed. They identified the protein responsible for the worm's fusion activity - the EFF-1 protein. The researchers showed that in mutant worms skin cells do not fuse and the cells begin to migrate through the body. Beni Podbilewicz 36

37

...we identified fusion family (FF) proteins within and beyond nematodes, and divergent members from the human parasitic nematode Trichinella spiralis and the chordate Branchiostoma floridae could also fuse mammalian cells 38