- conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster.

Similar documents
Week 10: Homology Modelling (II) - HHpred

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Tutorial 4 Substitution matrices and PSI-BLAST

BLAST. Varieties of BLAST

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Supplemental Materials

Genomics and bioinformatics summary. Finding genes -- computer searches

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Sequence Analysis, '18 -- lecture 9. Families and superfamilies. Sequence weights. Profiles. Logos. Building a representative model for a gene.

Grundlagen der Bioinformatik Summer semester Lecturer: Prof. Daniel Huson

-max_target_seqs: maximum number of targets to report

GEP Annotation Report

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

BIOINFORMATICS LAB AP BIOLOGY

Sequences, Structures, and Gene Regulatory Networks

CS612 - Algorithms in Bioinformatics

Hands-On Nine The PAX6 Gene and Protein

Annotation of Drosophila grimashawi Contig12

GENERAL BIOLOGY LABORATORY EXERCISE Amino Acid Sequence Analysis of Cytochrome C in Bacteria and Eukarya Using Bioinformatics

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Qi Sun Bioinformatics Facility Cornell University

Understanding Sequence, Structure and Function Relationships and the Resulting Redundancy

Procedure to Create NCBI KOGS

Homology and Information Gathering and Domain Annotation for Proteins

SUPPLEMENTARY INFORMATION

Exercise 5. Sequence Profiles & BLAST

EBI web resources II: Ensembl and InterPro

Bioinformatics Chapter 1. Introduction

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Single alignment: Substitution Matrix. 16 march 2017

Introduction to Bioinformatics

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Evaluation. Course Homepage.

Open a Word document to record answers to any italicized questions. You will the final document to me at

Introduction to Evolutionary Concepts

Basic Local Alignment Search Tool

Molecular Modeling. Prediction of Protein 3D Structure from Sequence. Vimalkumar Velayudhan. May 21, 2007

MiGA: The Microbial Genome Atlas

EECS730: Introduction to Bioinformatics

RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Sequencing alignment Ameer Effat M. Elfarash

Scoring Matrices. Shifra Ben-Dor Irit Orr

Comparative genomics: Overview & Tools + MUMmer algorithm

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

Example of Function Prediction

Hidden Markov Models in computational biology. Ron Elber Computer Science Cornell

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Detailed overview of the primer-free full-length SSU rrna library preparation.

Bioinformatics for Biologists

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Sequencing alignment Ameer Effat M. Elfarash

DNA and protein databases. EMBL/GenBank/DDBJ database of nucleic acids

Protein Interaction Mapping: Use of Osprey to map Survival of Motor Neuron Protein interactions

Comparative Network Analysis

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Bioinformatics and BLAST

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

Protein function prediction based on sequence analysis

Name: Class: Date: ID: A

Lecture 2. The Blast2GO annotation framework

Bioinformatics Exercises

K-means-based Feature Learning for Protein Sequence Classification

Multiple sequence alignment

Biological Networks: Comparison, Conservation, and Evolution via Relative Description Length By: Tamir Tuller & Benny Chor

Motifs, Profiles and Domains. Michael Tress Protein Design Group Centro Nacional de Biotecnología, CSIC

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Sequence and Structure Alignment Z. Luthey-Schulten, UIUC Pittsburgh, 2006 VMD 1.8.5

Jeremy Chang Identifying protein protein interactions with statistical coupling analysis

Tools and Algorithms in Bioinformatics

Practical search strategies

Effective Cluster-Based Seed Design for Cross-Species Sequence Comparisons

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Sequence Analysis and Databases 2: Sequences and Multiple Alignments

Comparative Bioinformatics Midterm II Fall 2004

Robert Edgar. Independent scientist

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Chapter 15 Active Reading Guide Regulation of Gene Expression

Large-Scale Genomic Surveys

Protein Structure Prediction and Display

Computational methods for predicting protein-protein interactions

EECS730: Introduction to Bioinformatics

USING BLAST TO IDENTIFY PROTEINS THAT ARE EVOLUTIONARILY RELATED ACROSS SPECIES

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

objective functions...

Sequence analysis and comparison

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

Homology. and. Information Gathering and Domain Annotation for Proteins

String Matching Problem

SUPPLEMENTARY INFORMATION

Chapter 17. Table of Contents. Objectives. Taxonomy. Classifying Organisms. Section 1 Biodiversity. Section 2 Systematics

Alignment & BLAST. By: Hadi Mozafari KUMS

Introduction to Bioinformatics Online Course: IBT

We have: We will: Assembled six genomes Made predictions of most likely gene locations. Add a layers of biological meaning to the sequences

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CSCE555 Bioinformatics. Protein Function Annotation

RGP finder: prediction of Genomic Islands

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

How much non-coding DNA do eukaryotes require?

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

Transcription:

NCBI BLAST Services DELTA-BLAST BLAST (http://blast.ncbi.nlm.nih.gov/), Basic Local Alignment Search tool, is a suite of programs for finding similarities between biological sequences. DELTA-BLAST is a new, sensitive algorithm, intended for detection of distant protein homologs. Like BLAST, DELTA is an acronym and it stands for Domain Enhanced Lookup Time Accelerated. Outline: - Compare DELTA-BLAST searches with blastp. - Search for homologs for a human RAS oncogene family. - Find which of the human RAS oncogene family are conserved throughout the Eukaryotes. - Use the longest protein isoforms of these genes to search for homologs in Bacteria. - Search against a few bacterial genera in the RefSeq protein database. Step 1: Homologene search http://www.ncbi.nlm.nih.gov/homologene/ Search for clusters with the following requirements: - conserved in Eukaryotes. - proteins in the cluster have identifiable conserved domains. - human gene should be included in the cluster. The search term for such requirements is: "txid2759"[ancestor:noexp] AND "homologene cdd"[filter] AND human[organism] Step 2: The Gene database search From the Homologene search result page, link to the Gene database. 1

Select the human records. Note the search term in the Gene search box, once the page reloads: Designation #10 comes from the Homologene link. It will be different each time. The "Homo sapiens"[porgn: txid9606] term has been added from selecting the human organism from the taxonomy sorter. This is also reflected in the search details on the right side of the screen. 2

Add "ras oncogene"[gene/protein Name] to the search term in the query box and run the search. The result of the search is 17 human Gene records for genes that have "ras oncogene" in their name. According to our selection, they all have identified conserved domains and they are conserved throughout Eukaryotic kingdom. In find related data, select protein database. Link to the Reference Sequence proteins for these genes. Step 3: Protein records and protein alignment After selecting find items. Some of the Genes have more than one transcript. These splice variants result in different protein isoforms. You can select the longest isoforms of each gene by simply looking at the reported protein length on the result summary page. (You can send your section to clipboard and save the items in My NCBI as a collection). Check those below: NP_004153.2 (215 aa), NP_005361.2 (207 aa), NP_006852.1 (201 aa), NP_116235.2 (216 aa), NP_057661.3 (208 aa) Select the link to the COBALT tool to see their multiple alignments. 3

Result of their alignment, Select NP_116235 protein. Step 4: Conserved domain and PSSM Go to the protein record of Rab-2B and select the link to Conserved Domains to see its corresponding conserved domain. The domain is Rab2 and it belongs to the P-loop_NTPase superfamily, link to the domain. You can see active GTP/Mg2+ binding site on the aligned sequences if you scroll down the record. You can also view the PSSM for the CD (link indicated). A PSSM, or Position-Specific Scoring Matrix, is a type of scoring matrix used in protein BLAST searches in which amino acid substitution scores are given separately for each position in a protein multiple sequence alignment. 4

From the alignment, you can see the three conserved domains. Cysteine at position 19 has score 10. 5

Step 5: BLAST searches Select the link to the BLAST page from the protein record. The link leads to the blastp page with the query accession added. Change the database to search to the Reference proteins. The query for the Rab2 isoform NP_116235.2. Add all the information, as you can see in the picture below. 6

Increase the Max target sequences to 500. 7

BLASTp Output for RAB2: Alignment result. The blastp alignment covers only one of the active site. 8

DELTA-BLAST for RAB2: Change the job title, replace blastp by delta. Select Delta-Blast for algorithm. Increase the Max target sequences to 5000. 9

The result: For DELTA-BLAST, all three active sites are in the alignment. 10