RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES

Similar documents
Hands-On Nine The PAX6 Gene and Protein

Phylogenetics - Orthology, phylogenetic experimental design and phylogeny reconstruction. Lesser Tenrec (Echinops telfairi)

08/21/2017 BLAST. Multiple Sequence Alignments: Clustal Omega

Orthology Part I: concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Bioinformatics Exercises

Investigation 3: Comparing DNA Sequences to Understand Evolutionary Relationships with BLAST

Using Bioinformatics to Study Evolutionary Relationships Instructions

Introduction to protein alignments

BLAST. Varieties of BLAST

Orthology Part I concepts and implications Toni Gabaldón Centre for Genomic Regulation (CRG), Barcelona

Comparing Genomes! Homologies and Families! Sequence Alignments!

Bioinformatics: Investigating Molecular/Biochemical Evidence for Evolution

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

Introduction to Bioinformatics

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Basic Local Alignment Search Tool

Elements of Bioinformatics 14F01 TP5 -Phylogenetic analysis

Master Biomedizin ) UCSC & UniProt 2) Homology 3) MSA 4) Phylogeny. Pablo Mier

Tree Building Activity

Homology and Information Gathering and Domain Annotation for Proteins

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

BIOINFORMATICS LAB AP BIOLOGY

Open a Word document to record answers to any italicized questions. You will the final document to me at

3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT

Bioinformatics. Part 8. Sequence Analysis An introduction. Mahdi Vasighi

Example of Function Prediction

Pairwise & Multiple sequence alignments

Emily Blanton Phylogeny Lab Report May 2009

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

MegAlign Pro Pairwise Alignment Tutorials

Journal of Proteomics & Bioinformatics - Open Access

Tools and Algorithms in Bioinformatics

Investigating Evolutionary Questions Using Online Molecular Databases *

Homology. and. Information Gathering and Domain Annotation for Proteins

Sara C. Madeira. Universidade da Beira Interior. (Thanks to Ana Teresa Freitas, IST for useful resources on this subject)

Introduction to Bioinformatics Online Course: IBT

COMPARING DNA SEQUENCES TO UNDERSTAND EVOLUTIONARY RELATIONSHIPS WITH BLAST

Bioinformatics Chapter 1. Introduction

Bioinformatics tools for phylogeny and visualization. Yanbin Yin

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Multiple Sequence Alignments

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Grundlagen der Bioinformatik, SS 08, D. Huson, May 2,

Bioinformatics and BLAST

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Homology Modeling. Roberto Lins EPFL - summer semester 2005

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

Ch. 9 Multiple Sequence Alignment (MSA)

Protein function prediction based on sequence analysis

Session 5: Phylogenomics

B I O I N F O R M A T I C S

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics

GENERAL BIOLOGY LABORATORY EXERCISE Amino Acid Sequence Analysis of Cytochrome C in Bacteria and Eukarya Using Bioinformatics

Figure S1: Mitochondrial gene map for Pythium ultimum BR144. Arrows indicate transcriptional orientation, clockwise for the outer row and

Alignment principles and homology searching using (PSI-)BLAST. Jaap Heringa Centre for Integrative Bioinformatics VU (IBIVU)

Heuristic Alignment and Searching

Tools and Algorithms in Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics

8/23/2014. Phylogeny and the Tree of Life

Gene Families part 2. Review: Gene Families /727 Lecture 8. Protein family. (Multi)gene family

Supplemental Figure 1.

Chapter 5. Proteomics and the analysis of protein sequence Ⅱ

Algorithms in Bioinformatics

Sequence Alignment Techniques and Their Uses

Supplementary text for the section Interactions conserved across species: can one select the conserved interactions?

Background. Bioinformatics Exercises for the Study of Evolution with Heme Proteins as a Model System

Synteny Portal Documentation

Collected Works of Charles Dickens


Annotation of Drosophila grimashawi Contig12

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Small RNA in rice genome

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

Inferring phylogeny. Constructing phylogenetic trees. Tõnu Margus. Bioinformatics MTAT

Supplemental Materials

Comparative genomics: Overview & Tools + MUMmer algorithm

Phylogeny and systematics. Why are these disciplines important in evolutionary biology and how are they related to each other?

Tutorial 12 Excess Pore Pressure (B-bar method) Undrained loading (B-bar method) Initial pore pressure Excess pore pressure

Biol478/ August

Cladistics and Bioinformatics Questions 2013

Introduction to Bioinformatics

Ontology Alignment in the Presence of a Domain Ontology

Draft document version 0.6; ClustalX version 2.1(PC), (Mac); NJplot version 2.3; 3/26/2012

Computational Biology: Basics & Interesting Problems

Phylogenetic analysis. Characters

Comparative Bioinformatics Midterm II Fall 2004

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

CHAPTERS 24-25: Evidence for Evolution and Phylogeny

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Biochemistry 324 Bioinformatics. Pairwise sequence alignment

Computational Biology

Network Alignment 858L

Homolog. Orthologue. Comparative Genomics. Paralog. What is Comparative Genomics. What is Comparative Genomics

Massachusetts Institute of Technology Computational Evolutionary Biology, Fall, 2005 Notes for November 7: Molecular evolution

GenomeBlast: a Web Tool for Small Genome Comparison

Transcription:

Molecular Biology-2018 1 Definitions: RELATIONSHIPS BETWEEN GENES/PROTEINS HOMOLOGUES Heterologues: Genes or proteins that possess different sequences and activities. Homologues: Genes or proteins that share a threshold level of similarity as determined by alignment of matching bases or amino acids. Specifically, nucleotide sequences whose percent similarity is equal to or greater than 70% are termed homologous. In contrast, amino acid sequences whose percent similarity is equal to or greater than 25% are said to be homologous. Similarity is a quantitative term that defines the degree of sequence match between two compared sequences. For example, two aligned genes or segments of sequence that are homologous may have varying degrees of similarity based upon identical base matches in the alignment. In the first sequence alignment in the following figure, the sequences are obviously identical and therefore exhibit 39 matches out of 39 positions aligned, or 100% similarity. In the second alignment, the aligned sequences contain 28 matches out of 39 possible. The quantitative match or degree of similarity is then 28/39 or 72%. In both cases the sequences are homologous. A atgcctgaaggcctattgtttcccagtcgattggctgct... 39 of 39 matches atgcctgaaggcctattgtttcccagtcgattggctgcg... B atgcctgaaggcctattgtttcccagtcgattggctgct... 28 of 39 matches atgcctcggcttatattgtatcccagtccattggcagcg... Analogues: Genes or proteins that display the same activity but lack sufficient similarity to be homologs. (Less than 70% in the case of nucleotide sequences or less than 25% in the case of protein sequences.) Paralogs: Homologous genes or proteins produced by gene duplication are termed paralogous. Given that gene duplication occurs within the same organism/species, paralogues are sequences that share a high degree of similarity within a same species. These may have similar or different activities. Orthologs: After a speciation event, one homolog sorts with one species and the other copy with the other species. Subsequent divergence of the duplicated sequence is associated with one or the other species. Consequently, orthologues represent genes or proteins that share a high degree of similarity between different species.

Molecular Biology-2018 2 FINDING HOMOLOGS STARTING FROM A NUCLEOTIDE SEQUENCE 1. For this exercise you will be using the sequence represented by the mrna accession number NM_000558. Obtain the accession number, source organism and FASTA sequence. 2. From the nucleotide record click on the link "Run Blast" under the heading "analyze this sequence" on the right side of the page. This should bring you to the following page: Click here 3. Choose the options indicated above by the red boxes. Then click on algorithm parameters, at the bottom of the page, to obtain more options. 4. Change the following parameters: Set Max target sequences to 1000 and Expect threshold to 100. Click on Blast to start the search.

Molecular Biology-2018 3 5. Once you've obtained the Blast results, as shown below, click on "Taxonomy reports" to display the different organisms in which sequence similarities were found. 6. A new page will appear. Find the entry for Bos taurus (domestic cow). Notice the number of hits and click on it to list those records. 7. Obtain the first nucleotide record for hemoglobin, alpha 2 (HBA) mrna. Obtain the accession number, source organism and FASTA sequence. 8. Use the same approach to obtain the first nucleotide record of hemoglobin, alpha 1 mrna from Gallus gallus (chicken) 9. Use the approach we ve seen in the first bioinfo exercise to obtain the nucleotide record for human fetal hemoglobin (hemoglobin subunit gamma 1). Obtain the accession number, source organism and FASTA sequence.

Molecular Biology-2018 4 DETERMINING THE LEVEL OF SIMILARITY AT THE NUCLEOTIDE LEVEL 1. You should have saved four nucleotide sequences; two from humans, one from the domestic cow and one from the chicken. To determine the level of similarity at the nucleotide level we will use the program Clustal omega to perform a sequence alignment. Copy and paste each of the nucleotide sequences in FASTA format into the query box. Make sure to choose the option DNA 2. Click Submit to view the alignment. 3. On the menu at the top of the page, click on results summary and then on "percent identity matrix" to obtain the percentage of identity between the different sequences. See below. Percent Identity Matrix - created by Clustal2.1 1: gi 302408715 100.00 18.58 64.66 2: gi 189202936 18.58 100.00 62.75 3: gi 185698558 64.66 62.75 100.00 4. These results are pairwise comparisons between the different sequences. Obtain from this file the percentage identity between each of the following pairs: human-alpha to human-gamma, humanalpha to cow-alpha, and human-alpha to chicken alpha.

Molecular Biology-2018 5 DETERMINING THE LEVEL OF SIMILARITY AT THE PROTEIN LEVEL 1. From each of the nucleotide records obtained above, obtain the corresponding protein records and their FASTA sequences. 2. Repeat the alignment in Clustal omega with the protein sequences. Make sure to choose the option protein this time. 3. You will notice that the display of the alignment is somewhat different this time. Interpreting the results displayed: "*" Means that the amino acids are identical. ":" Means that conserved substitutions are observed; a different amino acid which shares the same charge and shape. "." Means that semi-conserved substitutions are observed; a different amino acid which shares either the same charge or shape. 4. Obtain the percentage identity between each of the following protein pairs: human-alpha to human-gamma, human-alpha to cow-alpha, and human-alpha to chicken alpha.

Molecular Biology-2018 6 FINDING PROTEIN HOMOLOGS STARTING FROM A PROTEIN SEQUENCE 1. For this exercise you will be using the sequence represented by the protein accession number AAA82165. Obtain the corresponding source organism and FASTA sequence. 2. From the protein record page, click on the link "Run Blast" under the heading "analyze this sequence" on the right side of the page. 3. Using the same parameters you used in the previous exercise, use Blastp to find and obtain the FASTA protein sequences for each of the following organisms: Bos taurus (Domestic cow) Mus musculus (Mouse) 4. Use the approach we have seen in the first bioinfo exercise to obtain the protein record for the alcohol dehydrogenase of Saccharomyces cerevisiae (yeast; hint it is classified as a fungi). 5. As you did previously, use Clustal omega to determine the percentage identity between the following pairs of proteins: Human to cow, human to mouse, and human to yeast. FINDING NUCLEOTIDE SEQUENCES WHICH CODE FOR PROTEINS WITH SIMILAR FUNCTIONS STARTING FROM A PROTEIN SEQUENCE 1. For this exercise you will be using the sequence represented by the protein accession number AAA82165. Obtain the corresponding source organism and FASTA sequence. 2. From the protein record page, click on the link "Run Blast" under the heading "analyze this sequence" on the right side of the page. 3. This time, choose the option tblastn among the different Blast option. 4. As you have done previously, use the same algorithm parameters to find and obtain the FASTA nucleotide sequence for ADH 7 of Myotis ricketti (bats) as well as the FASTA protein sequence. 5. Obtain the FASTA nucleotide sequence and FASTA protein sequence from the record with the accession number U09623. 6. As you did previously, use Clustal omega to determine the percentage identity between the nucleotide sequences and the proteins sequences to answer the following questions: a. What type of homologues are the nucleotide sequences? b. What type of homologues are the protein sequences?