The wonderful world of RNA informatics

Similar documents
In Genomes, Two Types of Genes

Combinatorial approaches to RNA folding Part I: Basics

Computational Biology: Basics & Interesting Problems

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

SNORNAS HOMOLOGY SEARCH

DNA/RNA Structure Prediction

Flow of Genetic Information

Predicting RNA Secondary Structure

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

UNIT 5. Protein Synthesis 11/22/16

From gene to protein. Premedical biology

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Introduction to Molecular and Cell Biology

GCD3033:Cell Biology. Transcription

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Multiple Choice Review- Eukaryotic Gene Expression

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

Introduction to molecular biology. Mitesh Shrestha

Grand Plan. RNA very basic structure 3D structure Secondary structure / predictions The RNA world

Algorithms in Bioinformatics

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

BCB 444/544 Fall 07 Dobbs 1

BME 5742 Biosystems Modeling and Control

RNA Basics. RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U. Bases can only pair with one other base. wobble pairing. 23 Hydrogen Bonds more stable

Study Guide: Fall Final Exam H O N O R S B I O L O G Y : U N I T S 1-5

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

PROTEIN SYNTHESIS INTRO

1. In most cases, genes code for and it is that

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Hairpin Database: Why and How?

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Lab III: Computational Biology and RNA Structure Prediction. Biochemistry 208 David Mathews Department of Biochemistry & Biophysics

Sequence analysis and comparison

Full file at CHAPTER 2 Genetics

Lecture 12. DNA/RNA Structure Prediction. Epigenectics Epigenomics: Gene Expression

Detecting non-coding RNA in Genomic Sequences

Translation Part 2 of Protein Synthesis

Chapter 17. From Gene to Protein. Biology Kevin Dees

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

A different perspective. Genes, bioinformatics and dynamics. Metaphysics of science. The gene. Peter R Wills

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

Organic Chemistry Option II: Chemical Biology

RNA secondary structure prediction. Farhat Habib

Describing RNA Structure by Libraries of Clustered Nucleotide Doublets

Genome 559 Wi RNA Function, Search, Discovery

Unit 3 - Molecular Biology & Genetics - Review Packet

Procesamiento Post-transcripcional en eucariotas. Biología Molecular 2009

+ regulation. ribosomes

Sensing Metabolic Signals with Nascent RNA Transcripts: The T Box and S Box Riboswitches as Paradigms

RGP finder: prediction of Genomic Islands

Revisiting the Central Dogma The role of Small RNA in Bacteria

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

RNA and Protein Structure Prediction

Chapter 9 DNA recognition by eukaryotic transcription factors

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Using SetPSO to determine RNA secondary structure

From Gene to Protein

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Comparative genomics: Overview & Tools + MUMmer algorithm

Bioinformatics Chapter 1. Introduction

Motivating the need for optimal sequence alignments...

Sugars, such as glucose or fructose are the basic building blocks of more complex carbohydrates. Which of the following

Novel Algorithms for Structural Alignment of Noncoding

Chapter 12. Genes: Expression and Regulation

Introduction to Evolutionary Concepts

ASSESSING TRANSLATIONAL EFFICIACY THROUGH POLY(A)- TAIL PROFILING AND IN VIVO RNA SECONDARY STRUCTURE DETERMINATION

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

Computational Cell Biology Lecture 4

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Berg Tymoczko Stryer Biochemistry Sixth Edition Chapter 1:

Initiation of translation in eukaryotic cells:connecting the head and tail

Introduction to the Ribosome Overview of protein synthesis on the ribosome Prof. Anders Liljas

RecitaLon CB Lecture #10 RNA Secondary Structure

The wonderful world of NUCLEIC ACID NMR!

CSCE555 Bioinformatics. Protein Function Annotation

Part III - Bioinformatics Study of Aminoacyl trna Synthetases. VMD Multiseq Tutorial Web tools. Perth, Australia 2004 Computational Biology Workshop

Genetics 304 Lecture 6

What is the central dogma of biology?

Chapter 15 Active Reading Guide Regulation of Gene Expression

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

De novo prediction of structural noncoding RNAs

DANNY BARASH ABSTRACT

Taxonomy. Content. How to determine & classify a species. Phylogeny and evolution

Bio 119 Bacterial Genomics 6/26/10

Time allowed: 2 hours Answer ALL questions in Section A, ALL PARTS of the question in Section B and ONE question from Section C.

Section 7. Junaid Malek, M.D.

Quantum Chemical Studies Of Nucleic Acids Can We Construct A Bridge To The Rna Structural Biology And Bioinformatics Communities?

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Searching genomes for non-coding RNA using FastR

The Riboswitch is functionally separated into the ligand binding APTAMER and the decision-making EXPRESSION PLATFORM

Rapid Dynamic Programming Algorithms for RNA Secondary Structure

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

UNIVERSITY OF YORK. BA, BSc, and MSc Degree Examinations Department : BIOLOGY. Title of Exam: Molecular microbiology

Transcription:

December 9, 2012

Course Goals Familiarize you with the challenges involved in RNA informatics. Introduce commonly used tools, and provide an intuition for how they work. Give you the background and confidence to find, understand, and apply methods in your own work.

Introduction 1. Introduction to RNA Non-coding RNA and disease Bacterial ncrnas RNA structure, basepairing Drawing RNA structure 2. RNA structure prediction: Single sequence: Nussinov, Zuker & McCaskill Comparative analysis: alignment folding & Sankoff Comparison of comparative analysis algorithms 3. Homology-search methods Sequence-based methods Profile-based methods Gene-finding Family specific methods 4. RNA Family Practical 5. Extra material for the course can be found here: http://sites.google.com/site/rnainformatics/

RNA: why is this stuff interesting? RNA world was an essential step to modern protein-dna based life (using current reasonable models). Which came first, DNA or protein? RNA has catalytic potential (like protein), carries hereditary information (like DNA). Many RNAs involved in essential cellular process. I.e. translation, splicing and regulation of protein expression. 2 3 of the ribosome is RNA. Ribosomal function is preserved even after amino-acid residues are deleted from the active site! Current estimates indicate that the number of ncrna genes is comparable to the number of protein coding genes.

RNA and human disease (I) Prader-Willi syndrome: mapped to the C/D box snorna SNORD116 (HBII-85) http:/ / w ww.exp ertreviews.org / p q 1 1 2 3 2 1 1 2 3 4 5 1 2 3 4 5 6 BP1 BP2 BP3 Cen HERC2 GCP5 CYFIP1 NIPA2 NIPA1 HERC2 MKRN3 MAGEL2 NDN C15ORF12 SNURF-SNRPN HBII-436/13 HBII-438A HBII-85 IPW exons HBII-52 HBII-438B UBE3A ATP10C GABRB3 GABRA5 GABRG3 P (OCA2) HERC2 exp ert reviews in m ole c ular m e dicin e Ideogram of chromosome 15, showing genes located in the typical deletion region of Prader-Willi syndrome Expert Reviews in Molecular Medicine C 2005 Cambridge University Press IC snornas Tel Type I deletion Type II deletion Maternally expressed genes (Angelman syndrome genes) Paternally expressed genes (Prader-Willi syndrome candidate genes and snornas) Genes expressed on both chromosomes Genes with paternal biased expression Gene expression status not confirmed Figure 1. Ideogram of chromosome 15, showing genes located in the typical deletion region of Prader Willi syndrome. The locations of genes in this region, 15q11-q13, and their imprinting statuses are shown. The gene order is based on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu). Approximately 40% of subjects with the typical deletion have the type I deletion, and approximately 60% have the type II deletion. Abbreviations: Cen, centromere; Tel, telomere; BP, breakpoint; IC, imprinting centre; snorna, small nucleolar RNA. Ideogram of chromosome 15, showing genes located in the typical deletion region of Prader Willi syndrome 0 1 Sequence conservation G A U G A U G A C U Y C C W Y A H AW C U U R C A U U C G G A C AAA A A A Aa G C UG A GU G A U 5 3 G C G C A U U G C G A G U G A R A A C U C YMU C A A G C U R C U C Sahoo et al. (2008) Prader-Willi phenotype caused by paternal deficiency for the HBII-85 C/D box small nucleolar RNA cluster. Nat Genet. A CC D YY G UC

5 3 RNA and human disease (II) mir-96 and deafness U A G M G C G A C G S A U R U A U U A U U A G C G C C G A U C G U A A G Y C G A U C G A U U A U C U U A U U A G C C G U A U G S U U G C UC U G C C U C CU 0 1 Sequence conservation Lewis et al. (2009) An ENU-induced mutation of mir-96 associated with progressive hearing loss in mice. Nat Genet.

Bacterial RNA srnas Vogel. (2008) A rough guide to the non-coding RNA world of Salmonella.

Bacterial RNA srnas Vogel. (2008) A rough guide to the non-coding RNA world of Salmonella.

Riboswitches - expression platforms Nudler & Mironov (2004) The riboswitch control of bacterial metabolism. Trends Biochem Sci.

Riboswitches - distribution Barrick & Breaker (2007) The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol.

Bacterial RNA tmrna Source: Wikipedia user Czwieb.

Bacterial RNA tmrna Source: Wikipedia user Czwieb.

Nucleic acid chemistry R 2 R 1 IUPAC ambiguity chars: R 1 R 1 R 1 RNA DNA R 1: OH H R : H 2 CH 3

RNA: structure A Primary Structure 10 15 20 25 30 35 5 40 45 50 55 60 65 70 75 Ψ Ψ 5 GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA 3 B Secondary Structure 75 3 A C C 5 G C A Acceptor C G Stem G C 70 G U T ΨC D Loop 5A U 15 U A Loop D G 60 A U U D A A 65 C U C G U C G G A C A G G A G A C A G 10 C C U 25 G U G 50 G T C C GAG GUC. CG 20 Ψ 55 45 G A U A G C 40 30 C Ψ. Variable Anticodon U A Loop G. Loop A A Y 35 T ΨC Loop D Loop C Tertiary Structure Anticodon Loop 5 Acceptor Stem 3

RNA: base-pairing Central dogma of structural biology: Sequence determines structure determines function. Canonical (Watson-Crick) base-pairs C G, A U. Non-canonical (Wobble) base-pair G U Note: other non-canonical base-pairs do occur, but these are rare and generally re-defined as tertiary interactions. Images lifted from: http://en.wikipedia.org/wiki/base pair

RNA: base-pairing Yang et al. (2003) Tools for the automatic identification and classification of RNA base pairs, NAR.

RNA: base-pairing Yang et al. (2003) Tools for the automatic identification and classification of RNA base pairs, NAR.

RNA: base-pairing bpc C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1% Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6% Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3% Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0% Just 71.3% of rrna contacts are canonical or G:U wobble! Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rrna structure and RNA structural motifs J Mol Biol.

RNA stacking Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from: http://rna.ucsc.edu/pdbrestraints/index.html

Alanine trna Holley, Apgar, Everett, Madison, Marquisee, Merrill, Penswick & Zamir (1965) Structure of a ribonucleic acid. Science.

Tyrosine trna Madison, Everett & Kung (1966) Nucleotide Sequence of a Yeast Tyrosine Transfer RNA. Science.

Exercise 1 Split into groups of at most 3 and fold one of the following sequences by hand (use nothing but a pencil, ruler and compass): http://sites.google.com/site/rnainformatics/rna-folding-exercises/exercise-1 >A1 AAAAAAGGCGACAGAGUAAUCUGUCGCCUUUUUUCUUUGCUUGC >A2 AAGAAAAACGGGUCGCCAGAAGGUGACCCGUUUUUUUUAUUCUUUUA >A3 AAAAAAGCCCGCACCUGACAGUGCGGGCUUUUUUUUUC >A4 AAAGCCCGUGAGUAUUCACGGGCUUUUUUAUUAUUUAAU >B1 UGGGAGGGACGGCCCUCCUAUCCACCAGCAUAUCAGCCGCGGGGACGACCCUG >B2 GCCCGGGGACGGCCCCGGGCCGUUCGCUUCAACGGGGACGACCCC >B3 CCUCGGGGACGACCUCGAGGCCUCCUGAUACGCAGGGACGACCCUG >B4 GAAGCGGGACGACCCGUUUUCCUUCUUUCAUUGCGCGGGGACGACCCUG >C1 CCAGCCGCUGACGACGGGGCUGGACUUGCUGGGAGCGCCGCCUUUCGGCGCUUCCGUACCCAUGUUGCUUCAAGGAGGAUAUGGCUAUGGCAA >C2 GCCGAUGCCAAUUGGGUCGGCAUGGUCAGGGAGCGCCACGCUUCUUGGCGCUUCCUCGUAUCUAUGUUGCUCUACGGAGGAUGUAGCUAUGAGAA >C3 AGAGCCGCCUGUAAGGGGCUCGCAGUCGAGGAGCUCCGUUCUCUUCGGCGCUCCUCAUCGUCCAUGUUGCUCAAGGAGGAUAUGGCUAUGAGAA >C4 UCGGUCGCCGCAUAAGGGGCCGAUGUGUCAGGGAGCGCCAUGCUUCUUGGCGUUCCCUCGUAUCUAUGUUGCUCCAAGGAGGAUGUAGUUAUGAGAA

RNA: structure RNA secondary structure graphs satisfy the following restraints upon the corresponding adjacency matrix A n n. G G G A A A C C C G 1 0 0 0 0 0 0 0 1 G 1 0 0 0 0 0 1 0 G 1 0 0 0 1 0 0 A 1 0 0 0 0 0 A 1 0 0 0 0 A 1 0 0 0 C 1 0 0 C 1 0 C 1 Sugar-phosphate backbone: a i,i+1 = 1. Base-pairs are unique: for any i there is at most one k (k i ± 1) satisfying a i,k = 1. Minimal hairpin loop size: for any a i,k = 1 (k i ± 1), i and k satisfy k i > 3 No pseudo-knot criterion: for any a i,j = a k,l = 1 (i < j, k < l) and i < k < j then k < l < j.

RNA: representations

From a matrix to an image G G G A A A C C C G 1 0 0 0 0 0 0 0 1 G 1 0 0 0 0 0 1 0 G 1 0 0 0 1 0 0 A 1 0 0 0 0 0 A 1 0 0 0 0 A 1 0 0 0 C 1 0 0 C 1 0 C 1 GGGAAACCC (((...)))

RNA: number of structures A N is the number of possible sequences of length N. A N = 4 N S N is the number of possible secondary structures of length N. S 0 = S 1 = 1 N S N+1 = S N + S j 1 S N j+1 j=1 S N 1.8 N Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics.

RNA: representations Tinoco Plot : File: trna_25748 Helix length: 4 The Tinoco plot Type: RNA G C G G A U U U A G C U C A G U U G G G A G A G C G C C A G A C U G A A U A U C U G G A G G U C C U G U G U U C G A U C C A C A G A A U U C G C A C C A A C C A C G C U U A A G A C A C C U A G C U U G U G U C C U G G A G G U C U A U A A G U C A G A C C G C G A G A G G G U U G A C U C G A U U U A G G C G G C G G A U U U A G C U C A G U U G G G A G A G C G C C A G A C U G A A U A U C U G G A G G U C C U G U G U U C G A U C C A C A G A A U U C G C A C C A A:U G:C G:U G C G G A U U U A G C U C A G U U G G G A G A G C G C C A G A C U G A A U A U C U G G A G G U C C U G U G U U C G A U C C A C A G A A U U C G C A C C A

Exercise 2 Split into groups of at most 3 and build a dot-plot for one of the following sequences: http://sites.google.com/site/rnainformatics/rna-folding-exercises/exercise-2 >A1 AAAAAAGGCGACAGAGUAAUCUGUCGCCUUUUUUCUUUGCUUGC >A2 AAGAAAAACGGGUCGCCAGAAGGUGACCCGUUUUUUUUAUUCUUUUA >A3 AAAAAAGCCCGCACCUGACAGUGCGGGCUUUUUUUUUC >A4 AAAGCCCGUGAGUAUUCACGGGCUUUUUUAUUAUUUAAU >B1 UGGGAGGGACGGCCCUCCUAUCCACCAGCAUAUCAGCCGCGGGGACGACCCUG >B2 GCCCGGGGACGGCCCCGGGCCGUUCGCUUCAACGGGGACGACCCC >B3 CCUCGGGGACGACCUCGAGGCCUCCUGAUACGCAGGGACGACCCUG >B4 GAAGCGGGACGACCCGUUUUCCUUCUUUCAUUGCGCGGGGACGACCCUG >C1 CCAGCCGCUGACGACGGGGCUGGACUUGCUGGGAGCGCCGCCUUUCGGCGCUUCCGUACCCAUGUUGCUUCAAGGAGGAUAUGGCUAUGGCAA >C2 GCCGAUGCCAAUUGGGUCGGCAUGGUCAGGGAGCGCCACGCUUCUUGGCGCUUCCUCGUAUCUAUGUUGCUCUACGGAGGAUGUAGCUAUGAGAA >C3 AGAGCCGCCUGUAAGGGGCUCGCAGUCGAGGAGCUCCGUUCUCUUCGGCGCUCCUCAUCGUCCAUGUUGCUCAAGGAGGAUAUGGCUAUGAGAA >C4 UCGGUCGCCGCAUAAGGGGCCGAUGUGUCAGGGAGCGCCAUGCUUCUUGGCGUUCCCUCGUAUCUAUGUUGCUCCAAGGAGGAUGUAGUUAUGAGAA

The end of section one! CC-licensed image from Flickr user cliff1066: North Church, Portsmouth, NH