December 9, 2012
Course Goals Familiarize you with the challenges involved in RNA informatics. Introduce commonly used tools, and provide an intuition for how they work. Give you the background and confidence to find, understand, and apply methods in your own work.
Introduction 1. Introduction to RNA Non-coding RNA and disease Bacterial ncrnas RNA structure, basepairing Drawing RNA structure 2. RNA structure prediction: Single sequence: Nussinov, Zuker & McCaskill Comparative analysis: alignment folding & Sankoff Comparison of comparative analysis algorithms 3. Homology-search methods Sequence-based methods Profile-based methods Gene-finding Family specific methods 4. RNA Family Practical 5. Extra material for the course can be found here: http://sites.google.com/site/rnainformatics/
RNA: why is this stuff interesting? RNA world was an essential step to modern protein-dna based life (using current reasonable models). Which came first, DNA or protein? RNA has catalytic potential (like protein), carries hereditary information (like DNA). Many RNAs involved in essential cellular process. I.e. translation, splicing and regulation of protein expression. 2 3 of the ribosome is RNA. Ribosomal function is preserved even after amino-acid residues are deleted from the active site! Current estimates indicate that the number of ncrna genes is comparable to the number of protein coding genes.
RNA and human disease (I) Prader-Willi syndrome: mapped to the C/D box snorna SNORD116 (HBII-85) http:/ / w ww.exp ertreviews.org / p q 1 1 2 3 2 1 1 2 3 4 5 1 2 3 4 5 6 BP1 BP2 BP3 Cen HERC2 GCP5 CYFIP1 NIPA2 NIPA1 HERC2 MKRN3 MAGEL2 NDN C15ORF12 SNURF-SNRPN HBII-436/13 HBII-438A HBII-85 IPW exons HBII-52 HBII-438B UBE3A ATP10C GABRB3 GABRA5 GABRG3 P (OCA2) HERC2 exp ert reviews in m ole c ular m e dicin e Ideogram of chromosome 15, showing genes located in the typical deletion region of Prader-Willi syndrome Expert Reviews in Molecular Medicine C 2005 Cambridge University Press IC snornas Tel Type I deletion Type II deletion Maternally expressed genes (Angelman syndrome genes) Paternally expressed genes (Prader-Willi syndrome candidate genes and snornas) Genes expressed on both chromosomes Genes with paternal biased expression Gene expression status not confirmed Figure 1. Ideogram of chromosome 15, showing genes located in the typical deletion region of Prader Willi syndrome. The locations of genes in this region, 15q11-q13, and their imprinting statuses are shown. The gene order is based on the UCSC Genome Bioinformatics website (http://genome.ucsc.edu). Approximately 40% of subjects with the typical deletion have the type I deletion, and approximately 60% have the type II deletion. Abbreviations: Cen, centromere; Tel, telomere; BP, breakpoint; IC, imprinting centre; snorna, small nucleolar RNA. Ideogram of chromosome 15, showing genes located in the typical deletion region of Prader Willi syndrome 0 1 Sequence conservation G A U G A U G A C U Y C C W Y A H AW C U U R C A U U C G G A C AAA A A A Aa G C UG A GU G A U 5 3 G C G C A U U G C G A G U G A R A A C U C YMU C A A G C U R C U C Sahoo et al. (2008) Prader-Willi phenotype caused by paternal deficiency for the HBII-85 C/D box small nucleolar RNA cluster. Nat Genet. A CC D YY G UC
5 3 RNA and human disease (II) mir-96 and deafness U A G M G C G A C G S A U R U A U U A U U A G C G C C G A U C G U A A G Y C G A U C G A U U A U C U U A U U A G C C G U A U G S U U G C UC U G C C U C CU 0 1 Sequence conservation Lewis et al. (2009) An ENU-induced mutation of mir-96 associated with progressive hearing loss in mice. Nat Genet.
Bacterial RNA srnas Vogel. (2008) A rough guide to the non-coding RNA world of Salmonella.
Bacterial RNA srnas Vogel. (2008) A rough guide to the non-coding RNA world of Salmonella.
Riboswitches - expression platforms Nudler & Mironov (2004) The riboswitch control of bacterial metabolism. Trends Biochem Sci.
Riboswitches - distribution Barrick & Breaker (2007) The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol.
Bacterial RNA tmrna Source: Wikipedia user Czwieb.
Bacterial RNA tmrna Source: Wikipedia user Czwieb.
Nucleic acid chemistry R 2 R 1 IUPAC ambiguity chars: R 1 R 1 R 1 RNA DNA R 1: OH H R : H 2 CH 3
RNA: structure A Primary Structure 10 15 20 25 30 35 5 40 45 50 55 60 65 70 75 Ψ Ψ 5 GCGGAUUUAGCUCAGDDGGGAGAGCGCCAGACUGAAYA.CUGGAGGUCCUGUGT.CGAUCCACAGAAUUCGCACCA 3 B Secondary Structure 75 3 A C C 5 G C A Acceptor C G Stem G C 70 G U T ΨC D Loop 5A U 15 U A Loop D G 60 A U U D A A 65 C U C G U C G G A C A G G A G A C A G 10 C C U 25 G U G 50 G T C C GAG GUC. CG 20 Ψ 55 45 G A U A G C 40 30 C Ψ. Variable Anticodon U A Loop G. Loop A A Y 35 T ΨC Loop D Loop C Tertiary Structure Anticodon Loop 5 Acceptor Stem 3
RNA: base-pairing Central dogma of structural biology: Sequence determines structure determines function. Canonical (Watson-Crick) base-pairs C G, A U. Non-canonical (Wobble) base-pair G U Note: other non-canonical base-pairs do occur, but these are rare and generally re-defined as tertiary interactions. Images lifted from: http://en.wikipedia.org/wiki/base pair
RNA: base-pairing Yang et al. (2003) Tools for the automatic identification and classification of RNA base pairs, NAR.
RNA: base-pairing Yang et al. (2003) Tools for the automatic identification and classification of RNA base pairs, NAR.
RNA: base-pairing bpc C:G U:A U:G G:A C:A U:C A:A C:C G:G U:U Total WC 49.8% 14.4% 0.01% 1.2% 0.1% 0.5% - - - - 66.1% Wb 0.06% 0.06% 7.1% - 0.2% - 0.3% 0.5% 0.2% 0.9% 9.6% Other 0.8% 5.8% 1.5% 9.4% 2.3% 0.6% 2.6% 0.5% 0.7% 0.3% 24.3% Total 50.7% 20.3% 8.7% 10.6% 2.6% 1.0% 2.9% 1.0% 0.9% 1.3% 100.0% Just 71.3% of rrna contacts are canonical or G:U wobble! Lee & Gutell (2004) Diversity of base-pair conformations and their occurrence in rrna structure and RNA structural motifs J Mol Biol.
RNA stacking Laurberg et al. (2008) Structural basis for translation termination on the 70S ribosome Nature. Image lifted from: http://rna.ucsc.edu/pdbrestraints/index.html
Alanine trna Holley, Apgar, Everett, Madison, Marquisee, Merrill, Penswick & Zamir (1965) Structure of a ribonucleic acid. Science.
Tyrosine trna Madison, Everett & Kung (1966) Nucleotide Sequence of a Yeast Tyrosine Transfer RNA. Science.
Exercise 1 Split into groups of at most 3 and fold one of the following sequences by hand (use nothing but a pencil, ruler and compass): http://sites.google.com/site/rnainformatics/rna-folding-exercises/exercise-1 >A1 AAAAAAGGCGACAGAGUAAUCUGUCGCCUUUUUUCUUUGCUUGC >A2 AAGAAAAACGGGUCGCCAGAAGGUGACCCGUUUUUUUUAUUCUUUUA >A3 AAAAAAGCCCGCACCUGACAGUGCGGGCUUUUUUUUUC >A4 AAAGCCCGUGAGUAUUCACGGGCUUUUUUAUUAUUUAAU >B1 UGGGAGGGACGGCCCUCCUAUCCACCAGCAUAUCAGCCGCGGGGACGACCCUG >B2 GCCCGGGGACGGCCCCGGGCCGUUCGCUUCAACGGGGACGACCCC >B3 CCUCGGGGACGACCUCGAGGCCUCCUGAUACGCAGGGACGACCCUG >B4 GAAGCGGGACGACCCGUUUUCCUUCUUUCAUUGCGCGGGGACGACCCUG >C1 CCAGCCGCUGACGACGGGGCUGGACUUGCUGGGAGCGCCGCCUUUCGGCGCUUCCGUACCCAUGUUGCUUCAAGGAGGAUAUGGCUAUGGCAA >C2 GCCGAUGCCAAUUGGGUCGGCAUGGUCAGGGAGCGCCACGCUUCUUGGCGCUUCCUCGUAUCUAUGUUGCUCUACGGAGGAUGUAGCUAUGAGAA >C3 AGAGCCGCCUGUAAGGGGCUCGCAGUCGAGGAGCUCCGUUCUCUUCGGCGCUCCUCAUCGUCCAUGUUGCUCAAGGAGGAUAUGGCUAUGAGAA >C4 UCGGUCGCCGCAUAAGGGGCCGAUGUGUCAGGGAGCGCCAUGCUUCUUGGCGUUCCCUCGUAUCUAUGUUGCUCCAAGGAGGAUGUAGUUAUGAGAA
RNA: structure RNA secondary structure graphs satisfy the following restraints upon the corresponding adjacency matrix A n n. G G G A A A C C C G 1 0 0 0 0 0 0 0 1 G 1 0 0 0 0 0 1 0 G 1 0 0 0 1 0 0 A 1 0 0 0 0 0 A 1 0 0 0 0 A 1 0 0 0 C 1 0 0 C 1 0 C 1 Sugar-phosphate backbone: a i,i+1 = 1. Base-pairs are unique: for any i there is at most one k (k i ± 1) satisfying a i,k = 1. Minimal hairpin loop size: for any a i,k = 1 (k i ± 1), i and k satisfy k i > 3 No pseudo-knot criterion: for any a i,j = a k,l = 1 (i < j, k < l) and i < k < j then k < l < j.
RNA: representations
From a matrix to an image G G G A A A C C C G 1 0 0 0 0 0 0 0 1 G 1 0 0 0 0 0 1 0 G 1 0 0 0 1 0 0 A 1 0 0 0 0 0 A 1 0 0 0 0 A 1 0 0 0 C 1 0 0 C 1 0 C 1 GGGAAACCC (((...)))
RNA: number of structures A N is the number of possible sequences of length N. A N = 4 N S N is the number of possible secondary structures of length N. S 0 = S 1 = 1 N S N+1 = S N + S j 1 S N j+1 j=1 S N 1.8 N Hofacker et al. (1998) Combinatorics of RNA Secondary Structures, Discrete Applied Mathematics.
RNA: representations Tinoco Plot : File: trna_25748 Helix length: 4 The Tinoco plot Type: RNA G C G G A U U U A G C U C A G U U G G G A G A G C G C C A G A C U G A A U A U C U G G A G G U C C U G U G U U C G A U C C A C A G A A U U C G C A C C A A C C A C G C U U A A G A C A C C U A G C U U G U G U C C U G G A G G U C U A U A A G U C A G A C C G C G A G A G G G U U G A C U C G A U U U A G G C G G C G G A U U U A G C U C A G U U G G G A G A G C G C C A G A C U G A A U A U C U G G A G G U C C U G U G U U C G A U C C A C A G A A U U C G C A C C A A:U G:C G:U G C G G A U U U A G C U C A G U U G G G A G A G C G C C A G A C U G A A U A U C U G G A G G U C C U G U G U U C G A U C C A C A G A A U U C G C A C C A
Exercise 2 Split into groups of at most 3 and build a dot-plot for one of the following sequences: http://sites.google.com/site/rnainformatics/rna-folding-exercises/exercise-2 >A1 AAAAAAGGCGACAGAGUAAUCUGUCGCCUUUUUUCUUUGCUUGC >A2 AAGAAAAACGGGUCGCCAGAAGGUGACCCGUUUUUUUUAUUCUUUUA >A3 AAAAAAGCCCGCACCUGACAGUGCGGGCUUUUUUUUUC >A4 AAAGCCCGUGAGUAUUCACGGGCUUUUUUAUUAUUUAAU >B1 UGGGAGGGACGGCCCUCCUAUCCACCAGCAUAUCAGCCGCGGGGACGACCCUG >B2 GCCCGGGGACGGCCCCGGGCCGUUCGCUUCAACGGGGACGACCCC >B3 CCUCGGGGACGACCUCGAGGCCUCCUGAUACGCAGGGACGACCCUG >B4 GAAGCGGGACGACCCGUUUUCCUUCUUUCAUUGCGCGGGGACGACCCUG >C1 CCAGCCGCUGACGACGGGGCUGGACUUGCUGGGAGCGCCGCCUUUCGGCGCUUCCGUACCCAUGUUGCUUCAAGGAGGAUAUGGCUAUGGCAA >C2 GCCGAUGCCAAUUGGGUCGGCAUGGUCAGGGAGCGCCACGCUUCUUGGCGCUUCCUCGUAUCUAUGUUGCUCUACGGAGGAUGUAGCUAUGAGAA >C3 AGAGCCGCCUGUAAGGGGCUCGCAGUCGAGGAGCUCCGUUCUCUUCGGCGCUCCUCAUCGUCCAUGUUGCUCAAGGAGGAUAUGGCUAUGAGAA >C4 UCGGUCGCCGCAUAAGGGGCCGAUGUGUCAGGGAGCGCCAUGCUUCUUGGCGUUCCCUCGUAUCUAUGUUGCUCCAAGGAGGAUGUAGUUAUGAGAA
The end of section one! CC-licensed image from Flickr user cliff1066: North Church, Portsmouth, NH