The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

Similar documents
RNA Secondary Structure. CSE 417 W.L. Ruzzo

Genome 559 Wi RNA Function, Search, Discovery

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery

Dynamic Programming 1

Dynamic Programming. Weighted Interval Scheduling. Algorithmic Paradigms. Dynamic Programming

Searching for Noncoding RNA

CSE 421 Weighted Interval Scheduling, Knapsack, RNA Secondary Structure

CS 580: Algorithm Design and Analysis

Copyright 2000, Kevin Wayne 1

Chapter 6. Weighted Interval Scheduling. Dynamic Programming. Algorithmic Paradigms. Dynamic Programming Applications

6. DYNAMIC PROGRAMMING I

Areas. ! Bioinformatics. ! Control theory. ! Information theory. ! Operations research. ! Computer science: theory, graphics, AI, systems,.

Computational Biology: Basics & Interesting Problems

CSE 202 Dynamic Programming II

6. DYNAMIC PROGRAMMING I

Chapter 6. Dynamic Programming. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Modeling and Searching for Non-Coding RNA

proteins are the basic building blocks and active players in the cell, and

Topic 4 - #14 The Lactose Operon

Impact Of The Energy Model On The Complexity Of RNA Folding With Pseudoknots

Control of Gene Expression in Prokaryotes

BME 5742 Biosystems Modeling and Control

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

GCD3033:Cell Biology. Transcription

CHAPTER : Prokaryotic Genetics

Name: Monster Synthesis Activity

Introduction to Molecular and Cell Biology

RNA Secondary Structure Prediction

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

Honors Biology Reading Guide Chapter 11

Controlling Gene Expression

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

Flow of Genetic Information

Algorithms in Bioinformatics

Prokaryotic Gene Expression (Learning Objectives)

Lesson Overview. Gene Regulation and Expression. Lesson Overview Gene Regulation and Expression

Gene Regulation and Expression

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

15.2 Prokaryotic Transcription *

2015 FALL FINAL REVIEW

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

UNIT 5. Protein Synthesis 11/22/16

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

Bacterial Genetics & Operons

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

A Novel Statistical Model for the Secondary Structure of RNA

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

13.4 Gene Regulation and Expression

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Introduction to molecular biology. Mitesh Shrestha

Multiple Choice Review- Eukaryotic Gene Expression

-CH 2. -S-CH 3 How do I know this?

Computational Cell Biology Lecture 4

Regulation of Gene Expression at the level of Transcription

RNA Protein Interaction

BSC 1010C Biology I. Themes in the Study of Life Chapter 1

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

RecitaLon CB Lecture #10 RNA Secondary Structure

THE EDIBLE OPERON David O. Freier Lynchburg College [BIOL 220W Cellular Diversity]

Lesson Overview. Ribosomes and Protein Synthesis 13.2

A Method for Aligning RNA Secondary Structures

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Name Period The Control of Gene Expression in Prokaryotes Notes

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Introduction. Gene expression is the combined process of :

Prokaryotic Gene Expression (Learning Objectives)

Regulation of Gene Expression

Modeling and Searching for Non-Coding RNA

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

Introduction to Bioinformatics

Basic Synthetic Biology circuits

RNA Synthesis and Processing

Gene Switches Teacher Information

Lecture 7: Simple genetic circuits I

From Gene to Protein

PROTEIN SYNTHESIS: TRANSLATION AND THE GENETIC CODE

Combinatorial approaches to RNA folding Part I: Basics

Self Similar (Scale Free, Power Law) Networks (I)

The Gene The gene; Genes Genes Allele;

Proteins. Division Ave. High School Ms. Foglia AP Biology. Proteins. Proteins. Multipurpose molecules

UE Praktikum Bioinformatik

RNA$2 nd $structure$predic0on

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

Genetic transcription and regulation

DNA, Chromosomes, and Genes

Bio 119 Bacterial Genomics 6/26/10

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

Computational methods for predicting protein-protein interactions

VIEW. Flipping Off the Riboswitch: RNA Structures That Control Gene Expression

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

Revisiting the Central Dogma The role of Small RNA in Bacteria

Transcription:

The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral Dogma of Molecular Biology! DN! RN! Protein! Fig. 2. The arrows show the situation as it seemed in 1958. Solid arrows represent probable transfers, dotted arrows possible transfers. The absent arrows (compare Fig. 1) represent the impossible transfers postulated by the central dogma. They are the three possible arrows starting from protein.! Protein! gene! DN " (chromosome)! cell! RN! (messenger)!

Non-coding RN! DN structure: dull! Messenger RN - codes for proteins! Non-coding RN - all the rest! 5 TT 3! Before, say, mid 199 s, 1-2 dozen known (critically important, but narrow roles)! 3 TTT 5! Since mid 9 s dramatic discoveries! Regulation, transport, stability/degradation! E.g. microrn : 1s in humans => 5% of genes! E.g. riboswitches : 1s in bacteria! RN " Structure:" Rich! RN Secondary Structure: RN makes helices too Base pairs!!!!!!!!!!!!!!!!!!!!!!!! 5!!!!!!! sually single stranded " RN s fold, " and function! " Nature uses" what works! 3! 7

http://www.rcsb.org/pdb/explore.do?structureid=1evv! RN " Secondary " Structure:" Not everything," but important," easier than 3d" Why is structure important?! " For protein-coding, similarity in sequence is a powerful tool for finding related sequences! " e.g. hemoglobin is easily recognized in all vertebrates! " For non-coding RN, many different sequences have the same structure, and structure is most important for function.! " So, using structure plus sequence, can find related sequences at much greater evolutionary distances! 6S mimics an " open promoter! E.coli! Barrick et al. RN 25! Trotochaud et al. NSMB 25! Willkomm et al. NR 25!

hloroflexus aurantiacus hloroflexi eobacter metallireducens eobacter sulphurreducens " -Proteobacteria In Bacteria: typical biosynthetic cycle (E. coli) around a critical metabolite ( SM ) ene Regulation: The MET Repressor lberts, et al, 3e. The protein way Riboswitch alternative SM SM rundy & Henkin, Mol. Microbiol 1998 Epshtein, et al., PNS 23 Winkler et al., Nat. Struct. Biol. 23 Protein lberts, et al, 3e. DN 15 16

lberts, et al, 3e. The protein way Riboswitch alternatives lberts, et al, 3e. The protein way Riboswitch alternatives SM-II SM-III SM-I SM-I SM-II rundy, Epshtein, Winkler et al., 1998, 23 orbino et al., enome Biol. 25 17 rundy, Epshtein, Winkler et al., 1998, 23 orbino et al., enome Biol. 25 Fuchs et al., NSMB 26 18 lberts, et al, 3e. The protein way Riboswitch alternatives SM-I SM-II SM-III SM-IV boxed = confirmed riboswitch nd many other examples. Widespread, deeply conserved, structurally sophisticated, functionally diverse, biologically important uses for ncrn throughout prokaryotic world. (+2 more) rundy, Epshtein, Winkler et al., 1998, 23 orbino et al., enome Biol. 25 Fuchs et al., NSMB 26 19 Weinberg et al., RN 28 Weinberg, et al. Nucl. cids Res., July 27 35: 489-4819.! 2

Vertebrates " Bigger, more complex genomes " <2% coding " But >5% conserved in sequence? " nd 5-9% transcribed? " nd structural conservation, if any, invisible (without proper alignments, etc.) " What s going on? Fastest Human ene? 21 Q: What s so hard?!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! : Structure often more important than sequence! Origin of Life? Life needs information carrier: DN molecular machines, like enzymes: Protein making proteins needs DN + RN + proteins making (duplicating) DN needs proteins Horrible circularities! How could it have arisen in an abiotic environment?

Origin of Life? 6.5 RN Secondary Structure RN can carry information, too RN double helix; RN-directed RN polymerase RN can form complex structures RN enzymes exist (ribozymes) RN can control, do logic (riboswitches) Nussinov s lgorithm core technology for RN structure prediction! The RN world hypothesis: 1st life was RN-based RN Secondary Structure RN Secondary Structure (somewhat oversimplified) RN. String B = b 1 b 2 b n over alphabet {,,, }. Secondary structure. RN is usually single-stranded, and tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. Ex: Secondary structure. set of pairs S = { (b i, b j ) } that satisfy:!" [Watson-rick.] " S is a matching, i.e. each base pairs with at most one other, and " each pair in S is a Watson-rick pair: -, -, -, or -.!" [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (b i, b j ) # S, then i < j - 4.!" [Non-crossing.] If (b i, b j ) and (b k, b l ) are two pairs in S, then we cannot have i < k < j < l. (Violation of this is called a pseudoknot.) Free energy. sual hypothesis is that an RN molecule will form the secondary structure with the optimum total free energy. approximate by number of base pairs complementary base pairs: -, - oal. iven an RN molecule B = b 1 b 2 b n, find a secondary structure S that maximizes the number of base pairs.

RN Secondary Structure: Examples RN Secondary Structure: Subproblems Examples. First attempt. OPT[j] = maximum number of base pairs in a secondary structure of the substring b 1 b 2 b j. match b t and b j base pair 1 t j ok $4 sharp turn crossing Difficulty. Results in two sub-problems.!" Finding secondary structure in: b 1 b 2 b t-1.!" Finding secondary structure in: b t+1 b t+2 b j-1. OPT(t-1) not OPT of anything; need more sub-problems Dynamic Programming Over Intervals: (R. Nussinov s algorithm) Bottom p Dynamic Programming Over Intervals Notation. OPT[i, j] = maximum number of base pairs in a secondary structure of the substring b i b i+1 b j. Q. What order to solve the sub-problems?. Do shortest intervals first.!" ase 1. If i % j - 4. " OPT[i, j] = by no-sharp turns condition.!" ase 2. Base b j is not involved in a pair. " OPT[i, j] = OPT[i, j-1]!" ase 3. Base b j pairs with b t for some i $ t < j - 4. " non-crossing constraint decouples resulting sub-problems " OPT[i, j] = 1 + max t { OPT[i, t-1] + OPT[t+1, j-1] } Key point: Either last base is unpaired (case 1,2) or paired (case 3)! RN(b 1,,b n ) { for k = 5, 6,, n-1 for i = 1, 2,, n-k j = i + k ompute OPT[i, j] } return OPT[1, n] using recurrence i 4 3 2 1 6 7 8 9 j j k take max over t such that i $ t < j-4 and b t and b j are Watson-rick complements Running time. O(n 3 ). i 1 2 5 4 1 1 i 2 6 7 8 9 Remark. Same core idea in KY algorithm to parse context-free grammars. 3 4 3 4 k

omputing one cell: OPT[2,18] =?!!n = 16! ( (. (.... ). ).. )..! 1 1 1 1 1 2 2 2 3 3 3! 1 1 2 2 2 2 2 2! 1 1 1 1 1 2 2 2! 1 1 1 1 1 2 2 2! 1 1 1 1 1 1 2! 1 1 1 1 2! 1 1 1 1 1! 1! 1! E.g.: OPT[1,6] = 1:! (...) E.g.: OPT[6,16] = 2:! ((...)...) n= 2! $ OPT[i, j -1]! ' max %! ( ase 1:! 2 % 18-4? no.! ase 2:! B 18 unpaired?! lways a possibility;" then OPT[2,18] % 3"! ((...))(...)... omputing one cell: OPT[2,18] =? n= 2! $ OPT[i, j -1]! ' max %! ( ase 3, 2 $ t <18-4:! t = 2: no pair! omputing one cell: OPT[2,18] =? n= 2! $ OPT[i, j -1]! ' max %! ( ase 3, 2 $ t <18-4:! t = 3: no pair!

omputing one cell: OPT[2,18] =? n= 2! $ OPT[i, j -1]! ' max %! ( ase 3, 2 $ t <18-4:! t = 4: yes pair" OPT[2,18]%1++3!!..(...(((...))))! omputing one cell: OPT[2,18] =? n= 2! OPT(i, j) = % $ OPT(i, OPT[i, j -1) -1]! ' max% ( & 1+ max t (OPT(i,t (OPT[i,t #1) #1] + OPT[t OPT(t +1, j #1] #1)) ase 3, 2 $ t <18-4:! t = 5: yes pair" OPT[2,18]%1++3!!...(..(((...))))! omputing one cell: OPT[2,18] =? n= 2! OPT(i, j) = % $ OPT(i, OPT[i, j -1) -1]! ' max% ( & 1+ max t (OPT(i,t (OPT[i,t #1) #1] + OPT[t OPT(t +1, j #1] #1)) ase 3, 2 $ t <18-4:! t = 6: yes pair" OPT[2,18]%1++3!!...(.(((...))))! omputing one cell: OPT[2,18] =? n= 2! OPT(i, j) = % $ OPT(i, OPT[i, j -1) -1]! ' max% ( & 1+ max t (OPT(i,t (OPT[i,t #1) #1] + OPT[t OPT(t +1, j #1] #1)) ase 3, 2 $ t <18-4:! t = 7: yes pair" OPT[2,18]%1++3!!...((((...))))!

omputing one cell: OPT[2,18] =? n= 2! $ OPT[i, j -1]! ' max %! ( ase 3, 2 $ t <18-4:! t = 8: no pair! omputing one cell: OPT[2,18] =? n= 2! $ OPT[i, j -1]! ' max %! ( ase 3, 2 $ t <18-4:! t = 11: yes pair" OPT[2,18]%1+2+!! ((...))(...)! (not shown:! t=9,1, 12,13)! n= 2! omputing one cell: OPT[2,18] = 4 1 2 2 2 2 2 2 3 3 3 4 5 6! $ 1 1 1! if i " j # 4 $ OPT[i, j -1]! ' max %! ( Overall, Max = 4! several ways, e.g.:!!..(...(((...))))! tree shows trace back:! square = case 3! octagon = case 1" nother Trace Back Example!n = 16! ( (. (.... ). ).. )..! 1 1 1 1 1 2 2 2 3 3 3! 1 1 2 2 2 2 2 2! 1 1 1 1 1 2 2 2! 1 1 1 1 1 2 2 2! 1 1 1 1 1 1 2! 1 1 1 1 2! 1 1 1 1 1! 1! 1!! $! if i " j # 4 * $ OPT[i, j -1] ' max% ( E.g.: OPT[1,16] = 3:! ((.(...).)..)..