RNA Matrices and RNA Secondary Structures

Similar documents
GCD3033:Cell Biology. Transcription

RNA & PROTEIN SYNTHESIS. Making Proteins Using Directions From DNA

BME 5742 Biosystems Modeling and Control

Computational Biology: Basics & Interesting Problems

Multiple Choice Review- Eukaryotic Gene Expression

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Algorithms in Bioinformatics

What is the central dogma of biology?

1. In most cases, genes code for and it is that

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

Translation Part 2 of Protein Synthesis

Introduction to molecular biology. Mitesh Shrestha

From gene to protein. Premedical biology

Types of RNA. 1. Messenger RNA(mRNA): 1. Represents only 5% of the total RNA in the cell.

9/2/17. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Lesson Overview. Ribosomes and Protein Synthesis 13.2

Chapters 12&13 Notes: DNA, RNA & Protein Synthesis

Chapter 17. From Gene to Protein. Biology Kevin Dees

9/11/18. Molecular and Cellular Biology. 3. The Cell From Genes to Proteins. key processes

Biology I Fall Semester Exam Review 2014

Number of questions TEK (Learning Target) Biomolecules & Enzymes

In Genomes, Two Types of Genes

Molecular Biology of the Cell

Discrete Applied Mathematics

Videos. Bozeman, transcription and translation: Crashcourse: Transcription and Translation -

Reading Assignments. A. Genes and the Synthesis of Polypeptides. Lecture Series 7 From DNA to Protein: Genotype to Phenotype

UNIT 5. Protein Synthesis 11/22/16

More Protein Synthesis and a Model for Protein Transcription Error Rates

What Kind Of Molecules Carry Protein Assembly Instructions From The Nucleus To The Cytoplasm

GENETICS UNIT VOCABULARY CHART. Word Definition Word Part Visual/Mnemonic Related Words 1. adenine Nitrogen base, pairs with thymine in DNA and uracil

Translation and Operons

Biomolecules. Energetics in biology. Biomolecules inside the cell

Bio 119 Bacterial Genomics 6/26/10

Bioinformatics. Dept. of Computational Biology & Bioinformatics

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Combinatorial approaches to RNA folding Part I: Basics

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

From Gene to Protein

Chapter 2: Chemistry. What does chemistry have to do with biology? Vocabulary BIO 105

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

2015 FALL FINAL REVIEW

A Recursive Relation for Weighted Motzkin Sequences

Molecular Biology - Translation of RNA to make Protein *

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

Statistical Machine Learning Methods for Bioinformatics IV. Neural Network & Deep Learning Applications in Bioinformatics

What Organelle Makes Proteins According To The Instructions Given By Dna

GENETICS - CLUTCH CH.1 INTRODUCTION TO GENETICS.

"Nothing in biology makes sense except in the light of evolution Theodosius Dobzhansky

UE Praktikum Bioinformatik

What Mad Pursuit (1988, Ch.5) Francis Crick (1916 ) British molecular Biologist 12 BIOLOGY, CH 1

Predicting RNA Secondary Structure

Revisiting the Central Dogma The role of Small RNA in Bacteria

5. From the genetic code to enzyme action

PROTEIN SYNTHESIS INTRO

Biophysics II. Key points to be covered. Molecule and chemical bonding. Life: based on materials. Molecule and chemical bonding

Chapter

ومن أحياها Translation 1. Translation 1. DONE BY :Maen Faoury

COMP 598 Advanced Computational Biology Methods & Research. Introduction. Jérôme Waldispühl School of Computer Science McGill University

Name Block Date Final Exam Study Guide

Translational Initiation

CCHS 2016_2017 Biology Fall Semester Exam Review

CHAPTER 3. Cell Structure and Genetic Control. Chapter 3 Outline

Flow of Genetic Information

Introduction to Molecular and Cell Biology

Unit 3 - Molecular Biology & Genetics - Review Packet

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

Protein Synthesis. Unit 6 Goal: Students will be able to describe the processes of transcription and translation.

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Animals and 2-Motzkin Paths

Biophysics Lectures Three and Four

Procesamiento Post-transcripcional en eucariotas. Biología Molecular 2009

Boolean models of gene regulatory networks. Matthew Macauley Math 4500: Mathematical Modeling Clemson University Spring 2016

Organic Chemistry Option II: Chemical Biology

Section 7. Junaid Malek, M.D.

Cells and the Stuff They re Made of. Indiana University P575 1

Name Date Period Unit 1 Basic Biological Principles 1. What are the 7 characteristics of life?

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Bioinformatics Chapter 1. Introduction

The tree of life: Darwinian chemistry as the evolutionary force from cyanic acid to living molecules and cells

Eukaryotic vs. Prokaryotic genes

The MOLECULES of LIFE

CST and FINAL EXAM REVIEW

RIBOSOME: THE ENGINE OF THE LIVING VON NEUMANN S CONSTRUCTOR

The Complete Set Of Genetic Instructions In An Organism's Chromosomes Is Called The

9 The Process of Translation

AMINO ACIDS ARE JOINED TOGETHER BY BONDS. FILE

CCHS 2015_2016 Biology Fall Semester Exam Review

Information Content in Genetics:

Cellular Neuroanatomy I The Prototypical Neuron: Soma. Reading: BCP Chapter 2

Using SetPSO to determine RNA secondary structure

Homology Modeling (Comparative Structure Modeling) GBCB 5874: Problem Solving in GBCB

Chapter 6.2. p

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

Chapter 9 DNA recognition by eukaryotic transcription factors

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Midterm Review Guide. Unit 1 : Biochemistry: 1. Give the ph values for an acid and a base. 2. What do buffers do? 3. Define monomer and polymer.

Darwin's theory of natural selection, its rivals, and cells. Week 3 (finish ch 2 and start ch 3)

Transcription:

RNA Matrices and RNA Secondary Structures Institute for Mathematics and Its Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota October 29 November 2, 27 Asamoah Nkwanta, Morgan State University Nkwanta@jewel.morgan.edu

RNA Secondary Structure Prediction Given a primary sequence, we want to find the biological function of the related secondary structure. To achieve this goal we predict its secondary structure using a lattice walk or path approach. This walk approach involves enumerative combinatorics and is connected to infinite lower triangular matrices called RNA matrices.

RNA Secondary Structure Primary Structure The linear sequence of bases in an RNA molecule Secondary Structure The folding or coiling of the sequence due to bonded nucleotide pairs: A-U, G-C Tertiary Structure The three dimensional configuration of an RNA molecule. The three dimensional shape is important for biological function, and it is harder to predict.

RNA Molecule Ribonucleic acid (RNA) molecule: Three main categories mrna (messenger) carries genetic information from genes to other cells trna (transfer) carries amino acids to a ribosome (cells for making proteins) rrna (ribosomal) part of the structure of a ribosome

RNA Molcule (cont.) Other types (RNA) molecules: snrna (small nuclear RNA) carries genetic information from genes to other cells mirna (micro RNA) carries amino acids to a ribosome (cells for making proteins) irna (immune RNA) part of the structure of a ribosome (Important for HIV studies)

Primary RNA Sequence CAGCAUCACAUCCGCGGGGUAAACGCU Nucleotide Length, 27 bases

Geometric Representation Secondary structure is a graph defined on a set of n labeled points (M.S. Waterman, 1978) Biological Combinatorial/Graph Theoretic Random Walk Other Representations

RNA Structure 3-D structure of Haloarcula marismortui 5S ribosomal RNA in large ribosomal subunit

RNA NUMBERS 1,1,1,2,4,8,17,37,82,185,423,978, These numbers count RNA secondary structures of length n.

RNA Combinatorics Recurrence Relation: s s ( ) ( ) ( ) = s 1 = s 2 = 1 n 1 j= 1 ( n + 1) = s( n) + s( j 1) s( n j),( n 2) M. Waterman, Introduction to Computational Biology: Maps, sequences and genomes, 1995. M. Waterman, Secondary structure of single-stranded nucleic acids, Adv. Math. (suppl.) 1978.

Counting Sequence Database The On-line Encyclopedia of Integer Sequences: http:/www.research.att.com/njas/sequence s/index.html N.J.A. Sloane & S. Plouffe, The Encyclopedia of Integer Sequences, Academic Press, 1995.

RNA Combinatorics (cont.) The number of RNA secondary structures for the sequence [1,n] is counted by the coefficients of S(z): ( ) 1 2 2 3 4 4 8 5 17 6 37 7 sz =++ z z + z + z + z + z + z + Coefficients of the power series: (1,1,1,2,4,8,17,37,82,185,423,978, )

s z = 1+ z + z + 2z + 4z + 8z + 17z + 37 z + ( ) 2 3 4 5 6 7

RNA Combinatorics (cont.) Based on the coefficients of the generating function there are approximately 1.3 billion possible RNA structures of length n = 27. s z = 1+ z+ z + 2z + + 1,392,251,12z + ( ) 2 3 27

RNA Combinatorics (cont.) Using the recurrence relation we can find the closed form generating function associated with the RNA numbers. ( ) s z = ( ) 2 2 3 4 1 z+ z 1 2z z 2z + z 2z 2 ( ) 2 3 4 5 6 27 1 2 4 8 17 1,392,251,12 sz = + z+ z + z + z + z + z + + z +

RNA Combinatorics (cont.) Exact Formula, and Asymptotic Estimate (as n grows without bound): s ( n ) = 1 n ( ) n k k k k 1 k 1 n k s ( n ) 3 15 + 7 5 2 ~ n 8π 3 + 2 5 n

RNA Combinatorics (cont.) S(n,k) is the number of structures of length n with exactly k base pairs: For n,k >, (, ) s nk 1 n k n k 1 = k k+ 1 k 1

s 6,1 = 1 and s 6,2 = 6 ( ) ( )

RNA Combinatorics (cont.) RNA hairpin combinatorics. ( ) h n = n 2 2 1, number of hairpins of length n ( n ) n m 1 H = 2 1, number of hairpins with m or more bases in the loop, n m+1 ( ) k a,b a+ b 2 =, number of stems with topmost bases bound a 1 k K ( a,b ) a+ b 4 =, number of stems with first and last bases bound a 2 k

Random Walk A random walk is a lattice path from one point to another such that steps are allowed in a discrete number of directions and are of a certain length

RNA Walk Type I NSE* Walks Unit step walks starting at the origin (,) with steps up, down, and right No walks pass below the x-axis and there are no consecutive NS steps

Type I, RNA Array (n x k) Type I, RNA Array (n x k) 1 6 15 24 3 28 17 1 5 1 13 13 8 1 4 6 6 4 1 3 3 2 1 2 1 1 1 1

Type I, RNA Array (n x k) 1 1 1 1 2 1 2 3 3 1 4 6 6 4 1 8 13 13 1 5 1 17 28 3 24 15 6 1

Type I, Formation Rule (Recurrence) m (,) = 1 ( + 1,) = m( n j, j) m n j ( + 1, k) = m( n, k 1) + m( n j, k + j) m n ( + 1, k) =, k > n+ 1 m n j Note. S(z) can be derived using this recurrence.

First Moments/Weighted Row Sums 1 1 1 1 1 2 3 1 2 1 3 8 2 3 3 1 4 = 21 4 6 6 4 1 5 55 8 13 13 1 5 1 6 144 Computing the average height of the walks above the x-axis is given by the alternate Fibonacci numbers

RNA Walk Type II NSE** Walks Unit-step walks starting at the origin (,) with steps up, down, and right such that no walks pass below the x-axis and there are no consecutive SN steps

Type II, RNA Array (n x k) 1 1 1 2 2 1 4 4 3 1 8 9 7 4 1 17 2 17 11 5 1 37 45 41 29 16 6 1

Examples Type I: ENNESNESSE Type II: NEEENSEEES Note. Some Type II walks are not associated with RNA. Thus we have two class of walks to work with for RNA prediction.

RNA Walk Bijection Theorem: There is a bijection between the set of NSE* walks of length n+1 ending at height k = and the set of NSE** walks of length n ending at height k =. Source: Lattice paths, generating functions, and the Riordan group, Ph.D. Thesis, Howard University, Washington, DC, 1997

Main Theorem Theorem: There is a bijection between the set of RNA secondary structures of length n and the set of NSE* walks ending at height k =. Source: Lattice paths and RNA secondary structures, DIMAC Series in Discrete Math. & Theoretical Computer Science 34 (1997) 137-147. (CAARMS2 Proceedings)

Main Theorem (cont.) Proof (sketch): Consider an RNA sequence of length n and convert it to the non-intersecting chord form. Consider the following rules: k E ( i, j ) ( N, S )

Application: HIV-1 1 Prediction Given primary RNA sequences and using RNA combinatorics, the goal of this project is to model components of an HIV-1 RNA secondary structure (namely SL2 and SL3 domains). The major concentration of this project is on reducing the minimum free energy to form an optimum HIV-1 RNA secondary structure. Source: HIV-1 sequence prediction, 27, in progress

HIV-I 5' RNA Structural Elements. Illustration of a working model of the HIV-I 5' UTR showing the various stem-loop structures important for virus replication. These are the TAR element, the poly(a) hairpin, the U5-PBS complex, the stem-loops 1-4 containing the DIS, the major splice donor, the major packaging signal, and the gag start codon, respectively. Nucleotides and numbering correspond to the HIV-I HXB2 sequence. (Adapted from Clever et al. (1995) and Berkhout and van Wamel (2))

Application: HIV-1 1 Prediction (cont.) The following sequence was obtained from the NCBI website. The first 363 nucleotides were extracted from the entire HIV-1 RNA genomic sequence: GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCU GGCUAACUAGGGAACCCACUGCUUAAGCCUCAAUAAAGCUU GCCUUGAGUGCUUCAAGUAGUGUGUGCCCGUCUGUUGUGU GACUCUGGUAACUAGAGAUCCCUCAGACCCUUUUAGUCAGU GUGGAAAAUCUCUAGCAGUGGCGCCCGAACAGGGACCUGA AAGCGAAAGGGAAACCAGAGGAGCUCUCUCGACGCAGGAC UCGGCUUGCUGAAGCGCGCACGGCAAGAGGCGAGGGGCGG CGACUGGUGAGUACGCCAAAAAUUUUGACUAGCGGAGGCUA GAAGGAGAGAGAUGGGUGCGAGAGCGUCAGUAUUAAGCG Color key: SL2 yellow SL3 - red

Future Research: Centers For Biological and Chemical Sensors Research Environmental Toxicology and Biosensors Research The mission is to advance the fundamental scientific and technological knowledge needed to enable the development of new biological and chemical sensors.

Math-Bio Collaborators Dwayne Hill, Biology Dept., MSU Alvin Kennedy and Richard Williams, Chemistry Dept., MSU Wilfred Ndifon, Ecology and Evolutionary Biology Dept., Princeton U. Boniface Eke, Mathematics Dept., MSU