Searching for Noncoding RNA

Size: px
Start display at page:

Download "Searching for Noncoding RNA"

Transcription

1 Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington Bio 2006, Seattle, 8/4/2006 1

2 Outline Noncoding RN Why are they hard to discover? Key computational problems: Motif discovery Motif search Sketch new methods pplication: cis-regulatory motifs in actinobacteria 2

3 Non-coding RN Messenger RN - codes for proteins Non-coding RN - all the rest Before, say, mid 1990 s, 1-2 dozen known (critically important, but narrow roles: e.g. trn) Since mid 90 s dramatic discoveries Hundreds of new families Regulation, transport, stability/degradation E.g. microrn : 100 s in humans By some estimates, ncrn >> mrn 3

4 Example: lycine Regulation How is glycine level regulated? Plausible answer: g TF g transcription factors (proteins) g TF g gce protein glycine cleavage enzyme gene g DN 4

5 The lycine Riboswitch ctual answer (in many bacteria): g g gce protein g g 5 3 gce mrn glycine cleavage enzyme gene DN Mandal et al. Science

6 6S mimics an open promoter E.coli Barrick et al. RN 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NR

7 7 Why should these be hard to discover? : Structure often more important than sequence

8 Wanted ood, fast search tools ( RN BLST, etc.) ood, fast motif discovery tools ( RN MEME, etc.) Importance of structure makes both hard; progress on both below 8

9 How to model an RN Motif? onceptually, start with a profile HMM: from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model mostly del ins all 9

10 How to model an RN Motif? dd column pairs and pair emission probabilities for base-paired regions <<<<<<< paired columns >>>>>>> 10

11 RN Motif Models ovariance Models (Eddy & Durbin 1994) aka profile stochastic context-free grammars aka hidden Markov models on steroids Model position-specific nucleotide preferences and base-pair preferences Pro: accurate on: model building hard, search sloooow 11

12 Task 1: Faster Search 12

13 M s are good, but slow Rfam Reality EMBL Our Work EMBL Rfam oal EMBL BLST Ravenna M M M junk hits 1 month, 1000 computers hits ~2 months, 1000 computers junk hits 10 years, 1000 computers 13

14 Ravenna: enome Scale RN Search Typically 100x speedup over raw M, with no (or little) loss in accuracy: drop structure from M to create a (faster) HMM use that to pre-filter sequence; discard parts where, provably, the M will score < threshold; actually run M on the rest (the promising parts) assignment of HMM transition/emission scores is key (large convex optimization problem) Weinberg & Ruzzo, Bioinformatics, 2004,

15 Results: buried treasures Name # found BLST + M # found rigorous filter + M # new Pyrococcus snorn Iron response element Histone 3 element Purine riboswitch Retron msr Hammerhead I Hammerhead III snrn S-box snrn snrn snrn

16 Task 2: Motif Discovery 16

17 RN Motif Discovery Typical problem: given a ~10-20 unaligned sequences of ~1kb, most of which contain instances of one RN motif of, say, 150bp -- find it Example: 5 TRs of orthologous glycine cleavage genes from γ-proteobacteria 17

18 Obvious pproach I Predict secondary RN structure using MFOLD or Vienna Problems false folding predictions comparing structures 18

19 Obvious pproach II: Predict from Multiple Sequence lignment ompensatory mutations reveal structure, but usual alignment algorithms penalize them (twice) 19

20 Our pproach: Mfinder Simultaneous alignment, folding and M-based motif description using an EM-style learning procedure Yao, Weinberg & Ruzzo, Bioinformatics,

21 lignment M lignment Similar to HMM, but much slower Largely from Eddy & Durbin, 94 But new way to infer which columns to pair, via a principled combination of mutual information and predicted folding energy 21

22 MFinder Harder: Finding Ms without alignment Yao, Weinberg & Ruzzo, Bioinformatics, 2006 Folding predictions Smart heuristics andidate alignment M Search Realign EM 22

23 Mfinder ccuracy (on Rfam families with flanking sequence) /W /W 23

24 Task 3: pplication enome-wide search for cis-regulatory RN elements (in prokaryotes, initially) 24

25 oal: Predicting New cis-regulatory RN Elements iven unaligned TRs of coexpressed or orthologous genes, find common structural motifs Difficulties: Low sequence similarity: alignment difficult Varying flanking sequence Motif missing from some input genes 25

26 pproach hoose a bacterial genome For each gene, get close orthologs (DD) Find most promising genes, based on conserved sequence motifs (Footprinter) From those, find structural motifs (Mfinder) enome-wide search for more instances (Ravenna) Expert analyses (Breaker Lab, Yale) 26

27 enome Scale Search: Why Most riboswitches, e.g., are present in ~5 copies per genome Throughout (most of) clade More examples give better model, hence even more examples, fewer errors More examples give more clues to function - critical for wet lab verification 27

28 enome Scale Search Mfinder is directly usable for/with search Folding predictions Smart heuristics andidate alignment M Search Realign 28

29 Results Process largely complete in bacillus/clostridia gamma proteobacteria cyanobacteria actinobacteria nalysis ongoing 29

30 Some Preliminary ctino Results 8 of 10 Rfam families found Rfam Family Type (metabolite) Rank THI riboswitch (thiamine) 4 ydao-yua riboswitch (unknown) 19 obalamin riboswitch (cobalamin) 21 SRP_bact gene 28 RFN riboswitch (FMN) 39 yybp-ykoy riboswitch (unknown) 48 gcvt riboswitch (glycine) 53 S_box riboswitch (SM) 401 tmrn gene Not found RNaseP gene Not found not cisregulatory (got one anyway) 30

31 More Prelim ctino Results Many others (not in Rfam) are likely real of top 50: known (Rfam, 23S) 10 probable (Tbox, IRE, Lex, parp, pyrr) 7 probable (ribosomal genes) 9 potentially interesting 12 unknown or poor 12 One bench-verified, 2 more in progress 31

32 mrn leader mrn leader switch? 32

33 Ongoing & Future Work Still automating a few steps, e.g. identifying duplicates Improved ranking/motif significance stats Better ortholog clustering Performance & scale-up Eukaryotic mrns, e.g. TRs 33

34 cknowledgements Zizhen Yao (W) Zasha Weinberg (W Yale) Shane Neph (W) Martin Tompa (W) Ron Breaker (Yale) Jeff Barrick (Yale) 34

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger) lassical RNs mrn trn rrn snrn

More information

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007 RNA Search and Motif Discovery Lectures 18-19 CSE 527 Autumn 2007 The Human Parts List, circa 2001 1 gagcccggcc cgggggacgg gcggcgggat agcgggaccc cggcgcggcg gtgcgcttca 61 gggcgcagcg gcggccgcag accgagcccc

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/09sp ENOME 541 protein and DN sequence analysis to determine

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo Outline Why RN? Examples of RN biology omputational hallenges Modeling Search Inference The entral Dogma DN RN

More information

RNA Secondary Structure. CSE 417 W.L. Ruzzo

RNA Secondary Structure. CSE 417 W.L. Ruzzo RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger

More information

RNA Search and! Motif Discovery"

RNA Search and! Motif Discovery RN Search and! Motif Discovery" Lecture 9! SEP 590! utumn 2008" Outline" Whirlwind tour of ncrn search & discovery" RN motif description (ovariance Model Review)" lgorithms for searching" Rigorous & heuristic

More information

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein! The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

ROC TPR FPR

ROC TPR FPR HW5 2 3 TPR 0.0 0.2 0.4 0.6 0.8 1.0 291 411 537 669 807 957 1098 1269 1506 1971 8703 171 51 RO TPR = True Positive Rate = Sensitivity = Recall = TP/(TP+FN) FPR = False Positive Rate = 1 Specificity = FP/(FP+TN)

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery SEP 527 omputational Biology RN: Function, Secondary Structure Prediction, Search, Discovery The Message ells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes Zizhen Yao 1*, Jeffrey Barrick 2, Zasha Weinberg 3, Shane Neph 1,4, Ronald Breaker 2,3,5, Martin Tompa

More information

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

DUPLICATED RNA GENES IN TELEOST FISH GENOMES DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary

More information

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna. onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Mariam Alshaikh. April 15, 2015

Mariam Alshaikh. April 15, 2015 Master Thesis graph kernel approach to the identification and characterization of structured non-coding RNs using multiple sequence alignment information Mariam lshaikh pril 15, 2015 Faculty of Engineering

More information

Inferring Transcriptional Regulatory Networks from Gene Expression Data II

Inferring Transcriptional Regulatory Networks from Gene Expression Data II Inferring Transcriptional Regulatory Networks from Gene Expression Data II Lectures 9 Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday

More information

Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm,

Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, Getting statistical significance and Bayesian confidence limits for your hidden Markov model or score-maximizing dynamic programming algorithm, with pairwise alignment of sequences as an example 1,2 1

More information

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.

More information

RNA Abstract Shape Analysis

RNA Abstract Shape Analysis ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:

More information

Outline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University

Outline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University MB330 - January, 2006 Sequence-comparison methods erard Kleywegt Uppsala University Outline! Why compare sequences?! Dotplots! airwise sequence alignments &! Multiple sequence alignments! rofile methods!

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

Detecting non-coding RNA in Genomic Sequences

Detecting non-coding RNA in Genomic Sequences Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université

More information

Discovering MultipleLevels of Regulatory Networks

Discovering MultipleLevels of Regulatory Networks Discovering MultipleLevels of Regulatory Networks IAS EXTENDED WORKSHOP ON GENOMES, CELLS, AND MATHEMATICS Hong Kong, July 25, 2018 Gary D. Stormo Department of Genetics Outline of the talk 1. Transcriptional

More information

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye!

Intro Gene regulation Synteny The End. Today. Gene regulation Synteny Good bye! Today Gene regulation Synteny Good bye! Gene regulation What governs gene transcription? Genes active under different circumstances. Gene regulation What governs gene transcription? Genes active under

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

EVALUATION OF RNA SECONDARY STRUCTURE MOTIFS USING REGRESSION ANALYSIS

EVALUATION OF RNA SECONDARY STRUCTURE MOTIFS USING REGRESSION ANALYSIS EVLTION OF RN SEONDRY STRTRE MOTIFS SIN RERESSION NLYSIS Mohammad nwar School of Information Technology and Engineering, niversity of Ottawa e-mail: manwar@site.uottawa.ca bstract Recent experimental evidences

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

Genome 541! Unit 4, lecture 3! Genomics assays

Genome 541! Unit 4, lecture 3! Genomics assays Genome 541! Unit 4, lecture 3! Genomics assays Much easier to follow with slides. Good pace.! Having the slides was really helpful clearer to read and easier to follow the trajectory of the lecture.!!

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

BIOINFORMATICS. Neighbourhood Thresholding for Projection-Based Motif Discovery. James King, Warren Cheung and Holger H. Hoos

BIOINFORMATICS. Neighbourhood Thresholding for Projection-Based Motif Discovery. James King, Warren Cheung and Holger H. Hoos BIOINFORMATICS Vol. no. 25 Pages 7 Neighbourhood Thresholding for Projection-Based Motif Discovery James King, Warren Cheung and Holger H. Hoos University of British Columbia Department of Computer Science

More information

RNAscClust clustering RNAs using structure conservation and graph-based motifs

RNAscClust clustering RNAs using structure conservation and graph-based motifs RNsclust clustering RNs using structure conservation and graph-based motifs Milad Miladi 1 lexander Junge 2 miladim@informatikuni-freiburgde ajunge@rthdk 1 Bioinformatics roup, niversity of Freiburg 2

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

Learning gene regulatory networks Statistical methods for haplotype inference Part I

Learning gene regulatory networks Statistical methods for haplotype inference Part I Learning gene regulatory networks Statistical methods for haplotype inference Part I Input: Measurement of mrn levels of all genes from microarray or rna sequencing Samples (e.g. 200 patients with lung

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Deciphering regulatory networks by promoter sequence analysis

Deciphering regulatory networks by promoter sequence analysis Bioinformatics Workshop 2009 Interpreting Gene Lists from -omics Studies Deciphering regulatory networks by promoter sequence analysis Elodie Portales-Casamar University of British Columbia www.cisreg.ca

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

GENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES.

GENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES. GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES!! www.clutchprep.com GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES COCEPT: LAC OPERO An operon is a group of genes with similar functions that are

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models

More information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription

More information

SUPPLEMENTARY METHODS

SUPPLEMENTARY METHODS SUPPLEMENTARY METHODS M1: ALGORITHM TO RECONSTRUCT TRANSCRIPTIONAL NETWORKS M-2 Figure 1: Procedure to reconstruct transcriptional regulatory networks M-2 M2: PROCEDURE TO IDENTIFY ORTHOLOGOUS PROTEINSM-3

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22) Warm-Up Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22) Yesterday s Picture The first cell on Earth (approx. 3.5 billion years ago) was simple and prokaryotic,

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

Inferring positional homologs with common intervals of sequences

Inferring positional homologs with common intervals of sequences Outline Introduction Our approach Results Conclusion Inferring positional homologs with common intervals of sequences Guillaume Blin, Annie Chateau, Cedric Chauve, Yannick Gingras CGL - Université du Québec

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

proteins are the basic building blocks and active players in the cell, and

proteins are the basic building blocks and active players in the cell, and 12 RN Secondary Structure Sources for this lecture: R. Durbin, S. Eddy,. Krogh und. Mitchison, Biological sequence analysis, ambridge, 1998 J. Setubal & J. Meidanis, Introduction to computational molecular

More information

CHAPTER : Prokaryotic Genetics

CHAPTER : Prokaryotic Genetics CHAPTER 13.3 13.5: Prokaryotic Genetics 1. Most bacteria are not pathogenic. Identify several important roles they play in the ecosystem and human culture. 2. How do variations arise in bacteria considering

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Searching genomes for non-coding RNA using FastR

Searching genomes for non-coding RNA using FastR Searching genomes for non-coding RNA using FastR Shaojie Zhang Brian Haas Eleazar Eskin Vineet Bafna Keywords: non-coding RNA, database search, filtration, riboswitch, bacterial genome. Address for correspondence:

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Self Similar (Scale Free, Power Law) Networks (I)

Self Similar (Scale Free, Power Law) Networks (I) Self Similar (Scale Free, Power Law) Networks (I) E6083: lecture 4 Prof. Predrag R. Jelenković Dept. of Electrical Engineering Columbia University, NY 10027, USA {predrag}@ee.columbia.edu February 7, 2007

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

The Gene The gene; Genes Genes Allele;

The Gene The gene; Genes Genes Allele; Gene, genetic code and regulation of the gene expression, Regulating the Metabolism, The Lac- Operon system,catabolic repression, The Trp Operon system: regulating the biosynthesis of the tryptophan. Mitesh

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g. Gene Expression- Overview Differentiating cells Achieved through changes in gene expression All cells contain the same whole genome A typical differentiated cell only expresses ~50% of its total gene Overview

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide

AP Bio Module 16: Bacterial Genetics and Operons, Student Learning Guide Name: Period: Date: AP Bio Module 6: Bacterial Genetics and Operons, Student Learning Guide Getting started. Work in pairs (share a computer). Make sure that you log in for the first quiz so that you get

More information

Quantitative Bioinformatics

Quantitative Bioinformatics Chapter 9 Class Notes Signals in DNA 9.1. The Biological Problem: since proteins cannot read, how do they recognize nucleotides such as A, C, G, T? Although only approximate, proteins actually recognize

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

HIDDEN MARKOV MODEL-BASED HOMOLOGY SEARCH AND GENE PREDICTION IN NGS ERA

HIDDEN MARKOV MODEL-BASED HOMOLOGY SEARCH AND GENE PREDICTION IN NGS ERA HIDDEN MRKOV MODEL-BSED HOMOLOY SERCH ND ENE PREDICTION IN NS ER By Prapaporn Techa-angkoon DISSERTTION Submitted to Michigan State University in partial fulfillment of the requirements for the degree

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

The Riboswitch is functionally separated into the ligand binding APTAMER and the decision-making EXPRESSION PLATFORM

The Riboswitch is functionally separated into the ligand binding APTAMER and the decision-making EXPRESSION PLATFORM The Riboswitch is functionally separated into the ligand binding APTAMER and the decision-making EXPRESSION PLATFORM Purine riboswitch TPP riboswitch SAM riboswitch glms ribozyme In-line probing is used

More information

Comparative Network Analysis

Comparative Network Analysis Comparative Network Analysis BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2016 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by

More information

Frequently Asked Questions (FAQs)

Frequently Asked Questions (FAQs) Frequently Asked Questions (FAQs) Q1. What is meant by Satellite and Repetitive DNA? Ans: Satellite and repetitive DNA generally refers to DNA whose base sequence is repeated many times throughout the

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

NcRNA Homology Search Using Hamming Distance Seeds

NcRNA Homology Search Using Hamming Distance Seeds NcRN Homology Search Using Hamming Distance Seeds Osama ljawad Dept. of omputer Science and Engineering Michigan State University East Lansing, MI 48824 aljawado@msu.edu Yanni Sun Dept. of omputer Science

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple equence lignment Four ami Khuri Dept of omputer cience an José tate University Multiple equence lignment v Progressive lignment v Guide Tree v lustalw v Toffee v Muscle v MFFT * 20 * 0 * 60 *

More information

RNA Structure Inference & Database Search

RNA Structure Inference & Database Search RN Structure & Database School of Electrical Engineering and omputer Science niversity of Ottawa Version of October 20, 2015 www.eecs.uottawa.ca/ turcotte/teaching/bnf-5106/ RN Structure & Database 1

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

A DNA Sequence 2017/12/6 1

A DNA Sequence 2017/12/6 1 A DNA Sequence ccgtacgtacgtagagtgctagtctagtcgtagcgccgtagtcgatcgtgtgg gtagtagctgatatgatgcgaggtaggggataggatagcaacagatgagc ggatgctgagtgcagtggcatgcgatgtcgatgatagcggtaggtagacttc gcgcataaagctgcgcgagatgattgcaaagragttagatgagctgatgcta

More information

An Introduction to Sequence Similarity ( Homology ) Searching

An Introduction to Sequence Similarity ( Homology ) Searching An Introduction to Sequence Similarity ( Homology ) Searching Gary D. Stormo 1 UNIT 3.1 1 Washington University, School of Medicine, St. Louis, Missouri ABSTRACT Homologous sequences usually have the same,

More information

Multiple Sequence Alignment: HMMs and Other Approaches

Multiple Sequence Alignment: HMMs and Other Approaches Multiple Sequence Alignment: HMMs and Other Approaches Background Readings: Durbin et. al. Section 3.1, Ewens and Grant, Ch4. Wing-Kin Sung, Ch 6 Beerenwinkel N, Siebourg J. Statistics, probability, and

More information