Modeling and Searching for Non-Coding RNA
|
|
- Clifford Nash
- 5 years ago
- Views:
Transcription
1 Modeling and Searching for Non-oding RN W.L. Ruzzo
2 The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger)
3 lassical RNs mrn trn rrn snrn (small nuclear - spl icing) snorn (small nucleolar - guides for t/rrn modifications) RNseP (trn maturation; ribozyme in bacteria) SRP (signal recognition particle; co-translational targeting of proteins to membranes) telomerases
4 Non-coding RN Messenger RN - codes for proteins Non-coding RN - all the rest Before, say, mid 1990 s, 1-2 dozen known (critically important, but narrow roles: e.g. trn) Since mid 90 s dramatic discoveries Regulation, transport, stability/degradation E.g. microrn : 100 s in humans By some estimates, ncrn >> mrn
5 ncrn Example: Xist large (12kb?) unstructured RN required for X-inactivation in mammals
6 ncrn Example: 6S medium size (175nt) structured highly expressed in e. coli in certain growth conditions sequenced in 1971; function unknown for 30 years
7 ncrn Example: IRE Iron Response Element: a short conserved stemloop, bound by iron response proteins (IRPs). Found in TRs of various mrns whose products are involved in iron metabolism. E.g., the mrn of ferritin (an iron storage protein) contains one IRE in its 5' TR. When iron concentration is low, IRPs bind the ferritin mrn IRE. repressing translation. Binding of multiple IREs in the 3' and 5' TRs of the transferrin receptor (involved in iron acquisition) leads to increased mrn stability. These two activities form the basis of iron homeostasis in the vertebrate cell.
8 ncrn Example: MicroRNs short (~22 nt) unstructured RNs excised from ~75nt precursor hairpin approx antisense to mrn targets, often in 3 TR regulate gene activity, e.g. by destabilizing (plants) or otherwise suppressing (animals) message several hundred, w/ perhaps thousands of targets, are known
9 ncrn Example: Riboswitches TR structure that directly senses/binds small molecules & regulates mrn widespread in prokaryotes some in eukaryotes
10 E.g.: the lycine Riboswitch lycine - simplest amino acid ses - make proteins, make energy Not enough OR too much - wasteful Hypothetical answer: g prot g g prot g gcvt protein gcvt (glycine cleavage) g gcvt promoter
11 The lycine Riboswitch ctual answer (in many bacteria): Look Ma, no protein g g gcvt protein g gcvt mrn gcvt (glycine cleavage)
12 Why? RN s fold, and function Nature uses what works
13 entral Dogma = entral hicken & Egg? DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Was there once an RN World?
14 Outline ncrn: what/why? What does computation bring? How to model and search for ncrn? Faster search Better model inference
15 Iron Response Element IRE (partial seed alignment): Hom.sap. Hom.sap.. Hom.sap.. Hom.sap.... Hom.sap. Hom.sap.... Hom.sap... Hom.sap.... Hom.sap.... av.por. Mus.mus.... Mus.mus. Mus.mus. Rat.nor.... Rat.nor. SS_cons <<<<<...<<<<<...>>>>>.>>>>>
16 6S mimics an open promoter E.coli Barrick et al. RN 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NR 2005
17 Dengue virus genome: ORF 3 element Known distribution (96% sequence identity) Dengue virus With our techniques (70% sequence identity) Dengue virus West Nile virus Yellow Fever Omsk Hemorrhagic fever Japanese encephalitis Tick-borne encephalitis Kunjin virus Langat virus Louping ill Murray Valley virus Powassan virus
18 polyadenylation inhibition element RN 1 small nuclear ribonucleoprotein RN element 3 TR Known distribution (90% sequence identity) Human, mouse, rabbit With our techniques (75% sequence identity) Human, mouse, rabbit Zebrafish, Tetraodon, Fugu Frog
19 Impact of RN homology search B. subtilis glycine riboswitch (Barrick, et al., 2004) operon L. innocua. tumefaciens V. cholera M. tuberculosis (and 19 more species)
20 Impact of RN homology search B. subtilis L. innocua glycine riboswitch (Barrick, et al., 2004) operon sing our techniques, we found. tumefaciens V. cholera M. tuberculosis (and 19 more species) (and 42 more species)
21 The lycine Riboswitch ctual answer (in many bacteria): Look Ma, no protein g g g g gcvt mrn gcvt protein gcvt (glycine cleavage)
22 (Mandal, Lee, Barrick, Weinberg, Emilsson, Ruzzo, Breaker, Science 2004) 5 nd More examples means better alignment nderstand phylogenetic distribution Find riboswitch in front of new gene gcvt ORF 3
23
24 RN Informatics RN: Not just a messenger anymore Dramatic discoveries Hundreds of families (besides classics like trn, rrn, snrn ) Widespread, important roles omputational tools important Discovery, characterization, annotation BT: slow, inaccurate, demanding
25 Q: What s so hard? : Structure often more important than sequence
26 omputational hallenges Search - given related RN s, find more Modeling - describe a related family Meta-modeling - what s a good modeling framework? M-based search Hand-curated alignments -> Ms ovariance Models
27 Predict Structure from Multiple Sequences ompensatory mutations reveal structure, but in usual alignment algorithms they are doubly penalized.
28 How to model an RN Motif? onceptually, start with a profile HMM: from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model mostly del ins all
29 How to model an RN Motif? ovariance Models (aka profile SF ) Probabilistic models, like profile HMMs, but adding column pairs and pair emission probabilities for base-paired regions <<<<<<< paired columns >>>>>>>
30 RN sequence analysis using covariance models Eddy & Durbin Nucleic cids Research, 1994 vol 22 #11,
31 What probabilistic model for RN families The ovariance Model Stochastic ontext-free rammar generalization of a profile HMM lgorithms for Training From aligned or unaligned sequences utomates comparative analysis omplements Nusinov/Zucker RN folding lgorithms for searching
32 Main Results Very accurate search for trn (Precursor to trnscanse - current favorite) iven sufficient data, model construction comparable to, but not quite as good as, human experts Some quantitative info on importance of pseudoknots and other tertiary features
33 Probabilistic Model Search s with HMMs, given a sequence, you calculate llikelihood ratio that the model could generate the sequence, vs a background model You set a score threshold nything above threshold => a hit Scoring: Forward / Inside algorithm - sum over all paths Viterbi approximation - find single best path (Bonus: alignment & structure prediction)
34 Example: searching for trns
35 lignment Quality
36 omparison to TRNSN Fichant & Burks - best heuristic then 97.5% true positive 0.37 false positives per MB M 1415 (trained on trusted alignment) > 99.98% true positives <0.2 false positives per MB urrent method-of-choice is trnscanse, a Mbased scan with heuristic pre-filtering (including TRNSN?) for performance reasons. Slightly different evaluation criteria
37 M Structure : Sequence + structure B: the M guide tree : probabilities of letters/ pairs & of indels Think of each branch being an HMM emitting both sides of a helix (but 3 side emitted in reverse order)
38 Overall M rchitecture One box ( node ) per node of guide tree BE/MTL/INS/DEL just like an HMM MTP & BIF are the key additions: MTP emits pairs of symbols, modeling base-pairs; BIF allows multiple helices
39 M Viterbi lignment x i = i th letter of input x ij = substring i,..., j of input T yz = P(transition y " z) y E xi,x j = P(emission of x i, x j from state y) S ij y = max # log P(x ij generated starting in state y via path #)
40 Viterbi, cont. S ij y = max " log P(x ij generated starting in state y via path ") S ij y = % z max z [S i+1, j#1 ' z max z [S i+1, j ' z & max z [S i, j#1 ' z ' max z [S i, j (' max i<k$ j [S i,k y left y + logt yz + log E xi,x j y + logt yz + log E xi ] match pair ] match/insert left + logt yz + log E x j ] match/insert right + logt yz ] delete y + S right k +1, j ] bifurcation
41 Model Training
42 Mutual Information M ij = f " f xi,xj xi,xj log xi,xj 2 f xi f xj ; 0 # M ij # 2 Max when no sequence conservation but perfect pairing MI = expected score gain from using a pair state Finding optimal MI, (i.e. optimal pairing of columns) is NP-hard(?) Finding optimal MI without pseudoknots can be done by dynamic programming
43
44 MI-Based Structure-Learning # S i+1, j % S S i, j = max$ i, j"1 S i+1, j"1 + M i, j &% max i< j<k S i,k + S k +1, j just like Nussinov/Zucker folding BT, need enough data---enough sequences at right phylogenetic distance
45 Pseudoknots disallowed allowed # n % & " max j M $ i=1 i, j (/2 '
46
47 ccelerating M search Zasha Weinberg & W.L. Ruzzo Recomb 04, Bioinformatics 04, 06
48 Rfam database (Release 7.0, 3/2005) 503 ncrn families 280,000 annotated ncrns 8 riboswitches, 235 small nucleolar RNs, 8 spliceosomal RNs, 10 bacterial antisense RNs, 46 microrns, 9 ribozymes, 122 cis RN regulatory elements,
49 Rfam Input (hand-tuned): MS SS_cons Score Thresh T Window Len W Output: M scan results IRE (partial seed alignment): Hom.sap. Hom.sap.. Hom.sap.. Hom.sap.... Hom.sap. Hom.sap.... Hom.sap... Hom.sap.... Hom.sap.... av.por. Mus.mus.... Mus.mus. Mus.mus. Rat.nor.... Rat.nor. SS_cons <<<<<...<<<<<...>>>>>.>>>>>
50 ovariance Model Key difference of M vs HMM: Pair states emit paired symbols, corresponding to base-paired nucleotides; 16 emission probabilities here.
51 M s are good, but slow Rfam Reality EMBL Our Work EMBL Rfam oal EMBL Blast Z junk M hits 1 month, 1000 computers M hits ~2 months, 1000 computers junk M hits 10 years, 1000 computers
52 Oversimplified M (for pedagogical purposes only)
53 M to HMM 25 emisions per state 5 emissions per state, 2x states M HMM
54 Key Issue: 25 scores 10 M HMM Need: log Viterbi scores M HMM
55 Viterbi/Forward Scoring Path π defines transitions/emissions Score(π) = product of probabilities on π NB: ok if probabilities aren t, e.g. E.g. in M, emissions are odds ratios vs 0thorder background For any nucleotide sequence x: Viterbi-score(x) = max{ score(π) π emits x} Forward-score(x) = score(π) π emits x}
56 Key Issue: 25 scores 10 M L R HMM Need: log Viterbi scores M HMM P L + R P L + R P L + R P L + R P L + R P L + R P L + R P L + R P L + R P L + R NB:HMM not a prob. model
57 Rigorous Filtering ny scores satisfying the linear inequalities give rigorous filtering P L + R P L + R P L + R P L + R P L + R Proof: M Viterbi path score corresponding HMM path score Viterbi HMM path score (even if it does not correspond to any M path)
58 Some scores filter better P = 1 L + R P = 4 L + R ssuming 25% Option 1: Opt 1: L = R = R = 2 L + (R + R )/2 = 4 Option 2: Opt 2: L = 0, R = 1, R = 4 L + (R + R )/2 = 2.5
59 Optimizing filtering For any nucleotide sequence x: Viterbi-score(x) = max{ score(π) π emits x } Forward-score(x) = score(π) π emits x } Expected Forward Score E(L i, R i ) = x Forward-score(x)*Pr(x) NB: E is a function of L i, R i only nder 0th-order background model Optimization: Minimize E(L i, R i ) subject to score L.I.s This is heuristic ( forward Viterbi filter ) But still rigorous because subject to score L.I.s
60 alculating E(L i, R i ) E(L i, R i ) = x Forward-score(x)*Pr(x) Forward-like: for every state, calculate expected score for all paths ending there, easily calculated from expected scores of predecessors & transition/ emission probabilities/scores
61 Minimizing E(L i, R i ) alculate E(L i, R i ) symbolically, in terms of emission scores, so we can do partial derivatives for a numerical convex optimization algorithm "E(L 1,L 2,...) "L i
62 What should the probabilities be? onvex optimization problem onstraints: enforce rigorous property Objective function: filter as aggressively as possible Problem sizes: variables inequality constraints
63 Estimated Filtering Efficiency (139 Rfam 4.0 families) Filtering fraction # families (compact) # families (expanded) break even < verages 283 times faster than M
64 Results: buried treasures Name # found BLST + M # found rigorous filter + M # new Pyrococcus snorn Iron response element Histone 3 element Purine riboswitch Retron msr Hammerhead I Hammerhead III snrn S-box snrn snrn snrn
65 What if filtering is poor? Profile HMM filter discards structure info Surprise is that they usually do very well But not always; e.g. a dozen families with filtering >.01, including stars like trn Three ideas: Sub M: graft in SM for a critical part Store Pair: retain a few critical pairs Filter hains: run fast, crude filters first
66 Full M Profile HMM Sub-M filters Sub-M sub-m Sub-profile-HMM
67 Store-pair filters Full M Store pair Profile HMM:
68 T Filter hains Rigorous filter Rigorous filter Rigorous filter M ncrns
69 Why run filters in series? Filter 1 Filter 2 M Filtering fraction N/ Run time (sec/kbase) M alone: 200 s/kb Filter 2 M: *200 = 12 s/kb Filter 1 Filter 2 M: * *200 = 5.5 s/kb
70 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Properties of a filter: Filtering fraction Run time (sec/kb)
71 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Simplified performance model (selectivity and speed) Independence assumptions for base pairs se dynamic programming to rapidly explore base pair combinations
72 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Optimal rigorous filter series E
73 Estimated M time (days) faster Results: faster M: 30 years (your career) Filters: 1 month (time between school terms) 100 faster 10 faster Rigorous series of filters + M time (days)
74 Results: more sensitive than BLST # with BLST+M # with rigorous filters + M # new Rfam trn roup II intron Iron response element tmrn Lysine riboswitch nd more
75 Is there anything more to do? Rigorous filters can be too cautious E.g., 10 times slower than heuristic filters Yet only 1-3% more sensitive We want to Run scans faster with minimal loss of sensitivity Know empirically what sensitivity we re losing
76 Input Multiple Sequence lignment <.> Heuristic Profile HMMs M Base paired columns Infinite Multiple sequence alignments <.>... (Weinberg & Ruzzo, 2006) Profile HMM
77 RO-like curves (lysine riboswitch) Filter sensitivity HMM BLST Filter sends 80% of hits to M Filtering fraction
78 trnscan-se: the leading brand (Lowe & Eddy 1997) Designed for trns sed in virtually every genome project ses Ms Heuristics: selected 2 trn detection programs n ambitious target to shoot for
79 trnscan-se vs. heuristic HMMs rchaea Eubacteria. elegans Drosophila Human Sensitivity (%) t-se HMM Run time (hours) t-se HMM rigor (Each filter sends same number of nucleotides to M)
80 trnscan-se vs. heuristic HMMs Time to create heuristic filters for trns trnscan-se 8 papers HMM 10 seconds to type a command 15 minutes to create & calibrate HMM Point is not that heuristic HMM is better than trnscan- SE --- it s not; point is that it s in the ballpark, so may be easy way to get useful results for new families.
81 Building M s Hand-curated alignments + structure as in Rfam are great, but it doesn t scale Example pplication: iven 5-20 upstream regions (~500 nt) of orthologous bacterial genes, some (but not all) plausibly regulated by a common riboswitch, could we find it?
82 MFinder Harder: Finding Ms without alignment Yao, Weinberg & Ruzzo, Bioinformatics, 2006 Folding predictions Smart heuristics andidate alignment M Search Realign
83 Mfinder ccuracy (on Rfam families with flanking sequence) /W /W
84 Importance of lignment Blue boxes, e.g., should be lined up. Structure is invisible otherwise.
85 Early Semi-automated Example Started with 16 genes orthologous to fol in B. subtilis Found 10 sharing good structural motif Searched all bacterial genomes for this motif Found 234 hits Realigned these to refine structural motif Found 367 hits 257 match RFM s T-box (Based on hand-curated alignment of 67 knowns)
86
87 Initial orthologs Found by scan hloroflexus aurantiacus hloroflexi eobacter metallireducens eobacter sulphurreducens δ -Proteobacteria Symbiobacterium thermophilum
88 n approach for cis-regulatory RN discovery in bacteria 1. hoose a bacterial genome 2. For each gene, collect close orthologs 3. Find most promising genes, based on sequence motifs conserved among orthologs 4. From those, find most promising genes, incorporating structure in the motifs 5. From those, genome-wide searches for more instances 6. Expert analyses (Breaker Lab, Yale)
89 enome Scale Search: Why Most riboswitches, e.g., are present in ~5 copies per genome Throughout (most of) clade More examples give better model, hence even more examples, fewer errors More examples give more clues to function
90 enome Scale Search: How Mfinder is directly usable for/with search Folding predictions Smart heuristics andidate alignment M Search Realign
91 Results Process largely complete in bacillus/clostridia gamma proteobacteria cyanobacteria actinobacteria nalysis ongoing
92 Some Preliminary ctino Results Rfam Family Type (metabolite) Rank THI riboswitch (thiamine) 4 ydao-yua riboswitch (unknown) 19 obalamin riboswitch (cobalamin) 21 SRP_bact gene 28 RFN riboswitch (FMN) 39 yybp-ykoy riboswitch (unknown) 48 gcvt riboswitch (glycine) 53 S_box riboswitch (SM) 401 tmrn gene Not found RNaseP gene Not found not cisregulatory
93 More Prelim ctino Results Many others (not in Rfam) are likely real of top 50: known (Rfam, 23S) 10 probable (Tbox, IRE, Lex, parp, pyrr) 7 ribosomal genes 9 potentially interesting 12 unknown or poor 12 One other being bench-verified
94 Software Infernal - (Eddy et al.) most of Eddy & Durbin RaveNna - (Weinberg) fast filtering Mfinder - (Yao) Motif discovery (local alignment)
95 Summary ncrn is a hot topic For family homology modeling: Ms Training & search like HMM (but slower) Dramatic acceleration possible utomated model construction Hopefully leading to new discoveries
Searching for Noncoding RNA
Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationRNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007
RNA Search and Motif Discovery Lectures 18-19 CSE 527 Autumn 2007 The Human Parts List, circa 2001 1 gagcccggcc cgggggacgg gcggcgggat agcgggaccc cggcgcggcg gtgcgcttca 61 gggcgcagcg gcggccgcag accgagcccc
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More informationRNA Secondary Structure. CSE 417 W.L. Ruzzo
RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger
More informationModeling and Searching for Non-Coding RNA
Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo Outline Why RN? Examples of RN biology omputational hallenges Modeling Search Inference The entral Dogma DN RN
More informationThe Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!
The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral
More informationModeling and Searching for Non-Coding RNA
Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/09sp ENOME 541 protein and DN sequence analysis to determine
More informationRNA Search and! Motif Discovery"
RN Search and! Motif Discovery" Lecture 9! SEP 590! utumn 2008" Outline" Whirlwind tour of ncrn search & discovery" RN motif description (ovariance Model Review)" lgorithms for searching" Rigorous & heuristic
More informationROC TPR FPR
HW5 2 3 TPR 0.0 0.2 0.4 0.6 0.8 1.0 291 411 537 669 807 957 1098 1269 1506 1971 8703 171 51 RO TPR = True Positive Rate = Sensitivity = Recall = TP/(TP+FN) FPR = False Positive Rate = 1 Specificity = FP/(FP+TN)
More informationIn Genomes, Two Types of Genes
In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns
More informationCSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery
SEP 527 omputational Biology RN: Function, Secondary Structure Prediction, Search, Discovery The Message ells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex
More informationConserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.
onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationA Method for Aligning RNA Secondary Structures
Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN
More informationDUPLICATED RNA GENES IN TELEOST FISH GENOMES
DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationBio 1B Lecture Outline (please print and bring along) Fall, 2007
Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution
More informationA graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information
graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of
More informationDetecting non-coding RNA in Genomic Sequences
Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationIntroduction to Molecular and Cell Biology
Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More information2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology
2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationL3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus
L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationproteins are the basic building blocks and active players in the cell, and
12 RN Secondary Structure Sources for this lecture: R. Durbin, S. Eddy,. Krogh und. Mitchison, Biological sequence analysis, ambridge, 1998 J. Setubal & J. Meidanis, Introduction to computational molecular
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationHow much non-coding DNA do eukaryotes require?
How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationGene Expression. Molecular Genetics, March, 2018
Gene Expression Molecular Genetics, March, 2018 Gene Expression Control of Protein Levels Bacteria Lac Operon Promoter mrna Inducer CAP Control Trp Operon RepressorOperator Control Attenuation Riboswitches
More informationGenome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.
Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationHonors Biology Reading Guide Chapter 11
Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins
More information3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.
3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of
More informationGenomes and Their Evolution
Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from
More information+ regulation. ribosomes
central dogma + regulation rpl DNA tsx rrna trna mrna ribosomes tsl ribosomal proteins structural proteins transporters enzymes srna regulators RNAp DNAp tsx initiation control by transcription factors
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationComputational approaches for functional genomics
Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding
More informationNewly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:
m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationEnsembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:
Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,
More informationCopyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained
Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
More informationName: SBI 4U. Gene Expression Quiz. Overall Expectation:
Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):
More informationControlling Gene Expression
Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationRNA Abstract Shape Analysis
ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:
More informationBIOINFORMATICS LAB AP BIOLOGY
BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to
More informationLecture 18 June 2 nd, Gene Expression Regulation Mutations
Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase
More informationGCD3033:Cell Biology. Transcription
Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors
More informationProkaryotic Regulation
Prokaryotic Regulation Control of transcription initiation can be: Positive control increases transcription when activators bind DNA Negative control reduces transcription when repressors bind to DNA regulatory
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationGENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES.
GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES!! www.clutchprep.com GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES COCEPT: LAC OPERO An operon is a group of genes with similar functions that are
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationSearching genomes for non-coding RNA using FastR
Searching genomes for non-coding RNA using FastR Shaojie Zhang Brian Haas Eleazar Eskin Vineet Bafna Keywords: non-coding RNA, database search, filtration, riboswitch, bacterial genome. Address for correspondence:
More informationEukaryotic vs. Prokaryotic genes
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,
More informationPredicting RNA Secondary Structure
7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationHidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)
Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P
More informationBio 119 Bacterial Genomics 6/26/10
BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis
More informationComparative Bioinformatics Midterm II Fall 2004
Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans
More informationBLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010
BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for
More informationComplete all warm up questions Focus on operon functioning we will be creating operon models on Monday
Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA
More informationOrganization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p
Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationIntroduction. Gene expression is the combined process of :
1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression
More informationGenomics and bioinformatics summary. Finding genes -- computer searches
Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationRelated Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.
CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationEarly History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species
Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0
More informationMotifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.
Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationREVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E
REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More informationCHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON
PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationSequence alignment methods. Pairwise alignment. The universe of biological sequence analysis
he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction
More informationGENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications
1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription
More informationRegulation of Transcription in Eukaryotes
Regulation of Transcription in Eukaryotes Leucine zipper and helix-loop-helix proteins contain DNA-binding domains formed by dimerization of two polypeptide chains. Different members of each family can
More informationBioinformatics Exercises
Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted
More informationThe Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11
The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding
More informationThe Gene The gene; Genes Genes Allele;
Gene, genetic code and regulation of the gene expression, Regulating the Metabolism, The Lac- Operon system,catabolic repression, The Trp Operon system: regulating the biosynthesis of the tryptophan. Mitesh
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More informationA Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes
A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes Zizhen Yao 1*, Jeffrey Barrick 2, Zasha Weinberg 3, Shane Neph 1,4, Ronald Breaker 2,3,5, Martin Tompa
More informationRNA Protein Interaction
Introduction RN binding in detail structural analysis Examples RN Protein Interaction xel Wintsche February 16, 2009 xel Wintsche RN Protein Interaction Introduction RN binding in detail structural analysis
More informationWarm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)
Warm-Up Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22) Yesterday s Picture The first cell on Earth (approx. 3.5 billion years ago) was simple and prokaryotic,
More informationFrom gene to protein. Premedical biology
From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,
More informationBLAST. Varieties of BLAST
BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationNeural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this
More informationMultiple Sequence Alignment
Multiple equence lignment Four ami Khuri Dept of omputer cience an José tate University Multiple equence lignment v Progressive lignment v Guide Tree v lustalw v Toffee v Muscle v MFFT * 20 * 0 * 60 *
More information