Modeling and Searching for Non-Coding RNA

Size: px
Start display at page:

Download "Modeling and Searching for Non-Coding RNA"

Transcription

1 Modeling and Searching for Non-oding RN W.L. Ruzzo

2 Outline Why RN? Examples of RN biology omputational hallenges Modeling Search Inference

3 The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger)

4 lassical RNs mrn trn rrn snrn (small nuclear - spl icing) snorn (small nucleolar - guides for t/rrn modifications) RNseP (trn maturation; ribozyme in bacteria) SRP (signal recognition particle; co-translational targeting of proteins to membranes) telomerases

5 Non-coding RN Messenger RN - codes for proteins Non-coding RN - all the rest Before, say, mid 1990 s, 1-2 dozen known (critically important, but narrow roles: e.g. trn) Since mid 90 s dramatic discoveries Regulation, transport, stability/degradation E.g. microrn : 100 s in humans By some estimates, ncrn >> mrn

6 RN Secondary Structure: RN makes helices too Base pairs U U U U U

7 RN on the Rise In humans more RN- than DN-binding proteins? much more conserved DN than coding MUH more transcribed DN than coding In bacteria regulation of MNY genes involves RN dozens of classes & thousands of new examples in just last 5 years

8 Human Predictions Evofold S Pedersen, Bejerano, Siepel, K Rosenbloom, K Lindblad-Toh, ES Lander, J Kent, W Miller, D Haussler, "Identification and classification of conserved RN secondary structures in the human genome." PLoS omput. Biol., 2, #4 (2006) e33. 48,479 candidates (~70% FDR?)

9 RNz S Washietl, IL Hofacker, M Lukasser, H tenhofer, PF Stadler, "Mapping of conserved RN secondary structures predicts thousands of functional noncoding RNs in the human genome." Nat. Biotechnol., 23, #11 (2005) ,000 structured RN elements 1,000 conserved across all vertebrates. ~1/3 in introns of known genes, ~1/6 in UTRs ~1/2 located far from any known gene

10 FOLDLIN E Torarinsson, M Sawera, JH Havgaard, M Fredholm, J orodkin, "Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RN structure." enome Res., 16, #7 (2006) candidates from (of 100,000) pairs

11 Mfinder unpublished 6500 candidates in ENODE alone (again high FDR)

12 Fastest Human ene?

13 Origin of Life? Life needs information carrier: DN molecular machines, like enzymes: Protein making proteins needs DN + RN + proteins making (duplicating) DN needs proteins Horrible circularities! How could it have arisen in an abiotic environment?

14 Origin of Life? RN can carry information too (RN double helix) RN can form complex structures RN enzymes exist (ribozymes) The RN world hypothesis: 1st life was RN-based

15 ncrn Example: Xist large (12kb?) unstructured RN required for X-inactivation in mammals

16 ncrn Example: 6S medium size (175nt) structured highly expressed in e. coli in certain growth conditions sequenced in 1971; function unknown for 30 years

17 6S mimics an open promoter E.coli Barrick et al. RN 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NR 2005

18 ncrn Example: IRE Iron Response Element: a short conserved stemloop, bound by iron response proteins (IRPs). Found in UTRs of various mrns whose products are involved in iron metabolism. E.g., the mrn of ferritin (an iron storage protein) contains one IRE in its 5' UTR. When iron concentration is low, IRPs bind the ferritin mrn IRE. repressing translation. Binding of multiple IREs in the 3' and 5' UTRs of the transferrin receptor (involved in iron acquisition) leads to increased mrn stability. These two activities form the basis of iron homeostasis in the vertebrate cell.

19 Iron Response Element IRE (partial seed alignment): Hom.sap. UUUUUUUUUU Hom.sap. UUUUU.UUUUUUU Hom.sap. UUUUUUUUUU. Hom.sap. UUUU..UUUU.U Hom.sap. UUUUUUUUUUU Hom.sap. UUU..UUUU.UU Hom.sap. UUU..UUUUUU Hom.sap. UUU..UUU.UU Hom.sap. UUU..UUU.UU av.por. UUUUUUUU Mus.mus. UUU..UUU.UU Mus.mus. UUUUUUUUU Mus.mus. UUUUUUUUU Rat.nor. UUU..UU.UU Rat.nor. UUUUUUUUUU SS_cons <<<<<...<<<<<...>>>>>.>>>>>

20 ncrn Example: MicroRNs short (~22 nt) unstructured RNs excised from ~75nt precursor hairpin approx antisense to mrn targets, often in 3 UTR regulate gene activity, e.g. by destabilizing (plants) or otherwise suppressing (animals) message several hundred, w/ perhaps thousands of targets, are known

21 ncrn Example: T-boxes

22 ncrn Example: Riboswitches UTR structure that directly senses/binds small molecules & regulates mrn widespread in prokaryotes some in eukaryotes

23

24 ene Regulation: The MET Repressor SM Protein lberts, et al, 3e. DN

25 lberts, et al, 3e. The protein way Riboswitch alternatives (2 of 4 shown) orbino et al., enome Biol. 2005

26 Example: lycine Regulation How is glycine level regulated? Plausible answer: g TF g g TF g gce protein glycine cleavage enzyme gene g DN transcription factors (proteins) bind to DN to turn nearby genes on or off

27 The lycine Riboswitch ctual answer (in many bacteria): g g gce protein g g 5 3 gce mrn glycine cleavage enzyme gene DN Mandal et al. Science 2004

28 Why? RN s fold, and function Nature uses what works

29 Impact of RN homology search B. subtilis glycine riboswitch (Barrick, et al., 2004) operon L. innocua. tumefaciens V. cholera M. tuberculosis (and 19 more species)

30 Impact of RN homology search B. subtilis L. innocua glycine riboswitch (Barrick, et al., 2004) operon Using our techniques, we found. tumefaciens V. cholera M. tuberculosis (and 19 more species) (and 42 more species)

31 RN Informatics RN: Not just a messenger anymore Dramatic discoveries Hundreds of families (besides classics like trn, rrn, snrn ) Widespread, important roles omputational tools important Discovery, characterization, annotation BUT: slow, inaccurate, demanding

32 Q: What s so hard? U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U U : Structure often more important than sequence

33 omputational hallenges Search - given related RN s, find more Modeling - describe a related family Meta-modeling - what s a good modeling framework? M-based search Hand-curated alignments -> Ms ovariance Models

34 Predict Structure from Multiple Sequences U U U U U U ompensatory mutations reveal structure, but in usual alignment algorithms they are doubly penalized.

35 How to model an RN Motif? onceptually, start with a profile HMM: from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model mostly del ins all

36 How to model an RN Motif? ovariance Models (aka profile SF ) Probabilistic models, like profile HMMs, but adding column pairs and pair emission probabilities for base-paired regions <<<<<<< paired columns >>>>>>>

37 RN sequence analysis using covariance models Eddy & Durbin Nucleic cids Research, 1994 vol 22 #11,

38 What probabilistic model for RN families The ovariance Model Stochastic ontext-free rammar generalization of a profile HMM lgorithms for Training From aligned or unaligned sequences utomates comparative analysis omplements Nusinov/Zucker RN folding lgorithms for searching

39 Main Results Very accurate search for trn (Precursor to trnscanse - current favorite) iven sufficient data, model construction comparable to, but not quite as good as, human experts Some quantitative info on importance of pseudoknots and other tertiary features

40 Probabilistic Model Search s with HMMs, given a sequence, you calculate llikelihood ratio that the model could generate the sequence, vs a background model You set a score threshold nything above threshold => a hit Scoring: Forward / Inside algorithm - sum over all paths Viterbi approximation - find single best path (Bonus: alignment & structure prediction)

41 Example: searching for trns

42 lignment Quality

43 omparison to TRNSN Fichant & Burks - best heuristic then 97.5% true positive 0.37 false positives per MB M 1415 (trained on trusted alignment) > 99.98% true positives <0.2 false positives per MB urrent method-of-choice is trnscanse, a Mbased scan with heuristic pre-filtering (including TRNSN?) for performance reasons. Slightly different evaluation criteria

44 M Structure : Sequence + structure B: the M guide tree : probabilities of letters/ pairs & of indels Think of each branch being an HMM emitting both sides of a helix (but 3 side emitted in reverse order)

45 Overall M rchitecture One box ( node ) per node of guide tree BE/MTL/INS/DEL just like an HMM MTP & BIF are the key additions: MTP emits pairs of symbols, modeling base-pairs; BIF allows multiple helices

46 M Viterbi lignment x i = i th letter of input x ij = substring i,..., j of input T yz = P(transition y " z) y E xi,x j = P(emission of x i, x j from state y) S ij y = max # log P(x ij generated starting in state y via path #)

47 Viterbi, cont. S ij y = max " log P(x ij generated starting in state y via path ") S ij y = % z max z [S i+1, j#1 ' z max z [S i+1, j ' z & max z [S i, j#1 ' z ' max z [S i, j (' max i<k$ j [S i,k y left y + logt yz + log E xi,x j y + logt yz + log E xi ] match pair ] match/insert left + logt yz + log E x j ] match/insert right + logt yz ] delete y + S right k +1, j ] Time O(qn 3 ), q states, seq len n compare: O(qn) for profile HMM bifurcation

48 Model Training

49 Mutual Information M ij = f " f xi,xj xi,xj log xi,xj 2 f xi f xj ; 0 # M ij # 2 Max when no seq conservation but perfect pairing MI = expected score gain from using a pair state Finding optimal MI, (i.e. opt pairing of cols) is hard(?) Finding optimal MI without pseudoknots can be done by dynamic programming

50 M.I. Example (rtificial) * * MI: U U U U U U U U U U U U U U U U U U U U 2 0 U U 1 U U U U U U U U U U U U U U U U U U ols 1 & 9, 2 & 8: perfect conservation & might be basepaired, but unclear whether they are. M.I. = 0 ols 3 & 7: No conservation, but always W- pairs, so seems likely they do base-pair. M.I. = 2 bits U ols 7->6: unconserved, but each letter in 7 has only 2 possible mates in 6. M.I. = 1 bit.

51

52

53 MI-Based Structure-Learning # S i+1, j % S S i, j = max$ i, j"1 S i+1, j"1 + M i, j &% max i< j<k S i,k + S k +1, j just like Nussinov/Zucker folding BUT, need enough data---enough sequences at right phylogenetic distance

54 Pseudoknots disallowed allowed # n % & " max j M $ i=1 i, j (/2 '

55

56 ccelerating M search Zasha Weinberg & W.L. Ruzzo Recomb 04, Bioinformatics 04, 06

57 Rfam database (Release 7.0, 3/2005) 503 ncrn families 280,000 annotated ncrns 8 riboswitches, 235 small nucleolar RNs, 8 spliceosomal RNs, 10 bacterial antisense RNs, 46 microrns, 9 ribozymes, 122 cis RN regulatory elements,

58 Rfam Input (hand-tuned): MS SS_cons Score Thresh T Window Len W Output: M scan results IRE (partial seed alignment): Hom.sap. UUUUUUUUUU Hom.sap. UUUUU.UUUUUUU Hom.sap. UUUUUUUUUU. Hom.sap. UUUU..UUUU.U Hom.sap. UUUUUUUUUUU Hom.sap. UUU..UUUU.UU Hom.sap. UUU..UUUUUU Hom.sap. UUU..UUU.UU Hom.sap. UUU..UUU.UU av.por. UUUUUUUU Mus.mus. UUU..UUU.UU Mus.mus. UUUUUUUUU Mus.mus. UUUUUUUUU Rat.nor. UUU..UU.UU Rat.nor. UUUUUUUUUU SS_cons <<<<<...<<<<<...>>>>>.>>>>>

59 ovariance Model Key difference of M vs HMM: Pair states emit paired symbols, corresponding to base-paired nucleotides; 16 emission probabilities here.

60 M s are good, but slow Rfam Reality EMBL Our Work EMBL Rfam oal EMBL Blast Z junk M hits 1 month, 1000 computers M hits ~2 months, 1000 computers junk M hits 10 years, 1000 computers

61 Oversimplified M (for pedagogical purposes only) U U U U

62 M to HMM 25 emisions per state 5 emissions per state, 2x states U U U U U U U U M HMM

63 Key Issue: 25 scores 10 M U U U U HMM Need: log Viterbi scores M HMM

64 Viterbi/Forward Scoring Path π defines transitions/emissions Score(π) = product of probabilities on π NB: ok if probabilities aren t, e.g. E.g. in M, emissions are odds ratios vs 0thorder background For any nucleotide sequence x: Viterbi-score(x) = max{ score(π) π emits x} Forward-score(x) = score(π) π emits x}

65 Key Issue: 25 scores 10 M U U U L R U HMM Need: log Viterbi scores M HMM P L + R P L + R P L + R P U L + R U P L + R P L + R P L + R P L + R P U L + R U P L + R NB:HMM not a prob. model

66 Rigorous Filtering ny scores satisfying the linear inequalities give rigorous filtering P L + R P L + R P L + R P U L + R U P L + R Proof: M Viterbi path score corresponding HMM path score Viterbi HMM path score (even if it does not correspond to any M path)

67 Some scores filter better P U = 1 L U + R P U = 4 L U + R ssuming U 25% Option 1: Opt 1: L U = R = R = 2 L U + (R + R )/2 = 4 Option 2: Opt 2: L U = 0, R = 1, R = 4 L U + (R + R )/2 = 2.5

68 Optimizing filtering For any nucleotide sequence x: Viterbi-score(x) = max{ score(π) π emits x } Forward-score(x) = score(π) π emits x } Expected Forward Score E(L i, R i ) = x Forward-score(x)*Pr(x) NB: E is a function of L i, R i only Under 0th-order background model Optimization: Minimize E(L i, R i ) subject to score L.I.s This is heuristic ( forward Viterbi filter ) But still rigorous because subject to score L.I.s

69 alculating E(L i, R i ) E(L i, R i ) = x Forward-score(x)*Pr(x) Forward-like: for every state, calculate expected score for all paths ending there, easily calculated from expected scores of predecessors & transition/ emission probabilities/scores

70 Minimizing E(L i, R i ) alculate E(L i, R i ) symbolically, in terms of emission scores, so we can do partial derivatives for numerical convex optimization algorithm "E(L 1, L 2,...) "L i Forward: Viterbi:

71 What should the probabilities be? onvex optimization problem onstraints: enforce rigorous property Objective function: filter as aggressively as possible Problem sizes: variables inequality constraints

72 Estimated Filtering Efficiency (139 Rfam 4.0 families) Filtering fraction # families (compact) # families (expanded) break even < verages 283 times faster than M

73 Results: buried treasures Name # found BLST + M # found rigorous filter + M # new Pyrococcus snorn Iron response element Histone 3 element Purine riboswitch Retron msr Hammerhead I Hammerhead III U4 snrn S-box U6 snrn U5 snrn U7 snrn

74 What if filtering is poor? Profile HMM filter discards structure info Surprise is that they usually do very well But not always; e.g. a dozen families with filtering >.01, including stars like trn Three ideas: Sub M: graft in SM for a critical part Store Pair: retain a few critical pairs Filter hains: run fast, crude filters first

75 Full M Profile HMM Sub-M filters U U U UUU Sub-M sub-m U UU Sub-profile-HMM

76 Store-pair filters Full M U U U Store pair UUU Profile HMM:

77 T Filter hains Rigorous filter Rigorous filter Rigorous filter M ncrns

78 Why run filters in series? Filter 1 Filter 2 M Filtering fraction N/ Run time (sec/kbase) M alone: 200 s/kb Filter 2 M: *200 = 12 s/kb Filter 1 Filter 2 M: * *200 = 5.5 s/kb

79 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Properties of a filter: Filtering fraction Run time (sec/kb)

80 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Simplified performance model (selectivity and speed) Independence assumptions for base pairs Use dynamic programming to rapidly explore base pair combinations

81 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Optimal rigorous filter series E

82 Estimated M time (days) faster Results: faster M: 30 years (your career) Filters: 1 month (time between school terms) 100 faster 10 faster Rigorous series of filters + M time (days)

83 Results: more sensitive than BLST # with BLST+M # with rigorous filters + M # new Rfam trn roup II intron Iron response element tmrn Lysine riboswitch nd more

84 Is there anything more to do? Rigorous filters can be too cautious E.g., 10 times slower than heuristic filters Yet only 1-3% more sensitive We want to Run scans faster with minimal loss of sensitivity Know empirically what sensitivity we re losing

85 Input Multiple Sequence lignment U <.> Heuristic Profile HMMs M Base paired columns Infinite Multiple sequence alignments U U <.> U U... (Weinberg & Ruzzo, 2006) Profile HMM

86 RO-like curves (lysine riboswitch) Filter sensitivity HMM BLST Filter sends 80% of hits to M Filtering fraction

87 trnscan-se: the leading brand (Lowe & Eddy 1997) Designed for trns Used in virtually every genome project Uses Ms Heuristics: selected 2 trn detection programs n ambitious target to shoot for

88 trnscan-se vs. heuristic HMMs rchaea Eubacteria. elegans Drosophila Human Sensitivity (%) t-se HMM Run time (hours) t-se HMM rigor (Each filter sends same number of nucleotides to M)

89 trnscan-se vs. heuristic HMMs Time to create heuristic filters for trns trnscan-se 8 papers HMM 10 seconds to type a command 15 minutes to create & calibrate HMM Point is not that heuristic HMM is better than trnscan- SE --- it s not; point is that it s in the ballpark, so may be easy way to get useful results for new families.

90 Building M s Hand-curated alignments + structure as in Rfam are great, but it doesn t scale Example pplication: iven 5-20 upstream regions (~500 nt) of orthologous bacterial genes, some (but not all) plausibly regulated by a common riboswitch, could we find it?

91 Importance of lignment Blue boxes, e.g., should be lined up. Structure is invisible otherwise.

92 SI () and SPS (B) as functions of the sequence identity for dataset 2 (see text for details) ardner, P. P. et al. Nucl. cids Res : ; doi: /nar/gki 541 opyright restrictions may apply.

93 Pfold (KH03) Test Set D Trusted alignment lustalw lignment Evolutionary Distance Knudsen & Hein, Pfold: RN secondary structure prediction using stochastic context-free grammars, Nucleic cids Research, 2003, v 31,

94 MFinder Harder: Finding Ms without alignment Yao, Weinberg & Ruzzo, Bioinformatics, 2006 Folding predictions Smart heuristics andidate alignment M Search Realign

95 Mfinder ccuracy (on Rfam families with flanking sequence) /W /W

96 Summary of Rfam test families and results

97 L i = column i; σ = (α, β) the 2 ary struct, α = unpaired, β = paired cols With MLE params, I ij is the mutual information between cols i and j

98 an find it via a simple dynamic programming alg.

99 omputational Pipeline for High Throughput Discovery of cis Regulatory Noncoding RN in Prokaryotes to appear, PLoS omp Biol Zizhen Yao 1,*, Jeffrey Barrick 4,6, Zasha Weinberg 5, Shane Neph 1,2, Ronald Breaker 3,4,5, Martin Tompa 1,2 and Walter L. Ruzzo 1,2

100 n approach for cis-regulatory RN discovery in bacteria 1. et all sequenced bacterial genomes 2. roup upstream sequences per DD 3. Find most promising genes, based on sequence motifs conserved in group 4. From those, find most promising candidates, incorporating structure in the motifs 5. From those, genome-wide searches for more instances 6. Expert analyses (Breaker Lab, Yale)

101 Identify DD group members 2946 DD groups Retrieve upstream sequences < 10 PU days Footprinter ranking < 10 PU days Mfinder 1 ~ 2 PU months motifs Motif postprocessing 1740 motifs RaveNn 10 PU months Mfinder refinement < 1 PU month Motif postprocessing 1466 motifs

102 enome Scale Search: Why Most riboswitches, e.g., are present in ~5 copies per genome Throughout (most of) clade More examples give better model, hence even more examples, fewer errors More examples give more clues to function

103 enome Scale Search: How Mfinder is directly usable for/with search Folding predictions Smart heuristics andidate alignment M Search Realign

104 Results Process largely complete in bacillus/clostridia gamma proteobacteria cyanobacteria actinobacteria nalysis ongoing

105 Some Preliminary ctino Results Rfam Family Type (metabolite) Rank THI riboswitch (thiamine) 4 ydao-yua riboswitch (unknown) 19 obalamin riboswitch (cobalamin) 21 SRP_bact gene 28 RFN riboswitch (FMN) 39 yybp-ykoy riboswitch (unknown) 48 gcvt riboswitch (glycine) 53 S_box riboswitch (SM) 401 tmrn gene Not found RNaseP gene Not found not cisregulatory

106 More Prelim ctino Results Many others (not in Rfam) are likely real of top 50: known (Rfam, 23S) 10 probable (Tbox, IRE, Lex, parp, pyrr) 7 ribosomal genes 9 potentially interesting 12 unknown or poor 12 One other being bench-verified

107 mrn leader mrn leader switch?

108

109 Rank Score # DD Rfam RV MF FP RV MF ID ene Descriptio n IlvB Thiamine pyrophosphate-requiring enzymes RF00230 T-box O3859 Predicted membrane protein RF00059 THI MetH Methionine synthase I specific DN methylase RF00162 S_box O0116 Predicted N6-adenine-specific DN methylase RF00011 RNaseP_bact_b DHBP 3,4-dihydroxy-2-butanone 4-phosphate synthase RF00050 RFN ua MP synthase RF00167 Purine cvp lycine cleavage system protein P RF00504 lycine DUF149 Uncharacterised BR, YbaB family O0718 RF00169 SRP_bact bib obalamin biosynthesis protein obd/bib RF00174 obalamin Lys Diaminopimelate decarboxylase RF00168 Lysine Ter Membrane protein Ter RF00080 yybp-ykoy MgtE Mg/o/Ni transporter MgtE RF00380 ykok lms lucosamine 6-phosphate synthetase RF00234 glms OpuBB B-type proline/glycine betaine transport systems RF00005 trn EmrE Membrane transporters of cations and cationic drug RF00442 ykk-yxkd O0398 Uncharacterized conserved protein RF00023 tmrn Table 1: Motifs that correspond to Rfam families

110 Table 3: High ranking motifs not found in Rfam Rank # DD ene: Description nnotation DHOase IIa: Dihydroorotase PyrR attenuator [22] RplL: Ribosomal protein L7/L1 L10 r-protein leader; see Supp RpsF: Ribosomal protein S6 S6 r-protein leader O1179: Dinucleotide-utilizing enzymes 6S RN [25] RpsJ: Ribosomal protein S10 S10 r-protein leader; see Supp Resolvase: N terminal domain Inf: Translation initiation factor 3 IF-3 r-protein leader; see Supp RpsD: Ribosomal protein S4 and related proteins S4 r-protein leader; see Supp [30] rol: haperonin roel Hrc DN binding site [46] Ribosomal L21p: Ribosomal prokaryotic L21 protein L21 r-protein leader; see Supp ad: admium resistance transporter [47] RplB: Ribosomal protein L2 S10 r-protein leader RN pol Rpb2 1: RN polymerase beta subunit O3830: T domain-containing protein Ribosomal S2: Ribosomal protein S2 S2 r-protein leader Rps: Ribosomal protein S7 S12 r-protein leader O2984: B-type uncharacterized transport system tsr: Firmicutes transcriptional repressor of class III tsr DN binding site [48] Formyl trans N: Formyl transferase PurE: Phosphoribosylcarboxyaminoimidazole O4129: Predicted membrane protein RplO: Ribosomal protein L15 L15 r-protein leader RpmJ: Ribosomal protein L36 IF-1 r-protein leader na B: na protein B-type domain Ribosomal S12: Ribosomal protein S12 S12 r-protein leader Ribosomal L4: Ribosomal protein L4/L1 family L3 r-protein leader O0742: N6-adenine-specific methylase ylbh putative RN motif [4] Pencillinase R: Penicillinase repressor BlaI, MecI DN binding site [49] Ribosomal S9: Ribosomal protein S9/S16 L13 r-protein leader; Fig Ribosomal L19: Ribosomal protein L19 L19 r-protein leader; Fig ap: lyceraldehyde-3-phosphate dehydrogenase/erythrose O4708: Predicted membrane protein O0325: Predicted enzyme with a TIM-barrel fold RpmF: Ribosomal protein L32 L32 r-protein leader LDH: L-lactate dehydrogenases spr: Predicted rrn methylase Fus: Translation elongation factors EF- r-protein leader

111 Rfam Membership Overlap Structure # Sn Sp nt Sn Sp bp Sn Sp RF00174 obalamin RF00504 lycine RF00234 glms RF00168 Lysine RF00167 Purine RF00050 RFN RF00011 RNaseP_bact_b RF00162 S_box RF00169 SRP_bact RF00230 T-box RF00059 THI RF00442 ykk-yxkd RF00380 ykok RF00080 yybp-ykoy mean median Tbl 2: Prediction accuracy compared to prokaryotic subset of Rfam full alignments. Membership: # of seqs in overlap between our predictions and Rfam s, the sensitivity (Sn) and specificity (Sp) of our membership predictions. Overlap: the avg len of overlap between our predictions and Rfam s (nt), the fractional lengths of the overlapped region in Rfam s predictions (Sn) and in ours (Sp). Structure: the avg # of correctly predicted canonical base pairs (in overlapped regions) in the secondary structure (bp), and sensitivity and specificity of our predictions. 1 fter 2nd RaveNn scan, membership Sn of lycine and obalamin increased to 76% and 98% resp., lycine Sp unchanged, but obalamin Sp dropped to 84%.

112 Identification of 22 candidate structured RNs in bacteria using the Mfinder comparative genomics pipeline to appear, NR Zasha Weinberg 1,*, Jeffrey E. Barrick 2,3, Zizhen Yao 4, dam Roth 2, Jane N. Kim 1, Jeremy ore 1, Joy Xin Wang 1, Elaine R. Lee 1, Kirsten F. Block 1, Narasimhan Sudarsan 1, Shane Neph 5, Martin Tompa 4,5, Walter L. Ruzzo 4,5 and Ronald R. Breaker 1,2,3,*

113

114 Motif RN? is? Switch? Phylum/class M,V ov. # Non cis EMM Y Y y Widespread V /309 Moco Y Y Y Widespread M,V /81 SH Y Y Y Proteobacteria M,V /41 SM-IV Y Y Y ctinobacteria V /54 O4708 Y Y y Firmicutes M,V /23 suc Y Y y -proteobacteria /40 23S-methyl Y Y n Firmicutes /37 hemb Y?? -proteobacteria V /50 (anti-hemb) (n) (n) (37) (31/37) MEB? Y n -proteobacteria /646 mini-ykk Y Y? Widespread V /205 purd y Y? -proteobacteria M /20 6 y? n ctinobacteria /27 alphatransposases? N N -proteobacteria /99 excisionase?? n ctinobacteria /27 TP y?? yanobacteria /23 cyano-30s Y Y n yanobacteria /23 lacto-1?? n Firmicutes /95 lacto-2 y N n Firmicutes /355 TD-1 y? n Spirochaetes M,V /29 TD-2 y N n Spirochaetes V /36 coccus-1? N N Firmicutes /189 gamma-150? N N -proteobacteria /27

115 RN Structures in Diverged enomic Regions of Vertebrates Torarinsson E.1,2, Yao, Z.3, RuzzoW.L.3,4and orodkin J.1

116

117 Software Infernal - (Eddy et al.) most of Eddy & Durbin RaveNn - (Weinberg) fast filtering Mfinder - (Yao) Motif discovery (local alignment)

118 Open Problems - Better M s Optional- and variable-length stems Riboswitches & other regulatory RNs often switch between conformations; better search & alignment exploiting both alternatives? ugmented M handling pseudoknots probably too slow for scan, but plausibly could be used for alignment Better use of prior knowledge? (NR tetraloops, single-stranded s )

119 Open Problems - Better algorithms & scoring incorporating phylogeny in model construction & scoring e.g. mutual information ignores it improve scoring by shuffling other ideas for scan filtering comparing & clustering RN structures search/alignment/inference with splicing

120 Open Problems - pplications & Biology clustering intergenic sequences, esp prokaryotic systematic look at eukaryotic UTRs how to cluster? how to score? swiss-cheese phylogenies evidence for selection (no dn/ds)

121 Summary ncrn is a hot topic For family homology modeling: Ms Training & search like HMM (but slower) Dramatic acceleration possible utomated model construction New computational methods lead to new discoveries

Searching for Noncoding RNA

Searching for Noncoding RNA Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger) lassical RNs mrn trn rrn snrn

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

RNA Search and! Motif Discovery"

RNA Search and! Motif Discovery RN Search and! Motif Discovery" Lecture 9! SEP 590! utumn 2008" Outline" Whirlwind tour of ncrn search & discovery" RN motif description (ovariance Model Review)" lgorithms for searching" Rigorous & heuristic

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/09sp ENOME 541 protein and DN sequence analysis to determine

More information

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007 RNA Search and Motif Discovery Lectures 18-19 CSE 527 Autumn 2007 The Human Parts List, circa 2001 1 gagcccggcc cgggggacgg gcggcgggat agcgggaccc cggcgcggcg gtgcgcttca 61 gggcgcagcg gcggccgcag accgagcccc

More information

ROC TPR FPR

ROC TPR FPR HW5 2 3 TPR 0.0 0.2 0.4 0.6 0.8 1.0 291 411 537 669 807 957 1098 1269 1506 1971 8703 171 51 RO TPR = True Positive Rate = Sensitivity = Recall = TP/(TP+FN) FPR = False Positive Rate = 1 Specificity = FP/(FP+TN)

More information

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein! The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral

More information

RNA Secondary Structure. CSE 417 W.L. Ruzzo

RNA Secondary Structure. CSE 417 W.L. Ruzzo RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger

More information

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery SEP 527 omputational Biology RN: Function, Secondary Structure Prediction, Search, Discovery The Message ells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex

More information

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes Zizhen Yao 1*, Jeffrey Barrick 2, Zasha Weinberg 3, Shane Neph 1,4, Ronald Breaker 2,3,5, Martin Tompa

More information

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna. onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Detecting non-coding RNA in Genomic Sequences

Detecting non-coding RNA in Genomic Sequences Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Chapter 15 Active Reading Guide Regulation of Gene Expression

Chapter 15 Active Reading Guide Regulation of Gene Expression Name: AP Biology Mr. Croft Chapter 15 Active Reading Guide Regulation of Gene Expression The overview for Chapter 15 introduces the idea that while all cells of an organism have all genes in the genome,

More information

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

DUPLICATED RNA GENES IN TELEOST FISH GENOMES DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

proteins are the basic building blocks and active players in the cell, and

proteins are the basic building blocks and active players in the cell, and 12 RN Secondary Structure Sources for this lecture: R. Durbin, S. Eddy,. Krogh und. Mitchison, Biological sequence analysis, ambridge, 1998 J. Setubal & J. Meidanis, Introduction to computational molecular

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

+ regulation. ribosomes

+ regulation. ribosomes central dogma + regulation rpl DNA tsx rrna trna mrna ribosomes tsl ribosomal proteins structural proteins transporters enzymes srna regulators RNAp DNAp tsx initiation control by transcription factors

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

RNA Abstract Shape Analysis

RNA Abstract Shape Analysis ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

Regulation of Gene Expression at the level of Transcription

Regulation of Gene Expression at the level of Transcription Regulation of Gene Expression at the level of Transcription (examples are mostly bacterial) Diarmaid Hughes ICM/Microbiology VT2009 Regulation of Gene Expression at the level of Transcription (examples

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

NcRNA Homology Search Using Hamming Distance Seeds

NcRNA Homology Search Using Hamming Distance Seeds NcRN Homology Search Using Hamming Distance Seeds Osama ljawad Dept. of omputer Science and Engineering Michigan State University East Lansing, MI 48824 aljawado@msu.edu Yanni Sun Dept. of omputer Science

More information

UE Praktikum Bioinformatik

UE Praktikum Bioinformatik UE Praktikum Bioinformatik WS 08/09 University of Vienna 7SK snrna 7SK was discovered as an abundant small nuclear RNA in the mid 70s but a possible function has only recently been suggested. Two independent

More information

Towards a Comprehensive Annotation of Structured RNAs in Drosophila

Towards a Comprehensive Annotation of Structured RNAs in Drosophila Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Gene Expression. Molecular Genetics, March, 2018

Gene Expression. Molecular Genetics, March, 2018 Gene Expression Molecular Genetics, March, 2018 Gene Expression Control of Protein Levels Bacteria Lac Operon Promoter mrna Inducer CAP Control Trp Operon RepressorOperator Control Attenuation Riboswitches

More information

Genetics 304 Lecture 6

Genetics 304 Lecture 6 Genetics 304 Lecture 6 00/01/27 Assigned Readings Busby, S. and R.H. Ebright (1994). Promoter structure, promoter recognition, and transcription activation in prokaryotes. Cell 79:743-746. Reed, W.L. and

More information

Regulation of Gene Expression

Regulation of Gene Expression Chapter 18 Regulation of Gene Expression Edited by Shawn Lester PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley

More information

GENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES.

GENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES. GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES!! www.clutchprep.com GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES COCEPT: LAC OPERO An operon is a group of genes with similar functions that are

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

Chapter 12. Genes: Expression and Regulation

Chapter 12. Genes: Expression and Regulation Chapter 12 Genes: Expression and Regulation 1 DNA Transcription or RNA Synthesis produces three types of RNA trna carries amino acids during protein synthesis rrna component of ribosomes mrna directs protein

More information

RNA Protein Interaction

RNA Protein Interaction Introduction RN binding in detail structural analysis Examples RN Protein Interaction xel Wintsche February 16, 2009 xel Wintsche RN Protein Interaction Introduction RN binding in detail structural analysis

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes

Chapter 16 Lecture. Concepts Of Genetics. Tenth Edition. Regulation of Gene Expression in Prokaryotes Chapter 16 Lecture Concepts Of Genetics Tenth Edition Regulation of Gene Expression in Prokaryotes Chapter Contents 16.1 Prokaryotes Regulate Gene Expression in Response to Environmental Conditions 16.2

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Prokaryotic Regulation

Prokaryotic Regulation Prokaryotic Regulation Control of transcription initiation can be: Positive control increases transcription when activators bind DNA Negative control reduces transcription when repressors bind to DNA regulatory

More information

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17.

Genetic Variation: The genetic substrate for natural selection. Horizontal Gene Transfer. General Principles 10/2/17. Genetic Variation: The genetic substrate for natural selection What about organisms that do not have sexual reproduction? Horizontal Gene Transfer Dr. Carol E. Lee, University of Wisconsin In prokaryotes:

More information

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task.

METHODS FOR DETERMINING PHYLOGENY. In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Chapter 12 (Strikberger) Molecular Phylogenies and Evolution METHODS FOR DETERMINING PHYLOGENY In Chapter 11, we discovered that classifying organisms into groups was, and still is, a difficult task. Modern

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

Searching genomes for non-coding RNA using FastR

Searching genomes for non-coding RNA using FastR Searching genomes for non-coding RNA using FastR Shaojie Zhang Brian Haas Eleazar Eskin Vineet Bafna Keywords: non-coding RNA, database search, filtration, riboswitch, bacterial genome. Address for correspondence:

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

15.2 Prokaryotic Transcription *

15.2 Prokaryotic Transcription * OpenStax-CNX module: m52697 1 15.2 Prokaryotic Transcription * Shannon McDermott Based on Prokaryotic Transcription by OpenStax This work is produced by OpenStax-CNX and licensed under the Creative Commons

More information

Translation and Operons

Translation and Operons Translation and Operons You Should Be Able To 1. Describe the three stages translation. including the movement of trna molecules through the ribosome. 2. Compare and contrast the roles of three different

More information

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16

Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Tuesday, December 27, 16 Big Idea 3: Living systems store, retrieve, transmit and respond to information essential to life processes. Enduring understanding 3.B: Expression of genetic information involves cellular and molecular

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g.

Co-ordination occurs in multiple layers Intracellular regulation: self-regulation Intercellular regulation: coordinated cell signalling e.g. Gene Expression- Overview Differentiating cells Achieved through changes in gene expression All cells contain the same whole genome A typical differentiated cell only expresses ~50% of its total gene Overview

More information

Graph Alignment and Biological Networks

Graph Alignment and Biological Networks Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale

More information

De novo prediction of structural noncoding RNAs

De novo prediction of structural noncoding RNAs 1/ 38 De novo prediction of structural noncoding RNAs Stefan Washietl 18.417 - Fall 2011 2/ 38 Outline Motivation: Biological importance of (noncoding) RNAs Algorithms to predict structural noncoding RNAs

More information

Flow of Genetic Information

Flow of Genetic Information presents Flow of Genetic Information A Montagud E Navarro P Fernández de Córdoba JF Urchueguía Elements Nucleic acid DNA RNA building block structure & organization genome building block types Amino acid

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem

Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem University of Groningen Computational methods for the analysis of bacterial gene regulation Brouwer, Rutger Wubbe Willem IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi

More information