ROC TPR FPR

Size: px
Start display at page:

Download "ROC TPR FPR"

Transcription

1 HW5

2 2

3 3

4 TPR RO TPR = True Positive Rate = Sensitivity = Recall = TP/(TP+FN) FPR = False Positive Rate = 1 Specificity = FP/(FP+TN) Precision, aka PPV = TP/(TP+FP) FPR 4

5 TPR FPR 5

6 * scores * orfs$len 6

7 scores log2(orfs$len) 7

8 FPR FPR scores scores TPR TPR orfs$len log2(orfs$len)

9 RNA Search and Motif Discovery SEP 590 A omputational Biology

10 15

11 Approaches to Structure Prediction Maximum Pairing + works on single sequences + simple - too inaccurate Minimum Energy + works on single sequences - ignores pseudoknots - only finds optimal fold Partition Function + finds all folds - ignores pseudoknots

12 Optimal pairing of r i... r j Two possibilities j Unpaired: Find best pairing of r i... r j-1 j! i! j-1! j Paired (with some k): Find best r i... r k-1 + best r k+1... r j-1 plus 1 i! k-1! k! Why is it slow? Why do pseudoknots matter? j! j-1! k+1!

13 Nussinov: A omputation Order B(i,j) = # pairs in optimal pairing of r i... r j B(i,j) = 0 for all i, j with i j-4; otherwise B(i,j) = max of: B(i,j-1) Or energy max { B(i,k-1)+1+B(k+1,j-1) i k < j-4 and r k -r j may pair} Time: O(n 3 ) Loop-based energy version is better; recurrences similar, slightly messier K=

14 Today Structure prediction via comparative analysis ovariance Models (Ms) represent RNA sequence/structure motifs Fast M search Motif Discovery Applications in prokaryotes & vertebrates 19

15 Approaches, II omparative sequence analysis + handles all pairings (potentially incl. pseudoknots) - requires several (many?) aligned, appropriately diverged sequences Stochastic ontext-free rammars Roughly combines min energy & comparative, but no pseudoknots Physical experiments (x-ray crystalography, NMR) Today

16 ovariation is strong evidence for base pairing 21

17 Example: Ribosomal Autoregulation: Excess L19 represses L19 (RF00556; similar) A B P2 L19 (rpls) mrna leader Y U A U A R R Y U R R U 5' Y 3' P1 N N N TSS nucleotide identity nucleotide present 97% 97% 90% 90% 75% 75% 50% A stem loop always present compensatory mutations compatible mutations Watson-rick base pair other base interaction 5' P1 P2 RBS Start A A U U U U U A A U A U U 3' U U U A A A U A A U A U A A U U U U B. subtilis L19 mrna leader U U U U U L19 U U U A A A A U U U U U A U A 3' U A A U U U A 5' A A A A? 22

18 Mutual Information M ij = f f xi,xj xi,xj log xi,xj 2 f xi f xj ; 0 M ij 2 Max when no seq conservation but perfect pairing MI = expected score gain from using a pair state (below) Finding optimal MI, (i.e. opt pairing of cols) is hard(?) Finding optimal MI without pseudoknots can be done by dynamic programming 23

19 27

20 omputational Problems How to predict secondary structure How to model an RNA motif (I.e., sequence/structure pattern) iven a motif, how to search for instances iven (unaligned) sequences, find motifs How to score discovered motifs How to leverage prior knowledge 29

21 Motif Description

22 RNA Motif Models ovariance Models (Eddy & Durbin 1994) aka profile stochastic context-free grammars aka hidden Markov models on steroids Model position-specific nucleotide preferences and base-pair preferences Pro: accurate on: model building hard, search slow 32

23 Eddy & Durbin 1994: What A probabilistic model for RNA families The ovariance Model A Stochastic ontext-free rammar A generalization of a profile HMM Algorithms for Training From aligned or unaligned sequences Automates comparative analysis omplements Nusinov/Zucker RNA folding Algorithms for searching 33

24 Main Results Very accurate search for trna (Precursor to trnascanse - current favorite) iven sufficient data, model construction comparable to, but not quite as good as, human experts Some quantitative info on importance of pseudoknots and other tertiary features 34

25 Probabilistic Model Search As with HMMs, given a sequence, you calculate likelihood ratio that the model could generate the sequence, vs a background model You set a score threshold Anything above threshold a hit Scoring: Forward / Inside algorithm - sum over all paths Viterbi approximation - find single best path (Bonus: alignment & structure prediction) 35

26 Example: searching for trnas 36

27 Profile Hmm Structure Mj: Match states (20 emission probabilities) Ij: Insert states (Background emission probabilities) Dj: Delete states (silent - no emission) 38

28 How to model an RNA Motif? onceptually, start with a profile HMM: from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model mostly all del ins 39

29 How to model an RNA Motif? Add column pairs and pair emission probabilities for base-paired regions <<<<<<< paired columns >>>>>>> 40

30 Profile Hmm Structure Mj: Match states (20 emission probabilities) Ij: Insert states (Background emission probabilities) Dj: Delete states (silent - no emission) 41

31 M Structure A: Sequence + structure B: the M guide tree : probabilities of letters/ pairs & of indels Think of each branch being an HMM emitting both sides of a helix (but 3 side emitted in reverse order) 47

32 Overall M Architecture One box ( node ) per node of guide tree BE/MATL/INS/DEL just like an HMM MATP & BIF are the key additions: MATP emits pairs of symbols, modeling basepairs; BIF allows multiple helices 48

33 M Viterbi Alignment (the inside algorithm) x i x ij = i th letter of input = substring i,..., j of input T yz = P(transition y z) y E xi,x j = P(emission of x i, x j from state y) S ij y = max π log P(x ij gen'd starting in state y via path π) 49

34 M Viterbi Alignment (the inside algorithm) S ij y = max π log P(x ij generated starting in state y via path π) S ij y = % z max z [S i+1, j 1 ' z max z [S i+1, j ' z & max z [S i, j 1 ' z ' max z [S i, j (' max i<k j [S i,k y left y + logt yz + log E xi,x j y + logt yz + log E xi ] match pair ] match/insert left + logt yz + log E x j ] match/insert right + logt yz ] delete y + S right k +1, j ] bifurcation Time O(qn 3 ), q states, seq len n Time O(qn 3 ), q states, seq len n compare: O(qn) for profile HMM 50

35 An Important Application: Rfam

36 Rfam an RNA family DB riffiths-jones, et al., NAR 03, 05, 08, 11, 12 Was biggest scientific comp user in Europe cpu cluster for a month per release Rapidly growing: Rel 1.0, 1/03: 25 families, 55k instances Rel 7.0, 3/05: 503 families, 363k instances Rel 9.0, 7/08: 603 families, 636k instances Rel 9.1, 1/09: 1372 families, 1148k instances Rel 10.0, 1/10: 1446 families, 3193k instances Rel 11.0, 8/12: 2208 families, 6125k instances DB size: ~8B ~160B ~320B 58

37 RF00037: Example Rfam Family Input (hand-curated): MSA seed alignment SS_cons Score Thresh T Window Len W Output: M scan results & full alignment phylogeny, etc. IRE (partial seed alignment): Hom.sap. UUUUUAAAUUUUAUAA Hom.sap. UUUUU.UUAAAUUUUAUAA Hom.sap. UUUUUUUAAAUUUA.AA Hom.sap. UUUAU..AUAAAUUAU.AUAAA Hom.sap. UUUUUUAAAUUUUAUAA Hom.sap. AUUAU..AAAUUUU.AUAAU Hom.sap. UUU..UUAAAUUUUAAA Hom.sap. UUAU..AAAUAUU.AUAU Hom.sap. AUUAU..AAAUUU.AUAAU av.por. UUUUUAAAUUUAA Mus.mus. UAUAU..AAAUAUU.AUAU Mus.mus. UUUUUUAAAUUUAAAA Mus.mus. UAUUUUAAAUUUUAAAA Rat.nor. UAUAU..AAAUAU.AUAU Rat.nor. UAUUUUUAAAUUUUAAA SS_cons <<<<<...<<<<<...>>>>>.>>>>> 62

38 An Important Need: Faster Search

39 Homology search Homolog similar by descent from common ancestor Sequence-based Smith-Waterman FASTA BLAST For RNA, sharp decline in sensitivity at ~60-70% identity So, use structure, too 73

40 Impact of RNA homology search B. subtilis! glycine riboswitch! (Barrick, et al., 2004)! operon! L. innocua! A. tumefaciens! V. cholera! M. tuberculosis! (and 19 more species)! 74

41 Impact of RNA homology search B. subtilis! L. innocua! glycine riboswitch! (Barrick, et al., 2004)! operon! (Mandal, et al., 2004)! A. tumefaciens! V. cholera! M. tuberculosis! (and 19 more species)! (and 42 more species)! BLAST-based M-based 75

42 Faster enome Annotation of Non-coding RNAs Without Loss of Accuracy Zasha Weinberg & W.L. Ruzzo Recomb 04, ISMB 04, Bioinfo 06

43 RaveNnA: enome Scale RNA Search Typically 100x speedup over raw M, w/ no loss in accuracy: Drop structure from M to create a (faster) HMM Use that to pre-filter sequence; Discard parts where, provably, M score < threshold; Actually run M on the rest (the promising parts) Assignment of HMM transition/emission scores is key (a large convex optimization problem) Weinberg & Ruzzo, Bioinformatics, 2004,

44 M s are good, but slow Rfam Reality EMBL Our Work EMBL Rfam oal EMBL BLAST Ravenna junk M hits 1 month, 1000 computers M hits ~2 months, 1000 computers M junk hits 10 years, 1000 computers 79

45 Oversimplified M (for pedagogical purposes only) A U A U A U A U 81

46 M to HMM A U A U A U A U A U A U A U A U M HMM emisions per state 5 emissions per state, 2x states

47 Key Issue: 25 scores 10 M HMM A U P A U A U L R A U Need: log Viterbi scores M HMM 83

48 Key Issue: 25 scores 10 M A U P A U Need: log Viterbi scores M HMM P AA L A + R A P A L A + R P A L A + R P AU L A + R U P A L A + R A U L P A L + R A P L + R P L + R P U L + R U P L + R R HMM A U NB: HMM not a prob. model 85

49 Rigorous Filtering Any scores satisfying the linear inequalities give rigorous filtering Proof: M Viterbi path score corresponding HMM path score Viterbi HMM path score (even if it does not correspond to any M path) P AA L A + R A P A L A + R P A L A + R P AU L A + R U P A L A + R 86

50 Minimizing E(L i, R i ) (subject to linear constraints) alculate E(L i, R i ) symbolically, in terms of emission scores, so we can do partial derivatives for numerical convex optimization algorithm Forward: Viterbi: E(L 1, L 2,...) L i 90

51 Assignment of scores/ probabilities onvex optimization problem onstraints: enforce rigorous property Objective function: filter as aggressively as possible Problem sizes: variables inequality constraints 91

52 onvex Optimization onvex: local max = global max; simple hill climbing works Nonconvex: can be many local maxima, global max; hill-climbing fails 92

53 Estimated Filtering Efficiency (139 Rfam 4.0 families) Filtering fraction # families (compact) # families (expanded) break even < Averages 283 times faster than M! ~100x speedup 93

54 Motif Discovery

55 RNA Motif Discovery Would be great if: given 100 complete genomes from diverse species, we could automatically find all the RNAs. State of the art: that s hopeless Hope: can we exploit biological knowledge to narrow the search space? 130

56 RNA Motif Discovery More promising problem: given a unaligned sequences of a few kb, most of which contain instances of one RNA motif of bp -- find it. Example: 5 UTRs of orthologous glycine cleavage genes from γ-proteobacteria Example: corresponding introns of orthogolous vertebrate genes Orthologs = counterparts in different species 131

57 Approaches Align-First: Align sequences, then look for common structure Fold-First: Predict structures, then try to align them Joint: Do both together 132

58 Pitfall for sequence-alignment- first approach Structural conservation Sequence conservation Alignment without structure information is unreliable LUSTALW alignment of SEIS elements with flanking regions same-colored boxes should be aligned 134

59 Approaches Align-first: align sequences, then look for common structure Fold-first: Predict structures, then try to align them single-seq struct prediction only ~ 60% accurate; exacerbated by flanking seq; no biologicallyvalidated model for structural alignment Joint: Do both together Sankoff good but slow Heuristic 139

60 Our Approach: Mfinder Simultaneous local alignment, folding and Mbased motif description using an EM-style learning procedure Yao, Weinberg & Ruzzo, Bioinformatics,

61 MFinder Simultaneous alignment, folding & motif description Yao, Weinberg & Ruzzo, Bioinformatics, 2006 Folding predictions ombines folding & mutual information in a principled way. Smart heuristics andidate alignment M Mutual Information Realign EM 146

62 Mfinder Accuracy (on Rfam families with flanking sequence) /W /W 153

63 Discovery in Bacteria 156

64 Right Data: Why/How We can recognize, say, 5-10 good examples amidst 20 extraneous ones (but not 5 in 200 or 2000) of length 1k or 10k (but not 100k) Regulators often near regulatees (protein coding genes), which are usually recognizable cross-species So, look near similar genes ( homologs ) Many riboswitches, e.g., are present in ~5 copies per genome (Not strategy used in vertebrates x larger genomes) 164

65 Processing Times Identify DD group members 2946 DD groups Retrieve upstream sequences < 10 PU days Footprinter ranking < 10 PU days Input from ~70 complete Firmicute genomes available in late 2005-early 2006, totaling ~200 megabases Mfinder motifs Motif postprocessing 1740 motifs RaveNnA Mfinder refinement 1 ~ 2 PU months 10 PU months < 1 PU month Motif postprocessing 1466 motifs 179

66 Table 1: Motifs that correspond to Rfam families Rank Score # DD Rfam RAV MF FP RAV MF ID ene Description IlvB Thiamine pyrophosphate-requiring enzymes RF00230 T-box O3859 Predicted membrane protein RF00059 THI MetH Methionine synthase I specific DNA methylase RF00162 S_box O0116 Predicted N6-adenine-specific DNA methylase RF00011 RNaseP_bact_b DHBP 3,4-dihydroxy-2-butanone 4-phosphate synthase RF00050 RFN uaa MP synthase RF00167 Purine cvp lycine cleavage system protein P RF00504 lycine DUF149 Uncharacterised BR, YbaB family O0718 RF00169 SRP_bact bib obalamin biosynthesis protein obd/bib RF00174 obalamin LysA Diaminopimelate decarboxylase RF00168 Lysine Ter Membrane protein Ter RF00080 yybp-ykoy MgtE Mg/o/Ni transporter MgtE RF00380 ykok lms lucosamine 6-phosphate synthetase RF00234 glms OpuBB AB-type proline/glycine betaine transport systems RF00005 trna EmrE Membrane transporters of cations and cationic drug RF00442 ykk-yxkd O0398 Uncharacterized conserved protein RF00023 tmrna Table 1: Motifs that correspond to Rfam families. Rank : the three columns show ranks for refined motif clusters after genome scans ( RAV ), Mfinder motifs before genome scans ( MF ), and FootPrinter results ( FP ). We used the same ranking scheme for RAV and MF. Score 180

67 Rfam Membership Overlap Structure # Sn Sp nt Sn Sp bp Sn Sp RF00174 obalamin RF00504 lycine RF00234 glms RF00168 Lysine RF00167 Purine RF00050 RFN RF00011 RNaseP_bact_b RF00162 S_box RF00169 SRP_bact RF00230 T-box RF00059 THI RF00442 ykk-yxkd RF00380 ykok RF00080 yybp-ykoy mean median Tbl 2: Prediction accuracy compared to prokaryotic subset of Rfam full alignments. Membership: # of seqs in overlap between our predictions and Rfam s, the sensitivity (Sn) and specificity (Sp) of our membership predictions. Overlap: the avg len of overlap between our predictions and Rfam s (nt), the fractional lengths of the overlapped region in Rfam s predictions (Sn) and in ours (Sp). Structure: the avg # of correctly predicted canonical base pairs (in overlapped regions) in the secondary structure (bp), and sensitivity and specificity of our predictions. 1 After 2nd RaveNnA scan, membership Sn of lycine, obalamin increased to 76% and 98% resp., lycine Sp unchanged, but obalamin Sp dropped to 84%. 184

68 Example: Ribosomal Autoregulation: Excess L19 represses L19 (RF00556; similar) A B P2 L19 (rpls) mrna leader Y U A U A R R Y U R R U 5' Y 3' P1 N N N TSS nucleotide identity nucleotide present 97% 97% 90% 90% 75% 75% 50% A stem loop always present compensatory mutations compatible mutations Watson-rick base pair other base interaction 5' P1 P2 RBS Start A A U U U U U A A U A U U 3' U U U A A A U A A U A U A A U U U U B. subtilis L19 mrna leader U U U U U L19 U U U A A A A U U U U U A U A 3' U A A U U U A 5' A A A A? 186

69 Examples: 6 (of 22) Representative motifs Sudarsan, et al Science, 2008! EMM! Wang, et al Mol ell, 2008! SAH! suca! boxed = confirmed riboswitch SAM-IV! Weinberg et al RNA 08! Moo! Regulski et al Mol Microbiol 08! Legend! O4708! Weinberg, et al. Nucl. Acids Res., July : ! Meyer, et al RNA, 2008! 191

70 Vertebrate ncrnas Some Results

71 Human Predictions Evofold S Pedersen, Bejerano, A Siepel, K Rosenbloom, K Lindblad-Toh, ES Lander, J Kent, W Miller, D Haussler, "Identification and classification of conserved RNA secondary structures in the human genome." PLoS omput. Biol., 2, #4 (2006) e33. 48,479 candidates (~70% FDR?) RNAz S Washietl, IL Hofacker, M Lukasser, A Hutenhofer, PF Stadler, "Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome." Nat. Biotechnol., 23, #11 (2005) ,000 structured RNA elements 1,000 conserved across all vertebrates. ~1/3 in introns of known genes, ~1/6 in UTRs ~1/2 located far from any known gene FOLDALIN E Torarinsson, M Sawera, JH Havgaard, M Fredholm, J orodkin, "Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure." enome Res., 16, #7 (2006) candidates from (of 100,000) pairs Mfinder Torarinsson, Yao, Wiklund, Bramsen, Hansen, Kjems, Tommerup, Ruzzo and orodkin. omparative genomics beyond sequence based alignments: RNA structures in the ENODE regions. enome Research, Feb 2008, 18(2): PMID: candidates in ENODE alone (better FDR, but still high) Some details below

72 Mfinder Search in Vertebrates Extract ENODE * Multiz alignments Remove exons, most conserved elements blocks, 8.7M bps. Apply Mfinder to both strands. 10,106 predictions, 6,587 clusters. High false positive rate, but still suggests 1000 s of RNAs. (We ve applied Mfinder to whole human genome: many 100 s of PU years. Analysis in progress.) Trust 17-way alignment for orthology, not for detailed alignment * ENODE: deeply annotated 1% of human genome 216

73 Log 10 (ount) enomic Intergap Distances (Human-Mouse) Length enome-wide Identification of Human Functional DNA Using a Neutral Indel Model erton Lunter, hris P. Ponting, Jotun Hein, PLoS omput Biol 2006, 2(1): e5. 220

74 Overlap w/ Indel Purified Segments IPS presumed to signal purifying selection Majority (64%) of candidates have >45% + Strong P-value for their overlap w/ IPS + data P N Expected Observed P-value % 0-35 igs % igs % igs % igs E % igs E % all igs E % 221

75 Realignment % realigned Average pairwise sequence similarity 225

76 Alignment Matters The original MULTIZ alignment without flanking regions. RNAz Score: (no RNA) Human TATTAAAATT-TTTAAAAAAT----TTAAATATAAAAAATAATT himp AATTTAATT-ATTTAAAAAT----ATTAAATATAAAATAAATT ow TATTTAAAATT-ATAAA--AAAAT----TTAATTTAAAAATTAATT Dog TATTTAAAATTTTAATA--AAAAAT----TTAATTTAAAATATTAATT Rabbit ATATTTAAAATTT-TTTTAATAAAAT----TTAATTATAAAATTAAATT Rhesus TATTAAAATT-TTTAAAAAATATTTAAATATAAAAAATAATT Str ((((((...(((((((...(((...)))..))))...)))...))))))...( The local Mfinder re-alignment of the MULTIZ block. RNAz Score: (RNA) Human TATTAAAATT-TTTAAA-A-----AATTTAAATATAAAAAATAA himp AATTTAATT-ATTT-AAA-----AATATTAAATATAAAATAAA ow TATTTAAAATT-ATAAA--AAA------ATTTAATTTAAAAATTAA Dog TATTTAAAATTTTAATA--AAA-A-----ATTTAATTTAAAATATTAA Rabbit ATATTTAAAATTT-TTTT-AATA-----AAATTTAATTATAAAATTAAA Rhesus TATTAAAATT-TTTAAA-AAA-TATTTAAATATAAAAAATAA Str ((((((...((((((((..(((...)))...))))))))...))))))

77 10 of 11 top (differentially) expressed 228

78 ncrna Summary ncrna is a hot topic For family homology modeling: Ms Training & search like HMM (but slower) Dramatic acceleration possible Automated model construction possible New computational methods yield new discoveries Many open problems 247

79 ourse Wrap Up

80 What is DNA? RNA? How many Amino Acids are there? Did human beings, as we know them, develop from earlier species of animals? What are stem cells? What did Viterbi invent? What is dynamic programming? What is a likelihood ratio test? What is the EM algorithm? How would you find the maximum of f(x) = ax 3 + bx 2 + cx +d in the interval -10<x<25? 249

81 Sensors High-Throughput DNA sequencing Microarrays/ene expression Mass Spectrometry/Proteomics BioTech Protein/protein & DNA/protein interaction ontrols loning ene knock out/knock in RNAi Floods of data rand hallenge problems 250

82 Exciting Times Lots to do Highly multidisciplinary You ll be hearing a lot more about it I hope I ve given you a taste of it

83 Thanks!

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

Searching for Noncoding RNA

Searching for Noncoding RNA Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why

More information

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007 RNA Search and Motif Discovery Lectures 18-19 CSE 527 Autumn 2007 The Human Parts List, circa 2001 1 gagcccggcc cgggggacgg gcggcgggat agcgggaccc cggcgcggcg gtgcgcttca 61 gggcgcagcg gcggccgcag accgagcccc

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/09sp ENOME 541 protein and DN sequence analysis to determine

More information

RNA Search and! Motif Discovery"

RNA Search and! Motif Discovery RN Search and! Motif Discovery" Lecture 9! SEP 590! utumn 2008" Outline" Whirlwind tour of ncrn search & discovery" RN motif description (ovariance Model Review)" lgorithms for searching" Rigorous & heuristic

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger) lassical RNs mrn trn rrn snrn

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo Outline Why RN? Examples of RN biology omputational hallenges Modeling Search Inference The entral Dogma DN RN

More information

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein! The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral

More information

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery SEP 527 omputational Biology RN: Function, Secondary Structure Prediction, Search, Discovery The Message ells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

RNA Secondary Structure. CSE 417 W.L. Ruzzo

RNA Secondary Structure. CSE 417 W.L. Ruzzo RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger

More information

Towards a Comprehensive Annotation of Structured RNAs in Drosophila

Towards a Comprehensive Annotation of Structured RNAs in Drosophila Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes Zizhen Yao 1*, Jeffrey Barrick 2, Zasha Weinberg 3, Shane Neph 1,4, Ronald Breaker 2,3,5, Martin Tompa

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna. onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Detecting non-coding RNA in Genomic Sequences

Detecting non-coding RNA in Genomic Sequences Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment

Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models and Their Applications in Biological Sequence Analysis Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract

More information

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis

Predicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis Fang XY, Luo ZG, Wang ZH. Predicting RNA secondary structure using profile stochastic context-free grammars and phylogenic analysis. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 23(4): 582 589 July 2008

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Searching genomes for non-coding RNA using FastR

Searching genomes for non-coding RNA using FastR Searching genomes for non-coding RNA using FastR Shaojie Zhang Brian Haas Eleazar Eskin Vineet Bafna Keywords: non-coding RNA, database search, filtration, riboswitch, bacterial genome. Address for correspondence:

More information

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics

Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I)

CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) CISC 889 Bioinformatics (Spring 2004) Sequence pairwise alignment (I) Contents Alignment algorithms Needleman-Wunsch (global alignment) Smith-Waterman (local alignment) Heuristic algorithms FASTA BLAST

More information

Bias in RNA sequencing and what to do about it

Bias in RNA sequencing and what to do about it Bias in RNA sequencing and what to do about it Walter L. (Larry) Ruzzo Computer Science and Engineering Genome Sciences University of Washington Fred Hutchinson Cancer Research Center Seattle, WA, USA

More information

De novo prediction of structural noncoding RNAs

De novo prediction of structural noncoding RNAs 1/ 38 De novo prediction of structural noncoding RNAs Stefan Washietl 18.417 - Fall 2011 2/ 38 Outline Motivation: Biological importance of (noncoding) RNAs Algorithms to predict structural noncoding RNAs

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of Computer Science San José State University San José, California, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri RNA Structure Prediction Secondary

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006

98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 98 Algorithms in Bioinformatics I, WS 06, ZBIT, D. Huson, December 6, 2006 8.3.1 Simple energy minimization Maximizing the number of base pairs as described above does not lead to good structure predictions.

More information

Algorithms in Computational Biology (236522) spring 2008 Lecture #1

Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Algorithms in Computational Biology (236522) spring 2008 Lecture #1 Lecturer: Shlomo Moran, Taub 639, tel 4363 Office hours: 15:30-16:30/by appointment TA: Ilan Gronau, Taub 700, tel 4894 Office hours:??

More information

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17

RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 RNA-Strukturvorhersage Strukturelle Bioinformatik WS16/17 Dr. Stefan Simm, 01.11.2016 simm@bio.uni-frankfurt.de RNA secondary structures a. hairpin loop b. stem c. bulge loop d. interior loop e. multi

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment

Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Module: Sequence Alignment Theory and Applications Session: Introduction to Searching and Sequence Alignment Introduction to Bioinformatics online course : IBT Jonathan Kayondo Learning Objectives Understand

More information

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types

THEORY. Based on sequence Length According to the length of sequence being compared it is of following two types Exp 11- THEORY Sequence Alignment is a process of aligning two sequences to achieve maximum levels of identity between them. This help to derive functional, structural and evolutionary relationships between

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

The wonderful world of RNA informatics

The wonderful world of RNA informatics December 9, 2012 Course Goals Familiarize you with the challenges involved in RNA informatics. Introduce commonly used tools, and provide an intuition for how they work. Give you the background and confidence

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.

(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid. 1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the

More information

Introduction to Evolutionary Concepts

Introduction to Evolutionary Concepts Introduction to Evolutionary Concepts and VMD/MultiSeq - Part I Zaida (Zan) Luthey-Schulten Dept. Chemistry, Beckman Institute, Biophysics, Institute of Genomics Biology, & Physics NIH Workshop 2009 VMD/MultiSeq

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Whole Genome Alignments and Synteny Maps

Whole Genome Alignments and Synteny Maps Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of

More information

BMI/CS 576 Fall 2016 Final Exam

BMI/CS 576 Fall 2016 Final Exam BMI/CS 576 all 2016 inal Exam Prof. Colin Dewey Saturday, December 17th, 2016 10:05am-12:05pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.

More information

Homology Modeling. Roberto Lins EPFL - summer semester 2005

Homology Modeling. Roberto Lins EPFL - summer semester 2005 Homology Modeling Roberto Lins EPFL - summer semester 2005 Disclaimer: course material is mainly taken from: P.E. Bourne & H Weissig, Structural Bioinformatics; C.A. Orengo, D.T. Jones & J.M. Thornton,

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Supplementary Information for Discovery and characterization of indel and point mutations

Supplementary Information for Discovery and characterization of indel and point mutations Supplementary Information for Discovery and characterization of indel and point mutations using DeNovoGear Avinash Ramu 1 Michiel J. Noordam 1 Rachel S. Schwartz 2 Arthur Wuster 3 Matthew E. Hurles 3 Reed

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

Alignment & BLAST. By: Hadi Mozafari KUMS

Alignment & BLAST. By: Hadi Mozafari KUMS Alignment & BLAST By: Hadi Mozafari KUMS SIMILARITY - ALIGNMENT Comparison of primary DNA or protein sequences to other primary or secondary sequences Expecting that the function of the similar sequence

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Matrix-based pattern discovery algorithms

Matrix-based pattern discovery algorithms Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)

More information

Hidden Markov Models

Hidden Markov Models Andrea Passerini passerini@disi.unitn.it Statistical relational learning The aim Modeling temporal sequences Model signals which vary over time (e.g. speech) Two alternatives: deterministic models directly

More information

Genome 373: Hidden Markov Models II. Doug Fowler

Genome 373: Hidden Markov Models II. Doug Fowler Genome 373: Hidden Markov Models II Doug Fowler Review From Hidden Markov Models I What does a Markov model describe? Review From Hidden Markov Models I A T A Markov model describes a random process of

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Alignment Algorithms. Alignment Algorithms

Alignment Algorithms. Alignment Algorithms Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.

More information

IMPROVEMENT OF STRUCTURE CONSERVATION INDEX WITH CENTROID ESTIMATORS

IMPROVEMENT OF STRUCTURE CONSERVATION INDEX WITH CENTROID ESTIMATORS 1 IMPROVEMENT OF STRUCTURE CONSERVATION INDEX WITH CENTROID ESTIMATORS YOHEI OKADA Department of Biosciences and Informatics, Keio University, 3 14 1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223 8522, Japan

More information

Sequence Alignment Techniques and Their Uses

Sequence Alignment Techniques and Their Uses Sequence Alignment Techniques and Their Uses Sarah Fiorentino Since rapid sequencing technology and whole genomes sequencing, the amount of sequence information has grown exponentially. With all of this

More information

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University

Sequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

13 Comparative RNA analysis

13 Comparative RNA analysis 13 Comparative RNA analysis Sources for this lecture: R. Durbin, S. Eddy, A. Krogh und G. Mitchison, Biological sequence analysis, Cambridge, 1998 D.W. Mount. Bioinformatics: Sequences and Genome analysis,

More information

Computational methods for predicting protein-protein interactions

Computational methods for predicting protein-protein interactions Computational methods for predicting protein-protein interactions Tomi Peltola T-61.6070 Special course in bioinformatics I 3.4.2008 Outline Biological background Protein-protein interactions Computational

More information

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1

Tiffany Samaroo MB&B 452a December 8, Take Home Final. Topic 1 Tiffany Samaroo MB&B 452a December 8, 2003 Take Home Final Topic 1 Prior to 1970, protein and DNA sequence alignment was limited to visual comparison. This was a very tedious process; even proteins with

More information

CSE 527 Phylogeny & RNA: Pfold. Lectures Autumn 2006

CSE 527 Phylogeny & RNA: Pfold. Lectures Autumn 2006 CSE 527 Phylogeny & RNA: Pfold Lectures 20-21 Autumn 2006 Phylogenies (aka Evolutionary Trees) Nothing in biology makes sense, except in the light of evolution -- Dobzhansky Modeling Sequence Evolution

More information