RNA Search and! Motif Discovery"
|
|
- Lizbeth Pope
- 5 years ago
- Views:
Transcription
1 RN Search and! Motif Discovery" Lecture 9! SEP 590! utumn 2008" Outline" Whirlwind tour of ncrn search & discovery" RN motif description (ovariance Model Review)" lgorithms for searching" Rigorous & heuristic filtering" Motif discovery" pplications" 2 RN Motif Models" Motif Description" ovariance Models (Eddy & Durbin 1994)" aka profile stochastic context-free grammars" aka hidden Markov models on steroids" Model position-specific nucleotide preferences and base-pair preferences" Pro: accurate" on: model building hard, search sloooow" 7 8
2 Example: searching for trns Profile Hmm Structure" 13 Mj: "Match states (20 emission probabilities)" Ij: "Insert states (Background emission probabilities)" Dj: "Delete states (silent - no emission)" 17 M Structure" M Viterbi lignment" : Sequence + structure" B: the M guide tree " : probabilities of letters/ pairs & of indels" Think of each branch being an HMM emitting both sides of a helix (but 3 side emitted in reverse order)" 18 x i x ij = i th letter of input = substring i,..., j of input T yz = P(transition y " z) y E xi,x j S ij y = P(emission of x i, x j from state y) = max # log P(x ij gen'd starting in state y via path #) 20
3 mrn leader S ij y = max " log P(x ij generated starting in state y via path ") % z y max z [S i+1, j#1 + logt yz + log E xi,x j ] match pair ' z y max z [S i+1, j + logt yz + log E xi ] match/insert left ' S y z ij = & max z [S i, j#1 + logt yz + log E y x j ] match/insert right ' z ' max z [S i, j + logt yz ] delete y max i<k$ j [S left y i,k + S right (' k +1, j ] bifurcation mrn leader switch? Time O(qn 3 ), q states, seq len n Mutual Information" f M ij = " f xi,xj xi,xj log xi,xj 2 ; 0 # M ij # 2 f xi f xj Max when no seq conservation but perfect pairing" MI = expected score gain from using a pair state" Finding optimal MI, (i.e. opt pairing of cols) is hard(?)" Finding optimal MI without pseudoknots can be done by dynamic programming" 24 26
4 Pseudoknots disallowed allowed # n % & " max j M $ i=1 i, j (/2 ' Rfam an RN family DB! riffiths-jones, et al., NR 03, 05" Biggest scientific computing user in Europe cpu cluster for a month per release" Rapidly growing:" Rel 1.0, 1/03: 25 families, 55k instances" Rel 7.0, 3/05: 503 families, >300k instances" 33 Input (hand-curated):" MS seed alignment " SS_cons" Score Thresh T" Window Len W" Output:" M" scan results & full alignment " Rfam" IRE (partial seed alignment):! Hom.sap. Hom.sap.. Hom.sap.. Hom.sap.... Hom.sap. Hom.sap.... Hom.sap... Hom.sap.... Hom.sap.... av.por. Mus.mus.... Mus.mus. Mus.mus. Rat.nor.... Rat.nor. SS_cons <<<<<...<<<<<...>>>>>.>>>>> 34
5 Faster Search" Faster enome nnotation! of Non-coding RNs! Without Loss of ccuracy" Zasha Weinberg " & W.L. Ruzzo" Recomb 04, ISMB 04, Bioinfo 06" RaveNn: enome Scale! RN Search" Typically 100x speedup over raw M, w/ no loss in accuracy: " " "drop structure from M to create a (faster) HMM" "use that to pre-filter sequence; " "discard parts where, provably, M score < threshold;" "actually run M on the rest (the promising parts)" "assignment of HMM transition/emission scores is key " ""(large convex optimization problem)" Weinberg & Ruzzo, Bioinformatics, 2004, junk M s are good, but slow! Rfam Reality EMBL BLST M hits 1 month, 1000 computers Our Work EMBL Ravenna M hits ~2 months, 1000 computers junk Rfam oal EMBL M hits 10 years, computers
6 M M to HMM" HMM M Key Issue: 25 scores " 10" HMM P L R Need: log Viterbi scores M! HMM" 25 emisions per state 5 emissions per state, 2x states Viterbi/Forward Scoring" Path! defines transitions/emissions" Score(!) = product of probabilities on!" NB: ok if probs aren t, e.g.!"1! (e.g. in M, emissions are odds ratios vs! 0th-order background)" For any nucleotide sequence x:" Viterbi-score(x) = max{ score(!)! emits x}" Forward-score(x) =!{ score(!)! emits x}" 45 M Key Issue: 25 scores " 10" P Need: log Viterbi scores M! HMM" P! L + R P! L + R P! L + R P! L + R P! L + R L P! L + R P! L + R P! L + R P! L + R P! L + R R HMM 46 NB: HMM not a prob. model
7 Rigorous Filtering" ny scores satisfying the linear inequalities give rigorous filtering! Proof:! M Viterbi path score!! corresponding HMM path score!! Viterbi HMM path score! (even if it does not correspond to any M path)" P! L + R P! L + R P! L + R P! L + R P! L + R 47 Some scores filter better" P = 1! L + R " P = 4! L + R " " " " " " " ssuming # 25% " Option 1: " " " "Opt 1:" L = R = R = 2 " " L + (R + R )/2 = 4 " Option 2: " " " "Opt 2:" L = 0, R = 1, R = 4 " L + (R + R )/2 = 2.5" 48 Optimizing filtering" For any nucleotide sequence x:" Viterbi-score(x) = max{ score(!)! emits x }" Forward-score(x) =!{ score(!)! emits x }" Expected Forward Score" E(L i, R i ) =! all sequences x Forward-score(x)*Pr(x)" NB: E is a function of L i, R i only" nder 0th-order background model Optimization:! Minimize E(L i, R i ) subject to score Lin.Ineq.s" This is heuristic ( forward$ % Viterbi$ % filter$ )" But still rigorous because subject to score Lin.Ineq.s " 49 alculating E(L i, R i )" E(L i, R i ) =! x Forward-score(x)*Pr(x)" Forward-like: for every state, calculate expected score for all paths ending there; easily calculated from expected scores of predecessors & transition/emission probabilities/scores" 50
8 Minimizing E(L i, R i )" onvex Optimization" alculate E(L i, R i ) symbolically, in terms of emission scores, so we can do partial derivatives for numerical convex optimization algorithm" Forward: Viterbi: onvex:! local max = global max;" simple hill climbing works" Nonconvex:! can be many local maxima, << global max;! hill-climbing fails" "E(L 1, L 2,...) "L i Estimated Filtering Efficiency! (139 Rfam 4.0 families)" Filtering fraction" # families (compact)" # families (expanded)" < 10-4 " 105" 110" " 8" 17" " 11" 3" " 2" 2" " 6" 4" " 7" 3" ~100x speedup 53 Name" Results: New ncrn s?" # found" BLST! + M" # found rigorous filter + M" # new" Pyrococcus snorn" 57" 180" 123" Iron response element" 201" 322" 121" Histone 3 element" 1004" 1106" 102" Purine riboswitch" 69" 123" 54" Retron msr" 11" 59" 48" Hammerhead I" 167" 193" 26" Hammerhead III" 251" 264" 13" 4 snrn" 283" 290" 7" S-box" 128" 131" 3" 6 snrn" 1462" 1464" 2" 5 snrn" 199" 200" 1" 7 snrn" 312" 313" 1" 54
9 RN Motif Discovery" Motif Discovery" Typical problem: given a ~10-20 unaligned sequences of ~1kb, most of which contain instances of one RN motif of, say, 150bp -- find it." Example: 5 TRs of orthologous glycine cleavage genes from &-proteobacteria" Searching for noncoding RNs" M s are great, but where do they come from?" n approach: comparative genomics" Search for motifs with common secondary structure in a set of functionally related sequences." hallenges" Three related tasks" Locate the motif regions." lign the motif instances." Predict the consensus secondary structure." Motif search space is huge!" Motif location space, alignment space, structure space." 62 mfinder-- ovariance! Model Based RN Motif! Finding lgorithm Bioinformatics, 2006, 22(4): Zizhen Yao" Zasha Weinberg" Walter L. Ruzzo" niversity of Washington, Seattle" 63
10 pproaches" Obvious pproach I: lign First,! Predict from Multiple Sequence lignment" lign sequences, then look for common structure" Predict structures, then try to align them" Do both together" " " " " " " ompensatory mutations reveal structure, (core of comparative sequence analysis ) but usual alignment algorithms penalize them (twice) Pfold (KH03) Test Set D Trusted alignment Pitfall for sequence-alignmentfirst approach" Structural conservation " Sequence conservation" lignment without structure information is unreliable" LSTLW alignment of SEIS elements with flanking regions lustalw lignment Evolutionary Distance Knudsen & Hein, Pfold: RN secondary structure prediction using stochastic context-free grammars, Nucleic cids Research, 2003, v 31, same-colored boxes should be aligned 67
11 pproaches" lign sequences, then look for common structure" Predict structures, then try to align them" single-seq struct prediction only ~ 60% accurate; exacerbated by flanking seq; no biologicallyvalidated model for structural alignment" Do both together" Sankoff good but slow" Heuristic" 68 Our pproach: Mfinder" Simultaneous alignment, folding and Mbased motif description using an EM-style learning procedure" Yao, Weinberg & Ruzzo, Bioinformatics, Design oals" Find RN motifs in unaligned sequences" Seq conservation exploited, but not required" Robust to inclusion of unrelated sequences" Robust to inclusion of flanking sequence" Reasonably fast and scalable" Produce a probabilistic model of the motif that can be directly used for homolog search" lignment " M " lignment" Similar to HMM, but slower" Builds on Eddy & Durbin, 94" But new way to infer which columns to pair, via a principled combination of mutual information and predicted folding energy" nd, it s local, not global, alignment (harder)" 70 71
12 Folding predictions Mfinder Outline" M step Heuristics andidate alignment M E step Realign Search Initial lignment Heuristics" fold sequences separately" candidates: regions with low folding energy" compare candidates via tree edit algorithm" find best central candidates & align to them" BLST anchors" EM M-step uses M.I. + folding energy for structure prediction Structure Inference" Part of M-step is to pick a structure that maximizes data likelihood" We combine:" mutual information" position-specific priors for paired/unpaired! (based on single sequence thermodynamic folding predictions)" intuition: for similar seqs, little MI; fall back on singlesequence folding predictions" data-dependent, so not strictly Bayesian " Mfinder ccuracy! (on Rfam families with flanking sequence)" /W /W 74 78
13 Searching for noncoding RNs" pplication I! omputational Pipeline for High Throughput Discovery of cis-regulatory Noncoding RN in Prokaryotes. Yao, Barrick, Weinberg, Neph, Breaker, Tompa and Ruzzo. PLoS omputational Biology. 3(7): e126, July 6, M s are great, but where do they come from?" n approach: comparative genomics" Search for motifs with common secondary structure in a set of functionally related sequences." hallenges" Three related tasks" Locate the motif regions." lign the motif instances." Predict the consensus secondary structure." Motif search space is huge!" Motif location space, alignment space, structure space." 81 Predicting New cis-regulatory RN Elements" oal: " iven unaligned TRs of coexpressed or orthologous genes, find common structural motifs" Difficulties: " Low sequence similarity: alignment difficult" Varying flanking sequence " Motif missing from some input genes" Right Data: Why/How" We can recognize, say, 5-10 good examples amidst 20 extraneous ones (but not 5 in 200 or 2000) of length 1k or 10k (but not 100k)" Regulators often near regulatees (protein coding genes), which are usually recognizable cross-species" So, find similar genes ( homologs ), look at adjacent DN " (Not strategy used in vertebrates x larger genomes)" 82 83
14 et bacterial genomes" pproach" For each gene, get close orthologs (DD)" Find most promising genes, based on conserved sequence motifs (Footprinter)" From those, find structural motifs (Mfinder)" enome-wide search for more instances "(Ravenna)" Expert analyses (Breaker Lab, Yale)" 84 pipeline for RN motif genome scans" DD Ortholgous genes pstream sequences Top datasets Footprinter Rank datasets Mfinder Motifs Homologs Search enome database Yao, Barrick, Weinberg, Neph, Breaker, Tompa and Ruzzo. omputational Pipeline for High Throughput Discovery of cis-regulatory Noncoding RN in 85 Prokaryotes. PLoS omputational Biology. 3(7): e126, July 6, 2007.! enome Scale Search: Why" enome Scale Search" Many riboswitches, e.g., are present in ~5 copies per genome" In most close relatives " More examples give better model, hence even more examples, fewer errors " More examples give more clues to function - critical for wet lab verification" But inclusion of non-examples can degrade motif " 86 Mfinder is directly usable for/with search" Folding predictions Smart heuristics andidate alignment Realign M Search 88
15 Results" Have analyzed most sequenced bacteria (~2005) " bacillus/clostridia" gamma proteobacteria" cyanobacteria" actinobacteria" firmicutes" Identify DD group members 2946 DD groups Retrieve upstream sequences Footprinter ranking Mfinder motifs Motif postprocessing 1740 motifs RaveNn Mfinder refinement < 10 P days < 10 P days 1 ~ 2 P months 10 P months < 1 P month Motif postprocessing 1466 motifs Rank Score # DD Rfam RV MF FP RV MF ID ene Description IlvB Thiamine pyrophosphate-requiring enzymes RF00230 T-box O3859 Predicted membrane protein RF00059 THI MetH Methionine synthase I specific DN methylase RF00162 S_box O0116 Predicted N6-adenine-specific DN methylase RF00011 RNaseP_bact_b DHBP 3,4-dihydroxy-2-butanone 4-phosphate synthase RF00050 RFN ua MP synthase RF00167 Purine cvp lycine cleavage system protein P RF00504 lycine DF149 ncharacterised BR, YbaB family O0718 RF00169 SRP_bact bib obalamin biosynthesis protein obd/bib RF00174 obalamin Lys Diaminopimelate decarboxylase RF00168 Lysine Ter Membrane protein Ter RF00080 yybp-ykoy MgtE Mg/o/Ni transporter MgtE RF00380 ykok lms lucosamine 6-phosphate synthetase RF00234 glms OpuBB B-type proline/glycine betaine transport systems RF00005 trn EmrE Membrane transporters of cations and cationic drug RF00442 ykk-yxkd O0398 ncharacterized conserved protein RF00023 tmrn Table 1: Motifs that correspond to Rfam families. Rank : the three columns show ranks for refined motif clusters after genome scans ( RV ), Mfinder motifs before genome scans ( MF ), and FootPrinter results ( FP ). We used the same ranking scheme for RV and MF. Score 91 Rfam Membership Overlap Structure # Sn Sp nt Sn Sp bp Sn Sp RF00174 obalamin RF00504 lycine RF00234 glms RF00168 Lysine RF00167 Purine RF00050 RFN RF00011 RNaseP_bact_b RF00162 S_box RF00169 SRP_bact RF00230 T-box RF00059 THI RF00442 ykk-yxkd RF00380 ykok RF00080 yybp-ykoy mean median Table 2: Motif prediction accuracy vs prokaryotic subset of Rfam full alignments. Membership : the number of sequences in the overlap between our predictions and Rfam s ( # ), the sensitivity ( Sn ) and specificity ( Sp ) of our membership predictions. Overlap : avg length of overlap between our predictions and Rfam s ( nt ), the fractional lengths of the overlapped region in Rfam s predictions ( Sn ) and in ours ( Sp ). Structure : avg number of correctly predicted canonical base pairs (in overlapped regions) and the sensivity ( Sn ) and specificity ( Sp ) of our predictions. 1 fter another iteration of RaveNn scan and refinement, the membership sensitivities of lycine and obalamin increased to 76% and 98% respectively, while the specificity of 92 lycine remained the same, and specificity of obalamin dropped to 84%.
16 Rank # DD ene: Description nnotation DHOase IIa: Dihydroorotase PyrR attenuator [22] RplL: Ribosomal protein L7/L1 L10 r-protein leader; see Supp RpsF: Ribosomal protein S6 S6 r-protein leader O1179: Dinucleotide-utilizing enzymes 6S RN [25] RpsJ: Ribosomal protein S10 S10 r-protein leader; see Supp Resolvase: N terminal domain Inf: Translation initiation factor 3 IF-3 r-protein leader; see Supp RpsD: Ribosomal protein S4 and related proteins S4 r-protein leader; see Supp [30] rol: haperonin roel Hrc DN binding site [46] Ribosomal L21p: Ribosomal prokaryotic L21 protein L21 r-protein leader; see Supp ad: admium resistance transporter [47] RplB: Ribosomal protein L2 S10 r-protein leader RN pol Rpb2 1: RN polymerase beta subunit O3830: T domain-containing protein Ribosomal S2: Ribosomal protein S2 S2 r-protein leader Rps: Ribosomal protein S7 S12 r-protein leader O2984: B-type uncharacterized transport system tsr: Firmicutes transcriptional repressor of class III tsr DN binding site [48] Formyl trans N: Formyl transferase PurE: Phosphoribosylcarboxyaminoimidazole O4129: Predicted membrane protein RplO: Ribosomal protein L15 L15 r-protein leader RpmJ: Ribosomal protein L36 IF-1 r-protein leader na B: na protein B-type domain Ribosomal S12: Ribosomal protein S12 S12 r-protein leader Ribosomal L4: Ribosomal protein L4/L1 family L3 r-protein leader O0742: N6-adenine-specific methylase ylbh putative RN motif [4] Pencillinase R: Penicillinase repressor BlaI, MecI DN binding site [49] Ribosomal S9: Ribosomal protein S9/S16 L13 r-protein leader; Fig Ribosomal L19: Ribosomal protein L19 L19 r-protein leader; Fig ap: lyceraldehyde-3-phosphate dehydrogenase/erythrose O4708: Predicted membrane protein O0325: Predicted enzyme with a TIM-barrel fold RpmF: Ribosomal protein L32 L32 r-protein leader LDH: L-lactate dehydrogenases spr: Predicted rrn methylase Fus: Translation elongation factors EF- r-protein leader mrn leader mrn leader switch? 94 pplication II" boxed = confirmed riboswitch (+2 more) Identification of 22 candidate structured RNs in bacteria using the Mfinder comparative genomics pipeline. " Weinberg, Barrick, Yao, Roth, Kim, ore, Wang, Lee, Block, Sudarsan, Neph, Tompa, Ruzzo and Breaker. " Nucl. cids Res., July : " Weinberg, et al. Nucl. cids Res., July : !
17 New Riboswitches" EMM regulated genes" SM IV SH MOO PreQ1 II EMM "(S-adenosyl methionine)" "(S-adenosyl homocystein)" "(Molybdenum ofactor)" "(queuosine precursor)" "(cyclic di-mp)" Pili and flagella! Secretion " hemotaxis " Signal transduction " hitin" Membrane Peptide " Other - tfox, cytochrome c" EMM sense a metabolite (cyclic di-mp) produced for signal transduction or for cell-cell communication tility?" ncrn discovery in Vertebrates" nknown" BT" omparative genomics beyond sequence based alignments: RN structures in the ENODE regions E.g., there are no known human riboswitches, so potentially fewer side effects from drugs that might target them" Some such drugs (w/ previously unknown targets) have been known for decades!" E. Torarinsson, Z. Yao, E. D. Wiklund, J. B. Bramsen,. Hansen, J. Kjems, N. Tommerup, W. L. Ruzzo and J. orodkin enome Research, Jan
18 ncrn discovery in Vertebrates" Previous studies focus on highly conserved regions (Washietl, Pedersen et al. 2007)" Evofold (Pedersen et al. 2006)" RNz (Washietl et al. 2005)" We explore regions with weak sequence conservation " pproach" Extract ENODE Multiz alignments " Remove exons, most conserved elements. " blocks, 8.7M bps." pply Mfinder to both strands." 10,106 predictions, 6,587 clusters. " False positive rate: 50% based on a heuristic ranking function. " Search in Vertebrates" Extract ENODE Multiz alignments " Remove exons, most conserved elements. " blocks, 8.7M bps." pply Mfinder to both strands." 10,106 predictions, 6,587 clusters. " High false positive rate, but still suggests 1000 s of RNs. " (We ve applied Mfinder to whole human genome:! O(1000) P years. nalysis in progress.)" Trust 17-way alignment for orthology, not for detailed alignment ssoc w/ coding genes" Many known human ncrns lie in introns" Several of our candidates do, too, including some of the tested ones" #6: SYN3 (Synapsin 3)" #10: TIMP3, antisense within SYN3 intron" #9: RM8 (glutamate receptor metabotropic 8)"
19 Overlap with known transcripts" Input regions include only one known ncrn hasmir-483, and we found it." 40% intergenetic, 60% overlap with protein coding gene" 114 Overlap w/ Indel Purified Segments" IPS presumed to signal purifying selection" Majority (64%) of candidates have >45% +" Strong P-value for their overlap w/ IPS " + data P N Expected Observed P-value % 0-35 igs % igs % igs % igs E % igs E % all igs E % 115 omparison with Evofold, RNz" lignment Matters" Mfinder Evofold RNz 1781 Small overlap (w/ highly significant p-values) emphasizes complementarity
20 Realignment" New scoring scheme " % realigned oal: improve false discovery rate for top ranking motifs " urrent methods can not improve beyond 50% FDR by using higher score threshold. " Neither RNz nor Evofold are robust on poorly conserved and gappy regions. (Of course, they weren t designed to be.)" verage pairwise sequence similarity Test on Mfinder motifs in ENODE regions" 10 of 11 top (differentially) expressed" False Discovery Rate FDR vs score ranks in the original alignments
21 Summary" ncrn - apparently widespread, much interest" ovariance Models - powerful but expensive tool for ncrn motif representation, search, discovery" Rigorous/Heuristic filtering - typically 100x speedup in search with no/little loss in accuracy" Mfinder - good M-based motif discovery in unaligned sequences " Pipeline integrating comp and bio for ribowitch discovery" Potentially many ncrns with weak sequence conservation in vertebrates. " Summary" Lots of structurally conserved ncrn" Functional significance often unclear" But high rate of confirmed tissue-specific expression in (small) set of top candidates in humans" BI P demands " Still need for further methods development & application" Thanks!" Discovering ncrns in prokaryotes through genome-wide clustering Elizabeth Tseng W SE 135
22 Our work! oal! lustering for homologous ncrn prediction! Our pproach! luster genomic sequences by homology! Incorporate secondary structure information! hallenges! Input: large search space! Homology inference: what tools to use?! How to evaluate? Overview! Motivation! pproach!lustering based on homology!incorporating secondary structure information! Evaluation! onclusion Overview of approach full genomic sequences full genomic sequence for species X 5 3 enbank annotated DS / trn / rrn / repeat regions for X strand 5 - strand 3 5 Intergenic Region (IR) extraction remove annotated regions < 15bps discard IRs < 15 bps discard IRs adjacent to rrn extracted IR for species X
23 Overview of approach Homology search programs full genomic sequences Query Subject Database Hits Intergenic Region (IR) extraction pool of IRs TTTTTTT TTTTTTTTTT T.. DB of sequences Query segment : x1-x2 Subject segment: y1-y2 Matching score: S Homology search Homology search programs! Popular homology search programs:! NBI-BLST! W-BLST! FST! SSERH Overview of approach full genomic sequences Intergenic Region (IR) extraction pool of IRs ses dynamic programming to find matching regions between two sequences SSERH 10 times slower than the rest homology hits Homology search Hierarchical clustering
24 Hierarchical clustering What if segments overlap?! Homology program produces a list of hits between IR segments! IR segments! nodes! hit between two segments! connecting edge! Similarity score! edge weight Query segment : x1-x2 Subject segment: y1-y2 Matching score: S1 Query segment : x3-x4 Subject segment: z1-z2 Matching score: S2 What if (x1,x2) and (x3,x4) overlap by a significant portion? Query segment : x1-x2 Subject segment: y1-y2 Matching score: S x1-x2 S y1-y2 y1 y2 z1 z2 x1 x3 x2 x4 highly conserved in sequence What if segments overlap? WPM (Weighted Pair roup Method using arithmetic veraging) y1 y2 x1 x3 x2 x4 highly conserved in sequence z1 z2! While exists a connecting edge! Select the edge with highest weight! Replace the connected two nodes with an new internal node! pdate edge associations hit y1-y2 F 6 x1-x4 hit z1-z2 infer 4 B 2 3 E 10 D 5
25 4 B 2 10 F E D 6 4 E B (,D) (F,) 4 B E F 6 B 1 4 E 0.3 (F,) (,D) ((,D),) se size threshold to cut down tree size Overview (((,D),), (B,E)) (F,) luster too big? D B E F! Motivation! pproach!lustering based on homology!incorporating secondary structure information! Evaluation! onclusion
26 Secondary structure info Secondary structure info!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! <<<<<<... >>>>>> <<<<.... >>>>! More conserved in structure than sequence! an we include secondary structure when searching for homologs?!!! convert 4-alphabet (,,,) to 12-alphabet {<, >, _} x {,,,} allow for mismatches between alphabets that are from different nucleotides, but the same structure DIY scoring matrix (<, <)! great (<, <)! good (<, _)! bad Predicting structures on IR iven an IR: 1.! Break into overlapping pieces (prev. slide) 2.! Feed each piece to RNfold! obtain structure 3.! onvert pieces from 4-alphabet to 12-alphabet 4.! se homology program with DIY scoring matrix 5.! Same clustering process INPT: enomic sequences IR extraction Incorporate structure info. Homology search program Hierarchical clustering Tree cutting OTPT: lusters of IR segments
27 Example of a good scan ncrn motif IR segment Example of a bad scan M scan recovered 95% of glms with NO false hits! M scan returned ~5000 false hits with 6 ylbh positive hits
Searching for Noncoding RNA
Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why
More informationGenome 559 Wi RNA Function, Search, Discovery
Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,
More informationRNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"
RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure
More informationModeling and Searching for Non-Coding RNA
Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/09sp ENOME 541 protein and DN sequence analysis to determine
More informationRNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007
RNA Search and Motif Discovery Lectures 18-19 CSE 527 Autumn 2007 The Human Parts List, circa 2001 1 gagcccggcc cgggggacgg gcggcgggat agcgggaccc cggcgcggcg gtgcgcttca 61 gggcgcagcg gcggccgcag accgagcccc
More informationModeling and Searching for Non-Coding RNA
Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo Outline Why RN? Examples of RN biology omputational hallenges Modeling Search Inference The entral Dogma DN RN
More informationROC TPR FPR
HW5 2 3 TPR 0.0 0.2 0.4 0.6 0.8 1.0 291 411 537 669 807 957 1098 1269 1506 1971 8703 171 51 RO TPR = True Positive Rate = Sensitivity = Recall = TP/(TP+FN) FPR = False Positive Rate = 1 Specificity = FP/(FP+TN)
More informationModeling and Searching for Non-Coding RNA
Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger) lassical RNs mrn trn rrn snrn
More informationA Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes
A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes Zizhen Yao 1*, Jeffrey Barrick 2, Zasha Weinberg 3, Shane Neph 1,4, Ronald Breaker 2,3,5, Martin Tompa
More informationThe Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!
The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral
More informationA Method for Aligning RNA Secondary Structures
Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN
More informationComparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey
Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes
More informationRNA Secondary Structure. CSE 417 W.L. Ruzzo
RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger
More informationIntroduction to Hidden Markov Models for Gene Prediction ECE-S690
Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence
More informationConserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.
onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence
More informationToday s Lecture: HMMs
Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models
More informationStatistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department
More informationStatistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences
Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic
More informationHidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)
Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P
More informationCMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison
CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture
More informationCopyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained
Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.
More informationSequence Alignment (chapter 6)
Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:
More informationA graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information
graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of
More informationStephen Scott.
1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationBackground: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)
Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?
More informationO 3 O 4 O 5. q 3. q 4. Transition
Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in
More informationRNA Abstract Shape Analysis
ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang
More informationRNAscClust clustering RNAs using structure conservation and graph-based motifs
RNsclust clustering RNs using structure conservation and graph-based motifs Milad Miladi 1 lexander Junge 2 miladim@informatikuni-freiburgde ajunge@rthdk 1 Bioinformatics roup, niversity of Freiburg 2
More informationMarkov Models & DNA Sequence Evolution
7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under
More informationCSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery
SEP 527 omputational Biology RN: Function, Secondary Structure Prediction, Search, Discovery The Message ells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex
More informationMultiple Sequence Alignment
Multiple equence lignment Four ami Khuri Dept of omputer cience an José tate University Multiple equence lignment v Progressive lignment v Guide Tree v lustalw v Toffee v Muscle v MFFT * 20 * 0 * 60 *
More informationHMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder
HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding
More informationExample: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding
Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)
More informationDUPLICATED RNA GENES IN TELEOST FISH GENOMES
DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary
More information1/22/13. Example: CpG Island. Question 2: Finding CpG Islands
I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island
More informationHMMs and biological sequence analysis
HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the
More informationRNA Protein Interaction
Introduction RN binding in detail structural analysis Examples RN Protein Interaction xel Wintsche February 16, 2009 xel Wintsche RN Protein Interaction Introduction RN binding in detail structural analysis
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationNewly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:
m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail
More informationTowards a Comprehensive Annotation of Structured RNAs in Drosophila
Towards a Comprehensive Annotation of Structured RNAs in Drosophila Rebecca Kirsch 31st TBI Winterseminar, Bled 20/02/2016 Studying Non-Coding RNAs in Drosophila Why Drosophila? especially for novel molecules
More informationMultiple Sequence Alignment using Profile HMM
Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationAlignment Algorithms. Alignment Algorithms
Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.
More informationOutline. Sequence-comparison methods. Buzzzzzzzz. Why compare sequences? Gerard Kleywegt Uppsala University
MB330 - January, 2006 Sequence-comparison methods erard Kleywegt Uppsala University Outline! Why compare sequences?! Dotplots! airwise sequence alignments &! Multiple sequence alignments! rofile methods!
More informationComputational Genomics and Molecular Biology, Fall
Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related
More informationPhylogeny Tree Algorithms
Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some
More informationCAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan
CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns
More informationAn Introduction to Bioinformatics Algorithms Hidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationProtein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.
Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein
More informationSequence analysis and comparison
The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species
More informationBMI/CS 576 Fall 2016 Final Exam
BMI/CS 576 all 2016 inal Exam Prof. Colin Dewey Saturday, December 17th, 2016 10:05am-12:05pm Name: KEY Write your answers on these pages and show your work. You may use the back sides of pages as necessary.
More informationBioinformatics Chapter 1. Introduction
Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!
More informationMultiple Alignment. Slides revised and adapted to Bioinformática IST Ana Teresa Freitas
n Introduction to Bioinformatics lgorithms Multiple lignment Slides revised and adapted to Bioinformática IS 2005 na eresa Freitas n Introduction to Bioinformatics lgorithms Outline Dynamic Programming
More informationHidden Markov Models
Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How
More informationEukaryotic vs. Prokaryotic genes
BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,
More informationEvolutionary Models. Evolutionary Models
Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment
More informationHidden Markov Models
Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training
More informationInDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9
Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic
More informationSequence alignment methods. Pairwise alignment. The universe of biological sequence analysis
he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction
More informationComparing whole genomes
BioNumerics Tutorial: Comparing whole genomes 1 Aim The Chromosome Comparison window in BioNumerics has been designed for large-scale comparison of sequences of unlimited length. In this tutorial you will
More information3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,
More informationComputational Biology: Basics & Interesting Problems
Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information
More informationIntroduction to Bioinformatics
CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics
More information10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison
10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:
More informationHMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM
I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington
More informationMolecular Modeling Lecture 7. Homology modeling insertions/deletions manual realignment
Molecular Modeling 2018-- Lecture 7 Homology modeling insertions/deletions manual realignment Homology modeling also called comparative modeling Sequences that have similar sequence have similar structure.
More informationWhole Genome Alignments and Synteny Maps
Whole Genome Alignments and Synteny Maps IINTRODUCTION It was not until closely related organism genomes have been sequenced that people start to think about aligning genomes and chromosomes instead of
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)
More informationIn Genomes, Two Types of Genes
In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns
More informationTree of Life iological Sequence nalysis Chapter http://tolweb.org/tree/ Phylogenetic Prediction ll organisms on Earth have a common ancestor. ll species are related. The relationship is called a phylogeny
More informationHidden Markov Models. x 1 x 2 x 3 x K
Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K HiSeq X & NextSeq Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization:
More informationAlgorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment
Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot
More informationBioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment
Bioinformatics (GLOBEX, Summer 2015) Pairwise sequence alignment Substitution score matrices, PAM, BLOSUM Needleman-Wunsch algorithm (Global) Smith-Waterman algorithm (Local) BLAST (local, heuristic) E-value
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Lecture : p he biological problem p lobal alignment p Local alignment p Multiple alignment 6 Background: comparative genomics p Basic question in biology: what properties
More informationSUPPLEMENTARY INFORMATION
Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)
More informationCSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:
Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2
More informationMatrix-based pattern discovery algorithms
Regulatory Sequence Analysis Matrix-based pattern discovery algorithms Jacques.van.Helden@ulb.ac.be Université Libre de Bruxelles, Belgique Laboratoire de Bioinformatique des Génomes et des Réseaux (BiGRe)
More informationProbabilistic Arithmetic Automata
Probabilistic Arithmetic Automata Applications of a Stochastic Computational Framework in Biological Sequence Analysis Inke Herms PhD thesis defense Overview 1 Probabilistic Arithmetic Automata 2 Application
More informationPredicting RNA Secondary Structure Using Profile Stochastic Context-Free Grammars and Phylogenic Analysis
Fang XY, Luo ZG, Wang ZH. Predicting RNA secondary structure using profile stochastic context-free grammars and phylogenic analysis. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 23(4): 582 589 July 2008
More informationGenome 373: Hidden Markov Models II. Doug Fowler
Genome 373: Hidden Markov Models II Doug Fowler Review From Hidden Markov Models I What does a Markov model describe? Review From Hidden Markov Models I A T A Markov model describes a random process of
More informationLecture 7 Sequence analysis. Hidden Markov Models
Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden
More informationSequence Alignment: A General Overview. COMP Fall 2010 Luay Nakhleh, Rice University
Sequence Alignment: A General Overview COMP 571 - Fall 2010 Luay Nakhleh, Rice University Life through Evolution All living organisms are related to each other through evolution This means: any pair of
More informationMultiple Sequence Alignment: HMMs and Other Approaches
Multiple Sequence Alignment: HMMs and Other Approaches Background Readings: Durbin et. al. Section 3.1, Ewens and Grant, Ch4. Wing-Kin Sung, Ch 6 Beerenwinkel N, Siebourg J. Statistics, probability, and
More informationAlgorithms in Bioinformatics
Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationComputational Genomics. Systems biology. Putting it together: Data integration using graphical models
02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput
More informationHidden Markov Models
Hidden Markov Models A selection of slides taken from the following: Chris Bystroff Protein Folding Initiation Site Motifs Iosif Vaisman Bioinformatics and Gene Discovery Colin Cherry Hidden Markov Models
More information3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT
3. SEQUENCE ANALYSIS BIOINFORMATICS COURSE MTAT.03.239 25.09.2012 SEQUENCE ANALYSIS IS IMPORTANT FOR... Prediction of function Gene finding the process of identifying the regions of genomic DNA that encode
More informationEECS730: Introduction to Bioinformatics
EECS730: Introduction to Bioinformatics Lecture 05: Index-based alignment algorithms Slides adapted from Dr. Shaojie Zhang (University of Central Florida) Real applications of alignment Database search
More information(Lys), resulting in translation of a polypeptide without the Lys amino acid. resulting in translation of a polypeptide without the Lys amino acid.
1. A change that makes a polypeptide defective has been discovered in its amino acid sequence. The normal and defective amino acid sequences are shown below. Researchers are attempting to reproduce the
More informationGraph Alignment and Biological Networks
Graph Alignment and Biological Networks Johannes Berg http://www.uni-koeln.de/ berg Institute for Theoretical Physics University of Cologne Germany p.1/12 Networks in molecular biology New large-scale
More informationMarkov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University
Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions
More informationCSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators
CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More informationCSEP 590A Summer Lecture 4 MLE, EM, RE, Expression
CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm
More information17 Non-collinear alignment Motivation A B C A B C A B C A B C D A C. This exposition is based on:
17 Non-collinear alignment This exposition is based on: 1. Darling, A.E., Mau, B., Perna, N.T. (2010) progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147.
More informationHIDDEN MARKOV MODEL-BASED HOMOLOGY SEARCH AND GENE PREDICTION IN NGS ERA
HIDDEN MRKOV MODEL-BSED HOMOLOY SERCH ND ENE PREDICTION IN NS ER By Prapaporn Techa-angkoon DISSERTTION Submitted to Michigan State University in partial fulfillment of the requirements for the degree
More informationUsing Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics
Using Ensembles of Hidden Markov Models for Grand Challenges in Bioinformatics Tandy Warnow Founder Professor of Engineering The University of Illinois at Urbana-Champaign http://tandy.cs.illinois.edu
More informationPhylogenetics: Building Phylogenetic Trees. COMP Fall 2010 Luay Nakhleh, Rice University
Phylogenetics: Building Phylogenetic Trees COMP 571 - Fall 2010 Luay Nakhleh, Rice University Four Questions Need to be Answered What data should we use? Which method should we use? Which evolutionary
More information