Modeling and Searching for Non-Coding RNA

Size: px
Start display at page:

Download "Modeling and Searching for Non-Coding RNA"

Transcription

1 Modeling and Searching for Non-oding RN W.L. Ruzzo

2 The entral Dogma DN RN Protein gene Protein DN (chromosome) cell RN (messenger)

3 lassical RNs mrn trn rrn snrn (small nuclear - spl icing) snorn (small nucleolar - guides for t/rrn modifications) RNseP (trn maturation; ribozyme in bacteria) SRP (signal recognition particle; co-translational targeting of proteins to membranes) telomerases

4 Non-coding RN Messenger RN - codes for proteins Non-coding RN - all the rest Before, say, mid 1990 s, 1-2 dozen known (critically important, but narrow roles: e.g. trn) Since mid 90 s dramatic discoveries Regulation, transport, stability/degradation E.g. microrn : 100 s in humans By some estimates, ncrn >> mrn

5 ncrn Example: Xist large (12kb?) unstructured RN required for X-inactivation in mammals

6 ncrn Example: 6S medium size (175nt) structured highly expressed in e. coli in certain growth conditions sequenced in 1971; function unknown for 30 years

7 ncrn Example: IRE Iron Response Element: a short conserved stemloop, bound by iron response proteins (IRPs). Found in TRs of various mrns whose products are involved in iron metabolism. E.g., the mrn of ferritin (an iron storage protein) contains one IRE in its 5' TR. When iron concentration is low, IRPs bind the ferritin mrn IRE. repressing translation. Binding of multiple IREs in the 3' and 5' TRs of the transferrin receptor (involved in iron acquisition) leads to increased mrn stability. These two activities form the basis of iron homeostasis in the vertebrate cell.

8 ncrn Example: MicroRNs short (~22 nt) unstructured RNs excised from ~75nt precursor hairpin approx antisense to mrn targets, often in 3 TR regulate gene activity, e.g. by destabilizing (plants) or otherwise suppressing (animals) message several hundred, w/ perhaps thousands of targets, are known

9 ncrn Example: Riboswitches TR structure that directly senses/binds small molecules & regulates mrn widespread in prokaryotes some in eukaryotes

10 E.g.: the lycine Riboswitch lycine - simplest amino acid ses - make proteins, make energy Not enough OR too much - wasteful Hypothetical answer: g prot g g prot g gcvt protein gcvt (glycine cleavage) g gcvt promoter

11 The lycine Riboswitch ctual answer (in many bacteria): Look Ma, no protein g g gcvt protein g gcvt mrn gcvt (glycine cleavage)

12 Why? RN s fold, and function Nature uses what works

13 entral Dogma = entral hicken & Egg? DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Was there once an RN World?

14 Outline ncrn: what/why? What does computation bring? How to model and search for ncrn? Faster search Better model inference

15 Iron Response Element IRE (partial seed alignment): Hom.sap. Hom.sap.. Hom.sap.. Hom.sap.... Hom.sap. Hom.sap.... Hom.sap... Hom.sap.... Hom.sap.... av.por. Mus.mus.... Mus.mus. Mus.mus. Rat.nor.... Rat.nor. SS_cons <<<<<...<<<<<...>>>>>.>>>>>

16 6S mimics an open promoter E.coli Barrick et al. RN 2005 Trotochaud et al. NSMB 2005 Willkomm et al. NR 2005

17 Dengue virus genome: ORF 3 element Known distribution (96% sequence identity) Dengue virus With our techniques (70% sequence identity) Dengue virus West Nile virus Yellow Fever Omsk Hemorrhagic fever Japanese encephalitis Tick-borne encephalitis Kunjin virus Langat virus Louping ill Murray Valley virus Powassan virus

18 polyadenylation inhibition element RN 1 small nuclear ribonucleoprotein RN element 3 TR Known distribution (90% sequence identity) Human, mouse, rabbit With our techniques (75% sequence identity) Human, mouse, rabbit Zebrafish, Tetraodon, Fugu Frog

19 Impact of RN homology search B. subtilis glycine riboswitch (Barrick, et al., 2004) operon L. innocua. tumefaciens V. cholera M. tuberculosis (and 19 more species)

20 Impact of RN homology search B. subtilis L. innocua glycine riboswitch (Barrick, et al., 2004) operon sing our techniques, we found. tumefaciens V. cholera M. tuberculosis (and 19 more species) (and 42 more species)

21 The lycine Riboswitch ctual answer (in many bacteria): Look Ma, no protein g g g g gcvt mrn gcvt protein gcvt (glycine cleavage)

22 (Mandal, Lee, Barrick, Weinberg, Emilsson, Ruzzo, Breaker, Science 2004) 5 nd More examples means better alignment nderstand phylogenetic distribution Find riboswitch in front of new gene gcvt ORF 3

23

24 RN Informatics RN: Not just a messenger anymore Dramatic discoveries Hundreds of families (besides classics like trn, rrn, snrn ) Widespread, important roles omputational tools important Discovery, characterization, annotation BT: slow, inaccurate, demanding

25 Q: What s so hard? : Structure often more important than sequence

26 omputational hallenges Search - given related RN s, find more Modeling - describe a related family Meta-modeling - what s a good modeling framework? M-based search Hand-curated alignments -> Ms ovariance Models

27 Predict Structure from Multiple Sequences ompensatory mutations reveal structure, but in usual alignment algorithms they are doubly penalized.

28 How to model an RN Motif? onceptually, start with a profile HMM: from a multiple alignment, estimate nucleotide/ insert/delete preferences for each position given a new seq, estimate likelihood that it could be generated by the model, & align it to the model mostly del ins all

29 How to model an RN Motif? ovariance Models (aka profile SF ) Probabilistic models, like profile HMMs, but adding column pairs and pair emission probabilities for base-paired regions <<<<<<< paired columns >>>>>>>

30 RN sequence analysis using covariance models Eddy & Durbin Nucleic cids Research, 1994 vol 22 #11,

31 What probabilistic model for RN families The ovariance Model Stochastic ontext-free rammar generalization of a profile HMM lgorithms for Training From aligned or unaligned sequences utomates comparative analysis omplements Nusinov/Zucker RN folding lgorithms for searching

32 Main Results Very accurate search for trn (Precursor to trnscanse - current favorite) iven sufficient data, model construction comparable to, but not quite as good as, human experts Some quantitative info on importance of pseudoknots and other tertiary features

33 Probabilistic Model Search s with HMMs, given a sequence, you calculate llikelihood ratio that the model could generate the sequence, vs a background model You set a score threshold nything above threshold => a hit Scoring: Forward / Inside algorithm - sum over all paths Viterbi approximation - find single best path (Bonus: alignment & structure prediction)

34 Example: searching for trns

35 lignment Quality

36 omparison to TRNSN Fichant & Burks - best heuristic then 97.5% true positive 0.37 false positives per MB M 1415 (trained on trusted alignment) > 99.98% true positives <0.2 false positives per MB urrent method-of-choice is trnscanse, a Mbased scan with heuristic pre-filtering (including TRNSN?) for performance reasons. Slightly different evaluation criteria

37 M Structure : Sequence + structure B: the M guide tree : probabilities of letters/ pairs & of indels Think of each branch being an HMM emitting both sides of a helix (but 3 side emitted in reverse order)

38 Overall M rchitecture One box ( node ) per node of guide tree BE/MTL/INS/DEL just like an HMM MTP & BIF are the key additions: MTP emits pairs of symbols, modeling base-pairs; BIF allows multiple helices

39 M Viterbi lignment x i = i th letter of input x ij = substring i,..., j of input T yz = P(transition y " z) y E xi,x j = P(emission of x i, x j from state y) S ij y = max # log P(x ij generated starting in state y via path #)

40 Viterbi, cont. S ij y = max " log P(x ij generated starting in state y via path ") S ij y = % z max z [S i+1, j#1 ' z max z [S i+1, j ' z & max z [S i, j#1 ' z ' max z [S i, j (' max i<k$ j [S i,k y left y + logt yz + log E xi,x j y + logt yz + log E xi ] match pair ] match/insert left + logt yz + log E x j ] match/insert right + logt yz ] delete y + S right k +1, j ] bifurcation

41 Model Training

42 Mutual Information M ij = f " f xi,xj xi,xj log xi,xj 2 f xi f xj ; 0 # M ij # 2 Max when no sequence conservation but perfect pairing MI = expected score gain from using a pair state Finding optimal MI, (i.e. optimal pairing of columns) is NP-hard(?) Finding optimal MI without pseudoknots can be done by dynamic programming

43

44 MI-Based Structure-Learning # S i+1, j % S S i, j = max$ i, j"1 S i+1, j"1 + M i, j &% max i< j<k S i,k + S k +1, j just like Nussinov/Zucker folding BT, need enough data---enough sequences at right phylogenetic distance

45 Pseudoknots disallowed allowed # n % & " max j M $ i=1 i, j (/2 '

46

47 ccelerating M search Zasha Weinberg & W.L. Ruzzo Recomb 04, Bioinformatics 04, 06

48 Rfam database (Release 7.0, 3/2005) 503 ncrn families 280,000 annotated ncrns 8 riboswitches, 235 small nucleolar RNs, 8 spliceosomal RNs, 10 bacterial antisense RNs, 46 microrns, 9 ribozymes, 122 cis RN regulatory elements,

49 Rfam Input (hand-tuned): MS SS_cons Score Thresh T Window Len W Output: M scan results IRE (partial seed alignment): Hom.sap. Hom.sap.. Hom.sap.. Hom.sap.... Hom.sap. Hom.sap.... Hom.sap... Hom.sap.... Hom.sap.... av.por. Mus.mus.... Mus.mus. Mus.mus. Rat.nor.... Rat.nor. SS_cons <<<<<...<<<<<...>>>>>.>>>>>

50 ovariance Model Key difference of M vs HMM: Pair states emit paired symbols, corresponding to base-paired nucleotides; 16 emission probabilities here.

51 M s are good, but slow Rfam Reality EMBL Our Work EMBL Rfam oal EMBL Blast Z junk M hits 1 month, 1000 computers M hits ~2 months, 1000 computers junk M hits 10 years, 1000 computers

52 Oversimplified M (for pedagogical purposes only)

53 M to HMM 25 emisions per state 5 emissions per state, 2x states M HMM

54 Key Issue: 25 scores 10 M HMM Need: log Viterbi scores M HMM

55 Viterbi/Forward Scoring Path π defines transitions/emissions Score(π) = product of probabilities on π NB: ok if probabilities aren t, e.g. E.g. in M, emissions are odds ratios vs 0thorder background For any nucleotide sequence x: Viterbi-score(x) = max{ score(π) π emits x} Forward-score(x) = score(π) π emits x}

56 Key Issue: 25 scores 10 M L R HMM Need: log Viterbi scores M HMM P L + R P L + R P L + R P L + R P L + R P L + R P L + R P L + R P L + R P L + R NB:HMM not a prob. model

57 Rigorous Filtering ny scores satisfying the linear inequalities give rigorous filtering P L + R P L + R P L + R P L + R P L + R Proof: M Viterbi path score corresponding HMM path score Viterbi HMM path score (even if it does not correspond to any M path)

58 Some scores filter better P = 1 L + R P = 4 L + R ssuming 25% Option 1: Opt 1: L = R = R = 2 L + (R + R )/2 = 4 Option 2: Opt 2: L = 0, R = 1, R = 4 L + (R + R )/2 = 2.5

59 Optimizing filtering For any nucleotide sequence x: Viterbi-score(x) = max{ score(π) π emits x } Forward-score(x) = score(π) π emits x } Expected Forward Score E(L i, R i ) = x Forward-score(x)*Pr(x) NB: E is a function of L i, R i only nder 0th-order background model Optimization: Minimize E(L i, R i ) subject to score L.I.s This is heuristic ( forward Viterbi filter ) But still rigorous because subject to score L.I.s

60 alculating E(L i, R i ) E(L i, R i ) = x Forward-score(x)*Pr(x) Forward-like: for every state, calculate expected score for all paths ending there, easily calculated from expected scores of predecessors & transition/ emission probabilities/scores

61 Minimizing E(L i, R i ) alculate E(L i, R i ) symbolically, in terms of emission scores, so we can do partial derivatives for a numerical convex optimization algorithm "E(L 1,L 2,...) "L i

62 What should the probabilities be? onvex optimization problem onstraints: enforce rigorous property Objective function: filter as aggressively as possible Problem sizes: variables inequality constraints

63 Estimated Filtering Efficiency (139 Rfam 4.0 families) Filtering fraction # families (compact) # families (expanded) break even < verages 283 times faster than M

64 Results: buried treasures Name # found BLST + M # found rigorous filter + M # new Pyrococcus snorn Iron response element Histone 3 element Purine riboswitch Retron msr Hammerhead I Hammerhead III snrn S-box snrn snrn snrn

65 What if filtering is poor? Profile HMM filter discards structure info Surprise is that they usually do very well But not always; e.g. a dozen families with filtering >.01, including stars like trn Three ideas: Sub M: graft in SM for a critical part Store Pair: retain a few critical pairs Filter hains: run fast, crude filters first

66 Full M Profile HMM Sub-M filters Sub-M sub-m Sub-profile-HMM

67 Store-pair filters Full M Store pair Profile HMM:

68 T Filter hains Rigorous filter Rigorous filter Rigorous filter M ncrns

69 Why run filters in series? Filter 1 Filter 2 M Filtering fraction N/ Run time (sec/kbase) M alone: 200 s/kb Filter 2 M: *200 = 12 s/kb Filter 1 Filter 2 M: * *200 = 5.5 s/kb

70 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Properties of a filter: Filtering fraction Run time (sec/kb)

71 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Simplified performance model (selectivity and speed) Independence assumptions for base pairs se dynamic programming to rapidly explore base pair combinations

72 Run time (sec/kb) Store pair 1E Filtering fraction Sub-M 1E Optimal rigorous filter series E

73 Estimated M time (days) faster Results: faster M: 30 years (your career) Filters: 1 month (time between school terms) 100 faster 10 faster Rigorous series of filters + M time (days)

74 Results: more sensitive than BLST # with BLST+M # with rigorous filters + M # new Rfam trn roup II intron Iron response element tmrn Lysine riboswitch nd more

75 Is there anything more to do? Rigorous filters can be too cautious E.g., 10 times slower than heuristic filters Yet only 1-3% more sensitive We want to Run scans faster with minimal loss of sensitivity Know empirically what sensitivity we re losing

76 Input Multiple Sequence lignment <.> Heuristic Profile HMMs M Base paired columns Infinite Multiple sequence alignments <.>... (Weinberg & Ruzzo, 2006) Profile HMM

77 RO-like curves (lysine riboswitch) Filter sensitivity HMM BLST Filter sends 80% of hits to M Filtering fraction

78 trnscan-se: the leading brand (Lowe & Eddy 1997) Designed for trns sed in virtually every genome project ses Ms Heuristics: selected 2 trn detection programs n ambitious target to shoot for

79 trnscan-se vs. heuristic HMMs rchaea Eubacteria. elegans Drosophila Human Sensitivity (%) t-se HMM Run time (hours) t-se HMM rigor (Each filter sends same number of nucleotides to M)

80 trnscan-se vs. heuristic HMMs Time to create heuristic filters for trns trnscan-se 8 papers HMM 10 seconds to type a command 15 minutes to create & calibrate HMM Point is not that heuristic HMM is better than trnscan- SE --- it s not; point is that it s in the ballpark, so may be easy way to get useful results for new families.

81 Building M s Hand-curated alignments + structure as in Rfam are great, but it doesn t scale Example pplication: iven 5-20 upstream regions (~500 nt) of orthologous bacterial genes, some (but not all) plausibly regulated by a common riboswitch, could we find it?

82 MFinder Harder: Finding Ms without alignment Yao, Weinberg & Ruzzo, Bioinformatics, 2006 Folding predictions Smart heuristics andidate alignment M Search Realign

83 Mfinder ccuracy (on Rfam families with flanking sequence) /W /W

84 Importance of lignment Blue boxes, e.g., should be lined up. Structure is invisible otherwise.

85 Early Semi-automated Example Started with 16 genes orthologous to fol in B. subtilis Found 10 sharing good structural motif Searched all bacterial genomes for this motif Found 234 hits Realigned these to refine structural motif Found 367 hits 257 match RFM s T-box (Based on hand-curated alignment of 67 knowns)

86

87 Initial orthologs Found by scan hloroflexus aurantiacus hloroflexi eobacter metallireducens eobacter sulphurreducens δ -Proteobacteria Symbiobacterium thermophilum

88 n approach for cis-regulatory RN discovery in bacteria 1. hoose a bacterial genome 2. For each gene, collect close orthologs 3. Find most promising genes, based on sequence motifs conserved among orthologs 4. From those, find most promising genes, incorporating structure in the motifs 5. From those, genome-wide searches for more instances 6. Expert analyses (Breaker Lab, Yale)

89 enome Scale Search: Why Most riboswitches, e.g., are present in ~5 copies per genome Throughout (most of) clade More examples give better model, hence even more examples, fewer errors More examples give more clues to function

90 enome Scale Search: How Mfinder is directly usable for/with search Folding predictions Smart heuristics andidate alignment M Search Realign

91 Results Process largely complete in bacillus/clostridia gamma proteobacteria cyanobacteria actinobacteria nalysis ongoing

92 Some Preliminary ctino Results Rfam Family Type (metabolite) Rank THI riboswitch (thiamine) 4 ydao-yua riboswitch (unknown) 19 obalamin riboswitch (cobalamin) 21 SRP_bact gene 28 RFN riboswitch (FMN) 39 yybp-ykoy riboswitch (unknown) 48 gcvt riboswitch (glycine) 53 S_box riboswitch (SM) 401 tmrn gene Not found RNaseP gene Not found not cisregulatory

93 More Prelim ctino Results Many others (not in Rfam) are likely real of top 50: known (Rfam, 23S) 10 probable (Tbox, IRE, Lex, parp, pyrr) 7 ribosomal genes 9 potentially interesting 12 unknown or poor 12 One other being bench-verified

94 Software Infernal - (Eddy et al.) most of Eddy & Durbin RaveNna - (Weinberg) fast filtering Mfinder - (Yao) Motif discovery (local alignment)

95 Summary ncrn is a hot topic For family homology modeling: Ms Training & search like HMM (but slower) Dramatic acceleration possible utomated model construction Hopefully leading to new discoveries

Searching for Noncoding RNA

Searching for Noncoding RNA Searching for Noncoding RN Larry Ruzzo omputer Science & Engineering enome Sciences niversity of Washington http://www.cs.washington.edu/homes/ruzzo Bio 2006, Seattle, 8/4/2006 1 Outline Noncoding RN Why

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007

RNA Search and Motif Discovery. Lectures CSE 527 Autumn 2007 RNA Search and Motif Discovery Lectures 18-19 CSE 527 Autumn 2007 The Human Parts List, circa 2001 1 gagcccggcc cgggggacgg gcggcgggat agcgggaccc cggcgcggcg gtgcgcttca 61 gggcgcagcg gcggccgcag accgagcccc

More information

Genome 559 Wi RNA Function, Search, Discovery

Genome 559 Wi RNA Function, Search, Discovery Genome 559 Wi 2009 RN Function, Search, Discovery The Message Cells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex New tools required alignment, discovery,

More information

RNA Secondary Structure. CSE 417 W.L. Ruzzo

RNA Secondary Structure. CSE 417 W.L. Ruzzo RN Secondary Structure SE 417 W.L. Ruzzo The Double Helix Los lamos Science The entral Dogma of Molecular Biology DN RN Protein gene Protein DN (chromosome) cell RN (messenger) Non-coding RN Messenger

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo Outline Why RN? Examples of RN biology omputational hallenges Modeling Search Inference The entral Dogma DN RN

More information

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein!

The Double Helix. CSE 417: Algorithms and Computational Complexity! The Central Dogma of Molecular Biology! DNA! RNA! Protein! Protein! The Double Helix SE 417: lgorithms and omputational omplexity! Winter 29! W. L. Ruzzo! Dynamic Programming, II" RN Folding! http://www.rcsb.org/pdb/explore.do?structureid=1t! Los lamos Science The entral

More information

Modeling and Searching for Non-Coding RNA

Modeling and Searching for Non-Coding RNA Modeling and Searching for Non-oding RN W.L. Ruzzo http://www.cs.washington.edu/homes/ruzzo http://www.cs.washington.edu/homes/ruzzo/ courses/gs541/09sp ENOME 541 protein and DN sequence analysis to determine

More information

RNA Search and! Motif Discovery"

RNA Search and! Motif Discovery RN Search and! Motif Discovery" Lecture 9! SEP 590! utumn 2008" Outline" Whirlwind tour of ncrn search & discovery" RN motif description (ovariance Model Review)" lgorithms for searching" Rigorous & heuristic

More information

ROC TPR FPR

ROC TPR FPR HW5 2 3 TPR 0.0 0.2 0.4 0.6 0.8 1.0 291 411 537 669 807 957 1098 1269 1506 1971 8703 171 51 RO TPR = True Positive Rate = Sensitivity = Recall = TP/(TP+FN) FPR = False Positive Rate = 1 Specificity = FP/(FP+TN)

More information

In Genomes, Two Types of Genes

In Genomes, Two Types of Genes In Genomes, Two Types of Genes Protein-coding: [Start codon] [codon 1] [codon 2] [ ] [Stop codon] + DNA codons translated to amino acids to form a protein Non-coding RNAs (NcRNAs) No consistent patterns

More information

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery

CSEP 527 Computational Biology. RNA: Function, Secondary Structure Prediction, Search, Discovery SEP 527 omputational Biology RN: Function, Secondary Structure Prediction, Search, Discovery The Message ells make lots of RN noncoding RN Functionally important, functionally diverse Structurally complex

More information

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna.

Conserved RNA Structures. Ivo L. Hofacker. Institut for Theoretical Chemistry, University Vienna. onserved RN Structures Ivo L. Hofacker Institut for Theoretical hemistry, University Vienna http://www.tbi.univie.ac.at/~ivo/ Bled, January 2002 Energy Directed Folding Predict structures from sequence

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

A Method for Aligning RNA Secondary Structures

A Method for Aligning RNA Secondary Structures Method for ligning RN Secondary Structures Jason T. L. Wang New Jersey Institute of Technology J Liu, JTL Wang, J Hu and B Tian, BM Bioinformatics, 2005 1 Outline Introduction Structural alignment of RN

More information

DUPLICATED RNA GENES IN TELEOST FISH GENOMES

DUPLICATED RNA GENES IN TELEOST FISH GENOMES DPLITED RN ENES IN TELEOST FISH ENOMES Dominic Rose, Julian Jöris, Jörg Hackermüller, Kristin Reiche, Qiang LI, Peter F. Stadler Bioinformatics roup, Department of omputer Science, and Interdisciplinary

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Bio 1B Lecture Outline (please print and bring along) Fall, 2007

Bio 1B Lecture Outline (please print and bring along) Fall, 2007 Bio 1B Lecture Outline (please print and bring along) Fall, 2007 B.D. Mishler, Dept. of Integrative Biology 2-6810, bmishler@berkeley.edu Evolution lecture #5 -- Molecular genetics and molecular evolution

More information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information

A graph kernel approach to the identification and characterisation of structured non-coding RNAs using multiple sequence alignment information graph kernel approach to the identification and characterisation of structured noncoding RNs using multiple sequence alignment information Mariam lshaikh lbert Ludwigs niversity Freiburg, Department of

More information

Detecting non-coding RNA in Genomic Sequences

Detecting non-coding RNA in Genomic Sequences Detecting non-coding RNA in Genomic Sequences I. Overview of ncrnas II. What s specific about RNA detection? III. Looking for known RNAs IV. Looking for unknown RNAs Daniel Gautheret INSERM ERM 206 & Université

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

Introduction to Molecular and Cell Biology

Introduction to Molecular and Cell Biology Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the molecular basis of disease? What

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology

2012 Univ Aguilera Lecture. Introduction to Molecular and Cell Biology 2012 Univ. 1301 Aguilera Lecture Introduction to Molecular and Cell Biology Molecular biology seeks to understand the physical and chemical basis of life. and helps us answer the following? What is the

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus

L3.1: Circuits: Introduction to Transcription Networks. Cellular Design Principles Prof. Jenna Rickus L3.1: Circuits: Introduction to Transcription Networks Cellular Design Principles Prof. Jenna Rickus In this lecture Cognitive problem of the Cell Introduce transcription networks Key processing network

More information

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6)

Background: comparative genomics. Sequence similarity. Homologs. Similarity vs homology (2) Similarity vs homology. Sequence Alignment (chapter 6) Sequence lignment (chapter ) he biological problem lobal alignment Local alignment Multiple alignment Background: comparative genomics Basic question in biology: what properties are shared among organisms?

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

proteins are the basic building blocks and active players in the cell, and

proteins are the basic building blocks and active players in the cell, and 12 RN Secondary Structure Sources for this lecture: R. Durbin, S. Eddy,. Krogh und. Mitchison, Biological sequence analysis, ambridge, 1998 J. Setubal & J. Meidanis, Introduction to computational molecular

More information

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder

HMM applications. Applications of HMMs. Gene finding with HMMs. Using the gene finder HMM applications Applications of HMMs Gene finding Pairwise alignment (pair HMMs) Characterizing protein families (profile HMMs) Predicting membrane proteins, and membrane protein topology Gene finding

More information

How much non-coding DNA do eukaryotes require?

How much non-coding DNA do eukaryotes require? How much non-coding DNA do eukaryotes require? Andrei Zinovyev UMR U900 Computational Systems Biology of Cancer Institute Curie/INSERM/Ecole de Mine Paritech Dr. Sebastian Ahnert Dr. Thomas Fink Bioinformatics

More information

Phylogeny Tree Algorithms

Phylogeny Tree Algorithms Phylogeny Tree lgorithms Jianlin heng, PhD School of Electrical Engineering and omputer Science University of entral Florida 2006 Free for academic use. opyright @ Jianlin heng & original sources for some

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Gene Expression. Molecular Genetics, March, 2018

Gene Expression. Molecular Genetics, March, 2018 Gene Expression Molecular Genetics, March, 2018 Gene Expression Control of Protein Levels Bacteria Lac Operon Promoter mrna Inducer CAP Control Trp Operon RepressorOperator Control Attenuation Riboswitches

More information

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting.

Genome Annotation. Bioinformatics and Computational Biology. Genome sequencing Assembly. Gene prediction. Protein targeting. Genome Annotation Bioinformatics and Computational Biology Genome Annotation Frank Oliver Glöckner 1 Genome Analysis Roadmap Genome sequencing Assembly Gene prediction Protein targeting trna prediction

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Honors Biology Reading Guide Chapter 11

Honors Biology Reading Guide Chapter 11 Honors Biology Reading Guide Chapter 11 v Promoter a specific nucleotide sequence in DNA located near the start of a gene that is the binding site for RNA polymerase and the place where transcription begins

More information

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization.

3.B.1 Gene Regulation. Gene regulation results in differential gene expression, leading to cell specialization. 3.B.1 Gene Regulation Gene regulation results in differential gene expression, leading to cell specialization. We will focus on gene regulation in prokaryotes first. Gene regulation accounts for some of

More information

Genomes and Their Evolution

Genomes and Their Evolution Chapter 21 Genomes and Their Evolution PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from

More information

+ regulation. ribosomes

+ regulation. ribosomes central dogma + regulation rpl DNA tsx rrna trna mrna ribosomes tsl ribosomal proteins structural proteins transporters enzymes srna regulators RNAp DNAp tsx initiation control by transcription factors

More information

Sequence Alignment (chapter 6)

Sequence Alignment (chapter 6) Sequence lignment (chapter 6) he biological problem lobal alignment Local alignment Multiple alignment Introduction to bioinformatics, utumn 6 Background: comparative genomics Basic question in biology:

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

Computational approaches for functional genomics

Computational approaches for functional genomics Computational approaches for functional genomics Kalin Vetsigian October 31, 2001 The rapidly increasing number of completely sequenced genomes have stimulated the development of new methods for finding

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION Supplementary information S1 (box). Supplementary Methods description. Prokaryotic Genome Database Archaeal and bacterial genome sequences were downloaded from the NCBI FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/)

More information

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are:

Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Comparative genomics and proteomics Species available Ensembl focuses on metazoan (animal) genomes. The genomes currently available at the Ensembl site are: Vertebrates: human, chimpanzee, mouse, rat,

More information

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained

Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained Copyright (c) 2007 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to pubs-permissions@ieee.org.

More information

Name: SBI 4U. Gene Expression Quiz. Overall Expectation:

Name: SBI 4U. Gene Expression Quiz. Overall Expectation: Gene Expression Quiz Overall Expectation: - Demonstrate an understanding of concepts related to molecular genetics, and how genetic modification is applied in industry and agriculture Specific Expectation(s):

More information

Controlling Gene Expression

Controlling Gene Expression Controlling Gene Expression Control Mechanisms Gene regulation involves turning on or off specific genes as required by the cell Determine when to make more proteins and when to stop making more Housekeeping

More information

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison

10-810: Advanced Algorithms and Models for Computational Biology. microrna and Whole Genome Comparison 10-810: Advanced Algorithms and Models for Computational Biology microrna and Whole Genome Comparison Central Dogma: 90s Transcription factors DNA transcription mrna translation Proteins Central Dogma:

More information

RNA Abstract Shape Analysis

RNA Abstract Shape Analysis ourse: iegerich RN bstract nalysis omplete shape iegerich enter of Biotechnology Bielefeld niversity robert@techfak.ni-bielefeld.de ourse on omputational RN Biology, Tübingen, March 2006 iegerich ourse:

More information

BIOINFORMATICS LAB AP BIOLOGY

BIOINFORMATICS LAB AP BIOLOGY BIOINFORMATICS LAB AP BIOLOGY Bioinformatics is the science of collecting and analyzing complex biological data. Bioinformatics combines computer science, statistics and biology to allow scientists to

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

Prokaryotic Regulation

Prokaryotic Regulation Prokaryotic Regulation Control of transcription initiation can be: Positive control increases transcription when activators bind DNA Negative control reduces transcription when repressors bind to DNA regulatory

More information

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment

Algorithms in Bioinformatics FOUR Pairwise Sequence Alignment. Pairwise Sequence Alignment. Convention: DNA Sequences 5. Sequence Alignment Algorithms in Bioinformatics FOUR Sami Khuri Department of Computer Science San José State University Pairwise Sequence Alignment Homology Similarity Global string alignment Local string alignment Dot

More information

GENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES.

GENETICS - CLUTCH CH.12 GENE REGULATION IN PROKARYOTES. GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES!! www.clutchprep.com GEETICS - CLUTCH CH.12 GEE REGULATIO I PROKARYOTES COCEPT: LAC OPERO An operon is a group of genes with similar functions that are

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Searching genomes for non-coding RNA using FastR

Searching genomes for non-coding RNA using FastR Searching genomes for non-coding RNA using FastR Shaojie Zhang Brian Haas Eleazar Eskin Vineet Bafna Keywords: non-coding RNA, database search, filtration, riboswitch, bacterial genome. Address for correspondence:

More information

Eukaryotic vs. Prokaryotic genes

Eukaryotic vs. Prokaryotic genes BIO 5099: Molecular Biology for Computer Scientists (et al) Lecture 18: Eukaryotic genes http://compbio.uchsc.edu/hunter/bio5099 Larry.Hunter@uchsc.edu Eukaryotic vs. Prokaryotic genes Like in prokaryotes,

More information

Predicting RNA Secondary Structure

Predicting RNA Secondary Structure 7.91 / 7.36 / BE.490 Lecture #6 Mar. 11, 2004 Predicting RNA Secondary Structure Chris Burge Review of Markov Models & DNA Evolution CpG Island HMM The Viterbi Algorithm Real World HMMs Markov Models for

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Bio 119 Bacterial Genomics 6/26/10

Bio 119 Bacterial Genomics 6/26/10 BACTERIAL GENOMICS Reading in BOM-12: Sec. 11.1 Genetic Map of the E. coli Chromosome p. 279 Sec. 13.2 Prokaryotic Genomes: Sizes and ORF Contents p. 344 Sec. 13.3 Prokaryotic Genomes: Bioinformatic Analysis

More information

Comparative Bioinformatics Midterm II Fall 2004

Comparative Bioinformatics Midterm II Fall 2004 Comparative Bioinformatics Midterm II Fall 2004 Objective Answer, part I: For each of the following, select the single best answer or completion of the phrase. (3 points each) 1. Deinococcus radiodurans

More information

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010

BLAST Database Searching. BME 110: CompBio Tools Todd Lowe April 8, 2010 BLAST Database Searching BME 110: CompBio Tools Todd Lowe April 8, 2010 Admin Reading: Read chapter 7, and the NCBI Blast Guide and tutorial http://www.ncbi.nlm.nih.gov/blast/why.shtml Read Chapter 8 for

More information

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday

Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday Complete all warm up questions Focus on operon functioning we will be creating operon models on Monday 1. What is the Central Dogma? 2. How does prokaryotic DNA compare to eukaryotic DNA? 3. How is DNA

More information

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p

Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p Organization of Genes Differs in Prokaryotic and Eukaryotic DNA Chapter 10 p.110-114 Arrangement of information in DNA----- requirements for RNA Common arrangement of protein-coding genes in prokaryotes=

More information

Alignment Algorithms. Alignment Algorithms

Alignment Algorithms. Alignment Algorithms Midterm Results Big improvement over scores from the previous two years. Since this class grade is based on the previous years curve, that means this class will get higher grades than the previous years.

More information

Introduction. Gene expression is the combined process of :

Introduction. Gene expression is the combined process of : 1 To know and explain: Regulation of Bacterial Gene Expression Constitutive ( house keeping) vs. Controllable genes OPERON structure and its role in gene regulation Regulation of Eukaryotic Gene Expression

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever.

Related Courses He who asks is a fool for five minutes, but he who does not ask remains a fool forever. CSE 527 Computational Biology http://www.cs.washington.edu/527 Lecture 1: Overview & Bio Review Autumn 2004 Larry Ruzzo Related Courses He who asks is a fool for five minutes, but he who does not ask remains

More information

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species

Early History up to Schedule. Proteins DNA & RNA Schwann and Schleiden Cell Theory Charles Darwin publishes Origin of Species Schedule Bioinformatics and Computational Biology: History and Biological Background (JH) 0.0 he Parsimony criterion GKN.0 Stochastic Models of Sequence Evolution GKN 7.0 he Likelihood criterion GKN 0.0

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison

CMPS 6630: Introduction to Computational Biology and Bioinformatics. Structure Comparison CMPS 6630: Introduction to Computational Biology and Bioinformatics Structure Comparison Protein Structure Comparison Motivation Understand sequence and structure variability Understand Domain architecture

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E

REVIEW SESSION. Wednesday, September 15 5:30 PM SHANTZ 242 E REVIEW SESSION Wednesday, September 15 5:30 PM SHANTZ 242 E Gene Regulation Gene Regulation Gene expression can be turned on, turned off, turned up or turned down! For example, as test time approaches,

More information

Algorithms in Bioinformatics

Algorithms in Bioinformatics Algorithms in Bioinformatics Sami Khuri Department of omputer Science San José State University San José, alifornia, USA khuri@cs.sjsu.edu www.cs.sjsu.edu/faculty/khuri Pairwise Sequence Alignment Homology

More information

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON

CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON PROKARYOTE GENES: E. COLI LAC OPERON CHAPTER 13 CHAPTER 13 PROKARYOTE GENES: E. COLI LAC OPERON Figure 1. Electron micrograph of growing E. coli. Some show the constriction at the location where daughter

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis

Sequence alignment methods. Pairwise alignment. The universe of biological sequence analysis he universe of biological sequence analysis Word/pattern recognition- Identification of restriction enzyme cleavage sites Sequence alignment methods PstI he universe of biological sequence analysis - prediction

More information

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications

GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 1 GENE ACTIVITY Gene structure Transcription Transcript processing mrna transport mrna stability Translation Posttranslational modifications 2 DNA Promoter Gene A Gene B Termination Signal Transcription

More information

Regulation of Transcription in Eukaryotes

Regulation of Transcription in Eukaryotes Regulation of Transcription in Eukaryotes Leucine zipper and helix-loop-helix proteins contain DNA-binding domains formed by dimerization of two polypeptide chains. Different members of each family can

More information

Bioinformatics Exercises

Bioinformatics Exercises Bioinformatics Exercises AP Biology Teachers Workshop Susan Cates, Ph.D. Evolution of Species Phylogenetic Trees show the relatedness of organisms Common Ancestor (Root of the tree) 1 Rooted vs. Unrooted

More information

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11

The Eukaryotic Genome and Its Expression. The Eukaryotic Genome and Its Expression. A. The Eukaryotic Genome. Lecture Series 11 The Eukaryotic Genome and Its Expression Lecture Series 11 The Eukaryotic Genome and Its Expression A. The Eukaryotic Genome B. Repetitive Sequences (rem: teleomeres) C. The Structures of Protein-Coding

More information

The Gene The gene; Genes Genes Allele;

The Gene The gene; Genes Genes Allele; Gene, genetic code and regulation of the gene expression, Regulating the Metabolism, The Lac- Operon system,catabolic repression, The Trp Operon system: regulating the biosynthesis of the tryptophan. Mitesh

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes

A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes A Computational Pipeline for High- Throughput Discovery of cis-regulatory Noncoding RNA in Prokaryotes Zizhen Yao 1*, Jeffrey Barrick 2, Zasha Weinberg 3, Shane Neph 1,4, Ronald Breaker 2,3,5, Martin Tompa

More information

RNA Protein Interaction

RNA Protein Interaction Introduction RN binding in detail structural analysis Examples RN Protein Interaction xel Wintsche February 16, 2009 xel Wintsche RN Protein Interaction Introduction RN binding in detail structural analysis

More information

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22)

Warm-Up. Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22) Warm-Up Explain how a secondary messenger is activated, and how this affects gene expression. (LO 3.22) Yesterday s Picture The first cell on Earth (approx. 3.5 billion years ago) was simple and prokaryotic,

More information

From gene to protein. Premedical biology

From gene to protein. Premedical biology From gene to protein Premedical biology Central dogma of Biology, Molecular Biology, Genetics transcription replication reverse transcription translation DNA RNA Protein RNA chemically similar to DNA,

More information

BLAST. Varieties of BLAST

BLAST. Varieties of BLAST BLAST Basic Local Alignment Search Tool (1990) Altschul, Gish, Miller, Myers, & Lipman Uses short-cuts or heuristics to improve search speed Like speed-reading, does not examine every nucleotide of database

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

Multiple Sequence Alignment

Multiple Sequence Alignment Multiple equence lignment Four ami Khuri Dept of omputer cience an José tate University Multiple equence lignment v Progressive lignment v Guide Tree v lustalw v Toffee v Muscle v MFFT * 20 * 0 * 60 *

More information