Incorporating dependence into models for DNA motifs

Size: px
Start display at page:

Download "Incorporating dependence into models for DNA motifs"

Transcription

1 Incorporating dependence into models for DNA motifs Terry Speed & Xiaoyue Zhao University of California at Berkeley Department of Human Genetics, UCLA May 17,

2 The objects of our study DNA, RNA and proteins: macromolecules which are unbranched polymers built up from smaller units. DNA: units are the nucleotide residues A, C, G and T RNA: units are the nucleotide residues A, C, G and U Proteins: units are the amino acid residues A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W and Y. To a considerable extent, the chemical properties of DNA, RNA and protein molecules are encoded in the linear sequence of these basic units: their primary structure. 2

3 Motifs - Sites - Signals - Domains For this talk, I ll use these terms interchangeably to describe recurring elements of interest to us. In PROTEINS we have: transmembrane domains, coiled-coil domains, EGF-like domains, signal peptides, phosphorylation sites, antigenic determinants,... In DNA / RNA we have: enhancers, promoters, terminators, splicing signals, translation initiation sites, centromeres,... 3

4 Why (probability) models for biomolecular motifs? to characterize them to help identify them for incorporation into larger models, e.g. for an entire gene 4

5 Motifs and models Motifs typically represent regions of structural significance with specific biological function. Are generalisations from known examples. The models can be highly specific. Multiple models can be used to give higher sensitivity & specificity in their detection. Can sometimes be generated automatically from examples or multiple alignments. 5

6 The use of models for motifs Can be descriptive, predictive or everything else in between..almost business as usual. However, stochastic mechanisms should never be taken literally, but nevertheless they can be amazingly useful. Care is always needed: a model or method can break down at any time without notice. Biological confirmation of predictions is almost always necessary. 6

7 Transcription initiation in E. coli In E. coli transcription is initiated at the promotor, and the sequence of the promotor is recognised by the Sigma factor of RNA polymerase. 7

8 Determinism 1: consensus sequences σ Factor Promotor consensus sequence σ 70 TTGACA TATAAT σ 28 CTAAA CCGATAT Similarly for σ 32, σ 38 and σ 54. Consensus sequences have the obvious limitation: there is usually some deviation from them. 8

9 The human transcription factor Sp1 has 3 Cys-Cys-His-His zinc finger DNA binding domains 9

10 Determinism 2: regular expressions The characteristic motif of a Cys-Cys-His-His zinc finger DNA binding domain has regular expression C-X(2,4)-C-X(3)-[LIVMFYWC]-X(8)-H-X(3,5)-H Here, as in algebra, X is unknown. The 29 a.a. sequence of our example domain 1SP1 is as follows, clearly fitting the model. 1SP1: KKFACPECPKRFMRSDHLSKHIKTHQNKK 10

11 Searching with regular expressions c.{2,4}c...[livmfywc]...h.{3,5}h PatternFind output [ISREC-Server] Date: Wed Aug 22 13:00:41 MET gp AF AEB01ABAC4F945 nuclear protein NP94b [Homo sapiens] Occurences: 2 Position : 514 CYICKASCSSQQEFQDHMSEPQH Position : 606 CTVCNRYFKTPRKFVEHVKSQGH... 11

12 Regular expressions can be limiting The regular expression syntax is still too rigid to represent many highly divergent protein motifs. Also, short patterns are sometimes insufficient with today s large databases. Even requiring perfect matches you might find many false positives. On the other hand, some real sites might not be perfect matches. We need to go beyond apparently equally likely alternatives, and ranges for gaps. We deal with the former first, having a distribution at each position. 12

13 Cys-Cys-His-His profile: sequence logo form A sequence logo is a scaled position-specific a.a.distribution. Scaling is by a measure of a position s information content. (Note that we ve lost the option of variable spacing.) 13

14 Weight matrix model (WMM) = Stochastic consensus sequence A A C C G G T T Counts from 242 known σ 70 sites Relative frequencies: f bl A C G T log 2 f bl /p b Informativeness:2+Σ b p bl log 2 p bl 14

15 Interpretation of weight matrix entries candidate sequence CTATAATC... aligned position Hypotheses: S=site (and independence) R=random (equiprobable, independence) pr(ctataa S) pr(ctataa R) log 2 = log 2 = (2+log 2.09)+...+(2+log 2.01) = x.03x.26x.13x.51x.01.25x.25x.25x.25x.25x.25 { } Generally, score s bl = log f bl /p b l=position, b=base p b =background frequency 15

16 Move the matrix along the sequence and score each window. Use of the matrix to find sites A C G T C T A T A A T C sum -93 Peaks should occur at the true sites. A C G T Of course in general any threshold will have some false positive and false negative rate. A C G T

17 Modelling motifs: the next steps Missing from the weight matrix models of motifs are good ways of dealing with: Length distributions for insertions/deletions Local and non-local association of amino acids Hidden Markov Models (HMM) help with the first. Dealing with the second remains a hard unsolved problem, but we ll describe a start. 17

18 Hidden Markov Models Processes {(S t, O t ), t=1, }, where S t is the hidden state and O t the observation at time t, such that pr(s t O t-1,s t-1,o t-2,s t-2 ) = pr(s t S t-1 ) pr(o t S t,o t-1,s t-1, O t-2,s t-2, ) = pr(o t S t, S t-1 ) The basics of HMM were laid bare in a series of beautiful papers by L E Baum and colleagues around 1970, and their formulation has been used almost unchanged to this day. 18

19 The algorithms As the name suggests, with an HMM the series O = (O 1,O 2,O 3,., O T ) is observed, while the states S = (S 1,S 2,S 3,., S T ) are not. There are elegant algorithms for calculating pr(o θ), arg max θ pr(o θ) in certain special cases, and arg max S pr(s O,θ). Here θ are the parameters of the model, e.g. transition and observation probabilities. 19

20 Profile HMM = stochastic regular expressions M = Match state, I = Insert state, D = Delete state. To operate, go from left to right. I and M states output 20 amino acids; B, D and E states are silent.

21 How profile HMM are used Instances of the motif are identified by calculating log{pr(sequence M)/pr(sequence B)}, where M and B are the motif and background HMM. Alignments of instances of the motif to the HMM are found by calculating arg max states pr(states instance, M). Estimation of HMM parameters is by calculating arg max parameters pr(sequences M, parameters). In all cases, we use the efficient HMM algorithms. 21

22 Pfam domain-hmm Pfam is a library of models of recurrent protein domains. They are constructed semi-automatically using profile hidden Markov models. Pfam families have permanent accession numbers and contain functional annotation and crossreferences to other databases, while Pfam-B families are re-generated at each release and are unannotated. See 22

23 Beyond independence Weight array matrices, Zhang & Marr (1993) consider dependencies between adjacent positions, i.e. (nonstationary) first-order Markov models. The number of parameters increases exponentially if we restrict to full higher-order Markovian models. Variable length Markov models, Rissanen (1986), Buhlmann & Wyner (1999), help us get over this problem. In the last few years, many variants have appeared: all make use of trees. [The interpolated Markov models of Salzberg et al (1998) address the same problem.] 23

24 Our aim and some notation L: length of the the sequence motif. X i : discrete random variable at position i, taking values from a finite set χ. Given a number of instances of a sequence motif x = (x 1,, x L ) of length L, we want a model for the probability P(x) of x. We denote by x ij (i < j) the sequence (x j, x j-1,., x i ) in reverse time order. 24

25 Variable Length Markov Models Factorize P(x) in the usual telescopic way: P(X 1 =x 1 ) Π L 2 P(X l = x l X l-1 1 = x l-1 1 ), then simplify this using context functions c l, l=2,..l, to P(X 1 =x 1 ) Π L 2 P(X l = x l c l (X l-1) 1 ) = c l (x l-1 1 )), where c l : x l-1 1 x l-1 l-m is suitably defined on l-1 tuples. 25

26 VLMM, cont. Here c l : χ l-1 i=0 l-1 χ i, and m = m l is given by m l (x 1 l-1 ) = min {r: P(X l = x X 1 l-1 = x 1 l-1 ) = P(X l = x X l-r l-1 = x l-r l-1 ) for all x χ }. The function c l defines the sequence-specific context, and m l defines the sequencespecific memory or order of the Markov property for position l. 26

27 VLMM: an illustrative example A full set of 16 contexts of order Pruned set of 12 contexts: P(X 3 X 2 =C,X 1 =G) = P(X 3 X 2 =C), etc.

28 VLMM cont. A VLMM for a biomolecular motif of length L is specified by a distribution for X 1, and, for l = 2, L, a constrained distribution for X l given X l-1,,x 1. That is, we need L-1 context functions, or trees. But, there is a difficulty here. 28

29 Sequence dependencies (interactions) are not always local 3-dimensional folding; DNA, RNA & protein interactions The methods outlined so far all fail to incorporate long-range ( 4 bp or a.a.) interactions. New model types are needed. 29

30 Modeling long-range dependency The principal work in this area is Burge & Karlin s (1997) maximal dependence decomposition (MDD). More recently, Cai et al (2000) and Barash et al (2003) used Bayes networks (BN). Ellrott et al (2002) optimized the sequence order in which a stationary Markov chain models the motif. We have adapted this last idea, to give permuted variable length Markov models (PVLMM). Potamianos & Jelinek (1998) have related work on decision trees (PVLMM(D)). 30

31 Maximal Dependence Decomposition MDD starts with a mutual independence model as with WMMs. The data are then iteratively subdivided, at each stage splitting on the most dependent position, suitably defined. At the tips of the tree so defined, a mutual independence model across all remaining positions is used. The details can vary according to the splitting criterion (Burge & Karlin used χ 2 ), the actual splits (binary, etc), and the stopping rule. However, the result is always a single tree. 31

32 Issues in modeling short motifs In any study of this kind, essential items are: the model class (e.g. VLMM) the way we search through the model class (e.g. by forward selection) the way we compare models when searching (e.g. by χ 2 ), and finally, the way we assess the final model in relation to our aims (e.g. by cross-validation). We always need interesting, high-quality datasets. 32

33 Model classes and model search For illustrative purposes, we will compare WMM, WAM, MDD and PVLMM(decision) for TFBS and splice donor recognition. Here we search using a simple procedure: recursively choosing the best extension of a current model, or forward selection. Our slower alternative moves through the models using reversible jump Markov chain Monte Carlo (RJMCMC). 33

34 Model comparison We fit the models using maximum likelihood, and compare fitted models using both AIC and BIC, standard penalties for model complexity. Better than either of these two is approximate normalized maximum likelihood (NML), Barron et al (1998). We use mixture models for the data with Jeffreys (Dirichlet) priors. 34

35 A simple illustration: transcription factor binding sites These are of great interest, their signals are very weak, and we typically have only a few instances. We have studied 43 TFBS with effective length 9 and 20 instances. In 17/43 cases we are able to improve upon WMM, the current standard; in 26/43, we cannot. 35

36 20 instances of P$DOF3_01 GTCTAAAGCGT aattaaagtaa GACGAAAGCAA aattaaagtgc GTCTAAAGCga GCGAAAAGCGA GCGTAAAGCAG TAGAAAAGGCG aattaaagtac CACAAAAGCCC GCCCAAAgatc tgacaaagcgt GCGGAAAgatc aattaaagcaa CAAAAAAGGCG taaaaaaggct CAGCAAAGACg GGAAAAAGCAA AGCAAAAGTGC GCAGAAAGTCA 36

37 Modelling P$DOF3_01 Sn Sp WMM.15 MDD.60 PVLMM(D).55 37

38 Now we touch on finding protein-coding genes 38

39 12 examples of 5 splice (donor) sites exon TCGGTGAGT intron TGGGTGTGT CCGGTCCGT ATG GTAAGA TCT GTAAGT CAGGTAGGA CAGGTAGGG AAGGTAAGG AGGGTATGG TGGGTAAGG GAGGTTAGT CATGTGAGT 39

40 Sequence logo for human splice donor sites Base A C G T

41 Splice site dataset Human splice donor sequences from SpliceDB, Burset et al (2001): 15,155 canonical donor sites of length 9, with GT conserved at positions 0 and 1 47,495 false donor sites from the set of all sequences which lie within 40 bp on both sides of the characteristic donor dinucleotide GT. 41

42 Part of a context (decision) tree for position -2 of a splice donor PVLMM Node #s:counts; Edge #s:split variables. Sequence order: +2(A/G) +5(T) -1(G) +4(G) -2(A) +3(A) -3(A) 42

43 Parts of MDD trees for splice donors In each case, splits are into the most frequent nt vs the others. 43

44 Model assessment: Stand-alone splice site recognition M: motif model B: background model Given a sequence x = (x 1,, x L ), we predict x to be a motif (here splice donor) if log {P(x M) / P(x B)} > c, for a suitably chosen threshold value c. 44

45 Model assessment: terms TP: true positives, TN: true negatives, FP: false positives FN: false negatives Sensitivity (sn) and specificity (sp) are given by sn = TP / [TP + FN] sp = TP / [TP + FP]. A 5-fold cross-validation is used in assessing performance. 45

46 Optimal permutation: Sp vs Sn Comparison of PVLMM decision tree (NML, ord = 5), 46 MDD (chi-square), WAM and WMM.

47 Model assessment: Integrated recognition For this assessment, we integrate the splice donor models into SLAM, Pachter et al (2002), a eukaryotic cross-species gene finder. The training data consists of 3,735 aligned human and mouse gene sequences. The resulting SLAM model is then tested on the Rosetta set of 117 single human gene sequences. 47

48 Results at the nucleotide level VLMM(D) MDD PVLMM(D) Sensitivity Specificity Results at the exon level VLMM(D) MDD PVLMM(D) Correct 362/ / /464 Partial 84/460 83/465 78/464 Wrong 14/460 17/465 15/464 Missing 25/465 23/465 22/465 Sensitivity Specificity 362/ / / / / /464 48

49 Interpretation of PVLMM model selected We use sequence logos to provide simple interpretations of our selected PVLMM, including the optimal permutation. 49

50 Beginning of the splicing process splice donor splice acceptor 50

51 Long-range dependence in the chosen model U1 sn RNA G U C C A U U C A Optimal permutation:

52 The context tree for

53 Some future work Joint modelling of human and mouse sites Joint modelling of multiple motifs in one species Including indels and dependence 53

54 Acknowledgements Xiaoyue Zhao, UCB Mauro Delorenzi (ISREC) Sourav Chatterji, UCB The SLAM team: Simon Cawley, Affymetrix Lior Pachter, UCB Marina Alexandersson, FCC 54

55 55

56 References Biological Sequence Analysis R Durbin, S Eddy, A Krogh and G Mitchison Cambridge University Press, Bioinformatics The machine learning approach P Baldi and S Brunak The MIT Press, 1998 Post-Genome Informatics M Kanehisa Oxford University Press,

57 Bayes networks (BN): an example Here each node corresponds to a sequence position, and the tree 57 defines conditional independence constraints on the distribution.

58 To find genes, we need to model splice sites 58

59 Weight matrix models, Staden (1984) Base A C G T A weight matrix for donor sites. Essentially a mutual independence model. An improvement over the consensus CAGGTAAGT. 59

60 Transcription factor binding site (TFBS) recognition We extracted all known TFBS from the TRANSFAC database with a) length 9, and b) 20 known sites. In all this gave 1,419 sites corresponding to 43 TF. Next we randomly inserted each site into a background sequence of length 1,000 simulated from a stationary 3rd order MM, with parameters estimated from a large collection of human sequence upstream of genes. Finally, we used the PVLMM, MDD and WMM to scan these sequences within a 10-fold cross-validation framework, to select a number of top-scoring sequences as putative binding sites. We always made this number equal to the true number in the60 sequences, and so sn = sp.

61 Human Transcription Factor Binding Sites 61

62 TFBS: three results P$DOF3.01 wmm pvlmm mdd V$CIZ.01 wmm pvlmm mdd P$EMBP1.Q2 wmm pvlmm mdd Entry: sensitivity/specificity Of the 43 TF, our dependence methods led to no improvement in 27 cases. 62

63 TFBS: more results 63

O 3 O 4 O 5. q 3. q 4. Transition

O 3 O 4 O 5. q 3. q 4. Transition Hidden Markov Models Hidden Markov models (HMM) were developed in the early part of the 1970 s and at that time mostly applied in the area of computerized speech recognition. They are first described in

More information

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki.

Protein Bioinformatics. Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet sandberg.cmb.ki. Protein Bioinformatics Rickard Sandberg Dept. of Cell and Molecular Biology Karolinska Institutet rickard.sandberg@ki.se sandberg.cmb.ki.se Outline Protein features motifs patterns profiles signals 2 Protein

More information

DNA Feature Sensors. B. Majoros

DNA Feature Sensors. B. Majoros DNA Feature Sensors B. Majoros What is Feature Sensing? A feature is any DNA subsequence of biological significance. For practical reasons, we recognize two broad classes of features: signals short, fixed-length

More information

Comparative Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Comparative Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Comparative Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following: using related genomes

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Bioinformatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD Department of Computer Science University of Missouri 2008 Free for Academic

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

HMMs and biological sequence analysis

HMMs and biological sequence analysis HMMs and biological sequence analysis Hidden Markov Model A Markov chain is a sequence of random variables X 1, X 2, X 3,... That has the property that the value of the current state depends only on the

More information

Today s Lecture: HMMs

Today s Lecture: HMMs Today s Lecture: HMMs Definitions Examples Probability calculations WDAG Dynamic programming algorithms: Forward Viterbi Parameter estimation Viterbi training 1 Hidden Markov Models Probability models

More information

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs15.html Describing & Modeling Patterns

More information

Markov Models & DNA Sequence Evolution

Markov Models & DNA Sequence Evolution 7.91 / 7.36 / BE.490 Lecture #5 Mar. 9, 2004 Markov Models & DNA Sequence Evolution Chris Burge Review of Markov & HMM Models for DNA Markov Models for splice sites Hidden Markov Models - looking under

More information

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences

Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Statistical Machine Learning Methods for Biomedical Informatics II. Hidden Markov Model for Biological Sequences Jianlin Cheng, PhD William and Nancy Thompson Missouri Distinguished Professor Department

More information

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

Hidden Markov Models (I)

Hidden Markov Models (I) GLOBEX Bioinformatics (Summer 2015) Hidden Markov Models (I) a. The model b. The decoding: Viterbi algorithm Hidden Markov models A Markov chain of states At each state, there are a set of possible observables

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1.

Motifs and Logos. Six Introduction to Bioinformatics. Importance and Abundance of Motifs. Getting the CDS. From DNA to Protein 6.1. Motifs and Logos Six Discovering Genomics, Proteomics, and Bioinformatics by A. Malcolm Campbell and Laurie J. Heyer Chapter 2 Genome Sequence Acquisition and Analysis Sami Khuri Department of Computer

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Introduction to Hidden Markov Models for Gene Prediction ECE-S690 Outline Markov Models The Hidden Part How can we use this for gene prediction? Learning Models Want to recognize patterns (e.g. sequence

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 07: profile Hidden Markov Model http://bibiserv.techfak.uni-bielefeld.de/sadr2/databasesearch/hmmer/profilehmm.gif Slides adapted from Dr. Shaojie Zhang

More information

Data Mining in Bioinformatics HMM

Data Mining in Bioinformatics HMM Data Mining in Bioinformatics HMM Microarray Problem: Major Objective n Major Objective: Discover a comprehensive theory of life s organization at the molecular level 2 1 Data Mining in Bioinformatics

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Markov Chains and Hidden Markov Models. = stochastic, generative models

Markov Chains and Hidden Markov Models. = stochastic, generative models Markov Chains and Hidden Markov Models = stochastic, generative models (Drawing heavily from Durbin et al., Biological Sequence Analysis) BCH339N Systems Biology / Bioinformatics Spring 2016 Edward Marcotte,

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Sequence analysis and comparison

Sequence analysis and comparison The aim with sequence identification: Sequence analysis and comparison Marjolein Thunnissen Lund September 2012 Is there any known protein sequence that is homologous to mine? Are there any other species

More information

Introduction to Hidden Markov Models (HMMs)

Introduction to Hidden Markov Models (HMMs) Introduction to Hidden Markov Models (HMMs) But first, some probability and statistics background Important Topics 1.! Random Variables and Probability 2.! Probability Distributions 3.! Parameter Estimation

More information

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and Hidden Markov Models Modeling the statistical properties of biological sequences and distinguishing regions

More information

Hidden Markov Models and Their Applications in Biological Sequence Analysis

Hidden Markov Models and Their Applications in Biological Sequence Analysis Hidden Markov Models and Their Applications in Biological Sequence Analysis Byung-Jun Yoon Dept. of Electrical & Computer Engineering Texas A&M University, College Station, TX 77843-3128, USA Abstract

More information

1. In most cases, genes code for and it is that

1. In most cases, genes code for and it is that Name Chapter 10 Reading Guide From DNA to Protein: Gene Expression Concept 10.1 Genetics Shows That Genes Code for Proteins 1. In most cases, genes code for and it is that determine. 2. Describe what Garrod

More information

Stephen Scott.

Stephen Scott. 1 / 21 sscott@cse.unl.edu 2 / 21 Introduction Designed to model (profile) a multiple alignment of a protein family (e.g., Fig. 5.1) Gives a probabilistic model of the proteins in the family Useful for

More information

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene.

The Computational Problem. We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene. GENE FINDING The Computational Problem We are given a sequence of DNA and we wish to know which subsequence or concatenation of subsequences constitutes a gene. The Computational Problem Confounding Realities:

More information

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2 Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis Shamir s lecture notes and Rabiner s tutorial on HMM 1 music recognition deal with variations

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Interpolated Markov Models for Gene Finding. BMI/CS 776 Spring 2015 Colin Dewey

Interpolated Markov Models for Gene Finding. BMI/CS 776  Spring 2015 Colin Dewey Interpolated Markov Models for Gene Finding BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2015 Colin Dewey cdewey@biostat.wisc.edu Goals for Lecture the key concepts to understand are the following the

More information

RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology"

RNA Search and! Motif Discovery Genome 541! Intro to Computational! Molecular Biology RNA Search and! Motif Discovery" Genome 541! Intro to Computational! Molecular Biology" Day 1" Many biologically interesting roles for RNA" RNA secondary structure prediction" 3 4 Approaches to Structure

More information

Lecture 7 Sequence analysis. Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models Lecture 7 Sequence analysis. Hidden Markov Models Nicolas Lartillot may 2012 Nicolas Lartillot (Universite de Montréal) BIN6009 may 2012 1 / 60 1 Motivation 2 Examples of Hidden Markov models 3 Hidden

More information

Introduction to Bioinformatics

Introduction to Bioinformatics CSCI8980: Applied Machine Learning in Computational Biology Introduction to Bioinformatics Rui Kuang Department of Computer Science and Engineering University of Minnesota kuang@cs.umn.edu History of Bioinformatics

More information

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models (HMMs) and Profiles Hidden Markov Models (HMMs) and Profiles Swiss Institute of Bioinformatics (SIB) 26-30 November 2001 Markov Chain Models A Markov Chain Model is a succession of states S i (i = 0, 1,...) connected by transitions.

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Predicting Protein Functions and Domain Interactions from Protein Interactions

Predicting Protein Functions and Domain Interactions from Protein Interactions Predicting Protein Functions and Domain Interactions from Protein Interactions Fengzhu Sun, PhD Center for Computational and Experimental Genomics University of Southern California Outline High-throughput

More information

Multiple Sequence Alignment using Profile HMM

Multiple Sequence Alignment using Profile HMM Multiple Sequence Alignment using Profile HMM. based on Chapter 5 and Section 6.5 from Biological Sequence Analysis by R. Durbin et al., 1998 Acknowledgements: M.Sc. students Beatrice Miron, Oana Răţoi,

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2014 1 HMM Lecture Notes Dannie Durand and Rose Hoberman November 6th Introduction In the last few lectures, we have focused on three problems related

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME

MATHEMATICAL MODELS - Vol. III - Mathematical Modeling and the Human Genome - Hilary S. Booth MATHEMATICAL MODELING AND THE HUMAN GENOME MATHEMATICAL MODELING AND THE HUMAN GENOME Hilary S. Booth Australian National University, Australia Keywords: Human genome, DNA, bioinformatics, sequence analysis, evolution. Contents 1. Introduction:

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98)

Hidden Markov Models. Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) Hidden Markov Models Main source: Durbin et al., Biological Sequence Alignment (Cambridge, 98) 1 The occasionally dishonest casino A P A (1) = P A (2) = = 1/6 P A->B = P B->A = 1/10 B P B (1)=0.1... P

More information

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models 02-710 Computational Genomics Systems biology Putting it together: Data integration using graphical models High throughput data So far in this class we discussed several different types of high throughput

More information

GCD3033:Cell Biology. Transcription

GCD3033:Cell Biology. Transcription Transcription Transcription: DNA to RNA A) production of complementary strand of DNA B) RNA types C) transcription start/stop signals D) Initiation of eukaryotic gene expression E) transcription factors

More information

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery

CSE 527 Autumn Lectures 8-9 (& part of 10) Motifs: Representation & Discovery CSE 527 Autumn 2006 Lectures 8-9 (& part of 10) Motifs: Representation & Discovery 1 DNA Binding Proteins A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%,

More information

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -

More information

Bioinformatics Chapter 1. Introduction

Bioinformatics Chapter 1. Introduction Bioinformatics Chapter 1. Introduction Outline! Biological Data in Digital Symbol Sequences! Genomes Diversity, Size, and Structure! Proteins and Proteomes! On the Information Content of Biological Sequences!

More information

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16 VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by

More information

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington

More information

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12)

Amino Acid Structures from Klug & Cummings. Bioinformatics (Lec 12) Amino Acid Structures from Klug & Cummings 2/17/05 1 Amino Acid Structures from Klug & Cummings 2/17/05 2 Amino Acid Structures from Klug & Cummings 2/17/05 3 Amino Acid Structures from Klug & Cummings

More information

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands

1/22/13. Example: CpG Island. Question 2: Finding CpG Islands I529: Machine Learning in Bioinformatics (Spring 203 Hidden Markov Models Yuzhen Ye School of Informatics and Computing Indiana Univerty, Bloomington Spring 203 Outline Review of Markov chain & CpG island

More information

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus:

Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: m Eukaryotic mrna processing Newly made RNA is called primary transcript and is modified in three ways before leaving the nucleus: Cap structure a modified guanine base is added to the 5 end. Poly-A tail

More information

Computational Genomics and Molecular Biology, Fall

Computational Genomics and Molecular Biology, Fall Computational Genomics and Molecular Biology, Fall 2011 1 HMM Lecture Notes Dannie Durand and Rose Hoberman October 11th 1 Hidden Markov Models In the last few lectures, we have focussed on three problems

More information

DNA Binding Proteins CSE 527 Autumn 2007

DNA Binding Proteins CSE 527 Autumn 2007 DNA Binding Proteins CSE 527 Autumn 2007 A variety of DNA binding proteins ( transcription factors ; a significant fraction, perhaps 5-10%, of all human proteins) modulate transcription of protein coding

More information

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM I529: Machine Learning in Bioinformatics (Spring 2017) Content HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM Yuzhen Ye School of Informatics and Computing Indiana University,

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, etworks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Applications of Hidden Markov Models

Applications of Hidden Markov Models 18.417 Introduction to Computational Molecular Biology Lecture 18: November 9, 2004 Scribe: Chris Peikert Lecturer: Ross Lippert Editor: Chris Peikert Applications of Hidden Markov Models Review of Notation

More information

SOME CHALLENGES IN COMPUTATIONAL BIOLOGY

SOME CHALLENGES IN COMPUTATIONAL BIOLOGY SOME CHALLENGES IN COMPUTATIONAL BIOLOGY M. Vidyasagar Advanced Technology Centre Tata Consultancy Services 6th Floor, Khan Lateefkhan Building Hyderabad 500 001, INDIA sagar@atc.tcs.co.in Keywords: Computational

More information

BME 5742 Biosystems Modeling and Control

BME 5742 Biosystems Modeling and Control BME 5742 Biosystems Modeling and Control Lecture 24 Unregulated Gene Expression Model Dr. Zvi Roth (FAU) 1 The genetic material inside a cell, encoded in its DNA, governs the response of a cell to various

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748 CAP 5510: Introduction to Bioinformatics Giri Narasimhan ECS 254; Phone: x3748 giri@cis.fiu.edu www.cis.fiu.edu/~giri/teach/bioinfs07.html 2/8/07 CAP5510 1 Pattern Discovery 2/8/07 CAP5510 2 Patterns Nature

More information

Motivating the need for optimal sequence alignments...

Motivating the need for optimal sequence alignments... 1 Motivating the need for optimal sequence alignments... 2 3 Note that this actually combines two objectives of optimal sequence alignments: (i) use the score of the alignment o infer homology; (ii) use

More information

QB LECTURE #4: Motif Finding

QB LECTURE #4: Motif Finding QB LECTURE #4: Motif Finding Adam Siepel Nov. 20, 2015 2 Plan for Today Probability models for binding sites Scoring and detecting binding sites De novo motif finding 3 Transcription Initiation Chromatin

More information

HMM : Viterbi algorithm - a toy example

HMM : Viterbi algorithm - a toy example MM : Viterbi algorithm - a toy example 0.6 et's consider the following simple MM. This model is composed of 2 states, (high GC content) and (low GC content). We can for example consider that state characterizes

More information

Week 10: Homology Modelling (II) - HHpred

Week 10: Homology Modelling (II) - HHpred Week 10: Homology Modelling (II) - HHpred Course: Tools for Structural Biology Fabian Glaser BKU - Technion 1 2 Identify and align related structures by sequence methods is not an easy task All comparative

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA

INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA INTERACTIVE CLUSTERING FOR EXPLORATION OF GENOMIC DATA XIUFENG WAN xw6@cs.msstate.edu Department of Computer Science Box 9637 JOHN A. BOYLE jab@ra.msstate.edu Department of Biochemistry and Molecular Biology

More information

Computational Biology: Basics & Interesting Problems

Computational Biology: Basics & Interesting Problems Computational Biology: Basics & Interesting Problems Summary Sources of information Biological concepts: structure & terminology Sequencing Gene finding Protein structure prediction Sources of information

More information

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche

Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche Protein Structure Prediction II Lecturer: Serafim Batzoglou Scribe: Samy Hamdouche The molecular structure of a protein can be broken down hierarchically. The primary structure of a protein is simply its

More information

CISC 636 Computational Biology & Bioinformatics (Fall 2016)

CISC 636 Computational Biology & Bioinformatics (Fall 2016) CISC 636 Computational Biology & Bioinformatics (Fall 2016) Predicting Protein-Protein Interactions CISC636, F16, Lec22, Liao 1 Background Proteins do not function as isolated entities. Protein-Protein

More information

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos Sequence Analysis BBSI 2006: Lecture #(χ+1) Takis Benos (2006) Molecular Genetics 101 What is a gene? We cannot define it (but we know it when we see it ) A loose definition: Gene is a DNA/RNA information

More information

Evolutionary Models. Evolutionary Models

Evolutionary Models. Evolutionary Models Edit Operators In standard pairwise alignment, what are the allowed edit operators that transform one sequence into the other? Describe how each of these edit operations are represented on a sequence alignment

More information

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression

CSEP 590A Summer Lecture 4 MLE, EM, RE, Expression CSEP 590A Summer 2006 Lecture 4 MLE, EM, RE, Expression 1 FYI, re HW #2: Hemoglobin History Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators

CSEP 590A Summer Tonight MLE. FYI, re HW #2: Hemoglobin History. Lecture 4 MLE, EM, RE, Expression. Maximum Likelihood Estimators CSEP 59A Summer 26 Lecture 4 MLE, EM, RE, Expression FYI, re HW #2: Hemoglobin History 1 Alberts et al., 3rd ed.,pg389 2 Tonight MLE: Maximum Likelihood Estimators EM: the Expectation Maximization Algorithm

More information

Bioinformatics 2 - Lecture 4

Bioinformatics 2 - Lecture 4 Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what

More information

CSCE555 Bioinformatics. Protein Function Annotation

CSCE555 Bioinformatics. Protein Function Annotation CSCE555 Bioinformatics Protein Function Annotation Why we need to do function annotation? Fig from: Network-based prediction of protein function. Molecular Systems Biology 3:88. 2007 What s function? The

More information

Mitochondrial Genome Annotation

Mitochondrial Genome Annotation Protein Genes 1,2 1 Institute of Bioinformatics University of Leipzig 2 Department of Bioinformatics Lebanese University TBI Bled 2015 Outline Introduction Mitochondrial DNA Problem Tools Training Annotation

More information

Hidden Markov models in population genetics and evolutionary biology

Hidden Markov models in population genetics and evolutionary biology Hidden Markov models in population genetics and evolutionary biology Gerton Lunter Wellcome Trust Centre for Human Genetics Oxford, UK April 29, 2013 Topics for today Markov chains Hidden Markov models

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Genomics and bioinformatics summary. Finding genes -- computer searches

Genomics and bioinformatics summary. Finding genes -- computer searches Genomics and bioinformatics summary 1. Gene finding: computer searches, cdnas, ESTs, 2. Microarrays 3. Use BLAST to find homologous sequences 4. Multiple sequence alignments (MSAs) 5. Trees quantify sequence

More information

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models. Three classic HMM problems An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Chapter 4: Hidden Markov Models

Chapter 4: Hidden Markov Models Chapter 4: Hidden Markov Models 4.1 Introduction to HMM Prof. Yechiam Yemini (YY) Computer Science Department Columbia University Overview Markov models of sequence structures Introduction to Hidden Markov

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018

CONCEPT OF SEQUENCE COMPARISON. Natapol Pornputtapong 18 January 2018 CONCEPT OF SEQUENCE COMPARISON Natapol Pornputtapong 18 January 2018 SEQUENCE ANALYSIS - A ROSETTA STONE OF LIFE Sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of

More information

Lecture 7: Simple genetic circuits I

Lecture 7: Simple genetic circuits I Lecture 7: Simple genetic circuits I Paul C Bressloff (Fall 2018) 7.1 Transcription and translation In Fig. 20 we show the two main stages in the expression of a single gene according to the central dogma.

More information

Lecture 18 June 2 nd, Gene Expression Regulation Mutations

Lecture 18 June 2 nd, Gene Expression Regulation Mutations Lecture 18 June 2 nd, 2016 Gene Expression Regulation Mutations From Gene to Protein Central Dogma Replication DNA RNA PROTEIN Transcription Translation RNA Viruses: genome is RNA Reverse Transcriptase

More information

Lecture 3: Markov chains.

Lecture 3: Markov chains. 1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.

More information

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY BIOINFORMATICS Lecture 11-12 Hidden Markov Models ROBI POLIKAR 2011, All Rights Reserved, Robi Polikar. IGNAL PROCESSING & PATTERN RECOGNITION LABORATORY @ ROWAN UNIVERSITY These lecture notes are prepared

More information

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Anomaly Detection for the CERN Large Hadron Collider injection magnets Anomaly Detection for the CERN Large Hadron Collider injection magnets Armin Halilovic KU Leuven - Department of Computer Science In cooperation with CERN 2018-07-27 0 Outline 1 Context 2 Data 3 Preprocessing

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information