Bioinformatics 2 - Lecture 4

Similar documents
Bayesian Machine Learning - Lecture 7

STA 4273H: Statistical Machine Learning

STA 414/2104: Machine Learning

O 3 O 4 O 5. q 3. q 4. Transition

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

COMP90051 Statistical Machine Learning

ECE521 Lecture 19 HMM cont. Inference in HMM

Hidden Markov Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Linear Dynamical Systems

Computational Biology: Basics & Interesting Problems

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Dynamic Approaches: The Hidden Markov Model

9 Forward-backward algorithm, sum-product on factor graphs

Note Set 5: Hidden Markov Models

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models 1

Graphical Models Seminar

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Machine Learning for OR & FE

MACHINE LEARNING 2 UGM,HMMS Lecture 7

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Markov Chains and Hidden Markov Models. = stochastic, generative models

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Lecture 7 Sequence analysis. Hidden Markov Models

Hidden Markov Models for biological sequence analysis

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Introduction to Bioinformatics

Computational Genomics and Molecular Biology, Fall

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Probabilistic Graphical Models

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Lecture 4: State Estimation in Hidden Markov Models (cont.)

GLOBEX Bioinformatics (Summer 2015) Genetic networks and gene expression data

Hidden Markov Models. Terminology, Representation and Basic Problems

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Data Mining in Bioinformatics HMM

A Simple Protein Synthesis Model

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

HMM part 1. Dr Philip Jackson

HMMs and biological sequence analysis

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Introduction to Artificial Intelligence (AI)

Introduction to Machine Learning CMU-10701

CS711008Z Algorithm Design and Analysis

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Hidden Markov Models and some applications

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models (I)

HIDDEN MARKOV MODELS

26 : Spectral GMs. Lecturer: Eric P. Xing Scribes: Guillermo A Cidre, Abelino Jimenez G.

Recall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem

State Space and Hidden Markov Models

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Pattern Recognition and Machine Learning

Genome Assembly. Sequencing Output. High Throughput Sequencing

Inference and estimation in probabilistic time series models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos

Machine Learning for Data Science (CS4786) Lecture 19

Multiplex network inference

Introduction to Probabilistic Graphical Models: Exercises

Dynamic Bayesian Networks with Deterministic Latent Tables

Biology 644: Bioinformatics

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Today s Lecture: HMMs

Markov Models & DNA Sequence Evolution

ROBI POLIKAR. ECE 402/504 Lecture Hidden Markov Models IGNAL PROCESSING & PATTERN RECOGNITION ROWAN UNIVERSITY

Computational Genomics. Reconstructing dynamic regulatory networks in multiple species

Hidden Markov Models. Ron Shamir, CG 08

Hidden Markov Models and some applications

Computational Genomics. Systems biology. Putting it together: Data integration using graphical models

Giri Narasimhan. CAP 5510: Introduction to Bioinformatics. ECS 254; Phone: x3748

An Introduction to Bioinformatics Algorithms Hidden Markov Models

CS532, Winter 2010 Hidden Markov Models

Evolutionary Models. Evolutionary Models

Steven L. Scott. Presented by Ahmet Engin Ural

Hidden Markov models

3/1/17. Content. TWINSCAN model. Example. TWINSCAN algorithm. HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Probabilistic Graphical Models

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Efficient Sensitivity Analysis in Hidden Markov Models

Hidden Markov Models. Terminology and Basic Algorithms

Sequence Analysis. BBSI 2006: Lecture #(χ+1) Takis Benos (2006) BBSI MAY P. Benos. Molecular Genetics 101. Cell s internal world

Machine Learning 4771

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

HMM for modeling aligned multiple sequences: phylo-hmm & multivariate HMM

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Template-Based Representations. Sargur Srihari

L23: hidden Markov models

p L yi z n m x N n xi

Transcription:

Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011

Sequences Many data types are ordered, i.e. you can naturally say what is before and what is after Chief example, data with a time series structure Other key biological example, sequences (order given by polarity of the molecules) Any other examples right in front of your eyes?

Latent variables in sequential data Sometimes what we observe is not what we are interested in For example, in a medical application, one could think of a person being either healthy (H), diseased (D) or recovering (R) What we measure are (related) quantities such as the temperature, blood pressure, O 2 concentration in blood,... The job of the doctor is to infer the latent state from the measurements

Latent variables in sequential data In a transcriptomic experiment, we can measure mrna abundance at different time points after a stimulus What we may be really interested in is the concentration of active transcription factor proteins, which may give a more direct insight in how the cells respond to the stimulus Again, we are interested in reconstructing a latent variable from observations; this time the latent variables are continuous (concentrations)

Network representation of latent variables We represent the latent states as a sequence of random variables; each of them depends only on the previous one The observations depend only on the corresponding state

States and parameters We are interested in the posterior distribution of the states x 1:T given the observations y 1:T (subscript 1 : T denotes the collection of variables from 1 to T ) Notice that we only have one observation per time point In the independent observations case, this would not be enough We also have parameters which we assume known: these are in the known probabilities π = p (x(1)) T x(t 1),x(t) = p (x(t) x(t 1)) O x,y = p (y(t) x(t) We assume parameters to be time-independent

The single time marginals The joint posterior over the states is, by the rules of probability, proportional to the joint probability of observations and states p (x 1:T y 1:T ) p (x 1:T, y 1:T ) An object of central importance is the single time marginal for the latent variable at time t This is obtained by marginalising the latent variables at all other time points; by the proportionality above p (x(t) y 1:T ) p (x(t), y 1:T )

Networks and factorisations By using the product rule of probability, we can rewrite the joint probability of states and observations as p (x 1:T, y 1:T ) = = p (y t+1:t x 1:T, y 1:t ) p (x 1:T, y 1:t ) (1) Recall that networks encode conditional independence relations; in particular, areas of the network which are not directly connected are independent of each other given the nodes in between.

Some conditional independencies By inspection of the network representation of the model (slide 4), we see that p (y t+1:t x 1:T, y 1:t ) = p (y t+1:t x t+1:t ) (2) Also x t+1:t are conditionally independent of y 1:t given x t, so that p (x 1:T, y 1:t ) = p (x t+1:t x 1:t, y 1:t ) p (x 1:t, y 1:t ) = p (x t+1:t x t ) p (x 1:t, y 1:t ) (3)

Factorisations and messages Putting equations (2,3) into (1), we get p (x 1:T, y 1:T ) = p (y t+1:t, x t+1:t x t ) p (x 1:t, y 1:t ) Marginalising x 1:t 1 and x t+1:t we get the following fundamental factorisation of the single time marginal p (x(t) y 1:T ) α(x(t))β(x(t)) = = p (x(t) y 1:t ) p (y t+1:t x(t)) (4) The single time marginal at time t is the product of the posterior estimate given all the data up to that point, times the likelihood of future observations given the state at t

Aside for Informaticians and like minded people The factorisation in equation (4) is an example of message passing α(x(t)) is a message propagated forwards from the previous observations (forward message or filtered process) β(x(t)) is a message propagated backwards from future observations (backward message) Message passing algorithms allow exact inference in tree structured graphical models (why?) and approximate inference in more complicated models

Filtering: computing the forward message Initialisation: Recursion: α(t) p (x(t), y 1:t ) = = x(t 1) = x(t 1) α(1) p (y(1), x(1)) = πo x(1),y(1) x(t 1) p (x(t), x(t 1), y 1:t ) = p (y(t) x(t)) p (x(t) x(t 1)) p (x(t 1) y 1:t 1 ) = O x(t),y(t) T x(t 1),x(t) α(x(t 1)) where I used the conditional independences of the network to go from line 1 to 2 If x(t) is a continuous, replace the sum with an integral

Computing the backward message Initialisation: β(x(t )) = 1 (why?) Backward recursion: β(x(t 1)) = p (y t:t x(t 1)) = x(t) p (y t:t, x(t) x(t 1)) = = x(t) p (y t+1:t y(t), x(t), x(t 1)) p (y(t)x(t) x(t 1)) = = x(t) β(x(t))p (y(t) x(t)) p (x(t) x(t 1)) Once again, if x is continuous replace sum with integral

Biological problem TF1 S 11 S 21...... gk TFd S k1 S Nd S2d... g1 g2 gn In some organisms, some of the wiring of the network is known Simplest possible model, log-linear model of gene expression g i (t) = j S ij X ij TF j (t) + ɛ where X is a binary matrix encoding the network and ɛ N (0, σ 2 ) is an error term

Inference in the model of transcriptional regulation The simple model of regulation states that gene expression levels are a weighted linear combination of TF levels Usually, we do not know the TF (protein) levels, so we treat this as a latent variable problem To incorporate dynamics, we assume the TF levels at time t to depend on the levels at time t 1, and gene expression measurements to be conditionally independent given TF levels Both TF and gene expression levels are assumed to be Gaussian; Linear Dynamical System (LDS)

LDS priors and jargon The time evolution of the hidden states is given by a Gaussian random walk x(t + 1) = Ax(t) + w(t) p (x(t + 1) x(t)) = N (x(t), Σ w ) (5) The term w N (0, Σ w ) is they system noise term; the matrix A is sometimes called the gain matrix. Observations are related to states using another linear Gaussian model y(t) = Bx(t) + ɛ(t) p (y(t) x(t)) = N (Bx(t), Σ ɛ ) (6) where ɛ N (0, Σ ɛ ) is the observation noise and B is the observation matrix

Inference for LDS Since both noises are Gaussian and all equations are linear, all the messages will be Gaussian This simplifies the inference as we do not need to compute normalisation constants For example, the forward message is computed as α(x(t)) = N (x(t) µ t, Σ t ) = dx(t 1)α(x(t 1))N (x(t) Ax(t 1), Σ w ) N (y(t) Bx(t), Σ ɛ ) Exercise: calculate the forward message

Biological motivations In many cases, we observe intrinsically discrete variables (e.g. DNA bases) Also, we are interested in intrinsically discrete latent states (e.g. is this fragment of DNA a gene or not?) These situations often arise when dealing with problems in genomics and functional genomics We will give three examples, and show some details on how to deal with one of these

How to find genes The outcome of a sequencing experiment is the sequence of a region of the genome Which parts of the sequence gets transcribed into mrna? Possible solution: sequence the mrna (laborious) Alternatively, use the codon effect: genic DNA is not uniformly distributed since triplets of basis code for specific amino-acids Thus, the sequence of a gene will look different from the sequence of a not gene region

CpG islands In the genome, a G nucleotide preceded by a C nucleotide is rare (strong tendency to be methylated and mutate into T) In some regions related to promoters of genes, methylation is inhibited so many more C followed by G (CpG) These functional regions are called CpG islands and they are characterized by a different nucleotide distribution

ChIP-on-chip data Technology to measure binding of transcription factors to DNA Observe an intensity signal (optical) Want to infer whether a certain intensity associated with a certain fragment of DNA implies binding or not More in Ian Simpson s guest lecture

Hidden Markov Models jargon When the latent states can only assume a finite number of discrete values, we have a Hidden Markov Model (HMMs) HMMs have a long history in speech recognition and signal processing and they have their own terminology The conditional probabilities p (x(t + 1) x(t)) are called transition probabilities. They are collected in a matrix T ij = p (x(t + 1) = i x(t) = j) The conditional probabilities p (y(t) x(t)) are called emission probabilities. If the observed variables are also discrete, we can collect the emission probabilities in another matrix O ij = p (y(t) = i x(t) = j)

Inference in HMM The forward and backward messages are simply computed as matrix multiplications involving emission and transition matrices The forward message is α(t) O x(t),y(t) T x(t 1),x(t) α(x(t 1)) x(t 1) The backward message is β(t 1) = x(t) β(x(t))p (y(t) x(t)) p (x(t) x(t 1))

HMM for CpG islands We construct latent variables with eight states representing bases in normal DNA and CpG regions, (A,C,G,T,Ā, C,Ḡ, T) The 8 8 transition matrix will have very low entry for T C,G and higher entry for T C,Ḡ The emission matrix is just O x,x = 1 = O x, x with all other entries zero, indicating that the observation is just the nucleotide without the CpG/ normal label The specific entries in the transition matrix will be determined from annotated databases