CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Similar documents
Review of probability

Review of Probability Mark Craven and David Page Computer Sciences 760.

Introduction to probability and statistics

Hidden Markov Models. music recognition. deal with variations in - pitch - timing - timbre 2

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Our Status in CSE 5522

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

CS188 Outline. We re done with Part I: Search and Planning! Part II: Probabilistic Reasoning. Part III: Machine Learning

Markov Chains and Hidden Markov Models. = stochastic, generative models

Basic Probability and Statistics

CS 188: Artificial Intelligence. Our Status in CS188

Assignments for lecture Bioinformatics III WS 03/04. Assignment 5, return until Dec 16, 2003, 11 am. Your name: Matrikelnummer: Fachrichtung:

Our Status. We re done with Part I Search and Planning!

Hidden Markov Models for biological sequence analysis I

CS188 Outline. CS 188: Artificial Intelligence. Today. Inference in Ghostbusters. Probability. We re done with Part I: Search and Planning!

CS 188: Artificial Intelligence Fall 2011

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Introduction to Hidden Markov Models for Gene Prediction ECE-S690

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

Data Mining Part 4. Prediction

CS 343: Artificial Intelligence

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

HMM part 1. Dr Philip Jackson

Hidden Markov Models for biological sequence analysis

Terminology. Experiment = Prior = Posterior =

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

CS 188: Artificial Intelligence Spring Announcements

Reinforcement Learning Wrap-up

Statistical Inference

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Hidden Markov Models

CSE 473: Artificial Intelligence

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Hidden Markov Models (I)

O 3 O 4 O 5. q 3. q 4. Transition

Intro to Probability. Andrei Barbu

2. A Basic Statistical Toolbox

CSC384: Intro to Artificial Intelligence Reasoning under Uncertainty-II

Markov Chains and Hidden Markov Models. COMP 571 Luay Nakhleh, Rice University

Reasoning under uncertainty

Approximate Inference

Uncertain Knowledge and Bayes Rule. George Konidaris

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

10/15/2015 A FAST REVIEW OF DISCRETE PROBABILITY (PART 2) Probability, Conditional Probability & Bayes Rule. Discrete random variables

CS 188: Artificial Intelligence Fall 2009

Today s Lecture: HMMs

Probability Based Learning

Hidden Markov Models and some applications

Hidden Markov Models (HMMs) and Profiles

Hidden Markov Models

Markov Models & DNA Sequence Evolution

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models

Hidden Markov Models 1

Stephen Scott.

Lecture #5. Dependencies along the genome

Math 151. Rumbos Spring Solutions to Review Problems for Exam 1

Bayesian Networks. Motivation

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Base Rates and Bayes Theorem

CS 188: Artificial Intelligence. Bayes Nets

Data Mining in Bioinformatics HMM

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Probabilistic Robotics

Day 5: Generative models, structured classification

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

The Naïve Bayes Classifier. Machine Learning Fall 2017

Probabilistic Graphical Models for Image Analysis - Lecture 1

Hidden Markov Models. Ron Shamir, CG 08

Bayesian Methods: Naïve Bayes

Stephen Scott.

Logistic Regression. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Hidden Markov Models and some applications

Hidden Markov Models Part 1: Introduction

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Hidden Markov Models

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Naïve Bayes classification

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Quantitative Bioinformatics

Lecture 3: Pattern Classification. Pattern classification

CS 361: Probability & Statistics

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

PROBABILITY. Inference: constraint propagation

CS 188: Artificial Intelligence Spring Today

Hidden Markov Models. Terminology, Representation and Basic Problems

Probabilistic Robotics. Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics (S. Thurn et al.

CSCE 471/871 Lecture 3: Markov Chains and

Bayesian Models in Machine Learning

Topic 10: Hypothesis Testing

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Examples of Hidden Markov Models

Grundlagen der Bioinformatik, SS 09, D. Huson, June 16, S. Durbin, S. Eddy, A. Krogh and G. Mitchison, Biological Sequence

Transcription:

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS * The contents are adapted from Dr. Jean Gao at UT Arlington Mingon Kang, Ph.D. Computer Science, Kennesaw State University

Primer on Probability Random Variable Probability Distribution Independence Bayes Theorem Means and Variances

Definition of Probability frequentist interpretation: the probability of an event is proportion of the time events of same kind will occur in the long run eamples the probability my flight to Chicago will be on time the probability this ticket will win the lottery the probability it will rain tomorrow always a number in the interval [0,1] 0 means never occurs 1 means always occurs

Sample Spaces sample space: a set of possible outcomes for some event eamples flight to Chicago: {on time, late, cancelled, accident} lottery:{ticket 1 wins, ticket 2 wins,,ticket n wins} weather tomorrow: {rain, not rain} or {sun, rain, snow} or {sun, clouds, rain, snow, sleet} or

Random Variables random variable: a variable representing the outcome of an eperiment eample X represents the outcome of my flight to Chicago we write the probability of my flight being on time as Pr(X = on-time) or when it s clear which variable we re referring to, we may use the shorthand Pr(on-time)

Notation Uppercase letters and capitalized words denote random variables Lowercase letters and uncapitalized words denote values We ll denote a particular value for a variable as follows Pr( X ) Pr( Fever true) we ll also use the shorthand form Pr( ) for Pr( X ) for Boolean random variables, we ll use the shorthand Pr( fever ) for Pr( Fever true) Pr( fever ) for Pr( Fever false )

Probability Distributions if X is a random variable, the function given by Pr(X = ) for each is the probability distribution function p(x) of X (also called probability mass function, pmf.) requirements: Pr( ) 0 for every 0.3 Pr( ) 1 0.2 0.1

Joint Distributions joint probability distribution of two RVs: the function given by Pr(X =, Y = y) read X equals and Y equals y eample, y Pr(X =, Y = y) sunny, on-time 0.20 rain, on-time 0.20 probability that it s sunny and my flight is on time snow, on-time 0.05 sunny, late 0.10 rain, late 0.30 snow, late 0.15

Marginal Distributions the marginal distribution of X is defined by Pr( ) Pr(, y) y the distribution of X ignoring other variables this definition generalizes to more than two variables, e.g. Pr( ) Pr(, y, z) y z

Marginal Distribution Eample joint distribution marginal distribution for X, y Pr(X =, Y = y) sunny, on-time 0.20 rain, on-time 0.20 snow, on-time 0.05 Pr(X = ) sunny 0.3 rain 0.5 snow 0.2 sunny, late 0.10 rain, late 0.30 snow, late 0.15

Joint/Marginal Distributions Two variable case The values of the joint distribution are in the 4 4 square, and the values of the marginal distributions are along the right and bottom margins. Ref: https://en.wikipedia.org/wiki/marginal_distribution

Conditional Distributions the conditional distribution of X given Y is defined as: Pr( X Y y) Pr( X P( Y, Y y) y) the distribution of X given that we know Y

Conditional Distribution Eample joint distribution conditional distribution for X given Y=on-time, y Pr(X =, Y = y) sunny, on-time 0.20 rain, on-time 0.20 snow, on-time 0.05 Pr(X = Y=on-time) sunny 0.20/0.45 = 0.444 rain 0.20/0.45 = 0.444 snow 0.05/0.45 = 0.111 sunny, late 0.10 rain, late 0.30 snow, late 0.15

Independence two random variables, X and Y, are independent if Pr(, y) Pr( ) Pr( y) for all and y

Independence Eample #1 joint distribution, y Pr(X =, Y = y) sunny, on-time 0.20 rain, on-time 0.20 snow, on-time 0.05 sunny, late 0.10 rain, late 0.30 snow, late 0.15 marginal distributions Pr(X = ) sunny 0.3 rain 0.5 snow 0.2 y Pr(Y = y) on-time 0.45 late 0.55 Are X and Y independent here? NO.

Independence Eample #2 joint distribution, y Pr(X =, Y = y) sunny, fly-united 0.27 rain, fly-united 0.45 snow, fly-united 0.18 sunny, fly-northwest 0.03 rain, fly-northwest 0.05 snow, fly-northwest 0.02 marginal distributions Pr(X = ) sunny 0.3 rain 0.5 snow 0.2 y Pr(Y = y) fly-united 0.9 fly-northwest 0.1 Are X and Y independent here? YES.

Conditional Independence two random variables X and Y are conditionally independent given Z if Pr( X Y, Z) Pr( X Z) once you know the value of Z, knowing Y doesn t tell you anything about X alternatively Pr(, y z) Pr( z) Pr( y z) for all, y, z

Conditional Independence Eample Flu Fever Vomit Pr true true true 0.04 true true false 0.04 true false true 0.01 true false false 0.01 false true true 0.009 false true false 0.081 false false true 0.081 false false false 0.729 Fever and Vomit are not independent: e.g. Pr( fever, vomit) Pr( fever ) Pr( vomit) Fever and Vomit are conditionally independent given Flu: Pr( fever, vomit flu) Pr( fever flu) Pr( vomit flu) Pr( fever, vomit flu) Pr( fever flu) Pr( vomit flu) etc.

Bayes Theorem Pr( y) Pr( y ) Pr( ) Pr( y) Pr( y ) Pr( ) Pr( y )Pr( ) this theorem is etremely useful there are many cases when it is hard to estimate Pr( y) directly, but it s not too hard to estimate Pr(y ) and Pr()

Bayes Theorem Eample MDs usually aren t good at estimating Pr(Disorder Symptom) they re usually better at estimating Pr(Symptom Disorder) if we can estimate Pr(Fever Flu) and Pr(Flu) we can use Bayes Theorem to do diagnosis Pr( flu fever ) Pr( fever Pr( fever flu)pr( flu) flu)pr( flu) Pr( fever flu)pr( flu)

Epected Values the epected value of a random variable that takes on numerical values is defined as: X E Pr( ) this is the same thing as the mean we can also talk about the epected value of a function of a random variable g( X ) g( ) E Pr( )

Epected Value Eamples E Shoesize 5 Pr(Shoesize 5)... 14 Pr( Shoesize 14) E gain( Lottery ) gain(winning) Pr(winning) gain(losing) Pr(losing) $1000.001 $10.999 $0.899

Markov Models in Computational Biology there are many cases in which we would like to represent the statistical regularities of some class of sequences genes various regulatory sites in DNA (e.g. where RNA polymerase and transcription factors bind) proteins in a given family Markov models are well suited to this type of task

References Bioinformatics Classic: Krogh et. al. (1998) An Introduction to Hidden Markov Models for Biological Sequences Computational Methods in Mol. Biol. 45-63 Book: Eddy & Durbin, Chapter 3. Jones and Pevzner: Chapter 11. Tutorial: Rabiner, L. (1989) A tutorial on hidden Markov models and selected applications in speech recognition Proc IEEE 77 (2) 257-286

Eample X: 35 years old customer with an income of $4,000 and fair credit rating H: Hypothesis that the customer will buy a computer

P(H X): probability that the customer will buy a computer given that we know his age, credit rating and income (posterior) P(H): probability that the customer will buy a computer regardless of age, credit rating, and income P(X H): probability that the customer is 35 year old, have fair credit rating and earns $40,000, given that he has bought our computer (likelihood) P(X): probability that a person from our set of customers is 35 year old, have fair credit rating and earns $40,000.

Maimum Likelihood Estimation (MLE) Strategy of estimating the parameters of a statistical model given observations Choose a value that maimizes the probability of observed data θ = argma P( θ), where is observation and θ is a parameter

Maimum a posteriori estimation (MAP) Estimating unobserved population (or making a decision) Choose a value that is most probable give observed data and prior belief: θ = argma P y

.34.16.38.12 transition probabilities 0.12 ) Pr( 0.38 ) Pr( 0.34 ) Pr( 0.16 ) Pr( 1 1 1 1 g t g g g c g a i i i i i i i i A Markov Chain Model A T C G begin state transition

Markov Chain Models a Markov chain model (a triplicate) is defined by a set of states some states emit symbols other states (e.g. the begin state) are silent a set of transitions with associated probabilities the transitions emanating from a given state define a distribution over the possible net states Initial state probabilities

Markov Chain Models given some sequence of length L, we can ask how probable the sequence is given based on our model for any probabilistic model of sequences, we can write this probability as Pr( ) Pr( L Pr( L, L1 L1,...,,..., 1 1 ) ) Pr( L1 L2,..., 1 )...Pr( key property of a (1 st order) Markov chain: the probability of each depends only on the value of Pr( ) i i 1 Pr( L Pr( 1 ) L1 L i2 ) Pr( Pr( i L1 i1 ) L2 )...Pr( 2 1 ) Pr( 1 1 ) )

The Probability of a Sequence for a Given Markov Chain Model A G begin C T Pr( cggt ) Pr( c)pr( g c)pr( g g)pr(t g )

Markov Chain Models can also have an end state; allows the model to represent a distribution over sequences of different lengths preferences for ending sequences with certain symbols A G begin end C T

Markov Chain Notation the transition parameters can be denoted by where a similarly we can denote the probability of a sequence as a B L i2 a Pr( i i 1) i1i Pr( 1 ) Pr( i i 1) 1 i1i i2 where represents the transition from the begin state L

Eample Application CpG islands CpG is a pair of nucleotides C and G, appearing successively in this order, along one DNA strand. CpG dinucleotides are rarer in eukaryotic genomes than epected given the marginal probabilities of C and G but the regions upstream of genes (promoter regions) are richer in CpG dinucleotides than elsewhere CpG islands useful evidence for finding genes could predict CpG islands with Markov chains one to represent CpG islands one to represent the rest of the genome

Estimating the Model Parameters given some data (e.g. a set of sequences from CpG islands), how can we determine the probability parameters of our model? one approach: maimum likelihood estimation given a set of data D set the parameters Pr( D ) to maimize i.e. make the data D look likely under the model

Maimum Likelihood Estimation suppose we want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t) and we re given the sequences accgcgctta gcttagtgac tagccgttac then the maimum likelihood estimates are Pr( a) Pr( c) 6 30 9 30 0.2 0.3 7 Pr( g) 0.233 30 8 Pr( t) 0.267 30

Maimum Likelihood Estimation suppose instead we saw the following sequences gccgcgcttg gcttggtggc tggccgttgc then the maimum likelihood estimates are Pr( a) Pr( c) 0 30 9 30 0 0.3 13 Pr( g) 0.433 30 8 Pr( t) 0.267 30 do we really want to set this to 0?

A Bayesian Approach instead of estimating parameters strictly from the data, we could start with some prior belief for each for eample, we could use Laplace estimates where character i Pr( a) n i i na 1 ( n 1) i represents the number of occurrences of using Laplace estimates with the sequences gccgcgcttg gcttggtggc tggccgttgc Pr( a) Pr( c) 0 1 34 9 1 34 pseudocount

A Bayesian Approach a more general form: m-estimates Pr( a) n i a n i p a m m prior probability of a number of virtual instances with m=8 and uniform priors gccgcgcttg gcttggtggc tggccgttgc Pr( c) 9 0.258 30 8 11 38

Estimation for 1 st Order Probabilities to estimate a 1 st order parameter, such as Pr(c g), we count the number of times that c follows the history g in our given sequences using Laplace estimates with the sequences gccgcgcttg 0 1 0 1 Pr( a g) Pr( a c) 12 4 7 4 gcttggtggc 7 1 tggccgttgc Pr( c g) 12 4 3 1 Pr( g g) 12 4 2 1 Pr( t g) 12 4

Markov Chains for Discrimination Question: Given a short stretch of genomic sequence, how could we decide if it comes from a CpG island or not? given sequences from CpG islands, and sequences from other regions, we can construct a model to represent CpG islands a null model to represent the other regions can then score a test sequence by: score( ) log Pr( CpG model) Pr( null model)

Markov Chains for Discrimination why use score( ) log Bayes rule tells us Pr( CpG) Pr( null ) Pr( CpG)Pr( CpG) Pr( CpG ) Pr( ) Pr( null )Pr( null ) Pr( null ) Pr( ) if we re not taking into account prior probabilities of two classes ( Pr(CpG) and Pr(null) ) then we just need to compare Pr( CpG) and Pr( null )

where Pr( CpGmodel) and a + st is the transition probability for the CpG island model from letter s to letter t, which is set as a st c L1 and c + st is the number of times letter s followed by t. So the score can be further written as Pr( st c t' st' L Pr( Pr( 1, 1 ) ) L 2 L 2,..., Pr( a i1 i 1 i model) i1 ) score( ) log Pr( CpGmodel) Pr( null model) L i log a a i1 1 i1 i i

Markov Chains for Discrimination parameters estimated for CpG and null models human sequences containing 48 CpG islands 60,000 nucleotides + A C G T A.18.27.43.12 C.17.37.27.19 G.16.34.38.12 T.08.36.38.18 - A C G T A.30.21.28.21 C.32.30.08.30 G.25.24.30.21 T.18.24.29.29

Markov Chains for Discrimination light bars represent negative sequences (non-cpg island) dark bars represent positive sequences

Higher Order Markov Chains the Markov property specifies that the probability of a state depends only on the probability of the previous state but we can build more memory into our states by using a higher order Markov model in an nth order Markov model Pr( i i, i2,..., 1 ) Pr( i i 1,..., 1 in )