EM algorithm and applications Lecture #9

Size: px
Start display at page:

Download "EM algorithm and applications Lecture #9"

Transcription

1 EM algorithm and applications Lecture #9 Bacground Readings: Chapters 11.2, 11.6 in the text boo, Biological Sequence Analysis, Durbin et al.,

2 The EM algorithm This lecture plan: 1. Presentation and Correctness Proof of the EM algorithm. 2. Examples of Implementations 2

3 Model, Parameters, ML A model with parameters θ is a probabilistic space M, in which each simple event y is determined by values of random variables (dice). The parameters θ are the probabilities associated with the random variables. (In HMM of length L, the simple events are HMM-sequences of length L, and the parameters are the transition probabilities m l and the emission probabilities e (b)). An observed data is a non empty subset x M. (In HMM, it can be all the simple events which fit with a given output sequence). Given observed data x, the ML method sees parameters θ* which maximize the lielihood of the data p(x θ)= y p(x,y θ). Finding such θ* is easy when the observed data is a simple event, but hard in general. 3

4 The EM algorithm Assume a model with parameters as in the previous slide. Given observed data x, the lielihood of x under model parameters θ is given by p(x θ)= y p(x,y θ). (The y s are the simple events which comprise x, usually determined by the possible values of hidden data ). The EM algorithm receives x and parameters θ, and returns new parameters λ* s.t. p(x λ*) > p(x θ). i.e., the new parameters increase the lielihood of the observed data. 4

5 The EM algorithm The graphs below are the logarithms of the lielihood functions Log(L θ )= E θ [log P(x,y λ)] log P(x λ) θ λ EM uses the current parameters θ to construct a simpler ML problem L θ : L ( ) p( x, y ) p x y θ λ = λ (, θ ) Guarantee: if L θ (λ)>l θ (θ), than P(x λ)>p(x θ). y λ 5

6 Derivation of the EM Algorithm Let x be the observed data. Let {(x,y 1 ),,(x,y )} be the set of (simple) events which comprise x. Our goal is to find parameters θ * which maximize the sum px (, θ ) = pxy (, θ ) + pxy (, θ ) pxy (, θ ) * * * * 1 2 As this is hard, we start with some parameters θ, and only find λ * s.t.: * * (, λ ) = (, i λ ) > (, i θ) = (, θ) i= 1 i= 1 p x p x y pxy px 6

7 For given parameters θ, Let p i =p(y i x,θ). (note that p 1 + +p =1). We use the p i s to define virtual sampling, in which: y 1 occurs p 1 times, y 2 occurs p 2 times, y occurs p times The EM algorithm loos for new parameters λ which maximize the lielihood of this "virtual" sampling. This lielihhod is given by L λ p y x λ p y x λ p y x λ θ p1 p2 p ( )= (, ) (, ) (, )

8 In each iteration the EM algorithm does the following. (E step): Calculate The EM algorithm p y x L ( λ )= θ θ p y x λ (, ) (, ), (M step): Find λ* which maximizes L θ (λ) (ext iteration sets θ λ* and repeat). y Comment: 1. At the M-step we only need that L θ (λ*)>l θ (θ). This change yields the so called Generalized EM algorithm. It is important when it is hard to find the optimal λ*. 2. Usually, Q ( λ)= log( L ( λ))= p( y x, θ)log( p( y, x λ)) is used. θ θ y 8

9 Correctness Theorem for the EM Algorithm Theorem: Let x= {( x, y i= 1 1 ),..,( xy, )} be a collection of events, as in the setting of the EM algorithm, and let L θ ( λ) = prob( xy, λ) Then the following holds: If L prob( y x, θ ) * * θ( λ ) > Lθ( θ), then prob( x λ ) > pr i i ob( x θ ). 9

10 Correctness proof of EM * Let prob( yi x, θ) = pi,prob( yi x, λ ) = qi. Then from the definition of conditional probability we have: * * prob( xy, i θ) = pi prob( x θ), prob( xy, i λ ) = qi prob( x λ ). * By the EM assumption on θ and λ : i= 1 since * pi * ( q prob( x λ )) = L ( λ ) > L ( θ) = ( p prob( x θ)) i p = q = 1 we get: i= 1 i i= 1 i θ i= 1 p * i pi ( q ) prob( ) ( ) prob( ) 1 i x λ > p i i 1 i x θ = = θ i p i 10

11 Correctness proof of EM (end) from last slide: p * i pi ( q ) prob( ) ( ) prob( ) [1] 1 i x λ > p i i 1 i x θ = = pi ( q ) * i= 1 i > p ( p i i= 1 i ) < 1 by the ML principle i= 1 p i q i 1 i < = i= 1 pi By the ML principle we have: ( ) ( p ). pi Dividing equation [1] by ( p ) we get : prob( x λ ) QED i * prob( x λ ) > prob( x θ ) by [1] above i 11

12 Example: Baum Welsh = EM for HMM The Baum-Welsh algorithm is the EM algorithm for HMM: p s x E step for HMM: L ( λ )= p s x λ (, θ ) θ (, ), s where λ are the new parameters {m l,e (b)}. M step for HMM: loo for λ which maximizes L θ (λ). Recall that for HMM, psx λ = m ( ) (, ) ( e() b ) s l M E b l l, b, s 12

13 Baum Welsh = EM for HMM (cont) Ml E ( b) writing psx (, λ) as m e() b we get l l, b, s l Lθ ( λ))= m l e ( b) s, l, b s s Ml p( s x, θ) E b p s x θ s s m ( ) (, ) l e ( b). l, b, M l s s M E ( b) s p( s x, θ ) As we showed, L ( λ )) is maximized when the m 's and e ( b)' s θ E (b) are the relative frequencies of the corresponding variables given x and θ. i.e., = l ml = M M l l ' l ' and e( b) = E( b) E b ( ') b' 13

14 A simple example: EM for 2 coin tosses Consider the following experiment: Given a coin with two possible outcomes: H (head) and T (tail), with probabilities θ H, θ T = 1- θ H. The coin is tossed twice, but only the 1 st outcome, T, is seen. So the data is x = (T,*). We wish to apply the EM algorithm to get parameters that increase the lielihood of the data. Let the initial parameters be θ = (θ H, θ T ) = ( ¼, ¾ ). 14

15 EM for 2 coin tosses (cont) The hidden data which produce x are the sequences y 1 = (T,H); y 2 =(T,T); Hence the lielihood of x with parameters (θ H, θ T ), is p(x θ) = P(x,y 1 θ) + P(x,y 2 θ) = q H q T +q 2 T For the initial parameters θ = ( ¼, ¾ ), we have: p(x θ) = ¼ ¾+ ¾ ¾= ¾ ote that in this case P(x,y i θ) = P(y i θ), for i = 1,2. we can always define y so that (x,y) = y (otherwise we set y (x,y) and replace the y s by y s). 15

16 EM for 2 coin tosses - E step Calculate L θ (λ) = L θ (λ H,λ T ). Recall: λ H,λ T are the new parameters, which we need to optimize Lθ ( λ) = p( x, y λ) p( x, y λ) p( y1 x, θ ) p( y2 x, θ ) 1 2 p(y 1 x,θ) = p(y 1,x θ)/p(x θ) = (¾ ¼)/ (¾) = ¼ p(y 2 x,θ) = p(y 2,x θ)/p(x θ) = (¾ ¾)/ (¾) = ¾ Thus we have Lθ ( λ) = p( x, y λ) p( x, y λ) 16

17 EM for 2 coin tosses - E step For a sequence y of coin tosses, let H (y) be the number of H s in y, and T (y) be the number of T s in y. Then ( y) ( y) H T pyλ = λ λ ( ) H T In our example: y 1 = (T,H); y 2 =(T,T), hence: H (y 1 ) = T (y 1 )=1, H (y 2 ) =0, T (y 2 )=2 17

18 Example: 2 coin tosses - E step Thus T ( y1) H ( y1) 1 T H T H T ( y2) H ( y2) 2 pxy (, 2 λ) = λt λh = λt 1 3 ( ) (, ) (, 2 ) ( λλ ) ( ) 4 T H λt = 1 ( ) ( 2) ( 1) ( 2) 4 T y + 4 T y 4 H y + 4 H y T λh Lθ λ = p x y λ p x y λ = λ pxy (, λ) = λ λ = λ λ And in general: T = 7 /4 H = ¼ T L λ = λ λ θ H ( ) T H 18

19 EM for 2 coin tosses - M step Find λ* which maximizes L θ (λ) And as we already saw, λ H λ T T λ H H is maximized when: H T = ; λt = + + H T H T λ H = = 1 ; λ 7 8 T = = that is, λ* = (, ) and px ( λ*) = [The optimal parameters (0,1), will never be reached by the EM algorithm!] 19

20 EM for single random variable (dice) ow, the probability of each y ( (x,y)) is given by a sequence of dice tosses. The dice has m outcomes, with probabilities λ 1,..,λ m. Let (y) = #(outcome occurs in y). Then m pyλ ( ) = λ = 1 Let be the expected value of (y), given x and θ: Then we have: ( y) =E( x,θ) = y p(y x,θ) (y), 20

21 L θ (λ) for one dice L θ ( λ) ( ) p( y x, θ ) = p y λ = y m p( y x, θ ) m ( y) p( y x, θ ) m ( y) y λ = λ = λ y = 1 = 1 = 1 which is maximized for λ = ' ' 21

22 EM algorithm for n independent observations x 1,, x n : Expectation step It can be shown that, if the x j are independent, then: n n j j j j j = (, ) (, ) = j= 1 j y j= 1 p y x θ y x n j= 1 1 j px ( θ ) j = y j j j j j py (, x θ ) ( y, x) 22

23 23 Example: The ABO locus A locus is a particular place on the chromosome. Each locus state (called genotype) consists of two alleles one parental and one maternal. Some loci (plural of locus) determine distinguished features. The ABO locus, for example, determines blood type. q q q q q q o o o o b a b a o b o b b b b b o a o a a a a a / / / / / / / / / / / /,,,,, = = = = = = Suppose we randomly sampled individuals and found that a/a have genotype a/a, a/b have genotype a/b, etc. Then, the MLE is given by: The ABO locus has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. The first two genotypes determine blood type A, the next two determine blood type B, then blood type AB, and finally blood type O. We wish to estimate the proportion in a population of the 6 genotypes.

24 The ABO locus (Cont.) However, testing individuals for their genotype is a very expensive. Can we estimate the proportions of genotype using the common cheap blood test with outcome being one of the four blood types (A, B, AB, O)? The problem is that among individuals measured to have blood type A, we don t now how many have genotype a/a and how many have genotype a/o. So what can we do? 24

25 The ABO locus (Cont.) The Hardy-Weinberg equilibrium rule states that in equilibrium the frequencies of the three alleles q a,q b,q o in the population determine the frequencies of the genotypes as follows: q a/b = 2q a q b, q a/o = 2q a q o, q b/o = 2q b q o, q a/a = [q a ] 2, q b/b = [q b ] 2, q o/o = [q o ] 2. In fact, Hardy-Weinberg equilibrium rule follows from modeling this problem as data x with hidden parameters y: 25

26 The ABO locus (Cont.) The dice outcome are the three possible alleles a, b and o. The observed data are the blood types A, B, AB or O. Each blood type is determined by two successive random sampling of alleles, which is an ordered genotypes pair this is the hidden data. For instance blood type A corresponds to the ordered genotypes pairs (a,a), (a,o) and (o,a). So we have three parameters of one dice q a,q b,q o -that we need to estimate. We start with parameters θ = (q a,q b,q o ), and then use EM to improve them. 26

27 EM setting for the ABO locus The observed data x =(x 1,..,x n ) is a sequence of elements (blood types) from the set {A,B,AB,O}. eg: (B,A,B,B,O,A,B,A,O,B, AB) are observations (x 1, x 11 ). The hidden data (ie the y s) for each x j is the set of ordered pairs of alleles that generates it. For instance, for A it is the set {aa, ao, oa}. The parameters θ= {q a,q b, q o } are the probabilities of the alleles. 27

28 EM for ABO loci For each observed blood type x j {A,B,AB,O} and for each allele z in {a,b,o} we compute z (x j ), the expected number of times that z appear in x j. j j j j z( x ) = p( y x, θ ) z( y ) y j Where the sum is taen over the ordered genotype pairs y j, and z (y j ) is the number of times allele z occurs in the pair y j. eg, a (o,b)=0; b (o,b) = o (o,b) = 1. 28

29 EM for ABO loci The computation for blood type B: P(B θ) = P((b,b) θ) + p((b,o) θ) +p((o,b) θ)) = q 2 b + 2q b q o. Since b ((b,b))=2, and b ((b,o))= b ((o,b)) = o ((o,b))= o ((b,o))=1, o (B) and b (B), the expected number of occurrences of o and b in B, are given by: 2qq b o 2qq b o o( B) = p( y B, θ ) o( y) = = 2 pb ( θ ) q + 2q q y ( B) = p( y B, θ ) ( y) = b y b + 2 2qb 2qbqo 2 qb + 2qbqo b b o Observe that b (B)+ o (B) = 2 29

30 EM for ABO loci Similarly, P(A θ) = q a 2 + 2q a q o. 2 2qq a o 2qa + 2qq a o 2 a 2 a + 2 a o a + 2 a o o( A) =, ( A) = q q q q q q P(AB θ) = p((b,a) θ) + p((a,b) θ)) = 2q a q b ; a (AB) = b (AB) = 1 P(O θ) = p((o,o) θ) = q o 2 o (O) = 2 [ b (O) = a (O) = o (AB) = b (A) = a (B) = 0 ] 30

31 E step: compute a, b and o Let #(A)=3, #(B)=5, #(AB)=1, #(O)=2 be the number of observations of A, B, AB, and O respectively. = #( A) ( A) + #( AB) ( AB) a a a = #( B) ( B) + #( AB) ( AB) b b b = #( A) ( A) + #( B) ( B) + #( O) ( O) o o o o ote that + + = 2 = 22 a b o M step: set λ*=( q a *, q b *, q o *) q * a = a 2 ; q * b = b 2 ; q * o = o 2 31

32 EM for a general discrete stochastic processes ow we wish to maximize lielihood of observation x with hidden data as before, ie maximize p(x λ)= y p(x,y λ). But this time experiment (x,y) is generated by a general stochastic process. The only assumption we mae is that the outcome of each experiment consists of a (finite) sequence of samplings of r discrete random variables (dices) Z 1,..., Z r, each of the Z i s can be sampled few times. This can be realized by a probabilistic acyclic state machine, where at each state some Z i is sampled, and the next state is determined by the outcome until a final state is reached. 32

33 EM for processes with many dices Example: In HMM, the random variables are the transmissions and emission probabilities: a l, e (b). x is the visible information y is the sequence s of states (x,y) is the complete HMM As before, we can redefine y so that (x,y) = y. s 1 s 2 s L-1 s L s i X 1 X 2 X L-1 X L X i 33

34 EM for processes with many dices Each random variable Z l (l =1,...,r)hasm l values z l,1,...z l,ml with probabilities {q l =1,...,m l }. Each y defines a sequence of outcomes (z l1, 1,...,z ln, n ) of the random variables used in y. In the HMM, these are the specific transitions and emissions, defined by the states and outputs of the sequence y j. Let l (y) = #(z l appears in y). 34

35 EM for processes with many dices Similarly to the single dice case, we have: pyλ ( ) r m = l l= 1 = 1 λ l l ( y) Define l as the expected value of l (y), given x and θ: Then we have: l =E( l x,θ) = y p(y x,θ) l (y), 35

36 L θ Q θ (λ) for processes with many dices l l= 1 = 1 l l= 1 = 1 p( y x, θ ) r ml p( y x, θ ) l ( y) ( λ) = p( y λ) = λl = y y l= 1 = 1 r ml l ( y) p( y x, θ ) r ml y l λ = λ where is the expected number of times that, l given x and θ, the outcome of dice l was : L θ = ( y) p( y x, θ ). l y ( λ) is maximized for λ l l = l ' l l ' 36

37 EM algorithm for processes with many dices Similarly to the one dice case we get: Expectation step Set l to E ( l (y) x,θ), ie: l = y p(y x,θ) l (y) Maximization step Set λ l = l / ( l ) 37

38 EM algorithm for n independent observations x 1,, x n : Expectation step It can be shown that, if the x j are independent, then: n n j j j j j l = (, ) l (, ) = l j= 1 j y j= 1 py x θ y x j l n j= 1 1 j px ( ) j = y j j j j py (, x θ ) ( y, x) l 38

39 EM in Practice Initial parameters: Random parameters setting Best guess from other source Stopping criteria: Small change in lielihood of data Small change in parameter values Avoiding bad local maxima: Multiple restarts Early pruning of unpromising ones 39

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition

More information

HIDDEN MARKOV MODELS

HIDDEN MARKOV MODELS HIDDEN MARKOV MODELS Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms  Hidden Markov Models Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas Forward Algorithm For Markov chains we calculate the probability of a sequence, P(x) How

More information

Unit 1: Sequence Models

Unit 1: Sequence Models CS 562: Empirical Methods in Natural Language Processing Unit 1: Sequence Models Lecture 5: Probabilities and Estimations Lecture 6: Weighted Finite-State Machines Week 3 -- Sep 8 & 10, 2009 Liang Huang

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm Forward-Backward Algorithm Profile HMMs HMM Parameter Estimation Viterbi training Baum-Welch algorithm

More information

HMM: Parameter Estimation

HMM: Parameter Estimation I529: Machine Learning in Bioinformatics (Spring 2017) HMM: Parameter Estimation Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Content Review HMM: three problems

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

An Introduction to Bioinformatics Algorithms Hidden Markov Models

An Introduction to Bioinformatics Algorithms   Hidden Markov Models Hidden Markov Models Outline 1. CG-Islands 2. The Fair Bet Casino 3. Hidden Markov Model 4. Decoding Algorithm 5. Forward-Backward Algorithm 6. Profile HMMs 7. HMM Parameter Estimation 8. Viterbi Training

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

An introduction to PRISM and its applications

An introduction to PRISM and its applications An introduction to PRISM and its applications Yoshitaka Kameya Tokyo Institute of Technology 2007/9/17 FJ-2007 1 Contents What is PRISM? Two examples: from population genetics from statistical natural

More information

CS Lecture 18. Expectation Maximization

CS Lecture 18. Expectation Maximization CS 6347 Lecture 18 Expectation Maximization Unobserved Variables Latent or hidden variables in the model are never observed We may or may not be interested in their values, but their existence is crucial

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Spring 2017 Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required optional Liang Huang Probabilities experiment

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Hidden Markov Models. Three classic HMM problems

Hidden Markov Models. Three classic HMM problems An Introduction to Bioinformatics Algorithms www.bioalgorithms.info Hidden Markov Models Slides revised and adapted to Computational Biology IST 2015/2016 Ana Teresa Freitas Three classic HMM problems

More information

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Expectation-Maximization (EM) algorithm

Expectation-Maximization (EM) algorithm I529: Machine Learning in Bioinformatics (Spring 2017) Expectation-Maximization (EM) algorithm Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Contents Introduce

More information

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8

The E-M Algorithm in Genetics. Biostatistics 666 Lecture 8 The E-M Algorithm in Genetics Biostatistics 666 Lecture 8 Maximum Likelihood Estimation of Allele Frequencies Find parameter estimates which make observed data most likely General approach, as long as

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm

6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm 6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic

More information

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Statistical Sequence Recognition and Training: An Introduction to HMMs

Statistical Sequence Recognition and Training: An Introduction to HMMs Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with

More information

Hidden Markov Models 1

Hidden Markov Models 1 Hidden Markov Models Dinucleotide Frequency Consider all 2-mers in a sequence {AA,AC,AG,AT,CA,CC,CG,CT,GA,GC,GG,GT,TA,TC,TG,TT} Given 4 nucleotides: each with a probability of occurrence of. 4 Thus, one

More information

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.

Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)

More information

Weighted Finite-State Transducers in Computational Biology

Weighted Finite-State Transducers in Computational Biology Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial

More information

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012

CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 CINQA Workshop Probability Math 105 Silvia Heubach Department of Mathematics, CSULA Thursday, September 6, 2012 Silvia Heubach/CINQA 2012 Workshop Objectives To familiarize biology faculty with one of

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Lagrange Multipliers

Lagrange Multipliers Calculus 3 Lia Vas Lagrange Multipliers Constrained Optimization for functions of two variables. To find the maximum and minimum values of z = f(x, y), objective function, subject to a constraint g(x,

More information

COORDINATE GEOMETRY LOCUS EXERCISE 1. The locus of P(x,y) such that its distance from A(0,0) is less than 5 units is x y 5 ) x y 10 x y 5 4) x y 0. The equation of the locus of the point whose distance

More information

Lecture 1. ABC of Probability

Lecture 1. ABC of Probability Math 408 - Mathematical Statistics Lecture 1. ABC of Probability January 16, 2013 Konstantin Zuev (USC) Math 408, Lecture 1 January 16, 2013 1 / 9 Agenda Sample Spaces Realizations, Events Axioms of Probability

More information

Stephen Scott.

Stephen Scott. 1 / 27 sscott@cse.unl.edu 2 / 27 Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations

Language Technology. Unit 1: Sequence Models. CUNY Graduate Center. Lecture 4a: Probabilities and Estimations Language Technology CUNY Graduate Center Unit 1: Sequence Models Lecture 4a: Probabilities and Estimations Lecture 4b: Weighted Finite-State Machines required hard optional Liang Huang Probabilities experiment

More information

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform

More information

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION

HIDDEN MARKOV MODELS IN SPEECH RECOGNITION HIDDEN MARKOV MODELS IN SPEECH RECOGNITION Wayne Ward Carnegie Mellon University Pittsburgh, PA 1 Acknowledgements Much of this talk is derived from the paper "An Introduction to Hidden Markov Models",

More information

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works

EM algorithm. Rather than jumping into the details of the particular EM algorithm, we ll look at a simpler example to get the idea of how it works EM algorithm The example in the book for doing the EM algorithm is rather difficult, and was not available in software at the time that the authors wrote the book, but they implemented a SAS macro to implement

More information

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2

f X (y, z; θ, σ 2 ) = 1 2 (2πσ2 ) 1 2 exp( (y θz) 2 /2σ 2 ) l c,n (θ, σ 2 ) = i log f(y i, Z i ; θ, σ 2 ) (Y i θz i ) 2 /2σ 2 Chapter 7: EM algorithm in exponential families: JAW 4.30-32 7.1 (i) The EM Algorithm finds MLE s in problems with latent variables (sometimes called missing data ): things you wish you could observe,

More information

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes Hidden Markov Models based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes music recognition deal with variations in - actual sound -

More information

Directed Probabilistic Graphical Models CMSC 678 UMBC

Directed Probabilistic Graphical Models CMSC 678 UMBC Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014

Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Overview - 1 Statistical Genetics I: STAT/BIOST 550 Spring Quarter, 2014 Elizabeth Thompson University of Washington Seattle, WA, USA MWF 8:30-9:20; THO 211 Web page: www.stat.washington.edu/ thompson/stat550/

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Consider the equation different values of x we shall find the values of y and the tabulate t the values in the following table

Consider the equation different values of x we shall find the values of y and the tabulate t the values in the following table Consider the equation y = 2 x + 3 for different values of x we shall find the values of y and the tabulate t the values in the following table x 0 1 2 1 2 y 3 5 7 1 1 When the points are plotted on the

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping Plan for today! Part 1: (Hidden) Markov models! Part 2: String matching and read mapping! 2.1 Exact algorithms! 2.2 Heuristic methods for approximate search (Hidden) Markov models Why consider probabilistics

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Slides revised and adapted to Bioinformática 55 Engª Biomédica/IST 2005 Ana Teresa Freitas CG-Islands Given 4 nucleotides: probability of occurrence is ~ 1/4. Thus, probability of

More information

Hidden Markov Modelling

Hidden Markov Modelling Hidden Markov Modelling Introduction Problem formulation Forward-Backward algorithm Viterbi search Baum-Welch parameter estimation Other considerations Multiple observation sequences Phone-based models

More information

Hidden Markov Models. Hosein Mohimani GHC7717

Hidden Markov Models. Hosein Mohimani GHC7717 Hidden Markov Models Hosein Mohimani GHC7717 hoseinm@andrew.cmu.edu Fair et Casino Problem Dealer flips a coin and player bets on outcome Dealer use either a fair coin (head and tail equally likely) or

More information

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9: Useful for modeling/making predictions on sequential data E.g., biological sequences, text, series of sounds/spoken words Will return to graphical models that are generative sscott@cse.unl.edu 1 / 27 2

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Introduction to Bayesian Learning

Introduction to Bayesian Learning Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline

More information

EECS E6870: Lecture 4: Hidden Markov Models

EECS E6870: Lecture 4: Hidden Markov Models EECS E6870: Lecture 4: Hidden Markov Models Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY 10549 stanchen@us.ibm.com, picheny@us.ibm.com,

More information

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks

Outline. 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Outline 1. Define likelihood 2. Interpretations of likelihoods 3. Likelihood plots 4. Maximum likelihood 5. Likelihood ratio benchmarks Likelihood A common and fruitful approach to statistics is to assume

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Lecture notes for probability. Math 124

Lecture notes for probability. Math 124 Lecture notes for probability Math 124 What is probability? Probabilities are ratios, expressed as fractions, decimals, or percents, determined by considering results or outcomes of experiments whose result

More information

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments

Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments Chapter 2. Review of basic Statistical methods 1 Distribution, conditional distribution and moments We consider two kinds of random variables: discrete and continuous random variables. For discrete random

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev CS4705 Probability Review and Naïve Bayes Slides from Dragomir Radev Classification using a Generative Approach Previously on NLP discriminative models P C D here is a line with all the social media posts

More information

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction Hidden Markov Models (HMMs) for Information Extraction Daniel S. Weld CSE 454 Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) standard sequence model in genomics, speech, NLP, What

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

CSCE 471/871 Lecture 3: Markov Chains and

CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models. x 1 x 2 x 3 x K Hidden Markov Models 1 1 1 1 2 2 2 2 K K K K x 1 x 2 x 3 x K Viterbi, Forward, Backward VITERBI FORWARD BACKWARD Initialization: V 0 (0) = 1 V k (0) = 0, for all k > 0 Initialization: f 0 (0) = 1 f k (0)

More information

Review of Basic Probability

Review of Basic Probability Review of Basic Probability Erik G. Learned-Miller Department of Computer Science University of Massachusetts, Amherst Amherst, MA 01003 September 16, 2009 Abstract This document reviews basic discrete

More information

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

STAT 430/510 Probability Lecture 7: Random Variable and Expectation STAT 430/510 Probability Lecture 7: Random Variable and Expectation Pengyuan (Penelope) Wang June 2, 2011 Review Properties of Probability Conditional Probability The Law of Total Probability Bayes Formula

More information

Automatic Speech Recognition (CS753)

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 6: Hidden Markov Models (Part II) Instructor: Preethi Jyothi Aug 10, 2017 Recall: Computing Likelihood Problem 1 (Likelihood): Given an HMM l =(A, B) and an

More information

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50

LECTURE # How does one test whether a population is in the HW equilibrium? (i) try the following example: Genotype Observed AA 50 Aa 0 aa 50 LECTURE #10 A. The Hardy-Weinberg Equilibrium 1. From the definitions of p and q, and of p 2, 2pq, and q 2, an equilibrium is indicated (p + q) 2 = p 2 + 2pq + q 2 : if p and q remain constant, and if

More information

MACHINE LEARNING 2 UGM,HMMS Lecture 7

MACHINE LEARNING 2 UGM,HMMS Lecture 7 LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability

More information

02 Background Minimum background on probability. Random process

02 Background Minimum background on probability. Random process 0 Background 0.03 Minimum background on probability Random processes Probability Conditional probability Bayes theorem Random variables Sampling and estimation Variance, covariance and correlation Probability

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

EM (cont.) November 26 th, Carlos Guestrin 1

EM (cont.) November 26 th, Carlos Guestrin 1 EM (cont.) Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 26 th, 2007 1 Silly Example Let events be grades in a class w 1 = Gets an A P(A) = ½ w 2 = Gets a B P(B) = µ

More information

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events LECTURE 1 1 Introduction The first part of our adventure is a highly selective review of probability theory, focusing especially on things that are most useful in statistics. 1.1 Sample spaces and events

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Lecture #5. Dependencies along the genome

Lecture #5. Dependencies along the genome Markov Chains Lecture #5 Background Readings: Durbin et. al. Section 3., Polanski&Kimmel Section 2.8. Prepared by Shlomo Moran, based on Danny Geiger s and Nir Friedman s. Dependencies along the genome

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4 ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian

More information

Distribusi Binomial, Poisson, dan Hipergeometrik

Distribusi Binomial, Poisson, dan Hipergeometrik Distribusi Binomial, Poisson, dan Hipergeometrik CHAPTER TOPICS The Probability of a Discrete Random Variable Covariance and Its Applications in Finance Binomial Distribution Poisson Distribution Hypergeometric

More information

Phasing via the Expectation Maximization (EM) Algorithm

Phasing via the Expectation Maximization (EM) Algorithm Computing Haplotype Frequencies and Haplotype Phasing via the Expectation Maximization (EM) Algorithm Department of Computer Science Brown University, Providence sorin@cs.brown.edu September 14, 2010 Outline

More information

Allele Frequency Estimation

Allele Frequency Estimation Allele Frequency Estimation Examle: ABO blood tyes ABO genetic locus exhibits three alleles: A, B, and O Four henotyes: A, B, AB, and O Genotye A/A A/O A/B B/B B/O O/O Phenotye A A AB B B O Data: Observed

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

LOCUS. Definition: The set of all points (and only those points) which satisfy the given geometrical condition(s) (or properties) is called a locus.

LOCUS. Definition: The set of all points (and only those points) which satisfy the given geometrical condition(s) (or properties) is called a locus. LOCUS Definition: The set of all points (and only those points) which satisfy the given geometrical condition(s) (or properties) is called a locus. Eg. The set of points in a plane which are at a constant

More information

Data Structures and Algorithm Analysis (CSC317) Randomized algorithms

Data Structures and Algorithm Analysis (CSC317) Randomized algorithms Data Structures and Algorithm Analysis (CSC317) Randomized algorithms Hiring problem We always want the best hire for a job! Using employment agency to send one candidate at a time Each day, we interview

More information

Lecture 6: Entropy Rate

Lecture 6: Entropy Rate Lecture 6: Entropy Rate Entropy rate H(X) Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker Toss a fair coin and see and sequence Head, Tail, Tail,

More information

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16

VL Algorithmen und Datenstrukturen für Bioinformatik ( ) WS15/2016 Woche 16 VL Algorithmen und Datenstrukturen für Bioinformatik (19400001) WS15/2016 Woche 16 Tim Conrad AG Medical Bioinformatics Institut für Mathematik & Informatik, Freie Universität Berlin Based on slides by

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A = 0.75! Exact p-value Exact confidence

More information

26. LECTURE 26. Objectives

26. LECTURE 26. Objectives 6. LECTURE 6 Objectives I understand the idea behind the Method of Lagrange Multipliers. I can use the method of Lagrange Multipliers to maximize a multivariate function subject to a constraint. Suppose

More information

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1 EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess

More information

Probability and Independence Terri Bittner, Ph.D.

Probability and Independence Terri Bittner, Ph.D. Probability and Independence Terri Bittner, Ph.D. The concept of independence is often confusing for students. This brief paper will cover the basics, and will explain the difference between independent

More information

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding Example: The Dishonest Casino Hidden Markov Models Durbin and Eddy, chapter 3 Game:. You bet $. You roll 3. Casino player rolls 4. Highest number wins $ The casino has two dice: Fair die P() = P() = P(3)

More information

MATH MW Elementary Probability Course Notes Part I: Models and Counting

MATH MW Elementary Probability Course Notes Part I: Models and Counting MATH 2030 3.00MW Elementary Probability Course Notes Part I: Models and Counting Tom Salisbury salt@yorku.ca York University Winter 2010 Introduction [Jan 5] Probability: the mathematics used for Statistics

More information

Learning Bayesian Networks (part 1) Goals for the lecture

Learning Bayesian Networks (part 1) Goals for the lecture Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials

More information

Lecture 12: Algorithms for HMMs

Lecture 12: Algorithms for HMMs Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information