Cheng Soon Ong & Christian Walder. Canberra February June 2018
|
|
- Jacob Pierce
- 5 years ago
- Views:
Transcription
1 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1 Linear Regression 2 Linear Classification 1 Linear Classification 2 Kernel Methods Sparse Kernel Methods Mixture Models and EM 1 Mixture Models and EM 2 Neural Networks 1 Neural Networks 2 Principal Component Analysis Autoencoders Graphical Models 1 Graphical Models 2 Graphical Models 3 Sampling Sequential Data 1 Sequential Data 2 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 828
2 Part XVIII Sequential Data 2 Alpha-Beta How to train a - Viterbi Algorithm 645of 828
3 Example of a Hidden Markov Model Assume Peter and Mary are students in Canberra and Sydney, respectively. Peter is a computer science student and only interested in riding his bicycle, shopping for new computer gadgets, and studying. (Well, he also does other things but because these other activities don t depend on the weather we neglect them here.) Mary does not know the current weather in Canberra, but knows the general trends of the weather in Canberra. She also knows Peter well enough to know what he does on average on rainy days and also when the skies are blue. She believes that the weather (rainy or not rainy) follows a given discrete Markov chain. She tries to guess the sequence of weather patterns for a number of days after Peter tells her on the phone what he did in the last days. Alpha-Beta How to train a - Viterbi Algorithm 646of 828
4 Example of a Hidden Markov Model Mary uses the following rainy sunny initial probability transition probability rainy sunny emission probability cycle shop study Assume, Peter tells Mary that the list of his activities in the last days was [ cycle, shop, study ] (a) Calculate the probability of this observation sequence. (b) Calculate the most probable sequence of hidden states for these observations. Alpha-Beta How to train a - Viterbi Algorithm 647of 828
5 Hidden Markov Model The trellis for the hidden Markov model Find the probability of cycle, shop, study Rainy 0.2 Sunny Rainy Sunny Rainy Sunny Alpha-Beta cycle shop cycle shop cycle shop How to train a - Viterbi Algorithm study study study 648of 828
6 Hidden Markov Model Find the most probable hidden states for cycle, shop, study Rainy 0.2 Sunny Rainy Sunny Rainy Sunny Alpha-Beta cycle shop cycle shop cycle shop How to train a - Viterbi Algorithm study study study 649of 828
7 Homogeneous Hidden Markov Model Joint probability distribution over both latent and observed variables [ N ] N p(x, Z θ) = p(z 1 π) p(z n z n 1, A) p(x m z m, φ) n=2 m=1 where X = (x 1,..., x N ), Z = (z 1,..., z N ), and θ = {π, A, φ}. Most of the discussion will be independent of the particular choice of emission probabilities (e.g. discrete tables, Gaussians, mixture of Gaussians). Alpha-Beta How to train a - Viterbi Algorithm 650of 828
8 We have observed a data set X = (x 1,..., x n ). Assume it came from a with a given structure (number of nodes, form of emission probabilities). The likelihood of the data is p(x θ) = Z p(x, Z θ) This joint distribution does not factorise over n (as with the mixture distribution). We have N variables, each with K states : K N terms. Number of terms grows exponentially with the length of the chain. But we can use the conditional independence of the latent variables to reorder their calculation later. Further obstacle to find a closed loop maximum likelihood solution: calculating the emission probabilities for different states z n. Alpha-Beta How to train a - Viterbi Algorithm 651of 828
9 - EM Employ the EM algorithm to find the Maximum Likelihood for. Alpha-Beta How to train a - Viterbi Algorithm 652of 828
10 - EM Employ the EM algorithm to find the Maximum Likelihood for. Start with some initial parameter settings θ old. Alpha-Beta How to train a - Viterbi Algorithm 653of 828
11 - EM Employ the EM algorithm to find the Maximum Likelihood for. Start with some initial parameter settings θ old. E-step: Find the posterior distribution of the latent variables p(z X, θ old ). Alpha-Beta How to train a - Viterbi Algorithm 654of 828
12 - EM Employ the EM algorithm to find the Maximum Likelihood for. Start with some initial parameter settings θ old. E-step: Find the posterior distribution of the latent variables p(z X, θ old ). M-step: Maximise Alpha-Beta Q(θ, θ old ) = Z p(z X, θ old ) ln p(z, X θ) How to train a - Viterbi Algorithm with respect to the parameters θ = {π, A, φ}. 655of 828
13 - EM Denote the marginal posterior distribution of z n by γ(z n ), and the joint posterior distribution of two successive latent variables by ξ(z n 1, z n ) γ(z n ) = p(z n X, θ old ) ξ(z n 1, z n ) = p(z n 1, z n X, θ old ). For each step n, γ(z n ) has K nonnegative values which sum to 1. For each step n, ξ(z n 1, z n ) has K K nonnegative values which sum to 1. Elements of these vectors are denoted by γ(z nk ) and ξ(z n 1,j, z nk ) respectively. Alpha-Beta How to train a - Viterbi Algorithm 656of 828
14 - EM Because the expectation of a binary random variable is the probability that it is one, we get with this notation γ(z nk ) = E [z nk ] = γ(z n )z nk z n ξ(z n 1,j, z nk ) = E [z n 1,j z nk ] = ξ(z n 1, z n )z n 1,j z nk. z n 1,z n Putting all together we get K N K K Q(θ, θ old ) = γ(z 1k ) ln π k + ξ(z n 1,j, z nk ) ln A jk k=1 n=2 j=1 k=1 N K + γ(z nk ) ln p(x n φ k ). n=1 k=1 Alpha-Beta How to train a - Viterbi Algorithm 657of 828
15 - EM M-step: Maximising Q(θ, θ old ) = results in K γ(z 1k ) ln π k + k=1 + N n=1 k=1 N K n=2 j=1 k=1 K γ(z nk ) ln p(x n φ k ). π k = γ(z 1k) K j=1 γ(z 1j) K ξ(z n 1,j, z nk ) ln A jk Alpha-Beta How to train a - Viterbi Algorithm N n=2 A jk = ξ(z n 1,j, z nk ) K N l=1 n=2 ξ(z n 1,j, z nl ) 658of 828
16 - EM Still left: Maximising with respect to φ. Q(θ, θ old ) = K γ(z 1k ) ln π k + k=1 + N n=1 k=1 N K n=2 j=1 k=1 K γ(z nk ) ln p(x n φ k ). K ξ(z n 1,j, z nk ) ln A jk But φ only appears in the last term, and under the assumption that all the φ k are independent of each other, this term decouples into a sum. Then maximise each contribution N n=1 γ(z nk) ln p(x n φ k ) individually. Alpha-Beta How to train a - Viterbi Algorithm 659of 828
17 - EM In the case of Gaussian emission densities p(x φ k ) = N (x µ k, Σ k ), we get for the maximising parameters for the emission densities N n=1 µ k = γ(z nk) x n N n=1 γ(z nk) Σ k = N n=1 γ(z nk) (x n µ k )(x n µ k ) T N n=1 γ(z. nk) Alpha-Beta How to train a - Viterbi Algorithm 660of 828
18 Need to efficiently evaluate the γ(z nk ) and ξ(z n 1,j, z nk ). The graphical model for the is a tree! We know we can use a two-stage message passing algorithm to calculate the posterior distribution of the latent variables. For this is called the forward-backward algorithm (Rabiner, 1989), or Baum-Welch algorithm (Baum, 1972). Other variants exist, different only in the form of the messages propagated. We look at the most widely known, the alpha-beta algorithm. Alpha-Beta How to train a - Viterbi Algorithm 661of 828
19 for Given the data X = {x 1,..., x N } the following independence relations hold: p(x z n ) = p(x 1,..., x n z n ) p(x n+1,..., x N z n ) p(x 1,..., x n 1 x n, z n ) = p(x 1,..., x n 1 z n ) p(x 1,..., x n 1 z n 1, z n ) = p(x 1,..., x n 1 z n 1 ) p(x n+1,..., x N z n, z n+1 ) = p(x n+1,..., x N z n+1 ) p(x n+2,..., x N x n+1, z n+1 ) = p(x n+2,..., x N z n+1 ) p(x z n 1, z n ) = p(x 1,..., x n 1 z n 1 ) p(x n z n ) p(x n+1,..., x N z n ) p(x N+1 X, z N+1 ) = p(x N+1 z N+1 ) p(z n+1 X, z N ) = p(z N+1 z N ) Alpha-Beta How to train a - Viterbi Algorithm z 1 z 2 z n 1 z n z n+1 x 1 x 2 x n 1 x n x n+1 662of 828
20 for - Example Let s look at the following independence relation: p(x z n ) = p(x 1,..., x n z n ) p(x n+1,..., x N z n ) Any path from the set {x 1,..., x n } to the set {x n+1,..., x N } has to go through z n. In p(x z n ) the node z n is conditioned on (= observed). All paths from x 1,..., x n 1 through z n to x n+1,..., x N are head-tail. The path from x n through z n to z n+1 is tail-tail, so blocked. Therefore x 1,..., x n x n+1,..., x N z n. Alpha-Beta How to train a - Viterbi Algorithm z 1 z 2 z n 1 z n z n+1 x 1 x 2 x n 1 x n x n+1 663of 828
21 Alpha-Beta Define the joint probability of observing all data up to step n and having z n as latent variable to be α(z n ) = p(x 1,..., x n, z n ). Define the probability of all future data given z n to be β(z n ) = p(x n+1,..., x N z n ). Then it an be shown the following recursions hold α(z n ) = p(x n z n ) z n 1 α(z n 1 ) p(z n z n 1 ) Alpha-Beta How to train a - Viterbi Algorithm K α(z 1 ) = {π k p(x 1 φ k )} z1k k=1 664of 828
22 Alpha-Beta At step n we can efficiently calculate α(z n ) given α(z n 1 ) α(z n ) = p(x n z n ) z n 1 α(z n 1 ) p(z n z n 1 ) α(z n 1,1 ) k = 1 α(z n 1,2 ) k = 2 A 21 A 11 A 31 α(z n,1 ) p(x n z n,1 ) Alpha-Beta How to train a - Viterbi Algorithm α(z n 1,3 ) k = 3 n 1 n 665of 828
23 Alpha-Beta And for β(z n ) we get the recursion β(z n ) = z n+1 β(z n+1 ) p(x n+1 z n+1 ) p(z n+1 z n ) k = 1 k = 2 β(z n,1 ) β(z n+1,1 ) A 13 A 11 A 12 p(x n z n+1,1 ) β(z n+1,2 ) p(x n z n+1,2 ) β(z n+1,3 ) Alpha-Beta How to train a - Viterbi Algorithm k = 3 n n + 1 p(x n z n+1,3 ) 666of 828
24 Alpha-Beta How do we start the β recursion? What is β(z N )? β(z N ) = p(x N+1,..., x N z n ). Can be shown the following is consistent with the approach β(z N ) = 1. k = 1 β(z n,1 ) β(z n+1,1 ) A 11 A 12 β(z n+1,2 ) p(x n z n+1,1 ) Alpha-Beta How to train a - Viterbi Algorithm k = 2 A 13 β(z n+1,3 ) p(x n z n+1,2 ) k = 3 n n + 1 p(x n z n+1,3 ) 667of 828
25 Alpha-Beta Now we know how to calculate α(z n ) and β(z n ) for each step. What is the probability of the data p(x)? Use the definition of γ(z n ) and Bayes γ(z n ) = p(z n X) = p(x z n) p(z n ) p(x) = p(x, z n) p(x) and the following conditional independence statement from the graphical model of the p(x z n ) = p(x 1,..., x n z n ) p(x n+1,..., x N z n ) }{{}}{{} α(z n)/p(z n) β(z n) Alpha-Beta How to train a - Viterbi Algorithm and therefore γ(z n ) = α(z n)β(z n ) p(x) 668of 828
26 Alpha-Beta Marginalising over z n results in 1 = z γ(z n ) = n α(z n )β(z n ) p(x) z n and therefore at each step n p(x) = z n α(z n )β(z n ) Most conveniently evaluated at step N where β(z N ) = 1 as Alpha-Beta How to train a - Viterbi Algorithm p(x) = z N α(z n ). 669of 828
27 Alpha-Beta Finally, we need to calculate the joint posterior distribution of two successive latent variables by ξ(z n 1, z n ) defined by ξ(z n 1, z n ) = p(z n 1, z n X). This can be done directly from the α and β values in the form ξ(z n 1, z n ) = α(z n 1) p(x n z n ) p(z n z n 1 ) β(z n ) p(x) Alpha-Beta How to train a - Viterbi Algorithm 670of 828
28 How to train a 1 Make an inital selection for the parameters θ old where θ = {π, A, φ)}. (Often, A, π initialised uniformly or randomly. The φ k -initialisation depends on the emission distribution; for Gaussians run K-means first and get µ k and Σ k from there.) 2 (Start of E-step) Run forward recursion to calculate α(z n ). 3 Run backward recursion to calculate β(z n ). 4 Calculate γ(z) and ξ(z n 1, z n ) from α(z n ) and β(z n ). 5 Evaluate the likelihood p(x). 6 (Start of M-step) Find a θ new maximising Q(θ, θ old ). This results in new settings for the parameters π k, A j k and φ k as described before. 7 Iterate until convergence is detected. Alpha-Beta How to train a - Viterbi Algorithm 671of 828
29 Alpha-Beta - Notes In order to calculate the likelihood, we need to use the joint probability p(x, Z) and sum over all possible values of Z. That means, every particular choice of Z corresponds to one path through the lattice diagram. There are exponentially many! Using the alpha-beta algorithm, the exponential cost has been reduced to a linear cost in the length of the model. How did we do that? Swapping the order of multiplication and summation. Alpha-Beta How to train a - Viterbi Algorithm 672of 828
30 - Viterbi Algorithm Motivation: The latent states can have some meaningful interpretation, e.g. phonemes in a speech recognition system where the observed variables are the acoustic signals. Goal: After the system has been trained, find the most probable states of the latent variables for a given sequence of observations. Warning: Finding the set of states which are each individually the most probable does NOT solve this problem. Alpha-Beta How to train a - Viterbi Algorithm 673of 828
31 - Viterbi Algorithm Define ω(z n ) = max z1,...,z n 1 ln p(x 1,..., x n, z 1,..., z n ) From the joint distribution of the given by [ N ] N p(x 1,..., x N, z 1,..., z N ) = p(z 1 ) p(z n z n 1 ) p(x n z n ) n=2 the following recursion can be derived n=1 ω(z n ) = ln p(x n z n ) + max z n 1 {ln p(z n z n 1 ) + ω(z n 1 )} ω(z 1 ) = ln p(z 1 ) + ln p(x 1 z 1 ) = ln p(x 1, z 1 ) k = 1 Alpha-Beta How to train a - Viterbi Algorithm k = 2 k = 3 n 2 n 1 n n of 828
32 - Viterbi Algorithm Calculate ω(z n ) = max ln p(x 1,..., x n, z 1,..., z n ) z 1,...,z n 1 for n = 1,..., N. For each step n remember which is the best transition to go into each state at the next step. At step N : Find the state with the highest probability. For n = 1,..., N 1: Backtrace which transition led to the most probable state and identify from which state it came. k = 1 Alpha-Beta How to train a - Viterbi Algorithm k = 2 k = 3 n 2 n 1 n n of 828
Cheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationGraphical Models Seminar
Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationHidden Markov Models
CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence
More informationLecture 11: Hidden Markov Models
Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationSequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them
HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationParametric Models Part III: Hidden Markov Models
Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationHidden Markov Models
Hidden Markov Models CI/CI(CS) UE, SS 2015 Christian Knoll Signal Processing and Speech Communication Laboratory Graz University of Technology June 23, 2015 CI/CI(CS) SS 2015 June 23, 2015 Slide 1/26 Content
More informationHidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010
Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data
More informationCS838-1 Advanced NLP: Hidden Markov Models
CS838-1 Advanced NLP: Hidden Markov Models Xiaojin Zhu 2007 Send comments to jerryzhu@cs.wisc.edu 1 Part of Speech Tagging Tag each word in a sentence with its part-of-speech, e.g., The/AT representative/nn
More informationHidden Markov Models. Terminology and Basic Algorithms
Hidden Markov Models Terminology and Basic Algorithms The next two weeks Hidden Markov models (HMMs): Wed 9/11: Terminology and basic algorithms Mon 14/11: Implementing the basic algorithms Wed 16/11:
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationL23: hidden Markov models
L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech
More informationRecall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series
Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationHidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing
Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech
More informationRobert Collins CSE586 CSE 586, Spring 2015 Computer Vision II
CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationWe Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named
We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 25&29 January 2018 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationMachine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017
1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationHidden Markov Models. Terminology and Basic Algorithms
Hidden Markov Models Terminology and Basic Algorithms What is machine learning? From http://en.wikipedia.org/wiki/machine_learning Machine learning, a branch of artificial intelligence, is about the construction
More informationp(d θ ) l(θ ) 1.2 x x x
p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to
More informationHidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208
Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden
More informationNote Set 5: Hidden Markov Models
Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationStatistical Sequence Recognition and Training: An Introduction to HMMs
Statistical Sequence Recognition and Training: An Introduction to HMMs EECS 225D Nikki Mirghafori nikki@icsi.berkeley.edu March 7, 2005 Credit: many of the HMM slides have been borrowed and adapted, with
More informationHidden Markov Models Part 1: Introduction
Hidden Markov Models Part 1: Introduction CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Modeling Sequential Data Suppose that
More informationProbabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm
Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to
More informationLinear Dynamical Systems (Kalman filter)
Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete
More informationCISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)
CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationLEARNING SELF-ORGANIZING MIXTURE MARKOV MODELS
Journal of Nonlinear Systems and Applications (2010) 63 71 Copyright c 2010 Watam Press http://www.watam.org/jnsa/ LEARNING SELF-ORGANIING MIXTURE MARKOV MODELS Nicoleta Rogovschi, Mustapha Lebbah and
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationHidden Markov Models
Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010
More informationHuman-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg
Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationHidden Markov Models. Terminology, Representation and Basic Problems
Hidden Markov Models Terminology, Representation and Basic Problems Data analysis? Machine learning? In bioinformatics, we analyze a lot of (sequential) data (biological sequences) to learn unknown parameters
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More information6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm
6.864: Lecture 5 (September 22nd, 2005) The EM Algorithm Overview The EM algorithm in general form The EM algorithm for hidden markov models (brute force) The EM algorithm for hidden markov models (dynamic
More informationBioinformatics 2 - Lecture 4
Bioinformatics 2 - Lecture 4 Guido Sanguinetti School of Informatics University of Edinburgh February 14, 2011 Sequences Many data types are ordered, i.e. you can naturally say what is before and what
More informationAdvanced Data Science
Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter
More informationEECS E6870: Lecture 4: Hidden Markov Models
EECS E6870: Lecture 4: Hidden Markov Models Stanley F. Chen, Michael A. Picheny and Bhuvana Ramabhadran IBM T. J. Watson Research Center Yorktown Heights, NY 10549 stanchen@us.ibm.com, picheny@us.ibm.com,
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationStatistical Methods for NLP
Statistical Methods for NLP Information Extraction, Hidden Markov Models Sameer Maskey Week 5, Oct 3, 2012 *many slides provided by Bhuvana Ramabhadran, Stanley Chen, Michael Picheny Speech Recognition
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationHidden Markov Models in Language Processing
Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications
More informationPage 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence
Page Hidden Markov models and multiple sequence alignment Russ B Altman BMI 4 CS 74 Some slides borrowed from Scott C Schmidler (BMI graduate student) References Bioinformatics Classic: Krogh et al (994)
More informationLatent Variable View of EM. Sargur Srihari
Latent Variable View of EM Sargur srihari@cedar.buffalo.edu 1 Examples of latent variables 1. Mixture Model Joint distribution is p(x,z) We don t have values for z 2. Hidden Markov Model A single time
More informationUniversity of Cambridge. MPhil in Computer Speech Text & Internet Technology. Module: Speech Processing II. Lecture 2: Hidden Markov Models I
University of Cambridge MPhil in Computer Speech Text & Internet Technology Module: Speech Processing II Lecture 2: Hidden Markov Models I o o o o o 1 2 3 4 T 1 b 2 () a 12 2 a 3 a 4 5 34 a 23 b () b ()
More informationHidden Markov models
Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures
More informationHidden Markov Models
Hidden Markov Models Slides mostly from Mitch Marcus and Eric Fosler (with lots of modifications). Have you seen HMMs? Have you seen Kalman filters? Have you seen dynamic programming? HMMs are dynamic
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationCOMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma
COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More information10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)
10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of
More informationLecture 3: Machine learning, classification, and generative models
EE E6820: Speech & Audio Processing & Recognition Lecture 3: Machine learning, classification, and generative models 1 Classification 2 Generative models 3 Gaussian models Michael Mandel
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationSpeech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models.
Speech Recognition Lecture 8: Expectation-Maximization Algorithm, Hidden Markov Models. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Expectation-Maximization (EM)
More informationMarkov Chains and Hidden Markov Models
Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationClustering and Gaussian Mixtures
Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationp L yi z n m x N n xi
y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationLecture 15. Probabilistic Models on Graph
Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationMaster 2 Informatique Probabilistic Learning and Data Analysis
Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel
More informationMore on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013
More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationStatistical NLP: Hidden Markov Models. Updated 12/15
Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first
More information