Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Size: px
Start display at page:

Download "Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang"

Transcription

1 Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang

2 Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check the dependences and independences Define local conditional probability distributions Mixture models Latent factor analysis

3 Mixture Model Cancer is a general disease category consisting of many different types of diseases. For example, breast cancer has four major subtypes: normal-like, basal, luminal A and luminal B, with distinct clinical outcomes. Each subtype has different gene expression patterns. We need to infer the subtypes based on the observed gene expressions. Variables: subtypes (s), gene expressions (e) Relation: subtypes define the patterns of expressions s e N 1) Circles represent variables. Shadow/white circles represent observed/unobserved variables. 2) Rectangle indicates the variables in the same logic layer. The number (N) at the right-bottom corner represents the number of samples. 3

4 Variables: subtypes (s), gene expressions (e) Relation: subtypes define the patterns of expressions Important: carefully check the independences in your model s e N π μ s e Σ N Consider the local CPDs Different subtypes have different occurrence probability (s) K P s k, 1 i k1 Each expression pattern (indicated by i) independently follows different Gaussian distributions given the cancer subtype (e s) e, p s k k i k 4

5 Latent Factor Model Cancer is usually caused by multiple driving processes (factors), such as sustainable proliferation, resistance to cell death, immune escape and promoted vascular growth, etc. These driving processes have combined effects on the gene expressions patterns. We want to infer the driving processes based on large-scale gene expression datasets. Variables: driving factors (z i ), gene expressions (e j ) Relations: z determines the distribution of e Local CPDs Driving process z i follows a standard Gaussian distribution p z 0,1, i 1,, K i Expression e j also follows a Gaussian distribution but its mean is determined by a linear model of z p e z z z i1 K 2 j 1,..., k 0 i i, j 5

6 Variables: independent driving processes (z i ), gene expressions (e j ) Relations: z determines the distribution of e Local CPDs Driving process z i follows a standard Gaussian distribution p z 0,1, i 1,, K e 2 i Expression e j also follows a Gaussian distribution but its mean is determined by a linear model of z e 1 α 1 τ 1 z 1 z K e L N α 2 τ 2 α L τ L p e z z z i1 K 2 j 1,..., k 0j ij i, j 6

7

8 Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check the dependences and independences Define local conditional probability distributions Questions A vector or scalar for a variable? how to deal with the variables with unknown independences? (Recall assignment #2)

9 Outlines Stochastic Process and Markov property Dynamic Bayesian Networks Hidden Markov Models (HMMs) State-Observation Models A Few Examples of HMM Representations Calculations for HMMs Generate Simulated Data Extended Models

10 Markov / Memoryless Property Add the time dimension to variables X X (t) Assumptions X 0 X 1 X 2 X 3 Time can be discretized into interesting points t 1,t 2,...t n ( ) n P X P( X ( t) X (1),... X ( t 1)) Markov assumption holds P( X ) t 1 n P( X ( t) X ( t t Distribution is stationary (time invariant or homogeneous) 1 1)) i, j : P( X ( i) X ( i 1)) P( X ( j) X ( j 1))

11 Example: Random Walk Define X (t) as the coordinate of particle at time point t. The particle can randomly move to neighbor variables (nodes) with fixed probability. Then, P(X (t+1) ) is only determined by X (t). lysing-in-vivo-cell-migration-data/

12 Dynamic Bayesian Networks for Time Invariant Process/System

13 Consider a smoke detection tracking application, where we have 3 rooms connected in a row. Each room has a true smoke level (X) and a measured smoke level (Y) by a smoke detector situated in the middle of the room. Which of the below are the proper DBN structure(s) for this problem? X X 1 1 X 1 X 1 Y 1 Y 1 X 2 X 2 X 2 X 2 Y 2 Y 2 X 3 X 3 X 3 X 3 Y 3 Y 3 X 1 X 1 X 1 X 1 Y 1 Y 1 X 2 X 2 X 2 X 2 Y 2 Y 2 X 3 X 3 X 3 X 3 Y 3 Y 3

14 The Parameterization of Dynamic BNs Usually assume time-invariant systems (parameters are constant over time) The parameters in each time slice The parameters between internal state variables The emission or observation parameters between state variables and observation variables The transition parameters between two consecutive time slices

15 Dice problem You stay in a casino for a long time. You find that the probability of dice is equal for each of its six faces (1-6). But you doubt that there must be some trick on the dice. So you record the series of the dice number of each play, and want to find the problem... Hidden Markov Models Probabilistic Graphic Models, Fall

16 Dice problem Hidden Markov Models Probabilistic Graphic Models, Fall

17 Hidden Markov Models S 0 S 1 S 2 S 3 State variable S (dice index 1~K) O 1 O 2 O 3 Observation variable O (dice number 1~6) Local structure #1 Hidden states (cannot be observed) are linked as Markov chain: P(S T S t<t ) = P(S T S T-1 ) Local structure #2 Observable variables are only determined by the hidden state: P(O T S t<=t ) = P(O T S T )

18 State-Observation Models If the hidden variables can be discretized into several states, HMM can be represented as State- Observation Models. S 0 S 1 S 2 O 1 O 2 S 3 O 3 Hidden states (cannot be observed) are linked as Markov chain: P(S T S t<t ) = P(S T S T-1 ) Observable variables are only determined by the hidden state: P(O T S t<=t ) = P(O T S T )

19 State-Observation Models State-State Transmission Matrix T (n,n) S t = TS t-1 State-Observation Emission Matrix E (m,n) O t = ES t

20 Example: Stock Market Index We can define three different states: UP/DOWN/UNCHANGE Hidden Markov Models Probabilistic Graphic Models, Fall

21 Calculations in HMMs Problem 1: P X θ, given the model and the observation sequence, infer the probability of getting that observation sequence from the model Problem 2: argmax Y P(X, Y θ), given the model and the observation sequence, infer the hidden labels of the sequence Problem 3: argmax θ P(X θ), if parameters are un known, learn them from the observation sequence

22 Question #1: P X θ Inference

23 Formal Model Representation HMMs (discretized) have the below parameters and the observation sequences X T ti, j, i, j 1... N, E ei, j, i 1... N, j 1... K,, 1... i i N X x, t 1,..., T t Hidden Markov Models Probabilistic Graphic Models, Fall

24 Compute P(X θ) i P x,...,, 1 x y i t t t Forward/Backward algorithm Initialization: Induction: i e i i x1 1, N t 1 i t j t j, i ei, x j1 Termination: t 1 P X N T i1 i Y 1 X 1 Y 2 X 2 Y 3 X 3

25 i P x,,, 1 x y i Compute P(X θ) t t T t Forward/Backward algorithm Initialization: Induction: Termination: T N i 1 i t e j t i, j j, xt 1 t1 j1 Y 1 Y 2 X 1 X 2 P X i e j N 0 j j, x1 1 j1 Y 3 X 3

26 Full Chain Local Transition (forward) Local Transition (Backward) Figures from Rabiner LR. Proceedings of the IEEE 1989, 77(2):

27 Question #2: argmax Y P(X, Y θ) Inference

28 Full Chain argmax Y P(X, Y θ) Figures from Rabiner LR. Proceedings of the IEEE 1989, 77(2): Infer the most possible states

29 Compute argmax Y P(X, Y θ) HMM inference (Viterbi algorithm) Known transition matrix and emission matrix Infer the maximum probability STATE series The probability at time 0 with observation x 0 For Y i e 1 : 1, i i i, x The probability at time 1 with observation x 1 e max e t e max t 2, i i, x y y, x y, i i, x 1, y y, i y 1,, N y 1,, N i arg max y ty i 2 1, 1 1, y 1,, N 1 Hidden Markov Models Probabilistic Graphic Models, Fall

30 Compute argmax Y P(X, Y θ) HMM inference (Viterbi algorithm) The probability at time t with observation x t t, i i, x t1, y y, i y 1,, N i arg max 1, t, For the last observation e t t1 t1 t1 t t yt1 yt1 i y 1,, N t1 max t e max t T, i i, x 1,, 1,, T y N y y i T T 1 T 1 T 1 i arg max 1, t, T T yt 1 yt 1 i y 1,, N T 1 y * T arg max y T 1,, N T, y T Hidden Markov Models Probabilistic Graphic Models, Fall

31 Compute argmax Y P(X, Y θ) HMM inference (Viterbi algorithm) After y t is inferred, trace back to get other y i y * T arg max y T 1,, N We can infer other y i * * y i y arg max t T T Hidden Markov Models Probabilistic Graphic Models, Fall T, y i arg max 1, t, t t yt 1 yt 1 i y 1,, N t1 T 1 T T T 1, y 1 y 1, i y 1,, N T 1 arg max y i y t * * t1 t t y 1,, N t 1, yt 1 yt 1, i t1 T

32 Question #3: argmax θ P(X θ) Learning

33 Compute argmax θ P(θ X) Baum Welch algorithm Problem 3: if all parameters are unknown, we need to learn the model from observations Unknown variables The probability of y=i for time point t and y=j for the following time point t+1 i, j P y i, y j X, t t t1 Hidden Markov Models Probabilistic Graphic Models, Fall

34 Baum Welch algorithm T Use forward and backward algorithm to calculate probability i, j P y i, y j X, t t t1 i j t1, t Y Y i1 j1 i t e t i, j j, x t1 i t, E, t i, j t1 j, x j j e t1 0 i P x 1,, x, y i t t t 0 i P x 1,, x y i, t t T t Hidden Markov Models Probabilistic Graphic Models, Fall

35 Compute argmax θ P(X θ) Baum Welch algorithm The probability of Y=i for time point t: t i Y t j1 i, j So expected times for state stayed in Y=i and the expected times for state transition i-j: T 1 t i, t i j t0 T 1 t0 Hidden Markov Models Probabilistic Graphic Models, Fall

36 Compute argmax θ P(X θ) Baum Welch algorithm Re-estimate all parameters (maximize) t T 1 t t0 i, j T 1 t0 t i, j i t0 Repeat above steps until convergence e ix, T x x i T t t0 0 P X P X log log t i t Hidden Markov Models Probabilistic Graphic Models, Fall

37 Gibbs Sampling for Learning You can read textbooks and supplementary materials. Understand the prior setting of unknown parameters. Tobias Ryden. EM verus Markov chain Monte Carlo for Estimation of Hidden Markov Models: A Computational Perspective. Bayesian Analysis 2008, 3(4): Zoubin Grahramani, Michael I Jordan. Factorial Hidden Markov Models. Machine Learning 1997, 29(2-3):

38 MCMC & Gibbs Sampling In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of random samples from multivariate probability distribution (i.e. from the joint probability distribution of two or more random variables), when direct sampling is difficult. Its theory and detailed methods will be introduced in the following sections.

39 Generate the hidden variables using current parameter Y t ~P Y X, θ t Step 1: Inference (Gibbs Sampling) Initialization: generate the labeling sequencing according to the prior probability Y y, 1 t T t Re-generate Y t according to the initial setting P y Y, X, P y y, y, x, t t t1 t1 t P y, x y, y, P y y, t1 t t t1 t t1 t t1 t t t1 t t1 t t1 t1 t t t t t1 y, y y, e y, y P y, x y, P y y, P x y, P y y, t e t i We get a score proportional to the probability. So we can randomly generate Y t based on these scores.

40 Step 2: Parameter Re-Estimation Simply update the parameter in transition and emission probability matrix t i,j = e k,j = T t=1 T t=1 I Y t =j,y t 1 =i T I Y t =j,x t =k T Note: I is an indicator function If the chain is not long enough, run multiple runs of step 1 and then do step 2

41 Generate simulated data using HMMs Find the initial probability for hidden states Markov chain Monte Carlo (MCMC) They are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the desired distribution. Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated MCMC-based algorithms such as coupling from the past can produce exact samples. A good chain will have rapid mixing (the stationary distribution is reached quickly starting from an arbitrary position) described further under Markov chain mixing time. Hidden Markov Models Probabilistic Graphic Models, Fall

42 Extended Models Factorial HMMs Hierarchical HMMs (GeneScan) Bayesian HMMs Two HMMs comparisons Max Entropy Markov Models Conditional Random Fields Hidden Markov Models Probabilistic Graphic Models, Fall

43 Factorial HMMs Hierarchical HMMs What ever complex structure if you can solve it. Tree Structured HMMs

44 Max Entropy Markov Models In machine learning, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discriminative model that extends a standard maximum entropy classifier by assuming that the unknown values to be learned are connected in a Markov chain rather than being conditionally independent of each other. Y 0 Y 1 Y 2 Y 3 Y 0 Y 1 Y 2 Y 3 X 1 X 2 X 3 X 1 X 2 X 3 HMM MEMM Hidden Markov Models Probabilistic Graphic Models, Fall

45 Max Entropy Markov Models Hidden Markov Models Probabilistic Graphic Models, Fall

46 Supplementary Materials [1] Lawrence R Rabiner. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 1989, 77(2): [2] Zoubin Ghahramani. An introduction to Hidden Markov Models and Bayesian Networks. International Journal of Pattern Recognition and Artificial Intelligence 2001, 15(1):9-42.

47 The End of Chapter 4 HMM is a generative model HMM is powerful for modeling hidden mechanisms Hidden Markov Models Probabilistic Graphic Models, Fall

48 Practice for HMMs Two different parts: Part I is recommended to be finished in three weeks); and, Part II can be finished later The full report should be submitted before December 5

49 Literature Survey Neural networks & deep learning Deep belief networks Convolutional neural networks Recurrent neural networks Residual neural networks Bayesian induction for concept learning Latent Dirichlet allocation & Dirichlet process Conditional random fields

50 Course Project 1 st choice: make your own proposal before Oct 17 (recommended!) Please send s to Dongfang Wang Requirements Clearly describe your problems & aims The availability of data The source of your project (by your supervisor or an international competition or your own interest?) 50

51 Course Project 2 nd choice: object recognition of sundial ( 日晷 ). Do image segmentation and target object recognition. (Major model: MRFs) 3 rd choice: molecular subtyping of liver cancer. Identify consensus gene expression patterns from multiple datasets. (Major model: hierarchical models / LDA) You will be organized into groups, but every member needs to write his own project report

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models Part 2: Algorithms Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma COMS 4771 Probabilistic Reasoning via Graphical Models Nakul Verma Last time Dimensionality Reduction Linear vs non-linear Dimensionality Reduction Principal Component Analysis (PCA) Non-linear methods

More information

Machine Learning Summer School

Machine Learning Summer School Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,

More information

Advanced Data Science

Advanced Data Science Advanced Data Science Dr. Kira Radinsky Slides Adapted from Tom M. Mitchell Agenda Topics Covered: Time series data Markov Models Hidden Markov Models Dynamic Bayes Nets Additional Reading: Bishop: Chapter

More information

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm Probabilistic Graphical Models 10-708 Homework 2: Due February 24, 2014 at 4 pm Directions. This homework assignment covers the material presented in Lectures 4-8. You must complete all four problems to

More information

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013 More on HMMs and other sequence models Intro to NLP - ETHZ - 18/03/2013 Summary Parts of speech tagging HMMs: Unsupervised parameter estimation Forward Backward algorithm Bayesian variants Discriminative

More information

Introduction to Artificial Intelligence (AI)

Introduction to Artificial Intelligence (AI) Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 10 Oct, 13, 2011 CPSC 502, Lecture 10 Slide 1 Today Oct 13 Inference in HMMs More on Robot Localization CPSC 502, Lecture

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Lecture 6: Graphical Models: Learning

Lecture 6: Graphical Models: Learning Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)

More information

Note Set 5: Hidden Markov Models

Note Set 5: Hidden Markov Models Note Set 5: Hidden Markov Models Probabilistic Learning: Theory and Algorithms, CS 274A, Winter 2016 1 Hidden Markov Models (HMMs) 1.1 Introduction Consider observed data vectors x t that are d-dimensional

More information

Bayesian Hidden Markov Models and Extensions

Bayesian Hidden Markov Models and Extensions Bayesian Hidden Markov Models and Extensions Zoubin Ghahramani Department of Engineering University of Cambridge joint work with Matt Beal, Jurgen van Gael, Yunus Saatci, Tom Stepleton, Yee Whye Teh Modeling

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Data Analyzing and Daily Activity Learning with Hidden Markov Model

Data Analyzing and Daily Activity Learning with Hidden Markov Model Data Analyzing and Daily Activity Learning with Hidden Markov Model GuoQing Yin and Dietmar Bruckner Institute of Computer Technology Vienna University of Technology, Austria, Europe {yin, bruckner}@ict.tuwien.ac.at

More information

Master 2 Informatique Probabilistic Learning and Data Analysis

Master 2 Informatique Probabilistic Learning and Data Analysis Master 2 Informatique Probabilistic Learning and Data Analysis Faicel Chamroukhi Maître de Conférences USTV, LSIS UMR CNRS 7296 email: chamroukhi@univ-tln.fr web: chamroukhi.univ-tln.fr 2013/2014 Faicel

More information

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong Abstract: In this talk, a higher-order Interactive Hidden Markov Model

More information

Statistical Processing of Natural Language

Statistical Processing of Natural Language Statistical Processing of Natural Language and DMKM - Universitat Politècnica de Catalunya and 1 2 and 3 1. Observation Probability 2. Best State Sequence 3. Parameter Estimation 4 Graphical and Generative

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS LEARNING DYNAMIC SYSTEMS: MARKOV MODELS Markov Process and Markov Chains Hidden Markov Models Kalman Filters Types of dynamic systems Problem of future state prediction Predictability Observability Easily

More information

Bayesian Networks BY: MOHAMAD ALSABBAGH

Bayesian Networks BY: MOHAMAD ALSABBAGH Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Math 350: An exploration of HMMs through doodles.

Math 350: An exploration of HMMs through doodles. Math 350: An exploration of HMMs through doodles. Joshua Little (407673) 19 December 2012 1 Background 1.1 Hidden Markov models. Markov chains (MCs) work well for modelling discrete-time processes, or

More information

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly

More information

Hidden Markov Models

Hidden Markov Models Hidden Markov Models Lecture Notes Speech Communication 2, SS 2004 Erhard Rank/Franz Pernkopf Signal Processing and Speech Communication Laboratory Graz University of Technology Inffeldgasse 16c, A-8010

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Hidden Markov models

Hidden Markov models Hidden Markov models Charles Elkan November 26, 2012 Important: These lecture notes are based on notes written by Lawrence Saul. Also, these typeset notes lack illustrations. See the classroom lectures

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

order is number of previous outputs

order is number of previous outputs Markov Models Lecture : Markov and Hidden Markov Models PSfrag Use past replacements as state. Next output depends on previous output(s): y t = f[y t, y t,...] order is number of previous outputs y t y

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

CSCE 471/871 Lecture 3: Markov Chains and

CSCE 471/871 Lecture 3: Markov Chains and and and 1 / 26 sscott@cse.unl.edu 2 / 26 Outline and chains models (s) Formal definition Finding most probable state path (Viterbi algorithm) Forward and backward algorithms State sequence known State

More information

15-381: Artificial Intelligence. Hidden Markov Models (HMMs)

15-381: Artificial Intelligence. Hidden Markov Models (HMMs) 15-381: Artificial Intelligence Hidden Markov Models (HMMs) What s wrong with Bayesian networks Bayesian networks are very useful for modeling joint distributions But they have their limitations: - Cannot

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Parametric Models Part III: Hidden Markov Models

Parametric Models Part III: Hidden Markov Models Parametric Models Part III: Hidden Markov Models Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2014 CS 551, Spring 2014 c 2014, Selim Aksoy (Bilkent

More information

Learning from Sequential and Time-Series Data

Learning from Sequential and Time-Series Data Learning from Sequential and Time-Series Data Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/? Sequential and Time-Series Data Many real-world applications

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging 10-708: Probabilistic Graphical Models 10-708, Spring 2018 10 : HMM and CRF Lecturer: Kayhan Batmanghelich Scribes: Ben Lengerich, Michael Kleyman 1 Case Study: Supervised Part-of-Speech Tagging We will

More information

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391

Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Hidden Markov Models The three basic HMM problems (note: change in notation) Mitch Marcus CSE 391 Parameters of an HMM States: A set of states S=s 1, s n Transition probabilities: A= a 1,1, a 1,2,, a n,n

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

L23: hidden Markov models

L23: hidden Markov models L23: hidden Markov models Discrete Markov processes Hidden Markov models Forward and Backward procedures The Viterbi algorithm This lecture is based on [Rabiner and Juang, 1993] Introduction to Speech

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

MACHINE LEARNING 2 UGM,HMMS Lecture 7

MACHINE LEARNING 2 UGM,HMMS Lecture 7 LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7 THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability

More information

Hidden Markov Models Hamid R. Rabiee

Hidden Markov Models Hamid R. Rabiee Hidden Markov Models Hamid R. Rabiee 1 Hidden Markov Models (HMMs) In the previous slides, we have seen that in many cases the underlying behavior of nature could be modeled as a Markov process. However

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Probabilistic Models for Sequence Labeling

Probabilistic Models for Sequence Labeling Probabilistic Models for Sequence Labeling Besnik Fetahu June 9, 2011 Besnik Fetahu () Probabilistic Models for Sequence Labeling June 9, 2011 1 / 26 Background & Motivation Problem introduction Generative

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Inference in Explicit Duration Hidden Markov Models

Inference in Explicit Duration Hidden Markov Models Inference in Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar Columbia University November, 2011 Wood (Columbia University) EDHMM Inference November, 2011 1 /

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course)

10. Hidden Markov Models (HMM) for Speech Processing. (some slides taken from Glass and Zue course) 10. Hidden Markov Models (HMM) for Speech Processing (some slides taken from Glass and Zue course) Definition of an HMM The HMM are powerful statistical methods to characterize the observed samples of

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Multiscale Systems Engineering Research Group

Multiscale Systems Engineering Research Group Hidden Markov Model Prof. Yan Wang Woodruff School of Mechanical Engineering Georgia Institute of echnology Atlanta, GA 30332, U.S.A. yan.wang@me.gatech.edu Learning Objectives o familiarize the hidden

More information

Sequence modelling. Marco Saerens (UCL) Slides references

Sequence modelling. Marco Saerens (UCL) Slides references Sequence modelling Marco Saerens (UCL) Slides references Many slides and figures have been adapted from the slides associated to the following books: Alpaydin (2004), Introduction to machine learning.

More information

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative Chain CRF General

More information

CS 7180: Behavioral Modeling and Decision- making in AI

CS 7180: Behavioral Modeling and Decision- making in AI CS 7180: Behavioral Modeling and Decision- making in AI Learning Probabilistic Graphical Models Prof. Amy Sliva October 31, 2012 Hidden Markov model Stochastic system represented by three matrices N =

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis .. Lecture 6. Hidden Markov model and Viterbi s decoding algorithm Institute of Computing Technology Chinese Academy of Sciences, Beijing, China . Outline The occasionally dishonest casino: an example

More information

Unsupervised Learning

Unsupervised Learning Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College

More information

LECTURE 15 Markov chain Monte Carlo

LECTURE 15 Markov chain Monte Carlo LECTURE 15 Markov chain Monte Carlo There are many settings when posterior computation is a challenge in that one does not have a closed form expression for the posterior distribution. Markov chain Monte

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Directed and Undirected Graphical Models

Directed and Undirected Graphical Models Directed and Undirected Graphical Models Adrian Weller MLSALT4 Lecture Feb 26, 2016 With thanks to David Sontag (NYU) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Tutorial on Probabilistic Programming with PyMC3

Tutorial on Probabilistic Programming with PyMC3 185.A83 Machine Learning for Health Informatics 2017S, VU, 2.0 h, 3.0 ECTS Tutorial 02-04.04.2017 Tutorial on Probabilistic Programming with PyMC3 florian.endel@tuwien.ac.at http://hci-kdd.org/machine-learning-for-health-informatics-course

More information

Chapter 05: Hidden Markov Models

Chapter 05: Hidden Markov Models LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 05: Hidden Markov Models Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute of Technology Institute of Measurement and Control

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Time-Sensitive Dirichlet Process Mixture Models

Time-Sensitive Dirichlet Process Mixture Models Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Hybrid HMM/MLP models for time series prediction

Hybrid HMM/MLP models for time series prediction Bruges (Belgium), 2-23 April 999, D-Facto public., ISBN 2-649-9-X, pp. 455-462 Hybrid HMM/MLP models for time series prediction Joseph Rynkiewicz SAMOS, Université Paris I - Panthéon Sorbonne Paris, France

More information

Household Energy Disaggregation based on Difference Hidden Markov Model

Household Energy Disaggregation based on Difference Hidden Markov Model 1 Household Energy Disaggregation based on Difference Hidden Markov Model Ali Hemmatifar (ahemmati@stanford.edu) Manohar Mogadali (manoharm@stanford.edu) Abstract We aim to address the problem of energy

More information

Lecture 13: Structured Prediction

Lecture 13: Structured Prediction Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page

More information