MACHINE LEARNING 2 UGM,HMMS Lecture 7

Similar documents
Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Machine Learning CMU-10701

CS711008Z Algorithm Design and Analysis

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

Advanced Data Science

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Dynamic Approaches: The Hidden Markov Model

Linear Dynamical Systems (Kalman filter)

STA 4273H: Statistical Machine Learning

2 : Directed GMs: Bayesian Networks

An Introduction to Bioinformatics Algorithms Hidden Markov Models

STA 414/2104: Machine Learning

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Hidden Markov Models. x 1 x 2 x 3 x N

Linear Dynamical Systems

Hidden Markov Models

Graphical Models Seminar

HIDDEN MARKOV MODELS

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Lecture 9. Intro to Hidden Markov Models (finish up)

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Plan for today. ! Part 1: (Hidden) Markov models. ! Part 2: String matching and read mapping

1 : Introduction. 1 Course Overview. 2 Notation. 3 Representing Multivariate Distributions : Probabilistic Graphical Models , Spring 2014

order is number of previous outputs

Graphical models for part of speech tagging

Bayesian Machine Learning - Lecture 7

CSCE 471/871 Lecture 3: Markov Chains and

Hidden Markov Models. x 1 x 2 x 3 x K

Stephen Scott.

Example: The Dishonest Casino. Hidden Markov Models. Question # 1 Evaluation. The dishonest casino model. Question # 3 Learning. Question # 2 Decoding

Hidden Markov Models. Terminology, Representation and Basic Problems

9 Forward-backward algorithm, sum-product on factor graphs

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

CSCE 478/878 Lecture 9: Hidden. Markov. Models. Stephen Scott. Introduction. Outline. Markov. Chains. Hidden Markov Models. CSCE 478/878 Lecture 9:

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

3 : Representation of Undirected GM

Hidden Markov Models. x 1 x 2 x 3 x K

Hidden Markov Models 1

Note Set 5: Hidden Markov Models

Undirected Graphical Models: Markov Random Fields

Hidden Markov Models (I)

Directed Graphical Models or Bayesian Networks

Basic math for biology

2 : Directed GMs: Bayesian Networks

Hidden Markov Models. Three classic HMM problems

Hidden Markov models

COMP90051 Statistical Machine Learning

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Hidden Markov Models. based on chapters from the book Durbin, Eddy, Krogh and Mitchison Biological Sequence Analysis via Shamir s lecture notes

Page 1. References. Hidden Markov models and multiple sequence alignment. Markov chains. Probability review. Example. Markovian sequence

What s an HMM? Extraction with Finite State Machines e.g. Hidden Markov Models (HMMs) Hidden Markov Models (HMMs) for Information Extraction

Hidden Markov Models. Terminology and Basic Algorithms

Directed Probabilistic Graphical Models CMSC 678 UMBC

Statistical Methods for NLP

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Intelligent Systems (AI-2)

Chris Bishop s PRML Ch. 8: Graphical Models

Intelligent Systems:

Lecture 15. Probabilistic Models on Graph

Undirected Graphical Models

Chapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang

Variational Inference (11/04/13)

Hidden Markov Models Part 2: Algorithms

Hidden Markov Models

Lecture 9: PGM Learning

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Hidden Markov Models

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Statistical Learning

Intelligent Systems (AI-2)

Pattern Recognition and Machine Learning

4 : Exact Inference: Variable Elimination

Intelligent Systems (AI-2)

CSC 412 (Lecture 4): Undirected Graphical Models

Directed and Undirected Graphical Models

Machine Learning for OR & FE

Dept. of Linguistics, Indiana University Fall 2009

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Hidden Markov Models

11.3 Decoding Algorithm

6.047/6.878/HST.507 Computational Biology: Genomes, Networks, Evolution. Lecture 05. Hidden Markov Models Part II

Probabilistic Graphical Models (I)

Graphical Models Another Approach to Generalize the Viterbi Algorithm

CS532, Winter 2010 Hidden Markov Models

Lecture 7 Sequence analysis. Hidden Markov Models

Probabilistic Graphical Models

Part I. C. M. Bishop PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A gentle introduction to Hidden Markov Models

Master 2 Informatique Probabilistic Learning and Data Analysis

Computational Genomics and Molecular Biology, Fall

O 3 O 4 O 5. q 3. q 4. Transition

Conditional Random Fields

LEARNING DYNAMIC SYSTEMS: MARKOV MODELS

Multiscale Systems Engineering Research Group

Hidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1

Transcription:

LOREM I P S U M Royal Institute of Technology MACHINE LEARNING 2 UGM,HMMS Lecture 7

THIS LECTURE DGM semantics UGM De-noising HMMs Applications (interesting probabilities) DP for generation probability etc. (later Baum-Welch)

EXTENDED STUDENT EXAMPLE L H L H B L L H B - better H - higher L B L - less

EXTENDED STUDENT EXAMPLE

INDEPENDENCE I-MAP I(G) (conditional) independences implied by G (not yet defined) I(P) (conditional) independences in the distribution P G I-map for P in I(G) I(P) p q X Y X Y X Y

INDEPENDENCE I-MAP I(G) independences implied by G (not yet defined) I(P) independences in the distribution P G I-map for P in I(G) I(P) p q X Y X Y X Y p: X and Y ind. ex. p(x=1) = 0.48 + 0.12 =0.6, p(y=1) = 0.8, and p(x=1,y=1) = 0.48 q: X and Y are dependent

INDEPENDENCE I-MAP I(G) independences implied by G (not yet defined) I(P) independences in the distribution P G I-map for P in I(G) I(P) p q X G1 Y X G2 Y X G3 Y All three graphs are I-maps for p G1 and G2 are I-maps for q, but G3 is not

X Y O Z Chain D-SEPARATION A path is d-separated by O if it has X Y O Z Fork a chain X Y Z where Y O a fork X Y Z where Y O a v-structure X Y Z where (Y desc(y)) O = X Y O Z v-struct desc O

D-SEPARATION SETS AND CI OF DAGS G O A is d-separated from B given O if every undirected path between A and B is d-separated by O A B Cond. ind rel. in DAG G, x A G x B x O A is d-separated from B given O

FACTORIZATION OVER G p(x 1,...,x N )= N n=1 p(x n x pa(xn )) p can be factorized over G if it can be expressed as above

SOUNDNESS AND COMPLETENESS I(G) conditional independence relations implied by d-sep in G I(p) conditional independence relations satisfied by p Theorem A distribution P can be factorised over G iff I(G) I(p) = not possible to achieve, ex. clique and independent distribution

UGM UGMs - Undirected graphical models What is the direction between 2 pixels, 2 proteins? Coherence Probabilistic interpretation? Difficulty Intelligence p factorizes over G can be Grade SAT expressed as normalized product over factors associated with cliques Happy Letter Job

EXAMPLE CLIQUE

EXAMPLE MAXIMAL CLIQUE

EXAMPLE MAXIMUM CLIQUE

UGM An undirected graph G with so-called factors associated with its maximal cliques, for factor is a function from the clique s variables (the scope) to non-neg real numbers

Scope A,B B,C C,D D,A Factors misconception example Probability where P (A, B, C, D) = 1 Z 1 (A, B) 2 (B,C) 3 (C, D) 4 (D, A) Z = a,b,c,d 1(a, b) 2 (b, c) 3 (c, d) 4 (d, a) PROBABILISTIC INTERPRETATION

Misconception E.g. 1(A =1,B =1) 2 (B =1,C = 0) 3 (C =0,D = 1) 4 (D =1,A= 1) = 10 1 100 100 = 100000 Z = a,b,c,d 1(a, b) 2 (b, c) 3 (c, d) 4 (d, a) A FACTOR PRODUCT

DE-NOISING

p(y x ) ex Gaussian ISING MODEL- DE-NOISING Values -1,1 Factors of form and

ISING MODEL- DE-NOISING Values -1,1 Factors of form and Bipartite graph Suggests iterative procedu

Large is the noisy image; upper, UGM de-noised; and lower, graph cut de-noised

LATENT = HIDDEN Can reduce #parameters Can represent common causes

MARKOV CHAINS (DISCRETE) Directed graph with transition probabilities p2 We observe the sequence of visited vertices p1 q

MARKOV CHAINS (DISCRETE) Probabilities on outgoing edges sum to one p2 p1 i [d] pi =1 pd

THE OCCASIONALLY DISHONEST CASINO p 1-p 1-q Fair q Biased/loaded We observe the sequence of dice outcomes of visited vertices

EMISSION DISTRIBUTIONS

WHAT AN HMM DOES Starts in the state z1 When in state zt p outputs p(xt zt) 1-p 1-q moves to p(zt+1 zt) Fair q Biased/loaded Stops after a fixed number of steps or when reaching a stop step

WHAT AN HMM DOES Starts in the state z1 When in state zt p outputs p(xt zt) 1-p 1-q moves to p(zt+1 zt) Fair q Biased/loaded Stops after a fixed number of steps or when reaching a stop step The parameters

THE JOINT DISTRIBUTION Starts in the state z1 When in state zt emits p(xt zt) Categorial or Gaussian transits to p(zt+1 zt) Stops after a fixed number of steps or when reaching a stop step

THE JOINT DISTRIBUTION Starts in the state z1 When in state zt emits p(xt zt) Categorial or Gaussian transits to p(zt+1 zt) Stops after a fixed number of steps or when reaching a stop step

GAUSSIAN EMISSIONS AND HIDDEN STATES 20 15 10 13 12 9 5 0 7 17 3 16 6 25 4 15 10 11 14 8 20 18 5 1 19 10 20 15 10 5 0 5 10 15 20

LAYERED OR NOT D D D I I I I Begin M M M End 0 1 2 3 4

APPLICATIONS OF HMMS bat rat cat gnat goat x x... x AG- - - C A- AG- C AG- AA- - - A A A C AG- - - C 1 2... 3 D D D I I I I Begin M M M 0 1 2 3 4 End Automatic speech recognition Part of speech tagging Gene finding Gene family characterization Secondary structure prediction

TERMINOLOGY X ABOVE Z BELOW p(zt x1:t) p(zt+h x1:t) p(zt x1:t+l) p(zt x1:t)

MORE INFERENCE TYPES bat rat cat gnat goat x x... x AG- - - C A- AG- C AG- AA- - - A A A C AG- - - C 1 2... 3 D D D I I I I Begin M M M 0 1 2 3 4 End Viterbi (MAP) argmax p(z1:t x1:t) Posterior samples: ~p(z1:t x1:t) Parameters: given D & struct. Structure and param.: given D Probability of data: p(x1:t)

INFERENCE IN THE OCCASIONALLY DISHONEST CASINO Grey regions are states corresponding to biased die 1 filtered 1 smoothed 1 Viterbi p(loaded) 0.5 p(loaded) 0.5 Text MAP state (0=fair,1=loaded) 0.5 0 0 50 100 150 200 250 300 roll number 0 0 50 100 150 200 250 300 roll number 0 0 50 100 150 200 250 300 roll number Filtering: p(zt x1:t), online Smoothing, MAP state: p(zt x1:t) offline Viterbi, MAP path argmax p(z1:t x1:t)

Pairs of strings abbacd acbadd abbac abbacd abbac acbadd acbad acbad DP What is a subproblem? Rooted trees What is a subsolution? How do we decompose into smaller subproblems? How do we combine subsolutions into larger? How do we enumerate? How many and what time?

DP Polynomial many What is a subproblem? What is a subsolution? Polynomial time Polynomial time How do we decompose into smaller subproblems? How do we combine subsolutions into larger? Polynomial time How do we enumerate? Polynomial time overall How many and what time?

AN HMM CAN BE SEEN AS A DGM Zi hidden Xi observable Hidden often not observable when training, never when applying

SPECIAL CASE: HIDDEN MARKOV MODEL (HMM) Combinations of the transition distributions Zi hidden Combinations of emission the emission distribution Xi observable Hidden often not observable when training, never when applying

JOINT &FORWARD VARIABLE Joint is easy to express Graphical model The sum has exponentially many terms The forward variable, ft, can be computed with DP

Zt-1=k gives smaller Graphical model Knowing also Zt-1 breaks it into smaller, i.e., the event is the AND of the events

Applying sum rule Notice, by the sum rule, f t (k) =p(x 1:t 1,Z t = k) = X p(x 1:t 1,Z t 1 = k 0,Z t = k) k 0 2[K] The set of states each term in the sum is a probability of an event? # x 1!? #x2!? #x3 Z t 0 1 = k! Z t = k # # x t 1?!? which, as noted, can be broken into smaller

Forward recursion f t (k) = X l f t 1 (l) {z } smaller p(x t 1 Z t 1 = l) {z } emission p(z t = k Z t 1 = l) {z } transition

Forward recursion

Forward recursion Given Forward Algorithm For the start state k * s(0,k ):=1 For all other states k s(0,k):=0 For t=1 to T For k=1 to K

Time O(K 2 ) Forward Algorithm For the start state k * s(0,k ):=1 For all other states k s(0,k):=0 For t=1 to T { For k=1 to K }constant time }O(TK 2 ) O(K) So in total time O(TK 2 )

If layered If layered, total time O(TK) D D D I I I I O(K) Forward Algorithm For the start state k * s(0,k ):=1 For all other states k s(0,k):=0 For t=1 to T { For k=1 to K Begin M M M 0 1 2 3 4 }constant time }O(TK) End Replace by sum over constant number of states in previous layer

OBSERVATION PROBABILITY f t (k) =p(x 1:t 1, Z t = k) In general, (e.g. t=t) The final probability is easily obtained since

FILTERING emission data probability Filtering: p(zt x1:t), online

FILTERING emission data probability Filtering: p(zt x1:t), online

Backward variable Defined by Graphical model?! Z t = k #?!? # x t+1!? # xt

Sum rule gives Zt+1 Defined by Each term in the sum is a probability of an event?! Z t = k #? which is an AND of! Z t+1 = l # x t+1!? # xt Z t+1 = l # x t+1!? # x t+2!? # xt