Probabilistic Graphical Models

Similar documents
Probabilistic Graphical Models

Dynamic models 1 Kalman filters, linearization,

Probabilistic Graphical Models

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Probabilistic Graphical Models

Probabilistic Graphical Models

Probabilistic Graphical Models

COMP90051 Statistical Machine Learning

Intelligent Systems (AI-2)

Intelligent Systems (AI-2)

Intelligent Systems (AI-2)

Learning Bayesian network : Given structure and completely observed data

Probabilistic Graphical Models

Probabilistic Graphical Models

Introduction to Graphical Models

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond

PROBABILISTIC REASONING OVER TIME

Lecture 7: Optimal Smoothing

Probabilistic Graphical Models

STA 4273H: Statistical Machine Learning

Pattern Recognition and Machine Learning

What you need to know about Kalman Filters

Linear Dynamical Systems

Undirected Graphical Models

Probabilistic Graphical Models

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

A brief introduction to Conditional Random Fields

Machine Learning 4771

The Expectation-Maximization Algorithm

4 : Exact Inference: Variable Elimination

13: Variational inference II

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Chris Bishop s PRML Ch. 8: Graphical Models

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Based on slides by Richard Zemel

Stat 521A Lecture 18 1

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Bayesian networks Lecture 18. David Sontag New York University

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

Chapter 17: Undirected Graphical Models

9 Forward-backward algorithm, sum-product on factor graphs

Conditional Random Field

CS Lecture 4. Markov Random Fields

Autonomous Mobile Robot Design

Expectation Propagation Algorithm

Lecture 15. Probabilistic Models on Graph

10708 Graphical Models: Homework 2

Graphical Models. Outline. HMM in short. HMMs. What about continuous HMMs? p(o t q t ) ML 701. Anna Goldenberg ... t=1. !

15-780: Grad AI Lecture 19: Graphical models, Monte Carlo methods. Geoff Gordon (this lecture) Tuomas Sandholm TAs Erik Zawadzki, Abe Othman

ECE521 Lecture 19 HMM cont. Inference in HMM

Machine Learning (CS 567) Lecture 5

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Undirected Graphical Models: Markov Random Fields

Bayesian Machine Learning - Lecture 7

Temporal probability models. Chapter 15

COMS 4771 Probabilistic Reasoning via Graphical Models. Nakul Verma

Chapter 16. Structured Probabilistic Models for Deep Learning

An Introduction to Bayesian Machine Learning

Mathematical Formulation of Our Example

Graphical Models for Collaborative Filtering

Hidden Markov models

Lecture 9: PGM Learning

Kalman filtering and friends: Inference in time series models. Herke van Hoof slides mostly by Michael Rubinstein

14 : Theory of Variational Inference: Inner and Outer Approximation

p L yi z n m x N n xi

Nonparametric Bayesian Methods (Gaussian Processes)

Naïve Bayes classification

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

Probabilistic Graphical Models

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Autonomous Navigation for Flying Robots

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Probabilistic Graphical Models (Cmput 651): Hybrid Network. Matthew Brown 24/11/2008

CSC 412 (Lecture 4): Undirected Graphical Models

Lecture 6: Bayesian Inference in SDE Models

Bayesian Networks BY: MOHAMAD ALSABBAGH

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Directed and Undirected Graphical Models

Part 1: Expectation Propagation

Temporal probability models. Chapter 15, Sections 1 5 1

Probabilistic Graphical Models

Probabilistic Graphical Models

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Introduction to Probabilistic Graphical Models

Notes on Machine Learning for and

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Probabilistic Graphical Models

Probabilistic Graphical Models (I)

Advanced Machine Learning & Perception

Course 16:198:520: Introduction To Artificial Intelligence Lecture 9. Markov Networks. Abdeslam Boularias. Monday, October 14, 2015

Conditional Random Fields and beyond DANIEL KHASHABI CS 546 UIUC, 2013

STA 4273H: Statistical Machine Learning

Why do we care? Measurements. Handling uncertainty over time: predicting, estimating, recognizing, learning. Dealing with time

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

The Multivariate Gaussian Distribution [DRAFT]

Template-Based Representations. Sargur Srihari

Generalized Linear Models and Exponential Families

Introduction to Machine Learning Midterm, Tues April 8

Transcription:

Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause

Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2

Parameter learning for log-linear models Feature functions φ i (C i ) defined over cliques Log linear model over undirected graph G Feature functions φ 1 (C 1 ),,φ k (C k ) Domains C i can overlap Joint distribution How do we get weights w i? 3

Log-linear conditional random field Define log-linear model over outputs Y No assumptions about inputs X Feature functions φ i (C i,x) defined over cliques and inputs Joint distribution 4

Example: CRFsin NLP Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Y 7 Y 8 Y 9 Y 10 Y 11 Y 12 X 1 X 2 X 3 X 4 X 5 X 6 X 7 X 8 X 9 X 10 X 11 X 12 Mrs. Greene spoke today in New York. Green chairs the finance committee Classify into Person, Location or Other 5

Example: CRFsin vision 6

Gradient of conditional log-likelihood Partial derivative Requires one inference per Can optimize using conjugate gradient 7

Exponential Family Distributions Distributions for log-linear models More generally: Exponential family distributions h(x): Base measure w: natural parameters φ(x): Sufficient statistics A(w): log-partition function Hereby x can be continuous (defined over any set) 8

Examples Exp. Family: Gaussian distribution h(x): Base measure w: natural parameters φ(x): Sufficient statistics A(w): log-partition function Other examples: Multinomial, Poisson, Exponential, Gamma, Weibull, chi-square, Dirichlet, Geometric, 9

Moments and gradients Correspondence between moments and log-partition function (just like in log-linear models) Can compute moments from derivatives, and derivatives from moments! MLE moment matching 10

Conjugate priors in Exponential Family Any exponential family likelihood has a conjugate prior 11

Exponential family graphical models So far, only defined graphical models over discrete variables. Can define GMs over continuous distributions! For exponential family distributions: Can do much of what we discussed (VE, JT, parameter learning, etc.) for such exponential family models Important example: Gaussian Networks 12

Multivariate Gaussian distribution Joint distribution over n random variables P(X 1, X n ) σ jk = E[ (X j µ j ) (X k - µ k ) ] X j and X k independent σ jk =0 13

Marginalization Suppose (X 1,,X n ) ~ N( µ, Σ) What is P(X 1 )?? More generally: Let A={i 1,,i k } {1,,N} Write X A = (X i1,,x ik ) X A ~ N( µ A, Σ AA ) 14

Conditioning Suppose (X 1,,X n ) ~ N( µ, Σ) Decompose as (X A,X B ) What is P(X A X B )?? P(X A = x A X B = x B ) = N(x A ; µ A B, Σ A B ) where Computable using linear algebra! 15

Conditional linear Gaussians 16

Canonical Representation Multivariate Gaussians in exponential family! Standard vs canonical form: µ= Λ -1 η Σ= Λ -1 17

Gaussian Networks Zeros in precision matrix Λ indicate missing edges in log-linear model! 18

Inference in Gaussian Networks Can compute marginal distributions in O(n 3 )! For large numbers n of variables, still intractable If Gaussian Network has low treewidth, can use variable elimination / JT inference! Need to be able to multiply and marginalize factors! 19

Multiplying factors in Gaussians 20

Conditioning in canonical form Joint distribution (X A, X B ) ~ N(η AB,Λ AB ) Conditioning: P(X A X B = x B ) = N(x A ; η A B=xB, Λ A B=xB ) 21

Marginalizing in canonical form Recall conversion formulas µ= Λ -1 η Σ= Λ -1 Marginal distribution 22

Standard vs. canonical form Marginalization Standard form Canonical form Conditioning In standard form, marginalization is easy In canonical form, conditioning is easy! 23

Variable elimination In Gaussian Markov Networks, Variable elimination = Gaussian elimination (fast for low bandwidth = low treewidth matrices) 24

Dynamical models 25

HMMs/ KalmanFilters Most famous Graphical models: Naïve Bayes model Hidden Markov model Kalman Filter Hidden Markov models Speech recognition Sequence analysis in comp. bio Kalman Filters control Cruise control in cars GPS navigation devices Tracking missiles.. Very simple models but very powerful!! 26

HMMs/ KalmanFilters X 1 X 2 X 3 X 4 X 5 X 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 X 1,,X T : Unobserved (hidden) variables Y 1,,Y T : Observations HMMs: X i Multinomial, Y i arbitrary KalmanFilters: X i, Y i Gaussian distributions Non-linear KF: X i Gaussian, Y i arbitrary 27

HMMsfor speech recognition Words X 1 X 2 X 3 X 4 X 5 X 6 Phoneme Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 He ate the cookies on the couch Infer spoken words from audio signals 28

Inference: Hidden Markov Models In principle, can use VE, JT etc. New variables X t, Y t at each time step need to rerun Bayesian Filtering: Suppose we already have computed P(X t y 1,,t ) Want to efficiently compute P(X t+1 y 1,,t+1 ) X 1 X 2 X 3 X 4 X 5 X 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 29

Bayesian filtering Start with P(X 1 ) At time t Assume we have P(X t y 1 t-1 ) Condition: P(X t y 1 t ) X 1 X 2 X 3 X 4 X 5 X 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Prediction: P(X t+1, X t y 1 t ) Marginalization: P(X t+1 y 1 t ) 30

Parameter learning in HMMs Assume we have labels for hidden variables Assume stationarity P(X t+1 X t ) is same over all time steps P(Y t X t ) is same over all time steps Violates parameter independence ( parameter sharing ) Example: compute parameters for P(X t+1 =x X t =x ) What if we don t have labels for hidden vars? Use EM (later this course) 31

KalmanFilters (Gaussian HMMs) X 1,,X T : Location of object being tracked Y 1,,Y T : Observations P(X 1 ): Prior belief about location at time 1 P(X t+1 X t ): Motion model How do I expect my target to move in the environment? Represented as CLG: X t+1 = A X t + N(0, Σ M ) P(Y t X t ): Sensor model What do I observe if target is at location X t? Represented as CLG: Y t = H X t + N(0, Σ O ) X 1 X 2 X 3 X 4 X 5 X 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 32

Understanding Motion model 33

Understanding sensor model 34

Bayesian Filtering for KFs Can use Gaussian elimination to perform inference in unrolled model X 1 X 2 X 3 X 4 X 5 X 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 Start with prior belief P(X 1 ) At every timestephave belief P(X t y 1:t-1 ) Condition on observation: P(X t y 1:t ) Predict (multiply motion model): P(X t+1,x t y 1:t ) Roll-up (marginalize prev. time): P(X t+1 y 1:t ) 35

Implementation Current belief: P(x t y 1:t-1 ) = N(x t ; η Xt, Λ Xt ) Multiply sensor and motion model Marginalize 36

What if observations not linear? Linear observations: Y t = H X t + noise Nonlinear observations: 37

Incorporating Non-gaussianobservations Nonlinear observation P(Y t X t ) not Gaussian First approach: Approximate P(Y t X t ) as CLG LinearizeP(Y t X t ) around current estimate E[X t y 1..t-1 ] Known as Extended Kalman Filter (EKF) Can perform poorly if P(Y t X t ) highly nonlinear Second approach: Approximate P(Y t, X t ) as Gaussian Takes correlation in X t into account After obtaining approximation, condition on Y t =y t (now a linear observation) 38

Finding Gaussian approximations Need to find Gaussian approximation of P(X t,y t ) How? Gaussians in Exponential Family Moment matching!! E[Y t ] = E[Y t 2] = E[X t Y t ] = 39

Linearization by integration Need to integrate product of Gaussian with arbitrary function Can do that by numerical integration Approximate integral as weighted sum of evaluation points Gaussian quadrature defines locations and weights of points For 1 dim: Exact for polynomials of degree D if choosing 2D points using Gaussian quadrature For higher dimensions: Need exponentially many points to achieve exact evaluation for polynomials Application of this is known as Unscented Kalman Filter (UKF) 40

Factored dynamical models So far: HMMs and Kalman filters X 1 X 2 X 3 X 4 X 5 X 6 Y 1 Y 2 Y 3 Y 4 Y 5 Y 6 What if we have more than one variable at each time step? E.g., temperature at different locations, or road conditions in a road network? Spatio-temporal models 41

Dynamic Bayesian Networks At every timestep have a Bayesian Network A 1 A 2 A 3 B 1 D 1 B 2 D 2 B 3 D 3 C 1 E 1 C 2 E 2 C 3 E 3 Variables at each time step t called a slice S t Temporal edges connecting S t+1 with S t 42

Tasks Read Koller& Friedman Chapters 6.2.3, 15.1 43