Probabilistic Graphical Models

Similar documents
Introduction to Machine Learning

Probabilistic Graphical Models

Probabilistic Graphical Models

Probabilistic Graphical Models

Introduction to Machine Learning

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

STA 4273H: Statistical Machine Learning

Linear Dynamical Systems

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

STA 4273H: Statistical Machine Learning

Factor Analysis and Kalman Filtering (11/2/04)

Variational Autoencoders

STA 4273H: Statistical Machine Learning

Linear Dynamical Systems (Kalman filter)

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Probabilistic Machine Learning

Introduction to Graphical Models

Pattern Recognition and Machine Learning

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

CS242: Probabilistic Graphical Models Lecture 7B: Markov Chain Monte Carlo & Gibbs Sampling

Machine Learning Techniques for Computer Vision

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Lecture 6: April 19, 2002

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 7: Con3nuous Latent Variable Models

Lecture 13 : Variational Inference: Mean Field Approximation

Latent Variable Models and EM Algorithm

STA 4273H: Statistical Machine Learning

Recent Advances in Bayesian Inference Techniques

Mathematical Formulation of Our Example

13: Variational inference II

CS281 Section 4: Factor Analysis and PCA

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

The Expectation-Maximization Algorithm

STA414/2104 Statistical Methods for Machine Learning II

Bayesian Inference and MCMC

CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs

Variational Inference (11/04/13)

Unsupervised Learning

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Bayesian Networks Inference with Probabilistic Graphical Models

Graphical Models for Collaborative Filtering

Basic Sampling Methods

STA 414/2104: Machine Learning

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Part 1: Expectation Propagation

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Lecture 16 Deep Neural Generative Models

K-Means and Gaussian Mixture Models

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

STA 4273H: Sta-s-cal Machine Learning

Machine Learning for Data Science (CS4786) Lecture 24

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Particle Filters. Outline

STA 414/2104: Machine Learning

Lecture 6: Bayesian Inference in SDE Models

Dynamic models 1 Kalman filters, linearization,

Density Estimation. Seungjin Choi

ECE521 Lecture 19 HMM cont. Inference in HMM

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic Graphical Models

Sampling Rejection Sampling Importance Sampling Markov Chain Monte Carlo. Sampling Methods. Oliver Schulte - CMPT 419/726. Bishop PRML Ch.

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Based on slides by Richard Zemel

Variational inference

Probabilistic Graphical Models

Networks. Dynamic. Bayesian. A Whirlwind Tour. Johannes Traa. Computational Audio Lab, UIUC

Latent Variable Models

Statistical Techniques in Robotics (16-831, F12) Lecture#17 (Wednesday October 31) Kalman Filters. Lecturer: Drew Bagnell Scribe:Greydon Foil 1

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

p L yi z n m x N n xi

Probabilistic Graphical Models

Stat 521A Lecture 18 1

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Quantitative Biology II Lecture 4: Variational Methods

Nonlinear and/or Non-normal Filtering. Jesús Fernández-Villaverde University of Pennsylvania

Bayesian Networks BY: MOHAMAD ALSABBAGH

1 Bayesian Linear Regression (BLR)

CONDENSATION Conditional Density Propagation for Visual Tracking

Nonparametric Bayesian Methods (Gaussian Processes)

Bayesian Machine Learning

Latent Variable Models and EM algorithm

ECE 5984: Introduction to Machine Learning

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

CS181 Midterm 2 Practice Solutions

Introduction to Probabilistic Graphical Models: Exercises

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Mobile Robot Localization

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Sampling Methods (11/30/04)

CSC411 Fall 2018 Homework 5

Advanced Introduction to Machine Learning

Transcription:

Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures courtesy Michael Jordan s draft textbook, An Introduction to Probabilistic Graphical Models

Undirected Gaussian Graphical Models N (x 0, ) = 1 Z Y st(x s,x t ) (s,t)2e Z =((2 ) N ) 1/2 Undirected Markov properties correspond to sparse inverse covariance matrices For connected Gaussian MRFs, covariance is usually dense (all pairs correlated) Number of parameters, and thus learning complexity, reduced from O(N 2 )too(n) J = 1

Duality in Gaussian Distributions Marginals: 1 11 = 11 12 1 22 21 Conditionals: Ø Moment parameters: trivial marginalization, conditioning requires computation Ø Canonical parameters: trivial conditioning, marginalization requires computation

Linear Gaussian Systems Marginal Likelihood: X Posterior Distribution: Y

Directed Gaussian Graphical Models N (x 0, ) = Y s2v N (x s A s x (s),r s ) Sequence of locally normalized conditional distributions of each D-dimensional node: A s R s x s Linear state space model is a widely used special case: x t+1 N (A t x t,q t ) y t N (C t x t,r t ) Dimensionality reduction: State smaller than observed vectors Rich temporal dynamics: State larger than observed vectors

Probabilistic PCA & Factor Analysis Both Models: Data is a linear function of low-dimensional latent coordinates, plus Gaussian noise p(x i z i, )=N (x i Wz i + µ, ) p(x i ) =N (x i µ, W W T + ) Factor analysis: Probabilistic PCA: x 2 is a general diagonal matrix is a multiple of identity matrix = 2 I p(x bz) w p(z i ) =N (z i 0,I) x 2 low rank covariance parameterization µ } b z w µ p(z) p(x) bz z x 1 C. Bishop, Pattern Recognition & Machine Learning x 1

Expectation Maximization (EM) ln p(x ) =ln ln p(x ) Z z Z z Initialization: Randomly select starting parameters E-Step: Given parameters, find posterior of hidden data q (t) = arg max q L(q, (t 1) ) M-Step: Given posterior distributions, find likely parameters (t) = arg max p(x, z ) dz q(z)lnp(x, z ) dz L(q (t), ) Iteration: Alternate E-step & M-step until convergence Z z q(z) lnq(z) dz, L(q, ) (0)

ln p(x ) Z z EM: Expectation Step q(z)lnp(x, z ) dz Z z q(z) lnq(z) dz, L(q, ) q (t) = arg max q L(q, (t 1) ) General solution, for any probabilistic model: q (t) (z) =p(z x, (t 1) ) posterior distribution given current parameters For factor analysis and probabilistic PCA these are Gaussian: p(z x, ) = NY i=1 p(z i x i, ) = {W, µ, } p(z i x i,w,µ, ) = N (z i i W T 1 (x i µ), i ) 1 i = I + W T 1 W

PCA versus Probabilistic PCA p(z i x i,w,µ, ) = N (z i i W T 1 (x i µ), i ) 1 i = I + W T 1 W Standard PCA (orthogonal projection) Probabilistic PCA (shrunk towards mean) Maximum likelihood estimates of probabilistic PCA parameters are equal to the classic PCA eigenvector solution For classical PCA, optimal embedding is orthogonal projection For PPCA, latent coordinates are biased towards mean (zero)

EM: Maximization Step Z Z ln p(x ) q(z)lnp(x, z ) dz q(z) lnq(z) dz, L(q, ) z z Z (t) = arg max L(q (t), ) = arg max q(z)lnp(x, z ) dz Unlike E-step, no simplified general solution For factor analysis and probabilistic PCA, these reduce to weighted linear regression problems ln p(x, z ) =C + 1 NX zi 2 + D log 2 + 2 x i Wz i µ 2 2 min W,µ, 1 2 = 2 I NX i=1 i=1 D log 2 + 2 E q [ x i Wz i µ 2 ] z

EM Algorithm for Probabilistic PCA 2 (a) 0!2 True Principal Components!2 0 2 C. Bishop, Pattern Recognition & Machine Learning

EM Algorithm for Probabilistic PCA 2 (b) 0!2 E-Step #1 (Projection)!2 0 2 C. Bishop, Pattern Recognition & Machine Learning

EM Algorithm for Probabilistic PCA 2 (c) 0!2 M-Step #1 (Regression)!2 0 2 C. Bishop, Pattern Recognition & Machine Learning

EM Algorithm for Probabilistic PCA 2 (d) 0!2 E-Step #2 (Projection)!2 0 2 C. Bishop, Pattern Recognition & Machine Learning

EM Algorithm for Probabilistic PCA 2 (e) 0!2 M-Step #2 (Regression)!2 0 2 C. Bishop, Pattern Recognition & Machine Learning

EM Algorithm for Probabilistic PCA 2 (f) 0!2 Converged Solution!2 0 2 C. Bishop, Pattern Recognition & Machine Learning

Factor EM for Analysis: Linear State Space State Space: Models Factor Analysis or PPCA Linear-Gaussian State Space Model E-Step: Ø Independently find Gaussian posteriors for each observation M-Step: Ø Weighted linear regression to map embeddings to observations E-Step: Ø Determine posterior marginals via Gaussian BP (Kalman smoother) M-Step: Ø Observation regression identical to factor analysis M-step Ø Separate dynamics regression

Linear State Space Models States & observations jointly Gaussian: Ø All marginals & conditionals Gaussian Ø Linear transformations remain Gaussian

Simple Linear Dynamics Brownian Motion Constant Velocity Time Time

Constant Velocity Tracking Kalman Filter Kalman Smoother (K. Murphy, 1998)

Nonlinear State Space Models State dynamics and measurements given by potentially complex nonlinear functions Noise sampled from non-gaussian distributions

Examples of Nonlinear Models Dynamics implicitly determined by geophysical simulations Observed image is a complex function of the 3D pose, other nearby objects & clutter, lighting conditions, camera calibration, etc.

Nonlinear Filtering Prediction: Update:

Approximate Nonlinear Filters No direct represention of continuous functions, or closed form for the prediction integral Big literature on approximate filtering: Ø Histogram filters Ø Extended & unscented Kalman filters Ø Particle filters Ø

Nonlinear Filtering Taxonomy Histogram Filter: Ø Evaluate on fixed discretization grid Ø Only feasible in low dimensions Ø Expensive or inaccurate Extended/Unscented Kalman Filter: Ø Approximate posterior as Gaussian via linearization, quadrature, Ø Inaccurate for multimodal posterior distributions Particle Filter: Ø Dynamically evaluate states with highest probability Ø Monte Carlo approximation

E[f] = Z Monte Carlo Methods f(z)p(z) dz 1 L Provably good if L sufficiently large: Unbiased for any sample size Variance inversely proportional to sample size (and independent of dimension of space) Weak law of large numbers Strong law of large numbers Problem: Drawing samples from complex distributions LX `=1 Alternatives for hard problems: f(z (`) ) Importance sampling Markov chain Monte Carlo (MCMC) z (`) p(z) Estimation of expected model properties via simulation