Introduction to Probabilistic Graphical Models: Exercises

Similar documents
Linear Dynamical Systems

Lecture 1a: Basic Concepts and Recaps

Basic Sampling Methods

Nonparameteric Regression:

Chris Bishop s PRML Ch. 8: Graphical Models

Pattern Recognition and Machine Learning

Probabilistic Graphical Models

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

STA 4273H: Statistical Machine Learning

Bayesian Decision and Bayesian Learning

Machine Learning 4771

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Factor Analysis and Kalman Filtering (11/2/04)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

STA 4273H: Statistical Machine Learning

CSC411 Fall 2018 Homework 5

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Lecture 4: Probabilistic Learning

STA 4273H: Statistical Machine Learning

Bayesian Machine Learning - Lecture 7

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

Dynamic Approaches: The Hidden Markov Model

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Introduction to Graphical Models

Note Set 5: Hidden Markov Models

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Variational Mixture of Gaussians. Sargur Srihari

CIS 390 Fall 2016 Robotics: Planning and Perception Final Review Questions

Kernel Bayes Rule: Nonparametric Bayesian inference with kernels

p L yi z n m x N n xi

CS Homework 3. October 15, 2009

Series 7, May 22, 2018 (EM Convergence)

Bayesian Machine Learning

9 Forward-backward algorithm, sum-product on factor graphs

Introduction to Machine Learning

Probabilistic Graphical Models Homework 2: Due February 24, 2014 at 4 pm

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Probabilistic Graphical Models

p(d θ ) l(θ ) 1.2 x x x

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Markov Chains and Hidden Markov Models

Multivariate Bayesian Linear Regression MLAI Lecture 11

CS281A/Stat241A Lecture 17

The Expectation Maximization or EM algorithm

Lecture 13 : Variational Inference: Mean Field Approximation

13: Variational inference II

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Graphical Models Seminar

Mathematical Formulation of Our Example

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Probabilistic Graphical Models (I)

Nonparametric Bayesian Methods (Gaussian Processes)

Variational Inference (11/04/13)

Variational inference

Graphical Models for Collaborative Filtering

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

GWAS V: Gaussian processes

Probabilistic Graphical Models

Variational Autoencoders

Inference and estimation in probabilistic time series models

Dynamic models 1 Kalman filters, linearization,

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

A Bayesian Treatment of Linear Gaussian Regression

Gaussian with mean ( µ ) and standard deviation ( σ)

Expectation Maximization

Expectation Propagation in Dynamical Systems

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Dimension Reduction. David M. Blei. April 23, 2012

Linear Dynamical Systems (Kalman filter)

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Probabilistic Graphical Models

X t = a t + r t, (7.1)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Gaussian Process Approximations of Stochastic Differential Equations

Conditional Random Field

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

Expectation Maximization

Introduction to Probabilistic Graphical Models

Advanced Introduction to Machine Learning CMU-10715

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Mobile Robot Localization

Gaussian Process Regression

Basic math for biology

Density Estimation. Seungjin Choi

Parametric Techniques

STA 4273H: Statistical Machine Learning

Lecture 5: GPs and Streaming regression

Probabilistic Graphical Models

An Introduction to Expectation-Maximization

Final Exam, Machine Learning, Spring 2009

Lecture 6: April 19, 2002

The Variational Gaussian Approximation Revisited

Transcription:

Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010

Exercise 1: basics Given the following graphical model: x 1 x 2 x 3 x 4 x 5 x 6 x 7 How can p(x 1,..., x 7 ) be docomposed?

Exercise 2: d-separation x n α µ w N t n N x n N σ 2 ˆt ˆx (a) (b) Let p(x µ) be a univariate Gaussian and p(µ) the prior on the mean. Assume we observe {x n }. The corresponding Bayesian network is shown in Fig. a. Unfold the graphical model and write down the joint. Can we exploit any CI? What happens if we marginalise wrt µ? Fig. b is the graphical model for Bayesian polynomial regression. The variables {x n, t n } are the input-output pairs and w is the vector or regression weights. The variable ˆt is the prediction. Are training outputs and prediction independent? What is the consequence in practice?

Exercise 3: linear dynamical system z 1 z 2 z n 1 z n z n+1 x 1 x 2 x n 1 x n x n+1 Consider a state space model with Gaussian transition density and Gaussian noise: p(z n z n 1 ) = N (Fz n 1, R), p(x n z n ) = N (Hz n, Q). where F, H, R and Q are parameters. Write the factor graph and the joint density. Can we decompose p(x n+1 x 1,..., x n )? Motivate. Use the sum-product algorithm to derive the forward recursion (Kalman filter). Write the recursion for the message from f 1 to z 2. (Hint: first rewrite the message from h to f 1?) What specific form does it take for arbitrary n? Use Gaussian identities for write down the explicit form. Code and test the algorithm.

Multivariate Gaussian distribution Let x be a D-dimensional Gaussian random vector. The density of x is defined as N (µ, Σ) = (2π) D/2 Σ 1/2 exp { 1 } 2 (x µ) Σ 1 (x µ), he Gaussian where µ Distribution R D 1 the mean and Σ R D D is the covariance matrix. he Gaussian distribution is given by Figure: 2-dimensional Gaussian. p(x µ Σ) = N(µ Σ) = (2π) D/2 Σ 1/2 exp ( 1 2 (x µ) Σ 1 (x µ) )

Gaussian identities Consider the following two Gaussian distributions: p(x) = N (µ x, Σ xx ), p(y x) = N (Ax + b, Λ). The marginal p(y) is Gaussian with mean and covariance given by µ y = Aµ x + b, Σ yy = Λ+AΣ xx A. The posterior p(x y) is Gaussian with mean and covariance equal to µ x y = Σ x y {Σ 1 xx µ x + A Λ 1 (y b)}, Σ x y = (Σ 1 xx + A Λ 1 A) 1. (For proofs see for example chapter 2 of Bishop, 2006.)

Exercise 4: Gaussian mixture model π z n x n µ Σ N Define the Bayesian network and write down the log-likelihood. Show that the posterior q(z) is of proportional to exp{ln p(x, z; θ)}. Write the expected log-complete likelihood. Derive the M step. Are there constraints to take into account? Download the wine data from the UCI repository and implement the algorithm. Investigate how the different components move and change shape (by plotting ellipses) as a function of the number of iterations. (Consider only two dimensions.) Split the data in two sets. Train the algorithm on one and compute the log-likelihood of both sets. Plot their evolution as a function of the number of components. What do you observe?

Solution 1 and 2 1: Bishop chapter 8, p 362 (top). 2: Bishop chapter 8, p 379 (bottom) and p 380 (top).

Solution 3 Factor graph idem HMM but with Gaussian parametric form. No decomposition (check d-separation). Forward recursion idem HMM. First message is proportional to p(z 1 x 1 ). Then: µ f2 x 2 (z 2 ) = p(x 2 z 2 ) p(z 2, z 1 x 1 )dz 1 = p(x 2 z 2 )p(z 2 x 1 ) p(z 2 x 2, x 1 ). µ fn x n (z n ) p(x n z n )p(z n x n 1,..., x 1 ) p(z n x n,..., x 1 ). The p(z n x n 1,..., x 1 ) is called the predictive density. Direct application of first Gaussian identity. The second is found by completing the squares (second identity). It is called the filtering density. In practice you need to assume some Gaussian prior on the initial state.