Machine Learning Techniques for Computer Vision

Similar documents
Recent Advances in Bayesian Inference Techniques

p L yi z n m x N n xi

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

STA 4273H: Statistical Machine Learning

Expectation Maximization

Variational Principal Components

VIBES: A Variational Inference Engine for Bayesian Networks

Pattern Recognition and Machine Learning

STA 414/2104: Machine Learning

STA 4273H: Statistical Machine Learning

STA 4273H: Sta-s-cal Machine Learning

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2

Density Estimation. Seungjin Choi

Curve Fitting Re-visited, Bishop1.2.5

L11: Pattern recognition principles

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Probabilistic Graphical Models

Lecture 16 Deep Neural Generative Models

Variational Autoencoders

Probabilistic Graphical Models

Nonparametric Bayesian Methods (Gaussian Processes)

Sparse Gaussian Markov Random Field Mixtures for Anomaly Detection

K-Means and Gaussian Mixture Models

Introduction to Machine Learning

Bayesian Methods for Machine Learning

PROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Statistical Pattern Recognition

STA414/2104 Statistical Methods for Machine Learning II

The Expectation-Maximization Algorithm

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

Linear Dynamical Systems

Mixtures of Gaussians. Sargur Srihari

Non-linear Bayesian Image Modelling

Unsupervised Learning

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Latent Variable Models

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

Machine Learning Overview

Lecture 7: Con3nuous Latent Variable Models

Expectation Propagation Algorithm

Lecture 13 : Variational Inference: Mean Field Approximation

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Introduction to Graphical Models

STA 4273H: Statistical Machine Learning

Chris Bishop s PRML Ch. 8: Graphical Models

Expectation Maximization

COURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Kyle Reing University of Southern California April 18, 2018

Latent Variable Models and EM Algorithm

13: Variational inference II

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Neural networks: Unsupervised learning

Statistical Pattern Recognition

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Introduction to Machine Learning

Generative models for missing value completion

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Lecture 6: Gaussian Mixture Models (GMM)

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

Expectation Propagation for Approximate Bayesian Inference

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

STA 4273H: Statistical Machine Learning

Probability and Information Theory. Sargur N. Srihari

Efficient Variational Inference in Large-Scale Bayesian Compressed Sensing

CSC 2541: Bayesian Methods for Machine Learning

Gaussian Mixture Models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

CS 6140: Machine Learning Spring 2017

Brief Introduction of Machine Learning Techniques for Content Analysis

Gaussian Mixture Models, Expectation Maximization

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Independent Component Analysis and Unsupervised Learning

CS281 Section 4: Factor Analysis and PCA

Variational inference

Latent Variable Models and Expectation Maximization

CPSC 540: Machine Learning

Variational Autoencoder

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Statistical learning. Chapter 20, Sections 1 4 1

Variational Learning : From exponential families to multilinear systems

Variational Methods in Bayesian Deconvolution

Lecture 21: Spectral Learning for Graphical Models

Part 1: Expectation Propagation

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Bayesian Machine Learning

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

Transcription:

Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1

Overview of Part 2 Mixture models EM Variational Inference Bayesian model complexity Continuous latent variables

The Gaussian Distribution Multivariate Gaussian mean covariance Maximum likelihood

Gaussian Mixtures Linear super-position of Gaussians Normalization and positivity require

Example: Mixture of 3 Gaussians 1 1 0.5 0.5 0 0 0.5 1 (a) 0 0 0.5 1 (b)

Maximum Likelihood for the GMM Log likelihood function Sum over components appears inside the log no closed form ML solution

EM Algorithm Informal Derivation

EM Algorithm Informal Derivation M step equations

EM Algorithm Informal Derivation E step equation

EM Algorithm Informal Derivation Can interpret the mixing coefficients as prior probabilities Corresponding posterior probabilities (responsibilities)

Old Faithful Data Set Time between eruptions (minutes) Duration of eruption (minutes)

Latent Variable View of EM To sample from a Gaussian mixture: first pick one of the components with probability then draw a sample from that component repeat these two steps for each new data point 1 0.5 0 0 0.5 1 (a)

Latent Variable View of EM Goal: given a data set, find Suppose we knew the colours maximum likelihood would involve fitting each component to the corresponding cluster Problem: the colours are latent (hidden) variables

Incomplete and Complete Data 1 1 0.5 0.5 0 0 0.5 1 (b) incomplete 0 0 0.5 1 (a) complete

Latent Variable Viewpoint

Latent Variable Viewpoint Binary latent variables describing which component generated each data point Conditional distribution of observed variable Z Prior distribution of latent variables X Marginalizing over the latent variables we obtain

Graphical Representation of GMM z n x n N

Latent Variable View of EM Suppose we knew the values for the latent variables maximize the complete-data log likelihood trivial closed-form solution: fit each component to the corresponding set of data points We don t know the values of the latent variables however, for given parameter values we can compute the expected values of the latent variables

Posterior Probabilities (colour coded) 1 1 0.5 0.5 0 0 0.5 1 (b) 0 0 0.5 1 (a)

Over-fitting in Gaussian Mixture Models Infinities in likelihood function when a component collapses onto a data point: with Also, maximum likelihood cannot determine the number K of components

Cross Validation Can select model complexity using an independent validation data set If data is scarce use cross-validation: partition data into S subsets train on S 1 subsets test on remainder repeat and average Disadvantages computationally expensive can only determine one or two complexity parameters

Bayesian Mixture of Gaussians Parameters and latent variables appear on equal footing Conjugate priors z n x n N

Data Set Size Problem 1: learn the function for from 100 (slightly) noisy examples data set is computationally small but statistically large Problem 2: learn to recognize 1,000 everyday objects from 5,000,000 natural images data set is computationally large but statistically small Bayesian inference computationally more demanding than ML or MAP (but see discussion of Gaussian mixtures later) significant benefit for statistically small data sets

Variational Inference Exact Bayesian inference intractable Markov chain Monte Carlo computationally expensive issues of convergence Variational Inference broadly applicable deterministic approximation let denote all latent variables and parameters approximate true posterior using a simpler distribution minimize Kullback-Leibler divergence

General View of Variational Inference For arbitrary where Maximizing over would give the true posterior this is intractable by definition

Variational Lower Bound

Factorized Approximation Goal: choose a family of q distributions which are: sufficiently flexible to give good approximation sufficiently simple to remain tractable Here we consider factorized distributions No further assumptions are required! Optimal solution for one factor, keeping the remainder fixed coupled solutions so initialize then cyclically update message passing view (Winn and Bishop, 2004)

1 x 2 0.5 0 0 0.5 x 1 (a) 1

Lower Bound Can also be evaluated Useful for maths/code verification Also useful for model comparison:

Illustration: Univariate Gaussian Likelihood function Conjugate prior Factorized variational distribution

Initial Configuration 2 (a) τ 1 0 1 0 µ 1

After Updating 2 (b) τ 1 0 1 0 µ 1

After Updating 2 (c) τ 1 0 1 0 µ 1

Converged Solution 2 (d) τ 1 0 1 0 µ 1

Variational Mixture of Gaussians Assume factorized posterior distribution No other approximations needed!

Variational Equations for GMM

Lower Bound for GMM

VIBES Bishop, Spiegelhalter and Winn (2002)

ML Limit If instead we choose we recover the maximum likelihood EM algorithm

Bound vs. K for Old Faithful Data

Bayesian Model Complexity

Sparse Bayes for Gaussian Mixture Corduneanu and Bishop (2001) Start with large value of K treat mixing coefficients as parameters maximize marginal likelihood prunes out excess components

Summary: Variational Gaussian Mixtures Simple modification of maximum likelihood EM code Small computational overhead compared to EM No singularities Automatic model order selection

Continuous Latent Variables Conventional PCA data covariance matrix eigenvector decomposition x 2 x n ~xn ~ xn u 1 Minimizes sum-of-squares projection not a probabilistic model how should we choose L? x 1

Probabilistic PCA Tipping and Bishop (1998) L dimensional continuous latent space D dimensional data space x 2 PCA factor analysis w { z x 1

Probabilistic PCA Marginal distribution z n Advantages x n N exact ML solution computationally efficient EM algorithm captures dominant correlations with few parameters mixtures of PPCA Bayesian PCA building block for more complex models W

EM for PCA 2 (a) 0 2 2 0 2

EM for PCA 2 (b) 0 2 2 0 2

EM for PCA 2 (c) 0 2 2 0 2

EM for PCA 2 (d) 0 2 2 0 2

EM for PCA 2 (e) 0 2 2 0 2

EM for PCA 2 (f) 0 2 2 0 2

EM for PCA 2 (g) 0 2 2 0 2

Bayesian PCA Bishop (1998) Gaussian prior over columns of Automatic relevance determination (ARD) z n N W x n ML PCA Bayesian PCA

Non-linear Manifolds Example: images of a rigid object x 3 x 1 x 2

Bayesian Mixture of BPCA Models W m s n x n z nm N m M

Flexible Sprites Jojic and Frey (2001) Automatic decomposition of video sequence into background model ordered set of masks (one per object per frame) foreground model (one per object per frame)

Transformed Component Analysis Generative model Now include transformations (translations) Extend to L layers s l m l Inference intractable so use variational framework T nl L x n N

Bayesian Constellation Model Li, Fergus and Perona (2003) Object recognition from small training sets Variational treatment of fully Bayesian model

Bayesian Constellation Model

Summary of Part 2 Discrete and continuous latent variables EM algorithm Build complex models from simple components represented graphically incorporates prior knowledge Variational inference Bayesian model comparison