Learning to Disentangle Factors of Variation with Manifold Learning

Similar documents
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Introduction to Restricted Boltzmann Machines

Knowledge Extraction from DBNs for Images

Restricted Boltzmann Machines

Chapter 20. Deep Generative Models

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Inductive Principles for Restricted Boltzmann Machine Learning

Greedy Layer-Wise Training of Deep Networks

Deep Boltzmann Machines

UNSUPERVISED LEARNING

The Origin of Deep Learning. Lili Mou Jan, 2015

Modeling Documents with a Deep Boltzmann Machine

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Chapter 16. Structured Probabilistic Models for Deep Learning

How to do backpropagation in a brain

Robust Classification using Boltzmann machines by Vasileios Vasilakakis

Restricted Boltzmann Machines for Collaborative Filtering

Deep unsupervised learning

arxiv: v2 [cs.ne] 22 Feb 2013

Learning Deep Architectures for AI. Part II - Vijay Chakilam

An efficient way to learn deep generative models

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

CONVOLUTIONAL DEEP BELIEF NETWORKS

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Basic Principles of Unsupervised and Unsupervised

Contrastive Divergence

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Unsupervised Learning

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Annealing Between Distributions by Averaging Moments

Restricted Boltzmann Machines

Neural Networks: Backpropagation

Notes on Machine Learning for and

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Deep Neural Networks

Variational Inference for Monte Carlo Objectives

Scaling Neighbourhood Methods

Lecture 4 Towards Deep Learning

Lecture 16 Deep Neural Generative Models

An Empirical Investigation of Minimum Probability Flow Learning Under Different Connectivity Patterns

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

ARestricted Boltzmann machine (RBM) [1] is a probabilistic

Notes on Boltzmann Machines

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Generative models for missing value completion

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Naïve Bayes classification

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Bayesian Networks (Part I)

3 : Representation of Undirected GM

arxiv: v1 [cs.ne] 6 May 2014

Undirected Graphical Models: Markov Random Fields

Statistical Learning Theory. Part I 5. Deep Learning

Jakub Hajic Artificial Intelligence Seminar I

Machine Learning Basics

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

CMU-Q Lecture 24:

Gaussian Cardinality Restricted Boltzmann Machines

arxiv: v2 [cs.lg] 17 Nov 2016

Spatial Transformation

Learning and Evaluating Boltzmann Machines

Loss Functions and Optimization. Lecture 3-1

Machine Learning with Tensor Networks

CSC 412 (Lecture 4): Undirected Graphical Models

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

The Recurrent Temporal Restricted Boltzmann Machine

Neural Networks: Backpropagation

using Low-Depth Quantum Circuits Guillaume Verdon Co-Founder & Chief Scientific Officer

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Deep Belief Networks are compact universal approximators

Enhanced Gradient and Adaptive Learning Rate for Training Restricted Boltzmann Machines

TUTORIAL PART 1 Unsupervised Learning

Knowledge Extraction from Deep Belief Networks for Images

Bayesian Learning. Two Roles for Bayesian Methods. Bayes Theorem. Choosing Hypotheses

Kyle Reing University of Southern California April 18, 2018

An Efficient Learning Procedure for Deep Boltzmann Machines

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Andriy Mnih and Ruslan Salakhutdinov

An Efficient Learning Procedure for Deep Boltzmann Machines Ruslan Salakhutdinov and Geoffrey Hinton

Recurrent Latent Variable Networks for Session-Based Recommendation

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Comparison of Modern Stochastic Optimization Algorithms

Representation of undirected GM. Kayhan Batmanghelich

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Generative v. Discriminative classifiers Intuition

Learning Tetris. 1 Tetris. February 3, 2009

Learning Deep Boltzmann Machines using Adaptive MCMC

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Graphical Models for Collaborative Filtering

Feedforward Neural Networks

Machine Learning with Quantum-Inspired Tensor Networks

Transcription:

Learning to Disentangle Factors of Variation with Manifold Learning Scott Reed Kihyuk Sohn Yuting Zhang Honglak Lee University of Michigan, Department of Electrical Engineering and Computer Science 08 May 2015 Presented by: Kyle Ulrich Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 1 / 19

Introduction It is a challenge to separate the factors of variation that combine to generate observations e.g., for face images: pose, expression, illumination, identity, etc. This work considers each factor of variation as forming a sub-manifold Observations are formed from the interactions between these sub-manifolds Furthermore, additional strategies may help to disentangle factors of variation, e.g., Taking into account label information Forcing known similarities/differences on the sub-manifold Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 2 / 19

Example We could be interested in supporting queries such as: Given a fixed identity, provide the same face image in a different pose Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 3 / 19

Review of Restricted Boltzmann Machines The RBM is a bipartite graphical model with visible units v {0, 1} D and hidden units h {0, 1} K. The joint distribution is: P(v, h) = 1 Z exp( E(v, h)), (1) D K K E(v, h) = v i W ik h k b k h k i=1 k=1 k=1 i=1 With conditional distributions: P(v i = 1 h) = σ( W ik h k + c i ) k D c i v i (2) P(h k = 1 v) = σ( i W ik v i + b k ) Contrastive divergence (CD) is often used to approximate gradients for Θ = {W, b, c} Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 4 / 19

The disentangling Boltzmann machine (disbm) The disbm models higher-order interactions between observations and multiple groups of hidden units For now, consider two groups of hidden units h and m Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 5 / 19

The disentangling Boltzmann machine (disbm) Consider the disbm model with visible units v {0, 1} D and two groups of hidden units, h {0, 1} K and m {0, 1} L The energy function is defined as E(v, m, h) = f ( i W v if v i)( j W m jf m j)( k W h kf h k) ij P m ij v i m j ik P h ik v ih k (3) such that the weight tensor W R D L K has F factors, W ijk = F f =1 W v if W m jf W h kf (4) Two-way interactions are allowed between the visible units and each hidden group through P m and P h Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 6 / 19

Conditional Independence between Groups Hidden units are no longer conditionally independent given visible units However, each group is conditionally independent given all other groups: P(v i = 1 h, m) = σ( jk W ijk m j h k + j P m ij m j + k P h ik h k) (5) P(m j = 1 v, h) = σ( ik W ijk v i h k + i P m ij v i ) (6) P(h k = 1 v, m) = σ( ij W ijk v i m j + i P h ik v i) (7) Allows for efficient 3-way block Gibbs sampling Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 7 / 19

Inference Variational inference is used to approximate the true posterior with the factorized distribution Q(m, h) = Q(m j )Q(h k ) j Minimizing KL(Q(m, h) P(m, h v)) provides the fixed-point equations k ĥ k = σ( ij W ijk v i ˆm j + i P h ik v i) (8) ˆm j = σ( ik W ijk v i ĥ k + i P m ij v i ) (9) where ĥ k = Q(h k = 1) and ˆm j = Q(m j = 1) Alternate updating ĥ and ˆm until convergence Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 8 / 19

Learning The model is trained to maximize the data log-likelihood using stochastic gradient descent The gradient of the log likelihood with respect to the parameters Θ = {W v, W m, W h, P m, P h } is [ ] [ ] E(v, m, h) E(v, m, h) E P(m,h v) + E θ P(v,m,h) θ The first term can be approximated using variational inference The second term can be approximated with persistent CD using 3-way sampling Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 9 / 19

Disentangling It is desired to disentangle the factors of variation i.e., each group of hidden units is sensitive to changes in a single factor of variation and remains relatively invariant to changes in others Several disentangling methods are proposed for the disbm Partial labels Clamping Manifold-based Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 10 / 19

Disentangling: Partial Labels Labels may be provided for any group of hidden units Here, hidden units e are connected to hidden units m The energy function is augmented such that E label (v, m, h, e) = E(v, m, h) jl m j U jl e l (10) Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 11 / 19

Disentangling: Partial Labels The energy function is augmented such that E label (v, m, h, e) = E(v, m, h) jl m j U jl e l (10) where l e l = 1 Variational inference updates proceed according to the equations: ĥ k = σ( ij W ijk v i ˆm j + i P h ik v i) (11) ˆm j = σ( ik W ijk v i ĥ k + i P m ij v i + l U jl ê l ) (12) ê l = exp( j U jl ˆm j ) l exp( j U jl ˆm j) (13) Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 12 / 19

Disentangling: Clamping Perhaps it is known two data points match in some factor of variation (e.g., images of the same person) The hidden units h may be clamped between these data by: E clamp (v (1), v (2), m (1), m (2), h) = E(v (1), m (1), h) + E(v (2), m (2), h) (11) Fixed point equations are adjusted such that ĥ k = σ( ij W ijk v (1) i ˆm (1) j + i P h ik v (1) i + ij W ijk v (2) i ˆm (2) j + i P h ik v (2) i ) (12) Note: labels may be included simultanously on another group of hidden units Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 13 / 19

Disentangling: Manifold-Based Training Perhaps clamping is too strong of an assumption Clamping forces pairs to the same point on the manifold Does not exploit non-correspondence Another method is to learn a representation for h such that h (1) h (2) 2 2 0 h (1) h (3) 2 2 β, if (v (1), v (2) ) D sim, if (v (1), v (3) ) D dis The objective is augmented with the manifold objective: h (1) h (2) 2 2 + max(0, β h (1) h (3) 2 ) 2 (13) Gradients may be computed with RNN backpropagation Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 14 / 19

Experiments: Flipped MNIST Digits A random 50% of digits in the MNIST dataset had all their pixel values flipped A disbm model was trained with two groups: a single flip unit and appearance units Flip mode was successfully disentangled A linear SVM was trained on the appearance units for classification Samples from flipped MNIST dataset Test classification errors Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 15 / 19

Experiments: Toronto Face Database (TFD) Contains 112,234 face images with 7 possible emotion labels and 3,874 identity labels Given an input identity, disbm can traverse the expression manifold Fix identity units h and label units e Perform Gibbs sampling between v and m Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 16 / 19

Experiments: CMU Multi-PIE Contains 754,200 face images with variations in pose, lighting, and expression Given an input identity, disbm can traverse the pose manifold Fix identity units h and label units e Perform Gibbs sampling between v and m Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 17 / 19

Experiments: TFD & Multi-PIE Identities in the left column are transferred to the expressions and poses of the middle column Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 18 / 19

Experiments: TFD Performance on TFD of emotion recognition and face verification Emotion recognition: trained linear SVM, report % accuracy Face verification: use cosine similarity as a score, report AUC Comparisons to other methods Comparisons among different proposed disentangling methods Reed et al. (University of Michigan) Disentangling Boltzmann Machines 08 May 2015 19 / 19