Deep Learning Autoencoder Models

Similar documents
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Learning Deep Architectures for AI. Part II - Vijay Chakilam

CSC321 Lecture 20: Autoencoders

Lecture 16 Deep Neural Generative Models

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

UNSUPERVISED LEARNING

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Deep unsupervised learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Learning Deep Architectures

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

TUTORIAL PART 1 Unsupervised Learning

The Origin of Deep Learning. Lili Mou Jan, 2015

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Generative Models. (Unsupervised Learning)

STA 414/2104: Lecture 8

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Dynamic Approaches: The Hidden Markov Model

Cheng Soon Ong & Christian Walder. Canberra February June 2018

STA 414/2104: Lecture 8

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Deep Learning: a gentle introduction

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Introduction to Deep Neural Networks

10. Artificial Neural Networks

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Deep Learning Architectures and Algorithms

Machine Learning for Physicists Lecture 1

Sum-Product Networks: A New Deep Architecture

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Denoising Autoencoders

Auto-Encoders & Variants

Feature Design. Feature Design. Feature Design. & Deep Learning

Unsupervised Learning

Greedy Layer-Wise Training of Deep Networks

Deep Generative Stochastic Networks Trainable by Backprop

Grundlagen der Künstlichen Intelligenz

The connection of dropout and Bayesian statistics

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Jakub Hajic Artificial Intelligence Seminar I

Directed and Undirected Graphical Models

Deep Learning Lab Course 2017 (Deep Learning Practical)

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Knowledge Extraction from DBNs for Images

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Lecture 14: Deep Generative Learning

Deep Belief Networks are compact universal approximators

How to do backpropagation in a brain

Dimensionality Reduction and Principle Components Analysis

Learning Deep Architectures

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Machine Learning Techniques

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning & Neural Networks Lecture 2

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

An Efficient Learning Procedure for Deep Boltzmann Machines

Quantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA

Neural Networks: A Very Brief Tutorial

arxiv: v3 [cs.lg] 18 Mar 2013

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

attention mechanisms and generative models

Opportunities and challenges in quantum-enhanced machine learning in near-term quantum computers

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Generative models for missing value completion

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Part 2. Representation Learning Algorithms

Chapter 16. Structured Probabilistic Models for Deep Learning

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

Slide credit from Hung-Yi Lee & Richard Socher

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

arxiv: v1 [stat.ml] 2 Sep 2014

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

PV021: Neural networks. Tomáš Brázdil

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Basic Principles of Unsupervised and Unsupervised

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

c Copyright 2016 Galen Andrew

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

STA 4273H: Statistical Machine Learning

Recent Advances in Bayesian Inference Techniques

STA 4273H: Statistical Machine Learning

Introduction to Deep Learning

An efficient way to learn deep generative models

EE-559 Deep learning 9. Autoencoders and generative models

Machine Learning. Boris

Spectral Hashing: Learning to Leverage 80 Million Images

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Mathematical Formulation of Our Example

From perceptrons to word embeddings. Simon Šuster University of Groningen

Opportunities and challenges in near-term quantum computers

arxiv: v4 [cs.lg] 16 Apr 2015

Transcription:

Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR)

Generative Models Wrap-up Deep Learning Module Lecture Generative Graphical Models Bayesian Models/Nets N θ z w M α Unsupervised data understanding Interpretability Weak on supervised performance β K Markov Random Fields Y i X i Knowledge and constraints through feature functions CRF: the supervised way to generative Computationally heavy Y j Dynamic Models S 2 1 S 2 2 S 2 3 Y 2 t Q 1 2 Q 2 2 Topology unfolds on data structure Structured data processing Complex causal relationships

Generative Models Wrap-up Deep Learning Module Lecture Module s Take Home Messages Consider using generative models when Need interpretability Need to incorporate prior knowledge Unsupervised learning or learning with partially observable supervision Need reusable/portable learned knowledge Consider avoiding generative models when Having tight computational constraints Dealing with raw, noisy low level data Variational inference Efficient way to learn an approximation to intractable distributions The variational function can be a neural network

Generative Models Wrap-up Deep Learning Module Lecture Deep Learning AI Prediction Deep Learning Hard-coded expert reasoning Expertdesigned features Trainable predictor Learned features Learned feature hierarchy ML Input ANN

Generative Models Wrap-up Deep Learning Module Lecture Module Outline Recurrent, recursive and contextual (Micheli) Recurrent NN training refresher Recursive NN Graph processing Foundational models s and RBM Convolutional Neural Networks Gated Recurrent Networks (LSTM, ) Advanced topics Adversarial models, memory networks and attentional models Variational deep learning Deep reinforcement learning API and applications seminars

Generative Models Wrap-up Deep Learning Module Lecture Reference Book Ian Goodfellow and Yoshua Bengio and Aaron Courville, Deep Learning, MIT Press Chapters 6-10, 14,20 Integrated by course slides and additional readings Freely available online

Generative Models Wrap-up Deep Learning Module Lecture Module s Prerequisites Formal model of neuron Neural network Feed-forward Recurrent Cost function optimization Backpropagation/SGD Regularization Neural network hyper-parameters and model selection

Generative Models Wrap-up Deep Learning Module Lecture Lecture Outline Autoencoders a.k.a. The first and the latest deep learning model Autoencoders and dimensionality reduction Deep neural autoencoders Sparse Denoising Contractive Deep generative-based autoencoders Deep Belief Networks Deep Boltzmann Machines Application Examples

Key Concepts Neural Approaches Generative Approaches Basic Autoencoder (AE) input D Encoder f hidden h K Decoder g output x x Latent space projection (again) D Train a model to reconstruct the input Passing through some form of information bottleneck K << D, or? h sparsely active Train by loss minimization L x, x = L x, g f(x)

Key Concepts Neural Approaches Generative Approaches A Very Well Known Autoencoder x x h Encoding-Decoding h = f x = W e x x = g h = W d W e x Tied weights (often, not always) D W e K W d D Euclidean Loss W d = W e T = W T L x, x = x W T Wx 2 2 What if we take f and g linear and K<<D? Learns the same subspace of PCA

Key Concepts Neural Approaches Generative Approaches A Probabilistic View x x Stochastic Autoencoder h h P e (h x) P d ( x h) K x x A unifying view of neural and generative AE Paves the way to Variational Autoencoders

Key Concepts Neural Approaches Generative Approaches Neural Autoencoders Generally, we would like to train nonlinear AEs, with possibly K>D, that do not learn trivial identity Regularized autoencoders Sparse AE Denoising AE Contractive AE Autoencoders with dropout layers

Key Concepts Neural Approaches Generative Approaches Sparse Autoencoder Add a term to the cost function to penalize h (want the number of active units to be small) J SAE θ = x S (L(x, x) + λω(h)) Typically Ω h = Ω f(x) = j h j (x))

Key Concepts Neural Approaches Generative Approaches Probabilistic Interpretation (Oh No, Again!) (From ML Course) Training with regularization is MAP inference max log P h, x = max (log P x h + log P h ) Likelihood P h = λ 2 exp( λ h 1) Prior Ω h = λ h 1 Laplace

Key Concepts Neural Approaches Generative Approaches Denoising Autoencoder (DAE) Train the AE to minimize the function L(x, g(f x )) where x is a version of original input x corrupted by some noise process C( x x) Key Intuition - Learned representations should be robust to partial destruction of the input

Key Concepts Neural Approaches Generative Approaches Another Interpretation yes, exactly the one you are thinking of f h g Learns the denoising distribution P(x x) x x By minimizing C( x x) x log P d (x h = f x )

Key Concepts Neural Approaches Generative Approaches DAE as Manifold Learning Learning a vector field (green arrows) approximating the gradient of the unknown data generating distribution g h x log p(x) x C x x = N x x, σ 2 )

Key Concepts Neural Approaches Generative Approaches The Manifold Assumption Assume data lies on a lower dimensional non-linear manifold since variables in data are typically dependent Regularized AE can afford to represent only variations that are needed to reconstruct training examples AE mapping is sensitive only to changes in manifold direction Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2009.

Key Concepts Neural Approaches Generative Approaches Contractive Autoencoder Penalize encoding function for input sensitivity J CAE θ = x S (L(x, x) + λω(h)) Ω h = Ω f(x) = f(x) x 2 F You can as well penalize on higher order derivatives

Key Concepts Neural Approaches Generative Approaches Supervised learning h 4 h 3 h 2 h 1 x Unsupervised training Hierarchical autoencoder Extracts a representation of inputs that facilitates Data visualization, exploration, indexing, Realization of a supervised task

Key Concepts Neural Approaches Generative Approaches Unsupervised Layerwise Pretraining Incremental unsupervised construction of the Deep AE Any form of AE, e.g. those shown in previous slides x W 1 T h 1 W 1 h 1 W 1 x x

Key Concepts Neural Approaches Generative Approaches Unsupervised Layerwise Pretraining Incremental unsupervised construction of the Deep AE h 1 W 2 T h 2 h 2 W 2 W 2 h 1 h 1 x W 1

Key Concepts Neural Approaches Generative Approaches Unsupervised Layerwise Pretraining Incremental unsupervised construction of the Deep AE h 2 W 3 T h 4 W 4 h 3 W 3 W 3 h 3 h 2 h 2 h 1 x W 2 W 1

Key Concepts Neural Approaches Generative Approaches Optional Fine Tuning Fine tune the whole autoencoder to optimize input reconstruction You can use backpropagation, but it remains an unsupervised task W 4 T h 4 W 4 h 3 W 3 T h 3 W 3 h 2 W 2 T h 2 W 2 h 1 x W 1 T h 1 x W 1

Key Concepts Neural Approaches Generative Approaches Rearranging the Graphics Does it look like something familiar? h 3 h 4 h 3 A layered Restricted Boltzmann Machine h 1 h 2 h 1 h 2 Can use RBM to perform layerwise pretraining and learn the matrices W i x x

Key Concepts Neural Approaches Generative Approaches Deep Belief Network (DBN) A stack of pairwise RBM h 4 h 3 h 2 h 1 x RBM 4 RBM 3 RBM 2 RBM 1 IMPORTANT NOTE A DBM is a deep autoencoder but it is NOT a deep RBM It is (mostly) directed!

Key Concepts Neural Approaches Generative Approaches Deep Boltzmann Machine (DBM) How do we get this? h 4 h 3 Training requires some attention because of the recurrent interactions from higher layers to the bottom h 2 h 1 P h j 1 x, h 2 = σ i W ij 1 x i + m W jm 2 h m 2 x P x i h 1 = σ W ij 1 h j 1 j

Key Concepts Neural Approaches Generative Approaches Pretraining DBM How do we get this? 2) (Pre)training the second layer changes h 1 prior by P h 1 W 2 ) = h 1 P h 1, h 2 W 2 ) h 2 h 1 x 1) (Pre)training the first layer entails fitting this model When putting things together, we need to average between the two P h 1 W 1 ) = P x θ = h 1 P h 1 W 1 )P(x h 1, W 1 ) x P h 1, x W 1 )

Key Concepts Neural Approaches Generative Approaches Pretraining DBM - Trick Averaging the two models of h 1 can be approximated by taking half contribution from W 1 and half from W 2 Using full W 1 and W 2 would double count x contribution as h 2 depends on x h 2 h 2 W 2 W 2 h 1 h 1 W 1 W 1 x x W 1 h 2 h 1 x W 2 When training with more than two RBMs apply trick to first and last layers and halve weights (both direction) of intermediate RBM

Key Concepts Neural Approaches Generative Approaches DBM Discriminative Fine Tuning W 2 T P(h 2 v) output h 2 h 1 W 2 W 1 x The pretrained DBM matrices can be used to initialize a deep autoencoder Add input from h 2 to the first hidden layer Add output layer Fine tuning of the RBM matrices by backpropagation

Software Conclusions Software - Deep Neural Autoencoders All deep learning frameworks offer facilities to build (deep) AEs Check out classic Theano-based tutorials for denoising autoencoders and their stacked version A variety of deep AE in Keras and their counterpart in Lua-Torch Stacked autoencoders built with official Matlab toolbox functions

Software Conclusions Matlab - Deep Generative Models Matlab code for the DBN paper with a demo on MNIST data Matlab code for Deep Boltzmann Machines with a demo on MNIST data Deepmat Matlab library for deep generative models DeeBNet Matlab/Octave toolbox for deep generative models with GPU support

Software Conclusions Python - Deep Generative Models DBN and DBM implementations exist for all major deep learning libraries Deep Boltzmann machine implementation (Tensorflow-based) with image processing application, pre-trained networks and notebooks Deepnet A Toronto based implementation of deep autoencoders (neural and generative) Check out classic Theano-based tutorials for deep belief networks and RBM

Software Conclusions AE - Visualization Visualizing complex data in learned latent space

Software Conclusions Visualizing Sound

Software Conclusions AE Image Restoration/Colorization Apply autoencoder construction with advanced building blocks (e.g. CNN layers)

Software Conclusions DBM Learning Image Features CIFAR-10 Images Level 1 Filters Level 2 Filters https://github.com/monsta-hd/boltzmann-machines

Software Conclusions Multimodal DBM Modality fusion layers Modality 1 Modality K N. Srivastava, R. Salakhutdinov, Multimodal Learning with Deep Boltzmann Machines, JMLR 2014

P(txt img) Introduction Software Conclusions Multimodal DBM Image and Text P(img txt) N. Srivastava, R. Salakhutdinov, Multimodal Learning with Deep Boltzmann Machines, JMLR 2014

Software Conclusions Multimodal DBM Sampling N. Srivastava, R. Salakhutdinov, Multimodal Learning with Deep Boltzmann Machines, JMLR 2014

Software Conclusions Multimodal DBM Multimodal Quering N. Srivastava, R. Salakhutdinov, Multimodal Learning with Deep Boltzmann Machines, JMLR 2014

Software Conclusions Multimodal DBM Pang et al. Deep Multimodal Learning for Affective Analysis and Retrieval. IEEE Trans. on Multimedia, 2015

Software Conclusions Take Home Messages Regularized autoencoder Optimize reconstruction quality Constrain stored information Autoencoder training is manifold learning Learn a latent space manifold where input data resides Store only variations that are useful to represent training data Autoencoders learn a (conditional) distribution of input data P( x ) Deep AE: pretraining, fine tuning, supervised optimization Use AE for finding new/useful data representations Or to learn its distribution

Software Conclusions Next Lecture Next lecture will be on Monday 16 th April 2018 Stay tuned on Moodle for the topic Need to finalize dates for: Lecture recovery: 19 th April 2018, 11-13 (room TBD) Midterm 2: 03 rd May 2018, 11-13 (room TBD)