Denoising Autoencoders

Similar documents
Learning Deep Architectures

Greedy Layer-Wise Training of Deep Networks

Deep Belief Networks are compact universal approximators

Learning Deep Architectures

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Feature Design. Feature Design. Feature Design. & Deep Learning

Learning Deep Architectures for AI. Part II - Vijay Chakilam

UNSUPERVISED LEARNING

Deep Learning & Neural Networks Lecture 2

Neural Networks: A Very Brief Tutorial

Nonlinear system modeling with deep neural networks and autoencoders algorithm

arxiv: v3 [cs.lg] 18 Mar 2013

Knowledge Extraction from DBNs for Images

Deep unsupervised learning

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

An Introduction to Deep Learning

TUTORIAL PART 1 Unsupervised Learning

arxiv: v5 [cs.lg] 19 Aug 2014

arxiv: v1 [cs.lg] 30 Jun 2012

Deep Generative Stochastic Networks Trainable by Backprop

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Deep Generative Models. (Unsupervised Learning)

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

How to do backpropagation in a brain

Knowledge Extraction from Deep Belief Networks for Images

Cheng Soon Ong & Christian Walder. Canberra February June 2018

A Connection Between Score Matching and Denoising Autoencoders

CSC321 Lecture 20: Autoencoders

Deep Learning Autoencoder Models

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines

A Connection Between Score Matching and Denoising Autoencoders

[6, 7], SVM, [7].,, SVM,, SVM,,,,,,, 0.01%,,,,.,,,, SVM,,,, [9], ( ) Soft Confidence-Weighted Learning (SCW)[8],, SCW[8], ( ),,, ( ),,,, , [

Part 2. Representation Learning Algorithms

arxiv: v1 [stat.ml] 2 Sep 2014

Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

A Hybrid Deep Learning Approach For Chaotic Time Series Prediction Based On Unsupervised Feature Learning

arxiv: v1 [stat.ml] 30 Mar 2016

EE-559 Deep learning 9. Autoencoders and generative models

An efficient way to learn deep generative models

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Lecture 16 Deep Neural Generative Models

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

arxiv: v1 [cs.lg] 10 Jun 2016

Auto-Encoders & Variants

On autoencoder scoring

arxiv: v2 [cs.lg] 14 Dec 2012

Self-paced Convolutional Neural Networks

arxiv: v3 [cs.lg] 9 Aug 2016

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Unsupervised Learning

Deep Neural Networks

Jakub Hajic Artificial Intelligence Seminar I

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Deep Learning: a gentle introduction

arxiv: v1 [cs.lg] 11 May 2015

Deep Learning of Invariant Spatio-Temporal Features from Video

FreezeOut: Accelerate Training by Progressively Freezing Layers

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Deep Recurrent Neural Networks

Au-delà de la Machine de Boltzmann Restreinte. Hugo Larochelle University of Toronto

To go deep or wide in learning?

Fast Learning with Noise in Deep Neural Nets

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Scalable Gaussian Process Regression Using Deep Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Unsupervised Feature Learning from Temporal Data

Learning Tetris. 1 Tetris. February 3, 2009

arxiv: v1 [cs.lg] 25 Jun 2015

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Tempered Markov Chain Monte Carlo for training of Restricted Boltzmann Machines

RegML 2018 Class 8 Deep learning

Cardinality Restricted Boltzmann Machines

Supervised Learning Part I

Learning Energy-Based Models of High-Dimensional Data

Neural Networks and Deep Learning.

An Efficient Learning Procedure for Deep Boltzmann Machines

Reward-modulated inference

A Unified Energy-Based Framework for Unsupervised Learning

Deep Learning. Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

arxiv: v1 [cs.lg] 6 Nov 2016

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

UNDERSTANDING LOCAL MINIMA IN NEURAL NET-

Deep Learning Architectures and Algorithms

Approximation properties of DBNs with binary hidden units and real-valued visible units

arxiv: v2 [stat.ml] 18 Jun 2017

Deep Learning Made Easier by Linear Transformations in Perceptrons

STA 414/2104: Lecture 8

Measuring Invariances in Deep Networks

Introduction to Deep Learning

arxiv: v4 [stat.ml] 8 Jan 2016

Improved Local Coordinate Coding using Local Tangents

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Learning Dynamics of Linear Denoising Autoencoders

Transcription:

Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11

Introduction Poor initialisation can lead to local minima 1986 - Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 2 / 11

Introduction Poor initialisation can lead to local minima 1986 - Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance 2006 - Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 2 / 11

Introduction Poor initialisation can lead to local minima 1986 - Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance 2006 - Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance 2007 - Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 2 / 11

Introduction Poor initialisation can lead to local minima 1986 - Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance 2006 - Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance 2007 - Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 2 / 11

Introduction Poor initialisation can lead to local minima 1986 - Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance 2006 - Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance 2007 - Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 2 / 11

Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b input x reconstructed input z Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 3 / 11

Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 3 / 11

Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error L(x, z) = x z 2 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 3 / 11

Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 3 / 11

Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 3 / 11

Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 3 / 11

Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 4 / 11

Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 4 / 11

Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 4 / 11

Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 4 / 11

Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer 2 remove autoencoder construct and use the learned mapping f θ directly on the input Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 5 / 11

2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 5 / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer

2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 5 / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer

Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 6 / 11

Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 6 / 11

Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 6 / 11

Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 7 / 11

Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 7 / 11

Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 7 / 11

Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 7 / 11

Results Dataset SVM rbf SAA-3 DBN-3 SdA-3 (v%) basic 3.03± 0.15 3.46± 0.16 3.11± 0.15 2.80± 0.14 (10) rot 11.11± 0.28 10.30± 0.27 10.30± 0.27 10.29± 0.27 (10) bg-rand 14.58± 0.31 11.28± 0.28 6.73± 0.22 10.38± 0.27 (40) bg-img 22.61± 0.37 23.00± 0.37 16.31± 0.32 16.68± 0.33 (25) ro-b-im 55.18± 0.44 51.93± 0.44 47.39± 0.44 44.49± 0.44 (25) rect 2.15± 0.13 2.41± 0.13 2.60± 0.14 1.99± 0.12 (10) rect-img 24.04± 0.37 24.05± 0.37 22.50± 0.37 21.59± 0.36 (25) convex 19.13± 0.34 18.41± 0.34 18.63± 0.34 19.06± 0.34 (10) MNIST data set Test error rate with a 95% confidence interval [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 8 / 11

Results v = 0% Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 9 / 11

Results v = 10% Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 9 / 11

Results v = 25% Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 9 / 11

Results v = 50% Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 9 / 11

Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 10 / 11

Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 10 / 11

Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 10 / 11

Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 10 / 11

References [BLP + 07] [HOT06] [PCL06] [RHW88] [VLBM08] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, Universite De Montreal, and Montreal Quebec. Greedy layer-wise training of deep networks. In In NIPS. MIT Press, 2007. Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7):1527 1554, July 2006. Christopher Poultney, Sumit Chopra, and Yann Lecun. Efficient learning of sparse representations with an energy-based model. In Advances in Neural Information Processing Systems (NIPS 2006. MIT Press, 2006. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Neurocomputing: Foundations of research. chapter Learning Representations by Back-propagating Errors, pages 696 699. MIT Press, Cambridge, MA, USA, 1988. Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pages 1096 1103, New York, NY, USA, 2008. ACM. Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 11 / 11