Denoising Autoencoders

Size: px

Start display at page:

Download "Denoising Autoencoders"

Anissa Tyler
6 years ago
Views:

1 Denoising Autoencoders Oliver Worm, Daniel Leinfelder Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

2 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

3 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

4 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

5 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

6 Introduction Poor initialisation can lead to local minima Rumelhart, Hinton, Williams [RHW88] random initialization and gradient descent shows bad performance Hinton, Osindero, Teh [HOT06] stacking Restricted Boltzmann Machines and tune with Up-Down shows very good performance Bengio, Lamblin, Popovici, Larochelle [BLP + 07] [PCL06] stacking Autoencoders and tune with gradient descent shows good performance Can we initialize it better? Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

7 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b input x reconstructed input z Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

8 Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

9 Autoencoder hidden representation y f θ input x g θ reconstruction error L(x, z) reconstructed input z x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error L(x, z) = x z 2 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

10 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

11 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

12 Autoencoder hidden representation y f θ g θ reconstruction error L(x, z) x [0, 1] d y [0, 1] d z [0, 1] d y = f θ (x) = s(wx + b) θ = W, b z = g θ (y) = s(w y + b ) θ = W, b Squared error input x reconstructed L(x, z) = x z 2 input z θ, θ = arg min θ,θ 1 n n L(x (i), g θ (f θ (x (i) ))) i=1 Reconstruction cross-entropy L H (x, z) = d [x k log z k + (1 x k ) log(1 z k )] k=1 Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

13 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

14 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

15 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

16 Denoising Autoencoder hidden representation y reconstruction error L H (x, z) g θ X X f θ q D corrupted input x input x input z input x [0, 1] d, destroy partially, corrupted input x q D ( x x) x mapped to hidden representation y = f θ ( x) reconstruction from y leads to z = g θ (y) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

17 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer 2 remove autoencoder construct and use the learned mapping f θ directly on the input Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

18 2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer

19 2 remove autoencoder construct and use the learned mapping f θ directly on the input 3 learn next layer by repeating the steps Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11 Learning the layers L H X f (2) θ q D g (2) θ f θ input x 1 learn f θ with a denoising autoencoder on the first layer

20 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

21 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

22 Supervised fine tuning supervised cost initialize the network with unsupervised learning continue with a supervised learning for f sup θ fine tune the network with the supervised criterion f sup θ f (3) θ f (2) θ f θ target Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

23 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

24 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

25 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

26 Perspective view: Manifold There are several perspective views for the denoising autoencoders here: learning a manifold training data (x) lies nearby a low-dimensinal manifold a corruption example ( ) is obtained by applying q D ( X X) learning the model with p(x X), project them back to the manifold Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

27 Results Dataset SVM rbf SAA-3 DBN-3 SdA-3 (v%) basic 3.03± ± ± ± 0.14 (10) rot 11.11± ± ± ± 0.27 (10) bg-rand 14.58± ± ± ± 0.27 (40) bg-img 22.61± ± ± ± 0.33 (25) ro-b-im 55.18± ± ± ± 0.44 (25) rect 2.15± ± ± ± 0.12 (10) rect-img 24.04± ± ± ± 0.36 (25) convex 19.13± ± ± ± 0.34 (10) MNIST data set Test error rate with a 95% confidence interval [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

28 Results v = 0% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

29 Results v = 10% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

30 Results v = 25% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

31 Results v = 50% Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

32 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

33 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

34 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

35 Summary extending autoencoders to denoising autoencoders is simple denoising helps to capture interesting structures from the input distribution initialization with stacked denoising autoencoders shows better performance than stacked basic autoencoders denoising autoencoders perform even better than deep belief networks whose layers are initialized as Restricted Boltzmann Machines [VLBM08] Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

36 References [BLP + 07] [HOT06] [PCL06] [RHW88] [VLBM08] Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle, Universite De Montreal, and Montreal Quebec. Greedy layer-wise training of deep networks. In In NIPS. MIT Press, Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep belief nets. Neural Comput., 18(7): , July Christopher Poultney, Sumit Chopra, and Yann Lecun. Efficient learning of sparse representations with an energy-based model. In Advances in Neural Information Processing Systems (NIPS MIT Press, David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Neurocomputing: Foundations of research. chapter Learning Representations by Back-propagating Errors, pages MIT Press, Cambridge, MA, USA, Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML 08, pages , New York, NY, USA, ACM. Oliver Worm, Daniel Leinfelder Denoising Autoencoders / 11

Learning Deep Architectures

Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,