CSC321 Lecture 20: Autoencoders

Similar documents
CSC 411 Lecture 12: Principal Component Analysis

STA 414/2104: Lecture 8

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

STA 414/2104: Lecture 8

Lecture 16 Deep Neural Generative Models

Principal Component Analysis (PCA)

Deep Learning Autoencoder Models

CSC321 Lecture 5: Multilayer Perceptrons

UNSUPERVISED LEARNING

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

CSC321 Lecture 7 Neural language models

How to do backpropagation in a brain

Machine Learning Techniques

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

CSC321 Lecture 9: Generalization

Deep unsupervised learning

CSC321 Lecture 9: Generalization

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 7: Con3nuous Latent Variable Models

Introduction to Machine Learning

An efficient way to learn deep generative models

Unsupervised Learning

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Principal Component Analysis (PCA) CSC411/2515 Tutorial

STA 414/2104: Machine Learning

Unsupervised Neural Nets

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Restricted Boltzmann Machines

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Principal Component Analysis

The Origin of Deep Learning. Lili Mou Jan, 2015

Dimension Reduction (PCA, ICA, CCA, FLD,

CSC321 Lecture 16: ResNets and Attention

Lecture 14: Deep Generative Learning

Learning Deep Architectures

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

Denoising Autoencoders

1 Principal Components Analysis

Variational Autoencoders. Presented by Alex Beatson Materials from Yann LeCun, Jaan Altosaar, Shakir Mohamed

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

TUTORIAL PART 1 Unsupervised Learning

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 20: Reversible and Autoregressive Models

Knowledge Extraction from DBNs for Images

Lecture 9: Generalization

Covariance and Correlation Matrix

Variational Autoencoders

Deep Generative Models. (Unsupervised Learning)

PCA and admixture models

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Artificial Neural Networks. MGS Lecture 2

Basic Principles of Unsupervised and Unsupervised

Clustering, K-Means, EM Tutorial

CS281 Section 4: Factor Analysis and PCA

CSC321 Lecture 15: Recurrent Neural Networks

Unsupervised Learning

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Ruslan Salakhutdinov Joint work with Geoff Hinton. University of Toronto, Machine Learning Group

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Greedy Layer-Wise Training of Deep Networks

CSC321 Lecture 4: Learning a Classifier

Neuroscience Introduction

CSC321 Lecture 4: Learning a Classifier

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Dimensionality Reduction

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

CSC 411 Lecture 10: Neural Networks

CSC321 Lecture 7: Optimization

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

CSC321 Lecture 8: Optimization

CSC321 Lecture 18: Learning Probabilistic Models

CONVOLUTIONAL DEEP BELIEF NETWORKS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Latent Variable Models

Probabilistic Reasoning in Deep Learning

Machine Learning - MT & 14. PCA and MDS

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Lecture 15: Exploding and Vanishing Gradients

20 Unsupervised Learning and Principal Components Analysis (PCA)

Auto-Encoding Variational Bayes

Principal Component Analysis (PCA)

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Deep Learning: a gentle introduction

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Sum-Product Networks: A New Deep Architecture

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Statistical Pattern Recognition

Statistical Learning Theory. Part I 5. Deep Learning

Machine Learning Techniques for Computer Vision

Machine Learning. Neural Networks

Transcription:

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16

Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete latent variables. Now let s talk about continuous ones. One use of continuous latent variables is dimensionality reduction Roger Grosse CSC321 Lecture 20: Autoencoders 2 / 16

Autoencoders An autoencoder is a feed-forward neural net whose job it is to take an input x and predict x. To make this non-trivial, we need to add a bottleneck layer whose dimension is much smaller than the input. Roger Grosse CSC321 Lecture 20: Autoencoders 3 / 16

Autoencoders Why autoencoders? Map high-dimensional data to two dimensions for visualization Compression (i.e. reducing the file size) Note: autoencoders don t do this for free it requires other ideas as well. Learn abstract features in an unsupervised way so you can apply them to a supervised task Unlabled data can be much more plentiful than labeled data Roger Grosse CSC321 Lecture 20: Autoencoders 4 / 16

Principal Component Analysis The simplest kind of autoencoder has one hidden layer, linear activations, and squared error loss. L(x, x) = x x 2 This network computes x = UVx, which is a linear function. If K D, we can choose U and V such that UV is the identity. This isn t very interesting. But suppose K < D: V maps x to a K-dimensional space, so it s doing dimensionality reduction. The output must lie in a K-dimensional subspace, namely the column space of U. Roger Grosse CSC321 Lecture 20: Autoencoders 5 / 16

Principal Component Analysis We just saw that a linear autoencoder has to map D-dimensional inputs to a K-dimensional subspace S. Knowing this, what is the best possible mapping it can choose? Roger Grosse CSC321 Lecture 20: Autoencoders 6 / 16

Principal Component Analysis We just saw that a linear autoencoder has to map D-dimensional inputs to a K-dimensional subspace S. Knowing this, what is the best possible mapping it can choose? By definition, the projection of x onto S is the point in S which minimizes the distance to x. Fortunately, the linear autoencoder can represent projection onto S: pick U = Q and V = Q, where Q is an orthonormal basis for S. Roger Grosse CSC321 Lecture 20: Autoencoders 6 / 16

Principal Component Analysis The autoencoder should learn to choose the subspace which minimizes the squared distance from the data to the projections. This is equivalent to the subspace which maximizes the variance of the projections. By the Pythagorean Theorem, 1 N x (i) µ 2 + 1 N x (i) x (i) 2 N N i=1 i=1 }{{}}{{} projected variance reconstruction error = 1 N x (i) µ 2 N i=1 }{{} constant You wouldn t actually sove this problem by training a neural net. There s a closed-form solution, which you learn about in CSC 411. The algorithm is called principal component analysis (PCA). Roger Grosse CSC321 Lecture 20: Autoencoders 7 / 16

Principal Component Analysis PCA for faces ( Eigenfaces ) Roger Grosse CSC321 Lecture 20: Autoencoders 8 / 16

Principal Component Analysis PCA for digits Roger Grosse CSC321 Lecture 20: Autoencoders 9 / 16

Deep Autoencoders Deep nonlinear autoencoders learn to project the data, not onto a subspace, but onto a nonlinear manifold This manifold is the image of the decoder. This is a kind of nonlinear dimensionality reduction. Roger Grosse CSC321 Lecture 20: Autoencoders 10 / 16

Deep Autoencoders Nonlinear autoencoders can learn more powerful codes for a given dimensionality, compared with linear autoencoders (PCA) Roger Grosse CSC321 Lecture 20: Autoencoders 11 / 16

Layerwise Training There s a neat connection between autoencoders and RBMs. An RBM is like an autoencoder with tied weights, except that the units are sampled stochastically. Roger Grosse CSC321 Lecture 20: Autoencoders 12 / 16

Layerwise Training Suppose we ve already trained an RBM with weights W (1). Let s compute its hidden features on the training set, and feed that in as data to another RBM: Note that now W (1) is held fixed, but W (2) is being trained using contrastive divergence. Roger Grosse CSC321 Lecture 20: Autoencoders 13 / 16

Layerwise Training A stack of two RBMs can be thought of as an autoencoder with three hidden layers: This gives a good initialization for the deep autoencoder. You can then fine-tune the autoencoder weights using backprop. This strategy is known as layerwise pre-training. Roger Grosse CSC321 Lecture 20: Autoencoders 14 / 16

Autoencoders are not a probabilistic model. However, there is an autoencoder-like probabilistic model called a variational autoencoder (VAE). These are beyond the scope of the course, and require some more advanced math. Check out David Duvenaud s excellent course Differentiable Inference and Generative Models : https://www.cs.toronto.edu/ ~duvenaud/courses/csc2541/index.html Roger Grosse CSC321 Lecture 20: Autoencoders 15 / 16

Deep Autoencoders (Professor Hinton s slides) Roger Grosse CSC321 Lecture 20: Autoencoders 16 / 16