Unsupervised Neural Nets

Similar documents
How to do backpropagation in a brain

Cheng Soon Ong & Christian Walder. Canberra February June 2018

UNSUPERVISED LEARNING

CSC321 Lecture 20: Autoencoders

Basic Principles of Unsupervised and Unsupervised

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Greedy Layer-Wise Training of Deep Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Statistical Learning Theory. Part I 5. Deep Learning

Deep Generative Models. (Unsupervised Learning)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

Introduction to Convolutional Neural Networks (CNNs)

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

STA 414/2104: Lecture 8

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Lecture 14: Deep Generative Learning

A Unified Energy-Based Framework for Unsupervised Learning

STA 414/2104: Lecture 8

Part 2. Representation Learning Algorithms

Machine Learning Basics: Maximum Likelihood Estimation

ECE521 Lectures 9 Fully Connected Neural Networks

RegML 2018 Class 8 Deep learning

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Stochastic Backpropagation, Variational Inference, and Semi-Supervised Learning

TUTORIAL PART 1 Unsupervised Learning

Neural Networks and Deep Learning.

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Reward-modulated inference

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks: Backpropagation

Convolutional Neural Networks. Srikumar Ramalingam

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Lecture 16 Deep Neural Generative Models

ECS171: Machine Learning

Unsupervised Learning

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Deep unsupervised learning

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Artificial Neural Networks. MGS Lecture 2

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Pattern Recognition and Machine Learning

Introduction to Neural Networks

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Loss Functions and Optimization. Lecture 3-1

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Introduction to Machine Learning

An efficient way to learn deep generative models

Deep Learning for Natural Language Processing

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Mathematical Tools for Neuroscience (NEU 314) Princeton University, Spring 2016 Jonathan Pillow. Homework 8: Logistic Regression & Information Theory

Feature Design. Feature Design. Feature Design. & Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Connecting Machine Learning with Shallow Neural Networks

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Artificial Intelligence

Variational Autoencoders (VAEs)

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Lecture 17: Neural Networks and Deep Learning

Machine Learning (BSMC-GA 4439) Wenke Liu

Auto-Encoding Variational Bayes

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Bayesian Semi-supervised Learning with Deep Generative Models

Intro to Neural Networks and Deep Learning

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts III-IV

Natural Image Statistics

Learning Deep Architectures

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Loss Functions and Optimization. Lecture 3-1

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Dimensionality Reduction and Principle Components Analysis

Bits of Machine Learning Part 1: Supervised Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning Techniques

Learning features by contrasting natural images with noise

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Linear Factor Models. Deep Learning Decal. Hosted by Machine Learning at Berkeley

Supervised Learning. George Konidaris

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Learning with Singular Vectors

An Introduction to Independent Components Analysis (ICA)

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Introduction to Machine learning

Autoencoders and Score Matching. Based Models. Kevin Swersky Marc Aurelio Ranzato David Buchman Benjamin M. Marlin Nando de Freitas

Denoising Autoencoders

Neural Network Training

Higher Order Statistics

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

Transcription:

Unsupervised Neural Nets (and ICA) Lyle Ungar (with contributions from Quoc Le, Socher & Manning) Lyle Ungar, University of Pennsylvania

Semi-Supervised Learning Hypothesis:%P(c x)%can%be%more%accurately%computed%using% shared%structure%with%p(x)%% purely% supervised% from Socher and Manning 2

Semi-Supervised Learning Hypothesis:%P(c x)%can%be%more%accurately%computed%using% shared%structure%with%p(x)%% semij% supervised% from Socher and Manning 3

Unsupervised Neural Nets u Autoencoders l Take same image as input and output n often adding noise to the input l Learn weights to minimize the reconstruction error l Avoid perfect fitting n Pass through a bottleneck n Impose sparsity w Dropout n Add noise to the input u Generalize PCA or ICA http://ufldl.stanford.edu/wiki/index.php/ Autoencoders_and_Sparsity

Independent Components Analysis (ICA) u Given observations X, find W such that components s j of S = XW are as independent of each other as possible l E.g. have maximum KL-divergence or low mutual information l Alternatively, find directions in X that are most skewed n farthest from Gaussian l Usually mean center and whiten the data (make unit covariance) first n whiten: (X T X) -1/2 X u Very similar to PCA l But the loss function is not quadratic l So optimization cannot be done by SVD

Independent Components Analysis (ICA) u Given observations X, find W and S such that components s j of S = XW are as independent of each other as possible l S k = sources should be independent u Reconstruct X ~ (XW)W + = SW + l S like principle components l W + like loadings l x ~ Σ j s j w j + u Auto-encoder nonlinear generalization that encodes X as S and then decodes it

Reconstruction ICA (RICA) u Reconstruction ICA: find W to minimize l Reconstruction error n X - SW + 2 = X - (XW)W + 2 And minimize l Mutual information between sources S = XW I(s 1, s 2...s k ) = H(y) = k H(s i ) H(s) i 1 p(y)log p(y)dy Difference between the entropy of each source s i and the entropy of all of them together http://mathworld.wolfram.com/differentialentropy.html Note: this is a bit more complex than it looks, as we have real numbers, not distributions

Mutual information http://www.stat.ucla.edu/~yuille/courses/stat161-261-spring14/hyvo00-icatut.pdf I(y 1,y 2, y m ) = KL(p(y 1,y 2, y m ) p(y 1 )p(y 2 )...p(y m )) 8

Unsupervised Neural Nets u Auto-encoders l Take same image as input and output n often adding noise to the input l Learn weights to minimize the reconstruction error l This can be done repeatedly (reconstructing features) l Use for semi-supervised learning http://theanalyticsstore.com/deep-learning/

PCA = Linear Manifold = Linear Auto-Encoder from Socher and Manning 10

The Manifold Learning Hypothesis from Socher and Manning 11

Auto-Encoders are like nonlinear PCA from Socher and Manning 12

Stacking for deep learning from Socher and Manning 13

Stacking for deep learning Now learn to reconstruct the features (using more abstract ones) from Socher and Manning 14

Stacking for deep learning u Recurse many layers deep u Can be used to initialize supervised learning from Socher and Manning 15

Tera-scale deep learning Quoc%V.%Le% Stanford%University%and%Google% Now at google

Joint%work%with% Kai%Chen% Greg%Corrado% Jeff%Dean% MaQhieu%Devin% Rajat%Monga% Andrew%Ng% Marc Aurelio% Ranzato% Paul%Tucker% Ke%Yang% AddiNonal% Thanks:% Samy%Bengio,%Zhenghao%Chen,%Tom%Dean,%Pangwei%Koh,% Mark%Mao,%Jiquan%Ngiam,%Patrick%Nguyen,%Andrew%Saxe,% Mark%Segal,%Jon%Shlens,%%Vincent%Vanhouke,%%Xiaoyun%Wu,%% Peng%Xe,%Serena%Yeung,%Will%Zou%

TICA:% Warning: this x and W are the transpose of what we use ReconstrucNon%ICA:% Equivalence%between%Sparse%Coding,%Autoencoders,%RBMs%and%ICA% Build%deep%architecture%by%treaNng%the%output%of%one%layer%as%input%to% another%layer% Le,%et%al.,%ICA$with$Reconstruc1on$Cost$for$Efficient$Overcomplete$Feature$Learning.%NIPS%2011%

Visualization of features learned Most%are%% local%features%

Challenges%with%1000s%of%machines%

Asynchronous%Parallel%SGDs% Parameter%server% Le,%et%al.,%Building$high>level$features$using$large>scale$unsupervised$learning.%ICML%2012%

Local%recepNve%field%networks% Machine%#1% Machine%#2% Machine%#3% Machine%#4% RICA%features% Image% Le,%et%al.,%Tiled$Convolu1onal$Neural$Networks.%NIPS%2010%

10%million%200x200%images%% 1%billion%parameters%

Training% RICA% RICA% Dataset:%10%million%200x200%unlabeled%images%%from%YouTube/Web% % Train%on%2000%machines%(16000%cores)%for%1%week% % 1.15%billion%parameters% \ 100x%larger%than%previously%reported%% \ Small%compared%to%visual%cortex% % RICA% Image% Le,%et%al.,%Building$high>level$features$using$large>scale$unsupervised$learning.%ICML%2012%

The%face%neuron% Top%sNmuli%from%the%test%set% OpNmal%sNmulus%% by%numerical%opnmizanon% Le,%et%al.,%Building$high>level$features$using$large>scale$unsupervised$learning.%ICML%2012%

The cat neuron Top%sNmuli%from%the%test%set% OpNmal%sNmulus%% by%numerical%opnmizanon% Le,%et%al.,%Building$high>level$features$using$large>scale$unsupervised$learning.%ICML%2012%

What you should know u Supervised neural nets l Generalize logistic regression l Often have built-in structure n local receptive fields, max pooling l Usually solved by minibatch stochastic gradient descent n With chain rule ( backpropagation ), u Unsupervised neural nets l Generalize PCA or ICA l Generally learn an overcomplete basis l Often trained recursively as nonlinear autoencoders l Used in semi-supervised learning n Or to initialize supervised deep nets