Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Similar documents
Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

UNSUPERVISED LEARNING

Restricted Boltzmann Machines for Collaborative Filtering

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

How to do backpropagation in a brain

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

The Origin of Deep Learning. Lili Mou Jan, 2015

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Deep unsupervised learning

An efficient way to learn deep generative models

Fundamentals of Computational Neuroscience 2e

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Learning Tetris. 1 Tetris. February 3, 2009

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Basic Principles of Unsupervised and Unsupervised

arxiv: v2 [cs.ne] 22 Feb 2013

Knowledge Extraction from DBNs for Images

Jakub Hajic Artificial Intelligence Seminar I

Deep Boltzmann Machines

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Approximate Q-Learning. Dan Weld / University of Washington

Deep Learning Architectures and Algorithms

Restricted Boltzmann Machines

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

CSE446: Neural Networks Winter 2015

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 17: Neural Networks and Deep Learning

Restricted Boltzmann Machines

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Introduction to Convolutional Neural Networks (CNNs)

Human-level control through deep reinforcement. Liia Butler

Machine Learning. Neural Networks

Lecture 16 Deep Neural Generative Models

Q-Learning in Continuous State Action Spaces

Greedy Layer-Wise Training of Deep Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Neural networks and optimization

TUTORIAL PART 1 Unsupervised Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Learning Energy-Based Models of High-Dimensional Data

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Artificial Neural Networks

Deep Neural Networks

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Probabilistic Models in Theoretical Neuroscience

Lecture 5: Logistic Regression. Neural Networks

David Silver, Google DeepMind

Foundations of Artificial Intelligence

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Artificial Neural Networks. MGS Lecture 2

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Bayesian Networks (Part I)

Introduction to Machine Learning (67577)

Deep Learning (CNNs)

CS230: Lecture 9 Deep Reinforcement Learning

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Grundlagen der Künstlichen Intelligenz

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks and Deep Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural networks. Chapter 20. Chapter 20 1

Parallel Tempering is Efficient for Learning Restricted Boltzmann Machines

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Learning Deep Architectures

RegML 2018 Class 8 Deep learning

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Introduction to Deep Neural Networks

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Kartik Audhkhasi, Osonde Osoba, Bart Kosko

Multilayer Perceptrons (MLPs)

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Introduction to Neural Networks

Deep Generative Models. (Unsupervised Learning)

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

Reinforcement Learning

Neural networks. Chapter 20, Section 5 1

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Understanding How ConvNets See

Chapter 16. Structured Probabilistic Models for Deep Learning

REINFORCEMENT LEARNING

Deep Reinforcement Learning

CONVOLUTIONAL DEEP BELIEF NETWORKS

Machine Learning for Physicists Lecture 1

Transcription:

CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information in a hierarchical manner. 2 he Rise of Deep Learning Long History (in Hind Sight) Made popular in recent years Geoffrey Hinton et al. (2006). Andrew Ng & Jeff Dean (Google Brain team, 202). Schmidhuber et al. s deep neural networks (won many competitions and in some cases showed super human performance; 20 ). Fukushima s Neocognitron (980). LeCun et al. s Convolutional neural networks (989). Schmidhuber s work on stacked recurrent neural networks (993). Vanishing gradient problem. 3 4

Fukushima s Neocognitron LeCun s Colvolutional Neural Nets Appeared in journal Biological Cybernetics (980). Multiple layers with local receptive fields. S cells (trainable) and C cells (fixed weight). Deformation-resistent recognition. Convolution kernel (weight sharing) + Subsampling Fully connected layers near the end. 5 6 Current rends Boltzmann Machine to Deep Belief Nets Deep belief networks (based on Boltzmann machine) Deep neural networks Haykin Chapter : Stochastic Methods rooted in statistical mechanics. Convolutional neural networks Deep Q-learning Network (extensions to reinforcement learning) 7 8

Boltzmann Machine Boltzmann Machine: Energy Network state: x from random variable X. w ij = w ji and w ii = 0. Energy (in analogy to thermodynamics): E(x) = w ji x i x j 2 i j,i j Stochastic binary machine: + or -. Fully connected symmetric connections: w ij = w ji. Visible vs. hidden neurons, clamped vs. free-running. Goal: Learn weights to model prob. dist of visible units. Unsupervised. Pattern completion. 9 0 Boltzmann Machine: Prob. of a State x Probability of a state x given E(x) follows the Gibbs distribution: P (X = x) = ( Z, Z: partition function (normalization factor hard to compute) Z = x : temperature parameter. exp( E(x)/ ) Low energy states are exponentially more probable. With the above, we can calculate P (X j = x {X i = x i } K i=,i j ) his can be done without knowing Z. Boltzmann Machine: P (X j = x the rest) A : X j = x. B : {X i = x i } K i=,i j P (X j = x the rest) = P (A, B) P (B) (the rest). P (A, B) = A P (A, B) = P (A, B) P (A, B) + P ( A, B) = ( ) + exp x i,i j w jix i = sigmoid x w ji x i i,i j Can compute equilibrium state based on the above. 2

Boltzmann Machine: Gibbs Sampling Initialize x (0) to a random vector. For j =, 2,..., n (generate n samples x P (X)) x (j+) from p(x x (j) 2, x(j) 3,..., x(j) K ) x (j+) 2 from p(x 2 x (j+), x (j) 3 x (j+) 3 from p(x 3 x (j+), x (j+)... x (j+) K,..., x(j) K ) 2, x (j) 4..., x(j) K ) from p(x K x (j+), x (j+) 2, x (j+) 3..., x (j+) K ) One new sample x (j+) P (X). Simulated annealing used (high to low ) for faster conv. Boltzmann Learning Rule () Probability of activity pattern being one of the training patterns (visible unit: subvector x α ; hidden unit: subvector y β ), given the weight vector w. P (X α = x α ) Log-likelihood of the visible units being any one of the trainning patterns (assuming they are mutually independent) : L(w) = log = x α x α We want to learn w that maximizes L(w). P (X α = x α ) log P (X α = x α ) 3 4 Boltzmann Learning Rule (2) Want to calculate P (X α = x α ): use energy function. P (X α = x α ) = ( Z x β log P (X α = x α ) = log ( log Z x β = log ( x β log ( x Note: Z = x exp ( E(x) 5 ) Finally, we get: Boltzmann Learning Rule (3) L(w) = log ( log ( xα x β x Note that w is involved in: E(x) = w ji x i x j 2 i j,i j Differentiating L(w) wrt w ji, we get: L(w) = P (X β = x β Xα = xα)x j x i xα x β ) P (X = x)x j x i x 6

Setting: We get: ρ + ji = Boltzmann Learning Rule (4) x α ρ ji = P (X β = x β X α = x α )x j x i x β x α L(w) P (X = x)x j x i x = ( ) ρ + ji ρ ji heoretically elegant. Boltzmann Machine Summary Very slow in practice (especially the unclamped phase). Attempting to maximize L(w), we get: where η = ɛ w ji = ɛ L(w) = η. his is gradient ascent. ( ) ρ + ji ρ ji 7 8 Logistic (or Directed) Belief Net Deep Belief Net () Overcomes issues with Logistic Belief Net. Hinton et al. (2006) Similar to Boltzmann Machine, but with directed, acyclic connections. P (X j = x j X = x,..., X j = x j ) = P (X j = x j parents(x j )) Same learning rule: w ji = η L(w) With dense connetions, calculation of P becomes intractable. 9 Based on Restricted Boltzmann Machine (RBM): visible and hidden layers, with layer-to-layer full connection but no within-layer connections. RBM Back-and-forth update: update hidden given visible, then update visible given hidden, etc., then train w based on L(w) = ρ (0) ji ρ ( ) ji 20

Deep Belief Net (2) Deep Convolutional Neural Networks () Deep Belief Net = Layer-by-layer training using RBM. Hybrid architecture: op layer = undirected, lower layers directed.. rain RBM based on input to form hidden representation. 2. Use hidden representation as input to train another RBM. 3. Repeat steps 2-3. Applications: NIS digit recognition. Krizhevsky et al. (202) Applied to ImageNet competition (.2 million images,,000 classes). Network: 60 million parameters and 650,000 neurons. op- and top-5 error rates of 37.5% and 7.0%. rained with backprop. 2 22 Deep Convolutional Neural Networks (2) Deep Convolutional Neural Networks (3) Learned kernels (first convolutional layer). Resembles mammalian RFs: oriented Gabor patterns, color opponency (red-green, blue-yellow). Left: Hits and misses and close calls. Right: est (st column) vs. training images with closest hidden representation to the test data. 23 24

Deep Q-Network (DQN) DQN Overview Google Deep Mind (Mnih et al. Nature 205). Latest application of deep learning to a reinforcement learning domain (Q as in Q-learning). Applied to Atari 2600 video game playing. Input: video screen; Output: Q(s, a); Reward: game score. Q(s, a): action-value function Value of taking action a when in state s. 25 26 DQN Overview DQN Algorithm Input preprocessing Experience replay (collect and replay state, action, reward, and resulting state) Delayed (periodic) update of Q. Moving target ˆQ value used to compute error (loss function L, parameterized by weights θ i ). Gradient descent: L θ i 27 28

DQN Results Superhuman performance on over half of the games. 29 DQN Operation DQN Hidden Layer Representation (t-sne map) Similar perception, similar reward clustered. 30 Summary Deep belief network: Based on Boltzmann machine. Elegant theory, good performance. Deep convolutional networks: High computational demand, over the board great performance. Deep Q-Network: unique apporach to reinforcement learning. End-to-end machine learning. Super-human performance. Value vs. game state; Game state vs. action value. 32