Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Similar documents
Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Deep Reinforcement Learning SISL. Jeremy Morton (jmorton2) November 7, Stanford Intelligent Systems Laboratory

UNSUPERVISED LEARNING

Restricted Boltzmann Machines for Collaborative Filtering

How to do backpropagation in a brain

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Approximate Q-Learning. Dan Weld / University of Washington

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

An efficient way to learn deep generative models

The Origin of Deep Learning. Lili Mou Jan, 2015

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Basic Principles of Unsupervised and Unsupervised

Deep unsupervised learning

Jakub Hajic Artificial Intelligence Seminar I

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Fundamentals of Computational Neuroscience 2e

arxiv: v2 [cs.ne] 22 Feb 2013

Deep Learning Architectures and Algorithms

Learning Tetris. 1 Tetris. February 3, 2009

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Knowledge Extraction from DBNs for Images

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Deep Boltzmann Machines

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CSE446: Neural Networks Winter 2015

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Restricted Boltzmann Machines

Lecture 17: Neural Networks and Deep Learning

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

Restricted Boltzmann Machines

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Lecture 16 Deep Neural Generative Models

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Machine Learning. Neural Networks

David Silver, Google DeepMind

Neural networks and optimization

Learning Energy-Based Models of High-Dimensional Data

TUTORIAL PART 1 Unsupervised Learning

Foundations of Artificial Intelligence

Introduction to Convolutional Neural Networks (CNNs)

Deep Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CS230: Lecture 9 Deep Reinforcement Learning

Human-level control through deep reinforcement. Liia Butler

Q-Learning in Continuous State Action Spaces

Introduction to Machine Learning (67577)

Greedy Layer-Wise Training of Deep Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Introduction to Deep Neural Networks

Artificial Neural Networks

Learning Deep Architectures

Neural networks. Chapter 19, Sections 1 5 1

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Deep Learning (CNNs)

Bayesian Networks (Part I)

Probabilistic Models in Theoretical Neuroscience

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Neural networks. Chapter 20. Chapter 20 1

Lecture 5: Logistic Regression. Neural Networks

Neural Networks and Deep Learning

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Density estimation. Computing, and avoiding, partition functions. Iain Murray

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Artificial Neural Networks. MGS Lecture 2

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Machine Learning for Physicists Lecture 1

Deep Reinforcement Learning

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Introduction to Neural Networks

Machine Learning. Boris

Grundlagen der Künstlichen Intelligenz

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Multilayer Perceptrons (MLPs)

Energy Based Models. Stefano Ermon, Aditya Grover. Stanford University. Lecture 13

Introduction to Deep Learning

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 14. Sinan Kalkan

Neural networks. Chapter 20, Section 5 1

Understanding How ConvNets See

Kartik Audhkhasi, Osonde Osoba, Bart Kosko

Introduction to Deep Learning

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

Parallel Tempering is Efficient for Learning Restricted Boltzmann Machines

Deep Generative Models. (Unsupervised Learning)

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Bayesian Learning in Undirected Graphical Models

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

RegML 2018 Class 8 Deep learning

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Transcription:

CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information in a hierarchical manner. he Rise of Deep Learning Long History in Hind Sight Made popular in recent years Geoffrey Hinton et al. 006. Andrew Ng & Jeff Dean Google Brain team, 0. Schmidhuber et al. s deep neural networks won many competitions and in some cases showed super human performance; 0. Google Deep Mind: Atari 600 games 05, AlphaGo 06. Fukushima s Neocognitron 980. LeCun et al. s Convolutional neural networks 989. Schmidhuber s work on stacked recurrent neural networks 993. Vanishing gradient problem. See Schmidhuber s extended review: Schmidhuber, J. 05. Deep learning in neural networks: An overview. Neural Networks, 6, 85-7. 3 4

Fukushima s Neocognitron LeCun s Colvolutional Neural Nets Appeared in journal Biological Cybernetics 980. Multiple layers with local receptive fields. S cells trainable and C cells fixed weight. Deformation-resistent recognition. Convolution kernel weight sharing + Subsampling Fully connected layers near the end. 5 6 Current rends Boltzmann Machine to Deep Belief Nets Deep belief networks based on Boltzmann machine Deep neural networks Haykin Chapter : Stochastic Methods rooted in statistical mechanics. Convolutional neural networks Deep Q-learning Network extensions to reinforcement learning Applications to diverse domains including natural language. Lots of open source tools available. 7 8

Boltzmann Machine Boltzmann Machine: Energy Network state: x from random variable X. w ij w ji and w ii 0. Energy in analogy to thermodynamics: Ex w ji x i x j i j,i j Stochastic binary machine: + or -. Fully connected symmetric connections: w ij w ji. Visible vs. hidden neurons, clamped vs. free-running. Goal: Learn weights to model prob. dist of visible units. Unsupervised. Pattern completion. 9 0 Boltzmann Machine: Prob. of a State x Probability of a state x given Ex follows the Gibbs distribution: P X x Z exp Ex, Z: partition function normalization factor hard to compute Z x : temperature parameter. exp Ex/ Low energy states are exponentially more probable. With the above, we can calculate P X j x {X i x i } K i,i j his can be done without knowing Z. Boltzmann Machine: P X j x the rest A : X j x. B : {X i x i } K i,i j P X j x the rest P A, B P B the rest. P A, B A P A, B P A, B P A, B + P A, B + exp x i,i j w jix i sigmoid x w ji x i i,i j Can compute equilibrium state based on the above.

Boltzmann Machine: Gibbs Sampling Initialize x 0 to a random vector. For j,,..., n generate n samples x P X x j+ from px x j, xj 3,..., xj K x j+ from px x j+, x j 3 x j+ 3 from px 3 x j+, x j+... x j+ K,..., xj K, x j 4..., xj K from px K x j+, x j+, x j+ 3..., x j+ K One new sample x j+ P X. Simulated annealing used high to low for faster conv. Boltzmann Learning Rule Probability of activity pattern being one of the training patterns visible unit: subvector x α ; hidden unit: subvector x β, given the weight vector w. P X α x α Log-likelihood of the visible units being any one of the trainning patterns assuming they are mutually independent : Lw log x α x α We want to learn w that maximizes Lw. P X α x α log P X α x α 3 4 Boltzmann Learning Rule Want to calculate P X α x α probability of finding the visible neurons in state x α with any x β : use energy function. P Xα xα x β P Xα xα, X β x β exp Ex Z x β log P Xα xα log x β exp Note: Z x exp Ex log x β exp log x exp Ex Ex Ex log Z Finally, we get: Boltzmann Learning Rule 3 Lw log exp Ex log exp Ex xα x β x Note that w is involved in: Ex w ji x i x j i j,i j Differentiating Lw wrt w ji, we get: Lw P X β x β Xα xαx j x i xα x β P X xx j x i x 5 6

o derive: Boltzmann Learning Rule 3-: Some hints Lw Ex i j,i j P X β x β Xα xαx j x i xα x β P X xx j x i x w ji x i x j x i x j, i j P X β x β Xα xα P X α xα, X β x β P X x exp Z P Xα xα Ex 7 exp Ex x exp Ex Z exp Ex Z x exp β Ex Setting: We get: ρ + ji Boltzmann Learning Rule 4 x α ρ ji P X β x β X α x α x j x i x β x α Lw P X xx j x i x Attempting to maximize Lw, we get: where η ɛ w ji ɛ Lw ρ + ji ρ ji η. his is gradient ascent. 8 ρ + ji ρ ji Boltzmann Machine Summary Logistic or Directed Belief Net heoretically elegant. Very slow in practice especially the unclamped phase. Similar to Boltzmann Machine, but with directed, acyclic connections. P X j x j X x,..., X j x j P X j x j parentsx j Same learning rule: w ji η Lw 9 With dense connetions, calculation of P becomes intractable. 0

Deep Belief Net Deep Belief Net Deep Belief Net Layer-by-layer training using RBM. Hybrid architecture: op layer undirected, lower layers directed. Overcomes issues with Logistic Belief Net. Hinton et al. 006 Based on Restricted Boltzmann Machine RBM: visible and hidden layers, with layer-to-layer full connection but no within-layer connections. RBM Back-and-forth update: update hidden given visible, then update visible given hidden, etc., then train w based on Lw ρ 0 ji ρ ji. rain RBM based on input to form hidden representation.. Use hidden representation as input to train another RBM. 3. Repeat steps -3. Applications: NIS digit recognition. Deep Convolutional Neural Networks Deep Convolutional Neural Networks Krizhevsky et al. 0 Applied to ImageNet competition. million images,,000 classes. Network: 60 million parameters and 650,000 neurons. op- and top-5 error rates of 37.5% and 7.0%. Learned kernels first convolutional layer. Resembles mammalian RFs: oriented Gabor patterns, color opponency red-green, blue-yellow. rained with backprop. 3 4

Deep Convolutional Neural Networks 3 Deep Q-Network DQN Left: Hits and misses and close calls. Right: est st column vs. training images with closest hidden representation to the test data. Google Deep Mind Mnih et al. Nature 05. Latest application of deep learning to a reinforcement learning domain Q as in Q-learning. Applied to Atari 600 video game playing. 5 6 DQN Overview DQN Overview Input preprocessing Experience replay collect and replay state, action, reward, and resulting state Delayed periodic update of Q. Moving target ˆQ value used to compute error loss function L, parameterized by weights θ i. Gradient descent: L Input: video screen; Output: Qs, a; Reward: game score. θ i Qs, a: action-value function Value of taking action a when in state s. 7 8

DQN Algorithm 9 DQN Hidden Layer Representation t-sne map Similar perception, similar reward clustered. 3 DQN Results Superhuman performance on over half of the games. 30 DQN Operation Value vs. game state; Game state vs. action value.

Deep Learning ools Kaffe: UC Berkeley s deep learning tool box Google ensorflow Microsoft CNK Computational Network ool Kit Other: Apache Mahout MapReduce-based ML Summary Deep belief network: Based on Boltzmann machine. Elegant theory, good performance. Deep convolutional networks: High computational demand, over the board great performance. Deep Q-Network: unique apporach to reinforcement learning. End-to-end machine learning. Super-human performance. Flood of deep learning tools available. 33 34