Reward-modulated inference

Similar documents
Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

ECS171: Machine Learning

Logistic Regression & Neural Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

UNSUPERVISED LEARNING

Machine Learning Basics: Maximum Likelihood Estimation

CSC242: Intro to AI. Lecture 21

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Introduction to Neural Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Variational Autoencoder

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Classification: Logistic Regression from Data

Logistic Regression Trained with Different Loss Functions. Discussion

CSCI567 Machine Learning (Fall 2014)

CS260: Machine Learning Algorithms

ECS171: Machine Learning

Statistical Data Mining and Machine Learning Hilary Term 2016

Neural Networks: Backpropagation

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

TTIC 31230, Fundamentals of Deep Learning David McAllester, Winter Multiclass Logistic Regression. Multilayer Perceptrons (MLPs)

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Deep Feedforward Networks

Linear Regression (continued)

Neural networks and optimization

Logistic Regression. William Cohen

Importance Reweighting Using Adversarial-Collaborative Training

CSC321 Lecture 5: Multilayer Perceptrons

How to do backpropagation in a brain

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Logistic Regression. COMP 527 Danushka Bollegala

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Feedforward Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks 2

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Linear Models for Classification

Variations of Logistic Regression with Stochastic Gradient Descent

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Machine Learning 4771

Empirical Risk Minimization

Probabilistic Machine Learning. Industrial AI Lab.

STA 414/2104: Lecture 8

Classification with Perceptrons. Reading:

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

CSC321 Lecture 8: Optimization

CS229 Supplemental Lecture notes

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

Support Vector Machines

Multilayer Perceptron

Greedy Layer-Wise Training of Deep Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Distirbutional robustness, regularizing variance, and adversaries

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Online Learning and Sequential Decision Making

Artificial Neural Networks (ANN)

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Logistic Regression Logistic

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Jakub Hajic Artificial Intelligence Seminar I

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Neural networks and optimization

Classification: Logistic Regression from Data

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Unsupervised Neural Nets

Towards stability and optimality in stochastic gradient descent

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

Machine Learning Linear Models

Machine Learning and Data Mining. Linear classification. Kalev Kask

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Linear Models in Machine Learning

Neural Networks and Deep Learning.

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Introduction to Machine Learning

Lecture 5: Logistic Regression. Neural Networks

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Neural Networks and Deep Learning

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Part of the slides are adapted from Ziko Kolter

CSC321 Lecture 7: Optimization

Denoising Autoencoders

Linear Models for Regression

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Loss Functions, Decision Theory, and Linear Models

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Nonlinear Classification

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

Artificial Neural Networks. MGS Lecture 2

Transcription:

Buck Shlegeris Matthew Alger COMP3740, 2014

Outline Supervised, unsupervised, and reinforcement learning Neural nets RMI Results with RMI

Types of machine learning supervised unsupervised reinforcement

MNIST

Supervised learning Given some pairs (x, f (x)), approximate f.

Supervised learning Given some pairs (x, f (x)), approximate f. classification

Supervised learning Given some pairs (x, f (x)), approximate f. classification regression

Supervised learning Given some pairs (x, f (x)), approximate f. classification regression prediction

Unsupervised learning Given data, find patterns.

Unsupervised learning Given data, find patterns. Goal is to increase likelihood of observed data.

Unsupervised learning Given data, find patterns. Goal is to increase likelihood of observed data. Related to compression

Unsupervised learning Given data, find patterns. Goal is to increase likelihood of observed data. Related to compression This is useful as a preprocessing step.

Reinforcement learning You see some stuff, and get some reward. What do you want to do?

Reinforcement learning You see some stuff, and get some reward. What do you want to do? Way more general much harder make assumptions.

Reinforcement learning You see some stuff, and get some reward. What do you want to do? Way more general much harder make assumptions. Stationary

Reinforcement learning You see some stuff, and get some reward. What do you want to do? Way more general much harder make assumptions. Stationary MDP

Reinforcement learning You see some stuff, and get some reward. What do you want to do? Way more general much harder make assumptions. Stationary MDP Value estimation? (Use supervised prediction?)

Reinforcement learning You see some stuff, and get some reward. What do you want to do? Way more general much harder make assumptions. Stationary MDP Value estimation? (Use supervised prediction?) Explore vs exploit?

Reinforcement learning You see some stuff, and get some reward. What do you want to do? Way more general much harder make assumptions. Stationary MDP Value estimation? (Use supervised prediction?) Explore vs exploit? Preprocessing somehow?

Neural nets What s a nice class of functions from R n to R m?

Neural nets f ( x) = W x + b (1)

Neural nets f ( x) = W x + b (1) Let s use the name θ for our model, the combination of W and b.

Logistic regression P(Y = i θ, x) = s(w x + b)i (2) (s rescales vectors in R n to have an L1 norm of 1.)

Logistic regression P(Y = i θ, x) = s(w x + b)i (2) (s rescales vectors in R n to have an L1 norm of 1.) prediction(θ, x) = argmax(p(y = i θ, x)) (3) i

Logistic regression L(θ, x, y) = log(s(w x + b)y ) (4)

Logistic regression L(θ, x, y) = log(s(w x + b)y ) (4) Overfitting?

Logistic regression L(θ, x, y) = log(s(w x + b)y ) (4) Overfitting? L(θ, x, y) = log(s(w x + b)y ) R(θ) (5)

Logistic regression L(θ, x, y) = log(s(w x + b)y ) (4) Overfitting? L(θ, x, y) = log(s(w x + b)y ) R(θ) (5) What if instead of a single input vector x and single label y, we had a whole list of inputs D and a vector of labels y?

Logistic regression L(θ, x, y) = log(s(w x + b)y ) (4) Overfitting? L(θ, x, y) = log(s(w x + b)y ) R(θ) (5) What if instead of a single input vector x and single label y, we had a whole list of inputs D and a vector of labels y? L(θ, D, y) = L(θ, D i, y i ) R(θ) (6) i D

Logistic regression θ (D, y) =

Logistic regression θ (D, y) = argmin (L(θ, D, y)) (7) θ

Logistic regression θ (D, y) = argmin (L(θ, D, y)) (7) θ θ (R x y R y ), so good luck finding that analytically

Logistic regression

Logistic regression Gradient descent while last update was bigger than ɛ do W new W α L(θ,D, y) W end while

Logistic regression Stochastic gradient descent for input x and label y (D, y) do w W α L(θ, x,y) W end for

Logistic regression Only linearly separable things can be separated!

Multilayer perceptron Logistic regression Multilayer perceptron s(w x + b) s(ws(w 2 x + b2 ) + b) = s(w h + b)

Multilayer perceptron Why stop at 1 layer?

Multilayer perceptron Why stop at 1 layer?

Autoencoders

Autoencoders x = s(w s(w 2 x + b2 ) + b) (8)

Autoencoders x = s(w s(w 2 x + b2 ) + b) (8) L(θ, x) = x x (9)

Denoising autoencoders (das)

Denoising autoencoders (das) s(w s(w 2 n( x) + b2 ) + b) (10) (where n is a stochastic noise function)

Denoising autoencoders (das)

Denoising autoencoders (das) s(w s(w 2 x + b2 ) + b) (11)

Denoising autoencoders (das)

Denoising autoencoders (das)

Denoising autoencoders (das)

Denoising autoencoders (das)

Denoising autoencoders (das) L(θ, x, y) = log(s(w x + b)y ) (12)

Denoising autoencoders (das) L(θ, x, y) = log(s(w x + b)y ) (12) L(θ, x) = x x (13)

Denoising autoencoders (das) L(θ, x, y) = log(s(w x + b)y ) (12) L(θ, x) = x x (13) How do we mix between these?

Reward modulation Add a time-varying modulation function λ(t):

Reward modulation Add a time-varying modulation function λ(t): L(θ, x, y) = λ(t) supervised cost + (1 λ(t)) unsupervised cost (14)

Reward modulation Add a time-varying modulation function λ(t): L(θ, x, y) = λ(t) supervised cost + (1 λ(t)) unsupervised cost (14) ( ) ( ) L(θ, x, y) = λ(t) log(s(w x + b)y ) + (1 λ(t)) x x (15)

Reward modulation Motivations: Information theory

Reward modulation Motivations: Information theory Fine tuning

Reward modulation Motivations: Information theory Fine tuning Extreme learning

Reward modulation Questions:

Reward modulation Questions: Does it improve performance on problems?

Reward modulation Questions: Does it improve performance on problems? How should we vary reward modulation?

Reward modulation Questions: Does it improve performance on problems? How should we vary reward modulation?

Results

Hyperparameters

Hyperparameters

Hyperparameters

Classification problem Error rate 0.1 8 10 2 6 10 2 p = 0 p = 1 p = 2 p = 4 p = 10 p = 15 p = 20 4 10 2 2 10 2 0 5 10 15 20 25 30 35 40 Epochs Figure: Step reward modulation with λ = 0 if t < p, and λ = 0 otherwise.

Classification problem 10 2 λ = t λ = t 2 Error rate 6 4 λ = t 4 λ = t 10 λ = t 20 λ = t 40 λ = t 80 2 0 5 10 15 20 25 30 35 40 Epochs Figure: Linear reward modulation with different changes in λ.

Classification problem 10 2 Error rate 6 4 λ = 1 1 1+2t λ = 1 5 5+2t λ = 1 9 9+2t λ = 1 13 13+2t λ = 1 10 10+t λ = 1 12 12+t λ = 1 20 20+t λ = 1 100 100+t 2 0 5 10 15 20 25 30 35 40 Epochs Figure: Hyperbolic reward modulation with different changes in λ.

Contextual bandit no results yet.

MDP

Conclusions

Conclusions RMI works on classification (maybe because it s like fine-tuning)

Conclusions RMI works on classification (maybe because it s like fine-tuning) Haven t got contextual bandit results yet.

Conclusions RMI works on classification (maybe because it s like fine-tuning) Haven t got contextual bandit results yet. Works nicely on grid world for some reason, more MDP data to come.