imitation learning Recurrent Hal Daumé III University of

Similar documents
Advances in Structured Prediction

Better! Faster! Stronger*!

Imitation Learning by Coaching. Abstract

Imitation Learning. Richard Zhu, Andrew Kang April 26, 2016

Deep Reinforcement Learning: Policy Gradients and Q-Learning

Lecture 8: Inference. Kai-Wei Chang University of Virginia

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

Machine Learning for Structured Prediction

Lecture 13: Structured Prediction

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

An Algorithms-based Intro to Machine Learning

Neural Networks in Structured Prediction. November 17, 2015

Feature Design. Feature Design. Feature Design. & Deep Learning

Maximum Margin Planning

NLP Programming Tutorial 11 - The Structured Perceptron

Natural Language Processing with Deep Learning CS224N/Ling284

ML4NLP Multiclass Classification

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

CSC321 Lecture 15: Recurrent Neural Networks

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Research Faculty Summit Systems Fueling future disruptions

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Be able to define the following terms and answer basic questions about them:

PAC-learning, VC Dimension and Margin-based Bounds

Input layer. Weight matrix [ ] Output layer

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Crowdsourcing via Tensor Augmentation and Completion (TAC)

Sequence Modeling with Neural Networks

with Local Dependencies

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

arxiv: v3 [cs.lg] 16 Mar 2011

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Low-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park

CSC321 Lecture 9 Recurrent neural nets

Machine Learning I Continuous Reinforcement Learning

(COM4513/6513) Week 1. Nikolaos Aletras ( Department of Computer Science University of Sheffield

Experiments on the Consciousness Prior

Natural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs

From perceptrons to word embeddings. Simon Šuster University of Groningen

Towards Lifelong Machine Learning Multi-Task and Lifelong Learning with Unlabeled Tasks Christoph Lampert

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Machine Learning and Deep Learning! Vincent Lepetit!

CS230: Lecture 9 Deep Reinforcement Learning

Recurrent Neural Networks. Jian Tang

Name (NetID): (1 Point)

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Do not tear exam apart!

causal inference at hulu

CS885 Reinforcement Learning Lecture 7a: May 23, 2018

Introduction to Reinforcement Learning

Reinforcement Learning as Classification Leveraging Modern Classifiers

Data-Intensive Computing with MapReduce

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Learning Beam Search Policies via Imitation Learning

Algorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Be able to define the following terms and answer basic questions about them:

Joint Emotion Analysis via Multi-task Gaussian Processes

Recurrent and Recursive Networks

Lecture 8. Instructor: Haipeng Luo

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Reinforcement Learning via Policy Optimization

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

Structured Prediction

Online Learning and Sequential Decision Making

Juergen Gall. Analyzing Human Behavior in Video Sequences

text classification 3: neural networks

REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning

Multimodal context analysis and prediction

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Theory Continued

Reinforcement Learning and NLP

Deep Reinforcement Learning. Scratching the surface

Machine Learning Foundations

Midterm: CS 6375 Spring 2015 Solutions

Neural Map. Structured Memory for Deep RL. Emilio Parisotto

Lecture 5 Neural models for NLP

Bias-Variance in Machine Learning

Lecture 12: Algorithms for HMMs

Machine Learning for NLP

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Lecture 11 Recurrent Neural Networks I

Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity- Representativeness Reward

COMP3702/7702 Artificial Intelligence Lecture 11: Introduction to Machine Learning and Reinforcement Learning. Hanna Kurniawati

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Fun with weighted FSTs

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Machine Learning Techniques

Features of Statistical Parsers

MIRA, SVM, k-nn. Lirong Xia

Sequence Labeling: HMMs & Structured Perceptron

Transcription:

Networks imitation learning Recurrent Neural Hal Daumé III University of Maryland me@hal3.name @haldaume3

Networks imitation learning Recurrent Neural NON-DIFFERENTIABLE DISCONTINUOUS Hal Daumé III University of Maryland me@hal3.name NON-BACKPROPABLE DISCRETE CHOICES? @haldaume3

Examples of structured prediction joint The monster ate a big sandwich

image credit: Richard Padgett Sequence labeling x = the monster ate the sandwich y = Dt Nn Vb Dt Nn x = Yesterday I traveled to Lille y = - PER - - LOC The monster ate a big sandwich

Natural language parsing [root] object OUTPUT n-mod n-mod subject n-mod p-mod n-mod NLP algorithms use a kitchen sink of features INPUT

Segmentation image credit: Daniel Muñoz

7 Hal Daumé III (me@hal3.name) LOLS Simultaneous (machine) interpretation Nuremburg Trials Dozens of defendants Judges from four nations (three languages) Status quo: speak, then translate After Nuremberg, simultaneous translations became the norm Long wait bad conversation

8 Hal Daumé III (me@hal3.name) LOLS Why simultaneous interpretation is hard Human languages have vastly different word orders About half are OV, the other half are VO This comes with a lot more baggage than just verb-final man-top store-loc go-past the man went to the store Running (German/English) Example: Ich bin mit dem Zug nach Ulm gefahren I food-obj am buy-desire with the man-top train to store-loc Ulm traveled go-past I the man (.... who.. waiting. wanted..... to ) buy food went traveled to the by store train to Ulm

9 Hal Daumé III (me@hal3.name) LOLS Model for interpretation decisions We have a set of actions (predict / translate) Wait Predict clause-verb Predict next word Commit ( speak ) In a changing environment (state) The words we've seen so far Our models' internal predictions With well-defined notions of: Reward (or loss) at the end Optimal action at training time

10 Hal Daumé III (me@hal3.name) LOLS Example of interpretation trajectory Big Challenges: Ich bin mit dem Zug nach Ulm gefahren I am with the train to Ulm traveled I (...... waiting...... ) traveled by train to Ulm No supervision about when to wait Complicated loss/reward functions

Back to the original problem... How to optimize a discrete, joint loss? Input: x X Truth: y Y(x) Outputs: Y(x) Predicted: ŷ Y(x) Loss: loss(y, ŷ) Data: (x,y) ~ D I can can a can Pro Md Vb Dt Nn Pro Md Md Dt Vb Pro Md Md Dt Nn Pro Md Nn Dt Md Pro Md Nn Dt Vb Pro Md Nn Dt Nn Pro Md Vb Dt Md Pro Md Vb Dt Vb

Back to the original problem... How to optimize a discrete, joint loss? Input: x X Truth: y Y(x) Outputs: Y(x) Predicted: ŷ Y(x) Loss: loss(y, ŷ) Data: (x,y) ~ D Goal: find h H such that h(x) Y(x) minimizing E (x,y)~d [ loss(y, h(x)) ] based on N samples (x n, y n ) ~ D

Search spaces When y decomposes in an ordered manner, a sequential decision making process emerges I decision Pro Md Vb Dt Nn action can decision Pro Md Vb Dt Nn action can decision Pro Md Vb Dt Nn action

Search spaces When y decomposes in an ordered manner, a sequential decision making process emerges a can Pro Md Vb Dt Nn Pro Md Vb Dt Nn Encodes an output ŷ = ŷ(e) from which loss(y, ŷ) can be computed (at training time) e end

Policies A policy maps observations to actions ( ) =a obs. input: x timestep: t partial traj: τ anything else

An analogy from playing Mario From Mario AI competition 2009 Input: Output: Jump in {0,1} Right in {0,1} Left in {0,1} Speed in {0,1} High level goal: Watch an expert play and learn to mimic her behavior Extracted 27K+ binary features from last 4 observations (14 binary features for every cell)

Training (expert) Video credit: Stéphane Ross, Geoff Gordon and Drew Bagnell

Warm-up: Supervised learning 1.Collect trajectories from expert π ref 2.Store as dataset D = { ( o, π ref (o,y) ) o ~ π ref } 3.Train classifier π on D Let π play the game! ref π

Test-time execution (sup. learning) Video credit: Stéphane Ross, Geoff Gordon and Drew Bagnell

What's the (biggest) failure mode? The expert never gets stuck next to pipes Classifier doesn't learn to recover! ref π

Kittens, revisited. (Held & Hein, 1936)

Warm-up II: Imitation learning 1. Collect trajectories from expert π ref 2. Dataset D 0 = { ( o, π ref (o,y) ) o ~ π ref } 3. Train π 1 on D 0 4. Collect new trajectories from π 1 But let the expert steer! 5. Dataset D 1 = { ( o, π ref (o,y) ) o ~ π 1 } 6. Train π 2 on D 0 D 1 If N = T log T, L(π n ) < T N + O(1) for some n π 1 π 2 ref π In general: Dn = { ( o, π ref (o,y) ) o ~ π n } Train πn+1 on i n D i

Test-time execution (DAgger) Video credit: Stéphane Ross, Geoff Gordon and Drew Bagnell

What's the biggest failure mode? Classifier only sees right versus not-right No notion of better or worse No partial credit Must have a single target answer π 1 π 2 π*

Learning to search: LOLS 1.Let learned policy π drive for t timesteps to obs. o 2.For each possible action a: Take action a, and let expert πref drive the rest Record the overall loss, ca 3.Update π based on example: (o, c 1, c 2,..., c K ) 4.Goto (1) Side note: can also be run in bandit mode w/ sampling π 100 0.4 0

So... what's the connection? I Pro Md Vb Dt Nn can can Pro Md Vb Dt Nn Pro Md Vb Dt Nn ( OAA NNet Φ(x,s,a) obs. ) input: x timestep: t partial traj: τ anything else This looks a lot like an RNN! 12/18/15

Two quick results If you don't backprop through time: POS tagging: no change Named entity recognition: marginal improvement Dependency parsing: 1% gain over strong baseline If you do do backprop through time: synthetic sequence labeling data, Gaussian obs cannot exactly fit (most) generated datasets mashup: 10.4% error 1.4% error on code at http://hal3.name/tmp/rnnlols.py training data! (thanks to autograd folks for autograd!) 12/18/15

ICPR '10 EMNLP'13 ICC'15 CVPR '11 EMNLP'14 Fusion'15 EMNLP'12 NIPS '14 EMNLP'15 NIPS '12 SLT '14 + more Alekh Agarwal I am on the job market! umiacs.umd.edu/~hhe Kai-Wei Chang Akshay Krishnamurthy John Langford Simultaneous machine interpretation is a super fun problem and you should work on it! Not being able to backprop something isn't always the end of the world you're not stuck with RL! RNN+LOLS mashup appears promising! He He Thanks! Questions?