Maximum Margin Planning
|
|
- Edwina Day
- 6 years ago
- Views:
Transcription
1 Maximum Margin Planning Nathan Ratliff, Drew Bagnell and Martin Zinkevich Presenters: Ashesh Jain, Michael Hu CS6784 Class Presentation
2 Theme 1. Supervised learning 2. Unsupervised learning 3. Reinforcement learning Reward based learning Inverse Reinforcement Learning
3 Motivating Example (Abbeel et. al)
4 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Activity Conclusion
5 Reward-based Learning
6 Markov Decision Process (MDP) MDP is defined by (X, A, p, R) X A p R State space Action space Dynamics model p(x x, a) Reward function R(x) or R(x, a) 0.1 Y X
7 Markov Decision Process (MDP) MDP is defined by (X, Policy A, p, R) X A p R π: X A State space Action space Dynamics model p(x x, a) Reward function R(x) or R(x, a) 0.1 Y Example policy X
8 Graphical representation of MDP a t x t R t a t+1 x t+1 R t+1 Goal: Learn an optimal policy π: X A π = arg max E π γt R t (x t, π(x t )) t=0 x t, π x t x t+1, π x t+1
9 Activity Very Easy!!! Deterministic Dynamics Draw the optimal Policy!
10 Activity Very Easy!!!
11 Activity Solution
12 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Activities Conclusion
13 RL to Inverse-RL (IRL) State space: X Action space:a Optimal Reward Function Policy π π: R(x, a) a Inverse RL Algorithm Dynamics Model p(x x, a) Optimal Reward Function Policy π π: R(x, a) a Inverse problem: Given optimal policy π or samples drawn from it, recover the reward R(x, a)
14 Why Inverse-RL? State space: X Action space:a Reward Function R(x, a) RL Algorithm Specifying R(x, a) is hard, but samples from π are easily available Ratliff et. al. Kober et. al. Abbeel et. al.
15 IRL framework π : x a Expert Interacts y = x 1, a 1 x 2, a 2 x 3, a 3 x n, a n Demonstration w T f y = w T w T w T w T Reward
16 Paper contributions 1. Formulated IRL as structural prediction Maximum margin approach 2. Two optimization algorithms Problem specific solver Sub-gradient method 3. Robotic application: 2D navigation etc.
17 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Activities Conclusion
18 Formulation n Input X i, A i, p i, f i, y i, L i i=1 X i A i p i y i f i ( ) State space Action space Dynamics model Expert demonstration Feature function L i (, y i ) Loss function 0.1 Y X
19 Max-margin formulation λ min w,ξ i 2 w n s. t. i w T f i y i i β i ξ i max y Y i w T f i y + L y, y i Features linearly decompose over path f i y ξ i = F i μ F i = d μ = Feature map X A Frequency counts X A
20 Max-margin formulation μ λ min w,ξ i 2 w n s. t. i i β i ξ i w T F i μ i max μ G i w T F i μ + l i T μ ξ i satisfies Bellman-flow constraints x x,a μ x,a p i x x, a + s i x = a μ x,a Inflow Outflow
21 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Problem specific QP-formulation Sub-gradient method Activities Conclusion
22 Problem specific QP-formulation λ min w,ξ i 2 w n s. t. i i β i ξ i w T F max w T F i μ + l T i μ i s T i v i ξ i i μ ξ i μ G i μ x,a p i x x x, a + s i = μ x,a i, x, a v x i w T F i + l x,a i + p i x x x x, a v i x,a x a Inflow Outflow Can be optimized using off-the-shelf QP solvers
23 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Problem specific QP-formulation Sub-gradient method Activities Conclusion
24 Optimizing via Sub-gradient λ min w,ξ i 2 w n s. t. i i β i ξ i w T F i μ i max μ G i w T F i μ + l i T μ ξ i Re-writing constraint Its tight at optimality ξ i max μ G i w T F i μ + l i T μ w T F i μ i 0 ξ i = max μ G i w T F i μ + l i T μ w T F i μ i μ satisfies Bellman-flow constraints
25 Optimizing via Sub-gradient c w λ = min w,ξ i 2 w n i β i max μ G i w T F i μ + l i T μ w T F i μ i Ratliff, PhD Thesis
26 Weight update via Sub-gradient Standard gradient descent update w t+1 w t α αg wt c(w) λ min w,ξ i 2 w n β i μ = argmax μ G i w t T F i + l i T μ i max μ G i w T F i μ + l i T μ w T F i μ i Loss-augmented reward g wt = λw t + 1 n i β i F i (μ μ i )
27 Convergence summary n c w = λ 2 w n i=1 r i w w w αg w How to chose α? If w g w G then a constant step size 0 < α 1 λ gives linear convergence to a region around the minimum w t+1 w 2 κ t+1 w 0 w 2 + αg2 λ Optimization error 0 < κ < 1
28 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Activity Conclusion
29 Convergence proof We know w g w G w t+1 w 2 = w t αg t w 2 = w t w 2 + α 2 g t 2 2αg t T (w t w )? c(w) w w c w c w + g T w w w + λ 2 λ Substitute w w w w t 2 w t w 2 g t T (w t w ) Use c w c(w t ) w w 2 1 αλ w t w 2 + α 2 G 2
30 Outline Basics of Reinforcement Learning State and action space Activity Introducing Inverse RL Max-margin formulation Optimization algorithms Activity Conclusion
31 2D path planning Expert s Demonstration Learned Reward Planning in a new Environment
32 Car driving (Abbeel et. al)
33 Bad driver (Abbeel et. al)
34 Recent works 1. N. Ratliff, D. Bradley, D. Bagnell, J. Chestnutt. Boosting Structures Prediction for Imitation Learning. In NIPS B. Ziebart, A. Mass, D. Bagnell, A. K. Dey. Maximum Entropy Inverse Reinforcement Learning. In AAAI P. Abbeel, A. Coats, A. Ng. Autonomous helicopter aerobatics through apprenticeship learning. In IJRR S. Ross, G. Gordon, D. Bagnell. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In AISTATS B. Akgun, M. Cakmak, K. Jiang, A. Thomaz. Keyframe-based leaning from demonstrations. In IJSR A. Jain, B. Wojcik, T. Joachims, A. Saxena. Learning Trajectory Preferences for Manipulators via Iterative Improvement. In NIPS 2013
35 Questions Summary of key points: Reward-based learning can be modeled by the MDP framework. Inverse Reinforcement Learning takes perception features as input and outputs a reward function. The max-margin planning problem can be formulated as a tractable QP. Sub-gradient descent solves an equivalent problem with provable convergence bound.
Reinforcement Learning
Reinforcement Learning Inverse Reinforcement Learning Inverse RL, behaviour cloning, apprenticeship learning, imitation learning. Vien Ngo Marc Toussaint University of Stuttgart Outline Introduction to
More informationReinforcement Learning
Reinforcement Learning Inverse Reinforcement Learning LfD, imitation learning/behavior cloning, apprenticeship learning, IRL. Hung Ngo MLR Lab, University of Stuttgart Outline Learning from Demonstrations
More informationRelative Entropy Inverse Reinforcement Learning
Relative Entropy Inverse Reinforcement Learning Abdeslam Boularias Jens Kober Jan Peters Max-Planck Institute for Intelligent Systems 72076 Tübingen, Germany {abdeslam.boularias,jens.kober,jan.peters}@tuebingen.mpg.de
More informationBatch, Off-policy and Model-free Apprenticeship Learning
Batch, Off-policy and Model-free Apprenticeship Learning Edouard Klein 13, Matthieu Geist 1, and Olivier Pietquin 12 1. Supélec-Metz Campus, IMS Research group, France, prenom.nom@supelec.fr 2. UMI 2958
More informationMAP Inference for Bayesian Inverse Reinforcement Learning
MAP Inference for Bayesian Inverse Reinforcement Learning Jaedeug Choi and Kee-Eung Kim bdepartment of Computer Science Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea jdchoi@ai.kaist.ac.kr,
More informationNonparametric Bayesian Inverse Reinforcement Learning
PRML Summer School 2013 Nonparametric Bayesian Inverse Reinforcement Learning Jaedeug Choi JDCHOI@AI.KAIST.AC.KR Sequential Decision Making (1) Multiple decisions over time are made to achieve goals Reinforcement
More informationPreference Elicitation for Sequential Decision Problems
Preference Elicitation for Sequential Decision Problems Kevin Regan University of Toronto Introduction 2 Motivation Focus: Computational approaches to sequential decision making under uncertainty These
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationSubgradient Methods for Maximum Margin Structured Learning
Nathan D. Ratliff ndr@ri.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Robotics Institute, Carnegie Mellon University, Pittsburgh, PA. 15213 USA Martin A. Zinkevich maz@cs.ualberta.ca Department of Computing
More informationDeep Reinforcement Learning: Policy Gradients and Q-Learning
Deep Reinforcement Learning: Policy Gradients and Q-Learning John Schulman Bay Area Deep Learning School September 24, 2016 Introduction and Overview Aim of This Talk What is deep RL, and should I use
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationIntroduction to Reinforcement Learning Part 1: Markov Decision Processes
Introduction to Reinforcement Learning Part 1: Markov Decision Processes Rowan McAllister Reinforcement Learning Reading Group 8 April 2015 Note I ve created these slides whilst following Algorithms for
More informationImproving the Efficiency of Bayesian Inverse Reinforcement Learning
Improving the Efficiency of Bayesian Inverse Reinforcement Learning Bernard Michini* and Jonathan P. How** Aerospace Controls Laboratory Massachusetts Institute of Technology, Cambridge, MA 02139 USA Abstract
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationImitation Learning. Richard Zhu, Andrew Kang April 26, 2016
Imitation Learning Richard Zhu, Andrew Kang April 26, 2016 Table of Contents 1. Introduction 2. Preliminaries 3. DAgger 4. Guarantees 5. Generalization 6. Performance 2 Introduction Where we ve been The
More informationInverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics Michael Herman Tobias Gindele Jörg Wagner Felix Schmitt Wolfram Burgard Robert Bosch GmbH D-70442 Stuttgart, Germany
More informationInverse Optimal Control
Inverse Optimal Control Oleg Arenz Technische Universität Darmstadt o.arenz@gmx.de Abstract In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing
More informationA Game-Theoretic Approach to Apprenticeship Learning
Advances in Neural Information Processing Systems 20, 2008. A Game-Theoretic Approach to Apprenticeship Learning Umar Syed Computer Science Department Princeton University 35 Olden St Princeton, NJ 08540-5233
More informationarxiv: v1 [cs.ro] 12 Aug 2016
Density Matching Reward Learning Sungjoon Choi 1, Kyungjae Lee 1, H. Andy Park 2, and Songhwai Oh 1 arxiv:1608.03694v1 [cs.ro] 12 Aug 2016 1 Seoul National University, Seoul, Korea {sungjoon.choi, kyungjae.lee,
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationDeep Reinforcement Learning. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017
Deep Reinforcement Learning STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 19, 2017 Outline Introduction to Reinforcement Learning AlphaGo (Deep RL for Computer Go)
More informationAdversarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer
ersarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer Xiangli Chen Mathew Monfort Brian D. Ziebart University of Illinois at Chicago Chicago, IL 60607 {xchen0,mmonfo,bziebart}@uic.edu
More informationReinforcement Learning. Yishay Mansour Tel-Aviv University
Reinforcement Learning Yishay Mansour Tel-Aviv University 1 Reinforcement Learning: Course Information Classes: Wednesday Lecture 10-13 Yishay Mansour Recitations:14-15/15-16 Eliya Nachmani Adam Polyak
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationGenerative Adversarial Imitation Learning
Generative Adversarial Imitation Learning Jonathan Ho OpenAI hoj@openai.com Stefano Ermon Stanford University ermon@cs.stanford.edu Abstract Consider learning a policy from example expert behavior, without
More informationCS788 Dialogue Management Systems Lecture #2: Markov Decision Processes
CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision
More informationApprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods Gergely Neu Budapest University of Technology and Economics Műegyetem rkp. 3-9. H-1111 Budapest, Hungary Csaba Szepesvári
More informationREINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning
REINFORCE Framework for Stochastic Policy Optimization and its use in Deep Learning Ronen Tamari The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (#67679) February 28, 2016 Ronen Tamari
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement
More informationReinforcement Learning as Variational Inference: Two Recent Approaches
Reinforcement Learning as Variational Inference: Two Recent Approaches Rohith Kuditipudi Duke University 11 August 2017 Outline 1 Background 2 Stein Variational Policy Gradient 3 Soft Q-Learning 4 Closing
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationarxiv: v2 [cs.lg] 3 Jul 2012
Agnostic System Identification for Model-Based Reinforcement Learning Stéphane Ross Robotics Institute, Carnegie Mellon University, PA USA STEPHANEROSS@CMU.EDU J. Andrew Bagnell DBAGNELL@RI.CMU.EDU Robotics
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationFast Inverse Reinforcement Learning with Interval Consistent Graph for Driving Behavior Prediction
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Fast Inverse Reinforcement Learning with Interval Consistent Graph for Driving Behavior Prediction Masamichi Shimosaka,
More informationA Cascaded Supervised Learning Approach to Inverse Reinforcement Learning
A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning Edouard Klein 1,2, Bilal Piot 2,3, Matthieu Geist 2, Olivier Pietquin 2,3 1 ABC Team LORIA-CNRS, France. 2 Supélec, IMS-MaLIS Research
More informationTopics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems
Topics of Active Research in Reinforcement Learning Relevant to Spoken Dialogue Systems Pascal Poupart David R. Cheriton School of Computer Science University of Waterloo 1 Outline Review Markov Models
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationMultiagent Value Iteration in Markov Games
Multiagent Value Iteration in Markov Games Amy Greenwald Brown University with Michael Littman and Martin Zinkevich Stony Brook Game Theory Festival July 21, 2005 Agenda Theorem Value iteration converges
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationMaximum Entropy Inverse Reinforcement Learning
Maximum Entropy Inverse Reinforcement Learning Brian D. Ziebart, Andrew Maas, J.Andrew Bagnell, and Anind K. Dey School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 bziebart@cs.cmu.edu,
More informationModeling Decision Making with Maximum Entropy Inverse Optimal Control Thesis Proposal
Modeling Decision Making with Maximum Entropy Inverse Optimal Control Thesis Proposal Brian D. Ziebart Machine Learning Department Carnegie Mellon University September 30, 2008 Thesis committee: J. Andrew
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationRL 14: Simplifications of POMDPs
RL 14: Simplifications of POMDPs Michael Herrmann University of Edinburgh, School of Informatics 04/03/2016 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationReinforcement Learning. George Konidaris
Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationOptimal Control with Learned Forward Models
Optimal Control with Learned Forward Models Pieter Abbeel UC Berkeley Jan Peters TU Darmstadt 1 Where we are? Reinforcement Learning Data = {(x i, u i, x i+1, r i )}} x u xx r u xx V (x) π (u x) Now V
More informationActivity Forecasting. Research CMU. Carnegie Mellon University. Kris Kitani Carnegie Mellon University
Carnegie Mellon University Research Showcase @ CMU Robotics Institute School of Computer Science 8-2012 Activity Forecasting Kris Kitani Carnegie Mellon University Brian D. Ziebart Carnegie Mellon University
More informationLinear Least-squares Dyna-style Planning
Linear Least-squares Dyna-style Planning Hengshuai Yao Department of Computing Science University of Alberta Edmonton, AB, Canada T6G2E8 hengshua@cs.ualberta.ca Abstract World model is very important for
More informationInverse Reinforcement Learning in Partially Observable Environments
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Inverse Reinforcement Learning in Partially Observable Environments Jaedeug Choi and Kee-Eung Kim Department
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationGeneralized Inverse Reinforcement Learning with Linearly Solvable MDP
Generalized Inverse Reinforcement Learning with Linearly Solvable MDP Masahiro Kohjima ( ), Tatsushi Matsubayashi, and Hiroshi Sawada NTT Service Evolution Laboratories, NTT Corporation, Japan {kohjima.masahiro,matsubayashi.tatsushi,sawada.hiroshi}@lab.ntt.co.jp
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationLecture 23: Reinforcement Learning
Lecture 23: Reinforcement Learning MDPs revisited Model-based learning Monte Carlo value function estimation Temporal-difference (TD) learning Exploration November 23, 2006 1 COMP-424 Lecture 23 Recall:
More informationComputational Rationalization: The Inverse Equilibrium Problem
Kevin Waugh waugh@cs.cmu.edu Brian D. Ziebart bziebart@cs.cmu.edu J. Andrew Bagnell dbagnell@ri.cmu.edu Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA 15213 Abstract Modeling the purposeful
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationImitation Learning by Coaching. Abstract
Imitation Learning by Coaching He He Hal Daumé III Department of Computer Science University of Maryland College Park, MD 20740 {hhe,hal}@cs.umd.edu Abstract Jason Eisner Department of Computer Science
More information15-780: Graduate Artificial Intelligence. Reinforcement learning (RL)
15-780: Graduate Artificial Intelligence Reinforcement learning (RL) From MDPs to RL We still use the same Markov model with rewards and actions But there are a few differences: 1. We do not assume we
More informationApproximation Methods in Reinforcement Learning
2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement
More informationBits of Machine Learning Part 2: Unsupervised Learning
Bits of Machine Learning Part 2: Unsupervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationAdministration. CSCI567 Machine Learning (Fall 2018) Outline. Outline. HW5 is available, due on 11/18. Practice final will also be available soon.
Administration CSCI567 Machine Learning Fall 2018 Prof. Haipeng Luo U of Southern California Nov 7, 2018 HW5 is available, due on 11/18. Practice final will also be available soon. Remaining weeks: 11/14,
More informationBayesian Nonparametric Feature Construction for Inverse Reinforcement Learning
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning Jaedeug Choi and Kee-Eung Kim Department
More informationOptimal Control. McGill COMP 765 Oct 3 rd, 2017
Optimal Control McGill COMP 765 Oct 3 rd, 2017 Classical Control Quiz Question 1: Can a PID controller be used to balance an inverted pendulum: A) That starts upright? B) That must be swung-up (perhaps
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationThe Art of Sequential Optimization via Simulations
The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More informationCompetitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations
Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations Xingyu Wang 1 Diego Klabjan 1 Abstract his paper considers the problem of inverse reinforcement learning in zero-sum
More informationAn online kernel-based clustering approach for value function approximation
An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr
More informationBe able to define the following terms and answer basic questions about them:
CS440/ECE448 Section Q Fall 2017 Final Review Be able to define the following terms and answer basic questions about them: Probability o Random variables, axioms of probability o Joint, marginal, conditional
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement
More informationReinforcement Learning
CS7/CS7 Fall 005 Supervised Learning: Training examples: (x,y) Direct feedback y for each input x Sequence of decisions with eventual feedback No teacher that critiques individual actions Learn to act
More informationCS Machine Learning Qualifying Exam
CS Machine Learning Qualifying Exam Georgia Institute of Technology March 30, 2017 The exam is divided into four areas: Core, Statistical Methods and Models, Learning Theory, and Decision Processes. There
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationarxiv: v2 [cs.lg] 13 Aug 2018 ABSTRACT
LEARNING ROBUST REWARDS WITH ADVERSARIAL INVERSE REINFORCEMENT LEARNING Justin Fu, Katie Luo, Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley
More informationMarkov Decision Processes
Markov Decision Processes Noel Welsh 11 November 2010 Noel Welsh () Markov Decision Processes 11 November 2010 1 / 30 Annoucements Applicant visitor day seeks robot demonstrators for exciting half hour
More informationLecture 7: Value Function Approximation
Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,
More informationRL 14: POMDPs continued
RL 14: POMDPs continued Michael Herrmann University of Edinburgh, School of Informatics 06/03/2015 POMDPs: Points to remember Belief states are probability distributions over states Even if computationally
More informationLecture 23: Online convex optimization Online convex optimization: generalization of several algorithms
EECS 598-005: heoretical Foundations of Machine Learning Fall 2015 Lecture 23: Online convex optimization Lecturer: Jacob Abernethy Scribes: Vikas Dhiman Disclaimer: hese notes have not been subjected
More informationTutorial on Policy Gradient Methods. Jan Peters
Tutorial on Policy Gradient Methods Jan Peters Outline 1. Reinforcement Learning 2. Finite Difference vs Likelihood-Ratio Policy Gradients 3. Likelihood-Ratio Policy Gradients 4. Conclusion General Setup
More informationA Tour of Reinforcement Learning The View from Continuous Control. Benjamin Recht University of California, Berkeley
A Tour of Reinforcement Learning The View from Continuous Control Benjamin Recht University of California, Berkeley trustable, scalable, predictable Control Theory! Reinforcement Learning is the study
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More information