Reinforcement Learning

Size: px
Start display at page:

Download "Reinforcement Learning"

Transcription

1 Reinforcement Learning Inverse Reinforcement Learning LfD, imitation learning/behavior cloning, apprenticeship learning, IRL. Hung Ngo MLR Lab, University of Stuttgart

2 Outline Learning from Demonstrations (LfD) Behavioral Cloning/Imitation Learning Inverse Reinforcement Learning (IRL) Algorithms 2/14

3 Learning from Demonstrations (LfD) Setting: An oracle teaches an agent how to perform a given task. Given: Samples of an MDP agent s behavior over time and in different circumstances, from a supposedly optimal policy π o, i.e., A set of trajectories {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Reward signal r t = R(s t, a t, s t+1 ) unobserved Transition model T (s, a, s ) = P (s s, a) known/unknown. 3/14

4 Learning from Demonstrations (LfD) Setting: An oracle teaches an agent how to perform a given task. Given: Samples of an MDP agent s behavior over time and in different circumstances, from a supposedly optimal policy π o, i.e., A set of trajectories {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Reward signal r t = R(s t, a t, s t+1 ) unobserved Transition model T (s, a, s ) = P (s s, a) known/unknown. Goals: Recover teacher s policy π o directly: behavioral cloning, or imitation learning. Recover teacher s latent reward function R o (s, a, s ): IRL. Recover teacher s policy π o indirectly by first recovering R o (s, a, s ): apprenticeship learning via IRL. 3/14

5 Behavioral cloning Formulated as a supervised-learning problem Given training data {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Learn policy mapping ˆπ o : S A. Solved using SVM, (deep) ANN, etc. 4/14

6 Behavioral cloning Formulated as a supervised-learning problem Given training data {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Learn policy mapping ˆπ o : S A. Solved using SVM, (deep) ANN, etc. Behavioral cloning/il: can only mimic the trajectory of the teacher, no transfer w.r.t. task (e.g., env. changed but similar goals), may fail in non-markovian environments (e.g. in driving; sometimes states from several time-steps needed). 4/14

7 Behavioral cloning Formulated as a supervised-learning problem Given training data {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Learn policy mapping ˆπ o : S A. Solved using SVM, (deep) ANN, etc. Behavioral cloning/il: can only mimic the trajectory of the teacher, no transfer w.r.t. task (e.g., env. changed but similar goals), may fail in non-markovian environments (e.g. in driving; sometimes states from several time-steps needed). IRL vs. behavioral cloning: ˆRo vs. ˆπ o. Why not recover V πo instead? reward function is more succint (easily generalizable/transferrable) values are trajectory dependent 4/14

8 Why IRL? As computational model for learning behaviors in natural world Bee foraging (Montague et al 1995) Song-bird vocalization (Doya & Sejnowski 1995) 5/14

9 Why IRL? As computational model for learning behaviors in natural world Bee foraging (Montague et al 1995) Song-bird vocalization (Doya & Sejnowski 1995) Construction of an intelligent agent in a particular domain Modeling humans and other adversarial/cooperative agents. Collaborative robots (learn reward func. & plan for cooperative tasks) Intermediate step in apprenticeship learning (autonomous driving, driver preferences, autonomous flight, e.g., helicopter, etc.) Abbeel et al 04 Ziebart et al. 08 Andrew Ng et. al. 5/14

10 Example: Urban Navigation picture from a tutorial of Pieter Abbeel. 6/14

11 IRL Formulation #1: Small, Discrete MDPs Given: An incomplete MDP M = S, A, T, R, γ. known transition model T (s, a, s ) = P (s s, a), s, a, s unobserved but bounded reward signal, R(s, a, s ) r max, s, a, s (for simplicity, consider state-dependent reward functions, R(s)) known, supposedly optimal policy π o (s), s S, instead of {ξ i } n i=1. Find R : S [ r max, r max ] such that teacher s policy π o is optimal, furthermore: simple, and robust reward function Notes: in the following we fix an enumeration on the state space: S = {s 1,..., s S }. Then R is a column vector in R S, with R i = R(s i ). Andrew Ng, Stuart Russell: Algorithms for Inverse Reinforcement Learning. ICML /14

12 IRL Formulation #1: Small, Discrete MDPs Find R R S such that teacher s policy π o is optimal: recall Bellman optimality theorem (for a known MDP): π o (s) is optimal π o (s) argmax a Q πo (s, a), s S Q πo (s, π o (s)) Q πo (s, a), s S, a A ( ) 1 x y denotes vectorial (component-wise) inequality: x i y i for every index i. 8/14

13 IRL Formulation #1: Small, Discrete MDPs Find R R S such that teacher s policy π o is optimal: recall Bellman optimality theorem (for a known MDP): π o (s) is optimal π o (s) argmax a Q πo (s, a), s S Q πo (s, π o (s)) Q πo (s, a), s S, a A ( ) define policy-conditioned transition matrices P o and P a [0, 1] S S : [P o ] ij := P (s j s i, π o (s i )), and [P a ] ij := P (s j s i, a), s i, s j S 1 x y denotes vectorial (component-wise) inequality: x i y i for every index i. 8/14

14 IRL Formulation #1: Small, Discrete MDPs Find R R S such that teacher s policy π o is optimal: recall Bellman optimality theorem (for a known MDP): π o (s) is optimal π o (s) argmax a Q πo (s, a), s S Q πo (s, π o (s)) Q πo (s, a), s S, a A ( ) define policy-conditioned transition matrices P o and P a [0, 1] S S : [P o ] ij := P (s j s i, π o (s i )), and [P a ] ij := P (s j s i, a), s i, s j S we can represent the constraints 1 (*) on R as: (P o P a )(I γp o ) 1 R 0, a A ( ) Proof: Bellman equations Q πo (s, a) = R(s) + γ s P (s s, a)v πo (s ), and V πo = (I γp o ) 1 R. Denote by Q πo π a length- S column vector with elements Q πo π (s) := Qπo (s, π(s)), i.e., Q πo π = R + γp π V πo. The set of S A constraints in (*) can be written in matrix form (by fixing an action a for all starting states s S) as: Q πo o Qπo a 0, a A (**). 1 x y denotes vectorial (component-wise) inequality: x i y i for every index i. 8/14

15 IRL Formulation #1: Small, Discrete MDPs Challenges: What if noisy teacher? (i.e., a t π o (s t ) at some t) instead of full π o (s), s S, only given sampled trajectories {ξ i } n i=1? computationally expensive/infeasible: S A constraints for each R reward function ambiguity: IRL is ill-posed! (R = 0 is a solution.) From reward-shaping theory: If the MDP M with reward function R admits π o as an optimal policy, then M with affine-transformed reward function below also admits π o as an optimal policy: R (s, a, s ) = αr(s, a, s ) + γψ(s ) ψ(s), with ψ : S R, α 0. 9/14

16 IRL Formulation #1: Small, Discrete MDPs Challenges: What if noisy teacher? (i.e., a t π o (s t ) at some t) instead of full π o (s), s S, only given sampled trajectories {ξ i } n i=1? computationally expensive/infeasible: S A constraints for each R reward function ambiguity: IRL is ill-posed! (R = 0 is a solution.) From reward-shaping theory: If the MDP M with reward function R admits π o as an optimal policy, then M with affine-transformed reward function below also admits π o as an optimal policy: R (s, a, s ) = αr(s, a, s ) + γψ(s ) ψ(s), with ψ : S R, α 0. One solution (to the reward ambiguity issue): find simple, and robust R, e.g., use l 1 -norm penalty R 1, and maximize sum of value-margins V πo (s) of π o & second-best action, V πo (s) = Q πo (s, π o (s)) max a π o (s) Qπo (s, a) = min a π o (s) [Qπo (s, π o (s)) Q πo (s, a)] 9/14

17 IRL Formulation #1: Small, Discrete MDPs Combining altogether: { max R R S min a A\π o (s) s S { } } (Ps o Ps a )(I γp o ) 1 R λ R 1 s. t. (P o P a )(I γp o ) 1 R 0, a A R(s) r max, s S with P a s the row vector of transition probabilities P (s s, a), s S, i.e., P o s, P a s are the s-th rows of P o, P a, respectively. 10/14

18 IRL Formulation #1: Small, Discrete MDPs Combining altogether: { max R R S min a A\π o (s) s S { } } (Ps o Ps a )(I γp o ) 1 R λ R 1 s. t. (P o P a )(I γp o ) 1 R 0, a A R(s) r max, s S with P a s the row vector of transition probabilities P (s s, a), s S, i.e., P o s, P a s are the s-th rows of P o, P a, respectively. Linear Program: hints We can use two dummy length- S { column vectors U = } R and Γ a vector with s-th element as min a A\π o (s) (Ps o P s a)(i γp o ) 1 R, and create a length-3 S column vector x = (R, U, Γ). Let c denote a length-3 S column vector c = (0, 1, λ1), the LP becomes max x c x s.t. U R U, 0 U r max1, Γ 0, A a R 0, Ā a R Γ a, a A, with A a = (P o P a )(I γp o ) 1, and Āa, Γ a are the resulting matrices and vectors after deleting from A a, Γ the rows s such that π o (s) = a. 10/14

19 IRL Formulation #2: With LFA For large/continuous domains, with sampled trajectories. Assume s 0 P 0 (S); for teacher s policy π o to be optimal: [ E γ t R(s t ) π o] [ ] E γ t R(s t ) π, π t=0 t=0 11/14

20 IRL Formulation #2: With LFA For large/continuous domains, with sampled trajectories. Assume s 0 P 0 (S); for teacher s policy π o to be optimal: [ E γ t R(s t ) π o] [ ] E γ t R(s t ) π, π t=0 Using LFA: R(s) = w φ(s), where w R n, w 2 1, and φ : S R n. [ ] E γ t R(s t ) π = E t=0 [ t=0 t=0 ] γ t w φ(s t ) π [ ] = w E γ t φ(s t ) π = w η(π) The problem becomes find w such that w η(π o ) w η(π), π t=0 η(π): feature expectation of policy π can be evaluated with sampled trajectories from π. [ ] η(π) = E γ t φ(s t ) π 1 N t=0 N T i γ t φ(s t ) i=1 t=0 11/14

21 Apprenticeship learning: Literature Pieter Abbeel, Andrew Ng: Apprenticeship learning via inverse RL. ICML 04 Pieter Abbeel et al.: An Application of RL to Aerobatic Helicopter Flight. NIPS 06. Ratliff, Nathanet al., Maximum margin planning. ICML 06 Ziebart, Brian D., et al. Maximum Entropy Inverse Reinforcement Learning. AAAI 08 Adam Coates et al.. Apprenticeship learning for helicopter control. Commun. ACM 09 12/14

22 Apprenticeship learning via IRL: Max-margin From IRL formulation #2, find a policy π whose performance is as close to performance of oracle s policy π o as possible: w η(π o ) w η(π) ɛ 13/14

23 Apprenticeship learning via IRL: Max-margin From IRL formulation #2, find a policy π whose performance is as close to performance of oracle s policy π o as possible: w η(π o ) w η(π) ɛ ] Also maximize the value margin γ = min π [w η(π o ) w η(π), 13/14

24 Apprenticeship learning via IRL: Max-margin From IRL formulation #2, find a policy π whose performance is as close to performance of oracle s policy π o as possible: w η(π o ) w η(π) ɛ ] Also maximize the value margin γ = min π [w η(π o ) w η(π), Constraints Generation Algorithm: 1: Initialize π 0 (depending on chosen RL alg, e.g., tabular, approximate RL, etc.) 2: for i = 1, 2,... do 3: Find a reward function such that the teacher maximally outperforms all previously found controllers. max γ γ, w 1 s.t. w η(π o ) w η(π) + γ, π {π 0, π 1,..., π i 1 } 4: Find optimal policy π i for the reward function R w w.r.t current w (using any RL algs, e.g., tabular, approximate RL, etc.). 13/14

25 Other Resources Excellent survey on LfD and various formulations see also section ~/Current_Work Pieter Abbeel s simulated highway driving MLR Lab s learning to open door Relational activity processes for toolbox assembly task LfD 14/14

26 Appendix: Quick Review on Convex Optimization Slides from Marc Toussaint s Introduction to Optimization lectures Solvers: CVX (MATLAB), CVXOPT (Python), etc /14

27 Linear and Quadratic Programs Linear Program (LP) LP in standard form Quadratic Program (QP) min x c x s.t. Gx h, Ax = b min x c x s.t. x 0, Ax = b 1 min x 2 x Qx + c x s.t. Gx h, Ax = b where x R n, Q is positive definite. 16/14

28 Transforming an LP problem into standard form LP problem: Define slack variables: min x c x s.t. Gx h, Ax = b min x,ξ c x s.t. Gx + ξ = h, Ax = b, ξ 0 Express x = x + x with x +, x 0: min c (x + x ) x +,x,ξ s.t. G(x + x ) + ξ = h, A(x + x ) = b, ξ 0, x + 0, x 0 where (x +, x, ξ) R 2n+m Now this is conform with the standard form (replacing (x +, x, ξ) z, etc) min z w z s.t. z 0, Dz = e 17/14

29 Algorithms for Linear Programming Constrained optimization methods augmented Lagrangian (LANCELOT software), penalty log barrier ( interior point method, [central] path following ) primal-dual Newton The simplex algorithm, walking on the constraints (The emphasis in the notion of interior point methods is to distinguish from constraint walking methods.) Interior point and simplex methods are comparably efficient Which is better depends on the problem 18/14

30 Quadratic Programming 1 min x 2 x Qx + c x s.t. Gx h, Ax = b Efficient Algorithms: Interior point (log barrier) Augmented Lagrangian Penalty Highly relevant applications: Support Vector Machines Similar types of max-margin modelling methods 19/14

31 Example: Support Vector Machine Primal: Dual: max M s.t. i : y i (φ(x i ) β) M β, β =1 min β β 2 s.t. i : y i (φ(x i ) β) 1 y B A x 20/14

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Inverse Reinforcement Learning Inverse RL, behaviour cloning, apprenticeship learning, imitation learning. Vien Ngo Marc Toussaint University of Stuttgart Outline Introduction to

More information

Maximum Margin Planning

Maximum Margin Planning Maximum Margin Planning Nathan Ratliff, Drew Bagnell and Martin Zinkevich Presenters: Ashesh Jain, Michael Hu CS6784 Class Presentation Theme 1. Supervised learning 2. Unsupervised learning 3. Reinforcement

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Batch, Off-policy and Model-free Apprenticeship Learning

Batch, Off-policy and Model-free Apprenticeship Learning Batch, Off-policy and Model-free Apprenticeship Learning Edouard Klein 13, Matthieu Geist 1, and Olivier Pietquin 12 1. Supélec-Metz Campus, IMS Research group, France, prenom.nom@supelec.fr 2. UMI 2958

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement

More information

Relative Entropy Inverse Reinforcement Learning

Relative Entropy Inverse Reinforcement Learning Relative Entropy Inverse Reinforcement Learning Abdeslam Boularias Jens Kober Jan Peters Max-Planck Institute for Intelligent Systems 72076 Tübingen, Germany {abdeslam.boularias,jens.kober,jan.peters}@tuebingen.mpg.de

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Stochastic Primal-Dual Methods for Reinforcement Learning

Stochastic Primal-Dual Methods for Reinforcement Learning Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

An Introduction to Reinforcement Learning

An Introduction to Reinforcement Learning An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important

More information

MAP Inference for Bayesian Inverse Reinforcement Learning

MAP Inference for Bayesian Inverse Reinforcement Learning MAP Inference for Bayesian Inverse Reinforcement Learning Jaedeug Choi and Kee-Eung Kim bdepartment of Computer Science Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea jdchoi@ai.kaist.ac.kr,

More information

Adversarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer

Adversarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer ersarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer Xiangli Chen Mathew Monfort Brian D. Ziebart University of Illinois at Chicago Chicago, IL 60607 {xchen0,mmonfo,bziebart}@uic.edu

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Nonparametric Bayesian Inverse Reinforcement Learning

Nonparametric Bayesian Inverse Reinforcement Learning PRML Summer School 2013 Nonparametric Bayesian Inverse Reinforcement Learning Jaedeug Choi JDCHOI@AI.KAIST.AC.KR Sequential Decision Making (1) Multiple decisions over time are made to achieve goals Reinforcement

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Module 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Module 8 Linear Programming CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Policy Optimization Value and policy iteration Iterative algorithms that implicitly solve

More information

Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More. March 8, 2017

Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More. March 8, 2017 Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More March 8, 2017 Defining a Loss Function for RL Let η(π) denote the expected return of π [ ] η(π) = E s0 ρ 0,a t π( s t) γ t r t We collect

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Nonlinear Optimization for Optimal Control

Nonlinear Optimization for Optimal Control Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University

More information

Inverse Optimal Control

Inverse Optimal Control Inverse Optimal Control Oleg Arenz Technische Universität Darmstadt o.arenz@gmx.de Abstract In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Parking lot navigation. Experimental setup. Problem setup. Nice driving style. Page 1. CS 287: Advanced Robotics Fall 2009

Parking lot navigation. Experimental setup. Problem setup. Nice driving style. Page 1. CS 287: Advanced Robotics Fall 2009 Consider the following scenario: There are two envelopes, each of which has an unknown amount of money in it. You get to choose one of the envelopes. Given this is all you get to know, how should you choose?

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning

A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning Edouard Klein 1,2, Bilal Piot 2,3, Matthieu Geist 2, Olivier Pietquin 2,3 1 ABC Team LORIA-CNRS, France. 2 Supélec, IMS-MaLIS Research

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Tutorial on Convex Optimization: Part II

Tutorial on Convex Optimization: Part II Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

A Game-Theoretic Approach to Apprenticeship Learning

A Game-Theoretic Approach to Apprenticeship Learning Advances in Neural Information Processing Systems 20, 2008. A Game-Theoretic Approach to Apprenticeship Learning Umar Syed Computer Science Department Princeton University 35 Olden St Princeton, NJ 08540-5233

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Inverse Reinforcement Learning in Partially Observable Environments

Inverse Reinforcement Learning in Partially Observable Environments Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Inverse Reinforcement Learning in Partially Observable Environments Jaedeug Choi and Kee-Eung Kim Department

More information

Imitation Learning. Richard Zhu, Andrew Kang April 26, 2016

Imitation Learning. Richard Zhu, Andrew Kang April 26, 2016 Imitation Learning Richard Zhu, Andrew Kang April 26, 2016 Table of Contents 1. Introduction 2. Preliminaries 3. DAgger 4. Guarantees 5. Generalization 6. Performance 2 Introduction Where we ve been The

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations

Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations Francisco S. Melo 1 and Manuel Lopes 2 and Ricardo Ferreira 3 Abstract. Inverse reinforcement learning (IRL addresses the problem

More information

Variance Reduction for Policy Gradient Methods. March 13, 2017

Variance Reduction for Policy Gradient Methods. March 13, 2017 Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits

More information

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics Michael Herman Tobias Gindele Jörg Wagner Felix Schmitt Wolfram Burgard Robert Bosch GmbH D-70442 Stuttgart, Germany

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40

Optimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40 Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 28, 2017 1 / 40 The Key Idea of Newton s Method Let f : R n R be a twice differentiable function

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Stuart Russell, UC Berkeley Stuart Russell, UC Berkeley 1 Outline Sequential decision making Dynamic programming algorithms Reinforcement learning algorithms temporal difference

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

Lecture 8. Strong Duality Results. September 22, 2008

Lecture 8. Strong Duality Results. September 22, 2008 Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation

More information

Lecture 13: Constrained optimization

Lecture 13: Constrained optimization 2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems

More information

arxiv: v1 [cs.ro] 12 Aug 2016

arxiv: v1 [cs.ro] 12 Aug 2016 Density Matching Reward Learning Sungjoon Choi 1, Kyungjae Lee 1, H. Andy Park 2, and Songhwai Oh 1 arxiv:1608.03694v1 [cs.ro] 12 Aug 2016 1 Seoul National University, Seoul, Korea {sungjoon.choi, kyungjae.lee,

More information

Reinforcement Learning and Deep Reinforcement Learning

Reinforcement Learning and Deep Reinforcement Learning Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q

More information

EM-based Reinforcement Learning

EM-based Reinforcement Learning EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

A Quick Tour of Linear Algebra and Optimization for Machine Learning

A Quick Tour of Linear Algebra and Optimization for Machine Learning A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators

More information

Lecture 9: Policy Gradient II 1

Lecture 9: Policy Gradient II 1 Lecture 9: Policy Gradient II 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Additional reading: Sutton and Barto 2018 Chp. 13 1 With many slides from or derived from David Silver and John

More information

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning

Today s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides

More information

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26

Linear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26 Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Applications of Linear Programming

Applications of Linear Programming Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal

More information

Least squares policy iteration (LSPI)

Least squares policy iteration (LSPI) Least squares policy iteration (LSPI) Charles Elkan elkan@cs.ucsd.edu December 6, 2012 1 Policy evaluation and policy improvement Let π be a non-deterministic but stationary policy, so p(a s; π) is the

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Convex Optimization Overview (cnt d)

Convex Optimization Overview (cnt d) Convex Optimization Overview (cnt d) Chuong B. Do October 6, 007 1 Recap During last week s section, we began our study of convex optimization, the study of mathematical optimization problems of the form,

More information

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)

Course Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979) Course Outline FRTN Multivariable Control, Lecture Automatic Control LTH, 6 L-L Specifications, models and loop-shaping by hand L6-L8 Limitations on achievable performance L9-L Controller optimization:

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Learning in Zero-Sum Team Markov Games using Factored Value Functions

Learning in Zero-Sum Team Markov Games using Factored Value Functions Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Autonomous Helicopter Flight via Reinforcement Learning

Autonomous Helicopter Flight via Reinforcement Learning Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy

More information

State Space Abstractions for Reinforcement Learning

State Space Abstractions for Reinforcement Learning State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction

More information

Lecture 8: Policy Gradient I 2

Lecture 8: Policy Gradient I 2 Lecture 8: Policy Gradient I 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver and John Schulman

More information

Approximation Methods in Reinforcement Learning

Approximation Methods in Reinforcement Learning 2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement

More information

Imitation Learning by Coaching. Abstract

Imitation Learning by Coaching. Abstract Imitation Learning by Coaching He He Hal Daumé III Department of Computer Science University of Maryland College Park, MD 20740 {hhe,hal}@cs.umd.edu Abstract Jason Eisner Department of Computer Science

More information

Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning

Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning Jaedeug Choi and Kee-Eung Kim Department

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

Distributed Optimization. Song Chong EE, KAIST

Distributed Optimization. Song Chong EE, KAIST Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links

More information

LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise) What is Optimization? An Optimization Problem has 3 parts. x F f(x) :

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

(Deep) Reinforcement Learning

(Deep) Reinforcement Learning Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015

More information

SUPPORT VECTOR MACHINE

SUPPORT VECTOR MACHINE SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Hybrid Machine Learning Algorithms

Hybrid Machine Learning Algorithms Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised

More information

Convex Optimization and SVM

Convex Optimization and SVM Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence

More information