Reinforcement Learning
|
|
- Kenneth Wade
- 5 years ago
- Views:
Transcription
1 Reinforcement Learning Inverse Reinforcement Learning LfD, imitation learning/behavior cloning, apprenticeship learning, IRL. Hung Ngo MLR Lab, University of Stuttgart
2 Outline Learning from Demonstrations (LfD) Behavioral Cloning/Imitation Learning Inverse Reinforcement Learning (IRL) Algorithms 2/14
3 Learning from Demonstrations (LfD) Setting: An oracle teaches an agent how to perform a given task. Given: Samples of an MDP agent s behavior over time and in different circumstances, from a supposedly optimal policy π o, i.e., A set of trajectories {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Reward signal r t = R(s t, a t, s t+1 ) unobserved Transition model T (s, a, s ) = P (s s, a) known/unknown. 3/14
4 Learning from Demonstrations (LfD) Setting: An oracle teaches an agent how to perform a given task. Given: Samples of an MDP agent s behavior over time and in different circumstances, from a supposedly optimal policy π o, i.e., A set of trajectories {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Reward signal r t = R(s t, a t, s t+1 ) unobserved Transition model T (s, a, s ) = P (s s, a) known/unknown. Goals: Recover teacher s policy π o directly: behavioral cloning, or imitation learning. Recover teacher s latent reward function R o (s, a, s ): IRL. Recover teacher s policy π o indirectly by first recovering R o (s, a, s ): apprenticeship learning via IRL. 3/14
5 Behavioral cloning Formulated as a supervised-learning problem Given training data {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Learn policy mapping ˆπ o : S A. Solved using SVM, (deep) ANN, etc. 4/14
6 Behavioral cloning Formulated as a supervised-learning problem Given training data {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Learn policy mapping ˆπ o : S A. Solved using SVM, (deep) ANN, etc. Behavioral cloning/il: can only mimic the trajectory of the teacher, no transfer w.r.t. task (e.g., env. changed but similar goals), may fail in non-markovian environments (e.g. in driving; sometimes states from several time-steps needed). 4/14
7 Behavioral cloning Formulated as a supervised-learning problem Given training data {ξ i } n i=1, ξ i = {(s t, a t )} Hi 1 t=0, a t π o (s t ). Learn policy mapping ˆπ o : S A. Solved using SVM, (deep) ANN, etc. Behavioral cloning/il: can only mimic the trajectory of the teacher, no transfer w.r.t. task (e.g., env. changed but similar goals), may fail in non-markovian environments (e.g. in driving; sometimes states from several time-steps needed). IRL vs. behavioral cloning: ˆRo vs. ˆπ o. Why not recover V πo instead? reward function is more succint (easily generalizable/transferrable) values are trajectory dependent 4/14
8 Why IRL? As computational model for learning behaviors in natural world Bee foraging (Montague et al 1995) Song-bird vocalization (Doya & Sejnowski 1995) 5/14
9 Why IRL? As computational model for learning behaviors in natural world Bee foraging (Montague et al 1995) Song-bird vocalization (Doya & Sejnowski 1995) Construction of an intelligent agent in a particular domain Modeling humans and other adversarial/cooperative agents. Collaborative robots (learn reward func. & plan for cooperative tasks) Intermediate step in apprenticeship learning (autonomous driving, driver preferences, autonomous flight, e.g., helicopter, etc.) Abbeel et al 04 Ziebart et al. 08 Andrew Ng et. al. 5/14
10 Example: Urban Navigation picture from a tutorial of Pieter Abbeel. 6/14
11 IRL Formulation #1: Small, Discrete MDPs Given: An incomplete MDP M = S, A, T, R, γ. known transition model T (s, a, s ) = P (s s, a), s, a, s unobserved but bounded reward signal, R(s, a, s ) r max, s, a, s (for simplicity, consider state-dependent reward functions, R(s)) known, supposedly optimal policy π o (s), s S, instead of {ξ i } n i=1. Find R : S [ r max, r max ] such that teacher s policy π o is optimal, furthermore: simple, and robust reward function Notes: in the following we fix an enumeration on the state space: S = {s 1,..., s S }. Then R is a column vector in R S, with R i = R(s i ). Andrew Ng, Stuart Russell: Algorithms for Inverse Reinforcement Learning. ICML /14
12 IRL Formulation #1: Small, Discrete MDPs Find R R S such that teacher s policy π o is optimal: recall Bellman optimality theorem (for a known MDP): π o (s) is optimal π o (s) argmax a Q πo (s, a), s S Q πo (s, π o (s)) Q πo (s, a), s S, a A ( ) 1 x y denotes vectorial (component-wise) inequality: x i y i for every index i. 8/14
13 IRL Formulation #1: Small, Discrete MDPs Find R R S such that teacher s policy π o is optimal: recall Bellman optimality theorem (for a known MDP): π o (s) is optimal π o (s) argmax a Q πo (s, a), s S Q πo (s, π o (s)) Q πo (s, a), s S, a A ( ) define policy-conditioned transition matrices P o and P a [0, 1] S S : [P o ] ij := P (s j s i, π o (s i )), and [P a ] ij := P (s j s i, a), s i, s j S 1 x y denotes vectorial (component-wise) inequality: x i y i for every index i. 8/14
14 IRL Formulation #1: Small, Discrete MDPs Find R R S such that teacher s policy π o is optimal: recall Bellman optimality theorem (for a known MDP): π o (s) is optimal π o (s) argmax a Q πo (s, a), s S Q πo (s, π o (s)) Q πo (s, a), s S, a A ( ) define policy-conditioned transition matrices P o and P a [0, 1] S S : [P o ] ij := P (s j s i, π o (s i )), and [P a ] ij := P (s j s i, a), s i, s j S we can represent the constraints 1 (*) on R as: (P o P a )(I γp o ) 1 R 0, a A ( ) Proof: Bellman equations Q πo (s, a) = R(s) + γ s P (s s, a)v πo (s ), and V πo = (I γp o ) 1 R. Denote by Q πo π a length- S column vector with elements Q πo π (s) := Qπo (s, π(s)), i.e., Q πo π = R + γp π V πo. The set of S A constraints in (*) can be written in matrix form (by fixing an action a for all starting states s S) as: Q πo o Qπo a 0, a A (**). 1 x y denotes vectorial (component-wise) inequality: x i y i for every index i. 8/14
15 IRL Formulation #1: Small, Discrete MDPs Challenges: What if noisy teacher? (i.e., a t π o (s t ) at some t) instead of full π o (s), s S, only given sampled trajectories {ξ i } n i=1? computationally expensive/infeasible: S A constraints for each R reward function ambiguity: IRL is ill-posed! (R = 0 is a solution.) From reward-shaping theory: If the MDP M with reward function R admits π o as an optimal policy, then M with affine-transformed reward function below also admits π o as an optimal policy: R (s, a, s ) = αr(s, a, s ) + γψ(s ) ψ(s), with ψ : S R, α 0. 9/14
16 IRL Formulation #1: Small, Discrete MDPs Challenges: What if noisy teacher? (i.e., a t π o (s t ) at some t) instead of full π o (s), s S, only given sampled trajectories {ξ i } n i=1? computationally expensive/infeasible: S A constraints for each R reward function ambiguity: IRL is ill-posed! (R = 0 is a solution.) From reward-shaping theory: If the MDP M with reward function R admits π o as an optimal policy, then M with affine-transformed reward function below also admits π o as an optimal policy: R (s, a, s ) = αr(s, a, s ) + γψ(s ) ψ(s), with ψ : S R, α 0. One solution (to the reward ambiguity issue): find simple, and robust R, e.g., use l 1 -norm penalty R 1, and maximize sum of value-margins V πo (s) of π o & second-best action, V πo (s) = Q πo (s, π o (s)) max a π o (s) Qπo (s, a) = min a π o (s) [Qπo (s, π o (s)) Q πo (s, a)] 9/14
17 IRL Formulation #1: Small, Discrete MDPs Combining altogether: { max R R S min a A\π o (s) s S { } } (Ps o Ps a )(I γp o ) 1 R λ R 1 s. t. (P o P a )(I γp o ) 1 R 0, a A R(s) r max, s S with P a s the row vector of transition probabilities P (s s, a), s S, i.e., P o s, P a s are the s-th rows of P o, P a, respectively. 10/14
18 IRL Formulation #1: Small, Discrete MDPs Combining altogether: { max R R S min a A\π o (s) s S { } } (Ps o Ps a )(I γp o ) 1 R λ R 1 s. t. (P o P a )(I γp o ) 1 R 0, a A R(s) r max, s S with P a s the row vector of transition probabilities P (s s, a), s S, i.e., P o s, P a s are the s-th rows of P o, P a, respectively. Linear Program: hints We can use two dummy length- S { column vectors U = } R and Γ a vector with s-th element as min a A\π o (s) (Ps o P s a)(i γp o ) 1 R, and create a length-3 S column vector x = (R, U, Γ). Let c denote a length-3 S column vector c = (0, 1, λ1), the LP becomes max x c x s.t. U R U, 0 U r max1, Γ 0, A a R 0, Ā a R Γ a, a A, with A a = (P o P a )(I γp o ) 1, and Āa, Γ a are the resulting matrices and vectors after deleting from A a, Γ the rows s such that π o (s) = a. 10/14
19 IRL Formulation #2: With LFA For large/continuous domains, with sampled trajectories. Assume s 0 P 0 (S); for teacher s policy π o to be optimal: [ E γ t R(s t ) π o] [ ] E γ t R(s t ) π, π t=0 t=0 11/14
20 IRL Formulation #2: With LFA For large/continuous domains, with sampled trajectories. Assume s 0 P 0 (S); for teacher s policy π o to be optimal: [ E γ t R(s t ) π o] [ ] E γ t R(s t ) π, π t=0 Using LFA: R(s) = w φ(s), where w R n, w 2 1, and φ : S R n. [ ] E γ t R(s t ) π = E t=0 [ t=0 t=0 ] γ t w φ(s t ) π [ ] = w E γ t φ(s t ) π = w η(π) The problem becomes find w such that w η(π o ) w η(π), π t=0 η(π): feature expectation of policy π can be evaluated with sampled trajectories from π. [ ] η(π) = E γ t φ(s t ) π 1 N t=0 N T i γ t φ(s t ) i=1 t=0 11/14
21 Apprenticeship learning: Literature Pieter Abbeel, Andrew Ng: Apprenticeship learning via inverse RL. ICML 04 Pieter Abbeel et al.: An Application of RL to Aerobatic Helicopter Flight. NIPS 06. Ratliff, Nathanet al., Maximum margin planning. ICML 06 Ziebart, Brian D., et al. Maximum Entropy Inverse Reinforcement Learning. AAAI 08 Adam Coates et al.. Apprenticeship learning for helicopter control. Commun. ACM 09 12/14
22 Apprenticeship learning via IRL: Max-margin From IRL formulation #2, find a policy π whose performance is as close to performance of oracle s policy π o as possible: w η(π o ) w η(π) ɛ 13/14
23 Apprenticeship learning via IRL: Max-margin From IRL formulation #2, find a policy π whose performance is as close to performance of oracle s policy π o as possible: w η(π o ) w η(π) ɛ ] Also maximize the value margin γ = min π [w η(π o ) w η(π), 13/14
24 Apprenticeship learning via IRL: Max-margin From IRL formulation #2, find a policy π whose performance is as close to performance of oracle s policy π o as possible: w η(π o ) w η(π) ɛ ] Also maximize the value margin γ = min π [w η(π o ) w η(π), Constraints Generation Algorithm: 1: Initialize π 0 (depending on chosen RL alg, e.g., tabular, approximate RL, etc.) 2: for i = 1, 2,... do 3: Find a reward function such that the teacher maximally outperforms all previously found controllers. max γ γ, w 1 s.t. w η(π o ) w η(π) + γ, π {π 0, π 1,..., π i 1 } 4: Find optimal policy π i for the reward function R w w.r.t current w (using any RL algs, e.g., tabular, approximate RL, etc.). 13/14
25 Other Resources Excellent survey on LfD and various formulations see also section ~/Current_Work Pieter Abbeel s simulated highway driving MLR Lab s learning to open door Relational activity processes for toolbox assembly task LfD 14/14
26 Appendix: Quick Review on Convex Optimization Slides from Marc Toussaint s Introduction to Optimization lectures Solvers: CVX (MATLAB), CVXOPT (Python), etc /14
27 Linear and Quadratic Programs Linear Program (LP) LP in standard form Quadratic Program (QP) min x c x s.t. Gx h, Ax = b min x c x s.t. x 0, Ax = b 1 min x 2 x Qx + c x s.t. Gx h, Ax = b where x R n, Q is positive definite. 16/14
28 Transforming an LP problem into standard form LP problem: Define slack variables: min x c x s.t. Gx h, Ax = b min x,ξ c x s.t. Gx + ξ = h, Ax = b, ξ 0 Express x = x + x with x +, x 0: min c (x + x ) x +,x,ξ s.t. G(x + x ) + ξ = h, A(x + x ) = b, ξ 0, x + 0, x 0 where (x +, x, ξ) R 2n+m Now this is conform with the standard form (replacing (x +, x, ξ) z, etc) min z w z s.t. z 0, Dz = e 17/14
29 Algorithms for Linear Programming Constrained optimization methods augmented Lagrangian (LANCELOT software), penalty log barrier ( interior point method, [central] path following ) primal-dual Newton The simplex algorithm, walking on the constraints (The emphasis in the notion of interior point methods is to distinguish from constraint walking methods.) Interior point and simplex methods are comparably efficient Which is better depends on the problem 18/14
30 Quadratic Programming 1 min x 2 x Qx + c x s.t. Gx h, Ax = b Efficient Algorithms: Interior point (log barrier) Augmented Lagrangian Penalty Highly relevant applications: Support Vector Machines Similar types of max-margin modelling methods 19/14
31 Example: Support Vector Machine Primal: Dual: max M s.t. i : y i (φ(x i ) β) M β, β =1 min β β 2 s.t. i : y i (φ(x i ) β) 1 y B A x 20/14
Reinforcement Learning
Reinforcement Learning Inverse Reinforcement Learning Inverse RL, behaviour cloning, apprenticeship learning, imitation learning. Vien Ngo Marc Toussaint University of Stuttgart Outline Introduction to
More informationMaximum Margin Planning
Maximum Margin Planning Nathan Ratliff, Drew Bagnell and Martin Zinkevich Presenters: Ashesh Jain, Michael Hu CS6784 Class Presentation Theme 1. Supervised learning 2. Unsupervised learning 3. Reinforcement
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationBatch, Off-policy and Model-free Apprenticeship Learning
Batch, Off-policy and Model-free Apprenticeship Learning Edouard Klein 13, Matthieu Geist 1, and Olivier Pietquin 12 1. Supélec-Metz Campus, IMS Research group, France, prenom.nom@supelec.fr 2. UMI 2958
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@cse.iitb.ac.in Department of Computer Science and Engineering Indian Institute of Technology Bombay April 2018 What is Reinforcement
More informationRelative Entropy Inverse Reinforcement Learning
Relative Entropy Inverse Reinforcement Learning Abdeslam Boularias Jens Kober Jan Peters Max-Planck Institute for Intelligent Systems 72076 Tübingen, Germany {abdeslam.boularias,jens.kober,jan.peters}@tuebingen.mpg.de
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationStochastic Primal-Dual Methods for Reinforcement Learning
Stochastic Primal-Dual Methods for Reinforcement Learning Alireza Askarian 1 Amber Srivastava 1 1 Department of Mechanical Engineering University of Illinois at Urbana Champaign Big Data Optimization,
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationAn Introduction to Reinforcement Learning
An Introduction to Reinforcement Learning Shivaram Kalyanakrishnan shivaram@csa.iisc.ernet.in Department of Computer Science and Automation Indian Institute of Science August 2014 What is Reinforcement
More informationArtificial Intelligence
Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important
More informationMAP Inference for Bayesian Inverse Reinforcement Learning
MAP Inference for Bayesian Inverse Reinforcement Learning Jaedeug Choi and Kee-Eung Kim bdepartment of Computer Science Korea Advanced Institute of Science and Technology Daejeon 305-701, Korea jdchoi@ai.kaist.ac.kr,
More informationAdversarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer
ersarial Inverse Optimal Control for General Imitation Learning Losses and Embodiment Transfer Xiangli Chen Mathew Monfort Brian D. Ziebart University of Illinois at Chicago Chicago, IL 60607 {xchen0,mmonfo,bziebart}@uic.edu
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationNonparametric Bayesian Inverse Reinforcement Learning
PRML Summer School 2013 Nonparametric Bayesian Inverse Reinforcement Learning Jaedeug Choi JDCHOI@AI.KAIST.AC.KR Sequential Decision Making (1) Multiple decisions over time are made to achieve goals Reinforcement
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationModule 8 Linear Programming. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo
Module 8 Linear Programming CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo Policy Optimization Value and policy iteration Iterative algorithms that implicitly solve
More informationAdvanced Policy Gradient Methods: Natural Gradient, TRPO, and More. March 8, 2017
Advanced Policy Gradient Methods: Natural Gradient, TRPO, and More March 8, 2017 Defining a Loss Function for RL Let η(π) denote the expected return of π [ ] η(π) = E s0 ρ 0,a t π( s t) γ t r t We collect
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationIntroduction to Reinforcement Learning
CSCI-699: Advanced Topics in Deep Learning 01/16/2019 Nitin Kamra Spring 2019 Introduction to Reinforcement Learning 1 What is Reinforcement Learning? So far we have seen unsupervised and supervised learning.
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationReinforcement Learning
Reinforcement Learning Markov decision process & Dynamic programming Evaluative feedback, value function, Bellman equation, optimality, Markov property, Markov decision process, dynamic programming, value
More informationReinforcement Learning
Reinforcement Learning Model-Based Reinforcement Learning Model-based, PAC-MDP, sample complexity, exploration/exploitation, RMAX, E3, Bayes-optimal, Bayesian RL, model learning Vien Ngo MLR, University
More informationInverse Optimal Control
Inverse Optimal Control Oleg Arenz Technische Universität Darmstadt o.arenz@gmx.de Abstract In Reinforcement Learning, an agent learns a policy that maximizes a given reward function. However, providing
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationParking lot navigation. Experimental setup. Problem setup. Nice driving style. Page 1. CS 287: Advanced Robotics Fall 2009
Consider the following scenario: There are two envelopes, each of which has an unknown amount of money in it. You get to choose one of the envelopes. Given this is all you get to know, how should you choose?
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationA Cascaded Supervised Learning Approach to Inverse Reinforcement Learning
A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning Edouard Klein 1,2, Bilal Piot 2,3, Matthieu Geist 2, Olivier Pietquin 2,3 1 ABC Team LORIA-CNRS, France. 2 Supélec, IMS-MaLIS Research
More informationReinforcement Learning
Reinforcement Learning Ron Parr CompSci 7 Department of Computer Science Duke University With thanks to Kris Hauser for some content RL Highlights Everybody likes to learn from experience Use ML techniques
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationTutorial on Convex Optimization: Part II
Tutorial on Convex Optimization: Part II Dr. Khaled Ardah Communications Research Laboratory TU Ilmenau Dec. 18, 2018 Outline Convex Optimization Review Lagrangian Duality Applications Optimal Power Allocation
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationA Game-Theoretic Approach to Apprenticeship Learning
Advances in Neural Information Processing Systems 20, 2008. A Game-Theoretic Approach to Apprenticeship Learning Umar Syed Computer Science Department Princeton University 35 Olden St Princeton, NJ 08540-5233
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationInverse Reinforcement Learning in Partially Observable Environments
Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (IJCAI-09) Inverse Reinforcement Learning in Partially Observable Environments Jaedeug Choi and Kee-Eung Kim Department
More informationImitation Learning. Richard Zhu, Andrew Kang April 26, 2016
Imitation Learning Richard Zhu, Andrew Kang April 26, 2016 Table of Contents 1. Introduction 2. Preliminaries 3. DAgger 4. Guarantees 5. Generalization 6. Performance 2 Introduction Where we ve been The
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationAnalysis of Inverse Reinforcement Learning with Perturbed Demonstrations
Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations Francisco S. Melo 1 and Manuel Lopes 2 and Ricardo Ferreira 3 Abstract. Inverse reinforcement learning (IRL addresses the problem
More informationVariance Reduction for Policy Gradient Methods. March 13, 2017
Variance Reduction for Policy Gradient Methods March 13, 2017 Reward Shaping Reward Shaping Reward Shaping Reward shaping: r(s, a, s ) = r(s, a, s ) + γφ(s ) Φ(s) for arbitrary potential Φ Theorem: r admits
More informationInverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics
Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics Michael Herman Tobias Gindele Jörg Wagner Felix Schmitt Wolfram Burgard Robert Bosch GmbH D-70442 Stuttgart, Germany
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationCS711008Z Algorithm Design and Analysis
CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationOptimization. Yuh-Jye Lee. March 28, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 40
Optimization Yuh-Jye Lee Data Science and Machine Intelligence Lab National Chiao Tung University March 28, 2017 1 / 40 The Key Idea of Newton s Method Let f : R n R be a twice differentiable function
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationML (cont.): SUPPORT VECTOR MACHINES
ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version
More informationReinforcement learning
Reinforcement learning Stuart Russell, UC Berkeley Stuart Russell, UC Berkeley 1 Outline Sequential decision making Dynamic programming algorithms Reinforcement learning algorithms temporal difference
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Formal models of interaction Daniel Hennes 27.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Taxonomy of domains Models of
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More informationLecture 8. Strong Duality Results. September 22, 2008
Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation
More informationLecture 13: Constrained optimization
2010-12-03 Basic ideas A nonlinearly constrained problem must somehow be converted relaxed into a problem which we can solve (a linear/quadratic or unconstrained problem) We solve a sequence of such problems
More informationarxiv: v1 [cs.ro] 12 Aug 2016
Density Matching Reward Learning Sungjoon Choi 1, Kyungjae Lee 1, H. Andy Park 2, and Songhwai Oh 1 arxiv:1608.03694v1 [cs.ro] 12 Aug 2016 1 Seoul National University, Seoul, Korea {sungjoon.choi, kyungjae.lee,
More informationReinforcement Learning and Deep Reinforcement Learning
Reinforcement Learning and Deep Reinforcement Learning Ashis Kumer Biswas, Ph.D. ashis.biswas@ucdenver.edu Deep Learning November 5, 2018 1 / 64 Outlines 1 Principles of Reinforcement Learning 2 The Q
More informationEM-based Reinforcement Learning
EM-based Reinforcement Learning Gerhard Neumann 1 1 TU Darmstadt, Intelligent Autonomous Systems December 21, 2011 Outline Expectation Maximization (EM)-based Reinforcement Learning Recap : Modelling data
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationA Quick Tour of Linear Algebra and Optimization for Machine Learning
A Quick Tour of Linear Algebra and Optimization for Machine Learning Masoud Farivar January 8, 2015 1 / 28 Outline of Part I: Review of Basic Linear Algebra Matrices and Vectors Matrix Multiplication Operators
More informationLecture 9: Policy Gradient II 1
Lecture 9: Policy Gradient II 1 Emma Brunskill CS234 Reinforcement Learning. Winter 2019 Additional reading: Sutton and Barto 2018 Chp. 13 1 With many slides from or derived from David Silver and John
More informationToday s Outline. Recap: MDPs. Bellman Equations. Q-Value Iteration. Bellman Backup 5/7/2012. CSE 473: Artificial Intelligence Reinforcement Learning
CSE 473: Artificial Intelligence Reinforcement Learning Dan Weld Today s Outline Reinforcement Learning Q-value iteration Q-learning Exploration / exploitation Linear function approximation Many slides
More informationLinear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26
Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationReinforcement Learning
Reinforcement Learning Temporal Difference Learning Temporal difference learning, TD prediction, Q-learning, elibigility traces. (many slides from Marc Toussaint) Vien Ngo MLR, University of Stuttgart
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationConstrained Optimization and Support Vector Machines
Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationApplications of Linear Programming
Applications of Linear Programming lecturer: András London University of Szeged Institute of Informatics Department of Computational Optimization Lecture 9 Non-linear programming In case of LP, the goal
More informationLeast squares policy iteration (LSPI)
Least squares policy iteration (LSPI) Charles Elkan elkan@cs.ucsd.edu December 6, 2012 1 Policy evaluation and policy improvement Let π be a non-deterministic but stationary policy, so p(a s; π) is the
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationConvex Optimization Overview (cnt d)
Convex Optimization Overview (cnt d) Chuong B. Do October 6, 007 1 Recap During last week s section, we began our study of convex optimization, the study of mathematical optimization problems of the form,
More informationCourse Outline. FRTN10 Multivariable Control, Lecture 13. General idea for Lectures Lecture 13 Outline. Example 1 (Doyle Stein, 1979)
Course Outline FRTN Multivariable Control, Lecture Automatic Control LTH, 6 L-L Specifications, models and loop-shaping by hand L6-L8 Limitations on achievable performance L9-L Controller optimization:
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLearning in Zero-Sum Team Markov Games using Factored Value Functions
Learning in Zero-Sum Team Markov Games using Factored Value Functions Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 27708 mgl@cs.duke.edu Ronald Parr Department of Computer
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationAutonomous Helicopter Flight via Reinforcement Learning
Autonomous Helicopter Flight via Reinforcement Learning Authors: Andrew Y. Ng, H. Jin Kim, Michael I. Jordan, Shankar Sastry Presenters: Shiv Ballianda, Jerrolyn Hebert, Shuiwang Ji, Kenley Malveaux, Huy
More informationState Space Abstractions for Reinforcement Learning
State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction
More informationLecture 8: Policy Gradient I 2
Lecture 8: Policy Gradient I 2 Emma Brunskill CS234 Reinforcement Learning. Winter 2018 Additional reading: Sutton and Barto 2018 Chp. 13 2 With many slides from or derived from David Silver and John Schulman
More informationApproximation Methods in Reinforcement Learning
2018 CS420, Machine Learning, Lecture 12 Approximation Methods in Reinforcement Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Reinforcement
More informationImitation Learning by Coaching. Abstract
Imitation Learning by Coaching He He Hal Daumé III Department of Computer Science University of Maryland College Park, MD 20740 {hhe,hal}@cs.umd.edu Abstract Jason Eisner Department of Computer Science
More informationBayesian Nonparametric Feature Construction for Inverse Reinforcement Learning
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Bayesian Nonparametric Feature Construction for Inverse Reinforcement Learning Jaedeug Choi and Kee-Eung Kim Department
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationActive Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer
More informationDistributed Optimization. Song Chong EE, KAIST
Distributed Optimization Song Chong EE, KAIST songchong@kaist.edu Dynamic Programming for Path Planning A path-planning problem consists of a weighted directed graph with a set of n nodes N, directed links
More informationLMI Methods in Optimal and Robust Control
LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise) What is Optimization? An Optimization Problem has 3 parts. x F f(x) :
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More information(Deep) Reinforcement Learning
Martin Matyášek Artificial Intelligence Center Czech Technical University in Prague October 27, 2016 Martin Matyášek VPD, 2016 1 / 17 Reinforcement Learning in a picture R. S. Sutton and A. G. Barto 2015
More informationSUPPORT VECTOR MACHINE
SUPPORT VECTOR MACHINE Mainly based on https://nlp.stanford.edu/ir-book/pdf/15svm.pdf 1 Overview SVM is a huge topic Integration of MMDS, IIR, and Andrew Moore s slides here Our foci: Geometric intuition
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationHybrid Machine Learning Algorithms
Hybrid Machine Learning Algorithms Umar Syed Princeton University Includes joint work with: Rob Schapire (Princeton) Nina Mishra, Alex Slivkins (Microsoft) Common Approaches to Machine Learning!! Supervised
More informationConvex Optimization and SVM
Convex Optimization and SVM Problem 0. Cf lecture notes pages 12 to 18. Problem 1. (i) A slab is an intersection of two half spaces, hence convex. (ii) A wedge is an intersection of two half spaces, hence
More information