Laplacian Agent Learning: Representation Policy Iteration

Size: px
Start display at page:

Download "Laplacian Agent Learning: Representation Policy Iteration"

Transcription

1 Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan

2 Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...) V a2 ( Earth ) = f(100,-1,-1,-1,-1,...)

3 Infinite Horizon Markov Decision Processes! Graded myopia: discounted sum of rewards " Maximize γ t " Hell if γ < 0.98, otherwise Heaven " Maximize average-adjusted sum of rewards n lim rt n Always go to heaven! t r t = 1 t n

4 Sample MDP , , +1 V b (1) = 0.5 x x 2 + γ (0.5) V b (1) + γ (0.5) V b (2) V b (2) = 1 + γ V b (2)

5 Sample MDP , , +1 V b (2) = 1/(1 - γ) = 2 (if γ = 0.5) V b (1) = (1.5 + γ (0.5) V b (2))/(1 - γ(0.5)) = 8/3

6 Sample MDP , , +1 V r (2) = 1/(1 - γ) = 2 (if γ = 0.5) V r (1) = (1 + γ (0.5)V r (2))/(1 - γ(0.5)) = 2 Hence, the blue policy is optimal in state 1

7 Bellman Equation Choose greedy action given V* EXIT WEST Goal! +50 WAIT EXIT EAST

8 Value Function Approximation R S x A LSPI [Lagoudakis and Parr, 2003] Inverted Pendulum with Radial Basis Functions (10) R k

9 VFA by Least-Squares Projection: Minimize Bellman Residual Error T π ( Vˆ( s)) Bellman residual = i V ˆ( s) φ ( s) i w i Subspace Φ (poly, RBF, neural nets)

10 VFA by Least-Squares Projection: Fixed Point Projection T π ( Vˆ( s)) = i V ˆ( s) φ ( s) i w i Minimize Projected Residual Subspace Φ (poly, RBF, neural nets)

11 Parametric Value Function Approximation Can Fail Approximator blind to symmetries! LSPI converges to an incorrect policy Approximator blind to bottlenecks! Goal is to get to center of square grid G [Drummond, JAIR 2002]

12 Divergence of Neural Nets in Mountain Car Task (Boyan and Moore, NIPS 1995)

13 Automating Value Function Approximation " Many approaches to value function approximation " Neural nets, radial basis functions, support vector machines, kernel density estimation, nearest neighbor " How to automate the design of a function approximator? " We want to go beyond model selection!

14 Representation Learning using Harmonic Analysis on Manifolds (Mahadevan and Maggioni, NIPS 2005) 8 6 Fourier 2 ( F) = 0 Div(Grad(F))=0 4 Inverted pendulum Wavelets Dilations of diffusion operator on manifold Samples from random walk

15 Proto-Value Functions (Mahadevan, ICML 2005) Proto-value functions are the representational building blocks of all value functions on a given environment. Proto-value functions are based on modeling the state space manifold Fourier Wavelet

16 How are Proto-Value Functions Learned? Proto-value functions Graph Laplacian

17 Representation Policy Iteration (Mahadevan, UAI 2005) Policy improvement Greedy Policy Actor Policy evaluation Representation Learner Trajectories Critic Proto-value functions

18 Policy Iteration (Howard, 1960) Policy Evaluation: ( Critic ) γ Policy Improvement: ( Actor ) γ

19 Representation Policy Iteration " Learn a set of proto-value functions from a sample of transitions generated from a random walk (or from watching an expert) " These basis functions can then be used in an approximate policy iteration algorithm, such as Least Squares Policy Iteration [Lagoudakis and Parr, 2003]

20 Least-Squares Policy Iteration (Boyan, ICML 1999; Lagoudakis and Parr, JMLR 2003) Do a random walk generating a set of transitions D = (s t, a t, r, s t ) Solve the equation:

21 Learned vs. Handcoded Representations: Chain MDP Time Error Laplacian RBF Poly

22 Results of RPI on a Grid World LSPI RBF LSPI Polynomial RPI Goal is to get to center of square grid

23 Results of RPI on Two Room World G LSPI Polynomial Nonlinearity due to bottleneck is nicely captured by RPI!

24 RPI in Continuous State Spaces (Mahadevan, Maggioni, Ferguson, Osentoski, 2006) " To extend RPI to continuous state spaces, we need to use the Nystrom extension to extend eigenfunctions from sample points to new points " Many issues are involved here, including how many sampled points to choose from, how to construct the graph to reflect the underlying manifold, etc.

25 RPI : Inverted Pendulum Task Normalized Graph Laplacian matrix Sample transitions from random walk (~ ) Proto value functions

26 RPI: Inverted Pendulum Q-value for action Angle Angular Velocity -4 Q-value for action Angle Angular Velocity -4 Learned Proto-Value Functions Handcoded Radial Basis Functions

27 RPI Results on Inverted Pendulum (Mahadevan, Maggioni, Ferguson, Osentoski, 2006) 3200 Pendulum: Proto Value Functions vs Radial Basis Functions Steps RBF 50 PVF 36 RBF 50 RBF Number of training episodes

28 RPI Results for Mountain Car (Mahadevan, Maggioni, Ferguson, Osentoski, 2006) 550 Mountain Car: Proto Value Functions vs. Radial Basis Functions Steps PVF (Z=500, P=100, k=25) PVF (Z=1500, P=100, k=25) 13 RBF (Z=1500) 49 RBF (Z=1500) Number of training episodes

29 Summary: Proto-Value Functions " A major challenge facing MDP research is to how to automate value function approximation " Proto-value functions " Fourier and wavelet representations on manifolds that capture largescale geometry of the state (action) space " Representation Policy Iteration is a general framework for simultaneously learning representations and policies " Extensions of proto-value functions " On-policy proto-value functions [Maggioni and Mahadevan, 2005] " Factored Markov decision processes [Mahadevan, 2006] " Group-theoretic extensions [Mahadevan, in preparation]

Representation Policy Iteration: A Unified Framework for Learning Behavior and Representation

Representation Policy Iteration: A Unified Framework for Learning Behavior and Representation Representation Policy Iteration: A Unified Framework for Learning Behavior and Representation Sridhar Mahadevan Unified Theories of Cognition (Newell, William James Lectures, Harvard) How are we able to

More information

Learning Representation & Behavior:

Learning Representation & Behavior: Learning Representation & Behavior: Manifold and Spectral Methods for Markov Decision Processes and Reinforcement Learning Sridhar Mahadevan, U. Mass, Amherst Mauro Maggioni, Yale Univ. June 25, 26 ICML

More information

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of

More information

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions

Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 13 mahadeva@cs.umass.edu Mauro

More information

Learning Representation and Control In Continuous Markov Decision Processes

Learning Representation and Control In Continuous Markov Decision Processes Learning Representation and Control In Continuous Markov Decision Processes Sridhar Mahadevan Department of Computer Science University of Massachusetts 140 Governor s Drive Amherst, MA 03 mahadeva@cs.umass.edu

More information

Representation Policy Iteration

Representation Policy Iteration Representation Policy Iteration Sridhar Mahadevan Department of Computer Science University of Massachusetts 14 Governor s Drive Amherst, MA 13 mahadeva@cs.umass.edu Abstract This paper addresses a fundamental

More information

Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes Journal of Machine Learning Research 8 (27) 2169-2231 Submitted 6/6; Revised 3/7; Published 1/7 Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

More information

Value Function Approximation in Reinforcement Learning using the Fourier Basis

Value Function Approximation in Reinforcement Learning using the Fourier Basis Value Function Approximation in Reinforcement Learning using the Fourier Basis George Konidaris Sarah Osentoski Technical Report UM-CS-28-19 Autonomous Learning Laboratory Computer Science Department University

More information

Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes

Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA

More information

Constructing Basis Functions from Directed Graphs for Value Function Approximation

Constructing Basis Functions from Directed Graphs for Value Function Approximation Constructing Basis Functions from Directed Graphs for Value Function Approximation Jeff Johns johns@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Dept. of Computer Science, Univ. of Massachusetts

More information

Value Function Approximation in Reinforcement Learning Using the Fourier Basis

Value Function Approximation in Reinforcement Learning Using the Fourier Basis Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Value Function Approximation in Reinforcement Learning Using the Fourier Basis George Konidaris,3 MIT CSAIL gdk@csail.mit.edu

More information

An online kernel-based clustering approach for value function approximation

An online kernel-based clustering approach for value function approximation An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr

More information

An Analysis of Laplacian Methods for Value Function Approximation in MDPs

An Analysis of Laplacian Methods for Value Function Approximation in MDPs An Analysis of Laplacian Methods for Value Function Approximation in MDPs Marek Petrik Department of Computer Science University of Massachusetts Amherst, MA 3 petrik@cs.umass.edu Abstract Recently, a

More information

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes

Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Mauro Maggioni mauro.maggioni@yale.edu Department of Mathematics, Yale University, P.O. Box 88, New Haven, CT,, U.S.A.

More information

Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization

Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Jeff Johns and Sridhar Mahadevan and Chang Wang Computer Science Department University of Massachusetts Amherst Amherst,

More information

Basis Construction from Power Series Expansions of Value Functions

Basis Construction from Power Series Expansions of Value Functions Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function Approximation Continuous state/action space, mean-square error, gradient temporal difference learning, least-square temporal difference, least squares policy iteration Vien

More information

An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning

An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman

More information

New Frontiers in Representation Discovery

New Frontiers in Representation Discovery New Frontiers in Representation Discovery Sridhar Mahadevan Department of Computer Science University of Massachusetts, Amherst Collaborators: Mauro Maggioni (Duke), Jeff Johns, Sarah Osentoski, Chang

More information

Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization

Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Jeff Johns and Sridhar Mahadevan and Chang Wang Computer Science Department University of Massachusetts Amherst Amherst,

More information

Feature Selection by Singular Value Decomposition for Reinforcement Learning

Feature Selection by Singular Value Decomposition for Reinforcement Learning Feature Selection by Singular Value Decomposition for Reinforcement Learning Bahram Behzadian 1 Marek Petrik 1 Abstract Linear value function approximation is a standard approach to solving reinforcement

More information

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina

Reinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it

More information

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes

Today s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks

More information

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm

Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu

More information

Reinforcement Learning as Classification Leveraging Modern Classifiers

Reinforcement Learning as Classification Leveraging Modern Classifiers Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions

More information

, and rewards and transition matrices as shown below:

, and rewards and transition matrices as shown below: CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount

More information

Low-rank Feature Selection for Reinforcement Learning

Low-rank Feature Selection for Reinforcement Learning Low-rank Feature Selection for Reinforcement Learning Bahram Behzadian and Marek Petrik Department of Computer Science University of New Hampshire {bahram, mpetrik}@cs.unh.edu Abstract Linear value function

More information

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets

A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets Mauro Maggioni Program in Applied Mathematics Department of Mathematics Yale University New Haven, CT 6 mauro.maggioni@yale.edu

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Reinforcement Learning II. George Konidaris

Reinforcement Learning II. George Konidaris Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Chapter 3: The Reinforcement Learning Problem

Chapter 3: The Reinforcement Learning Problem Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

CMPSCI 791BB: Advanced ML: Laplacian Learning

CMPSCI 791BB: Advanced ML: Laplacian Learning CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!

More information

Lecture 7: Value Function Approximation

Lecture 7: Value Function Approximation Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,

More information

Basis Construction and Utilization for Markov Decision Processes Using Graphs

Basis Construction and Utilization for Markov Decision Processes Using Graphs University of Massachusetts Amherst ScholarWorks@UMass Amherst Open Access Dissertations 2-2010 Basis Construction and Utilization for Markov Decision Processes Using Graphs Jeffrey Thomas Johns University

More information

Basis Adaptation for Sparse Nonlinear Reinforcement Learning

Basis Adaptation for Sparse Nonlinear Reinforcement Learning Basis Adaptation for Sparse Nonlinear Reinforcement Learning Sridhar Mahadevan, Stephen Giguere, and Nicholas Jacek School of Computer Science University of Massachusetts, Amherst mahadeva@cs.umass.edu,

More information

State Space Abstractions for Reinforcement Learning

State Space Abstractions for Reinforcement Learning State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction

More information

Analyzing Feature Generation for Value-Function Approximation

Analyzing Feature Generation for Value-Function Approximation Ronald Parr Christopher Painter-Wakefield Department of Computer Science, Duke University, Durham, NC 778 USA Lihong Li Michael Littman Department of Computer Science, Rutgers University, Piscataway, NJ

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Least-Squares λ Policy Iteration: Bias-Variance Trade-off in Control Problems

Least-Squares λ Policy Iteration: Bias-Variance Trade-off in Control Problems : Bias-Variance Trade-off in Control Problems Christophe Thiery Bruno Scherrer LORIA - INRIA Lorraine - Campus Scientifique - BP 239 5456 Vandœuvre-lès-Nancy CEDEX - FRANCE thierych@loria.fr scherrer@loria.fr

More information

Linear Feature Encoding for Reinforcement Learning

Linear Feature Encoding for Reinforcement Learning Linear Feature Encoding for Reinforcement Learning Zhao Song, Ronald Parr, Xuejun Liao, Lawrence Carin Department of Electrical and Computer Engineering Department of Computer Science Duke University,

More information

Markov decision processes

Markov decision processes CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only

More information

Towards Feature Selection In Actor-Critic Algorithms Khashayar Rohanimanesh, Nicholas Roy, and Russ Tedrake

Towards Feature Selection In Actor-Critic Algorithms Khashayar Rohanimanesh, Nicholas Roy, and Russ Tedrake Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-051 November 1, 2007 Towards Feature Selection In Actor-Critic Algorithms Khashayar Rohanimanesh, Nicholas Roy,

More information

CS599 Lecture 2 Function Approximation in RL

CS599 Lecture 2 Function Approximation in RL CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer

More information

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes

CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision

More information

Lecture 3: The Reinforcement Learning Problem

Lecture 3: The Reinforcement Learning Problem Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which

More information

Hidden Markov Models (HMM) and Support Vector Machine (SVM)

Hidden Markov Models (HMM) and Support Vector Machine (SVM) Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)

More information

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Reading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where

More information

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan

Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning

More information

CS599 Lecture 1 Introduction To RL

CS599 Lecture 1 Introduction To RL CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming

More information

Reinforcement Learning. Introduction

Reinforcement Learning. Introduction Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control

More information

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. R.R. Coifman, S. Lafon, MM Mathematics Department Program of Applied Mathematics. Yale University Motivations The main

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

A Generalized Reduced Linear Program for Markov Decision Processes

A Generalized Reduced Linear Program for Markov Decision Processes Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence A Generalized Reduced Linear Program for Markov Decision Processes Chandrashekar Lakshminarayanan and Shalabh Bhatnagar Department

More information

Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning

Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning Abstract Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning RL in continuous MDPs March April, 2015 Large/Continuous MDPs Large/Continuous state space Tabular representation cannot be used Large/Continuous action space Maximization over action

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.

More information

Batch, Off-policy and Model-free Apprenticeship Learning

Batch, Off-policy and Model-free Apprenticeship Learning Batch, Off-policy and Model-free Apprenticeship Learning Edouard Klein 13, Matthieu Geist 1, and Olivier Pietquin 12 1. Supélec-Metz Campus, IMS Research group, France, prenom.nom@supelec.fr 2. UMI 2958

More information

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming Introduction to Reinforcement Learning Part 6: Core Theory II: Bellman Equations and Dynamic Programming Bellman Equations Recursive relationships among values that can be used to compute values The tree

More information

Lecture 1: March 7, 2018

Lecture 1: March 7, 2018 Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights

More information

Markov Decision Processes Chapter 17. Mausam

Markov Decision Processes Chapter 17. Mausam Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.

More information

Reinforcement Learning: An Introduction

Reinforcement Learning: An Introduction Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is

More information

Regularization and Feature Selection in. the Least-Squares Temporal Difference

Regularization and Feature Selection in. the Least-Squares Temporal Difference Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter kolter@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Computer Science Department, Stanford University,

More information

Machine Learning I Reinforcement Learning

Machine Learning I Reinforcement Learning Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:

More information

Decision making, Markov decision processes

Decision making, Markov decision processes Decision making, Markov decision processes Solved tasks Collected by: Jiří Kléma, klema@fel.cvut.cz Spring 2017 The main goal: The text presents solved tasks to support labs in the A4B33ZUI course. 1 Simple

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Least-Squares Temporal Difference Learning based on Extreme Learning Machine

Least-Squares Temporal Difference Learning based on Extreme Learning Machine Least-Squares Temporal Difference Learning based on Extreme Learning Machine Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis IDAL, Intelligent

More information

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018

INF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018 Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)

More information

Manifold Alignment using Procrustes Analysis

Manifold Alignment using Procrustes Analysis Chang Wang chwang@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Computer Science Department, University of Massachusetts, Amherst, MA 13 USA Abstract In this paper we introduce a novel approach

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

REINFORCEMENT LEARNING

REINFORCEMENT LEARNING REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

Open Theoretical Questions in Reinforcement Learning

Open Theoretical Questions in Reinforcement Learning Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem

More information

CSE250A Fall 12: Discussion Week 9

CSE250A Fall 12: Discussion Week 9 CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

Regularization and Feature Selection in. the Least-Squares Temporal Difference

Regularization and Feature Selection in. the Least-Squares Temporal Difference Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter kolter@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Computer Science Department, Stanford University,

More information

Decision Theory: Markov Decision Processes

Decision Theory: Markov Decision Processes Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies

More information

CSC321 Lecture 22: Q-Learning

CSC321 Lecture 22: Q-Learning CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize

More information

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016

Course 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016 Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the

More information

Final Exam December 12, 2017

Final Exam December 12, 2017 Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes

More information

Model-Based Function Approximation in Reinforcement Learning

Model-Based Function Approximation in Reinforcement Learning In The Sixth International Conference on Autonomous Agents and Multiagent Systems (AAMAS 7), Honolulu, Hawai i, May 27. Model-Based Function Approximation in Reinforcement Learning Nicholas K. Jong The

More information

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction

MS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent

More information

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a

1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a 3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now

More information

Reinforcement Learning and Control

Reinforcement Learning and Control CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make

More information

Ensembles of Extreme Learning Machine Networks for Value Prediction

Ensembles of Extreme Learning Machine Networks for Value Prediction Ensembles of Extreme Learning Machine Networks for Value Prediction Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José D. Martín-Guerrero IDAL, Intelligent

More information

An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning

An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning Haitham Bou Ammar University of Pennsylvania Eric Eaton University of Pennsylvania Matthew E. Taylor Washington State University

More information

6 Reinforcement Learning

6 Reinforcement Learning 6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,

More information

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning. Monte Carlo is important in practice CSE 190: Reinforcement Learning: An Introduction Chapter 6: emporal Difference Learning When there are just a few possibilitieo value, out of a large state space, Monte

More information

The Reinforcement Learning Problem

The Reinforcement Learning Problem The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence

More information