Laplacian Agent Learning: Representation Policy Iteration
|
|
- Dale Griffith
- 6 years ago
- Views:
Transcription
1 Laplacian Agent Learning: Representation Policy Iteration Sridhar Mahadevan
2 Example of a Markov Decision Process a1: $0 Heaven $1 Earth What should the agent do? a2: $100 Hell $-1 V a1 ( Earth ) = f(0,1,1,1,1,...) V a2 ( Earth ) = f(100,-1,-1,-1,-1,...)
3 Infinite Horizon Markov Decision Processes! Graded myopia: discounted sum of rewards " Maximize γ t " Hell if γ < 0.98, otherwise Heaven " Maximize average-adjusted sum of rewards n lim rt n Always go to heaven! t r t = 1 t n
4 Sample MDP , , +1 V b (1) = 0.5 x x 2 + γ (0.5) V b (1) + γ (0.5) V b (2) V b (2) = 1 + γ V b (2)
5 Sample MDP , , +1 V b (2) = 1/(1 - γ) = 2 (if γ = 0.5) V b (1) = (1.5 + γ (0.5) V b (2))/(1 - γ(0.5)) = 8/3
6 Sample MDP , , +1 V r (2) = 1/(1 - γ) = 2 (if γ = 0.5) V r (1) = (1 + γ (0.5)V r (2))/(1 - γ(0.5)) = 2 Hence, the blue policy is optimal in state 1
7 Bellman Equation Choose greedy action given V* EXIT WEST Goal! +50 WAIT EXIT EAST
8 Value Function Approximation R S x A LSPI [Lagoudakis and Parr, 2003] Inverted Pendulum with Radial Basis Functions (10) R k
9 VFA by Least-Squares Projection: Minimize Bellman Residual Error T π ( Vˆ( s)) Bellman residual = i V ˆ( s) φ ( s) i w i Subspace Φ (poly, RBF, neural nets)
10 VFA by Least-Squares Projection: Fixed Point Projection T π ( Vˆ( s)) = i V ˆ( s) φ ( s) i w i Minimize Projected Residual Subspace Φ (poly, RBF, neural nets)
11 Parametric Value Function Approximation Can Fail Approximator blind to symmetries! LSPI converges to an incorrect policy Approximator blind to bottlenecks! Goal is to get to center of square grid G [Drummond, JAIR 2002]
12 Divergence of Neural Nets in Mountain Car Task (Boyan and Moore, NIPS 1995)
13 Automating Value Function Approximation " Many approaches to value function approximation " Neural nets, radial basis functions, support vector machines, kernel density estimation, nearest neighbor " How to automate the design of a function approximator? " We want to go beyond model selection!
14 Representation Learning using Harmonic Analysis on Manifolds (Mahadevan and Maggioni, NIPS 2005) 8 6 Fourier 2 ( F) = 0 Div(Grad(F))=0 4 Inverted pendulum Wavelets Dilations of diffusion operator on manifold Samples from random walk
15 Proto-Value Functions (Mahadevan, ICML 2005) Proto-value functions are the representational building blocks of all value functions on a given environment. Proto-value functions are based on modeling the state space manifold Fourier Wavelet
16 How are Proto-Value Functions Learned? Proto-value functions Graph Laplacian
17 Representation Policy Iteration (Mahadevan, UAI 2005) Policy improvement Greedy Policy Actor Policy evaluation Representation Learner Trajectories Critic Proto-value functions
18 Policy Iteration (Howard, 1960) Policy Evaluation: ( Critic ) γ Policy Improvement: ( Actor ) γ
19 Representation Policy Iteration " Learn a set of proto-value functions from a sample of transitions generated from a random walk (or from watching an expert) " These basis functions can then be used in an approximate policy iteration algorithm, such as Least Squares Policy Iteration [Lagoudakis and Parr, 2003]
20 Least-Squares Policy Iteration (Boyan, ICML 1999; Lagoudakis and Parr, JMLR 2003) Do a random walk generating a set of transitions D = (s t, a t, r, s t ) Solve the equation:
21 Learned vs. Handcoded Representations: Chain MDP Time Error Laplacian RBF Poly
22 Results of RPI on a Grid World LSPI RBF LSPI Polynomial RPI Goal is to get to center of square grid
23 Results of RPI on Two Room World G LSPI Polynomial Nonlinearity due to bottleneck is nicely captured by RPI!
24 RPI in Continuous State Spaces (Mahadevan, Maggioni, Ferguson, Osentoski, 2006) " To extend RPI to continuous state spaces, we need to use the Nystrom extension to extend eigenfunctions from sample points to new points " Many issues are involved here, including how many sampled points to choose from, how to construct the graph to reflect the underlying manifold, etc.
25 RPI : Inverted Pendulum Task Normalized Graph Laplacian matrix Sample transitions from random walk (~ ) Proto value functions
26 RPI: Inverted Pendulum Q-value for action Angle Angular Velocity -4 Q-value for action Angle Angular Velocity -4 Learned Proto-Value Functions Handcoded Radial Basis Functions
27 RPI Results on Inverted Pendulum (Mahadevan, Maggioni, Ferguson, Osentoski, 2006) 3200 Pendulum: Proto Value Functions vs Radial Basis Functions Steps RBF 50 PVF 36 RBF 50 RBF Number of training episodes
28 RPI Results for Mountain Car (Mahadevan, Maggioni, Ferguson, Osentoski, 2006) 550 Mountain Car: Proto Value Functions vs. Radial Basis Functions Steps PVF (Z=500, P=100, k=25) PVF (Z=1500, P=100, k=25) 13 RBF (Z=1500) 49 RBF (Z=1500) Number of training episodes
29 Summary: Proto-Value Functions " A major challenge facing MDP research is to how to automate value function approximation " Proto-value functions " Fourier and wavelet representations on manifolds that capture largescale geometry of the state (action) space " Representation Policy Iteration is a general framework for simultaneously learning representations and policies " Extensions of proto-value functions " On-policy proto-value functions [Maggioni and Mahadevan, 2005] " Factored Markov decision processes [Mahadevan, 2006] " Group-theoretic extensions [Mahadevan, in preparation]
Representation Policy Iteration: A Unified Framework for Learning Behavior and Representation
Representation Policy Iteration: A Unified Framework for Learning Behavior and Representation Sridhar Mahadevan Unified Theories of Cognition (Newell, William James Lectures, Harvard) How are we able to
More informationLearning Representation & Behavior:
Learning Representation & Behavior: Manifold and Spectral Methods for Markov Decision Processes and Reinforcement Learning Sridhar Mahadevan, U. Mass, Amherst Mauro Maggioni, Yale Univ. June 25, 26 ICML
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationValue Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions
Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 13 mahadeva@cs.umass.edu Mauro
More informationLearning Representation and Control In Continuous Markov Decision Processes
Learning Representation and Control In Continuous Markov Decision Processes Sridhar Mahadevan Department of Computer Science University of Massachusetts 140 Governor s Drive Amherst, MA 03 mahadeva@cs.umass.edu
More informationRepresentation Policy Iteration
Representation Policy Iteration Sridhar Mahadevan Department of Computer Science University of Massachusetts 14 Governor s Drive Amherst, MA 13 mahadeva@cs.umass.edu Abstract This paper addresses a fundamental
More informationProto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
Journal of Machine Learning Research 8 (27) 2169-2231 Submitted 6/6; Revised 3/7; Published 1/7 Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
More informationValue Function Approximation in Reinforcement Learning using the Fourier Basis
Value Function Approximation in Reinforcement Learning using the Fourier Basis George Konidaris Sarah Osentoski Technical Report UM-CS-28-19 Autonomous Learning Laboratory Computer Science Department University
More informationProto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA
More informationConstructing Basis Functions from Directed Graphs for Value Function Approximation
Constructing Basis Functions from Directed Graphs for Value Function Approximation Jeff Johns johns@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Dept. of Computer Science, Univ. of Massachusetts
More informationValue Function Approximation in Reinforcement Learning Using the Fourier Basis
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence Value Function Approximation in Reinforcement Learning Using the Fourier Basis George Konidaris,3 MIT CSAIL gdk@csail.mit.edu
More informationAn online kernel-based clustering approach for value function approximation
An online kernel-based clustering approach for value function approximation N. Tziortziotis and K. Blekas Department of Computer Science, University of Ioannina P.O.Box 1186, Ioannina 45110 - Greece {ntziorzi,kblekas}@cs.uoi.gr
More informationAn Analysis of Laplacian Methods for Value Function Approximation in MDPs
An Analysis of Laplacian Methods for Value Function Approximation in MDPs Marek Petrik Department of Computer Science University of Massachusetts Amherst, MA 3 petrik@cs.umass.edu Abstract Recently, a
More informationFast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes
Fast Direct Policy Evaluation using Multiscale Analysis of Markov Diffusion Processes Mauro Maggioni mauro.maggioni@yale.edu Department of Mathematics, Yale University, P.O. Box 88, New Haven, CT,, U.S.A.
More informationCompact Spectral Bases for Value Function Approximation Using Kronecker Factorization
Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Jeff Johns and Sridhar Mahadevan and Chang Wang Computer Science Department University of Massachusetts Amherst Amherst,
More informationBasis Construction from Power Series Expansions of Value Functions
Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of
More informationReinforcement Learning
Reinforcement Learning Function Approximation Continuous state/action space, mean-square error, gradient temporal difference learning, least-square temporal difference, least squares policy iteration Vien
More informationAn Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection For Reinforcement Learning Ronald Parr, Lihong Li, Gavin Taylor, Christopher Painter-Wakefield, Michael L. Littman
More informationNew Frontiers in Representation Discovery
New Frontiers in Representation Discovery Sridhar Mahadevan Department of Computer Science University of Massachusetts, Amherst Collaborators: Mauro Maggioni (Duke), Jeff Johns, Sarah Osentoski, Chang
More informationCompact Spectral Bases for Value Function Approximation Using Kronecker Factorization
Compact Spectral Bases for Value Function Approximation Using Kronecker Factorization Jeff Johns and Sridhar Mahadevan and Chang Wang Computer Science Department University of Massachusetts Amherst Amherst,
More informationFeature Selection by Singular Value Decomposition for Reinforcement Learning
Feature Selection by Singular Value Decomposition for Reinforcement Learning Bahram Behzadian 1 Marek Petrik 1 Abstract Linear value function approximation is a standard approach to solving reinforcement
More informationReinforcement Learning. Donglin Zeng, Department of Biostatistics, University of North Carolina
Reinforcement Learning Introduction Introduction Unsupervised learning has no outcome (no feedback). Supervised learning has outcome so we know what to predict. Reinforcement learning is in between it
More informationToday s s Lecture. Applicability of Neural Networks. Back-propagation. Review of Neural Networks. Lecture 20: Learning -4. Markov-Decision Processes
Today s s Lecture Lecture 20: Learning -4 Review of Neural Networks Markov-Decision Processes Victor Lesser CMPSCI 683 Fall 2004 Reinforcement learning 2 Back-propagation Applicability of Neural Networks
More informationBalancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm
Balancing and Control of a Freely-Swinging Pendulum Using a Model-Free Reinforcement Learning Algorithm Michail G. Lagoudakis Department of Computer Science Duke University Durham, NC 2778 mgl@cs.duke.edu
More informationReinforcement Learning as Classification Leveraging Modern Classifiers
Reinforcement Learning as Classification Leveraging Modern Classifiers Michail G. Lagoudakis and Ronald Parr Department of Computer Science Duke University Durham, NC 27708 Machine Learning Reductions
More information, and rewards and transition matrices as shown below:
CSE 50a. Assignment 7 Out: Tue Nov Due: Thu Dec Reading: Sutton & Barto, Chapters -. 7. Policy improvement Consider the Markov decision process (MDP) with two states s {0, }, two actions a {0, }, discount
More informationLow-rank Feature Selection for Reinforcement Learning
Low-rank Feature Selection for Reinforcement Learning Bahram Behzadian and Marek Petrik Department of Computer Science University of New Hampshire {bahram, mpetrik}@cs.unh.edu Abstract Linear value function
More informationA Multiscale Framework for Markov Decision Processes using Diffusion Wavelets
A Multiscale Framework for Markov Decision Processes using Diffusion Wavelets Mauro Maggioni Program in Applied Mathematics Department of Mathematics Yale University New Haven, CT 6 mauro.maggioni@yale.edu
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationMARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti
1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early
More informationReinforcement Learning II. George Konidaris
Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2017 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes
More informationThis question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.
This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you
More informationChristopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015
Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)
More informationReinforcement Learning II. George Konidaris
Reinforcement Learning II George Konidaris gdk@cs.brown.edu Fall 2018 Reinforcement Learning π : S A max R = t=0 t r t MDPs Agent interacts with an environment At each time t: Receives sensor signal Executes
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationChapter 3: The Reinforcement Learning Problem
Chapter 3: The Reinforcement Learning Problem Objectives of this chapter: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationCMPSCI 791BB: Advanced ML: Laplacian Learning
CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!
More informationLecture 7: Value Function Approximation
Lecture 7: Value Function Approximation Joseph Modayil Outline 1 Introduction 2 3 Batch Methods Introduction Large-Scale Reinforcement Learning Reinforcement learning can be used to solve large problems,
More informationBasis Construction and Utilization for Markov Decision Processes Using Graphs
University of Massachusetts Amherst ScholarWorks@UMass Amherst Open Access Dissertations 2-2010 Basis Construction and Utilization for Markov Decision Processes Using Graphs Jeffrey Thomas Johns University
More informationBasis Adaptation for Sparse Nonlinear Reinforcement Learning
Basis Adaptation for Sparse Nonlinear Reinforcement Learning Sridhar Mahadevan, Stephen Giguere, and Nicholas Jacek School of Computer Science University of Massachusetts, Amherst mahadeva@cs.umass.edu,
More informationState Space Abstractions for Reinforcement Learning
State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction
More informationAnalyzing Feature Generation for Value-Function Approximation
Ronald Parr Christopher Painter-Wakefield Department of Computer Science, Duke University, Durham, NC 778 USA Lihong Li Michael Littman Department of Computer Science, Rutgers University, Piscataway, NJ
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationLeast-Squares λ Policy Iteration: Bias-Variance Trade-off in Control Problems
: Bias-Variance Trade-off in Control Problems Christophe Thiery Bruno Scherrer LORIA - INRIA Lorraine - Campus Scientifique - BP 239 5456 Vandœuvre-lès-Nancy CEDEX - FRANCE thierych@loria.fr scherrer@loria.fr
More informationLinear Feature Encoding for Reinforcement Learning
Linear Feature Encoding for Reinforcement Learning Zhao Song, Ronald Parr, Xuejun Liao, Lawrence Carin Department of Electrical and Computer Engineering Department of Computer Science Duke University,
More informationMarkov decision processes
CS 2740 Knowledge representation Lecture 24 Markov decision processes Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administrative announcements Final exam: Monday, December 8, 2008 In-class Only
More informationTowards Feature Selection In Actor-Critic Algorithms Khashayar Rohanimanesh, Nicholas Roy, and Russ Tedrake
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2007-051 November 1, 2007 Towards Feature Selection In Actor-Critic Algorithms Khashayar Rohanimanesh, Nicholas Roy,
More informationCS599 Lecture 2 Function Approximation in RL
CS599 Lecture 2 Function Approximation in RL Look at how experience with a limited part of the state set be used to produce good behavior over a much larger part. Overview of function approximation (FA)
More informationReinforcement Learning
Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation
More informationActive Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning
Active Policy Iteration: fficient xploration through Active Learning for Value Function Approximation in Reinforcement Learning Takayuki Akiyama, Hirotaka Hachiya, and Masashi Sugiyama Department of Computer
More informationCS788 Dialogue Management Systems Lecture #2: Markov Decision Processes
CS788 Dialogue Management Systems Lecture #2: Markov Decision Processes Kee-Eung Kim KAIST EECS Department Computer Science Division Markov Decision Processes (MDPs) A popular model for sequential decision
More informationLecture 3: The Reinforcement Learning Problem
Lecture 3: The Reinforcement Learning Problem Objectives of this lecture: describe the RL problem we will be studying for the remainder of the course present idealized form of the RL problem for which
More informationHidden Markov Models (HMM) and Support Vector Machine (SVM)
Hidden Markov Models (HMM) and Support Vector Machine (SVM) Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Hidden Markov Models (HMM)
More informationReading Response: Due Wednesday. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1
Reading Response: Due Wednesday R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Another Example Get to the top of the hill as quickly as possible. reward = 1 for each step where
More informationLecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 18: Reinforcement Learning Sanjeev Arora Elad Hazan Some slides borrowed from Peter Bodik and David Silver Course progress Learning
More informationCS599 Lecture 1 Introduction To RL
CS599 Lecture 1 Introduction To RL Reinforcement Learning Introduction Learning from rewards Policies Value Functions Rewards Models of the Environment Exploitation vs. Exploration Dynamic Programming
More informationReinforcement Learning. Introduction
Reinforcement Learning Introduction Reinforcement Learning Agent interacts and learns from a stochastic environment Science of sequential decision making Many faces of reinforcement learning Optimal control
More informationDiffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.
Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets. R.R. Coifman, S. Lafon, MM Mathematics Department Program of Applied Mathematics. Yale University Motivations The main
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationA Generalized Reduced Linear Program for Markov Decision Processes
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence A Generalized Reduced Linear Program for Markov Decision Processes Chandrashekar Lakshminarayanan and Shalabh Bhatnagar Department
More informationAdaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning
Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning Abstract Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationReinforcement Learning
Reinforcement Learning RL in continuous MDPs March April, 2015 Large/Continuous MDPs Large/Continuous state space Tabular representation cannot be used Large/Continuous action space Maximization over action
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.
More informationBatch, Off-policy and Model-free Apprenticeship Learning
Batch, Off-policy and Model-free Apprenticeship Learning Edouard Klein 13, Matthieu Geist 1, and Olivier Pietquin 12 1. Supélec-Metz Campus, IMS Research group, France, prenom.nom@supelec.fr 2. UMI 2958
More informationIntroduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming
Introduction to Reinforcement Learning Part 6: Core Theory II: Bellman Equations and Dynamic Programming Bellman Equations Recursive relationships among values that can be used to compute values The tree
More informationLecture 1: March 7, 2018
Reinforcement Learning Spring Semester, 2017/8 Lecture 1: March 7, 2018 Lecturer: Yishay Mansour Scribe: ym DISCLAIMER: Based on Learning and Planning in Dynamical Systems by Shie Mannor c, all rights
More informationMarkov Decision Processes Chapter 17. Mausam
Markov Decision Processes Chapter 17 Mausam Planning Agent Static vs. Dynamic Fully vs. Partially Observable Environment What action next? Deterministic vs. Stochastic Perfect vs. Noisy Instantaneous vs.
More informationReinforcement Learning: An Introduction
Introduction Betreuer: Freek Stulp Hauptseminar Intelligente Autonome Systeme (WiSe 04/05) Forschungs- und Lehreinheit Informatik IX Technische Universität München November 24, 2004 Introduction What is
More informationRegularization and Feature Selection in. the Least-Squares Temporal Difference
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter kolter@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Computer Science Department, Stanford University,
More informationMachine Learning I Reinforcement Learning
Machine Learning I Reinforcement Learning Thomas Rückstieß Technische Universität München December 17/18, 2009 Literature Book: Reinforcement Learning: An Introduction Sutton & Barto (free online version:
More informationDecision making, Markov decision processes
Decision making, Markov decision processes Solved tasks Collected by: Jiří Kléma, klema@fel.cvut.cz Spring 2017 The main goal: The text presents solved tasks to support labs in the A4B33ZUI course. 1 Simple
More informationReinforcement Learning
Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function
More informationReinforcement Learning. George Konidaris
Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom
More informationProf. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be
REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while
More informationLeast-Squares Temporal Difference Learning based on Extreme Learning Machine
Least-Squares Temporal Difference Learning based on Extreme Learning Machine Pablo Escandell-Montero, José M. Martínez-Martínez, José D. Martín-Guerrero, Emilio Soria-Olivas, Juan Gómez-Sanchis IDAL, Intelligent
More informationINF 5860 Machine learning for image classification. Lecture 14: Reinforcement learning May 9, 2018
Machine learning for image classification Lecture 14: Reinforcement learning May 9, 2018 Page 3 Outline Motivation Introduction to reinforcement learning (RL) Value function based methods (Q-learning)
More informationManifold Alignment using Procrustes Analysis
Chang Wang chwang@cs.umass.edu Sridhar Mahadevan mahadeva@cs.umass.edu Computer Science Department, University of Massachusetts, Amherst, MA 13 USA Abstract In this paper we introduce a novel approach
More informationLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu
More informationREINFORCEMENT LEARNING
REINFORCEMENT LEARNING Larry Page: Where s Google going next? DeepMind's DQN playing Breakout Contents Introduction to Reinforcement Learning Deep Q-Learning INTRODUCTION TO REINFORCEMENT LEARNING Contents
More informationReinforcement learning an introduction
Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,
More informationOpen Theoretical Questions in Reinforcement Learning
Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem
More informationCSE250A Fall 12: Discussion Week 9
CSE250A Fall 12: Discussion Week 9 Aditya Menon (akmenon@ucsd.edu) December 4, 2012 1 Schedule for today Recap of Markov Decision Processes. Examples: slot machines and maze traversal. Planning and learning.
More informationReinforcement Learning. Machine Learning, Fall 2010
Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationRegularization and Feature Selection in. the Least-Squares Temporal Difference
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter kolter@cs.stanford.edu Andrew Y. Ng ang@cs.stanford.edu Computer Science Department, Stanford University,
More informationDecision Theory: Markov Decision Processes
Decision Theory: Markov Decision Processes CPSC 322 Lecture 33 March 31, 2006 Textbook 12.5 Decision Theory: Markov Decision Processes CPSC 322 Lecture 33, Slide 1 Lecture Overview Recap Rewards and Policies
More informationCSC321 Lecture 22: Q-Learning
CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize
More informationCourse 16:198:520: Introduction To Artificial Intelligence Lecture 13. Decision Making. Abdeslam Boularias. Wednesday, December 7, 2016
Course 16:198:520: Introduction To Artificial Intelligence Lecture 13 Decision Making Abdeslam Boularias Wednesday, December 7, 2016 1 / 45 Overview We consider probabilistic temporal models where the
More informationFinal Exam December 12, 2017
Introduction to Artificial Intelligence CSE 473, Autumn 2017 Dieter Fox Final Exam December 12, 2017 Directions This exam has 7 problems with 111 points shown in the table below, and you have 110 minutes
More informationModel-Based Function Approximation in Reinforcement Learning
In The Sixth International Conference on Autonomous Agents and Multiagent Systems (AAMAS 7), Honolulu, Hawai i, May 27. Model-Based Function Approximation in Reinforcement Learning Nicholas K. Jong The
More informationMS&E338 Reinforcement Learning Lecture 1 - April 2, Introduction
MS&E338 Reinforcement Learning Lecture 1 - April 2, 2018 Introduction Lecturer: Ben Van Roy Scribe: Gabriel Maher 1 Reinforcement Learning Introduction In reinforcement learning (RL) we consider an agent
More information1. (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a
3 MDP (2 points). (3 pts) In MDPs, the values of states are related by the Bellman equation: U(s) = R(s) + γ max a s P (s s, a)u(s ) where R(s) is the reward associated with being in state s. Suppose now
More informationReinforcement Learning and Control
CS9 Lecture notes Andrew Ng Part XIII Reinforcement Learning and Control We now begin our study of reinforcement learning and adaptive control. In supervised learning, we saw algorithms that tried to make
More informationEnsembles of Extreme Learning Machine Networks for Value Prediction
Ensembles of Extreme Learning Machine Networks for Value Prediction Pablo Escandell-Montero, José M. Martínez-Martínez, Emilio Soria-Olivas, Joan Vila-Francés, José D. Martín-Guerrero IDAL, Intelligent
More informationAn Automated Measure of MDP Similarity for Transfer in Reinforcement Learning
An Automated Measure of MDP Similarity for Transfer in Reinforcement Learning Haitham Bou Ammar University of Pennsylvania Eric Eaton University of Pennsylvania Matthew E. Taylor Washington State University
More information6 Reinforcement Learning
6 Reinforcement Learning As discussed above, a basic form of supervised learning is function approximation, relating input vectors to output vectors, or, more generally, finding density functions p(y,
More informationMonte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.
Monte Carlo is important in practice CSE 190: Reinforcement Learning: An Introduction Chapter 6: emporal Difference Learning When there are just a few possibilitieo value, out of a large state space, Monte
More informationThe Reinforcement Learning Problem
The Reinforcement Learning Problem Slides based on the book Reinforcement Learning by Sutton and Barto Formalizing Reinforcement Learning Formally, the agent and environment interact at each of a sequence
More information