Chapter 8: Generalization and Function Approximation

Similar documents
Value Prediction with FA. Chapter 8: Generalization and Function Approximation. Adapt Supervised Learning Algorithms. Backups as Training Examples [ ]

Chapter 8: Generalization and Function Approximation

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CS599 Lecture 2 Function Approximation in RL

Generalization and Function Approximation

MULTILAYER PERCEPTRONS

Temporal-Difference Learning

4/18/2005. Statistical Learning Theory

Numerical Integration

Multiple Experts with Binary Features

CSCE 478/878 Lecture 4: Experimental Design and Analysis. Stephen Scott. 3 Building a tree on the training set Introduction. Outline.

Directed Regression. Benjamin Van Roy Stanford University Stanford, CA Abstract

Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline. Machine. Learning. Problems. Measuring. Performance.

Chapter 3: Theory of Modular Arithmetic 38

Revision of Lecture Eight

Conjugate Gradient Methods. Michael Bader. Summer term 2012

Geometry and statistics in turbulence

10/04/18. P [P(x)] 1 negl(n).

EM Boundary Value Problems

C e f paamete adaptation f (' x) ' ' d _ d ; ; e _e K p K v u ^M() RBF NN ^h( ) _ obot s _ s n W ' f x x xm xm f x xm d Figue : Block diagam of comput

A Machine Learned Model of a Hybrid Aircraft

Research Design - - Topic 17 Multiple Regression & Multiple Correlation: Two Predictors 2009 R.C. Gardner, Ph.D.

A Power Method for Computing Square Roots of Complex Matrices

ASTR415: Problem Set #6

x 1 b 1 Consider the midpoint x 0 = 1 2

Probablistically Checkable Proofs

Localization of Eigenvalues in Small Specified Regions of Complex Plane by State Feedback Matrix

Reliability analysis examples

Gradient-based Neural Network for Online Solution of Lyapunov Matrix Equation with Li Activation Function

Multi-Objective Optimization Algorithms for Finite Element Model Updating

Chem 453/544 Fall /08/03. Exam #1 Solutions

you of a spring. The potential energy for a spring is given by the parabola U( x)

Linear Program for Partially Observable Markov Decision Processes. MS&E 339B June 9th, 2004 Erick Delage

LINEAR AND NONLINEAR ANALYSES OF A WIND-TUNNEL BALANCE

A Deep Convolutional Neural Network Based on Nested Residue Number System

Forecasting Agricultural Commodity Prices Using Multivariate Bayesian Machine Learning. Andres M. Ticlavilca, Dillon M. Feuz, and Mac McKee

working pages for Paul Richards class notes; do not copy or circulate without permission from PGR 2004/11/3 10:50

Flux. Area Vector. Flux of Electric Field. Gauss s Law

Method for Approximating Irrational Numbers

To Feel a Force Chapter 7 Static equilibrium - torque and friction

Computational Methods of Solid Mechanics. Project report

Quantum Fourier Transform

Elementary Statistics and Inference. Elementary Statistics and Inference. 11. Regression (cont.) 22S:025 or 7P:025. Lecture 14.

Hammerstein Model Identification Based On Instrumental Variable and Least Square Methods

Hydroelastic Analysis of a 1900 TEU Container Ship Using Finite Element and Boundary Element Methods

Machine Learning and Rendering

Rejection Based Face Detection

Introduction to Nuclear Forces

A Simple Nonparametric Approach to Estimating the Distribution of Random Coefficients in Structural Models

Experience Selection in Deep Reinforcement Learning for Control

High precision computer simulation of cyclotrons KARAMYSHEVA T., AMIRKHANOV I. MALININ V., POPOV D.

1 Explicit Explore or Exploit (E 3 ) Algorithm

2 x 8 2 x 2 SKILLS Determine whether the given value is a solution of the. equation. (a) x 2 (b) x 4. (a) x 2 (b) x 4 (a) x 4 (b) x 8

1D2G - Numerical solution of the neutron diffusion equation

2. Electrostatics. Dr. Rakhesh Singh Kshetrimayum 8/11/ Electromagnetic Field Theory by R. S. Kshetrimayum

Fluid flow in curved geometries: Mathematical Modeling and Applications

DonnishJournals

Reasons to Build a Hydraulic Model

MAPPING LARGE PARALLEL SIMULATION PROGRAMS TO MULTICOMPUTER SYSTEMS

Electrostatics (Electric Charges and Field) #2 2010

Phys-272 Lecture 17. Motional Electromotive Force (emf) Induced Electric Fields Displacement Currents Maxwell s Equations

Quantum Mechanics II

ANALYSIS OF PRESSURE VARIATION OF FLUID IN AN INFINITE ACTING RESERVOIR

Extra notes for circular motion: Circular motion : v keeps changing, maybe both speed and

Ground states of stealthy hyperuniform potentials: I. Entropically favored configurations

Objectives: After finishing this unit you should be able to:

Particle Systems. University of Texas at Austin CS384G - Computer Graphics Fall 2010 Don Fussell

Part V: Closed-form solutions to Loop Closure Equations

, the tangent line is an approximation of the curve (and easier to deal with than the curve).

Recent Advances in Chemical Engineering, Biochemistry and Computational Chemistry

Scientific Computing II

PHYS Summer Professor Caillault Homework Solutions. Chapter 5

Fresnel Diffraction. monchromatic light source

MECHANICAL PULPING REFINER MECHANICAL PULPS

HINDCASTING OF WIND AND WAVE CLIMATE OF SEAS AROUND RUSSIA

CBE Transport Phenomena I Final Exam. December 19, 2013

FUSE Fusion Utility Sequence Estimator

Estimation of the Correlation Coefficient for a Bivariate Normal Distribution with Missing Data

OSCILLATIONS AND GRAVITATION

AQI: Advanced Quantum Information Lecture 2 (Module 4): Order finding and factoring algorithms February 20, 2013

Teachers notes. Beyond the Thrills excursions. Worksheets in this book. Completing the worksheets

Bayesian Analysis of Topp-Leone Distribution under Different Loss Functions and Different Priors

Research Article On Alzer and Qiu s Conjecture for Complete Elliptic Integral and Inverse Hyperbolic Tangent Function

7.2. Coulomb s Law. The Electric Force

Chapter 9 Dynamic stability analysis III Lateral motion (Lectures 33 and 34)

Pulse Neutron Neutron (PNN) tool logging for porosity Some theoretical aspects

CHAPTER 3. Section 1. Modeling Population Growth

Absolute Specifications: A typical absolute specification of a lowpass filter is shown in figure 1 where:

Numerical solution of diffusion mass transfer model in adsorption systems. Prof. Nina Paula Gonçalves Salau, D.Sc.

FE FORMULATIONS FOR PLASTICITY

6 PROBABILITY GENERATING FUNCTIONS

On the Poisson Approximation to the Negative Hypergeometric Distribution

Hopefully Helpful Hints for Gauss s Law

17.1 Electric Potential Energy. Equipotential Lines. PE = energy associated with an arrangement of objects that exert forces on each other

GENERALIZED STATISTICAL METHODS FOR UNSUPERVISED MINORITY CLASS DETECTION IN MIXED DATA SETS. Uwe F. Mayer

Discrete LQ optimal control with integral action: A simple controller on incremental form for MIMO systems

Information Retrieval Advanced IR models. Luca Bondi

Math 301: The Erdős-Stone-Simonovitz Theorem and Extremal Numbers for Bipartite Graphs

STABILITY AND PARAMETER SENSITIVITY ANALYSES OF AN INDUCTION MOTOR

Transcription:

Chapte 8: Genealization and Function Appoximation Objectives of this chapte: Look at how expeience with a limited pat of the state set be used to poduce good behavio ove a much lage pat. Oveview of function appoximation (FA) methods and how they can be adapted to RL

Value Pediction with FA As usual: Policy Evaluation (the pediction poblem): fo a given policy π, compute the state-value function V! In ealie chaptes, value functions wee stoed in lookup tables. Hee, the value function estimate at time t, V t, depends on a paamete vecto, and only the paamete vecto is updated. e.g.,! t could be the vecto of connection weights of a neual netwok. R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 2

Adapt Supevised Leaning Algoithms Taining Info = desied (taget) outputs Inputs Supevised Leaning System Outputs Taining example = {input, taget output} Eo = (taget output actual output) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 3

Backups as Taining Examples e.g., the TD(0) backup : [ ] V(s t )! V(s t ) + " t +1 +# V(s t +1 ) $ V(s t ) As a taining example: desciption of s t, t +1 +! V (s t+1 ) { } input taget output R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 4

Any FA Method? In pinciple, yes: atificial neual netwoks decision tees multivaiate egession methods etc. But RL has some special equiements: usually want to lean while inteacting ability to handle nonstationaity othe? R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 5

Gadient Descent Methods tanspose = ( (1), (2),K, (n)) T Assume V t is a (sufficiently smooth) diffeentiable function of! t, fo all s "S. Assume, fo now, taining examples of this fom : { desciption of s t, V! (s t )} R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 6

Pefomance Measues Many ae applicable but a common and simple one is the mean-squaed eo (MSE) ove a distibution P : MSE( ) = & s%s [ ] P(s) V # (s) $V t (s) Why P? Why minimize MSE? Let us assume that P is always the distibution of states with which backups ae done. The on-policy distibution: the distibution ceated while following the policy being evaluated. Stonge esults ae available fo this distibution. 2 R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 7

Gadient Descent Let f be any function of the paamete space. Its gadient at any point in this space is : #" f ( ) = $f ( ) $"(1),$f ( ) $"(2),K,$f ( T % )( ' * & $"(n) )! (2)! t = (! t (1),! t (2)) T Iteatively! t +1 = move down! t "#$ f ( the gadient:!! t )! (1) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 8

Gadient Descent Cont. Fo the MSE given above and using the chain ule: +1 = # 1 2 $% " MSE( ) ' P(s) [ V ( (s) #V (s)] 2 s&s = + $ P(s) [ V ( (s) #V t (s)]%" V t(s) = # 1 2 $% ' s&s R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 9

Gadient Descent Cont. Use just the sample gadient instead: +1 = # 1 2 $% [ V & (s ) #V t (s t )] 2 = + $ [ V & (s t ) #V t (s t )]%" V (s ), t t Since each sample gadient is an unbiased estimate of the tue gadient, this conveges to a local minimum of the MSE if α deceases appopiately with t. E[ V " (s t ) #V t (s t )]$% V t (s t ) = ' P(s) V " (s) #V t (s) s&s [ ] $ % V t (s) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 10

But We Don t have these Tagets Suppose we just have tagets v t instead :! t +1 =! t +" [ v t # V t (s t )]$ V (s )! t t If each v t is an unbiased estimate of V " (s t ), i.e., E{ v t } = V " (s t ), then gadient descent conveges to a local minimum (povided # deceases appopiately). e.g., the Monte Calo taget v t = R t : +1 = + #[ R t $V t (s t )]%" V t (s t ) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 11

What about TD(λ) Tagets? +1 = + #[ R $ t %V t (s t )]&" V t (s t ) Not unbiased fo $ <1 But we do it anyway, using the backwads view : +1 = + #$ t e t, whee : $ t = t +1 + % V t (s t +1 ) &V t (s t ), as usual, and e t = % ' e t&1 + ( V (s t ) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 12

On-Line Gadient-Descent TD(λ) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 13

Linea Methods Repesent states as featue vectos: fo each s " S : # s = (# s (1),# s (2),K,# s (n)) T V t (s) = T " # V t (s) =? n $ i=1 # s = (i)# s (i) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 14

Nice Popeties of Linea FA Methods The gadient is vey simple:! V (s) = # s Fo MSE, the eo suface is simple: quadatic suface with a single minumum. Linea gadient descent TD(λ) conveges: Step size deceases appopiately On-line sampling (states sampled fom the on-policy distibution) Conveges to paamete vecto with popety:! " MSE(! " ) # 1 $% & 1 $ % MSE(! ' ) (Tsitsiklis & Van Roy, 1997) best paamete vecto R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 15

Coase Coding R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 16

Leaning and Coase Coding R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 17

Tile Coding Binay featue fo each tile Numbe of featues pesent at any one time is constant Binay featues means weighted sum easy to compute Easy to compute indices of the featues pesent R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 18

Tile Coding Cont. Iegula tilings Hashing CMAC Ceebella Model Aithmetic Compute Albus 1971 R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 19

Radial Basis Functions (RBFs) e.g., Gaussians % " s (i) = exp # s # c i ' 2 & 2$ i 2 ( * ) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 20

Can you beat the cuse of dimensionality? Can you keep the numbe of featues fom going up exponentially with the dimension? Function complexity, not dimensionality, is the poblem. Kaneva coding: Select a bunch of binay pototypes Use hamming distance as distance measue Dimensionality is no longe a poblem, only complexity Lazy leaning schemes: Remembe all the data To get new value, find neaest neighbos and intepolate e.g., locally-weighted egession R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 21

Contol with FA Leaning state-action values Taining examples of the fom: { desciption of ( s t, a t ), v } t The geneal gadient-descent ule:! t +1 =! t +" [ v t # Q t (s t,a t )]$! Q(s t,a t ) Gadient-descent Sasa(λ) (backwad view):! t +1 =! t +"# t whee e t # t = t +1 + $ Q t (s t +1, a t +1 ) % Q t (s t,a t ) e t = $ & e t %1 + ' Q t (s t,a t )! R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 22

Linea Gadient Descent Sasa(λ) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 23

GPI Linea Gadient Descent Watkins Q(λ) R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 24

Mountain-Ca Task R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 25

Mountain-Ca Results R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 26

Baid s Counteexample R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 27

Baid s Counteexample Cont. R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 28

Should We Bootstap? R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 29

Summay Genealization Adapting supevised-leaning function appoximation methods Gadient-descent methods Linea gadient-descent methods Radial basis functions Tile coding Kaneva coding Nonlinea gadient-descent methods? Backpopation? Subleties involving function appoximation, bootstapping and the on-policy/off-policy distinction R. S. Sutton and A. G. Bato: Reinfocement Leaning: An Intoduction 30