What we learned last time

Size: px
Start display at page:

Download "What we learned last time"

Transcription

1 Wat we learned last time Value-function approximation by stocastic gradient descent enables RL to be applied to arbitrarily large state spaces Most algoritms just carry over Targets from tabular case Wit bootstrapping (TD), we don t get true gradient descent metods but linear, on-policy case is still guaranteed convergent and learning is faster wit n-step metods (n>1), as before For continuous state spaces, coarse/tile coding is a good strategy

2 Capter 1: On-policy Control wit Approximation

3 Value function approximation (VFA) replaces table wit a general parameterized form S t ˆq(S t,a t, ) A t U t

4 On-policy Control wit Approximation (Semi-)gradient metods carry over to control in usual way Mountain Car example n-step metods carry over too, wit usual tradeoffs A new average-reward setting, wit differential value functions and differential algoritms Queuing example (tabular) Te discounting setting is deprecated

5 (Semi-)gradient metods carry over to control in usual on-policy GPI way Always learn action-value function of current policy Always act near-greedily wrt current action-value estimates Te learning rule is same as in Capter 9: t+1 i = t + U t ˆq(S t,a t, t ) rˆq(s t,a t, t ) (Expected Sarsa) U t = R t+1 + a update target, eg, U t = G t (MC) U t = R t+1 + ˆq(S t+1,a t+1, t ) (a S t+1 )ˆq(S t+1,a, t ) U t = s,r p(s i,r S t,a t ) r + (a s )ˆq(s,a, t ) a (Sarsa) (DP)

6 (Semi-)gradient metods carry over to control t+1 i = t + U t ˆq(S t,a t, t ) rˆq(s t,a t, t ) Episodic Semi-gradient Sarsa for Estimating ˆq q Input: a di erentiable function ˆq : S A R n! R Initialize value-function weigts 2 R n arbitrarily (eg, = ) Repeat (for eac episode): S, A initial state and action of episode (eg, "-greedy) Repeat (for eac step of episode): Take action A, observe R, S If S is terminal: + R ˆq(S, A, ) rˆq(s, A, ) Go to next episode Coose A as a function of ˆq(S,, ) (eg, "-greedy) + R + ˆq(S,A, ) ˆq(S, A, ) rˆq(s, A, ) S S A A

7 Example: Te Mountain-Car problem Goal SITUATIONS: car's position and velocity ACTIONS: tree trusts: forward, reverse, none Gravity wins REWARDS: always 1 until car reaces goal Episodic, No Discounting, γ=1 Minimum-Time-to-Goal Problem

8 Values learned wile solving Mountain-Car wit tile coding function approximation MOUNTAIN CAR Goal Step 428 Episode 12 4!12 Position 6 Velocity! Position Velocity ( max a ˆq(s, a, ) Episode 14 Episode 1 Episode Position Velocity Position Velocity Position Velocity Demo

9 Learning curves for semi-gradient Sarsa wit tile coding 1 8 8x8 tilings tiles3py Mountain Car 4 Steps per episode log scale averaged over 1 runs 2 =1/8 =2/8 =5/8 1 5 Episode

10 As we ave seen before, performance is best if an intermediate level of ping is used, corresponding to an n larger tan 1 Figure 13 sow algoritm tends to learn faster and obtain a better asymptotic performan T, as usual Te n-step update equation is tan at n = 1 on Mountain Car task Figure 14 sows results i e ect of parameters and n on rate of learn detailed study of q (St, task At, t+n 1 ) rq (St, At, t+n 1 ), t < T (14) n-step semi-gradient Sarsa is better for n>1 wit = Gt if t + n (n) t+n = t+n 1 + Gt (n) Gt Exercise 12 Give pseudocode for semi-gradient one-step Expected Sar Complete pseudocode is given on next page 3 n=1 As we ave seen before, performance is best if an intermediate 12 N -STEP SEMI-GRADIENT SARSA 237n=16level of bootstrap- n=8 ping is used, corresponding to an n larger tan28 1 Figure 13 sows ow tis algoritm a Car better asymptotic performancen=4at n = 8 Mountain 1 tends to learn faster and obtain 26 Steps per episode tan at n = 1 on Mountain Car task Figure 14 sows results of a more n=2 averaged over first 5 episodes detailed study of e ect of parameters and n on rate of learning on tis n=16 24 and 1 runs task 4 n=1 n=8 Mountain Car Steps per Exercise episode log scale averaged over 1 runs n=4 22 n= con12 Give pseudocode for semi-gradient one-step Expected1 Sarsa for 2 n=1 n=8 3 1 Mountain Car 28 number of tilings (8) Figure n=1 14: E ect of and n on early performance of n-step semi-gradien tile-coding function n=16approximation on Mountain Car task As usual, an i level of bootstrapping (n = 4) performed best Tese results are for selected n=8 5 log scale, and n connected by straigt lines Te standard errors ranged fro Episode tan line widt) for n = 1 to about 4 for n = 16 (wy se results are mo so main e ects are all statistically significant

11 On-policy Control wit Approximation (Semi-)gradient metods carry over to control in usual way Mountain Car example n-step metods carry over too, wit usual tradeoffs A new average-reward setting, wit differential value functions and differential algoritms Queuing example (tabular) Te discounting setting is deprecated

12 On-policy Control wit Approximation (Semi-)gradient metods carry over to control in usual way Mountain Car example n-step metods carry over too, wit usual tradeoffs A new average-reward setting, wit differential value functions and differential algoritms Queuing example (tabular) Te discounting setting is deprecated

13 13 AVERAGE REWARD: A NEW PROBLEM SETTING FOR CONTINUING A new goal for continuing tasks: : A NEW PROBLEM SETTING FOR CONTINUING TASKS239 In average-reward setting, quality of a policy is defined as aver Maximizingrateaverage reward per time step of reward wile following tat policy, wic we denote an ( ): ting, quality of a policy is defined as average T 1 g tat policy, wic we denote ( ) an ( ): = lim E[Rt A:t Maximize T!1 T t=1 t A:t 1 ] 1 ] assuming tat se limits exist is known as ergodicity property = lim E[Rt A:t 1 ], t!1 = d (s) (a s) p(s, r s, a)r, (15) s a s,r (1 ], 13 AVERAGE REWARD: A NEW PROBLEM SETTING FOR CONTINUING TASKS23 a s) p(s, r s, a)r, were expectations are conditioned on prior actions, A, A1,, At 1, be d : S! [, 1] is steady-state distribution under π, s,r taken according to, and d is steady-state distribution, d (s) = limt!1 Pr{S also known as on-policy distribution: wic is assumed to exist and to of beaindependent of S Tis property is known In average-reward setting, quality policy is defined as average onditioned on prior actions, A, A1,, At 1, being ergodicity It means tat werewic MDP startsan or ( ): any early decision made by rate of reward wile following tat policy, we denote is steady-state distribution, d (s) = limt!1 Pr{St = s A:t 1 }, agent can Tave only a temporary e ect; in long run your expectation of being nd to be independent of S Tis1 property is known as a state depends only on policy and MDP transition probabilities Ergodic = lim E[Rtamount A:t by ]reward received per time step ere MDP starts ( ) or any early decision made 1 is T average of!1 T is sufficient to guarantee existence of limits in equations above t=1 rary e ect; in long run your expectation of being in 1

14 (a s, )p(s s, a) = d (s ) d (s) (16 Tis is known as di erential return, and corresponding value functions are s a known as di erential value functions Tey are defined in same way and we willaverage-reward use same notation forreturns m asare we defined ave all in along: v (s) = E [Gt St = s] and In setting, terms of di erences between q (s, a) = Eaverage s, At = a] (similarly for v and q ) Di erential value functions rewards and [Gt St =reward: also ave Bellman equations, just sligtly di erent from tose we ave seen earlier Returns: G = Rt+1 ( )all+ s R ( )all+rewards Rt+3 by ( ) between reward (17 t simply t+2replace We remove and + di erence and true average reward: Tis is known as di erential return, and corresponding value functions ar i in same way and w known as di erential value functions Tey are defined v (s) = (a s) p(s, r s, a) r ( ) + v (s ), Bellman Eqs: In average reward setting, everyting is new will use same notation for [Gt St = s] and a r,sm as we ave all along: v (s) = E prediction i q (s, a) = E [Gt St = s, At = a] (similarly for v and q ) Di erential value function q (s, a) = p(s, r s, a) r ( ) + (a s )q (s, a ), also ave equations, 1 juston-policy sligtly di erent from tose weapproimation ave seen earlier 24 BellmanCHAPTER CONTROL WITH r,s a We simply remove all s and replace all rewards by di erence between reward i and truev average reward: p(s, r s, a) r ( ) + v (s ), and (s) = max i a r,s v (s) = (a s) p(s, r s, a) r ( ) + v (s ), control i q (s,aa) q (s, a) = r,s p(s, r s, a) = r,s p(s, r s, a) r (cf Eqs 314, 41, and 42) Update targets: r,s r ( ) + max q (s, a ) ( ) + a a (a s )q (s, a ), i Tere is also a di erential form of two TD errors: Ut = Rt+1 R t + q (St+1, At+1, ) or Ut = Rt+1 R t + v (St+1, ) t = Rt+1 R estimate, ) v (S, ), and t + v (St+1 t of ( ) (18)

15 Di erential semi-gradient Sarsa for estimating ˆq q Input: a di erentiable function ˆq : S A R n! R Parameters: step sizes, > Initialize value-function weigts 2 R n arbitrarily (eg, = ) Initialize average reward estimate R arbitrarily (eg, R = ) Initialize state S, and action A Repeat (for eac step): Take action A, observe R, S Coose A as a function of ˆq(S,, ) (eg, "-greedy) R R +ˆq(S,A, ) ˆq(S, A, ) R R + + rˆq(s, A, ) S S A A

16 Example: Te access-control queuing problem solved by tabular differential Sarsa Customers wait in line to be served by one of k=1 servers Priority REJECT ACCEPT POLICY Customers pay rewards of 1, 2, 4, or 8 (depending on ir priority) for being served Number of free servers On eac step, customer at front of queue is accepted (served), or rejected 1 5 priority 8 priority 4 Te queue never empties; new customers ave random priorities Differential value of best action -5 priority 2 priority 1 VALUE FUNCTION Busy servers become free -1 wit probability p=6 on eac step Number of free servers t =2,,, = = 1, = 1, R t 231

17 Discounting is futile in continuing control settings wit function approximation Te problem statement is broken! Te goal is broken! We can not longer give a useful ordering on policies we can only order a few policies, tose tat dominate ors in all states It would be OK if we could say wat states we care about, but in control case we can t Suppose we cared about states according to ow often y occur? Surprisingly, discounting n becomes irrelevant!

18 Te Futility of Discounting in Continuing Problems Peraps discounting can be saved by coosing an objective tat sums discounted values over distribution wit wic states occur under policy: J( ) = s = s d (s)v (s) d (s) a = ( )+ s (a s) s d (s) a = ( )+ s v (s ) s (were v is discounted value function) r (a s) s d (s) a p(s,r s, a) r + r v (s ) (Bellman Eq) p(s,r s, a) v (s ) (from (15)) (a s)p(s s, a) (from (38)) = ( )+ s v (s )d (s ) (from (16)) = ( )+ J( ) = ( )+ ( )+ 2 J( ) = ( )+ ( )+ 2 ( )+ 3 ( )+ = 1 ( ) 1 Te proposed discounted objective orders policies identically to undiscounted (average reward) objective We ave failed to save discounting!

19 Conclusions Control is straigtforward in on-policy, episodic, linear case For continuing case, we need average-reward setting wic is a lot like just replacing Rt wit Rt - η(π) everywere were η(π) is average reward per step, or its estimate We sould probably never use discounting as a control objective Formal results (bounds) exist for linear, on-policy case we get cattering near a good solution, not convergence

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods hen extend to control methods R. S.

More information

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning.

Monte Carlo is important in practice. CSE 190: Reinforcement Learning: An Introduction. Chapter 6: Temporal Difference Learning. Monte Carlo is important in practice CSE 190: Reinforcement Learning: An Introduction Chapter 6: emporal Difference Learning When there are just a few possibilitieo value, out of a large state space, Monte

More information

Planning in Markov Decision Processes

Planning in Markov Decision Processes Carnegie Mellon School of Computer Science Deep Reinforcement Learning and Control Planning in Markov Decision Processes Lecture 3, CMU 10703 Katerina Fragkiadaki Markov Decision Process (MDP) A Markov

More information

Reinforcement Learning. Machine Learning, Fall 2010

Reinforcement Learning. Machine Learning, Fall 2010 Reinforcement Learning Machine Learning, Fall 2010 1 Administrativia This week: finish RL, most likely start graphical models LA2: due on Thursday LA3: comes out on Thursday TA Office hours: Today 1:30-2:30

More information

Chapter 6: Temporal Difference Learning

Chapter 6: Temporal Difference Learning Chapter 6: emporal Difference Learning Objectives of this chapter: Introduce emporal Difference (D) learning Focus first on policy evaluation, or prediction, methods Compare efficiency of D learning with

More information

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer.

This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. This question has three parts, each of which can be answered concisely, but be prepared to explain and justify your concise answer. 1. Suppose you have a policy and its action-value function, q, then you

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

4.2 - Richardson Extrapolation

4.2 - Richardson Extrapolation . - Ricardson Extrapolation. Small-O Notation: Recall tat te big-o notation used to define te rate of convergence in Section.: Definition Let x n n converge to a number x. Suppose tat n n is a sequence

More information

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER /2019 ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS MATH00030 SEMESTER 208/209 DR. ANTHONY BROWN 6. Differential Calculus 6.. Differentiation from First Principles. In tis capter, we will introduce

More information

These errors are made from replacing an infinite process by finite one.

These errors are made from replacing an infinite process by finite one. Introduction :- Tis course examines problems tat can be solved by metods of approximation, tecniques we call numerical metods. We begin by considering some of te matematical and computational topics tat

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Internet Monetization

Internet Monetization Internet Monetization March May, 2013 Discrete time Finite A decision process (MDP) is reward process with decisions. It models an environment in which all states are and time is divided into stages. Definition

More information

Function Composition and Chain Rules

Function Composition and Chain Rules Function Composition and s James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 8, 2017 Outline 1 Function Composition and Continuity 2 Function

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

Continuity and Differentiability Worksheet

Continuity and Differentiability Worksheet Continuity and Differentiability Workseet (Be sure tat you can also do te grapical eercises from te tet- Tese were not included below! Typical problems are like problems -3, p. 6; -3, p. 7; 33-34, p. 7;

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Section 3: The Derivative Definition of the Derivative

Section 3: The Derivative Definition of the Derivative Capter 2 Te Derivative Business Calculus 85 Section 3: Te Derivative Definition of te Derivative Returning to te tangent slope problem from te first section, let's look at te problem of finding te slope

More information

Reinforcement Learning. George Konidaris

Reinforcement Learning. George Konidaris Reinforcement Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom

More information

Temporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI

Temporal Difference. Learning KENNETH TRAN. Principal Research Engineer, MSR AI Temporal Difference Learning KENNETH TRAN Principal Research Engineer, MSR AI Temporal Difference Learning Policy Evaluation Intro to model-free learning Monte Carlo Learning Temporal Difference Learning

More information

2.3 Product and Quotient Rules

2.3 Product and Quotient Rules .3. PRODUCT AND QUOTIENT RULES 75.3 Product and Quotient Rules.3.1 Product rule Suppose tat f and g are two di erentiable functions. Ten ( g (x)) 0 = f 0 (x) g (x) + g 0 (x) See.3.5 on page 77 for a proof.

More information

Section 2.1 The Definition of the Derivative. We are interested in finding the slope of the tangent line at a specific point.

Section 2.1 The Definition of the Derivative. We are interested in finding the slope of the tangent line at a specific point. Popper 6: Review of skills: Find tis difference quotient. f ( x ) f ( x) if f ( x) x Answer coices given in audio on te video. Section.1 Te Definition of te Derivative We are interested in finding te slope

More information

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a?

(a) At what number x = a does f have a removable discontinuity? What value f(a) should be assigned to f at x = a in order to make f continuous at a? Solutions to Test 1 Fall 016 1pt 1. Te grap of a function f(x) is sown at rigt below. Part I. State te value of eac limit. If a limit is infinite, state weter it is or. If a limit does not exist (but is

More information

Calculus I Homework: The Derivative as a Function Page 1

Calculus I Homework: The Derivative as a Function Page 1 Calculus I Homework: Te Derivative as a Function Page 1 Example (2.9.16) Make a careful sketc of te grap of f(x) = sin x and below it sketc te grap of f (x). Try to guess te formula of f (x) from its grap.

More information

Lab 6 Derivatives and Mutant Bacteria

Lab 6 Derivatives and Mutant Bacteria Lab 6 Derivatives and Mutant Bacteria Date: September 27, 20 Assignment Due Date: October 4, 20 Goal: In tis lab you will furter explore te concept of a derivative using R. You will use your knowledge

More information

Reinforcement Learning. Spring 2018 Defining MDPs, Planning

Reinforcement Learning. Spring 2018 Defining MDPs, Planning Reinforcement Learning Spring 2018 Defining MDPs, Planning understandability 0 Slide 10 time You are here Markov Process Where you will go depends only on where you are Markov Process: Information state

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

https://www.youtube.com/watch?v=ymvi1l746us Eligibility traces Chapter 12, plus some extra stuff! Like n-step methods, but better! Eligibility traces A mechanism that allow TD, Sarsa and Q-learning to

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti

MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti 1 MARKOV DECISION PROCESSES (MDP) AND REINFORCEMENT LEARNING (RL) Versione originale delle slide fornita dal Prof. Francesco Lo Presti Historical background 2 Original motivation: animal learning Early

More information

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk

Bob Brown Math 251 Calculus 1 Chapter 3, Section 1 Completed 1 CCBC Dundalk Bob Brown Mat 251 Calculus 1 Capter 3, Section 1 Completed 1 Te Tangent Line Problem Te idea of a tangent line first arises in geometry in te context of a circle. But before we jump into a discussion of

More information

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be

Prof. Dr. Ann Nowé. Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING AN INTRODUCTION Prof. Dr. Ann Nowé Artificial Intelligence Lab ai.vub.ac.be REINFORCEMENT LEARNING WHAT IS IT? What is it? Learning from interaction Learning about, from, and while

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

Excerpt from "Calculus" 2013 AoPS Inc.

Excerpt from Calculus 2013 AoPS Inc. Excerpt from "Calculus" 03 AoPS Inc. Te term related rates refers to two quantities tat are dependent on eac oter and tat are canging over time. We can use te dependent relationsip between te quantities

More information

Introduction to Reinforcement Learning. Part 5: Temporal-Difference Learning

Introduction to Reinforcement Learning. Part 5: Temporal-Difference Learning Introduction to Reinforcement Learning Part 5: emporal-difference Learning What everybody should know about emporal-difference (D) learning Used to learn value functions without human input Learns a guess

More information

1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits.

1. State whether the function is an exponential growth or exponential decay, and describe its end behaviour using limits. Questions 1. State weter te function is an exponential growt or exponential decay, and describe its end beaviour using its. (a) f(x) = 3 2x (b) f(x) = 0.5 x (c) f(x) = e (d) f(x) = ( ) x 1 4 2. Matc te

More information

Reinforcement learning an introduction

Reinforcement learning an introduction Reinforcement learning an introduction Prof. Dr. Ann Nowé Computational Modeling Group AIlab ai.vub.ac.be November 2013 Reinforcement Learning What is it? Learning from interaction Learning about, from,

More information

232 Calculus and Structures

232 Calculus and Structures 3 Calculus and Structures CHAPTER 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS FOR EVALUATING BEAMS Calculus and Structures 33 Copyrigt Capter 17 JUSTIFICATION OF THE AREA AND SLOPE METHODS 17.1 THE

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

(4.2) -Richardson Extrapolation

(4.2) -Richardson Extrapolation (.) -Ricardson Extrapolation. Small-O Notation: Recall tat te big-o notation used to define te rate of convergence in Section.: Suppose tat lim G 0 and lim F L. Te function F is said to converge to L as

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Mario Martin CS-UPC May 18, 2018 Mario Martin (CS-UPC) Reinforcement Learning May 18, 2018 / 65 Recap Algorithms: MonteCarlo methods for Policy Evaluation

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning RL in continuous MDPs March April, 2015 Large/Continuous MDPs Large/Continuous state space Tabular representation cannot be used Large/Continuous action space Maximization over action

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

Spike timing dependent plasticity - STDP

Spike timing dependent plasticity - STDP Spike timing dependent plasticity - STDP Post before Pre: LTD + ms Pre before Post: LTP - ms Markram et. al. 997 Spike Timing Dependent Plasticity: Temporal Hebbian Learning Synaptic cange % Pre t Pre

More information

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations

Physically Based Modeling: Principles and Practice Implicit Methods for Differential Equations Pysically Based Modeling: Principles and Practice Implicit Metods for Differential Equations David Baraff Robotics Institute Carnegie Mellon University Please note: Tis document is 997 by David Baraff

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Reinforcement learning Daniel Hennes 4.12.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Reinforcement learning Model based and

More information

Chapter 2 Limits and Continuity

Chapter 2 Limits and Continuity 4 Section. Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 6) Quick Review.. f () ( ) () 4 0. f () 4( ) 4. f () sin sin 0 4. f (). 4 4 4 6. c c c 7. 8. c d d c d d c d c 9. 8 ( )(

More information

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro

CMU Lecture 12: Reinforcement Learning. Teacher: Gianni A. Di Caro CMU 15-781 Lecture 12: Reinforcement Learning Teacher: Gianni A. Di Caro REINFORCEMENT LEARNING Transition Model? State Action Reward model? Agent Goal: Maximize expected sum of future rewards 2 MDP PLANNING

More information

Differential Calculus (The basics) Prepared by Mr. C. Hull

Differential Calculus (The basics) Prepared by Mr. C. Hull Differential Calculus Te basics) A : Limits In tis work on limits, we will deal only wit functions i.e. tose relationsips in wic an input variable ) defines a unique output variable y). Wen we work wit

More information

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x)

1 Calculus. 1.1 Gradients and the Derivative. Q f(x+h) f(x) Calculus. Gradients and te Derivative Q f(x+) δy P T δx R f(x) 0 x x+ Let P (x, f(x)) and Q(x+, f(x+)) denote two points on te curve of te function y = f(x) and let R denote te point of intersection of

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Machine Learning. Reinforcement learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Machine Learning Reinforcement learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1396 1 / 32 Table of contents 1 Introduction

More information

Work with a partner. a. Write a formula for the area A of a parallelogram.

Work with a partner. a. Write a formula for the area A of a parallelogram. COMMON CORE Learning Standard HSA-CED.A.4 1. Rewriting Equations and Formulas Essential Question How can you use a formula for one measurement to write a formula for a different measurement? Using an Area

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Function approximation Daniel Hennes 19.06.2017 University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Eligibility traces n-step TD returns Forward and backward view Function

More information

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4.

Solution. Solution. f (x) = (cos x)2 cos(2x) 2 sin(2x) 2 cos x ( sin x) (cos x) 4. f (π/4) = ( 2/2) ( 2/2) ( 2/2) ( 2/2) 4. December 09, 20 Calculus PracticeTest s Name: (4 points) Find te absolute extrema of f(x) = x 3 0 on te interval [0, 4] Te derivative of f(x) is f (x) = 3x 2, wic is zero only at x = 0 Tus we only need

More information

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1

Chapter 2. Limits and Continuity 16( ) 16( 9) = = 001. Section 2.1 Rates of Change and Limits (pp ) Quick Review 2.1 Capter Limits and Continuity Section. Rates of Cange and Limits (pp. 969) Quick Review..... f ( ) ( ) ( ) 0 ( ) f ( ) f ( ) sin π sin π 0 f ( ). < < < 6. < c c < < c 7. < < < < < 8. 9. 0. c < d d < c

More information

Decision Theory: Q-Learning

Decision Theory: Q-Learning Decision Theory: Q-Learning CPSC 322 Decision Theory 5 Textbook 12.5 Decision Theory: Q-Learning CPSC 322 Decision Theory 5, Slide 1 Lecture Overview 1 Recap 2 Asynchronous Value Iteration 3 Q-Learning

More information

5.1 introduction problem : Given a function f(x), find a polynomial approximation p n (x).

5.1 introduction problem : Given a function f(x), find a polynomial approximation p n (x). capter 5 : polynomial approximation and interpolation 5 introduction problem : Given a function f(x), find a polynomial approximation p n (x) Z b Z application : f(x)dx b p n(x)dx, a a one solution : Te

More information

Reinforcement Learning II

Reinforcement Learning II Reinforcement Learning II Andrea Bonarini Artificial Intelligence and Robotics Lab Department of Electronics and Information Politecnico di Milano E-mail: bonarini@elet.polimi.it URL:http://www.dei.polimi.it/people/bonarini

More information

CSC321 Lecture 22: Q-Learning

CSC321 Lecture 22: Q-Learning CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21 Overview Second of 3 lectures on reinforcement learning Last time: policy gradient (e.g. REINFORCE) Optimize

More information

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming

Introduction to Reinforcement Learning. Part 6: Core Theory II: Bellman Equations and Dynamic Programming Introduction to Reinforcement Learning Part 6: Core Theory II: Bellman Equations and Dynamic Programming Bellman Equations Recursive relationships among values that can be used to compute values The tree

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating? CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to

More information

CHAPTER (A) When x = 2, y = 6, so f( 2) = 6. (B) When y = 4, x can equal 6, 2, or 4.

CHAPTER (A) When x = 2, y = 6, so f( 2) = 6. (B) When y = 4, x can equal 6, 2, or 4. SECTION 3-1 101 CHAPTER 3 Section 3-1 1. No. A correspondence between two sets is a function only if eactly one element of te second set corresponds to eac element of te first set. 3. Te domain of a function

More information

Some Review Problems for First Midterm Mathematics 1300, Calculus 1

Some Review Problems for First Midterm Mathematics 1300, Calculus 1 Some Review Problems for First Midterm Matematics 00, Calculus. Consider te trigonometric function f(t) wose grap is sown below. Write down a possible formula for f(t). Tis function appears to be an odd,

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES (Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,

More information

MA119-A Applied Calculus for Business Fall Homework 4 Solutions Due 9/29/ :30AM

MA119-A Applied Calculus for Business Fall Homework 4 Solutions Due 9/29/ :30AM MA9-A Applied Calculus for Business 006 Fall Homework Solutions Due 9/9/006 0:0AM. #0 Find te it 5 0 + +.. #8 Find te it. #6 Find te it 5 0 + + = (0) 5 0 (0) + (0) + =.!! r + +. r s r + + = () + 0 () +

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

3.1 Extreme Values of a Function

3.1 Extreme Values of a Function .1 Etreme Values of a Function Section.1 Notes Page 1 One application of te derivative is finding minimum and maimum values off a grap. In precalculus we were only able to do tis wit quadratics by find

More information

Section 15.6 Directional Derivatives and the Gradient Vector

Section 15.6 Directional Derivatives and the Gradient Vector Section 15.6 Directional Derivatives and te Gradient Vector Finding rates of cange in different directions Recall tat wen we first started considering derivatives of functions of more tan one variable,

More information

Calculus I Practice Exam 1A

Calculus I Practice Exam 1A Calculus I Practice Exam A Calculus I Practice Exam A Tis practice exam empasizes conceptual connections and understanding to a greater degree tan te exams tat are usually administered in introductory

More information

Reinforcement Learning with Function Approximation. Joseph Christian G. Noel

Reinforcement Learning with Function Approximation. Joseph Christian G. Noel Reinforcement Learning with Function Approximation Joseph Christian G. Noel November 2011 Abstract Reinforcement learning (RL) is a key problem in the field of Artificial Intelligence. The main goal is

More information

MATH1131/1141 Calculus Test S1 v8a

MATH1131/1141 Calculus Test S1 v8a MATH/ Calculus Test 8 S v8a October, 7 Tese solutions were written by Joann Blanco, typed by Brendan Trin and edited by Mattew Yan and Henderson Ko Please be etical wit tis resource It is for te use of

More information

Chapter 4: Numerical Methods for Common Mathematical Problems

Chapter 4: Numerical Methods for Common Mathematical Problems 1 Capter 4: Numerical Metods for Common Matematical Problems Interpolation Problem: Suppose we ave data defined at a discrete set of points (x i, y i ), i = 0, 1,..., N. Often it is useful to ave a smoot

More information

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation

Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation Lecture 3: Policy Evaluation Without Knowing How the World Works / Model Free Policy Evaluation CS234: RL Emma Brunskill Winter 2018 Material builds on structure from David SIlver s Lecture 4: Model-Free

More information

The Verlet Algorithm for Molecular Dynamics Simulations

The Verlet Algorithm for Molecular Dynamics Simulations Cemistry 380.37 Fall 2015 Dr. Jean M. Standard November 9, 2015 Te Verlet Algoritm for Molecular Dynamics Simulations Equations of motion For a many-body system consisting of N particles, Newton's classical

More information

15-780: ReinforcementLearning

15-780: ReinforcementLearning 15-780: ReinforcementLearning J. Zico Kolter March 2, 2016 1 Outline Challenge of RL Model-based methods Model-free methods Exploration and exploitation 2 Outline Challenge of RL Model-based methods Model-free

More information

Section 2: The Derivative Definition of the Derivative

Section 2: The Derivative Definition of the Derivative Capter 2 Te Derivative Applied Calculus 80 Section 2: Te Derivative Definition of te Derivative Suppose we drop a tomato from te top of a 00 foot building and time its fall. Time (sec) Heigt (ft) 0.0 00

More information

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating?

CSE 190: Reinforcement Learning: An Introduction. Chapter 8: Generalization and Function Approximation. Pop Quiz: What Function Are We Approximating? CSE 190: Reinforcement Learning: An Introduction Chapter 8: Generalization and Function Approximation Objectives of this chapter: Look at how experience with a limited part of the state set be used to

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

Chapter 4: Dynamic Programming

Chapter 4: Dynamic Programming Chapter 4: Dynamic Programming Objectives of this chapter: Overview of a collection of classical solution methods for MDPs known as dynamic programming (DP) Show how DP can be used to compute value functions,

More information

Continuity and Differentiability of the Trigonometric Functions

Continuity and Differentiability of the Trigonometric Functions [Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

5. (a) Find the slope of the tangent line to the parabola y = x + 2x

5. (a) Find the slope of the tangent line to the parabola y = x + 2x MATH 141 090 Homework Solutions Fall 00 Section.6: Pages 148 150 3. Consider te slope of te given curve at eac of te five points sown (see text for figure). List tese five slopes in decreasing order and

More information

2.3 More Differentiation Patterns

2.3 More Differentiation Patterns 144 te derivative 2.3 More Differentiation Patterns Polynomials are very useful, but tey are not te only functions we need. Tis section uses te ideas of te two previous sections to develop tecniques for

More information

Solutions Manual for Precalculus An Investigation of Functions

Solutions Manual for Precalculus An Investigation of Functions Solutions Manual for Precalculus An Investigation of Functions David Lippman, Melonie Rasmussen 1 st Edition Solutions created at Te Evergreen State College and Soreline Community College 1.1 Solutions

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Yishay Mansour. AT&T Labs and Tel-Aviv University. design special-purpose planning algorithms that exploit. this structure.

Yishay Mansour. AT&T Labs and Tel-Aviv University. design special-purpose planning algorithms that exploit. this structure. A Sparse Sampling Algoritm for Near-Optimal Planning in Large Markov Decision Processes Micael Kearns AT&T Labs mkearns@researc.att.com Yisay Mansour AT&T Labs and Tel-Aviv University mansour@researc.att.com

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1

Chapter 7: Eligibility Traces. R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Chapter 7: Eligibility Traces R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction 1 Midterm Mean = 77.33 Median = 82 R. S. Sutton and A. G. Barto: Reinforcement Learning: An Introduction

More information

A.P. CALCULUS (AB) Outline Chapter 3 (Derivatives)

A.P. CALCULUS (AB) Outline Chapter 3 (Derivatives) A.P. CALCULUS (AB) Outline Capter 3 (Derivatives) NAME Date Previously in Capter 2 we determined te slope of a tangent line to a curve at a point as te limit of te slopes of secant lines using tat point

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if

Recall from our discussion of continuity in lecture a function is continuous at a point x = a if and only if Computational Aspects of its. Keeping te simple simple. Recall by elementary functions we mean :Polynomials (including linear and quadratic equations) Eponentials Logaritms Trig Functions Rational Functions

More information

Chapter 1D - Rational Expressions

Chapter 1D - Rational Expressions - Capter 1D Capter 1D - Rational Expressions Definition of a Rational Expression A rational expression is te quotient of two polynomials. (Recall: A function px is a polynomial in x of degree n, if tere

More information