Artificial Intelligence Markov Decision Problems

Similar documents
Reinforcement Learning and Policy Reuse

Reinforcement learning

Reinforcement learning II

Reinforcement Learning

19 Optimal behavior: Game theory

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

Bellman Optimality Equation for V*

Administrivia CSE 190: Reinforcement Learning: An Introduction

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

CS 188: Artificial Intelligence Fall 2010

Reinforcement Learning for Robotic Locomotions

CS 188: Artificial Intelligence

{ } = E! & $ " k r t +k +1

Chapter 4: Dynamic Programming

Bias in Natural Actor-Critic Algorithms

Uninformed Search Lecture 4

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

Lecture 3 Gaussian Probability Distribution

TP 10:Importance Sampling-The Metropolis Algorithm-The Ising Model-The Jackknife Method

PHYSICS 211 MIDTERM I 22 October 2003

Math 116 Final Exam April 26, 2013

CS 188: Artificial Intelligence Spring 2007

Markov Decision Processes

1 Probability Density Functions

CS 188 Introduction to Artificial Intelligence Fall 2018 Note 7

Acceptance Sampling by Attributes

MArkov decision processes (MDPs) have been widely

4-4 E-field Calculations using Coulomb s Law

Analysis of Variance and Design of Experiments-II

20.2. The Transform and its Inverse. Introduction. Prerequisites. Learning Outcomes

Improper Integrals. Type I Improper Integrals How do we evaluate an integral such as

Adding and Subtracting Rational Expressions

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Working with Powers and Exponents

Properties of Integrals, Indefinite Integrals. Goals: Definition of the Definite Integral Integral Calculations using Antiderivatives

Efficient Planning in R-max

Chapter 6 Notes, Larson/Hostetler 3e

The graphs of Rational Functions

f(x) dx, If one of these two conditions is not met, we call the integral improper. Our usual definition for the value for the definite integral

Each term is formed by adding a constant to the previous term. Geometric progression

Mathematics Extension 1

1. Extend QR downwards to meet the x-axis at U(6, 0). y

We will see what is meant by standard form very shortly

2D1431 Machine Learning Lab 3: Reinforcement Learning

Chapter 5 Plan-Space Planning

Math 8 Winter 2015 Applications of Integration

Cf. Linn Sennott, Stochastic Dynamic Programming and the Control of Queueing Systems, Wiley Series in Probability & Statistics, 1999.

Chapter 5 : Continuous Random Variables

Pre-Calculus TMTA Test 2018

A sequence is a list of numbers in a specific order. A series is a sum of the terms of a sequence.

Multi-Armed Bandits: Non-adaptive and Adaptive Sampling

The Fundamental Theorem of Calculus, Particle Motion, and Average Value

Solutions Problem Set 2. Problem (a) Let M denote the DFA constructed by swapping the accept and non-accepting state in M.

Math 31S. Rumbos Fall Solutions to Assignment #16

Name Solutions to Test 3 November 8, 2017

Section 6: Area, Volume, and Average Value

Continuous Random Variables

Student Session Topic: Particle Motion

Robot Planning in Partially Observable Continuous Domains

Line Integrals. Partitioning the Curve. Estimating the Mass

Review of Calculus, cont d

DATA Search I 魏忠钰. 复旦大学大数据学院 School of Data Science, Fudan University. March 7 th, 2018

Math 2142 Homework 2 Solutions. Problem 1. Prove the following formulas for Laplace transforms for s > 0. a s 2 + a 2 L{cos at} = e st.

Robot Planning in Partially Observable Continuous Domains

Module 9: Tries and String Matching

Module 9: Tries and String Matching

Reinforcement Learning

PHYS 601 HW 5 Solution. We wish to find a Fourier expansion of e sin ψ so that the solution can be written in the form

APPROXIMATE INTEGRATION

MATH SS124 Sec 39 Concepts summary with examples

Lesson 1.6 Exercises, pages 68 73

List all of the possible rational roots of each equation. Then find all solutions (both real and imaginary) of the equation. 1.

Exam 2, Mathematics 4701, Section ETY6 6:05 pm 7:40 pm, March 31, 2016, IH-1105 Instructor: Attila Máté 1

HIGHER SCHOOL CERTIFICATE EXAMINATION MATHEMATICS 3 UNIT (ADDITIONAL) AND 3/4 UNIT (COMMON) Time allowed Two hours (Plus 5 minutes reading time)

Orthogonal Polynomials

AP Calculus Multiple Choice: BC Edition Solutions

CS 109 Lecture 11 April 20th, 2016

We divide the interval [a, b] into subintervals of equal length x = b a n

Week 12 Notes. Aim: How do we use differentiation to maximize/minimize certain values (e.g. profit, cost,

MATH 115 FINAL EXAM. April 25, 2005

Math 1B, lecture 4: Error bounds for numerical methods

Goals: Determine how to calculate the area described by a function. Define the definite integral. Explore the relationship between the definite

Individual Contest. English Version. Time limit: 90 minutes. Instructions:

x ) dx dx x sec x over the interval (, ).

We know that if f is a continuous nonnegative function on the interval [a, b], then b

MORE FUNCTION GRAPHING; OPTIMIZATION. (Last edited October 28, 2013 at 11:09pm.)

New data structures to reduce data size and search time

Math 116 Calculus II

Ph2b Quiz - 1. Instructions

The ifs Package. December 28, 2005

1 The Riemann Integral

Precalculus Spring 2017

MA FINAL EXAM INSTRUCTIONS

THE KENNESAW STATE UNIVERSITY HIGH SCHOOL MATHEMATICS COMPETITION PART I MULTIPLE CHOICE NO CALCULATORS 90 MINUTES

What's Your Body Composition?

Do the one-dimensional kinetic energy and momentum operators commute? If not, what operator does their commutator represent?

Duality # Second iteration for HW problem. Recall our LP example problem we have been working on, in equality form, is given below.

Hidden Markov Models

CS 188: Artificial Intelligence Fall Announcements

Transcription:

rtificil Intelligence Mrkov eciion Problem ilon - briefly mentioned in hpter Ruell nd orvig - hpter 7 Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple: probbilitic blockworld ction outcome probbility time (= cot) move ucce filure..9 minute minute pint ucce. minute Mrkov eciion Problem; pge of how to determine the verge pln-execution time of given pln t move to t i = verge pln-execution time until tte i reched if the gent in tte i nd follow the pln G t F t color move F to G t t = +. t +.9 t t = +. t t = +. t +.9 t t = t = (= verge pln-execution time) t = t = t = Mrkov eciion Problem; pge of color color H I move H to I color J J Pln verge pln execution time 6. minute Pln verge pln execution time 9. minute K L move K to L move to G F color move F to G S move S to tble T U move T to U Pln verge pln execution time. minute Pln verge pln execution time. minute M P O V W Y X move M to move O to P move V to W move X to Y R Q Z move Q to R move Z to

how to (lmot) detemine pln with miniml verge pln-execution time determinitic plnning nd erch move to move to ground color color move to the deciion tree i infinite! ction re determinitic Mrkov eciion Problem; pge 5 of move to the vlue ocited with thee chnce node hould be the me ==> the ction ocited with thee choice node hould be the me ==> whenever the configurtion of the block i the me, one wnt to execute the me ction (= policy) ction hve determinitic effect tte nd ction determine uniquely the ucceor tte tte re completely obervble pln i equence of ction (= pth) minimize totl cot optiml pln i cyclic Mrkov eciion Problem; pge 6 of ction re probbilitic the robot cn drift probbilitic plnning nd erch =Mrkov eciion Problem (MP).5.5.5 Mrkov property ction hve probbilitic effect tte nd ction uniquely determine prob ditribution over ucceor tte tte re completely obervble pln i mpping from tte to ction (= policy) minimize expected totl cot optiml pln cn be cyclic.5 - W W determinitic plnning nd erch pln i equence of ction (= pth) cn be found uing (forwrd or bckwrd) erch Mrkov eciion Problem pln i mpping from tte to ction (= policy) how to find it? determine the expected ditnce of ll tte greedily ign the ction to ech tte tht decree the expected ditnce the mot Mrkov eciion Problem; pge 7 of Mrkov eciion Problem; pge 8 of

() ucc(,) c(,) gd() determinitic plnning nd erch = ditnce of tte = tte = ction = et of ction tht cn be executed in tte = the tte tht reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte gd() = gd() = min ε () c(,) + gd(ucc(,)) if i tte if i not tte () ucc(,) c(,) p(,) gd() Mrkov eciion Problem.5 = tte = ction = et of ction tht cn be executed in tte = the et of tte tht cn reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte = the probbility tht tte reult from the execution of ction in tte = expected ditnce of tte.5.5.5 gd() = gd() = min ε () (c(,) + Σ ε ucc(,) p(,) gd( )) ellmn eqution if i tte if i not tte () = the optiml ction to execute in tte () = the optiml ction to execute in tte () = rgmin ε () c(,) + gd(ucc(,)) if i not tte () = rgmin ε () (c(,) + Σ ε ucc(,) p(,) gd( )) if i not tte Mrkov eciion Problem; pge 9 of Mrkov eciion Problem; pge of exmple determinitic plnning nd erch determinitic plnning nd erch Mrkov eciion Problem; pge of Mrkov eciion Problem.5 given the expected ditnce, we cn ue the definition to check them but clculting them i chicken-nd-egg problem.5.5.5.5 () ucc(,) c(,) Mrkov eciion Problem; pge of gd() = ditnce of tte (= miniml cot until i reched if execution t gd i () = miniml cot until i reched or i ction hve been executed if execution in tte for i lrger thn contnt: gd() = gd i () (= once gd i () = gd i- () for ll tte ) gd () = = tte = ction = et of ction tht cn be executed in tte = the tte tht reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte gd i () = gd i () = min ε () (c(,) + gd i- (ucc(,))) if i tte if i not tte

() ucc(,) c(,) p(,) gd() gd i () Mrkov eciion Problem.5.5.5.5 = tte = ction = et of ction tht cn be executed in tte = the et of tte tht cn reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte = the probbility tht tte reult from the execution of ction in tte = expected ditnce of tte = miniml expected cot until i reched or i ction hve been executed if execution in tte gd() = lim i -> infinity gd i () (not necerily fter finite mount of time). i :=. Set (for ll εs) gd i () =.. i := i+. Set (for ll εs) Vlue Itertion mintin pproximtion of the ditnce (= vlue) gd i () = gd i () = min ε () (c(,) + Σ ε ucc(,) p(,) gd i- ( )) 5. If (for ome εs) gd i () - gd i- () > mll contnt, go to. 6. Set (for ll εs tht re not tte) () = rgmin ε () (c(,) + Σ ε ucc(,) p(,) gd i ( )) if i tte if i not tte gd () = gd i () = gd i () = min ε () (c(,) + Σ ε ucc(,) p(,) gd i- ( )) if i tte if i not tte Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of exmple of Vlue Itertion ().5 tte.5.5.5 i 5 6 exmple of Vlue Itertion ().5 tte.5.5.5.5 (no dicounting) tte.5.5.75.75.75.75.75.6875.5.5.75.5.5 which ction to execute in the tte? : + = : +.5 +.5.5 =.75 execute! Mrkov eciion Problem; pge 5 of Mrkov eciion Problem; pge 6 of

Mrkov eciion Problem; pge 7 of Policy Itertion mintin policy. i :=. Set (for ll εs tht re not tte) i () to n rbitrry ction in ().. Set gd i () to the verge pln-execution time until tte i reched if the gent in tte nd follow policy i. i := i+ 5. Set (for ll εs tht re not tte) i () = rgmin ε () (c(,) + Σ ε ucc(,) p(,) gd i ( )) 6. If (for ome εs tht i not tte) i () doe not equl i- (), go to 7. Set (for ll εs tht re not tte) () = i (). ote: The initil policy o h to gurntee tht the gent reche tte with probbility one no mtter which tte it i ed in. Mrkov eciion Problem; pge 8 of exmple of Policy Itertion.5 tte.5.5.5 (no dicounting) policy t i= () = (could lo hve been ) nd (tte ) = gd () = +.5 gd () +.5 gd (tte) = 6 gd (tte ) = +.5 gd () +.5 gd () = gd () = policy t i= () = nd (tte ) = gd () = +. gd () = gd (tte ) = +.5 gd () +.5 gd () =.5 gd () = policy t i= () = nd (tte ) = execute ction in the tte! extenion: no () extenion: no () cnnot minimize expected totl cot wht if there i no? living in the world cn no longer minimize expected cot until the i reched expected totl cot = infinite expected totl cot = infinite here: - cn minimize expected cot per ction execution - cn minimize expected totl dicounted cot Mrkov eciion Problem; pge 9 of Mrkov eciion Problem; pge of

extenion: no () totl dicounted cot = dicount fctor extenion: no () cn minimize the expected totl dicounted cot - ume γ =.9 if the interet rte i (-γ)/γ (for < γ < ), how much money do I need to py omeone right now o tht there i no difference to pying the following yerly intllment expected totl dicounted cot =.9 expected totl dicounted cot =. x dollr right now re worth ( + (-γ)/γ)x = x/γ dollr in yer o, y dollr in yer re worth γ y dollr right now nwer: + γ + γ + γ + γ + Mrkov eciion Problem; pge of Mrkov eciion Problem; pge of - dicounting mke the totl cot finite c c c c c expected totl dicounted cot = c/(-γ) - dicounting moothe out the horizon - dicounting cn be interpreted the probbility of dying Mrkov eciion Problem; pge of extenion: no (5) dicounting: if the interet rte i (-γ)/γ, then y dollr in yer re worth γ y dollr right now dying: if I die lter thi yer with probbility -γ, then the expected vlue of y dollr in yer i γ y right now γ () ucc(,) c(,) p(,) gd() Mrkov eciion Problem; pge of = dicount fctor ( < γ < ); if there i, cn et γ = (no dicounting) = tte = ction = et of ction tht cn be executed in tte = the et of tte tht cn reult from the execution of ction in tte = the cot tht reult from the execution of ction in tte = the probbility tht tte reult from the execution of ction in tte = miniml expected dicounted totl cot if execution in tte gd() = gd() = min ε () (c(,) + γ Σ ε ucc(,) p(,) gd( )) gd i () () = the optiml ction to execute in tte () = rgmin ε () (c(,) + γ Σ ε ucc(,) p(,) gd( )) if i tte if i not tte = miniml expected dicounted totl cot until i reched or i ction hve been executed if execution in tte gd () = gd i () = gd i () = min ε () (c(,) + γ Σ ε ucc(,) p(,) gd i- ( )) gd() = lim i -> infinity gd i () Vlue-Itertion with or without dicounting for ll if i tte if i not tte if i not tte gd() doe not necerily converge fter finite mount of time () converge fter finite mount of time if gd() i pproximted with gd i () for ll

exmple of Vlue Itertion ().5 tte.5.5.5 (dicount fctor =.9) i exmple of Vlue Itertion ().5 tte.5.5.5.5 (dicount fctor =.9).9.9.575.575.96.96 which ction to execute in the tte? tte.5.5.855.855.9.8.8 5 6. 7.75 : + = : +.9.5 +.9.5.5 =.75 execute! (In generl, the optiml ction depend on the dicount fctor!)...5.5.5.5 Mrkov eciion Problem; pge 5 of Mrkov eciion Problem; pge 6 of lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model.5.5.5.5 exm exmple find policy (behvior) tht mximize the expected totl dicounted rewrd even in the preence of delyed rewrd if you don t know the ction outcome (rewrd nd probbilitie): reinforcement lerning lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model pproch etimte the probbilitie nd rewrd ue vlue-itertion time 8 time p(,) =? explortion/exploittion trdeoff Mrkov eciion Problem; pge 7 of Mrkov eciion Problem; pge 8 of

lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model pproch ue Q-lerning Mrkov eciion Problem; pge 9 of if you execute ction in tte nd you receive cot c nd mke trnition to tte then updte Q(,) = Q(,) + α (c + γ V( ) - Q(,)) lerning rte dicount fctor < γ < V( ) = min ε ( ) Q(,) Q(,) = miniml expected dicounted totl cot until i reched if execution in tte nd the firt ction executed i V( ) = miniml expected dicounted totl cot until i reched if execution in tte (= gd( ) ) lerning for optimiztion reinforcement lerning with Mrkov eciion Proce Model pproch. Initilize Q(,) = for ll tte nd ction.. := the current tte.. if i tte then.. hooe n ction to execute in the current tte. (The ction believed to be bet i := rgmin ε () Q(,).) 5. xecute ction. Oberve the cot c nd ucceor tte. 6. Updte Q(,) = Q(,) + α (c + γ V( ) - Q(,)). 7. Goto. Q(, ) = 5. Mrkov eciion Problem; pge of Q(,) =. cot prob.5 prob.5 Q(, ) Q(, ) Q(, ) =. =. =.5 Q(, ) =.9