Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Similar documents
3/6/00. Reading Assignments. Outline. Hidden Markov Models: Explanation and Model Learning

Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Remember: Project Proposals are due April 11.

Improving Anytime Point-Based Value Iteration Using Principled Point Selections

Variable time amplitude amplification and quantum algorithms for linear algebra. Andris Ambainis University of Latvia

6 Roots of Equations: Open Methods

UNIVERSITY OF IOANNINA DEPARTMENT OF ECONOMICS. M.Sc. in Economics MICROECONOMIC THEORY I. Problem Set II

International Journal of Pure and Applied Sciences and Technology

Math 497C Sep 17, Curves and Surfaces Fall 2004, PSU

CIS587 - Artificial Intelligence. Uncertainty CIS587 - AI. KB for medical diagnosis. Example.

Introduction to Numerical Integration Part II

arxiv: v2 [cs.lg] 9 Nov 2017

Abhilasha Classes Class- XII Date: SOLUTION (Chap - 9,10,12) MM 50 Mob no

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

Least squares. Václav Hlaváč. Czech Technical University in Prague

7.2 Volume. A cross section is the shape we get when cutting straight through an object.

Lecture 4: Piecewise Cubic Interpolation

DCDM BUSINESS SCHOOL NUMERICAL METHODS (COS 233-8) Solutions to Assignment 3. x f(x)

The Schur-Cohn Algorithm

Advanced Machine Learning. An Ising model on 2-D image

CHAPTER - 7. Firefly Algorithm based Strategic Bidding to Maximize Profit of IPPs in Competitive Electricity Market

Principle Component Analysis

Bellman Optimality Equation for V*

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

Lecture 36. Finite Element Methods

INTRODUCTION TO COMPLEX NUMBERS

An Ising model on 2-D image

An Introduction to Support Vector Machines

ICS 252 Introduction to Computer Design

I1 = I2 I1 = I2 + I3 I1 + I2 = I3 + I4 I 3

Lecture 21: Numerical methods for pricing American type derivatives

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 9

ESCI 342 Atmospheric Dynamics I Lesson 1 Vectors and Vector Calculus

Computational issues surrounding the management of an ecological food web

Probabilistic Graphical Models

LOCAL FRACTIONAL LAPLACE SERIES EXPANSION METHOD FOR DIFFUSION EQUATION ARISING IN FRACTAL HEAT TRANSFER

Reinforcement learning II

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

CS 188: Artificial Intelligence

Using Predictions in Online Optimization: Looking Forward with an Eye on the Past

Applied Statistics Qualifier Examination

Review of linear algebra. Nuno Vasconcelos UCSD

4. Eccentric axial loading, cross-section core

Multiple view geometry

Zbus 1.0 Introduction The Zbus is the inverse of the Ybus, i.e., (1) Since we know that

Decision Networks. CS 188: Artificial Intelligence Fall Example: Decision Networks. Decision Networks. Decisions as Outcome Trees

Module 6 Value Iteration. CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo

In this Chapter. Chap. 3 Markov chains and hidden Markov models. Probabilistic Models. Example: CpG Islands

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

18.7 Artificial Neural Networks

Administrivia CSE 190: Reinforcement Learning: An Introduction

Trigonometry. Trigonometry. Solutions. Curriculum Ready ACMMG: 223, 224, 245.

CS 188: Artificial Intelligence Fall 2010

Homework Assignment 6 Solution Set

1 Nondeterministic Finite Automata

CISE 301: Numerical Methods Lecture 5, Topic 4 Least Squares, Curve Fitting

Formulated Algorithm for Computing Dominant Eigenvalue. and the Corresponding Eigenvector

19 Optimal behavior: Game theory

Decision Networks. CS 188: Artificial Intelligence. Decision Networks. Decision Networks. Decision Networks and Value of Information

1/4/13. Outline. Markov Models. Frequency & profile model. A DNA profile (matrix) Markov chain model. Markov chains

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and

Smart Motorways HADECS 3 and what it means for your drivers

Jens Siebel (University of Applied Sciences Kaiserslautern) An Interactive Introduction to Complex Numbers

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede. with respect to λ. 1. χ λ χ λ ( ) λ, and thus:

Line Drawing and Clipping Week 1, Lecture 2

ragsdale (zdr82) HW6 ditmire (58335) 1 the direction of the current in the figure. Using the lower circuit in the figure, we get

Designing Information Devices and Systems I Spring 2018 Homework 7

CS 275 Automata and Formal Language Theory

{ } = E! & $ " k r t +k +1

Chapter 4: Dynamic Programming

Katholieke Universiteit Leuven Department of Computer Science

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

What would be a reasonable choice of the quantization step Δ?

Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes

ME 501A Seminar in Engineering Analysis Page 1

Multilayer Perceptron (MLP)

Pyramid Algorithms for Barycentric Rational Interpolation

Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service

set is not closed under matrix [ multiplication, ] and does not form a group.

STATISTICAL MECHANICS OF THE INVERSE ISING MODEL

CS 188: Artificial Intelligence Fall Announcements

Definition of Tracking

Complex Numbers, Signals, and Circuits

Minimum Spanning Trees

Lecture 2e Orthogonal Complement (pages )

7.1 Integral as Net Change and 7.2 Areas in the Plane Calculus

GAUSS ELIMINATION. Consider the following system of algebraic linear equations

Reinforcement Learning with a Gaussian Mixture Model

Bases for Vector Spaces

a = Acceleration Linear Motion Acceleration Changing Velocity All these Velocities? Acceleration and Freefall Physics 114

Online Learning Algorithms for Stochastic Water-Filling

Frequency scaling simulation of Chua s circuit by automatic determination and control of step-size

Constructing Free Energy Approximations and GBP Algorithms

Section 7.1 Area of a Region Between Two Curves

Activator-Inhibitor Model of a Dynamical System: Application to an Oscillating Chemical Reaction System

Homework Assignment 3 Due in class, Thursday October 15

Machine Learning Support Vector Machines SVM

Transcription:

CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton, flterng, wtness 1 Prtlly Oservle Mrkov Decson Process (POMDP) Formlsm A Prtlly Oservle Mrkov Decson Process (POMDP) s tuple (S, A, T, R, γ,, Ω). where 1. S s the set of possle sttes for the system 2. A s the set of possle ctons 3. T represents the system dynmcs 4. R : S R s the rewrd functon 5. O s the set of possle oservtons we cn mke 6. Ω s the prolty dstruton P( S) We cn convert POMDP nto elef stte MDP. A elef stte spce B s S 1 dmensonl smplex whose elements re prolty dstrutons over sttes: B : (s) = P ro(s ll nformton vlle t current tme) The trnston model now descres trnstons etween elefs, rther thn trnstons etween sttes. We defne o to e the elef stte reched when strtng n elef stte, tkng cton, nd oservng o. Hence, o (s P (o t+1 = o s t+1 = s ) s ) = P (s t+1 = s s t = s, t = )(s) s P (o t+1 = o s t+1 = s ) s P (s t+1 = s s t = s, t = )(s) = P (o s ) s P (s s, )(s) P (o, ) As we terte through tme, the elef sttes re updted s follows: P ( o, ) = s P (o t+1 = o s t+1 = s ) s P (s s, )(s) Our polces now mp elefs to ctons: π : B A 1

nd we compute the vlue of polcy s follows: V π () = E[ γ t R( t, t ) 0 = ; π] Bellmn ck-ups for POMDP: t=0 V () mx [ R(s, )(s) + γ A s S o 2 Pont-sed vlue terton P ( o, )V ( o )] A prctcl ssue tht rses s tht, even when our stte spce s dscrete nd fnte, the elef stte spce ( S 1 dmensonl smplex) s contnuous nd hence hs nfntely mny sttes. One soluton: use functon pproxmton, e.g., grd out the elef stte spce nd use nerest neghor s functon pproxmton. Here we study nother soluton: t turns out tht for ny fnte horzon, the vlue functon of elef stte MDP cn e representted y the mx over set of lner functons. Concretely, n the frst terton of our vlue terton we hve: Ths remns true for rtrry n: We defne r = R(1,) R(2,). R(s,) V 1 () = mx A s S R(s, )(s) = mx α (0) T {α (0) } V n () = mx We tertvely updte the vlue of the elef stte: V n+1 () = mx [rt + γ o P (o, )V n ( o )] α (n) T } where = mx [rt + γ o = mx [rt + γ o = mx [rt + γ o = mx [rt + γ o P (o, ) mx P (o, ) mx P (o, ) mx α (n) } } } mx g,,ot (n) ] {,,o },,o (s) = s s s T o ] α (n) T (s ) o (s )] α (n) P (o s )P (s s, )α (n) (s ) T (s ) P (o s ) s P (s s, )(s) ] P (o, ) To do the ckup for :,, o, compute,,o (s) ccordng to the equton ove let = r + γ o rg mx g,,ot (n),,o 2

Ths yelds for V n+1 () = α (n+1) T α (n+1) rg mx g (n) T Pont-sed vlue terton s effcent, ut nexct due to the dscrete choce of elef sttes. However, there re clever wys to pck the ponts: 1. Pneu, Gordon, Thrun: Pont-Bsed Vlue Iterton Begn t some ntl elef. Then pck elef ponts y forwrd smulton nd prune y dstnce. 2. Vlsss, Spn: Perseus In every terton, only do the Bellmn ck-up for pont f the ck-ups of other elef sttes hs not yet ncresed tht ponts vlue functon. [Assumes you ntlze the vlue functon wth lower-ound.] Ths ensures the vlue functon ncreses for every elef pont n every terton. It works very well n prctce. 3 Glol methods Consder polcy wth horzon H n POMDP. We cn represent ths s polcy tree (ssumng the polcy s determnstc). We cn compute the vlue of eng t stte s nd followng polcy tree p: V H p = R(s, p(s)) + γ s S P (s s, p(s)) o P (o s )V H 1 ˆp (s) where ˆp s sutree of p. If we do not know wht stte we re n, ut do know wht elef stte we re n, we compute: Vp H () = s (s)vp H (s) For the optml (wthn horzon H) tree, we defne V H () = mx p Vp H () 3

Fgure 1: Polcy tree llustrton. 3.1 Glol vlue terton for elef stte MDP (Algorthm) t = 1 V 1 = set of 1-step polcy trees Loop: 1. t++ 2. Compute V t +, the set of possly useful t-step polcy trees from V t 1 3. Prune/Flter V + t Untl sup V t () V t 1 () < ɛ 3.2 Bsc flterng: j, solve to get V t, the set of useful t-step polcy trees mx t j, α T j α T + t 0 1 T = 1 If for j we hve tht the soluton t > 0, then the polcy tree j s useful. In tht, there s elef pont for whch polcy tree j s the optml polcy. Otherwse, we prune out polcy tree j. 4

3.3 Lrk s flterng: Incrementlly uld up the set of useful α. Ths wy the sze of the LP scles wth V t rther thn V t + j = 1, 2,..., V t + : mx t k V t, t α T j α T k 0 1 T = 1 f t > 0, fnd mx j 1,2,..., V + t T α j dd rgmx j 1,2,..., V + t T α j to V t For 3.4 Wtness lgorthm In the wtness lgorthm, we try to vod constructng V t +. We do ths y only ddng tree f t s optml for some elef. It s s suffcent to check ths per sutree: Is there elef stte we cn rech fter cton nd oservton o such tht the polcy tree p V t 1 would e optml? Ths prolem cn e solved y lner progrmmng. 5