Markov Decision Processes

Size: px

Start display at page:

Download "Markov Decision Processes"

Peter Watkins
5 years ago
Views:

1 Mrkov Deciion Procee A Brief Introduction nd Overview Jck L. King Ph.D. Geno UK Limited

2 Preenttion Outline Introduction to MDP Motivtion for Study Definition Key Point of Interet Solution Technique Prtilly Oervle MDP Motivtion Solution Technique Grphicl Model Decription Appliction Technique

3 Introduction to Mrkov Deciion Procee

4 Sequentil Deciion Proce Sequentil Deciion Proce A erie of deciion re mde ech reulting in rewrd nd new itution. The hitory of the itution i ued in mking the deciion. Mrkov Deciion Proce At ech time period t the ytem tte provide the deciion mker with ll the informtion necery for chooing n ction. A reult of chooing n ction the deciion mker receive rewrd r nd the ytem evolve to poily different tte ` with proility p.

5 Appliction Inventory Mintennce Service Queuing Pricing Root Guidnce Rik Mngement

6 Key Point of Interet 1. I there policy tht deciion mker cn ue to chooe ction tht yield the mximum rewrd ville? 2. Cn uch policy if it exit e computed in finite time i it computtionlly feile? 3. Are there certin choice for optimlity or tructure for the ic model tht ignificntly impct 1. nd 2.?

7 Definition S A et of poile world tte et of poile ction R rel-vlued rewrd function T π decription of ech ction effect in tte. T: SXA->ProS. Ech tte nd ction pecifie new trnition proility ditriution for the next tte. policy mpping from S to A {d t =}

8 How to evlute policy? Expected totl rewrd Led to infinite vlue Set finite horizon Somewht ritrry Dicount rewrd Mot tudied nd implemented Give weighting to erlier rewrd Interprettion in economic i cler nd lo cn e ued generl topping criteri.

9 Dicounted Vlue Function Rewrd re dicounted for ech period. V 0 = < R ρ 0 < + 1 ρr 1 + ρ 2 R ρ T RT Vlue function prtilly order the policy o t let one optiml policy exit. We re generlly intereted in ttionry policie where the deciion rule d t re the me for ll time period.

10 Solving MDP Enumerted Vlue Function 1 V d d r V t S t t t + = + π τ ρ π Dynmic Progrmming 1 V d d r V S n n n n + = π τ ρ π 0 0 = V π nd for ll where trting tte i with n tep remining.

11 Optiml Vlue Function for MDP Optiml Vlue function cn e ued to find the mximum rewrd policy. + = mx * 1 * V r V S n A n τ ρ Policy itertion lterntive. rg mx * * V d n A n =

12 MDP Summry Importnt cl of equentil deciion procee. Well known lgorithm for olution Bellmn eqution with dynmic progrmming. Solution re t et polynomil in ction nd tte O 3 when converted to LP. Require detiled expoition of ction nd tte. Become intrctle nd unmngele very quickly for intereting prolem.

13 MDP Reference Bellmn Dynmic Progrmming 1957 Seminl work in dynmic progrmming nd vlue function Bellmn eqution. Cormen et.l. Introduction to Algorithm 1990 Good decription of lgorithm for dynmic progrmming nd greedy erch policy itertion. Ro Applied Proility Model with Appliction to Optimiztion Introduction to optimiztion nd MDP well written with good exmple.

14 Exmple MDP Should I refreh the rower when the network i up/down? 100 Don t Refreh Don t Refreh 0 Up Down Refreh Refreh 70

15 Exmple Clcultion 100 Don t Refreh Don t Refreh 0 Stte Action Up Down Utility Up Refreh Up Down Up Don t Refreh Refreh Refreh 70 Down Refreh Down Don t Refreh Q Up refreh = V * Up +.1V Down Q Up don' t = V * Up +.1V * Down Q Down refreh = V * Down +.7V * Up Q Down don' t = 0 +.8V * Down +.2V * Up * V * Up = don' t V * Down = refreh Policy i to refreh when the Network i down.

16 Prtilly Oervle MDP POMDP ddreed the prolem of You don t normlly know if the network i functioning well e.g. up or down. You only oerve the repone you ee text nd picture pper nd the ttu r diply.

17 Solving POMDP Convert to MDP Add et of oervtion O. Add proility ditriution Uo for ech tte. Add n initil tte ditriution I. Stte now re clled Belief tte nd re updted uing Bye Rule. Then olve the POMDP you would n MDP over the elief tte proility ditriution.

18 POMPD Eqution { } Z z B r z R z O r z S S Z z = = = ψ ω τ Relte tte to n oervtion Z uing proility ditriution O. Rewrd function Succeor Stte Trnition Mtrix

19 DP Eqution for POMDP + = * 1 * mx B n A n V V ψ ρ ω Explicit form of DP eqution for POMDP mx * 1 * + = S S Z z z n S A n V z o r V τ ρ Optiml vlue function for POMDP

20 Solving MDP POMDP olution pproche mut e le to del with continuou pce of elief tte. Exct olution LP-ed On x Enumertion Pruning Witne Approximte Solution Heuritic Grid-ed Stochtic Simultion

21 POMDP Summry Vrition of MDP when tte informtion i not explicitly known. Convert to MDP uing elief tte nd olve n MDP over elief tte. Prolem i now continuou nd olution lgorithm for MDP no longer work. Exct olution re t et exponentil in ction nd tte O when converted to LP. Require detiled expoition of ction nd elief tte long with trting vlue. Become intrctle nd unmngele very quickly for intereting prolem.

22 POMDP Reference Cndr Tony POMDP weite t Brown tutoril exmple ee hi PhD thei for nice urvey of POMDP lgorithm. Sondik 1971 Optiml Control of Prtilly Oervle Mrkov Procee PhD thei Stnford. Erly pper on exct olution to POMDP. Ro Applied Proility Model with Appliction to Optimiztion Introduction to optimiztion MDP nd POMDP well written with good exmple.

23 Grphicl Model Min ide Node repreent vrile nd rc re conditionl dependencie. Joint proility for model i fctored uing conditionl proilitie etween vrile. Model i tructured directed cyclic grph DAG. Advntge Fctoriztion correpond to dt vilility nd proilitie cn e uilt uing hitoricl frequency. Independence etween vrile i exploited reulting in more compct repreenttion. Efficient mege-ping lgorithm cn e ued reulting in trctility for intereting prolem.

24 Grphicl Model Node repreent the vrile in the model nd the link how which vrile depend on other. Sunny Cloudy Riny Forect No Rin Rin Wether Decide_Umrell Tke It Leve At Home Stifction

25 Grphicl Model Reference Perl 1987 Proilitic Reoning in Expert Sytem. Erly work on uilding grphicl model. Luritzen 1996 Grphicl Model generl decription of grphicl model reltionhip to Mrkov model very thorough. King 2001 Opertionl Rik Meurement nd Modelling. Exmple of ppliction of grphicl model to rik mngement.

26 Summry Mrkov Deciion Proce An effective wy to ddre complex prolem tht del with uncertinty. POMDP ddre the prolem of oerving the tte of the proce. Grphicl Model ddre the prolem of repreenttion nd trctility of the model.

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes

Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes Non-Myopic Multi-Apect Sening with Prtilly Oervle Mrkov Deciion Procee Shiho Ji 2 Ronld Prr nd Lwrence Crin Deprtment of Electricl & Computer Engineering 2 Deprtment of Computer Engineering Duke Univerity