Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1

Size: px

Start display at page:

Download "Dennis Bricker, 2001 Dept of Industrial Engineering The University of Iowa. MDP: Taxi page 1"

Hilary Atkins
5 years ago
Views:

1 Denns Brcker, 2001 Dept of Industrl Engneerng The Unversty of Iow MDP: Tx pge 1

2 A tx serves three djcent towns: A, B, nd C. Ech tme the tx dschrges pssenger, the drver must choose from three possble ctons: (1) "Cruse" the streets lookng for pssenger. (2) Go to the nerest tx stnd (hotel, trn stton, etc.) (3) Wt for rdo cll from the dsptcher wth nstructons (but not possble n town B becuse of dstnce nd poor recepton). MDP: Tx pge 2

3 MDP model: Sttes: {A, B, C} Acton sets: KA = {1,2,3}, KB = {1,2,3}, KC = {1,2} Trnston probblty mtrces Crusng streets Wtng t tx stnd Wtng for dsptch P = P = P = MDP: Tx pge 3

4 Pyoff mtrces (expected proft per pssenger): k R j = expected proft f cton k s selected, nd pssenger wshes to trvel from town to town j Crusng streets Wtng t tx stnd Dsptch cll R = R = R = Snce our model ssumes mnmzton of cost, we use k k k C = PR j j j Note: Ths exmple ws ntroduced by Ron Howrd n hs textbook, Dynmc Progrmmng nd Mrkov Processes, MIT Press (1960), n whch no consderton ws gven to the vrble mount of tme per stge (trp) n the optmzton model. MDP: Tx pge 4

5 Sttes: Actons: stte 1 town A 2 town B 3 town C k cton 1 CRUISE 2 TAXISTAND 3 RADIO CALL Cost Mtrx stte town A town B town C (Rows ~ sttes, Columns ~ ctons) A vlue of 999 bove sgnls n nfesble cton n stte. Note tht the lgorthm ssumes mnmzton, nd so the "cost" s the negtve of the expected pyoffs! MDP: Tx pge 5

6 Acton: CRUISE Acton: TAXISTAND Acton: RADIO CALL Trnston Probbltes f to r o m f to r o m f to r o m MDP: Tx pge 6

7 Let's frst use the crteron: Mxmze verge rewrd per trp LP Tbleu for MDP k: : RHS Mn Note tht one of the "stedystte" equtons (for stte C) ws elmnted becuse of redundncy. MDP: Tx pge 7

8 Mnmze subject to = 1 x = px for ll sttes j j j j x 0 for ll sttes nd ctons A x cx MDP: Tx pge 8

9 Phse One procedure ws used to fnd n ntl bsc fesble soluton Iterton 0 Polcy: (Cost= 8 ) Stte Acton P{} R{} 1) town A 3) RADIO CALL ) town B 1) CRUISE ) town C 2) TAXISTAND MDP: Tx pge 9

10 k: : rhs Mn Intlly the bsc vrbles re { 3 1 2,, } A B C X X X (exctly one per stte). The vlues of these vrbles re the stedystte probbltes of the Mrkov chn correspondng to the polcy (3, 1, 2). The "most negtve" reduced cost s 6 (of vrble tht vrble should enter the bss, replcng element s , ndcted bove.) 2 X B), nd so 1 X B. (The pvot MDP: Tx pge 10

11 Iterton 1 Polcy: (Cost= ) Stte Acton P{} R{} 1) town A 3) RADIO CALL ) town B 2) TAXISTAND ) town C 2) TAXISTAND k: : rhs Mn The next pvot should enter 2 X A nto the bss, replcng 3 X A. MDP: Tx pge 11

12 Iterton 2 Polcy: (Cost= ) Stte Acton P{} R{} 1) town A 2) TAXISTAND ) town B 2) TAXISTAND ) town C 2) TAXISTAND k: : rhs Mn All reduced costs re nonnegtve! MDP: Tx pge 12

13 Optml Polcy Stte Acton P{} R{} 1) town A 2) TAXISTAND ) town B 2) TAXISTAND ) town C 2) TAXISTAND Averge cost/stge = MDP: Tx pge 13

14 MDP: Tx pge 14

15 MDP: Tx pge 15

16 Vlue Iterton Method (Note: objectve: mxmze verge rewrd per pssenger) ( ) f We wnt to compute lm n n n fn = C + pj fn j A j where ( ) ( ) ( ) f Snce lm n should be ndependent of the stte, our n n convergence crteron s to compute fn( ) fn 1 ( ) f ( ) = n n 1 nd termnte when mx ( ) { n } mn { n ( )} f f ε MDP: Tx pge 16

17 Tolernce: 1.00E 6 Mnmzng verge cost/perod terton Mx V Mn V gp (%) E E E E E E E E E E E E E E E E E E E E E E E E 5 ***Converged! wth gp = % Soluton: stte cton Vlue MDP: Tx pge 17

18 Plot of f ( ) f ( ) n n 1 MDP: Tx pge 18

19 Plot of ( ) { fn } { fn ( ) } log 10 mx mn MDP: Tx pge 19

20 Next we wll solve the problem wth the objectve of mxmzng the totl dscounted pyoff. Crteron: Dscounted Totl Cost, wth β = (1.2) -1 = k: : RHS Mn Note: Specfyng ntl condtons to be determnstc, wth town A s ntl stte Mnmze X 0 A, A j C X subject to X = β PX j j j A A MDP: Tx pge 20

21 Note tht X s not probblty n ths model, nd so the equton X = 1 s not ncluded n the LP tbleu. There s one equton for ech stte (not ncludng the objectve row), wth no redundncy s n the verge cost/stge LP model, so the totl number of vrbles, s before, s equl to the number of sttes, nd s before, n bsc fesble soluton there s one bsc vrble per stte. MDP: Tx pge 21

22 Phse One procedure ws used to fnd ntl bsc fesble soluton Iterton 0 Polcy: (Cost= ) Stte Acton X{} V{} 1) town A 1) CRUISE ) town B 1) CRUISE ) town C 3) RADIO CALL k: : rhs Mn ~stte, k~cton MDP: Tx pge 22

23 Iterton 1 Polcy: (Cost= 61.4 ) Stte Acton X{} V{} 1) town A 1) CRUISE ) town B 2) TAXISTAND ) town C 3) RADIO CALL k: : rhs Mn ~stte, k~cton MDP: Tx pge 23

24 Iterton 2 Polcy: (Cost= ) Stte Acton X{} V{} 1) town A 1) CRUISE ) town B 2) TAXISTAND ) town C 2) TAXISTAND k: : rhs Mn ~stte, k~cton MDP: Tx pge 24

25 Iterton 3 Polcy: (Cost= ) Stte Acton X{} V{} 1) town A 2) TAXISTAND ) town B 2) TAXISTAND ) town C 2) TAXISTAND k: : rhs Mn ~stte, k~cton Reduced costs re now nonnegtve! MDP: Tx pge 25

26 Optml Polcy Stte Acton X{} V{} lph{} 1) town A 2) TAXISTAND ) town B 2) TAXISTAND ) town C 2) TAXISTAND Alph s ntl dstrbuton of the stte Dscounted future costs = MDP: Tx pge 26

27 MDP: Tx pge 27

28 MDP: Tx pge 28

29 Vlue Iterton Method Tolernce: 1.00E 6 Mnmzng dscounted future costs terton Mx V Mn V gp (%) E E E E E E E E E E E E E E E E E E E E E E E E E E E 5 ***Converged! wth gp = % MDP: Tx pge 29

30 Soluton: stte cton Vlue MDP: Tx pge 30

31 Solvng gn. Tolernce: 1.00E 12 Reduced the tolernce! terton Mx V Mn V gp (%) E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E E 11 ***Converged! wth gp = 4.794E 11% MDP: Tx pge 31

32 Soluton: stte cton Vlue Note: Polcy s sme s the erler run wth lrger tolernce, but objectve vlue s nerer to true vlue. MDP: Tx pge 32

33 MDP: Tx pge 33

34 Becuse the durton (n mnutes) of stge (trp) wll depend upon the polcy whch we select, the objectve of mxmzng the rewrd per trp s npproprte-- we should nsted mxmze the rewrd per unt tme. Ths requres tht we tret ths s Sem-Mrkov Decson Process (SMDP). MDP: Tx pge 34

35 Suppose we hve the ddtonl dt: Expected tme to obtn pssenger: k W = expected wtng tme (mnutes) n town when cton k s selected Town \ Acton Crusng Tx stnd Dsptch cll A B C Expected trvel tme between towns: T j= expected trvel tme (mnutes) from town to town j Town \ Town A B C A B C MDP: Tx pge 35

36 Expected trvel tme (mnutes) of trp = k PT j j = j Cruse Tx-stnd Rdo cll town A town B town C v E τ = expected totl durton of trp (wtng + trvelng) k k k when n town f cton s selected,.e., E τ = W + PT j j = Cruse Tx-stnd Rdo cll town A town B town C j MDP: Tx pge 36

37 Averge rewrd per mnute for the optml polcy (2,2,2) found by the MDP: cx verge rewrd/trp $ $ /mn. vx = verge durton of trp = mn. = MDP: Tx pge 37

38 If we tret ths s Sem-Mrkov Decson Process (SMDP), then we cn fnd the polcy whch mxmzes our rewrd per mnute by solvng the LP: Mnmze subject to = 1 u = pu for ll sttes j j j j u 0 for ll sttes nd ctons A vu cu MDP: Tx pge 38

39 k: : RHS Mn Note tht ths tbleu dffers from tht of the LP for MDP only n the lst row! MDP: Tx pge 39

40 Optml Polcy: Stte Acton U{} 1) town A 1) CRUISE ) town B 2) TAXISTAND ) town C 2) TAXISTAND Averge cost/unt tme = The optml polcy (1,2,2) of the SMDP s dfferent n town A, nd the verge rewrd per mnute s slghtly lrger ($ ) thn tht ($ ) of the erler polcy (2,2,2). In relty, of course, the nfnte horzon s much more problemtc ssumpton n ths prtculr problem!) MDP: Tx pge 40

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism CS294-40 Lernng for Rootcs nd Control Lecture 10-9/30/2008 Lecturer: Peter Aeel Prtlly Oservle Systems Scre: Dvd Nchum Lecture outlne POMDP formlsm Pont-sed vlue terton Glol methods: polytree, enumerton,