Whe solvg a vetory repleshmet problem usg a MDP model, kowg that the optmal polcy s of the form (s,s) ca reduce the computatoal burde. That s, f t s optmal to replesh the vetory whe the vetory level s, the t s optmal to replesh whe the vetory level s -. Lkewse, propertes of the optmal polcy for equpmet replacemet & mateace problems ca be used to reduce the computato. Uder certa reasoable assumptos, f t s optmal to replace a mache of age, the t s optmal to replace a older mache. MDP- Mootoe Optmal Polces page D.L.Brcker
Defto: Suppose the states ad actos of a MDP are real varables, * ad let a ( ) be the optmal decso rule as a fucto of the state whe perods rema. * The a ( ) s mootoe odecreasg f ( ) ( ) a s a s + γ for each γ > 0 ad s S * * mootoe ocreasg f ( ) ( ) a s a s + γ for each γ > 0 ad s S * * MDP- Mootoe Optmal Polces page 2 D.L.Brcker
Examples: equpmet replacemet problem, where s = age of the equpmet, ad A s = { 0, } where dcates "replace", 0 dcates "keep". The optmal * polcy s mootoe odecreasg, sce f a ( ) s = (the optmal decso s to replace equpmet at age s), the optmal decso s also "replace" for older peces of equpmet. vetory repleshmet problem, where s = vetory level ad a A s s the vetory level after repleshmet,.e., the order-up-to level. The optmal polcy s mootoe ocreasg, sce f the optmal decso s to order up to level s' whe the curret level s s, the f the vetory level were greater, oe would ot order up to a larger quatty. Rather, f the vetory level s exceeds the reorder pot, the order-up-to level s also s, whle f the vetory level falls below the reorder pot, the order-up-to level creases. MDP- Mootoe Optmal Polces page 3 D.L.Brcker
Deftos: A bary decso process (BDP) s a Markov decso process wth a fte umber of states ad acto set As = { 0, } for each s S. A cotrol lmt V for a BDP s a quatty such that the optmal acto s a (s)=0 f ad oly f s V. Examples: Equpmet replacemet, where actos a (s) are 0 "keep" ad "replace". Stoppg problem, where the actos a(s) are 0 "stop" ad "cotue". Processer wth two rates, where the state s the umber of customers watg for processg ad the two actos are "slow" ad "fast" processg rates. MDP- Mootoe Optmal Polces page 4 D.L.Brcker
Cosder the Bary Decso Process where Defe the fucto f s cs psj f j a j S a a ( ) = m + β ( ) { 0,} γ ( sa, ) =.e., the codtoal probablty that, gve acto a selected state s, the ext state of the system s or greater. j p a sj MDP- Mootoe Optmal Polces page 5 D.L.Brcker
Theorem: Suppose 0 c c c c for s S 0 0 s+ s s+ s ad for each, the fuctos γ (,0), (,) γ (,0) γ (,) are all odecreasg. γ & the dfferece The for each there s a cotrol lmt V,.e., a optmal polcy s gve by a 0 f s = f s < V V Referece: Dael P. Heyma & Matthew J. Sobel, Stochastc Models Operatos Research, Volume II: Stochastc Optmzato, McGraw-Hll Book Compay, 984, page 387. MDP- Mootoe Optmal Polces page 6 D.L.Brcker
Iterpretato: The codto 0 0 0 cs+ cs cs+ cs states that the sgle-perod cost creases as s creases, but that the cremet cost s greater f a = 0 tha f a =. The codto that γ (,0) ad (,) γ are odecreasg meas that, regardless of the acto, large values of s ted to be followed by large values of s+, whle the codto that γ (,0) γ (,) ths tedecy s greater f a = 0 tha f a =. s odecreasg meas that MDP- Mootoe Optmal Polces page 7 D.L.Brcker
Example: Equpmet replacemet problem, where state s = ageof the equpmet, ad actos are: a = deotes "replacemet" ad a = 0 deotes "keep". Cosder frst the determstc problem wth o falures,.e., so that p 0 ss, ps,0 + = =, γ ( s,0) = 0 for s ad ( ) γ s,0 = for s >. Furthermore, γ ( s ) = for all s, ad ( ) 0, γ s, = 0 for. It follows that the codtos upo the fucto γ the theorem are satsfed. MDP- Mootoe Optmal Polces page 8 D.L.Brcker
Suppose that form ( r L s ) 0 cs s the operatg cost at age s ad that where r s the cost of the ew pece of equpmet ad L s s the salvage value of the replaced equpmet. The the codtos of the theorem requre that 0 0 0 Ls Ls+ cs+ cs cs has the that s, operatg cost creases wth age whle salvage value decreases, wth operatg costs rsg wth age at a rate at least as great as the reducto salvage value. Gve these reasoable assumptos, the theorem mples exstece of a optmal cotrol lmt,.e., t s optmal to replace the equpmet whe t exceeds a certa age. MDP- Mootoe Optmal Polces page 9 D.L.Brcker
Cosder ow the more realstc case whch radom falures may occur: b s s the probablty of falure of a mache of age s. The p = + b & p = b 0 0 ss, s s,0 s sce falg uts are mmedately replaced, so that γ 0 f s+ < s,0 = bs f 0< s + f = 0 ( ) whch s ot ecessarly odecreasg s as s requred to apply the theorem. MDP- Mootoe Optmal Polces page 0 D.L.Brcker
b b + However, "creasg falure rate",.e., s s s ofte a vald assumpto, ad order to apply the theorem we re-defe the states: Delete state 0 ad let state deote the "hghest" state such that breakdows cause a trasto to ths state, from whch replacemet madatory,.e., A = {}. MDP- Mootoe Optmal Polces page D.L.Brcker
For all s, let p = s, b0 deote the probablty that a ew tem breaks dow. p The s, ad = bs ad p 0 ss, + = bs for s, s, so that Therefore, (,0) ( ) γ s,0 = f s + ( ) γ s,0 = b f > s + s γ s odecreasg f falure rates are b b + odecreasg ( s s Smlarly, γ ( s,) for all s ). s a costat wth respect to s, ad so the codtos of the theorem relatg to γ are satsfed. MDP- Mootoe Optmal Polces page 2 D.L.Brcker
Let cs be defed as before for s ad c = r L. Also, we should crease expected costs uder acto 0 ("keep") to clude the expected replacemet costs,.e., where ( β ) ' ( ) c = b r K + b K 0 s s s s s Ks ' s the salvage value of a broke-dow tem ad Ks s the operatg cost of oe that has ot broke dow. MDP- Mootoe Optmal Polces page 3 D.L.Brcker
The { c k s } s odecreasg s f L b s+ s b L s s+,, ( ) ( ) b K b K, s s s+ s+ b K' b K ', ad L s s s+ s+ L s for all s. Uder these codtos, the theorem mples a optmal cotrol lmt polcy. MDP- Mootoe Optmal Polces page 4 D.L.Brcker
Suppose that the states are defed ot by chroologcal age, but by codto, so that p > 0 for may j (ot merely for j = s+ ad j = ). 0 sj I order for the codtos of the theorem to hold, t s ecessary that j p 0 sj If s be odecreasg s for each. c = r L, the a cotrol lmt polcy s optmal. s MDP- Mootoe Optmal Polces page 5 D.L.Brcker