DYNAMIC PROGRAMMING Dyamc Programmg It s a useful mathematcal techque for makg a sequece of terrelated decsos. Systematc procedure for determg the optmal combato of decsos. There s o stadard mathematcal formulato of the Dyamc Programmg problem. Kowg whe to apply dyamc programmg depeds largely o experece wth ts geeral structure. João Mguel da Costa Sousa / Alexadra Moutho 03 Prototype example Costs Cost c j of gog from state to state j s: Stagecoach problem Fortue seeker wats to go from Mssour (A) to Calfora (J) the md 9th cetury. Jourey has 4 stages. Cost s the lfe surace of a specfc route; lowest cost s equvalet to safest trp. João Mguel da Costa Sousa / Alexadra Moutho 04 B C D E F G H I J A 4 3 B 7 4 6 E 4 H 3 C 3 4 F 6 3 I 4 D 4 5 G 3 3 Problem: whch route mmzes the total cost of the polcy? João Mguel da Costa Sousa / Alexadra Moutho 05 Solvg the problem Note that greedy approach does ot work. Soluto A B F I J has total cost of 3. However, e.g. A D F s cheaper tha A B F. Other possblty: tral ad error. Too much effort eve for ths smple problem. Dyamc programmg s much more effcet tha exhaustve eumerato, especally for large problems. Starts from the last stage of the problem, ad elarges t oe stage at a tme. João Mguel da Costa Sousa / Alexadra Moutho 06 Formulato Decso varables x ( =,, 3, 4) are the mmedate destato of stage. Route s A x x x 3 x 4, where x 4 = J. Total cost of the best overall polcy for the remag stages s f (s, x ) Actual state s s, ready to start stage, selectg x as the mmedate destato. x mmzes f (s, x ) ad f (s, x ) s the mmum value of f (s, x ): f () s = m f (, s x ) = f (, s x ) x João Mguel da Costa Sousa / Alexadra Moutho 07
Formulato Soluto procedure where f( s, x) = mmedate cost (stage ) + mmum future cost (stages + oward) ( ) = c + sx f + x Value of c sx gve by c j where = s (curret state) ad j = x (mmedate destato). Objectve: fd f (A) ad the correspodg route. Dyamc programmg fds successvely f 4 (s), f 3 (s), f (s) ad fally f (A). Whe = 4, the route s determed by ts curret state s (H or I) ad ts fal destato J. Sce f 4 (s) = f 4 (s, J) = c sj, the soluto for = 4: s f 4 (s) x 4 H 3 J I 4 J João Mguel da Costa Sousa / Alexadra Moutho 08 João Mguel da Costa Sousa / Alexadra Moutho 09 Stage = 3 Needs a few calculatos. If fortue seeker s state F, he ca go to ether H or I wth costs c F,H = 6 or c F,I = 3. Choosg H, the mmum addtoal cost s f 4 (H) = 3. Total cost s 6 + 3 = 9. Choosg I, the total cost s 3 + 4 = 7. Ths s smaller, ad t s the optmal choce for state F. Stage = 3 Smlar calculatos ca be made for the two possble states s = E ad s = G, resultg the table for = 3: f 3 (s, x 3 ) = c sx3 + f 4 (x 3 ) s x 3 H I f 3 (s) x 3 E 4 8 4 H F 9 7 7 I G 6 7 6 H João Mguel da Costa Sousa / Alexadra Moutho 0 João Mguel da Costa Sousa / Alexadra Moutho Stage = I ths case, f (s, x ) = c sx + f 3 (x ). Example for ode C: x = E: f (C, E) = c C,E + f 3 (E) = 3 + 4 = 7 optmal x = F: f (C, F) = c C,F + f 3 (F) = + 7 = 9. x = G: f (C, G) = c C,G + f 3 (G) = 4 + 6 = 0. Stage = Smlar calculatos ca be made for the two possble states s = B ad s = D, resultg the table for = : f (s,, x ) = c + f 3 (x sx ) s x E F G f (s) x B E or F C 7 9 0 7 E D 8 8 8 E or F João Mguel da Costa Sousa / Alexadra Moutho João Mguel da Costa Sousa / Alexadra Moutho 3
Stage = Just oe possble startg state: A. x = B: f (A, B) = c A,B + f (B) = + = 3. x = C: f (A, C) = c A,C + f (C) = 4 + 7 = optmal x = D: f (A, D) = c A,D + f (D) = 3 + 8 = optmal Optmal soluto Three optmal solutos, all wth f (A) = : Results the table: s x f (s, x ) = c sx + f (x ) B C D f (s) x A 3 C or D João Mguel da Costa Sousa / Alexadra Moutho 4 João Mguel da Costa Sousa / Alexadra Moutho 5 Characterstcs of DP. The problem ca be dvded to stages, wth a polcy decso requred at each stage. Example: 4 stages ad lfe surace polcy to choose. Dyamc programmg problems requre makg a sequece of terrelated decsos.. Each stage has a umber of states assocated wth the begg of each stage. Example: states are the possble terrtores where the fortue seeker could be located. States are possble codtos whch the system mght be. Characterstcs of DP 3. Polcy decso trasforms the curret state to a state assocated wth the begg of the ext stage. Example: fortue seeker s decso led hm from hs curret state to the ext state o hs jourey. DP problems ca be terpreted terms of etworks: each ode correspod to a state. Value assged to each lk s the mmedate cotrbuto to the objectve fucto from makg that polcy decso. I most cases, objectve correspods to fdg the shortest or the logest path. João Mguel da Costa Sousa / Alexadra Moutho 6 João Mguel da Costa Sousa / Alexadra Moutho 7 Characterstcs of DP 4. The soluto procedure fds a optmal polcy for the overall problem. Fds a prescrpto of the optmal polcy decso at each stage for each of the possble states. Example: soluto procedure costructed a table for each stage,, that prescrbed the optmal decso, x, for each possble state s. I addto to detfyg optmal solutos, DP provdes a polcy prescrpto of what to do uder every possble crcumstace (why a decso s called polcy decso). Ths s useful for sestvty aalyss. João Mguel da Costa Sousa / Alexadra Moutho 8 Characterstcs of DP 5. Gve the curret state, a optmal polcy for the remag stages s depedet of the polcy decsos adopted prevous stages. Optmal mmedate decso depeds oly o curret state ad ot o how t was obtaed: ths s the prcple of optmalty for DP. Example: at ay state, the surace polcy s depedet o how the fortue seeker got there. Kowledge of the curret state coveys all formato ecessary for determg the optmal polcy heceforth (Markova property). Problems lackg ths property are ot Dyamc Programmg Problems. João Mguel da Costa Sousa / Alexadra Moutho 9 3
Characterstcs of DP 6. Soluto procedure begs by fdg the optmal polcy for the last stage. Soluto s usually trval. 7. A recursve relatoshp that detfes optmal polcy for stage, gve optmal polcy for stage +, s avalable. Example: recursve relatoshp was { + } f () s = m c + f ( x ) sx x Recursve relatoshp dffers somewhat amog dyamc programmg problems. João Mguel da Costa Sousa / Alexadra Moutho 0 Characterstcs of DP 7. (cot.) Notato: N = umber of stages. = label for curret stage ( =,,, N). s = curret stae t for stage. x = decso varable for stage. x = optmal value of x (gve s ). f ( s, x ) = cotrbuto of stages, +,, N to objectve f ( s ) = f ( s, x ) fucto f system starts state s at stage, mmedate decso s x, ad optmal decsos are made thereafter. João Mguel da Costa Sousa / Alexadra Moutho Characterstcs of DP 7. (cot.) Recursve relatoshp: f ( s ) = max { f ( s, x )} or f ( s ) = m { f ( s, x )} x x where f (s, x ) s wrtte terms of s, x, f, ad + ( s+ ) probably some measure of the mmedate cotrbuto of x to the objectve fucto. 8. Usg recursve relatoshp, soluto procedure starts at the ed ad moves backward stage by stage. Stops whe optmal polcy startg at tal stage s foud. The optmal polcy for the etre problem s foud. Example: the tables for the stages show ths procedure. João Mguel da Costa Sousa / Alexadra Moutho Characterstcs of DP 8. (cot.) For DP problems, a table such as the followg would be obtaed for each stage ( = N, N,, ): s x f (s, x ) f ( s ) João Mguel da Costa Sousa / Alexadra Moutho 3 x Determstc dyamc programmg Determstc problems: the state at the ext stage s completely determed by the state ad polcy decso at the curret stage. Form of the objectve fucto: mmze or maxmze the sum, product, etc. of the cotrbutos from the dvdual stages. Set of states: may be dscrete or cotuous, or a state vector. Decso varables ca also be dscrete or cotuous. João Mguel da Costa Sousa / Alexadra Moutho 4 Example: dstrbutg medcal teams The World Health Coucl has fve medcal teams to allocate to three uderdeveloped coutres. Measure of performace: addtoal perso years of lfe,.e., creased lfe expectacy ( years) tmes coutry s populato. Thousads of addtoal perso years of lfe Coutry Medcal teams 3 45 0 50 70 45 70 3 90 75 80 4 05 0 00 5 0 50 30 João Mguel da Costa Sousa / Alexadra Moutho 5 4
Formulato of the problem States to be cosdered Problem requres three terrelated decsos: how may teams to allocate to the three coutres (stages). x s the umber of teams to allocate to stage. What are the states? What chages from oe stage to aother? s = umber of medcal teams stll avalable for remag coutres (,, 3). Thus: s = 5, s = 5 x = s x, s 3 = s x. Thousads of addtoal perso years of lfe Coutry Medcal teams 3 45 0 50 70 45 70 3 90 75 80 4 05 0 00 5 0 50 30 João Mguel da Costa Sousa / Alexadra Moutho 6 João Mguel da Costa Sousa 7 Overall problem Polcy p (x ): measure of performace from allocatg x medcal teams to coutry. 3 = Maxmze p( x ), subject to Recursve relatoshp relatg fuctos: { + } f ( s ) = max p ( x ) + f ( s x ), for =, x= 0,,, s f ( s ) = max p ( x ) 3 3 3 3 x = 0,,, s 3 ad x 3 = x = 5, are oegatve tegers. João Mguel da Costa Sousa / Alexadra Moutho 8 João Mguel da Costa Sousa / Alexadra Moutho 9 Soluto procedure, stage = 3 For last stage = 3, values of p 3 (x 3 ) are the last colum of table. Here, x 3 = s 3 ad f 3 (s 3 )= p 3 (s 3 ). Stage = Here, fdg x requres calculatg f (s, x ) for the values of x = 0,,, s. Example for s = : Medcal teams Thousads of addtoal perso years of lfe Coutry 3 45 0 50 70 45 70 3 90 75 80 4 05 0 00 5 0 50 30 = 3: s 3 f 3 (s 3 ) x 3 0 0 0 50 70 3 80 3 4 00 4 5 30 5 Medcal teams Thousads of addtoal perso years of lfe Coutry 3 45 0 50 70 45 70 3 90 75 80 4 05 0 00 5 0 50 30 State: 45 0 0 0 50 0 70 João Mguel da Costa Sousa / Alexadra Moutho 30 João Mguel da Costa Sousa / Alexadra Moutho 3 5
Stage = Smlar calculatos ca be made for the other values of s : f (s, x ) = p (x ) + f 3 (s x ) = : s x 0 3 4 5 f (s ) x 50 0 50 0 70 70 45 70 0 or 3 80 90 95 75 95 4 00 00 5 5 0 5 3 5 30 0 5 45 60 50 60 4 João Mguel da Costa Sousa / Alexadra Moutho 3 Stage = Oly state s the startg state s = 5: State: 5 0 0 0... 5 45 4 0 60 5 Thousads of addtoal perso years of lfe Coutry Medcal teams 3 45 0 50 70 45 70 3 90 75 80 4 05 0 00 5 0 50 30 = : s x f (s, x ) = p (x ) + f (s x ) 0 3 4 5 f (s ) x 5 60 70 65 60 55 0 70 João Mguel da Costa Sousa / Alexadra Moutho 33 Optmal polcy decso Dstrbuto of effort problem Oe kd of resource s allocated to a umber of actvtes. Objectve: how to dstrbute the effort (resource) amog the actvtes most effectvely. DP volves oly oe (or few) resources, whle LP ca deal wth thousads of resources. The assumptos of LP: proportoalty, dvsblty ad certaty ca be volated by DP. Oly addtvty (or aalogous for product of terms) s ecessary because of the prcple of optmalty. World Health Coucl problem volates proportoalty ad dvsblty (WHY?) João Mguel da Costa Sousa 34 João Mguel da Costa Sousa / Alexadra Moutho 35 Formulato of dstrbuto of effort Stage = actvty ( =,,, N). x = amout of resource allocated to actvty. State s = amout of resource stll avalable for allocato to remag actvtes (,, N). Whe system starts at stage state s, choce x results the ext state at stage + beg s + = s x : Stage: + State: s x s x Example Dstrbutg scetsts to research teams 3 teams are solvg egeerg problem to safely fly people to Mars. extra scetsts reduce the probablty of falure. Probablty of falure Team New scetsts 3 0 0.40 0.60 0.80 0.0 0.40 0.50 0.5 0.0 0.30 João Mguel da Costa Sousa / Alexadra Moutho 36 João Mguel da Costa Sousa / Alexadra Moutho 37 6
Cotuous dyamc programmg Prevous examples had a dscrete state varable s, at each stage. They all have bee reversble; the soluto procedure could have moved backward or forward stage by stage. Next example s cotuous. As s ca take ay values certa tervals, the solutos f (s ) ad x must be expressed as fuctos of s. Stages the ext example wll correspod to tme perods, so the soluto must proceed backwards. Example: schedulg jobs The compay Local Job Shop eeds to schedule employmet jobs due to seasoal fluctuatos. Mache operators are dffcult to hre ad costly to tra. Peak seaso payroll should ot be mataed afterwards. Overtme work o a regular bass should be avoded. Mmum requremets ear future: Seaso Sprg Summer Autum Wter Sprg Requremets 55 0 40 00 55 João Mguel da Costa Sousa / Alexadra Moutho 38 João Mguel da Costa Sousa / Alexadra Moutho 39 Example: schedulg jobs Formulato Employmet above level the table costs $,000 per perso per seaso. Total cost of chagg level of employmet from oe seaso to the other s $00 tmes the square of the dfferece employmet levels. Fractoal levels are possble due to part tme employees. From data, maxmum employmet should be 55 (sprg). It s ecessary to fd the level of employmet for other seasos. Seasos are stages. Oe cycle of four seasos, where stage s summer ad stage 4 s sprg (kow employmet). x = employmet level for stage ( =,,3,4); x 4 =55 r = mmum employmet requremet for stage : r =0, r =40, r 3 =00, r 4 =55. Thus: r x 55 João Mguel da Costa Sousa / Alexadra Moutho 40 João Mguel da Costa Sousa / Alexadra Moutho 4 Formulato Cost for stage = 00(x x ) + 000(x r ) State s : employmet the precedg seaso x s = x (=: s = x 0 = x 4 = 55) Problem: Choose x, x ad x 3 as to 4 x x + x r = mmze 000( ) 00( ), subject to r x 55, for =,,3,4 Data Choose x, x ad x 3 as to 4 x x + x r = mmze 000( ) 00( ), subject to r x 55, for =,,3,4 r Feasble x Possble s = x Cost 0 0 x 55 s = 55 00(x 55) + 000(x 0) 40 40 x 55 0 s 55 00(x x ) + 000(x 40) 3 00 00 x 3 55 40 s 3 55 00(x 3 x ) + 000(x 3 00) 4 55 x 4 = 55 00 s 4 55 00(55 x 3 ) João Mguel da Costa Sousa / Alexadra Moutho 4 João Mguel da Costa Sousa / Alexadra Moutho 43 7
Formulato Recursve relatoshp: Basc structure of the problem: { + } f ( s ) = m 00( x s ) + 000( x r ) + f ( x ) r x 55 Soluto procedure r Feasble x Possble s = x Cost 0 0 x 55 s = 55 00(x 55) + 000(x 0) 40 40 x 55 0 s 55 00(x x ) + 000(x 40) 3 00 00 x 3 55 40 s 3 55 00(x 3 x ) + 000(x 3 00) 4 55 x 4 = 55 00 s 4 55 00(55 x 3) Stage 4: the soluto s kow to be x 4 = 55. s 4 f 4 (s 4 ) x 4 00 s 4 55 00(55 s 4 ) 55 João Mguel da Costa Sousa / Alexadra Moutho 44 João Mguel da Costa Sousa / Alexadra Moutho 45 Soluto procedure Graphcal soluto for f 3 (x 3 ) r Feasble x Possble s = x Cost 0 0 x 55 s = 55 00(x 55) + 000(x 0) 40 40 x 55 0 s 55 00(x x ) + 000(x 40) 3 00 00 x 3 55 40 s 3 55 00(x 3 x ) + 000(x 3 00) 4 55 x 4 = 55 00 s 4 55 00(55 x 3 ) Stage 3: 40 s 3 55: { } f ( s ) = m 00( x s ) + 000( x 00) + f ( x ) 3 3 00 x3 55 3 3 3 4 3 00 x3 55 { x3 s3 x3 x3 } = m 00( ) + 000( 00) + 00(55 ) João Mguel da Costa Sousa / Alexadra Moutho 46 João Mguel da Costa Sousa / Alexadra Moutho 47 Calculus soluto for f 3 (x 3 ) Usg calculus: f3( s3, x3) = 400( x3 s3) + 000 400(55 x3) x3 = 400(x3 s3 50) = 0 s3 + 50 x3 = Guaratees mmum? s 3 f 3 (s 3 ) x 3 40 s 3 55 50(50 s 3 ) +50(60 s 3 ) +000(s 3 50) (s 3 +50)/ Stage Solved a smlar fasho, wth f( s, x) = 00( x s) + 000( x r) + f3 ( x3) = 00( x s) + 000( x 40) + 50(50 x ) + 50(60 x ) + 000( x 50) for 0 s 55 (possble values) ad 40 x 55 (feasble values). Solvg / x [f (s, x )] = 0, yelds: s + 40 x = 3 João Mguel da Costa Sousa / Alexadra Moutho 48 João Mguel da Costa Sousa / Alexadra Moutho 49 8
Stage The soluto has to be feasble for 0 s 55 (.e., 40 x 55 for 0 s 55 )! s + 40 x = oly feasble for 40 s 55. 3 Need d to solve for feasble value of x that t mmzes f (s, x ) whe 0 s 40. For s 40, so x = 40. x f ( s, x ) > 0 for 40 x 55 Why? João Mguel da Costa Sousa / Alexadra Moutho 50 Stage ad Stage s f (s ) x 0 s 40 00(40 s ) +5000 40 40 s 55 00/9[(40 s ) +(55 s ) (70 s ) ]+000(s 95) Stage : procedure s smlar. (s +40)/3 55 85000 47.5 Soluto: x = 47.5, x = 45, x 3 = 47.5, x 4 = 55 Total cost of $85,000 s f (s ) x How? João Mguel da Costa Sousa / Alexadra Moutho 5 Determstc cotuous problem Cosder the followg olear programmg problem: Maxmze Z x x, subject to x x. (There are o oegatvty costrats.) Use dyamc programmg to solve ths problem. Probablstc dyamc programmg State at ext stage s ot completely determed by state ad polcy decso at curret stage. There s a probablty dstrbuto for determg the ext state, see fgure. S = umber of possble states at stage +. system goes to ( =,,,S) wth probablty p gve state s ad decso x at stage. C = cotrbuto of stage to objectve fucto. If fgure s expaded to all possble states ad decsos at all stages, t s a decso tree. João Mguel da Costa Sousa / Alexadra Moutho 5 João Mguel da Costa Sousa / Alexadra Moutho 53 Basc structure Probablstc dyamc programmg Relato betwee f (s, x ) ad f + (s + ) depeds upo form of overall objectve fucto. Example: mmze the expected sum of the cotrbutos from dvdual stages. f (s, x ) s the mmum expected sum from stage oward, gve state s ad polcy decso x at stage : wth S = + + = f ( s, x ) p C f ( ) f () = m f (, x ) + x + + + João Mguel da Costa Sousa / Alexadra Moutho 54 João Mguel da Costa Sousa / Alexadra Moutho 55 9
Example: determg reject allowaces The Ht ad Mss Maufacturg Compay receved a order to supply tem of a partcular type. Customer requres specfed strget qualty requremets. Maufacturer has to produce more tha oe to acheve oe acceptable. Number of extra tems s the reject allowace. Probablty of acceptable or defectve s ½. Number of acceptable tems a lot of sze L has a bomal dstrbuto: probablty of ot acceptable tems s (/) L. Setup cost = $300, cost per tem = $00. Maxmum producto rus = 3. Cost of o acceptable tem after 3 rus = $,600. João Mguel da Costa Sousa / Alexadra Moutho 56 Formulato Objectve: determe polcy regardg lot sze (+reject allowace) for requred producto ru(s) that mmzes total expected cost. Stage = producto ru ( =,,3), x = lot sze for stage, State s = umber of acceptable tems stll eeded ( or 0) at the begg of stage. At stage, state s =. João Mguel da Costa Sousa / Alexadra Moutho 57 Formulato f (s, x ) = total expected cost for stages,,3 ad optmal decsos are: f ( s) = m f( s, x) x = 0,, f (0) = 0. Moetary t ut s $00. Cotrbuto t to cost from stage s [K(x ) + x ], wth 0, f x = 0 Kx ( ) = 3, f x > 0 Note that f 4 () = 6. João Mguel da Costa Sousa / Alexadra Moutho 58 Basc structure of the problem Recursve relatoshp: x { + } f () = m K( x ) + x + 0.5 f () x = 0,,, for =,,3 João Mguel da Costa Sousa / Alexadra Moutho 59 Soluto procedure s 3 x 3 f 3 (, x 3 ) = K(x 3 ) + x 3 + (/) x 36 = 3: 0 3 4 5 f 3 (s 3 ) x 3 6 9 8 8 8.5 8 3 or 4 f (, x ) = K(x ) + x +(/) x f3 () s x = : 0 3 4 f (s x ) 8 8 7 7 7.5 7 or 3 s x f (, x ) = K(x ) + x +(/) x f () = : 0 3 4 f (s ) x 7 7.5 6.75 6.875 7.44 6.75 Optmal soluto? João Mguel da Costa Sousa / Alexadra Moutho 60 Probablstc problem A eterprsg youg statstca beleves that she has developed a system for wg a popular Las Vegas game. Her colleagues do ot beleve that her system works, so they have made a large bet wth her that f she starts wth three chps, she wll ot have at least fve chps after three plays of the game. Each play of the game volves bettg ay desred umber of avalable chps ad the ether wg or losg ths umber of chps. The statstca beleves that her system wll gve her a probablty of /3 of wg a gve play of the game. Assumg the statstca s correct, use dyamc programmg to determe her optmal polcy regardg how may chps to bet (f ay) at each of the three plays of the game. The decso at each play should take to accout the results of earler plays. The objectve s to maxmze the probablty of wg her bet wth her colleagues. João Mguel da Costa Sousa / Alexadra Moutho 6 0