Approximate Dynamic Programming: Solving the curses of dimensionality

Size: px

Start display at page:

Download "Approximate Dynamic Programming: Solving the curses of dimensionality"

Anthony Harold Turner
5 years ago
Views:

1 Approximate Dynamic Programming: Solving the curses of dimensionality Informs Computing Society Tutorial October, 2008 Warren Powell CASTLE Laboratory Princeton University Warren B. Powell, Princeton University 2008 Warren B. Powell Slide 1

2 2008 Warren B. Powell Slide 2

3 2008 Warren B. Powell Slide 3

4 The fractional jet ownership industry 2008 Warren B. Powell Slide 4

5 NetJets Inc Warren B. Powell Slide 5

6 Schneider National 2008 Warren B. Powell Slide 6

7 Schneider National 2008 Warren B. Powell Slide 7

Planning for a risky world Weather Robust design of emergency response networks.

events. Disease Models of disease propagation for response planning.

8 Planning for a risky world Weather Robust design of emergency response networks. Design of financial instruments to hedge against weather emergencies to protect individuals, companies and municipalities. Design of sensor networks and communication systems to manage responses to major weather events. Disease Models of disease propagation for response planning. Management of medical personnel, equipment and vaccines to respond to a disease outbreak. Robust design of supply chains to mitigate the disruption of transportation systems Warren B. Powell Slide 8

Energy management Energy resource allocation What is the right mix of

How should the use of different energy resources be coordinated over space

Should I invest in nuclear energy? What is the impact of a carbon tax?

9 Energy management Energy resource allocation What is the right mix of energy technologies? How should the use of different energy resources be coordinated over space and time? What should my energy R&D portfolio look like? Should I invest in nuclear energy? What is the impact of a carbon tax? Energy markets How should I hedge energy commodities? How do I price energy assets? What is the right price for energy futures? 2008 Warren B. Powell Slide 9

High value spare parts Electric Power Grid PJM oversees an aging investment in highvoltage transformers.

Typical cost of a replacement ~ $5 million. Failures vary widely in terms of economic impact on the network.

spare parts. Inventory strategy has to determine what to buy, when and where to store it.

10 High value spare parts Electric Power Grid PJM oversees an aging investment in highvoltage transformers. Replacement strategy needs to anticipate a bulge in retirements and failures 1-2 year lag times on orders. Typical cost of a replacement ~ $5 million. Failures vary widely in terms of economic impact on the network. Spare parts for business jets ADP is used to determine purchasing and allocation strategies for over 400 high value spare parts. Inventory strategy has to determine what to buy, when and where to store it. Many parts are very low volume (e.g. 7 spares spread across 15 service centers). Inventories have to meet global targets on level of service and inventory costs Warren B. Powell Slide 10

11 Challenges Real-time control» Scheduling aircraft, pilots, generators, tankers» Pricing stocks, options» Electricity resource allocation Near-term tactical planning» Can I accept a customer request?» Should I lease equipment?» How much energy can I commit with my windmills? Strategic planning» What is the right equipment mix?» What is the value of this contract?» What is the value of more reliable aircraft? 2008 Warren B. Powell Slide 11

12 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

13 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

14 A resource allocation model Attribute vectors: a = Asset class Time invested Type Location Age Location ETA Home Experience Driving hours Location ETA A/C type Fuel level Home shop Crew Eqpt1 Eqpt Warren B. Powell Slide 14

15 A resource allocation model Modeling resources:» The attributes of a single resource: a = The attributes of a single resource a A The attribute space» The resource state vector: R» The information process: ˆ ta R ta = The number of resources with attribute ( ) Rt = Rta a A The resource state vector = The change in the number of resources with attribute a. a 2008 Warren B. Powell Slide 15

16 A resource allocation model Modeling demands:» The attributes of a single demand: b = The attributes of a demand to be served. b B The attribute space» The demand state vector: D = The number of demands with attribute Dt = Dtb The demand state vector b B» The information process: ˆ tb D tb ( ) = The change in the number of demands with attribute b. b 2008 Warren B. Powell Slide 16

17 Energy resource modeling The system state: S ( R D ρ ) =,, = System state, where: t t t t R t D t = Resource state (how much capacity, reserves) = Market demands ρ = "system parameters" t State of the technology (costs, performance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal) 2008 Warren B. Powell Slide 17

18 Energy resource modeling The decision variable: x t New capacity Retired capacity for each: = Type Location Technology 2008 Warren B. Powell Slide 18

19 Energy resource modeling Exogenous information: W = ( Rˆ Dˆ ˆ ρ ) New information =,, t t t t Rˆ t = Exogenous changes in capacity, reserves Dˆ t = New demands for energy from each source ˆ ρ = Exogenous changes in parameters. t 2008 Warren B. Powell Slide 19

20 Energy resource modeling The transition function = M t+ 1 t t t+ 1 S S ( S, x, W ) 2008 Warren B. Powell Slide 20

21 A resource allocation model Resources Demands 2008 Warren B. Powell Slide 21

22 A resource allocation model t t+1 t Warren B. Powell Slide 22

23 A resource allocation model Optimizing over time t t+1 t+2 Optimizing at a point in time 2008 Warren B. Powell Slide 23

24 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

25 Introduction to ADP We just solved Bellman s equation: V ( S ) = max C ( S, x ) + E V ( S ) S { } t t t t t t+ 1 t+ 1 t x X» We found the value of being in each state by stepping backward through the tree Warren B. Powell Slide 25

26 Introduction to ADP The challenge of dynamic programming: ( { }) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X Problem: Curse of dimensionality 2008 Warren B. Powell Slide 26

27 Introduction to ADP The challenge of dynamic programming: ( { }) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X Three curses Problem: Curse of dimensionality State space Outcome space Action space (feasible region) 2008 Warren B. Powell Slide 27

28 Introduction to ADP The computational challenge: ( { }) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X How do we find V ( S )? t+ 1 t+ 1 How do we compute the expectation? How do we find the optimal solution? 2008 Warren B. Powell Slide 28

29 Introduction to ADP Classical ADP» Most applications of ADP focus on the challenge of handling multidimensional state variables» Start with ( { }) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X» Now replace the value function with some sort of approximation V ( S ) V ( S ) t t t t

30 Introduction to ADP Approximating the value function:» We have to exploit the structure of the value function (e.g. concavity).» We might approximate the value function using a simple polynomial V ( S θ) = θ + θ S + θ S 2 t t 0 1 t 2 t».. or a complicated one: V ( S θ) = θ + θ S + θ S + θ ln S + θ sin( S )» Sometimes, they get really messy: ( ) 2 t t 0 1 t 2 t 3 t 4 t

31 t+ 2 t+ 3 (0) (1) (1) t( θ) = θ + θtst, ' tst, ' + θtwt, ' twt, ' s t' = t w t' = t V R R R + θ R + (2) 2 (2) 2 twt, ' twt, ' tst, ' tst, ' w t' s t' + θ R s 1 (3) ts t, st t, s' t S s' t+ 1 t+ 1 (4) θts Rt, st ' Rt, s' t ' s t' = t 2S s' t' = t R 1 θ 2 R t+ 2 t+ 2 (5) θts Rt, st ' Rt, s' t ' s t' = t 3S s' t' = t w s θ R R ( ws,1) tws, twt, tst, t+ 2 ( ws,2) + θtws, Rtwt, Rtst, ' w s t' = t t+ 3 t+ 2 ( ws,3) θtws, Rtwt, ' Rtst, ' w s t' = t t' = t t+ 2 t+ 2 ( ws,4) θtws, Rtwt, ' Rtst, ' w s t ' = t t' = t w s θ 1 R R R ( ws,5) tsw, tst,, + 2 twt, twt,, ,2,3 4,5,6,

32 Introduction to ADP We can write a model of the observed value of being in a state as: vˆ = θ + θ S + θ S + θ ln S + θ sin( S ) + ε ( ) t 2 t 3 t 4 t This is often written as a generic regression model: Y = θ + θ X + θ X + θ X + θ X The ADP community refers to the independent variables as basis functions: Y = θ ϕ ( S) + θϕ ( S) + θ ϕ ( S) + θ ϕ ( S) + θ ϕ ( S) = θϕ ( S) f F f f ϕ ( R) are also known as features. f

33 Introduction to ADP Methods for estimating»» θ 1 2 N ˆ ˆ ˆ Generate observations v, v,..., v, and use traditional regression methods to fit θ. n Stochastic gradient for updating θ : ( ˆ ) θ = θ α V ( S θ ) v V ( S θ ) n n 1 n 1 n n 1 n n 1 n n 1 n 1 ( V ( S ) vˆ ) = θ α θ n 1 n 1 n n 1 n n 1 Error ϕ1( S) ϕ2( S) ϕf ( S) Basis functions

34 Introduction to ADP Methods for estimating θ» Recursive statistics Iterative equations avoid matrix inverse: n n 1 n n n n θ = θ H x ε ε = Error n 1 n 1 T H = B B is F F matrix= XX n γ ( ( ) ) n n 1 1 n 1 n n T n 1 B = B B x x B n γ γ n n T n 1 n = 1+ ( ) x B x 1

35 Introduction to ADP Other statistical methods» Regression trees Combines regression with techniques for discrete variables.» Data mining Good for categorical data» Neural networks Engineers like this for low-dimensional continuous problems

36 Introduction to ADP Notes:» When approximating value functions, we are basically drawing on the entire field of statistics.» Choosing an approximation is primarily an art.» In special cases, the resulting algorithm can produce optimal solutions.» Most of the time, we are hoping for good solutions.» In some cases, it can work terribly.» As a general rule you have to use problem structure. Value function approximations have to capture the right structure. Blind use of polynomials will rarely be successful.

37 Introduction to ADP What you will struggle with:» Stepsizes Can t live with em, can t live without em. Too small, you think you have converged but you have really just stalled ( apparent convergence ) Too large, and the system is unstable.» Stability There are two sources of randomness: The traditional exogenous randomness An evolving policy» Exploration vs. exploitation You sometimes have to choose to visit a state just to collect information about the value of being in a state.

38 Introduction to ADP But we are not out of the woods» Assume we have an approximate value function.» We still have to solve a problem that looks like Vt( St) = max Ct( St, xt) + E θφ f f ( St+ 1) x X f F» This means we still have to deal with a maximization problem (might be a linear, nonlinear or integer program) with an expectation.

39 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

40 Action State Use weather report Do not use weather report Information Forecast rain.1 Forecast cloudy.3 Forecast sunny.6 Schedule game Cancel game State Action Schedule game Cancel game Schedule game Cancel game Schedule game Cancel game Rain.2 -$2000 Clouds.3 $1000 Sun.5 $5000 Rain.2 -$200 Clouds.3 -$200 Sun.5 -$200 Information Rain.8 -$2000 Clouds.2 $1000 Sun.0 $5000 Rain.8 -$200 Clouds.2 -$200 Sun.0 -$200 Rain.1 -$2000 Clouds.5 $1000 Sun.4 $5000 Rain.1 -$200 Clouds.5 -$200 Sun.4 -$200 Rain.1 -$2000 Clouds.2 $1000 Sun.7 $5000 Rain.1 -$200 Clouds.2 -$200 Sun.7 -$200 - Decision nodes - Outcome nodes

41 Schedule game Cancel game Schedule game Cancel game Schedule game Cancel game $2400 -$200 -$1400 -$200 $2300 -$200 $3500 -$200 Forecast rain.1 Forecast cloudy.3 Forecast sunny.6 Schedule game Cancel game Use weather report Do not use weather report

42 -$200 $2300 $3500 $2400 -$200 Forecast rain.1 Forecast cloudy.3 Forecast sunny.6 Schedule game Cancel game Use weather report Do not use weather report

43 $2770 $2400 Use weather report Do not use weather report

44 The post-decision state 2008 Warren B. Powell Slide 44

45 The post-decision state Managing blood inventories over time Week 0 Week 1 Week 2 Week 3 S 0 x 0 Rˆ, Dˆ 1 1 S 1 x 1 Rˆ, Dˆ 2 2 S 2 x 2 S x 2 Rˆ, Dˆ 3 3 x S S 3 3 x 3 t=0 t=1 t=2 t= Warren B. Powell Slide 45

46 The post-decision state What happens if use Bellman s equation?» State variable is: The supply of each type of blood 8 blood types 6 ages The demand for each type of blood 8 blood types 56 dimensional state vector» Decision variable is how much of 8 blood types to supply to 8 demand types dimensional decision vector» Random information Blood donations by week (8 types) New demands for blood (8 types) 16-dimensional information vector 2008 Warren B. Powell Slide 46

47 Rain.8 -$2000 Clouds.2 $1000 Sun.0 $5000 Rain.8 -$200 Clouds.2 -$200 Sun.0 -$200 Rain.1 -$2000 Clouds.5 $1000 Sun.4 $5000 Rain.1 -$200 Clouds.5 -$200 Sun.4 -$200 Rain.1 -$2000 Clouds.2 $1000 Sun.7 $5000 Rain.1 -$200 Clouds.2 -$200 S n 7 $200 Schedule game S t S t + 1 Cancel game Forecast rain.1 Schedule game Forecast cloudy.3 Cancel game Forecast sunny.6 Use weather report Schedule game Cancel game Rain 2 -$2000 Do not weath ( { } ) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X

48 Rain.8 -$2000 Clouds.2 $1000 Sun.0 $5000 Rain.8 -$200 Clouds.2 -$200 Sun.0 -$200 Rain.1 -$2000 Clouds.5 $1000 Sun.4 $5000 Rain.1 -$200 Clouds.5 -$200 Sun.4 -$200 Rain.1 -$2000 Clouds.2 $1000 Sun.7 $5000 Rain.1 -$200 Clouds.2 -$200 Sun.7 -$200 Schedule game Cancel game Schedule game Cancel game Schedule game Cancel game Rain.2 -$2000 Clouds.3 $1000 Sun.5 $5000 Rain.2 -$200 Clouds.3 -$200 Sun.5 -$200 - Decision nodes - Outcome nodes Forecast rain.1 Forecast cloudy.3 Forecast sunny.6 Schedule game Cancel game Use weather report Do not use weather report

49 The post-decision state New concept:» The pre-decision state variable: St = The information required to make a decision x t Same as a decision node in a decision tree.» The post-decision state variable: x S t = The state of what we know immediately after we make a decision. Same as an outcome node in a decision tree Warren B. Powell Slide 49

50 The post-decision state An inventory problem:» Our basic inventory equation: R t+1 =max{0,r t + x t ˆD t+1 } where R t = Inventory at time t. x t = Amount we ordered. ˆD t+1 = Demand in next time period.» Using pre- and post-decision states: Rt x = R t + x t Post-decision state R t+1 = max{0,rt x ˆD t+1 } Pre-decision state 2008 Warren B. Powell Slide 50

51 The post-decision state Pre-decision, state-action, and post-decision Pre-decision state State Action Post-decision state 9 3 states state-action pairs 9 3 states 2008 Warren B. Powell Slide 51

52 The post-decision state Pre-decision: resources and demands S = ( R, D ) t t t 2008 Warren B. Powell Slide 52

53 The post-decision state S = S ( S, x ) x M, x t t t 2008 Warren B. Powell Slide 53

54 The post-decision state x S t S = S ( S, W ) MW, x t+ 1 t t+ 1 W = ( Rˆ, Dˆ ) t+ 1 t+ 1 t Warren B. Powell Slide 54

55 The post-decision state S t Warren B. Powell Slide 55

56 The post-decision state Classical form of Bellman s equation: { } ( ) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X Bellman s equations around pre- and post-decision states:» Optimization problem (making the decision): ( x( M, x )) V ( S ) = max C ( S, x ) + V S ( S, x ) t t x t t t t t t t Note: this problem is deterministic!» Simulation problem (the effect of exogenous information): {, } V ( S ) = E V ( S ( S, W )) S x x M W x x t t t t t t 2008 Warren B. Powell Slide 56

57 The post-decision state Challenges» For most practical problems, we are not going to be x x able to compute V ( S ). t ( x x ) V ( S ) = max C ( S, x ) + V ( S ) t t x t t t t t» Concept: replace it with an approximation V ( x t St ) and solve» So now we face: What should the approximation look like? How do we estimate it? t ( x ) V ( S ) = max C ( S, x ) + V ( S ) t t x t t t t t 2008 Warren B. Powell Slide 57

58 The post-decision state Value function approximations:» Linear (in the resource state): x V ( R ) = v R x t t ta ta a A» Piecewise linear, separable: x x Vt ( Rt ) = Vta ( Rta ) a A» Indexed PWL separable: ( x ) x Vt( Rt ) = Vta Rta ( featurest) a A 2008 Warren B. Powell Slide 58

59 The post-decision state Value function approximations:» Ridge regression (Klabjan and Adelman)» Benders cuts ( ) x V ( R ) = V R R = θ R t t tf tf tf fa ta f F a A f V ( R ) t t x 1 x Warren B. Powell Slide 59

60 The post-decision state Comparison to other methods:» Classical MDP (value iteration) ( n 1 γ ) t+ 1 n V ( S) = max C( S, x) + EV ( S ) x» Classical ADP (pre-decision state): n n n n 1 vˆ t = max x Ct( St, xt) + p( s' St, xt) Vt+ 1 s' s' V ( S ) = (1 α ) V ( S ) + α vˆ n n n 1 n n t t n 1 t t n 1 t, 1» Our method (update V x n around post-decision state): (, 1, ) n n x n M x n vˆ = max C ( S, x ) + V ( S ( S, x )) t x t t t t t t V ( S ) = (1 α ) V ( S ) + α vˆ n x, n n 1 x, n n t 1 t 1 n 1 t 1 t 1 n 1 t t ( ) Expectation vˆ updates V ( S ) t t t vˆ updates V ( S ) x t t 1 t Warren B. Powell Slide 60

61 The post-decision state n Step 1: Start with a pre-decision state S t Step 2: Solve the deterministic optimization using an approximate value function: ( 1,, ))) n n n M x n vˆ = max C ( S, x ) + V ( S ( S x t x t t t t t t n x t to obtain. Step 3: Update the value function approximation n x, n n 1 x, n n t 1( t 1) = (1 αn 1) t 1 ( t 1) + αn 1ˆ t V S V S v n Step 4: Obtain Monte Carlo sample of Wt ( ω ) and compute the next pre-decision state: n M n n n St+ 1 = S ( St, xt, Wt+ 1( ω )) Step 5: Return to step 1. Deterministic optimization Recursive statistics Simulation 2008 Warren B. Powell Slide 61

Cannot handle high levels of complexity Simulation Simulation Strengths Extremely flexible High level of detail Easily

62 Approximate dynamic programming Optimization Optimization Strengths Produces optimal decisions. Mature technology Weaknesses Cannot handle uncertainty. Cannot handle high levels of complexity Simulation Simulation Strengths Extremely flexible High level of detail Easily handles uncertainty Weaknesses Models decisions using user-specified rules. Low solution quality. Cplex Approximate dynamic programming combines simulation and optimization in a rigorous yet flexible framework.

63 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

64 Blood management Managing blood inventories 2008 Warren B. Powell Slide 64

65 Blood management Managing blood inventories over time Week 0 Week 1 Week 2 Week 3 S 0 x 0 S x 0 Rˆ, Dˆ 1 1 S 1 x 1 S x 1 Rˆ, Dˆ 2 2 S 2 x 2 S x 2 Rˆ, Dˆ 3 3 x S S 3 3 x 3 t=0 t=1 t=2 t= Warren B. Powell Slide 65

66 S t = ( R, ˆ ) t Dt AB+ D ˆt, AB+ x R t AB+,0 R t,( AB+,0) AB+,0 AB- D ˆt, AB AB+,1 R t,( AB+,1) R t,( AB+,2) AB+,1 AB+,2 A+ A- D ˆt, A+ D ˆt, AB+ AB+,2 AB+,3 R R R t,( O,0) t,( O,1) t,( O,2) O-,0 O-,1 O-,2 B+ B- D ˆt, AB+ D ˆt, AB+ D ˆt, AB+ O-,0 O-,1 O-,2 O+ O-,3 O- D ˆt, AB+ Satisfy a demand Hold

67 R t x R t R t + 1 R t,( AB+,0) AB+,0 AB+,0 R ˆt+ 1, AB+ AB+,0 R t,( AB+,1) AB+,1 AB+,1 AB+,1 R t,( AB+,2) AB+,2 AB+,2 AB+,2 AB+,3 AB+,3 R t,( O,0) O-,0 R t,( O,1) O-,1 O-,0 R ˆt+ 1, O O-,0 R t,( O,2) O-,2 O-,1 O-,1 O-,2 O-,2 O-,3 O-,3 Dˆt

68 R t x R t R t,( AB+,0) R t,( AB+,1) R t,( AB+,2) R R R t,( O,0) t,( O,1) t,( O,2) AB+,0 AB+,1 AB+,2 O-,0 O-,1 O-,2 AB+,0 AB+,1 AB+,2 AB+,3 O-,0 O-,1 O-,2 O-,3 Dˆt

69 R t x R t ( ) F R t R t,( AB+,0) R t,( AB+,1) R t,( AB+,2) R R R t,( O,0) t,( O,1) t,( O,2) AB+,0 AB+,1 AB+,2 O-,0 O-,1 O-,2 AB+,0 AB+,1 AB+,2 AB+,3 O-,0 O-,1 O-,2 Solve this as a linear program. O-,3 Dˆt

70 Duals R t x R t ( ) F R t ˆt ν,( AB +,0) AB+,0 AB+,0 ˆt ν,( AB +,1) AB+,1 AB+,1 ˆt ν,( AB +,2) AB+,2 AB+,2 AB+,3 ν ˆt,( AB,0) O-,0 ν ˆt,( AB,1) O-,1 O-,0 ν ˆt,( AB,2) O-,2 O-,1 O-,2 Dual variables give value additional unit of blood.. O-,3 Dˆt

71 Updating the value function approximation Estimate the gradient at n R t ˆn ν t,( AB+,2) ( ) F R t n R t,( AB+,2) 2008 Warren B. Powell Slide 71

72 Updating the value function approximation Update the value function at xn, Rt 1 V ( R ) n 1 x t 1 t 1 ˆn ν t,( AB+,2) ( ) F R t xn, Rt 1 n R t,( AB+,2) 2008 Warren B. Powell Slide 72

73 Updating the value function approximation Update the value function at xn, Rt 1 V ( R ) n 1 x t 1 t 1 ˆn ν t,( AB+,2) xn, Rt Warren B. Powell Slide 73

74 Updating the value function approximation Update the value function at xn, Rt 1 V ( R ) n 1 x t 1 t 1 V ( R ) n x t 1 t 1 xn, Rt Warren B. Powell Slide 74

75 Iterative learning t 2008 Warren B. Powell Slide 75

76 Iterative learning 2008 Warren B. Powell Slide 76

77 Iterative learning 2008 Warren B. Powell Slide 77

78 Iterative learning 2008 Warren B. Powell Slide 78

79 Approximate dynamic programming With luck, the objective function will improve steadily Objective function Iteration

80 Approximate dynamic programming but performance can be jagged. Chart Title Objective function Iterations

81 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

82 Hierarchical aggregation Attribute vectors: a = Asset class Time invested Blood type Age Frozen? Location ETA Bus. segment Single/team Domicile Drive hours Days from home Location ETA A/C type Fuel level Home shop Crew Eqpt1 Eqpt100

83 Aggregation Estimating value functions $2050 = (1 0.10) $ (0.10) $2500 Location Location Location Fleet v Fleet α v Fleet αvˆ Domicile = + Domicile Domicile DOThrs DaysFromHome n n 1 ( ) (1 ) ( ) ( ) Value function Approximation may have fewer attributes than driver. Drivers may have very detailed attributes

84 Hierarchical aggregation Estimating value functions» Most disaggregate level Location Location Location Fleet v Fleet α v Fleet αvˆ Domicile = + Domicile Domicile DOThrs DaysFromHome n n 1 ( ) (1 ) ( ) ( )

85 Hierarchical aggregation Estimating value functions» Middle level of aggregation Location Fleet Location Location v α v αvˆ Domicile Fleet = + Fleet DOThrs DaysFromHome n n 1 ( ) (1 ) ( ) ( )

86 Hierarchical aggregation Estimating value functions» Most aggregate level Location Fleet n n 1 v ([ Location] ) = (1 α) v ([ Location] ) + αvˆ ( Domicile ) DOThrs DaysFromHome

87 Hierarchical aggregation State-dependent weighted aggregation:» Now we have a linear regression for each attribute v = w v ( g) ( g) a a a g» We cannot solve hundreds of thousands of linear regressions. Instead use ( ( ) ( ) 2 ) β 1 w Var v + w = 1 ( g) ( g) ( g) ( g) a a a a g Estimate of variance Estimate of bias

88 Hierarchical aggregation Objective function Aggregate Disaggregate Iterations

89 Hierarchical aggregation Weighted Combination Objective function Aggregate Disaggregate Iterations

90 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

91 Stepsizes Stepsizes:» Fundamental to ADP is an updating equation that looks like: n x n 1 1( 1) (1 1) x n t t = n t 1 ( t 1) + n 1ˆ t V S α V S α v Updated estimate Old estimate New observation The stepsize Learning rate Smoothing factor

92 Stepsizes High noise data: Best stepsize α = 1/ n n Low noise, changing signal αn

93 Stepsizes Theorem 1 (general knowledge):» If data is stationary, then 1/n is the best possible stepsize. Theorem 2 (e.g. Tsitsiklis 1994)» For general problems, 1/n is provably convergent Theorem 3 (Frazier and Powell):» 1/n works so badly it should (almost) never be used.

94 Stepsizes Lower bound on the value function after n iterations:

95 Stepsizes Bias-adjusted Kalman filter Estimate of the variance α n where: = 1 ( n 1) 2 1 ( n + λ σ + β ) 2 2 n λ = 1 α λ + α ( ) n 1 ( ) n 2 σ 2 As increases, stepsize decreases toward 1/ σ Estimate of the bias n As β increases, stepsize increases toward 1. n 2 Noise n Bias

96 Stepsizes The bias-adjusted Kalman filter Observed values BAKF stepsize rule /n stepsize rule

97 Stepsizes The bias-adjusted Kalman filter Observed values BAKF stepsize rule

98 Stepsizes BAKF formula Our newest formula» where α n = α n = ( ( ) ) n 1 2 n 2 λ σ + β ( ( ) 2 ) n n σ λ σ β ( 1 2 ( ( ) ) ) n n λ σ + γ δ c ( 2 c ) ( n ) n 1 ( n + γ λ σ + λ σ + 1 ( 1 γ) δ ) » First stepsize formula designed specifically for dynamic n programming (note: no need to estimate bias β )» Captures the natural growth of a value function through iterative learning.

99 Stepsizes I recommend:» Start with a constant stepsize Vary it, get a sense of what works best» Next try a deterministic stepsize such as a/(a+n) Choose a so that it declines to roughly the best deterministic stepsize at an iteration comparable to where your constant stepsize seems to stabilize Might be 50 iterations Might be 5000» Try an (optimal) adaptive stepsize rule Can work very well if there is not too much noise Adaptive rules work well when there is a need to keep the stepsize from declining too quickly (but you do not know how quickly)

100 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

101 Two-stage optimization Piecewise linear, separable value function approximations: Piecewise linear, separable: Vt( Rt) = Vtl( Rtl) l L

102 Two-stage optimization Benders decomposition: VR ( ) t t x 1 x 0 Multidimensional cuts produce provably convergent, nonseparable value function approximation.

103 The competition Exact solutions using Benders: L-Shaped decomposition (Van Slyke and Wets) V 0 x 0 Stochastic decomposition (Higle and Sen) V 0 x 0 CUPPS (Chen and Powell) V 0 x 0

104 The competition 45 Percent over Percent optimal from optimal after iterations iterations Increasing problem size Percent error locations 25 locations 50 locations 100 locations SD L-shaped CUPPS SPAR Benders Separable

105 The competition Percent over Percent optimal from optimal after iterations iterations Increasing problem size 35 Percent error locations 25 locations 50 locations 100 locations SD L-shaped CUPPS SPAR Benders Separable

106 Multistage problems Deterministic, single commodity flow

107 Multistage problems Deterministic, single commodity flow» ADP solution as a percent of the optimal solution (from Cplex)

108 Multistage problems Stochastic, single commodity flow = optimal posterior bound 0 Rolling horizon ADP Percent of posterior optimal (20,100) (20,200) (20,400) (40,100) (40,200) (40,400) (80,100) (80,200) (80,400)

109 Multistage problems Deterministic, (integer) multicommodity flow

110 Multistage problems Deterministic, (integer) multicommodity flow = optimal continuous relaxation Percent of optimal Base T_30 T_90 I_10 I_40 C_II C_III C_IV R_1 R_5 R_100 R_400 C_1 C_8

111 Multistage problems Stochastic, (integer) multicommodity flow Rolling horizon ADP Percent of posterior optimal 0 Base I_10 I_40 C_II C_III C_IV R_1 R_5 R_100 R_400 C_1 C_8

112 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

113 Schneider National 2008 Warren B. Powell Slide 113

114 2008 Warren B. Powell Slide 114

115 Schneider National 1400 Revenue per WU US_SOLO US_IC US_TEAM Capacity category Revenue per WU U tiliz atio n Historical maximum Simulation Historical minimum Calibrated Historical min model and max Utilization Historical maximum Simulation Historical minimum US_SOLO US_IC US_TEAM Capacity category 2008 Warren B. Powell Slide 115

116 Princeton team: Warren Powell Belgacem Bouzaiene-Ayari NS team: Clark Cheng Ricardo Fiorillo Junxia Chang Sourav Das

117 2008 Warren B. Powell Slide 117

118 2008 Warren B. Powell Slide 118

119 2008 Warren B. Powell Slide 119

120 2008 Warren B. Powell Slide 120

121 Convergence Train coverage

122 Model calibration Train delay curve October 2007 Delay hours/day Fleet size

123 Model calibration Train delay curve March 2008 Delay hours/day Fleet size

124 Outline A resource allocation model An introduction to ADP ADP and the post-decision state variable A blood management example Hierarchical aggregation Stepsizes How well does it work? Some applications» Transportation» Energy

125 Energy resource modeling 2008 Warren B. Powell Slide 125

126 Energy resource modeling Hourly model» Decisions at time t impact t+1 through the amount of water held in the reservoir. Hour t Hour t Warren B. Powell Slide 126

127 Energy resource modeling Hourly model» Decisions at time t impact t+1 through the amount of water held in the reservoir. Hour t Value of holding water in the reservoir for future time periods Warren B. Powell Slide 127

128 Energy resource modeling 2008 Warren B. Powell Slide 128

129 Energy resource modeling 2008 Hour Warren B. Powell Slide 129

130 Energy resource modeling 2008 Hour Warren B. Powell Slide 130

131 Annual energy model Optimal Reservoir level Precipitation HydroUsage Remaining Wastage Demand Coal Wind Demand Warren B. Powell Slide 131

132 Annual energy model Approximate dynamic programming Reservoir level Precipitation Usage Remaining Wastage Demand Coal Wind Demand Warren B. Powell Slide 132

133 Annual energy model Optimal Reservoir level Precipitation HydroUsage Remaining Wastage Demand Coal Wind Demand Warren B. Powell Slide 133

134 Annual energy model Approximate dynamic programming Reservoir level Precipitation Usage Remaining Wastage Demand Coal Wind Demand Warren B. Powell Slide 134

135 Annual energy model 9000 ADP vs optimal reservoir levels for stochastic rainfall Reservoir level ADP at last iteration Optimal for individual scenarios Time period 2008 Warren B. Powell Slide 135

136 2008 Warren B. Powell Slide 136

The Dynamic Energy Resource Model

The Dynamic Energy Resource Model Group Peer Review Committee Lawrence Livermore National Laboratories July, 2007 Warren Powell Alan Lamont Jeffrey Stewart Abraham George 2007 Warren B. Powell, Princeton