Approximate Dynamic Programming in Rail Operations

Size: px

Start display at page:

Download "Approximate Dynamic Programming in Rail Operations"

Garey Hill
5 years ago
Views:

1 Approximate Dynamic Programming in Rail Operations June, 2007 Tristan VI Phuket Island, Thailand Warren Powell Belgacem Bouzaiene-Ayari CASTLE Laboratory Princeton University Warren B. Powell, Princeton University

2 Warren B. Powell 2007 Slide 2

3 Warren B. Powell 2007 Slide 3

4 Warren B. Powell 2007 Slide 4

5 Warren B. Powell 2007 Slide 5

6 A bit of history: History» Developed first locomotive optimization model based on the principles of approximate dynamic programming (the optimizing-simulator back then).» Implementation at Norfolk Southern Railroad as both a strategic and an operational planning system» 2002: New locomotive manager: Why in the world would anyone do a project like this? » Redeveloped modeling library» Advances in approximate dynamic programming» Used to develop ADP-based optimization model for car distribution » Locomotive modeling project restarted using new modeling library, new algorithms, new implementation strategy. Warren B. Powell 2007 Slide 6

7 Outline The locomotive planning problem Approximate dynamic programming Solving the subproblem Decomposition strategy Implementation Warren B. Powell 2007 Slide 7

8 Outline The locomotive planning problem Approximate dynamic programming Solving the subproblem Decomposition strategy Implementation Warren B. Powell 2007 Slide 8

9 Warren B. Powell 2007 Slide 9 Optimization models Normally, we would formulate a big optimization problem: 0 subject to : min = + = k t t k k t k t k t k t k t k t k T t k t t x x u x b x B x A x c (0,1) k x t Integer!

10 Optimization models Multicommodity flow formulation Warren B. Powell 2007 Slide 10

11 Optimization models Multicommodity flow formulation Warren B. Powell 2007 Slide 11

12 The challenge Real-world issues: The future is uncertain» Trains are added (and dropped) with 1-3 days notice.» Tonnage per train is not known until the last minute.» Equipment fails.» Trains arrive late. Locomotive assignments are complex» Locomotives are bundled in consists there is a penalty for breaking a consist. This requires that every locomotive be modeled individually.» Leader locomotives The lead locomotive has to have certain characteristics ranging from bulletproof glass to flush toilets.» Shop routing We need to route power toward shops for maintenance. Data is not perfect» Railroads are notorious for incomplete and imperfect data.» There are multiple explanations when the model does not behave as we would expect. Warren B. Powell 2007 Slide 12

13 The challenge Competing methodologies: Deterministic optimization» Problem is NP-complete» Heuristics provide high-quality overall solutions, but can produce quirky solutions when evaluated up close.» Puts equal weight on decisions now and in the future.» Models here and now and the future at the same level of detail. Simulation» Able to handle a very high level of detail, but» Does not attempt to provide the best possible solution.» Suffers from complex rules needed to make decisions. Stochastic programming» Explodes problem size Dynamic programming/markov decision processes» You have to be kidding Warren B. Powell 2007 Slide 13

14 Outline The locomotive planning problem Approximate dynamic programming Solving the subproblem Decomposition strategy Implementation Warren B. Powell 2007 Slide 14

15 Approximate dynamic programming The challenge of dynamic programming: ( { }) V ( S ) = max C ( S, x ) + E V ( S ) S t t t t t t+ 1 t+ 1 t x X Three curses Problem: Curse of dimensionality State space Outcome space Action space (feasible region) Warren B. Powell 2007 Slide 15

16 Action State Use weather report Do not use weather report Information Forecast rain.1 Forecast cloudy.3 Forecast sunny.6 Schedule game Cancel game State Action Schedule game Cancel game Schedule game Cancel game Schedule game Cancel game Rain.2 -$2000 Clouds.3 $1000 Sun.5 $5000 Rain.2 -$200 Clouds.3 -$200 Sun.5 -$200 Information Rain.8 -$2000 Clouds.2 $1000 Sun.0 $5000 Rain.8 -$200 Clouds.2 -$200 Sun.0 -$200 Rain.1 -$2000 Clouds.5 $1000 Sun.4 $5000 Rain.1 -$200 Clouds.5 -$200 Sun.4 -$200 Rain.1 -$2000 Clouds.2 $1000 Sun.7 $5000 Rain.1 -$200 Clouds.2 -$200 Sun.7 -$200 - Decision nodes - Outcome nodes

17 Transition function Traditional modeling of dynamics S = S ( S, x, W ) M t+ 1 t t t+ 1 System model Transition function New information Decision Current state of the system New state Warren B. Powell 2007 Slide 17

18 State variables New concept: The pre-decision state variable:» S t = The information required to make a decision x t» Same as a decision node in a decision tree. The post-decision state variable:» x S t = The state of what we know immediately after we make a decision.» Same as an outcome node in a decision tree. Warren B. Powell 2007 Slide 18

19 State variables Breaking down the system dynamics: Instead of modeling from pre- to pre- S = S ( S, x, W ) M t+ 1 t t t+ 1 we use one function to go from pre- to post- S = S ( S, x ) x M, x t t t with another function from post- to pre- S = S ( S, W ) MW, x t+ 1 t t+ 1 Warren B. Powell 2007 Slide 19

20 State variables Pre-decision: resources and demands S = ( R, D ) t t t x t Warren B. Powell 2007 Slide 20

21 Post-decision: State variables S = S ( S, x ) x M, x t t t Warren B. Powell 2007 Slide 21

22 New information: State variables x S t S = S ( S, W ) MW, x t+ 1 t t+ 1 W = ( Rˆ, Dˆ ) t+ 1 t+ 1 t+ 1 Warren B. Powell 2007 Slide 22

23 New pre-decision: State variables S t + 1 Warren B. Powell 2007 Slide 23

24 State variables Pre- and post-decision attributes for a discrete resource: t = 40 Pre-decision t = 40 Decision t = 40 Post-decision t = 50 Pre-decision City ETA Equip Dallas 41.2 Good Chicago - - Chicago 54.7 Good Chicago 56.2 Repair Warren B. Powell 2007 Slide 24

25 State variables Bellman s equations broken into stages: Optimization problem (making the decision): ( x( M, x )) V ( S ) = max C ( S, x ) + V S ( S, x ) t t x t t t t t t t» Note: this problem is deterministic! Schedule game Cancel game Simulation problem (the effect of exogenous information): {, } V ( S ) = E V ( S ( S, W )) S x x M W x x t t t t t t» Need to compute expectation. Rain.8 -$2000 Clouds.2 $1000 Sun.0 $5000 Challenge: What is ( M, x (, )) V S S x x t t t t Warren B. Powell 2007 Slide 25

26 Our general algorithm Step 1: Start with a post-decision state Step 2: Obtain Monte Carlo sample of and compute the next pre-decision state: Step 3: Solve the deterministic optimization using an approximate value function: ( ) n n n 1 M, x n vˆ t = max x Ct( St, xt) + Vt ( S ( St, xt) ) n to obtain. x t (, ( ) ) 1 S = S S W ω n M, W x, n n t t t, S x n t ( 1 n Wt ω ) Step 4: Update the value function approximation n x, n n 1 x, n n t 1( t 1) = (1 αn 1) t 1 ( t 1) + αn 1ˆ t V S V S v Simulation Optimization Statistics Step 5: Find the next post-decision state: S = S ( S, x ) xn, M, x n n t t t vˆ updates V ( S ) x t t 1 t 1 Warren B. Powell 2007 Slide 26

27 Iterative learning t Warren B. Powell 2007 Slide 27

28 Iterative learning Warren B. Powell 2007 Slide 28

29 Iterative learning Warren B. Powell 2007 Slide 29

30 Value function approximations Value function approximations: Linear (in the resource state): x V ( R ) = v R x t t ta ta a A Piecewise linear, separable: x x Vt ( Rt ) = Vta ( Rta ) a A Indexed PWL separable: ( x ) x Vt( Rt ) = Vta Rta ( featurest) a A Best when assets are complex, R ta which means that is small (typically 0 or 1). Best when assets are simple, R ta which means that may be larger. Helps to capture dependencies. Warren B. Powell 2007 Slide 30

31 Value function approximations Value function approximations: Ridge regression (Klabjan and Adelman) Benders cuts (from stochastic programming) V ( R ) t t ( ) x V ( R ) = V R R = θ R t t tf tf tf fa ta f F a A f x 1 x 0 What worked: nested separable approximations Warren B. Powell 2007 Slide 31

32 Coming in September, 2007?? Warren B. Powell 2007 Slide 32

33 Outline The locomotive planning problem Approximate dynamic programming Solving the subproblem Decomposition strategy Implementation Warren B. Powell 2007 Slide 33

34 Solving the subproblem Elements of the problem Here and now» Assigning the right number, and the right types of locomotives to trains.» Must consider variety of operational constraints Consist breakup Leader locomotives Future» We need to move power now to serve needs in the future» Future train movements consist of: Scheduled trains (but uncertain tonnages) Unscheduled trains We have to consider» Getting locomotives to shop» Random travel times, equipment failures Warren B. Powell 2007 Slide 34

35 Solving the subproblem Atlanta Locomotives Trains Yards Baltimore Jacksonville Warren B. Powell 2007 Slide 35

36 Solving the subproblem Horsepower Locomotives Consist-breakup costs Shop routing bonuses/ penalties Leader logic Warren B. Powell 2007 Slide 36

37 Solving the subproblem Horsepower Locomotives Locomotives coming in on same train Warren B. Powell 2007 Slide 37

38 Solving the subproblem Train reward function Horsepower Locomotives Warren B. Powell 2007 Slide 38

39 Solving the subproblem The train reward function Train reward Minimum Goal Overpowering Power Warren B. Powell 2007 Slide 39

40 Solving the subproblem Horsepower Locomotives Train may need horsepower. Solutions: = = = Warren B. Powell 2007 Slide 40

41 Solving the subproblem Horsepower Locomotives Train may need horsepower. Solutions: = = = Warren B. Powell 2007 Slide 41

42 Solving the subproblem Horsepower Locomotives Locomotive buckets Warren B. Powell 2007 Slide 42

43 Warren B. Powell 2007 Slide 43

44 Warren B. Powell 2007 Slide 44

45 Warren B. Powell 2007 Slide 45

46 Warren B. Powell 2007 Slide 46

47 Linear value function approximations 1990 s linear value function approximations Objective function Total reward Iteration Warren B. Powell 2007 Slide 47

48 Measuring performance Model vs. history (August, 2000) Percent History Model Setouts Swaps Nonpreferred consists Underpowered Overpowered Warren B. Powell 2007 Slide 48

49 Results from the 1990 s 1990 s Solving locomotive assignment problem using a hybrid LP-relaxation and local search heuristic Value of locomotives in the future we estimated using linear functions Slope of the value function were estimated using the dual variable from the LP relaxation. Implemented in production at Norfolk Southern Weaknesses: Linear value function approximations could be unstable. Local search heuristic would occasionally produce anomalies.» The fact that the solution was not as good was irrelevant.» Anomalies reduced confidence in the model.» Produced diagnostic problems, since it was not easy to identify why an odd locomotive assignment was due to data problem, model problem, coding issue, or the heuristic. Warren B. Powell 2007 Slide 49

50 The challenge That s not very good Warren B. Powell 2007 Slide 50

51 Solving the subproblem Horsepower Locomotives Locomotive buckets Warren B. Powell 2007 Slide 51

52 Solving the subproblem Horsepower Locomotives Locomotive buckets Warren B. Powell 2007 Slide 52

53 Status in : Local search heuristic has been replaced with Cplex.» We have no trouble solving the IP to optimality, without sacrificing our ability to handle all the operational constraints. Value function approximations:» First replaced linear with piecewise linear separable Separate PWL value function for each type of locomotive at each location. Works much better than linear, but not well enough. Occasionally would move 5 locomotives to a location that needed 4. Introduced nested, separable piecewise linear value function approximation. Warren B. Powell 2007 Slide 53

54 Nested separable nonlinear Nested separable, nonlinear The value of six-axle high-adhesion locomotives in Atlanta The value of locomotives in Atlanta Warren B. Powell 2007 Slide 54

55 Nested separable nonlinear n R t n R t,(6 HL, ATL ) n R t,(4 HL, ATL ), xn R t ( ) F R t n R t,(6 HH, ATL ) n R t,(6 HL, JAX ) n R t,(4 HL, JAX ) n R t,(6 HH, JAX ) Warren B. Powell 2007 Slide 55

56 Nested separable nonlinear n R t, xn R t ( ) F R t vˆn,(6 t HL, ATL ) vˆn,(4 t HL, ATL ) vˆn,(6 t HH, ATL ) ˆn,(6, ) v t HL JAX ˆn,(4, ) v t HL JAX ˆn,(6, ) v t HH JAX Warren B. Powell 2007 Slide 56

57 Updating the value function approximation Estimate the gradient at n R t ˆn ν t,(6 HL, ATL ) ( ) F R t n R t,(6 HL, ATL ) Warren B. Powell 2007 Slide 57

58 Updating the value function approximation Update the value function at xn, Rt 1 V ( R ) n 1 x t 1 t 1 ˆn ν t,(6 HL, ATL ) ( ) F R t xn, n Rt 1,(6 HL, ATL) R t,(6 HL, ATL ) Warren B. Powell 2007 Slide 58

59 Updating the value function approximation Update the value function at xn, Rt 1 V ( R ) n 1 x t 1 t 1 ˆn ν t,(6 HL, ATL ) xn, Rt 1,(6 HL, ATL) Warren B. Powell 2007 Slide 59

60 Updating the value function approximation Update the value function at xn, Rt 1 V ( R ) n 1 x t 1 t 1 V ( R ) n x t 1 t 1 xn, Rt 1,(6 HL, ATL) Warren B. Powell 2007 Slide 60

61 Nonlinear VFA 2007 Nested, separable nonlinear approximations Objective Objective Iteration Warren B. Powell 2007 Slide 61

62 Outline The locomotive planning problem Approximate dynamic programming Solving the subproblem Decomposition strategy Implementation Warren B. Powell 2007 Slide 62

63 Decomposition strategy Spatial decomposition alternatives: The entire company» Simultaneously assign locomotives to trains across the entire company at a point in time.» Makes it possible to assign power to trains in different locations. One terminal at a time» This was the strategy when we first used linear approximations.» Reflects tendency of dispatchers to handle one yard at a time.» Creates problems when close-by terminals are managed jointly. Regions» Decompose the company the way it is actually managed. Warren B. Powell 2007 Slide 63

64 Decomposition strategy Option 1: Optimize over entire company at each point in time t Warren B. Powell 2007 Slide 64

65 Decomposition strategy Option 2: Decompose by terminal t Warren B. Powell 2007 Slide 65

66 Decomposition strategy Option 2: Decompose by terminal t Warren B. Powell 2007 Slide 66

67 Decomposition strategy Option 2: Decompose by terminal t Warren B. Powell 2007 Slide 67

68 D q D q D q D q D q D q Norfolk Southern Warren B. Powell 2007 Slide 68

69 Warren B. Powell 2007 Slide 69

70 Decomposition strategy Option 3: Decompose by desk (region) t Warren B. Powell 2007 Slide 70

71 Outline The locomotive planning problem Approximate dynamic programming Solving the subproblem Decomposition strategy Implementation Warren B. Powell 2007 Slide 71

72 Implementation strategy Applications: Strategic planning» What is the impact of changes in fleet size and mix on train delay?» How do changes in shop locations affect maintenance routing?» How do changes in schedule affect train delay for a given locomotive fleet?» How do changes in operating policies affect performance? Tactical planning» How much power will you have at each terminal 1, 2, 3 days out?» Where do you anticipate being short power? Real-time planning» What train should a locomotive be assigned to in order to get it to shop on time? Warren B. Powell 2007 Slide 72

73 Strategic planning One supersource node for each type of locomotive Warren B. Powell 2007 Slide 73

74 Strategic planning One supersource node for each type of locomotive Warren B. Powell 2007 Slide 74

75 Strategic planning One supersource node for each type of locomotive Warren B. Powell 2007 Slide 75

76 Strategic planning One supersource node for each type of locomotive Warren B. Powell 2007 Slide 76

77 Strategic planning One supersource node for each type of locomotive Warren B. Powell 2007 Slide 77

78 Strategic planning 99.5 percent train coverage on a historical dataset Coverage Coverage Iteration Warren B. Powell 2007 Slide 78

79 Strategic planning Total train delay (mins) Change in fleet size Warren B. Powell 2007 Slide 79

80 Strategic planning Total train delay (mins) Change in fleet size Warren B. Powell 2007 Slide 80

81 Strategic planning Fleet sizing system: Status» Fleet sizing system has been delivered.» Full laboratory functionality.» Hope to have user acceptance by September, Features» Provides tradeoff between fleet size and train delay» Sensitive to train schedule, operating policies, transit reliability, shop location, fleet mix, Implementation» Runs as stand-alone system» Requires network, train schedule, business rules and parameters Warren B. Powell 2007 Slide 81

82 Operational forecasting Operational forecasting system Initial inventories Warren B. Powell 2007 Slide 82

83 Operational forecasting Operational forecasting system Initial inventories Warren B. Powell 2007 Slide 83

84 Operational forecasting Operational forecasting system: Status» Extensive calibration as part of the fleet sizing system.» Project will start fall, Features» Updates every 1-2 minutes.» Models locomotives and trains at a high level of detail» Able to incorporate uncertainty into forecasts of train movements, tonnages and transit times» Uses same core model as the strategic planning system Implementation» Not yet started.» Sensitive issues: Accuracy of locomotive snapshot Ability to model unscheduled trains Balancing user expectations for the accuracy of the forecast with the realities of messy railroad operations Warren B. Powell 2007 Slide 84

Real-time locomotive assignment Real-time system: Status» Requires reoptimizing a single subproblem in real-time» Project will start summer, 2008.

85 Real-time locomotive assignment Real-time system: Status» Requires reoptimizing a single subproblem in real-time» Project will start summer, Features» Estimated reoptimization time for one subproblem < 1 second.» Responds in real-time to user overrides» Subproblem talks to operational planning model Implementation» Not yet started.» Sensitive issues: Tight integration with existing user interface Warren B. Powell 2007 Slide 85

86 pf Warren B. Powell 2007 Slide 86

The Dynamic Energy Resource Model

The Dynamic Energy Resource Model Group Peer Review Committee Lawrence Livermore National Laboratories July, 2007 Warren Powell Alan Lamont Jeffrey Stewart Abraham George 2007 Warren B. Powell, Princeton