Optimizing a Dynamic Order-Picking Process

Size: px

Start display at page:

Download "Optimizing a Dynamic Order-Picking Process"

Stephen Webb
5 years ago
Views:

1 Optimizing a Dynamic Order-Picking Process Yossi Bukchin, Eugene Khmelnitsky, Pini Yakuel Department of Industrial Engineering, Tel-Aviv University, Tel-Aviv 69978, ISRAEL Abstract This research studies the problem of batching orders in a dynamic, finite-horizon environment to minimize order tardiness and overtime costs of the pickers. The problem introduces the following trade-off: at every period, the picker has to decide whether to go on a tour and pick the accumulated orders, or to wait for more orders to arrive. By waiting, the picker risks higher tardiness of existing orders on the account of lower tardiness of future orders. We use a Markov Decision Process (MDP) based approach to set an optimal decision making policy. In order to evaluate the potential improvement of the proposed approach in practice, we compare the optimal policy with two naive heuristics: () Go on tour immediately after an order arrives, and, (2) Wait as long as the current orders can be picked and supplied on time. The optimal policy shows a considerable improvement over the naïve heuristics, in the range of 7%-99%, where the specific values depend on the picking process parameters. We have found that one measure, the slack percentage of the picking process, associated with the difference between the promised lead time and the single item picking time, predicts quite accurately the cost reduction generated by the optimal policy. The structure and the properties of the optimal solutions have led to the construction of a more comprehensive heuristic method. Numerical results show that the proposed heuristic, MDP-H, outperforms the naïve heuristics in all experiments. As compared to the optimal solution, MDP-H provides close to optimal results for a slack of up to 40%.. Introduction Order-picking is the process of retrieving items from stocking locations in a warehouse to satisfy given demands. This process may involve as much as 60% of all labor activities in a warehouse and may account for as much as 65% of all operating expenses (Gademann and Van de velde, 2005).

2 The performance of an order picking system is typically determined by seven factors: batching, picking sequence, storage policy, zoning, layout design, picking equipment and design of picking information. Some research has been mainly concerned with studying the joint effect of several factors, on the performance of order picking systems. Petersen and Aase (2004) evaluated a number of picking, routing and storing methods, in order to determine which combination of these factors is best in terms of picking time. Each combination was compared to a basic scenario, in which orders are picked separately, items are stored randomly and the traversal strategy is used for routing. They concluded that batching of orders leads to the largest improvement, especially when small sized orders are frequent. Moreover, an improved storage policy (one which is not random, for example, class based) also achieves significant improvement, and with less sensitivity to order size. The best combination reduced the picking time by almost 30%. Other papers address each order picking performance factor separately. In that context, batching related studies are very common. Generally, the order batching problem is the problem of simultaneously assigning orders to batches and determining a picking tour for every batch so as to optimize an objective function. The main driver for batching is to reduce the average picking travel distance and thereby increase the throughput, and improve due date performance. Gademann and Van De Velde (2005) addressed the problem of batching orders to minimize total travel time in a parallel aisle warehouse. This problem is also referred to as proximity batching, since the obvious motivation is to batch orders that are stored in near locations. They proved that the problem is NP-hard in the strong sense, but can be solved in a polynomial time when the batch size is no greater than two orders. In the past, many heuristics have been presented in the literature for proximity batching. Most of these heuristics first select a seed order for a batch and subsequently expand the batch with orders that have "proximity" to the seed order as long as the picking cart capacity is not exceeded. The distinctive factor is the measure of the proximity of orders. Armstrong et al. (979) considered proximity batching with fixed batch sizes and presented an integer programming model. Gibson and Sharp (992) considered order batching in an order picking operation of storage and retrieval (S/R) machines. Elsayed and Lee (996) investigated automated storage/retrieval (AS/R) systems where a due date is specified for each retrieval order. They considered the inclusion of both order retrieval and storage in the same tour when possible. Their main results include a set of rules for sequencing and batching orders to tours such that the total tardiness of retrievals per group of orders is minimized. 2

3 The routing strategies of pickers in the warehouse were investigated in Hall (993). Three strategies for routing manual pickers are compared: () traversal, (2) midpoint, and (3) largest gap. The comparison was made by estimating the expected route length of each strategy. The results include a few rules of thumb which assist in choosing one strategy over another. For example, the third strategy is best when the average number of picks per aisle is relatively small. Another research was made by Roodbergen and de Koster (200) who considered a parallel aisle warehouse, where order pickers can change aisles at the ends of every aisle and also at a cross aisle halfway along the aisles. They concluded that in many cases the average order picking time can be decreased significantly by adding a middle aisle to the layout. In zoning, the warehouse is divided into zones so that each order is divided into sub orders which are allocated to the different zones. Every sub order is picked in the respective zone and the entire order is being rejoined in the packing area. Hane and Laih (2005) studied a synchronized zone order picking system. In such a system, the pickers of all zones work on the same order simultaneously. In order to prevent balance loss, the authors suggest storing items, which are likely to be a part of the same order, in different zones. Next, they developed a natural cluster model for item assignment in the warehouse. In one case study, the proposed item clustering approach improved the system's efficiency by 29% and the order picking time by 8%. Jane (2000) has developed a heuristic algorithm for a progressive zone picking system. Unlike synchronized zoning, under progressive zoning each order is processed by one zone picker at a time. The research objective was to balance workloads among all pickers so each one has almost the same load and to adjust the zone size for order volume fluctuations. The proposed method was illustrated and verified to achieve the objective through empirical data and simulation experiments. As described above, most of the related literature deals with the static problem of picking a fixed number of orders in the most efficient way while finding the best picking sequence or picking strategy (batching or zoning). However, in many warehouses and distribution centers (DC), the picking activity is executed under uncertainty, since the inter-arrival time of customer orders is stochastic by nature. Both a DC satisfying customer orders made via the Internet and an automotive warehouse providing spare parts for auto-shops are examples of such an environment. In this research, we address the problem of batching orders in a dynamic, finite-horizon environment to minimize order tardiness and overtime costs of the pickers. This problem is solved to optimality using a Markov decision process based approach. The performance of the optimal procedure was compared with two naïve heuristics and is found significantly 3

4 superior. The structure and the properties of the obtained solutions lead to constructing an efficient heuristic, called MDP-H. The comparison between the proposed heuristic and the optimal one shows that MDP-H provides close to optimal solution (up to 0.62%) for a slack up to 40%. In all experiments, MDP-H provides better solutions than the two naïve heuristics. Although this paper mainly refers to the manual order picking system, we expect the results to be applicable to automatic systems as well, where AS/R machines are responsible for these operations in an AS/R system. Equipped with a dual or triple shuttle, the AS/R machine is capable of picking a small number of orders simultaneously, just like a human picker who uses a multi-bin, picking cart. Having this analogy defined, one might realize the possible applicability of the automatic system. For example, an AS/R machine that operates in a Blockbuster DVD rent center. Customers demand to rent DVDs randomly during the day and a picking policy for the S/R Machine must be defined with the purpose of maximizing customer service level (minimizing order tardiness). The structure of this paper is as follows. In Section 2 the problem description is presented. Section 3 formulates the problem as a Markov Decision Process (MDP) and briefly outlines the solution algorithm. In Section 4 the optimal solution is compared with naïve batching strategies and some numerical results are presented. Section 5 analyses a new heuristic, which is developed on the basis of the optimal strategies' properties learnt from the MDP solutions. The performance of the heuristic is then compared both with the optimal approach and the naïve heuristics. Finally, in Section 6 we discuss the main contribution of the paper and indicate further research opportunities. 2. Problem description The problem studied can be outlined in the following manner. Orders, each of a single line item, are picked by one picker who uses a cart of limited capacity. Different orders/items are being placed in different bins of the cart during the picking tour. This picking method is referred to as sort-while-pick. The arrival rate of the orders is a random variable, which follows the Poisson distribution with a mean value of λ orders per period of time. All orders are supplied according to the same service level, by having the same customer lead time. Whenever an order is supplied after its due date, a penalty, which is proportional to the amount of tardiness periods, is consumed. A finite horizon is considered, as the warehouse is closed at the end of each working day, after fulfilling all the orders of that day. Consequently, 4

5 another kind of penalty is incurred whenever the picker keeps on working after the end of the working day. This penalty is proportional to the amount of overtime periods. The fundamental trade-off existing in our problem can be explained as follows. At every period, regardless of whether a new order has arrived or not, the picker has to decide whether to go on a picking tour and supply the orders accumulated so far or to wait for more orders to arrive (to batch orders). The former decision may speed up the supply of the currently available orders. However, by doing this, the picker may miss an opportunity to batch more orders had he waited one more period. That is, by waiting, the picker risks higher tardiness of existing orders for the potential lower tardiness of future orders. Our goal is to set a decision making policy that will minimize the average cost of order tardiness and worker overtime during a finite working day. It is clear that the time to pick a batch of n orders changes according to their storage locations in the warehouse. However, in this model, we assume that the picking tour time of n items, T(n), is an increasing function of the number of items, n, and independent of their locations. Still, we assume that T(n) is a concave function of n and therefore there is a motivation for batching items before going on a tour. 3. The solution approach We have modeled the above problem via MDP. The time horizon over which the optimal order picking policy is obtained is finite and associated with one working day. The working day is divided into periods of time, where the period length is set in such a way that the probability of the arrival of more than one order during a period is negligible. At the beginning of each period (i.e., at each decision epoch), the decision maker has two actions to choose from: to go on a picking tour or to wait another period. If he decided to wait, he receives no reward. Otherwise, if he decided to go on a picking tour, he incurs a cost proportional to the tardiness of the orders batched so far. The system then evolves into the next decision epoch according to the transition probability matrix of the Markov chain. Each state is defined by the number of orders batched and their corresponding remaining times to supply. The number of states in our problem is finite since the picker cannot accumulate more orders than the number of bins in the picking cart, and since we have defined the waiting time of an order to be limited (i.e., every order must be supplied after a predefined amount of time). The elements of the MDP model are described next. 5

6 Decision epochs Let {, 2,..., N} be a finite set of decision epochs, where the decision epoch N denotes the end of a working day. According to the policy of the DC, no orders arrive at the last I periods of the working day, in order to allow the picker to supply all the orders arrived during the (N- I) periods. If I is chosen to be relatively small, then there is a good chance that the DC would have to remain open after decision epoch N and therefore pay for overtime. If I is relatively large, then there is only a small chance of overtime. System states Let S = S' U Δ denote the set of the possible system s states, where S' is the set of states describing the order batching process, and Δ is the set of states describing the picking tour. Let γ i denote the remaining time to supply order i and Γ n = ( γ, γ 2, K, γ n ), the vector of remaining times to supply the n orders batched so far, where γ < γ 2 <... < γ n. Recall that the strict inequality results from the fact that at most one order can enter the system within a single period. A member of the set S = { s' s' = ( n,( γ, γ, K, γ ))} contains the number of ' 2 n orders batched, n, and their corresponding remaining times to supply, Γ n. For example, the system's state s '= {2,(3,5)} implies that two orders were batched so far. The first order is due in three periods and the second one is due in five periods. The S' state set is bounded because the values of n and γ i for all i are bounded. For all i, γ i is bounded from above by d, the planned lead time of an order, and from below by L, the lowest time left to the due date, L γ d ; n is bounded by the number of bins in the picking cart, C. In case either the i remaining time to supply the oldest order, γ, reaches the value of L, or, the cart is full, the picker is forced going on a picking tour. The state s ' = (0, φ) describes the system with no orders. As mentioned above, Δ is the set of states describing the picking tour. The members of this set are defined by the time left to the end of the picking tour and the expected length of the tour; i.e., Δ= { δ δ = ( ktn, ( ))}, where k is the time left to the end of the tour, and T(n) is the length of the tour. For example, the system's state δ = ( 3, T (5)) implies that a picking tour will be over in three periods and its total length is T(5) periods. The Δ state space counts the periods left in the picking tour, in order to determine the epoch in which the system comes back to the S' state space. The tour length is also kept as a part of the state in order to calculate the correct transition probability to the S' state space. 6

7 The state set Δ consists of the following members: {( T ( n), T ( n)),( T ( n) 2, T ( n)),..., (, T ( n)),(0,0) } Δ = for all n=,,c where, δ = (0,0) is the state of a picking tour that ends in one of the last I periods of the working day (i.e., an absorbing state, since no more orders arrive in the last I periods). Actions The action set A s depends on state s and includes at most two actions for each state. The first action, a, is to wait for one more period and the second action, a 2, is to go on a picking tour. Clearly, a choice of a 2 is prohibited during a picking tour and when no orders have been batched in the system. More precisely, { a} s Δ { a} s S' n= 0 As = { a2} s S' n= C or γ = L { a, a2} otherwise Rewards Whenever action a is chosen, the decision maker receives no reward. If action a 2 is chosen, then an immediate penalty, which is proportional to the tardiness of all the orders accumulated thus far, is incurred. Notice that since the length of the tour given n is assumed known, T(n), the tardiness can be calculated before the tour has actually started. We denote by c T the tardiness penalty per period and by c O the overtime penalty per period. Note that the overtime penalty is incurred only once, at the end of the working day, in epoch N. The value of the tardiness penalty at every time of the working day, t, and for every possible action and state combination is 0 a= a r(, ) n t s a = ct max( T( n) γ i,0) a = a2, s = { n,( γ, K, γn)} i= and the overtime penalty value at time N is rn(( k, T( n))) = kco k = KT( n) r ((0,0)) = 0 N Transition probabilities At each decision epoch t, given the system state s and the action a, we determine the probability to reach a state j in the subsequent decision epoch t+. Two additional 7

8 assumptions are taken: the expected time to pick one order is larger than two periods; and the probability of more than C orders entering during the longest picking tour, T(C), is negligible. The transition probabilities differ in two distinct time frames. The first time frame is comprised of the first N-I periods during which orders can enter the system, and the second time frame is comprised of the last I periods during which orders do not enter the system. The transition probability matrix of the first time frame is presented in (). For t < N I s S' n> 0, j Δ, j = ( T( n), T( n)), a= a2 s Δ, s= ( ktn, ( )), j Δ, j= ( k, Tn ( )), k= 2 KTn ( ), a= a λ λe s S' s= ( n< C,( γ > L, γ2,..., γn)), j S' j = ( n,( γ: = γ,..., γn : = γn )), a= a λ Pt ( j s, a) = λe s S' s= ( n< C,( γ > L, γ2,..., γn) ), j S' j = ( n+,( γ: = γ,..., γn : = γn, γn+ : = d)), a= a P s Δ, s= (, T( n)), j S' j = ( n',( γ = d ( T( n) p ),..., γn' = d ( Tn ( ) pn' ))) if n' > 0, or j= (0, ) if n' = 0, a= a 0 otherwise () where, P ( λt ( n)) n'! T ( n) n' n' = λ T ( n) e. In the first line of (), action a 2 (go on tour) is chosen, and the system evolves into the set of the picking tour states with a probability of. The remaining tour time in the next state, j, is T(n)-. In the second line, the system occupies a state from Δ and moves in another state from Δ with a probability of. The remaining tour time is decreased by one period. This is true for all Δ states apart from δ = (, T ( n)). The third and forth lines consider a case in which the system occupies a state from S' and does not have to go on a tour immediately. That is, the number of batched orders is smaller than C and the oldest order has more than L periods left to its due date. Then, if an action a is chosen, the next state will be determined by whether an order has entered the system (line 4) or not (line 3). The fifth line addresses a transition from the state δ = (, T ( n)) into a state from S'. The transition to a specific state, j, is determined by both the number of orders that entered the system during the picking tour, n', and the time periods of the picking tour, denoted by p,...,, p2 pn', in which the n' orders have entered the system. To elaborate on the transition presented in the fifth line of (), we consider the following example, presented in Figure. Let a picking tour last five periods, and there are two orders 8

9 that entered the system during that tour, at the periods 2 and 4. Then, n'=2, p = 2 p 4. 2 = and The system returns to a state from S' Figure. Arrivals during picking tour. The state to which the system will transmit is defined as (2,(d-3,d-)). There are two orders to be supplied with the ages of three and one periods, respectively. The probability P in the example is calculated as follows. The probability of arriving two orders within the picking tour of five periods is 5 (5 λ) 2! e λ 2. Due to the lack of memory property of the Poisson distribution, the two orders could have entered in any two periods with the same probability. Since the number of options is 5, the transition probability is finally obtained as, (5 ) 5 λ P = e λ 2! 2. The transition probability matrix for the second time frame is given in (2). In this frame a picking tour is taken immediately, since no new orders can arrive. For N I t < N s S' n> 0, j Δ, j= ( T( n), T( n)), a= a2, N t< T( n) s S' n> 0, j Δ, j= (0,0), a= a2, N t T( n) s S' n= 0, j Δ, j= (0,0), a= a s Δ, s= (0,0), j Δ, j= (0,0), a= a Pjsa t(, ) = s Δ, s= ( k, T( n)), j Δ, j= ( k, T( n)), k= 2, K, T( n), a= a * P s Δ, s= (, T( n)), j S' j= ( n',( γ= d ( T( n) p ),..., γn' = d ( T( n) pn' ))), or j= (0, ) if n' = 0, a= a, t+ T( n) < N I, t N 2 0 otherwise (2) where, P * = e λ ( N I t + T ( n)) ( λ( N I t + T ( n))) n'! n' N I t + T ( n) n' 9

10 In the first line of (2), N-t<T(n), and hence, the last tour does not end before the end of the working day. In this case, the overtime length is kept for future calculation of the overtime cost. In the second line, there is enough time to complete the tour and the system evolves to the absorbing state. In the third and fourth lines, the system moves to the absorbing state immediately. The dynamics of the states within a picking tour is addressed in the fifth line. The sixth line is similar to line 5 in (). The only difference is that orders enter the system only in the first N I t + T ( n) periods of the tour rather than during the entire tour. An optimal policy The problem described above is characterized by a finite set of states, S, and a finite set of actions, A s, for each s S. Therefore, there exists an optimal deterministic Markovian * policy, as stated in Puterman (994). Let u ( ) be the maximum total expected reward, t s t * starting from state s t at decision epochs t, t +,..., N. Then, u t ( s t ) is obtained by the following Backwards induction algorithm, which gives also the optimal actions for each state and each epoch, A. * st, t The Backwards induction algorithm *. Set t = N and u ( s ) = r ( s ) for all s N S N N N * 2. Substitute t- for t and compute u ( ) for each s t S by u ( s ) = max{ r ( s, a) + * t t a Ast t t N j S t s t P ( j s, a) u t t * t + ( j)} * * Set A = arg max{ r ( s, a) + P ( j s, a) u ( )} s, t j t t t t t t a A + st j S 3. If t =, stop. Otherwise return to step Experiments 4.. MDP model versus naïve heuristics The main objectives of the experiments conducted in this section are: To validate the mathematical model. To evaluate possible cost reduction via applying the proposed approach in a real order picking system. To gain insights into the structure and the properties of optimal solutions that will assist in developing new MDP based heuristic methods. 0

11 In order to implement the MDP model, we have developed a computer code, and obtained as output a table containing the optimal policy. Each row in the table stands for each one of the possible system s states, excluding the Δ states which do not involve any decision. Each column of the table stands for each time period during a working day. The data within the table specifies instructions, regarding the optimal action choice, as "" means "go on tour" and "0" means "wait another time period". An example is presented in Figure 2, where the upper left hand side of an optimal policy table is shown. For demonstration purposes, the "go on tour" policy was painted in green while "wait another time period" policy was painted in red. One can see, for example, that when the system contains a single item, the picker waits when the time to supply the order is relatively large, however, when this value is smaller than or equal to 6, the picker goes on a tour. Figure 2. An example of an optimal policy table A simulation model of the order picking system was developed to evaluate the performance of the proposed MDP based solution procedure versus two naïve heuristics. The first heuristic (will be referred to as the Green heuristic from here on) is quite straightforward. Whenever an order is waiting to be picked and the picker is available, the picker will go on a picking tour. The second heuristic (will be referred to as the Slack heuristic from here on) indicates that "waiting another time period" is preferred as long as no certain tardiness will occur. We say that the system has slack if the orders picking time is smaller than the orders' remaining times to supply. This heuristic is called Slack since as long as there is slack available in the system, the action choice is "wait another time period". Once there is no slack, the action choice is "go on tour".

When examining the tables of the optimal policies obtained in the experiments, we were able to identify two major effects, demonstrated in Figure 3:.

In fact, at the steady state, the optimal policy can be expressed by a vector instead of a table, as each element denotes the optimal action given a certain state.

12 When examining the tables of the optimal policies obtained in the experiments, we were able to identify two major effects, demonstrated in Figure 3:. Steady state effect at a certain point in time, which is far back from the end of the working day, the optimal policy becomes independent of time. In fact, at the steady state, the optimal policy can be expressed by a vector instead of a table, as each element denotes the optimal action given a certain state. The steady state effect is clearly illustrated in the left hand side of Figure 3(a). 2. Transient state effect toward the end of the working day the optimal policy shows time dependent irregular pattern, as different actions are associated with the same state in different points in time (see the right hand side of Figure 3(a)). Note that in the last I periods, where no new orders arrive, the only action is go on tour. Moreover, in some experiments we were able to identify an additional red shape adjacent to the last I periods, denoted as the tail. An example of such a tail is shown in Figure 3(b). In the "tail" region, despite the certain tardiness, the picker chooses to wait in order to save future overtime costs. We were also able to identify an influence of the cost parameters on the transient state as the length of this state increases with the ratio of the overtime and tardiness cost parameters. (a) (b) Figure 3. The two major effects identified in the optimal policy. Another observation indicated that the optimal solution is mostly "green", i.e., the action go on tour is made more frequently than the action wait another time period. We believe such a behavior results from the relatively low order arrival rate. Indeed, when the arrival rate is low, the chance to batch additional order while waiting another period is relatively low as well. 2

13 4.2. Experimental design When analyzing the results of the preliminary experiments, we have determined the configuration of the final experiments in such a way that all the assumptions of the model are satisfied and all aspects of the optimal policy are clearly expressed. In particular, λ is chosen so that the probability of more than C orders arriving during a picking tour is small enough (%). For tractability purposes the value of C was set to three orders. The values of the other parameters are detailed in Table. Overall we have conducted 25 experiments that model 25 different warehouse configurations. The tour time function is chosen convex, T(2)=T()+, and T(3)=T(2)+. Table. The values of the problem parameters in the main experiments. Parameter C I N d L Slack:d-T() c O c T Set at 3 T(3)- 25,27,30,32,35 0 5,7,0,2, Analysis of results As noted above, the optimal policy was compared via simulation against two naïve heuristics, denoted as Green and Slack. The simulation results are presented in Table 2, as each row refers to one experiment. The order lead time, d, and the system's slack (which is defined as d-t()) are both displayed in the second and third column, respectively. Next, the slack as a percentage of d is given. Column 5 contains the MDP optimal steady state vector, determined by the parameters, n and n2; the parameters represent the threshold between the green area and the red area in terms of the slack for one and two-order states, respectively. The value Slack indicates that all the slack periods are painted green, and therefore the entire steady state vector is green. The value "0" indicates that all the slack periods are red, and therefore the steady state vector becomes identical to the vector of the Slack heuristic. The value "" indicates that all of the slack periods, except the last one, are painted red. Surprisingly, in all the experiments, n and n2 took only three values, 0, and Slack. That is, as slack decreases (from 5 down to 5), n and n2 decrease as well, while skipping all intermediate values between Slack and. The existence of tail is indicated in column 6. Note that tail appears only for states with two orders. Columns 7, 8 and 9 contain the average daily cost (based on simulation of 0,000 replications of a working day) of the optimal policy, Green heuristic and Slack heuristic, respectively. The last column contains the relative improvement of the optimal policy against the best heuristic. 3

14 Table 2. Experimental results # D slack Slack% MDP MDP Green Slack Improvement to steady tail average average average best H [%] state daily cost daily cost daily cost % n=;n2= No % n=0;n2=0 No % n=0;n2= % n=0;n2= % n=0;n2= % n=slack n2=slack No % n=;n2= No % n=0;n2= % n=0;n2= % n=0;n2= % n=slack n2= Slack No % n=;n2= No % n=0;n2= No % n=0;n2= % n=0;n2= % n=slack n2=slack No % n= n2= Slack No % n=;n2= No % n=0;n2= % n=0;n2= % n=slack n2=slack No % n= Slack n2=slack No % n=;n2= No % n=0;n2= % n=0;n2= The results indicate that the MDP optimal solution outperforms any of the heuristics in all the experiments; i.e., its average cost is always lower than the average costs of the two heuristics. Another observation, which is demonstrated in Figure 4, is that the slack percentage predicts the relative improvement to the best heuristic quite accurately, as the improvement percentage increases with the slack percentage. 4

Figure 4. The relative improvement as a function of the slack value Clearly, systems with larger slacks suffer from less tardiness and consequently, enjoy lower average costs.

15 Figure 4. The relative improvement as a function of the slack value Clearly, systems with larger slacks suffer from less tardiness and consequently, enjoy lower average costs. One can notice from Table 2 that cases with high relative improvement are associated with low absolute values of improvement which may be sometimes negligible. To stress this point we have divided the results into three groups with respect to high, medium and small relative slack (see Table 3). In the medium relative slack scenarios the average improvement is still significant while the average cost is far from being negligible. Therefore, we conclude that the strength of our model lies in medium slack size scenarios. Table 3. Absolute value improvement versus relative improvement Slack percentage range Average cost Improvement to best heuristic 44%-60% % 26%-43% 2 0.7% 4%-23% 380.7% The steady state vector of the Green heuristic is characterized by n=slack and n2=slack. Similarly, for the Slack heuristic, n=0 and n2=0. Now, we can notice that in all of the experiments, the MDP steady state vector is almost identical (there might be a difference of one or two action choices in the entire vector) to the steady state vector of one of the two 5

16 naïve heuristics. Therefore, we conclude that the major part of the MDP model benefit is due to the transient state effect. In addition, the structure of the optimal policy indicates that the higher the slack percentage, the more preferable the Green heuristic is against the Slack heuristic. 5. Heuristic methods 5. Background In this section, a heuristic approach for large-scale problem is proposed. To this end, the structure of the optimal policy, expressed by the colored table (see Section 4..), was analyzed. Fortunately, regular patterns were identified in the optimal policy. These patterns and their characteristics were the cornerstones of our heuristic design. We distinguish between patterns of the steady state and patterns of the transient state, and use these patterns in developing the heuristic. The main purpose of the proposed heuristic is to develop a close to optimal procedure which outperforms the best practice heuristics, named Green and Slack in the previous section. The patterns of the optimal procedure are outlined next. Patterns in the steady state We define the steady state as a time period in which the action choice depends only on the systems state s and not on the decision epoch t. Thus, the steady state can be defined by a policy vector instead of a policy table. According to the optimal results, the steady state vector has only a few configurations. The structure of steady state is described by the two parameters, n and n 2 that take only three values each. Specifically, one form of the steady state is a green vector, which orders go on tour for every possible state. Figure 5(a) illustrates such a case. Another form of the steady state vector is one of full slack usage or one with full slack usage minus one. This is illustrated in Figure 5(b). Rarely, the slack usage could be uneven between states of two orders and states of one order. Namely, n and n 2 are not necessarily equal in all of the optimal solutions. Full slack usage indicates that system states, in which slack is available, are painted red. Similarly, full slack usage minus one indicates that the same states are painted red, apart from the state with only one slack time period. This state is painted green. 6

Four red cubes A green steady state vector A full slack usage steady state vector (a) (b) Figure 5. Steady state and transient patterns for a green solution (a) and a full slack usage (b).

This observation was extremely helpful in the construction of the heuristic policy. Table 4 shows the 25 experiments, sorted by the slack percentage.

17 Four red cubes A green steady state vector A full slack usage steady state vector (a) (b) Figure 5. Steady state and transient patterns for a green solution (a) and a full slack usage (b). When analyzing the results of the main experiments, we have noticed that the steady state vector seems to have a strong link to the slack percentage. This observation was extremely helpful in the construction of the heuristic policy. Table 4 shows the 25 experiments, sorted by the slack percentage. It is easily seen that (i) in low slack percentage systems the steady state is described by a green vector (i.e., n = n 2 =0); (ii) in medium slack percentage systems the steady state is described by a full slack usage minus one vector (i.e., n = n 2 =); (iii) in high slack percentage systems the steady state is described by a full slack usage vector (i.e., n = n 2 = Slack). As mentioned above, note that in experiments 3 and 7 the values of n and n 2 are not even. We refer to this issue later on. Table 4: The 25 experiments sorted by the slack percentage. MDP steady Slack Tail # MDP steady Slack Tail # Slack d Slack d state (n,n2) (%) (Y/N) exp. state (n,n2) (%) (Y/N) exp. (0,0) 5 4% Y 35 5 (0,0) 2 34% N 35 2 (0,0) 5 6% Y 32 0 (,) 0 37% N 27 8 (0,0) 5 7% Y 30 5 (,) 2 38% N 32 7 (0,0) 5 9% Y (,) 2 40% N 30 2 (0,0) 7 20% Y 35 4 (,) 0 40% N (0,0) 5 20% Y (,) 5 43% N 35 (0,0) 7 22% Y 32 9 (,Slack) 2 44% N 27 7 (0,0) 7 23% Y 30 4 (Slack,Slack) 5 47% N 32 6 (0,0) 7 26% Y 27 9 (Slack,Slack) 2 48% N (0,0) 7 28% Y (Slack,Slack) 5 50% N 30 (0,0) 0 29% Y 35 3 (Slack,Slack) 5 56% N 27 6 (0,0) 0 3% Y 32 8 (Slack,Slack) 5 60% N 25 2 (0,) 0 33% N

Patterns in the transient state In the transient time, just before the end of the planning period, the system shows an unstable behavior. Nevertheless, clear and repetitive patterns still exist.

18 Patterns in the transient state In the transient time, just before the end of the planning period, the system shows an unstable behavior. Nevertheless, clear and repetitive patterns still exist. One clear pattern occurs in systems for which the steady state vector is not green. In these cases, at least three green holes are seen in the policy table. Such a case is illustrated in Figure 5(b). Another noticeable pattern occurs in systems in which the steady state vector is green. In these cases, at least four red cubes are observed in the policy table. Such a case is illustrated in Figure 5(a). Furthermore, the thickness of the cubes is the same in most of the cases. The transient state patterns also very much depend on the slack percentage. Interestingly, they also depend on the length of the picking tour. For example, we have discovered that the exact starting position of each of the three green holes can be determined by T(). This is illustrated in Figure 6. T() T() T() Figure 6. The position of the three green holes. The tail patterns are also apparent in the transient state. These patterns usually appear in low slack systems. Specifically, in systems with slack percentage lower than 32% (see Table 4). The tail is typically characterized by a fixed thickness and appears at specified places in the policy table (see Figure 3(b)). The last pattern, associated with the transient state, shows that the last I periods are always green, since in that time period no orders arrive and therefore there is no need in waiting. Key points of the heuristic design Three main principles have guided us in designing the heuristic approach: 8

. A rough cut of the optimal policy. The basic idea of our design is to follow the general visual form of the optimal policy table.

optimal policy. Still, our heuristic policy does not imitate the exact pattern of the optimal policy.

The heuristic policy comprises from several parameters. The setting strategy of all these parameters was based on the results of the optimal solutions.

A don t damage approach. The heuristic policy attempts to achieve better results than the two naive heuristics.

19 . A rough cut of the optimal policy. The basic idea of our design is to follow the general visual form of the optimal policy table. Consequently, we identify several types of problems based on their parameters and construct a typical generic heuristic policy for each problem type based on the above patterns of the optimal policy. Still, our heuristic policy does not imitate the exact pattern of the optimal policy. For example, we ignore the jagged left side of the red patterns shown in Figure 6 and replace it with a rectangular pattern. 2. The maximum similarity principle. The heuristic policy comprises from several parameters. The setting strategy of all these parameters was based on the results of the optimal solutions. Consequently, we have identified empirical properties of the optimal solution with regard to each of the parameters, and determined the parameters of the heuristic policy accordingly. 3. A don t damage approach. The heuristic policy attempts to achieve better results than the two naive heuristics. Accordingly, we wanted the MDP heuristic policy to indicate the action choices different from the naïve heuristics, only when such an Figure 7 illustrates the rough cut approach by showing four policies, two optimal and two heuristic, for two problems. (a) Optimal policy (high slack) (b) Optimal policy (low slack) (c) Heuristic policy (high slack) (d) Heuristic policy (low slack) Figure 7. Optimal versus heuristic policy in a high and low slack percentage system 9

20 action yields an improved performance over the best naïve heuristic. Therefore, we were very conservative in the parameter setting. When the optimal policy follows a pattern similar to those appears in Figure 5(a), clearly green policy heuristic outperforms the slack heuristic. In this case, we have added the only those cubes that were observed in all cases. Similarly, when the optimal policy follows a pattern similar to those in Figure 5(b), clearly the slack heuristic outperforms the green heuristic. In these cases, only those green holes that were identified in all of the cases were added. 5.2 Algorithmic formulation The following steps work as the instructions to the construction of an MDP based heuristic. These instructions are general and they fit different warehouses with different configurations. Every parameter in the following formulation was generated according to the maximum similarity principle and the don t damage approach, which were described above. Step : Calculate the slack percentage. Step 2: Set n and n 2 in the following manner: 0 if 0 slack percentage 0.36 n = if 0.36 < slack percentage 0.46 slack if 0.46 < slack percentage n 2 0 if 0 slack percentage 0.36 = if 0.36 < slack percentage 0.44 slack if 0.44 < slack percentage Step 3: (generalize the steady state vector on the entire policy table). According to the values of n and n 2 set the steady state policy vector. 2. Use the steady state policy vector to paint the entire policy table in a unified manner. Step 4: (set the three green holes or the four red cubes) If slack percentage < 0.44 then set the following steps in the left column (constructing three green holes). Else, set the following steps in the right column (constructing four red cubes).. t = N I T () : the starting point of the first green hole. 2. t = t () : the starting point of 2 T. t = N I : the starting point of the first red cube. 2. t = t (3) : the starting point of the 2 T 20

21 the second green hole. 3. t = t () : the starting point of 3 2 T the third green hole. 4. A T () = : the length of the first hole. 5. B T () = : the length of the second hole. 6. C T () = : the length of the third hole. 7. V = { t t +,.., t + A }, : the group of decision epochs in which the first hole is present. 8. V = { t t +,.., t + B } 2 2, 2 2 :the group of decision epochs in which the second hole is present. 9. V = { t t +,.., t + C } 3 3, 3 3 :the group of decision epochs in which the third hole is present. 0. V = V V2 V3 Set the three green holes as follows. For every decision epoch t, which is a member of the group V, and for every state s, which is a member of the group S, set a=a 2 (go on tour). second red cube. 3. t = t (3) : the starting point of the 3 2 T third red cube. 4. t = t (3) : the starting point of the 4 3 T forth red cube. 5. A T (3) = : the length of the first cube. 6. B T (3) = : the length of the second cube. 7. C T (3) = : the length of the third cube. 8. D T (3) = : the length of the fourth cube. 9. W = { t t,.., t A }, : the group of decision epochs in which the first cube is present. 0. = { t t,.., t B } W : the 2 2, 2 2 group of decision epochs in which the second cube is present.. = { t t,.., t C } W : the 3 3, 3 3 group of decision epochs in which the third cube is present. 2. = { t t,.., t D } W : the 4 4, 4 4 group of decision epochs in which the forth cube is present. 3. W = W W2 W3 W4 Set the four red cubes as follows. For every decision epoch t which is a member of the group W and for every state s which uses the slack to its full except the last unit (i.e., n and n 2 equal 2

22 ), set a=a (wait another time period). Step 5: (set the tail) On the basis of the slack percentage set m and m 2 (which determine whether a tail exists or not, in states of one order and two orders accordingly), set the tail thickness and the stopping state of the tail. The stopping state of the tail indicates that no more tails appear below this state. The tail s parameters are determined according to Table 5. Table 5. Tail parameter determination. Slack percentage m [ - tail exists, 0 tail does not exist] m 2 [ - tail exists, 0 tail does not exist] 0 Tail thickness (uniform thickness) Stopping state s γ 2 =T(2) s γ 2 =T(2) s γ 2 =T(2) s γ 2 =T(2) s γ 2 =T(2)+ - Based on these parameters (m, m 2, Tail thickness, stopping state) set the action a (wait another period) on the regions of the tail. 5.3 Experiments After completing the design of the MDP based heuristic, we have conducted experimentation for evaluation the performance of the heuristic procedure. The main purpose of the experiments was to compare its performance with the two naïve heuristics and the optimal algorithm. In addition, the effect of the length of the planning period was also examined. As a result of the first session of experiments, in which the slack percentage turned out to be very meaningful, we now choose to determine T() in an indirect manner. We did so, by defining the slack percentage as a direct independent parameter. Three parameters were examined. First, the length of the planning period was set to two levels, and decision epochs. Next, the order lead time, d, was set to 5, 30 and 45. Last, the slack percentage, which was identified in the previous set of experiments as the most influencing parameter, was set to five values: 20, 33, 40, 53 and 60 percent. All other parameters were left the same as in the first session of experiments. The experimental results are presented in Table 6. The first five columns contain the number of experiment, the length (in time periods) of the working day N, the order lead time, d, the picking time of one order T() and the slack percentage. These data define the 22

23 warehouse configuration. The next four columns contain the average daily cost, which was evaluated by 0,000 runs of the simulation model (0,000 working days), for each of the four order-picking policies (the optimal MDP policy, the MDP based heuristic and the two naive heuristics). In the tenth column, the relative improvement (in terms of average daily cost) of the MDP based heuristic to the better naive heuristic is presented. In the next column one can see whether the above difference is statistically significant (based on significant level of 95%). Finally, the percentile distance between the optimal policy and the MDP heuristic policy is shown. Table 6. Experimental results in the second session. # N d T() Slack (%) Opt. MDP-H Green Slack MDP-H vs best naive H (%) Significance No No No No No No No No No No MDP-H vs optimal (%) MDP heuristic versus the two naive heuristics Results show that the MDP based heuristic (MDP-H) outperforms the best naïve heuristic in all of the experiments. The slack percentage has been identified as the only parameter which significantly affects the difference between MDP-H and the best naïve heuristic. This value increases with the value of the slack (see Figure 8). Note that high difference in high slack systems stems from the fact that the total cost in these systems is very low and mostly associated with the transient stage. This issue is addressed next. 23

24 Figure 8. Average improvement of MDP-H over naïve heuristic. Sensitivity to different working day lengths Two types of costs were considered in the analysis, the tardiness cost and the overtime cost. The former occurred mostly during the steady state, while the latter was observed in the transient state. When analyzing different lengths of working day, clearly, the relative effect of the transient state decreases with the length of working day. One may expect that the main advantage of MDP-H over the naïve heuristics is related to the transient state. Consequently, we expected the difference between the two heuristics to increase as the working day becomes shorter. When we compared the experiments of to those of time periods, no significant effect of length of the working day was identified in high slack systems (slacks of 53% and 60%). The reason was that in such systems, the tardiness cost, associated with the steady state, was close to zero, and hence the overtime cost became the most significant cost element. As a result, the effect of the working day length became negligible. Small and medium slack systems (up to 40%) demonstrate the increased effectiveness of the time epochs experiments over the time epochs, as shown in Figure 9. Distance from the optimal policy The optimal policy was generated and compared to MDP-H. Not surprisingly, we have seen that also in this case the slack percentage significantly affects the distance from optimality. In particular, we have observed that the distance from optimality increases with the slack. Figure 0 demonstrates this phenomenon, as one can see that the difference between MDP-H and the optimal solution is relatively small (up to 0.62%) for a slack percentage of up to 0.4. For higher slacks we can see much larger differences. However, as can be seen in Figure 4, high slack systems are characterized by a very low costs. Hence the absolute value of the difference is insignificant. 24

Interventionist Routing Algorithm for single-block warehouse: Application and Complexity

Interventionist Routing Algorithm for single-block warehouse: Application and Complexity Wenrong Lu, Duncan McFarlane, Vaggelis Giannikas May 19, 2015 CUED/E-MANUF/TR.35 ISSN: 0951-9338 1 Introduction