Joint Optimization of Sampling and Control of Partially Observable Failing Systems

Size: px

Start display at page:

Download "Joint Optimization of Sampling and Control of Partially Observable Failing Systems"

Bethany Mason
5 years ago
Views:

1 OPERATIONS RESEARCH Vol. 61, No. 3, May June 213, pp ISSN 3-364X (print ISSN (online INFORMS Joint Optimization of Sampling and Control of Partially Observable Failing Systems Michael Jong Kim Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada; and Department of Decision Sciences, NUS Business School, Singapore, Republic of Singapore, Viliam Makis Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada, Stochastic control problems that arise in reliability and maintenance optimization typically assume that information used for decision-making is obtained according to a predetermined sampling schedule. In many real applications, however, there is a high sampling cost associated with collecting such data. It is therefore of equal importance to determine when information should be collected and to decide how this information should be utilized for maintenance decision-making. This type of joint optimization has been a long-standing problem in the operations research and maintenance optimization literature, and very few results regarding the structure of the optimal sampling and maintenance policy have been published. In this paper, we formulate and analyze the joint optimization of sampling and maintenance decision-making in the partially observable Markov decision process framework. We prove the optimality of a policy that is characterized by three critical thresholds, which have practical interpretation and give new insight into the value of condition-based maintenance programs in lifecycle asset management. Illustrative numerical comparisons are provided that show substantial cost savings over existing suboptimal policies. Subject classifications: reliability: maintenance/repairs; inspection; failure models; dynamic programming/optimal control: Markov; applications. Area of review: Stochastic Models. History: Received July 211; revisions received June 212, December 212; accepted February Introduction Modern manufacturing and service industries rely heavily on complex technical systems for their everyday operations. These systems typically deteriorate and are subject to breakdowns due to usage and age. The high cost associated with unplanned breakdowns has stimulated a lot of research activity in the maintenance optimization literature, where the main focus has been on determining the optimal time to preventively repair or replace a system before it fails. One of the earliest and most significant contributions to this class of problems is the celebrated paper of Barlow and Hunter (196. More recent contributions are given by Dogramaci and Fraiman (24, Heidergott and Farenhorst-Yuan (21, Kurt and Kharoufeh (21, and Kim et al. (211, among others. The most advanced state-of-the-art maintenance program applied in practice is known as condition-based maintenance (CBM, which recommends maintenance actions based on information collected through online condition monitoring. CBM initiates maintenance actions only when there is strong evidence of severe system deterioration, which significantly reduces maintenance costs by decreasing the number of unnecessary maintenance operations. For a recent overview of the mathematical models and technologies used in CBM, readers are referred to Jardine et al. (26 and the references therein. The common assumption made in CBM optimization models is that information used for decision-making is obtained at periodic equidistant sampling epochs. Under this assumption, the goal is to determine the optimal maintenance policy that optimizes an objective function over a finite or infinite time horizon. Recent contributions are given by Dayanik and Gurler (22, Makis and Jiang (23, Wang et al. (29, and Juang and Anderson (24. The problem with the equidistant sampling assumption is that in many applications there is a high sampling cost associated with collecting observable data. It is therefore of equal importance to determine when information should be collected and to decide how this information should be utilized for maintenance decision-making. This type of joint optimization has been a long-standing problem in the operations research and maintenance optimization literature, and very few results regarding the structure of the optimal sampling and maintenance policy have been published. Most work on the joint optimization of sampling and maintenance assumes that condition monitoring samples provide perfect information. That is, the true state of the 777

2 778 Operations Research 61(3, pp , 213 INFORMS system is fully observable at times of condition monitoring, and unobservable otherwise. The seminal work of Ross (1971 is the most relevant early contribution to this stream of research. Under the expected total discounted reward criterion, the author considered a two-state model and showed that the optimal policy is characterized by at most four regions. The excellent follow-up work of Rosenfield (1976 and White (1978 investigates extensions of this result to the more general N > 2 model. White (1978, for example, showed, using partial ordering of vectors on the probability simplex, that the optimal policy has certain structural properties that correspond to the fourregion policy described by Ross. The later work of Ohnishi et al. (1986 considered a similar perfect information problem. Under certain monotonicity assumptions, the authors were able to partially characterize the form of the optimal policy and showed that the times between successive samples are monotonically decreasing. More recently, Yeh (1996 considered a joint sampling and maintenance problem with perfect information and proposed a number of different algorithms to compute optimal policies, but was unable to characterize the form of the optimal policy. A distinguishing feature of the model considered in this paper is that we do not assume samples provide perfect information. Rather, when the system is sampled, condition monitoring data are only stochastically related to the underlying system state. Indeed, in many real applications, condition monitoring data such as spectrometric oil data or vibration data gives only partial (imperfect information about the true system state. In the imperfect information setting, the aforementioned perfect information approaches cannot be directly used, because taking a sampling action no longer guarantees that the posterior state vector is at a vertex on the probability simplex. Although some work has been done on the joint optimization of sampling and maintenance with imperfect information, to our knowledge, very few structural optimality results have been reported in maintenance optimization literature, even for the case when N = 2. The classical papers of Eckles (1968 and Smallwood and Sondik (1973 studied partially observable systems with imperfect information under a very general setting. Lovejoy (1987 also provides general results for certain classes of partially observable Markov decision processes (POMDPs with concave cost-to-go functions. Ehrenfeld (1976 studied a partially observable failing system with two system states: a healthy state and a silent failure state. The author conjectured that under certain conditions, a policy with at most three regions is optimal. However, the author was unable to prove this claim. An excellent recent contribution is given by Maillart (26, who considered maintenance policies for systems with perfect and imperfect information. The author derived structural properties for the perfect information case and used these properties to motivate heuristic policies for the imperfect information case. Other variants of the joint optimization of sampling and maintenance with imperfect information can be found in the papers of Anderson and Friedman (1978, Dieulle et al. (23, Jiang (21, Kander (1978, Kuo (26, Lam and Yeh (1994, and Tagaras and Nikolaidis (22 under different model assumptions, but again with few explicit structural results. Most of the aforementioned papers model both the system state and sampling process in discrete time. Another interesting feature of our formulation is that we model the system state process in continuous time and the observation process in discrete time. The reason we choose this discrete-continuous modeling approach is that in many real systems, the time lengths between sampling epochs can be quite long relative to the system degradation. For example, in the mining industry, it is not uncommon that oil samples collected through condition monitoring can be taken only once every (few hundred operational hours. This means that degradation and failure will typically occur at random times in between the sampling epochs, which cannot be captured in a discrete-discrete framework. Surprisingly, little research has been done on discrete-continuous models, which appear to be a better representation of real systems. In this paper, we consider a system whose state information is unobservable and can only be inferred by taking a sample through condition monitoring. Condition monitoring data provide imperfect information in that it is only stochastically related to the underlying system state. System failure, on the other hand, is fully observable. The decision maker can decide when condition monitoring information should be collected, as well as when to initiate preventive maintenance. The objective is to characterize the structural form of the optimal sampling and maintenance policy that minimizes the long-run expected cost per unit time. The problem is formulated as a partially observable Markov decision process (POMDP. It is shown that monitoring the posterior probability that the system is in a so-called warning state is sufficient for decision-making. The primary contribution of the paper is the proof that the optimal control policy can be represented as a control chart with three critical thresholds, which monitors the posterior probability that the system is in the warning state. Such a control chart has direct practical value because it can be readily implemented for online decision-making. We also show that the three critical thresholds have straightforward and intuitive interpretation, so that decisions can be easily justified and explained at a managerial level. Implication of the structural results, such as planning maintenance activities into the future, are discussed, and cost comparisons with other suboptimal policies are developed that illustrate the benefits of the joint optimization of sampling and control. The joint optimization of sampling and maintenance considered in this paper should not be confused with the body of research that deals with pure sampling optimization. Such models are sometimes referred to in the literature as

3 Operations Research 61(3, pp , 213 INFORMS 779 optimal inspection, sequential sampling, or optimal checking models. In these models, the decision maker has control only over when the system should be sampled, whereas preventive maintenance decisions are either not permitted (e.g., Barlow et al. 1963, Badia et al. 21, Parmigiani 1996 or follow some predetermined stopping rule (e.g., Ozekici and Pliska 1991, Nakagawa This makes such problems significantly easier to analyze than the more general joint sampling and maintenance problem considered in this paper. The remainder of the paper is organized as follows. In 2, we formulate and analyze the joint optimization problem in the POMDP framework. In 3, we determine the structural properties of the optimal policy. The dynamic optimality equation is derived, and we establish the form of the optimal sampling and maintenance policy. In 4, we develop an iterative algorithm to compute the optimal policy and the long-run expected average cost per unit time. We also provide numerical comparisons with other suboptimal policies that illustrate the benefits of the joint optimization of sampling and maintenance. In 5, we give concluding remarks and discuss possible extensions to our model and future research directions. 2. Problem Formulation Let F P be a complete probability space on which the following stochastic processes are defined. The state process X t t + is modeled as a continuous-time homogeneous Markov chain with unobservable operational states N 1 and an observable failure state N, so that the state space of the Markov chain is = N 1 N. The instantaneous transition rates PX q ij = lim h = j X = i < + i j h + h q ii = j i q ij and the state transition rate matrix = q ij N +1 N +1. We assume that if i < j, then state i is not worse than state j, and state denotes the state of a good-as-new or healthy system. To model such monotonic system deterioration, we assume that the state process is nondecreasing with probability 1, i.e., q ij = for all j < i. In particular, this implies that the failure state is absorbing. We also assume that if i < j, then failure rates q i N q j N. Let = inft + X t = N be the observable time of system failure. Upon system failure, mandatory corrective maintenance that takes T F time units is performed at a cost C F, which brings the system to a healthy state. To avoid costly failures, the decision maker can take a sample at a cost C S. In real applications, condition monitoring samples are not available at the instant it is decided to sample the system. In particular, once it is decided to sample the system at time t, there is typically a delay of h > time units (with magnitude that depends on the particular application before the system is eventually sampled at time t + h. We therefore naturally assume that the decision maker has the opportunity to take (or not take samples only at time points h 2h 3h Condition monitoring information at time nh is denoted Y n and takes values in E = 1 L. Samples Y n are stochastically related to the operational system state X nh. In particular, while the system is in operational state X nh = i 1 N 1, sample Y n has state-dependent distribution d iy = PY n = y X nh = i y E (1 The state-observation matrix is denoted = d iy. At the beginning of each decision epoch, the decision maker can initiate full system inspection to reveal (with probability 1 the current state of the system at a cost C I. If the system is found to be in a deteriorated state i 1 N 1, preventive maintenance is performed that brings the system to a healthy state at a cost C P i. If the system is found to be in healthy state, no preventive maintenance is performed, and the process continues. Full system inspection and preventive maintenance (in state i takes T I and T P i time units, respectively. We make the standard assumption that C F max i C I +C P i. For every time unit the system remains in deteriorated state i 1 N 1, an operating cost C W i is incurred. The objective is to characterize the structural form of the optimal sampling and maintenance policy that minimizes the long-run expected average cost per unit time. The problem can be formulated in the POMDP framework as follows. While the system is operational, one of the following three actions a n must be taken at each decision epoch time nh: 1. Do nothing, and take an action at the next decision epoch time n + 1h. 2. Take a sample. Information from the sample Y n+1 is first made available for decision-making at the beginning of the next decision epoch time n + 1h. 3. Initiate full system inspection, possibly followed by preventive maintenance. If nh time units have elapsed since the last maintenance action (full inspection, preventive maintenance or corrective maintenance and k samples Y n1 Y nk have been collected at time points < n 1 h < < n k h nh, then it is well known from the theory of POMDPs (e.g., Bertsekas and Shreve 1978 that the N -dimensional posterior state vector n = [ n n N 1 ] with n i = P ( X nh = i > nh Y n1 Y nk (2 the posterior probability that the system is in state i given all available information until time nh, represents sufficient information for decision-making at the nth decision epoch. Then, if an optimal stationary policy exists, it has the functional form 1 2 3, 1, where

4 78 Operations Research 61(3, pp , 213 INFORMS indicates the action a n to be chosen when n =. Let be the class of all stationary policies. From renewal theory, the long-run expected average cost per unit time is calculated for any stationary policy as the expected total cost TC incurred in one cycle divided by the expected cycle length CL, where a cycle is completed when either full system inspection, preventive maintenance, or corrective maintenance is carried out. For any stationary policy, let M = inf { nh + n = 3 } (3 represent the first time at which full system inspection is initiated, and let N = { n n = 2 nh < M } (4 represent the total number of samples collected in a cycle. Then, from the model description given above, TC = C S N + and M N 1 i=1 C W ii Xt =i dt N 1 + C I I XM = + C I + C P ii XM =i + C F I XM =N i=1 CL = M + T I I XM = T I + T P ii XM =i + T F I XM =N N 1 + i=1 For the average cost criterion, the problem is to find a stationary policy, if it exists, minimizing the longrun expected average cost per unit time given by E TC E CL where E is the conditional expectation given. We assume that a new system is installed at the beginning of the first cycle, i.e., = 1 N 1. We first transform the stochastic control problem (5 to an equivalent parameterized stochastic control problem (with parameter with an additive objective function. This transformation is known as the -minimization technique, and its theory is developed in the excellent paper of Aven and Bergman (1986. Define for > the function V = inf E TC CL (6 Then, Aven and Bergman (1986 showed that determined by the equation = inf > V (7 (5 is the optimal expected average cost for the stochastic control problem (5, and the stationary policy that minimizes the right-hand side of (6 for = determines the optimal stationary policy. We refer to the function V defined in (6 as the value function. We have found, through our experience with real diagnostic data such as spectrometric oil data (e.g., Kim et al. 211, Makis et al. 26 and vibration data (e.g., Yang and Makis 21, that it is usually sufficient (and even preferable to consider only two operational states a healthy state (state and a warning state (state 1. In many cases, the system moves through two distinct phases of operation. In the first and longer phase, the system operates under normal conditions, and the observations behave in a stationary manner. Although system degradation can be gradual, it is usually not until degradation has exceeded a certain level that the behavior of the condition monitoring observations changes and there is a nonnegligible increase in the system failure rate. At this point, the system enters the second and shorter phase, which we define to be the warning state. Such a characterization is consistent with the CBM paradigm, as it has the desirable property that maintenance actions are initiated only when the system experiences severe deterioration that can actually cause imminent failure. Accordingly, in this paper we shall focus on the case in which = 1 2, i.e., N = 2. In this setting, it follows that the univariate quantity n = P ( X nh = 1 > nh Y n1 Y nk (8 the probability that the system is in warning state 1 given all available information until time nh, represents sufficient information for decision-making at the nth decision epoch. We also simplify the notation for the cost and time parameters and write C P 1 = C P, C W 1 = C W and T P 1 = T P. In principle, the N -state version of the problem can be studied and analyzed. However, the majority of the theoretical and practical results obtained in this paper do not carry over to the more general setting, which is another reason we strongly advocate for the three-state model. The main problem roots in the fact that for the N -state model, the sufficient statistic for decision-making is no longer a univariate quantity but is an N -dimensional vector, i.e., the posterior state distribution. This makes finding an analytical form for the sampling and control regions extremely difficult (and perhaps not possible. For example, although it can be shown for the N -state model that the optimal preventive maintenance region is a convex subset of the N -dimensional probability simplex, in practice this does not mean much. Without any analytical or explicit structural form, the control regions must be approximated by solving the corresponding multidimensional dynamic program, which is computationally intractable. In particular, solving such a multidimensional dynamic program is a PSPACE hard problem and requires an exponential amount of memory and computation (see e.g., Papadimitriou 1995.

5 Operations Research 61(3, pp , 213 INFORMS 781 To compound this problem, the optimal policy for the N -state model depends on the auxiliary parameter, so the optimality equation must be solved for different values of until the optimal is found. As will be shown in 3, this is not the case for the three-state model. In particular, our characterization of the optimal policy will imply that the optimal control policy no longer depends on the parameter. Lastly, the optimal sampling and maintenance policy can no longer be visualized and represented as a simple univariate control chart, making decisions at a managerial level less intuitive and more difficult to automate and implement. In the next section, we analyze the value function defined in (6 and determine the structure of the optimal sampling and maintenance policy. For the remainder of the paper, to simplify notation we suppress the dependence on when there is no confusion and write, for example, V instead of V. 3. Structural Form of the Optimal Policy The goal of this section is to characterize the form of the optimal sampling and maintenance policy. The strategy we take is to first analyze the control problem over a restricted subclass of stationary policies k in which full system inspection must be initiated no later than at time kh. The value function V k for the restricted control problem is derived, and its properties are determined. The restriction is then lifted, and the properties of the restricted value functions V k are carried over to the infinite horizon value function V, which can be obtained as the limit V k V. The dynamic optimality equation is then derived, and further properties of the infinite horizon value function V are determined. It is then shown that the optimal policy is characterized by three critical thresholds, which have practical value and intuitive interpretation. We begin by providing a closed-form expression for the transition probability matrix for the uncontrolled state process X t. By the model assumptions given in 2, it can be shown by solving the Kolmogorov backward differential equations (e.g., Grimmett and Stirzaker 21 that the transition probability matrix for the uncontrolled state process is given by Pt = p ij t q e t 1 e 1t e t 1 e t 1 = q 1e 1t e t 1 e 1t 1 e 1t 1 where transition probabilities p ij t = PX t = j X = i, i j, and constants = q 1 + q 2, 1 = q 12. (9 Suppose at decision epoch n the system has not failed, i.e., > nh, and n =. Then for any t h, the probability that the system will not fail by nh + t is given by Rt = P > nh + t > nh n = = 1 p 2 t1 + 1 p 12 t (1 The function R defined in (1 is known as the conditional reliability function. If the decision maker choses action a n = 2 (take a sample, then at the beginning of the next decision epoch n+1, if > n+1h, a sample Y n+1 is made available, and the state probability is updated using Bayes rule (e.g., Schervish 1995: n+1 Y n+1 =PX n+1h =1 >n+1hy n+1 n = d 1Yn+1 p 1 h1 +p 11 h = d Yn+1 p h1 +d 1Yn+1 p 1 h1 +p 11 h (11 On the other hand, if the decision maker chooses action a n = 1 (do nothing, at the beginning of the next decision epoch n + 1, if > n + 1h, no new sample is available, so that the state probability is given by n+1 = PX n+1h = 1 > n + 1h n = = p 1h1 + p 11 h Rh p = 1 h1 +p 11 h (12 1 p 2 h1 +1 p 12 h The empty set symbol in (12 is used to indicate that no new sample Y n+1 was obtained at the beginning of decision epoch n + 1. We next analyze the control problem over a restricted subclass of stationary policies. For k, let k represent the class of stationary policies, such that the time of the first decision epoch at which full system inspection is initiated is less than or equal to kh with probability 1, i.e., M kh. Then, by the dynamic programming algorithm (e.g., Bertsekas and Shreve 1978, the value function for the restricted control problem V k = inf k E TC CL (13 satisfies the dynamic equations V = C I + C P T I + T P V k = minv 1 k V 2 k V 3 k (14 where V 1 k = C W ( p1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh V k 1 1

6 782 Operations Research 61(3, pp , 213 INFORMS V 2 k = C ( S + C W p1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh y E V k 1 1 y gy (15 V 3 k = C I + C P T I + T P and gy = d yp h1 +d 1y p 1 h1 +p 11 h (16 Rh The first term Vk 1 in (15 is the expected cost if action 1 (do nothing is chosen, and the decision maker runs the system for one period, updates the state probability 1 using Equation (12, and then continues optimally with k 1 periods left. The second term Vk 2 is the expected cost if action 2 (take a sample is chosen, and the decision maker runs the system for one period, collects a sample Y 1 = y, updates the state probability 1 y using Equation (11, and then continues optimally with k 1 periods left. The third term Vk 3 is the expected cost if action 3 (full inspection is chosen, and the decision maker stops the process for full system inspection, possibly followed by preventive maintenance. It then follows from Equations (14 (16 that the restricted value functions V k have the following property. Lemma 1. For each k, V k is a concave function of. Proof. See appendix. We also have the following lower bound on the family of restricted value functions V k. Lemma 2. The restricted value functions V k are uniformly bounded from below: V k h + T F (17 1 Rh Proof. See appendix. Lemmas 1 and 2 allow us to characterize the infinite horizon value function V defined in (6. For each k, since k k+1, by definition of V k given in Equation (13 it follows that V k V k+1. Then by Lemma 2, since the restricted value functions V k are uniformly bounded from below, lim k V k = V exists, and by Lemma 1 the value function V is concave and bounded. Furthermore, Bertsekas and Shreve (1978 showed (Lemma 5.1., Proposition 5.4. under much more general conditions that it satisfies the following dynamic optimality equation, which gives us our first important structural result. Theorem 1. The infinite horizon value function defined in Equation (6 is obtained as the limit V =lim k V k Furthermore, V is a concave, bounded function of, satisfying the dynamic optimality equation V = minv 1 V 2 V 3 (18 where V 1 = C W p 1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh V 1 V 2 = C S + C W p 1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh y E V 1 y gy V 3 = C I + C P T I + T P (19 It then follows that the value function V is also nondecreasing. Corollary 1. The infinite horizon value function V is a nondecreasing function of. Proof. See appendix. We next prove a theorem that makes use of the result in the classical paper of Barlow and Hunter (196. Theorem 2. Any policy that never stops the process to initiate full system inspection, i.e., M = +, is not optimal. Proof. Consider an age-based policy n that initiates full system inspection at time nh. From renewal theory, the long-run expected average cost per unit time for this policy is given by gn C F p 2 nh+c I p nh+c I +C P p 1 nh nh +C W p 1 tdt +C S EN n = Enh +T F p 2 nh+t I p nh+t I +T P p 1 nh Thus, to prove the claim, it suffices to show that (2 arg min gn < + (21 n To show (21, we derive an upper bound for arg min n gn by considering a related process in which we remove all incentive to stop the process early, so that full system

7 Operations Research 61(3, pp , 213 INFORMS 783 inspection must be done at a later time. In particular, consider a related process in which full system inspection costs C I + C P, whether the system is found to be in healthy or warning state, and all maintenance actions (corrective, inspection and preventive take time units. We furthermore assume that there is no penalty to run the system longer, so that C W = C S =. Then, if preventive maintenance is scheduled at time nh, the expected average cost for this process is given by bn = C F p 2 nh + C I + C P 1 p 2 nh (22 Enh and clearly arg min n gn arg min bn n To complete the proof, we show that arg min n bn < +, which implies Equation (21. Since we have assumed q 12 > q 2, the failure rate of is increasing. We have also assumed that C F > C I + C P. Barlow and Hunter (196 showed that under these assumptions, there exists a positive real value t < + such that t is the unique minimizer of bt. For our problem, arg min n bn is required to be integer-valued. However, since t is a unique minimizer, the function bt is increasing for t > t. Thus, it follows that arg min n bn t < +, which completes the proof. The optimal sampling and maintenance policy is described by the following theorem. Theorem 3. The optimal sampling and maintenance policy is characterized by three critical thresholds L U 1. In particular, at decision epoch n: 1. If n < L, do nothing, and run the system until the next decision epoch n If L n < U, take a sample. 3. If U n <, do nothing, and run the system until the next decision epoch n If n, initiate full system inspection, followed possibly by preventive maintenance. 5. Corrective maintenance is carried out immediately upon system failure. Proof. We first show that for = 1, V 3 1 < V 1 1 < V 2 1. We start with the second inequality V 1 1 < V 2 1. By Equation (19, V 1 1 V 2 1 = Rh 1V 1 1 Rh 1 V 1 y 1gy 1 C S y E = Rh1V1 Rh1V 1 y Egy1 C S = C S < which implies V 1 1 < V 2 1. We next show the first inequality V 3 1 < V 1 1 using mathematical induction. The inequality is equivalent to V 3 1 = V 1. For k = 1, we assume V1 31 > V 11 and draw a contradiction. Since it is not optimal to stop the process to initiate full system inspection when =, linearity of V1 1 V 1 2 V 1 3, implies that V1 3 > V 1 for all 1. Since for each k, V k V k+1, it follows that the limit V = lim k V k < V1 3, and the policy that never stops the process is optimal, which contradicts Theorem 2. Whence V1 31 = V 11, and by Equation (14, C I + C P T I T P C W p 11 t dt + C F T F 1 Rh 1 + Rh 1C I + C P Rt 1 dt Suppose now that for some k >, V 3 k 1 = V k1. Using the above inequality, it follows that V k+1 1 = minv 1 k+1 1 V 2 k+1 1 V 3 k+1 1 = minv 1 k+1 1 V 3 k+1 1 = C I + C P T I T P = V 3 k+1 1 which completes the inductive step. Therefore, the limit V 1 = lim k V k 1 = V 3 1. Thus for = 1, V 3 1 < V 1 1 < V 2 1. Since for =, V 1 < V 2 < V 3, the above inequalities and Equation (18 imply that the region V = V 3 is a convex subset of 1 of the form 1, for some 1, and the region V = V 2 is a convex subset of 1 of the form L U, for some L U, which completes the proof. Theorem 3 shows that the optimal control policy can be represented as a type of control chart, which monitors the probability n that the system is in a warning state. The intuitive interpretation of the three critical thresholds L U is as follows. When the probability that the system is in a warning state is below the lower sampling limit n L, the decision maker has high confidence that the system is in healthy state, and therefore has little reason to take an expensive sample through condition monitoring to confirm this belief. Similarly, when the state probability is above the upper sampling limit n U, the decision maker has high confidence that the system is in warning state 1, and therefore also has little reason to take the sample. It is only when the state probability L n < U that the decision maker is unsure about the system s condition and is willing to pay for a sample to get a better idea about its health. However, once the state probability exceeds, the risk of system failure and of incurring an expensive corrective maintenance cost is too high, so the decision maker should stop the process and initiate full system inspection, possibly followed by preventive maintenance.

8 784 Operations Research 61(3, pp , 213 INFORMS Remark 1. It is important to note that practitioners can also use the control policy described in Theorem 3 as a tool for planning maintenance activities in advance. For example, if U n <, the optimal action is to do nothing and run the system until the next decision epoch n + 1. However, since no sample is taken, the state probability at the next decision epoch n+1 = n = p 1 h1 + p 11 h 1 p 2 h1 + 1 p 12 h is a deterministic function given by Equation (12. Therefore, the next maintenance action (full system inspection can be scheduled to take place in the future in p T =infmh 1 mh1 +p 11 mh 1 p 2 mh1 +1 p 12 mh time units from now. Planning maintenance activities in advance is particularly useful in practice since suspending a system from operation for full inspection and maintenance could require significant preparation. Remark 2. It is interesting to note that the result obtained in Theorem 3 is consistent with the at most four region result first introduced by Ross (1971 in the perfect information setting. That is, the three critical thresholds L U partition the interval 1 into at most four regions (see Figure 2. Intuitively, one would expect that if the sampling cost C S =, we should always take a sample. On the other hand, if the sampling cost is greater than the cost of full system inspection and preventive maintenance, i.e., C S > C I + C P, one would expect that we should never take a sample. To conclude this section, we show using Jensen s inequality (e.g., Billingsley 1995 that this intuition is mathematically correct. Corollary 2. If the sampling cost C S =, then L = and U =. In other words, before full system inspection is initiated, i.e., for all <, it is always optimal to take a sample, i.e., = 2. On the other hand, if the sampling cost C S > C I +C P, then L = U =. In other words, before full system inspection is initiated, i.e., for all <, it is never optimal to take a sample, i.e., = 1. Proof of Corollary 2. By Equation (19, V 1 V 2 = Rh V 1 Rh y EV 1 ygy C S Also, Equations (11, (12, and (16 imply 1 = y E 1 y gy Thus, by concavity of V, it follows by Jensen s inequality that for all 1, Rh V 1 Rh y E V 1 y gy Thus if C S =, for all 1, V 1 V 2, and it is always optimal to sample if <. For the case in which C S > C I + C P, since we know by Corollary 1 the value function V is a nondecreasing function of, it follows that for all 1, Rh V 1 Rh y E V 1 y gy C I + C P. Thus C S > C I + C P, then V 1 < V 2, i.e., it is never optimal to take a sample. In the next section, we develop an iterative computational algorithm that determines the optimal values of the critical thresholds L U and the minimum long-run expected average cost per unit time. We also provide numerical comparisons with other suboptimal policies that illustrate the benefits of the joint optimization of sampling and maintenance. 4. Implementation of the Optimal Policy In this section, we develop a computational algorithm that determines the optimal values of the critical thresholds L U and the long-run expected average cost per unit time. We also provide numerical comparisons with other suboptimal policies that illustrate the benefits of the joint optimization of sampling and maintenance. The computational algorithm is based on the -minimization technique (Aven and Bergman 1986 and the (monotone convergence of the restricted value functions V k V. The Algorithm Step 1. Choose > and lower and upper bounds of, Lower and Upper. Step 2. Put = Lower + Upper /2, and V = C I + C P T I + T P, k = 1. Step 3. Calculate Vk using the dynamic Equations (14 and (15. Stop the iteration of Vk when Vk V k 1. Step 4. If Vk <, put Lower = and go to Step 2. If Vk >, put Upper = and go to Step 2. If Vk, put = and stop. In the algorithm above, Step 3 and Theorem 1 imply that the restricted value function Vk approximates the value function V for =. Step 4 and the -minimization technique (Aven and Bergman 1986 imply that is the optimal expected average cost. Furthermore, by Theorem 1, the optimal value of the lower (respectively upper sampling limit L (respectively U is the smallest (respectively largest value of such that V k = Vk 2, and is the smallest value of such that V k = Vk 3. In the algorithm above, since >, a natural choice for the initial value of the lower bound Lower is. However, it is not clear how one should choose the value of the initial upper bound Upper. Fortunately, we have the following result for a feasible choice of the initial upper bound. Lemma 3. The optimal average cost is bounded by < C I /T I. Thus, in the algorithm given above, Lower = and Upper = C I /T I are feasible initial values for lower and upper bounds, respectively.

9 Operations Research 61(3, pp , 213 INFORMS 785 Proof. Consider an age-based policy that initiates full system inspection immediately at time. From renewal theory, it is clear that the long-run expected average cost per unit time for this policy is given by = C I + C P T I + T P = C I T I where the second equality follows since we have assumed that a new system is installed at time, i.e., = PX = 1 =. Thus it follows that E TC = inf E CL E TC E CL = = C I T I Therefore, the optimal average cost is bounded by < C I /T I, which completes the proof. We next illustrate the use of the computational algorithm in the following subsection and determine the optimal values of the critical thresholds L U and the long-run expected average cost per unit time, in a numerical example Constructing the Optimal Control Chart In this subsection, we construct the cost-optimal control chart described in Theorem 3. Using the computational algorithm described at the beginning of this section, the optimal values of the critical thresholds L U and the long-run expected average cost per unit time are determined. Consider the following transition rate matrix and stateobservation matrix: = 5 5 = ( Maintenance cost parameters are given by C W = 3, C F = 85, C S = 2, C I = 65, C P = 2, and maintenance time parameters T F = T I = T P = h = 1. We coded the computational algorithm given above in MATLAB and obtained the following optimal values L = 5, U = 6, = 75, with a minimum expected average cost = The algorithm took seconds to complete on an Intel Corel 2 642, 2.13-GHz with 2 GB RAM. To run the algorithm, we chose = 1, and the interval 1 was discretized, considering values of = 1 2 1, so that V k V k 1 in Step 3, for example, is calculated as max =11 Vk V k 1. The value function is graphed in Figure 1. Theorem 3 implies that the optimal sampling and maintenance policy can be represented as a control chart, which monitors the probability n that the system is in a warning state. To illustrate the use of such a control chart, Figure 2 plots a sample path realization of n. The realizations of the sample values Y n, posterior probabilities n and corresponding optimal actions a n are recorded in Table 1. Figure 2 shows that no sample should be taken at the first decision epoch. From decision epochs 1 to 8, the posterior probability L n < U so the optimal action is to take a sample (see Table 1 for the realized sample values Y n. At decision epoch 9, the posterior probability U n <, so again it is optimal to do nothing. Finally, at decision epoch 1, n so the optimal action is full system inspection, possibly followed by preventive maintenance. Such a control chart has direct practical value as it can be readily implemented for online decision-making. Furthermore, since the monitored statistic is univariate and three critical thresholds have straightforward and intuitive interpretation, decisions that are made can be easily justified and explained at a managerial level. In the next subsection, we provide numerical comparisons with other policies that illustrate the benefits of the joint optimization of sampling and maintenance. Figure 1. The graph of the value function V V(

786 Operations Research 61(3, pp. 777 79, 213 INFORMS Figure 2. The optimal sampling and maintenance policy represented as a control chart. Π n 1..75.6 Full inspection Do nothing Take a sample.

10 786 Operations Research 61(3, pp , 213 INFORMS Figure 2. The optimal sampling and maintenance policy represented as a control chart. Π n Full inspection Do nothing Take a sample Do nothing n Comparison with Other Policies In this subsection, we compare the performance of our jointly optimal sampling and maintenance policy with the two most widely considered sampling policies: the policy that never takes a sample at any decision epoch, and the policy that always takes a sample at every decision epoch. Under each of these suboptimal sampling policies, the decision maker still has the freedom to initiate full system inspection at any time. On one hand, the policy that never takes a condition monitoring sample incurs no sampling costs but also has the least amount of information. On the other hand, the sampling policy that always takes a sample at every decision epoch carries the most information but also incurs the highest sampling cost. Our joint sampling and maintenance policy is the optimal balance between having the largest amount of information at the least cost. It is well known that the policy that never takes a sample at any decision epoch is nothing more than the classical age-based policy (e.g., Barlow and Hunter 196. Within our framework, this policy corresponds to the special case where L = U =. Similarly, the policy that always takes a sample at every decision epoch corresponds to the special case where L = and U =. This type of control policy is aptly known as a Bayesian control chart (e.g., Makis 28, Kim et al To facilitate our discussion, we refer to the policy that never takes a sample as an -Policy, the policy that always Table 1. Sample path realization of Y n, II n and a n. n Y n II n a n n Y n II n a n takes a sample as an -Policy, and our jointly optimal policy of Theorem 3 as a -Policy. We should note that the optimality of our joint policy has already been established, so necessarily it will outperform any other chosen suboptimal policy. The objective of this section is not to convince readers of this, but rather to illustrate the possible benefits for a given set of model parameters when compared to well-known suboptimal benchmark policies. For this comparison, we consider the following transition rate matrix and state-observation matrix: = =( and model parameters C W = 7, C F = 11, C S = 8, C I = 55, C P = 55, and T F = T I = T P = h = 1. We obtain the following results in Table 2. Table 2 shows that the jointly optimal -Policy performs substantially better than both the optimal -Policy and the optimal -Policy. In particular, Table 2 shows that using the optimal -Policy gives an expected 1.29% cost saving over the optimal -Policy, and an expected 8.51% cost saving over the optimal -Policy. Naturally, determining the optimal thresholds values L U for the -Policy takes longer than the determining the optimal threshold values for the optimal -Policy and the optimal -Policy. However, in practice, since these computations are typically done off-line, a total run time of a few minutes is surely worth the large cost savings obtained by using the optimal Table 2. Comparision with suboptimal policies. -Policy -Policy -Policy L 27 9 U Run time

11 Operations Research 61(3, pp , 213 INFORMS 787 Table 3. Optimal expected average cost for varying sampling costs C s. C s -Policy -Policy -Policy Policy. It is also interesting to note that in this example, the optimal threshold for full system inspection is quite low for all three policies. This is because the cost of corrective maintenance C F = 11 is relatively much higher than the cost of system inspection C I = 55 and preventive maintenance C P = 55. Therefore, it is more beneficial to perform full system inspection more frequently than to run the system longer and risk costly corrective maintenance due to failure. We next analyze the sensitivity of the optimal policy for different value of the sampling cost C S. In light of Corollary 2, we already know that the optimal -Policy coincides with the optimal -Policy when C S =, and with the optimal -Policy when C S > C I + C P = 11. We obtain the following results in Table 3. Table 3 provides important managerial insight into the operational value of condition monitoring information and technologies. This insight is best understood visually, so we plot the optimal expected average costs of Table 3 in Figure 3. Figure 3. Optimal average cost rate Graphical illustration of the optimal expected average cost for varying sampling costs C S Sampling cost The dashed horizontal line in Figure 3 is the expected average cost for the optimal -Policy for different values of the sampling cost C S. The dotted increasing line is the expected average cost for the optimal -Policy, and the solid increasing curve is the expected average cost for the optimal -Policy. Figure 3 shows that the optimal -Policy coincides with the optimal -Policy when C S =, and with the optimal -Policy when C S 24. The optimal -Policy is always better than the optimal -Policy from C S = to around C S = 9, after which the optimal -Policy is always better than the optimal -Policy. This implies that once the sampling cost C S exceeds 9, it is better to never take a sample and be ignorant of the state of the system than to incur regular condition monitoring sample costs to get a better idea of the system state. Although the optimal -Policy is always better than both the optimal -Policy and optimal -Policy for all values of C S, the benefits are approximately the greatest when C S = 9, i.e., the point at which the optimal -Policy is better than the optimal -Policy. On the other hand, the benefits of using the optimal -Policy become quite marginal when C S is close to and 24. This suggests that a manager is likely not to be willing to invest in condition monitoring technologies if the sampling cost C S is close to 24. Similarly, a manager should choose to sample the system at every decision epoch to simplify the scheduling of sampling and maintenance activities if the sampling cost C S is close to. The focus of this paper has been on a system with state space = 1 2, i.e., N = 2. In the following section, we discuss how to extend our results to the general case in which N > Concluding Remarks and Future Research In this paper, a joint sampling and control problem under partial observations has been considered. The problem has been formulated as a partially observable Markov decision process. The objective was to characterize the form of the optimal sampling and maintenance policy that minimizes the long-run expected average cost per unit time. It was shown that the optimal control policy can be represented as a control chart with three critical thresholds, which monitors the posterior probability that the system is in a so-called warning state. Such a control chart has direct practical value as it can be readily implemented for online decision-making. Furthermore, since the monitored statistic and three critical thresholds have straightforward and intuitive interpretation, decisions can be easily justified and explained at a managerial level. It was also shown that the structure of the optimal policy allows practitioners to plan and schedule maintenance activities into the future. A cost comparison with other suboptimal policies has been examined, which illustrates the benefits of the joint

12 788 Operations Research 61(3, pp , 213 INFORMS optimization of sampling and control. It was found that the jointly optimal sampling and maintenance policy performed substantially better than existing suboptimal policies. Numerical results indicate that the advantage of using the jointly optimal sampling and maintenance policy becomes less substantial for both very small and large values of the sampling cost C S. There are a number of exciting extensions and topics for future research. The numerical results of 4 showed that the run time of our algorithm took around 6 minutes to complete. Although this is not unreasonably long, there is still much room for improvement. In particular, a closer look at Theorem 3 reveals that the result has further computational value. Recall that the original stochastic control problem defined in (5 was transformed to an equivalent parameterized stochastic control problem (with parameter with an additive objective function using the -minimization technique. However, the characterization given in Theorem 3 implies that the optimal control policy no longer depends on and is completely determined by the ordered triple L U. This is a useful property from a computational point of view, since it is possible to develop an algorithm that directly finds the optimal values of L U that minimize the original objective function defined in (5. Such an algorithm would likely be faster than the algorithm presented in 4, as one would now be solving a single optimization problem as opposed to solving multiple stochastic control problems for different values of. Another interesting research direction would be to consider a more general model in which the system state process need only be stochastically increasing, which would be less restrictive than the current upper triangularity and increasing failure rate assumptions. It is conceivable that the main results given in this paper could still be established under this weaker assumption. While we believe the current assumptions in this paper are reasonable for most mechanical systems that cannot self-repair, such an extension might be useful in studying, for example, medical decision problems in which the patient degradation process might not necessarily be monotonic. Lastly, an important future research topic would be to develop a full case study in which the effectiveness and quality of our results would be tested on real-world data sets. This could likely also lead to a further refinement of both the model and algorithm. Appendix. Proofs Proof of Lemma 1. We prove this lemma using mathematical induction. For k = 1, substituting Equations (1 (12 into (15 shows that V1 1, V 1 2, V 1 3 are linear, and hence concave, functions of. Assume that for some k >, V k is a concave function of. We want to show that V k+1 is also a concave function of. Since the min operator preserves concavity and Rh is a linear function of, it suffices to show that Rh V k 1 and Rh y E V k 1 y gy are concave functions of. Fix arbitrary Then by Equation (12, ( Rh = 1 Rh Rh 2 ( 1 Rh + 2 Rh Rh 2 Then by concavity of V k, Rh V k Rh 1 V k Rh 2 V k 1 2 which shows that Rh V k 1 is a concave function of. Similarly, by Equation (11, for each y E, 1 y ( gy = 1 Rh 1 gy 1 Rh 1 +1 gy 2 Rh 2 1 y 1 ( 1 gy + 2 Rh 2 gy 1 Rh gy 2 Rh 2 1 y 2 Then by concavity of V k, Rh V k 1 y gy y E y EV k 1 y 1 gy 1 Rh V k 1 y 2 gy 2 Rh 2 = Rh 1 y E V k 1 y 1 gy Rh 2 y E V k 1 y 2 gy 2 which shows that Rh y E V k 1 y gy is a concave function of. Thus by mathematical induction, for each k >, V k is a concave function of. Proof of Lemma 2. We prove inequality (17 using mathematical induction. For k =, it is clear that V h + T F /1 Rh. Assume that for some k, V k h + T F /1 Rh. Then it follows that V 1 k+1 = C W p 1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh V k 1 h + T F h + T F Rh 1 Rh h + T F h + T F Rh 1 Rh = h + T F 1 Rh

Abstract. 1. Introduction

Abstract. 1. Introduction Abstract Repairable system reliability: recent developments in CBM optimization A.K.S. Jardine, D. Banjevic, N. Montgomery, A. Pak Department of Mechanical and Industrial Engineering, University of Toronto,