Joint Optimization of Sampling and Control of Partially Observable Failing Systems

Size: px
Start display at page:

Download "Joint Optimization of Sampling and Control of Partially Observable Failing Systems"

Transcription

1 OPERATIONS RESEARCH Vol. 61, No. 3, May June 213, pp ISSN 3-364X (print ISSN (online INFORMS Joint Optimization of Sampling and Control of Partially Observable Failing Systems Michael Jong Kim Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada; and Department of Decision Sciences, NUS Business School, Singapore, Republic of Singapore, Viliam Makis Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario M5S 3G8, Canada, Stochastic control problems that arise in reliability and maintenance optimization typically assume that information used for decision-making is obtained according to a predetermined sampling schedule. In many real applications, however, there is a high sampling cost associated with collecting such data. It is therefore of equal importance to determine when information should be collected and to decide how this information should be utilized for maintenance decision-making. This type of joint optimization has been a long-standing problem in the operations research and maintenance optimization literature, and very few results regarding the structure of the optimal sampling and maintenance policy have been published. In this paper, we formulate and analyze the joint optimization of sampling and maintenance decision-making in the partially observable Markov decision process framework. We prove the optimality of a policy that is characterized by three critical thresholds, which have practical interpretation and give new insight into the value of condition-based maintenance programs in lifecycle asset management. Illustrative numerical comparisons are provided that show substantial cost savings over existing suboptimal policies. Subject classifications: reliability: maintenance/repairs; inspection; failure models; dynamic programming/optimal control: Markov; applications. Area of review: Stochastic Models. History: Received July 211; revisions received June 212, December 212; accepted February Introduction Modern manufacturing and service industries rely heavily on complex technical systems for their everyday operations. These systems typically deteriorate and are subject to breakdowns due to usage and age. The high cost associated with unplanned breakdowns has stimulated a lot of research activity in the maintenance optimization literature, where the main focus has been on determining the optimal time to preventively repair or replace a system before it fails. One of the earliest and most significant contributions to this class of problems is the celebrated paper of Barlow and Hunter (196. More recent contributions are given by Dogramaci and Fraiman (24, Heidergott and Farenhorst-Yuan (21, Kurt and Kharoufeh (21, and Kim et al. (211, among others. The most advanced state-of-the-art maintenance program applied in practice is known as condition-based maintenance (CBM, which recommends maintenance actions based on information collected through online condition monitoring. CBM initiates maintenance actions only when there is strong evidence of severe system deterioration, which significantly reduces maintenance costs by decreasing the number of unnecessary maintenance operations. For a recent overview of the mathematical models and technologies used in CBM, readers are referred to Jardine et al. (26 and the references therein. The common assumption made in CBM optimization models is that information used for decision-making is obtained at periodic equidistant sampling epochs. Under this assumption, the goal is to determine the optimal maintenance policy that optimizes an objective function over a finite or infinite time horizon. Recent contributions are given by Dayanik and Gurler (22, Makis and Jiang (23, Wang et al. (29, and Juang and Anderson (24. The problem with the equidistant sampling assumption is that in many applications there is a high sampling cost associated with collecting observable data. It is therefore of equal importance to determine when information should be collected and to decide how this information should be utilized for maintenance decision-making. This type of joint optimization has been a long-standing problem in the operations research and maintenance optimization literature, and very few results regarding the structure of the optimal sampling and maintenance policy have been published. Most work on the joint optimization of sampling and maintenance assumes that condition monitoring samples provide perfect information. That is, the true state of the 777

2 778 Operations Research 61(3, pp , 213 INFORMS system is fully observable at times of condition monitoring, and unobservable otherwise. The seminal work of Ross (1971 is the most relevant early contribution to this stream of research. Under the expected total discounted reward criterion, the author considered a two-state model and showed that the optimal policy is characterized by at most four regions. The excellent follow-up work of Rosenfield (1976 and White (1978 investigates extensions of this result to the more general N > 2 model. White (1978, for example, showed, using partial ordering of vectors on the probability simplex, that the optimal policy has certain structural properties that correspond to the fourregion policy described by Ross. The later work of Ohnishi et al. (1986 considered a similar perfect information problem. Under certain monotonicity assumptions, the authors were able to partially characterize the form of the optimal policy and showed that the times between successive samples are monotonically decreasing. More recently, Yeh (1996 considered a joint sampling and maintenance problem with perfect information and proposed a number of different algorithms to compute optimal policies, but was unable to characterize the form of the optimal policy. A distinguishing feature of the model considered in this paper is that we do not assume samples provide perfect information. Rather, when the system is sampled, condition monitoring data are only stochastically related to the underlying system state. Indeed, in many real applications, condition monitoring data such as spectrometric oil data or vibration data gives only partial (imperfect information about the true system state. In the imperfect information setting, the aforementioned perfect information approaches cannot be directly used, because taking a sampling action no longer guarantees that the posterior state vector is at a vertex on the probability simplex. Although some work has been done on the joint optimization of sampling and maintenance with imperfect information, to our knowledge, very few structural optimality results have been reported in maintenance optimization literature, even for the case when N = 2. The classical papers of Eckles (1968 and Smallwood and Sondik (1973 studied partially observable systems with imperfect information under a very general setting. Lovejoy (1987 also provides general results for certain classes of partially observable Markov decision processes (POMDPs with concave cost-to-go functions. Ehrenfeld (1976 studied a partially observable failing system with two system states: a healthy state and a silent failure state. The author conjectured that under certain conditions, a policy with at most three regions is optimal. However, the author was unable to prove this claim. An excellent recent contribution is given by Maillart (26, who considered maintenance policies for systems with perfect and imperfect information. The author derived structural properties for the perfect information case and used these properties to motivate heuristic policies for the imperfect information case. Other variants of the joint optimization of sampling and maintenance with imperfect information can be found in the papers of Anderson and Friedman (1978, Dieulle et al. (23, Jiang (21, Kander (1978, Kuo (26, Lam and Yeh (1994, and Tagaras and Nikolaidis (22 under different model assumptions, but again with few explicit structural results. Most of the aforementioned papers model both the system state and sampling process in discrete time. Another interesting feature of our formulation is that we model the system state process in continuous time and the observation process in discrete time. The reason we choose this discrete-continuous modeling approach is that in many real systems, the time lengths between sampling epochs can be quite long relative to the system degradation. For example, in the mining industry, it is not uncommon that oil samples collected through condition monitoring can be taken only once every (few hundred operational hours. This means that degradation and failure will typically occur at random times in between the sampling epochs, which cannot be captured in a discrete-discrete framework. Surprisingly, little research has been done on discrete-continuous models, which appear to be a better representation of real systems. In this paper, we consider a system whose state information is unobservable and can only be inferred by taking a sample through condition monitoring. Condition monitoring data provide imperfect information in that it is only stochastically related to the underlying system state. System failure, on the other hand, is fully observable. The decision maker can decide when condition monitoring information should be collected, as well as when to initiate preventive maintenance. The objective is to characterize the structural form of the optimal sampling and maintenance policy that minimizes the long-run expected cost per unit time. The problem is formulated as a partially observable Markov decision process (POMDP. It is shown that monitoring the posterior probability that the system is in a so-called warning state is sufficient for decision-making. The primary contribution of the paper is the proof that the optimal control policy can be represented as a control chart with three critical thresholds, which monitors the posterior probability that the system is in the warning state. Such a control chart has direct practical value because it can be readily implemented for online decision-making. We also show that the three critical thresholds have straightforward and intuitive interpretation, so that decisions can be easily justified and explained at a managerial level. Implication of the structural results, such as planning maintenance activities into the future, are discussed, and cost comparisons with other suboptimal policies are developed that illustrate the benefits of the joint optimization of sampling and control. The joint optimization of sampling and maintenance considered in this paper should not be confused with the body of research that deals with pure sampling optimization. Such models are sometimes referred to in the literature as

3 Operations Research 61(3, pp , 213 INFORMS 779 optimal inspection, sequential sampling, or optimal checking models. In these models, the decision maker has control only over when the system should be sampled, whereas preventive maintenance decisions are either not permitted (e.g., Barlow et al. 1963, Badia et al. 21, Parmigiani 1996 or follow some predetermined stopping rule (e.g., Ozekici and Pliska 1991, Nakagawa This makes such problems significantly easier to analyze than the more general joint sampling and maintenance problem considered in this paper. The remainder of the paper is organized as follows. In 2, we formulate and analyze the joint optimization problem in the POMDP framework. In 3, we determine the structural properties of the optimal policy. The dynamic optimality equation is derived, and we establish the form of the optimal sampling and maintenance policy. In 4, we develop an iterative algorithm to compute the optimal policy and the long-run expected average cost per unit time. We also provide numerical comparisons with other suboptimal policies that illustrate the benefits of the joint optimization of sampling and maintenance. In 5, we give concluding remarks and discuss possible extensions to our model and future research directions. 2. Problem Formulation Let F P be a complete probability space on which the following stochastic processes are defined. The state process X t t + is modeled as a continuous-time homogeneous Markov chain with unobservable operational states N 1 and an observable failure state N, so that the state space of the Markov chain is = N 1 N. The instantaneous transition rates PX q ij = lim h = j X = i < + i j h + h q ii = j i q ij and the state transition rate matrix = q ij N +1 N +1. We assume that if i < j, then state i is not worse than state j, and state denotes the state of a good-as-new or healthy system. To model such monotonic system deterioration, we assume that the state process is nondecreasing with probability 1, i.e., q ij = for all j < i. In particular, this implies that the failure state is absorbing. We also assume that if i < j, then failure rates q i N q j N. Let = inft + X t = N be the observable time of system failure. Upon system failure, mandatory corrective maintenance that takes T F time units is performed at a cost C F, which brings the system to a healthy state. To avoid costly failures, the decision maker can take a sample at a cost C S. In real applications, condition monitoring samples are not available at the instant it is decided to sample the system. In particular, once it is decided to sample the system at time t, there is typically a delay of h > time units (with magnitude that depends on the particular application before the system is eventually sampled at time t + h. We therefore naturally assume that the decision maker has the opportunity to take (or not take samples only at time points h 2h 3h Condition monitoring information at time nh is denoted Y n and takes values in E = 1 L. Samples Y n are stochastically related to the operational system state X nh. In particular, while the system is in operational state X nh = i 1 N 1, sample Y n has state-dependent distribution d iy = PY n = y X nh = i y E (1 The state-observation matrix is denoted = d iy. At the beginning of each decision epoch, the decision maker can initiate full system inspection to reveal (with probability 1 the current state of the system at a cost C I. If the system is found to be in a deteriorated state i 1 N 1, preventive maintenance is performed that brings the system to a healthy state at a cost C P i. If the system is found to be in healthy state, no preventive maintenance is performed, and the process continues. Full system inspection and preventive maintenance (in state i takes T I and T P i time units, respectively. We make the standard assumption that C F max i C I +C P i. For every time unit the system remains in deteriorated state i 1 N 1, an operating cost C W i is incurred. The objective is to characterize the structural form of the optimal sampling and maintenance policy that minimizes the long-run expected average cost per unit time. The problem can be formulated in the POMDP framework as follows. While the system is operational, one of the following three actions a n must be taken at each decision epoch time nh: 1. Do nothing, and take an action at the next decision epoch time n + 1h. 2. Take a sample. Information from the sample Y n+1 is first made available for decision-making at the beginning of the next decision epoch time n + 1h. 3. Initiate full system inspection, possibly followed by preventive maintenance. If nh time units have elapsed since the last maintenance action (full inspection, preventive maintenance or corrective maintenance and k samples Y n1 Y nk have been collected at time points < n 1 h < < n k h nh, then it is well known from the theory of POMDPs (e.g., Bertsekas and Shreve 1978 that the N -dimensional posterior state vector n = [ n n N 1 ] with n i = P ( X nh = i > nh Y n1 Y nk (2 the posterior probability that the system is in state i given all available information until time nh, represents sufficient information for decision-making at the nth decision epoch. Then, if an optimal stationary policy exists, it has the functional form 1 2 3, 1, where

4 78 Operations Research 61(3, pp , 213 INFORMS indicates the action a n to be chosen when n =. Let be the class of all stationary policies. From renewal theory, the long-run expected average cost per unit time is calculated for any stationary policy as the expected total cost TC incurred in one cycle divided by the expected cycle length CL, where a cycle is completed when either full system inspection, preventive maintenance, or corrective maintenance is carried out. For any stationary policy, let M = inf { nh + n = 3 } (3 represent the first time at which full system inspection is initiated, and let N = { n n = 2 nh < M } (4 represent the total number of samples collected in a cycle. Then, from the model description given above, TC = C S N + and M N 1 i=1 C W ii Xt =i dt N 1 + C I I XM = + C I + C P ii XM =i + C F I XM =N i=1 CL = M + T I I XM = T I + T P ii XM =i + T F I XM =N N 1 + i=1 For the average cost criterion, the problem is to find a stationary policy, if it exists, minimizing the longrun expected average cost per unit time given by E TC E CL where E is the conditional expectation given. We assume that a new system is installed at the beginning of the first cycle, i.e., = 1 N 1. We first transform the stochastic control problem (5 to an equivalent parameterized stochastic control problem (with parameter with an additive objective function. This transformation is known as the -minimization technique, and its theory is developed in the excellent paper of Aven and Bergman (1986. Define for > the function V = inf E TC CL (6 Then, Aven and Bergman (1986 showed that determined by the equation = inf > V (7 (5 is the optimal expected average cost for the stochastic control problem (5, and the stationary policy that minimizes the right-hand side of (6 for = determines the optimal stationary policy. We refer to the function V defined in (6 as the value function. We have found, through our experience with real diagnostic data such as spectrometric oil data (e.g., Kim et al. 211, Makis et al. 26 and vibration data (e.g., Yang and Makis 21, that it is usually sufficient (and even preferable to consider only two operational states a healthy state (state and a warning state (state 1. In many cases, the system moves through two distinct phases of operation. In the first and longer phase, the system operates under normal conditions, and the observations behave in a stationary manner. Although system degradation can be gradual, it is usually not until degradation has exceeded a certain level that the behavior of the condition monitoring observations changes and there is a nonnegligible increase in the system failure rate. At this point, the system enters the second and shorter phase, which we define to be the warning state. Such a characterization is consistent with the CBM paradigm, as it has the desirable property that maintenance actions are initiated only when the system experiences severe deterioration that can actually cause imminent failure. Accordingly, in this paper we shall focus on the case in which = 1 2, i.e., N = 2. In this setting, it follows that the univariate quantity n = P ( X nh = 1 > nh Y n1 Y nk (8 the probability that the system is in warning state 1 given all available information until time nh, represents sufficient information for decision-making at the nth decision epoch. We also simplify the notation for the cost and time parameters and write C P 1 = C P, C W 1 = C W and T P 1 = T P. In principle, the N -state version of the problem can be studied and analyzed. However, the majority of the theoretical and practical results obtained in this paper do not carry over to the more general setting, which is another reason we strongly advocate for the three-state model. The main problem roots in the fact that for the N -state model, the sufficient statistic for decision-making is no longer a univariate quantity but is an N -dimensional vector, i.e., the posterior state distribution. This makes finding an analytical form for the sampling and control regions extremely difficult (and perhaps not possible. For example, although it can be shown for the N -state model that the optimal preventive maintenance region is a convex subset of the N -dimensional probability simplex, in practice this does not mean much. Without any analytical or explicit structural form, the control regions must be approximated by solving the corresponding multidimensional dynamic program, which is computationally intractable. In particular, solving such a multidimensional dynamic program is a PSPACE hard problem and requires an exponential amount of memory and computation (see e.g., Papadimitriou 1995.

5 Operations Research 61(3, pp , 213 INFORMS 781 To compound this problem, the optimal policy for the N -state model depends on the auxiliary parameter, so the optimality equation must be solved for different values of until the optimal is found. As will be shown in 3, this is not the case for the three-state model. In particular, our characterization of the optimal policy will imply that the optimal control policy no longer depends on the parameter. Lastly, the optimal sampling and maintenance policy can no longer be visualized and represented as a simple univariate control chart, making decisions at a managerial level less intuitive and more difficult to automate and implement. In the next section, we analyze the value function defined in (6 and determine the structure of the optimal sampling and maintenance policy. For the remainder of the paper, to simplify notation we suppress the dependence on when there is no confusion and write, for example, V instead of V. 3. Structural Form of the Optimal Policy The goal of this section is to characterize the form of the optimal sampling and maintenance policy. The strategy we take is to first analyze the control problem over a restricted subclass of stationary policies k in which full system inspection must be initiated no later than at time kh. The value function V k for the restricted control problem is derived, and its properties are determined. The restriction is then lifted, and the properties of the restricted value functions V k are carried over to the infinite horizon value function V, which can be obtained as the limit V k V. The dynamic optimality equation is then derived, and further properties of the infinite horizon value function V are determined. It is then shown that the optimal policy is characterized by three critical thresholds, which have practical value and intuitive interpretation. We begin by providing a closed-form expression for the transition probability matrix for the uncontrolled state process X t. By the model assumptions given in 2, it can be shown by solving the Kolmogorov backward differential equations (e.g., Grimmett and Stirzaker 21 that the transition probability matrix for the uncontrolled state process is given by Pt = p ij t q e t 1 e 1t e t 1 e t 1 = q 1e 1t e t 1 e 1t 1 e 1t 1 where transition probabilities p ij t = PX t = j X = i, i j, and constants = q 1 + q 2, 1 = q 12. (9 Suppose at decision epoch n the system has not failed, i.e., > nh, and n =. Then for any t h, the probability that the system will not fail by nh + t is given by Rt = P > nh + t > nh n = = 1 p 2 t1 + 1 p 12 t (1 The function R defined in (1 is known as the conditional reliability function. If the decision maker choses action a n = 2 (take a sample, then at the beginning of the next decision epoch n+1, if > n+1h, a sample Y n+1 is made available, and the state probability is updated using Bayes rule (e.g., Schervish 1995: n+1 Y n+1 =PX n+1h =1 >n+1hy n+1 n = d 1Yn+1 p 1 h1 +p 11 h = d Yn+1 p h1 +d 1Yn+1 p 1 h1 +p 11 h (11 On the other hand, if the decision maker chooses action a n = 1 (do nothing, at the beginning of the next decision epoch n + 1, if > n + 1h, no new sample is available, so that the state probability is given by n+1 = PX n+1h = 1 > n + 1h n = = p 1h1 + p 11 h Rh p = 1 h1 +p 11 h (12 1 p 2 h1 +1 p 12 h The empty set symbol in (12 is used to indicate that no new sample Y n+1 was obtained at the beginning of decision epoch n + 1. We next analyze the control problem over a restricted subclass of stationary policies. For k, let k represent the class of stationary policies, such that the time of the first decision epoch at which full system inspection is initiated is less than or equal to kh with probability 1, i.e., M kh. Then, by the dynamic programming algorithm (e.g., Bertsekas and Shreve 1978, the value function for the restricted control problem V k = inf k E TC CL (13 satisfies the dynamic equations V = C I + C P T I + T P V k = minv 1 k V 2 k V 3 k (14 where V 1 k = C W ( p1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh V k 1 1

6 782 Operations Research 61(3, pp , 213 INFORMS V 2 k = C ( S + C W p1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh y E V k 1 1 y gy (15 V 3 k = C I + C P T I + T P and gy = d yp h1 +d 1y p 1 h1 +p 11 h (16 Rh The first term Vk 1 in (15 is the expected cost if action 1 (do nothing is chosen, and the decision maker runs the system for one period, updates the state probability 1 using Equation (12, and then continues optimally with k 1 periods left. The second term Vk 2 is the expected cost if action 2 (take a sample is chosen, and the decision maker runs the system for one period, collects a sample Y 1 = y, updates the state probability 1 y using Equation (11, and then continues optimally with k 1 periods left. The third term Vk 3 is the expected cost if action 3 (full inspection is chosen, and the decision maker stops the process for full system inspection, possibly followed by preventive maintenance. It then follows from Equations (14 (16 that the restricted value functions V k have the following property. Lemma 1. For each k, V k is a concave function of. Proof. See appendix. We also have the following lower bound on the family of restricted value functions V k. Lemma 2. The restricted value functions V k are uniformly bounded from below: V k h + T F (17 1 Rh Proof. See appendix. Lemmas 1 and 2 allow us to characterize the infinite horizon value function V defined in (6. For each k, since k k+1, by definition of V k given in Equation (13 it follows that V k V k+1. Then by Lemma 2, since the restricted value functions V k are uniformly bounded from below, lim k V k = V exists, and by Lemma 1 the value function V is concave and bounded. Furthermore, Bertsekas and Shreve (1978 showed (Lemma 5.1., Proposition 5.4. under much more general conditions that it satisfies the following dynamic optimality equation, which gives us our first important structural result. Theorem 1. The infinite horizon value function defined in Equation (6 is obtained as the limit V =lim k V k Furthermore, V is a concave, bounded function of, satisfying the dynamic optimality equation V = minv 1 V 2 V 3 (18 where V 1 = C W p 1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh V 1 V 2 = C S + C W p 1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh y E V 1 y gy V 3 = C I + C P T I + T P (19 It then follows that the value function V is also nondecreasing. Corollary 1. The infinite horizon value function V is a nondecreasing function of. Proof. See appendix. We next prove a theorem that makes use of the result in the classical paper of Barlow and Hunter (196. Theorem 2. Any policy that never stops the process to initiate full system inspection, i.e., M = +, is not optimal. Proof. Consider an age-based policy n that initiates full system inspection at time nh. From renewal theory, the long-run expected average cost per unit time for this policy is given by gn C F p 2 nh+c I p nh+c I +C P p 1 nh nh +C W p 1 tdt +C S EN n = Enh +T F p 2 nh+t I p nh+t I +T P p 1 nh Thus, to prove the claim, it suffices to show that (2 arg min gn < + (21 n To show (21, we derive an upper bound for arg min n gn by considering a related process in which we remove all incentive to stop the process early, so that full system

7 Operations Research 61(3, pp , 213 INFORMS 783 inspection must be done at a later time. In particular, consider a related process in which full system inspection costs C I + C P, whether the system is found to be in healthy or warning state, and all maintenance actions (corrective, inspection and preventive take time units. We furthermore assume that there is no penalty to run the system longer, so that C W = C S =. Then, if preventive maintenance is scheduled at time nh, the expected average cost for this process is given by bn = C F p 2 nh + C I + C P 1 p 2 nh (22 Enh and clearly arg min n gn arg min bn n To complete the proof, we show that arg min n bn < +, which implies Equation (21. Since we have assumed q 12 > q 2, the failure rate of is increasing. We have also assumed that C F > C I + C P. Barlow and Hunter (196 showed that under these assumptions, there exists a positive real value t < + such that t is the unique minimizer of bt. For our problem, arg min n bn is required to be integer-valued. However, since t is a unique minimizer, the function bt is increasing for t > t. Thus, it follows that arg min n bn t < +, which completes the proof. The optimal sampling and maintenance policy is described by the following theorem. Theorem 3. The optimal sampling and maintenance policy is characterized by three critical thresholds L U 1. In particular, at decision epoch n: 1. If n < L, do nothing, and run the system until the next decision epoch n If L n < U, take a sample. 3. If U n <, do nothing, and run the system until the next decision epoch n If n, initiate full system inspection, followed possibly by preventive maintenance. 5. Corrective maintenance is carried out immediately upon system failure. Proof. We first show that for = 1, V 3 1 < V 1 1 < V 2 1. We start with the second inequality V 1 1 < V 2 1. By Equation (19, V 1 1 V 2 1 = Rh 1V 1 1 Rh 1 V 1 y 1gy 1 C S y E = Rh1V1 Rh1V 1 y Egy1 C S = C S < which implies V 1 1 < V 2 1. We next show the first inequality V 3 1 < V 1 1 using mathematical induction. The inequality is equivalent to V 3 1 = V 1. For k = 1, we assume V1 31 > V 11 and draw a contradiction. Since it is not optimal to stop the process to initiate full system inspection when =, linearity of V1 1 V 1 2 V 1 3, implies that V1 3 > V 1 for all 1. Since for each k, V k V k+1, it follows that the limit V = lim k V k < V1 3, and the policy that never stops the process is optimal, which contradicts Theorem 2. Whence V1 31 = V 11, and by Equation (14, C I + C P T I T P C W p 11 t dt + C F T F 1 Rh 1 + Rh 1C I + C P Rt 1 dt Suppose now that for some k >, V 3 k 1 = V k1. Using the above inequality, it follows that V k+1 1 = minv 1 k+1 1 V 2 k+1 1 V 3 k+1 1 = minv 1 k+1 1 V 3 k+1 1 = C I + C P T I T P = V 3 k+1 1 which completes the inductive step. Therefore, the limit V 1 = lim k V k 1 = V 3 1. Thus for = 1, V 3 1 < V 1 1 < V 2 1. Since for =, V 1 < V 2 < V 3, the above inequalities and Equation (18 imply that the region V = V 3 is a convex subset of 1 of the form 1, for some 1, and the region V = V 2 is a convex subset of 1 of the form L U, for some L U, which completes the proof. Theorem 3 shows that the optimal control policy can be represented as a type of control chart, which monitors the probability n that the system is in a warning state. The intuitive interpretation of the three critical thresholds L U is as follows. When the probability that the system is in a warning state is below the lower sampling limit n L, the decision maker has high confidence that the system is in healthy state, and therefore has little reason to take an expensive sample through condition monitoring to confirm this belief. Similarly, when the state probability is above the upper sampling limit n U, the decision maker has high confidence that the system is in warning state 1, and therefore also has little reason to take the sample. It is only when the state probability L n < U that the decision maker is unsure about the system s condition and is willing to pay for a sample to get a better idea about its health. However, once the state probability exceeds, the risk of system failure and of incurring an expensive corrective maintenance cost is too high, so the decision maker should stop the process and initiate full system inspection, possibly followed by preventive maintenance.

8 784 Operations Research 61(3, pp , 213 INFORMS Remark 1. It is important to note that practitioners can also use the control policy described in Theorem 3 as a tool for planning maintenance activities in advance. For example, if U n <, the optimal action is to do nothing and run the system until the next decision epoch n + 1. However, since no sample is taken, the state probability at the next decision epoch n+1 = n = p 1 h1 + p 11 h 1 p 2 h1 + 1 p 12 h is a deterministic function given by Equation (12. Therefore, the next maintenance action (full system inspection can be scheduled to take place in the future in p T =infmh 1 mh1 +p 11 mh 1 p 2 mh1 +1 p 12 mh time units from now. Planning maintenance activities in advance is particularly useful in practice since suspending a system from operation for full inspection and maintenance could require significant preparation. Remark 2. It is interesting to note that the result obtained in Theorem 3 is consistent with the at most four region result first introduced by Ross (1971 in the perfect information setting. That is, the three critical thresholds L U partition the interval 1 into at most four regions (see Figure 2. Intuitively, one would expect that if the sampling cost C S =, we should always take a sample. On the other hand, if the sampling cost is greater than the cost of full system inspection and preventive maintenance, i.e., C S > C I + C P, one would expect that we should never take a sample. To conclude this section, we show using Jensen s inequality (e.g., Billingsley 1995 that this intuition is mathematically correct. Corollary 2. If the sampling cost C S =, then L = and U =. In other words, before full system inspection is initiated, i.e., for all <, it is always optimal to take a sample, i.e., = 2. On the other hand, if the sampling cost C S > C I +C P, then L = U =. In other words, before full system inspection is initiated, i.e., for all <, it is never optimal to take a sample, i.e., = 1. Proof of Corollary 2. By Equation (19, V 1 V 2 = Rh V 1 Rh y EV 1 ygy C S Also, Equations (11, (12, and (16 imply 1 = y E 1 y gy Thus, by concavity of V, it follows by Jensen s inequality that for all 1, Rh V 1 Rh y E V 1 y gy Thus if C S =, for all 1, V 1 V 2, and it is always optimal to sample if <. For the case in which C S > C I + C P, since we know by Corollary 1 the value function V is a nondecreasing function of, it follows that for all 1, Rh V 1 Rh y E V 1 y gy C I + C P. Thus C S > C I + C P, then V 1 < V 2, i.e., it is never optimal to take a sample. In the next section, we develop an iterative computational algorithm that determines the optimal values of the critical thresholds L U and the minimum long-run expected average cost per unit time. We also provide numerical comparisons with other suboptimal policies that illustrate the benefits of the joint optimization of sampling and maintenance. 4. Implementation of the Optimal Policy In this section, we develop a computational algorithm that determines the optimal values of the critical thresholds L U and the long-run expected average cost per unit time. We also provide numerical comparisons with other suboptimal policies that illustrate the benefits of the joint optimization of sampling and maintenance. The computational algorithm is based on the -minimization technique (Aven and Bergman 1986 and the (monotone convergence of the restricted value functions V k V. The Algorithm Step 1. Choose > and lower and upper bounds of, Lower and Upper. Step 2. Put = Lower + Upper /2, and V = C I + C P T I + T P, k = 1. Step 3. Calculate Vk using the dynamic Equations (14 and (15. Stop the iteration of Vk when Vk V k 1. Step 4. If Vk <, put Lower = and go to Step 2. If Vk >, put Upper = and go to Step 2. If Vk, put = and stop. In the algorithm above, Step 3 and Theorem 1 imply that the restricted value function Vk approximates the value function V for =. Step 4 and the -minimization technique (Aven and Bergman 1986 imply that is the optimal expected average cost. Furthermore, by Theorem 1, the optimal value of the lower (respectively upper sampling limit L (respectively U is the smallest (respectively largest value of such that V k = Vk 2, and is the smallest value of such that V k = Vk 3. In the algorithm above, since >, a natural choice for the initial value of the lower bound Lower is. However, it is not clear how one should choose the value of the initial upper bound Upper. Fortunately, we have the following result for a feasible choice of the initial upper bound. Lemma 3. The optimal average cost is bounded by < C I /T I. Thus, in the algorithm given above, Lower = and Upper = C I /T I are feasible initial values for lower and upper bounds, respectively.

9 Operations Research 61(3, pp , 213 INFORMS 785 Proof. Consider an age-based policy that initiates full system inspection immediately at time. From renewal theory, it is clear that the long-run expected average cost per unit time for this policy is given by = C I + C P T I + T P = C I T I where the second equality follows since we have assumed that a new system is installed at time, i.e., = PX = 1 =. Thus it follows that E TC = inf E CL E TC E CL = = C I T I Therefore, the optimal average cost is bounded by < C I /T I, which completes the proof. We next illustrate the use of the computational algorithm in the following subsection and determine the optimal values of the critical thresholds L U and the long-run expected average cost per unit time, in a numerical example Constructing the Optimal Control Chart In this subsection, we construct the cost-optimal control chart described in Theorem 3. Using the computational algorithm described at the beginning of this section, the optimal values of the critical thresholds L U and the long-run expected average cost per unit time are determined. Consider the following transition rate matrix and stateobservation matrix: = 5 5 = ( Maintenance cost parameters are given by C W = 3, C F = 85, C S = 2, C I = 65, C P = 2, and maintenance time parameters T F = T I = T P = h = 1. We coded the computational algorithm given above in MATLAB and obtained the following optimal values L = 5, U = 6, = 75, with a minimum expected average cost = The algorithm took seconds to complete on an Intel Corel 2 642, 2.13-GHz with 2 GB RAM. To run the algorithm, we chose = 1, and the interval 1 was discretized, considering values of = 1 2 1, so that V k V k 1 in Step 3, for example, is calculated as max =11 Vk V k 1. The value function is graphed in Figure 1. Theorem 3 implies that the optimal sampling and maintenance policy can be represented as a control chart, which monitors the probability n that the system is in a warning state. To illustrate the use of such a control chart, Figure 2 plots a sample path realization of n. The realizations of the sample values Y n, posterior probabilities n and corresponding optimal actions a n are recorded in Table 1. Figure 2 shows that no sample should be taken at the first decision epoch. From decision epochs 1 to 8, the posterior probability L n < U so the optimal action is to take a sample (see Table 1 for the realized sample values Y n. At decision epoch 9, the posterior probability U n <, so again it is optimal to do nothing. Finally, at decision epoch 1, n so the optimal action is full system inspection, possibly followed by preventive maintenance. Such a control chart has direct practical value as it can be readily implemented for online decision-making. Furthermore, since the monitored statistic is univariate and three critical thresholds have straightforward and intuitive interpretation, decisions that are made can be easily justified and explained at a managerial level. In the next subsection, we provide numerical comparisons with other policies that illustrate the benefits of the joint optimization of sampling and maintenance. Figure 1. The graph of the value function V V(

10 786 Operations Research 61(3, pp , 213 INFORMS Figure 2. The optimal sampling and maintenance policy represented as a control chart. Π n Full inspection Do nothing Take a sample Do nothing n Comparison with Other Policies In this subsection, we compare the performance of our jointly optimal sampling and maintenance policy with the two most widely considered sampling policies: the policy that never takes a sample at any decision epoch, and the policy that always takes a sample at every decision epoch. Under each of these suboptimal sampling policies, the decision maker still has the freedom to initiate full system inspection at any time. On one hand, the policy that never takes a condition monitoring sample incurs no sampling costs but also has the least amount of information. On the other hand, the sampling policy that always takes a sample at every decision epoch carries the most information but also incurs the highest sampling cost. Our joint sampling and maintenance policy is the optimal balance between having the largest amount of information at the least cost. It is well known that the policy that never takes a sample at any decision epoch is nothing more than the classical age-based policy (e.g., Barlow and Hunter 196. Within our framework, this policy corresponds to the special case where L = U =. Similarly, the policy that always takes a sample at every decision epoch corresponds to the special case where L = and U =. This type of control policy is aptly known as a Bayesian control chart (e.g., Makis 28, Kim et al To facilitate our discussion, we refer to the policy that never takes a sample as an -Policy, the policy that always Table 1. Sample path realization of Y n, II n and a n. n Y n II n a n n Y n II n a n takes a sample as an -Policy, and our jointly optimal policy of Theorem 3 as a -Policy. We should note that the optimality of our joint policy has already been established, so necessarily it will outperform any other chosen suboptimal policy. The objective of this section is not to convince readers of this, but rather to illustrate the possible benefits for a given set of model parameters when compared to well-known suboptimal benchmark policies. For this comparison, we consider the following transition rate matrix and state-observation matrix: = =( and model parameters C W = 7, C F = 11, C S = 8, C I = 55, C P = 55, and T F = T I = T P = h = 1. We obtain the following results in Table 2. Table 2 shows that the jointly optimal -Policy performs substantially better than both the optimal -Policy and the optimal -Policy. In particular, Table 2 shows that using the optimal -Policy gives an expected 1.29% cost saving over the optimal -Policy, and an expected 8.51% cost saving over the optimal -Policy. Naturally, determining the optimal thresholds values L U for the -Policy takes longer than the determining the optimal threshold values for the optimal -Policy and the optimal -Policy. However, in practice, since these computations are typically done off-line, a total run time of a few minutes is surely worth the large cost savings obtained by using the optimal Table 2. Comparision with suboptimal policies. -Policy -Policy -Policy L 27 9 U Run time

11 Operations Research 61(3, pp , 213 INFORMS 787 Table 3. Optimal expected average cost for varying sampling costs C s. C s -Policy -Policy -Policy Policy. It is also interesting to note that in this example, the optimal threshold for full system inspection is quite low for all three policies. This is because the cost of corrective maintenance C F = 11 is relatively much higher than the cost of system inspection C I = 55 and preventive maintenance C P = 55. Therefore, it is more beneficial to perform full system inspection more frequently than to run the system longer and risk costly corrective maintenance due to failure. We next analyze the sensitivity of the optimal policy for different value of the sampling cost C S. In light of Corollary 2, we already know that the optimal -Policy coincides with the optimal -Policy when C S =, and with the optimal -Policy when C S > C I + C P = 11. We obtain the following results in Table 3. Table 3 provides important managerial insight into the operational value of condition monitoring information and technologies. This insight is best understood visually, so we plot the optimal expected average costs of Table 3 in Figure 3. Figure 3. Optimal average cost rate Graphical illustration of the optimal expected average cost for varying sampling costs C S Sampling cost The dashed horizontal line in Figure 3 is the expected average cost for the optimal -Policy for different values of the sampling cost C S. The dotted increasing line is the expected average cost for the optimal -Policy, and the solid increasing curve is the expected average cost for the optimal -Policy. Figure 3 shows that the optimal -Policy coincides with the optimal -Policy when C S =, and with the optimal -Policy when C S 24. The optimal -Policy is always better than the optimal -Policy from C S = to around C S = 9, after which the optimal -Policy is always better than the optimal -Policy. This implies that once the sampling cost C S exceeds 9, it is better to never take a sample and be ignorant of the state of the system than to incur regular condition monitoring sample costs to get a better idea of the system state. Although the optimal -Policy is always better than both the optimal -Policy and optimal -Policy for all values of C S, the benefits are approximately the greatest when C S = 9, i.e., the point at which the optimal -Policy is better than the optimal -Policy. On the other hand, the benefits of using the optimal -Policy become quite marginal when C S is close to and 24. This suggests that a manager is likely not to be willing to invest in condition monitoring technologies if the sampling cost C S is close to 24. Similarly, a manager should choose to sample the system at every decision epoch to simplify the scheduling of sampling and maintenance activities if the sampling cost C S is close to. The focus of this paper has been on a system with state space = 1 2, i.e., N = 2. In the following section, we discuss how to extend our results to the general case in which N > Concluding Remarks and Future Research In this paper, a joint sampling and control problem under partial observations has been considered. The problem has been formulated as a partially observable Markov decision process. The objective was to characterize the form of the optimal sampling and maintenance policy that minimizes the long-run expected average cost per unit time. It was shown that the optimal control policy can be represented as a control chart with three critical thresholds, which monitors the posterior probability that the system is in a so-called warning state. Such a control chart has direct practical value as it can be readily implemented for online decision-making. Furthermore, since the monitored statistic and three critical thresholds have straightforward and intuitive interpretation, decisions can be easily justified and explained at a managerial level. It was also shown that the structure of the optimal policy allows practitioners to plan and schedule maintenance activities into the future. A cost comparison with other suboptimal policies has been examined, which illustrates the benefits of the joint

12 788 Operations Research 61(3, pp , 213 INFORMS optimization of sampling and control. It was found that the jointly optimal sampling and maintenance policy performed substantially better than existing suboptimal policies. Numerical results indicate that the advantage of using the jointly optimal sampling and maintenance policy becomes less substantial for both very small and large values of the sampling cost C S. There are a number of exciting extensions and topics for future research. The numerical results of 4 showed that the run time of our algorithm took around 6 minutes to complete. Although this is not unreasonably long, there is still much room for improvement. In particular, a closer look at Theorem 3 reveals that the result has further computational value. Recall that the original stochastic control problem defined in (5 was transformed to an equivalent parameterized stochastic control problem (with parameter with an additive objective function using the -minimization technique. However, the characterization given in Theorem 3 implies that the optimal control policy no longer depends on and is completely determined by the ordered triple L U. This is a useful property from a computational point of view, since it is possible to develop an algorithm that directly finds the optimal values of L U that minimize the original objective function defined in (5. Such an algorithm would likely be faster than the algorithm presented in 4, as one would now be solving a single optimization problem as opposed to solving multiple stochastic control problems for different values of. Another interesting research direction would be to consider a more general model in which the system state process need only be stochastically increasing, which would be less restrictive than the current upper triangularity and increasing failure rate assumptions. It is conceivable that the main results given in this paper could still be established under this weaker assumption. While we believe the current assumptions in this paper are reasonable for most mechanical systems that cannot self-repair, such an extension might be useful in studying, for example, medical decision problems in which the patient degradation process might not necessarily be monotonic. Lastly, an important future research topic would be to develop a full case study in which the effectiveness and quality of our results would be tested on real-world data sets. This could likely also lead to a further refinement of both the model and algorithm. Appendix. Proofs Proof of Lemma 1. We prove this lemma using mathematical induction. For k = 1, substituting Equations (1 (12 into (15 shows that V1 1, V 1 2, V 1 3 are linear, and hence concave, functions of. Assume that for some k >, V k is a concave function of. We want to show that V k+1 is also a concave function of. Since the min operator preserves concavity and Rh is a linear function of, it suffices to show that Rh V k 1 and Rh y E V k 1 y gy are concave functions of. Fix arbitrary Then by Equation (12, ( Rh = 1 Rh Rh 2 ( 1 Rh + 2 Rh Rh 2 Then by concavity of V k, Rh V k Rh 1 V k Rh 2 V k 1 2 which shows that Rh V k 1 is a concave function of. Similarly, by Equation (11, for each y E, 1 y ( gy = 1 Rh 1 gy 1 Rh 1 +1 gy 2 Rh 2 1 y 1 ( 1 gy + 2 Rh 2 gy 1 Rh gy 2 Rh 2 1 y 2 Then by concavity of V k, Rh V k 1 y gy y E y EV k 1 y 1 gy 1 Rh V k 1 y 2 gy 2 Rh 2 = Rh 1 y E V k 1 y 1 gy Rh 2 y E V k 1 y 2 gy 2 which shows that Rh y E V k 1 y gy is a concave function of. Thus by mathematical induction, for each k >, V k is a concave function of. Proof of Lemma 2. We prove inequality (17 using mathematical induction. For k =, it is clear that V h + T F /1 Rh. Assume that for some k, V k h + T F /1 Rh. Then it follows that V 1 k+1 = C W p 1 t1 + p 11 t dt Rt dt + C F T F 1 Rh + Rh V k 1 h + T F h + T F Rh 1 Rh h + T F h + T F Rh 1 Rh = h + T F 1 Rh

Abstract. 1. Introduction

Abstract. 1. Introduction Abstract Repairable system reliability: recent developments in CBM optimization A.K.S. Jardine, D. Banjevic, N. Montgomery, A. Pak Department of Mechanical and Industrial Engineering, University of Toronto,

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach

Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach Decentralized Stochastic Control with Partial Sharing Information Structures: A Common Information Approach 1 Ashutosh Nayyar, Aditya Mahajan and Demosthenis Teneketzis Abstract A general model of decentralized

More information

Optimal Control and Estimation of Stochastic Systems with Costly Partial Information. Michael Jong Kim

Optimal Control and Estimation of Stochastic Systems with Costly Partial Information. Michael Jong Kim Optimal Control and Estimation of Stochastic Systems with Costly Partial Information by Michael Jong Kim A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago December 2015 Abstract Rothschild and Stiglitz (1970) represent random

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds

More information

OPTIMAL CONTROL OF A FLEXIBLE SERVER

OPTIMAL CONTROL OF A FLEXIBLE SERVER Adv. Appl. Prob. 36, 139 170 (2004) Printed in Northern Ireland Applied Probability Trust 2004 OPTIMAL CONTROL OF A FLEXIBLE SERVER HYUN-SOO AHN, University of California, Berkeley IZAK DUENYAS, University

More information

In diagnostic services, agents typically need to weigh the benefit of running an additional test and improving

In diagnostic services, agents typically need to weigh the benefit of running an additional test and improving MANAGEMENT SCIENCE Vol. 59, No. 1, January 213, pp. 157 171 ISSN 25-199 (print) ISSN 1526-551 (online) http://dx.doi.org/1.1287/mnsc.112.1576 213 INFORMS Diagnostic Accuracy Under Congestion Saed Alizamir

More information

Worst case analysis for a general class of on-line lot-sizing heuristics

Worst case analysis for a general class of on-line lot-sizing heuristics Worst case analysis for a general class of on-line lot-sizing heuristics Wilco van den Heuvel a, Albert P.M. Wagelmans a a Econometric Institute and Erasmus Research Institute of Management, Erasmus University

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago September 2015 Abstract Rothschild and Stiglitz (1970) introduce a

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

21 Markov Decision Processes

21 Markov Decision Processes 2 Markov Decision Processes Chapter 6 introduced Markov chains and their analysis. Most of the chapter was devoted to discrete time Markov chains, i.e., Markov chains that are observed only at discrete

More information

A Rothschild-Stiglitz approach to Bayesian persuasion

A Rothschild-Stiglitz approach to Bayesian persuasion A Rothschild-Stiglitz approach to Bayesian persuasion Matthew Gentzkow and Emir Kamenica Stanford University and University of Chicago January 2016 Consider a situation where one person, call him Sender,

More information

A new condition based maintenance model with random improvements on the system after maintenance actions: Optimizing by monte carlo simulation

A new condition based maintenance model with random improvements on the system after maintenance actions: Optimizing by monte carlo simulation ISSN 1 746-7233, England, UK World Journal of Modelling and Simulation Vol. 4 (2008) No. 3, pp. 230-236 A new condition based maintenance model with random improvements on the system after maintenance

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process

Bayesian Congestion Control over a Markovian Network Bandwidth Process Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard 1/30 Bayesian Congestion Control over a Markovian Network Bandwidth Process Parisa Mansourifard (USC) Joint work

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

Value of Information Analysis with Structural Reliability Methods

Value of Information Analysis with Structural Reliability Methods Accepted for publication in Structural Safety, special issue in the honor of Prof. Wilson Tang August 2013 Value of Information Analysis with Structural Reliability Methods Daniel Straub Engineering Risk

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Scheduling Markovian PERT networks to maximize the net present value: new results

Scheduling Markovian PERT networks to maximize the net present value: new results Scheduling Markovian PERT networks to maximize the net present value: new results Hermans B, Leus R. KBI_1709 Scheduling Markovian PERT networks to maximize the net present value: New results Ben Hermans,a

More information

A linear programming approach to nonstationary infinite-horizon Markov decision processes

A linear programming approach to nonstationary infinite-horizon Markov decision processes A linear programming approach to nonstationary infinite-horizon Markov decision processes Archis Ghate Robert L Smith July 24, 2012 Abstract Nonstationary infinite-horizon Markov decision processes (MDPs)

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018 Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections

More information

Strategic Dynamic Jockeying Between Two Parallel Queues

Strategic Dynamic Jockeying Between Two Parallel Queues Strategic Dynamic Jockeying Between Two Parallel Queues Amin Dehghanian 1 and Jeffrey P. Kharoufeh 2 Department of Industrial Engineering University of Pittsburgh 1048 Benedum Hall 3700 O Hara Street Pittsburgh,

More information

Production Policies for Multi-Product Systems with Deteriorating. Process Condition

Production Policies for Multi-Product Systems with Deteriorating. Process Condition Production Policies for Multi-Product Systems with Deteriorating Process Condition Burak Kazaz School of Business, University of Miami, Coral Gables, FL 3324. bkazaz@miami.edu Thomas W. Sloan College of

More information

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES

THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES MATHEMATICS OF OPERATIONS RESEARCH Vol. 27, No. 4, November 2002, pp. 819 840 Printed in U.S.A. THE COMPLEXITY OF DECENTRALIZED CONTROL OF MARKOV DECISION PROCESSES DANIEL S. BERNSTEIN, ROBERT GIVAN, NEIL

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Dynamically scheduling and maintaining a flexible server

Dynamically scheduling and maintaining a flexible server Dynamically scheduling and maintaining a flexible server Jefferson Huang 1, Douglas G. Down 2, Mark E. Lewis 3, and Cheng-Hung Wu 4 1 Operations Research Department, Naval Postgraduate School, Monterey,

More information

On the Partitioning of Servers in Queueing Systems during Rush Hour

On the Partitioning of Servers in Queueing Systems during Rush Hour On the Partitioning of Servers in Queueing Systems during Rush Hour Bin Hu Saif Benjaafar Department of Operations and Management Science, Ross School of Business, University of Michigan at Ann Arbor,

More information

Optimal Decentralized Control of Coupled Subsystems With Control Sharing

Optimal Decentralized Control of Coupled Subsystems With Control Sharing IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 58, NO. 9, SEPTEMBER 2013 2377 Optimal Decentralized Control of Coupled Subsystems With Control Sharing Aditya Mahajan, Member, IEEE Abstract Subsystems that

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

How Much Evidence Should One Collect?

How Much Evidence Should One Collect? How Much Evidence Should One Collect? Remco Heesen October 10, 2013 Abstract This paper focuses on the question how much evidence one should collect before deciding on the truth-value of a proposition.

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

A CONDITION-BASED MAINTENANCE MODEL FOR AVAILABILITY OPTIMIZATION FOR STOCHASTIC DEGRADING SYSTEMS

A CONDITION-BASED MAINTENANCE MODEL FOR AVAILABILITY OPTIMIZATION FOR STOCHASTIC DEGRADING SYSTEMS A CONDITION-BASED MAINTENANCE MODEL FOR AVAILABILITY OPTIMIZATION FOR STOCHASTIC DEGRADING SYSTEMS Abdelhakim Khatab, Daoud Ait-Kadi, Nidhal Rezg To cite this version: Abdelhakim Khatab, Daoud Ait-Kadi,

More information

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18.

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18. IEOR 6711: Stochastic Models I, Fall 23, Professor Whitt Solutions to Final Exam: Thursday, December 18. Below are six questions with several parts. Do as much as you can. Show your work. 1. Two-Pump Gas

More information

A Dynamic Programming Approach for Sequential Preventive Maintenance Policies with Two Failure Modes

A Dynamic Programming Approach for Sequential Preventive Maintenance Policies with Two Failure Modes Chapter 1 A Dynamic Programming Approach for Sequential Preventive Maintenance Policies with Two Failure Modes Hiroyuki Okamura 1, Tadashi Dohi 1 and Shunji Osaki 2 1 Department of Information Engineering,

More information

Markov Decision Processes Infinite Horizon Problems

Markov Decision Processes Infinite Horizon Problems Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld 1 What is a solution to an MDP? MDP Planning Problem: Input: an MDP (S,A,R,T)

More information

Partially Observable Markov Decision Processes (POMDPs)

Partially Observable Markov Decision Processes (POMDPs) Partially Observable Markov Decision Processes (POMDPs) Geoff Hollinger Sequential Decision Making in Robotics Spring, 2011 *Some media from Reid Simmons, Trey Smith, Tony Cassandra, Michael Littman, and

More information

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden

Selecting Efficient Correlated Equilibria Through Distributed Learning. Jason R. Marden 1 Selecting Efficient Correlated Equilibria Through Distributed Learning Jason R. Marden Abstract A learning rule is completely uncoupled if each player s behavior is conditioned only on his own realized

More information

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

Simplex Algorithm for Countable-state Discounted Markov Decision Processes Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision

More information

On the static assignment to parallel servers

On the static assignment to parallel servers On the static assignment to parallel servers Ger Koole Vrije Universiteit Faculty of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The Netherlands Email: koole@cs.vu.nl, Url: www.cs.vu.nl/

More information

Online Appendix for Sourcing from Suppliers with Financial Constraints and Performance Risk

Online Appendix for Sourcing from Suppliers with Financial Constraints and Performance Risk Online Appendix for Sourcing from Suppliers with Financial Constraints and Performance Ris Christopher S. Tang S. Alex Yang Jing Wu Appendix A: Proofs Proof of Lemma 1. In a centralized chain, the system

More information

The knowledge gradient method for multi-armed bandit problems

The knowledge gradient method for multi-armed bandit problems The knowledge gradient method for multi-armed bandit problems Moving beyond inde policies Ilya O. Ryzhov Warren Powell Peter Frazier Department of Operations Research and Financial Engineering Princeton

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

The Simplex Method: An Example

The Simplex Method: An Example The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified

More information

Dynamic Control of a Tandem Queueing System with Abandonments

Dynamic Control of a Tandem Queueing System with Abandonments Dynamic Control of a Tandem Queueing System with Abandonments Gabriel Zayas-Cabán 1 Jungui Xie 2 Linda V. Green 3 Mark E. Lewis 1 1 Cornell University Ithaca, NY 2 University of Science and Technology

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

arxiv: v1 [math.ho] 25 Feb 2008

arxiv: v1 [math.ho] 25 Feb 2008 A Note on Walking Versus Waiting Anthony B. Morton February 28 arxiv:82.3653v [math.ho] 25 Feb 28 To what extent is a traveller called Justin, say) better off to wait for a bus rather than just start walking

More information

Session-Based Queueing Systems

Session-Based Queueing Systems Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the

More information

A Parametric Simplex Algorithm for Linear Vector Optimization Problems

A Parametric Simplex Algorithm for Linear Vector Optimization Problems A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

Persuading a Pessimist

Persuading a Pessimist Persuading a Pessimist Afshin Nikzad PRELIMINARY DRAFT Abstract While in practice most persuasion mechanisms have a simple structure, optimal signals in the Bayesian persuasion framework may not be so.

More information

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers

On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers On the Approximate Solution of POMDP and the Near-Optimality of Finite-State Controllers Huizhen (Janey) Yu (janey@mit.edu) Dimitri Bertsekas (dimitrib@mit.edu) Lab for Information and Decision Systems,

More information

Persuading Skeptics and Reaffirming Believers

Persuading Skeptics and Reaffirming Believers Persuading Skeptics and Reaffirming Believers May, 31 st, 2014 Becker-Friedman Institute Ricardo Alonso and Odilon Camara Marshall School of Business - USC Introduction Sender wants to influence decisions

More information

Coordinating Inventory Control and Pricing Strategies with Random Demand and Fixed Ordering Cost: The Finite Horizon Case

Coordinating Inventory Control and Pricing Strategies with Random Demand and Fixed Ordering Cost: The Finite Horizon Case OPERATIONS RESEARCH Vol. 52, No. 6, November December 2004, pp. 887 896 issn 0030-364X eissn 1526-5463 04 5206 0887 informs doi 10.1287/opre.1040.0127 2004 INFORMS Coordinating Inventory Control Pricing

More information

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1 Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle

More information

Sequential Decisions

Sequential Decisions Sequential Decisions A Basic Theorem of (Bayesian) Expected Utility Theory: If you can postpone a terminal decision in order to observe, cost free, an experiment whose outcome might change your terminal

More information

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321

Lecture 11: Introduction to Markov Chains. Copyright G. Caire (Sample Lectures) 321 Lecture 11: Introduction to Markov Chains Copyright G. Caire (Sample Lectures) 321 Discrete-time random processes A sequence of RVs indexed by a variable n 2 {0, 1, 2,...} forms a discretetime random process

More information

Journal of Process Control

Journal of Process Control Journal of Process Control 22 (2012) 1478 1489 Contents lists available at SciVerse ScienceDirect Journal of Process Control j ourna l ho me pag e: www.elsevier.com/locate/jprocont On defect propagation

More information

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming MATHEMATICS OF OPERATIONS RESEARCH Vol. 37, No. 1, February 2012, pp. 66 94 ISSN 0364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/10.1287/moor.1110.0532 2012 INFORMS Q-Learning and Enhanced

More information

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem

Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Bayesian Congestion Control over a Markovian Network Bandwidth Process: A multiperiod Newsvendor Problem Parisa Mansourifard 1/37 Bayesian Congestion Control over a Markovian Network Bandwidth Process:

More information

Discrete-Time Markov Decision Processes

Discrete-Time Markov Decision Processes CHAPTER 6 Discrete-Time Markov Decision Processes 6.0 INTRODUCTION In the previous chapters we saw that in the analysis of many operational systems the concepts of a state of a system and a state transition

More information

2. Transience and Recurrence

2. Transience and Recurrence Virtual Laboratories > 15. Markov Chains > 1 2 3 4 5 6 7 8 9 10 11 12 2. Transience and Recurrence The study of Markov chains, particularly the limiting behavior, depends critically on the random times

More information

Optimal Rejuvenation for. Tolerating Soft Failures. Andras Pfening, Sachin Garg, Antonio Puliato, Miklos Telek, Kishor S. Trivedi.

Optimal Rejuvenation for. Tolerating Soft Failures. Andras Pfening, Sachin Garg, Antonio Puliato, Miklos Telek, Kishor S. Trivedi. Optimal Rejuvenation for Tolerating Soft Failures Andras Pfening, Sachin Garg, Antonio Puliato, Miklos Telek, Kishor S. Trivedi Abstract In the paper we address the problem of determining the optimal time

More information

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon

Optimal Control of Partiality Observable Markov. Processes over a Finite Horizon Optimal Control of Partiality Observable Markov Processes over a Finite Horizon Report by Jalal Arabneydi 04/11/2012 Taken from Control of Partiality Observable Markov Processes over a finite Horizon by

More information

6 Evolution of Networks

6 Evolution of Networks last revised: March 2008 WARNING for Soc 376 students: This draft adopts the demography convention for transition matrices (i.e., transitions from column to row). 6 Evolution of Networks 6. Strategic network

More information

Appendix (For Online Publication) Community Development by Public Wealth Accumulation

Appendix (For Online Publication) Community Development by Public Wealth Accumulation March 219 Appendix (For Online Publication) to Community Development by Public Wealth Accumulation Levon Barseghyan Department of Economics Cornell University Ithaca NY 14853 lb247@cornell.edu Stephen

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Ordering Policies for Periodic-Review Inventory Systems with Quantity-Dependent Fixed Costs

Ordering Policies for Periodic-Review Inventory Systems with Quantity-Dependent Fixed Costs OPERATIONS RESEARCH Vol. 60, No. 4, July August 2012, pp. 785 796 ISSN 0030-364X (print) ISSN 1526-5463 (online) http://dx.doi.org/10.1287/opre.1110.1033 2012 INFORMS Ordering Policies for Periodic-Review

More information

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7).

Economics 201B Economic Theory (Spring 2017) Bargaining. Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7). Economics 201B Economic Theory (Spring 2017) Bargaining Topics: the axiomatic approach (OR 15) and the strategic approach (OR 7). The axiomatic approach (OR 15) Nash s (1950) work is the starting point

More information

A MODEL FOR THE LONG-TERM OPTIMAL CAPACITY LEVEL OF AN INVESTMENT PROJECT

A MODEL FOR THE LONG-TERM OPTIMAL CAPACITY LEVEL OF AN INVESTMENT PROJECT A MODEL FOR HE LONG-ERM OPIMAL CAPACIY LEVEL OF AN INVESMEN PROJEC ARNE LØKKA AND MIHAIL ZERVOS Abstract. We consider an investment project that produces a single commodity. he project s operation yields

More information

IN THIS paper we investigate the diagnosability of stochastic

IN THIS paper we investigate the diagnosability of stochastic 476 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 50, NO 4, APRIL 2005 Diagnosability of Stochastic Discrete-Event Systems David Thorsley and Demosthenis Teneketzis, Fellow, IEEE Abstract We investigate

More information

Chapter 16 Planning Based on Markov Decision Processes

Chapter 16 Planning Based on Markov Decision Processes Lecture slides for Automated Planning: Theory and Practice Chapter 16 Planning Based on Markov Decision Processes Dana S. Nau University of Maryland 12:48 PM February 29, 2012 1 Motivation c a b Until

More information

Bayesian Persuasion Online Appendix

Bayesian Persuasion Online Appendix Bayesian Persuasion Online Appendix Emir Kamenica and Matthew Gentzkow University of Chicago June 2010 1 Persuasion mechanisms In this paper we study a particular game where Sender chooses a signal π whose

More information

The Complexity of Decentralized Control of Markov Decision Processes

The Complexity of Decentralized Control of Markov Decision Processes The Complexity of Decentralized Control of Markov Decision Processes Daniel S. Bernstein Robert Givan Neil Immerman Shlomo Zilberstein Department of Computer Science University of Massachusetts Amherst,

More information

Review Paper Machine Repair Problem with Spares and N-Policy Vacation

Review Paper Machine Repair Problem with Spares and N-Policy Vacation Research Journal of Recent Sciences ISSN 2277-2502 Res.J.Recent Sci. Review Paper Machine Repair Problem with Spares and N-Policy Vacation Abstract Sharma D.C. School of Mathematics Statistics and Computational

More information

Stochastic convexity in dynamic programming

Stochastic convexity in dynamic programming Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:

More information

On the Partitioning of Servers in Queueing Systems during Rush Hour

On the Partitioning of Servers in Queueing Systems during Rush Hour On the Partitioning of Servers in Queueing Systems during Rush Hour This paper is motivated by two phenomena observed in many queueing systems in practice. The first is the partitioning of server capacity

More information

Learning with Rejection

Learning with Rejection Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,

More information

Stochastic Analysis of a Two-Unit Cold Standby System with Arbitrary Distributions for Life, Repair and Waiting Times

Stochastic Analysis of a Two-Unit Cold Standby System with Arbitrary Distributions for Life, Repair and Waiting Times International Journal of Performability Engineering Vol. 11, No. 3, May 2015, pp. 293-299. RAMS Consultants Printed in India Stochastic Analysis of a Two-Unit Cold Standby System with Arbitrary Distributions

More information

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974 LIMITS FOR QUEUES AS THE WAITING ROOM GROWS by Daniel P. Heyman Ward Whitt Bell Communications Research AT&T Bell Laboratories Red Bank, NJ 07701 Murray Hill, NJ 07974 May 11, 1988 ABSTRACT We study the

More information

Lecture 20 : Markov Chains

Lecture 20 : Markov Chains CSCI 3560 Probability and Computing Instructor: Bogdan Chlebus Lecture 0 : Markov Chains We consider stochastic processes. A process represents a system that evolves through incremental changes called

More information

Common Knowledge and Sequential Team Problems

Common Knowledge and Sequential Team Problems Common Knowledge and Sequential Team Problems Authors: Ashutosh Nayyar and Demosthenis Teneketzis Computer Engineering Technical Report Number CENG-2018-02 Ming Hsieh Department of Electrical Engineering

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Scheduling with Advanced Process Control Constraints

Scheduling with Advanced Process Control Constraints Scheduling with Advanced Process Control Constraints Yiwei Cai, Erhan Kutanoglu, John Hasenbein, Joe Qin July 2, 2009 Abstract With increasing worldwide competition, high technology manufacturing companies

More information

Stochastic modelling of epidemic spread

Stochastic modelling of epidemic spread Stochastic modelling of epidemic spread Julien Arino Centre for Research on Inner City Health St Michael s Hospital Toronto On leave from Department of Mathematics University of Manitoba Julien Arino@umanitoba.ca

More information

A Note on Auxiliary Particle Filters

A Note on Auxiliary Particle Filters A Note on Auxiliary Particle Filters Adam M. Johansen a,, Arnaud Doucet b a Department of Mathematics, University of Bristol, UK b Departments of Statistics & Computer Science, University of British Columbia,

More information

Revenue Maximization in a Cloud Federation

Revenue Maximization in a Cloud Federation Revenue Maximization in a Cloud Federation Makhlouf Hadji and Djamal Zeghlache September 14th, 2015 IRT SystemX/ Telecom SudParis Makhlouf Hadji Outline of the presentation 01 Introduction 02 03 04 05

More information

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

INTRODUCTION TO MARKOV CHAIN MONTE CARLO INTRODUCTION TO MARKOV CHAIN MONTE CARLO 1. Introduction: MCMC In its simplest incarnation, the Monte Carlo method is nothing more than a computerbased exploitation of the Law of Large Numbers to estimate

More information

CS 7180: Behavioral Modeling and Decisionmaking

CS 7180: Behavioral Modeling and Decisionmaking CS 7180: Behavioral Modeling and Decisionmaking in AI Markov Decision Processes for Complex Decisionmaking Prof. Amy Sliva October 17, 2012 Decisions are nondeterministic In many situations, behavior and

More information

Finding the Value of Information About a State Variable in a Markov Decision Process 1

Finding the Value of Information About a State Variable in a Markov Decision Process 1 05/25/04 1 Finding the Value of Information About a State Variable in a Markov Decision Process 1 Gilvan C. Souza The Robert H. Smith School of usiness, The University of Maryland, College Park, MD, 20742

More information

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal

Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Channel Probing in Communication Systems: Myopic Policies Are Not Always Optimal Matthew Johnston, Eytan Modiano Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge,

More information

University of Groningen. Statistical Auditing and the AOQL-method Talens, Erik

University of Groningen. Statistical Auditing and the AOQL-method Talens, Erik University of Groningen Statistical Auditing and the AOQL-method Talens, Erik IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check

More information

Network design for a service operation with lost demand and possible disruptions

Network design for a service operation with lost demand and possible disruptions Network design for a service operation with lost demand and possible disruptions Opher Baron, Oded Berman, Yael Deutsch Joseph L. Rotman School of Management, University of Toronto, 105 St. George St.,

More information

Control Theory : Course Summary

Control Theory : Course Summary Control Theory : Course Summary Author: Joshua Volkmann Abstract There are a wide range of problems which involve making decisions over time in the face of uncertainty. Control theory draws from the fields

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information