Mapping tightly-coupled applications on volatile resources

Similar documents
Scheduling Tightly-Coupled Applications on Heterogeneous Desktop Grids

Scheduling Parallel Iterative Applications on Volatile Resources

Scheduling Parallel Iterative Applications on Volatile Resources

A new simple recursive algorithm for finding prime numbers using Rosser s theorem

Solving an integrated Job-Shop problem with human resource constraints

Methylation-associated PHOX2B gene silencing is a rare event in human neuroblastoma.

Easter bracelets for years

Analysis of Boyer and Moore s MJRTY algorithm

Multiple sensor fault detection in heat exchanger system

Case report on the article Water nanoelectrolysis: A simple model, Journal of Applied Physics (2017) 122,

A new approach of the concept of prime number

Climbing discrepancy search for flowshop and jobshop scheduling with time-lags

Completeness of the Tree System for Propositional Classical Logic

A Simple Proof of P versus NP

MODal ENergy Analysis

b-chromatic number of cacti

Smart Bolometer: Toward Monolithic Bolometer with Smart Functions

Thomas Lugand. To cite this version: HAL Id: tel

Reduced Models (and control) of in-situ decontamination of large water resources

Dispersion relation results for VCS at JLab

On Solving Aircraft Conflict Avoidance Using Deterministic Global Optimization (sbb) Codes

Factorisation of RSA-704 with CADO-NFS

On the Complexity of Mapping Pipelined Filtering Services on Heterogeneous Platforms

On size, radius and minimum degree

Nonlocal computational methods applied to composites structures

Slack-Time Computation for Temporal Robustness in Embedded Systems

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Performance analysis of clouds with phase-type arrivals

Learning an Adaptive Dictionary Structure for Efficient Image Sparse Coding

Energy-efficient scheduling

On Symmetric Norm Inequalities And Hermitian Block-Matrices

A proximal approach to the inversion of ill-conditioned matrices

Exact Comparison of Quadratic Irrationals

Influence of a Rough Thin Layer on the Potential

Passerelle entre les arts : la sculpture sonore

A Slice Based 3-D Schur-Cohn Stability Criterion

Soundness of the System of Semantic Trees for Classical Logic based on Fitting and Smullyan

A novel method for estimating the flicker level generated by a wave energy farm composed of devices operated in variable speed mode

Full-order observers for linear systems with unknown inputs

Spatial representativeness of an air quality monitoring station. Application to NO2 in urban areas

Territorial Intelligence and Innovation for the Socio-Ecological Transition

On the longest path in a recursively partitionable graph

Divisible Load Scheduling

STATISTICAL ENERGY ANALYSIS: CORRELATION BETWEEN DIFFUSE FIELD AND ENERGY EQUIPARTITION

3D-CE5.h: Merge candidate list for disparity compensated prediction

Energy-aware checkpointing of divisible tasks with soft or hard deadlines

A non-commutative algorithm for multiplying (7 7) matrices using 250 multiplications

FORMAL TREATMENT OF RADIATION FIELD FLUCTUATIONS IN VACUUM

Voltage Stability of Multiple Distributed Generators in Distribution Networks

From Unstructured 3D Point Clouds to Structured Knowledge - A Semantics Approach

A CONDITION-BASED MAINTENANCE MODEL FOR AVAILABILITY OPTIMIZATION FOR STOCHASTIC DEGRADING SYSTEMS

The magnetic field diffusion equation including dynamic, hysteresis: A linear formulation of the problem

Parallel Repetition of entangled games on the uniform distribution

Decentralized Interference Channels with Noisy Feedback Possess Pareto Optimal Nash Equilibria

Evolution of the cooperation and consequences of a decrease in plant diversity on the root symbiont diversity

The Windy Postman Problem on Series-Parallel Graphs

Power-Aware Linear Programming Based Scheduling for Heterogeneous Computer Clusters

Hardware Operator for Simultaneous Sine and Cosine Evaluation

Question order experimental constraints on quantum-like models of judgement

Comparison of NEWUOA with Different Numbers of Interpolation Points on the BBOB Noiseless Testbed

Vibro-acoustic simulation of a car window

New Basis Points of Geodetic Stations for Landslide Monitoring

Can we reduce health inequalities? An analysis of the English strategy ( )

Exogenous input estimation in Electronic Power Steering (EPS) systems

On path partitions of the divisor graph

A Novel Aggregation Method based on Graph Matching for Algebraic MultiGrid Preconditioning of Sparse Linear Systems

Towards an active anechoic room

L institution sportive : rêve et illusion

Diagnosability Analysis of Discrete Event Systems with Autonomous Components

Quantum efficiency and metastable lifetime measurements in ruby ( Cr 3+ : Al2O3) via lock-in rate-window photothermal radiometry

AC Transport Losses Calculation in a Bi-2223 Current Lead Using Thermal Coupling With an Analytical Formula

How to make R, PostGIS and QGis cooperate for statistical modelling duties: a case study on hedonic regressions

The Accelerated Euclidean Algorithm

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Linear Quadratic Zero-Sum Two-Person Differential Games

Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices

Hook lengths and shifted parts of partitions

A Context free language associated with interval maps

Nodal and divergence-conforming boundary-element methods applied to electromagnetic scattering problems

How to deal with uncertainties and dynamicity?

On infinite permutations

Black-Box Optimization Benchmarking the IPOP-CMA-ES on the Noisy Testbed

A Simple Model for Cavitation with Non-condensable Gases

On Newton-Raphson iteration for multiplicative inverses modulo prime powers

approximation results for the Traveling Salesman and related Problems

On production costs in vertical differentiation models

Optimal quorum for a reliable Desktop grid

GENERALIZED OPTICAL BISTABILITY AND CHAOS IN A LASER WITH A SATURABLE ABSORBER

Numerical Modeling of Eddy Current Nondestructive Evaluation of Ferromagnetic Tubes via an Integral. Equation Approach

The core of voting games: a partition approach

Sound intensity as a function of sound insulation partition

Call Detail Records to Characterize Usages and Mobility Events of Phone Users

Improved Adaptive Resolution Molecular Dynamics Simulation

Some approaches to modeling of the effective properties for thermoelastic composites

Stochastic invariances and Lamperti transformations for Stochastic Processes

Predicting the risk of non-compliance to EMC requirements during the life-cycle

On scheduling the checkpoints of exascale applications

On Symmetric Norm Inequalities And Hermitian Block-Matrices

A remark on a theorem of A. E. Ingham.

Dissipative Systems Analysis and Control, Theory and Applications: Addendum/Erratum

Transcription:

Mapping tightly-coupled applications on volatile resources Henri Casanova, Fanny Dufossé, Yves Robert, Frédéric Vivien To cite this version: Henri Casanova, Fanny Dufossé, Yves Robert, Frédéric Vivien. Mapping tightly-coupled applications on volatile resources. PDP 2013, the 21st Euromicro Int. Conf. on Parallel, Distributed, and Network- Based Processing, Feb 2013, Belfast, United Kingdom. IEEE Computer Society Press, 2013. <hal- 00763376> HAL Id: hal-00763376 https://hal.inria.fr/hal-00763376 Submitted on 10 Dec 2012 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Mapping Tightly-Coupled Applications on Volatile Resources Henri Casanova 1, Fanny Dufossé 2, Yves Robert 2,3 and Frédéric Vivien 2 1. Univ. of Hawai i at Manoa, Honolulu, USA, henric@hawaii.edu 2. Ecole Normale Supérieure de Lyon,& INRIA, France {FannyDufosse Yves.Robert Frederic.Vivien}@ens-lyon.fr 3. University of Tennessee Knoxville, USA Abstract Platforms that comprise volatile processors, such as desktop grids, have been traditionally used for executing independent-task applications. In this work we study the scheduling of tightly-coupled iterative masterworker applications onto volatile processors. The main challenge is that workers must be simultaneously available for the application to make progress. We consider two additional complications: one should take into account that workers can become temporarily reclaimed and, for dataintensive applications, one should account for the limited bandwidth between the master and the workers. In this context, our first contribution is a theoretical study of the scheduling problem in its off-line version, i.e., when processor availability is known in advance. Even in this case the problem is NP-hard. Our second contribution is an analytical approximation of the expectation of the time needed by a set of workers to complete a set of tasks and of the probability of success of this computation. This approximation relies on a Markovian assumption for the temporal availability of processors. Our third contribution is a set of heuristics, some of which use the above approximation to favor reliable processors in a sensible manner. We evaluate these heuristics in simulation. We identify some heuristics that significantly outperform their competitors and derive heuristic design guidelines. I. INTRODUCTION In this paper we study the problem of scheduling parallel applications onto volatile processors. We target typical scientific iterative applications in which a master process parallelizes the execution of each iteration across worker processes. Each iteration requires the execution of a fixed number of tasks, with a global synchronization at the end of each iteration. We consider a platform that consists of processors that alternate between periods of availability and periods of unavailability. When available each processor runs a worker process, and a master process can choose to enroll a subset of these workers to participate in the application execution. Worker unavailability can be due to software faults, in which case unavailability may last only the time of a reboot. A hardware failure can lead to a longer unavailability period, until a repair is completed and followed by a reboot. We consider a third source of processor unavailability, which comes from cyclestealing scenarios: when a processor is contributed to the platform by an individual owner, this owner can reclaim it at any time without notice for some unknown length of time. A difference here is that the processor is merely preempted (as opposed to being terminated) until the processor is no longer reclaimed. A worker process on this processor can later resume its computation. Accordingly, we use a 3-state availability model: UP (available), DOWN (crashed, computation is lost) and RECLAIMED (preempted, but computation can resume later). Our platform model also accounts for the fact that, due to bandwidth limitation, the master is only able to communicate simultaneously with a limited number of workers (to send them the application program as well as task data). This limitation corresponds to the bounded multi-port model [1]. It turns out that limiting the communication capacity of the master dramatically complicates the design of scheduling strategies. But without this limitation, it would be in principle possible to enroll thousands of new processors at each iteration, which is simply not feasible in practice even if these many processors are available. Given the above application and platform models, and given a deadline (typically expressed in hours or days), the scheduling problem under study is that of maximizing the expected number of application iterations successfully completed before the deadline. Informally, during each iteration, one must use the best processors among those that are simultaneously UP; these could be the fastest ones, or those expected to remain UP for the longest time. In addition, with processors failing, becoming reclaimed, and becoming UP again later, one has to decide when and how to change the set of currently enrolled processors. Each such change comes at a price: first, the application program needs to be sent to newly enrolled processors, thereby consuming some of the master s bandwidth; second, and more importantly, iteration computation that was only partially completed is lost due to the tight coupling of tasks. Our contribution in this work is threefold. First, we

determine the complexity of the off-line scheduling problem, i.e., when processor availability is known in advance. Even with such knowledge the problem is NP-hard. Second, we compute approximations of the expectation of the time needed by a set of processors to complete a set of tasks and of the probability that this computation succeeds. These approximations provide a sound basis for making sensible scheduling decisions. Third, we design several on-line heuristics that we evaluate in simulation. Some of these contributions assume Markovian processor availability, which is not representative of real-world platforms but provides a tractable framework for obtaining theoretical and experimental results in laboratory conditions. Due to lack of space we refer the reader to the companion research report [2] for a worked-out example, related work, statements and proofs of all NP-completeness results, and complete experimental results. A. Application model II. MODELS AND ASSUMPTIONS We consider an application that performs a sequence of iterations. Each iteration consists of executing m tasks and ends with a global synchronization. All m tasks are identical (in terms of computational cost) and communicate throughout the iteration execution. Therefore, all tasks must make progress at the same rate. If a task is terminated prematurely (due to a worker failure), all computation performed so far for the current iteration is lost, and the entire iteration has to be restarted. If a task is suspended (due to a worker becoming temporarily reclaimed), then the entire execution of the iteration is also suspended. Due to the global synchronization, there is no overlap between communication of task data and computation, but only between communication between tasks and computation. We thus consider that an iteration proceeds in two phases: a communication phase and a computation phase. Finally, before being able to compute, a worker must acquire the application code once (e.g., binary executable, byte code), of constant size V prog in bytes, and the input data for each task and iteration, of constant size V data in bytes. Data messages depend on tasks but have identical size. B. Platform model The platform comprises p processors, or workers. Worker P q, q = 1,..., p, can be in one of three states (UP, RECLAIMED or DOWN ), and transitions between these states occur for each processor at each time-slot independently of the other processors. More precisely: Any UP processor can become DOWN or RECLAIMED. Any UP or RECLAIMED processor can become DOWN. It then loses the application program and all the data for its current tasks. If it was computing some of these tasks, these computations are lost. Any UP processor can become RECLAIMED. The processor does not lose any state. If it was receiving the application program or data for a task, the communication is temporarily suspended. If it was computing a task, the computation on all processors is temporarily suspended. We denote by S q the vector that gives the state of P q at each time-slot starting with time-slot 0. P q can compute a task in w q time-slots if it remains UP. If w q = w for each processor P q, then the processors are homogeneous. The master has network bandwidth BW and communicates with a worker with bandwidth bw, meaning that we assume same capacity links from the master to each worker. Here we equate bandwidth with data transfer rate, acknowledging that in practice the data transfer rate is a fraction of the physical bandwidth. Let n prog be the number of workers receiving the program at time t, and let n data be the number of workers receiving the input data of a task at time t. The constraint on the master s bandwidth writes n prog + n data n com = BW /bw, where n com is the maximum number of. processors that the master can communicate with (sending either program or input data) at each time-slot. Indeed, consider a worker on processor P q that is communicating at time t. Either P q is receiving the program, or it is receiving data for a task. In both cases, it does this at data transfer rate bw. Overall, the master can execute only a limited number n com of such communications simultaneously. The time for a worker to receive the program is T prog = V prog /bw, and the time to receive the data is T data = V data /bw. For simplicity we assume that T prog and T data consist of integral numbers of time-slots. We also assume that the master is always UP, which can be enforced by using, for instance, two dedicated servers with a primary backup mechanism. C. Application execution model Let config(t) denote the set of workers enrolled by the master, or configuration, at time t. The configuration is determined by an application scheduler, and in this work we propose algorithms to be used by this scheduler. To complete an iteration, enrolled workers must progress concurrently throughout the computations. One worker may be assigned several tasks and execute them concurrently if it has enough memory to do so. Formally, we define for each worker P q a bound µ q on the maximum number of tasks that it can execute concurrently. We assume that p q=1 µ q m, otherwise the configuration cannot execute the application. The m tasks are mapped onto k m workers. Each enrolled

worker P q is assigned x q tasks, where k q=1 x q = m. To be able to compute their tasks, the k enrolled workers must have received the application program and all necessary data. More precisely: (i) each enrolled worker P q must receive the program, unless it has received it at some previous time and has not be DOWN since then; (ii) in addition, each worker P q must receive a number x q of data messages (one per task) from the master. Suppose that since the begin of the current iteration P q has received x q data messages. At least (x q x q)t data time-slots are needed for this communication, and likely more since the master can be engaged in at most n com concurrent communications. Overall, the computation can start at a time t only if each of the k enrolled workers is in the UP state, has the program, has the data of all its allocated tasks, and has never been in the DOWN state since receiving these messages. Because tasks must proceed in locked steps, the execution goes at the pace of the slowest worker. Hence the computation of an iteration requires max q (x q w q ) time-slots of concurrent computations (not necessarily consecutive, due to workers possibly being reclaimed). Consider the interval of time between time t 1 and time t 2 = t 1 + max q (x q w q ) + t 1 for some t. For the iteration to be successfully completed by time t 2, between t 1 and t 2 there must be max q (x q w q ) timeslots for which all enrolled workers are simultaneously UP, and there may be t time-slots during which one or more workers are RECLAIMED. The scheduler may choose a new configuration at each time t. If at least one worker in config(t) becomes DOWN, the scheduler must select another configuration and restart the iteration from scratch. Even if all workers in config(t) are UP, the scheduler may decide to change the configuration because more desirable (i.e., faster, more reliable) workers have become available. Let P q be a newly enrolled worker at that point, i.e., P q config(t + 1) \ config(t). P q needs to receive the program unless it already has a copy of it and has not been DOWN since receiving it. In all cases, P q needs to receive task data, i.e., x q messages of V data bytes. This holds true even if P q had been enrolled at time t < t but was un-enrolled since then. In other words, any interrupted communication must be resumed from scratch if the worker became DOWN or was removed from the configuration. III. OFF-LINE COMPLEXITY The scheduling problem is to maximize the expected number of completed application iterations before time N, where N is a specified deadline. In the off-line version of this problem, one assumes full knowledge of future worker states. In other words, S q [j] is known for 1 q p and 1 j N. It is shown in [2] that the simplest off-line and deterministic versions of the problem are NP-hard. IV. ANALYTICAL APPROXIMATIONS In this section, we compute the expectation of the time needed by a configuration to compute a given workload conditioned on this computation being successful (i.e., with no worker becoming DOWN ), as well as the probability of success. Intuitively, these quantities seem relevant for developing scheduling heuristics that account for the need for workers to be UP simultaneously, and for workers that can become temporarily RECLAIMED. To compute the above expectation and probability, we introduce a Markov model of processor availability. The availability of processor P q is described by a 3-state recurrent aperiodic Markov chain, defined by 9 probabilities: P (q) i,j, with i, j {u, r, d}, is the probability for P q to move from state i at time t to state j at time t + 1, which does not depend on t. A. Probability of success and expected duration of a computation Consider a set S of workers all in the UP state at time 0. This set is assigned a workload that requires W time-slots of simultaneous computation. To complete this workload successfully, all the workers in S must be simultaneously UP during another W 1 time-slots. They can possibly become RECLAIMED (thereby temporarily suspending the execution) but must never become DOWN in between. What is the probability of the workload being completed? And, if it is successfully completed, what is the expectation of the number of time-slots until completion? Definition 1: Knowing that all processors in a set S are UP at time-slot t 1, let P (S) + be the conditional probability that they will all be UP simultaneously at a later time-slot, without any of them going to the DOWN state in between. Formally, knowing that P q S, S q [t 1 ] = u, P (S) + is the conditional probability that there exists a time t 2 > t 1 such that P q S, S q [t 2 ] = u and S q [t] d for t 1 < t < t 2. Definition 2: Let E (S) (W) be the conditional expectation of the number of time-slots required by a set of processors S to complete a workload of size W knowing that all processors in S are UP at the current time-slot t 1 and none will become DOWN before completing this workload. Formally, knowing that S q [t 1 ] = u, and that there exist W 1 time-slots t 2 < t 3 < < t W, with t 1 < t 2, S q [t i ] = u for i [2, W ], and S q [t] d for t [t 1, t W ], E (S) (W ) is the expectation of t W t 1 + 1 conditioned on success. Theorem 4.1: It is possible to approximate the values of P (S) + and E (S) (W ) numerically up to an arbitrary precision ε in fully polynomial time.

B. Probability of success and expected duration of a communication Similar approximations cannot be obtained for communications due to complexity added by the n com constraint. Instead, we resort to a coarser approximation as explained hereafter. Let S be a set of enrolled workers. For worker P q S, let n q be the number of time-slots of communication needed to receive the application program and all the data of its allocated tasks. Suppose first that S n com. In this case, the expected communication time on worker P q, E q, can be estimated precisely reusing the result in the previous section: E q = E (Pq) (n q ). We then estimate the expected communication time of the current configuration as E comm (S) = max Pq S{E (Pq) (n q )}. In the case S n com, obtaining an estimate close to the actual expected communication time seems out of reach. { Instead, we use a coarser estimation: E comm (S) = { max max Pq S E (P q) (n q ) } } Pq S, nq n com. Let P (Pq) ND (t) denote the probability that worker P q that was UP at time t does not become DOWN between time t and time t + t. The probability of success is then estimated as P comm (S) = P P (Pq) q S ND (E(S) comm). The expression for P comm (S) does not take into account the time needed after the end of all communications for all workers to be UP simultaneously. The probability of success of an iteration is estimated by multiplying the probability of success of the communications and the probability of success of the computations. V. ON-LINE HEURISTICS We propose heuristics for solving the on-line version of the scheduling problem, i.e., assuming no knowledge of future processor states. Conceptually, we distinguish between two classes of heuristics. Passive heuristics conservatively keep current processors active as long as possible: the current configuration is changed only when one of the enrolled processors becomes DOWN. In this case, all previously executed work is lost. However, a worker that has not become DOWN but has already received task data, can reuse that data if the scheduler reassigns tasks to it. Proactive heuristics allow for a complete reconfiguration even if no worker fails, possibly aborting ongoing computation if a better configuration is found. This makes it possible for an iteration to never complete. A criterion must thus be derived to decide whether and when such an aggressive reconfiguration is worthwhile. Our proactive heuristics are defined by a pair (criterion, passive heuristic). When a new configuration is computed using the heuristic, it is compared to the current configuration according to the criterion. If the new configuration is better than the current one, then it is launched, leading to new communications and task allocations. Otherwise, the execution continues with the current configuration for an additional time slot. A. Passive heuristics Passive heuristics assign tasks to workers, which must be in the UP state, one by one until m tasks are assigned. Each task is assigned to a worker according to a criterion that defines the heuristic. As described hereafter, we consider four different criteria: probability of success, expected completion time, estimated yield, and estimated apparent yield. IP (Incremental: Probability of success) This heuristic attempts to find configurations with high probability of success. The next task is assigned to the worker such that the probability of success of all currently assigned tasks (including the new one) is maximized. More precisely, consider the set S of workers with at least one task already assigned. For each worker P q, either in S or not, we compute the probability P (S) (q) of success of the communication and the computation if the additional task is assigned to P q, using the results of Section IV: P (S) (q) = P (S {Pq}) (W q ) P comm (S {Pq}) with W q the maximal load in S {P q } with an additional task on P q. We assign the next task to worker P q0, with q 0 = ArgMax { P (S) (q) }. This natural idea of the most reliable workers has been used for scheduling independent tasks in [3], [4], [5]. IE (Incremental: Expected completion time) This heuristic attempts to find fast configurations, without considering reliability. The next task is assigned to the worker that minimizes the expected execution time of the iteration. More precisely, consider the set S of workers with at least one task already assigned. For each worker P q, either in S or not, we compute the expected communication time E comm (S {Pq}) and the expected computation time E (S {Pq}) (W q ) with an additional task on P q. We obtain the expected duration of the iteration E (S) (q) = E comm (S {Pq}) + E (S {Pq}) (W q ). We assign the next task to worker P q0, with q 0 = ArgMin { E (S) (q) }. This idea of picking the fatest workers has been used for scheduling independent tasks in [6]. IY (Incremental: Expected yield) This heuristic assigns the next task to the worker that maximizes the yield of the configuration. The yield is the expected value of the inverse of the execution time of the current iteration, which we estimate as follows. For a given configuration with probability of success P and expected completion time E for an iteration that has already been running P E+t. for t time slots, the yield is estimated as Y = Intuitively, we expect the yield to achieve a trade-off between reliability (probability of success) and execution speed. Consider the set S of workers with at least one task already assigned. For each processor P q, either in S

or not, we compute the expected yield with an additional task on P q : let P (S) (q) be the probability computed for heuristic IP, E (S) (q) be the expected completion time computed for heuristic IE, and t be the time spent since the beginning of the current iteration. We assign the next } { P (S) (q) t+e (S) (q) task to worker P q0, with q 0 = ArgMax. This general idea of trading off reliability for speed has been used in the context of independent tasks in many previous works [6], [4], [7], [8]. IAY (Incremental: Expected apparent yield) The yield takes into account the time already spent in the current iteration. It could be worthwhile to consider only future work, i.e., the remaining time until iteration completion. To this end we define the apparent yield as AY = P E. Using the same notations as for heuristic IY, we assign the next } task to processor P q0 with q 0 = ArgMax. { P (S) (q) E (S) (q) B. Proactive heuristics Consider an application executing on a platform using a passive heuristic H and criterion C at some time t. The configuration config(t 1) was selected by H at time t t 1 because of a configuration change due to a proactive decision, due to a worker becoming DOWN, or due to the beginning of a new iteration. Let config 1 = config(t ) = config(t 1). At time t, the configuration was measured by criterion C with value c. Suppose that by time t no worker in this configuration has failed. Between t and t, some work may have been done: some communications may be in process or completed, and computations may have started. Consequently, the measure of this configuration given by C should be updated to account for the progress between t and t. Let c be the updated value of criterion C for the current configuration. At step t, a new configuration is computed from scratch using heuristic H, as if no task were allocated to any worker. Let config 2 be this new configuration and c 2 its measure by C. If c c 2, then the current configuration at time t 1 is kept for another time-slot: config(t) = config 1. Otherwise, the current configuration is interrupted, and the new configuration is config(t) = config 2. For certain criterion choices, a heuristic could diverge and continually change the configuration, even with workers that are reliably UP. To avoid this divergence, proactive criteria have to respect the following constraint: a given configuration that has been running for t + 1 time-slots must be better for the proactive criterion than the same configuration running for t time slots. With this constraint, all possible configurations are ordered by their value for the selected criterion at the beginning of the iteration, and a lower-ranked configuration in this order cannot be chosen to replace the current configuration. As the number of possible configurations is finite, no proactive heuristic can diverge. The four criteria used to define passive heuristics in the previous section meet this constraint. However, AY (Apparent Yield) leads to many (unnecessary) configuration changes before converging, while the other criteria should be stable. Hence, for the proactive criterion C, we only retain P (Probability of success), E (Expected completion time) and Y (Expected yield). Any passive heuristic H can be used as the building block for a proactive heuristic. We thus obtain 3 4 proactive heuristics named C-H where C {P, E, Y} and H {IP, IE, IY, IAY}. VI. CONCLUSION Unlike previous work that has considered looselycoupled master-worker applications, in this study a single processor failure can have a dramatic effect on application execution. The requirement that the application can progress only when all enrolled processors are simultaneously available, dramatically complicates all scheduling decisions. By assuming a Markov model of processor availability, we have proposed polynomial time approximation schemes to compute the expected completion time of a computation, and its probability of success. We have then proposed many heuristics that are easily defined as combinations of two among four sensible metrics: probability of success, expected completion time, expected yield and expected apparent yield. All these heuristics have been extensively evaluated in simulations, see [2]. The main conclusion is that a proactive heuristic that selects processors to maximize expected execution time and changes configuration based on yield or probability of success is very promising. Acknowledgments. This work was supported in part by the ANR RESCUE project. REFERENCES [1] B. Hong and V. K. Prasanna, Adaptive allocation of independent tasks to maximize throughput, IEEE TPDS, vol. 18, no. 10, pp. 1420 1435, 2007. [2] H. Casanova, F. Dufossé, Y. Robert, and F. Vivien, Mapping Tightly-Coupled Applications on Volatile Resources, ENS Lyon, Tech. Rep., May 2012. [Online]. Available: http: //hal-ens-lyon.archives-ouvertes.fr/ensl-00697621 [3] T. Estrada, D. Flores, M. Taufer, P. Teller, A. Kerstens, and D. Anderson, The Effectiveness of Threshold-Based Scheduling Policies in BOINC Projects, in Proc. of e-science 06, 2006. [4] C. Anglano, J. Brevik, M. Canonico, D. Nurmi, and R. Wolski, Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids, in Proc. of Grid Computing, 2006, pp. 56 63. [5] J. Wingstrom and H. Casanova, Probabilistic Allocation of Tasks on Desktop Grids, in Proc. of PCGrid, 2008. [6] D. Kondo, A. Chien, and H. Casanova, Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids, in Proc. of SC, 2004. [7] E. Byun, S. Choi, M. Baik, J. Gil, C. Park, and C. Hwang, MJSA: Markov job scheduler based on availability in desktop grid computing environment, FGCS, vol. 23, pp. 616 622, 2007. [8] E. Heien, D. Anderson, and K. Hagihara, Computing Low Latency Batches with Unreliable Workers in Volunteer Computing Environments, Journal of Grid Computing, vol. 7, no. 4, pp. 501 518, 2009.