Optimal quorum for a reliable Desktop grid

Similar documents
On Sharing Workload in Desktop Grids

Power-Aware Linear Programming Based Scheduling for Heterogeneous Computer Clusters

Mapping tightly-coupled applications on volatile resources

ONLINE SCHEDULING OF MALLEABLE PARALLEL JOBS

Scheduling divisible loads with return messages on heterogeneous master-worker platforms

Single Machine Scheduling with Generalized Total Tardiness Objective Function

Embedded Systems Design: Optimization Challenges. Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden

Single Machine Scheduling with a Non-renewable Financial Resource

On Machine Dependency in Shop Scheduling

Maximizing Data Locality in Distributed Systems

Non-preemptive Fixed Priority Scheduling of Hard Real-Time Periodic Tasks

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

INFINITY: CARDINAL NUMBERS

Energy-efficient scheduling

Asymptotic Counting Theorems for Primitive. Juggling Patterns

Revision notes for Pure 1(9709/12)

Math 3 Variable Manipulation Part 4 Polynomials B COMPLEX NUMBERS A Complex Number is a combination of a Real Number and an Imaginary Number:

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

12.1 Arithmetic Progression Geometric Progression General things about sequences

The simplex algorithm

Adaptive Scheduling for Adjusting Retrieval Process in BOINC-based Virtual Screening

We have been going places in the car of calculus for years, but this analysis course is about how the car actually works.

A Note on Parallel Algorithmic Speedup Bounds

Minimizing Mean Flowtime and Makespan on Master-Slave Systems

Do we have a quorum?

27 th Annual ARML Scrimmage

Internet Computing: Using Reputation to Select Workers from a Pool

Notation. Bounds on Speedup. Parallel Processing. CS575 Parallel Processing

Kim Bowman and Zach Cochran Clemson University and University of Georgia

Data Structures and Algorithms Running time and growth functions January 18, 2018

Introduction to Randomized Algorithms III

Randomized Algorithms

Some consequences of the Riemann-Roch theorem

Divisible Load Scheduling

Optimisation and Operations Research

Robust goal programming

1. Find the domain of the following functions. Write your answer using interval notation. (9 pts.)

An Efficient Energy-Optimal Device-Scheduling Algorithm for Hard Real-Time Systems

Glossary availability cellular manufacturing closed queueing network coefficient of variation (CV) conditional probability CONWIP

A Readable Introduction to Real Mathematics

Non-preemptive multiprocessor scheduling of strict periodic systems with precedence constraints

The virtue of dependent failures in multi-site systems

Session-Based Queueing Systems

Recoverable Robustness in Scheduling Problems

Scenario based analysis of linear computations

CS 347 Parallel and Distributed Data Processing

Approximation Algorithms for Scheduling with Reservations

A polynomial-time approximation scheme for the two-machine flow shop scheduling problem with an availability constraint

Finally the Weakest Failure Detector for Non-Blocking Atomic Commit

Math 1b Sequences and series summary

Diary of Mathematical Musings. Patrick Stein

On-line Bin-Stretching. Yossi Azar y Oded Regev z. Abstract. We are given a sequence of items that can be packed into m unit size bins.

1 Modelling and Simulation

NEUTRIX CALCULUS I NEUTRICES AND DISTRIBUTIONS 1) J. G. VAN DER CORPUT. (Communicated at the meeting of January 30, 1960)

Online Interval Coloring and Variants

bound of (1 + p 37)=6 1: Finally, we present a randomized non-preemptive 8 -competitive algorithm for m = 2 7 machines and prove that this is op

2. Limits at Infinity

Math 3361-Modern Algebra Lecture 08 9/26/ Cardinality

Lecture 2: Scheduling on Parallel Machines

CS 700: Quantitative Methods & Experimental Design in Computer Science

Questionnaire for CSET Mathematics subset 1

Examiners Report/ Principal Examiner Feedback. June GCE Core Mathematics C2 (6664) Paper 1

Word-representability of line graphs

An Asynchronous Algorithm on NetSolve Global Computing System

CS264: Beyond Worst-Case Analysis Lecture #18: Smoothed Complexity and Pseudopolynomial-Time Algorithms

EGYPTIAN FRACTIONS WITH EACH DENOMINATOR HAVING THREE DISTINCT PRIME DIVISORS

Experience in Factoring Large Integers Using Quadratic Sieve

UNIQUE-PERIOD PRIMES. Chris K. Caldwell 5856 Harry Daniels Road Rives, Tennessee

Evaluation and Validation

Mathematical Proofs. e x2. log k. a+b a + b. Carlos Moreno uwaterloo.ca EIT e π i 1 = 0.

Complexity of preemptive minsum scheduling on unrelated parallel machines Sitters, R.A.

DISTINGUING NON-DETERMINISTIC TIMED FINITE STATE MACHINES

CS264: Beyond Worst-Case Analysis Lecture #15: Smoothed Complexity and Pseudopolynomial-Time Algorithms

Postulate 2 [Order Axioms] in WRW the usual rules for inequalities

Introduction to integer programming II

Data Structures and Algorithms

A Self-Stabilizing Algorithm for Finding a Minimal Distance-2 Dominating Set in Distributed Systems

ABSOLUTE VALUES AND VALUATIONS

Finite Mathematics : A Business Approach

Analysis of Algorithms. Unit 5 - Intractable Problems

ON MATCHINGS IN GROUPS

A NEW SET THEORY FOR ANALYSIS

An improved approximation algorithm for two-machine flow shop scheduling with an availability constraint

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

5 + 9(10) + 3(100) + 0(1000) + 2(10000) =

2. Two binary operations (addition, denoted + and multiplication, denoted

An introductory example

Bi-objective approximation scheme for makespan and reliability optimization on uniform parallel machines

Advanced Mathematics Unit 2 Limits and Continuity

Advanced Mathematics Unit 2 Limits and Continuity

ASSIGNMENT 1 SOLUTIONS

Some Basic Notations Of Set Theory

Introduction to Probability

Scheduling Coflows in Datacenter Networks: Improved Bound for Total Weighted Completion Time

An Interesting Perspective to the P versus NP Problem

Chapter 20. Countability The rationals and the reals. This chapter covers infinite sets and countability.

Algorithms. NP -Complete Problems. Dong Kyue Kim Hanyang University

Math 105A HW 1 Solutions

On-line scheduling of periodic tasks in RT OS

Online Optimization of Busy Time on Parallel Machines

Transcription:

Optimal quorum for a reliable Desktop grid Ilya Chernov and Natalia Nikitina Inst. Applied Math Research, Pushkinskaya 11, Petrozavodsk, Russia Abstract. We consider a reliable desktop grid solving multiple tasks with answers from a given set. Wrong answer can be obtained with some small probability p: reliability means that p is small with respect to 1 p. Penalty is added to the computation cost if a wrong answer is believed to. We consider the optimization problem of choosing the quorum in case of finite set of possible answers, countable set, and set with continuously distributed subset. Keywords: Desktop grid, optimal quorum, replication, reliable computation 1 Introduction In the presented work we consider the problem of task scheduling in a Desktop Grid. The term stands for a distributed computing system with computational nodes voluntarily donated by an organization (or a group of organizations following the same research goals) in their idle time. The nodes of such system are desktop PCs, compute servers, cluster nodes, and other computational resources available in a local network and/or the Internet. The BOINC middleware [1] can be considered a de-facto standard for organizing Desktop Grids. Since late 1990s, BOINC-based Desktop Grids have proven to serve as an effective, highly scalable tool for solving computationally intensive problems. But the definition given above implies that computational nodes of a Desktop Grid can be highly heterogeneous by technical and software characteristics and become available/unavailable in unpredicted moments. Moreover, the answer returned by the computing node can be wrong, due to malfunction, errors in transaction, malicious actions etc. In order to harness the volatile Desktop Grid resources rationally, special scheduling methods and algorithms should be developed. Much effort has been paid recently to optimize the calculation process in a Desktop Grid and improve its reliability. For example, a series of works [3, 4, 6] consider heuristics for mapping independent tasks onto identical nodes that may leave Desktop Grid in random moments; the work [5] presents closed-form conditions of task replication to be reasonable; the authors of [2] and [7] consider heterogeneity of computational nodes, etc. The contribution of the presented work is investigation of the optimal task replication level and the optimal quorum for a Desktop Grid with high reliability of the answers returned from computational nodes. Solving a computational problem with high reliability may significantly

2 I. Chernov, N. Nikitina increase the cost of computations, such as the overall time, due to high precision of calculations. We try to evaluate the expected cost and select the optimal strategy of task scheduling. 2 The mathematical model Let us consider a desktop grid computing system that consists of multiple computers and solves numerous similar tasks. Each task has an answer from some set of possible answers. Without loss of generality we can assume that this set is a subset of real numbers or integer numbers. The correct answer is obtained with some probability q. Errors are possible due to malfunction of hardware, malicious actions, or wrong answers produced by a correct non-deterministic algorithm. Other, wrong answers have the total probability 1 q; some of them can have non-zero probability, others are distributed continuously. Such errors can be made less likely by replication: sending copies of each task to different computing nodes until a chosen number of identical answers is received. This number is called the quorum. Obviously, for a given quorum the number of copies is at least but can be more, up to N() + 1 for finite number N of possible answers. However, the average number of copies is finite even in case of infinite set of answers. If a wrong answers is taken, this error will later be revealed and can cost much. The losses can be connected with loosing a rare desired phenomenon, unnecessary expensive laboratory checks, reputational losses, etc. These penalties can be huge compared to the average cost of a single task. Here we assume that q 1 in the following sense: 1 q is negligibly small compared to q. Therefore we can neglect possible cases of choosing between different wrong answers or any wrong answers seen if the correct one has been believed. First we consider the finite set of possible answers. Then we will add some simple notes about more general case with infinite set of answers, with positive probabilities and/or continuously distributed. 3 Finite set of answers Assume that M + 1 possible answers have probabilities p m, p 0 = q 1 (the correct answer), penalties of different answers are F m, F 0 = 0. If quorum is chosen and the correct answer is accepted, the average number of copies is up to the precision discussed above, the probability to get the correct answer is q. However, for each wrong answer number m possible durations can be any between (the same wrong answer in a row) and 2 (after correct and wrong answers the final wrong answer was received). The corresponding probabilities of durations + i, i = 0,..., are, respectively, equal to ( ) ( ) + i + i p i mq i p m.

Optimal quorum for a reliable Desktop grid 3 The first factor chooses i positions of correct answers among first tries, while the final answer is accepted. The mean cost is E() = + M ( ) + i p m. F m 1 i=0 To simplify this expression, we need the following formula: It follows from 1 ( ) ( ) + i 2 =. i=0 ( ) (2) j = j=1 ( ) 2 if j = i. To prove this equality, we need two well-known formulae (the Pascal triangle): ( ) ( ) ( ) ( ) ( ) n n 1 n 1 n n = +, =. m m 1 m m n m Now consider ( ) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 2 2 2 = + = + = 2 ( ) ( ) ( ) 2 2 2 3 2 3 = + +. Continue to obtain ( ) 2 = ( ) (2) j. j=1 On the final step we applied the obvious equality ( ) ( n n = n 1). This completes the proof. Now we can rewrite the mean cost as ( ) 2 M E() = + F m p m. (1) We want to minimize this cost keeping in mind the fact that penalties F m are not known precisely, so that conditions should allow variations of F m. Let us consider differences E = E( + 1) E() and take the first with E 0. It is easy to check that ( ) 2 M E 1 F m p m.

4 I. Chernov, N. Nikitina So we need the smallest such that ( ) 2 M F m p m 1. (2) It is clear that inequality (2) holds, then each term is less than 1; although it is possible to choose parameters in such a way that all terms are less than one while the sum is more. However, if we take making each term less than 1, difference between optimal value is at most 1. Let us simplify condition (2) in case M = 1 using the asymptotical formulae ( ) 2 = 22, π ( ) 2 = 22 1. π Note that they are rather precise even for low : 10% for = 1 and less. Then the condition (2) can be replaced by the approximate one: 2 2 1 π F p 1. If F = 2 A, p = 2 B (A > 1, B > 2), then (2 B) + A 1 0.5 log 2 π. The right-hand side is less than 3 for < 20 and we can safely neglect it; then A 1 B 2 = log 2(F ) 1 log 2 (p) + 2 = ln(f/2) ln(4p), the smallest to be taken. For example, for A = 20, B = 10 we have = 3. The same quorum we get solving the general inequality (2). 4 Infinite set of wrong answers Assume that beside the correct answer of some probability q 1 and M wrong ones with probabilities p m > 0 the set of possible answers contains a subset S of continuously distributed answers of total probability r. Obviously these answers are thrown off by any replication because they have zero probabilities and therefore receiving an answer again is an impossible event. However, wrong answers with positive probabilities can come a few times and be believed. In this case the cost is increased on some penalty F m. The results here are completely the same as in the previous section, only the average cost of = 1 case is different because estimated value of penalty for an answer from S must be added to the average cost. Now let us consider the case of countable (M = ) set of answers with positive probabilities. This assumption may look nice, for example in case when

Optimal quorum for a reliable Desktop grid 5 answers are integer numbers. However, such case does not allow equal (or at least comparable) probabilities p m : the series must converge to some p which is small compared to 1 p. Provided that p m are given, the average cost E() = + p m ( ) 2 F m p m can still diverge for some or all, depending on behaviour of penalties F m. It is clear that polynomial (at most) growth of F m with respect to m is sufficient for convergence. To prove that, note that a binomial coefficient grows with polynomial rate: ( ) n 2 n, 0 m n, m and the sequence p m decreases more quickly that m 1 because it forms a convergent series; therefore sufficiently large = makes the terms of the series decrease quickly enough, so the the series converges. Higher only reduce the sum at least (max p m ) times, so that the second term in the expression for E decreases. So, for high enough values of the expected cost E grows, almost lineary. Then it has the unique minimal value being the solution to the optimization problem. We have proven that the optimization problem we are studying always has a solution. It is solved in the similar way as the one for the finite set of answers, only the cost function may be equal to infinity for some µ. Note that the solution can equal 1 in case of low (or no) penalties. This means that no replication is necessary. It is clear that both countable set of possible answers and continuously distributed set of impossible answers can also be combined. Acknowledgements The work was financially supported by grants 13-07-00008 and 15-29-07974 of Russian Foundation for Basic Research. References 1. D. Anderson. BOINC: A system for public-resource computing and storage. In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing, pages 4 10, Washington, DC, USA, 2004. 2. C. Anglano, J. Brevik, M. Canonico, D. Nurmi, and R. Wolski. Fault-aware scheduling for bag-of-tasks applications on desktop grids. In Proceedings of the 7th IEEE/ACM International Conference on Grid Computing, pages 56 63, 2006.

6 I. Chernov, N. Nikitina 3. L.-C. Canon, A. Essafi, G. Mounie, and D. Trystram. A bi-objective scheduling algorithm for desktop grids with uncertain resource availabilities. In Proceedings of the 17th International Conference on Parallel Processing (Euro-Par11), volume II, pages 238 249, Berlin, Heidelberg, 2011. Springer-Verlag. 4. H.-C. Hwang, K. Lee, and S. Y. Chang. The effect of machine availability on the worst-case performance of LPT. Discrete Applied Mathematics, 148(1):49 61, 2005. 5. O. Ibarra and C. Kim. Heuristic algorithms for scheduling independent tasks on nonidentical processors. Journal of ACM, 24(2):280 289, 1977. 6. C.-Y. Lee. Parallel machines scheduling with nonsimultaneous machine available time. Discrete Applied Mathematics, 30(1):53 61, 1991. 7. A. S. Rumiantsev. Optimizing the execution time of a desktop grid project. Program Systems: Theory and Applications online journal, 5(1(19)):175 182, 2014. (In Russian).