Study on the Lightweight checkpoint based rollback recovery mechanism

Similar documents
CPSC 211 Data Structures & Implementations (c) Texas A&M University [ 259] B-Trees

Lecture 1: Numerical Integration The Trapezoidal and Simpson s Rule

4.1 The Uniform Distribution Def n: A c.r.v. X has a continuous uniform distribution on [a, b] when its pdf is = 1 a x b

Microscopic Flow Characteristics Time Headway - Distribution

An Indian Journal FULL PAPER. Trade Science Inc. A stage-structured model of a single-species with density-dependent and birth pulses ABSTRACT

On the Derivatives of Bessel and Modified Bessel Functions with Respect to the Order and the Argument

Spring 2006 Process Dynamics, Operations, and Control Lesson 2: Mathematics Review

General Article Application of differential equation in L-R and C-R circuit analysis by classical method. Abstract

UNSTEADY FLOW OF A FLUID PARTICLE SUSPENSION BETWEEN TWO PARALLEL PLATES SUDDENLY SET IN MOTION WITH SAME SPEED

Boyce/DiPrima 9 th ed, Ch 2.1: Linear Equations; Method of Integrating Factors

A MATHEMATICAL MODEL FOR NATURAL COOLING OF A CUP OF TEA

Lecture 2: Current in RC circuit D.K.Pandey

A THREE COMPARTMENT MATHEMATICAL MODEL OF LIVER

UNIT #5 EXPONENTIAL AND LOGARITHMIC FUNCTIONS

7.4 QUANTUM MECHANICAL TREATMENT OF FLUCTUATIONS *

Midterm Examination (100 pts)

Elementary Differential Equations and Boundary Value Problems

Logistic equation of Human population growth (generalization to the case of reactive environment).

Lecture 4: Laplace Transforms

CSE 245: Computer Aided Circuit Simulation and Verification

2.1. Differential Equations and Solutions #3, 4, 17, 20, 24, 35

Routing in Delay Tolerant Networks

A Condition for Stability in an SIR Age Structured Disease Model with Decreasing Survival Rate

CHAPTER CHAPTER14. Expectations: The Basic Tools. Prepared by: Fernando Quijano and Yvonn Quijano

1. Inverse Matrix 4[(3 7) (02)] 1[(0 7) (3 2)] Recall that the inverse of A is equal to:

Institute of Actuaries of India

Let s look again at the first order linear differential equation we are attempting to solve, in its standard form:

The transition:transversion rate ratio vs. the T-ratio.

Modelling of three dimensional liquid steel flow in continuous casting process

Discussion 06 Solutions

Economics 302 (Sec. 001) Intermediate Macroeconomic Theory and Policy (Spring 2011) 3/28/2012. UW Madison

Applied Statistics and Probability for Engineers, 6 th edition October 17, 2016

Availability Analysis of Repairable Computer Systems and Stationarity Detection

AR(1) Process. The first-order autoregressive process, AR(1) is. where e t is WN(0, σ 2 )

CHAPTER-5 PROBABILISTIC MODEL FOR RELIABILITY ANALYSIS IN WIRELESS SENSOR NETWORKS

Modeling and Experimental Investigation on the Internal Leakage in a CO2 Rotary Vane Expander

I) Title: Rational Expectations and Adaptive Learning. II) Contents: Introduction to Adaptive Learning

MEM 355 Performance Enhancement of Dynamical Systems A First Control Problem - Cruise Control

5. An object moving along an x-coordinate axis with its scale measured in meters has a velocity of 6t

Final Exam : Solutions

Integrity Control in Nested Certificates

Lecture 1: Growth and decay of current in RL circuit. Growth of current in LR Circuit. D.K.Pandey

Midterm exam 2, April 7, 2009 (solutions)

Homework #2: CMPT-379 Distributed on Oct 2; due on Oct 16 Anoop Sarkar

Arturo R. Samana* in collaboration with Carlos Bertulani*, & FranjoKrmpotic(UNLP-Argentina) *Department of Physics Texas A&M University -Commerce 07/

Charging of capacitor through inductor and resistor

Random Access Techniques: ALOHA (cont.)

Mixing time with Coupling

Transient Performance Analysis of Serial Production Lines

Mundell-Fleming I: Setup

Wave Equation (2 Week)

Voltage v(z) ~ E(z)D. We can actually get to this wave behavior by using circuit theory, w/o going into details of the EM fields!

Circuits and Systems I

Phys463.nb Conductivity. Another equivalent definition of the Fermi velocity is

Boyce/DiPrima 9 th ed, Ch 7.8: Repeated Eigenvalues

Methodology for Analyzing State Tax Policy By Orphe Pierre Divounguy, PhD, Revised by Andrew J. Kidd, PhD (May 2018)

( ) 2! l p. Nonlinear Dynamics for Gear Fault Level. ( ) f ( x) ( ),! = sgn % " p. Open Access. Su Xunwen *,1, Liu Jinhao 1 and Wang Shaoping 2. !

H is equal to the surface current J S

The Optimal Timing of Transition to New Environmental Technology in Economic Growth

SOLUTIONS. 1. Consider two continuous random variables X and Y with joint p.d.f. f ( x, y ) = = = 15. Stepanov Dalpiaz

Chapter 3: Fourier Representation of Signals and LTI Systems. Chih-Wei Liu

whereby we can express the phase by any one of the formulas cos ( 3 whereby we can express the phase by any one of the formulas

Control System Engineering (EE301T) Assignment: 2

Ratio-Product Type Exponential Estimator For Estimating Finite Population Mean Using Information On Auxiliary Attribute

Advanced Queueing Theory. M/G/1 Queueing Systems

Transfer function and the Laplace transformation

On the Speed of Heat Wave. Mihály Makai

Software Reliability using SPRT: Inflection S- shaped Model

Real time estimation of traffic flow and travel time Based on time series analysis

3(8 ) (8 x x ) 3x x (8 )

ERROR ANALYSIS A.J. Pintar and D. Caspary Department of Chemical Engineering Michigan Technological University Houghton, MI September, 2012

Physical Organization

EXERCISE - 01 CHECK YOUR GRASP

Lagrangian for RLC circuits using analogy with the classical mechanics concepts

Chapter 12 Introduction To The Laplace Transform

DEPARTMENT OF ELECTRICAL &ELECTRONICS ENGINEERING SIGNALS AND SYSTEMS. Assoc. Prof. Dr. Burak Kelleci. Spring 2018

a dt a dt a dt dt If 1, then the poles in the transfer function are complex conjugates. Let s look at f t H t f s / s. So, for a 2 nd order system:

Single Electron Devices for Logic Applications

Impulsive Differential Equations. by using the Euler Method

Economics 302 (Sec. 001) Intermediate Macroeconomic Theory and Policy (Spring 2011) 4/25/2011. UW Madison

Reliability Analysis of a Bridge and Parallel Series Networks with Critical and Non- Critical Human Errors: A Block Diagram Approach.

Ma/CS 6a Class 15: Flows and Bipartite Graphs

Decline Curves. Exponential decline (constant fractional decline) Harmonic decline, and Hyperbolic decline.

Power communication network traffic prediction based on two dimensional prediction algorithms Lei Xiao Xue Yang Min Zhu Lipeng Zhu

CHAPTER. Linear Systems of Differential Equations. 6.1 Theory of Linear DE Systems. ! Nullcline Sketching. Equilibrium (unstable) at (0, 0)

Poisson process Markov process

Fixed-Relative-Deadline Scheduling of Hard Real-Time Tasks with Self-Suspensions

FWM in One-dimensional Nonlinear Photonic Crystal and Theoretical Investigation of Parametric Down Conversion Efficiency (Steady State Analysis)

On General Solutions of First-Order Nonlinear Matrix and Scalar Ordinary Differential Equations

Copyright 2012 Pearson Education, Inc. Publishing as Prentice Hall.

B) 25y e. 5. Find the second partial f. 6. Find the second partials (including the mixed partials) of

FIRST-ORDER SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS I: Introduction and Linear Systems

The Procedure Abstraction Part II: Symbol Tables and Activation Records

The Science of Monetary Policy

C From Faraday's Law, the induced voltage is, C The effect of electromagnetic induction in the coil itself is called selfinduction.

The Soft Engine for Economic Growth in a Long-Time: The Economic Development Power, Conversion and Conservation for economic Energy

LaPlace Transform in Circuit Analysis

Double Slits in Space and Time

4.3 Design of Sections for Flexure (Part II)

Modelling of repairable items for production inventory with random deterioration

Transcription:

9 Inrnaional Confrnc on Compur Enginring and Applicaions II vol. IAI rss, Singapor Sudy on h ighwigh chcpoin basd rollbac rcovry mchanism Zhang i,3, ang Rui Dai Hao 3, Ma Mingai 3 and i Xianghong 4 Insiu of Command Auomaion A Univrsiy of Scinc and chnology rain bas of Gnral Saff am 5 3 Insiu of China Elcronic Equipmn Sysm Enginring Company 4 Burau of Xinhua Nws Agncy Communicaion chnology Absrac. h clusr srvic rcovry mchanism is vry imporan in h sudy of survivabl srvr clusr. Afr analyzing h rlad wor, his papr poin ou h dficincy of h xisd rcovry mchanisms, and propos h ighwigh chcpoin basd rollbac rcovry mchanism ha aims o nhanc h rcovry prformanc, h xprimn rsul valida ha h prformanc xclld xisd mhods. Kywords: Rollbac rcovry, Clusr, Srvic survivabiliy.. Inroducion ih h dvlopmn of h nwor chnology and h incras of usrs rquirmn, h coninuiy of h nwor srvic providd by h clusr mus b nhancd. Som applicaions such as lcronic businss rquir vry high qualiy of srvic, so h survivabl srvr clusr sysm mus hav no only h abiliy of rsisanc, bu also h abiliy of rcovry whn ncssary. h y applicaion daa should b ingrad and h rcovry mchanism should b ransparn o h usrs, finally h applicaion program affcd vry lil. Chcpoin basd Rollbac Rcovry R mhod is a common faul-olran chnology in sofwar/hardwar sysms. R [ mans ha h sysm or h procss sar o run anw from h righ poin of h saus which has bn savd bfor h faul im. In h srvr clusr sysms, i is ncssary of h chcpoin informaion o rcovry h faul srvic. h rasons ha caus h srvic inrrupiv may b diffrn in h sudy of faul-olran and survivabiliy, bu h srvic survivabiliy sudy dosn focus on h causaions, so h R also can b usd in h srvic survivabiliy sudy. Alhough h rollbac rcovry chnology providd h rliabiliy, i incrass h complxiy of h sysm. Chcpoin [ is h copy of h applicaion program saus, which is savd in h sady sorag; w assum ha h sady sorag nvr fails. h chcpoin informaion can also b sn o h sorag of h ohr clusr nod in ordr o rcovry h srvic in ha nod, his mhod is usually usd in h clusr sysm and dcras h rcovry im.. Chcpoin Mchanism.. Chcpoin Informaion I is flxibl o s chcpoin, som applicaions can rcovry from vry lil informaion, som ohr applicaions can rcovry from h spcial chcpoin informaion such as spcial im, and h informaion Corrsponding auhor. l.: 8666885; fax: 8666858. E-mail addrss: zhangli_ism@63.com. 39

may b vry small. o dcras h ovrhad of h program, w can dcras h numbr of chcpoin or h siz of h chcpoin fils. In h ordinary applicaion programs, chcpoins nd o sav hs conns as follows[3: procss daa fild, conn in h usr sac; conx rlad ims, includ program counr, procss or saus rgisr, saus poin, c; 3 aciv fil informaion, includ fil dscripor, accss mod, fil siz, rad/wri poin, c; 4 rlad signal informaion, includ shild cod, sac poin, procss funcion handlr, and suspndd signal flag; 5 usr fil conn, rgisr conn. hs conn of h procss saus can b usd in singl poin srvr, bu in h srvr clusr sysms, h chcpoin informaion nd includ: 6 clusr viw of h srvr nod; 7 bacup and inrplay informaion among clusr nods; 8 h im and conn whn rciv/snd mssags; 9 spcial informaion rlad o h applicaion. improv h radiional chcpoin mhods, w rcord as las informaion as possibl whn sing chcpoin, bu hs informaion can b usd whn rcovry h procss. For xampl, for h snd and rciv mssag pair, w only rcord on of hm, whn h program runs corrcly, h ohr mssag is also corrc. sav h chcpoin in daa fild, li abl : abl: lighwigh chcpoin informaion daa fild Exc_sa ro_cl Mm_add Msg_sa Fil_sa Clusr_viw Rsrvd hr ar svn filds in h lighwigh chcpoin, hs conn ar as follows: Exc_sa: procss running saus; includ conx ransiion informaion, rgisr and sac poin informaion; ro_cl: procss conrol saus, procss id, procss rlaionship; Mm_add: mmory and addrss saus; includ conn in h sac, daa fild rlad o h procss; Msg_sa: rcord of snding mssag and mssag in cach; Fil_sa: h fil sa, includ h fil lis which procss opnd, handlr and cach informaion; Clusr_viw: h saus of h clusr, includ h nod rol, bacup and inrplay informaion among clusr nods; Rsrvd: som fild o sav somhing rlad o h spcial applicaions. hs ar h whol conn of h lighwigh chcpoin; h firs hr ims can rcovry a procss somims, so his can hardly rduc h burdn which chcpoin bring on... S chcpoin In h arly days, whn o s chcpoin, h running procss mus b suspndd [4 during his im, h procss coninu o run afr finishing h chcpoin. his mhod guaran h chcpoin informaion cohrn among h whol clusr, bu i influncd h usual procss running, and h ovrhad is also larg, i is no fi for hs applicaions ha hav many rquirs and larg daum. In ordr o prvn h chcpoin mchanism o influnc h usual applicaions, som rsarchrs build nw hrad o s chcpoin spcially, his hrad can communica wih h main procss, and rcord h saus of vry hrad in h chcpoin im, sav h mssags, ma h chcpoin fil as las as possibl. Manim, ohr hrads can coninu running h usual procsss. Basd on h B [5 Brly ab s inux Chcpoin/Rsar, w improv h sing policy of h chcpoin, add h lighwigh mchanism, and rduc h ovrhad of h chcpoin. In ordr o guaran h rliabiliy of h rcovry mhod, h sorag mus b absoluly rliabl. hr ar wo phass of sing h lighwigh chcpoin, in h firs phas, h applicaion iniializ wih a hrad rcall funcion, his rcall funcion gnra a rcall hrad, whn o s chcpoin, his hrad sop blocing, h ohr hrads in h procss coninu o run hir own program, h fil handlr is usd o sav chcpoin informaion. hn h las hrad rcalls o h cr_chcpoin saus, h rcall funcion rurn o h rnl saus, and h procss nr h scond phas. 4

In h scond phas, sysm calls snd chcpoin signal o ohr hrads, afr all h hrads nring rnl saus, h chcpoin informaion bgins o b rcordd. Afr h hrads wri hir own saus ino h chcpoin fil, h sysm rurns o h usr spac, coninu xcuing ohr program cods..3. Rollbac rcovry Using h chcpoin informaion rcovry h applicaion procss, w mainly rly on h iocl call o rad chcpoin fil daa. h rollbac rcovry mans o r-xcu procss from h las chcpoin im whn h sysm fails. Undr inux sysm, firs rcall h funcion do_for o gnra procss, hn using clon o gnra all hrads ha h procss ndd, sarup rcall procss o run hs hrads. h firs hrad rads h chcpoin fil, rcovry h shard ims and is own id, rgisr and signal saus, ohr hrads gradually rcovry h savd saus, a las, rcovry h rlaionship of hs hrads, hn, h procss rurns from h rnl saus, afr all h procsss rcovry, ohr applicaion cods bgin o xcu. 3. Opimal Chcpoin Inrvals hn hr ar fwr failurs, and h running nvironmn is good, sing oo many chcpoins will influnc h usual applicaion prformanc. On h ohr hand, if hr ar high load, and h probabiliy of failur will b high, small chcpoin inrvals may bring on larg rcovry ovrhad, bcaus h procss mus r-xcu far from h failur im. will dcid h opimal chcpoin inrvals in his scion. 3.. im spnding of sing chcpoin consum ha h srvr nod failurs in h clusr prsn a oisson disribuing, h failur probabiliy is. h im spnding of a singl chcpoin undr no failur condiions is ; h chcpoin duraion is. h im spnding of rcovry is. Noic ha h and ar consan as o h spcial applicaion. In h priod of a chcpoin, h program xcuing im incrass, h im spnding is -. Consum ha h applicaion xcuing im is, undr h chcpoin mchanism, h avrag ral xcuing im of h program is, h as is dividd by svral chcpoins, dnos by K, h avrag im spnding of h chcpoin is, so K*. Srvic duraion C can b calculad as: C/. 3.. Saus ransformaion hr ar all wo inds of saus in h clusr nods, on is normal saus, running h procss, includ sing h chcpoin, anohr saus is failur, h procss rollbac o rcovry. Figur is h saus ransformaion. Figur. saus ransformaion In figur, saus is h normal saus, and saus is h failur saus, h lrs on h arrow sid ar h ransformaion probabiliy. h wo saus ransform modl rduc h complxiy of h sysm saus, and i is asy o program. 3.3. Opimal chcpoin inrvals Expcd chcpoin inrvals can b solvd using h wo-sa discr Marov chain prsnd in figur. As o h oisson procss, h ransiion ra is, in im, h ra of vns ar, x dnos h numbrs of h vns, hn 4

,,,! x p Each ransiionx, y, from sa X o sa Y in h Marov chain, has an associad ransiion probabiliy xy and a cos xy, hs variabls can b calculad as follows on by on. 3 4 According o h ransiion probabiliy and condiion probabiliy, w can g h im cos as follows: [ d d 5 [ d d 9 6 7 8 From sa o sa, h procss finishs h rollbac rcovry, h avrag im spnding is, no ha is h avrag rcovry im, so w g formula 5-9. Now, w g all h avrag im spnding of h sa ransiion, l Ψ dno h avrag applicaion procss running im, according h im spnding and hir probabiliy, w g Ψ as follows, Ψ inroduc chcpoin im fficincy Ψ ; no ha is h avrag chcpoin inrvals, hn h propr should saisfy h following formula and 3, [ [ 3 us zplo funcion in Malab program o figur h rlaionship bwn and y, no ha y is, w g h rsul 4

3 4 5 6 7 8 9 x -7 / xp/ 3//-xp/ 3/-/ x -7 4 7/ xp7/ /5/-xp7/ /5-/ X: 375.4 X: 593.3 Y: Y: X: 36.9 Y: X: 45. Y: X: 45. Y: - X: 54.6 Y: - -4-6 - -8-3 - 3 4 5 6 7 8 a Figur. h opimal chcpoin inrvals b Figur a is h diffrnial cofficin funcion of y, ar 3, 6, 8 rspcivly, h propr is h poin whr y is zro, w conclud ha as h incras, h propr is incrasing. From figur b, w conclud ha as incras, h propr is dcrasing, so h propr should b dcidd according o h spcial condiions. 4. prformanc sudy In ordr o valua h prformanc of h lighwigh chcpoin mchanism, w uiliz h Charm languag o di h paralll sysm, and implmn h chcpoin basd rollbac rcovry mhod. choos n-inux, h compilr is GNU. his sag can call h iniial program of h opraion sysm; h mhod of h papr is mbddd in h applicaion program o valua is prformanc. a Figur 3. rcovry im prformanc comparing b Figur 3a is h comparing rsul of rcovry mhod and h radiional log basd mhod, undr svral sysm failurs, h avrag running im of lighwigh chcpoin basd rollbac rcovry is smallr han h log basd mhod. hn is consan, h rcovry mhod in his papr is fficin han h pssimisic log basd rcovry mhod, no only h avrag running im is smallr, bu also h CU rsourc is conomic. 5. Conclusion improvd h radiional chcpoin basd rcovry mhod, pr-digsd h chcpoin fil, his mad h chcpoin sing procss and rcovry asy and fficin o prform. h rcovry procss can choos h lighwigh chcpoin informaion fild daa, which can rduc h rcovry im vry much. h xprimn rsul showd ha his mhod consum lss marials and mor fficin. 6. Rfrncs [ E. N. Elnozahy,. Alvisi, Y. ang, and D. B. Johnson. A survy of rollbac-rcovry proocols in mssag-passing sysms. ACM Compu. Surv. 343: 375-48,. [ Vaidya, N.H., Impac of chcpoin lancy on ovrhad raio of a chcpoining schm, IEEE ransacions on Compurs, Volum 46, Issu 8, Aug. 997: 94-947. [3 ogging Srvic, hp://docs.sun.com/sourc/86-6687- /logging.hml. 8--3. [4 annnbaum, izow M. h Condor Disribud rocssing Sysm [J. Dobb s Journal, 995, : 4-48. [5 J. Dull,. argrov, E. Roman, h Dsign and Implmnaion of Brly ab s inux Chcpoin/Rsar, --3. 43