A Tighter Analysis of Work Stealing

A Tighter Analysis of Work Stealing Marc Tchiboukdjian Nicolas Gast Denis Trystram Jean-Louis Roch Julien Bernard Laboratoire d Informatique de Grenoble INRIA Marc Tchiboukdjian A Tighter Analysis of Work Stealing 1/18

Parallel programming with task parallel libraries Fib(n) { if ( n <= 1 ) return n ; else { x = spawn Fib(n-1) ; y = Fib(n-2) ; sync ; return x+y ; } } online scheduler Shared Memory C 1 C 2 C 3 C 4 m processors Work W =17 Depth D=9 The new standard for parallel programming? Cilk, Intel TBB, Microsoft TPL, KAAPI,... Marc Tchiboukdjian A Tighter Analysis of Work Stealing 2/18

Efficiently schedule task parallel programs List scheduling Greedy scheduler: when tasks are available, no processor is idle C max W ( m + 1 1 ) D m core 1 core 2 core 3 core 4 Marc Tchiboukdjian A Tighter Analysis of Work Stealing 3/18

Efficiently schedule task parallel programs List scheduling Greedy scheduler: when tasks are available, no processor is idle C max W ( m + 1 1 ) D m core 1 core 2 core 3 core 4 Problem: contention on the list Work stealing Each processor has its own list When empty, it tries to steal tasks in others lists u.a.r Contention is reduced: only when several thieves target same victim steal core 1 core 2 core 3 core 4 Marc Tchiboukdjian A Tighter Analysis of Work Stealing 3/18

Previous Work on Work Stealing Work generation is probabilist, focus on steady state results [Mitzenmacher 98, Berenbrink et al. 03] Study of the makespan on identical processors [Blumofe Leiserson 99, Arora Blumofe Plaxton 01] Extended to processors with varying speeds [Bender Rabin 02] Marc Tchiboukdjian A Tighter Analysis of Work Stealing 4/18

Work Stealing Scheduler of Arora Blumofe Plaxton running task ready task stolen task executed task steal steal thief Unit tasks, one source, out-degree at most 2 Execute depth-first and steal breadth-first Analysis based on critical path work queue... E[C max ] W m + 32 D { P C max W m + 64 D + 16 log 2 1 } ɛ ɛ pop worker Marc Tchiboukdjian A Tighter Analysis of Work Stealing 5/18 push

Why a new analysis of work stealing? Analysis of Arora, Blumofe, Plaxton DAG with only 1 source and out-degree at most 2 (does not cover independent tasks) Fixed steal policy (task at the top of the deque) Big constant factor New analysis Apply to several application models: independent tasks, ABP DAG, unrestricted DAG Can model different steal policies: standard steal, cooperative steal More accurate Marc Tchiboukdjian A Tighter Analysis of Work Stealing 6/18

Remaining of the talk 1 Proof methodology 2 Example of unit independent tasks 3 Conclusions Marc Tchiboukdjian A Tighter Analysis of Work Stealing 7/18

Proof based on load balancing processor j steals processor i w i (t) w i (t + 1) w j (t + 1) Each processor owns some amount of work w i (t) After a steal operation from processor j to processor i, some work is transfered from i to j (e.g. one half): max{w j (t + 1), w i (t + 1)} ρ w i (t) (with ρ < 1) Marc Tchiboukdjian A Tighter Analysis of Work Stealing 8/18

Potential Function Φ: Motivation Gantt chart with 25 processors and 2000 unit tasks White: execution Grey: steal Difficult to see any structure due to the random choices Potential function decreasing at each successful steal Bound number of steals S to bound C max m C max = W + S Marc Tchiboukdjian A Tighter Analysis of Work Stealing 9/18

Potential Function Φ: Definition Definition Φ(t) = 1 i m ( w i (t) w(t) m Φ represents how well the load is balanced between the lists ) 2 w(t) m w i (t) w(t) m Marc Tchiboukdjian A Tighter Analysis of Work Stealing 10/18

Potential Function Φ: Properties Φ(t) = 1 Φ = 0 = no more steals 1 i m ( w i (t) w(t) m ) 2 2 i, w i w i c = Φ = 0 c 3 Idle processor i steals half of the work of active processor j = Φ = w j 2 2 Marc Tchiboukdjian A Tighter Analysis of Work Stealing 11/18 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111

Proof Methodology 1 Compute the expected decrease of the potential in one step when α t processors are active and m α t are stealing E[Φ t Φ t+1 Φ t ] h(α t ) Φ t 2 Solve the equation to bound the number of steals S E[S] λ m log 2 Φ 0 { ( 1 )} P S λ m log 2 Φ 0 + log 2 ɛ ɛ 3 Deduce a bound on the execution time E[C max ] W m + λ log 2 Φ 0 with λ = max 1 α m m α m log 2 (1 h(α)) Marc Tchiboukdjian A Tighter Analysis of Work Stealing 12/18

Example: unit independent tasks Remainder w i (t): number of tasks on processor i at time t w(t): total number of tasks at time t steal half of the tasks if several thieves target the same victim, only one succeed First step: expected decrease of the potential Φ t = 1 i m ( Φ t = Φ t Φ t+1 = w i (t) w(t) m ) 2 = active processors 1 i m w 2 i (t) w 2 (t) m δ i (t) 1 m w 2 (t) w 2 (t) = w 2 (t) (w(t) α t ) 2 = 2α t w(t) α 2 t Marc Tchiboukdjian A Tighter Analysis of Work Stealing 13/18

Expected decrease of Φ in one step If processor i is not stolen, one unit of work is executed δ i (t) = w 2 i (t) w 2 i (t + 1) = w 2 i (t) (w i (t) 1) 2 = 2w i (t) 1 Marc Tchiboukdjian A Tighter Analysis of Work Stealing 14/18

Expected decrease of Φ in one step If processor i is not stolen, one unit of work is executed δ i (t) = w 2 i (t) w 2 i (t + 1) = w 2 i (t) (w i (t) 1) 2 = 2w i (t) 1 If processor j steals half of the work of processor i δ i (t) = wi 2 (t) wi 2 (t + 1) wj 2 (t + 1) ( = wi 2 wi (t) ) 2 ( wi (t) (t) 1 2 2 = w i 2(t) + w i (t) 1 2 ) 2 Marc Tchiboukdjian A Tighter Analysis of Work Stealing 14/18

Expected decrease of Φ in one step Expected decrease on active processor i { } E[δ i (t)] = P processor i is not stolen ( ) 2w i (t) 1 { } ( w 2 + P processor i is stolen i (t) ) + w i (t) 1 2 As there are m α t idle processors attempting to steal, { } ( P processor i is stolen = p(α t ) = 1 1 1 ) m αt m 1 Summing δ i on all active processors, we get E[ Φ t Φ t ] p(α t) 2 Φ t Marc Tchiboukdjian A Tighter Analysis of Work Stealing 15/18

Unit independent tasks: result Expected decrease of the potential in one step Solve the equation E[ Φ t Φ t ] p(α t) 2 Φ t E[S] λ m log 2 Φ 0 + m 1 with λ = Bound on the makespan 1 1 log 2 (1 + 1 e ) E[C max ] W m + λ log 2 Φ 0 + 1 W m + 3.65 log 2 W + 1 Results from simulation 2.37 log 2 W (gap: adversary choosing α t ) Marc Tchiboukdjian A Tighter Analysis of Work Stealing 16/18

Cooperative Stealing Standard steal: if several thieves target the same victim, only one succeed Cooperative steal: all thieves targeting the same victim succeed to steal some work If k processors steal processor i ( δ i (t) = w i (t) 2 wi (t) 1 ) 2 k ( wi (t) ) 2 ( 1 1 ) w i (t) 2 k + 1 k + 1 k + 1 Same analysis leads to E[C coop max ] W m + 2 log 2 (1 1 e ) log 2 W +1 W m +3.02 log 2 W +1 20% less steals Marc Tchiboukdjian A Tighter Analysis of Work Stealing 17/18

Conclusion Work stealing analysis Introduced a new technique based on a potential function Accurate Can modify the steal policy Not in the paper Improved constant factor for ABP DAG W Arora, Blumofe, Plaxton: m + 32 D Our analysis: W m + 5.5 D + 1 Our analysis also applies to weighted independent tasks and unrestricted DAG Marc Tchiboukdjian A Tighter Analysis of Work Stealing 18/18