Dependency Graph Approach for Multiprocessor Real-Time Synchronization. TU Dortmund, Germany

Dependency Graph Approach for Multiprocessor Real-Time Synchronization Jian-Jia Chen, Georg von der Bru ggen, Junjie Shi, and Niklas Ueter TU Dortmund, Germany 14,12,2018 at RTSS Jian-Jia Chen 1 / 21

Multiprocessor Scheduling Partitioned Multiprocessor Scheduling ready queue ready queue ready queue P 1 P 2 P m Jian-Jia Chen 2 / 21

Multiprocessor Scheduling Global Multiprocessor Scheduling ready queue P 1 P 2 P m Jian-Jia Chen 2 / 21

Multiprocessor Scheduling Semi-Partitioned Multiprocessor Scheduling ready queue ready queue ready queue P 1 P 2 P m Jian-Jia Chen 2 / 21

Resource Sharing Shared Resources: Data structures, variables, main memory area, file, set of registers, I/O unit Mutual exclusion Critical section Uniprocessor systems: Priority Inheritance Protocols (PIP) Priority Ceiling Protocols (PCP) Stack Resource Policy (SRP) Jian-Jia Chen 3 / 21

Existing Multiprocessor Locking Protocols Partitioned scheduling MPCP, 1990 MSRP, 2001 MrsP, 2013 FMLP, 2007 Jian-Jia Chen 4 / 21

Existing Multiprocessor Locking Protocols Partitioned scheduling MPCP, 1990 MSRP, 2001 MrsP, 2013 FMLP, 2007 Semi-partitioned scheduling DPCP, 1988 ROP, 2016, and ROP-Enforce, 2017 Jian-Jia Chen 4 / 21

Two Correlated Problems Scheduler Design Problem Design locking protocols to synchronize the critical sections Design scheduling policies to schedule the synchronized tasks Partition the tasks to processors if the protocol is restricted to partitioned or semi-partitioned scheduling Schedulability Test Problem Validate the schedulability of a scheduling algorithm. Jian-Jia Chen 5 / 21

Open Problems for Multiprocessor Locking Protocols Performance of these protocols highly depends on How the tasks are partitioned How the tasks are prioritized Whether a job/task being blocked should spin or suspend itself How the resources are shared locally and globally Jian-Jia Chen 6 / 21

Open Problems for Multiprocessor Locking Protocols Performance of these protocols highly depends on How the tasks are partitioned Lakshmanan et al. (RTSS 2009), Nemati et al. (OPODIS 2010): grouping strategies Hsiu et al. (EMSOFT 2013): distributed execution mechanism, priority-based Huang et al. (RTSS 2016): resource-oriented partitioning (ROP) How the tasks are prioritized von der Brüggen et al. (RTNS 2017): different priority-assignment strategies under ROP Afshar et al. (RTCSA 2017): optimal priority assignment for spin-based protocols Whether a job/task being blocked should spin or suspend itself How the resources are shared locally and globally Jian-Jia Chen 6 / 21

Open Problems for Multiprocessor Locking Protocols Performance of these protocols highly depends on How the tasks are partitioned Lakshmanan et al. (RTSS 2009), Nemati et al. (OPODIS 2010): grouping strategies Hsiu et al. (EMSOFT 2013): distributed execution mechanism, priority-based Huang et al. (RTSS 2016): resource-oriented partitioning (ROP) How the tasks are prioritized von der Brüggen et al. (RTNS 2017): different priority-assignment strategies under ROP Afshar et al. (RTCSA 2017): optimal priority assignment for spin-based protocols Whether a job/task being blocked should spin or suspend itself Yang et al. (RTSS 2016) for global scheduling How the resources are shared locally and globally Jian-Jia Chen 6 / 21

Open Problems for Multiprocessor Locking Protocols Performance of these protocols highly depends on How the tasks are partitioned Lakshmanan et al. (RTSS 2009), Nemati et al. (OPODIS 2010): grouping strategies Hsiu et al. (EMSOFT 2013): distributed execution mechanism, priority-based Huang et al. (RTSS 2016): resource-oriented partitioning (ROP) How the tasks are prioritized von der Brüggen et al. (RTNS 2017): different priority-assignment strategies under ROP Afshar et al. (RTCSA 2017): optimal priority assignment for spin-based protocols Whether a job/task being blocked should spin or suspend itself Yang et al. (RTSS 2016) for global scheduling How the resources are shared locally and globally Huang et al. (RTSS 2016): resource-oriented partitioning (ROP) Jian-Jia Chen 6 / 21

Research Questions What is the fundamental difficulty? What is the performance gap of partitioned, semi-partitioned, and global scheduling? Is it always beneficial to prioritize critical sections? Jian-Jia Chen 7 / 21

A Simple Task/Synchronization Model C 1,1 C 2,1 C 3,1 C 4,1 C 5,1 A 1,1 A 2,1 A 3,1 A 4,1 A 5,1 C 1,2 C 2,2 C 3,2 C 4,2 C 5,2 Mutex-Lock S1 Mutex-Lock S2 A set of tasks that arrive at the same time Task τ i has one critical section using one share resource: A i,1 Task τ i has two non-critical sections: C i,1, C i,2 Jian-Jia Chen 8 / 21

Abundant Processors? normal execution critical section critical sec. release P 1: τ 1 P 2: τ 2 P 3: τ 3 P 4: τ 4 P 5: τ 5 P 6: τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 9 / 21

Abundant Processors? normal execution critical section critical sec. release P 0: sync τ2 τ5 τ1 τ6 τ3 τ4 P 1: τ 1 P 2: τ 2 P 3: τ 3 P 4: τ 4 P 5: τ 5 P 6: τ 6 0 2 4 6 8 10 12 14 16 18 t If tasks share one resource, image a virtual processor P 0 the critical section of task τ i is released at time C i,1 the critical section of task τ i has deadline of H C i,2 H is the target deadline Jian-Jia Chen 9 / 21

Strongly NP-Hard Scheduling problem 1 r j L max, strongly NP-complete 1: uniprocessor systems r j : a set of jobs {J j } with different arrival times r j and deadlines d j default is non-preemptive scheduling, the execution time of job J j is p j (decision version) L max: whether there is a feasible schedule Reduction from 1 r j L max Makespan H is a positive integer with H > d j for every job J j J j, we construct a task τ j with one critical section 1 C j,1 is set to r j 2 C j,2 is set to H d j 3 A j,1 is set to p j All constructed tasks share one resource in the critical sections Jian-Jia Chen 10 / 21

Computational Complexity and Implications Assume that the task set needs synchronization. Makespan: Theorem The makespan problem on M processors is N P-hard in the strong sense even if M is sufficiently large under any scheduling paradigm. Bin Packing: Theorem Minimizing the number of processors for a given common deadline of T is N P-hard in the strong sense under any scheduling paradigm. Theorem There is no polynomial-time (approximation) algorithm to minimize the number of processors for a given common deadline of T under any scheduling paradigm unless P = N P. Jian-Jia Chen 11 / 21

Dependency-Graph Approach (DGA) A two-step approach: First step: create a dependency graph G = (V, E) A task τ i has three vertices, for C i,1, A i,1, and C i,2, in a chain C1,1 C2,1 C3,1 C4,1 C5,1 A1,1 A2,1 A3,1 A4,1 A5,1 C1,2 C2,2 C3,2 C4,2 C5,2 Mutex-Lock S1 Mutex-Lock S2 Jian-Jia Chen 12 / 21

Dependency-Graph Approach (DGA) A two-step approach: First step: create a dependency graph G = (V, E) A task τ i has three vertices, for C i,1, A i,1, and C i,2, in a chain If τ i and τ j share the same binary Mutex-Lock, their A i,1 and A j,1 must be defined in the precedence constraint C1,1 C2,1 C3,1 C4,1 C5,1 A1,1 A2,1 A3,1 A4,1 A5,1 C1,2 C2,2 C3,2 C4,2 C5,2 Mutex-Lock S1 Mutex-Lock S2 Jian-Jia Chen 12 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 normal execution critical section critical sec. release P 0 : sync P 1 : τ 1 P 2 : τ 2 P 3 : τ 3 P 4 : τ 4 τ 2 C i,1 A i,1 C i,2 τ 1 3 3 3 τ 2 1 2 5 τ 3 4 3 2 τ 4 9 1 4 τ 5 2 2 4 τ 6 3 2 0.5 P 5 : τ 5 P 6 : τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 13 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 normal execution critical section critical sec. release P 0 : sync P 1 : τ 1 P 2 : τ 2 P 3 : τ 3 P 4 : τ 4 τ 2 τ 5 C i,1 A i,1 C i,2 τ 1 3 3 3 τ 2 1 2 5 τ 3 4 3 2 τ 4 9 1 4 τ 5 2 2 4 τ 6 3 2 0.5 P 5 : τ 5 P 6 : τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 13 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 normal execution critical section critical sec. release P 0 : sync P 1 : τ 1 P 2 : τ 2 P 3 : τ 3 P 4 : τ 4 τ 1 τ 2 τ 5 C i,1 A i,1 C i,2 τ 1 3 3 3 τ 2 1 2 5 τ 3 4 3 2 τ 4 9 1 4 τ 5 2 2 4 τ 6 3 2 0.5 P 5 : τ 5 P 6 : τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 13 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 normal execution critical section critical sec. release P 0 : sync P 1 : τ 1 P 2 : τ 2 P 3 : τ 3 P 4 : τ 4 τ 1 τ 2 τ 5 τ 3 C i,1 A i,1 C i,2 τ 1 3 3 3 τ 2 1 2 5 τ 3 4 3 2 τ 4 9 1 4 τ 5 2 2 4 τ 6 3 2 0.5 P 5 : τ 5 P 6 : τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 13 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 normal execution critical section critical sec. release P 0 : sync P 1 : τ 1 P 2 : τ 2 P 3 : τ 3 P 4 : τ 4 τ 1 τ 2 τ 5 τ 3 τ 4 C i,1 A i,1 C i,2 τ 1 3 3 3 τ 2 1 2 5 τ 3 4 3 2 τ 4 9 1 4 τ 5 2 2 4 τ 6 3 2 0.5 P 5 : τ 5 P 6 : τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 13 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 normal execution critical section critical sec. release P 0 : sync P 1 : τ 1 P 2 : τ 2 P 3 : τ 3 P 4 : τ 4 τ 1 τ 2 τ 5 τ 3 τ 4 τ 6 C i,1 A i,1 C i,2 τ 1 3 3 3 τ 2 1 2 5 τ 3 4 3 2 τ 4 9 1 4 τ 5 2 2 4 τ 6 3 2 0.5 P 5 : τ 5 P 6 : τ 6 0 2 4 6 8 10 12 14 16 18 t Jian-Jia Chen 13 / 21

Generate A Dependency Graph Jackson s rule (non-preemptive EDF) for the sync. processor: The deadline of task τ i is H C i,2 P 0 : sync τ 2 τ 5 τ 1 τ 3 τ 4 τ 6 0 2 4 6 8 10 12 14 16 18 t C 2,1 C 5,1 C 1,1 C 3,1 C 4,1 C 6,1 A 2,1 A 5,1 A 1,1 A 3,1 A 4,1 A 6,1 C 2,2 C 5,2 C 1,2 C 3,2 C 4,2 C 6,2 Jian-Jia Chen 13 / 21

Schedules for the DAG tasks (List Scheduling) 1 2 2 1 2 2 normal execution (1st/2nd) 1 2 1 2 2 1 3 4 1 3 2 2 τ 1 τ 2 τ 3 τ 4 τ 5 τ 6 Semaphore S2 Mutex-Lock S1 critical section (S1/S2) P 1 : τ 1 τ 1 τ 3 τ 3 τ 5 τ 5 τ 1 τ 3 τ 5 τ 6 P 2 : τ 2 τ 2 τ 4 τ 4 τ 6 τ 6 τ 2 τ 4 0 2 4 6 8 10 12 14 16 18 20 t Jian-Jia Chen 14 / 21

Schedules for the DAG tasks (List Scheduling) 1 2 2 1 2 2 normal execution (1st/2nd) 1 2 1 2 2 1 3 4 1 3 2 2 τ 1 τ 2 τ 3 τ 4 τ 5 τ 6 Semaphore S2 Mutex-Lock S1 critical section (S1/S2) P 1 : τ 1 τ 1 τ 3 τ 3 τ 5 τ 5 τ 1 τ 3 τ 5 τ 6 P 2 : τ 2 τ 2 τ 4 τ 4 τ 6 τ 6 τ 2 τ 4 τ 6 0 2 4 6 8 10 12 14 16 18 20 t Jian-Jia Chen 14 / 21

Makespan Analysis Algorithms used in two steps: first step: α-approximation algorithm for the problem 1 r j L max under the delivery-time model: d j is negative Jian-Jia Chen 15 / 21

Makespan Analysis Algorithms used in two steps: first step: α-approximation algorithm for the problem 1 r j L max under the delivery-time model: d j is negative Extended Jackson s rule: α = 2 Potts iterative process: α = 1.5 Hall and Shmoys bidirectional iterative process: α = 4/3 Hall and Shmoys polynomial-time approximation scheme: α = 1 + ɛ for any ɛ > 0 second step: list scheduling (semi-partitioned) to schedule a DAG for the problem P prec C max Jian-Jia Chen 15 / 21

Limitation of DGA The two-step approach in DGA results in non-optimal solutions even if both steps are done optimally regardless of complexity proof based on a concrete example Jian-Jia Chen 16 / 21

Limitation of DGA The two-step approach in DGA results in non-optimal solutions even if both steps are done optimally regardless of complexity proof based on a concrete example For minimizing the makespan, the approximation ratio by optimizing both steps optimally is at least 2 2 M + 1 M 2 2 1 M under any scheduling paradigm under partitioned or semi-partitioned scheduling Jian-Jia Chen 16 / 21

Evaluation Setup Evaluations with M = 4, 8, and 16 processors 1000 task sets, each with 10M tasks RandomFixedSum by Emberson and Davis was modified The number of shared resources: z {4, 8, 16} The length A i,1 is a fraction of the total execution time C i,1 + C i,2 + A i,1 of task τ i, depended on β {5% 50%} Jian-Jia Chen 17 / 21

Makespan Performance JKS-SP-NP JKS-P-NP JKS-SP-P JKS-P-P POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 LB is the lower bound (detailed in the paper) 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance JKS-SP-NP JKS-P-NP JKS-SP-P JKS-P-P POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 D is the target deadline 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance JKS-SP-NP JKS-P-NP JKS-SP-P JKS-P-P POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 D/LB: performance gap between the approximation and the optimal solutions 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance JKS-SP-NP JKS-P-NP JKS-SP-P JKS-P-P POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 percentage of task sets meeting their deadlines 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance JKS-SP-NP JKS-P-NP JKS-SP-P JKS-P-P POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P Acceptance Ratio (%) 100 80 60 40 20 (a) M=8 z=8 β=10%-40% Partition(P) or Semi-Partition (SP) Preemptive(P) or Non-Preemptive (NP) 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance JKS-SP-NP JKS-P-NP JKS-SP-P JKS-P-P POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance JKS-P-NP JKS-SP-P POTTS-P-NP POTTS-SP-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Makespan Performance POTTS-SP-NP POTTS-P-NP POTTS-SP-P POTTS-P-P 100 (a) M=8 z=8 β=10%-40% Acceptance Ratio (%) 80 60 40 20 0 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 D/LB Jian-Jia Chen 18 / 21

Experimental Results: Schedulability Acceptance Ratio (%) 1.0 JKS-SP-P POTTS-SP-P ROP-EDF ROP-FP (a) M=8 z=8 β=5%-10% 100 0.8 80 0.6 60 40 0.4 20 0.2 0 40 50 60 70 80 90 100 Utilization (%) / M (b) M=8 z=8 β=40%-50% 100 80 60 40 20 0 40 50 60 70 80 90 100 0.0 0.0 0.2 0.4 0.6 0.8 1.0 ROP-FP: ROP under fixed-priority scheduling and release enforcement. ROP-EDF: ROP under EDF and release enforcement POTTS-SP-P: Our approach by applying algorithm Potts for generating G, semi-partitioned list scheduling, and preemptive scheduling for the second non-critical section JKS-SP-P: Our approach by applying algorithm JKS for generating G, semi-partitioned list scheduling, and preemptive scheduling for the second non-critical section Jian-Jia Chen 19 / 21

Conclusion Difficulty mainly due to the sequencing of the mutual exclusion of share resources. None of the following tricks helps adding more processors removing periodicity and job recurrence introducing task migration allowing preemption Performance gap of partitioned and semi-partitioned scheduling mainly due to scheduling the dependency graph Partitioned scheduling P prec, tied Cmax is less understood Jian-Jia Chen 20 / 21

Open Problems Work-conserving is not the best Existing protocols assume work-conserving for critical sections DGA reveals a potential of non-work-conserving protocols Jian-Jia Chen 21 / 21

Open Problems Work-conserving is not the best Existing protocols assume work-conserving for critical sections DGA reveals a potential of non-work-conserving protocols Extension to periodic task sets One special case is directly applicable: a mutex lock is only shared among the tasks that have the same period For each of the z mutex locks, a DAG is constructed The z resulting DAGs can be scheduled using any approach for multiprocessor DAG scheduling Other cases remain open Jian-Jia Chen 21 / 21