Supporting Intra-Task Parallelism in Real- Time Multiprocessor Systems José Fonseca

Technical Report Supporting Intra-Task Parallelism in Real- Time Multiprocessor Systems José Fonseca CISTER-TR-121007 Version: Date: 1/1/2014

Technical Report CISTER-TR-121007 Supporting Intra-Task Parallelism in Real-Time Multiprocessor Systems Supporting Intra-Task Parallelism in Real-Time Multiprocessor Systems José Fonseca CISTER Research Unit Polytechnic Institute of Porto (ISEP-IPP) Rua Dr. António Bernardino de Almeida, 431 4200-072 Porto Portugal Tel.: +351.22.8340509, Fax: +351.22.8340509 E-mail: jaf@det.ua.pt http://www.cister.isep.ipp.pt Abstract Multiple programming models are emerging to address the increased need for dynamic task-level parallelism in applications for multi-core processors and shared-memory parallel computing, presenting promising solutions from a user-level perspective. Nonetheless, while high-level parallel languages offer a simple way for application programmers to specify parallelism in a form that easily scales with problem size, they still leave the actual scheduling of tasks to be performed at runtime. Therefore, if the underlying system cannot efficiently map those tasks on the available cores, the benefits will be lost. This is particularly important in modern real-time systems as their average workload is rapidly growing more parallel, complex and computing-intensive, whilst preserving stringent timing constraints. However, as the realtime scheduling theory has mostly been focused on sequential task models, a shift to parallel task models introduces a completely new dimension to the scheduling problem. Within this context, the work presented in this thesis considers how to dynamically schedule highly heterogeneous parallel applications that require real-time performance guarantees on multi-core processors. A novel scheduling approach called RTWS is proposed. RTWS combines the G-EDF scheduler with a priority-aware work-stealing load balancing scheme, enabling parallel real-time tasks to be executed on more than one processor at a given time instant. Two stealing sub-policies have arisen from this proposal and their suitability is discussed in detail. Furthermore, this thesis describes the implementation of a new scheduling class in the Linux kernel concerning RTWS, and extensively evaluate its feasibility. Experimental results demonstrate the greater scalability and lower scheduling overhead of the proposed approach, comparatively to an existing real-time deadline-driven scheduling policy for the Linux kernel, as well as reveal its better performance when considering tasks with intra-task parallelism than without, even for short-living applications. We show that busy-aware stealing is robust to small deviations from a strict priority schedule and conclude that some priority inversion may be actually acceptable, provided it helps reduce contention, communication, synchronisation and coordination between parallel threads. CISTER Research Unit www.cister.isep.ipp.pt 1

P P P

m m τ τ = τ 1,...,τ n τ i i th 1 i n J i,j j th τ i j 1 J i T i C i τ i s C i > 0 O i τ i s O i 0 T i τ i s P i >C i D i τ i s D i C i u i τ i s u i = C i /T i a i,j J i,j s a i,j a i,j 1 + T i d i,j J i,j s d i,j = a i,j + D i f i,j J i,j s f i,j a i,j

D i T i D i = T i

u sum (τ) = n u i 1 i ln(2) 69, 3 u sum (τ) = 2 13 + 3 13 + 2 15 + 3 17 = 72.2 τ 4 τ 3 τ 4 τ 3 τ 1 τ 3 C i T i D i O i τ 1 τ 2 τ 3 τ 4

C i T i u i τ 1 ϵ τ 2 ϵ τ 3 ϵ m =2 n =3 t =0 τ 1 τ 2 τ 3 ϵ P 1 P 2 J 1,1 J 2,1 [0, 2ϵ] ϵ J 3,1 ni u i < 2 ϵ 0 n i u i 1

τ i P (τ i ) P P (τ i ) =1 P (τ i ) = m

m (m + 1)/2 u sum (τ) m (m + 1)/2 u max (τ) m u sum (τ) m (m 1)u max (τ). ζ ζ ζ m/(2m 1) u max (τ) u sum (τ) m 2 /(2m 1). k k m (k 1) + u sum(τ) u k 1 u k, u k k ζ k ζ 1/2

u sum (τ) (m + 1)/2. k k k min k min 1/2 m

u sum (τ) (m+1)/2+ϵ m u sum (τ) ( 1/u max(τ) m + 1) ( 1/u max (τ) + 1), n>m/( 1/u max (τ) ) n τ u max (τ) =1

T T 1 T p p T p T 1 p + T. T T 1 S p

S p S 1 p, T S max P S max

O(log n ) n

RT RT

m p 1,p 2,...,p m m τ 1,...,τ n m j th τ i a i,j r i,j s i,j d i,j = r i,j +T i T i τ i

f i,j a i,j r i,j s i,j f i,j j th τ i w k i,j, 1 k n i n i j th τ i n i 2 τ i wi,j k τ i e k i,j C i τ i τ i u i = C i T i U Π = n i u i Π m u Π = max 1 i n u i Γ A A Γ τ i Γ d i τ i Π m u Π 1; U Π m u Π (m 1) m m

m m τ i p

t =0 m m

P s m P = {p 1,p 2,...,p m } P s = {P s P s P, n pi 1} n pi p i P s w edf P s w edf P s 1 w edf P s : min d r k (P s ),P s 1 min τ 1 =(5, 10) τ 2 = (10, 20) τ 3 =(4, 19) τ 1 τ 2 τ 3 u Π =0.5 U Π =1.21 t =0 τ 1 τ 3 t =[0, 5]

τ 1 t =5 τ 2 t =7 t =7 t =7 w2,1 2 t = 10 τ 1 w2,1 3 w2,1 3 w2,1 3 t = 11 t = 12 τ 1 t = 13

D i T i C i

u min u max T min T max T i T i = T min + x (T max T min ) x [U Πmin,U Πmax ] [0.38, 0.40] [0.58, 0.60] [0.73, 0.75] u i u i = u min + x (u max u min ) n k=1 u k U Πmin n k=1 u k U Πmax C i C i = T i u i n i = x (m 2) n U Πmax u max n C i

m C i m =4 [0.38, 0.40] 457 128 8 4 = 7.14

L m =4 158 82 =1.93 m =8 217 50 320 82 401 98 4

n m