Regression-based Statistical Bounds on Software Execution Time

Size: px

Start display at page:

Download "Regression-based Statistical Bounds on Software Execution Time"

Grant Barton
5 years ago
Views:

1 ecos 17, Montreal Regression-based Statistical Bounds on Software Execution Time Ayoub Nouri P. Poplavko, L. Angelis, A. Zerzelidis, S. Bensalem, P. Katsaros University of Grenoble-Alpes - France August 24, 2017

2 WCET: Statistical vs. Common Approaches WCET is useful for... Schedulability of real-time systems Performance Evaluation, e.g, Network Calculus Model-based design, i.e. building faithful Hw/Sw models 2 / 29

3 WCET: Statistical vs. Common Approaches WCET is useful for... Schedulability of real-time systems Performance Evaluation, e.g, Network Calculus Model-based design, i.e. building faithful Hw/Sw models Statistical approaches probabilistic upper-bounds, correctness probability 1 α Common approaches 100% certain upper-bounds, α = 0 Prob. Pr (TT > tt ii ) TT is a R.V modeling Exec. Time αα Exec. time measurements tt 1 tt ii tt αα>0 tt αα=0 Execution time tt nn 2 / 29

4 Advantages and Shortcomings Both approaches are accepted in the context of safety-critical systems with hard real-time constraints achieve α (EVT) and α = 0 (common approaches) 3 / 29

5 Advantages and Shortcomings Both approaches are accepted in the context of safety-critical systems with hard real-time constraints achieve α (EVT) and α = 0 (common approaches) Extreme Value Theory (EVT): i.i.d exec. times random Hw, e.g., cache Common approaches: costly faithful model of Hw 3 / 29

6 Advantages and Shortcomings Both approaches are accepted in the context of safety-critical systems with hard real-time constraints achieve α (EVT) and α = 0 (common approaches) Extreme Value Theory (EVT): i.i.d exec. times random Hw, e.g., cache Common approaches: costly faithful model of Hw Our approach: Maximal Execution Time (MET) statistical measurement-based non-safety-critical systems with soft real-time constraints e.g., video streaming systems 3 / 29

7 Measurement-Based Timing Analysis (MBTA) Program Exec. time measures tt ii Analysis WCET/MET ττ Input data samples 4 / 29

8 Measurement-Based Timing Analysis (MBTA) Program Exec. time measures tt ii Analysis WCET/MET ττ Input data samples Probabilistic MBTA Given α, compute the probabilistic upper-bound τ as τ = min t i s.t. P r{t < t i } (1 α) Prob. Pr (TT > tt ii ) Pr (TT < tt ii ) 1 αα TT is a R.V modeling Exec. Time αα Execution time 4 / 29

9 A simplified view... Density Density 1 αα Execution time Figure: Histogram and density Figure: Quantile function tt ii Execution time each exec. time observation t i is modeled as a random variable T i Assume it was a probability distribution... suppose T i are i.i.d and T i f then given α, the solution τ s.t. P r{t < t i } = (1 α) is τ = Q f (1 α) where, Q f is the Quantile function of f 5 / 29

10 Exec. Time is not Purely Random Density Execution time T i are not i.i.d no f such that T i f Figure: Execution time in practice Executions time dependencies input data, e.g., images X 1 : captures the size of input images, X 2 : captures their type mono/multi-chromatic,... Hw effect, e.g., cache 6 / 29

11 Main Contributions Maximal Regression Model a model that captures dependencies enables to derive execution times upper-bound from }{{} T = h(x 1, X 2,... ) + }{{}}{{} ɛ Execution Time Oracle for execution time Random error based on Linear Regression and Confidence Intervals 7 / 29

12 Main Contributions Maximal Regression Model a model that captures dependencies enables to derive execution times upper-bound from }{{} T = h(x 1, X 2,... ) + }{{}}{{} ɛ Execution Time Oracle for execution time Random error based on Linear Regression and Confidence Intervals Construction Methods a method to identify the set of most pertinent X i a method to control the quality of input data (measurements) 7 / 29

13 Main Contributions Maximal Regression Model a model that captures dependencies enables to derive execution times upper-bound from }{{} T = h(x 1, X 2,... ) + }{{}}{{} ɛ Execution Time Oracle for execution time Random error based on Linear Regression and Confidence Intervals Construction Methods a method to identify the set of most pertinent X i a method to control the quality of input data (measurements) An automated design-flow to compute MET integrates and implements the previous contributions used to analyze a JPEG Decoder 7 / 29

14 Outline 1 The Maximal Regression Model 2 Model Construction Techniques 3 A JPEG Decoder Study 4 Conclusions 8 / 29

15 Linear Regression (LR) Why is LR an appropriate model? Execution time can be always decomposed as a linear combination of code block contributions 9 / 29

16 Linear Regression (LR) Why is LR an appropriate model? Execution time can be always decomposed as a linear combination of code block contributions Ideal model Y = β 0 + β 1 X 1 + β 2 X β p X p + ɛ Y is the explained variable, i.e., execution time, X i are the explanatory vars. (predictors), i.e. block counters, β i are parameters/coefficients, i.e. block costs, ɛ is the regression error 9 / 29

17 Linear Regression (LR) Approximated model Y = β 0 + β 1 X 1 + β 2 X β p X p + ɛ 10 / 29

18 Linear Regression (LR) Approximated model Y = β 0 + β 1 X 1 + β 2 X β p X p + ɛ Ŷ = b 0 + b 1 X 1 + b 2 X b p X p Ŷ approximates the explained variable, i.e., execution time, X i are the explanatory variables (predictors), b i are estimations of β i to be computed (measurements), ɛ can be neglected (on average) 10 / 29

19 LR in the nutshell Greatest Common Divisor (GCD) I n p u t : i n t n, k ( n >= k ) Output : GCD( n, k ) r e p e a t { p := n ; n := k ; k := p mod n ; } u n t i l k=0; Execution time Y is proportional to the cost of a loop iteration (β 1 ) Nb of iterations (X 1 ) depends on the input parameters (n, k) r e t u r n n ; 11 / 29

20 LR in the nutshell Greatest Common Divisor (GCD) I n p u t : i n t n, k ( n >= k ) Output : GCD( n, k ) r e p e a t { p := n ; n := k ; k := p mod n ; } u n t i l k=0; Execution time Y is proportional to the cost of a loop iteration (β 1 ) Nb of iterations (X 1 ) depends on the input parameters (n, k) r e t u r n n ; Y = β 0 + β 1 X 1 + ɛ 11 / 29

21 LR in the nutshell Artificial Program I n p u t : i n t i1, i2, i 3 Output : out out := i 1 i 2 ; i f ( out <0) { out := out ; } w h i l e ( out > i 3 ) { out := out 0. 9 ; } r e t u r n out ; Execution time Y is proportional to the cost of the if body (β 1 ) and the while loop iteration (β 2 ) Nb of times if branch (X 1 ) is taken and nb of iterations (X 2 ) depend on the input parameters (i1, i2, i3) Y = β 0 + β 1 X 1 + β 2 X 2 + ɛ 12 / 29

22 The Maximal Regression Model Approximated LR model Ŷ = b 0 + b 1 X 1 + b 2 X b p X p Generally used to compute average execution time 13 / 29

23 The Maximal Regression Model Approximated LR model Ŷ = b 0 + b 1 X 1 + b 2 X b p X p Generally used to compute average execution time Execution Time + YY YY YY bb ii ββ ii bb ii + Figure: Confidence Interval on Ŷ Figure: Confidence Interval on β i 13 / 29

24 The Maximal Regression Model Approximated LR model Ŷ = b 0 + b 1 X 1 + b 2 X b p X p Generally used to compute average execution time Execution Time + YY YY YY bb ii ββ ii bb ii + Figure: Confidence Interval on Ŷ Figure: Confidence Interval on β i Our model Ŷ + = b b+ 1 X 1 + b + 2 X b p X p + ɛ + ɛ + is obtained by the same principle using a χ 2 distribution 13 / 29

25 Probabilistic bound The associated probability Ŷ + = b b+ 1 X 1 + b + 2 X b p X p + ɛ + P r{y < Ŷ + (p + 2)α } 1 2 assuming (1 α) confidence intervals for each b i and ɛ Challenges find predictors X i the most influential on the execution time avoid redundancy and inter-correlation consider a representative set of measurements w.r.t input data 14 / 29

26 Outline 1 The Maximal Regression Model 2 Model Construction Techniques 3 A JPEG Decoder Study 4 Conclusions 15 / 29

Identifying Predictors: Step-wise Regression Potential

if, switch / case, loops } Identifying the most relevant

parsimony: smallest/relevant/sufficient subset p P rule of

$induces over-fitting SNR(X i, p) < α p p \ { X i } P P { X i }$

27 Identifying Predictors: Step-wise Regression Potential predictors P P ={ a predictor per block of code e.g. if, switch / case, loops } Identifying the most relevant predictors, why? parsimony: smallest/relevant/sufficient subset p P rule of thumb N > 5p, i.e. smaller model - less measurements keeping too many variables induces over-fitting SNR(X i, p) < α p p \ { X i } P P { X i } X i := argmin SNR(X i, p) P; p = { X 0 };α X i := argmax SNR(X i, P) SNR(X i, P) > α p p { X i } P P \ { X i } End 16 / 29

28 What About Input Data?... identify algorithmic complexity factors, and come-up with an input data-set, where every combination of these factors is represented fairly Cook s Distance A distance measure D(t i ) that quantifies the influence of t i on the regression model recall that execution time depends on input data can be seen as a distance measure on input data Measurements shouldn t be dominated by outliers, i.e. D(t i ) > θ either add more similar samples, or remove them from the measurement set (keep for testing) 17 / 29

29 Computing MET Pragmatic MET Assumption Ŷ + = b b+ 1 X 1 + b + 2 X b p X p + ɛ + p MET: τ = b (bx) + i (bx) + i = i=1 + ɛ +, where { b + X + if b + > 0 b + X if b + < 0 know the min/max bounds of each X i, i.e. X + i, X i often, intuitive human interpretation can be given for X i then, an expert can give an min/max bounds on X i e.g. X 1 images size : max images size can be determined from camera resolution 18 / 29

30 Outline 1 The Maximal Regression Model 2 Model Construction Techniques 3 A JPEG Decoder Study 4 Conclusions 19 / 29

31 Experiments Setting JPEG Files Sequence of compressed MCUs (16x16 or 8x8 pixels) Each MCU contains pixel blocks In color format 4:1:1, it contains 6 blocks In monochromatic, it contains 1 block Pixel blocks are represented by a matrix of DCT coef. JPEG Decoder FPGA SPARC V8 with 7-stage pipeline Double precision FPU 4 KB instruction cache 4 KB data cache 256 L2 cache and an SDRAM reset Data caches for each new program run (i.e. each new image) input: 99 JPEG images, different sizes and color formats, output: 99 measurements (X i and execution time Y ) 0 MCU: Minimum Coded Units 0 DCT: Discrete Cosine Transform 20 / 29

32 Measurements identified 95 predictors P = {X 1,..., X 95 } X = X 0 X 1 X t , Y = t 99 Some observations... the maximal measured execution time was Mcycles corresponds to a particularly large image the average execution time was 1000 Mcycles Pre-processing split the set of measurements into 70 measurements for training, and 29 measurements for testing 21 / 29

33 Training set vs. Test set Training set Test set X train = X 0 X 1 X t , Y train = used to compute coefficients b + i and ɛ + t 70 X 0 X 1 X t 71 X test = , Y test = used to test whether the obtained model can correctly (generalize to/handle) different unseen inputs t / 29

34 Results Basic Model (p = 1, α = 0.05) Y = β 0 + ɛ error ɛ + = 6650 Mcycles pragmatic MET 8000 Mcycles under-estimates the maximal measured time (23600 Mcycles) 23 / 29

35 Results Basic Model (p = 1, α = 0.05) Y = β 0 + ɛ error ɛ + = 6650 Mcycles pragmatic MET 8000 Mcycles under-estimates the maximal measured time (23600 Mcycles) Our Method (p = 6, α = 0.05) Y = β i=1 β ix i + ɛ 23 / 29

36 Identified predictors per MCU block X 3 Cost p Human interpretation per Byte X 1 byte count in the main body of JPEG per pixel block X 2 pixel block count for those blocks that had correct prediction in term of 0-th DCT coefficient number of elements in color format (e.g. 5 for color, 0 for monochromatic) X 4 number of padded image dimensions, i.e. those not exactly proportional to the MCU size X 5 total number of MCUs in monochromatic images - not costly in term of need bytes for encoding (contrib. of costly blocks captured by X 1 ) - such dimensions implies less processing/- data copying for partial MCU blocks - complimentary to X 3 24 / 29

37 Results Basic Model (p = 1, α = 0.5) Y = β 0 + ɛ error ɛ + = 6650 Mcycles pragmatic MET 8000 Mcycles under-estimates the maximal measured time (23600 Mcycles) Our Method (p = 6, α = 0.05) Y = β i=1 β ix i + ɛ outliers detection error ɛ + = 240 pragmatic MET under-estim. on test-set 25 / 29

38 Results Basic Model (p = 1, α = 0.5) Y = β 0 + ɛ error ɛ + = 6650 Mcycles pragmatic MET 8000 Mcycles under-estimates the maximal measured time (23600 Mcycles) Our Method (p = 6, α = 0.05) Y = β i=1 β ix i + ɛ outliers detection error ɛ + = 240 pragmatic MET under-estim. on test-set + outliers detection error ɛ + = 52 pragmatic MET over-estim for all tests 25 / 29

39 Results Our Method (p = 8) Y = β i=1 β ix i + ɛ error ɛ + = 35 pragmatic MET (α = 0.05) pragmatic MET (α = ) 20 Residual in training set (Mcycles) 3000 Actual and Predicted Execution Times Actual alpha= alpha= Execution Times (Mcycles) test set 26 / 29

40 Outline 1 The Maximal Regression Model 2 Model Construction Techniques 3 A JPEG Decoder Study 4 Conclusions 27 / 29

41 Summary Problem computing a probabilistic upper-bound on embedded Sw execution time Approach for non safety-critical systems having soft real-time constraints statistical, Linear Regression measurement-based Contributions Maximal Regression Model (LR + CI) construction tech. (step-wise regression, Cook s distance, etc.) integrated design flow validated on a real-life system 28 / 29

42 Future Work Pragmatic MET is pessimistic likely to incur extra over-estimation including unfeasible execution paths e.g. switch-case branching: associates a separate predictor with every case/assumes that they all take maximal values simultaneously combine it with implicit path enumeration technique (IPET) keep only realistic/feasible execution paths Further directions model Hw effect using specially defined predictors investigate possible connection to Extreme Value Theory 29 / 29

Regression-based Statistical Bounds on Software Execution Time

Regression-based Statistical Bounds on Software Execution Time Peter Poplavko 1, Ayoub Nouri 1, Lefteris Angelis 2,3, Alexandros Zerzelidis 2, Saddek Bensalem 1, and Panagiotis Katsaros 2,3 1 Univ. Grenoble