Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Similar documents
Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Caches in WCET Analysis

Timing analysis and timing predictability

Design and Analysis of Time-Critical Systems Cache Analysis and Predictability

Toward Precise PLRU Cache Analysis

Embedded Systems 23 BF - ES

Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources

ICS 233 Computer Architecture & Assembly Language

A Definition and Classification of Timing Anomalies

Timing analysis and predictability of architectures

Task Models and Scheduling

Evaluation and Validation

Scheduling Periodic Real-Time Tasks on Uniprocessor Systems. LS 12, TU Dortmund

A framework for the timing analysis of dynamic branch predictors

Non-Preemptive and Limited Preemptive Scheduling. LS 12, TU Dortmund

Theory of Computer Science. Theory of Computer Science. D7.1 Introduction. D7.2 Turing Machines as Words. D7.3 Special Halting Problem

Fall 2008 CSE Qualifying Exam. September 13, 2008

ECE 571 Advanced Microprocessor-Based Design Lecture 10

Unit 6: Branch Prediction

Special Nodes for Interface

Portland State University ECE 587/687. Branch Prediction

CS20a: Turing Machines (Oct 29, 2002)

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Non-Work-Conserving Non-Preemptive Scheduling: Motivations, Challenges, and Potential Solutions

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

ECS 120 Lesson 23 The Class P

Real-Time Calculus. LS 12, TU Dortmund

Caches in WCET Analysis Predictability, Competitiveness, Sensitivity

Generalizing Data-flow Analysis

V Honors Theory of Computation

Lecture notes on Turing machines

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Abstractions and Decision Procedures for Effective Software Model Checking

Binary Decision Diagrams and Symbolic Model Checking

THE ZCACHE: DECOUPLING WAYS AND ASSOCIATIVITY. Daniel Sanchez and Christos Kozyrakis Stanford University

Decidability (What, stuff is unsolvable?)

Complexity: Some examples

CSCI-564 Advanced Computer Architecture

Disconnecting Networks via Node Deletions

Algorithmic verification

Software Verification

Real-Time Scheduling and Resource Management

Multiprocessor Scheduling I: Partitioned Scheduling. LS 12, TU Dortmund

3/11/18. Final Code Generation and Code Optimization

CMP 338: Third Class

(a) Definition of TMs. First Problem of URMs

On the analysis of random replacement caches using static probabilistic timing methods for multi-path programs

Compiling Techniques

Schedulability and Optimization Analysis for Non-Preemptive Static Priority Scheduling Based on Task Utilization and Blocking Factors

CEC 450 Real-Time Systems

CSE 200 Lecture Notes Turing machine vs. RAM machine vs. circuits

Time and Schedulability Analysis of Stateflow Models

[2] Predicting the direction of a branch is not enough. What else is necessary?

Part I: Definitions and Properties

19. Extending the Capacity of Formal Engines. Bottlenecks for Fully Automated Formal Verification of SoCs

Decision Procedures An Algorithmic Point of View

Accelerating Decoupled Look-ahead to Exploit Implicit Parallelism

Mixed Criticality in Safety-Critical Systems. LS 12, TU Dortmund

Schedulability of Periodic and Sporadic Task Sets on Uniprocessor Systems

Mm7 Intro to distributed computing (jmp) Mm8 Backtracking, 2-player games, genetic algorithms (hps) Mm9 Complex Problems in Network Planning (JMP)

ENEE350 Lecture Notes-Weeks 14 and 15

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

CMP N 301 Computer Architecture. Appendix C

CPU SCHEDULING RONG ZHENG

Cache-Oblivious Algorithms

Processor Design & ALU Design

Outline. policies for the first part. with some potential answers... MCS 260 Lecture 10.0 Introduction to Computer Science Jan Verschelde, 9 July 2014

State Machines. Example FSM: Roboant

Cache-Oblivious Algorithms

There are two main techniques for showing that problems are undecidable: diagonalization and reduction

Fall 2011 Prof. Hyesoon Kim

Proportional Share Resource Allocation Outline. Proportional Share Resource Allocation Concept

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Turing Machines, diagonalization, the halting problem, reducibility

Branch Prediction using Advanced Neural Methods

Intelligent Agents. Formal Characteristics of Planning. Ute Schmid. Cognitive Systems, Applied Computer Science, Bamberg University

Data Structures in Java

1 Reals are Uncountable

Introduction to Computer Science and Programming for Astronomers

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Analysis of Algorithms [Reading: CLRS 2.2, 3] Laura Toma, csci2200, Bowdoin College

CPSC 421: Tutorial #1

Efficient TDM-based Arbitration for Mixed-Criticality Systems on Multi-Cores

Computational Models Lecture 8 1

CS154, Lecture 10: Rice s Theorem, Oracle Machines

/ : Computer Architecture and Design

Microprocessor Power Analysis by Labeled Simulation

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Static Program Analysis using Abstract Interpretation

[2] Predicting the direction of a branch is not enough. What else is necessary?

Automatic Resource Bound Analysis and Linear Optimization. Jan Hoffmann

Performance Prediction Models

CS20a: Turing Machines (Oct 29, 2002)

COVER SHEET: Problem#: Points

Unit 1A: Computational Complexity

CSE 105 THEORY OF COMPUTATION. Spring 2018 review class

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65

Input Decidable Language -- Program Halts on all Input Encoding of Input -- Natural Numbers Encoded in Binary or Decimal, Not Unary

Further discussion of Turing machines

CSE 105 THEORY OF COMPUTATION

Transcription:

Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43

Most Essential Assumptions for Real-Time Systems Upper bound on the execution times: Commonly, called the Worst-Case Execution Time(WCET) Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 2 / 43

What does Execution Time Depend on Input parameters Algorithm parameters Problem size etc. Initial states and intermediate states of the system for executing Cache configuration, replacement policies Pipelines Speculations etc. Interferences from the environment Scheduling Interrupts etc. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 3 / 43

How to Derive the Worst-Case Execution Time (WCET) Most of industry s best practice Measure it: determine WCET directly by running or simulating a set of inputs. There is no guarantee to give an upper bound of the WCET. The derived WCET could be too optimistic. Exhaustive execution: by considering the set of all the possible inputs In general, not possible The inputs have to cover all the possible initial states and intermediate states of the system, which is also usually not possible. Compute it In general, not possible neither, as computing (tight) WCET for a program is uncomputable by Turing machines. Based on some structures, it is possible and the derived solution is a safe upper bound of the WCET. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 4 / 43

Why is It Uncomputable? Halting Problem Given the description of a Turing machine m and its input x, the problem is to answer whether the machine halts on x. Theorem Halting Problem is undecidable (uncomputable). In other words, one cannot use an algorithm to decide whether another algorithm m halts on a specific input. WCET is undecidable It is even undecidable if it terminates at all. Deriving the WCET is of course of undecidable. Please refer to the textbook of Computational Complexity by Prof. Papadimitriou. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 5 / 43

Execution Time Distribution Our objectives: Upper bound of execution time as tightly as possible. All control-flow paths, by considering all possible inputs. All paths through the architecture, resulting from the potential initial and assumed intermediate architectural states. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 6 / 43

Timing Analysis By considering systems, in general, with finite architectural configurations, finite input domains, and bounded loops and recursion, WCET is computable. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 7 / 43

Timing Analysis By considering systems, in general, with finite architectural configurations, finite input domains, and bounded loops and recursion, WCET is computable. But..., the search space is too large to explore it exhaustively! Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 7 / 43

Why is It Hard for Analyzing WCET? Execution time e(i) of machine instruction i In the good old time: e(i) is a constant c, which could be found in the data sheet Nowadays, especially for high-performance processors: e(i) also depends on the (architectural) execution state s. min{e(i, s) s S} e(i) max{e(i, s) s S}, where S is the set of all states. Using max{e(i, s) s S} is safe for WCET, but might be not tight since some states in S might not be possible reached by some inputs. Execution history, resulting in a smaller set of reachable execution states, has to be enforced to improve the tightness of the analysis. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 8 / 43

Variability of Execution Times Wilhelm et al. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 9 / 43

Timing Accidents and Penalties Timing Accident: cause for an increase of the execution time of an instruction Timing Penalty: the associated increase Types of timing accidents Cache misses Pipeline stalls Branch mispredictions Bus collisions Memory refresh of DRAM TLB miss Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 10 / 43

Overall Approach: Modularization Architecture Analysis: Use Abstract Interpretation. Exclude as many Timing Accidents as possible during analysis. Certain timing accidents will never happen, e.g., at a certain program point, instruction fetch will never cause a cache miss. The more accidents excluded, the lower (better) the upper bound. Determine WCET for basic blocks, based on context information. Worst-Case Path Determination: Map control flow graph to an Integer Linear Program (ILP). Determine upper bound and associated path. High-Level Objectives: Upper Bound of WCET It must be safe, i.e., not underestimate. It should be tight, i.e., not far away from real WCET. The analysis effort must be tolerable. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 11 / 43

Overall Structure Executable Binary Program Control Flow Graph (CFG) Reconstruction Loop Analysis and Unfolding Loop Bounds Static Analysis Value Analyzer Micro Architecture Abstraction Path Analysis ILP Generator Cache/Pipeline Analyzer Timing Information ILP Solver Micro architecture Analysis WCET Visualization and Analysis Evaluation Worst Case Path Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 12 / 43

Outline Introduction Program Path Analysis Static Analysis Value Analysis Cache Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 13 / 43

Control Flow Graph (CFG) Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 14 / 43

Basic Blocks Definition: A basic block is a sequence of instructions where the control flow enters at the beginning and exits at the end, in which it is highly amenable to analysis. a[0] := b[0] + c[0] a[1] := b[3] + c[3] a[2] := b[6] + c[6] d := a[0] a[1] e := d/a[2] if e < 10 goto L Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 15 / 43

Basic Blocks Definition: A basic block is a sequence of instructions where the control flow enters at the beginning and exits at the end, in which it is highly amenable to analysis. Determining the basic blocks Beginning: the first instruction targets of un/conditional jumps instructions that follow un/conditional jumps a[0] := b[0] + c[0] a[1] := b[3] + c[3] a[2] := b[6] + c[6] d := a[0] a[1] e := d/a[2] if e < 10 goto L Ending: the basic block consists of the block beginning and runs until the next block beginning (exclusive) or until the program ends Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 15 / 43

Program Path Analysis Problem: Which sequence of instructions is executed in the worst case (i.e., the longest execution time)? Input: Timing information for each basic block, derived from static analysis (value/cache/pipeline analysis) Loop bounds by specification CFG derived from the executable binary program Basic Concept: Transform structure of CFG into a set of (integer) linear equations Solution of the Integer Linear Program (ILP) yields bound on the WCET. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 16 / 43

Program Path Analysis: Formal Definition Input A CFG with N basic blocks, in which each basic block B i has a worst-case execution time c i, given by static analysis. Output Suppose that each block B i is executed exactly x i times. What is the worst-case execution time WCET = N c i x i, i=1 such that the values of x i s satisfy the structural constraints in the CFG? Note that additional constraints provided by the programmer (bounds for loop counters, etc.) can also be included. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 17 / 43

Example for CFG Constraints d9 d1 s=k; d2 while (k < 20) d3 if (ok) d4 d5 j++; j=0; ok=true; d8 Flow equations: (x i is a variable) d 1 = d 2 = x 1 d 3 + d 9 = d 2 + d 8 = x 2 d 4 + d 5 = d 3 = x 3 d 6 + d 7 = d 8 = x 4 d 4 = d 6 = x 5 d 5 = d 7 = x 6 d6 k++; d7 r=j d10 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 18 / 43

Example for Additional Constraints d9 d1 s=k; d2 while (k < 20) d3 if (ok) d4 d5 j++; j=0; ok=true; d6 d7 k++; d8 The loop is executed for at most 20 times when k is initialized with a non-negative number: x 3 20x 1. The basic block for j = 0; ok = true; is executed for at most one time: r=j d10 x 6 x 1. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 19 / 43

WCET: ILP Formulation maximize N c i x i i=1 such that d 1 = 1 j in(b i ) d j = k out(b j ) d k = x i, additional linear constraints i = 1,..., N Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 20 / 43

Outline Introduction Program Path Analysis Static Analysis Value Analysis Cache Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 21 / 43

Overall Structure Executable Binary Program Control Flow Graph (CFG) Reconstruction Loop Analysis and Unfolding Loop Bounds Static Analysis Value Analyzer Micro Architecture Abstraction Path Analysis ILP Generator Cache/Pipeline Analyzer Timing Information ILP Solver Micro architecture Analysis WCET Visualization and Analysis Evaluation Worst Case Path Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 22 / 43

Outline Introduction Program Path Analysis Static Analysis Value Analysis Cache Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 23 / 43

Value Analysis: Motivation and Method Motivation Provide access information to data-cache/pipeline analysis Detect infeasible paths Derive loop bounds Method Calculate intervals at all program points By considering addresses, register contents, local and global variables. Abstract Interpretation Perform the program s computation using value descriptions or abstract values in place of the concrete values. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 24 / 43

Outline Introduction Program Path Analysis Static Analysis Value Analysis Cache Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 25 / 43

Overall Structure Executable Binary Program Control Flow Graph (CFG) Reconstruction Loop Analysis and Unfolding Loop Bounds Static Analysis Value Analyzer Micro Architecture Abstraction Path Analysis ILP Generator Cache/Pipeline Analyzer Timing Information ILP Solver Micro architecture Analysis WCET Visualization and Analysis Evaluation Worst Case Path Analysis Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 26 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy hit [ab] CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy hit [ab] a 1 CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy a 1! hit [ab] CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy c 3? miss [ab] CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy miss [ab] c 3? c? CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy miss [ac] c 3? c = c 1 c 2 c 3 c 4! CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy c 3! miss [ac] CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy c 4? hit [ac] CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Caches: Fast Memory to Deal with the Memory Wall How they work: dynamically managed by replacement policy c 4! hit [ac] CPU Cache Main Memory Capacity: Latency: 32 KB 3 cycles 2 MB 100 cycles Why they work: principle of locality spatial temporal Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 27 / 43

Fully Associative Caches B log2 (s) Address: b log2 (8 b) Block offset Tag Tag Tag... Data Block Data Block Tag Data Block =? Yes: Hit! k = associativity No: Miss! MUX Data Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 28 / 43

k s k s s Set-Associative Caches log2 (s) Address: Tag Index B log log22(s) (8 b)log2 (8 b) b Block offset Cache Set: Tag Tag Tag log2 (s) Special cases:... Data Block Data Block Cache Set:... Data Block log2 (8 b) Tag Tag... Data Block Data Block Tag Data Block =? Yes: Hit! k s MUX No: Miss! Data direct-mapped cache: only one line per cache set fully-associative cache: only one cache set Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 29 / 43

Replacement Policies Least-Recently-Used (LRU) used in Intel Pentium I and MIPS 24K/34K First-In First-Out (FIFO or Round-Robin) used in Motorola PowerPC 56x, Intel XScale, ARM9, ARM11 Pseudo-LRU (PLRU) used in Intel Pentium II-IV and PowerPC 75x Most Recently Used (MRU) as described in literature Each cache set is treated independently: Set-associative caches are compositions of fully-associative caches. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 30 / 43

Cache Analysis for LRU Two types of cache analyses: 1 Local guarantees: classification of individual accesses Must-Analysis Underapproximates cache contents May-Analysis Overapproximates cache contents 2 Global guarantees: bounds on cache hits/misses Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 31 / 43

Challenges for Cache Analysis read z Always a cache hit/always a miss? read y read x write z Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 32 / 43

Challenges for Cache Analysis read z Always a cache hit/always a miss? read y read x 1. Initial cache contents unknown. 2. Different paths lead to these points. write z 3. Cannot resolve address of z. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 32 / 43

Using Abstract Interpretation read z Collecting Semantics = set of states at each program point that any execution may encounter there read y read x Two approximations: Collecting Semantics uncomputable write z Cache Semantics computable γ(abstract Cache Sem.) efficiently computable Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 33 / 43

Using Abstract Interpretation read z Collecting Semantics = set of states at each program point that any execution may encounter there read y read x Two approximations: Collecting Semantics uncomputable write z Cache Semantics computable γ(abstract Cache Sem.) efficiently computable Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 33 / 43

Using Abstract Interpretation read z Collecting Semantics = set of states at each program point that any execution may encounter there read y read x Two approximations: Collecting Semantics uncomputable write z Cache Semantics computable γ(abstract Cache Sem.) efficiently computable Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 33 / 43

Using Abstract Interpretation read z Collecting Semantics = set of states at each program point that any execution may encounter there read y read x Two approximations: Collecting Semantics uncomputable write z Cache Semantics computable γ(abstract Cache Sem.) efficiently computable Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 33 / 43

Using Abstract Interpretation read z Collecting Semantics = set of states at each program point that any execution may encounter there read y read x Two approximations: Collecting Semantics uncomputable write z Cache Semantics computable γ(abstract Cache Sem.) efficiently computable Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 33 / 43

Least-Recently-Used (LRU): Concrete Behavior s Cache Miss : z y x t s z y x LRU has notion of age s Cache Hit : z y s t s z y t Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 34 / 43

LRU: Must-Analysis: Abstract Domain Used to predict cache hits. Maintains upper bounds on ages of memory blocks. Upper bound associativity memory block definitely cached. Example Abstract state: {x} age 0 {s,t} age 3... and its interpretation: Describes the set of all concrete cache states in which x, s, and t occur, x with an age of 0, s and t with an age not older than 2. γ([{x},, {s, t}, ]) = {[x, s, t, a], [x, t, s, a], [x, s, t, b],...} Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 35 / 43

Sound Update Local Consistency Abstract Update (must) (must ) γ Lifted Concrete Update γ concrete cache states concrete cache states Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 36 / 43

LRU: Must-Analysis: Update Potential Cache Miss : Definite Cache Hit : {x} {s,t} {x} {s,t} z s {z} {x} {s,t} {s} {x} {t} Why does not t age in the second case under Must-Cache analysis? Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 37 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: Must-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {c,f} {e} {a} {a,c} Intersection + Maximal Age How many memory blocks can be in the must-cache? Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 38 / 43

LRU: May-Analysis: Abstract Domain Used to predict cache misses. Maintains lower bounds on ages of memory blocks. Lower bound associativity memory block definitely not cached. Abstract state: {x,y} age 0 {s,t} {u} age 3 and its interpretation: Describes a set of all concrete cache states, where no memory blocks except x, y, s, t, and u occur, x and y with an age of at least 0, s and t with an age of at least 2, u with an age of at least 3. γ([{x, y},, {s, t}, {u}]) = {[x, y, s, t], [y, x, s, t], [x, y, s, u],...} Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 39 / 43

LRU: May-Analysis: Update Definite Cache Miss : Potential Cache Hit : {x} {s,t} {y} {x} {s,t} {y} z s {z} {x} {s,t} {s} {x} {y,t} Why does t age in the second case? Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 40 / 43

LRU: May-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {a,c} {c,f} {e} {a} = {e} {f} Union + Minimal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 41 / 43

LRU: May-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {a,c} {c,f} {e} {a} = {e} {f} Union + Minimal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 41 / 43

LRU: May-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {a,c} {c,f} {e} {a} = {e} {f} Union + Minimal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 41 / 43

LRU: May-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {a,c} {c,f} {e} {a} = {e} {f} Union + Minimal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 41 / 43

LRU: May-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {a,c} {c,f} {e} {a} = {e} {f} Union + Minimal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 41 / 43

LRU: May-Analysis: Join Need to combine information where control-flow merges. Join should be conservative: γ(a) γ(a B) γ(b) γ(a B) {a} {c} {a,c} {c,f} {e} {a} = {e} {f} Union + Minimal Age Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 41 / 43

Summary of WCET Analysis Value analysis Cache analysis using statically computed effective addresses and loop bounds Pipeline analysis Model CPU as a big state machine. There have been some new results published in 2015 from Prof. Jan Reineke s group from Uni des Saarlandes. Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 42 / 43

ait-tool Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 43 / 43