Space-aware data flow analysis

Similar documents
Partial model checking via abstract interpretation

The L Machines are very high-level, in two senses:

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Static Program Analysis using Abstract Interpretation

Element x is R-minimal in X if y X. R(y, x).

MIT Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Flow grammars a flow analysis methodology

A Lower Bound of 2 n Conditional Jumps for Boolean Satisfiability on A Random Access Machine

Models of Computation,

Static Program Analysis

Mechanics of Static Analysis

A Tableau Calculus for Minimal Modal Model Generation

Axiomatic Semantics. Semantics of Programming Languages course. Joosep Rõõmusaare

Chapter 4: Computation tree logic

Discrete Fixpoint Approximation Methods in Program Static Analysis

Data flow analysis. DataFlow analysis

Tableau-based decision procedures for the logics of subinterval structures over dense orderings

Dynamic Semantics. Dynamic Semantics. Operational Semantics Axiomatic Semantics Denotational Semantic. Operational Semantics

Introduction to Metalogic

Supplementary Notes on Inductive Definitions

Enhancing Active Automata Learning by a User Log Based Metric

Introduction to Metalogic 1

What is SSA? each assignment to a variable is given a unique name all of the uses reached by that assignment are renamed

From EBNF to PEG. Roman R. Redziejowski. Giraf s Research

DR.RUPNATHJI( DR.RUPAK NATH )

Lecture Notes: Program Analysis Correctness

Artificial Intelligence Chapter 7: Logical Agents

Lecture Notes on Inductive Definitions

Shortest paths with negative lengths

Hoare Logic: Reasoning About Imperative Programs

CMSC 631 Program Analysis and Understanding. Spring Data Flow Analysis

Proving Inter-Program Properties

Dynamic Noninterference Analysis Using Context Sensitive Static Analyses. Gurvan Le Guernic July 14, 2007

Proof Techniques (Review of Math 271)

Chapter 3 Deterministic planning

CSC D70: Compiler Optimization Static Single Assignment (SSA)

Hoare Logic and Model Checking

Lecture Notes on Inductive Definitions

EDUCATIONAL PEARL. Proof-directed debugging revisited for a first-order version

CSC D70: Compiler Optimization Dataflow-2 and Loops

AN EXTENSION OF THE PROBABILITY LOGIC LP P 2. Tatjana Stojanović 1, Ana Kaplarević-Mališić 1 and Zoran Ognjanović 2

Lecture Notes on Emptiness Checking, LTL Büchi Automata

Markov Decision Processes

arxiv: v1 [cs.cc] 18 Mar 2013

Proving languages to be nonregular

Dataflow Analysis. Chapter 9, Section 9.2, 9.3, 9.4

Data-Flow Analysis. Compiler Design CSE 504. Preliminaries

Hoare Logic (I): Axiomatic Semantics and Program Correctness

Temporal logics and explicit-state model checking. Pierre Wolper Université de Liège

Propositional Logic: Part II - Syntax & Proofs 0-0

On Sequent Calculi for Intuitionistic Propositional Logic

arxiv: v1 [cs.ds] 9 Apr 2018

Relational Reasoning in Natural Language

Program verification. Hoare triples. Assertional semantics (cont) Example: Semantics of assignment. Assertional semantics of a program

Time Petri Nets. Miriam Zia School of Computer Science McGill University

COSE312: Compilers. Lecture 17 Intermediate Representation (2)

Chapter 6: Computation Tree Logic

CDS 270 (Fall 09) - Lecture Notes for Assignment 8.

Computational Models Lecture 8 1

Computation Tree Logic (CTL) & Basic Model Checking Algorithms

A Lower Bound for Boolean Satisfiability on Turing Machines

Finite Universes. L is a fixed-length language if it has length n for some

Inductive Definitions

Introduction to Temporal Logic. The purpose of temporal logics is to specify properties of dynamic systems. These can be either

INDUCTION AND RECURSION

BASIC CONCEPTS OF ABSTRACT INTERPRETATION

KRIPKE S THEORY OF TRUTH 1. INTRODUCTION

Krivine s Intuitionistic Proof of Classical Completeness (for countable languages)

Advanced Topics in LP and FP

CPSC 421: Tutorial #1

The State Explosion Problem

Hoare Logic: Part II

Mathematics 114L Spring 2018 D.A. Martin. Mathematical Logic

A Formal Model and an Algorithm for Generating the Permutations of a Multiset

Propositional and Predicate Logic - V

On the Complexity of the Reflected Logic of Proofs

From Liveness to Promptness

Foundations of Artificial Intelligence

Dynamic Programming: Shortest Paths and DFA to Reg Exps

Polynomial Time Algorithms for Minimum Energy Scheduling

Program Analysis Part I : Sequential Programs

1 Showing Recognizability

Chapter 2. Assertions. An Introduction to Separation Logic c 2011 John C. Reynolds February 3, 2011

Description Logics: an Introductory Course on a Nice Family of Logics. Day 2: Tableau Algorithms. Uli Sattler

Termination Graphs for Java Bytecode. Marc Brockschmidt, Carsten Otto, Christian von Essen, Jürgen Giesl

Guarded resolution for Answer Set Programming

First-Order Logic. 1 Syntax. Domain of Discourse. FO Vocabulary. Terms

Computational Logic. Davide Martinenghi. Spring Free University of Bozen-Bolzano. Computational Logic Davide Martinenghi (1/30)

Lecture 23 : Nondeterministic Finite Automata DRAFT Connection between Regular Expressions and Finite Automata

A call-by-name lambda-calculus machine

Formal Verification Techniques. Riccardo Sisto, Politecnico di Torino

Search and Lookahead. Bernhard Nebel, Julien Hué, and Stefan Wölfl. June 4/6, 2012

On the Exponent of the All Pairs Shortest Path Problem

Classical Program Logics: Hoare Logic, Weakest Liberal Preconditions

Theory of Computing Tamás Herendi

Characterizing Fault-Tolerant Systems by Means of Simulation Relations

Propositional logic. First order logic. Alexander Clark. Autumn 2014

Axiomatic Semantics. Lecture 9 CS 565 2/12/08

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Computational Models Lecture 8 1

Transcription:

Space-aware data flow analysis C. Bernardeschi, G. Lettieri, L. Martini, P. Masci Dip. di Ingegneria dell Informazione, Università di Pisa, Via Diotisalvi 2, 56126 Pisa, Italy {cinzia,g.lettieri,luca.martini,paolo.masci}@iet.unipi.it 1 Introduction Data Flow Analysis (DFA for short) is a basic technique to collect statical information on run-time behaviors of programs: it is essential in optimizing compilers and is also used in type inference problems [1]. When performing a DFA, the program to be analysed is modeled by its Control Flow Graph (CFG) and by a set of transformation functions, one for each node in the graph. The set of program execution states is generally abstracted in a lattice. Each function models the effect of execution of the corresponding program instruction on the (abstracted) program execution state. Then, starting from the initial node in the CFG, and from an abstraction of the initial execution state, every possible execution path in the CFG should be tried, and the execution states encountered at each node collected. Standard DFA uses a fix point iteration to find an approximation of this information. DFA is expensive in time (because of the fixed point iteration), but also in space: a representation of the program execution state must be stored for each CFG node for the entire duration of the analysis. In this paper we propose a method to reduce space usage during some kind of DFA analysis. The method proceeds as follows: first, we define a formal proof system for sentences like all execution paths that start at node n, with initial execution state f, and reach node Technical Report. This paper may not be, either entirely or in part, published and distributed without the explicit permission of the authors. 1

m, have been tried ; the rules in the proof deduce sentences about sets of paths, based on the validity of sentences on sub-paths. We show that every proof in our system contains an approximation of the information collected at nodes when all paths are executed. Then, we give an algorithm to find proofs in our system. The algorithm constructs the proof in a goal directed (or backward) fashion. The algorithm has the potential to save space since it can release some program state information each time it has constructed a proof for a sub-goal. This space saving is possible, for example, when we are interested in the (non)existence of a given execution state or in the collection of information only at some program points. In particular, Java bytecode verification fits these requirements, and the proposed algorithm can be used to develop a Verifier on memory limited embedded systems, e.g. Java cards [4]. 2 Basic concepts Definition 1 (Directed graph) A directed graph G = (V, E) consists of a set of nodes V together with a set of directed edges E V V. The function succ : V 2 V returns the successors of a node: succ(n) = {m (n, m) E}. Definition 2 (Path) A path from n to m is a sequence of nodes n = n 1,..., n j = m such that (n i, n i+1 ) E for 1 i < j. Note that the null path n is always a valid path from n to n. In the following, to denote a path n 1,..., n j we will use the string n 1 n j. We write n p if p = p np for some possibly empty strings p and p. Given a graph G = (V, E) and a set of nodes V, G is the subgraph of G obtained by pruning the edges from each n to its successors: G = (V, E/{(n, m) n }) The set of all paths from node n to node m in G is denoted as n G m. We use the notation n = m instead of n G m, when G is clear from the context. Moreover, when = we write n = m. 2

(a) 2 {1,2,3} ==== 2 3 {1,2} === 2 2 === {1,2} 5 3 {1,2} === 5 5 === {1,2} 5 2 {1} == 5 1 = 5 Figure 1: (a) A control flow graph (b) The control tree of the graph (b) 5 {1,4} === 5 4 {1} == 5 The CFG of a program is a directed graph, containing the control dependences among the instructions of the program [1]. Each node in the CFG represents an instruction or a basic block. Definition 3 (CFG) A CFG is a 4-tuple (V, E, start, end) where (V, E) is a directed graph, start, end V, succ(end) = and there is a path from start to every node in V. An example of control flow graph is shown in Fig. 1(a), where start is node 1 and end is node 2. The immediate postdominator of a node n in a directed graph represents the first common node in all the paths that start at node n. Definition 4 (Postdomination) Let n, m nodes in CFG: m postdominates n, denoted by m pd n, if m is on every path from n to end. We use n pd + m iff m pd n and m n. Moreover m immediately postdominates n, denoted by m = ipd(n), if m pd + n and there is no node r such that m pd + r pd + n. For example, in Fig. 1(a), ipd(1) = ipd(2) = ipd(4) = 5 and ipd(3) = 2. In the following, we assume that end is reachable from any other node of the CFG. As consequence, the immediate postdominator always exists for each node different from end. Lemma 1 (pd and pd + transitivity) Let n 1, n 2, n 3 nodes in CFG: (i) if n 1 pd n 2 n 2 pd n 3, then n 1 pd n 3 (ii) if n 1 pd + n 2 n 2 pd + n 3, then n 1 pd + n 3. 3

Proof. (i) Omitted for brevity. (ii) Since pd + implies pd, by (i) it is n 1 pd n 3. We now prove that n 1 pd + n 3. By contradiction, assume n 1 = n 3. We have n 1 pd + n 2 n 2 pd + {n 1} n 1. Take now a path p 1 n 1 == end. Since n 2 pd + n 1, we have n 2 p 1. Now take the subpath p 2 of p 1 from the last occurrence of n 2 to end. It is {n 1} p 2 n 2 == end, contradicting n 1 pd + n 2. 2.1 Collecting abstract execution states DFA calculates a representation of the collecting semantics of an abstract interpretation of a program [2]. For our purposes, we can define abstract interpretation and collecting semantics in the following way. We assume a finite complete lattice (F, ) ranged over by f, f, g,.., partially ordered by. Given f, g F, f g means f g and f g. Moreover, f g denotes the least upper bound of f and g. We use to denote the least element in F. Given a CFG=(V, E, start, end), an abstract execution state is denoted as n:f, with n V and f F, where f represents the per-node information needed by the analysis (frame, from now on). For each node n in the CFG, we introduce a transformation function I n : F F. The transformation function models the effect of the execution of the instruction at node n on the frame part of the abstract execution state. Given an abstract execution state n:f, the abstract interpretation of the CFG produces a set of next abstract execution states, given by m:i n (f), for each m succ(n). This gives an abstract execution tree, rooted at start:f 0, where f 0 models the initial program state. We assume that n, I n is a monotonic function. Given a path p = n 1... n k, we denote by I p the composition of the transformation functions of the path: f, I p (f) = I nk (... (I n1 (f)...). If p is empty, I p is the identity function. The collecting semantics associates to each node of the CFG the set of frames that appeared at the CFG in the abstract execution tree. We are interested in all frames that appear in paths that start at a given node, and end in any node belonging to a given set. To this end, we introduce the following definition: Definition 5 (N n:f n:f ) Let G = (V, E) with V. N V F is defined as the set of pairs such that: m:g N n:f iff pm n = m and g = I p (f). Note that N start is the collecting semantics of the program. The following lemma can 4

be easily proved: Lemma 2 Let 1, 2 V, n V, f F. If 1 2 then N n:f N n:f The following relation is defined on sets of abstract states: Definition 6 ( ) Given A, B V F, A B iff n:g A, n:g B such that g g. Note that A B implies A B. The following lemma can be proved by monotonicity of I n. Lemma 3 Let V, n V and f, f F with f f, it is N n:f N n:f 3 A formal proof system In this section we give a formal proof system for sentences of the form Γ n:f m:g, where Γ: V F represents the constraints, n V is the source node, f F is the source frame, m V is the target node, and g F is the target frame. Assume first that the constraints are empty (dom(γ) = ). In this scenario, this sentence means that, in all execution paths that start with execution state n:f, and end in node m, the execution arrives to node m with a frame g g. If the the constraints are not empty, the sentence has the same meaning, as before, for the paths that never reach a node in dom(γ). For the paths that reach a node n dom(γ), instead, the sentence says that, the first time node n is reached, the execution state is n :f, with f Γ(n ). In this case, nothing is said on the rest of the path (including the target frame). The general form of a rule in the formal proof system is a number of premise sentences above an horizontal line with a single conclusion sentence below the line. The number of premises may be zero (axiom). Each rule asserts that the conclusion sentence is valid if all the premise sentences are valid, and the conditions are met. Fig. 2 reports a formal proof system for sentences. 2 1. In the rules, the notation Γ, n:f (n dom(γ)) represents the function Γ such that Γ (n) = f, and Γ (n ) = Γ(n ), n dom(γ). Rule NULL considers the case when the target node is the same as the source node (and the source node is not in the constraints). In this case, there is 5

NULL Γ n:f n:f IPD if n dom(γ) (succ(n) = {m 1,..., m s }) CYCLE Γ n:f m: Γ, n:f m 1 :I n (f ) ipd(n):f 1 Γ, n:f m s :I n (f ) ipd(n):f s Γ n:f ipd(n): PD f i 1 i s Γ n:f ipd(n):f Γ ipd(n):f m:f Γ n:f m:f Figure 2: A formal proof system if n dom(γ), f Γ(n) if f f, n dom(γ) if m pd + n, m ipd(n), n dom(γ) only one path to consider: the null path that starts and ends in the source node. Here the sentence is clearly true if the target frame is equal to the source frame. Rule CYCLE says that, if the source node is in the constraints, the sentence is true, provided that the source frame verifies the constraint. Note that the soundness of the formal proof system could be preserved by choosing, in each rule, target frames greater than, or equal to, the ones given in Fig. 2. The rules are given in the present form to increase the accuracy (i.e. to prove Theorem 5). The truth of this sentence could be asserted for any target frame. However, the axiom only considers the frame, to simplify formal proof system. Rule IPD considers the case when the target node is the immediate postdominator of the source node. The rule says that the sentence s is true if there exists a frame f such that all the premises are true. Each premise is a new sentence, where the source node is a successor of the source node of s, and the target node is the same as in s. In each premise, a constraint of f on the source node of s is added. Rule PD considers the case when the target node postdominates the source node, but it is not the ipd of the source node. The rule says that the sentence is true if we can prove a similar sentence on all the paths from the source node to its ipd, and then from the ipd to the target node. 6

A derivation tree is a tree of sentences with leaves at the top and a root at the bottom, where each sentence is obtained from the ones immediately above it by some rules of the proof system. A sentence is derivable if it is the root of a derivation. Note that a given sentence may be the root of more than one derivation. In particular, there may be different premises that meet the requirements of rule IPD, each with a different value for f. Each derivation tree T has the form: T = T 1 T n Γ n:f m:f Where T 1,..., T n are (eventually empty) derivation trees. We use T to indicate the root of the tree (the conclusion sentence). We use T to indicates the set of derivations that are a proof for the premise sentences of T. It is: T = Γ n:f m:f and T = {T 1,, T n }. We are interested in properties of entire derivations, rather than in properties of derivable sentences. We write T to indicate the set of pairs n:f, such that n:f appears as the source of at least one sentence of T. The set T can be recursively defined as follows: (i) =, (ii) given T with T = Γ n:f m:f, T = T T T {n:f}. The following lemma states that the frame Γ(n) in a derivable sentence is an upper bound for all the frames paired with node n in all the proofs of the sentence. The proof is by structural induction on the shape of the derivation. Lemma 4 Given a derivation tree T, with T = Γ n:f m:f, ξ dom(γ), if g such that ξ:g T, then g Γ(ξ) The following lemma states that the the target frame of a derivable sentence is an upper bound of all the frames paired with the target node in all the proofs of the sentence. The proof is by structural induction on the shape of the derivation. Lemma 5 Given T with T = Γ n:f m:f, if g such that m:g T, then g f. The following theorem shows that every proof for a given sentence s contains a 7

safe approximation of the collection of abstract execution states obtained starting from the source of s, and ending in the nodes that are a either a constraint, or the target of s. Theorem 1 (safety) Given T with T = Γ n:f m:f, it is N n:f dom(γ) {m} Proof. The proof is by structural induction on the shape of the derivation. NULL. Trivial. CYCLE. Trivial. T. IPD. Let = dom(γ) {ipd(n)} and = {n}. Assume succ(n) = {m 1,..., m s }. It is T i T, T i = Γ, n:f m i :I n (f ) ipd(n):f i and T = Γ n:f ipd(n): 1 i s f i. The induction hypothesis is: i {1... s}, N mi:in(f ) T i. The induction thesis is: N n:f n:f T. Consider ξ:g N. By Definition 5, p n = ξ, with p = qξ and I q (f) = g. If q is empty, then it must be ξ = n and g = f. Since n:f is the source of T, n:f T. If q is not empty, let p = np ξ with p ξ m j = ξ for some m j succ(n). Now, two cases may arise. Case 1. n p : then, p ξ m j == ξ. By Definition 5, ξ:g = ξ:i p (I n (f)) N mj:in(f). Since f f and I n is monotone, by Lemma 3 it is N mj:in(f) N mj:in(f ) T j (by induction hypothesis). Moreover, T j T (by definition of ). Case 2. n p ξ. Let p = p 1np 2n p h 1 np h with p 1n m j1 == n, for some m j1 succ(n); p l n m j l == n, (1 < l < h) and p h m j h == ξ. Consider the first sub-path. By Definition 5 we can write: n:i p 1 (I n (f)) N mj 1 :In(f). By Lemma 3, since f f and I n is monotone, N mj 1 :In(f) N mj 1 :In(f ) T j 1 (by induction hypothesis). Let f 1 = I p 1 (I n (f)). By Lemma 4, f 1 f since n:f is a constraint in T j1. Consider now the second sub-path. Now n:i p 2 (I n (f 1)) N mj 2 :In(f 1 ). Since f 1 f by the result of the first sub-path, N mj 2 :In(f 1 ) N mj 2 :In(f ) T j 2 (by induction hypothesis). Let f 2 = I p 2 (I n (f 1)). By Lemma 4, f 2 f. The same reasoning applies to all sub-paths until p h 1 and we obtain f h 1 f. In the last path we have g = I p h (I n (f h 1 )). It is ξ:g N mj h :In(f h 1 ) N mj h :In(f ) T j h T. PD. Let = dom(γ) {ipd(n)} and = dom(γ) {m}. It is T = Γ n:f m:f and T = {T 1, T 2 }, with T 1 = Γ n:f ipd(n):f and T 2 = Γ ipd(n):f m:f. The induction hypothesis is: N n:f T 1 and N ipd(n):f T 2. The induction thesis is: N n:f n:f T. Consider ξ:g N. By Definition 5, p n == ξ, with p = qξ and I q (f) = g. We have two cases. Case 1. ipd(n) 8

p: then, p n ==== ξ. By Definition 5, it is ξ:i q (f) N n:f Lemma 2). By induction hypothesis, N n:f N n:f (by T 1 T. Case 2. ipd(n) p. Let p = p ipd(n)p ξ, with p ipd(n) n ==== ipd(n) and ipd(n)p ξ ipd(n) == ξ and g = I p (I ipd(n) (I p (f))). By Definition 5, we have: ipd(n):i p (f) N n:f N n:f (by Lemma 2). By induction hypothesis, N n:f T 1. By Lemma 5, we have that I p (f) f. Now we can write: ξ:g N ipd(n):i p (f) ipd(n):f N by Lemma 3. Finally, by induction hypothesis, N ipd(n):f T 2 T. 4 The algorithm In the following we give an algorithm that constructs a proof in the formal proof system in Fig. 2. The algorithm is given in two steps. In the first step we give an algorithm to explore paths in the CFG, disregarding frames. The paths are encoded in a tree structure, called the control tree, defined operatively as the result of the recursive procedure ctree(γ, n, m) shown in Fig. 3, where γ V and n, m V. The control tree for the CFG of Fig. 1(a) is shown in Fig. 1(b). The control tree relates sets of paths to other (smaller) sets of paths. The relation among sets of paths in the control tree is similar to the relation among sentences in a derivation in the proof system of Fig. 2, as far as the CFG nodes are concerned. The paths from a node n to a node m that postdominates n are split in the paths from node n to its ipd, joined to the paths that go from ipd(n) to m. The paths that go from a node n to its ipd are the union of the paths that go from every successor of n to ipd(n). The set γ is used to break cycles, and plays a role similar to dom(γ) in the formal proof system. The following theorem states that the control tree for the paths that go from a node n to a node m always exists, provided that m pd n. Theorem 2 (existence of control tree) n, m V such that m pd n, γ V it is true that: (i) ctree(γ, n, m) never fails; (ii) ctree(γ, n, m) always terminates. Proof sketch. (i) By definition, ctree(γ, n, m) in Fig. 3 fails iff m does not postdominates n. It is sufficient to show that, whenever ctree(γ, n, m) is called with m pd n, all recursive invocations of ctree preserve this same property. The proof can be given 9

ctree(γ, n, m) { if(n = m n γ) return ; n = γ m if(m = ipd(n)) /* let succ(n) = {m 1,..., m s } */ return ctree(γ {n}, m 1, m)... ctree(γ {n}, m s, m) ; (*) n = γ m if(m pd n) fail;} ctree(γ, n, ipd(n)) ctree(γ, ipd(n), m) return ; (**) n = γ m Figure 3: An algorithm for finding the control tree by examination of the definition of ctree. (ii). By contradiction. Let s = s 1 s 2... be an infinite sequence of recursive invocations of ctree(γ, n, m). First we note γ may never decrease in such a sequence. Therefore s cannot contain an infinite number of invocations of type (*) in s, since, at each invocation, γ is augmented and, at most, it can contain all the nodes in V, which are finite. Let s k be the last invocation of type (*). Then s = s k+1 s k+2... only contains invocations of type (**). Moreover, since the first of the two invocations in (**) will cause a subsequent invocation of type (*), then s only contains instances of the second invocation in (**). Consider invocation s i in s, with arguments γ i, n i and m i. Then, invocation s i+1 will have arguments γ i, ipd(n i ) and m i. Thus, the second argument of all recursive invocations in s form an infinite sequence n 0, n 1, n 2,... where n i+1 = ipd(n i ). Since nodes are finite, at least a node n must appear twice in this sequence. Suppose node n appears at positions j and j + k. We have n = n j+k pd + n j+k 1 pd +... pd + n j = n, implying n pd + n (by transitivity of pd + ) which is a contradiction. Fig. 4 shows an algorithm to find a proof for the sentence Γ n:f m:f. Given a proof T we define tf (T ) as the target frame in T. In Fig. 4, the statement throw 10

dfa( t 1 t s, Γ, f) { n = γ m (1) if(n = m) return ; Γ n:f n:f (2) if(n γ) { if(f Γ(n)) return ; else throw n:f Γ(n);} Γ n:f m: (3) if(m = ipd(n)) { f := f; do { for(1 i s) T i := dfa(t i, (Γ, n:f ), I n (f ));} while catch(n:f ); T 1 T s return Γ n:f ipd(n): ; } tf (T i ) 1 i s (4) if(m pd n) { T 1 := dfa(t 1, Γ, f); T 2 := dfa(t 2, Γ, tf (T 1)); T 1 T 2 return ; } } Γ n:f m:tf (T 2) Figure 4: An algorithm for constructing a proof n:f rises an exception of the form n:f. An exception normally causes immediate termination of the current instance of dfa, and is rethrown in the enclosing instance, unless the exception is caught. The statement do {c} while catch(n:f ) catches all exceptions n:f thrown by command c. If an exception is caught, f is assigned to f and command c is reexecuted. Otherwise, the do statement terminates. The algorithm constructs the proof in a goal oriented fashion, using a form of backtracking (embodied by the exception mechanism) to find a suitable frame f in the application of rule IPD. The following theorems show that the algorithm is correct. Theorem 3 (termination of dfa) For each t = ctree(γ, n, m), with m pd n, for each Γ such that γ dom(γ), and for each f F, dfa(t, Γ, f) either terminates correctly, or it throws n: f such that n γ and Γ( n) f. Proof sketch. The proof is by induction on the depth d of t. Base step. If d = 1, then it must be (by construction of t) that either n = m or n γ. In the former case, the algorithm will take branch (1) in Fig. 4, terminating 11

correctly. In the latter case, it will take branch (2). There, it will either terminate correctly, or (if f Γ(n)) it will throw n: f, with n = n γ and f = f Γ(n). Note that Γ(n) f, and it cannot be Γ(n) = f, since this would mean that f Γ(n). Induction Step. Assume that the theorem holds for all control trees with depth d d, with d > 1. We want to show that the theorem also holds for all control trees with depth d. If d > 1, we have two cases. Case 1. m = ipd(n): Branch 3 is taken. Each recursive invocation of dfa will (by induction hypothesis) either terminate correctly or throw n: f, with n γ {n}. If all recursive invocations terminate correctly, also dfa will terminate correctly. If a recursive invocation throws n: f, we have two cases. Case 1.1. n = n: the exception is caught, and the do loop is repeated, with f = f. Note that, by induction hypothesis, the new value of f is strictly greater than the previous one. Thus, this step cannot be repeated indefinitely, since lattice F is of finite height. Case 1.2 n n: the exception is not caught. The current instance of dfa rethrows n: f. The conditions on n and f are met by the induction hypothesis. Case 2. m ipd(n): Branch (4) is taken. By induction hypothesis, the two recursive invocations of dfa will either terminate correctly or throw n: f. It is easy to see that, in both cases, the theorem holds. Corollary 1 For each control tree t and for each f F, dfa(t,, f) always terminates correctly. Proof. By theorem 3, no exception can be throw, since dom(γ) =. Theorem 4 Let T = dfa(ctree(dom(γ), n, m), Γ, f). T is a derivation for the sentence Γ n:f m:tf (T ) in the proof system. Proof sketch. By induction on the depth of t = ctree(dom(γ), n, m). Let MOP(n) the least upper bound of the frames paired with node n in the abstract execution states (that is the meet over all paths frame for n, that for a distributive problem is also the solution of the dataflow equations (the information collected during the DFA) [3]): n CFG, MOP(n) = {f n:f N start }. We now prove that MOP(n) is an upper bound for all the frames paired with source node n in the sentences of the derivation dfa(ctree(, start, end),, f 0 ). 12

Lemma 6 Let I n be distributive over. I n (f) MOP(v). Proof. If f MOP(n), then v succ(n), By monotonicity of I n, we have I n (f) I n ( {g n:g N start }) = (by distributivity of I n ) {I n (g) n:g N start } {I m (g) v succ(m) m:g N start } = MOP(v). The last inequality holds because v can also be the successor of a node different from n. Theorem 5 Let T = dfa(ctree(dom(γ), n, m), Γ, f), with I n distributive for each n. If f MOP(n) and n dom(γ), Γ(n ) MOP(n ), then: (i) if dfa terminates, then v:g T, g MOP(v), and tf (T ) MOP(m); (ii) if dfa arises an exception v:g, then g MOP(v). Proof sketch. The proof is by induction on the depth d of t = ctree(dom(γ),n,m). Base step. If d = 1, then either branch (1) or (2) in Fig. 4 is taken. The step is trivially proved by inspection of Fig. 4. If branch (1) is taken, dfa terminates, T = {n:f}, m = n and tf (T ) = f. The theorem holds because f MOP(n) by hypothesis. If branch (2) is taken, assume dfa terminates. We have that T = {n:f} and tf (T ) =. Point (i) is satisfied since f MOP(n) by hypothesis, and MOP(m) by lattice definition. Assume dfa rises an exception. By construction the exception is n:f Γ(n). Point (ii) holds because f Γ(n) MOP(n) by definition (it is f MOP(n) and Γ(n) MOP(n) by hypothesis). Induction Step. Assume that the theorem holds for all control trees with depth d < d, with d > 1. We prove that the theorem holds for all control trees with depth d. Since d > 1, either branch (3) or (4) in Fig. 4 is taken. If branch (3) is taken, it is m = ipd(n), T = Γ n:f ipd(n): 1 i s tf (T i ) and T = {T 1,..., T s}, where T i = dfa(t i, (Γ, n:f), I n (f )), with depth of t i = d 1, and v i is the source node of T i. The first time the loop is executed, f = f MOP(n) (by hypothesis). Moreover, I n (f ) MOP(v i ), by Lemma 6. Since the theorem holds on each recursive call (by induction hypothesis), and the hypothesis of the theorem are easily verified, we know that either point (i) or (ii) hold for each call. If a recursive call rises an exception v:g (point (ii)) we have two cases: if v n, the exception is rethrown, with g MOP(v) 13

and the induction step is proved; if v = n, the exception is caught and the loop is repeated with f = g MOP(n). The same reasoning can be repeated, until all recursive calls terminate correctly (we know by Theorem 3 that this will eventually occurs). Then, we have tf (T i ) MOP(ipd(n)) (1 i s) and 1 i s tf (T i ) MOP(ipd(n)) (by definition of ). Finally, point (i) for the induction step is proved by observing that T = 1 i s T i {n:f}, f MOP(n) by hypothesis, and point (i) already holds for each T i. If branch (4) is taken, the induction step is easily proved by applying the theorem to the two recursive calls, in sequence. The complete analysis is performed by dfa(ctree(, start, end),, f 0 ) that, by Theorem 3, always terminates correctly (no exception can be thrown, since dom(γ) = ). Moreover by Theorem 1 every proof T is a safe approximation of the collection of abstract execution states and by Theorem 5 every node is visited with a frame MOP(n). In embedded system, RAM is a severely limited resource. The dfa algorithm in Fig. 4 is able to reduce RAM usage in DFA analysis provided that: the CFG and the ipd function (which are read only information) are stored in PROM; the control tree is built on the fly; Γ is implemented with a global stack that grows/shrinks whenever branch (3) is taken/returns; dfa only returns the target frame of the conclusion of the sub-derivation. Note that branch (3) is non trivial (that is, it causes recursive calls to dfa that do not terminate immediately) when the source node represents a control instruction of the program (that is, it has more than one successor). In this case, branch (3) causes all paths that belongs to the scope of the control instruction to be tried, collecting information at the control instruction exit (that is, the point where all paths originating from the control instruction meet: the ipd). The maximum size of the Γ stack, then, is almost equal to the maximum nesting depth of control instructions of the program. The number of frames that must be stored in RAM can be estimated in twice the maximum depth of the global stack (for each element of the stack we need one frame for the element itself, and one frame for the accumulation at ipd targets). By contrast, we can think of standard DFA as requiring a number of frames roughly equal to the total number of control instructions. Since the maximum nesting depth is largely independent from the total number of control instructions, our approach has the 14

potential to be feasible for a larger number of programs, for a given amount of RAM. References [1] A. V. Aho, R.Sethi, and J.D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Publishing Company, Reading, Massachusetts, 1986. [2] P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In 4th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages Proceedings, pages 238 252. Los Angeles, California, 1977. [3] John B. am and Jeffrey D. Ullman. Monotone data flow analysis frameworks. Acta Inf., 7:305 317, 1977. [4] X. Leroy. Bytecode verification for Java smart card. Software Practice & Experience, 32:319 340, 2002. 15