Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution

Similar documents
Verification of String Manipulating Programs Using Multi-Track Automata

CS243, Logic and Computation Nondeterministic finite automata

Chapter 6. Properties of Regular Languages

1 Alphabets and Languages

Closure under the Regular Operations

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Sri vidya college of engineering and technology

Optimizing Finite Automata

T (s, xa) = T (T (s, x), a). The language recognized by M, denoted L(M), is the set of strings accepted by M. That is,

Finite Universes. L is a fixed-length language if it has length n for some

CS 455/555: Finite automata

CSE 105 THEORY OF COMPUTATION

Chapter 5. Finite Automata

Automata-based Verification - III

Nondeterminism. September 7, Nondeterminism

How do regular expressions work? CMSC 330: Organization of Programming Languages

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Chap. 1.2 NonDeterministic Finite Automata (NFA)

Finite Automata. Seungjin Choi

Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine

arxiv: v2 [cs.fl] 29 Nov 2013

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

Theory of Computation

CS:4330 Theory of Computation Spring Regular Languages. Finite Automata and Regular Expressions. Haniel Barbosa

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

Great Theoretical Ideas in Computer Science. Lecture 4: Deterministic Finite Automaton (DFA), Part 2

Constructions on Finite Automata

Automata-based Verification - III

Sanjit A. Seshia EECS, UC Berkeley

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

CS 275 Automata and Formal Language Theory

CS 154. Finite Automata, Nondeterminism, Regular Expressions

Equivalence of DFAs and NFAs

Theory of Computation Lecture 1. Dr. Nahla Belal

Foreword. Grammatical inference. Examples of sequences. Sources. Example of problems expressed by sequences Switching the light

Lecture 3: Nondeterministic Finite Automata

Nondeterministic Finite Automata

CS 21 Decidability and Tractability Winter Solution Set 3

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Computational Theory

Concatenation. The concatenation of two languages L 1 and L 2

Finite Automata. Wen-Guey Tzeng Computer Science Department National Chiao Tung University

Size reduction of multitape automata

2. Elements of the Theory of Computation, Lewis and Papadimitrou,

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

CS 154, Lecture 3: DFA NFA, Regular Expressions

10. Finite Automata and Turing Machines

UNIT-III REGULAR LANGUAGES

Author: Vivek Kulkarni ( )

Unit 6. Non Regular Languages The Pumping Lemma. Reading: Sipser, chapter 1

CPS 220 Theory of Computation REGULAR LANGUAGES

CS375 Midterm Exam Solution Set (Fall 2017)

Lecture Notes On THEORY OF COMPUTATION MODULE -1 UNIT - 2

Introduction to Turing Machines. Reading: Chapters 8 & 9

Deterministic Finite Automata (DFAs)

CPSC 421: Tutorial #1

Deterministic Finite Automata (DFAs)

CS21004 Formal Languages and Automata Theory, Spring

Languages, regular languages, finite automata

REGular and Context-Free Grammars

Turing Machines (TM) Deterministic Turing Machine (DTM) Nondeterministic Turing Machine (NDTM)

Regular Expressions. Definitions Equivalence to Finite Automata

Regular Model Checking and Verification of Cellular Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CSE 135: Introduction to Theory of Computation Nondeterministic Finite Automata (cont )

A Universal Turing Machine

Homework 02 Solution updated

Finite State Transducers

Languages. A language is a set of strings. String: A sequence of letters. Examples: cat, dog, house, Defined over an alphabet:

Inf2A: Converting from NFAs to DFAs and Closure Properties

Chapter 3. Regular grammars

CSci 311, Models of Computation Chapter 4 Properties of Regular Languages

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Deterministic Finite Automaton (DFA)

Fooling Sets and. Lecture 5

CSE182-L7. Protein Sequence Analysis Patterns (regular expressions) Profiles HMM Gene Finding CSE182

Automata & languages. A primer on the Theory of Computation. Laurent Vanbever. ETH Zürich (D-ITET) September,

CS481F01 Prelim 2 Solutions

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

CSE 105 THEORY OF COMPUTATION

CHAPTER 1 Regular Languages. Contents

Advanced Automata Theory 7 Automatic Functions

Regular expressions and Kleene s theorem

CSE 105 THEORY OF COMPUTATION

Deterministic Finite Automata (DFAs)

Warshall s algorithm

CISC 4090: Theory of Computation Chapter 1 Regular Languages. Section 1.1: Finite Automata. What is a computer? Finite automata

Notes on State Minimization

Recap DFA,NFA, DTM. Slides by Prof. Debasis Mitra, FIT.

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

Kleene Algebras and Algebraic Path Problems

} Some languages are Turing-decidable A Turing Machine will halt on all inputs (either accepting or rejecting). No infinite loops.

More on Finite Automata and Regular Languages. (NTU EE) Regular Languages Fall / 41

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Before we show how languages can be proven not regular, first, how would we show a language is regular?

CSE 105 Theory of Computation

CS 121, Section 2. Week of September 16, 2013

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Transcription:

l Given a string manipulating program, string analysis determines all possible values that a string expression can take during any program execution l Using string analysis we can verify properties of string manipulating programs l For example, we can identify all possible input values of sensitive functions in a web application and then check whether inputs of sensitive functions can contain attack strings

Configurations/Transitions are represented using word equations Word equations are represented/approximated using (aligned) multi-track DFAs which are closed under intersection, union, complement and projection Operations required for reachability analysis (such as equivalence checking) are computed on DFAs

Let X (the first track), Y (the second track), be two string variables λ : a padding symbol that appears only on the tail of each track (aligned) A multi-track automaton that encodes X = Y.txt (t, λ) (x, λ) (t, λ) (a,a), (b,b)

Compute the post-conditions of statements Given a multi-track automata M and an assignment statement: X := sexp Post(M, X := sexp) denotes the post-condition of X := sexp with respect to M Post(M, X := sexp) = ( X, M CONSTRUCT(X = sexp, +))[X/X ]

We implement a symbolic forward reachability computation using the post-condition operations The forward fixpoint computation is not guaranteed to converge in the presence of loops and recursion We use an automata based widening operation to over-approximate the fixpoint Widening operation over-approximates the union operations and accelerates the convergence of the fixpoint computation

The alphabet of an n-track automaton is Σ n The size of multi-track automata could be huge during computations On the other hand, we may carry more information than we need to verify the property More Abstractions: We propose alphabet abstraction to reduce Σ We propose relation abstraction to reduce n

Select a subset of alphabet characters (Σ ) to analyze distinctly and merge the remaining alphabet characters into a special symbol (u) For example: Let Σ={<, a, b, c} and Σ ={<}, L(M) = a<b +, we have: α Σ,Σ (M) = M α and γ Σ,Σ (M α ) = M γ, where L(M α )=u<u +, and L(M γ ) = (a b c)<(a b c) +

We use an alphabet transducer M Σ,Σ to construct abstract automata α denotes any character in Σ β denotes any character in Σ\Σ (α,α) (λ,λ) (λ,λ) (β,u)

M a < b b M α u < u u (a,*) (<,*) (b,*) (b,*) α (a,u) (<,<) (b,u) (b,u) (<,<) (λ,λ) M Σ,Σ (λ,λ) (a,u), (b,u), (c,u)

M γ a,b,c < a,b,c a,b,c M α u < u u (a,u), (b,u), (c,u) (<,<) (a,u), (b,u), (c,u) (a,u), (b,u), (c,u) γ (*,u) (*,<) (*,u) (*,u) (<,<) (λ,λ) M Σ,Σ (λ,λ) (a,u), (b,u), (c,u)

1:<?php 2: $www = $_GET[ www ]; 3: $l_otherinfo = URL ; 4: $www = str_replace(<,,$www); 5: echo <td>. $l_otherinfo. :. $www. </ td> ; 6:?> Consider the above example, choosing Σ ={<, s} (instead of all ASCII characters) is sufficient to conclude that the echo string does not contain any substring that matches <script

Consider the following abstraction: We map all the symbols in the alphabet to a single symbol The automaton we generate with this abstraction will be a unary automaton (an automaton with a unary alphabet) The only information that this automaton will give us will be the length of the strings So alphabet abstraction corresponds to length abstraction

Select sets of string variables to analyze relationally (using multi-track automata), and analyze the rest independently (using single-track automata) For example, consider three string variables n 1, n 2, n 3. Let χ={{n 1,n 2 }, n 3 } and χ ={{n 1 }, {n 2 }, {n 3 }} Let M = {M 1,2, M 3 } that consists of a 2-track automaton for n 1 and n 2 and a single track automaton for n 3 We have α χ,χ (M) = M α γ χ,χ (M α ) = M γ, where

M α = {M 1, M 2, M 3 } such that M 1 and M 2 are constructed by the projection of M 1,2 to the first track and the second track respectively M Υ = {M 1,2, M 3 } such that M 1,2 is constructed by the intersection of M 1,* and M *,2, where M 1,* is the two-track automaton extended from M1 with arbitrary values in the second track M *,2 is the two-track automaton extended from M2 with arbitrary values in the first track

M1,2 (b,b) (a,a) (c,c) α M 1, M 2 M 1,* b a (b,*) c (c,*) M 1,2 (b,b) (b,a) (a,a) (a,b) (c,c) γ M *,2 (a,*) (*,b) (*,a) (*,c)

1:<?php 2: $usr = $_GET[ usr ]; 3: $passwd = $_GET[ passwd ]; 4: $key = $usr.$passwd; 5: if($key = admin1234 ) 6: echo $usr; 7:?> Consider the above example, choosing χ ={{$usr, $key}, {$passwd}} is sufficient to identify the echo string is a prefix of admin1234 and does not contain any substring that matches <script

Both alphabet and relation abstractions form abstraction lattices, which allow different levels of abstractions Combining these abstractions leads a product lattice, where each point is an abstraction class that corresponds to a particular alphabet abstraction and a relation abstraction The top is a non relational analysis using unary alphabet The bottom is a complete relational analysis using full alphabet

Some abstraction from the abstraction lattice and the corresponding analyses (, ) size analysis relational size analysis (, )... (, ) string analysis (, ) relational string analysis

Select an abstraction class Ideally, the choice should be as abstract as possible while remaining precise enough to prove the property in question Heuristics Let the property guide the choice Collect constants and relations from assertions and their dependency graphs It forms the lower bound of the abstraction class Select an initial abstraction class, e.g., characters and relations appearing in assertions Refine the abstraction class toward the lower bound