Intrusion Detection and Malware Analysis

Size: px
Start display at page:

Download "Intrusion Detection and Malware Analysis"

Transcription

1 Intrusion Detection and Malware Analysis IDS feature extraction Pavel Laskov Wilhelm Schickard Institute for Computer Science

2 Metric embedding of byte sequences Sequences 1. blabla blubla blablabu aa 2. bla blablaa bulab bb abla 3. a blabla blabla ablub bla 4. blab blab abba blabla blu Subsequences Histograms of subsequences Geometry 2 3 Features blablabu blablaa blablu blabla bulab ablub blab abla abba blu bla bb aa b a 1 4

3 Formalization of embedding A sequence x from an alphabet Σ of cardinality N: x Σ A language L of pre-defined words: Example languages: n-grams bag-of-words L Σ = {w w Σ } all subsequences bag-of-delimiters Embedding function defined over the language: frequency φ w (x) : count for w in x binary flag

4 Similarity measures for embeddings Metric embedding enables application of various vectorial similarity measures over sequences, e.g. Kernels k(x, y) Distances d(x, y) Linear RBF φ w (x)φ w (y) w L exp(d(x, y) 2 /σ) Manhattan Minkowski φ w (x) φ w (y) w L k w L φ w (x) φ w (y) k Similarity coefficients Jaccard, Kulczynski,... Hamming Chebyshev sgn φ w (x) φ w (y) w L max φ w(x) φ w (y) w L

5 Abstract similarity measure Outer loop: s(x, y) = w L m(x, y, w) Inner function: m + (φ w (x), φ w (y)), m(x, y, w) = mx (φ w (x)), my (φ w (y)), if w matches x,y if w mismatches x if w mismatches y

6 Inner function computation m + (p, q) mx (p) my (q) Kernel functions Linear p q 0 0 Distances Manhattan p q p q Minkowski (p q) k p k q k Chebyshev max p q p q

7 Data structures: an overview How should we store subsequences to ensure linear-time extraction and matching?

8 Data structures: an overview How should we store subsequences to ensure linear-time extraction and matching? Hash tables: simple and relatively efficient; limited embeddings, hash table size difficult to choose. Sorted arrays: simple and highly efficient (contiguous storage!); limited embeddings Tries: moderately complex and efficient; limited embeddings. Suffix trees: unlimited embeddings; very complex, high constants and memory consumption.

9 Sorted array representation Rieck and Laskov Extract subsequences and store them in an array Sort the array value phi[x]. The length of an array X is denoted by X. In order to support effi comparison, For theany fields pair of of Xsequences, are sorted byfind contained matching words, and e.g. mismatching using the lexicogra order of thentries alphabet by A. looping Figureover 1 illustrates sorted arrays. the sorted arrays of 3-grams extracted the two example Example: sequences x = abbaa, x and y. y = baaaab X word[x] phi[x] abb 1 baa 1 bba 1 Y aaa 2 aab 1 baa 1 Figure 1: Sorted arrays of 3-grams for x = abbaa and y = baaaab. The number in field indicates the number of occurrences.

10 How to sort (sub-)sequences?

11 Radix sort at byte level Simple, linear running time How to sort (sub-)sequences?

12 How to sort (sub-)sequences? Radix sort at byte level Simple, linear running time Store subsequences in machine words, use numeric sorting Simple, superlinear running time, extremely low constants

13 How to sort (sub-)sequences? Radix sort at byte level Simple, linear running time Store subsequences in machine words, use numeric sorting Simple, superlinear running time, extremely low constants Ditto, use radix sorting at bit-level

14 Suffix tree: a definition A suffix tree for an m-character string S stores all suffixes of S. S = ababc$ ab b c$ 5 abc$ c$ abc$ c$

15 Properties of suffix trees A suffix tree has exactly m leaves numbered 1 to m. Each internal node has at least two children. Each edge is labeled by a non-empty substring of S. All edges of the same node begin with different symbols. For any leaf i, the concatenation of the labels on the path from root to i is the suffix of S starting at position i, i.e. S[i..m].

16 What are suffix trees good for? Problem: Given a string S of length n and a pattern p of length m, m n, find positions of all occurrences of P in S. Classical solution: O(m + n) (e.g. Knuth-Morris-Pratt) Suffix tree solution: O(m) S = ababc$, P = ab ab b c$ 5 abc$ c$ abc$ c$

17 Compact suffix tree storage Labels are replaced by index ranges. Internal nodes contain depth and leaf counts. Suffix links point to subtrees corresponding to the next suffix. S = ababc$ [1, 2] [2, 2] [5, e] [3, e] [5, e] [3, e] [5, e]

18 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: S = ababc$, P = baaaba ab b c$ abc$ c$ abc$ c$

19 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: scan b S = ababc$, P = baaaba ab c$ b abc$ c$ abc$ c$

20 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: scan a : MATCH, count 1 S = ababc$, P = baaaba ab b c$ abc$ c$ abc$ c$

21 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: scan a : MISMATCH S = ababc$, P = baaaba ab b c$ abc$ c$ abc$ c$

22 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: scan a : MISMATCH S = ababc$, P = baaaba ab b c$ abc$ c$ abc$ c$

23 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: scan b : MATCH, count 2 S = ababc$, P = baaaba ab b c$ abc$ c$ abc$ c$

24 Chang & Lawler Algorithm: an example Given a suffix tree for S, we can count matching substrings in S and P by walking along P and S: scan a : MATCH, count 1 S = ababc$, P = baaaba ab b c$ abc$ c$ abc$ c$

25 Generalized suffix tree (GST) A suffix tree for more than one string. Creation: concatenate two strings with different delimiters and build a single suffix tree Example: GST for x = abbaa and y = baaaa : 6 6 a # $ b a # $ bbaa# aa baa# a # $ aa$ # a$ $

26 Similarity computation using GST 2-grams abbaa baaaa abbaa baaaa = 0 aa ab ba bb 6 6 a # $ b a # $ bbaa# aa baa# a # $ aa$ # a$ $

27 Similarity computation using GST 2-grams abbaa baaaa abbaa baaaa = 3 aa 1 3 ab ba bb 6 6 a # $ b a # $ bbaa# aa baa# a # $ aa$ # a$ $

28 Similarity computation using GST 2-grams abbaa baaaa abbaa baaaa = 3 aa 1 3 ab 1 0 ba bb 6 6 a # $ b a # $ bbaa# aa baa# a # $ aa$ # a$ $

29 Similarity computation using GST 2-grams abbaa baaaa abbaa baaaa = 4 aa 1 3 ab 1 0 ba 1 1 bb 6 6 a # $ b a # $ bbaa# aa baa# a # $ aa$ # a$ $

30 Similarity computation using GST 2-grams abbaa baaaa abbaa baaaa = 4 aa 1 3 ab 1 0 ba 1 1 bb a # $ b a # $ bbaa# aa baa# a # $ aa$ # a$ $

31 Lessons learned Extraction of features from packet payloads is tricky but can be efficiently done with specialized data structures. In practice, sorted arrays work best for computation of similarity measures Suffix trees are the most powerful data structure for feature extraction: will be used for other problems.

32 Recommended reading D. Gusfield. Algorithms on strings, trees, and sequences. Cambridge University Press, K. Rieck and P. Laskov. Linear-time computation of similarity measures for sequential data. Journal of Machine Learning Research, 9(Jan):23 48, E. Ukkonen. Online construction of suffix trees. Algorithmica, 14(3): , 1995.

Lecture 9 Kernel Methods for Structured Inputs

Lecture 9 Kernel Methods for Structured Inputs Lecture 9 Kernel Methods for Structured Inputs Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen, Germany Advanced Topics in

More information

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200

Fast String Kernels. Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Fast String Kernels Alexander J. Smola Machine Learning Group, RSISE The Australian National University Canberra, ACT 0200 Alex.Smola@anu.edu.au joint work with S.V.N. Vishwanathan Slides (soon) available

More information

Theory of Computation

Theory of Computation Theory of Computation Lecture #2 Sarmad Abbasi Virtual University Sarmad Abbasi (Virtual University) Theory of Computation 1 / 1 Lecture 2: Overview Recall some basic definitions from Automata Theory.

More information

Linear Classifiers (Kernels)

Linear Classifiers (Kernels) Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers (Kernels) Blaine Nelson, Christoph Sawade, Tobias Scheffer Exam Dates & Course Conclusion There are 2 Exam dates: Feb 20 th March

More information

Hierarchical Overlap Graph

Hierarchical Overlap Graph Hierarchical Overlap Graph B. Cazaux and E. Rivals LIRMM & IBC, Montpellier 8. Feb. 2018 arxiv:1802.04632 2018 B. Cazaux & E. Rivals 1 / 29 Overlap Graph for a set of words Consider the set P := {abaa,

More information

1 Alphabets and Languages

1 Alphabets and Languages 1 Alphabets and Languages Look at handout 1 (inference rules for sets) and use the rules on some examples like {a} {{a}} {a} {a, b}, {a} {{a}}, {a} {{a}}, {a} {a, b}, a {{a}}, a {a, b}, a {{a}}, a {a,

More information

Recap from Last Time

Recap from Last Time Regular Expressions Recap from Last Time Regular Languages A language L is a regular language if there is a DFA D such that L( D) = L. Theorem: The following are equivalent: L is a regular language. There

More information

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata CMSC 330 Spring 2017 1 How do regular expressions work? What we ve learned What regular expressions are What they

More information

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata

CMSC 330: Organization of Programming Languages. Regular Expressions and Finite Automata CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata CMSC330 Spring 2018 1 How do regular expressions work? What we ve learned What regular expressions are What they

More information

How do regular expressions work? CMSC 330: Organization of Programming Languages

How do regular expressions work? CMSC 330: Organization of Programming Languages How do regular expressions work? CMSC 330: Organization of Programming Languages Regular Expressions and Finite Automata What we ve learned What regular expressions are What they can express, and cannot

More information

FABER Formal Languages, Automata. Lecture 2. Mälardalen University

FABER Formal Languages, Automata. Lecture 2. Mälardalen University CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2010 1 Content Languages, g Alphabets and Strings Strings & String Operations Languages & Language Operations

More information

CS375 Midterm Exam Solution Set (Fall 2017)

CS375 Midterm Exam Solution Set (Fall 2017) CS375 Midterm Exam Solution Set (Fall 2017) Closed book & closed notes October 17, 2017 Name sample 1. (10 points) (a) Put in the following blank the number of strings of length 5 over A={a, b, c} that

More information

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries

Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries Harvard CS121 and CSCI E-121 Lecture 2: Mathematical Preliminaries Harry Lewis September 5, 2013 Reading: Sipser, Chapter 0 Sets Sets are defined by their members A = B means that for every x, x A iff

More information

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata. Code No: R09220504 R09 Set No. 2 II B.Tech II Semester Examinations,December-January, 2011-2012 FORMAL LANGUAGES AND AUTOMATA THEORY Computer Science And Engineering Time: 3 hours Max Marks: 75 Answer

More information

Theory of Computation

Theory of Computation Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what

More information

Computation Theory Finite Automata

Computation Theory Finite Automata Computation Theory Dept. of Computing ITT Dublin October 14, 2010 Computation Theory I 1 We would like a model that captures the general nature of computation Consider two simple problems: 2 Design a program

More information

Section 1 (closed-book) Total points 30

Section 1 (closed-book) Total points 30 CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes

More information

Lecture 1 09/08/2017

Lecture 1 09/08/2017 15CS54 Automata Theory & Computability By Harivinod N Asst. Professor, Dept of CSE, VCET Puttur 1 Lecture 1 09/08/2017 3 1 Text Books 5 Why Study the Theory of Computation? Implementations come and go.

More information

Context-Free Languages

Context-Free Languages CS:4330 Theory of Computation Spring 2018 Context-Free Languages Non-Context-Free Languages Haniel Barbosa Readings for this lecture Chapter 2 of [Sipser 1996], 3rd edition. Section 2.3. Proving context-freeness

More information

Fast Kernels for String and Tree Matching

Fast Kernels for String and Tree Matching Fast Kernels for String and Tree Matching S. V. N. Vishwanathan Dept. of Comp. Sci. & Automation Indian Institute of Science Bangalore, 560012, India vishy@csa.iisc.ernet.in Alexander J. Smola Machine

More information

Chapter 4. Regular Expressions. 4.1 Some Definitions

Chapter 4. Regular Expressions. 4.1 Some Definitions Chapter 4 Regular Expressions 4.1 Some Definitions Definition: If S and T are sets of strings of letters (whether they are finite or infinite sets), we define the product set of strings of letters to be

More information

In English, there are at least three different types of entities: letters, words, sentences.

In English, there are at least three different types of entities: letters, words, sentences. Chapter 2 Languages 2.1 Introduction In English, there are at least three different types of entities: letters, words, sentences. letters are from a finite alphabet { a, b, c,..., z } words are made up

More information

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018 Lecture 14 Ana Bove May 14th 2018 Recap: Context-free Grammars Simplification of grammars: Elimination of ǫ-productions; Elimination of

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

Section 1.3 Ordered Structures

Section 1.3 Ordered Structures Section 1.3 Ordered Structures Tuples Have order and can have repetitions. (6,7,6) is a 3-tuple () is the empty tuple A 2-tuple is called a pair and a 3-tuple is called a triple. We write (x 1,, x n )

More information

Module 9: Tries and String Matching

Module 9: Tries and String Matching Module 9: Tries and String Matching CS 240 - Data Structures and Data Management Sajed Haque Veronika Irvine Taylor Smith Based on lecture notes by many previous cs240 instructors David R. Cheriton School

More information

A Universal Turing Machine

A Universal Turing Machine A Universal Turing Machine A limitation of Turing Machines: Turing Machines are hardwired they execute only one program Real Computers are re-programmable Solution: Universal Turing Machine Attributes:

More information

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata.

Clarifications from last time. This Lecture. Last Lecture. CMSC 330: Organization of Programming Languages. Finite Automata. CMSC 330: Organization of Programming Languages Last Lecture Languages Sets of strings Operations on languages Finite Automata Regular expressions Constants Operators Precedence CMSC 330 2 Clarifications

More information

CS 133 : Automata Theory and Computability

CS 133 : Automata Theory and Computability CS 133 : Automata Theory and Computability Lecture Slides 1 Regular Languages and Finite Automata Nestine Hope S. Hernandez Algorithms and Complexity Laboratory Department of Computer Science University

More information

Automata: a short introduction

Automata: a short introduction ILIAS, University of Luxembourg Discrete Mathematics II May 2012 What is a computer? Real computers are complicated; We abstract up to an essential model of computation; We begin with the simplest possible

More information

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission.

CSE 105 Homework 1 Due: Monday October 9, Instructions. should be on each page of the submission. CSE 5 Homework Due: Monday October 9, 7 Instructions Upload a single file to Gradescope for each group. should be on each page of the submission. All group members names and PIDs Your assignments in this

More information

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1) COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2017 Fall Hakjoo Oh COSE212 2017 Fall, Lecture 1 September 4, 2017 1 / 9 Inductive Definitions Inductive definition (induction)

More information

Languages, regular languages, finite automata

Languages, regular languages, finite automata Notes on Computer Theory Last updated: January, 2018 Languages, regular languages, finite automata Content largely taken from Richards [1] and Sipser [2] 1 Languages An alphabet is a finite set of characters,

More information

Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine

Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine Turing s thesis: (1930) Any computation carried out by mechanical means can be performed by a Turing Machine There is no known model of computation more powerful than Turing Machines Definition of Algorithm:

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching 1 Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Outline and Reading Strings ( 9.1.1) Pattern matching algorithms Brute-force algorithm ( 9.1.2) Boyer-Moore algorithm ( 9.1.3) Knuth-Morris-Pratt

More information

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar

TAFL 1 (ECS-403) Unit- III. 3.1 Definition of CFG (Context Free Grammar) and problems. 3.2 Derivation. 3.3 Ambiguity in Grammar TAFL 1 (ECS-403) Unit- III 3.1 Definition of CFG (Context Free Grammar) and problems 3.2 Derivation 3.3 Ambiguity in Grammar 3.3.1 Inherent Ambiguity 3.3.2 Ambiguous to Unambiguous CFG 3.4 Simplification

More information

UNIT-I. Strings, Alphabets, Language and Operations

UNIT-I. Strings, Alphabets, Language and Operations UNIT-I Strings, Alphabets, Language and Operations Strings of characters are fundamental building blocks in computer science. Alphabet is defined as a non empty finite set or nonempty set of symbols. The

More information

The Probability of Winning a Series. Gregory Quenell

The Probability of Winning a Series. Gregory Quenell The Probability of Winning a Series Gregory Quenell Exercise: Team A and Team B play a series of n + games. The first team to win n + games wins the series. All games are independent, and Team A wins any

More information

Theory of Computer Science

Theory of Computer Science Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 14, 2016 Introduction Example: Propositional Formulas from the logic part: Definition (Syntax of Propositional

More information

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia

Pattern Matching. a b a c a a b. a b a c a b. a b a c a b. Pattern Matching Goodrich, Tamassia Pattern Matching a b a c a a b 1 4 3 2 Pattern Matching 1 Brute-Force Pattern Matching ( 11.2.1) The brute-force pattern matching algorithm compares the pattern P with the text T for each possible shift

More information

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007. Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G

More information

Automata Theory CS F-08 Context-Free Grammars

Automata Theory CS F-08 Context-Free Grammars Automata Theory CS411-2015F-08 Context-Free Grammars David Galles Department of Computer Science University of San Francisco 08-0: Context-Free Grammars Set of Terminals (Σ) Set of Non-Terminals Set of

More information

Theory of Computation Turing Machine and Pushdown Automata

Theory of Computation Turing Machine and Pushdown Automata Theory of Computation Turing Machine and Pushdown Automata 1. What is a Turing Machine? A Turing Machine is an accepting device which accepts the languages (recursively enumerable set) generated by type

More information

Solution to CS375 Homework Assignment 11 (40 points) Due date: 4/26/2017

Solution to CS375 Homework Assignment 11 (40 points) Due date: 4/26/2017 Solution to CS375 Homework Assignment 11 (40 points) Due date: 4/26/2017 1. Find a Greibach normal form for the following given grammar. (10 points) S bab A BAa a B bb Ʌ Solution: (1) Since S does not

More information

Graduate Algorithms CS F-20 String Matching

Graduate Algorithms CS F-20 String Matching Graduate Algorithms CS673-2016F-20 String Matching David Galles Department of Computer Science University of San Francisco 20-0: String Matching Given a source text, and a string to match, where does the

More information

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1)

COSE212: Programming Languages. Lecture 1 Inductive Definitions (1) COSE212: Programming Languages Lecture 1 Inductive Definitions (1) Hakjoo Oh 2018 Fall Hakjoo Oh COSE212 2018 Fall, Lecture 1 September 5, 2018 1 / 10 Inductive Definitions Inductive definition (induction)

More information

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction. Microsoft Research Asia September 5, 2005 1 2 3 4 Section I What is? Definition is a technique for efficiently recurrence computing by storing partial results. In this slides, I will NOT use too many formal

More information

CS Automata, Computability and Formal Languages

CS Automata, Computability and Formal Languages Automata, Computability and Formal Languages Luc Longpré faculty.utep.edu/longpre 1 - Pg 1 Slides : version 3.1 version 1 A. Tapp version 2 P. McKenzie, L. Longpré version 2.1 D. Gehl version 2.2 M. Csűrös,

More information

Pattern Matching (Exact Matching) Overview

Pattern Matching (Exact Matching) Overview CSI/BINF 5330 Pattern Matching (Exact Matching) Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Pattern Matching Exhaustive Search DFA Algorithm KMP Algorithm

More information

Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm

Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Theoretical aspects of ERa, the fastest practical suffix tree construction algorithm Matevž Jekovec University of Ljubljana Faculty of Computer and Information Science Oct 10, 2013 Text indexing problem

More information

The Binomial Theorem.

The Binomial Theorem. The Binomial Theorem RajeshRathod42@gmail.com The Problem Evaluate (A+B) N as a polynomial in powers of A and B Where N is a positive integer A and B are numbers Example: (A+B) 5 = A 5 +5A 4 B+10A 3 B

More information

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome

Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression. Sergio De Agostino Sapienza University di Rome Computing Techniques for Parallel and Distributed Systems with an Application to Data Compression Sergio De Agostino Sapienza University di Rome Parallel Systems A parallel random access machine (PRAM)

More information

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska LECTURE 5 CHAPTER 2 FINITE AUTOMATA 1. Deterministic Finite Automata DFA 2. Nondeterministic Finite Automata NDFA 3. Finite Automata

More information

CS6902 Theory of Computation and Algorithms

CS6902 Theory of Computation and Algorithms CS6902 Theory of Computation and Algorithms Any mechanically (automatically) discretely computation of problem solving contains at least three components: - problem description - computational tool - procedure/analysis

More information

CS375: Logic and Theory of Computing

CS375: Logic and Theory of Computing CS375: Logic and Theory of Computing Fuhua (Frank) Cheng Department of Computer Science University of Kentucky 1 Table of Contents: Week 1: Preliminaries (set algebra, relations, functions) (read Chapters

More information

Closure under the Regular Operations

Closure under the Regular Operations Closure under the Regular Operations Application of NFA Now we use the NFA to show that collection of regular languages is closed under regular operations union, concatenation, and star Earlier we have

More information

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages

CS 154. Finite Automata vs Regular Expressions, Non-Regular Languages CS 154 Finite Automata vs Regular Expressions, Non-Regular Languages Deterministic Finite Automata Computation with finite memory Non-Deterministic Finite Automata Computation with finite memory and guessing

More information

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata

CMSC 330: Organization of Programming Languages. Theory of Regular Expressions Finite Automata : Organization of Programming Languages Theory of Regular Expressions Finite Automata Previous Course Review {s s defined} means the set of string s such that s is chosen or defined as given s A means

More information

Chapter 6. Properties of Regular Languages

Chapter 6. Properties of Regular Languages Chapter 6 Properties of Regular Languages Regular Sets and Languages Claim(1). The family of languages accepted by FSAs consists of precisely the regular sets over a given alphabet. Every regular set is

More information

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harry Lewis October 8, 2013 Reading: Sipser, pp. 119-128. Pushdown Automata (review) Pushdown Automata = Finite automaton

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)

More information

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Binary Search Introduction Problem Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26 Strategy 1: Random Search Randomly select a page until the page containing

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) CS/ECE 374: Algorithms & Models of Computation, Fall 28 Deterministic Finite Automata (DFAs) Lecture 3 September 4, 28 Chandra Chekuri (UIUC) CS/ECE 374 Fall 28 / 33 Part I DFA Introduction Chandra Chekuri

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Fall 27 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, September 5, 27 Sariel Har-Peled (UIUC) CS374 Fall 27 / 36 Part I DFA Introduction Sariel

More information

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs CSC4510/6510 AUTOMATA 6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs The Pumping Lemma for Context Free Languages One way to prove AnBn is not regular is to use the pumping lemma

More information

Theory of Computation

Theory of Computation Fall 2002 (YEN) Theory of Computation Midterm Exam. Name:... I.D.#:... 1. (30 pts) True or false (mark O for true ; X for false ). (Score=Max{0, Right- 1 2 Wrong}.) (1) X... If L 1 is regular and L 2 L

More information

Knuth-Morris-Pratt Algorithm

Knuth-Morris-Pratt Algorithm Knuth-Morris-Pratt Algorithm Jayadev Misra June 5, 2017 The Knuth-Morris-Pratt string matching algorithm (KMP) locates all occurrences of a pattern string in a text string in linear time (in the combined

More information

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science

CSCI 340: Computational Models. Regular Expressions. Department of Computer Science CSCI 340: Computational Models Regular Expressions Chapter 4 Department of Computer Science Yet Another New Method for Defining Languages Given the Language: L 1 = {x n for n = 1 2 3...} We could easily

More information

INF 4130 / /8-2017

INF 4130 / /8-2017 INF 4130 / 9135 28/8-2017 Algorithms, efficiency, and complexity Problem classes Problems can be divided into sets (classes). Problem classes are defined by the type of algorithm that can (or cannot) solve

More information

Small-Space Dictionary Matching (Dissertation Proposal)

Small-Space Dictionary Matching (Dissertation Proposal) Small-Space Dictionary Matching (Dissertation Proposal) Graduate Center of CUNY 1/24/2012 Problem Definition Dictionary Matching Input: Dictionary D = P 1,P 2,...,P d containing d patterns. Text T of length

More information

1. Basics of Information

1. Basics of Information 1. Basics of Information 6.004x Computation Structures Part 1 Digital Circuits Copyright 2015 MIT EECS 6.004 Computation Structures L1: Basics of Information, Slide #1 What is Information? Information,

More information

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages

What Is a Language? Grammars, Languages, and Machines. Strings: the Building Blocks of Languages Do Homework 2. What Is a Language? Grammars, Languages, and Machines L Language Grammar Accepts Machine Strings: the Building Blocks of Languages An alphabet is a finite set of symbols: English alphabet:

More information

arxiv: v1 [cs.ds] 15 Feb 2012

arxiv: v1 [cs.ds] 15 Feb 2012 Linear-Space Substring Range Counting over Polylogarithmic Alphabets Travis Gagie 1 and Pawe l Gawrychowski 2 1 Aalto University, Finland travis.gagie@aalto.fi 2 Max Planck Institute, Germany gawry@cs.uni.wroc.pl

More information

Author: Vivek Kulkarni ( )

Author: Vivek Kulkarni ( ) Author: Vivek Kulkarni ( vivek_kulkarni@yahoo.com ) Chapter-3: Regular Expressions Solutions for Review Questions @ Oxford University Press 2013. All rights reserved. 1 Q.1 Define the following and give

More information

Formal Definition of a Finite Automaton. August 26, 2013

Formal Definition of a Finite Automaton. August 26, 2013 August 26, 2013 Why a formal definition? A formal definition is precise: - It resolves any uncertainties about what is allowed in a finite automaton such as the number of accept states and number of transitions

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz

Homework 4. Chapter 7. CS A Term 2009: Foundations of Computer Science. By Li Feng, Shweta Srivastava, and Carolina Ruiz CS3133 - A Term 2009: Foundations of Computer Science Prof. Carolina Ruiz Homework 4 WPI By Li Feng, Shweta Srivastava, and Carolina Ruiz Chapter 7 Problem: Chap 7.1 part a This PDA accepts the language

More information

Analysis of Algorithms Prof. Karen Daniels

Analysis of Algorithms Prof. Karen Daniels UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Spring, 2012 Tuesday, 4/24/2012 String Matching Algorithms Chapter 32* * Pseudocode uses 2 nd edition conventions 1 Chapter

More information

REGular and Context-Free Grammars

REGular and Context-Free Grammars REGular and Context-Free Grammars Nicholas Mainardi 1 Dipartimento di Elettronica e Informazione Politecnico di Milano nicholas.mainardi@polimi.it March 26, 2018 1 Partly Based on Alessandro Barenghi s

More information

Lecture 18 April 26, 2012

Lecture 18 April 26, 2012 6.851: Advanced Data Structures Spring 2012 Prof. Erik Demaine Lecture 18 April 26, 2012 1 Overview In the last lecture we introduced the concept of implicit, succinct, and compact data structures, and

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

arxiv: v2 [cs.ds] 16 Mar 2015

arxiv: v2 [cs.ds] 16 Mar 2015 Longest common substrings with k mismatches Tomas Flouri 1, Emanuele Giaquinta 2, Kassian Kobert 1, and Esko Ukkonen 3 arxiv:1409.1694v2 [cs.ds] 16 Mar 2015 1 Heidelberg Institute for Theoretical Studies,

More information

More Properties of Regular Languages

More Properties of Regular Languages More Properties of Regular anguages 1 We have proven Regular languages are closed under: Union Concatenation Star operation Reverse 2 Namely, for regular languages 1 and 2 : Union 1 2 Concatenation Star

More information

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS 1. [6 POINTS] For language L 1 = {0 n 1 m n, m 1, m n}, which string is in L 1? ANSWER: 0001111 is in L 1 (with n =

More information

ON THE STAR-HEIGHT OF SUBWORD COUNTING LANGUAGES AND THEIR RELATIONSHIP TO REES ZERO-MATRIX SEMIGROUPS

ON THE STAR-HEIGHT OF SUBWORD COUNTING LANGUAGES AND THEIR RELATIONSHIP TO REES ZERO-MATRIX SEMIGROUPS ON THE STAR-HEIGHT OF SUBWORD COUNTING LANGUAGES AND THEIR RELATIONSHIP TO REES ZERO-MATRIX SEMIGROUPS TOM BOURNE AND NIK RUŠKUC Abstract. Given a word w over a finite alphabet, we consider, in three special

More information

Mining Frequent Closed Unordered Trees Through Natural Representations

Mining Frequent Closed Unordered Trees Through Natural Representations Mining Frequent Closed Unordered Trees Through Natural Representations José L. Balcázar, Albert Bifet and Antoni Lozano Universitat Politècnica de Catalunya Pascal Workshop: learning from and with graphs

More information

A canonical semi-deterministic transducer

A canonical semi-deterministic transducer A canonical semi-deterministic transducer Achilles A. Beros Joint work with Colin de la Higuera Laboratoire d Informatique de Nantes Atlantique, Université de Nantes September 18, 2014 The Results There

More information

String Search. 6th September 2018

String Search. 6th September 2018 String Search 6th September 2018 Search for a given (short) string in a long string Search problems have become more important lately The amount of stored digital information grows steadily (rapidly?)

More information

CS6901: review of Theory of Computation and Algorithms

CS6901: review of Theory of Computation and Algorithms CS6901: review of Theory of Computation and Algorithms Any mechanically (automatically) discretely computation of problem solving contains at least three components: - problem description - computational

More information

CS 154, Lecture 3: DFA NFA, Regular Expressions

CS 154, Lecture 3: DFA NFA, Regular Expressions CS 154, Lecture 3: DFA NFA, Regular Expressions Homework 1 is coming out Deterministic Finite Automata Computation with finite memory Non-Deterministic Finite Automata Computation with finite memory and

More information

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1. Theory of Computer Science March 20, 2017 C1. Formal Languages and Grammars Theory of Computer Science C1. Formal Languages and Grammars Malte Helmert University of Basel March 20, 2017 C1.1 Introduction

More information

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer.

Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer. Context Free Languages (CFL) Language Recognizer A device that accepts valid strings. The FA are formalized types of language recognizer. Language Generator: Context free grammars are language generators,

More information

Deterministic Finite Automata (DFAs)

Deterministic Finite Automata (DFAs) Algorithms & Models of Computation CS/ECE 374, Spring 29 Deterministic Finite Automata (DFAs) Lecture 3 Tuesday, January 22, 29 L A TEXed: December 27, 28 8:25 Chan, Har-Peled, Hassanieh (UIUC) CS374 Spring

More information

CS 121, Section 2. Week of September 16, 2013

CS 121, Section 2. Week of September 16, 2013 CS 121, Section 2 Week of September 16, 2013 1 Concept Review 1.1 Overview In the past weeks, we have examined the finite automaton, a simple computational model with limited memory. We proved that DFAs,

More information

Exam 1 CSU 390 Theory of Computation Fall 2007

Exam 1 CSU 390 Theory of Computation Fall 2007 Exam 1 CSU 390 Theory of Computation Fall 2007 Solutions Problem 1 [10 points] Construct a state transition diagram for a DFA that recognizes the following language over the alphabet Σ = {a, b}: L 1 =

More information

Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA

Roberto Perdisci^+, Guofei Gu^, Wenke Lee^ presented by Roberto Perdisci. ^Georgia Institute of Technology, Atlanta, GA, USA U s i n g a n E n s e m b l e o f O n e - C l a s s S V M C l a s s i f i e r s t o H a r d e n P a y l o a d - B a s e d A n o m a l y D e t e c t i o n S y s t e m s Roberto Perdisci^+, Guofei Gu^, Wenke

More information

p 3 p 2 p 4 q 2 q 7 q 1 q 3 q 6 q 5

p 3 p 2 p 4 q 2 q 7 q 1 q 3 q 6 q 5 Discrete Fréchet distance Consider Professor Bille going for a walk with his personal dog. The professor follows a path of points p 1,..., p n and the dog follows a path of points q 1,..., q m. We assume

More information

Automata Theory CS F-04 Non-Determinisitic Finite Automata

Automata Theory CS F-04 Non-Determinisitic Finite Automata Automata Theory CS411-2015F-04 Non-Determinisitic Finite Automata David Galles Department of Computer Science University of San Francisco 04-0: Non-Determinism A Deterministic Finite Automata s transition

More information

{a, b, c} {a, b} {a, c} {b, c} {a}

{a, b, c} {a, b} {a, c} {b, c} {a} Section 4.3 Order Relations A binary relation is an partial order if it transitive and antisymmetric. If R is a partial order over the set S, we also say, S is a partially ordered set or S is a poset.

More information