Finding the Longest Similar Subsequence of Thumbprints for Intrusion Detection

Similar documents
Design and Analysis of Algorithms

Problem Set 9 Solutions

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

The Minimum Universal Cost Flow in an Infeasible Flow Network

Foundations of Arithmetic

Graph Reconstruction by Permutations

On the Repeating Group Finding Problem

Lecture 4: November 17, Part 1 Single Buffer Management

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Outline and Reading. Dynamic Programming. Dynamic Programming revealed. Computing Fibonacci. The General Dynamic Programming Technique

Grover s Algorithm + Quantum Zeno Effect + Vaidman

The Study of Teaching-learning-based Optimization Algorithm

Singular Value Decomposition: Theory and Applications

Errors for Linear Systems

Min Cut, Fast Cut, Polynomial Identities

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

Introduction to Information Theory, Data Compression,

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

CHAPTER IV RESEARCH FINDING AND ANALYSIS

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Finding Dense Subgraphs in G(n, 1/2)

Calculation of time complexity (3%)

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

ISSN: ISO 9001:2008 Certified International Journal of Engineering and Innovative Technology (IJEIT) Volume 3, Issue 1, July 2013

A new construction of 3-separable matrices via an improved decoding of Macula s construction

Simultaneous Optimization of Berth Allocation, Quay Crane Assignment and Quay Crane Scheduling Problems in Container Terminals

x = , so that calculated

Maximizing the number of nonnegative subsets

CS 331 DESIGN AND ANALYSIS OF ALGORITHMS DYNAMIC PROGRAMMING. Dr. Daisy Tang

a b a In case b 0, a being divisible by b is the same as to say that

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Kernel Methods and SVMs Extension

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Lecture Notes on Linear Regression

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

A Network Intrusion Detection Method Based on Improved K-means Algorithm

Volume 18 Figure 1. Notation 1. Notation 2. Observation 1. Remark 1. Remark 2. Remark 3. Remark 4. Remark 5. Remark 6. Theorem A [2]. Theorem B [2].

First day August 1, Problems and Solutions

Example: (13320, 22140) =? Solution #1: The divisors of are 1, 2, 3, 4, 5, 6, 9, 10, 12, 15, 18, 20, 27, 30, 36, 41,

Finding Primitive Roots Pseudo-Deterministically

NUMERICAL DIFFERENTIATION

= z 20 z n. (k 20) + 4 z k = 4

ECE559VV Project Report

An Interactive Optimisation Tool for Allocation Problems

NP-Completeness : Proofs

2.3 Nilpotent endomorphisms

Difference Equations

Annexes. EC.1. Cycle-base move illustration. EC.2. Problem Instances

Chapter Newton s Method

The L(2, 1)-Labeling on -Product of Graphs

A New Refinement of Jacobi Method for Solution of Linear System Equations AX=b

Single-Facility Scheduling over Long Time Horizons by Logic-based Benders Decomposition

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Introduction to information theory and data compression

COS 521: Advanced Algorithms Game Theory and Linear Programming

U.C. Berkeley CS278: Computational Complexity Professor Luca Trevisan 2/21/2008. Notes for Lecture 8

Chapter - 2. Distribution System Power Flow Analysis

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Exercises of Chapter 2

A 2D Bounded Linear Program (H,c) 2D Linear Programming

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS

An Admission Control Algorithm in Cloud Computing Systems

Appendix B: Resampling Algorithms

Common loop optimizations. Example to improve locality. Why Dependence Analysis. Data Dependence in Loops. Goal is to find best schedule:

arxiv:quant-ph/ Jul 2002

A new Approach for Solving Linear Ordinary Differential Equations

HMMT February 2016 February 20, 2016

On the Multicriteria Integer Network Flow Problem

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

More metrics on cartesian products

CHAPTER III Neural Networks as Associative Memory

Lecture 5 Decoding Binary BCH Codes

The Order Relation and Trace Inequalities for. Hermitian Operators

CS : Algorithms and Uncertainty Lecture 17 Date: October 26, 2016

Dynamic Programming 4/5/12. Dynamic programming. Fibonacci numbers. Fibonacci: a first attempt. David Kauchak cs302 Spring 2012

1 GSW Iterative Techniques for y = Ax

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Resource Allocation with a Budget Constraint for Computing Independent Tasks in the Cloud

A combinatorial problem associated with nonograms

Anti-van der Waerden numbers of 3-term arithmetic progressions.

Uncertainty in measurements of power and energy on power networks

Split alignment. Martin C. Frith April 13, 2012

Assortment Optimization under MNL

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

Generalized Linear Methods

Valuated Binary Tree: A New Approach in Study of Integers

Economics 101. Lecture 4 - Equilibrium and Efficiency

Pop-Click Noise Detection Using Inter-Frame Correlation for Improved Portable Auditory Sensing

Module 9. Lecture 6. Duality in Assignment Problems

Lecture 4: Constant Time SVD Approximation

Welfare Properties of General Equilibrium. What can be said about optimality properties of resource allocation implied by general equilibrium?

Problem Set 6: Trees Spring 2018

Perfect Competition and the Nash Bargaining Solution

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for U Charts. Dr. Wayne A. Taylor

Transcription:

Fndng the Longest Smlar Subsequence of Thumbprnts for Intruson Detecton Mng D. Wan, Shou-Hsuan Stephen Huang, and Janhua Yang Department of Computer Scence, Unversty of Houston Houston, Texas, 77204, USA Emal: {mngwan, shuang, hyang}@cs.uh.edu http://www.cs.uh.edu Techncal Report Number UH-CS-05-26 December 13, 2005 Keywords: Networ Securty, Intruson Detecton, Thumbprnt, Smlarty, Dynamc Programmng. Abstract One way to detect ntruders on the Internet s to compare the smlarty of two thumbprnts. A thumbprnt s a summary of a connecton that characterzes the connecton. The pacet gap thumbprnt conssts of sequences of non-negatve real number representng the tme gaps between send pacets. Ths paper formalzed defntons of smlarty between two non-negatve real number sequences, by ntroducng ε-smlarty, partal sum and longest ε-smlar subsequence (LSS). Length of LSS s a measurement of smlarty between two sequences. The Longest ε-smlar Subsequence (LSS) problem s a generalzaton of the well nown Longest Common Subsequence (LCS) problem. The goal of ths paper s to fnd an optmal soluton to the LSS problem. We analyzed the property of partal sums and proposed to focus on the mnmum matched partal sum whch leads to an optmal soluton to LSS whle reduce the problem space. As the LSS problem has optmal structure, we proposed an Algorthm based on dynamc programmng technque. Tme complexty of ths algorthm s O(m 2 n 2 ). By usng a property of the partal sums, we reduced the tme complexty to O(mn(m+n)).

Fndng the Longest Smlar Subsequence of Thumbprnts for Intruson Detecton Mng D. Wan, Shou-Hsuan Stephen Huang, and Janhua Yang Department of Computer Scence, Unversty of Houston Houston, Texas, 77204, USA Emal: {mngwan, shuang, hyang}@cs.uh.edu Abstract One way to detect ntruders on the Internet s to compare the smlarty of two thumbprnts. A thumbprnt s a summary of a connecton that characterzes the connecton. The pacet gap thumbprnt conssts of sequences of non-negatve real number representng the tme gaps between send pacets. Ths paper formalzed defntons of smlarty between two non-negatve real number sequences, by ntroducng ε-smlarty, partal sum and longest ε-smlar subsequence (LSS). Length of LSS s a measurement of smlarty between two sequences. The Longest ε- Smlar Subsequence (LSS) problem s a generalzaton of the well nown Longest Common Subsequence (LCS) problem. The goal of ths paper s to fnd an optmal soluton to the LSS problem. We analyzed the property of partal sums and proposed to focus on the mnmum matched partal sum whch leads to an optmal soluton to LSS whle reduce the problem space. As the LSS problem has optmal structure, we proposed an Algorthm based on dynamc programmng technque. Tme complexty of ths algorthm s O(m 2 n 2 ). By usng a property of the partal sums, we reduced the tme complexty to O(mn(m+n)). Index Terms Networ Securty, Intruson Detecton, Thumbprnt, Smlarty, Dynamc Programmng. 1. Introducton One way of detectng steppng-stone s by montorng a ste s ncomng and outgong traffc. In general we have the problem of determnng whether two connectons belong to the same connecton chan [ZP00]. In a recent paper, we proposed to use tme gaps between pacets as a temporal thumbprnt to dentfy a connecton [YH05]. The method requres us to compare two such thumbprnts to see f they are smlar. Even though there are effcent heurstc algorthms to compare two gap sequences, the ssue has never been studed formally. We propose a formal defnton of the problem by ntroducng ε-smlarty, partal sum and longest ε-smlar subsequence (LSS) and cast t as a generalzaton of the well nown Longest Common Subsequence (LCS) problem. It s not a trval tas to defne what s smlar n two sequences of numbers whch may dffer n length. The defnton and soluton we derve here can be used n comparng many dfferent thumbprnts. The LSS problem s much more complcated than LCS problem due to partal sums nvolved. We analyze the property of partal sums, proposed to focus on the mnmum matched partal sum (MMPS). The Longest ε-smlar Subsequence (LSS) problem has smlar optmal structure le LCS problem. Wth dynamc programmng technque, we have an O(m2n2) algorthm to fnd the optmal soluton to the LSS problem. Based on the property of partal sums as defned by ths paper, we come up wth a more effcent optmal algorthm wth tme complexty of O(mn(m+n)). Practcally, matches of very bg szed mnmum matched partal sum (MMPS), whch summed up a large number of elements together, are lely false match-ups n thumbprnt applcaton. By lmtng the sze of MMPS to a constant number s, we reduced the complexty a suboptmal soluton to O(smn). Page 1

The rest of ths paper s organzed as followng: In Secton 2, we proposed formal defntons. In Secton 3, we defne a partcular partal sum, whch sums up consecutve elements before one element. In Secton 4, property of LSS s studed. Dynamc programmng technque s used to solve the LSS problem. In Secton 5, a more effcent optmal algorthm and a heurstc algorthm are proposed to reduce tme complexty to O(mn(m+n)) and O(smn) respectvely. 2. Defntons The frst maor dffculty of our problem s the ssue of smlarty. The followng defntons defne the ε-smlarty between two sequences. ε-smlarty: Gven a rato ε between 0 and 1 nclusvely, and two non-negatve real number a and b, we defne a and b to be ε-smlar to each other f (a-b)/(a+b) ñ. In other words, the two numbers are wthn ε of each other proportonally (dfference dvded by the sum). We then generalze the smlarty to two sequences of the same length. Two sequences a[1..n] and b[1..n] are ε-smlar f a[] and b[] are ε-smlar for all =1,, n. Partal Sum: Gven a sequence a[1],, a[m], a partal sum of the sequence s the sum of one or more consecutve elements of the sequence, a[]+a[+1]+ +a[] for some <. Partal Sum Subsequence: A partal sum subsequence of a[1],, a[m] s a sequence of partal sum a [1], t[ ], a [m ] such that a'[ ] = where 1 ñ s[] ñ t[] ñ s[+1] ñ t[+1] ñ m for all = 1,, m -1. The = a[ ] s[ ] length of the (partal sum) subsequence s defned as m. Longest ε-smlar Subsequence (LSS): Gven two sequences of non-negatve real numbers a[1],, a[m] and b[1],, b[n], f () there exsts a (partal sum) subsequence a [1 p] of a and a subsequence b [1..p] of b, such that a and b are ε-smlar and () there are no other such sequences wth a longer length, we defne a and b as the longest ε-smlar subsequence (LSS) of a and b. The length of the longest ε-smlar can then be used to measure the smlarty of the two sequences. The smlarty rato of the two subsequences s defned as p/mn(a,b). An example of an ε-smlar subsequence (LSS) of length 4 s shown n Fgure 1 below. X: 10 20 30 15 15 40 50 80 Y: 30 60 70 20 30 50 Fgure 1: LSS example wth ε=0 If we choose ε=0 and s[]=t[] for all n the defnton of partal sum subsequence (. e., no non-trval partal sums allowed), the LSS becomes LCS. Therefore, the LSS problem s a generalzed LCS problem. See Fgure 2 for an LSS example that s also an LCS. X: 10 20 30 15 15 40 50 80 Y: 30 60 70 20 30 50 Fgure 2: LCS example Page 2

To solve the LSS problem, a brute-force approach s to enumerate all subsequences of a[1..m] and chec each partal sum subsequences to see f t s also a partal sum subsequence of b[1..n], eepng trac of the longest subsequence found. For a sequences of a[1..m], there are many partal sum subsequences, mang t mpractcal for long sequences. The well nown LCS problem has an O(mn) algorthm [CL01], where m and n are the lengths of the two sequences, based on Dynamc Programmng technque to fnd a soluton to the LCS problem [KC72, MP80]. Le the LCS problem, the LSS problem has an optmal-substructure property, so t s possble to use Dynamc Programmng technque to solve the problem. But, the LSS problem has partal sums nvolved, the soluton of the LSS problem s more complex than the LCS problem, the soluton to LSS problem requres more tme comparng to LCS problem. 3. Partal Sum Selecton A partal sum may contan one or many consecutve elements of a sequence. For one element of a sequence, there are many ways to defne ts partal sum startng and endng elements. Thus extra complexty wll be ntroduced. To reduce the problem space, we focus on one partcular way to defne partal sum startng and endng elements. If there s any optmal soluton to the LSS problem, there shall exst an optmal soluton consstng of partal sum selected by the defned method. We start wth defnng of the partal sum selecton and followed wth the proof of ts optmal property. Gven sequences X[1..m] and Y[1..n] and the maxmum length of partal sum s, we choose to calculate partal sums bacward startng from x and y, 1 ñ ñ m, 1 ñ ñ n. There are s number of combnatons of partal sums startng from x bacward wth maxmum length of s, such as {(x ), (x -1 +x ),, (x -s+1 +x - s+..+x )}. There are the same number of partal sums for y. The set of partal sums are defned as Partal Sums endng at x and Partal Sums endng at y respectvely. Fgure 3 shows Partal Sums endng at x and y. X: x 1 x -s x -s+1 x -1 x x m Y: y 1 y -s y -s+1 y -1 y y n Fgure 3: Partal Sums endng at x and y To chec f there s any ε-smlarty between Partal Sums endng at x and y, there are s 2 match evaluatons. It s possble that there are several partal sums beng ε-smlar at the same tme. To reduce the problem space, we don t need to consder all cases of those matched partal sums as long as we can fnd the one whch can produce the Longest ε-smlar Subsequence. The Mnmum Matched Partal Sums can produce the longest ε-smlar subsequence, thus we wll only focus on ths form of partal sum. For convenence, we denote as s ε-smlar wth and as not ε-smlar wth. Mnmum Matched Partal Sum (MMPS): Gven sequences X={x 1, x 2,, x n } and Y={y 1, y 2,, y n } and, x and x are any Partal Sum endng at x respectvely and, y and y are any Partal Sum endng at y respectvely. Let x' = x, = u = x u' = y v = y v' x' ' =, y' = y' ' =,, and where 1 m, 1 n, 0 (u,u )<, 0 (v,v )<. If x y, x y, x <x and y <y, then x and y are a par of Mnmum Matched Partal Sum (MMPS). Page 3

= u Fgure 4 shows two pars of ε-smlar partal sums where x ' = x s ε-smlar to y ' =, and y = u' " x = x = v' " s ε-smlar to y = y, x, y. Even though x and y are ε-smlar, but only x and = v y forms a par of Matched Partal Sum wth mnmum value, thus they are Mnmum Matched Partal Sum (MMPS) X = x 1 x 2 x -u x -u, x -1 x x +1 x m Y = y 1 y 2 y -v y -v, y -1 y y +1 y n Fgure 4: Mnmum Matched Partal Sums Theorem 1: Gven sequences X={x 1, x 2,, x m } and Y={y 1, y 2,, y n } and, x s a Partal Sums endng at x and, y s a Partal Sums endng at y. Let x' = = x, u y' = = y, where 1 m, 1 n, 0 u<, v 0 v<. x and y are par of Mnmum Matched Partal Sum (MMPS). MMPS has a property of optmal soluton to LSS problem for sequences of X and Y. Proof: MMPS have the longest prefx sequences X[1 u-1] and Y[1 v-1]. Larger Matched Partal Sums have shorter prefx subsequences. MMPS s prefx sequences have the most elements appended to shorter prefx sequences X[1 u -1] and Y[1 v -1] of other larger Matched Partal Sums. Thus t wll produce at least the same length of ε-smlar subsequence as other shorter prefx sequences. If MMPS can t produce longer ε-smlar subsequence, other matched partal sums can t do so ether. By only consderng Mnmum Matched Partal Sum endng at x and y respectvely, we avod further evaluaton of all other larger canddate Partal Sums, thus computaton tme for those canddate Partal Sums s reduced, resultng n a better performance. 4. Property and Soluton of LSS 4.1 Characterzng a Longest ε-smlar Subsequence Le the well nown LCS problem, the LSS problem has an optmal-substructure property. The natural classes of sub-problems correspond to pars of prefxes of the two nput sequences. Theorem 2 (Optmal substructure of an LSS) Let X={x 1, x 2,, x m } and Y={y 1, y 2,, y n } be sequences, and Z={z 1, z 2,, z } by any LSS of X and Y, and x m s a mnmum matched partal sum (MMPS) endng at x m, m n x' =, y m = x u 1 u m n s a MMPS endng at y n, y' n = = y v v, 1 v n. x m = x m, y n =y n f x m, y n s not a MMPS. Then there are 3 cases: 1. f x m y n, then z x m and y n, and z -1 s an LSS of X u-1 and Y v-1. 2. f x m y n, and z x m, mples that Z s an LSS of X m-1 and Y. 3. f x m y n, and z y n, mples that Z s an LSS of X and Y n-1. Proof: (1) If z x' m, then we could append x m y n to Z to obtan a ε-smlar subsequence of X and Y of length +1, contradctng the supposton that Z s a Longest ε-smlar subsequence of X and Y. Thus, we must have z x m y n. Now, the prefx Z -1 s length of (-1) ε-smlar subsequence of x m-u and y n-v. We wsh to show t s an LSS. Suppose for purpose of contradcton that there s a ε-smlar subsequence W of x u-1 and y v-1 wth length greater than -1. Then, appendng x m y n to W produces a ε-smlar subsequence of X and Y whose length s greater than, whch s a contradcton. Page 4

(2) If x ' m y' n, then Z s a ε-smlar subsequence of Xm-1 and Y. If there were a ε-smlar subsequence W of X m-1 and Y wth length greater than, then W would also be a ε-smlar subsequence X m and Y, contradctng to the assumpton that Z s an LSS of X and Y. (3) The proof s symmetrc to (2). The theorem shows that an LSS of two sequences contans an LSS of prefxes of the two sequences. Thus, the LSS problem has an optmal-substructure property. 4.2 A Recursve Soluton Snce the LSS problem has an optmal-substructure property, we could apply dynamc programmng technque to solve ths problem. To fnd LSS of X and Y, we may need to fnd the LSS s of X and Y n-1 and of X m-1 and Y. But each of these sub-problems has the same sub-sub-problem. To summarze t all, we enumerate the sub-problems: 1. f x m y n, then the sub-problem s the LSS of X u-1 and Y v-1 ; 2. f x m y n, then the sub-problem s the LSS of X m and Y n-1 or of X m-1 and Y n. Let c[,] to be the length of an LSS of the sequences X and Y. If ether =0 or =0, one of the sequences has length 0, so the LSS has length 0. The optmal substructure of LSS problem gves the recursve formula: c[,]=max{(c[u-1,v-1]+δ), c[-1,], c[,-1]} where δ=1, f x y, x' =, = x u y' = and 1 m, 1 u<, 1 n, 1 v<. = y v δ=0, otherwse As stated n the above formula, a condton n the problem restrcts whch sub-problem we may consder. When x ' y', we need to fnd the LSS of Xu-1 and Y v-1. Otherwse, we need to fnd LSS of X and Y -1 or of X -1 and Y. 4.3 Algorthm of Computng LSS Procedure LSS_Length taes two sequences X={x 1, x 2,, x m } and Y={y 1, y 2,, y n } as nputs. It stores c[, ] values n a table c[0..m, 0..n] whose entres are computed n row-maor order. The frst row and column are ntated to 0. When a ε-smlar matched Partal Sum (MMPS) s found, - b[1..m, 1..n]: stores a ponter to the optmal structure. - s[1..m, 1..n]: s[,]=0 f t s not a entry element of MMPS, otherwse 1. The procedure returns c, b, s tables. Table c[m,n] contans the length of an LSS of X and Y. The subroutne MMPS1() was called for each x and y to chec f there exsts a ε-smlar mnmum matched Partal Sum (MMPS) endng at x and y. LSS_LENGTH (X, Y) m = length[x]; n = length[y]; c = 0; for = 1 to m for =1 to n f (MMPS1(,) s true) and (c[u-1,v-1]+1 >= max(c[-1,],c[,-1])) { c[,]=c[u-1,v-1]+1; b[,] = (u-1, v-1); s[,] =1; } else { s[,] =0; Page 5

f c[,-1]>c[-1,] { c[,] = c[,-1]; b[,] = (, -1); } else { c[,] = c[-1,]; b[,] = c[-1,]; } } return (c,b,s) Subroutne MMPS1() checs f there s MMPS endng at x and y, bacward from mnmum partal sum to larger ones. The subroutne MMPS1() wll return the ndces of the partal sum f a Mnmum Matched Partal Sum was found, and saves runnng tme by sppng to chec all other larger Partal Sums. MMPS1(, ) for u = to 1 sum1 = sum(x[u..]); for v = to 1{ sum2 = sum(y[v..]); f (two sums are ε-smlar) return (true, u, v) } return (false,0,0) Fgure 7 shows the result of an example usng by the algorthm. The cells n row and column contans c[,], b[,], and s[,]. We combne the three tables nto one for easer dsplay. We use angle bracet to represent the ponter b[,]. The number before t s the c value and the number after t s the s value. The c value of 4 on the lowest rght corner s the length of the LSS. MMPS s are llustrated n shaded rectangle boxes. When s[,]=1, t s a vald entry for MMPS. The LSS computed n ths example s {30, 60, 90, 80}. Computng the tables taes O(mn) teratons, and for each table entry, we need to chec MMPS whch taes O(mn) tme each. Thus the total runnng tme of the algorthm s O(m 2 n 2 ). 4.4 Constructng an LSS The b and s table returned by LSS_LENGTH() can be used to construct an LSS of X={x 1, x 2,, x m } and Y={y 1, y 2,, y n }. For row and column, when s[,]=1, t s a par of vald partal sum endng at (,). Pnter b[,]=(u-1, v-1) ponts to next element n the optmal structure. The par of Partal Sums s X[u ] and Y[v ]. PRINT_LSS_X() traces the path of optmal structure, and prnts elements of partal sums of X[u ]. The followng procedure PRINT_LSS_X() runnng from rght-lower corner of b table prnts an LSS n terms of X elements n proper order: PRINT_LSS_X (X, s,, ) f (=0 or =0) return PRINT_LSS_X(X,s,b[,]) f (,) s the entry element of Partal Sum endng at x prnt Partal Sum elements end For p[,]=(u-1, v-1), then elements of partal sum endng at x s X[u ], and the elements of partal sum endng at y s Y[v ]. Page 6

1 2 3 4 5 6 X \ Y 30 60 70 20 30 50 1 10 0<1,0>0 0<1,0>0 0<1,2>0 0<1,3>0 0<1,4>0 0<1,5>0 2 20 1<0,0>1 1<2,1>0 1<2,2>0 1<1,3>1 1<0,4>1 1<2,5>0 3 30 1<2,0>1 1<0,1>1 1<3,2>0 1<3,3>0 2<2,4>1 2<3,5>0 4 15 1<3,1>0 1<4,1>0 1<4,2>0 1<4,3>0 2<3,5>0 2<4,5>0 5 15 1<3,0>1 2<2,1>1 2<5,2>0 2<5,3>0 2<3,4>1 2<5,5>0 6 40 1<5,1>0 2<5,2>0 2<3,2>1 2<6,3>0 2<6,4>0 2<2,3>1 7 50 1<6,1>0 2<6,2>0 2<7,2>0 3<5,2>1 3<6,3>1 3<6,5>1 8 80 1<7,1>0 2<7,2>0 2<6,1>1 3<7,4>0 3<8,4>0 4<7,4>1 Fgure 7: The combned table computed by LSS_LENGTH on sample sequences X={10, 20, 30, 15, 15, 40, 50, 80} and Y={30, 60, 70, 20, 30, 50}. 5. More Effcent Algorthms As mentoned n Secton 4.3, LENGTH_LSS() taes O(m 2 n 2 ) tme. The algorthm s not practcal f the lengths of the sequences are large. By examnng the property of the partal sums, there s room to mprove the effcency of the algorthm. 5.1 A More Effcent Optmal Algorthm: As we descrbed n Secton 3, for Sequences X[1 m] and Y[1 n], Partal Sums endng at x are {x }, {x +x -1 }, {x +x -1 +x -2 },, Partal Sums endng at y are {y }, {y +y -1 }, {y +y -1 +y -2 },. They are n monotonous ncreasng order. Usng ths property, we can come up wth a more effcent procedure of MMPS1(). The frst step of MMPS2() puts partal sums endng at x and y nto arrays psum_x[] and psum_y[], then compares them startng from the smallest one untl a ε-smlar match found. If a match was found, returns true wth the poston of startng elements of the par of matched partal sums, otherwse returns false. Snce the two partal sum arrays are monotonously ncreasng lsts, we can use the Merge procedure [HS97] to compare elements between the two ordered lsts. Mergng two sorted lsts of szes m and n taes O(m+n) tme. MMPS2(, ) psum_x = psum_y = 0; for u= to 1 psum_x[u] = psum_x[u-1]+x[u]; for v= to 1 psum_y[v] = psum_y[v-1]+y[v]; u = ; v = ; whle(u>0 and v>0) f psum_x[u] ε-smlar to psum_y[v] return (true, u, v) else f psum_x[u] > psum_y[v] do u-- else f psum_x[u] < psum_y[v] do v-- return (false, 0, 0) The tme complexty for MMPS2() s O(m+n), thus total tme complexty for procedure LENGTH_LSS()s reduced to O(mn(m+n)). Page 7

5.2 A Heurstc Algorthm In comparng smlarty of thumbprnts, t s hghly unlely a true smlarty for a par of matched partal sums consstng of a large number of elements. We have experenced that a match of a par of partal sums wth 50 and 101 elements respectvely. By lmtng the sze of partal sum to s, we can come up wth a heurstc algorthm MMPS3(). The tme complexty for MMPS3() s O(s), thus the total tme complexty for LSS_LENGTH() s O(smn). Due to lmtaton of the space and ts smlarty to MMPS2(), the algorthm s not lsted here. 6. Expermental Result We test our algorthm on two thumbprnts wth sequences of 224 and 176 postve real numbers. Values of the sequence s element range between 50,000 to 4,000,000 (mcroseconds n terms of pacet gap). The program runs on a PC wth Intel s Pentum III CPU. Dfferent ε values are used to determne performance of the algorthm. The result s shown n Table 1. Average run tme s about 0.8 seconds. The result of experment 1 shows that LSS length decreases when ε decreases. When ε s very small, the LSS elements consst of large szed partals sums. For example, when ε =0, the partal sum conssts 50 elements of X, and 101 elements of Y. In thumbprnt applcaton, too large sze of partal sums may not present the true smlarty of two thumbprnts. The experment also suggested an ε around 0.1 as an deal smlarty rato. The maxmum partal sum sze of 3 to 4 agrees wth our ntuton understandng of the thumbprnts. If we lmt the sze of partal sums to a small constant number s, the tme complexty wll be decreased to O(smn). ε LSS Length Max. Partal Sum Sze Total elements of matched partal sums X Y X Y 0.200 134 3 4 168 165 0.150 127 4 3 164 146 0.100 114 4 4 155 153 0.050 95 8 5 149 146 0.010 65 11 6 125 125 0.001 34 7 7 101 121 0.000 1 50 101 50 101 Table 1: No Sze Lmtaton on Partal Sums. (X length = 254, Y length = 176) Wth lmtng partal sum sze to s=5, we have a suboptmal soluton as shown n the Table 2. Average run tme s about 0.06 second. For large enough ε (10% or more), the solutons are actually optmal. For smaller ε, the solutons are not as good as the optmal solutons. ε LSS Length Max. Partal Sum Sze Total elements of matched partal sums X Y X Y 0.200 134 3 4 168 165 0.150 127 4 3 164 146 0.100 114 4 4 155 153 0.050 95 5 5 149 146 0.010 64 5 5 118 113 0.001 32 5 5 84 86 0.000 0 0 0 0 0 Table 2: Maxmum Partal Sum sze s lmted to s=5. (X length = 254, Y length = 176) Page 8

Fgure 6 compares the optmal and the suboptmal results. For the suboptmal algorthm, wth the sze of partal sum lmted to 5, we stll can get an almost dentcal result wth optmal soluton. The beneft of heurstc algorthm s that wthout sacrfcng too much precson, the runnng tme s much faster than optmal soluton (10 tmes faster) mang t feasble for real-tme applcaton. 7. Concluson Longest ε-smlar Subsequence (LSS) problem s a generalzaton of the Longest Common Subsequence (LCS) problem. In ntruson detecton applcaton on the Internet, t may be necessary to compare two sequences of thumbprnt to see f there are smlar. In order to do so, we provded a defnton of smlarty n ths context. Even though we use a specfc thumbprnt (pacet gap) as an example, the algorthm wll apply n most other cases snce most thumbprnts consst of a sequence of numbers. By analyzed the property of partal sums, we lmted our computaton to the mnmum matched partal sum (MMPS) whch leads to our optmal soluton of LSS whle reduces problem space. Fgure 6: Experment Comparson LSS Length 160 140 120 100 80 60 40 20 0 0 0.001 0.01 0.05 0.1 0.15 0.2 e-smlarty s=176 s=5 Fgure 6 Comparson of Optmal and Suboptmal solutons. Wth dynamc programmng technque, an O(m 2 n 2 ) algorthm for the optmal soluton to the LSS problem was descrbed. Based on the property of partal sums, we derved a more effcent algorthm wth tme complexty of O(mn(m+n)). Practcally, match-ups of very bg szed mnmum matched partal sum (MMPS), whch summed up a large number of elements together, are lely to be false match-ups n thumbprnt applcaton. By lmtng the sze of MMPS to a small constant number s, we wll have an algorthm wth O(smn) tme complexty. References: [ZP00] Yn Zhang, Vern Paxson, Detectng Steppng-Stone, Proceedngs of the 9 th USENIX Securty Symposum, Denver, CO, August 2000, pp 67-81. [YH05] Janhua Yang, Shou-Hsuan Stephen Huang: Matchng TCP Pacets and Its Applcaton to the Detecton of Long Connecton Chans, Proceedngs of IEEE Internatonal Conference on Advanced Informaton Networng and Applcatons, March 2005, Tape, Tawan, pp.1005-1010. [CL01] Thomas H. Cormen, Charles E. Leserson, Ronald L. Rvest and Clfford Sten. Introducton to Algorthms, Second Edton, pp. 350 355. The MIT Press, 2001. [KC72] V. Chvatal, D. A. Klarner, and D.E. Knuth. Selected combnatoral research programs. Techncal Report STAN-CS-72-292, Computer Scence Department, Stanford Unversty, 1972 [MP80] Wllam J. Mase and Mchael S. Paterson. A faster algorthm computng strng edt dstances, Journal of Computer and System Scence, 20(1):18-31, 1980. [HS97] Ells Horowtz, Sarta Sahn, Sanguthevar Raasearan, Computer Algorthms, pp. 146-147. Computer Scence Press, 1997. Page 9