A PACKET CLASSIFIER USING LUT CASCADES BASED ON EVMDDS (K)

Similar documents
Multi-Terminal Multi-Valued Decision Diagrams for Characteristic Function Representing Cluster Decomposition

A Fast Head-Tail Expression Generator for TCAM Application to Packet Classification

Sum-of-Generalized Products Expressions Applications and Minimization

Analysis and Synthesis of Weighted-Sum Functions

LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations

NUMERICAL FUNCTION GENERATORS USING BILINEAR INTERPOLATION

A Deep Convolutional Neural Network Based on Nested Residue Number System

An Implementation of an Address Generator Using Hash Memories

On LUT Cascade Realizations of FIR Filters

On the Complexity of Error Detection Functions for Redundant Residue Number Systems

On the Number of Products to Represent Interval Functions by SOPs with Four-Valued Variables

Hardware to Compute Walsh Coefficients

doi: /TCAD

An Efficient Heuristic Algorithm for Linear Decomposition of Index Generation Functions

A Novel Ternary Content-Addressable Memory (TCAM) Design Using Reversible Logic

Design Method for Numerical Function Generators Based on Polynomial Approximation for FPGA Implementation

A Regular Expression Matching Circuit Based on a Decomposed Automaton

PAPER Design Methods of Radix Converters Using Arithmetic Decompositions

An Application of Autocorrelation Functions to Find Linear Decompositions for Incompletely Specified Index Generation Functions

Reorganized and Compact DFA for Efficient Regular Expression Matching

WITH rapid growth of traditional FPGA industry, heterogeneous

A Virus Scanning Engine Using an MPU and an IGU Based on Row-Shift Decomposition

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions

PAPER Head-Tail Expressions for Interval Functions

Polynomial Methods for Component Matching and Verification

Reversible Implementation of Ternary Content Addressable Memory (TCAM) Interface with SRAM

x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 x 11 x 12 x 13 x 14 x 15 x 16

Title. Citation Information Processing Letters, 112(16): Issue Date Doc URLhttp://hdl.handle.net/2115/ Type.

BDD Based Upon Shannon Expansion

x 1 x 2 x 7 y 2 y 1 = = = WGT7 SB(7,4) SB(7,2) SB(7,1)

Symmetrical, Dual and Linear Functions and Their Autocorrelation Coefficients

Index Generation Functions: Minimization Methods

REMARKS ON THE NUMBER OF LOGIC NETWORKS WITH SAME COMPLEXITY DERIVED FROM SPECTRAL TRANSFORM DECISION DIAGRAMS

Counter Examples to the Conjecture on the Complexity of BDD Binary Operations

Irredundant Sum-of-Products Expressions - J. T. Butler

A VLSI Algorithm for Modular Multiplication/Division

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Large-Scale SOP Minimization Using Decomposition and Functional Properties

Binary Decision Diagrams and Symbolic Model Checking

On the number of segments needed in a piecewise linear approximation

On the Minimization of SOPs for Bi-Decomposable Functions

Binary Decision Diagrams

Detecting Support-Reducing Bound Sets using Two-Cofactor Symmetries 1

EVMDD-Based Analysis and Diagnosis Methods of Multi-State Systems with Multi-State Components

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

A New Method to Express Functional Permissibilities for LUT based FPGAs and Its Applications

7600 Series Routers Adjacency Allocation Method

Efficient Techniques for Fast Packet Classification

Representations of All Solutions of Boolean Programming Problems

Reduced Ordered Binary Decision Diagrams

Basing Decisions on Sentences in Decision Diagrams

Variable Reordering for Reversible Wave Cascades - PRELIMINARY VERSION

Preprint from Workshop Notes, International Workshop on Logic Synthesis (IWLS 97), Tahoe City, California, May 19-21, 1997

Logic BIST. Sungho Kang Yonsei University

EECS 219C: Computer-Aided Verification Boolean Satisfiability Solving III & Binary Decision Diagrams. Sanjit A. Seshia EECS, UC Berkeley

A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton

Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method

Symbolic Model Checking with ROBDDs

(Self-)reconfigurable Finite State Machines: Theory and Implementation

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

Binary decision diagrams for security protocols

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Efficient Polynomial Evaluation Algorithm and Implementation on FPGA

A Deep Convolutional Neural Network Based on Nested Residue Number System

Sample Test Paper - I

Inadmissible Class of Boolean Functions under Stuck-at Faults

Software Engineering 2DA4. Slides 8: Multiplexors and More

PAPER Adaptive Bloom Filter : A Space-Efficient Counting Algorithm for Unpredictable Network Traffic

Pipelined Viterbi Decoder Using FPGA

Binary Decision Diagrams

A Fast Method to Derive Minimum SOPs for Decomposable Functions

FPGA IMPLEMENTATION OF BASIC ADDER CIRCUITS USING REVERSIBLE LOGIC GATES

Design and Implementation of High Speed CRC Generators

Overview. Discrete Event Systems Verification of Finite Automata. What can finite automata be used for? What can finite automata be used for?

by Symbolic Spectral Analysis of Boolean Functions Politecnico di Torino Torino. ITALY 10129

A High-Speed Realization of Chinese Remainder Theorem

2. Accelerated Computations

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan

326 J. Comput. Sci. & Technol., May 2003, Vol.18, No.3 of single-output FPRM functions. The rest of the paper is organized as follows. In Section 2, p

FPGA Resource Utilization Estimates for NI PXI-7854R. LabVIEW FPGA Version: 8.6 NI-RIO Version: 3.0 Date: 8/5/2008

Mozammel H A Khan. Marek A Perkowski. Pawel Kerntopf

Unit 1A: Computational Complexity

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Spectral Response of Ternary Logic Netlists

Digital Systems Design Overview. ENGIN 341 Advanced Digital Design University of Massachuse?s Boston Department of Engineering Dr.

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Technical Report: Projects & Background in Reversible Logic

Enumeration of Bent Boolean Functions by Reconfigurable Computer

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Analysis of Scalable TCP in the presence of Markovian Losses

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar

A Unifying Approach to Edge-valued and Arithmetic Transform Decision Diagrams

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

CSCI 1010 Models of Computa3on. Lecture 02 Func3ons and Circuits

Decomposition of Multi-Output Boolean Functions - PRELIMINARY VERSION

High Speed Time Efficient Reversible ALU Based Logic Gate Structure on Vertex Family

MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application

Reduced Ordered Binary Decision Diagrams

Special Nodes for Interface

QuIDD-Optimised Quantum Algorithms

Transcription:

A PACKET CLASSIFIER USING CASCADES BASED ON EVMDDS (K) Hiroki Nakahara Tsutomu Sasao Munehiro Matsuura Kagoshima University, Japan Meiji University, Japan Kyushu Institute of Technology, Japan ABSTRACT This paper presents a packet classifier using multiple cascades based on edge-valued multi-valued decision diagrams (EVMDDs (k)). First, a set of rules for a packet classifier is partitioned into groups. Second, they are decomposed into field functions and Cartesian product functions. Third, they are represented by EVMDDs (k), and finally, they are converted to cascades using adders. We implemented the proposed circuit on a Virtex VC evaluation board. The system throughput is. Gbps for minimum packet size ( Bytes). As for the normalized throughput (efficiency), the proposed one is. times better than existing FPGA implementations.. INTRODUCTION.. Demands of Packet Classifier A packet classification [] is a key technology in routers and firewalls. A packet header includes a protocol number, a source address, a destination address, and a port number. The packet classifier performs a predefined action for a corresponding rule. Applications of the packet classifier include a firewall (FW), an access control list (ACL), and an IP chain for an IP masquerading technique. With the rapid increase of traffic, core routers dissipate the major part the total network power []. Thus, we cannot use ternary content addressable memories (TCAMs), since they dissipate too much power. In addition, a reconfigurable architecture is necessary to update policy rules of the packet classifier. With the rapid growth of the Internet, packet classifiers have become the bottleneck in the network traffic management. Recently, a core router works at a Gbps link speed for a minimum packet size ( bytes). Thus, a parallel processing is an effective method to increase the system throughput. In this case, the throughput per area-efficiency is an important measure []. A modern FPGA consists of lookup-tables (Slices), on-chip memories (BRAMs), arithmetic circuits (DSP8Es), and so on. A balanced usage of hardware resources in FPGAs is the key to realize a high throughput per area. This paper considers a memory-based architecture on the FPGA, which dissipate lower power than TCAMs... Contributions of the Paper This paper proposes an architecture using multiple look-uptable () cascades based on edge-valued multi-valued decision diagrams (EVMDDs (k)). Our contributions are as follows: We proposed a compact and high-speed packet classifier by cascades based on EVMDDs (k). Conventional methods use only Slices and BRAMs, while the proposed architecture uses DSP8E blocks in addition to Slices and BRAMs. Thus, the proposed architecture uses available FPGA resources effectively. We implemented packet classifier using multiple cascades on an FPGA. Its system throughput is more than Gbps. As for the efficiency measure (throughput per normalized memory size), the proposed architecture is higher than existing methods. The rest of the paper is organized as follows: Chapter introduces a packet classifier; Chapter shows the cascade based on an MTMDD (k); Chapter shows the cascade based on an EVMDD (k); Chapter shows experimental results; and Chapter concludes the paper.. PACKET CLASSIFIER.. -tuple Packet Classification A packet classification table consists of a set of rules. Each rule has five input fields: Source address (SA), destination address (DA), source port (SP), destination port (DP), and protocol number (PRT). Also, it generates a rule number (Rule). A field has entries. In this paper, since we consider a realization of the packet classifier for the Internet protocol version (IPv), we assume that SA and DA have bits, DP and SP have bits, and PRT has 8 bits. An entry for SA or DA is specified by an IP address; that for SP or DP is specified by an interval [x,y], where x and y denote a port number; and that for PRT is specified by a protocol number. SA and DA are detected by a longest prefix match; SP and DP are detected by a range match; and PRT is detected by an exact match. A packet classifier detects matched rules using the packet classification table. In this paper, we assume that the rule with the largest number has the highest priority. Note that, any packet matches a default rule whose rule number is zero. Obviously, the default rule has the lowest priority. When two or more rules are matched, a rule having the highest priority is selected. Example. Table shows an example of the packet classification table, where an asterisk * in an entry matches both and, while a dash - in a field matches any pattern. In Table, each field has four bits, rather than the actual number of bits to simplify the example. Consider the packet classification table shown in Table. The packet header with SA =, DA =, SP = 8, DP = 8, and P RT = T CP matches rule, 98--99--//$. IEEE

Table. An example of a packet classification table. in out SA DA SP DP PRT Rule * [,8] [8,9] ICMP ** *** [,9] [,8] TCP * [8,] [,] UDP *** ** [8,9] [,] TCP **** **** [,] [,] - (default) X SA ( bit) X DA ( bit) X SP ( bit) X DP ( bit) X PRT (8 bit) Field Funcon m = log ( p m = log ( p m = log (p m = log (p m =8 ) ) ) ) Cartesian Product Funcon log ( p ) Fig.. Decomposition of packet classification table by Cartesian product method. rule, and the default rule. Since the rule has the highest priority, the rule is selected... Decomposition of Packet Classification Table by Cartesian Product Method Let p be the number of rules. Since X SA = X DA =, X SP = X DP =, and X P RT = 8, the direct memory realization requires log (p) bits, which is too large to implement. We use Cartesian product method [9] which decomposes the packet classification table into field functions and a Cartesian product function. An entry of a rule can be represented by an interval SA 8 9 Rule Rule Interval Rule Rule Default Vectorized Interval Segment Filed Funcon Funcon [:] [:] [:] [8:8] [9:] Vectorized DP Interval Interval Segment Funcon [:] [:] [:] [:] 8 [8:8] 9 [9:9] [:] [:] [:] Rule Rule Rule Default Rule Fig.. Examples of segment. In [9], Cartesian product was called cross product. Filed Funcon 8 Field Funcons SA IDX SA DA IDX DA SP IDX SP DP IDX DP PRT [:] [:] [:] [:] [:] [:] [:] [:] [:] [:] [:] [:] [:] [:] [:] [8:8] [8:] [8:8] [:] [:] [9:] [:] [9:9] [8:8] [:] [:] [9:9] [:] [:] Cartesian Product Funcon [:] 8 IDX SA IDX DA IDX SP IDX DP IDX PRT Rule Fig.. An example of Cartesian product method. function []: IN(X : A, B) = { (A X B) (otherwise) IDX PRT where X, A, and B are integers. Let x i {, }, y i =, v = (x, x,..., x n, y, y,..., y m ), and A = n i= x i i. Any entry for SA is represented by IN(X SA : A m,(a ) m ). Similarly, any entry for DA can be represented by an interval function. Any entry for PRT is represented by IN(X P RT : b, b), where b is a protocol number. As shown in Example., multiple rules may match in a packet classification table. In such a case, we use a vectorized interval function. Let r be the number of rules. A vectorized interval function is H(X) = r i= e iin(x : A i, B i ), where e i is a unit vector with r elements, and only i-th bit is one and other bits are zeros. For each value of H(X), we assign a segment, which is an interval or a set of intervals. Then, we define a field function F (X), which generates an unique integer index I i corresponding to the i-th segment [C i, D i ] satisfying C i X D i. Note that, to distinguish from an interval, we denote a segment consisting of an interval [C, D] as [C : D]. Next, we define Cartesian product function G : Y Z, where Y = I I I k is a set of Cartesian products of indices generated by field functions. As shown in Fig., the packet classification table is decomposed into field functions and a Cartesian product function. We can assign an arbitrary index to a segment. In this paper, we assign indices to make an M -monotone increasing function [8] to reduce the amount of memory. Let I be a set of integers including. An integer function f (X) : I Z such that f (X ) f (X) and f () = is an M -monotone increasing function on I. Example. Fig. shows examples of segments for SA and DP shown in Table. Note that, rules are represented by intervals.. CASCADE BASED ON MTMDD (K).. Cascade Realization of an M -monotone Increasing Function A binary decision diagram (BDD) [] is obtained by applying Shannon expansions repeatedly to a logic function ()

SA xxxx IDX SA 8 9 M -monotone Increasing Funcon x x x x MTBDD (MTMDD()) x x x µ = X {x,x} X {x,x} X {x,x} MTMDD() µ = Fig.. An cascade based on an MTMDD (k). () x x r Rails cascade r x x f Terminal Node Non-terminal Node Fig.. Conversion of an MTBDD node into an EVBDD node. f. Each non-terminal node labeled with a variable x i has two outgoing edges which indicate nodes representing cofactors of f with respect to x i. A multi-terminal BDD (MTBDD) [] is an extension of a BDD and represents an integer-valued function. In the MTBDD, the terminal nodes are labeled by integers. Let X = (X, X,..., X u ) be a partition of the input variables, and X i be the number of inputs in X i. X i is called a super variable. When the Shannon expansions are performed with respect to super variables X i, where X i = k, all the non-terminal nodes have k edges. In this case, we have a multi-valued multi-terminal decision diagram (MTMDD(k)) []. Note that, an MTMDD() means an MTBDD. The width of the MDD (k) at the height k is the number of edges crossing the section of the MDD (k) between super variables X i and X i, where the edges incident to the same node are counted as one. Let p be the number of rules, and X = n. An M - monotone increasing function can be realized by an cascade [] shown in Fig.. Connections between i and i requires r i = log µ i. Since a modern FPGA has BRAMs and distributed RAMs (realized by Slices), cascades are easy to implement. The amount of memory for i based on an MTMDD (k) is r i (kri). Thus, the total amount of memory for an cascade is M = u i= r i (kri). The number of unique indices for the M -monotone increasing function is equal to the number of segments. A reduction of r i reduces the amount of memory for an cascade. Thus, to reduce the amount of memory for the cascade, we partition rules into subrules which increases the least number of segments. Example. Fig. converts the field function for SP shown in Fig. into the cascade. First, the given function is converted to the MTBDD. Then, it is converted to the MT- MDD (k). Next, by realizing each index on the MTMDD (k) of the, we have the cascade. In this example, the amount of memory for the cascade is = bits. As for an M -monotone increasing function, the upper bound on the number of in the cascade has been analyzed. p terminals X u µ u X u- µ u µ X X u X u- X log µ u log µ u log µ log ( p ) Fig.. Conversion of an cascade from an MT- MDD (k). X X X X X Field Funcons m m m m m Cartesian Product Funcons Cascade log ( p ) Fig.. Packet Classifier by cascades based on an MT- MDD (k). Theorem. [] Let p be the number of unique indices for the M -monotone increasing function. The upper bound on the number of in the cascade is r = log (p )... Partition of Rules by Greedy Algorithm Since a field function produces at most p segments, it is compactly realized by an cascade []. However, the In [], the M -monotone increasing function is called segment index encoder function.

Cartesian product function produces O(p ) segments [9]. Thus, a direct realization by an cascade is hard. To reduce the number of segments, we partition rules into subrules. Then, we realize subrules by circuits shown in Fig.. Since two or more rules may match at the same time, we attached the maximum selector to the output. Let [x, y] be an entry for a field. Then, y x is the size of the interval. We propose the greedy algorithm to partition rules as follows: Algorithm. (Partition of rules) Let R = {r, r,..., r p } be the set of rules, p be the number of rules, G = {G, G,..., G q } be the partition of rules, and q be the number of groups of rules.. Compute the sum of sizes of intervals d for each rule. Then, sort the rules in decreasing order as R = (r, r,..., r p).. q, i.. G q r i.. Do Steps. to. until i > p... For j q, decompose G j r i by Cartesian product method, then generate cascades. And, obtain the amount of memory M grp for the cascades... Decompose r i by Cartesian product method, then generate cascades. And, obtain the amount of memory M single for the cascades... If M grp < M single, then G j G j r i. Otherwise, G q r i, and q q... i i.. Terminate. Algorithm. partitions the packet classification table efficiently using its property. Real-life packet classification tables in an inherent data structure are analyzed in []. Since many packet classification tables are maintained by humans, global controls (wide range port) are used in the global networks, while detail controls (narrow range port) are used in the local networks. Thus, in practice, the number of seldom becomes the worst. A simple partition algorithm can suppress the increase of segments. As a result, we can reduce the memory size.. CASCADE BASED ON AN EVMDD (K) To reduce the amount of memory for an cascade, we introduce an cascade based on an edge-valued multivalued decision diagram (EVMDD (k)) [], which is an extension of an EVBDD. An EVBDD consists of one terminal node representing zero and non-terminal nodes with a weighted -edge, where the weight has an integer value α. An EVBDD is obtained by recursively applying the conversion shown in Fig. to each non-terminal node in an MTBDD. Note that, in the EVBDD, -edges have zero weights. In an M α -monotone increasing function, subfunction f is obtained by adding α to subfunction f. Thus, an EVBDD may have smaller widths by sharing f and f with α edge (Fig. 8 (a)). The MTBDD only shares prefixes, while the EVBDD shares both prefixes and postfixes (Fig. 8 (b)). By rewriting x α f f α f (a) Fig. 8. Principle of reduction of width in an EVBDD. x x x x x x x x x x x (b) x x Translaon f' f Fig. 9. An example of rewrite to M -monotone increasing function. the terminals of the MTBDD for the Cartesian product function, we have the M -monotone increasing function. Fig. 9 shows an example to obtain an M -monotone increasing function. To recover the original function, we use a translation memory. The size of the translation memory is equal to the number of terminal nodes in the MTBDD. Experimental results shows that its amount memory tends to be small. An edge-valued MDD (k) (EVMDD (k)) is an extension of the MDD (k), and represents a multi-valued input M -monotone increasing function. It consists of one terminal node representing zero and non-terminal nodes with edges having integer weights, and -edges always have zero weights. Let p be the number of rules, and X = n. An M - monotone increasing function is efficiently realized by an cascade with adders [9] shown in Fig.. In this case, the represent sub-functions in the EVMDD (k). The outputs from each i other than represent the sum of weights of edges. We call such outputs A which consist of ar i. Since the width of the EVMDD (k) for M -monotone increasing function is smaller than that of the MTMDD (s), we can reduce the amount of memory for the cascade by using an EVMDD (k). Since we realize the adders by DSP blocks (DSP8Es), FPGA resources are efficiently used. The amount of memory for i is (r i ar i ) kri. Let X = n be the number of inputs, and k = X i. The cascade has u = n k s. Thus, the cascade based on an EVMDD (k) requires u i= (r i ar i ) kr i bit of memory in total. Also, it requires u adders. Generally, an increase of k increases the amount of memory, while decreases the number of adders. Thus, in this paper, we find a value of k that uses FPGA resources efficiently. Example. Fig. shows the EVBDD obtained from the MTBDD shown in Fig.. At the first level, the width of the MTBDD is four, while that of the EVBDD is two. First,

X u X u µ u X u- X u- µ u µ X X A x x x x X {x,x} X {x,x} EVBDD (EVMDD()) EVMDD (k) = Rail x x Arail x x ARail Rail cascade Fig.. Conversion of EVMDD (k) into an cascade. Fig.. Conversion of an EVBDD into an cascade. Single- Realizaon k= cascades k= based on k= MTMDD (k) k= k= cascades k= based on k= EVMDD (k) k= Subrule (9 rules) 9 Cartesian Product Translaon -Fields Subrule ( rules) Size [KB] Fig.. Comparison of memory sizes [KB]. Fig.. Number of adders for EVMDD (k). X SA X DA X SP X DP X PRT -Fileds Cartesian Product cascade cascade Translaon Subrule (9 rules) Subrule ( rules) Maximum Selector We used the Xilinx PlanAhead version. for the synthesis. As for the cascade implementation, i whose size is equal to or more than Kb i is implemented by Kb BRAMs, while i whose size is less than Kb i is implemented by distributed RAMs using Slices. To increase the system throughput, we set the dual port mode to the memory. By Algorithm., we partitioned 9,8 ACL rules generated by ClassBench [] into two: Subrule (9, rules) and Subrule ( rules). Then, each subrule is decomposed into five field functions and a Cartesian product function. Finally, each function is realized by an cascade based on an EVMDD (k). To reduce the widths for an EVMDD (k), we used the shifting method []. Fig.. Overall architecture. convert the EVBDD to the EVMDD (k). Then, convert it to the cascade. Its memory size is ( ) = bits. A single memory implementation of this function requires = 8 bits. Thus, the cascade can reduce the total memory size.. EXPERIMENTAL RESULTS.. Implementation Setup We implemented the proposed circuit on the Virtex VC evaluation board (FPGA: Xilinx, XCVX8T-FFG,,9 Slices,, KbBRAMs, and,8 DSP8E Blocks)... Comparison of EVMDD (k) with MTMDD (k) to Implement We realized the packet classifier by three different methods: A single memory. cascades based on MTMDDs (k). cascades based on an EVMDDs (k). To find the smallest cascade, we changed the size of super variables k from one to four. Fig. compares the memory sizes. It shows that, for all k, EVMDDs (k) produced smaller cascades than MTMDDs (k). Also, the memory size takes its minimum when k =. As for Cartesian product functions, EVMDDs (k) required smaller memory than MTMDDs (k) even if the translation memories are

Table. Comparison with other methods. Architecture #Rules [KB] [B] /#Rule Throughput [Gbps] Efficiency [Gbps Rules/KB] BV-TCAM (FPGA ) [8].8. 8. -based DCFL (FCCM 8) [] 8 8...9 sbfce (FCCM 8) [], 8... Simplified Hyper Cuts (ANCS 8) [], 8 9.8.8 9. Cartesian-Product with Quadtrees (ASAP 9) [] 9,. 9. 9. Optimized Hyper Cuts (IEEE Trans. VLSI) [] 9,. 8. 8.9 Multiple cascades (Proposed) 9,8.. 9. used. Fig. shows the number of adders for EVMDD (k). Although EVMDD (k) requires DSP8Es, it requires less than.8% of available resources. Thus, the usage of DSP8Es is negligible. As shown in this part, the cascade based on EVMDDs (k) efficiently utilize the resource of an FPGA... Comparison with Other Methods According to the result of Section., we implemented the packet classifier by the cascades based on an EVMDD (k) shown in Fig., which consumes, Slices (.%), BRAMs (.%), and DSP8E blocks (.8%). Since the maximum clock frequency was. MHz, we set the system clock frequency to MHz. Thus, the system throughput is. (MHz) (ports) (Bits) =. Gbps for minimum packet size ( Bytes). Table compares the proposed method with other methods. Since different methods use different numbers of rules of ClassBench, we use the efficiency measure [] by T hroughput [Gbps] normalizedarea ( [B]/#Rules) to compare them. The proposed architecture implemented 9,8 rules by [KB] memory, and its system throughput was. [Gbps]. Thus, the efficiency measure is 9. [Gbps rules/kb]. From Table, the efficiency measure of the proposed architecture is. times higher than that of Cartesian-Product with Quadtrees method [] that was the best among the existing methods. In this way, we implemented a high-speed and area efficient system.. CONCLUSION In this paper, we showed a method to implement a packet classifier. First, the packet classification table was decomposed into two subrules. Second, they were decomposed into five field functions and a Cartesian product function. And finally, each function was realized by an cascade based on an EVMDD (). We implemented the proposed architecture on a Virtex VC evaluation board. Experimental result showed that, the efficiency measure (throughput per normalized area) is. times higher than that of an existing method. The rules for the packet classifier should be updated (added and deleted) frequently. The addition and deletion of a registered vector can be done in time that is proportional to the number of cells in the cascade []. One of a future project is applying this update method in the proposed architecture.. ACKNOWLEDGMENTS This research is supported in part by the Grants in Aid for Scientistic Research of JSPS. Reviewer s comments were useful to improve the paper. 8. REFERENCES [] R. E. Bryant, Graph-based algorithms for boolean function manipulation, IEEE Trans. Comput., Vol. C-, No. 8, 98, pp. -9. [] E. M. Clarke, K. L. McMillan, X. Zhao, M. Fujita, and J. Yang, Spectral transforms for large Boolean functions with applications to technology mapping, DAC99, 99, pp. -. [] G. S. Jedhe, A. Ramamoorthy, and K. Varghese, A scalable high throughput firewall in FPGA, FCCM8, 8, pp. -. [] T. Kam, T. Villa, R. K. Brayton, and A. L. Sangiovanni- Vincentelli, Multi-valued decision diagrams: Theory and applications, Multiple-Valued Logic: An International Journal, Vol., No. -, 998, pp. 9-. [] A. Kennedy, X. Wang, Z. Liu, and B. Liu, Low power architecture for high speed packet classification, ANCS8, 8, pp. -. [] Y-T. Lai and S. Sastry, Edge-valued binary decision diagrams for multi-level hierarchical verification, DAC99, 99, pp. 8-. [] L. Xu, B. Yang, Y. Xue, and J. Li, Packet classification algorithms: From theory to practice, INFOCOM9, 9, pp. 8-. [8] S. Nagayama and T. Sasao, Complexities of graph-based representations for elementary functions IEEE Trans. Comput., Vol. 8. No., Jan. 9, pp.-9. [9] S. Nagayama, T. Sasao, and J. T. Butler, Design method for numerical function generators using recursive segmentation and EVBDDs, IEICE Trans. Fundamentals, Vol. E9-A, No.,, pp. -. [] H. Nakahara, T. Sasao and M. Matsuura, A CAM emulator using look-up table cascades, RAW,, CD-ROM-RAW, paper-. [] A. Nikitakis and I. Papaefstathiou, A memory-efficient FPGAbased classification engine, FCCM8, 8, pp. -. [] W. Jiang and V. K. Prasanna, Scalable packet classification on FPGA, IEEE Trans. on VLSI, Vol., No. 9,, pp. 8-8. [] W. Jiang and V. K. Prasanna, A FPGA-based parallel architecture for scalable high-speed packet classification, ASAP9, 9, pp. -. [] R. Rudell, Dynamic variable ordering for ordered binary decision diagrams, ICCAD99, pp. -, 99. [] T. Sasao, -based logic synthesis, Springer,. [] T. Sasao, On the complexity of classification functions, IS- MVL8, 8. [] T. Sasao, M. Matsuura, and Y. Iguchi, A cascade realization of multiple-output function for reconfigurable hardware, IWLS,, pp. -. [8] H. Song and J. W. Lockwood, Efficient packet classification for network intrusion detection using FPGA, FPGA,, pp. 8-. [9] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, Fast and scalable layer four switching, SIGCOM998, 998, pp. 9-. [] D. E. Taylor, Survey and taxonomy of packet classification techniques, ACM Computing Curveys, Vol., No.,, pp. 8-. [] D. E. Taylor and J. S. Turner, ClassBench: a packet classification benchmark, INFOCOM,, Vol., pp. 8-9. [] R. Tucker, Optical packet-switched WDM networks: a cost and energy perspective, OFC/NFOEC8, 8.