LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations

LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations Tsutomu Sasao and Alan Mishchenko Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 80-80, Japan Dept. of EECS, University California Berkeley, CA, USA Abstract First, this paper considers the number of LUTs to implement logic functions based on MUX-based realization and cascade realization. This is useful to quickly estimate the number of LUTs to implement the functions on a FPGA. Second, this paper shows an algorithm to realize logic functions by -LUTs using cascade and MUX-based realizations. It often produces smaller circuits than previous methods when the number of the input variables is smaller than. Introduction A K-LUT (look-up table) is a module that realizes an arbitrary K-variable function. This paper considers the number of K-LUTs to realize given function. FPGAs with K =or input LUTs are believed to be the most efficient [, ]. However, in the current technology, FPGAs with K =are standard [,, ]. Thus, we have the following: Problem. Given an n-variable (multiple-output) function f, find the minimum number of -LUTs needed to implement f, where n 0. Unfortunately, it is a very hard problem, in general. So, we try to obtain an upper bound by using properties or measures of the function. The property of the function should be quickly detected, such as symmetry, and measures should be quickly calculated. Such measures include the number of variables (i.e., the support size), the C-measure of a function, the number of products in the SOP [], the number of literals in the factored expression [], the number of nodes in the decision diagram [8]. In this paper, we present two types of realizations: MUX-based realization and cascade realization. Also we show various theorems on the number of LUTs. After that, we we show an algorithm to realize logic functions by using -input LUTs (-LUTs). The algorithm is suitable for the functions with up to inputs, and often produces better solutions than previous methods. Due to the page limitation, all proofs of theorems are omitted. MUX-based Realization In this part, we show a universal method to implement an n-variable function, and obtain upper bounds on the number of LUTs. Such bounds are useful when we only know the number of the input variables n. Definition. A multiplexer with a single control input (- MUX) is the selection circuit shown in Fig... It performs the logical operation: g(x, y 0,y )= xy 0 xy. t-mux is shown in Fig... It is a multiplexer with t control inputs (x,...,x t ), and t data inputs (y 0,y,...,y t ). Let g(x,...,x t,y 0,y,...,y t ) be the output function. Then, g = y a when the decimal representation of the control input (x,...,x t ) is a. That is, when the control input is (0, 0,...,0), the top data input y 0 is selected. When the control input is (0, 0,...,), the second data input y is selected. Also, when the control input is (,,...,), the last data input y t is selected. Lemma. An n-mux is realized by using n copies of -MUXs. Example. A -MUX is realized with =7copies of -MUXs as shown in Fig... (End of Example) Definition. Let B = {0, }. Let X =(x,x,...,x t ). Then, we can consider that X takes its value from P =

x x x y 0 y g = xy 0 xy Figure.. -MUX. y 0 y x x k y 0 y y y y y y y 7 x y k - Figure.. Realization of -MUX by using - MUXs. Figure.. K-MUX. {0,,..., t }. Let S be a subset (S P ) of P. Then, X S is a literal of X. When X S, X S =, and when X / S, X S = 0. Let S i P i (i = S,,...,n), then X S S X X n n is a logical product. (S X,S,...,S n) S S S X X n n is a sum-of-products S expression (SOP). When S i = P i, X i i =and the logical product is independent of X i. In this case, literal X i i P is redundant and can be deleted. A logical product is also called a term, oraproduct term. Theorem. An arbitrary n-variable function is expanded as follows: f(x,x )= g i (X )X, i i P where X = (x,x,...,x k ) and X = (x k+,x k+,...,x n ), P = {0,,..., n k }, and the OR is performed with respect to n k elements. Example. Consider the realization of a 7-variable function f(x,x ), where X =(x,x,...,x ), and X = (x,x 7 ). f is expanded into a sum of four products: f(x,x )= i=0 g i(x )X i = g 0 (X )X 0 g (X )X g (X )X g (X )X. As shown in Fig.., -MUX can be implemented by using three copies of -MUXs. The top LUT in the leftmost column realizes g 0, which is selected when (x,x 7 )=(0, 0). The second LUT in the leftmost column realizes g,which is selected when (x,x 7 )=(0, ). Other LUTs are derived similarly. (End of Example) Theorem. When K n, an arbitrary n-variable function is realized by using at most n K copies of -MUXs, and n K copies of K-LUTs. X X X X g 0 g g g x -MUX Figure.. Realization of an arbitrary 7- variable function using -LUTs. To implement -MUX, we have more efficient realization than Fig... When K =, a -MUX can be realized by a single -LUT instead of using three copies of -MUXs. Lemma. An arbitrary function of n = K +variables can be implemented with at most three K-LUTs, where K. Lemma. An arbitrary function of n = K +variables can be implemented with at most five K-LUTs, where K. Theorem. [9, ] The number of -LUTs to implement an arbitrary n-variable function f is ( n )/ or less, when n is even, and ( n +)/ or less, when n is odd. x7

-MUX -MUX Figure.. Realization of an arbitrary 8- variable function using -LUTs. -MUX x7 x8 -MUX -MUX x9 -MUX Figure.7. Realization of an arbitrary 0- variable function using -LUTs. X =(x,x ) 00 0 0 00 0 0 0 X =(x,x ) 0 0 0 0 0 0 0 0 0 0 0 Figure.. Realization of an arbitrary 9- variable function using -LUTs. Example. The number of -LUTs to implement an n- variable function is or less, when n = 8. In this case, g i (X )(i = 0,,,, ) are implemented by four copies of -LUTs, while the -MUX is implemented by a single -LUT, as shown in Fig... In the figure, the numbers in squares denote the numbers of necessary LUTs. or less, when n =9. In this case, g i (X )(i = 0,,,...,7) are implemented by 8 copies of -LUTs, while the -MUX is implemented by three -LUTs as shown in Fig... or less, when n =0. In this case, g i (X )(i = 0,,,...,) are implemented by copies of - LUTs, while the -MUX is implemented by using five -LUTs as shown in Fig..7. (End of Example) Cascade Realization In this part, we show a cascade realization of a logic function. This gives more compact circuits than the MUXbased realization, but is only applicable to function with special properties. Figure.. Decomposition table of an logic function.. Functional Decomposition Definition. [] Let f(x) be a logic function, and (X,X ) be a partition of the input variables, where X = (x,x,...,x k ) and X =(x k+,x k+,...,x n ). The decomposition table for f is a two-dimensional matrix with k columns and n k rows, where each column and row is labeled by a unique binary code, and each element corresponds to the truth value of f. The function represented by a column is a column function. Variables in X are bound variables, while variables in X are free variables. Inthe decomposition table, the column multiplicity denoted by µ k is the number of different column patterns. Example. Fig.. shows a decomposition table of a - variable function. Since all the column patterns are different, the column multiplicity is µ =. (End of Example) Theorem. [] For a given function f, letx be the bound variables, let X be the free variables, and let µ k be the column multiplicity of the decomposition table. Then, the function f can be realized with the network shown in Fig... In this case, the number of signal lines connecting two blocks H and G is log µ k. When the number of signal lines connecting two blocks is smaller than the number of input variables in X, we can often reduce the total amount of memory by the realization in

X H X Figure.. Realization of a logic function by decomposition. G f X X Figure.. Realization of a 9-variable function with C-measure at most 8. Fig.. [7]. Such technique is call a Curtis decomposition [], in this paper.. C-Measure Definition. Let f(x,x,...,x n ) be a logic function. Let µ k be the column multiplicity of the decomposition table for f(x,x ), where X = (x,x,...,x k ) and X =(x k+,...,x n ). The C-Measure of the function f is max(µ,µ,...,µ n ), and denoted by µ(f). By repeatedly applying functional decompositions to a given function, we have an LUT cascade [0] shown in Fig... An LUT cascade consists of cells, and the signal lines connecting adjacent cell are rails. A logic function with a small C-measure can be realized by a compact LUT cascade. To obtain the C-measure, a decomposition table is not necessary. A quasi-reduced binary decision diagram (QRBDD) that represents the logic function directly shows the C-measure: The C-measure is equal to the maximum width of the QRBDD, where the variable ordering is (x,x,...,x n ) [7].. LUT Cascade Lemma. [0] Let µ(f) be the C-measure of a function f. Then, f can be implemented by an LUT cascade, whose cells have at most log µ(f) +inputs, and log µ(f) outputs. Lemma. [] In an LUT cascade that realizes the function f, letn be the number of the input variables; s be the number of cells; w = log µ(f) be the maximum number of rails; K be the number of inputs for a cell; n K +; Figure.. LUT cascade. f and K log µ(f) +. Then, an LUT cascade satisfying the following condition exists: n w s. K w. Bounds for Functions with Small C- Measure From Lemmas. and., we can derive the followings: Theorem. [] The number of -LUTs to implement an arbitrary n-variable function, where n 8 is:. n or less, when µ(f).. n or less, when µ(f).. n or less, when µ(f) 8(n =r).. n or less, when µ(f) 8(n r). Example. Let f(x,x ) be a 9-variable function, where X =(x,x,...,x ) and X =(x 7,x 8,x 9 ). Let µ be the column multiplicity of the decomposition of f with respect to (X,X ).Ifµ 8, then f(x,x ) can be implemented with four copies of -LUTs, as shown in Fig... To check that a given 9-variable function can be realized as Fig. (., we need to check the column multiplicities for only 9 ) =combinations.. Bound for Symmetric Functions When the given function is symmetric, it can be implemented more efficiently than a general function [7, ]. Efficient algorithms to detect symmetric functions exist, e.g., []. Lemma. Let f be a symmetric function of n-variables. Then, µ(f) n +. Theorem. The number of K-LUTs to implement an n- variable symmetric function is

Figure.. Realization of a symmetric function of variables by -LUTs. Figure.. Realization of a symmetric function of variables by -LUTs.. or less, when n =9and K =. Fig.. shows the realization.. 7 or less, when n =and K =. Fig.. shows the realization.. or less, when n =and K =. Fig.. shows the realization. Logic Synthesis with -LUTs For some class of functions, the LUT cascade realizations require many fewer LUTs than other methods. Current version of the program works for only small problems. The algorithm obtains the column multiplicity µ and calculate the number of cells in the LUT cascade. When log (µ) K, the algorithm uses a MUX-based realization. The outline of the method is as follows: Algorithm. (LUTMIN:Logic synthesis using -LUTs). For a multi-output function, decompose it into singleoutput functions.. For each single-output function F : (a) Minimize support of F by removing redundant variables. (b) If the support size of F is less than or equal to K (LUT size), implement it using one K-LUT. (c) If the support size of F is exactly K +, the best we can do is to realize with by two K-LUTs. In this case, we check cofactors of F w.r.t. each variable. If the support size of one cofactor is K or less, implement F using two K-LUTs composed of the larger cofactor in one LUT, and smaller cofactor and MUX in another LUT. (d) If F is not decomposed in step (.b) and (.c), reorder BDD of F to minimize the number of nodes. (e) Consider M = log (µ), where µ is column multiplicity of F w.r.t. the bound-set composed of K variables on top of the BDD. (f) If M < K, use Curtis decomposition w.r.t. K topmost variables as the bound set. Implement M bound set nodes using K-LUTs and perform recursive decomposition of the composition function whose support size is equal to: Support size(f ) K + M. (g) If M K, create -MUX w.r.t. to the two topmost variables in the BDD, and recursively decompose four cofactors.. As a last post-processing step, detect and merge functionally equivalent nodes in the resulting LUT network. (Such nodes could be created in step (.g) when, for example, two of the four cofactors of F are equal.) Example. Consider sym [9], a symmetric function of variables. sym is iff the number of s in the inputs is between and 8.. The column multiplicity of the function with respect to the bound set composed of K =variables is µ =7. Thus, M = log 7 =.. Since M < K, we use Curtis decomposition with respect to the topmost variables in the bound set: x,x,...,x. Realize the first cell using -LUTs, which corresponds to the leftmost cell in Fig.... The composition function has -+=9 variables. The top of the variables are three outputs of the leftmost cell, and x 7,x 8,x 9. In this case, the column multiplicity is µ 9 =8. Thus, M = log 8 =.. Since M<K, we use Curtis decomposition with respect to the topmost variables in the bound set: three outputs of the leftmost cell, and x 7,x 8,x 9. Realize the second cell using -LUTs, which corresponds to the middle cell in Fig...

xⅹ x x7 x8 x9 x0 x x sym Figure.. sym implemented by -LUTs. Table.. Number of -LUTs realize functions. MBC LUT MIN SAS Name PI PO LUT lev LUT lev LUT lev alu 8 8 08 8 amd 8 09 9 apex 9 9 87 chkn 9 7 7 7 0 8 cordic 9 cps 09 7 70 9 ex00 0 0 98 9 0 exep 0 0 0 8 in 0 0 0 0 98 intb 7 9 7 7 9 misex 0 misj 7 9 pdc 0 7 0 spla 8 0 ti 8 7 7 9 8 97 8 tial 8 08 89 8. The composition function has 9-+= variables. Since the support size is equal to K=, implement the function by a -LUT, which corresponds to the rightmost cell in Fig.... In this way, sym is realized by ++=7LUTs of -inputs. Note that this realization is different from Fig... Experimental Results (End of Example) Algorithm. was implemented in ABC [] as command lutmin, applicable to logic networks in ABC. The implementation was tested on benchmarks from Espresso/MCNC benchmark suites. The experiments were run on the laptop with Intel Core Duo U900.GHz CPU with Gb RAM (only a few Mb of RAM was used). The total runtime for benchmarks was 8 sec. The resulting LUT networks were verified using SAT-based combinational equivalence checking command cec in ABC. Table. shows the numbers of LUTs and logic levels when K =. The columns headed with MBC show the results of improved version of Mishchenko et al. [0]. The columns headed with LUTMIN show the results obtained by lutmin (Algorithm.). When the number of inputs is less than, Algorithm. often produced better solutions than Mishchenko et al.[0]. The columns headed with SAS shows the upper bounds obtained by using additional idea. (It obtains only the number of LUTs to show the possibility of further improvement. ) In Manohararajah et al. [8] and Mishchenko et al. [0], numbers of -LUTs to implement some benchmark functions are shown. For example, ex00, a 0-input 0-output function. Example. shows that to implement any 0 variable function, LUTs are sufficient. Thus, to implement ex00, which is a 0-output function, requires at most 0 = 0 LUTs. Note that 9 LUTs are used in [8] and 09 LUTs are used in [0]. apex is a 9 input 9-output function. Example. shows that to implement any 9 variable function, LUTs is sufficient. Thus, apex, which is 9-output function, requires at most 9 = 09 LUTs. Note that 78 LUTs are used in [8] and 7 LUTs are used in [0]. These results show that the even the state-of-the-art logic synthesis algorithms still have room for improvements, which is also noted by Cong and Minkovich []. The reason why [0] failed to obtain the optimum solutions are as follows:. [0] uses AND inverter graphs as an initial solutions, which can lead to local minimal solutions.. [0] uses heuristic method to design larger circuits, and some optimization is omitted to save computation time. It should be noted that [0] is applicable to large circuits, while Algorithm. is applicable to only small problems.. Conclusion and Comments In this paper, we consider the number of -LUTs to implement logic functions. Also presented a synthesis method with -LUTs. The method is MUX-based realizations and cascade-based realizations. Although, this method produces circuits with more levels and is practical for the functions with at most inputs, it often produces circuits with fewer LUTs than existing methods [0, 8]. A possible extension is to optimize encodings of the rail outputs [9]. By considering the encodings, we can often replace one or more outputs functions of LUTs by primary input variables. In such a case, the number of LUTs can be further reduced. Another extension is to extract decomposable component whose number of inputs is at most.

In this paper, to implement the multiple-output functions, functions are decomposed into single-output ones. However, decomposition into groups of a few functions also produces very good circuits []. Yet another possible extension is to use such partitions of the outputs. 7. Acknowledgments This research is supported in part by the Grants in Aid for Scientific Research of JSPS, and the Grant of Knowledge Cluster Project of MEXT. Discussion with Prof. R. K. Brayton was quite useful. References [] Altera, Stratix IV FPGA Core Fabric Architecture, http://www.altera.com/ [] E. Ahmed and J. Rose, The effect of LUT and cluster size on deep-submicron FPGA performance and density, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol., No., March 00, pp.88-98. [] R. L. Ashenhurst, The decomposition of switching functions, International Symposium on the Theory of Switching,pp. 7-, April 97. [] Berkeley Logic Synthesis and Verification Group, ABC: A System for Sequential Synthesis and Verification, Release 709. http://www.eecs.berkeley.edu/ alanmi/abc/ [] J. Cong and K. Minkovich, Optimality study of logic synthesis for LUT-Based FPGAs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol., Issue, Feb. 007, pp.0-9. [] H.A.Curtis,ANew Approach to the Design of Switching Circuits, D. Van Nostrand Co., Princeton, NJ, 9. [7] V. N. Kravets, K. A. Sakallah, Constructive libraryaware synthesis using symmetries, DATE 000, pp.08-, March 000. [8] V. Manohararajah, S. D. Brown, and Z. G. Vranesic, Heuristics for area minimization in LUT-based FPGA technology mapping, IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, Vol., No., November 00, pp. -0. [9] A. Mishchenko and T. Sasao, Encoding of Boolean functions and its application to LUT cascade synthesis, International Workshop on Logic and Synthesis (IWLS00), New Orleans, Louisiana, June -7, 00, pp.-0. [0] A. Mishchenko, R. K. Brayton, and S. Chatterjee, Boolean factoring and decomposition of logic networks, Proc. ICCAD 08, Nov. 008, pp.8-. [] R. Murgai, R. Brayton, and A. Sangiovanni Vincentelli, Logic Synthesis for Field-Programmable Gate Arrays, Springer, July 99. [] H. Nakahara and T. Sasao, A PC-based logic simulator using a look-up table cascade emulator, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, Vol. E89-A, No., Dec. 00, pp. 7-8. [] S. Panda, F. Somenzi, and B. F. Plessier, Symmetry detection and dynamic variable ordering of decision diagrams, ICCAD99, Nov. 99, pp.8-. [] C. A. Papachristou, Characteristic measures of switching functions, Information Science, Vol.,No., pp.- 7, 977. [] J. Rose, R. J. Francis, D. Lewis, and P. Chow, Architecture of field programmable gate arrays: The effect of logic block functionality on area efficiency, IEEE J. Solid State Circ.,, pp. 7-, Oct. 990. [] J. Rose, A. El Gamal, and A. Sangiovanni-Vincentelli, Architecture of field-programmable gate arrays, Proc. IEEE,Vol. 8, No. 7, pp.0-09, July 99. [7] T. Sasao, FPGA design by generalized functional decomposition, In Logic Synthesis and Optimization, Sasao ed., Kluwer Academic Publisher, pp. -8, 99. [8] T. Sasao and J. T. Butler, A Design method for look-up table type FPGA by pseudo-kronecker expansion, IEEE International Symposium on Multiple-Valued Logic, Boston, May 99, pp. 97-0. [9] T. Sasao, Switching Theory for Logic Synthesis, Kluwer Academic Publishers, 999. [0] T. Sasao, M. Matsuura, and Y. Iguchi, A cascade realization of multiple-output function for reconfigurable hardware, International Workshop on Logic and Synthesis (IWLS0), Lake Tahoe, CA, June -, 00, pp. - 0. [] T. Sasao, Analysis and synthesis of weighted-sum functions, IEEE TCAD, Special issue on International Workshop on Logic and Synthesis, Vol., No., May 00, pp. 789-79. [] T. Sasao, A New expansion of symmetric functions and their application to non-disjoint functional decompositions for LUT-type FPGAs, International Workshop on Logic Synthesis, Dana Point, California, U.S.A., May -June, 000, pp.0-0. [] T. Sasao, On the number of LUTs to realize sparse logic functions, International Workshop on Logic and Synthesis IWLS-009, July -Aug., 009, Berkeley, U.S.A.. [] Xilinx, Advantages of the Virtex- FPGA, -Input LUT Architecture, WP8 (v.0), Dec. 9, 007, http://www.xilinx.com/