WHILE designing VLSI chips, engineers need to take

Size: px

Start display at page:

Download "WHILE designing VLSI chips, engineers need to take"

Lesley Williamson
5 years ago
Views:

1 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS TurboMOR-RC: an Efficient Model Order Reduction Technique for RC Networks with Many Ports Denis Oyaro, Student Member, IEEE, and Piero Triverio, Member, IEEE Abstract Model order reduction (MOR) techniques play a crucial role in the computer-aided design of modern integrated circuits, where they are used to reduce the size of parasitic networks. Unfortunately, the efficient reduction of passive networks with many ports is still an open problem. Existing techniques do not scale well with the number of ports, and lead to dense reduced models that burden subsequent simulations. In this paper, we propose TurboMOR-RC, a novel MOR technique for the efficient reduction of passive RC networks. TurboMOR-RC is based on moment-matching at DC, achieved through efficient congruence transformations based on Householder reflections. A novel feature of TurboMOR-RC is the block-diagonal structure of the reduced models, that makes them more efficient than the dense models produced by existing techniques. Moreover, the model structure allows for an insightful interpretation of the reduction process in terms of system theory. Numerical results show that TurboMOR-RC scales more favourably than existing techniques in terms of reduction, simulation and memory consumption. Index Terms Model order reduction, many ports, moment matching, parasitics, partitioning. I. INTRODUCTION WHILE designing VLSI chips, engineers need to take into account the parasitic resistance, capacitance and inductance of signal- and power-delivery interconnects, in order to prevent signal and power integrity issues 3. Electromagnetic solvers are used to extract RC or RLC interconnect models, which are then connected to non-linear devices for system-level simulations. Unfortunately, parasitic networks can be very large, featuring a huge number of components, nodes and ports. Direct simulation involving such large networks is often prohibitive. Model order reduction (MOR) is frequently used to reduce parasitic models to a manageable size, and accelerate subsequent simulations. Several approaches to MOR have been proposed in the last decades, such as node elimination 4, Krylov subspaces 5, 6, and balancing 7. Krylov methods are widely used for parasitic reduction, since they are more scalable than balancing methods. Among them, 8 is one of the most popular and widely used Krylov algorithms. s success is due to its ability to guarantee the passivity of the reduced model, a mandatory property to prevent divergent Manuscript received...; revised... This work was supported in part by the Natural Sciences and Engineering Research Council of Canada (Discovery Grant program) and in part by the Canada Research Chairs program. D. Oyaro and P. Triverio are with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, M5S 3G4 Canada ( piero.triverio@utoronto.ca). transient simulations 9. Unfortunately, can become very inefficient when applied to networks with many ports. generates the reduced model through a congruence transformation with an orthogonal matrix that spans a suitable Krylov subspace. The orthogonal projection matrix is dense and can become very large when ports are many. Generating the reduced model becomes very consuming, since it involves products between large and dense matrices. In some cases, even storing the projection matrix can be challenging. Moreover, the obtained reduced model is dense, large and frequently slower than the original system. These issues affect most existing techniques and are an outstanding issue in MOR. A number of techniques have been recently proposed to address such challenges. Methods like SVDMOR, ESVD- MOR, RECMOR 3 and several others 4, 5 aim at reducing the number of ports before applying. This is done by exploiting the correlation that may exist between different ports. However, practical networks with many ports rarely exhibit a high degree of correlation 6. In 7 9, the problem of reducing networks with many ports is simplified by clustering inputs into small groups, and reducing each subsystem individually. These methods generate accurate and block-diagonal reduced models that are sparse. However, since subsystems are treated independently, passivity is not always guaranteed. Another method known as SIP offers a more efficient approach to moment matching for RC networks. Rather than explicitly constructing the projection matrix, sparse matrix manipulations are used to generate the reduced matrices directly using the Schur complement, an idea also used in PACT. This makes SIP more efficient than for large networks with many ports. However, SIP can match only two moments per expansion point. This level of accuracy is not always sufficient for practical applications, as we will show in Sec. IV. The authors in suggest using multi-point moment matching 5, 6, to achieve more accuracy. However, the obtained reduced matrices can be singular, and avoiding this issue does not seem to be trivial. In 3, the SparseRC method is proposed, combining graph-partitioning techniques 4 7 with a SIP-like reduction process. A divide and conquer strategy is used to partition the original system into smaller subsystems, then reduced separately with a method similar to SIP. The resulting model has the same partitioned structure as the original system. Such a reduction strategy is efficient in terms of memory and cpu Copyright (c) 5 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an to pubs-permissions@ieee.org.

2 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS for large networks, since the problem of reducing the large system simplifies to reducing smaller subsystems that can be managed efficiently. The generated model is also sparse. By default, SparseRC matches two moments around s =. Additional moments can be added after the first iteration by employing a standard Krylov-based moment-matching projection at arbitrary expansion points 3. While this provides a flexible way to improve accuracy, we demonstrate in this paper that a more efficient strategy can be devised, which also leads to a sparser model. In this paper, we propose TurboMOR-RC, a novel MOR technique for RC networks with many ports. TurboMOR-RC achieves moment-matching at DC without explicitly computing a dense projection matrix as in. Efficient and memory-conscious Householder reflections 8 are used to generate the reduced model, and match two moments per iteration. Differently from previous methods such as SIP, an arbitrary number of moments can be matched, providing full control on accuracy. TurboMOR-RC can be combined with partitioning 3 7 to reduce very large networks. A key feature of TurboMOR-RC is the block-diagonal structure of the reduced models, that addresses the poor efficiency of the dense models produced by general purpose techniques for moment matching, such as 8. The proposed method provides a novel way to generate block-sparse reduced order models, which can be nicely combined with the sparsity provided by partitioning 3 7 to maximize the efficiency of the reduced model. The block diagonal structure also lends itself to a novel and insightful interpretation of moment matching in terms of cascaded subsystems. The reduced models produced by TurboMOR-RC, like the models generated by,, 3, 4, are passive, retain the input-output structure of the original system, and can be synthesized into an equivalent RC netlist 9. Numerical tests demonstrate the superior scalability of TurboMOR-RC in terms of reduction, simulation, and memory consumption. The rest of the paper is organized as follows. In Sec. II, we state the problem and briefly review the foundations of moment matching. In Sec. III, we discuss the theoretical derivation and practical implementation of TurboMOR-RC. Sec. IV compares TurboMOR-RC against the state of the art. In Sec. V we draw our conclusions, and in the Appendix we provide some mathematical proofs. II. PROBLEM FORMULATION We consider a passive network made by resistors and capacitors with m nodes and p ports. Using nodal analysis 3, the network can be described in the Laplace domain by the systems of equations Gx(s) + scx(s) = Bu(s) y(s) = B T () x(s) where vectors u(s) R p and y(s) R p collect all port currents and port voltages, respectively. Vector x(s) R m contains all nodal voltages. Matrices G, C R m m are conductance and capacitance matrices, respectively. They are symmetric and positive semi-definite. Matrix B R m p maps input ports to the nodal equations, and T denotes transposition. The transfer function of () reads H(s) = B T (G + sc) B () The goal of MOR is to approximate () with a model of much lower order n m Ĝˆx(s) + s Ĉˆx(s) = ˆBu(s) ŷ(s) = ˆB (3) T ˆx(s) where Ĝ, Ĉ Rn n, ˆB R n p and ˆx(s) R n. This model must accurately capture the response of the original system across the frequency range of interest. One way of ensuring accuracy is through Padé approximation, also known as moment matching. Around s =, the Taylor series expansion of () reads H(s) = M + M s + M s +... (4) The coefficients M k are called moments of () at DC 5, 6, 8, and can be related to the systems matrices as M k = B T ( G C) k G B k =,,,... (5) The moments of the reduced model are defined similarly, as the Taylor expansion coefficients of the transfer function Ĥ(s) = ˆB T (Ĝ + sĉ) ˆB (6) of reduced model (3). The goal of moment matching is to generate a reduced model (3) that will match the first moments of the original system M k = ˆM k k =,..., q (7) up to a given order controlled by q. Since, for RC networks, moments are typically matched in pairs, we denote the number of matched moments as q. By increasing q, the reduced model will become more accurate, but also larger. While moment matching is not an optimal MOR method, it is one of the most popular approaches for electronic design automation. The high computational cost of optimal methods, such as Hankel MOR and truncated balanced realizations, makes them unsuitable for large circuit problems, 7. In, moment matching is performed with a congruence transformation applied to the matrices of the original system () Ĝ = Q T GQ, Ĉ = Q T CQ, ˆB = Q T B (8) The columns of Q R m qp span the Krylov subspace K q (A, R) = spanr, AR, A R,..., A q R} (9) where A = G C and R = G B. It can be shown that reduced model (8) matches the first q moments at DC of the original system. The reduced model is of size n = qp, and is passive by construction since congruence transformation (8) maintains the positive semi-definite nature of G and C. The projection matrix Q is constructed numerically with the block Arnoldi process 6, an orthogonalization procedure similar to the modified GramSchmidt process 8. Unfortunately, orthogonalization leads to a dense Q. As a result, when p

3 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 3 is high, computing Q and projection products (8) can be very expensive. For very large networks, even storing Q becomes an issue, since its size can easily exceed several Gigabytes. Moreover, transformations (8) lead to a dense reduced model, which will burden any subsequent circuit simulation. These bottlenecks, which make existing methods quite inefficient for many-port networks, are tackled by the proposed method. III. PROPOSED METHOD In this section, we discuss the theoretical derivation of TurboMOR-RC and how it can be implemented for maximum efficiency. The method works recursively, matching two moments per iteration. We discuss the first two iterations in detail, before generalizing. A. Theoretical Derivation ) Matching Two Moments: The first iteration of the proposed method is analogous to,, 3. Nodes are first reordered in such a way that port nodes come first, followed by internal nodes. After reordering, system () reads C ( G + s G G C ) x C x y = B T x x = B u (a) (b) where x R p and x R m p denote port and internal node voltages, respectively. The symbol is used in symmetric matrices to denote the transpose of the symmetric block across the diagonal. For the purpose of shortening our notation, we do not indicate explicitly the dependency on s for input, output and state variables. Submatrix G describes the resistive couplings present between internal and port nodes. We eliminate this block through Gaussian elimination, using the congruence transformation (8) with Q given by Q () = I p K T K G I m p () Matrix K is the Cholesky factor 8 of G. For the being, we assume G to be positive definite (strictly). In Sec. III-C, we will discuss how a singular G can be handled. Matrix I p is the identity matrix of size p p. After the congruence, equations (a) and (b) become ( G ) () C () + s x B G C () C x () = u (a) y = B T x x () (b) where G () = G G T K T K G (3) C () = C G T K T K C C T K T K G + G T K T K C K T K G (4) C () = C C K T K G (5) u d dt u () Σ () x y () u () Σ () Fig.. System theory interpretation of (a) and (b). The original system has been decomposed into two subsystems Σ () and Σ (), decoupled at DC. With Gaussian elimination, all resistive couplings between port nodes and internal nodes have been eliminated, leaving only capacitive couplings. The obtained equations lend themselves to a useful interpretation in terms of system theory, depicted in Fig.. System (a)-(b) can be seen as the cascade of a system Σ () of order p Σ () : d dt G () x + sc () x = u () + B u y = B T x (6) and a system Σ () of order m p Σ () : G x () + sc x () = C () u() y () = (C () )T x () y (7) Only the first subsystem Σ () is directly connected to the input/output ports of the network. Subsystem Σ () is instead connected only to Σ (), through equations u() = sy () and u () = sx, which define derivatives. The coupling between the two subsystems is thus purely dynamical. At DC, the second system is completely decoupled from Σ () and the network ports, and has no influence on the transfer function H(s) between input u and output y. At low frequency, the coupling between the two is weak, and the overall system response is given mainly by Σ (). Therefore, the first subsystem alone can be interpreted as a reduced model of order p of the original system G () x + sc () x = B u (8) ŷ = B T x In the Appendix, we indeed prove that (8) matches the first two moments of the original system at s =. From an accuracy standpoint, the proposed reduced model is thus equivalent in size and accuracy to the models generated by other moment matching techniques. Its computation, however, requires less effort, since its matrices (3) and (4) can be computed cheaply using sparse matrix techniques. It is interesting to note that the idea of decoupling, used here to perform MOR, has several analogies with waveform relaxation 3, where subdivision into weakly-coupled systems is used to enable parallel computations.

4 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 4 ) Matching Four Moments: In order to match more than two moments, the presence of Σ () must be taken into account. Instead of applying to Σ () as in 3, loosing efficiency, we show how additional moments can be efficiently matched by further decomposing Σ (). First, we apply a congruence transformation to (7) using Q = K T in (8) Im p z () + sk C K T z () = K C () u() y () = (C () )T K T z () (9) where x () = K T z (). This step turns G into the identity matrix, and does not require expensive computations since K is already available from the previous iteration. Then, with a series of Householder reflections 8, we compute the QR factorization of the input-to-state matrix in (9) (Q () ) T K C () R = () () where R () R p p is upper triangular and Q () R (m p) (m p) is an orthogonal matrix given by the product of Householder reflectors 8. After Q () is applied to (9) with a congruence transformation, the system will read ( Ip + s I m p where C () y () = C () C () R () ) x () x () T x () x () R () = u () (a) (b) C () = I m p (Q () ) T K C K T Q () I m p C () = I m p (Q () ) T K C K T Q () Ip C () = I p (Q () ) T K C K T Q () Ip () (3) (4) System (a)-(b) is now in the form (a)-(b), and the reduction process used in iteration can be applied again. System (a)-(b) can be seen as the cascade of a first system Σ () of order p Σ () : Ip x () + sc () x() = u () R () u () y () = (R () ) T x () and a second system Σ () of order m p Σ () Im p x () + sc () : x() = C () u() y () = (C () )T x () (5) (6) The two systems are only dynamically coupled, through equations u () = sy () and u () = sx (). Overall, the original system () is now decomposed into three blocks, all coupled dynamically, as shown in Fig.. If we retain the first two u d dt d dt u () Σ () x y () u () Σ () u () x () y () u () Σ () Fig.. Structure of the system obtained after two iterations of the proposed method. blocks, and neglect Σ (), we obtained a reduced model of order p ( G () I p + s C () R () C () d dt d dt ) x = x () ŷ = B T x x () y B u (7a) (7b) As shown in the Appendix, this model matches the first 4 moments of the original system. 3) Matching More Than Four Moments: Additional moments can be matched by iterating the proposed process, and further decompose subsystem Σ () in Fig.. This goal can be achieved by computing, at each iteration j 3, the QR decomposition (Q (j) ) T C (j ) R (j) = (9) of the input-to-state matrix of the innermost system (at iteration j = 3, matrix C () in (6)). The QR decomposition is obtained with a series of Householder reflectors that form the congruence matrix Q (j). The obtained system will have the same structure as (a)-(b), and can be seen as the cascade of two blocks. The first system Σ (j), of size p, will add two matched moments to the reduced model computed up to that point. The second system will be further decomposed if j < q. Otherwise, at the last iteration, it will be discarded. After q iterations, the obtained reduced model will have order pq, and will be in the form shown in equation (8) at the top of the next page. In the Appendix, we prove that the obtained model matches q moments of the original network. The proposed technique therefore leads to a reduced model of the same size and accuracy as, but in a more efficient way, which avoids the explicit construction of a huge and dense projection matrix. In comparison to SIP, that can match only two moments per frequency point, the proposed method can match an arbitrary number of moments, and does not suffer from the singularity issues of multipoint SIP. In TurboMOR-RC, additional moments are matched by iterating

5 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 5 G () Ip... Ip x x (). x (q) C () + s R () C () R (q) C (q) x x (). x (q) = B. u (8) the same procedure instead of switching, after a first SIP-like iteration, to a different method ( or pole matching) as in SparseRC 3. Being able to match additional moments at the same expansion point avoids the need to compute a new Cholesky factorization of a shifted G matrix. With the proposed procedure, one can refine model accuracy using only the Cholesky factorization of G, which is already available from the first iteration. The iterative nature of the proposed algorithm also leads to an interesting feature of the proposed method, the blockdiagonal structure of the generated reduced model (8). Unlike, that generates dense models, the proposed method naturally leads to a block-sparse representation. This reduces the memory footprint of the models, and accelerates subsequent simulations, as we shall see in Sec. IV. Although models can be sparsified with an eigenvalue decomposition 6, this operation costs extra CPU cycles. Since the sparsity provided by TurboMOR-RC arises from the momentmatching process itself, it is independent from the sparsity produced by partitioning 3 7. Therefore, the two strategies to sparsify the reduced model can be nicely combined, as we will discuss in Sec. III-D. The obtained models are stable and passive by construction, since only congruence transformations like (8) have been used to generate the model matrices. The positive-definitive nature of G and C in () is thus preserved, which implies passivity and guarantees stable transient simulations 9. We also note that TurboMOR- RC preserved the matrix B in (a) and (b) that maps input ports to state equations. As discussed in 9, this property facilitates the connection of the reduced model to the surrounding components. Finally, the obtained reduced model can be converted into an RC equivalent circuit using, for example, RLCSYN 9, for seamless integration into existing tools for electronic design automation. B. Practical Implementation We now discuss how TurboMOR-RC can be implemented for maximum efficiency in terms of CPU and memory consumption. The Cholesky decomposition of G can be obtained using efficient routines for the factorization of sparse, positive-definitive matrices, such as the supernodal method 3 available in MATLAB s chol routine. The first congruence matrix () consists of sparse factors, that can be stored efficiently. The QR decomposition in () is computed with the Householder method. In our MATLAB implementation of TurboMOR-RC, we used a direct call to the compiled LAPACK routine DGEQRF 33, which returns the orthogonal Q (j) matrix in factored form 8. Such matrix is never computed explicitly, but kept in factored form, which Algorithm (Ĝ, Ĉ) = TurboMOR-RC(G, C, portvec, q) Input: G, C, vector of port nodes portvec, desired number of iterations q Output: Ĝ, Ĉ of the reduced model : Reorder nodes so that port nodes come first : K = chol(g ) Cholesky factorization 3: Compute G (), C() using equations (3), (4) 4: if q > then 5: Compute C () using equation (5) 6: Let C () = K C K T 7: for j =... q do 8: if j = then 9: (Q (j), R (j) ) = qr(k C () ) QR factoriz. : else : (Q (j), R (j) ) = qr(c (j ) ) QR factoriz. : end if C (j) 3: C (j) (Q (j) ) T C (j ) Q (j) Ip 4: end for 5: (Q (q), R (q) ) = qr(c (q ) ) QR factoriz. 6: C (q) I p (Q (q) ) T C (q ) Q (q) Ip 7: Form Ĝ and Ĉ using (8) 8: else 9: Ĝ = G (), Ĉ = C() : end if consists of a series of vectors representing elementary reflectors 8. The size of these vectors progressively decreases with iterations, in contrast to the orthogonal vectors that form the projection matrix in -like algorithms, whose size is always m. This feature, together with the sparsity of the other factors used in the reduction, contributes to the memory efficiency of the proposed algorithm. The LAPACK s routine DORMQR 33 can be used to compute products involving Q (j) directly from its factorization. Being large and dense, matrix C (j) is also never computed explicitly. Its factored form is always used, which is given by () for j = and by C (j) = I (m jp) (Q (j) ) T C (j ) Q (j) (3) I (m jp) for j >. Algorithm summarizes the entire flow of the proposed technique. C. On the Singularity of G Throughout the derivation of TurboMOR-RC, we assumed the block G in (a) to be strictly positive definite, hence

6 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 6 invertible. When this is not the case, we adopt the solution proposed in 3 for SparseRC. The rows and columns that make G singular are promoted into the first set of equations, and not eliminated. Since the number of such rows is typically very low, this does not significantly increase the size of the obtained models. D. TurboMOR-RC with partitioning Graph partitioning techniques 3 7 can be integrated into TurboMOR-RC to reduce very large networks, such as the power grid models that we will consider in Sec. IV. A possible partitioning strategy, used in 3 and 4, is to partition the given network into subnetworks that interact only through a limited set of nodes, called separator nodes. An optimal partitioning can be found with the nested dissection algorithm nesdis from the SuiteSparse package 3. Once the network nodes are reordered according to the partitions identified by nesdis, the matrices in () assume a bordered block diagonal form 34. To illustrate this, consider a threecomponent partitioning of () G C G + s C G 3 G 3 G 3 C 3 C 3 C 3 x B = u (3) x x 3 B B 3 Blocks G, C and G, C correspond to two decoupled subsystems, that interact only through a set of separator nodes associated to G 3, via coupling matrices G 3, C 3, G 3, C 3. Subsystems and can be reduced individually. The coupling matrices are then updated accordingly. For instance, for reducing subsystem, we first form its nodal equations ( G + s G 3 G 3 C C 3 and then reorder its nodes such that ) x = C 3 x 3 B u (3) B 3 port nodes and separator nodes come first, and form the state vector x in (a); internal nodes come second, forming x in (a). Then, we perform the reduction as in Sec. III-A. After all subsystems have been reduced, the obtained reduced model will read Ĝ Ĝ Ĝ 3 Ĝ 3 G3 +s Ĉ Ĉ Ĉ 3 Ĉ 3 C3 ˆx ˆB ˆx = ˆB u x 3 (33) As numerical results will show, partitioning reduces the overall cost of the reduction, since TurboMOR-RC is applied to subsystems of smaller size 7. Additionally, it reduces the number of fill-ins in the reduced model, since the zero blocks in (3) are maintained in (33), making the reduced model block-sparse. However, if the reduction of the subsystems is performed with existing methods, the non-zero blocks in (33) will be full. With the proposed method, instead, such blocks will have the block-sparse structure of (8). TurboMOR-RC B 3 thus provides a novel way to generate block-sparse models within the moment matching process, that nicely complements the sparsity provided by partitioning. E. On the extension to the RLC case While beyond the scope of this paper, we briefly comment on the feasibility of extending the proposed method to the RLC case. In, it is shown how SIP can be generalized to such case, by replacing the Cholesky decomposition with a LU decomposition. Differently from the RC case, in the RLC case this process matches only one moment. While this is not sufficient from an accuracy standpoint, and using multiple expansion points is more costly than in the RC case, this result provides a starting point to generalize the proposed method to the RLC case. The main challenge in the RLC case is the non-symmetric nature of the circuit matrices. The extension to the RLC case will be the subject of future investigations. IV. NUMERICAL RESULTS The proposed TurboMOR-RC algorithm has been implemented in MATLAB, with direct calls to compiled LAPACK libraries for a few key operations, namely the QR decomposition of (), and the computation of the products with the Householder matrices Q (j). In this section, we compare the performance of TurboMOR-RC against 8 and SparseRC 3. Computations were performed on a 3.4 GHz Intel i7 CPU, with 6 GB of memory and MATLAB R3b. A. Reduction Time Table I shows the needed by the different methods to reduce various test networks. Example is an on-chip bus consisting of 8 signal lines. The bus was modelled with lumped RC segments, and has the characteristics of a global interconnect in the 65 nm technology node 35. Examples - 6 are power grid benchmarks obtained from 36. The original benchmarks include some inductors, which were neglected. A variable number of input current sources has been considered to investigate the scalability of the MOR methods with respect to port count. We first compare the proposed method without partitioning against, in order to assess its intrinsic efficiency in matching moments. For each test case, reduced order models have been generated to match, 4, and 6 moments. From the results in Table I, we observe that TurboMOR-RC is consistently faster than, up to 9.3 s. Savings are particularly high when order and port count are high, as in example 6. While takes hours and 43 minutes (987 s) to match 6 moments at DC, TurboMOR-RC achieves the same result in only 7.5 minutes (5 s). This speedup is due to the fact that TurboMOR-RC achieves moment matching without computing and storing a large projection matrix as does. Then, we compare TurboMOR-RC with partitioning against the recently-proposed SparseRC method 3. The approximate size of the partitions produced by nesdis was established by inspection of the network order and number of ports,

7 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 7 TABLE I REDUCTION TIME AND REDUCED MODEL SIZE FOR THE DIFFERENT METHODS ON VARIOUS TEST NETWORKS. VARIABLES p AND m DENOTE THE NUMBER OF PORTS AND ORDER OF THE ORIGINAL SYSTEM, RESPECTIVELY. ALL TIMES ARE IN SECONDS. Examples q SparseRC Proposed Proposed w/partitioning Reduction Model size Reduction Model size w.r.t Reduction Model size w.r.t Reduction Model size w.r.t. On-chip bus p = 56 m = 38, 58. ibmpgt (RC) p = m = 5, ibmpgt (RC) p = 8 m = 63, ibmpgt (RC) p = m = 63, ibmpgt (RC) p = 5 m = 63, ibmpgt (RC) p = m = 63, PLL 3 p = 44 m = 38, n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a 3 n/a n/a n/a n/a n/a n/a n/a based on the guidelines in 3. This approach was adopted since the determination of a optimal criterion to estimate the number of partitions in MOR is still an open problem 3. Two partitions were found sufficient for examples to 6, while eleven were necessary for the PLL example. From Table I, we observe that partitioning improves reduction substantially, especially for large networks (examples 3, 4, 5 and 6). Comparing the proposed method and SparseRC, we see that for two moments matched (q = ), both methods have almost the same reduction. This is expected since, in this case, the methods perform the same operations. However, when additional moments are matched (q = and q = 3), the proposed method is always faster than SparseRC, which employs a standard Krylov-based moment-matching projection to capture extra moments. While this gives the flexibility to add moments at other expansion points, it reduces efficiency. With the Householder transformations proposed in Sec. III-A, accuracy can be increased at a lower computational cost. B. Accuracy of the Reduced Models In this section we confirm that, from an accuracy standpoint, TurboMOR-RC is equivalent to, and that matching only two moments is not always sufficient to achieve satisfactory accuracy. A transient or AC simulation was performed for all test cases, and the results are summarized in Table II. The on-chip bus (example ) was loaded with ff capacitors and excited with a V peak-to-peak pulse train, with clock Voltage V Original TurboMOR RC Time ns Fig. 3. Transient response of the original system and the reduced models of the imbpgt power grid of Sec. IV-B, obtained with TurboMOR-RC and. The reduced models match two moments (q = ). frequency of 85 MHz and. ns of rise and fall. All power grids (examples -6) were run in the original testbench 3, where they are connected to.8 V power supplies and switching currents derived from actual chips. For the PLL benchmark (example 7), an AC simulation was performed as in 3. It should be noted that in this case, due to the network size, we could not simulate the original network. All simulations were performed with a circuit simulator written in MATLAB, using compiled libraries for the key numerical calculations. Since the simulator can directly load the matrices of the reduced model, and stamp them into the nodal equations

8 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 8 TABLE II SIMULATION TIME FOR THE REDUCED MODELS OBTAINED WITH THE DIFFERENT METHODS. FOR EXAMPLES -6, A TRANSIENT ANALYSIS WAS PERFORMED. FOR EXAMPLE 7, AN AC ANALYSIS WAS PERFORMED. IN THE LATTER CASE, WE REPORT THE SIMULATION TIME PER FREQUENCY POINT. ALL TIMES IN SECONDS. VARIABLES p AND m DENOTE THE NUMBER OF PORTS AND ORDER OF THE ORIGINAL SYSTEM, RESPECTIVELY. Examples Sim. q Sim. SparseRC Proposed Proposed w/partitioning Max. Abs. error (mv) Sim. Max. Abs. error (mv) Sim. Max. Abs. error (mv) Sim. Max. Abs. error (mv). On-chip bus p = 56 m = 38, 58. ibmpgt p = m = 5, ibmpgt p = 8 m = 63, ibmpgt p = m = 63, ibmpgt p = 5 m = 63, ibmpgt p = m = 63, PLL 3 p = 44 m = 38, n/a n/a n/a n/a.65 n/a n/a n/a n/a n/a.64 n/a n/a n/a n/a n/a 9.8 n/a n/a n/a n/a n/a 9.65 n/a n/a 3 n/a n/a n/a 3.69 n/a n/a n/a n/a n/a 5. n/a n/a Maximum Absolute Error V 6 x Time ns TurboMOR RC Voltage V Original.794 TurboMOR RC Time ns Fig. 4. Error between the response of the original system and the response of the reduced models of the imbpgt power grid of Sec. IV-B, computed with and the proposed method. The reduced models match two moments (q = ). of the whole circuit, we did not realize the reduced model as an RC network with RLCSYN 9, although possible. This choice has no influence on the accuracy and run of the reduced models, since the nodal equations of the final circuit remain the same whether RLCSYN is used or not. We first consider the power grid ibmpgt from 3, which corresponds to example 3 in the tables. Fig. 3 shows the response obtained with the original system and the reduced Fig. 5. As in Fig. 3, but for four moments matched (q = ). models from TurboMOR-RC and, for the case of two moments matched (q = ). In Fig. 4, the maximum error for the two reduced models is depicted. Figures show that a reduced model with only two moments matched is not suitable for an accurate assessment of the voltage drop across the power grid. Indeed, the reduced models underestimate the voltage drop, by as much as 5 mv. In Fig. 5, we show the transient results obtained with and TurboMOR-RC models that match four moments (q = ). Now, both models lead to a very accurate prediction of the original system response. The worst case transient error is indeed below mv, as shown by Fig. 6.

9 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 9 Maximum Absolute Error V 6 x Time ns Fig. 6. As in Fig. 4, but for four moments matched (q = ). TurboMOR RC This example shows that matching only two moments as in SIP is not accurate enough for some applications. The simulation errors provided in Table II for the other examples further support this conclusion. In examples and, the maximum absolute error decreases below mv only when six moments are matched (q = 3). For example, the mv error threshold corresponds to a relative error of %. From Figures 3 to 6, we see that and TurboMOR- RC provide results of similar accuracy, as also evident from the errors reported in Table II. This outcome is expected since, as demonstrated in the Appendix, both methods match the same number of moments for a given model size, thus providing the same level of accuracy. C. Efficiency of the Reduced Models We now evaluate the efficiency of the reduced models generated by the proposed method,, and SparseRC. In Table II, the simulation for the original network and the various reduced models is reported. Without partitioning, TurboMOR-RC produces reduced models that are consistently faster than models. This is attributed to the block diagonal structure of the reduced models, which reduces the cost of the LU factorizations used to perform subsequent transient simulations. TurboMOR-RC models are faster by up to five s. Comparing now the simulation s for the methods with partitioning (SparseRC and TurboMOR-RC with partitioning), we observe that when two moments are matched (q = ), the simulation s are essentially the same, which is expected since both methods adopt the same reduction strategy. However, when additional moments are matched, TurboMOR- RC delivers models that are always faster than those from SparseRC, because of higher sparsity. SparseRC uses a standard Krylov-based method to match additional moments, which introduces some dense blocks in the reduced model. For the largest network that we analysed (PLL, example 7), the proposed models reduce the simulation for q = 3 from 3.69 s to 5. s per frequency point. It is interesting to note that, in the first example, which consists of a bus of transmission lines with capacitive couplings, the proposed method without partitioning outperforms all other methods, including those with partitioning. This network, commonly found in integrated circuits, cannot be partitioned as effectively as the other circuits. Many separator nodes have to be introduced, resulting in a larger reduced model. The proposed method without partitioning gives the best simulation s, with an improvement of 3-4X with respect to SparseRC and.5x with respect to the proposed method with partitioning. This example demonstrates the usefulness of the proposed method, which improves MOR scalability for networks that are easy to partition (in combination with partitioning), as well as for those networks that cannot be partitioned efficiently. D. Comparison against SparseRC with frequency shifting SparseRC uses a standard Krylov-based method to match extra moments in addition to the two at s = matched with a SIP-like iteration. This approach gives the flexibility of matching moments at an arbitrary point in the complex plane, using the so-called frequency shifting. We compared SparseRC with frequency shifting against the proposed method for example in Table I. The comparison was performed for for 4 and 6 moments, and for different expansion points, as described in Table III. The expansion points were chosen with respect to the first cut-off frequency of the RC network, which is at ω c = 8 rad/s. Expansion points were located at the cut-off frequency, one decade before, and one decade after. The numerical results in Table III show that frequency shifting can lead to an increase in accuracy, depending on the choice of the expansion point. However, this comes at the expense of both a higher reduction and a higher simulation. The increase in reduction is due to the Cholesky factorization of G + s C needed at each new expansion point s. In contrast, the proposed method provides a way to increase accuracy without additional Cholesky factorizations. Since G + s C is less sparse than G, the additional factorizations required by frequency shifting cost more than the factorization of G, especially when many capacitive couplings are present. The lower simulation provided by the proposed method is due to the higher sparsity of the obtained reduced models. E. Scalability Finally, we investigate the scalability of TurboMOR-RC and existing methods with respect to network order and number of ports. Tests are performed on the first example (on-chip bus) for the case of six moments matched. ) Varying Number of Ports, Constant Node-to-Port Ratio: In the first test, we vary the number of signal lines and, consequently, ports. Since bus length is kept constant, the network order increases linearly with the number of ports. The node-to-port ratio remains constant at 5.5. Fig. 7 depicts the reduction for TurboMOR-RC (without partitioning) and versus the number of ports. We observe that TurboMOR-RC scales better than, and savings grow as port count increases. In Fig. 8, the analysis is repeated for the proposed method with partitioning and SparseRC. Also in this case, TurboMOR-RC scales better than existing methods.

10 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS TABLE III COMPARISON BETWEEN THE PROPOSED TURBOMOR-RC METHOD AND SPARSERC WITH FREQUENCY SHIFTING (SEE SEC. IV-D): NUMBER OF MOMENTS MATCHED, EXPANSION POINTS LOCATION, REDUCED MODEL SIZE, REDUCTION TIME, SIMULATION TIME, MAXIMUM SIMULATION ERROR. Method Moments matched Expansion points (rad/s) Model size Reduction (s) Simulation (s) Max. abs. error (mv) Proposed SparseRC 4, SparseRC 4, SparseRC 4, Proposed SparseRC 6, 8, TurboMOR RC 3 5 TurboMOR RC Reduction Time s 5 Reduction Time s Number of Ports Fig. 7. On-chip bus of Sec. IV-E: reduction for and TurboMOR- RC without partitioning vs number of ports. Both methods match six moments (q = 3). Reduction Time s TurboMOR RC with partitioning sparserc Number of Ports Fig. 8. On-chip bus of Sec. IV-E: reduction for SparseRC and TurboMOR-RC with partitioning vs number of ports. Both methods match six moments (q = 3). ) Varying Node-to-Port Ratio, Constant Number of Ports: In the second test, we keep the number of ports constant to 4, which corresponds to 5 lines. We increase the number of nodes and, consequently, order by making the bus longer. Fig. 9 shows the reduction for the two methods without partitioning (proposed and ) as a function of the number of nodes. Beyond a certain point, the reduction for increases dramatically, because the projection matrix becomes larger than the 6 GB of memory available on the machine. starts resorting to slow swap memory, and becomes very inefficient. With TurboMOR-RC, large projection matrices are avoided. The matrices used to perform the congruence transformations are either sparse (Cholesky Node to Port Ratio Fig. 9. On-chip bus of Sec. IV-E: reduction for and proposed method without partitioning, as a function of the ratio of network order and number of ports. Both methods match six moments (q = 3). Reduction Time s 5 5 TurboMOR RC with partitioning sparserc Node to Port Ratio Fig.. As in Fig. 9, but for SparseRC and proposed method with partitioning. Both methods match six moments (q = 3). factor K) or stored in efficient factored form (Householder reflectors in Q (j) ). This results in lower memory consumption, and allows TurboMOR-RC to achieve high scalability even for very large port counts. In Fig., the analysis is repeated for TurboMOR-RC with partitioning and SparseRC. The figure confirms the efficiency of the proposed models, which are faster than those generated by SparseRC especially for large systems with many ports. V. CONCLUSION We introduced TurboMOR-RC, a new model order reduction method for large RC networks with many ports. TurboMOR-RC achieves moment matching at DC via efficient

11 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS Householder transformations, sparse matrix factorizations, and graph partitioning techniques. Differently from popular methods such as, no large and dense projection matrices need to be computed nor stored. A key novelty of the proposed method is the ability to cheaply match an arbitrary number of moments at DC. This feature provides full control on accuracy, at a lower computational cost compared to existing techniques. The ability to match an arbitrary number of moments is particularly useful for networks that cannot be partitioned well, for which partitioning-based methods become inefficient. TurboMOR-RC generates block-sparse reduced models by construction, which run efficiently when imported in a circuit simulator. The block structure of the reduced models also enables a nice interpretation of moment matching in terms of system theory. TurboMOR-RC models are passive by construction, and can be cast into an equivalent RC circuit, for seamless integration into electronic design automation tools. Numerical results demonstrate the superior performance of TurboMOR-RC in reducing large passive networks with many ports, that arise more and more frequently in practice. APPENDIX A PROOF OF MOMENT MATCHING We prove that the reduced model (8), obtained after q iterations of the proposed method, matches q moments. We assume G invertible, since otherwise moments (4) are not defined. If G is singular, the proposed method will still work, but one cannot speak of moment matching. The starting point of the proof is realization (a)-(b), which is obtained from the original system (a)-(b) by means of congruence transformation (). Since () is invertible by construction, the transformation does not change the transfer function nor the system moments. The key argument of the proposed proof is the derivation of the relation between the moments M k of the original system (a)-(b) and the moments of the inner subsystem (7) extracted by TurboMOR-RC after one iteration. The transfer function of the original system (a)-(b) can be written as, 3 H(s) = B T where ( H (s) = G () + sc () s H (s) B (34) C () ) T (G + sc ) C () (35) is the transfer function of the inner subsystem Σ (). The moments of this subsystem are denoted with N l, so we have H (s) = + l= N l s l (36) After substituting (4) and (36) into (34), we obtain + k= + M k s k = B T G () + sc() N l s l+ B (37) l= For circuits, matrix B is typically a permutation of the identity matrix, and is thus invertible. We can thus rewrite (37) as + G () + sc() + N l s l+ B T M k s k = B (38) l= k= where superscript T denotes the inverse of the transpose. After exchanging the two series, we have + k= G () B T M k s k + C () B T M k s k+ + l= N l B T M k s k+l+ = B (39) Both sides of (39) are polynomials in s that, in order to be equal, must have the same coefficients. Imposing the equality between the coefficients of s we obtain G () B T M = B M = B T ( G () ) B (4) The inverse of G () exists since we G is non-singular. By equating the coefficients of s, we have M = B T (G () ) C () (G() ) B (4) Equations (4) and (4) show that the first two moments of the original system just depend on the matrices B, G () and C (). Such matrices are preserved in reduced model (8), which thus matches the first two moments of the original system. By equating the coefficients of a generic power s r in (39) for r, we obtain the recursive relation M r = B T ( G () + B T ) C () B T M r ( ) r G () l= N l B T M r l (4) Equation (4) shows that the moment M r of order r of the original system (a)-(b) depends on: ) the matrices B, G () and C() of the outer subsystem (6), which are always preserved in the reduced model (8); ) the moments N l of the inner subsystem (7) up to order r. Therefore, if one replaces the nested subsystem (7) with a reduced model that preserves its first r moments, then the overall model will match r moments of the original system. By iterating this argument, it is straightforward to prove that reduced model (8) matches q moments of the original system. The developed relation between the moments of the original system and the moments of its inner subsystem (7) plays a fundamental role in the proposed method. It allows us to match moments recursively, two at a, by iterative application of the same transformation to subsystems of decreasing size. If B is not full rank, a correlation between some inputs exists, which can be extracted before the reduction, making the reduced model smaller and leading to a full-rank B.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS The proposed proof is also applicable to the reduced models obtained from other techniques such as SparseRC 3.

12 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS The proposed proof is also applicable to the reduced models obtained from other techniques such as SparseRC 3. The main differences between our proof and the one in 3 are two. First, the proof in 3 considers only the first two moments, while ours is general. Second, 3 proves moment matching for the moments of the network admittance. Our proof is instead based on the original impedance representation of network (). Our contribution therefore establishes the equivalence, from a moment-matching perspective, of fast MOR methods (proposed, SparseRC) and. REFERENCES H. H. Chen and J. S. Neely, Interconnect and circuit modeling techniques for full-chip power noise analysis, IEEE Trans. Adv. Packag., vol., no. 3, pp. 9 5, 998. J. M. Silva and L. M. Silveira, Issues in model reduction of power grids, in Vlsi-Soc: From Systems To Silicon. Springer, 7, pp S. R. Nassif, Power grid analysis benchmarks, in Proceedings of the 8 Asia and South Pacific Design Automation Conference. IEEE Computer Society Press, 8, pp B. N. Sheehan, TICER: Realizable reduction of extracted RC circuits, in Proceedings of the 999 IEEE/ACM international conference on Computer-aided design. IEEE Press, 999, pp E. J. Grimme, Krylov projection methods for model reduction, Ph.D. dissertation, University of Illinois at Urbana-Champaign, M. Celik, L. Pileggi, and A. Odabasioglu, IC interconnect analysis. Springer Science & Business Media,. 7 A. C. Antoulas, Approximation of large-scale dynamical systems. SIAM, 5, vol A. Odabasioglu, M. Celik, and L. T. Pileggi, Prima: passive reducedorder interconnect macromodeling algorithm, in Proceedings of the 997 IEEE/ACM international conference on Computer-aided design. IEEE Computer Society, 997, pp P. Triverio, S. Grivet-Talocia, M. S. Nakhla, F. G. Canavero, and R. Achar, Stability, causality, and passivity in electrical interconnect models, IEEE Trans. Adv. Packag., vol. 3, no. 4, pp , 7. J. M. Silva, J. F. Villena, P. Flores, and L. M. Silveira, Outstanding issues in model order reduction, in Scientific Computing in Electrical Engineering. Springer, 7, pp P. Feldmann, Model order reduction techniques for linear systems with large numbers of terminals, in Proceedings of the conference on Design, automation and test in Europe-Volume. IEEE Computer Society, 4, p P. Liu, S. X.-D. Tan, B. Yan, and B. McGaughy, An efficient terminal and model order reduction algorithm, Integration, the VLSI journal, vol. 4, no., pp. 8, 8. 3 P. Feldmann and T. Liu, Sparse and efficient reduced order modeling of linear subcircuits with large number of terminals, in IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 4, pp P. Benner and A. Schneider, Model order and terminal reduction approaches via matrix decomposition and low rank approximation, in Scientific Computing in Electrical Engineering SCEE 8. Springer,, pp P. Li and W. Shi, Model order reduction of linear networks with massive ports via frequency-dependent port packing, in Proceedings of the 43rd annual Design Automation Conference. ACM, 6, pp B. Yan, S.-D. Tan, L. Zhou, J. Chen, and R. Shen, Decentralized and passive model order reduction of linear networks with massive ports, IEEE Trans. VLSI Syst., vol., no. 5, pp ,. 7 P. Benner, L. Feng, and E. B. Rudnyi, Using the superposition property for model reduction of linear systems with a large number of inputs, in Proceedings of the 8th International Symposium on Mathematical Theory of Networks & Systems, 8. 8 Z. Zhang, X. Hu, C.-K. Cheng, and N. Wong, A block-diagonal structured model reduction scheme for power grid networks, in Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE,, pp B. Nouri, M. S. Nakhla, and R. Achar, Efficient reduced-order macromodels of massively coupled interconnect structures via clustering, IEEE Trans. Compon., Packag., Manuf. Technol., vol. 3, no. 5, pp , 3. Z. Ye, D. Vasilyev, Z. Zhu, and J. R. Phillips, Sparse implicit projection (SIP) for reduction of general many-terminal networks, in Proceedings of the 8 IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 8, pp K. J. Kerns and A. T. Yang, Stable and efficient reduction of large, multiport RC networks by pole analysis via congruence transformations, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 6, no. 7, pp , 997. S. Tan and L. He, Advanced model order reduction techniques in VLSI design. Cambridge University Press, 7. 3 R. Ionuţiu, J. Rommes, and W. H. Schilders, SparseRC: sparsity preserving model reduction for RC circuits with many terminals, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 3, no., pp ,. 4 P. Miettinen, M. Honkala, J. Roos, and M. Valtonen, PartMOR: Partitioning-based realizable model-order reduction method for RLC circuits, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 3, no. 3, pp ,. 5 Y.-M. Lee, Y. Cao, T.-H. Chen, J. M. Wang, and C. C.-P. Chen, HiPRIME: hierarchical and passivity preserved interconnect macromodeling engine for RLKC power delivery, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 4, no. 6, pp , 5. 6 D. Li, S. X.-D. Tan, and L. Wu, Hierarchical Krylov subspace based reduction of large interconnects, INTEGRATION, the VLSI journal, vol. 4, no., pp. 93, 9. 7 P. Miettinen, M. Honkala, J. Roos, and M. Valtonen, Benefits of Partitioning in a Projection-based and Realizable Model-order Reduction Flow, Journal of Electronic Testing, vol. 3, no. 3, pp. 7 85, 4. 8 G. H. Golub and C. F. Van Loan, Matrix computations. JHU Press,, vol F. Yang, X. Zeng, Y. Su, and D. Zhou, RLCSYN: RLC equivalent circuit synthesis for structure-preserved reduced-order model of interconnect, in International Symposium on Circuits and Systems (ISCAS). IEEE, 7, pp C.-W. Ho, A. E. Ruehli, and P. A. Brennan, The modified nodal approach to network analysis, IEEE Trans. Circuits Syst., vol., no. 6, pp , M. D. Al-Khaleel, M. J. Gander, and A. E. Ruehli, Optimization of transmission conditions in waveform relaxation techniques for RC circuits, SIAM Journal on Numerical Analysis, vol. 5, no., pp. 76, 4. 3 T. Davis, SuiteSparse, html. 33 Netlib, LAPACK, 34 A. Zecevic and D. Siljak, Balanced decompositions of sparse systems for multilevel parallel processing, IEEE Trans. Circuits Syst., vol. 4, no. 3, pp. 33, Nanoscale Integration and Modeling (NIMO) Group, Predictive technology model, 36 Z. Li, P. Li, and S. R. Nassif, IBM Power Grid Benchmarks, http: //dropzone.tamu.edu/ pli/pgbench/. Denis Oyaro (S 5) received the B.Sc. degree in Electronic and Computer Engineering from Politecnico di Torino in, and the M.A.Sc degree in Electrical and Computer Engineering from the University of Toronto in 5. His research interests are in mathematical modelling and simulation algorithms for the computer aided design of electronic and electromagnetic systems.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 3 Piero Triverio (S 6 M 9) received the M.Sc. and Ph.D. degrees in Electronic Engineering from Politecnico di Torino, Italy in 5 and 9, respectively.

13 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS 3 Piero Triverio (S 6 M 9) received the M.Sc. and Ph.D. degrees in Electronic Engineering from Politecnico di Torino, Italy in 5 and 9, respectively. He is an Assistant Professor with the Department of Electrical and Computer Engineering at the University of Toronto, where he holds the Canada Research Chair in Modeling of Electrical Interconnects. From 9 to, he was a research assistant with the Electromagnetic Compatibility group at Politecnico di Torino, Italy. He has been a visiting researcher at Carleton University in Ottawa, Canada, and at the Massachusetts Institute of Technology in Boston. His research interests include signal integrity, electromagnetic compatibility, and model order reduction. He received several international awards, including the 7 Best Paper Award of the IEEE Transactions on Advanced Packaging, the EuMIC Young Engineer Prize at the 3th European Microwave Week, and the Best Paper Award at the IEEE 7th Topical Meeting on Electrical Performance of Electronic Packaging (EPEP 8).

EE5900 Spring Lecture 5 IC interconnect model order reduction Zhuo Feng

EE5900 Spring Lecture 5 IC interconnect model order reduction Zhuo Feng EE59 Spring Parallel VLSI CAD Algorithms Lecture 5 IC interconnect model order reduction Zhuo Feng 5. Z. Feng MU EE59 In theory we can apply moment matching for any order of approximation But in practice