Model order reduction for multi-terminals systems : with applications to circuit simulation Ionutiu, R.

Size: px

Start display at page:

Download "Model order reduction for multi-terminals systems : with applications to circuit simulation Ionutiu, R."

Clemence Stone
5 years ago
Views:

Model order reduction for multi-terminals systems : with applications to circuit simulation Ionutiu, R. DOI: 10.

1 Model order reduction for multi-terminals systems : with applications to circuit simulation Ionutiu, R. DOI: /IR Published: 01/01/2011 Document Version Publisher s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication: A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. The final author version and the galley proof are versions of the publication after peer review. The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication Citation for published version (APA): Ionutiu, R. (2011). Model order reduction for multi-terminals systems : with applications to circuit simulation Eindhoven: Technische Universiteit Eindhoven DOI: /IR General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal? Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Download date: 17. Jan. 2019

2 Model Order Reduction for Multi-terminal Systems with Applications to Circuit Simulation

3 Copyright c 2011 by Roxana Ionuţiu, Eindhoven, The Netherlands. All rights are reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior permission of the author. This work has has been carried out under a dual doctorate agreement between Jacobs University - Bremen, Germany and Technische Universiteit Eindhoven - Eindhoven, The Netherlands, in collaboration with NXP Semiconductors - Eindhoven, The Netherlands. A catalogue record is available from the Eindhoven University of Technology Library ISBN

4 Model Order Reduction for Multi-terminals Systems with Applications to Circuit Simulation PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Eindhoven, op gezag van de rector magnificus, prof.dr.ir. C.J. van Duijn, voor een commissie aangewezen door het College voor Promoties in het openbaar te verdedigen op maandag 26 september 2011 om uur door Roxana Ionuţiu geboren te Oradea, Roemenië

5 Dit proefschrift is goedgekeurd door de promotoren: prof.dr. W.H.A. Schilders en prof.dr. A.C. Antoulas Copromotor: dr. J. Rommes

6 Contents 1 Introduction Motivation Preliminaries from linear system theory External description Internal description Stability and passivity Model order reduction A classification Model reduction by moment matching Exploiting the structure of electrical circuits Special methods for multi-terminal systems Thesis outline Reduction and synthesis framework for multi-terminal circuits Introduction Problem formulation Reduction setup for multi-terminal systems A simple admittance to impedance conversion Input-output structure preservation with SPRIM/IOPOR Synthesis Synthesis by unstamping: RLCSYN Foster synthesis of rational transfer functions Numerical examples Illustrative example SISO RLC network MIMO RLC network Concluding remarks Reduction of multi-terminal R/RC networks Introduction Reduction setup for R/RC networks Reduction of R/RC networks with positive-only resistors

7 vi Contents Properties of G and unstamping Unstamping the reduced Ĝ Partition-based reduction of RC networks Reduction per subnetwork Moment matching, passivity, terminal connectivity Revealing path resistances Computational complexity Improving sparsity per subnetwork Numerical results Low Noise Amplifier (LNA) circuit (CMOS045) Mixer circuit Interconnect structure Very large networks Concluding remarks SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Introduction Problem formulation Model reduction Multi-terminal RC reduction with moment matching Fill-in and its effect in synthesis SparseRC reduction via graph partitioning Partitioning and the BBD form Mathematical formulation Moment matching, passivity and synthesis SparseRC algorithm Numerical results and circuit simulations General results Advanced comparisons Concluding remarks Appendix Reflection on the partition-based RC reduction alternatives Additional experiments Reduction of multi-terminal RLC networks Introduction Reduction without partitioning First-order form Second-order form A parallel between 1st and 2nd order form Partition-based reduction Towards solving the DC moment matching problem

8 Contents vii Identifying fill-in Moment matching and the dimensions of the reduced blocks When partitioning is advantageous Numerical results Concluding remarks Appendix An SPRIM partition-based reduction approach Using general reduced order models in simulations Introduction Problem definition Ungrounded vs. grounded systems Generalized frequency response/transfer function Recovering the reference node from the grounded system Implications Model reduction Transforming a reduced model into synthesis-ready form Numerical results and simulations Simple RC circuit RC transmission line Concluding remarks Graph partitioning with separation of terminals Introduction Formulation of multi-terminal partitioning criterion way partitioning of multi-terminal networks N-way multi-terminal partitioning Concluding remarks Conclusions and future research 165

10 Chapter 1 Introduction 1.1 Motivation Over the past decades, developments in microelectronics have followed the path predicted by the american scientist and Intel co-founder, Gordon E. Moore, who, already back in the early days of the integrated circuit, extrapolated that the number of transistors that can be packed onto a chip of silicon would double approximately every two years. What became known as Moore s Law has since dictated the speed with which the complexity of integrated circuits increases and with that, the rate at which the price of electronics goes down. From a circuit design perspective however, more transistors per silicon area means that components are more densely packed together, and that the behaviors of different parts of the chip are no longer independent. The design of current integrated circuits is therefore hampered by parasitic electromagnetic effects that strongly influence the behavior of the device. During the physical verification of circuit designs, it is thus vital to take these parasitic effects into account. This requires simulation of large scale electrical networks, with numbers of nodes and electronic circuit components (resistors, capacitors, inductors) exceeding hundreds of thousands. Standard circuit simulation tools are often insufficient for this task, as they may be unable to compute solutions to the differential algebraic systems of very high order underlying these circuits. To provide a solution to this challenging industrial problem, a new methodology is developed in this thesis, which combines techniques from electrical engineering, numerical linear algebra, and graph theory. Model order reduction (MOR) aims at constructing a model of lower dimension than an original system while well approximating its behavior. For instance, if the original system is an electrical circuit, a reduced circuit is sought which is to replace the original one within the desired simulation setup. When this is achieved, more time and memory efficient simulations can be performed with the available computing resources. Model re-

11 2 Introduction duction however involves a mathematical procedure, through which usually the physical interpretation of the original system is lost. That is, if the original system is, say, an electrical circuit, the reduction returns a smaller mathematical model, rather than an actual circuit. The synthesis problem then arises, that of mapping the reduced mathematical model back into an equivalent electrical circuit. Also, as different parts of a circuit design often need to be analyzed separately, another important question is whether these can be reduced independently and re-assembled together in a simple way. This seems to be a natural feature, however is not directly satisfied with traditional reduction approaches, because the physical meaning of interconnection points between different parts (also called terminals) is lost. A distinguishing feature of the methods developed in this thesis is their ability to convert reduced models into circuit representations which are easily re-coupled to one another and re-used within the original simulation setup. Yet another critical challenge arises when the the original systems are so large, that even the process of reduction itself is hampered by limited CPU and memory resources. The emerging question is then: how does one reduce a system that is too big to fit within the available reduction methodology? This scenario is especially relevant when reducing circuits with a very large number of terminal nodes, for which preserving sparsity is a critical requirement. Here, these problems are overcome with the help of graph partitioning and fill-reducing node reorderings. Very large, multi-terminal networks are reduced in a divide-and-conquer manner, while the behavior of the reduced circuit remains close to that of the original. The reduced circuits thus obtained contain much fewer nodes and elements than otherwise obtained from conventional reduction techniques and, as industrial examples show, allow up to 50 times faster simulations at little loss of accuracy. In addition, the proposed multi-terminal model reduction methods make circuit simulations possible for designs which could not be handled in their original dimension by SPICE-like tools. While the applications in this thesis come from the electronics industry, many of the problems addressed here are fundamental, and the solutions proposed could resolve similar issues arising in other disciplines. Whether they are models of electrical circuits, mechanical systems, neuronal networks in the brain, or links between web-pages, one can always benefit from numerical algorithms that can approximate their functionality efficiently, either by parts or in whole, enabling a simpler and faster understanding of their behavior. The following sections review some general facts and established results concerning linear dynamical systems and their reduction. The material is based mostly on the references [72], [4], [78], with some adaptations pertaining to the context of this thesis. Then, an overview of the remaining chapters is provided.

12 1.2 Preliminaries from linear system theory Preliminaries from linear system theory Dynamical systems arise in various disciplines: chemical processing, biomedical engineering, acoustics or circuit design and can describe different physical processes. They share however some common features. To quote [72], a system can be viewed as a process in which input signals are transformed by the system or cause the system to respond in a certain way, resulting in other signals as outputs. For instance, an electrical circuit is a system which produces some voltages or currents, in response to applied currents or voltages. Fig. 1.1 gives a simple representation of a linear system with driving inputs u(t) and observed outputs y(t). The word dynamic refers to the fact that u System y ( h ) y(t) = h u(t) = t h(t τ)u(τ)dτ, (1.1) Figure 1.1: System with input u and output y. The output is the convolution of the input with the impulse response h(t). the system has memory, i.e., the system s future behavior depends on its past evolution. An electrical network with capacitors or inductors is a dynamical system, where the memory has a physical interpretation related to the storage of energy; for example, the capacitor stores energy by accumulating electrical charge. The voltage across a capacitor is thus an integral of the current, namely: y(t) = 1 C t u(τ)dτ. (1.2) Whichever the underlying application, the systems which fit in the reduction framework of this thesis are linear, dynamic, continuous, time-invariant 1, and share a common mathematical description, presented next External description The external description characterizes a system by means of the input variables u and the outputs y, related through the convolution operation (1.1), where h(t) is the system s impulse response (the response of the system { to the Dirac delta input, denoted here 1, t = 0 informally as the unit impulse function δ(t) = 2 ). 0, t = 0 1 Continuous refers to the fact that continuous-time input signals result in continuous-time signals at the outputs, while time-invariance refers to the system s behavior being fixed over time (for an electrical circuit, this holds if the resistances/capacitors/inductors are constant over time). 2 See [21, 83] for formal definitions of the Dirac delta distribution.

13 4 Introduction Internal description In addition to the inputs and outputs one can use the internal variables to characterize a dynamical system. The internal variable is by definition the least amount of information required at time t = t 0 so that, together with the excitation for t > t 0, one can compute the future behavior of the system. This input/internal-variable/output representation is called the internal description of the system and is governed by a set of differentialalgebraic equations: { Eẋ(t) = Ax(t) + Bu(t) Σ y(t) = Cx(t) + Du(t), (1.3) where u R p is the input of the system, x R n are the internal variables, y R m is the output (the variables observed), and E R n n, A R n n, B R n p, C R m n, D R m p are the system matrices. The first equation from (1.3) is the state equation which describes the system s dynamics, and the second is the output equation which describes the observation. The number of internal variables n is the dimension of Σ. If m, p > 1, the system is called multiple-input multiple-output (MIMO), and if m = p = 1 it is called single-input single-output (SISO). In the most general case which also covers the scenario of a singular E, system Σ is a descriptor system [68, 76], while for the particular case of E invertible, it is a state space system, and the internal variables x are called states [4]. Systems describing electrical circuits often have a special structure (see Sect ), in particular C = B T and D = 0. It turns out that the external and internal descriptions are intimately related. In particular, using the Laplace transform [72], (1.3) is expressed in the frequency domain, where the inputs, outputs, and internal variables as a function of frequency s are: U(s), Y(s), and X(s) respectively. More precisely, the Laplace transform of x(t) is X(s) = (se A) 1 x(0 ) + (se A) 1 BU(s) and it is assumed that the initial condition is x(0 ) = 0 [4, Chapter 4]. Eliminating now the states X(s) from the first equation of (1.3) and replacing them in the second, one obtains the system s transfer function: Definition The transfer function of the dynamical system (1.3) is defined as: H(s) = C(sE A) 1 B + D C m p. (1.4) H(s) describes the input/output behavior of the system in the frequency domain, and is precisely the Laplace transform of the impulse response h(t), introduced previously in the external description. For the transfer function to be well defined, the pencil (A, E) must be regular 3 [a matrix pair (A, E) is regular if there exists at least one λ C such that det(a λe) = 0]. Most of the existing model reduction methods rest on this assumption however, the methods developed in chapters 3, 4, 5, and 6 also apply to a 3 For electrical networks, the pencil is regular if the network is made consistent by grounding one of the nodes.

14 1.2 Preliminaries from linear system theory 5 particular type of singular pencils, mainly those describing un-grounded electrical networks. The poles of a system lie at the heart of its behavior, in particular they determine its stability (see Sect ). They are defined as follows: Definition The poles of the transfer function H(s) are the points λ C for which lim s λ H(s) s =, i.e., the generalized eigenvalues of the pair (A, E) Stability and passivity Original systems Σ describing electrical circuits are stable and passive; it is hence desirable for the reduced Σ to be also stable and passive. These are among the most important system properties that should be preserved ideally by any reduction method. Stability In the external description, the system Σ characterized by the convolution (1.1) is bounded -input, bounded-output stable if any bounded input u(t) results in a bounded output y(t). This has an equivalent interpretation via the internal description, namely that a system is stable if and only if all poles lie in the closed left half of the complex plane (all poles have non-positive real parts and all pure imaginary eigenvalues have multiplicity one) [4, Theorem 5.10]. Passivity Passive systems are those which do not generate energy. According to [17], the system (1.3) is passive if there exists a non-negative-valued function Θ R n R +, such that: [ t1 Θ(x(t 1 )) Θ(x(t 0 )) t 0 ] u T (τ)y(τ)dτ, (1.5) for t 0, t 1 R, t 1 t 0, and all trajectories (u, x, y) which satisfy the system equations (1.3). If it exists, Θ is called a storage function. The interpretation of equation (1.5) is that, the change in internal storage Θ(x(t 1 )) Θ(x(t 0 )) can never exceed what is supplied to the system [4, Chapter 5.9]. Electrical networks with passive components (resistors, capacitors, inductors and ideal transformers) are passive systems; they absorb supply (energy) in the form of electrical power, which is the sum of the product of the voltage and current at the external ports: u T (t)y(t) = Σ k V k I k. 4 For A C n n, E C n n, λ is a generalized eigenvalue of (A, E), if x C n, x = 0 such that Ax = λex.

15 6 Introduction An established result which characterizes passivity in a unifying manner is that a descriptor system Σ with m = p is passivie if and only if its transfer function H(s) is positive real [3, 76]. By definition [17]: Definition A rational function H(s) C m m is positive real if and only if it satisfies both of the following conditions: 1. H(s) is analytic in C + 2. H(s) + H H (s) 0, for all s C + 5. [4, Theorem 5.22] gives a complete characterization of positive realness in terms of the external representation of Σ. By ensuring that the positive realness condition remains satisfied after reduction, several passivity preserving model reduction methods have been developed. Among those which make no assumptions about the structure of the underlying system matrices are [77] based on balancing and [44] based on moment matching. Other model-reduction methods [27, 71, 89] exploit the special structure and matrix properties of descriptor systems describing electrial circuits to preserve passivity. In short, starting from an original system in a passive form [57] the reduced models retain the passive form. To this type of methods belong also the ones developed in this thesis. Passive systems are also stable [4, Theorem 5.22], [57], hence all reduction methods which preserve passivity are implicitly stability preserving. 1.3 Model order reduction At the heart of model reduction lies the desire to approximate the behavior of a large dynamical system in an efficient manner, so that the resulting approximation error is small. Other requirements are: the preservation of important system properties, of its physical interpretation, and an efficient implementation. In short, starting from an n- dimensional system Σ, a reduced k-dimensional system Σ is sought: Σ { Ẽ x(t) = Ã x(t) + Bu(t) ỹ(t) = C x(t) + Du(t), (1.6) where k n, so that the output approximation error y(t) ỹ(t) (in appropriate norm) is small. Through reduction, the number of inputs and outputs is the same as in the reduced model, however the internal variables and the system matrices are reduced in dimension: x R k, Ẽ R k k, Ã Rk k, B R k p, C R m k. The inputs of the 5 For s = σ + jω, and H(s) defined as in (1.4), the Hermitian complex conjugate reads: (H(s)) H = B H ((σ jω)e H A H ) 1 C H + D H.

16 1.3 Model order reduction 7 reduced system are the same as for the original. The transfer function of Σ is: H(s) = C(sẼ Ã) 1 B + D C m p. (1.7) The unifying approach for obtaining a reduced model from an original system is via a Petrov-Galerkin 6 projection [6]: Σ(Ẽ, Ã, B, C, D) (W EV, W AV, W B, CV, D), (1.8) where V, W R n k are matrices whose k n columns form bases for the relevant subspaces pertaining to the reduction method chosen. In this projection framework it is common to set D = D, but other scenarios are possible, as described in [6] A classification The governing principle behind all reduction methods is that, after a suitable decomposition is found, the non-dominant 7 internal variables are eliminated from the system. Reduction methods differ in the way the decomposition is performed. This in turn dictates how the projection matrices V and W are constructed. Roughly speaking, reduction methods are classified into (a) spectral-, and (b) Krylov-based. Among the volumes which give a comprehensive coverage of the various methods are [4, 12, 82]. A comparison of spectral and Krylov-based methods is available in [39]. By no means attempting to give an exhaustive literature review, a few well-know methods are mentioned here, with a stronger emphasis on Krylov-based methods. The latter are the foundation for the multi-terminal reduction methodology in this thesis. Spectral methods Among the spectral methods, one distinguishes between those based on the SVD (singular value decomposition) or the EVD (eigenvalue decomposition) of relevant system quantities. From the SVD methods, balanced truncation [11] and positive real balanced truncation [74, 77] use SVD-based projections to reduce a balanced representation of the system, i.e. one where the observability and controllability Gramians are equal and diagonal [4, Chapter 7]. Compared to balancing, which uses both the observability and controllabilty gramians as the basis for decomposition, Poor man s TBR [75] decomposes only one of the system gramians. Proper orthogonal decomposition (POD) [4, Chapter 9.1] constructs the reducing projection based on the SVD of a matrix of time snapshots (samples of the state computed at given time instants). Among the EVD methods, modal approximation [78] constructs a reduced model which interpolates dominant poles of the original system, based on the generalized eigenvalue decomposition of the pair (A, E). 6 When W = V the projection is of Galerkin type. 7 Non-dominant here refers to internal variables which contribute the least to the system s response.

17 8 Introduction Krylov-based methods Krylov-based reduction methods exploit the use of Krylov subspace iterations to achieve system approximation by moment matching (explained in Sect ). Among these are PRIMA [71], the structure preserving version SPRIM [27], the second order variants SOAR [9] and SAPOR [66,89], the spectral zero method (SZM) [5,44,45,86] and optimal H 2 [32]. Next, the concept of reduction by moment matching is explained Model reduction by moment matching The starting point for reduction by moment matching is the desire to approximate a transfer function H(s) by a rational function of lower degree, H(s). The questions are then: (a) what are the coefficients of a reduced H(s) which accurately approximates H(s) and, once these are identified, (b) how are the projections V, W constructed, so that the reduced model Σ (2.1) is charcterized precisely by the H(s) with the desired coefficients? The answer to (a) is found by constructing H(s) to match some terms of the Laurent series expansion of H(s) at various points of the complex plane. In particular, the k th moment of H(s) at s 0 C is the k th derivative of H(s) evaluated at s = s 0 : η k (s 0 ) = ( 1) k d k so the Laurent expansion of H(s) around s 0 C is: H(s) = η 0 (s 0 ) + η 1 (s 0 ) (s s 0 ) 1! ds k H(s) s=s0 R m p, k 0, (1.9) + η 2 (s 0 ) (s s 0 )2 2! η k (s 0 ) (s s 0 )k k! +... (1.10) Model reduction by moment matching thus amounts to finding, given the original Σ, a reduced model Σ with transfer function H(s), such that H(s) has the same moments as H(s) up to a desired number k. More precisely: H(s) = η 0 (s 0 ) + η 1 (s 0 ) (s s 0 ) + η 1! 2 (s 0 ) (s s 0 ) η 2! k (s 0 ) (s s 0 )k +..., k! and η i = η i, i = 1... k. (1.11) From (1.9), the moments can be expressed directly in terms of the system matrices, as derivatives of (1.4) for the original system [and (1.7) for the reduced system]. In particular, introducing the following notation: A = (s 0 E A) 1 E, R = (s 0 E A) 1 B, (1.12) Ã = (s 0 Ẽ Ã) 1 Ẽ, R = (s 0 Ẽ Ã) 1 B (1.13)

18 1.3 Model order reduction 9 the moments (1.9) for the original and reduced systems respectively are: η 0 (s 0 ) = CR + D, η i (s 0 ) = [i!( 1) i ]CA i R, i 1, and (1.14) η 0 (s 0 ) = C R + D, η i (s 0 ) = [i!( 1) i ] CÃi R, i 1. (1.15) The general case of matching moments around arbitrary points s 0 C is called rational interpolation. Two special cases are when the expansion is around s = 0 (Padé approximation) and s = (partial realization, where the moments are the Markov parameters of the system). The Laurent expansion and moments at s = 0 are easily derived by setting s 0 = 0 in (1.10)-(1.11) and (1.12)-(1.13) respectively, and requiring A and Ã to be invertible. For matching around s =, the Laurent expansion and moments (Markov parameters) 8 are: H(s) = η 0 ( ) + η 1 ( )s 1 + η 2 ( )s η k ( )s k +... (1.16) η 0 ( ) = D, η i ( ) = C(E 1 A) i 1 (E 1 B), i 1, (1.17) similarly for the reduced transfer function, requiring E, Ẽ to be invertible. An important result in reduction by moment matching is that reducing projections W and V can be constructed to ensure that a desired number of moments of Σ are matched by Σ. Following [4, Sect. 11.3] and [55, Theorem 2.1] (to which we refer for proofs) this result is repeated here, with some adaptations in notation to make the theory in this section uniform: Theorem Let A R n n, R R n p be the matrices from (1.12), V R n k, W R n k, W T V = I, k < n, m n, p n. If: ( ) span A i R span (V), i (0, 1,..., q 1 1), and (1.18) ( ( span A T) i C T) span (W), i (0, 1,... q 2 1), (1.19) then CA i R = CÃi R, for i (0, 1,... q 1 + q 2 1), (1.20) where : C = CV, Ã = WT AV, R = W T R. More precisely, Σ obtained from the projection (2.1) matches q 1 + q 2 moments of the original system Σ at a chosen expansion point s 0. In practice, W and V are not formed explicitly, due to the fact that computing the moments is numerically problematic [4]. Rather, exploiting [ the analogy between ] the V which satisfies (1.18) and the Krylov subspace K q = span R, AR,..., A q 1 1 R (known 8 The following notation is used: η k ( ) = lim s η k (s).

19 10 Introduction as the reachability subspace in system theory, while W is associated with the observability subspace), V and W are constructed iteratively based on the Lanczos [58] or Arnoldi [7] algorithms. [4, Chapter 11.2] gives a detailed analogy between Lanczos/Arnoldi and moment matching, and discusses numerical issues such as the loss of orthogonality or break-down and the means to overcome them. One of the disadvantages of Lanczos/Arnoldi-based moment matching algorithms is that they do not automatically preserve the stability or the passivity of the original system (some variants do preserve passivity based on either assumptions on the structure of the original system, or special ways to pick the interpolation points). Devising robust implementations for Krylov methods and their use in model recution has received the attention of numerous works, among which [30], [31], [26], [36]. A related problem is that of matching moments at different points s 1,... s k, rather than more moments at one point s 0. The solution to this problem is generally known as rational Krylov [4, Section 11.3], [81], and brings in three important additional questions: (a) what is the appropriate selection of points s 1,... s k to ensure passivity preservation (b) how can the associated projection matrices W, V be computed efficiently and (c) is there a suitable selection of s 1,... s k which ensures good approximation and if so, is there an efficient procedure to compute the corresponding interpolating projection? The answers are provided by the spectral zero interpolation approach, on which a brief overview follows. Antoulas [5] showed that if the expansion points are chosen among the roots of H(s) + H ( s) = 0 (the so called spectral zeros), then the reduced Σ which matches one moment at each selected spectral zero is passive. At the same time, Sorensen [86] shows the analogy between spectral zeros and the eigenvalues of an associated Hamiltonian pair, demonstrating that the projection matrices W, V which interpolate at the spectral zeros can be computed from the eigenvectors of the Hamiltonian eigenvalue problem. Later, Ionuţiu et al. [44] show that interpolating the spectral zeros which are dominant in an appropriate residue norm ensures approximation quality and they propose to compute the corresponding W, V with a specialized iterative eigenvalue solver Exploiting the structure of electrical circuits While the above literature overview applies to a general type of descriptor systems, some reduction methods exploit the special properties and structure of systems arising in circuit simulation. The most encountered representation which describes the dynamics of electrical circuits is obtained using modified nodal analysis (MNA) [37]. From Kirchhoff s current, voltage laws, and the branch constitutive equations, the MNA representation of an RLC circuit leads to the following descriptor system: [ C ] [ ] 0 d v(t) 0 L dt }{{}} i L (t) {{} E ẋ + [ ] [ ] [ ] G E v(t) B E T = u(t), (1.21) 0 i L (t) 0 }{{}}{{}}{{} A x B

20 1.3 Model order reduction 11 where without loss of generality it is assumed that the inputs are current sources applied at the port nodes and that the outputs are voltages measured at the ports. An alternative representation with input voltages and output currents is treated in Chapter 2, from which the representation (1.21) can be obtained, as shown therein. The MNA system (1.21) has a special structure. The internal variables x are are split into node voltages, v and currents through inductors, i L. G is the conductance matrix, C is the capacitance matrix, L is the inductance matrix (a diagonal with the inductor values if there are no mutual inductances, otherwise the mutuals appear as off-diagonal entries). E is the incidence matrix which determines the topological connections for the inductors. B is the incidence matrix of current injections (the i th column of B is the i th unit vector for each input i). For demonstrative purposes, two simple RLC examples in MNA form are given in Sect and Sect respectively, and an RC example in Sect The blocks of the MNA matrices have important numerical properties, which are often exploited by reduction methods especially to preserve passivity: G, C, L are symmetric nonnegative definite (they have non-negative eigenvalues), and C = B T (the outputs are measured at the nodes to which the inputs are applied). One of the most popular reduction methods for electrical circuits is PRIMA [71], which matches moments of the original system at s 0 = 0, and is based on a block-arnoldi implementation. The most important advantages offered by PRIMA are: the applicability to MIMO systems and passivity preservation. With PRIMA, passivity is preserved by means of a congruence transformation (i.e. the left and right projection matrices are equal W = V) applied on the MNA matrices (1.21); the passivity proof is given in [71], and holds for all reduction methods based on congruence transforms applied to systems in the MNA form (1.21). One of the limitations of PRIMA is that the MNA structure of the original Σ is lost during the projection (2.1), and as a consequence finding a netlist representation for Σ is not straightforward. Later, the structure preserving SPRIM [27] and the second order SAPOR [66, 89] variants have been introduced which, through structure preservation, allow a simpler realization of the reduced model as a circuit, while maintaining the desired moment matching and passivity preserving properties. In this thesis, Chapter 2 addresses the realization problem from different angles, including that of structure preservation via SPRIM. Chapter 6 brings an additional contribution in demonstrating how even models reduced with non-structure preserving methods such as PRIMA can be easily realized as netlists and re-used in simulations Special methods for multi-terminal systems Despite the recent advances in model reduction, mostly aimed at improving the accuracy and efficiency of state-of-the-art methods, there remains a class of systems to which the applicability of existing approaches is limited. These are systems with a large number of inputs and outputs, in short, multi-terminal systems, which receive special attention in this thesis. Examples of large, multi-terminal systems are electrical networks resulting from parasitic extraction which can have terminal numbers exceeding thousands. The terminals can be either the user defined input-output nodes, or interconnec-

21 12 Introduction tion nodes between the linear, parasitic network and other non-linear circuit elements such as diodes or transistors. When reducing a multi-terminal system via a traditional approach, be it spectral- or Krylov-based, usually a reduced model results which is much denser than the original (even though its dimension is smaller). As explained and demonstrated in this thesis (see especially Chapter 4), re-using a dense reduced model in simulation requires the same, or even more computational effort than the original system, so the entire reduction effort becomes useless. Aside from the multi-terminal challenge, there is yet another limitation for traditional reduction approaches: forming the reducing projection is often too expensive (computational- and memory-wise) or even unfeasible for systems with a large number of internal variables (e.g. circuits with node numbers exceeding tens of thousands). It is thus vital to develop new methods which can efficiently handle such challenging systems arising in industrial problems. Recently, the ReduceR [80] method has provided promising results in reducing very large, multi-terminal resistor networks. At the heart of ReduceR are tools from graphtheory and fill-reducing node reorderings, which ensure that only those variables are eliminated from the network, that do no generate a lot of fill-in. Using graph partitioning and a special hierarchical ordering of the system matrices new, efficient methods for reducing large, multi-terminal RC networks are developed in this thesis (Chapters 3 and 4), with extensions to RLC networks in Chapter 5. The foundation for the multiterminal framework of these chapters is the congruence-based reduction methodology PACT [56, 57] which, compared to the PRIMA/SPRIM approach, exploits a different splitting of the system variables and matrices to construct related Krylov subspaces (the PACT methods are described in the afore-mentioned chapters). Modeling multi-port systems from measured data While all of the above methods construct a reduced model starting from an original descriptor system (1.3), a different problem arises when such a representation is not available. Rather, for a multi-port device, a reduced order model must be constructed directly from frequency response measurements. A recent solution to this problem is provided by the Loewner-based methodology [60 64] (an earlier approach is by Vector Fitting [20, 35]). Although the Loewner approach may not directly apply to networks with a very large number of terminals (due to the sparsity considerations above described), [60 64] show that it provides promising results for those with a moderate number of terminals. For such systems, Chapter 6 shows that low order macromodels obtained with the Loewner framework can also be synthesized and re-inserted in a circuit simulation flow. 1.4 Thesis outline Throughout this thesis, model reduction and synthesis are closely linked. Therefore, the research is focused both on the approximation quality and efficiency of reduction, and

22 1.4 Thesis outline 13 on whether/how the resulting reduced model can be cast into a circuit and reconnected with other sub-circuits, sources, or non-linear devices in the actual simulation phase. We call this the use (or re-use ) of the reduced model. Hence, almost all experiments in this thesis involve reduction, synthesis, and the simulation of the reduced circuit. Except for the self-created demonstrative examples, all circuits are provided by NXP Semiconductors. Unless otherwise stated, all circuit simulations are performed with Spectre [16]. This thesis is organized as a collection of articles, hence each chapter can be read individually. The material nevertheless builds up from chapter to chapter, and the appropriate connections among the different topics are made throughout the text. An outline of the thesis follows next. With Chapter 1, the reader should become familiar with the problem background and with basic concepts related to system theory and model order reduction. The tie between reduction and synthesis is given in Chapter 2 which proposes a new framework for dealing with multi-terminal systems. The framework allows the decoupling of all sources or non-linear devices from the linear circuit which must be reduced, and their re-insertion after reduction and synthesis. Although this is intuitive from a physical perspective, an appropriate set-up for the original system equations is necessary, if this procedure is to succeed also numerically. The mathematical formulation which ensures this is proposed in Chapter 2. The reduction framework can be thought of as independent from the types of inputs/outputs chosen for the circuit. The synthesis problem is also analyzed and shows that unstamping (also denoted as RLCSYN [93]) is the most suitable in the derived multi-terminal framework. The material in this chapter has been published as [40, 41, 43] and has been reorganized here for presentation clarity. The proposed framework for multi-terminal reduction and synthesis gives the theoretical foundation for the approaches taken in later chapters. The reduction of RC circuits is addressed in Chapters 3 and 4. Chapter 3 shows that the same projection which underlies the reduction of R-networks [80] extends immediately to RC networks and brings two main additional contributions. First, it proves that the governing projection for multi-terminal R/RC reduction ensures that resistors in the reduced network are positive. Then, a partition-based reduction for RC networks is derived, shown mathematically equivalent to the unpartitioned approach. This result guarantees that reduction accuracy remains satisfied also when the network is reduced by parts. The proposed partition-based scheme also gives a simple solution for computing path resistances between the terminals of a network, a problem which often occurs in circuit simulation [80]. The properties identified in Chapter 3 also hold for the more advanced method in Chapter 4. In Chapter 4, the attention is turned to reducing very large RC networks with many terminals, for which a new method is developed, SparseRC. As the name suggests, the main feature of SparseRC is that it preserves sparsity during reduction. Retaining sparsity is crucial when reducing circuits with terminal numbers exceeding thousands, as otherwise the fill-in would render the reduced models useless during simulation.

23 14 Introduction SparseRC preserves sparsity with the help of graph partitioning and the identification of nodes responsible for fill-in. Another important feature of SparseRC is that it matches moments at DC for each subnet resulting from partitioning as well as for the recombined network. In addition, it can reduce efficiently very large networks for which existing techniques are inappropriate. Extensive experiments demonstrate that SparseRC achieves significant reduction rates and simulation speed-ups with little computation effort and at almost no accuracy loss. This chapter is published as [48]. Other publications related to this work are [42, 46, 49]. Chapter 5 addresses multi-terminal RLC reduction. Two approaches are compared, based on the first and second order formulation of system equations. A partition-based implementation is derived based on the second order form. An extensive comparison between the first and second order form is drawn, through which the advantages and limitations of each approach are identified. Among the most important contributions of Chapter 5 is a rank revealing decomposition of the reduced inductive susceptance matrix, which ensures that the synthesized model successfully simulates. Other problems, such as recovering circuit behavior at DC, are also analyzed. Chapter 6 takes a new turn and resolves two important problems which usually limit the applicability of traditional reduction methods to multi-terminal circuits: (1) the reduction of ungrounded systems and (2) the synthesis without controlled sources. For un-grounded systems, the underlying matrix pencil (A, E) is singular. An example would be a sub-circuit which is extracted from a larger network. While such circuits can be reduced with the methods in Chapters 3, 4 and 5, other approaches such as PRIMA [71] would be immediately dismissed, as they generally assume that the pair (A, E) is regular. Chapter 6 proposes a terminal removal and recovery action, which allows un-grounded multi-terminal models to be reduced with standard methods as well. The second problem also finds a solution in Chapter 6 using transformations based on the input/output matrices of the reduced model. As the key to preserving sparsity in multi-terminal MOR is graph partitioning, Chapter 7 gives a closing analysis on this topic. The standard partitioning objectives of state-ofthe art software are revised and new objectives aimed at explicitly minimizing fill-in are derived. Although implementing these objectives exceeds the scope of this research, a new partitioning problem in the context of multi-terminal model reduction is proposed. Selections from this material appear in [47].

24 Chapter 2 Reduction and synthesis framework for multi-terminal circuits A framework is presented for the reduction and synthesis of multi-terminal systems arising in circuit simulation. Two main problems are addressed: (1) setting up the appropriate circuit equations, so that reduction can be used without any constraints on the types of inputs applied to the circuit. This feature becomes especially useful when reducing sub-circuits of bigger systems individually, or the linear part of circuits containing non-linear devices, and (2) ensuring that reduced models recover their physical interpretation and that they can be re-inserted naturally in the original simulation flow via the relevant interconnection nodes. 2.1 Introduction Although many model order reduction methods have been developed and have evolved since the 1990s (see for instance [4] for an overview), it is usually less clear how to use these methods efficiently in industrial practice, e.g., in a circuit simulator. One reason can be that the reduced order model does not satisfy certain physical properties, for instance, it may not be stable or passive while the original system is. Failing to preserve these properties is typically inherent to the reduced order method used (or its implementation). Passivity (and stability implicitly) can nowadays be preserved via several methods [5, 27, 44, 71, 74, 77, 86], but none addresses explicitly the practical aspect of (re)using the reduced order models with circuit simulation software (e.g., SPICE [92]).

25 16 Reduction and synthesis framework for multi-terminal circuits Difficulties can occur at several levels, when inserting a reduced model in place of the original circuit in a simulation: 1. If the reduced model (2.1) was obtained based on certain restrictive assumptions about the types of inputs applied u and measured outputs y, the simulation is usually also constrained to the same input/output types. Hence, if the model must be re-used in different simulation runs which would require other input and output types, a new reduced model would have to be generated for each scenario. A more elegant solution would be to have a single reduced circuit which can be re-coupled via its terminal nodes to any kind of driving inputs: voltage sources, current sources, non-linear devices, or other circuit blocks. In this chapter, the appropriate reduction setup is derived which ensures that reduced models can indeed accommodate any types of driving sources in simulations. 2. The linear circuit to be reduced is represented by a netlist, which is a description of the circuit element values (R,L,C) and their connections to the circuit nodes. Such an example is the simple netlist in Fig However, reduced order models are usually represented in terms of system matrices or of the input-output transfer function. Typically, the default input format for circuit simulators are netlists with nodes and circuit elements, and would require additional software to directly handle mathematical representations. Thus, a reduced model in netlist representation is more easily coupled to other circuit blocks, simulated, or used by circuit designers for further analysis. This chapter also addresses synthesis methods which convert reduced models into netlists. Here, two methods are presented: Foster synthesis for single input single output systems (SISO), and synthesis by un-stamping for multi input multi output (MIMO) systems. The latter especially suits the multiterminal reduction setup mentioned at item [1.]. With the proposed multi-terminal reduction and synthesis framework reduced circuits are obtained which contain no controlled sources or transformers, and which are easily re-inserted in any SPICE-like circuit simulation environment. The material is organized as follows. Sect. 2.2 formulates the two main problems of this chapter. The first is treated in Sect. 2.3, and provides the setup for multi-terminal model reduction. The second problem, namely synthesis, is addressed in Sect. 2.4, which describes two procedures: RLCSYN unstamping, and Foster synthesis. Experiments in Sect. 2.5 demonstrate the functionality of the proposed reduction and synthesis framework. Sect. 2.6 concludes. 2.2 Problem formulation Following Sect. 1: given an original system Σ(E, A, B, C, D) of dimension n in the form (1.3), a reduced model Σ(Ẽ, Ã, B, C, D) of dimension k n is sought. In particular: Σ(Ẽ, Ã, B, C, D) (W T EV, W T AV, W T B, CV, D), (2.1)

26 2.2 Problem formulation 17 where V, W R n k are matrices whose k n columns form bases for relevant subspaces of the space in which the original internal variables lie. Rather than addressing explicitly the construction of V, W here, the aim of this chapter is to derive a reduction and synthesis framework which overcomes the two global problems identified in Sect Chapters 3, 4 and 5 are based on this framework and are concerned explicitly with forming the projection matrices V, W appropriate to the context therein. For clarity, the two problems identified in Sect. 2.1 are re-stated: 1. What is the appropriate modeling of equations (1.3) governing the original system Σ(E, A, B, C, D), which ensures that the reduced Σ(Ẽ, Ã, B, C, D) accommodates any input/output types in later simulation stages? 2. How can Σ(Ẽ, Ã, B, C, D) recover its physical interpretation? For instance, if the original Σ(E, A, B, C, D) describes a circuit with R, L, C elements, which methods are suitable to convert Σ(Ẽ, Ã, B, C, D) into a circuit representation with R, L, C elements only, without introducing new elements such as controlled sources or transformers? Sect. 2.3 provides the setup for multi-terminal model reduction which underlies all reduction methods in this thesis. In this chapter in particular, the reduction itself is performed with SPRIM/IOPOR [27, 93] but for demonstration purposes only. Then, Sect. 2.4 discusses two synthesis methods, from which the unstamping method is chosen as the most suitable for many-terminal systems. The proposed framework offers flexibility in choosing the input/output types of the original or reduced system during the simulation stage, demonstrates that synthesis is possible with R, L, C elements only, and that the reduced circuits can be re-inserted easily via its terminals nodes to the desired simulation environment. Further on, the following naming conventions are used. Σ(E, A, B, C, D) will denote a system which hides any further information about the structure of the underlying matrices. It will be clear from the context when the special structure of system matrices is exploited, such as in (1.21). Also, if the inputs u are currents and the outputs y are voltages, then H(s) is an impedance function (each i, j entry of H(s), H (i, j) (s) = Y i (s) U j (s), represents a voltage deviled by a current). We denote such a current driven circuit to be in impedance form. If on the other hand the inputs are voltages and the outputs are currents, then H(s) is an admittance function, and we denote the circuit to be in admittance form. We will also often refer to reduction methods that are input-output structure preserving. These ensure that the reduced input-output matrices B, C retain the special structure of the original input-output matrices B, C respectively (simply put, B, C are sub-blocks of B, C). Preserving input-output structure during reduction is an important feature which later enables synthesis without controlled sources or transformers. Among the input-output structure preserving methods are SPRIM/IOPOR [27, 93], PACT [56] for RC networks, the RLC version [57] and the methods developed in Chapters 3, 4 and 5. Chapter 6 presents another reduction approach which enables synthesis

27 18 Reduction and synthesis framework for multi-terminal circuits R R a 1 P Q C C Figure 2.1: Circuit with terminal a, internal node 1, port P, and port Q(a,0). without controlled sources even for methods that are not input-output structure preserving by default. The approach taken in Chapter 6 is nevertheless also based on the reduction setup presented next. 2.3 Reduction setup for multi-terminal systems The terms internal nodes, terminals (or external nodes), and ports are often found in electronic engineering related papers. In short, external nodes are those related to the system inputs u and outputs y, while all the rest are internal nodes. Thus, a terminal (external node) is a node that is visible on the outside of a circuit, i.e., a node in which currents can be injected (cf. node a in Fig. 2.1). An internal node is one which is not visible on the outside of a circuit, i.e., no currents can be injected in an internal node (cf. node 1 in Figure 2.1). A port consists of two terminals that can be connected, for instance, by a source, a non-linear circuit element, or another (sub)circuit (cf. port P in Fig. 2.1). Sometimes terminals are referred to as ports and vice versa: from the context it should then be clear which terminal(s) complete the ports; usually it is implicitly assumed that the ground node completes the ports. In Fig. 2.1, for instance, terminal a can be seen as a port (Q) by including the ground node. Most of the multi-terminal systems in this thesis arise from full device-parasitic simulations. The linear part i.e. the parasitic network is first decoupled from other non-linear elements such as transistors or diodes, and then reduced. Through the de-coupling, the interconnection points between the linear subcircuit and non-linear elements become terminals of the linear part to be reduced. A similar scenario occurs for instance when reducing parts of a network individually, for instance, after partitioning a large circuit, as is the case in Chapters 3, 4 and 5. The reduction setup proposed in this chapter ensures that the removal of non-linear elements or other circuit blocks before reduction and their re-insertion after reduction has a theoretical foundation. Without loss of generality, in the following section this is exemplified through the removal/re-insertion of voltage sources from a circuit. The result holds for the other afore-mentioned scenarios as well. By reducing a current driven model, the reduced netlist can be easily coupled to other circuitry in place of the original netlist, and (re)using the reduced model in simulation becomes straightforward.

28 2.3 Reduction setup for multi-terminal systems 19 In the following section, it is shown that the impedance form (defined in Sect. 2.2) is the one which gives the desired freedom in connecting the circuit in practice to other types of elements than current sources. First, we motivate reduction in impedance form, and show how it also applies for systems that are originally in admittance form. The impedance-based reduction setup is demonstrated via the input-output structure preserving method SPRIM/IOPOR [27,93]. Finally, a note on numerical aspects concerning the SPRIM/IOPOR projection is given A simple admittance to impedance conversion The strength of input-output structure preserving methods is that the input/output connectivity can be synthesized after reduction without controlled sources, provided that the system is in impedance form: the inputs are currents injected into the circuit terminals, and the outputs are voltages measured at the terminals. The emerging question is: how to ensure synthesis without controlled sources, if the original model is in admittance form (i.e., it is voltage driven)? We show that reduction and synthesis via the input impedance transfer function can be performed, also when the original circuit is initially voltage driven. The same principle of an impedance-based reduction serves as the basis for reducing the linear part of circuits with non-linear elements, as is done in Chapters 3, 4, 5 and 6 or other methods such as [80, 88]. The admittance to impedance conversion proposed herein (also published as [43]) enables the reduction, synthesis and most importantly re-use of complex linear system which contain couplings to other circuit blocks (e.g. non-linear devices, sources, etc. ). Consider the modified nodal analysis (MNA) description of an input admittance 1 type RLC circuit, driven by n s voltage sources: C 0 0 v(t) G E d v E L v(t) 0 i dt S (t) T + E v 0 0 i S (t) = B u(t), (2.2) 0 0 L i L (t) T E }{{}}{{} L 0 0 i L (t) 0 }{{}}{{}}{{} E Y ẋ Y A x Y B Y Y where u(t) R n s are input voltages and y(t) = C Y x(t) R n s are output currents, C Y = B T Y. The states are x Y (t)t = [v(t), i S (t), i L (t)] T, with v(t) R n v the node voltages, i S (t) R n s the currents through the voltage sources, and i L (t) R n L the currents through the inductors, n v + n s + n L = n. The n v = n 1 + n 2 node voltages are formed by the n 1 external nodes/terminals 2 and the n 2 internal nodes (assuming that the voltage sources may be ungrounded, n 1 satisfies: n s < n 1 2n s + 1). The dimensions of the underlying matrices are: C R n v n v, G R n v n v, E v R n v n s, L 1 The subscript Y refers to quantities associated with a system in admittance form. 2 The MNA form (2.2) corresponds to the ungrounded circuit (i.e., the reference node is counted within the n 1 external nodes), resulting in a defective matrix pencil (A Y, E Y ). For subsequent computations such as the construction of a Krylov subspace, the pencil (A Y, E Y ) must be regular. Thus in (2.2), one node must be chosen as a ground (reference) node by removing the row/column corresponding to that node; this ensures that the regularity conditions (i) and (ii) from [76, Assumption 4] are satisfied. The positive definiteness of C, L, G is also a necessary condition to ensure the circuit s passivity.

29 20 Reduction and synthesis framework for multi-terminal circuits R n L n L, E L R n v n L, B R n 1 n s. Assuming without loss of generality that (2.2) is permuted such that the first n 1 nodes are the external nodes, the input voltages are determined by a linear combination of v 1:n1 (t) only. Thus the following holds: [ ] Bv E v = R n v n s, B 0 v R n 1 n s, B = B v. (2.3) n2 n s We derive the first order impedance-type system associated with (2.2). Note that by definition, i S (t) flows out of the circuit terminals into the voltage source (i.e., from the + to the terminal of the voltage source, see also the example in Sect ). We can define new input currents as the currents flowing into the circuit terminals: i in (t) = i S (t). Since v 1:n1 (t) are the terminal voltages, they describe the new output equations, and it is straightforward to rewrite (2.2) in the impedance form: [ ] [ ] C 0 d v(t) 0 L dt i L (t) }{{}}{{} E ẋ [ ] [ ] E T v(t) v 0 }{{} i L (t) }{{} C x [ ] [ ] [ ] G EL v(t) Ev + T = i E L 0 i L (t) 0 in (t) }{{}}{{}}{{} A x B [ ] (2.4) = y(t) = B v v 1:n1 (t), E T v = B T v 0 ns n 2 where B describes the new input incidence matrix corresponding the input currents, i in. The new output incidence matrix is C, corresponding to the voltage drops over the circuit terminals. We emphasize that (2.4) has fewer unknowns than (2.2), since the currents i S have been eliminated. The transfer function associated to (2.4) is an input impedance: H (i, j) (s) = Y i (s) I in (s), where Y(s) and I in (s) are the Laplace transforms of y(t) and i in (t) j respectively. In Sect we explain how to obtain an impedance type reduced order model in the input/output structure preserved form: [ ] [ ] [ ] [ ] [ ] C 0 d ṽ(t) G + ẼL ṽ(t) Ẽv = i in (t) 0 L dt ĩ L (t) }{{}}{{} Ẽ [ ] [ ] ṽ(t) ẼT v 0 }{{} ĩ L (t) }{{} C x x ẼT L 0 ĩ L (t) }{{}}{{} Ã x [ = y(t) = B v v 1:n1 (t), ẼT v = 0 }{{} B B T v 0 ns k 2 ] where C, L, G, Ẽv are the reduced MNA matrices, and the reduced input impedance transfer function is: H(s) = C(sẼ Ã) 1 B. Due to the input/output preservation, the circuit terminals are easily preserved in the reduced model (2.5). It turns out that after reduction and synthesis, the reduced model (2.5) can still be used as a voltage driven admittance block in simulation. This is shown next. We can rewrite [ ] [ ] the second equation in (2.5) as: ẼT v 0 0 ṽ(t) T ĩ S (t) T ĩ L (t) T T = Bu(t). This (2.5)

30 2.3 Reduction setup for multi-terminal systems 21 result together with i in (t) = i S (t), reveals that (2.5) can be rewritten as: C 0 0 ṽ(t) d i 0 0 L dt S (t) ĩ L (t) }{{}}{{} x Y Ẽ Y + G Ẽv Ẽ L ṽ(t) ẼT v 0 0 i S (t) = ẼT L 0 0 ĩ L (t) }{{}}{{} x Y ÃY 0 B 0 }{{} B Y u(t), (2.6) which has the same structure as the original admittance model (2.2). Conceptually one could have reduced system (2.2) directly via the input admittance. In that case, synthesis would have required controlled sources [36], due to the fact that the structure of the input/output matrix would not be preserved. As shown above, this is avoided by: applying the simple admittance-to-impedance conversion (2.2) to (2.4), reducing (2.4) to (2.5), and finally reinserting voltage sources as in (2.6). In other words, the input-output structure preserved reduced admittance (2.6) tells that, after synthesizing the reduced impedance model (2.5) into a reduced netlist with all terminal nodes preserved, the voltage source elements are safely reconnected at the terminals Input-output structure preservation with SPRIM/IOPOR A reduced model of the impedance form (2.5) can be obtained for instance via the inputoutput structure preserving SPRIM/IOPOR projection [27, 93] as follows. Let V be the projection matrix obtained with PRIMA [71], split according to the block dimensions of (2.4): V = [V 1, V 2, V 3 ] R ((n 1 +n 2 +n L ) k), where n 1 + n 2 = n v, V 1 R (n 1 k), V 2 R (n 2 k), V 3 R (n L k), k n 1, i = After appropriate orthonormalization (e.g., via Modified Gram-Schmidt [78, Chapter 1]), we obtain: Ṽ i = orth(v i ) R n i k i, k i k. The SPRIM [27] block structure preserving projection is: Ṽ = Ṽ Ṽ Ṽ 3 R n (k 1 +k 2 +k 3 ) which preserves the block structure of (2.4) but does not yet preserve the structure of the input and output matrices. The input-output structure preserving projection is obtained via SPRIM/IOPOR as proposed in [93]: I n1 n V = 0 Ṽ 2 0 R n (n 1 +k 2 +k 3 ), 0 0 Ṽ 3 where the top block is:

31 22 Reduction and synthesis framework for multi-terminal circuits W = [ In1 n Ṽ 2 ] R (n 1 +n 2 ) (n 1 +k 2 ). (2.7) Recalling (2.3), we obtain the reduced system matrices in (2.5): C = W T CW, G = W T GW, L = Ṽ T 3 LṼ 3, ẼL = WT E L Ṽ 3 (2.8) [ ] Ẽ v = W T T E v = B T v 0 n1 k 2, (2.9) which compared to (2.3) clearly preserves input-output structure. Therefore a netlist representation for the reduced impedance-type model can be obtained, that is driven by injected currents just as the original circuit. To this end, we use the Laplace transform and convert (2.5) to the second order form: { [s C + G + 1 s Γ ]ṽ(s) = Ẽv i in (s) ỹ(s) = ẼT vṽ(s), (2.10) where ĩl (s) = 1 s L 1 (Ẽ L T ) ṽ(s) are the eliminated current variables and Γ = ẼL L 1ẼT L. The reduced model (2.10) is synthesized via RLCSYN unstamping according to [93] (this is revised in Sect ). On SPRIM/IOPOR and rank loss of Ã In some cases it was observed (and shown by results in Sect ) that models reduced with SPRIM or SPRIM/IOPOR exhibit poles and zeros at 0. This section explains when this happens and supports theoretically the interpretation of the results in Sect Proposition Let W and Ṽ 3 be the SPRIM/IOPOR projection matrices (2.7), with full column rank. If ẼL = WT E L Ṽ 3 in (2.8) has deficient column rank, then the SPRIM/IOPOR reduced model (2.5) has at least a pole-zero pair at 0. Proof Recalling that Ṽ 2 has full column rank, it is clear from (2.7) that W also has full column rank. Nonetheless W T R (n 1 +k 2 ) (n 1 +n 2 ) has more columns than rows (usually k 2 n 2 ), thus W T has deficient column rank. Hence ẼL = WT E L Ṽ 3 R (n 1 +k 2 ) k 3 may also lose column rank (even if E L and Ṽ 3 have full column rank). If ẼL has deficient column rank, then in (2.5) we have: rank(ã) < #cols(ã). Thus Ã has deficient column rank Ã has at least an eigenvalue at 0 0 is also an eigenvalue of the pencil (Ã, Ẽ) and this is a pole at 0. The zeros of the reduced system (2.5) are determined as eigenvalues of the extended matrix pair (Ãz, Ẽ z ), [ ] Ã where Ãz is the Rosenbrock matrix [78, Chapter 5]: Ã B G ẼL Ẽ v z = = Ẽ T L 0 0, C 0 ẼT v 0 0

32 2.4 Synthesis 23 and Ẽ z = [Ẽ ] 0. From the structure of Ãz it is seen that if ẼL has deficient column rank, 0 0 then Ãz also loses rank and will have a 0 eigenvalue. Consequently, (Ãz, Ẽ z ) will also have an eigenvalue at 0, which means that the SPRIM/IOPOR reduced system (2.5) has a zero at 0. Analytically, pole-zero pairs at 0 are harmless since they theoretically cancel. Numerically this may not be the case, altering the approximation for low frequencies (as seen for instance in the result in Sect ). As also reported in Chapter 5, the deficiency in the column rank of ẼL [and implicitly of (Ã)] may prejudice the computation of the DC solution when the reduced, synthesized RLC model is re-simulated. A rank revealing procedure for Γ which avoids these numerical limitations is proposed in Chapter Synthesis Synthesis is the realization step needed to map the reduced order model from the mathematical representation (in terms of system matrices or transfer function) into a netlist consisting of electrical circuit components [34]. In [15] it was shown that passive systems (with positive real transfer functions) can be synthesized with positive R,L,C elements and transformers (see also [76]). Later developments [14] propose a method to circumvent the introduction of transformers, however the resulting realization is nonminimal (i.e., the number of electrical components generated during synthesis is too large). Allowing for possibly negative R, L, C values, other methods have been proposed via e.g. direct stamping [57, 71] or full realization [36, 73]. These mostly model the input/output connections of the reduced model with controlled sources. The term synthesis is usually tied to circuit realizations which contain only positive circuit elements, but which may include transformers or controlled sources. The main reason for requiring that circuit elements in the reduced model are positive, is to guarantee passivity. This is however not a necessary requirement. A reduced circuit with negative elements can also be passive as long as its underlying transfer function is positive real. This holds for all reduced models obtained in this thesis. For this reason, we dispose of the positiveness condition on circuit elements when using the term synthesis. Furthermore, in this work synthesis methods are considered which do not introduce transformers or controlled sources. Two such synthesis methods are: (1) RLCSYN synthesis by unstamping [93] and (2) Foster synthesis [34], discussed next Synthesis by unstamping: RLCSYN This section focuses on synthesis via the RLCSYN [93] method, which is based on unstamping the reduced matrix data directly into the netlist representation. It is suitable for synthesizing models which were reduced via methods that preserve the MNA structure

33 24 Reduction and synthesis framework for multi-terminal circuits and the input-output connectivity at the circuit terminals. All reduction methods in this thesis satisfy these properties, hence RLCSYN unstamping is the synthesis method of choice in the remaining chapters. The presentation of RLCSYN follows [93, Sect. 4], [40] and is briefly described here. In circuit simulation, the process of forming the C, G, L system matrices from the individual branch element values is called stamping. The reverse operation of unstamping involves decomposing entry-wise the values of the reduced system matrices in (2.10) into the corresponding R, L, and C values. The resulting Rs, Ls and Cs are connected in the reduced netlist according to the MNA topology. The reduced input/output matrices of (2.10) directly reveal the input connections in the reduced model via injected currents, without any controlling elements. The unstamping procedure is best understood via an example, which is provided in Sect The general framework reduction followed by RLCSYN unstamping is as follows: 1. The system to be reduced is in MNA impedance form (2.4). If the system is of admittance type (2.2), apply the admittance-to-impedance conversion from Sect If the circuit contains non-linear elements, they are removed in a similar manner so that the linear part in impedance form (2.4) remains. 2. System (2.4) is reduced with an input-output structure preserving method and converted to second order form (2.10). The alternative is to obtain the second order form of the original system first, and reduce it directly in second order form [8,93]. 3. According to [93], to obtain a reduced RLC netlist which successfully simulates, the reduced Γ from (2.10) must be diagonalized and regularized as proposed therein. Diagonalization ensures that all inductors in the synthesized model are connected to ground (i.e., there are no inductor loops). Regularization eliminates spurious over-large inductors. These steps however are not needed for purely RC circuits. 4. Originally [93] impose that in (2.4), no Ls are directly connected to the input terminals so that, after reduction, diagonalization and regularization preserve the input/output structure. In Chapter 5 it is shown that this restrictive assumption is unnecessary, since the rank revealing decomposition of Γ can be performed, without affecting the input-output structure, irrespective of whether there are Ls connected to terminals or not. In fact, the reduced model can be cast into the form (2.5) and synthesized as an RC equivalent netlist; this procedure allows multi-terminal circuits with Ls connected to ports to be reduced, synthesized and re-simulated. 5. Finally, the reduced netlist is inserted via its terminal nodes in the original simulation setup. Voltage sources or non-linear elements that were removed from the network are simply reconnected at the terminal nodes. With unstamping, roughly speaking every non-zero entry in the upper triangle of the reduced matrices maps to a circuit element in the reduced netlist. Hence, the main drawback of unstamping would be that, when the reduced system matrices are dense

34 2.4 Synthesis 25 and the number of terminals is large [p is of O(10 3 )], the RLCSYN procedure yields dense netlists. For a dense reduced network with p terminals and k internal nodes, the RLCSYN synthesized netlist will have O[(p + k) 2 ] circuit elements. This situation is remedied however when the reduction itself ensures that the reduced matrices are sparse also when p is large. Then, the O[(p + k) 2 ] factor becomes only a loose upper bound on the actual number of elements resulting from unstamping, and in practice much fewer elements are generated. Hence, RLCSYN becomes especially suitable for synthesizing multi-terminal reduced models obtained by sparsity preserving reduction methods. Devising such methods is a very challenging task, a solution for which is the SparseRC method, developed in Chapter 4. Other sparsity-related contributions are given in Chapters 5 and Foster synthesis of rational transfer functions This section describes the Foster synthesis method, which was developed in the 1930s by Foster and Cauer [34] and involves realization based on the system s transfer function rather than on the systems matrices. Hence, the Foster approach can be used to realize any reduced order model that is computed by standard projection based model order reduction techniques (where standard means that no structure preserving requirements are necessary). Realization of impedance functions Realizations in this section are described in terms of SISO impedances (Z-parameters): the circuit input is a current injection, and the output is a voltage. Given the reduced system (2.1) and assuming that the matrix pencil (Ã, Ẽ) is non-defective3, consider the partial fraction expansion [50] of its transfer function: H(s) = k i=1 r i s p i + D + sr j, (2.11) The residues are r i = ( C x i )(ỹ i B) ỹ, the poles are p i Ẽ x i and, if non-zero, D and r j give additional contributions from poles at. An eigentriplet ( p i, x i, ỹ i ) is composed of an i eigenvalue p i of (Ã, Ẽ) and the corresponding right and left eigenvectors x i, ỹ i C k. The expansion (2.11) consists of basic summands of the form: Z(s) = r 1 + r 2 + r ( 3 s p 2 s + r4 + r ) ( 4 r7 + sr s p 4 s p r ) 7, (2.12) 4 s p 7 s p 7 where for completeness we can assume that any kind of poles may appear, i.e., either purely real, purely imaginary, in complex conjugate pairs, at or at 0 (see also Table 2.1). The Foster realization converts each term in (2.12) into the corresponding circuit 3 For every eigenvalue of the pencil (Ã, Ẽ), its algebraic multiplicity is equal to its geometric multiplicity.

26 Reduction and synthesis framework for multi-terminal circuits block with R, L, C components, and connects these blocks in series in the final netlist. This is shown in Fig. 2.

35 26 Reduction and synthesis framework for multi-terminal circuits block with R, L, C components, and connects these blocks in series in the final netlist. This is shown in Fig. 2.2, where each summand from (2.12) is represented in order as a circuit block between two nodes. Note that any reordering of the circuit blocks in the realization of (2.12) in Fig. 2.2 still is a realization of (2.12). The values for the circuit components in Fig. 2.2 are determined according to Table 2.1 (for a full derivation see [40]). Figure 2.2: Realization of a general impedance transfer function as a series RLC circuit. Table 2.1: Circuit element values for Fig. 2.2 for the Foster impedance realization of (2.12) pole residue R(Ohm) C(F) L(H) G(Ohm 1 ) p 1 = r 1 R r 1 p 2 R r 2 R r 2 p 2 1 r2 p 3 = 0 r 3 R 1 r3 p 4 = σ + iω C r 4 = α + iβ C a 0 p 5 p 4 r 5 r 4 a 0 = 2(ασ + βω), a 1 = 2α, b 0 = σ 2 + ω 2, b 1 = 2σ p 6 = r 6 R r 6 1 a a L 3 a 1 1 b 1 a 0 1 a1 a 2 1 b 0 a 0 (a 1 b 1 a 0 ) a 2 1 p 7 ir r 7 R 1 r7 2r 7 p 7 p 7 p 8 p 7 r 8 r 7 Realization of admittance functions Realizations in this section will be described according to [65] in terms of SISO admittances (Y-parameters): the inputs are voltage sources and the outputs are currents. Consider the admittance function: Y(s) = r 1 + r 2 s p 2 + ( r3 + r ) 3 + sr s p 3 s p 5, 3 where r 1, r 2, p 1, r 6 R and r 3 = ν + iµ, p 3 = α + iβ C. This admittance function can be realized by the RLCG network in Fig. 2.3, with the elements defined by:

36 2.4 Synthesis 27 Figure 2.3: Realization of a general admittance transfer function as a RLCG parallel circuit [65]. pole residue R C L G 1 p 1 = r 1 R r 1 p 2 R r 2 R p 2 1 r 2 r2 p 3 C r 3 C 1 2L(L(να + µβ) α) L(α 2 +β 2 1 2R(να+µβ)) 2ν 2LC(να + µβ) p 4 p 3 r 4 r 3 p 5 = 0 r 5 R r 5 Multi-port transfer functions If the transfer function is an input admittance [H(s) = Y(s)], then a MIMO Foster realization is determined as described in [90]. To the author s best knowledge, MIMO Foster realizations for general impedance transfer functions Z(s) are only known to be feasible for 2-terminal systems. Even in that case, an admittance Foster network is determined first [90], which is converted to an impedance network using a Π Y transformation. Some developments for multiport realizations of RC impedance transfer functions can be found in [87]. The realization in netlist form can be implemented in any language such as SPICE [92], so that it can be reused and combined with other circuits as well. The advantages of Foster synthesis are: (1) its straightforward implementation for single-input-single-output (SISO) transfer functions, via either the impedance or the admittance transfer function, (2) after reducing purely RC or RL circuits via modal approximation [78], the reduced netlists obtained from Foster synthesis are guaranteed to have positive RC or RL values respectively (see [40] for a proof). Note however that Foster synthesis does not guarantee positive circuit elements in general (e.g., when used to synthesize reduced models originating from RLC circuits, or reduced models of RC and RL circuits that were obtained with methods different than modal approximation). The main disadvantage is that for systems with many terminals, the MIMO Foster realization (see for instance [90]) would give dense reduced netlists, since the transfer function between every pair of ex-

37 28 Reduction and synthesis framework for multi-terminal circuits ternal nodes has to be realized. So, for a k-dimensional reduced system with p terminals, the Foster synthesis will yield a total of p 2 k circuit elements. In contrast, synthesis by unstamping is more straightforward to use for MIMO systems and, when applied after structure and sparsity preserving reductions, can yield reduced netlists with much fewer circuit elements (as seen in Chapter 4). To summarize the synthesis section, a comparison between Foster and unstamping is drawn in Table 2.2. Table 2.2: Summary and comparison of synthesis methods To summarize Properties Unstructured projection and Foster realization of Ĥ(s) Passivity guaranteed by positive realness of Ĥ(s) from appropriate V, W I/O preserving reduction and RLCSYN unstamping Passivity ensured via congruence transformation W, W T W = I on Σ in MNA form Advantages Main hurdles Σ need not be in MNA form RC, RL reduced with modal approximation positive elements MIMO realization for > 2 terminals only for admittance H(s) and Ĥ(s) Dense MIMO netlists: for p ports, k poles O(p 2 k) circuit elements Positive R, L, C not guaranteed Preserved I/O and MNA structure Unstamping easy for MIMO via impedance Σ Benefits from sparsity preserving reduction sparse MIMO netlists G may loose rank with SPRIM/IOPOR When reduced G, C, Γ are dense, for p terminals, k internal nodes O((p + k) 2 ) circuit elements Positive R, L, C not guaranteed 2.5 Numerical examples Three circuits are chosen to demonstrate the applicability of the reduction and synthesis framework presented in this chapter. The first is a simple circuit which illustrates the complete admittance-to-impedance formulation of Sect and the RLCSYN unstamping procedure described in Sect The second example is a SISO transmission line model, while the third is a MIMO model of a spiral inductor. For the singleinput-single-output (SISO) examples, one can easily provide synthesized models via

38 2.5 Numerical examples 29 both Foster and RLCSYN. For the multi-input-multi-output (MIMO) example, a synthesized model can be obtained straightforwardly with RLCSYN, thus RLCSYN synthesis is preferred over Foster synthesis Illustrative example Figure 2.4: The admittance-type circuit driven by input voltages from [71]. G 1,2,3 = 0.1S, L 1 = 10 3 H, C 1,2 = 10 6, C c = 10 4, u 1,2 = 1. The circuit in Fig. 2.4 is a simple RLC netlist driven by two voltage sources, u 1 and u 2. The terminals are v 1 and v 4, and the current flowing out of the voltage sources into the terminals are i 1 and i 2 respectively. The MNA equations in admittance form (2.2) are: v 1 G 1 0 G v v 4 0 G 0 0 C 1 +C c C c v v C c C 2 +C c v G 1 0 G 1 +G 2 G v G 2 G v 0 0 [ ] 3 = i S u1 i i S S1 1 0 u 2 i S L i L i L 0 0 (2.13) Notice that: i in = u = [ i1 i 2 [ u1 u 2 ] [ ] is1 = ] = [ v1 v 4 i S2 (2.14) ], (2.15) thus the external nodes (input nodes/terminals) are v 1 and v 4, and the internal nodes are v 2 and v 3. As described in Sect , (2.13) has an equivalent impedance formula-

39 30 Reduction and synthesis framework for multi-terminal circuits tion (2.4), with: G 1 0 G 1 0 C = C 1 +C c C c, G = 0 G G 1 0 G 1 +G 2 G 2 (2.16) 0 0 C c C 2 +C c 0 0 G 2 G L = [ L ] [ ], E L = 1 0, E v = , B =, B 0 1 v = B (2.17) Matrices (2.16), (2.17) are reduced in first order form using SPRIM/IOPOR according to Sect , then synthesized by unstamping via RLCSYN. Note that there is an L directly connected to the second input node v 4, thus assumption 4, Sect from RLCSYN is not satisfied. For simplicity, we thus reduce and synthesize the single-input-singleoutput version of (2.13) only, where the second input i 2 is removed. Therefore the new incidence matrices are: 1 E v1 = 0 0, B 1 = ( 1 ), B v1 = B 1. (2.18) 0 We choose an underlying PRIMA projection matrix V C n k spanning a k = 2- dimensional Krylov subspace (with expansion point s 0 = 0). According to Sect , after splitting V and appropriate re-orthonormalization, the dimensions of the inputoutput structure preserving partitioning are : and the SPRIM/IOPOR projection is: n 1 = 1, n 2 = 3, n L = 1, k 2 = 2, k 3 = 1, (2.19) W = C 5 4, with W C 4 3. (2.20) After diagonalization and regularization, the SPRIM/IOPOR reduced system matrices

40 2.5 Numerical examples 31 in (2.10) are: G = C = , Γ =, Ẽv 1 = (2.21) Reduced matrices (2.21) are now unstamped individually using RLCSYN. The reduced system dimension in second order form is thus N = 3, indicating that the reduced netlist will have three nodes and an additional ground node. In the following, we denote by M i, j i = 1... N, j = 0... N 1 a circuit element connected between nodes (i, j) in the resulting netlist. M represents a circuit element of the type: R,L,C or current source J. By unstamping G, we obtain the following R values : R 1,0 = [ 3 k=1 R 2,0 = [ 3 k=1 G (1,k) ] 1 = Ω, R 1,2 = [ G (1,2) ] 1= Ω, R1,3 = [ G (1,3) ] 1= Ω, G (2,k) ] 1 = Ω, R 2,3 = [ G (2,3) ] 1= Ω, R3,0 = By unstamping C, we obtain the following C values: [ 3 k=1 G (3,k) ] 1 = Ω. C 2,0 = C 3,0 = 3 k=1 [ 3 k=1 C (2,k) = F, C 2,3 = C (2,3) = , F, C (3,k) ] 1 = F. By unstamping Γ, we obtain the following L values: L 3,0 = [ 3 k=1 Γ(3,k) ] 1 = H. By unstamping Ẽv 1, we obtain the current source J 1,0 of amplitude 1 A. The equivalent netlist written in circuit simulation language (SPICE [92]) is shown in Fig. 2.5 Table 2.3 summarizes the reduction and synthesis results. Even though the number of internal variables (states) generated by the simulator is smaller for the SPRIM/IOPOR model than for the original, the number of circuit elements generated by RLCSYN is larger in the reduced model than in the original (note that this is a demonstrative exam-

41 32 Reduction and synthesis framework for multi-terminal circuits.subckt small_rlc 1 0 r_1_ e+000 r_1_ e+001 r_1_ e+001 r_2_ e+000 r_2_ e+001 r_3_ e+001 l_3_ e-002 c_2_ e-005 c_2_ e-005 c_3_ e-004 * Connect source between terminals 1 and 0 * Resistors 6 * Capacitors 3 * Inductors 1.ends small_rlc Figure 2.5: Netlist description obtained from RLCSYN unstamping for the reduced model (2.21). ple only hence no real candidate for reduction). Chapters 4, 5 and 6 address explicitly the problem of minimizing the fill generated inside the reduced matrices. Fig. 2.6 shows that approximation with SPRIM/IOPOR is more accurate than with PRIMA. The circuit re-simulation of the RLCSYN synthesized model also matches the MATLAB simulation of the reduced transfer function. Table 2.3: Input impedance reduction (SPRIM/IOPOR) and synthesis (RLCSYN) System Dimension R C L States Inputs/Outputs Original SPRIM/IOPOR SISO RLC network We reduce the SISO RLC transmission line in Fig Note that the circuit is driven by the voltage u, thus it is of admittance type (2.2). In [44], reduction via the admittance transfer function was shown using various methods, in particular with the dominant spectral zero method (Dominant SZM). Here we also reduce the circuit via its impedance formulation (2.4) as described in Sect , by removing the voltage source and driving the circuit via a current flowing into its terminal. After reduction and synthesis via the input impedance, we show that the reduced admittance is recovered as well in simulation. This is done easily by driving the reduced synthesized impedance model via a voltage at the input terminal.

42 2.5 Numerical examples 33 Figure 2.6: Original, reduced and synthesized systems: PRIMA, SPRIM/IOPOR. The MATLAB reduced (red) and SPICE synthesized (green) models overlap, as expected. Figure 2.7: Transmission line for example of Sect Fig. 2.8 shows the input impedance of the original impedance model (2.4) and reduced with Dominant SZM. The model is passive and stable [44] and is synthesized using the impedance Foster realization in Sect (the circuit simulation of the synthesized model is superimposed with the MATLAB simulation). For comparison, the direct admittance-based Dominant SZM reduction [44], synthesized with the Foster admittance approach (Sect ), is shown in Fig Both the reduced impedance (Fig. 2.8) and admittance (Fig. 2.10) capture the behavior of the original model well for the entire frequency range, and can also reproduce oscillations at dominant frequency points. The SPRIM/IOPOR reduced impedance model is shown in Fig. 2.9, together with the response obtained in the re-simulation of the synthesized model obtained with RLCSYN. Note that if the original circuit had been reduced directly from the admittance form (2.2), synthesis by unstamping via RLCSYN would have required controlled sources to model the input of the reduced network [36]. The benefit of the admittance-to-impedance transformation described in Sect is seen in Fig By reducing the system in impedance form with SPRIM/IOPOR and synthesizing (2.5) [via RLCSYN unstamping of the second order form (2.10)], we are

43 34 Reduction and synthesis framework for multi-terminal circuits Figure 2.8: Input impedance transfer function: original, reduced with Dominant SZM (red) and synthesized via Foster impedance (green). Figure 2.9: Input impedance transfer function: original, reduced with SPRIM/IOPOR (red) and synthesized with RLCSYN (green). Figure 2.10: Input admittance transfer function: original, reduced with Dominant SZM in admittance form (red) and synthesized with Foster admittance (green). Figure 2.11: Input admittance transfer function: original admittance response is compared to the admittance response of the reduced, RLCSYN synthesized SPRIM/IOPOR model. able to recover the reduced admittance (2.6) as well. This result is shown in Fig. 2.11: the original admittance transfer function (2.2) is compared to the circuit re-simulation of the synthesized model, after reinserting the voltage source as an input according to (2.6). The approximation is good for the entire frequency range, except close to 0 where the oscillating behavior is over-estimated in the reduced model. Table 2.4 summarizes the reduction and synthesis. With both methods, the dimension (number of nodes in the netlist), the number of internal variables (states), as well as the number of circuit elements were reduced.

44 2.6 Concluding remarks 35 Table 2.4: Impedance reduction and synthesis for transmission line in Fig System Dimension R C L States Simulation time Original s Dominant SZM s SPRIM/IOPOR s MIMO RLC network We reduce the MIMO RLC netlist resulting from the parasitic extraction of a coil structure [33]. The model has 4 pins (external nodes). Pin 4 is connected to other circuit nodes only via C s, which causes the original model (2.4) to have a pole at 0. The example shows that: (1) the SPRIM/IOPOR model preserves the terminals and is synthesizable with RLCSYN without controlled sources, and (2) structure preserving projections may perturb the location of poles and zeros 0, affecting the approximation quality at low frequencies (a technique which remedies this effect is proposed in Chapter 5). Fig shows the simulation of the transfer function from input 4 to output 4, which clearly reflects the presence of the pole at 0 due to the large response magnitude for low frequencies. By inspecting the response from input 3 to output 3 however, we notice in Fig that the SPRIM/IOPOR model is less accurate around DC than PRIMA. The SPRIM/IOPOR (2.7) projection approximates the original pole at 0 by a small pole (not at 0), but still places zeros at 0 due to the dependencies created in the Rosenbrock matrix Ã z (see Sect ). Such pole-zero pairs can no longer cancel numerically, affecting the approximation quality for low frequencies. The alternative is to ground pin 4 prior to reduction. This will remove the pole at 0 from the original model, resolve the corresponding numerical cancellation effects, and improve approximation around DC. As seen from Fig. 2.15, SPRIM/IOPOR applied on the remaining 3-terminal system gives a better approximation than PRIMA for the entire frequency range. The transient simulation in Fig confirms that the SPRIM/IOPOR model is both accurate and stable. With pin 4 grounded however, we loose the ability to (re)connect the synthesized model in simulation via all the terminals. This scheme was adopted here only to demonstrate possible numerical sensitivities of certain reduction projections to the presence/absence of system poles at 0. In Chapters 4, 5 robust reduction methods are presented which overcome such numerical limitations when reducing large, multi-terminal circuits, while preserving the connectivity at all terminal nodes. 2.6 Concluding remarks A framework was presented for the reduction and synthesis of multi-terminal systems arising in circuit simulation. An admittance to impedance conversion was proposed as a pre-model reduction step and shown to enable synthesis without controlled sources.

45 36 Reduction and synthesis framework for multi-terminal circuits Figure 2.12: Input impedance transfer function with v 4 kept: H 44 for PRIMA, SPRIM/IOPOR and RLC- SYN realization. Figure 2.13: Input impedance transfer function with v 4 kept: H 33 for PRIMA, SPRIM/IOPOR and RLC- SYN realization. Figure 2.14: Transient simulation with v 4 grounded: voltage measured at node 2 for SPRIM/IOPOR (from RLCSYN realization). Figure 2.15: Input impedance transfer function with v 4 grounded: H 33 for PRIMA, SPRIM/IOPOR and RLC- SYN realization. This simple yet elegant approach gives the theoretical foundation for detaching voltage sources or non-linear elements before the reduction phase and reinserting them easily afterwards. Two synthesis approaches were described: RLCSYN [93] synthesis by unstamping (for MIMO systems) and Foster realization (for SISO transfer functions). The reduction and synthesis framework was tested on several examples of moderate size. The framework forms the basis for the multi-terminal reduction methods of Chapters 4, 3, 5 and 6 which are focused on more challenging industrial problems.

46 Chapter 3 Reduction of multi-terminal R/RC networks A multi-terminal reduction framework for R/RC circuits is presented, based on two related methodologies, ReduceR [80] and PACT [56]. The underlying reducing projection is shown here to preserve the positiveness of resistors, the path resistance between circuit terminals, and also to support a partition-based implementation. This is an important result which guarantees that, when circuits are reduced by parts, the same desirable properties are retained (moment-matching, passivity, terminal connectivity) which are normally ensured by an unpartitioned approach. Several realistic multi-terminal parasitic networks are reduced in this framework, demonstrating approximation quality and significant computational speed-ups in re-simulations. 3.1 Introduction With the decrease in feature sizes and the increasing complexity of VLSI circuit designs, parasitic effects have to be analyzed as to properly understand and control the relative impact between different components on a chip. The simulation of parasitic R and RC networks is however often a difficult task, as these networks may contain millions of nodes among which thousands are connection nodes (terminals) to nonlinear devices such as diodes or transistors. Reducing such multi-terminal networks to circuits with fewer nodes and elements can help to perform the desired simulations at much lower computational cost. In this chapter, the focus is on a multi-terminal reduction framework based on [80] for for R and [56] for RC networks, denoted here as ReduceR and PACT, respectively. These

47 38 Reduction of multi-terminal R/RC networks methods turn out to be governed by the same reducing projection, whose important advantages are: passivity preservation, moment matching at DC, and an immediate, natural, preservation of the connectivity at the terminal nodes (what in Chapter 2 is referred to as input/output structure preservation). ReduceR achieves tremendous reduction rates for very large resistor networks with thousands of terminals, but the question of whether the resulting reduced resistor networks have only positive resistors remains open in [80]. Here, it is shown that the resistors in the reduced network are indeed positive (a different proof was derived independently in [91]). This result extends immediately for the reduction of multi-terminal RC networks which are governed by the same underlying projection as ReduceR, for instance that underlying PACT, although this was not shown in the original reference [56]. Turning the attention to the PACTbased reduction of multi-terminal RC networks, this chapter proposes a partition-based implementation of PACT which has the potential to reduce more efficiently very large RC networks, for which forming a reducing projection directly on the entire network is too costly or unfeasible. It can also improve the overall sparsity of the reduced network, by applying fill reducing node reorderings per subnet to identify fill-creating nodes. More advanced partitioning and reordering strategies, especially aimed at improving sparsity, are given in Chapter 4. The partition-based RC reduction in addition reveals a simple and efficient solution to the problem of computing path resistances between terminals. This chapter is organized as follows. In Sect , the general setup for R/RC is described. Sect. 3.2 gives the proof on the positiveness of resistors. Sect. 3.3 derives the partition-based reduction of RC circuits, shows its mathematical equivalence to the PACT projection, provides the solution for the computation of path resistances, and describes the actions for further improving sparsity. Numerical results are provided in Sect. 3.4 and Sect. 3.5 concludes Reduction setup for R/RC networks The model reduction setup for R and RC networks in this chapter follows [80] and [56] respectively, and is presented here in preparation for the additional derivations of this chapter. Consider the modified nodal analysis (MNA) [37] description of an RC circuit: (G + sc)x(s) = Bu(s), (3.1) where MNA matrices G, C are symmetric, non-negative definite, corresponding to the stamps of resistor and capacitor values respectively. x R m+p denote the node voltages (measured at the m internal nodes and the p terminals) and m + p is the dimension of (3.1). u R p are the currents injected into the terminals. The outputs are the voltage drops at the terminal nodes: y(s) = B T x(s). The underlying matrix dimensions are: G, C R (m+p) (m+p), B R (m+p) p. Let the nodes x be split into terminal nodes x P to

48 3.1 Introduction 39 be preserved, and x I internal nodes to be eliminated: ([ GI G C G T C G P ] [ ]) [ ] [ ] CI C + s K xi 0 CK T = u. (3.2) C P x P B P Let the following projection: [ G 1 V = I I G C ], (3.3) where I R p p is the identity matrix, be applied to (3.2). From simple linear algebra, the Galerkin projection: Ĝ = VT GV, Ĉ = VT CV, B = V T B reduces (3.2) to: (Ĝ + sĉ)x P = Bu, where : (3.4) Ĝ = G P GC T G 1 I G C, B = B P (3.5) Ĉ = C P + CC T W + WT C C + W T C I W, W = G 1 I G C. (3.6) The projection matrix (3.3) is the one governing ReduceR [80] for the reduction of resistor networks, and PACT [56] for RC networks. An important property of (3.3) is that it preserves the path resistance (defined in Sect ) of the original network (3.2). Especially relevant for RC reduction is that (3.3) preserves the 0 th and 1 st moments at DC (s = 0) of the original network, where the path resistance is precisely the first moment. This is formalized as Proposition in Chapter 4, where a more detailed presentation of PACT is provided. Furthermore, being a congruence transformation acting on nonnegative definite matrices, (3.3) preserves the passivity of (3.2) (the proof is provided in [56]). Finally, notice that, since B = B P, the input incidence matrix of the reduced network is the same as in the original, and so the connectivity via terminal nodes is preserved (the importance of this feature, also known as input/output structure preservation in the context of synthesis is discussed in Chapter 2). In the following sections, the solution to two problems is presented. In Sect. 3.2, it is shown that the resistor network characterized by the reduced conductance matrix (3.5) contains only positive resistor values. This result guarantees that the ReduceR [80] methodology generates positive-only resistors, and also holds for the multi-terminal RC reduction methods of Sect. 3.3 and Chapter 4. In Sect. 3.3, a partition-based reduction of multi-terminal RC networks is derived, shown to be mathematically equivalent to PACT, thus inheriting its afore-mentioned properties. The partition-based reduction provides additional advantages such as: the ability to split a network into sub-parts and reduce them individually while satisfying the same requirements on accuracy, moment matching, passivity preservation and terminal connectivity.

49 40 Reduction of multi-terminal R/RC networks 3.2 Reduction of R/RC networks with positive-only resistors The graph-based multi-terminal model reduction methods for R [80] and RC networks [49] are based on the projection (3.3). Notice from (3.5) that the reduced Ĝ is the Schur complement of block G I in G. It is demonstrated here that, if the resistors in the original network are positive, the resistors in the reduced circuit characterized by Ĝ are also guaranteed to be positive (for a different proof derived independently see [91]). As a special case, it is also shown that if the original network has no resistors connected to the ground node, the reduced network will also have no resistors to ground. For simplicity of presentation, the following derivations are based on the reduction of purely resistive networks. The generalization to RC follows immediately, as the underlying projection matrix (3.3) is the same in both cases. Consider a resistor network described by the conductance matrix G of (3.1) with n = m + p nodes [from which m are internal nodes and p are terminals (external nodes)]. Assume that the graph associated with this resistor network is strongly connected, i.e. there is a path of resistive connections between every pair of nodes in the network. Otherwise the network is split into its strongly connected components, and the derivations are applied per component; this is the subject of Sect 3.3. Naturally, assume also that all resistors in the original network are positive. We show that if the original conductance matrix G is characterized by positive resistors only, the reduced conductance matrix Ĝ = G P GT C G 1 I G C is unstamped into positive resistors as well Properties of G and unstamping Before giving the proof, we describe the structure of the conductance matrix G and identify its special properties. G has the following form: G = g 1,1 g 1,2 g 1,n g 2,1 g 2,2 g 2,n... g n,1 g n,2 g n,n, (3.7) where g i, j > 0 for i, j = 1, n 1. The resistor values in the network and their topology are unstamped ( read-off ) from G as follows: 1. the resistor between node i and node j is: r i, j = g i, j 1, i = j. 1 i = 1, n stands for i = 1... n

50 3.2 Reduction of R/RC networks with positive-only resistors the resistor between node i and the ground node is: r i = ( g i,i n j=1, j =i g i, j ) 1 ; if there is no resistor between node i and ground (r i = ), then the sum of the i th row of G is zero: g i,i n j=1, j =i g i, j = 0; Throughout the following derivations, we denote as G i, j the (i, j) th entry in matrix G, thus G i, j = g i, j for i = j and G i,i = g i,i. The properties of a resistor network characterized by the conductance matrix (3.7) are summarized in Claim Claim The conductance matrix G of a resistive network with positive resistors satisfies the following properties: 1. G 0, G = G T i.e., G is symmetric positive semi-definite (a) If the network is grounded (i.e. voltages are measured with respect to a reference node), then G > 0, i.e., G is positive definite 2. G i,i > 0, i = 1, n, i.e., diagonal entries are positive 3. G i, j 0, i, j = 1, n, i = j, i.e., off-diagonal entries are negative (or 0) and at least one is non-zero 4. G is diagonally dominant, i.e : G i,i n G i, j (3.8) j=1, j =i (a) If the network is ungrounded, then (3.8) is satisfied with equality (the row/column sum of G is zero) Unstamping the reduced Ĝ For the reduced resistive network to be characterized by positive resistors only, Ĝ must also satisfy the properties of Claim (3.2.1). We prove the following: Theorem Given a conductance matrix G which satisfies Claim 3.2.1, the reduced matrix Ĝ = G P GC T G 1 I G C obtained by eliminating the internal nodes from the network also satisfies Claim Hence the reduced Ĝ is unstamped into positive resistors only. Proof The proof follows by induction. To this end, we introduce the following notation: let G (1), G (2),..., G (m) denote the reduced matrices obtained by eliminating one, two,..., m internal nodes respectively from G. With this notation, G (0) = G is the original conductance matrix and G (m) = Ĝ is the final reduced conductance matrix, with all m internal nodes eliminated.

51 42 Reduction of multi-terminal R/RC networks Case m = 1 eliminated internal node We show that Theorem (3.2.1) holds for the reduced network obtained by eliminating the first internal node. G 1,1 G 1,2 G 1,n [ ] [ ] G 2,1 G 2,2 G 2,n G1,1 G =... = G c1 1 G 1 G T, X c 1 G 1 = 1,1 G c 1 (3.9) P1 0 I n 1 G n,1 G n,2 G n,n [ ] G 1 = X1 T GX 1 = G1,1 0 0 G P1 G 1. (3.10) 1,1 GT c 1 G c1 Let G (1) := G P1 G 1 1,1 GT c 1 G c1, G (1) R (n 1) (n 1), which represents the reduced network with the first internal node eliminated. We have: = G (1) =G P1 G 1 1,1 GT c 1 G c1 = (3.11) G 2,2 G 1 G 1,1 2,1 G 1,2 G 2,3 G 1 G 1,1 2,1 G 1,3 G 2,n 1 G G 1,1 2,1 G 1,n G 3,2 G 1 G 1,1 3,1 G 1,2 G 3,3 G 1 G 1,1 3,1 G 1,3 G 3,n 1 G G 1,1 3,1 G 1,n (3.12).... G n,2 G 1 G 1,1 n,1 G 1,2 G n,3 G 1 G 1,1 n,1 G 1,3 G n,n G 1 G 1,1 n,1 G 1,n We show that G (1) as in (3.12) satisfies the properties of Claim 3.2.1: 1. G (1) 0, i.e., G (1) is positive semi-definite: X 1 from (3.10) is a congruence transformation, hence, because G 0, G1 [ ] 0. This implies that Y 1, Y1 T G 0 1 Y 1 0. In particular, let Y 1 = I Y T 1 G 1 Y 1 = G P1 G 1 1,1 GT c 1 G c1 = G (1) G (1) k,k > 0, k = 1, n 1, i.e. G(1) has positive diagonal entries: This follows easily from the fact that G (1) 0 is positive definite, thus y T G (1) y 0 for y C n 1. In particular let y = e k, i.e. the k th unit vector. Then G (1) k,k = e T k G(1) e k > 0. Note that e T k G(1) e k = 0. To show this assume for a moment that e T k G(1) e k = 0. Then: e T k G(1) e k = G (1) k,k = G i,i 1 G 1,1 G i,1 G 1,i = 0, i = 2, n (3.13) G i,i G 1,1 = G i,1 G 1,i (3.14)

52 3.2 Reduction of R/RC networks with positive-only resistors 43 Due to the diagonal dominance (3.8), we know G i,i > G i,1 and G 1,1 > G i,1. Hence (3.14) is false, and the assumption that ek T G(1) e k = G (1) k,k = 0 is false. N. B. A final note on the possibility of (3.14) being true: this holds only if G i,i = G 1,1 = G i,1 = G i,1 (recall from G = G T that G i,1 = G i,1 ). This would mean that, prior to the elimination of node 1, nodes i and node 1 form one strongly connected component, 1 with a resistor of value G as the edge between them, and no other outgoing edges from 1,1 nodes 1 and i to the other circuit nodes. This however violates the assumption that G forms one strongly connected component. 3. G (1) k,l 0 for k, l = 1, n 1, k = l i.e. G(1) has non-positive off-diagonal entries: G (1) k,l k =l = G i, j 1 G 1,1 G i,1 G 1, j, i, j = 2, n, i = j Clearly, from the properties of G (see Claim 3.2.1) for i, j = 2, n, G i, j 0, G i,1 0, G 1, j 0 and G 1,1 > 0, thus: i = j we know that: G i, j 1 G 1,1 G i,1 G 1, j G (1) is diagonally dominant 2, i.e. : This is equivalent to showing that, for i 2: G (1) n 1 k,k G (1) k,l. (3.15) l=1,l =k G i,i 1 n G G i,1 G 1,i G i, j 1 G 1,1 G i,1 G 1, j. (3.16) j=2, j =i 1,1 Using triangle s inequality: n G i, j 1 G G i,1 G 1, j j=2, j =i 1,1 n j=2, j =i G i, j + } {{ } :=S 1 n j=2, j =i 1 G G i,1 G 1, j 1,1 } {{ } :=S 2 (3.17) Recall from Claim that G is diagonally dominant, thus: G i,i n n G i, j = G i,1 + G i, j (3.18) j=1, j =i j=2, j =i S 1 G i,i G i,1 (3.19) 2 Part of this proof is inspired from a homework assigned during the course CAAM 453/553 Numerical Analysis I, taken at Rice University in 2005, taught by Prof. Mark Embree.

53 44 Reduction of multi-terminal R/RC networks Also: S 2 = n j=2, j =i 1 1 G G i,1 G 1, j = G 1,1 G i,1 1,1 Again, from G diagonally dominant, we know: n G 1, j. (3.20) j=2, j =i G 1,1 n n G 1, j = G 1,i + G 1, j (3.21) j=1, j =i j=2, j =i n G 1, j G 1,1 G 1,i (3.22) j=2, j =i Inserting (3.22) into (3.20) yields: 1 S 2 G G i,1 ( G 1,1 G 1,i ) = G i,1 1 G 1,1 G i,1 G 1,i. (3.23) 1,1 Adding (3.19) and (3.23): S 1 + S 2 G i,i 1 G 1,1 G i,1 G 1,i G i,i 1 G 1,1 G i,1 G 1,i (3.24) Recalling (3.17), we conclude that for i 2: n G i, j 1 G G i,1 G 1, j S 1 + S 2 G i,i 1 G j=2, j =i 1,1 G i,1 G 1,i, 1,1 which is nothing but (3.16). Hence G (1) is diagonally dominant. (a) We also show that if the original network in ungrounded [item 4(a) of Claim (3.2.1) holds for G], then the network determined by G (1) is also ungrounded. In other words, if the row sum of G is zero, then the row sum of G (1) is also zero,.i.e. : G (1) n 1 k,k = G (1) k,l (3.25) l=1,l =k From G (1) k,k > 0 and G(1) k,l 0 for k = l (3.25) writes: G i,i 1 G G i,1 G 1,i + 1,1 [ G i,i + ] n G i, j 1 G G i,1 j=2, j =i 1,1 G (1) k,k + n [ G i, j 1 j=2, j =i [ G 1,i + n 1 l=1,l =k G 1,1 G i,1 G 1, j G (1) k,l = 0 (3.26) ] ] n G 1, j j=2, j =i = 0 (3.27) = 0 (3.28)

54 3.2 Reduction of R/RC networks with positive-only resistors 45 From (3.12), note that the above derivations hold for i 2. Also recall the original assumption that the row/column sums of G are zero. From the i th row sum of G being zero, we have: [ ] n n G i,i + G i, j = 0 G i,1 = G i,i + G i, j (3.29) j=1, j =i j=2, j =i From the 1 st row sum of G being zero, we have: [ ] n n G 1,i + G 1, j = 0 G 1,1 = G 1,i + G 1, j j=1, j =i j=2, j =i (3.30) Replacing (3.29) and (3.30) in (3.28) we arrive at: G i,1 + 1 G 1,1 G i,1 G i,1 = 0 0 = 0. (3.31) This shows that (3.25) holds, and that, if the row sum of G is zero, the row sum of G (1) is also zero. This implies that the reduced G (1) has no resistors to ground. With all four properties of Claim (3.2.1) being shown for G (1), this concludes the proof of the first induction step. Case m 1 eliminated internal nodes Assume that the reduced G (m 1) R (p+1) (p+1) obtained by eliminating m 1 internal nodes satisfies the properties of Claim Hence G (m 1) is characterized by positive only resistors. Case m eliminated internal nodes At the m th elimination step, the final reduced matrix G (m) is obtained from G (m 1) as follows. G (m 1) = X m = G (m 1) 1,1 G (m 1) 1,2 G (m 1) 1,p+1 G (m 1) 2,1 G (m 1) 2,2 G (m 1) 2,p+1. G (m 1) p+1,1 [ 1 1 G (m 1) 1,1 0 I m.. p+1,2 ] G (m 1) G (m 1) G c1 G m = X T m G(m 1) X m =, p+1,p+1 = G (m 1) 1,1 0 0 G (m 1) P 1 1 G(m 1) 1,1 G (m 1) c 1 G (m 1) T c 1 G (m 1) 1,1 G (m 1) c 1 G (m 1) P 1 T G (m 1) c 1,.

55 46 Reduction of multi-terminal R/RC networks Then G (m) := G (m 1) P 1 1 G (m 1) 1,1 G (m 1) T c 1 G (m 1) c 1, G (m) R p p, which represents the reduced network with all m internal nodes eliminated. Similarly to the case of one eliminated internal node (3.12), we have: = G (m) =G (m 1) P 1 G (m 1) 2,2 1 G (m 1) 1,1 G (m 1) p+1,2 1 1 G (m 1) 1,1. G (m 1) 1,1 G (m 1) T c 1 G (m 1) c 1 = (3.32) G (m 1) 2,1 G (m 1) 1,2 G (m 1) 2,p+1 1 G (m 1) 1, G (m 1) p+1,1 G(m 1) 1,2 G (m 1) p+1,p+1 1 G (m 1) 1,1 G (m 1) 2,1 G 1,p+1 G (m 1) p+1,1 G(m 1) 1,p+1. (3.33) Note that (3.33) has the same structure and properties as (3.12). Hence the proof that G (m) satisfies Claim (3.2.1) [based on the fact that G (m 1) satisfies these properties] follows in the same manner as for G (1). Having completed the final induction step, we conclude that the final reduced conductance matrix Ĝ = G(m) inherits the properties of the original G and is characterized by positive-only resistors. Also, if G has no resistors to ground, then Ĝ = G(m) will not have resistors to ground either. 3.3 Partition-based reduction of RC networks In this section, a partition-based reduction for multi-terminal RC networks is derived, and its equivalence to PACT [56] is demonstrated. A change in notation is introduced from bold to caligraphic letters, as to more easily distinguish between blocks of an unpartitioned vs. a partitioned matrix respectively. The starting system matrices underlying (3.1) are now G, C, B, to be partitioned and reduced as described next. Let Q be the permutation of the circuit nodes which reveals the strongly connected components 3 (SCCs) of G, QQ T = I. The original system permuted with Q is then: (Q T GQ + sq T CQ)Q T x(s) = Q T Bu(s) (3.34) ([ ] [ ]) [ ] [ ] G1 0 C1 C + s 12 x1 B1 0 G 2 C T = (3.35) 12 C 2 x 2 B 2 If G has N SCCs, in (3.35) G 1 denotes the first SCC of G, and the remaining N 1 SCCs are grouped into G 2 = blockdiag(g 2,2,..., G N,N ). This splitting separates the original network into sub-networks that are only connected by capacitors: C 12 is the connectivity block from subnet-1 (specified by G 1, C 1 ) to subnet-2 (specified by G 2, C 2 ). With (3.35) a 3 A graph is called (strongly) connected if there exists a path from any vertex to any other vertex in the graph [19].

56 3.3 Partition-based reduction of RC networks 47 network topology has been identified, where subnetwork G 1 can be reduced separately from the remaining G 2. Even though G 2 further contains SCCs of G, it is interpreted as one block during the reduction of G 1. B 1 and B 2 contain the input incidence of current injections into the terminals of subnetwork 1 and 2 respectively. Note that the number of terminals from the original netlist is split in between subnetwork 1 and 2. Thus the subnetworks have fewer nodes and terminals than the original. For clarity, from here onwards G and C are assumed to be already permuted according to the SCCs of G, and have the structure (3.35) Reduction per subnetwork Let V = blockdiag(v 1, I 2 ) be the projection reducing the original network (3.35), where V 1 reduces subnetwork-1 and I 2 is the identity matrix keeping subnetwork-2 unreduced. Projecting (3.35) with V one obtains the reduced model (4.13,4.14) in the form: ( [ Ĝ1 0 0 G 2 where subnetwork-1 is reduced to: ] [ ]) [ ] [ ] Ĉ1 + s Ĉ12 x1 B Ĉ T = 1, (3.36) 12 C 2 x 2 B 2 Ĝ 1 = V T 1 G 1 V 1, Ĉ1 = V T 1 C 1 V 1 (3.37) Ĉ 12 = V T 1 C 12, B 1 = V T 1 B 1, x 1 = V T 1 x 1, (3.38) The appropriate V 1 can be chosen according to the preferred MOR method. Here, if the ratio #terminals/#nodes in subnetwork-1 is small, V 1 is constructed with the method from Sect as follows. The initial partitioning according to the SCCs of G has split the network in such a way, that applying PACT successively on all subnetworks (i.e. SCCs of G) would be equivalent to applying PACT directly on the full circuit. This is formalized in Theorem The partitioning has computational and structural advantages: (a) PACT reduction applied on each subnetwork becomes cheaper computationally than on the full problem, (b) the matrix fill-in introduced by PACT can be monitored on each subnetwork separately (see Sect ). Theorem Given is an RC circuit as in (3.35), partitioned into subnetworks corresponding to the strongly components of the conductance matrix G. The reduced model obtained from applying PACT successively on each subnetwork of (3.35) is the same (up to permutations) as the reduced model obtained from applying PACT directly on (4.12). Proof The proof follows by induction. We show that the claim of Theorem holds while reducing the subnetwork corresponding to the first strongly connected component (SCC) of G. Then, assuming that N 1 subnets corresponding to the first N 1 SCCs of G have been reduced, we show that Theorem holds when reducing subnet N corresponding to the N th SCC of G.

57 48 Reduction of multi-terminal R/RC networks Step 1 At the first reduction step, we have subnetwork-1 defined by G 1 := G 1,1 (the first SCC of G), and subnetwork-2 defined by G 2 :=blockdiag(g 2,2,... G N,N ) [the remaining N 1 SCCs of G]: G = C = G 1, G 2, [ ] 0 0 G 3, G1 0 := (3.39). 0 G G N,N C 1,1 C 1,2 C 1,3... C 1,N C1,2 T C 2,2 C 2,3... C 2,N [ ] C T 1,3 C2,3 T C 3,3... C 3,N C1 C := C T, (3.40) 12 C 2. C1,N T CT 2,N CT 3,N... C N,N The goal is to show that the claim of Theorem holds when reducing subnetwork-1 while keeping subnetwork-2 unreduced. This will be done by constructing two reduced models, shown to be the same: (1) M1: subnetwork-1 is considered individually and reduced with PACT, and (2) M2: the entire network is considered, from which the internal nodes of subnetwork-1 are eliminated, while the terminals of subnetwork-1 together with all the nodes of subnetwork-2 are preserved. For each of these cases, consider (3.35) with the splitting of nodes: x = [x T 1, xt 2 ]T, where x 1 = [x T 1 R, x T 1 S ] T. 1. M1: Subnetwork-1 is considered individually and reduced with PACT using the partitioning of x 1 : into internal nodes to be eliminated x R := x 1R, and selected nodes to be preserved x S := x 1S. Subnetwork-2 is kept unreduced. To visualize this, consider (3.35) with the following splitting: G R G K 0 C R C K C 12R GK T G S 0 +s CK T C S C 12S 0 0 G 2 C12 T R C12 T S C 2 }{{} G+sC x 1R x 1S x 2 }{{} x = 0 B 1S B 2 }{{} B u. (3.41) 2. M2: The entire original network (3.35) is reduced directly with PACT using the partitioning of x into: internal nodes to be eliminated x R := x 1R, and selected nodes to be preserved x S := [x T 1 S, x T 2 ]T. Again, to visualize consider (3.35) with the corresponding splitting: G R G K 0 C R C K C 12R x 1R GK T G S 0 +s CK T C S C 12S x 1S = 0 0 G 2 C12 T R C12 T S C 2 x 2 }{{}}{{} G+sC x 0 B 1S B 2 }{{} B u. (3.42)

58 3.3 Partition-based reduction of RC networks 49 We show that the PACT transformation V which reduces the network by eliminating x R1 is the same for both M1 and M2. Therefore the reduced models M1 and M2 will be the same at the end of Step In view of obtaining M1, we start from (3.41) and let the reducing projection V be partitioned into: V = G 1 R G K 0 [ ] I S 0 V1 0 :=, (3.43) 0 I 0 I 2 2 [ G 1 where V 1 := R G ] [ ] K W1 := reduces subnetwork-1 with PACT, and I 2 keeps I S I S subnetwork-2 unreduced. The reduced matrices for subnetwork-1 are formed using ( ): Ĝ 1 = G S GK T G 1 R G K, W 1 := G 1 R G K (3.44) Ĉ 1 = C S + W1 T C R W 1 + W 1 T C K + CT K W 1, (3.45) Ĉ 12 = C 12S + W T 1 C 12 R (3.46) B 1 = B 1S, x 1 = x 1 S. (3.47) The reduced model M1 for (3.41) follows from (3.36): Ĝ 1, [ ] 0 G 2, Ĝ = V T Ĝ1 0 GV = = 0 0 G 3,3... 0, (3.48) 0 G G N,N Ĉ 1,1 [ ] Ĉ1,2 Ĉ 1,3... Ĉ1,N Ĉ Ĉ = V T Ĉ1 CV = Ĉ12 1,2 T C 2,2 C 2,3... C 2,N Ĉ T = Ĉ T 12 C 1,3 C2,3 T C 3,3... C 3,N, (3.49) [ x = V T x1 x = x 2 Ĉ1,N T CT 2,N CT 3,N... C N,N ] [ ] B, B = V T B = 1 B 2 (3.50) 2. In view of obtaining M2, consider (3.42) with the following block assignments: G R :=G R, G K := [ G K 0 ], (3.51) [ ] C R :=C R, C K := C K C 12R (3.52) [ ] [ ] GS 0 CS C 12S G S :=, C 0 G S := 2 C12 T. (3.53) S C 2

59 50 Reduction of multi-terminal R/RC networks As described in Sect , the PACT transformation which reduces the network (3.42) up to the selected nodes x S = [x T 1 S, x T 2 ]T is given by ( ). Recalling (3.51) and denoting I S2 :=blockdiag(i S, I 2 ), this becomes: V = [ G 1 R G K I S2 ] = G 1 R G K 0 I S 0 = V, (3.54) 0 I 2 the same as (3.43). The reduced M2 model for (3.42) is thus computed from ( ) using the assignments ( ). Performing the computations reveals that the M2 reduced PACT system is: [ ] [ ] G S = Ĝ1 0 = 0 G Ĝ, C S = Ĉ1 Ĉ12 2 Ĉ T = 12 C Ĉ (3.55) 2 [ ] [ ] B B S = 1 = B, x S = = x, (3.56) xx2 B 2 where ( ) hold. This is the same as the reduced model M1 ( ). We have shown the equivalence between reduced models M1 and M2 after the reduction of the first subnet. We proceed for reducing the next candidate, subnet-2 corresponding to the 2nd SCC of G. Recall that G 2 = blockdiag(g 2,2,... G N,N ) contains the unreduced SCCs of G. The reduced system ( ) is permuted next, so that the reduced subnetwork-1 moves to the bottom and G 2,2 (and the corresponding blocks) is promoted as the next candidate for reduction: P = P TĈ P T = [ 0 Î1 I 2 [ 0 C 2 ĈT 12 Ĉ 12 Ĉ1 ], P TĜ [ P T G2 0 = ] 0 Ĝ1 := C, P T x=[ x2 x 1 ] := x, P T B = ] := G, (3.57) [ B2 ] := B. (3.58) B 1 The matrices are redefined for the reduction at step 2 as follows: G = C = G 2, G 3, [ ]..... G1 0.. := 0 G G N,N Ĝ1,1 C 2,2 C 2,3... C 2,N Ĉ2,1 C2,3 T C 3,3... C 3,N Ĉ3,1 [..... C1 C.. := 12 C2,N T CT 3,N... C C12 T C 2 N,N Ĉ N,1 ] (3.59), (3.60) Ĉ T 2,1 Ĉ T 3,1... ĈT N,1 Ĉ 1,1 where G 1, C 1 describe the new subnetwork-1 to be reduced, G 2, C 2 describe the new subnetwork- 2, and C 12 is the new connectivity block between them.

60 3.3 Partition-based reduction of RC networks 51 Step N At the end of reduction step N 1, we have the reduced matrices: Ĝ = Ĉ = Ĝ N 1,N G N,N Ĝ1, ĜN 2,N 2 Ĉ N 1,N 1 ĈN 1,N Ĉ N 1,1... ĈN 1,N 2 ĈN 1,N T C N,N ĈN,1... ĈN,N 2 ĈN 1,1 T ĈN,1 T Ĉ 1,1... Ĉ1,N Ĉ1,N 2 T... ĈN 2,N 2 Ĉ T N 1,N 2 Ĉ T N,N 2 (3.61), (3.62) With appropriate dimensions, the permutation P ( ) is applied to move the reduced subnetwork-1 (now defined by the reduced ĜN 1,N 1 and the corresponding matrix blocks from Ĉ) to the bottom and promote G N,N as the next (final) subnetwork for reduction. After the permutation, the active matrices are redefined: G := C := G N,N Ĝ1, [ ]..... G1 0.. := (3.63) 0 G ĜN 2,N ĜN 1,N 1 C N,N ĈN,1... ĈN,N 2 Ĉ N,N 1 ĈN,1 T Ĉ 1,1... Ĉ1,N 2 Ĉ 1,N 1 [ ]..... C1 C.. := 12 Ĉ1,N 2 T C T (3.64)... ĈN 2,N 2 12 C 2 Ĉ T N,N 2 Ĉ N 2,N 1 Ĉ T N,N 1 Ĉ T 1,N 1... ĈT N 2,N 1 Ĉ N 1,N 1 The procedure of Step for obtaining the reduced models M1 and M2 is repeated on the new system: ( ) with subnetwork-1 defined by G 1 := G N,N (the last SCC of G), and subnetwork-2 defined by G 2 :=blockdiag(ĝ1,1,... ĜN 1,N 1 ), each with the corresponding blocks from C. Thus, after N reduction steps the M1 and M2 models remain identical. In particular, after the appropriate permutation P ( ), the final reduced form is: Ĝ = Ĝ 1, Ĝ2, Ĝ3,3... 0, Ĉ = ĜN,N Ĉ 1,1 Ĉ1,2 Ĉ1,2 T Ĉ1,3 T Ĉ2,3 T. Ĉ T 1,N Ĉ 1,3... Ĉ1,N Ĉ 2,2 Ĉ2,3... Ĉ2,N Ĉ 3,3... Ĉ3,N. (3.65) Ĉ3,N T... ĈN,N Ĉ T 2,N

61 52 Reduction of multi-terminal R/RC networks Moment matching, passivity, terminal connectivity Theorem shows that the reduced model obtained based on the SCC(G) partitioning is the same as the PACT reduced model derived in Sect By Prop , the reduced model (3.65) thus matches the two multi-port admittance moments at s = 0 of the original (3.40). The consequence of Theorem is that all afore-mentioned properties of the PACT reduced model, such as passivity preservation and preservation of terminal connectivity hold for the partitioned-based reduced model as well Revealing path resistances An important analysis step during the design of interconnect structures is the computation of path resistances between terminals [80]. Here we show how to compute path resistances in a straightforward manner, for RC networks that are reduced with the SCC(G)-based partitioning method. First we define the path resistance of a general network. Then, we show that the multi-port RC reduction proposed in this chapter preserves the path resistance. Finally, we show how path resistances are computed from the reduced network. For a general multi-port resistor network, Kirchhoff s current law gives: Gx = Bu, (3.66) where the columns of B are the unit vectors B [:,i] = e i describing the incidence of current injections into terminals. Within the scope of this section, it is assumed that the conductance matrix G is invertible, which is the case if: (1) the network is grounded and one of the following holds: (2) the network is completely connected via resistors, in other words the graph defined by the non-zero pattern of G has one strongly connected component or (3) G is block diagonal and each block is itself invertible. The path resistance between two nodes of a circuit is defined as the ratio of voltage across the nodes to the current flow injected into them [28]. The path resistance from terminal i to terminal j is given by: r i j = (e i e j ) T G 1 (e i e j ), (3.67) where e i, e j are the i th and j th unit vectors respectively. grounded node, (3.67) becomes: If neither i nor j is the r i j = G 1 [i,i] + G 1 [ j, j] 2G 1 [i, j], (3.68) where G [i, j] is the entry in the i th row and j th column of G [not to be mistaken for G i,i denoting the i th strongly connected component of G as in (3.39)]. If j is the grounded

62 3.3 Partition-based reduction of RC networks 53 terminal, thus e j = 0, then (3.67) is simply: r ii = G 1 [i,i]. (3.69) A more compact form entering the computation of (3.67), (3.68) and (3.69) is the matrix of resistive paths: R := B T G 1 B. (3.70) As explained in [80], since G is positive semi-definite, in practice the computation of (3.70) is based on the Cholesky factorization [29] of G: G = LL T, Q = L 1 B, R = Q T Q. For the sake of capturing the contribution of all terminals to the resistive path computation, further-on we shall denote the matrix quantity (3.70) as the path resistance rather than (3.67). Let (3.66) be partitioned according to voltages measured at terminals x P and voltages measured at the internal nodes x I : [ ] [ ] [ GI G C xi 0 = G T C G P x P B P ], (3.71) where B P = I, the identity block corresponding to current injections into each of the x P terminal nodes. Then: [ G 1 I G 1 ] T [ ] [ = I G C GI 1 0 I G 1 ] 0 I 0 (G P G C GI 1 G C ) 1 I G C. (3.72) 0 I Plugging (3.72), into (3.70) gives: R = B T G 1 B = BP T (G P GT C G 1 I G C ) 1 BP T = (G P GT C G 1 I G C ) 1. (3.73) Notice that (3.73) is actually the inverse of the first moment at s = 0 of the multi-port admittance as shown in the proof of Prop , and also the inverse of the reduced conductance matrix of the PACT reduced model. This demonstrates that PACT reduction preserves the original path resistances between terminals. Theorem (3.3.1) showed that the PACT reduced model for an unpartitioned network is equivalent to the reduced model obtained via the SCC(G)-based partitioning, and as a consequence this will also preserve the path resistances. Recall however that path resistances are only computable if the inverse of G exists, which does not generally hold for RC circuits, due to the fact that not all nodes are connected to each other via resistors. If this is the case, then the conductance matrix G has more strongly connected components. The SCC(G)-based partitioning and reduction therefore provides an additional structural advantage: the path resistances between

63 54 Reduction of multi-terminal R/RC networks terminal nodes in the circuit become readily computable per component. The immediate consequence of partitioning the network according to the strongly connected components of G is that subnets are identified that are not connected to each-other via resistors. In matrix terms this corresponds to the block diagonal structure of (3.39). In other words there is no resistive path from nodes in one component to nodes in another component (there may be capacitive connections, but these do not enter the computation of path resistances). As reduction progressed per component, the reduced model will retain the SCC(Ĝ) on the block diagonal, as seen from (3.65). The path resistances for an RC network (3.39) where G has N strongly connected components can thus be computed per component (after appropriate grounding) as follows: R i = B T i G 1 i,i B T i = (3.74) = Ĝ 1 i,i, i = 1... N. (3.75) The last equality in (3.74) follows from (3.73) where G i,i, B i are split as in (3.71), and Ĝ i,i = G ip Gi T C G 1 i I G ic are the diagonal blocks from the reduced model (3.65). The computation of path resistances is performed for the example of Sect Computational complexity With the SCC(G)-based reduction being mathematically equivalent to PACT, there remains to quantify the computational advantage of the former compared to the latter. Consider the original model before the first reduction step (3.41), and the corresponding reducing projection (3.43), and note that only the submatrices of G 1 enter in forming the projection. This holds during the reduction of each component, whose computational cost is determined by forming G 1 R G S. Assuming that each component i = 1... N, has m i internal nodes and p i terminals, the corresponding sub-block dimensions are G R R (m i m i ), G S R (p i p i ). The reduction cost per component is thus O(m α i i p i ), where typically 1 < α i 2 for circuit matrices [75]. For a total of N components, the total cost is O( N i=1 m α i i p i ). For an unpartitioned circuit with n internal nodes and p terminals, the cost of PACT is O(m α p), but after the SCC(G) partitioning one attains m i < m and p i < p. Therefore especially when m and p are large, the cost for the component-wise reduction will be smaller than the O(m α p) of PACT. The large examples from Sect confirm this in practice Improving sparsity per subnetwork The partitioning according to the SCCs of G provides structural and computational advantages, as smaller subnets are reduced individually, with the appropriate reduction

64 3.4 Numerical results 55 of the capacitive blocks connecting the subnets. This block-wise reduction alone however does not directly minimize fill-in: circuit partitioning according to the SCCs(G) only identifies sub-networks to be reduced individually, but has no immediate effect on the sparsity of the reduced model. As seen from Theorem 3.3.1, the PACT reduction per subnet is mathematically equivalent to applying PACT (see Sect ) directly on the unpartitioned network. Therefore reduced models obtained with the SCC(G)-based partitioning will have the same fill-in as otherwise obtained from applying PACT without partitioning. Nevertheless, overall sparsity can be improved in this framework as well by applying fill-reducing orderings per subnet, as described next. Reconsider the partitioning (3.35), where subnet-1 is to be reduced. Applying a constrained approximate minimum degree reordering (CAMD) [1,2] on the graph G1 := nzp(g 1 + C 1 ) will reorder the nodes x 1 so that fill-in is minimized during the PACT reduction of subnet-1 [where nzp stands for the non-zero pattern of a matrix]. In particular, the special internal nodes of subnet-1 are identified, which if eliminated would introduce the most fill-in in Ĝ1 and Ĉ1. Consequently, these internal nodes are preserved and promoted along with the terminals to form the selected nodes x 1S of subnet-1. The PACT reduction per subnet-1 follows as in Sect The effect of the CAMD reordering per subnet is demonstrated in Example As the CAMD reordering is applied only to the individual graphs G i := nzp(g ii + C ii ) of each subnet, the corresponding ordering of the capacitive connection blocks C i j, i = j between the subnets however is not minimizing their fill-in. Consequently the reduced Ĉ i j blocks would be too dense if the C i j blocks are large. Especially for circuits with p > 100, sparsity is better improved by considering the graph G associated with the entire RC topology (i.e. by considering all R and the C connections from the start). While the permutation according to the strongly connected components of G := nzp(g) revealed a partitioning with G in block diagonal form (with no R communication between blocks), the extension to the general case is the BBD-based partitioning of G := nzp(g + C) from chapter 4. The methodology therein achieves better sparsity levels and is thus recommended for reducing very large circuits with terminals exceeding thousands. 3.4 Numerical results A selection of circuits from the electronics industry was reduced with the SCC-based partitioning proposed here. The examples shown have relatively few terminals p = O(10 2 ), thus suitable for this framework, as sparsity preservation is not of major concern. Rather, the examples demonstrate the performance of the SCC-based reduction strategy in terms of approximation quality, preservation of terminal connectivity, synthesis and re-simulation. As in chapter 4, the reduced netlists are obtained by unstamping the reduced model (3.65) with RLCSYN [93]. In the following tables, several parameters are recorded: p-number of terminals, n(k)-number of internal nodes in the original (reduced) circuit respectively, P nk = 100(n k) n -the percentage reduction in internal nodes

65 56 Reduction of multi-terminal R/RC networks,#r-number of resistors, P R = 100(#R orig #R red ) #R orig -the percentage reduction/increase in the number of resistors, #C-number of capacitors, P R = 100(#C orig #C red ) #C orig -the percentage reduction/increase in the number of capacitors, CPUt- CPU time simulation time for the original and reduced circuits, Speed-up = CPUt orig CPUt red. It should be noted that the recorded #C quantities for the reduced circuits also include possibly generated negative capacitance values. For reasons explained in Sect , these are kept in the reduced netlist Low Noise Amplifier (LNA) circuit (CMOS045) Three RC parasitic extracted models (TL 1,2,3 ) of the LNA circuit are reduced according to Sect Belonging to the same family, the emphasis is placed here on the first, TL 1. The original G and C matrices are shown in Fig. 3.1, while in Fig. 3.2 they are permuted and partitioned according to the SCCs(G) [24 components]. These subnetworks are visible in the diagonal blocks of Ĝ. The reduced Ĝ and Ĉ retain this structure (see Fig. 3.3). The reduced block diagonals of Ĝ (and correspondingly of Ĉ) are entirely associated with the preserved terminal nodes from each subnet (no internal nodes are preserved). Table 3.1 collects the reduction and re-simulations results of the 3 circuits. As no fill-in minimizing node reorderings were used, all nodes were eliminated except the terminals (100% reduction rate in internal nodes). From the element reduction rates P R and P C, it is clear that the reduced circuits contain much fewer circuit elements than the original. The benefit of reduction is immediately reflected in the tremendous speed-ups attained, when the reduced circuits were re-simulated compared to the original simulations. The reduction time itself is very small (below 1s), and the re-simulation time for the reduced circuits is also more than 100 times smaller than the original simulations. Even more, simulating the original TL 3 circuit was only possible after reduction (the original simulation failed in finding a DC solution, possibly due to the ill-conditioning of the underlying matrices). The simulations consisted of the following analysis: AC, Noise, SP (S-parameter), PSS (periodic steady state). Figures compare, for several simulation types, the responses of the original vs. the reduced circuit. These match perfectly up to very high frequencies. Figures show the waveforms obtained from simulating circuit TL 3 after reduction. Even though the original simulation failed due to a convergence error in computing the DC solution, the reduced simulation ran without errors. Without a reference solution to compare the reduced simulation against, these responses were nevertheless appreciated by the circuit designers to follow the desired behavior. Furthermore, the accuracy of the results obtained from reducing TL 1 and TL 2 (of the same family as TL 3 ) predicts desirable performance from reducing TL 3.

3.4 Numerical results 57 Figure 3.1: Example 3.4.1-TL 1 : original, unordered G (left) and C (right) matrices. Figure 3.2: Example 3.4.1-TL 1 : G (left) and C (right) reordered according to SCCs(G).

66 3.4 Numerical results 57 Figure 3.1: Example TL 1 : original, unordered G (left) and C (right) matrices. Figure 3.2: Example TL 1 : G (left) and C (right) reordered according to SCCs(G). Figure 3.3: Example TL 1 : reduced Ĝ (left) and Ĉ (right). Note the preserved block diagonal structure of Ĝ, showing each reduced strongly connected component Mixer circuit The mixer circuit (layout in Fig. 3.8) example shows how using fill-in minimizing node reorderings (CAMD) on each subnetwork can improve the sparsity of the reduced model, by identifying special internal nodes which are not eliminated. Table 3.2 collects the reduction results for the two strategies: method 3.3.5, where CAMD reorders the nodes in each subnet for minimum fill-in, and method (without reorderings). The reduced models were 2 times faster to simulate than the original.

67 58 Reduction of multi-terminal R/RC networks Table 3.1: Reduction summary for Example TL 1 TL 2 TL 3 Orig. Red. Orig. Red. Orig. Red. p n k P nk 100% 100% 100% #R P R 99.8% 99.8% 99.8% #C P C 91.3% 92.4% 89.5% Reduction time 0.44 s 0.39s 0.46 s Sim. CPUt AC s 0.12 s s NA s Noise s 0.13 s s NA 0.14 s SP s 0.21 s s NA 0.22 s PSS s 2.93 s s NA 3.05 s Speed-up > 270x > 127x Table 3.2: Reduction summary for Example Orig. Red Perc. Red p Perc. n k % 0 100% #R % % #C % % Sim. CPUt Speed Up CPUt Speed Up QPSS s s 2x 13.2 s 2x QPSP 1735 s 754 s 2x 746 s 2x Fig. 3.9 shows how the minimum fill-in is monitored during one subnet reduction (here, subnet-11): the x-axis shows the number of the node to be eliminated, after the reordering of nodes based on CAMD(G 11 + C 11 ) ; the y-axis shows the #R + #C as node elimination progresses. The red circle shows the minimum value of circuit elements min(#r + #C), here attained after eliminating node 8. Further eliminating node 9 would significantly increase the #R + #C in the reduced subnet-11. Thus node 9 of subnet-11 is not eliminated. Tracking the fill-in per subnet in this manner has identified the total k = 13 internal nodes preserved in the final reduced model (see Table 3.2, first column). The improvement in sparsity is reflected in the fact that reduced model Red has fewer circuit elements than model Red

68 3.4 Numerical results 59 Figure 3.4: TL 1 : AC analysis - node in : magnitude. Comparison: original vs. reduced, together with the approximation error which is very small (difference between original and reduced curves is around 0dB). Figure 3.5: TL 1 : PSS analysis, time domain - node out. Comparison: original vs. reduced Interconnect structure The RC extracted parasitics of four interconnect structures, each with p = 12 terminals, were reduced with the proposed strategy, with results recorded in Table 3.3. The reduction rates for nodes and circuit elements are above 95%, and the reduction time is in the order of 1s. Two types of analysis are required, an AC simulation and the computation of path resistances R path between terminals. The AC analysis is used to compare the response of the original netlist to the reduced netlist for a large frequency range. As seen from Fig. 3.10, these match perfectly.

60 Reduction of multi-terminal R/RC networks Figure 3.6: TL 3 : PSS: node Vse2lse 0:2. Figure 3.7: TL 3 : Noise: NF. Figure 3.8: Example 3.4.2: mixer circuit layout. Table 3.

69 60 Reduction of multi-terminal R/RC networks Figure 3.6: TL 3 : PSS: node Vse2lse 0:2. Figure 3.7: TL 3 : Noise: NF. Figure 3.8: Example 3.4.2: mixer circuit layout. Table 3.3: Reduction summary for Example ICL 1 ICL 2 ICL 3 ICL 4 Orig. Red. Orig. Red. Orig. Red. Orig. Red. p n k P nk 99.9% 99.9% 99.9% 99.9% #R P R 99.8% 99.3% 99.9% 99.9% #C P C 98.3% 97.5% 99.9% 99.9% Reduction time 0.06 s 0.05 s 1.2 s 0.8 s Numerical solution CPU time R path s 0 s s 0 s 0.19 s 0 s 0.16 s 0 s

70 3.4 Numerical results 61 Figure 3.9: Example Monitoring the fill-in generated during one subnet reduction. The minimum value of #R + #C is attained after eliminating node 8, while node 9 is preserved. Figure 3.10: Example AC analysis for original and reduced netlists ICL1 showing a perfect match. While an AC analysis is an operation performed with the circuit simulator (here, Spectre [16]), the computation of path resistances is sometimes not a direct functionality of

71 62 Reduction of multi-terminal R/RC networks commercial tools. This however is easily performed numerically via the partitioned conductance matrix, as described in Sect For ICL 3, the original matrices are shown in Fig The SCC(G) partitioning reveals six components, each with only two terminals. As seen in Fig. 3.12, the reduced model preserves the six components, each component having only two nodes (i.e. the terminal nodes) as expected. The path resistance between each pair of terminals is computed using formula (3.74) for each component, after appropriate grounding. For this example actually, these path resistances can be simply read-off from the upper diagonal of Ĝ. Note in Table 3.3 that the time for computing the path resistances for the original net via (3.74) is very small, and close to the time required to compute them for the reduced net via (3.75). This is due to the fact that the inverses in (3.74), (3.75) are formed via the backslash operator (mldivide) in Matlab [which itself exploits the sparsity of G and performs a sparse LL T factorization based on AMD reordering [67]] hence turns out to be very efficient, especially as the number of columns of B is small (only 12 terminals). As has been already emphasized in [80], DC problems such as the computation of path resistances can often be solved efficiently as solutions to sparse linear systems involving the G matrix directly (and a network reduction is unnecessary). In contrast, simulations requiring a sweep over a large frequency range (such as an AC analysis), require that solutions to linear systems involving a denser problem (G + s i C) 1 B are computed for each frequency point s i. These repeated operations are inefficient especially when the number of terminals exceeds thousands, hence the need for reduced networks Very large networks Two very large networks [part of a phase locked loop (PLL) and a receiver design (RX) respectively] with more than 10 5 internal nodes and 10 3 terminals were reduced based on the SCC(G) partitioning, with results recorded in Table 3.4. Due to their large dimension, the direct PACT reduction fails due to insufficient computational resources in forming (4.5)-(4.7). The block-wise reduction of subnets based on the SCC(G) partitioning in contrast was able to reduce these networks within an order of minutes, which is explained by the analysis in Sect Reduction rates of 90% in internal nodes, resistors and capacitors were thus achieved. 3.5 Concluding remarks The reduction of multi-terminal R and RC networks which share a common projection matrix was addressed in this chapter. Based on the properties of the conductance matrix G underlying these circuits, and the special structure of the projection matrix, it was shown that the reduced models obtained in this framework have only positive resistors. A partition-based implementation for multi-terminal RC reduction was also derived,

3.5 Concluding remarks 63 Figure 3.11: Example 3.4.3. Original G (left) and C (right) matrices. Figure 3.12: Example 3.4.3. Reduced Ĝ (left) and Ĉ (right) matrices.

72 3.5 Concluding remarks 63 Figure 3.11: Example Original G (left) and C (right) matrices. Figure 3.12: Example Reduced Ĝ (left) and Ĉ (right) matrices. The strongly connected components of Ĝ appear as blocks on the diagonal. These are directly used to compute the path resistances (3.74). Table 3.4: Reduction summary for Example PLL Receiver Orig. Red. Orig. Red. p n k P nk 99.9% 99.9% #R P R 99.8% 99.1% #C P C 89% 93.3% Reduction time SCC(G)-based 42.4 s 760s Reduction: PACT NA NA

73 64 Reduction of multi-terminal R/RC networks based on the strongly connected components of the conductance matrix G. A proof on the mathematical equivalence between the unpartitioned versus partitioned approach was provided. This ensures that the two approaches share the same moment matching, passivity preservation and terminal reconnectivity properties. The partition provides additional structural advantages, such as a simple solution for computing path resistances in a network component-wise. Numerical experiments show that the reduced circuits thus obtained contain much fewer elements than the original and attain significant speed-ups in re-simulation. The methods in this chapter inspired the development of the sparsity preserving RC reduction methodology of Chapter 4, especially suited for circuits with terminals exceeding thousands.

74 Chapter 4 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks A novel model order reduction (MOR) method for multi-terminal RC circuits is proposed: SparseRC. Specifically tailored to systems with many terminals, SparseRC employs graph-partitioning and fill-in reducing orderings to improve sparsity during model reduction, while maintaining accuracy via moment matching. The reduced models are easily converted to their circuit representation. These contain much fewer nodes and circuit elements than otherwise obtained with conventional MOR techniques, allowing faster simulations at little accuracy loss. 4.1 Introduction During the design and verification phase of VLSI circuits, coupling effects between various components on a chip have to be analyzed. This requires simulation of electrical circuits consisting of many non-linear devices together with extracted parasitics. Due to the increasing amount of parasitics, full device-parasitic simulations are too costly and often impossible. Hence, reduced models are sought for the parasitics, which when re-coupled to the devices can reproduce the original circuit behavior. Parasitic circuits are very large network models containing millions of nodes interconnected via basic circuit elements: R, RC or RLC(k). Of the circuit nodes, a special subset form the terminals, which are the designer specified input/output nodes and the nodes connecting the parasitics to the non-linear devices. Parasitic networks with millions

75 66 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks of nodes, RC elements, and thousands of terminals are often encountered in real chip designs. A reduced order model for the parasitics ideally has fewer nodes and circuit elements than the original, and preserves the terminal nodes for re-connectivity. The presence of many terminals introduces additional structural and computational challenges during model order reduction (MOR). Existing MOR methods may be unsuitable for circuits with many terminals as they produce dense reduced models. These correspond to circuits with fewer circuit nodes, but more circuit elements (Rs, Cs) than the original circuit, and may even require longer simulation times than originally. Furthermore, if terminal connectivity is affected, additional elements such as current/voltage controlled sources must be introduced to appropriately model the re-connection of reduced parasitics to other devices. The emerging problem is to develop efficient model reduction schemes for large multiterminal circuits that are accurate, sparsity preserving and also preserve terminal connectivity. The method proposed here, SparseRC, achieves these goals by efficiently combining the strengths of existing MOR methodology with graph-partitioning and fill-reducing node reordering strategies, achieving tremendous reduction rates even for circuits with terminal numbers exceeding thousands. Reduced RC models thus obtained are sparser than those computed via conventional techniques, have shorter simulation timings, and also accurately approximate the input/output behavior of the original RC circuit. In addition, the reduced RC parasitics can be reconnected directly via the terminal nodes to remaining circuitry (e.g. non-linear devices), without introducing new circuit elements. A comprehensive coverage of established MOR methods is available in [4], while [82] collects more circuit simulation specific contributions. Mainly, MOR methods are classified into truncation-based (modal approximation [79]/balancing [77]) and Krylovbased methods, from which we mention PRIMA [71], SPRIM [27], or the dominant spectral zero method [44] as they are passivity preserving 1. Generally however, the applicability of traditional MOR techniques to very large circuits with many terminals is limited due to computational limitations together with the afore-mentioned sparsity and re-connectivity considerations. While the multi-terminal problem has been addressed in numerous works such as [24], [13] [95], [59], it is usually less clear whether their performance scales with the number of ports, especially as this exceeds thousands. Recent developments in model reduction for very large multi-terminal R-networks were achieved in [80] (denoted here as ReduceR), which uses graph theoretical tools, fill-in minimizing node reorderings and node elimination to obtain sparse reduced R-networks. Towards obtaining sparse reduced models for multi-terminal RC(L) networks, the Sparse implicit projection (SIP) method [94] also proposes reordering actions prior to eliminating unimportant internal nodes, and makes several important analogies between related node elimination-based methods (e.g. TICER [84], and [22]) and moment-matching MOR by projection (e.g. PRIMA [71]). In fact, the fundamental projection behind SIP can be traced back in the PACT methods [56, 57] for reducing multi-terminal RC(L) networks. 1 Only passive reduced order models guarantee stable results when re-coupled to other circuit blocks in subsequent simulation stages [71].

76 4.1 Introduction 67 As will be shown, SparseRC combines the advantages of ReduceR and SIP/PACT into an efficient procedure, while overcoming some of their computational limitations: using graph partitioning, circuit components are identified which are reduced individually via a PACT-like projection [denoted here as the extended moment matching projection, (EMMP)] while appropriately accounting for the interconnection between components. The reduction process is simplified computationally, as smaller components are reduced individually. The final SparseRC reduced circuit matches by default two moments at DC of the original circuit s multi-port admittance, and can be extended with dominant poles [79] or additional moments to improve accuracy at higher frequency points if needed. Through partitioning, the relevant nodes responsible for fill-in are identified automatically; SparseRC preserves these along with the terminals to ensure the sparsity of the reduced model. This feature makes SparseRC more efficient than ReduceR or SIP: it avoids the unnecessary computational cost of monitoring fill-in at each node elimination step. A related method is PartMOR [70], which is based on the same partitioning philosophy, but constructs the reduced models in a different manner. PartMOR realizes selected moments from each subnet into a netlist equivalent, while SparseRC is implemented as block moment matching projection operating on the matrix hierarchy which results from partitioning. This construction enables SparseRC to match admittance moments per subnet as well as for the recombined network. With global accuracy thus ensured, the approximation quality of the final SparseRC reduced model is guaranteed irrespective of the partitioning tool used or the number/sizes of partitions. This work focuses on RC reduction. For RC, a reducing transformation which matches multi-port admittance moments is sufficient to ensure accuracy and can be so constructed as to improve sparsity. For RLC however, additional accuracy considerations have to be accounted for as to capture oscillatory behavior. Hence, constructing a projection which simultaneously ensures accuracy and sparsity is more involved. RLC circuits can be partitioned with the same framework using a second order susceptance based system formulation which reveals the network topology [93]. On the other hand, the dense nature of inductive couplings for RLCK circuits may prevent finding a good partition. These topics are treated in Chapter 5; for an existing RLC partitioned-based approach we refer to PartMOR [70]. The rest of the chapter is structured as follows. Sect. 4.2 formulates the multi-terminal model reduction problem. The SparseRC partitioning based strategy is detailed in Sect. 4.3, the main focus of the chapter. Numerical results and circuits simulations are presented in Sect Sect. 4.5 concludes. Some conventions on notation and terminology follow next. Matrices: G and G are used interchangeably for the conductance matrix, depending on whether the context refers to unpartitioned or partitioned matrices respectively (similarly for the capacitance matrix C, C or the incidence matrix B, B). Graphs: G (non-bold, non-calligraphic) is a graph associated with the non-zero pattern of the circuit matrices, C (non-bold, non-calligraphic) is a component of G, nzp is the non-zero-pattern of a matrix, i.e. its graph topology. Dimensions: p-number of circuit terminals (external nodes), the same for the original and reduced matrices/circuit, n-

77 68 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks the number of internal nodes of the original circuit, k-the number of internal nodes of the reduced circuit, N-number of matrix partitions. Nodes: circuit nodes prior to partitioning are classified into terminals, and internal nodes; separator nodes are a subset of the original nodes identified through partitioning as communication nodes among individual components. Terminology: a partition/subnet/block describes the same concept: an individual graph/circuit/matrix component identified from partitioning; similarly, a separator, border is a component containing only separator nodes. 4.2 Problem formulation This section provides the preliminaries for model reduction of general RC circuits and identifies the challenges emerging in multi-terminal model reduction. The building block for SparseRC is described: EMMP, an extended moment matching projection for reducing multi-terminal RC circuits Model reduction Similarly to [56], consider the modified nodal analysis (MNA) description of an RC circuit: (G + sc)x(s) = Bu(s), (4.1) where MNA matrices G, C are symmetric, non-negative definite, corresponding to the stamps of resistor and capacitor values respectively. x R n+p denote the node voltages (measured at the n internal nodes and the p terminals) and n + p is the dimension of (6.1). u R p are the currents injected into the terminals. The outputs are the voltage drops at the terminal nodes: y(s) = B T x(s). The underlying matrix dimensions are: G, C R (n+p) (n+p), B R (n+p) p. In model reduction, an appropriate V R (n+p) (k+p), k 0 is sought, such that the system matrices and unknowns are reduced to: and satisfy: (Ĝ + sĉ) x(s) = Bu(s). Ĝ = V T GV, Ĉ = V T CV R (k+p) (k+p) B = V T B R (k+p) p, x = V T x R k+p The transfer function H(s) = B T (G + sc) 1 B characterizes the system s behavior at the input/output ports (here, at the terminal nodes) over the frequency sweep s. After reduction, this becomes: Ĥ(s) = B T (Ĝ + sĉ) 1 B. A good reduced model/circuit generally satisfies the following:

78 4.2 Problem formulation 69 (a) gives a small approximation error H Ĥ in a suitably chosen norm, for instance by ensuring that Ĥ matches moments of the original H at selected frequency points, (b) preserves passivity (and stability implicitly) and, (c) can be computed efficiently. For multi-terminal circuits especially, new conditions emerge: (d) for reconnectivity purposes, the incidence of current injections into terminal nodes is preserved (i.e. B is a submatrix of B) and (e) Ĝ and Ĉ are sparse. SparseRC is a multi-terminal RC reduction method which meets targets (a)-(e), as will be shown Multi-terminal RC reduction with moment matching The extended moment matching projection (EMMP) is a moment matching reduction method for multi-terminal RC circuits derived from PACT [56] (and conceptually similar to SIP [94]). Being suitable for multi-terminal RC circuits with relatively few terminals only, this projection will be applied, after partitioning, in a block-wise manner inside SparseRC, as described in Sect The description here covers only material from [56] that is relevant for SparseRC. Original circuit model (6.1): G, C R (n+p) (n+p), B R (n+p) p Recalling (6.1), let the nodes x be split into selected nodes x S (terminals and separator nodes 2 ) to be preserved, and internal nodes to be eliminated x R, revealing the following structure: ([ GR G K G T K G S ] [ ]) [ ] [ ] CR C + s K xr 0 CK T = u. (4.2) C S x S B S Note that [56] uses a simple block separation into purely terminal nodes x S and internal nodes x R. Promoting separator nodes along with terminals inside x S will ultimately positively influence the sparsity of the reduced model. The congruence transform applied to (4.2), X T GX, X T CX, X T B, where [56]: X = 2 See Sect. 4.1 and Sect for the definition of separator nodes [ I G 1 R G ] K, x = X T x, yields : (4.3) 0 I

79 70 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks [ GR 0 0 G S ] + s C R C T K C K C S [ xr x S ] [ 0 = B S ] u (4.4) where: G S = G S G T K G 1 R G K, W = G 1 R G K (4.5) C S = C S + W T C R W + W T C K + CK T W, (4.6) C K = C K + C R W. Expressing x R in terms of x S second gives: from the first block-row of (4.4), and replacing it in the [(G S + sc S }{{} ) s 2 C T K (G R + sc R ) 1 C K]x S }{{} = B S u. Y S (s) Y R (s) The expression (4.7) represents the circuit s multi-port admittance, defined with respect to the selected nodes x S. Y (s) captures the first two multi-port admittance moments at s = 0 [56]. This is formalized as Proposition Proposition For a multi-terminal RC circuit of the form (4.2), the first two moments at s = 0 of the multi-port admittance are given by G S and C S from (4.5), (4.6). Proof (Proof of Prop ) Expressing x R in terms of x S from the first equation of (4.2), and replacing it the second gives the circuit s multi-port admittance: Y(s)x S =B S u, where (4.7) Y(s)=(G S +sc S ) (G K +sc K ) T (G R +sc R ) 1 (G K +sc K ) The first 2 moments of Y(s) at s = 0 are computed from Y(s) s=0 and dy ds (s) s=0 : Y(s) s=0 = G S G T K G 1 R G K = G S, dy ds (s) = C S CT K (G R +sc R ) 1 (G K +sc K ) (G K +sc K ) T (G R +sc R ) 1 C K + + (G K +sc K ) T (G R +sc R ) 1 C R (G R +sc R ) 1 (G K +sc K ) dy ds (s) s=0 = C S C T K G 1 R G K GT K G 1 R C K +GT K G 1 R C R G 1 R G K = C S, the same as G S and C S from (4.5), (4.6). The practical consequence of Prop is that, as with ReduceR, the path resistance of the original circuit is precisely (4.5) and, as shown next, is preserved by the reduced

80 4.2 Problem formulation 71 model. In addition to the path resistance, the slope of an RC circuit s response is captured by the second moment, namely (4.6). Reduced circuit model: Ĝ, Ĉ R(k+p) (k+p), k 0 By Prop , the reduced model which preserves the first two admittance moments of the original (4.2) is revealed: eliminate nodes x R (and the contribution Y R ) and retain nodes x S. The corresponding moment matching projection is obtained by removing from X of (4.3) the columns corresponding to x R : V = [ G 1 R G ] K, (4.8) I Ĝ = V T GV = G S, Ĉ = VT CV = C S, (4.9) B = V T B = B S, x = V T x = x S (4.10) For simplicity the reducing projection V from (4.8) shall be referred to further-on as the extended moment matching projection (EMMP). The term extended denotes that moments are matched of the multi-port admittance defined by terminals and the preserved internal nodes, rather than, as in PACT, by terminal nodes only. In other words, EMMP is the extension of the original projection from PACT [56] or SIP [94] to include the separator nodes. On the singularity of G Conductance G and capacitance C matrices describing parasitic RC circuits in MNA form are often singular, thus one must ensure that the EMMP projection (4.8) inverts only non-singular G R blocks. This is easily achieved by exploiting the properties of MNA matrices (e.g., definiteness, diagonal dominance), and a simple grouping of nodes so that internal nodes (i.e. rows/columns) responsible for the singularity of G are excluded from G R (and promoted to G S ) without any accuracy loss. Lemma formalizes these actions. Similar actions for ensuring the invertibility of G R are detailed in [56]. Lemma From the singular conductance matrix G underlying the MNA circuit equations (4.2), an invertible sub-block G R can always be found. Proof Towards showing the invertibility of G R, an analogy is first emphasized between the circuit s topology and the properties of the G, C matrices describing the MNA equations (6.1). G and C correspond to ungrounded 3 circuits and consequently the matrix pencil (G, C) is singular 3 A ground node is only chosen in the actual circuit simulation phase, depending on the simulation requirements and how the RC parasitics are connected to other circuitry. The ground node is thus interpreted

81 72 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks (the vector of all 1 s form their column null-space). Therefore traditional MOR approaches which assume that (G, C) is a regular pencil describing a grounded circuit cannot be directly applied in this context. SparseRC however can handle singular (G, C) matrix pencils, as long as the G R block involved in the EMMP projection is invertible, as is shown next. Matrices G, C underlying an ungrounded circuit have the following properties: G i,i 0, C i,i 0, G i, j 0, C i, j 0 for i = j. Furthermore they are diagonally dominant satisfying G i,i + Σ i = j G i, j = 0, C i,i + Σ i = j C i, j = 0. In other words they are symmetric, positive-semi-definite Z-matrices, with positive diagonal entries, negative (or 0) off-diagonal entries, and possibly rows/columns entirely with 0 s 4. As an illustrative example, the circuit in Fig. 4.1 with nodes ordered according to (4.2) has the MNA representation (4.11). In general G R, being a sub- g 1 +g 2 +g 3 +g 4 +g 6 g 4 g 1 g 2 g 3 g 6 g 4 g g 1 0 g g g g g 3 +g 5 g 5 g g 5 g 5 +g 6 }{{} c 1 +c c 2 0 c 1 0 c 3 0 c s c 2 c 3 0 c 3 +c c c 1 G } {{ } C V 3 (s) V 4 (s) V 1 (s) V 2 (s) V 5 (s) V 6 (s) }{{} x + (4.11) = i i i i }{{}}{{} u B block of G will also have positive diagonal entries, negative (or 0) off-diagonals and possibly 0 rows/columns. The structural singularity of G R due to the 0 rows/columns is easily removed by promoting the empty rows/columns to the G S block. Assuming this procedure performed, the following scenarios apply for the remaining structure of G R (as a sub-block of G): 1. G Ri,i > Σ i = j G Ri, j, i.e., G R is strictly diagonally dominant thus invertible 2. G R may still contain some (not all) rows/columns whose sum is 0. In this case G R satisfies the following property: the diagonal entries of G R are positive and there exists a diagonal matrix D such that G R D is strictly diagonally dominant [assume w.l.o.g row i is such that G Ri,i + Σ i = j G Ri, j = 0 and G R j, j Σ j =k G R j,k for all other rows j = i; then one as a terminal and must be preserved after reduction. Grounding the netlist before MOR would fix a particular node, remove the corresponding row and column from G, C and destroy connectivity via this node, which is undesirable in practice. 4 A 0 row/column in G (C respectively) denotes a node to which no resistor (or capacitor respectively) is connected, which is common in practice.

82 4.2 Problem formulation 73 can take D = diag(i i = j, g l ) for some arbitrary g l > G i,i ]. This is equivalent to G R invertible [25]. 3. G R contains only row/columns whose sum is 0 (thus still singular). In this case the corresponding x R nodes are eliminated for free (since the corresponding G K = 0 and the two moments at s = 0 of the multi-port admittance are preserved automatically). Thus no inversion of G R is required. The G R block thus remains invertible. Singularity in C can be treated similarly, however not needed in this chapter. Reduction via the EMMP already meets some of the challenges defined at the beginning of Sect. 4.2: (a) two multi-port admittance moments are preserved irrespective of the separation level of x into x R and x S, provided that x R are internal nodes (thus the incidence matrix B R = 0); this ensures that accuracy is maintained via moment matching also when EMMP is later applied in the partitioned framework (see Sect ), (b) passivity is preserved [56], as V is a congruence transformation projecting the original positive semi-definite matrices G and C into reduced matrices Ĝ, Ĉ which remain positive semi-definite, and (d) the input/output incidence matrix B S remains un-altered after reduction; consequently, the reduced model can be reconnected directly via the terminal nodes to remaining circuitry (e.g. non-linear devices), without introducing new circuit elements such as controlled sources. The efficiency (c) and sparsity (e) considerations however are not met by EMMP alone when the circuits to be reduced have nodes, circuit elements, and terminals exceeding thousands. On one hand, constructing G 1 R G K is either too costly or unfeasible, on the other the Ĝ, Ĉ resulting from (4.9) may become too dense Fill-in and its effect in synthesis Usually, G and C describing circuits from real chip designs are large and sparse, while the Ĝ and Ĉ as obtained from (4.9) are small, but dense. Furthermore, they are only mathematical constructions, thus a synthesis procedure is required to convert the reduced matrix representation back into an RC netlist. This is obtained by unstamping the non-zero entries of Ĝ and Ĉ into the corresponding resistor and capacitor topology respectively, while B (being a sub-matrix of B S ) is mapped directly into the original current injection at terminals. Unstamping is done via RLCSYN [93]. The dimension of Ĝ and Ĉ gives the number of nodes, while the number of their non-zero entries dictates how many Rs and Cs are present in the reduced netlist. Therefore, while limiting the size of Ĝ and Ĉ, it is critical to also ensure their sparsity. The simple example in Fig. 4.1 compares two reduced netlists derived from a small circuit. The dense reduced model has fewer nodes but more R, C elements than the

83 74 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Original circuit p = 4, n = 2 # R = 6 # C = 3 C2 Current injection Terminal Internal node 2 C3 1 R1 3 R4 4 R3 C1 5 6 R5 Identify nodes that cause fill-in Eliminate all internal nodes Reduction comparison Preserve special internal nodes Dense MOR 1 C1 R6 6 C2 C5 Terminals: p = 4 2 R3 C4 5 C6 No internal nodes: k = 0 # R = 6 # C = 6 C3 Sparse MOR 1 3 R1 R6 6 C2 C1 R5 R2 Terminals: p = 4 Int. node preserved: k = 1 # R = 5 # C = Figure 4.1: Top: RC circuit to be reduced, containing p = 4 terminals and n = 2 internal nodes. Node 3 is a special internal node with many connections to other nodes. Bottom left: a dense reduced model, where all internal nodes (3 and 4) were eliminated, but more circuit elements are generated. Bottom right: a sparse reduced model (with fewer circuit elements) obtained from keeping node 3 and eliminating only node 4. original, while the sparse reduced model has both fewer nodes and R, C elements. The sparse model was obtained by preserving a node which would introduce fill-in if eliminated. Naturally, identifying such nodes by inspection is no longer possible for very large circuits. Next, it is explained how avoiding fill-creating nodes is possible in practice using reordering techniques.

84 4.3 SparseRC reduction via graph partitioning 75 Improving sparsity with node reorderings At the basis of sparsity preserving MOR lies the following observation: the congruence transform X from (4.3) is analogous to a partial Cholesky factorization [29] of G [94]. Just as fill-reducing matrix reorderings are used for obtaining sparser factorizations, so can they be applied for sparsity preserving model reduction. These lie at the heart of ReduceR [80] and SIP [94], where the idea is to pre-order the matrix for instance with Constrained Approximate Minimum Degree (CAMD) [1, 2], so that nodes responsible for fill-in are placed towards the end of the elimination sequence, along with the terminals. By eliminating the nodes one by one and keeping track of the fill-in generated at each step, the circuit with the fewest number of elements can be determined. For circuits with challenging topologies however (i.e. with more internal nodes, terminals or circuit elements), these actions may be either too costly or even unfeasible (see the results in Sect ). SparseRC avoids such hurdles by exploiting graph partitioning and an appropriate matrix structure which allow for separator (fill-creating) nodes to be identified and skipped automatically in the reduction process, thus ensuring a desirable level of sparsity. The following sections show how SparseRC, building upon the EMMP in combination with graph-partitioning and sometimes additional fill-reducing orderings, maintains the (a),(b),(d) and in addition meets the (c) efficiency and (e) sparsity requirements. These are crucial for successfully reducing very large networks with many terminals arising in industrial problems. 4.3 SparseRC reduction via graph partitioning The analogy between circuits and graphs is immediate: the circuit nodes are the graph vertices, while the connections among them via R, C elements form the edges in the graph. For very large multi-terminal circuits to be manageable at all with limited computational resources, a global partitioning scheme is proposed, visualized with the help of Fig Essentially, an original large-multi terminal circuit is split into subnetworks which have fewer nodes and terminals, are minimally connected among each-other via separator nodes, and are reduced individually up to the terminals and separator (cut) nodes. SparseRC reduces each subnet individually, up to terminals and separator nodes. As all separator nodes are automatically preserved, sparsity is improved. 5 The two-way partitioning is presented here for simplicity; a natural extension is partitioning into N > 2 subnets.

85 76 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks p terminals n nodes, p terminals p1 terminals, k1 cut nodes n1 nodes Graph partitioning p2 terminals, k2 cut nodes Few connections between subnetworks n2 nodes Fewer #nodes & #terminals per subnetwork Terminals Cut nodes Figure 4.2: Graph partitioning with separation of terminals Partitioning and the BBD form Implemented as a divide and conquer reduction strategy, SparseRC first uses graph decompositions [based on the non-zero pattern (nzp) of G + C] to identify individual subnets, as well as the separator nodes through which these communicate. Through partitioning, the original circuit matrices are reordered into the bordered block diagonal (BBD) [97] form: individual blocks form the subnets to be reduced, while the border blocks collect the separator nodes which are all preserved. The separator nodes are identified with the help of a separator tree indicating which of the subnets resulting from partitioning are in fact separators (this is usually a direct functionality of the partitioning algorithm, see for instance the MATLAB [67] manual pages of nesdis reordering). In the conquer phase, individual blocks are reduced with the EMMP from Sect and the border blocks are correspondingly updated to maintain the moment matching property. A graphical representation of the 7-component BBD partitioning and reduction is shown in Fig It is emphasized that, as reduction progresses, fill-in is isolated in the reduced parts of C 1, C 2, C 4, C 5, the separator blocks C 3, C 6, C 7, and the corresponding connection borders. As components are minimally connected, fill-in generated on the border is minimized. 6 In implementation both G and C are in BBD form, in Fig. 4.3 G denotes simultaneously the corresponding blocks from both matrices.

86 4.3 SparseRC reduction via graph partitioning 77 Original Reduced C1: C2: G11 G13 G17 G22 G23 G27 Red C1: Red 1 Red 2 Red 1 Red 2 Red 1 Red 2 C3: G33 G37 Keep C3: Fill 1,2 C4: C5: C6: C7: G44 G46 G47 G55 G56 G57 G66 G67 G77 Keep C7: Red 4 Red 5 Red 4 Red 5 Fill 4,5 Red 4 Red 5 Fill1, 2,4,5 Figure 4.3: Circuit matrices after partitioning, in BBD form (original-left vs. reduced-right). For clarity, block dimensions are not drawn to scale: in practice the separators are much smaller than the independent components, thus the borders are thin. The individual blocks are reduced up to terminals, the borders are retained and updated. The number inside each independent component denotes the reduction step; this number is also stamped into the corresponding border blocks to mark fill-in. Example: reducing C 1 also updates the separators C 3 and C 7 and the corresponding borders. The root separator C 7 is updated from reducing all individual blocks C 1,2,4, Mathematical formulation The mathematical derivation of SparseRC follows, showing how reduction progressively traverses the BBD matrix structure, reducing individual components and updating the connectivity among them. Herein, G and C shall denote the original circuit matrices, while G, C shall directly refer to matrix blocks associated with the EMMP reduction from Sect Reconsider the original RC circuit in MNA form: (G + sc)x(s) = Bu(s), (4.12) of dimension n + p, where n are internal nodes, p are terminals. The appropriate projection V R (n+p) (k+p), k 0, is sought, which reduces (4.12) to: Ĝ=V T GV R (k+p) (k+p), Ĉ=V T CV R (k+p) (k+p) (4.13) x=v T x R (k+p), B=V T B R (k+p) p (4.14) As illustrated in Sect , V is constructed step-wise using the BBD matrix reordering. Mathematically this is shown via the simplest example of a BBD partitioning: a bisection into two independent components communicating via one separator (border) block.

87 78 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks General reduction for a multi-level BBD partitioned system follows similarly. Consider the bisection of (4.12): G 11 0 G 13 0 G 22 G 23 + s G13 T G13 T G 33 C 11 0 C 13 0 C 22 C 23 C13 T C13 T C 33 x 1 x 2 = x 3 B 1 B 2. (4.15) B 3 Reducing (4.15) amounts to applying the EMMP from Sect on the individual components [here C 1 := nzp(g 11 + C 11 ) and C 2 := nzp(g 22 + C 22 )]. The separator [here C 3 := nzp(g 33 + C 33 )] is kept and updated twice with the projections reducing C 1 and C 2 respectively. Naturally, the reduction is applied to the communication blocks G 13, C 13, G 23, C 23. Updating the separator and communication blocks at each individual reduction step ensures admittance moment preservation for the total recombined circuit (see Theorem in Sect ). Step 1 Consider the reduction of subnetwork C 1 with EMMP, based on splitting x 1 of (4.15) into x 1R and x 1S, i.e. into the internal nodes (to be eliminated) and selected nodes (to be preserved) of subnet C 1 : G 11R G 11K 0 G 13R C 11R C 11K 0 C 13R G11 T K G 11S 0 G 13S 0 0 G 22 G 23 G13 T R G13 T S G23 T G 33 +s C11 T K C 11S 0 C 13S 0 0 C 22 C 23 C13 T R C13 T S C23 T C 33 x T = [x T 1 R, x T 1 S, x T 2, xt 3 ]T, B T = [0, B T 1 S, B T 2, BT 3 ] (4.16) Recognizing the analogy with (4.2), the EMMP-based transformation which reduces the network (4.16) by eliminating nodes the internal nodes x 1R is given by: G11 1 R G 11K 0 G11 1 R G 13R I S I I 3 where I S1 23 :=blockdiag(i S 1, I 2, I 3 ). Let: [ = G 1 R G K I S1 23 ] = V 1, (4.17) W 11 = G 1 11 R G 11K, W 13 = G 1 11 R G 13R. (4.18)

88 4.3 SparseRC reduction via graph partitioning 79 As with (4.8)-(4.10), the reduced model for (4.16) is computed from G S = VT 1 GV 1, C S = V T 1 CV 1. The reduced system at step 1 has the form: where: G S = Ĝ 11 0 Ĝ13 0 G 22 G 23 Ĝ T 13 G T 23 G 33 B 1 B S = B 2, B 3, C S = x x 1 S = x 2 x 3 Ĉ 11 0 Ĉ13 0 C 22 C 23 Ĉ T 13 C T 23 C 33 (4.19), (4.20) Ĝ 11 = G 11S G T 11 K G 1 11 R G 11K, (4.21) Ĝ 13 = G 13S G11 T K G11 1 R G 13R (4.22) G 33 = G 33 G13 T R G11 1 R G 13R (4.23) Ĉ 11 = C 11S + W T 11 C 11 R W 11 + W T 11 C 11 K + C T 11 K W 11 (4.24) Ĉ 13 = C 13S + W T 11 C 13 R + C T 11 K W 13 + W T 11 C 11 R W 13 (4.25) C 33 = C 33 + W T 13 C 11 R W 13 + W T 13 C 13 R + C T 13 R W 13 (4.26) B 1 = B 1S, x 1 = x 1 S. (4.27) The BBD form provides an important structural advantage, both in terms of identifying fill-in, as well as in implementation: reducing one subnet only affects the entries of the corresponding separator and border blocks, leaving the rest of the independent subnets intact. Notice from (4.19)-(4.20) how reducing subnet C 1 has only affected its corresponding connection blocks to C 3, and the separator block C 3 itself. The blocks of subnet C 2 are not affected. This is because C 1 communicates with C 2 only via the separator C 3. Therefore while reducing C 2, the already computed blocks of the reduced C 1 will no longer be affected. Only the connection blocks from C 2 to C 3 and the separator C 3 itself will be updated. Mathematically, this is shown next. Step 2 Partition now the reduced G S, C S matrices (4.19)-(4.20) by splitting component C 2 according to x 2R and x 2S : Ĝ Ĝ13 Ĉ G 22R G 22K G Ĉ13 23R 0 G T 22 K G 22S G 23S +s 0 C 22R C 22K C 23R 0 C T Ĝ T 13 G T 23 R G23 T G 22 K C 22S C 23S S 33 Ĉ T 13 C T 23 R C23 T C S 33 (4.28) x ST = [ x T 1, x T 2 R, x T 2 S, x T 3 ]T, B ST = [ B T 1, 0, BT 1 S, B T 3 ] (4.29)

89 80 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks As before, the EMMP-based transformation which reduces the network (4.28) by eliminating nodes x 2R is given by: I S G22 1 R G 22K G22 1 I S1 0 R G 23R 0 I S2 0 = 0 G 1 0 I S I 3 R G K =V 2, (4.30) The reduced model is obtained by projecting (4.28)-(4.29) with V 2. Ĝ = VT 2 G S V 2, Ĉ = V T 2 C S V 2, B = V T 2 B S, x = VT 2 x S : Ĝ 11 0 Ĝ13 Ĉ 11 0 Ĝ = 0 Ĝ22 Ĝ 23, Ĝ = Ĉ13 0 Ĉ22 Ĉ 23 (4.31) Ĝ T 13 Ĝ T 23 G 33 Ĉ T 13 Ĉ T 23 C 33 B 1 x 1 B = B 2, x = x 2, (4.32) x 3 where (4.21)-(4.27) hold and: B 3 Ĝ 22 =G 22S G T 22 K G 1 22 R G 22K, (4.33) Ĝ 23 =G 23S G T 22 K G 1 22 R G 23R (4.34) G 33 = G 33 G T 23 R G 1 22 R G 23R (4.35) Ĉ 22 =C 22S + W T 22 C 22 R W 22 + W T 22 C 22 K + C T 22 K W 22 (4.36) Ĉ 23 =C 23S + W T 22 C 23 R + C T 22 K W 23 + W T 22 C 22 R W 23 (4.37) C 33 = C 33 + W T 23 C 22 R W 23 + W T 23 C 23 R + C T 23 R W 23 (4.38) B 2 =B 2S, x 2 = x 2 S, with (4.39) W 22 = G 1 22 R G 22K, W 23 = G 1 22 R G 23R. (4.40) As seen from (4.35) and (4.38), separator blocks G 33 and C 33 are the further updated blocks G 33, C 33 (previously obtained from reducing C 1 ). The reduced model retains the BBD form, and the separator nodes are retained in the blocks corresponding to G 33 and C 33. The p terminals are distributed among C 1, C 2, C 3 as seen from the form of B in (4.32). Equations (4.27), (4.39) and (4.32) together show that the input/output incidence matrix is preserved after reduction, thus the reduced netlist obtained from RLCSYN [93] unstamping preserves connectivity via the terminal nodes. In the general case, block-wise reduction of finer BBD partitions (into N > 3 components) follows similarly, with the appropriate projections of separator and border blocks. The moment-matching, terminal connectivity and passivity requirements remain satisfied.

90 4.3 SparseRC reduction via graph partitioning 81 Options for further improving sparsity Partitioning provides an additional structural and computational advantage: if necessary, additional reordering and minimum-fill tracking options such as those employed by ReduceR/SIP can be applied per subnet. Naturally, such operations come at additional computational cost, but are still more efficient than monitoring fill-in directly from the unpartitioned circuit. So, while separator nodes already improve sparsity globally and automatically, fill-monitoring actions may further identify additional internal nodes to be preserved locally in each subnet. For instance, in the reduction scenario of Step 1 (see Sect ), the G 11, C 11 blocks of (4.15) would be reordered with CAMD and fill-tracking would identify which additional internal nodes should be preserved along with terminals in x 1S. This may improve sparsity inside Ĝ11, Ĉ11 (and correspondingly in Ĝ13, Ĉ13, G 33, C 33 ) even beyond the level already achieved by preserving the separators nodes. In Sect. 4.4 examples are provided to illustrate this effect. Nevertheless, such fill-monitoring operations are not always necessary: the sparsity level achieved directly from partitioning and preserving separator nodes is often sufficient. This is discussed in Sect Moment matching, passivity and synthesis Note that as Ĝ = VT 2 G S V 2 = VT 2 VT 1 GV 1 V 2 (similarly for Ĉ, B), the reducing projection from (4.13)-(4.14) is V := V 1 V 2, with V 1, V 2 as deduced in (4.17) and (4.30) respectively. In efficient implementations however V 1, V 2, and V are never formed directly, rather they are formed block-wise as just shown. Only W 11, W 13, W 22, W 23 from (4.18) and (4.40) respectively are explicitly formed. Next, it is shown that the V constructed from successive EMMPs matches the first two admittance moments at the end of the reduction procedure. Theorem Consider the original circuit (4.15) with matrices partitioned and structured in BBD form, which is reduced by applying successive EMMP projections (see Sect ) on each subnet. The final reduced model (4.31)-(4.32) preserves the first two multi-port admittance moments around s=0 of each individual subnet and of the entire recombined circuit (4.15). Proof (Proof of Theorem 4.3.1) By Prop , the reduced subnet 1 defined by Ĝ11 (4.21) and Ĉ11 (4.24), preserves the first two multi-port admittance moments at s = 0 of the original subnet 1 defined by G 11, C 11. The same holds for the reduced subnet 2, defined by Ĝ22 (4.33) and Ĉ22 (4.36). There remains to prove the admittance moment matching between the original (4.15) and reduced (4.31)-(4.32) recombined circuits. This is shown by reconstructing from the individual EMMPs, an EMMP projection V associated with entire circuit (4.15), as follows. Recall G from (4.16), where in addition nodes x 2 of the second subnet are split into x 2R

91 82 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks and x 2S as in Sect : G= G 11R G 11K 0 0 G 13R G11 T K G 11S 0 0 G 13S 0 0 G 22R G 22K G 23R 0 0 G T 22 K G 22S G 23S G T 13 R G T 13 S G T 23 R G T 23 S G 33 Recall V = V 1 V 2 = with V 1 from (4.17) and V 2 from (4.30). Inside the V 1, let I 2 = blockdiag(i R2, I S2 ) be partitioned according to the splitting of x 2 into x 2R and x 2S respectively. Then, by straightforward matrix multiplication: V = V 1 V 2 = G11 1 R G 11K 0 G11 1 R G 13R I S G22 1 R G 22K G22 1 R G 23R 0 I S I 3 (4.41) Let P be the permutation swapping the second with the third block-rows of (4.41). Then, denoting V P = PV: V P = G11 1 R G 11K 0 G11 1 R G 13R 0 G22 1 R G 22K G22 1 R G 23R I S I S I 3 [ ] CR C similarly C P := K C T, B K C P = S. (4.42) Define the permuted matrices G P = PGP T, C P = PCP T, B P = PB, and notice their structure: G 11R 0 G 11K 0 G 13R 0 G 22R 0 G 22K G 23R [ ] G P = G11 T K 0 G 11S 0 G 13S GR G := K 0 G22 T K 0 G 22S G G T, (4.43) K G S 23S G13 T R G13 T S G23 T R G23 T S G B 1S B 2S B 3S [ ] 0 :=. (4.44) B S From (4.43), (4.44) and the analogy with Sect , one recognizes immediately in (4.42) the

92 4.3 SparseRC reduction via graph partitioning 83 EMMP: V P = [ G 1 R G K I S1 S 2 3 ], I S1 S 2 3 := blockdiag(i S 1, I S2, I 3 ). (4.45) From Prop , the reduced model obtained by projecting (4.43), (4.44) with (4.45) matches the fist two moments around s = 0 of the multi-port admittance (defined with respect to B S, the total number of terminals and separator nodes of the recombined circuit). Since P is a permutation satisfying P T P = I, it follows immediately that this reduced model is precisely: where (4.31)-(4.39) hold. V T P G P V P = V T GV = Ĝ, V T P C P V P = V T CV = Ĉ, V T P B P = V T B = B. Matching behavior at higher frequencies Although SparseRC is mainly based on matching the first two admittance moments at s = 0 (which proved sufficient for most experiments of Sect. 4.4), it is possible, when necessary, to additionally improve approximation at higher frequency points.one possibility is to include contributions from the otherwise neglected term Y R (s) of the admittance response (4.7). Let for simplicity Q be a transformation which reduces Y R (s). One can for instance perform the traditional PRIMA [71] reduction of Y R (s), which constructs for a chosen expansion point s i the Krylov subspace: K m (A 1 C R, A 1 C [ K )= span (A 1 C R ) m 1 A 1 C ] K, where A=G R + s i C R. If K m Q then Q matches 2m multiport admittance moments of Y R (s) at s i. As a special case, if s i = 0, then Q matches 2m multi-port admittance moments at zero of (4.7), additionally to the 0 th and 1 st which are matched by default. Another option is to include in Q eigenvectors associated with the dominant eigenvalues of Y R (s), similarly to [56]. Be it either obtained from moment or pole matching (or a combination of both), it is easily verified that the projection Q which reduces Y R (s) enters the transformation (4.3) as follows: [ ] Q W X Q =, W = G 1 0 I R G K (4.46) To form Q within the partitioned framework, reconsider the reduction of subnet 1 from Sect The reducing transformation there is V 1 (4.17), which preserved the first two admittance moments at s = 0 and eliminated entirely the contribution of internal nodes x 1R. Rather than eliminating x 1R, let Q 1 be the transformation which reduces the internal matrices G 11R and C 11R. Similarly, during the reduction of subnet 2 (Sect ), let Q 2 be the transformation which reduces G 22R and C 22R. Q 1 and Q 2 enter the reducing transformation for the recombined network as follows:

93 84 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks X Q = Q 1 W W 13 0 I S Q 2 W 22 W I S I 3, (4.47) where (4.18) and (4.40) hold. Note that (4.47) is the extension with Q 1 and Q 2 of the default projection (4.41) which matches the first two multi-port admittance moments at s = 0. In a similar manner, a projection can be constructed which matches directly moments at s i = 0 of the entire multi-port admittance (4.7) (see [57] for details, which pertains more to RLC reduction). Given that a circuit s offset and slope at DC are precisely the first two admittance moments at s = 0 [56], matching these is a natural choice. We emphasize that, in contrast with SparseRC, direct moment matching at s = 0 cannot be achieved via PRIMA [71], or SPRIM [27] since the original G matrix is singular. To summarize, should additional accuracy be necessary when reducing RC circuits with SparseRC, it is safer to match the first two admittance moments at s = 0 and improve the response with contributions from the internal Y R (s) term as above described. This approach was implemented in the partitioned framework for two examples in Sect : the RC transmission line example where moments of Y R (s) are matched in addition, and the low noise amplifier (LNA) where dominant eigenmodes of Y R (s) are additionally matched. SparseRC matches two moments at s = 0 by default. Hence, extra poles/moments can be added, if needed, after the default reduction, by re-arranging the projection X Q so that the extra Q i columns are formed in a separate reduction run. The decision on whether to add extra poles or moments is difficult to make a-priori, as is the case for any moment-based reduction method. One approach would be to compare the response of the original and default reduced order model for large frequencies. Should significant deviations be observed, then the addition of extra poles/moments is recommended. In PACT [56] poles are added based on a specified error and maximum operating frequency. Passivity and synthesis As with PACT [56], the reducing projection V is a congruence transformation applied on original symmetric, non-negative definite matrices, which gives reduced symmetric, non-negative definite matrices (4.31)-(4.32). Consequently [56], the final reduced model (4.31)-(4.32) is passive and stable, and the reduced netlist obtained from RLCSYN [93] unstamping remains passive irrespective of the values of the resulting R, C elements. If reduction is performed with the default two-moment matching at s = 0 (which was sufficient in all experiments except the two-port RC line of Sect ) the projection V

94 4.3 SparseRC reduction via graph partitioning 85 of (4.41) guarantees that the unstamping of Ĝ generates only positive resistors. This is ensured by the special form of V which performs a Schur-complement operation on the original matrix G [91] (note that a standard moment matching projection as in PRIMA [71] does not guarantee positive resistors from unstamping, even though the reduced Ĝ is symmetric positive definite). While there may be negative capacitances resulting from unstamping Ĉ, they do not violate the passivity of the netlist, nor its direct usability inside a simulator such as Spectre [16]. Furthermore, they do not prejudice the quality of the re-simulation results as confirmed by Sect In fact, as also motivated in SIP [94], dropping negative capacitors from the reduced netlist is in practice a dangerous operation, so all capacitors are safely kept in. In the case of reduction with additional accuracy as in Sect , the unstamping of Ĝ may generate negative resistors; these again posed no difficulties in the simulations performed (e.g. AC, transient, periodic steady state). PartMOR [70] presents an alternative reduction and synthesis strategy which ensures positive-only elements SparseRC algorithm The SparseRC pseudocode is outlined in Algorithms 1, 2, which describe the most relevant reduction case of matching the first two admittance moments at s = 0. To summarize Alg. 1: from the original circuit matrices G, C and a vector of indices e denoting the original location of terminals (external nodes), SparseRC outputs the reduced circuit matrices Ĝ, Ĉ, and the vector ẽ denoting the new terminal locations. As an advanced option, do minfill specifies whether additional fill-reducing orderings should be employed per partition. The graph G defined by the circuit topology [the non-zero pattern (nzp) of G + C] is partitioned into N components. A permutation P (which reorders the circuit matrices in BBD form) is obtained, together with a vector Sep indicating which of the N components is a separator. For each non-separator component C k, defined by nodes i k, the corresponding matrix blocks are reduced with EMMP while accounting for the communication of C k to the remaining components via the separator nodes i sep. All separator components are kept, after having been appropriately updated inside EMMP. The G, C matrices supplied at each step to EMMP (line 8 of Alg. 1) are updated in place, and the reduction follows according to Alg. 2. The i k index selects from the supplied G, C the component to be reduced (say, C k ), while i sep are the indices of separator nodes through which i k communicate with the rest of the circuit. If desired, at the entry of EMMP these nodes are reordered with CAMD, as to identify additional internal nodes which may further improve sparsity from reducing C k (this operation however is only an advanced feature and often unnecessary). Internal and external nodes of component C k are identified. Internal nodes i R will be removed and the selected nodes i S will be preserved (i.e. terminals of C k, corresponding separator nodes, and possibly some additional internal nodes obtained from step 2). The corresponding matrix blocks are identified and the update matrix W is formed. The blocks corresponding to selected nodes i S are updated, while those corresponding to the eliminated i R nodes are re-

95 86 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks moved. At output, Ĝ, Ĉ are the reduced matrices: internal nodes were eliminated only from the component defined by node indices i k, while nodes corresponding to the other components are untouched. The terminal locations of the reduced model are indexed by ẽ. Algorithm 1 (Ĝ, Ĉ, ẽ) = SparseRC(G, C, e, do minfill) Given: original G, C, original vector of terminal indices e, do minfill (0/1) option for minimumfill reordering per subnet Output: reduced Ĝ, Ĉ, updated vector of terminal indices ẽ 1: Let graph G := nzp(g + C) 2: (P, Sep) =partition(g, N) 3: G = G(P, P), C = C(P, P), e = e(p) reorder in BBD 4: for component C k = 1... N do 5: if C k Sep then C k is not a separator 6: i k = index of nodes for component C k 7: i sep = index of separator nodes connecting C k to components C k+1... C N 8: (G, C, e) = EMMP(G, C, i k, i sep, e, do minfill) reduce C k with EMMP 9: else keep separator component C k 10: end if 11: end for 12: Ĝ = G, Ĉ = C, ẽ = e A few computational remarks: to ensure numerical stability while forming the reduced matrix blocks at step 11 of Alg. 1, we apply rescaling to C (and/or G). Based on Theorem 4.3.1, it is also ensured that the global moment matching projection which underlies SparseRC inherits the proved [94] full-column-rank properties of the SIP/PACT projection. On the partitioning strategy It was seen how preserving internal nodes along with terminals improves the sparsity of the reduced model. Good reduced models are sparse and small, i.e. have minimum fill and few preserved internal nodes. Towards obtaining reduced models with a suitable trade-off between sparsity and dimension, one may naturally ask: (a) what are the optimal partitioning criteria and the number of generated subnets N, and (b) when are additional fill-reducing node reorderings and fill-monitoring actions needed aside from partitioning? The choice of N Through partitioning, the aim is to minimize the communication among subnets (via a small number of separator nodes) and spread the terminals across partitions, as to minimize the fill-in generated from reducing each partition (up to its terminals) and inside the separator blocks. Towards achieving this goal, this chapter relies on the general purpose partitioner nested dissection (NESDIS, part of [18]) the choice

96 4.3 SparseRC reduction via graph partitioning 87 Algorithm 2 (Ĝ, Ĉ, ẽ) = EMMP(G, C, i k, i sep, e, do minfill) Given: initial G, C, corresponding vector of terminal indices e, do minfill (0/1) option for minimum-fill node reordering Output: reduced Ĝ, Ĉ, corresponding vector of terminals ẽ 1: if do minfill then find additional internal nodes to preserve 2: (i k, i sep, e) = reordercamd(g, C, i k, i sep, e) 3: find optimal minimum fill ordering per subnet 4: end if 5: (i int, i ext ) = split(i k, e) split i k into internal and external nodes 6: i R = i int internal nodes to eliminate 7: i S = [i ext, i sep ] selected nodes to keep 8: G R = G(i R, i R ), C R = C(i R, i R ) G K = G(i R, i S ), C K = C(i R, i S ) G S = G(i S, i S ), C S = C(i S, i S ) 9: W = G 1 R G K construct reducing projection 10: G(i S, i S ) = G S GK T W update entries for selected nodes 11: C(i S, i S ) = C S + CK T W + WT C K + W T C R W 12: G(i R, i R ) = { }, C(i R, i R ) = { } eliminate i R nodes G(i R, i S ) = { }, C(i R, i S ) = { }, e(i R ) = { } 13: Ĝ = G, Ĉ = C, ẽ = e however is by no means exclusive. In [47] and Appendix for instance the usage of the hypergraph partitioner Mondriaan [96] is documented. NESDIS partitions a graph so that communication among subnets is minimized, however cannot control explicitly the distribution of terminals across parts. With NESDIS, terminals get spread indirectly, as a consequence of partitioning. While estimating an optimal N is an open problem which must simultaneously account for multiple factors (number of terminals, internal nodes, elements, potential fill-in), we provide some guidelines as to quickly determine a satisfactory value to be used with NESDIS. For our experiments, N was mostly determined immediately by inspecting the ratio of terminals to internal nodes in the original graph, p n. For netlists with small p n [for instance p n < 10 1 ], a coarse partitioning is already sufficient to achieve few terminals per subnet and ensure sparsity (certainly, as long as the resulting number of nodes per subnet enables the computation of the corresponding block projection). Such circuits are the ideal candidates for SparseRC reduction based on NESDIS partitioning alone, without extra fill-reducing ordering actions. Additional fill-monitoring actions For circuits with large p n ratios though, finer NES- DIS partitions are needed to achieve a small enough number of terminals per subnet (see the Filter net in Sect ). In this case, additional fill-reducing orderings and minimum-fill tracking actions may further improve the sparsity attained from partitioning, at the cost of preserving more internal nodes. In Sect. 4.4 examples illustrate the sparsity, dimensionality and computational implications of the partitioning fineness and, where needed, of additional fill-monitoring actions.

97 88 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Clarifying remarks The functionality of SparseRC is not tied strictly to the partitioner used or the choice of N. It is assumed that the original circuits (graphs) are sparse. The sparsity of the original circuit will dictate the partitioning performance and consequently the dimension and sparsity level of the reduced circuit. Precise judgements on the optimal N or the necessity of additional fill-monitoring operations cannot be made a priori. These could be resolved by the following multi-terminal graph optimization problem [47]: For a multi-terminal network G = (V, E), of which P V are terminals, P = p, find an N-way partitioning with the objective of minimizing the number of separator nodes subject to the following constraints: (a) the separator nodes and terminals are balanced across the N parts, and (b) eliminating the internal nodes from each subnet introduces minimum fill in the parts determined by terminals and separator nodes. Chapter 7 analyzes the multi-terminal partitioning problem in more detail. As concerns the approximation quality, the SparseRC model will always be at least as good as a PACT [56] reduced model. Due to Theorem SparseRC guarantees not only local but also global moment matching for the recombined network, irrespective of N or the partitioner used. Computational complexity The computational complexity of SparseRC is dominated by the cost of computing W inside EMMP (line 9 in Alg. 2), for each of the N max < N non-separator components. With n max denoting the maximum number of internal nodes for a component (i.e. the maximum size of block G R ), and m max the maximum size of G S, the cost of one EMMP operation is at most O(n α max m max ), with 1 < α 2 [75]. When n and p are large and the circuit is partitioned, one aims at n max n and m max p [note that m max = p max + s max, with p max denoting the maximum number of terminals per component (i.e. length of i ext ) and s max the maximum number of separator nodes connecting a component C k to components C k+1... C N (i.e. length of i sep )]. Therefore, especially for netlists with many internal nodes n and many terminals p, the total cost O(N max (n α max m max )) of SparseRC is much smaller than the O(n α p) cost of constructing (if at all feasible) a PACT reducing projection directly from the original, unpartitioned matrices. The results in Sect , Table 4.1 confirm this through netlists 6.RX and 7.PLL which contain more than internal nodes and 4000 terminals. For a graph G = (V, E) with V = n + p nodes and E edges, the cost of NESDIS partitioning (being based on METIS [53]) is O( E ) hence cheap to perform for sparse graphs (here, V is the number of circuit nodes and E the number of resistor and capacitors). Should additional reorderings be employed per subnet, the cost of CAMD is O( V E ) [1, 2, 38], also a fast operation. The cost for the advanced option of tracking fill-in is more expensive, especially for networks with nodes and elements exceeding thousands. The operation becomes a partial Gaussian elimination up to terminals, which may reach O( V 3 ) in the worst case. Nonetheless, such fill-monitoring operations are only an optional, advanced feature of SparseRC which was rarely needed in our experiments.

98 4.4 Numerical results and circuit simulations 89 With ingredients of SparseRC in hand, we summarize its properties in light of the reduction criteria defined in Sect SparseRC meets the accuracy (a), passivity (b), and terminal re-connectivity (d), requirements while reducing multi-terminal RC circuits via a block-wise EMMP reducing projection. The efficiency (e) of SparseRC is ensured via a partitioning-based implementation, which reduces much smaller subnets (also with fewer terminals) individually while maintaining the accuracy, passivity and re-connectivity requirements of the entire circuit. The sparsity (c) of the SparseRC reduced model is enhanced by preserving a subset of internal nodes which are identified automatically from partitioning and where necessary, from additional fill-reducing orderings. The performance of SparseRC in practice is shown by the numerical and simulations results presented next. 4.4 Numerical results and circuit simulations Several parasitic extracted RC circuits from industrial applications are reduced with SparseRC. For each circuit, the terminals are nodes to which non-linear devices (such as diodes or transistors) are connected. During a parsing phase, the multi-terminal RC circuit is stamped into the MNA form (4.12) and reduced with SparseRC. The reduced model (4.31)-(4.32) is synthesized with RLCSYN [93] into its netlist description. As connectivity via the external nodes is preserved with SparseRC, no voltage/current controlled sources are generated during synthesis. The non-linear devices are directly re-coupled via the terminal nodes to the reduced parasitics. The reduced circuit is resimulated with Spectre and its performance is compared to the original simulation. Most of the circuits are reduced with the default options of SparseRC: matching only the 0 th and 1 st order multi-port admittance moments at s = 0 and also without employing additional fill-tracking options. For some examples, the functionality of advanced options within SparseRC is demonstrated, such as: additional moment or pole matching, or additional fill-monitoring actions per subnet. All results presented next are based on partitioning with nested dissection (for additional experiments with Mondriaan hypergraph partitioning see Appendix 4.6.2). SparseRC was implemented in Matlab (version R2007a); all reduction experiments were performed on a Linux (Ubuntu) machine with 3.9GRAM main memory and two Intel(R) Core(TM)2 Duo, 2.4GHz, CPUs. All Spectre simulations were run on a Linux (Redhat) machine with 16GRAM main memory and six Intel(R) Xeon(R) X5460, 3.1GHz, CPUs.

99 90 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Table 4.1: Reduction for various netlists with SparseRC vs. PACT (SparseRC with no partitioning). Sim. time: Spectre [16] netlist simulation time, Total red. time: partitioning plus reduction time Net Type n i #R #C 1. Transmission line (TL) p = Low noise amplifier (LNA) p = Mixer 3 (MX3) p = Interconnect structure (IS) p = Mixer 7 (MX7) p = Receiver (RX) p = Phase-locked loop (PLL) p = Filter p = 5852 Sim. Total red. time (s) time (s) Error (RMS) Orig SparseRC e 4 Red. rate 99.35% 97.20% 80.69% 51 X - Orig SpRC-dp e 4 Red. rate 99.8% 99.4% 78.3% 117 X - Orig SpRC-mf e 7 Red. rate 94.72% 90.17% 65.58% 2.5 X - Orig SpRC e 5 Red. rate 98.72% 61.28% 58.02% 96 X - Orig SpRC-mf e 8 Red. rate 83.58% 50.42% 3.61% 1.3 X - Orig NA - SpRC NA Red. rate 99.15% 93.28% 56.88% - Orig NA - SpRC NA Red % 92.17% 43.78% - Orig SpRC e 5 Red. rate 78.59% 32.95% 25.32% 1.6 X - Type n i #R #C Sim. Total red. time (s) time (s) Error (RMS) Orig PACT e 4 Red. rate 100% 99.32% 92.46% 51 X - Orig PACT e 3 Red. rate 100% 99.79% 91.02% 169 X - Orig PACT e 7 Red. rate 100% 92.03% 52.53% 2.3 X - Orig PACT e 4 Red. rate 99.9% 74.09% 61.47% 101 X - Orig PACT e 8 Red. rate 100% 52.94% 58.76% 0.8 X - Orig NA - PACT NA NA NA NA NA NA Red. rate NA NA NA NA - Orig NA - PACT NA NA NA NA NA NA Red. rate NA NA NA NA - Orig PACT > 24h 68.5 NA Red. rate 100% % 4692% NA -

100 4.4 Numerical results and circuit simulations General results Table 4.1 collects the main SparseRC reduction results for various multi-terminal netlists obtained from real chip designs, and compares the results obtained with PACT 7. Each block row consists of a netlist example with the corresponding number of terminals p (the same before and after reduction). For each netlist, the sparsity before and after reduction are recorded here as the number of circuit elements, i.e. #R, #C. The reduction rate (Red. rate) shows the percentage reduction for the corresponding column quantity. For instance the percentage reduction in internal nodes n i is: P n = 100(n i Orig. n isprc ) n iorig., similarly for #R, #C. The Red. rate in simulation time is computed as a speed-up factor: Orig Sim. time Sim. time (similarly for PACT). The approximation error is measured as the rootmean-square (RMS) value of the difference between the signals of the original and the SpRC reduced circuit. Several simulations are performed, including: AC, noise, transient, (quasi) periodic steady state, (quasi) s-parameter, usually a combination of these for each circuit depending on the underlying application. Due to space limitations we can only present some of the simulation figures, however the simulation timings recorded in the tables represent the sum of all analysis types performed for one circuit. The error RMS value is recorded for one analysis only, but is representative of all analysis types performed for that circuit. For most examples, excellent reduction rates (above 80%) in the number of internal nodes n i were obtained. The n i internal nodes of the reduced model are precisely the internal nodes which, if otherwise eliminated, would have introduced too much fill-in. They are the separator nodes identified automatically from partitioning (plus, where suitable, some additional internal nodes identified from fill monitoring operations). With n i thus preserved, very good reduction rates were obtained in the number of circuit elements: mostly above 60% reduction in resistors and above 50% for capacitors. The effect of reducing internal nodes as well as the number of circuit elements is revealed by significant speed-ups (mostly above 2X) attained when simulating the reduced circuits instead of the original. Even more, for the largest examples (netlists RX, PLL) simulation was only possible after reduction, as the original simulations failed due to insufficient CPU and memory resources. In addition, the reduction times recorded in Table 4.1 show that these reduced netlists were obtained efficiently. Reduction without partitioning For comparison, results for SparseRC without partitioning (essentially, PACT) are also recorded in Table 4.1. PACT amounts to running the SparseRC Algorithm 1 of Sect with N = 1 in line 2. The results reveal the strength of SparseRC especially when reducing challenging circuits with very large node and terminal numbers (e.g., nets 6, 7, 8). 7 For simplicity, PACT is used here only as a short term to denote SparseRC reduction without partitioning; the full PACT methodology [56] also includes more advanced analysis such as pole matching.

92 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks First, the computational advantages of partitioning for very large netlists are revealed through examples PLL and RX,

101 92 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks First, the computational advantages of partitioning for very large netlists are revealed through examples PLL and RX, for which an unpartitioned PACT projection could not even be computed. For the smaller examples, the PACT reduction times are smaller than the SparseRC ones, indicating that partitioning is not necessary. Second, partitioning and the preservation of separator nodes improves the sparsity of the reduced models. This is confirmed by the Filter and MX7, where the unpartitioned approach resulted in dense reduced netlists which were slower to simulate than the originals (the effect would be the same for PLL and RX). Figure 4.4: MX3. Transient simulation of the original (red) and SparseRC (blue) overlap. The error signal (black) has an RMS value of A. Next, selected simulation results are presented. The MX3 net comes from a mixer circuit. Here, the SparseRC model was obtained by further re-ordering the partitioned circuit matrices (obtained via NESDIS) with CAMD and by keeping track of fill-in during the block-wise reduction process. The transient simulation in Fig. 4.4 shows that the original and SparseRC curves are indistinguishable. Improving accuracy at higher frequencies Two examples in particular demonstrate possibilities for improving the approximation accuracy beyond matching the 0 th and 1 st moment at s = 0, as described in Sect :

102 4.4 Numerical results and circuit simulations 93 The low noise amplifier (LNA) Net LNA from Table 4.1 is part of a low noise amplifier circuit (C45 technology), and was reduced in 53.4 seconds to a SparseRC-dp 8 model which was 99% faster to simulate. SparseRC-dp denotes a reduced model obtained from a partition into N = 3 components where, for each component, the two default moments at s = 0 are matched plus eight dominant poles of the internal contribution Y R. These dominant poles were computed with the subspace accelerated dominant pole method [79]. In Fig. 4.5 the simulation results are shown of the noise analysis for the original, SparseRC-dp and PACT models. The effect of improving the SparseRC response with the additional dominant poles is visible in Fig This was also confirmed in transient simulation, which gave an RMS error of for SparseRCdp, smaller than for PACT (see Table 4.1). Figure 4.5: LNA. Noise analysis comparison of original (red) vs. two reduced models: in blue, SparseRC-dp (moment patching at s = 0 with additional dominant poles), and in pink, PACT with default moment matching at s = 0. Two port RC transmission line A uniform RC transmission line with two terminals (one node at the beginning, and one node at the end of the line) is considered, with a cascade of R-C sections. The circuit was partitioned into N = 2 subnets (and one separator block). Two reduced SparseRC models were computed: without and with additional matching at higher moments. The latter model was computed with the projection (4.47), which matched a combination of moments of the Y R term per subnet: two moments at 0, one at s = and one at s = was chosen. The bode magnitude plot of the frequency response for the original and the two reduced models is shown in 8 dp is short for dominant pole

103 94 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Fig The behavior at higher frequencies is indeed approximated more accurately for the reduced model which preserves additional moments. Figure 4.6: RCtline: Bode plot for original and two reduced SparseRC models: by matching the first two admittance moments at s = 0 only (red) and by matching in addition higher moments from the Y R term per subnet (blue) Advanced comparisons Nets PLL (phase-locked-loop) and Filter, two of the largest and most challenging netlists from Table 4.1 are analyzed in detail in Table 4.2. The purpose of the analysis is threefold: (1) the advantages of SparseRC over existing methodologies are revealed, (2) the effects of various partitioning sizes and of additional reorderings are shown, and (3) possible limitations and improvement directions for SparseRC are identified. The PLL Performing an original simulation was unfeasible for this circuit, as it failed due to insufficient CPU and memory resources even on a larger machine, but reduction makes the simulation possible. The original G and C matrices, reordered and partitioned in BBD form are shown in Fig The borders are clearly visible, collecting the separator nodes that will be preserved along with the terminals. The reduced matrices retain the BBD structure and are sparse, as seen in Fig. 4.8.

104 4.4 Numerical results and circuit simulations 95 Table 4.2: Advanced comparisons for very large examples. N is the number of subnets, AvgN size is the average size of a subnet, AvgS size/avg-p size are the average number of separator nodes/terminals per subnet, Avg. red. time is the average reduction time per subnet Net Type n i P ni (%) 7. PLL p = Filter p = 5852 #R P R (%) #C P C (%) Sim. time (s) Sim. speed-up (X) Red. time (s) Partition time (s) Orig NA SpRC c SpRC f PartMOR PACT NA NA NA NA NA NA NA NA NA SIP-mf NA NA NA NA NA NA NA NA > 24h Orig SpRC X SpRC-mf X PACT > 24h NA SIP-mf X N #part. AvgN size AvgS size Avg-p size Avg. red. time (s)

105 96 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Figure 4.7: PLL: re-ordered G and C in BBD form after NESDIS partitioning (dimension n + p = nodes). Figure 4.8: PLL: reduced Ĝ and Ĉ obtained with SparseRC (dimension n + p = 7946 nodes). The BBD structure is retained and the matrices remain sparse. Two SparseRC reduced models were computed: SpRC c and SpRC f, based on a coarse and fine NESDIS partitioning respectively, with the relevant statistics shown in Table 4.2. Both reduced models achieved excellent reduction rates in internal nodes and circuit elements, and were fast to simulate. After a coarse partitioning, the reduction time was smaller than after the fine partitioning due to less computational overhead in forming the reduced matrices per subnet. The SpRC f reduced model however was faster to simulate than SpRC c, possibly due to the fact that, although larger in numbers of nodes than SpRC c, SpRC f has fewer circuit elements. Determining the appropriate balance between preserved internal nodes n i and sparsity, and its influence on simulation time remains to be further studied. A direct PACT reduction is immediately dismissed, due to the prohibitive computational and density considerations. An SIP-based reduced model was attempted, but the fill-in monitoring actions were too expensive (> 24 hours). Comparison with PartMOR For the PLL, statistics of a reduced PartMOR model are also included, thanks to the authors of [70]. Compared to PartMOR, SpRC c has fewer internal nodes and circuit elements. The SparseRC models were faster to simulate, and also obtained in a shorter partitioning and reduction time. Although there is no original simulation to compare the reductions against, Fig. 4.9 shows an AC simulation wave-

106 4.4 Numerical results and circuit simulations 97 form for the two SparseRC models, and PartMOR. SpRC c and SpRC f overlap, confirming that the accuracy of SparseRC is robust to changes in the partition strategy, due to guaranteed local and global moment matching. The reduced PartMOR model was determined by local matching of the 0 th and 1 st moments at DC, however with no guarantee of matched moments for the recombined network (personal communication with PartMOR [70] authors, March 14, 2011). Thus, since the accuracy of PartMOR is dependent on the number of partitions, finer partitioning would likely be needed to reach what we infer are the correct waveforms produced by SparseRC. While PartMOR offers advantages in other respects, such as guaranteed positive elements, these results motivate a more thorough investigation into combining the strengths of both methods in the future. Figure 4.9: PLL. AC analysis of reduced models: SpRC c (red) and SpRC f (blue) are overlapping as expected. PartMOR (magenta) deviates slightly. The Filter This netlist is more challenging due to its large ratio p n > Two SparseRC reduced i models were computed: SpRC, based on NESDIS partitioning alone, and SpRC-mf 9, where after NESDIS partitioning a CAMD based re-ordering was applied on each subnet and additional internal nodes were preserved via fill-in monitoring operations. In both cases a fine partitioning into N = 2065 components was needed to distribute ter- 9 mf stands from minimum fill

107 98 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks minals into Avg-p=4 terminals per component. Although the SpRC-mf has much fewer circuit elements than SpRC, it takes longer to simulate, due to the presence of many internal nodes n i. The fill-monitoring operations inside SpRC-mf also make the reduction time significantly longer than for SpRC. The PACT reduced model is the smallest in dimension (has no preserved internal nodes) but extremely dense and useless in simulation (the re-simulation was stopped after 24 hours). An SIP reduced model was also computed: CAMD reordering and fill-monitoring actions were applied on the original netlist s graph to determine the reduced model with minimum fill. The result is shown in Fig. 4.11, where the minimum fill point is identified after the elimination of the first 1200 internal nodes. Therefore the optimal SIP reduced circuit achieves a much smaller reduction in internal nodes and elements than SparseRC, and is slower to simulate. Also, the SIP reduction time was much larger than SparseRC. The AC analysis comparison of SparseRC and SIP match with the original, as shown in Figure This further strengthens the advantages of partitioning: aside from enhancing sparsity by preserving separator nodes, it also makes fill-monitoring actions cheaper to perform. In summary, SparseRC achieves the best trade-off in terms of accuracy, dimension, sparsity, reduction time and re-simulation time. Finally, the Filter example reveals several directions for improving the SparseRC methodology. It indicates that other reduction transformations and/or further sparsification methods may be appropriate for circuits with many more capacitors than resistors. One option could be to postprocess the reduced system by dropping negative capacitances as in TICER [84], at the price of losing the two moment matching property. It was also seen that circuits with large p n ratios are the most difficult to partition and reduce with a satisfactory sparsity level. This could be resolved with the help of partitioners which could directly control the distribution of terminals, and remains for further research [47]. 4.5 Concluding remarks SparseRC is presented, a robust and efficient reduction strategy for large RC circuits with many terminals. It efficiently reduces testcases where traditional model reduction fails, due to either the large dimension of the problem at hand, or the density of the final reduced circuit. Developed on the divide and conquer principle, SparseRC uses graphpartitioning and fill-reducing node reorderings to separate a network into minimally connected components. These are reduced individually with an appropriate update of the interconnections among them. This guarantees that two multi-port admittance moments are matched for the entire net irrespective of the partitioning size or strategy. SparseRC reduced circuits contain fewer nodes and circuit elements compared to conventional MOR results. As challenging industrial testcases show, significant speedups are obtained when re-simulating SparseRC reduced models in place of the original circuits.

108 4.5 Concluding remarks 99 Figure 4.10: Filter. AC analysis of original (red), reduced SparseRC model (blue), and reduced SIP model with minimum fill-track (magenta) match perfectly. Figure 4.11: Filter. Determining the dimension of the SIP reduced model from CAMD reordering and node-wise elimination. The Minimum fill point is reached after eliminating the first 1200 of the internal nodes (for clarity only the first 1400 internal nodes are shown).

109 100 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks 4.6 Appendix Reflection on the partition-based RC reduction alternatives Two reduction approaches for RC networks were proposed in this thesis: SparseRC of this chapter and the one based on the strongly connected components of G from Chapter 3 [we refer to it as SCC(G)-based here]. At this point, it is appropriate to reconsider for which types of circuits the two alternatives are most suitable. The first indicator is the number of terminals p of the circuit: if p is within a few hundreds (what we call small ), then the SCC(G)-based reduction is usually sufficient to achieve good reduction rates in both the number of internal nodes and the number of circuit elements. Recall from Theorem that reduction based on SCC(G) partitioning is mathematically equivalent to that of eliminating all internal nodes up-front. Hence, when p is small, the fill generated in the p p reduced matrices is not a major concern. Nevertheless, when the circuit has a very large number of internal nodes (exceeding hundreds of thousands), the SCC(G) partitioning brings computational benefits as smaller subnets are reduced individually. This partitioning also gives a natural solution for computing path resistances between network terminals (see Sect ). In short, the SCC(G)-based reduction is recommended for circuits with a very large number of internal nodes, and a small number of terminals. When p exceeds thousands, the fill generated in the p p block from eliminating all internal nodes becomes non-negligible. Thus, fill-creating nodes must be identified and kept in the reduced model so that its sparsity is improved. This is achieved with SparseRC based on the partitioning of the entire R, C topology (rather than on the R- topology only as is done with the SCC(G)-based reduction). So, especially when the number of internal nodes is also large (hundreds of thousands), SparseRC is the method of choice. One limitation of SparseRC is the reduction of networks with many more capacitors than resistors. This could be improved with the help of better partitioners (see Chapter 7) and a reducing projection based on the C matrix rather than the G matrix (this would place more emphasis on capturing the behavior at higher frequencies rather than on low frequencies). The second indicator is the ratio of terminals p to internal nodes n i in the network. In general, as for most reduction methods, if p n is small (less than 1 i 10 ), good reduction rates in internal nodes can be obtained. Hence, if p n is small, the SCC(G)-based reduction is i recommended when p is small, while SparseRC is recommended when p is large. On the other hand when p n is high, only few internal nodes can be eliminated. This is i a challenge both for the SCC(G)-based reduction and for SparseRC. In this case, more advanced partitioners would help to find a more appropriate balance between the number of eliminated internal nodes and sparsity. Further considerations on partitioning are given in Chapter 7.

110 4.6 Appendix Additional experiments Additional RC experiments Additional SparseRC reduction experiments, based on NESDIS graph partitioning, were performed on the multi-terminal netlists in Table 4.3. The expectations regarding the improvements in sparsity and resimulation time, and the excellent approximation quality, are again confirmed by the SparseRC results. For the SymmRC1 circuit, challenging due to the presence of symmetries in the original design, the resimulation plots are shown in Fig The SparseRC reduced circuit captures the symmetry and matches the response of the original circuit. Table 4.3: Reduction for various netlists with SparseRC. Sim. time: Spectre [16] netlist simulation time, Total red. time: partitioning plus reduction time Net Type n i #R #C Sim. Total red. N time (min) time (s) # parts 9. Symmetric Orig RC (SymmRC1) SparseRC p = 1383 Red. rate 90.84% 43.71% 60.74% 2.7 X Symmetric Orig RC (SymmRC2) SparseRC p = 732 Red. rate 93.63% 75.93% 31.15% Shift register (Sband) p = 5003 Orig h 36min - SpRC-mf h12min Red. rate 66.93% 54.80% 6.83% 2.2 X - Using Mondriaan hypergraph partitioning To demonstrate that SparseRC can accommodate other partitioning algorithms than nested dissection, Tables 4.4, 4.5, and 4.6 collect the reduction results based on Mondriaan hypergraph partitioning [96]. With Mondriaan the resulting subnets are allowed to differ in size (by specifying an Imbalance factor), while with nested dissection the decomposition gives equal-sized partitions. The higher the Imbalance (the maximum value is 1), the more are the subnets generated by Mondriaan allowed to differ in size. This feature allows to experiment with different partition sizes and their impact on sparsity. In the following tables, each block row corresponds to the experiments performed for one netlist. SpRC B corresponds to the reduction statistics obtained after a balanced decomposition, and SpRC I after an imbalanced one. The quantities recorded in each column are the same as those of Table 4.2, with the difference that column Max-N size denotes the size of the largest subnet resulting from partitioning, and Max-p size the number of terminals falling in the maximum subnet. There are no significant dif-

111 102 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Figure 4.12: SymmRC1. Transient analysis of the original (red) and the SparseRC reduced (blue) circuit shows perfect match. The RMS error is also small, V. ferences in sparsity after reduction based on Mondriaan partitioning compared to the experiments based on nested dissection (Tables 4.1 and 4.2), however one experiment revealed an interesting insight. In particular, for the IS netlist, the imbalanced partition (SpRC I ) achieved significantly better reduction rates in the number of elements than the balanced partition, and also better than with nested dissection (see Table 4.1). This confirms that enforcing balanced decompositions may not be optimal for the multi-terminal reduction problem, and that more advanced partitioning criteria could significantly improve sparsity rates (this topic is addressed separately in Chapter 5). Partitioning time with Mondriaan was slower than with nested dissection especially for the large netlists (see the PLL and Filter examples); this is expected as Mondriaan is a hypergraph partitioner, while nested dissection is based on the Metis graph partitioner.

112 4.6 Appendix 103 Table 4.4: Reduction statistics with Mondriaan partitioning: part 1 Net Type n i P ni (%) 1. TL p = LNA p = MX 3 p = IS p = MX 7 p = 66 #R P R (%) #C P C (%) Red. time (s) Partition time (s) N #part. Max-N Imbal. size Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Max-p size

113 104 SparseRC: Sparsity preserving model reduction for multi-terminal RC networks Table 4.5: Reduction statistics with Mondriaan partitioning: part 2 Net Type n i P ni (%) 7. PLL p = Filter p = 5852 #R P R (%) #C P C (%) Red. Partition time (s) time (s) N Max-N Max-p Imbal. #part. size size Orig SpRC I SpRC B PACT NA Orig SpRC B SpRC I SpRC B SpRC I PACT

114 4.6 Appendix 105 Table 4.6: Reduction statistics with Mondriaan partitioning: part 3 Net Type n i P ni (%) 9. LNA 3 p = LNA 2 p = IF2cHB p = PPFHB p = TIAHB p = 142 #R P R (%) #C P C (%) Red. time (s) Partition time (s) N #part. Max-N Imbal. size Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Orig SpRC B SpRC I PACT Max-p size

115

116 Chapter 5 Reduction of multi-terminal RLC networks The reduction of multi-terminal RLC circuits is analyzed in the context of partitioning. Comparisons between reduction in first vs. second-order form are provided, and their potential for implementation in a partitioning setup is discussed. Based on the secondorder form, the BBD-based reduction of RLC networks is derived, and a post-processing procedure is proposed which allows the reduced model to be synthesized and re-used successfully with a circuit simulator. Advantages and limitations within this framework are identified, and directions for future research are provided. 5.1 Introduction Taking the methodology of Chapter 4 one step further, this chapter investigates the extension of partition-based reduction to large RLC networks. As for RC networks, the aim is to reduce large, multi-terminal RLC netlists efficiently, so that their re-use in a simulation setup is possible and more efficient than an original simulation, this at little accuracy loss. Achieving these goals simultaneously is more difficult for RLC than for RC networks, as this chapter reveals. The content here-in thus provides a skeleton for partition-based RLC reduction which identifies the main partition-reduction steps, the limitations encountered and the possibilities to overcome them. Among the popular approaches to multi-terminal RLC reduction are PRIMA [71] and its structure preserving followers SPRIM [27], and SAPOR [66]. These methods construct the reducing projection as a (block)-krylov subspace, and preserve the passiv-

117 108 Reduction of multi-terminal RLC networks ity of the circuit via congruence transforms. The structure preserving versions allow for voltage and current variables to be reduced separately, among the main goals being to (a) preserve terminal connectivity and (b) obtain a reduced model which can be cast back into an RLC netlist. While these methods may work well for relatively small-sized networks, their performance is limited when applied to large networks with many terminals. While on one hand, computing the Krylov subspaces for the original system is either too expensive or impossible, on the other the resulting matrices are prohibitively dense. Furthermore, the synthesis envisioned after an SPRIM or SAPOR projection is not as straightforward as it seems, due to the fact that the reduced models may contain singularities which in turn cause simulation failures. Here, a framework for multi-terminal RLC reduction is derived, with emphasis on achieving netlists that are re-usable in simulations. As a continuation of Chapter 4 which developed sparse RC reduction based on the PACT method [56], the starting point here is the extension of PACT to RLC circuits, namely [57]. In particular the potential of [56] in a partitionbased context is studied. In Sect. 5.2 the setup for unpartitioned reduction is described based on two system formulations: in the first- and second-order form respectively. For reasons there explained, the second-order form is chosen for partition-based reduction, which is presented in Sect A demonstrative numerical example is provided in Sect. 5.4, and Sect. 5.5 concludes. 5.2 Reduction without partitioning Consider the impedance-based Modified Nodal Analysis (MNA) [37] representation of an RLC(k) circuit: ([ G E T] [ ]) [ ] C 0 v + s = E 0 0 L i L [ B 0 ] u(s), (5.1) where v R n v are the voltage unknowns (voltage drops measured at all nodes in the circuit), i L R n L are the currents through inductors, G R (n v n v ) is the conductance matrix, C R (n v n v ) is the capacitance matrix, L R (n L n L ) is the inductance matrix (a diagonal with the inductor values if there are no mutual inductances, otherwise the mutuals appear as off-diagonal entries). E R (n v n L ) is the incidence matrix which determines the topological connections for the inductors. The inputs are u R p, the current injections into the p terminals of the circuit, and B R (n v p) is the incidence matrix of current injections (the i th column of B is the i th unit vector for each input i = 1... p). Let also n be the number of internal nodes, so that n v = n + p. The starting point for multi-terminal RLC reduction is [57], which reduces (5.1) in firstorder form using so called split congruence transforms. The splitting projects separately the voltage from the current unknowns, essentially preserving the structure of the RLC circuit, and is necessary to ensure that passivity is preserved [55, 57]. The same concept of splitting the reducing projection is seen in SPRIM [27], the Krylov framework [8] and

118 5.2 Reduction without partitioning 109 RLCSYN [93]. In Sect we present the main derivations for reduction in the firstorder form based on split congruence transforms, according to [57]. Then, in Sect , the new reduction in second-order form is presented which, as will be seen, does no longer require split transformations First-order form In (5.1), let the unknown voltages in (5.16) be split into selected nodes v S to be preserved (p terminals and if desired additional m internal nodes), and internal nodes to be eliminated v R. This dictates the following structure of (5.1): G S GK T E T S G K G R ER T C S C T K 0 + s C K C R 0 E S E R L v S v R il = B S 0 u(s) (5.2) 0 We make the analogy between the form (5.2) and the congruence-based framework of [57]. Notice that the bottom right corner block of G in (5.2) is unsymmetric. The derivations of [57] are based on congruence transformations applied to symmetric matrices, hence a sign change is required for the last equation of (5.2), which gives: G S GK T E T S G K G R ER T C S C T K 0 v + s C K C R 0 S v R = E S E R L i L ([ ] [ ]) [ G P GC T C + s P CC T ] xp = G C G I C C C I X I B S 0 u(s) (5.3) 0 [ bp 0 ] u(s) (5.4) The matrix blocks pertaining to (5.3) are redefined as in (5.4), as to directly identify [ ] the vi analogy with [57, equation (12)]. The internal unknowns then become X I =, and contain both voltage and current variables. This will later require a structure preserving reduction (also called split), as to guarantee passivity preservation [55]. In [57] the following congruence transformation: [ I 0 X = W I ], where W = C 1 I C C (5.5) i L is applied to (5.4) as to zero-out C C : with: G P G C G T C G I + s C P 0 0 C I [ x P X I ] = [ ] bp u(s) (5.6) 0

119 110 Reduction of multi-terminal RLC networks G P = G P + W T G I W + W T G C + G T C W, G C = G C + G I W (5.7) C P = C P + C T C C 1 I C C. (5.8) At this point, it is worth pausing to answer the following questions: why is the transformation to zero-out the capacitance connection matrix C C needed and why not zero-out G C, as was done for RC circuits in Chapter 4? There are two main reasons: passivity preservation and moment matching. First [57, Theorems 1, 2] show that, for RLC reduction to preserve passivity, the underlying projection must be split, as to preserve the structure dictated by the voltage and current variables [see v R, i L in (5.3)]. As explained in [55, Sect. 6.3], a transformation of the form G 1 I G C which would zero-out G C from (5.4) is not naturally split, due to the presence of the E R blocks in G I. However, since C I is block diagonal, the transformation (5.5) to zero-out C C is naturally split, and maintains the necessary structure for passivity preserving reduction. Furthermore, based on the assumption that the model has the form (5.6), namely that the capacitive connection blocks are zero, [57, Theorem 4] shows that reducing the internal blocks of (5.6) via a structure-preserving (split) Krylov projection matches moments of the original model system (5.4). This is outlined next. The multiport admittance response of (5.6) describes the circuit behavior at the ports, and is obtained by eliminating the X I variables from the second equation of (5.6) and replacing them in the first. This is expressed as follows: Y (s)x P (s) = b p u(s), where : (5.9) Y ( (s) = G ) P + sc P G CT (GI + sc I ) 1 G C (5.10) }{{}}{{} Y P (s) At this point, [57, Theorem 4] enables the construction of a reduced model which matches multi-port admittance moments of the original system (5.6) via a congruence transform: [ I 0 0 V T ] G P G C G T C G I + s C P 0 0 C I Y I (s) [ ] I 0, (5.11) 0 V which reduces (5.6) to: G P Ĝ C Ĝ T C Ĝ I [ ] + s C P 0 x P = 0 Ĉ X I I [ ] bp u(s), (5.12) 0 In [57, Theorem 4], it is proven that the reduced (5.12) matches 2m moments of (5.6) at s 0 if A = G I + s 0 C I is non-singular, G I and C I are symmetric and V is a subspace of

120 5.2 Reduction without partitioning 111 linearly independent columns which satisfies: [ ( ) ] span A 1 i C I A 1 G V, i = 0... m 1. (5.13) C From (5.12), the block matrices corresponding to port nodes x P are preserved, and the internal matrices are reduced to: Ĝ I = V T G I V, Ĝ C = V T G C, Ĉ I = V T C I V. The reduced multi-port admittance response which characterizes (5.12) is then: Ŷ(s) = (G P + sc P }{{} ) ĜT C (ĜI + sĉi ) 1 Ĝ }{{ C. (5.14) } Y P (s) Two important conditions must be satisfied as to ensure moment matching and passivity preservation: (a) G I and C I must be symmetric and (b) V must be split according to structure of the internal matrices from (5.3). In this sense, the V which satisfies (5.13) with equality is nothing but an SPRIM [27] projection of the internal admittance response Y I (s) from (5.10). The splitting of V allows the reduced model (5.12) to be expressed as a counterpart of (5.3): G S Ĝ T K Ê T S Ĝ K ĜR Ê S ÊR 0 Ê T R + s Ŷ I (s) C S ĈR L v S v R îl = B S 0 0 u(s) (5.15) System (5.15) being in a symmetric form can be unstamped into an RC equivalent netlist (where negative R and C values are also generated). Difficulties with the first-order form Several difficulties have been encountered in practice with the first-order reduction setup. 1. In practice, it was observed that the reduced block [ Ê T S Ê T R ] in (5.15) does not always have linearly independent columns. This in turn caused failures in computing the DC solution during the the re-simulation of the synthesized reduced model. Hence a post-processing step would be required to ensure linear independence, this without affecting the structure of the input/output matrix. We will see how this is easily achieved from the second-order form. [ GR E 2. If partitioning is employed, the internal matrix G I = R T ] resulting per E R 0 sub-block may become singular (even when for the unpartitioned system it is not

121 112 Reduction of multi-terminal RLC networks singular). This happens for instance when nodes to which inductors are connected are promoted as separators (these nodes become terminals and, when shorted together, may cause inductor loops [57]). It can also happen that from partitioning, the ER T block per subnet has columns of zero, which is undesirable. All these structural side-effects from partitioning make it thus difficult to construct a Krylov subspace (5.13) per subnet, especially in the context of matching moments at DC. A more detailed discussion on this issue follows Sect Next, a framework is derived for multi-terminal reduction in the second-order form, through which item 1. finds a solution, while item 2. can be resolved in certain scenarios Second-order form By eliminating the current variables i L from the second equation of (5.1) and replacing them in the first, it is possible to express (5.1) in second-order form: (G + sc + 1s Γ ) v = B, with Γ = E T L 1 E. (5.16) The second-order form (5.16) has the following main advantages: (a) congruence transforms on (5.16) automatically preserve passivity without need of splitting, as all unknowns are voltages (b) it defines the circuit graph topology in an intuitive way, giving a natural avenue for partition-based reduction and (c) some of the numerical limitations encountered from partitioning in the first-order form can be avoided. In addition to the resistive and capacitive topology determined by G and C respectively (as was the case for the RC circuits), Γ dictates the topology of inductors. Hence, the graph associated with the RLC circuit is given by the non-zero pattern G := nzp(g + C + Γ). The following derivations are based on the second-order form. As usual, the unknown voltages in (5.16) are split into selected nodes v S to be preserved (p terminals and if desired m additional internal nodes 1 ), and internal nodes to be eliminated v R, revealing the following structure: ([ GR G K G T K G S ] [ ] CR C + s K CK T + 1 C S s [ ]) [ ] [ ΓR Γ K vr 0 = Γ T K Γ S v S B S ] u(s). (5.17) For the same moment matching considerations of Sect , a congruence transformation is applied first which zeroes-out the C K connections (this transformation can be 1 These are for instance the separator nodes that are revealed from partitioning, as in Chapter 4

122 5.2 Reduction without partitioning 113 easily shown as analogous to the first-order case of zeroing-out C C ). Let: X = [ I W 0 I X R n v n v is only a transformation (no reduction) of (5.17) into: ], W = C 1 R C K. (5.18) G R G T K G K G S G = X T GX, C = X T CX, Γ = X T Γ X, B = X T B (5.19) [ ] CR 0 + s 0 T C + 1 Γ R Γ [ ] [ ] K vr T 0 S s v = u(s),(5.20) B S S Γ K Γ S where: G S = G S + W T G R W + W T G K + G T K W, G K = G K + G R W (5.21) Γ S = Γ S + W T Γ R W + W T Γ K + Γ T K W, Γ K = Γ K + Γ R W (5.22) C S = C S CK T C 1 R C K, C K = C K + C RW = 0. (5.23) As with the first-order form let Y (s) be the multi-port admittance response of (5.20), obtained by eliminating the v R unknowns: Y (s) = ( G S +sc S + 1 ) ( s Γ S G K + 1 ) T ( s Γ K G R +sc R + 1 ) 1 ( s Γ R G K + 1 ) s Γ K }{{}}{{} Y S (s) Y R (s) (5.24) Towards understanding how (5.20) can be reduced by moment matching, we make the following important analogy between the first and second-order formulation: Lemma The multiport admittance Y (s) given by (5.24) which characterizes the secondorder system (5.20) is equal to Y (s) given by (5.10) which pertains to the first-order form (5.6). Proof Without detailing all the derivations, based on the assignments (5.3)-(5.4), the form of (5.7)-(5.8), and of (5.21)-(5.23), it can be shown that: Y P (s) = G S +sc S (5.25) Y I (s) = 1 s Γ S (G + K + 1 ) T ( s Γ K G R +sc R + 1 ) 1 ( s Γ R G K + 1 ) s Γ K (5.26) }{{} Y R (s) Y (s) = Y P (s) Y I (s) = Y S (s) Y R (s) = Y (s) Lemma allows reduction to be performed in the second-order form, similarly to

123 114 Reduction of multi-terminal RLC networks that in the first-order form. This amounts to reducing the Y R (s) term of (5.24). Let for the moment Q R n v k be a transformation which reduces the matrix blocks from (5.17) corresponding to the internal nodes v R. As in the first-order case, the following congruence transform applied to (5.20): [ Q T ] 0 G R 0 I G T K G K G S + s [ CR 0 0 T C S ] + 1 Γ R s Γ T K Γ K Γ S [ ] Q 0 0 I (5.27) reduces (5.20) to: ([ ĜR ĜK Ĝ T K G S ] + s [ ĈR 0 0 T C S ] [ + 1 ΓR ΓK s Γ T K Γ S ]) [ vr v S ] = [ 0 B S ] u(s), (5.28) where (5.21)-(5.23) hold and: Ĝ R = Q T G R Q, ĜK = QT G K, ΓR = Q T Γ R Q, ΓK = Q T Γ K, ĈR = QT C R Q. (5.29) The multi-port admittance function for the reduce model (5.28) is similarly expressed by eliminating the v R unknowns: Ŷ(s) = ( G S +sc S + 1 ) s Γ S (Ĝ K + 1s Γ ) T K (Ĝ R +sĉr + 1s Γ ) 1 R (Ĝ K + 1s Γ ) K = (5.30) = Y S (s) ŶR (s). (5.31) As regards the appropriate construction of Q from (5.27), one possibility is to obtain it via a SAPOR [66] reduction of Y R (s) (similarly to how V is constructed as an SPRIM projection of Y I (s) in the first-order form of Sect ). Via SAPOR, a linearization of Y R (s) is needed to form an associated Krylov subspace (for a given expansion point s 0 ), from which the reducing projection Q is obtained (details on the construction of the SAPOR projection Q are given in [66]). One apparent hurdle for SAPOR is the frequency dependency s inside the input-output term G K + 1 s Γ K. Nevertheless, it is shown next that the linearization proposed in [66] eliminates the dependency on s when computing the actual Krylov subspace, so that the usual SAPOR reduction applies. Linearization and SAPOR Following [66] we outline the linearization of Y R (s) from (5.24). This then leads to the formation of the Krylov subspace from which the reducing projection Q is extracted. Consider Y R (s) separately from (5.24). Y R (s) then represents the transfer function of

124 5.2 Reduction without partitioning 115 the following internal system: ( ) G R +sc R + 1 s Γ R v R (s) = y R (s) = ( G K + 1 s Γ ) K u R (s) ( G K + 1 s K) Γ T vr (s) (5.32) where u R (s) are the inputs of the internal system and y R (s) are the outputs. Multiplying (5.32) with s, one obtains: (s 2 C R +sg R +Γ R ) v R (s) = Shifting (5.33) by setting s = s 0 + σ, one obtains the shifted system: where: ( sg K +Γ ) K u R (s). (5.33) (σ 2 C R + σd R + K R )v R (σ) = (B 0 + σb 1 )u R (σ), (5.34) D R = G R + 2s 0 C R, K R = s 2 0 C R + s 0 G R + Γ R, B 0 = s 0 G K + Γ K, B 1 = G K, (5.35) and s 0 is such that K R is non-singular. System (5.34) is linearized as follows. Let the intermediate variable z(σ) be such that: Replacing (5.36) in (5.34) results in: ( σc R v R (σ) + z(σ) = B 1 u R (σ) (5.36) σz(σ) + (σd R + K R )v R (σ) = B 0 u R (σ) ) I + σk 1 R D R v R (σ) σk 1 R z(σ) = K 1 R B 0 u R (σ). (5.37) Combining (5.37) with (5.36) leads to the following linearized system: ([ ] [ I 0 K 1 σ R D R K 1 R 0 I C R 0 ]) } {{ } :=I σt [ vr (σ) z(σ) ] = [ K 1 R B 0 B 1 ] u R (σ). (5.38) Notice that the input matrix of the linearized system (5.38) has no dependency on the frequency σ [B 0 and B 1 as in (5.35) are independent from σ]. Based on (5.38), the Block SAPOR method [66] proposes to compute an orthonormal basis Q which spans a desired number of block moments of v R (σ). This is then used to project the internal system (5.32): ( Q T G R Q+sQ T C R Q+ 1 ) s QT Γ R Q Q T v R (s) = ( Q T G K + 1 ) s QT Γ K u R (s), (5.39) which are the same matrices underlying our reduced second-order system (5.28)-(5.31). Note also that if moments at s 0 = 0 are to be matched, then from (5.35) we have that

125 116 Reduction of multi-terminal RLC networks K R = Γ R. From (5.38) this would require K R = Γ R to be invertible. As in the first-order case, we revise the question as to why not zero-out G K instead of C K in (5.17). Indeed, since in the second-order form the current variables i L are eliminated, the transformation (5.18) preserves the passive form automatically via congruence without any splitting. Hence, contrary to the first-order case, having W = GR 1 G K in (5.18) is justified as far as passivity is concerned. Although the option to zero-out G K is not completely ruled-out, two words of caution are mentioned for this scenario. (a) The inputoutput matrices of Y R (s) would have the form sc K + 1 s Γ K, in which case the linearization described above would not eliminate the dependency on s therein. A possible direction to resolve this could rest in [10], which derives moment matching projections for systems with input-output matrices that may depend on different powers of s. (b) Due to the analogy between the first and second-order admittance responses from Lemma (both based on C C and C K zeroed-out) the reduction in the second-order from inherits the moment matching properties from the first-order form. Whether/which moments would be matched by zeroing-out G K remains an open question. The same arguments could be raised in case Γ K is zeroed-out. While the PACT-based approach [55, 57] analyzed here needs the zeroing-out of C K to match moments, an alternative to avoid this altogether would be to reduce each subnet with SPRIM [27] in the first-order form, or with the second-order Arnoldi-based methods [9, 66, 89] in the second-order form. Synthesis from the second-order form Writing the reduced system (5.28) in compact form: ( Ĝ + sĉ + 1 ) s Γ v = Bu(s), (5.40) the final step is to synthesize (5.40) as a netlist to be re-used in simulation. While most publications related to the second-order form consider the reduced model (5.40) to be readily synthesizable, in practice this is not directly achieved. It seems natural to unstamp the matrices from (5.40) into the corresponding resistor, capacitor and inductor values respectively. While unstamping Ĝ and Ĉ poses no challenges, Γ usually contains inductor loops. These in turn generate simulation failures: inductor loops cause short circuits when the circuit operates at DC (at DC, the capacitor acts as an open-circuit and the inductor as a short). These limitations have been reported in the RLCSYN [93]. There, a post-processing of Γ is proposed, which diagonalizes Γ R as to ensure that inductors are connected only to ground, at the same time preserving the structure of the input matrix in (5.28). Due to this latter requirement, the procedure from [93] only holds when ΓK = 0 (based on the assumption that Γ K = 0) and is thus restricted to netlists where no inductors are connected to terminals. In practice however parasitic extracted RLC netlist have been encountered which do have inductors connected to terminals. Next, we show that a post-processing of Γ which eliminates inductor loops and preserves the input-incidence structure can be applied in general also to circuits where inductors are

126 5.2 Reduction without partitioning 117 connected to ports. Since Γ is real, symmetric, positive-semi-definite and Γ is obtained from congruence transformations, Γ will also be real, symmetric, positive-semi-definite. Hence a rank revealing Schur decomposition of Γ = UDU T with U unitary and D diagonal can be partitioned into: Γ = [ U U0 ] [ D 0 0 D 0 ] [ U T U T 0 ] = U D U T + U 0 D 0 U T 0 = Γ + Γ 0, (5.41) where D 0 contains the zero eigenvalues of Γ. Similarly to [93], eigenvalues smaller than a given tolerance may also be collected in the Γ 0 term, as these would correspond to over-large inductors. Hence the Γ 0 term is disregarded and Γ is retained. Defining L = D 1, we have Γ = U L 1 U T, and notice the analogy with (5.16). It is thus possible to cast the reduced model (5.40) directly in the first-order form: ([ Ĝ U U T 0 ] [ Ĉ 0 + s 0 L ] ) [ v îl ] = [ B 0 ] u(s), (5.42) where the last equation contains a sign change as to obtain a symmetric representation. The symmetric reduced model (5.42) can now be unstamped into an RC equivalent circuit and re-simulated. As most netlists obtained from unstamping, the reduced netlist will also contain some negative R and C values, nevertheless it successfully passes the Spectre [16] simulation. Two important observations are made regarding (5.42). (a) U has full column rank, in contrast to (5.15) which had rank deficiencies. Hence a DC solution of the synthesized netlist is found during the re-simulation. (b) The rank revealing decomposition of Γ allows the conversion from the reduced second-order form (5.40) into (5.42) without spoiling the structure of the input matrix B, and makes no assumptions about the structure of the original Γ. In this respect, the rank revealing procedure proposed here is a step further from that in [93]. Difficulties with the second-order form The main difficulty with reduction in the second-order form arises when attempting to match moments of Y R (s) from (5.24) at s = 0. This requires Γ R to be invertible, which is not always the case in practice. A more detailed discussion on this aspect is given in Sect

127 118 Reduction of multi-terminal RLC networks A parallel between 1st and 2nd order form As a summary of Sect and Sect , table 5.1 collects the main derivations steps for reduction in the first vs. second order form in the PACT framework and the analogies between the two. The reduction flow for the first or second order form respectively is shown in the corresponding column. The column link makes the connection between the operations on the first and and second order form at each step. 5.3 Partition-based reduction As motivated at the beginning of Sect , the second-order formulation provides an intuitive approach for reducing an RLC network by parts. To illustrate the partitionbased reduction in the second-order form, let (5.16) be reordered according to the BBD structure 2, as was done for RC circuits in Chapter 4: G 11R G 11K 0 0 G 13R C 11R C 11K 0 0 C 13R G11 T K G 11S 0 0 G 13S C T 11 K C 11S 0 0 C 13S 0 0 G 22R G 22K G 23R +s 0 0 C 22R C 22K C 23R G22 T K G 22S G 23S 0 0 C T G13 T R G13 T S G23 T R G23 T 22 K C 22S C 23S S G 33 C13 T R C13 T S C23 T R C23 T S C 33 Γ 11R Γ 11K 0 0 Γ 13R x + 1 Γ11 T 1R 0 K Γ 11S 0 0 Γ 13S x 1S s 0 0 Γ 22R Γ 22K Γ 23R x 0 0 Γ22 T K Γ 22S Γ 2R = B S x 2S B u(s),(5.43) 2 Γ13 T R Γ13 T S Γ23 T R Γ23 T S Γ 33 x 3 B 3 where each subnet is further split into the blocks corresponding to internal nodes to be eliminated x ir, and terminals to be preserved x is, i = 1, 2 and x 3 are the separator nodes. The projection which, for each subnet i = 1, 2 zeroes out the C iik, C i3r connection blocks, and reduces the internal matrices G iir, C iir, Γ iir together with the connection blocks G iik, Γ iik is: X Q = Q 1 W W 13 0 I S Q 2 W 22 W I S I 3, where W 11 = C 11R C 11K W 13 = C 11R C 13R (5.44) W 22 = C 22R C 22K W 13 = C 22R C 23R 2 For clarity, only the 2-way partitioning is shown: two subnets and one separator.

128 5.3 Partition-based reduction 119 Start Transformation Transformed model Transfer function Matching moments Reduced transfer function Reduced model Model for synthesis Table 5.1: Diagram with the main derivations for reduction in the first- and second-order form, in the PACT framework 1st order link 2nd order G S G T K E T S G K G R E T R E S E R 0 X = [ I 0 W I G P G C Y (s) = G C G I ([ GP G T C G C G I T ] + s ] C S C T K 0 C C K C R L [ CP C + T s C C C C I ]) [ xp X I ] v S v R i L = = [ bp 0 B S ] u 0 0 u Γ R = E T R L 1 E R, Γ S = E T S L 1 E S, Γ K = E T S L 1 E R ([ GR G K G T K G S ] + s [ CR C K C T K C S, where W = C 1 C C 1 C = I C I C + s ( P P ) G + sc } {{ } Y P (s) C P 0 0 C I T [ C 1 R C K 0 ] X = [ I W 0 I ] ] + 1 s, where W = C 1 R C K [ x P x I ] = [ bp 0 ] u(s) G R G K G K T G S +s [ CR 0 0 T C S G C (G I + sc I ) 1 G C }{{} Y Y I (s) (s) = Y (s) Y (s)= ( S ) G S+sC S+ 1 Γ } s Γ {{ } Y S (s) ] + 1 s [ ΓR Γ K Γ T K Γ S Γ R Γ K Γ T K Γ S ]) [ vr v S ] [ vr v S ] = = [ 0 B S ] [ 0 B S ] ( K ) ( K R R ) ( R K K ) G + 1 Γ T G +sc Γ G + 1 Γ } s Γ s Γ s Γ {{ } Y R (s) u u Rational Krylov on Y I(s) (for matching at s 0 = 0, G I must be invertible, otherwise apply transformations proposed in [55, 57]) to isolate the singularities at DC. Ŷ(s) = (G P + sc P) }{{} G S Y G P P (s) Ĝ T C Ĝ C Ĝ I Ĝ T K Ê T S Ĝ K Ĝ R Ê T R Ê S Ê R 0 ([ Ĝ U U T 0 ĜT C(ĜI + sĉi) 1 Ĝ C }{{} + s + s ] + s Ŷ I (s) C P 0 0 Ĉ I Y I(s) = 1 s Γ S+ Y R(s) Rational Krylov on linearized version of Y R(s), i.e. SAPOR [66] (for matching at s 0 = 0, Γ R must be invertible) Ŷ(s) = Ŷ(s) Ŷ(s)= ( S ) G S+sC S+ 1 Γ } s Γ {{ } [ C S Ĉ R L [ Ĉ 0 0 L ] ) [ x P XI ] v îl = v S v R [ bp 0 îl ] = ] = [ B u 0 B S ] u 0 0 u ΓR = Ê T R L 1 Ê R, Γ S = Ê T L 1 S Ê S, ΓK = Ê T L 1 S Ê R L = D 1 ([ ĜR Ĝ K Ĝ T K G S Y (s) ] + s (Ĝ K + 1s Γ K ) T (Ĝ R +sĉr+ 1s Γ R ) 1 (Ĝ K + 1s Γ K ) }{{} Ŷ R (s) [ ] [ ĈR 0 ΓR 0 T C + 1 ΓK s T S Γ K Γ S (Ĝ + s Ĉ + 1 Γ ) s v = Bu Rank revealing transformation: ] [ U T [ ] [ D 0 Γ = U U0 0 D 0 U T 0 ] ]) [ vr v S ] = [ 0 B S ] = U D U T + U 0 D 0 U T 0 = Γ + Γ 0 u

129 120 Reduction of multi-terminal RLC networks and Q 1, Q 2 are the projections (for instance obtained via SAPOR) reducing the internal contributions Y i R (s) for each subnet. Projecting (5.43) with X Q gives the reduced model in BBD form: Ĝ Ĝ11 11R K 0 0 Ĝ13 R Ĉ 11R Ĝ11 T K G 11 S 0 0 G 13 S 0 C 11 S 0 0 C 13 S 0 0 Ĝ22 Ĝ22 Ĝ23 R K R +s ĜT 22 K G 22 S G Ĉ22 R S C 22 S C 23 S Ĝ13 T R G T ĜT 13S 23 R G T 23S G 33 0 C T 13S 0 C T 23S C 33 Γ11R Γ11K 0 0 Γ13R T + 1 Γ 11K Γ 11 S 0 0 Γ x 1R 0 13 S x 1S 0 0 Γ22R Γ22K Γ23R x s T 0 0 Γ 22K Γ 22 S Γ 2R = B 1 0 x 23 S 2S B u(s).(5.45) 2 T Γ 13R Γ T T 13S Γ 23R Γ T x 23S Γ 3 B 3 33 Finally, the rank-revealing decomposition of Γ in (5.45) can be performed to cast the reduced model (5.45) in first-order form symmetric (5.42), which is then synthesized as an RC netlist and simulated Towards solving the DC moment matching problem In circuit analysis, often the first type of simulation performed is a DC analysis, which computes a solution to the circuit equations when all capacitors act as open-circuits and all inductors as short circuits. It is thus desirable for the reduced circuit to have the same DC solution as the original circuit. In other words, the reduction should preserve at least one moment of the original transfer function at DC. For RC circuits, this is easily achieved (see Chapter 4), while for RLC additional numerical challenges arise when the underlying conductance matrix is singular. These are discussed next, as well as possibilities to overcome them. Recalling (1.13), for moment matching at DC (at s = 0) the system matrix A must be invertible (w.l.o.g. we refer in the next discussion to A, rather than to G, or G). In typical Krylov-based reduction setups, if A is singular a non-zero expansion point s 0 = 0 is chosen and moments are matched around this point. In this manner however, the explicit matching at DC is not directly addressed. Here we are interested in computing a DC solution even when A is singular. In the RC case, singularities in A = G occur only if there are nodes with no connections to resistors, or nodes which are isolated at DC from the port nodes (what [55, 57] call type-1 singularities). In Chapter 4, the reducing projection for RC circuits, whether in the partitioned or unpartitioned case, was able to easily match moments at DC, even when the relevant A matrix was singular. This was possible because the singularities could be easily isolated from the network (where easily means without additional fill-

130 5.3 Partition-based reduction 121 creating transformations), so that there always remained an invertible sub-block of A (details are given in Sect ). In the RLC case, type-1 singularities can be removed in a similar manner, however the so-called type-2 singularities [55, 57] pose additional challenges (we refer to singularities of A, which represents the conductance matrix of an RLC circuit in the first-order form). Type-2 singularities can occur if the network: (a) has inductor loops or (b) has inductive paths between terminals. If a network is well-defined, such situations do not occur in practice. However, even when type-2 singularities are not present in the original, unpartitioned problem, inductive paths can appear between terminals and separator nodes, or between separator nodes, in the subnets resulting from partitioning. We discuss possibilities to resolve these. One elegant solution to avoid type-2 singularities would be to ensure that partitioning does not pick as separator nodes those which would create inductive paths between the subnetwork s terminals (recall that the terminals of a subnet are a subset of the network terminals plus the separator nodes pertaining to the subnet). Another solution (assuming that the partitioning is done naively without accounting for the singularities) would be to isolate that part of a subnet which contains an inductive path between terminals and not reduce it (similar actions have been proposed in [69]). This alternative is justified as in practice such subnets do not appear often or are typically small. A third alternative was proposed in [55,57] and involves a set of split transformations based on the range and nullspace of A. These split the network into a part with no DC singularities, to be reduced, and one containing all DC singularities which is preserved. The transformations however introduce fill-in, which is undesirable (an example on how to achieve the same effect in a sparse manner is given in [55], however the feature is not documented in detail). Next, we investigate whether the second-order form provides additional alternatives. Singularities at DC: first vs. second-order form One of the motivations for performing reduction in the second vs. the first-order form was to avoid singularities which may be introduced in the G I internal matrices per subnet. This can happen when nodes to which inductors are connected become separator nodes. An example is provided next, which illustrates how a singularity which appears in the first-order form, can be by-passed, under certain conditions, in the second-order form. Fig. 5.1 shows an RLC subnet that may result from partitioning, where the blue nodes v 1, v 2 are separator nodes, connecting this subnet to the rest of the circuit, v 3 is an actual terminal (in red) and v 4 is an internal node. Hence, when considered separately from the rest of the network, nodes v 1, v 2 become terminals of the subnet, along with v 3. The governing MNA equations for this subnet, in the first-order form partitioned according to (5.3)-(5.4), are shown on the right of Fig Notice that the internal matrix G I re-

131 122 Reduction of multi-terminal RLC networks v3 R3 L3 v1 R1 C1 R2 v2 L1 v4 L2 (G + sc)x=bu g g g 2 + g 3 g 3 g g 3 g g 1 g 2 0 g 1 + g v v c 1 c v s 0 0 c 1 c v 4 = i 1 i L i L 2 0 L i i L L 3 i L Figure 5.1: Left: RLC sub-component of a larger circuit illustrating singularities at DC. Right: MNA circuit matrices partitioned according to (5.3)-(5.4) with a singular G I block. sulting from the structure of Fig. 5.1 is singular: E R has two linearly dependent columns and also a zero column. Matching moments at s 0 = 0 (DC) via the Krylov subspace (5.13) is not possible directly for this subnet, hence [57] propose further transformations to eliminate the singularities from G I, at the price of introducing more fill-in. It would be possible though to form a Krylov subspace (5.13) for an expansion point s 0 = 0. When written in the second-order form (5.17), the governing MNA equations for this RLC subnet are: g g 1 0 g 2 + g 3 g 3 g 2 0 g 3 g 3 0 g 1 g 2 0 g 1 + g s γ 1 + γ 3 0 γ 3 γ 1 0 γ 2 0 γ 2 γ 3 0 γ 3 0 γ 1 γ 2 0 γ 1 + γ 2 v 1 v 2 v 3 v 4 = i i 2, i where the internal matrix blocks are G R = g 1 + g 2, Γ R = γ 1 + γ 2. After the linearization of Y R (s) according to (5.38), it becomes possible to express and match the moment at DC for this subnet. This avoids the fill-introducing operations that would otherwise be necessary in the first-order form. In this example, matching at DC is possible in the second-order form because Γ R is invertible (although in the first-order form G I is otherwise singular); in practice, networks have been encountered where Γ R is singular, in which case the DC moment matching problem in the second-order form still remains open. An extended version of the previous subnet is presented in Fig. 5.2, which has the same three terminals v 1,..., v 3 and five internal nodes v 4,... v 8. Note that, as in the previous example, if the system is expressed in first-order form, the internal matrix G I will

132 5.3 Partition-based reduction 123 be singular. This circuit is reduced nonetheless in the second-order form with SAPOR based on matching one moment of Y R (s) at s = 0. The comparison between the response of the original and the reduced circuit is shown in Fig. 5.2, where the accurate approximation for the low frequencies is visible. v1 R5 v6 v3 L3 R1 C1 L1 v4 L4 C2 L6 v5 R3 R2 L2 v7 L5 v2 R4 v8 L7 Figure 5.2: Left: RLC subnet reduced with moment matching at DC, in the second-order form. Right: the curves for the maximum and minimum singular values of the frequency response, for the original (black) and reduced (red) circuits respectively. The reduced circuit matches the behavior of the original well for low frequencies, expected due to moment matching at DC Identifying fill-in For simplicity, in this discussion we denote by an arbitrary block from the reduced matrices in (5.45). As for multi-terminal RC reduction, the fill generated in the S and 33 blocks is minimized through partitioning and the preservation of separator nodes. For RC circuits, reduction by matching the two moments at DC usually guarantees sufficient accuracy, these being entirely captured by the S blocks (see Chapter 4). In contrast, in the RLC reduction scenario contributions from the reduced internal blocks are present in the form of R and K, as seen from (5.45). So, while the S and 33 terms are sparse, the R and K blocks are dense. The density of R and K is due to two factors: (a) the transformation which zeros-out the C iik, C i3r connection blocks from the original model (5.43), and (b) the formation of the Krylov subspaces reducing Y i R (s) per subnet, which results in dense projections Q i. It is nevertheless possible to improve sparsity by simultaneously diagonalizing the Ĝii R, Ĉ iir blocks via an eigenvalue decomposition [these are symmetric, positive semi-definite matrices hence the pencil (Ĝii R, Ĉii R ) has real eigenvalues]. After this operation, the

133 124 Reduction of multi-terminal RLC networks reduced Ĉ matrix becomes very sparse as seen from its structure in (5.45). The remaining dense block appears as U when (5.45) is cast in the first-order form (5.42). U could be obtained in a sparse manner for instance via a sparse LDL T decomposition of Γ, provided that Γ is sparse. In most examples encountered however, the resulting Γ of (5.45) has dense R and K blocks as previously explained. Hence, the final reduced system (5.42) when unstamped as an RC circuit will typically have very few capacitors and many resistors, sometimes more resistors than in the original circuit. A much sparser representation could be obtained if (a), the transform to zero-out the C iik, C i3r blocks would be by-passed, (b) a sparse representation for the Krylov subspaces Q i would be found and (c) Γ could be factorized as into an LDL T in a sparse manner. Point (a) could be avoided if for instance a splitting of nodes into x ir and x is per subnet would be found, as to ensure directly that C iik = 0, C i3r = 0. Point (b) especially raises a new question in itself: the formation of sparse Krylov subspaces from sparse matrices. Point (c) would be possible as long as Γ is sparse, which in turn would be satisfied with the help of (a) and (b). Having identified the causes of fill-in and the potential directions to reduce it, implementing these aspects remains for further research. For another partition-based approach which relies on macro-model realization rather than on unstamping, and thus controls sparsity in a different manner, we refer to PartMOR [70] Moment matching and the dimensions of the reduced blocks The internal contributions Y i R (s) to be reduced per subnet have the form (5.24). For instance, during the reduction of subnet i (i = 1, 2), the internal blocks corresponding to (5.24) are as follows: G K := [ G R := G iir, C R := C iir, Γ R := Γ iir (5.46) ] G ii K G i3, Γ [ ] R K := Γ ii K Γ i3. (5.47) R As for any Krylov-based reduction, the number of columns of G K + 1 s Γ K dictates the dimension M i for one moment of Y i R (s). This is equal to the number of terminals of subnet i plus the number of separator nodes through which subnet i communicates with the rest of the circuit, i.e.: M i = x is + x 33. Thus, if m moments of Y i R (s) are to be matched per subnet, the dimension of the reduced internal matrices iir will be m M i (this will be also the number of rows of iik and i3r ). Hence, as the accuracy of the reduced model increases with the number of moments matched, it is desirable to match as many moments as possible, however without introducing much fill-in. This can be controlled by ensuring that the partitioning spreads terminals per partition sufficiently, and by generating as few separator nodes as possible. In this manner, M i, the dimension of a block moment per subnet, can be kept small. In addition, a careful selection of

134 5.4 Numerical results 125 expansion points at which the moments of Y i R (s) are generated could help to achieve a suitable trade-off between small dimensionality and accuracy. For instance, with knowledge from [78], the optimal shifts could be chosen close to the dominant eigenvalues of a suitable representation of Y i R (s) [for instance via a linearized version of Y i R (s)]. Finding an appropriate linearization or even more, an analogy between the eigenmodes of the Y i R (s) term and those of the start-point system (5.43) are new research questions and remain for further investigation When partitioning is advantageous For very large circuits with node numbers exceeding tens of thousands, partitioning becomes advantageous irrespective of how many terminals the circuit may have. Most importantly, it provides the computational benefit of performing reduction per subnet, when for the unpartitioned problem it is too costly or even unfeasible to form a Krylov projection as with PRIMA [71], SPRIM [27] SAPOR [66], or [8]. These advantages could be brought to full potential, provided that the sparsity considerations identified above are resolved. The partition-based derivations of this section however do not directly apply to RLCk circuits, that is circuits with mutual inductances. First, for RLCk circuits the inductance matrix L is typically dense, due to the presence of mutuals. Hence Γ to start with is dense, and a reasonable partition of the graph G := nzp(g + C + Γ) cannot be found, due to the fact that there are no good separator nodes (the graph G is fully connected). A good partition could be found by ignoring the mutuals, however the true Γ (with mutuals re-included after the appropriate permutations) is no longer in BBD-form, which was otherwise a necessary assumption for the derivations above. Hence, a more efficient way to partition RLCk networks is needed. 5.4 Numerical results An RLC transmission line was reduced according to the partition-based approach in second-order form, as described in Sect After partitioning, two moments at s 1 = 10 4 and one moment at s 2 = are matched per subnet. Table 5.2 collects the reduction statistics, where a partitioned-based and an unpartitioned reduction are attempted. The dimension n i denotes the number of internal variables 3 of the original model (5.1) compared to the reduced model (5.42). The partition-based reduction was faster to perform than the unpartitioned version, and also resulted in a more accurate model. This is also reflected in the Matlab bode plots from Fig. 5.3, and the AC Spectre simulation comparisons from Fig As expected from the analysis of Sect , there are more resis- 3 Internal variables are all those except the voltages at the terminal nodes.

135 126 Reduction of multi-terminal RLC networks Table 5.2: RLC reduction statistics: partitioned vs. unpartitioned Net Type n i #R #C #L Sim. Red. time (s) time (s) 1. Tline p = 22 Original Red: partitioned (N = 4 subnets) Red: unpartitioned tors in the reduced than in the original circuit; this is also reflected in the re-simulation time, which is almost the same as for the original circuit. Hence further investigation is required into improving the sparsity of the netlist, especially in the number of resistors. This example clearly reveals the importance of re-using the reduced models in simulation. Only when this final step is carried out, is one able to truly asses both the quality and the efficiency gains obtained from model reduction. 5.5 Concluding remarks The reduction of multi-terminal RLC circuits is addressed in this chapter. In particular, the PACT approach [57] is analyzed especially in the context of partitioning. Based on the second-order formulation, a hierarchical procedure to reduce RLC networks is derived. Through the border-block-diagonal hierarchy, the blocks which contain fillin are identified, based on which directions to further improve sparsity are provided. The problem of preserving moments at DC was also analyzed and suggestions to overcome it were provided. An example from circuit simulation was reduced in this framework, through which the computational advantages of partitioning are revealed. The results motivate a more thorough investigation into achieving a better trade-off between small dimensionality, sparsity and accuracy. The first recommendation in this respect is to derive a similar partition-based framework using SPRIM [27] for the first-order form or SAPOR [66] for the second-order form to reduce each subnet, rather than the PACT approach [57] analyzed here (the basics of such an approach are included in Appendix 5.6.1). For future work, the extension to RLCk circuits is also relevant. As a summarizing comparison between reduction in the first and second order form, Table 5.3 collects the advantages, limitations and future research questions for each scenario, both for the unpartitioned, as well as the partitioned framework.

5.6 Appendix 127 Figure 5.3: RLC tline. Transfer function for the original model compared to two reduced models: partition-based (left) and unpartitioned (right).

136 5.6 Appendix 127 Figure 5.3: RLC tline. Transfer function for the original model compared to two reduced models: partition-based (left) and unpartitioned (right). The partition based model captures the oscillations better. Figure 5.4: RLC tline: AC analysis of the synthesized circuits. Original (red) and Reduced with partitioning (blue) match very well, while the reduced model obtained without partitioning (magenta) deviates slightly. The error plots on the right also reflect that the partitioning-based model is more accurate. 5.6 Appendix An SPRIM partition-based reduction approach The structure preserving method SPRIM [27] or the block version BSMOR [95] have recently become popular for reducing RLC circuits. The main benefit of such approaches is that, from a subspace V which spans a moment-matching Krylov subspace, a block

137 128 Reduction of multi-terminal RLC networks version can be constructed which matches more moments (proportional to the number of blocks [95]). As a consequence, any block structure in the original circuit matrices will also be preserved in the reduced matrices. This concept has been further exploited in the RLCSYN paper [93], as to also preserve the structure of the input/output incidence matrices; this ensures that reduced models can be unstamped into RLC netlists without controlled sources (the RLCSYN [93] authors denote this special input-output structure preserving projection as SPRIM/IOPOR). Here, an SPRIM/IOPOR reduced model is constructed based on the border- block- diagonal (BBD) matrix structure resulting from partitioning the RLC netlist. The reduced model (a) preserves the input/output structure of the original system, and with that, the terminal connectivity and (b) preserves the BBD structure of the original partitioned model. The partition-based SPRIM/IOPOR procedure proposed here shows strong potential for reducing very large netlists with many terminals. Furthermore, it exploits the terminal grounding and recovery action proposed in Chapter 6, and demonstrates its functionality on industrial RLC examples. The main concept of the partition-based SPRIM/IOPOR reduction and synthesis procedure is outlined next, using a partitioning into two subnets and one separator (finer partitions are treated similarly): 1. Start with the multi-terminal RLC system in first order form (5.1) 2. Partition the RLC network using for instance nested dissection [e.g. based on the second order form (5.16) the partitioning induces the BBD reordered system (5.43)] 3. Rearrange the partitioned RLC system in first order form 4 : G 11R G 11K 0 0 G 13R G T 11 K G 11S 0 0 G 13S 0 0 G 22R G 22K G Ẽ 23R 0 0 G22 T + K G 22S G 23S G13 T R G13 T S G23 T R G23 T S G 33 Ẽ T 0 C 11R C 11K 0 0 C 13R x C T 1R 0 11 K C 11S 0 0 C 13S x 1S B C +s 22R C 22K C 23R 0 x 0 0 C22 T 2R 0 K C 22S C 23S x 2S = B u(s), (5.48) 2 C13 T R C13 T S C23 T R C23 T S C 33 x 3 B 3 0 L i 0 L where Ẽ is the reordered incidence matrix of inductor connections (the partition- 4 The procedure could continue directly in the second order form, using for instance the second order approach SAPOR/IOPOR [66, 93], however this in turn would require a linearization to a first order form.

138 5.6 Appendix 129 ing induces a special structure in Ẽ as well, but is ignored here for simplicity). In (5.48), the nodes of each non-separator subnet i = 1, 2 are split into internal nodes x ir and terminals x is. 4. System (5.48) is usually ungrounded, hence the underlying matrix pencil is singular. To make the system numerically consistent, ground the network by removing one terminal (together with its corresponding row and column) from (5.48). To avoid notation overloading, we simply denote (5.48) as being already grounded, and denote the underlying system matrices as G g, C g, B g. 5. Pick a set of expansion points s k, and construct a full-rank matrix V which spans the Krylov subspace associated with the desired number of moments for each expansion point. 6. Split V according to the block structure of (5.48): [ V T = V1 T R V1 T S V2 T R V2 T S V3 T V T L ], (5.49) compress the blocks corresponding to internal nodes and inductor currents to full column rank: Ṽ 1R, Ṽ 2R, Ṽ L, and replace the blocks corresponding to terminals and separator nodes with the appropriately sized identity matrices I 1S, I 2S, I Construct the block-diagonal projection: Ṽ = Ṽ 1R I 1S Ṽ 2R I 2S I 3 Ṽ L, (5.50) 8. Construct the reduced model Ĝg = ṼT G g Ṽ, Ĉg = ṼT C g Ṽ, B g = Ṽ T B, which will reveal the following structure: Ĝ Ĝ11 11R K 0 0 Ĝ13 R Ĝ T 11 K G 11S 0 0 G 13S 0 0 Ĝ22 Ĝ22 Ĝ23 Ê R K R ĜT 22 K G 22S G 23S Ĝ13 T R G13 T ĜT S 23 R G23 T S G 33 Ê T 0 Ĉ Ĉ11 11R K 0 0 Ĉ13 R x Ĉ T 1R 0 11 K C 11S 0 0 C 13S x 1S 0 0 +s Ĉ22 Ĉ22 Ĉ23 R K R 0 B 1 x 2R ĈT 22 K C 22S C 23S Ĉ13 T R C13 T ĈT S 23 R C23 T x 2S = B u(s), (5.51) 2 S C 33 x 3 B 3 0 L î 0 L

139 130 Reduction of multi-terminal RLC networks 9. Remove possible dependencies in Ê using the Schur decomposition of Γ = Ê L 1Ê T [see equation (5.41)], and cast the reduced system in the symmetric first order form (5.42). 10. Synthesize the reduced model (5.42) via RLCSYN [93] unstamping into the equivalent RC netlist. During the unstamping process, all R, C elements which would usually be connected to ground, are instead connected to the terminal which was removed by grounding in step 4 (see Chapter 6 for the theoretical interpretation of the terminal grounding and recovery step). Because the projection (5.50) preserves input/output structure, the resulting netlist contains no controlled sources. Notice that in the reduced system (5.51), the matrix blocks corresponding to terminals and separator nodes are unchanged, hence remain sparse. The reduced blocks corresponding to internal nodes, G iir, C iir, i = 1, 2 are usually dense, however they can be easily simultaneously diagonalized using the eigendecomposition of the matrix pencils (G iir, C iir ), or tridiagonalized according to [85]. Hence, ultimately the only dense parts will be Ĝii K, Ĝi3 R, Ĉii K, Ĉi3 R, and Ê [or U of (5.42) after the rank revealing decomposition of Γ]. Future research could address means to directly construct a sparse basis for the internal projection blocks Ṽ ir, Ṽ L, and for U, by exploiting for instance Householder transformations on these matrices. Rather than constructing the projection V as in (5.49) from the whole matrices, and then split it into (5.50), an alternative would be to construct directly small SPRIM/IOPOR projections per subnet. This would require more careful book-keeping, as the structure of Ẽ induced by the partitioning would have to be additionally accounted for. A consequence of exploiting structure in Ẽ is that the number of columns of Ê in (5.51) would be larger, and therefore the reduced model will have more current unknowns îl. This could further improve the quality of the reduced model. Computational advantages may also be gained as smaller Krylov subspaces would be computed for each subnet. Implementing the more advanced version remains for future work. The partition-based SPRIM reduction was applied on the RLC circuits in the table below: two RLC transmission line models with 22 terminals (Tline1 and Tline2), and one RLC inductor model with 3 terminals (Inductor). The reduction statistics, as well as the number of partitions N used and the number of moments matched are listed in the table. The best reduction rates, both in internal nodes and circuit elements, was achieved for the Inductor model. For the other two circuits, further sparsification methods may be appropriate as above described, as to also drive down the number of circuit elements and improve the resimulation time. The Spectre [16] simulation of the Inductor circuit in Fig. 5.5 shows perfect match between the response of the original and the reduced circuits. The same holds for the simulation of Tline2 in Figure 5.6.

140 Table with RLC reduction results for the partition-based SPRIM approach Net Type n i #R #C #L 1. Tline1 Original p = 22 partsprim Inductor Original p = 3 partsprim Tline2 Original p = 22 partsprim Sim. N time (s) # parts Moments matched one at 10 7 and two at 10 5, 10 7, and one at 10 7 and Figure 5.5: Inductor: S-parameter of the original (red) and reduced (blue) circuit overlap. Figure 5.6: Tline2. AC simulations of the original (red) and reduced (blue) circuits.

141 Table 5.3: Comparison between first and second-order form RLC reduction First order Second order: new No partition Advantages developed in [57] passivity preserved by split congruence transforms moments matched at s i after C C is zeroed, based on a structure preserving (split) Krylov projection of Y I(s) in (5.10) possibilities to match at DC (when G I is singular, based on split transformations involving the null and spanning subspaces of G I ) applies to RLCk analogous with first-order (Lemma 5.2.1) passivity preserved by congruence transforms (no splitting) moments matched after C K is zeroed-out, via a SAPOR reduction of Y R(s) in (5.24) rank deficiencies of the reduced model are eliminated via a decomposition of Γ successful re-simulations applies to RLCk Limitations transformation to zero-out C C introduces fill-in when G I is singular, matching moments at DC requires further network transformations reduced Ĝ may have rank deficiencies which prejudice the resimulation expensive/unfeasible for circuits with many nodes dense reduced matrices for many-terminal systems transformation to zero-out C K introduces fill-in the dimension for the linearized system of Y R(s) from (5.24) is twice that of G R expensive/unfeasible for circuits with many nodes dense reduced matrices for many-terminal systems Future work moment matching without zeroing out C C moment matching at DC avoiding the dense transformations from [55, 57] to eliminate singularities at DC moment matching without zeroing out C K develop theory for matching moments at DC Partition based: new Advantages an SPRIM partition based alternative (see Appendix 5.6.1) natural form for partitioning G, C, Γ in BBD structure simple implementation via congruence transforms operating on the BBD hierarchy, no need for splitting efficient for systems with many nodes especially for many-terminal systems, sparsity improved in the G S, C S, Γ S compared to unpartitioned case Limitations (aside from unpartitioned case) G I matrices per subnet become singular if there are inductors connected to the separator nodes fill-in still present from internal reduction per subnet moments at DC not well defined in general (possible when Γ R is invertible) fill-in still present from internal reduction per subnet, and the final rank revealing decomposition of Γ does not apply to RLCk Future work (aside from unpartitioned case) finding partitions which do not promote inductively connected nodes as separators extension to RLCk apply SPRIM [27] directly per subnet rather than [57] finding sparse LDL T decompositions of Γ finding suitable partitions for RLCk apply SAPOR [66] directly per subnet

142 Chapter 6 Using general reduced order models in simulations A unifying framework is presented for the reduction of multi-terminal systems, which enables the use of reduced models in simulations irrespective of the reduction method chosen. In particular it is shown how, even when the reduction does not preserve the structure of the input/output incidence matrix, the reduced model can still be synthesized, using the unstamping approach, without controlled sources. The framework also allows ungrounded systems to be reduced via the numerically sound grounded representation, with the guarantee that an ungrounded reduced netlist is finally recovered. This is then re-inserted via all terminal connections in a simulation flow. 6.1 Introduction To speed up the simulation of RLC electrical circuits such as interconnect models or parasitics networks, various model order reduction (MOR) methods [4] can nowadays be employed. Along with the reduction, a synthesis step is often necessary to convert the reduced model back into an electrical network. This must then be inserted in the desired simulation environment in place of the original circuit. A simple synthesis approach which reads the reduced MNA matrix entries into a circuit topology is the unstamping method [93], which is also used in this chapter. Two major constraints 1 are known to limit the applicability of traditional reduction 1 A third possible limitation of unstamping is that it does not guarantee positive circuit elements, which may be unacceptable for some circuit simulators. This problem lies outside the scope of this chapter; all simulations herein are performed with Spectre [16], which accepts negative elements.

143 134 Using general reduced order models in simulations methods and of the unstamping synthesis approach to multi-terminal systems (e.g. circuits with many input/output nodes): (a) the matrix pencils underlying multi-terminal systems are often singular and (b) the underlying reducing projection may destroy the structure of the input/output matrices and with that, the physical interpretation of terminal nodes. An undesirable consequence of (b) is that controlled sources would be introduced during synthesis to model the connectivity of the reduced circuit at the terminal nodes. The multi-terminal reduction methods presented in Chapters 4, 3, 5 automatically bypass both of these limitations due to the special way in which the reducing projection is formed. More specifically, the underlying projections preserve the structure of the input/output incidence matrices, allowing for synthesis without controlled sources. This holds both for systems which are grounded or ungrounded. Another class of methods which are still limited by (a) but overcome (b) are input-output structure preserving methods such as SPRIM/IOPOR from [93]. Many other more general MOR methods such as the established PRIMA [71] or the Loewner method [60] share none of these properties. Hence, although such methods may be qualitatively or computationally efficient, their reuse in practice (for instance inside a simulation setup) is cumbersome or at least inelegant. This chapter brings a new contribution by showing how, despite the known limitations, even more general reduction methods are able to handle multi-terminals systems. The reduction-synthesis framework proposed here eliminates the pencil singularity using a simple pre-processing of the original circuit, and recovers the connectivity at all terminal nodes using a post-processing of the reduced model. With the proposed procedure, the re-use of reduced models obtained from various reduction techniques becomes straightforward. It is shown for instance how reduced-order macromodels obtained with the Loewner interpolation-based method [60] can be easily re-inserted in a circuit simulation flow, with all terminal connections preserved. The proposed framework enables measurements to be taken for systems which have an original ungrounded representation (hence an underlying singular pencil). This is especially important when reducing multi-terminal circuits which do not have a direct connection to ground (such a circuit could be for instance a sub-net of a larger netlist, with the ground node belonging to another part of this larger netlist) Problem definition Consider the modified nodal analysis (MNA) description of an RC 2 circuit: { (G + sc)v(s) = Bu(s) y(s) = B T v, (6.1) 2 For simplicity, the derivations are based on RC reduction, but are generalized immediately to the RLC case especially when the system is expressed in the second order, e.g., as in [93].

144 6.1 Introduction 135 where MNA matrices G, C are symmetric, non-negative definite, corresponding to the stamps of resistor and capacitor values respectively. x R n+p denote the node voltages, measured at the p terminals and n internal nodes. u R p are the currents injected into the terminals. Assuming that the first p nodes are the terminals, the input incidence matrix has the special structure: B = [ Ip 0 ] R (n+p) p, (6.2) where I p R p p is the identity matrix of size p. The outputs are the voltage drops at the terminal nodes: y(s) = B T v = v p. In model reduction, an appropriate V R (n+p) (k+p), k 0 is sought, such that the system matrices and unknowns are reduced to: and satisfy: (Ĝ + sĉ) x(s) = Bu(s). Ĝ = V T GV, Ĉ = V T CV R (k+p) (k+p) (6.3) B = V T B R (k+p) p, x = V T x R k+p (6.4) It is emphasized that the representation (6.1) corresponds to an ungrounded circuit. Ungrounded means that a reference node in (6.1) has not yet been chosen, this being fixed only in the simulation phase. From a mathematical viewpoint, we do not yet know which terminal will be set to ground in simulation - this is decided by the designer. Consequently, the matrix pair (G, C) underlying the ungrounded system (6.1) is singular. Furthermore, the projection V resulting from many model reduction methods such as PRIMA [71], or the Loewner method [60] does not retain the original structure of current injections (6.2), so the reduced B from (6.4) is dense. Hence, when a SPICEequivalent circuit is derived from the reduced model using an unstamping method (e.g. RLCSYN [93]), controlled sources would be have to be used to unstamp the dense B [36, 73]. Two problems emerge from this setup, which are resolved in this chapter: (a) How can one use general projection based methods [which assume a regular (G, C) pencil] to reduce ungrounded circuits in such a way that all terminals re-appear in the reduced model? (b) Assuming that such a reduction is possible, how can one unstamp a general reduced order model (6.3)-(6.4) without controlled sources? Towards answering (a), the notion of a generalized frequency response or generalized transfer function is introduced. This expresses the input/output response of an ungrounded multi-port circuit in terms of the transfer function of the grounded circuit. In this manner, it becomes possible to reduce a grounded circuit while ensuring that the response of the ungrounded circuit at all terminals is recovered. The solution to (b) is given by an equivalence transformation which converts the reduced B into T T B = [ I 0 ].

145 136 Using general reduced order models in simulations 6.2 Ungrounded vs. grounded systems In preparation for solving the above problems, the structure of matrices underlying ungrounded vs. grounded circuits is derived, together with the relationship between the solutions of the corresponding linear systems. Consider the simple resistor circuit example in Fig. 6.1: on the left, the ungrounded circuit is shown, while the grounded version is on the right. From Fig. 6.1, consider the ungrounded version on the left. Ungrounded 1 Grounded Terminals: p = 2 Terminals: p = 1 Figure 6.1: Simple resistive circuit with two terminals. The left version is ungrounded, with currents flowing into nodes 2 and 3 (terminals). The right circuit is the grounded version, where node 3 is set to ground. Nodes 2 and 3 are terminals through which input currents i 2, i 3 are injected. The outputs are the voltages v 2, v 3 measured at the terminals. Node 1 is internal. From Kirchhoff s Current Law (KCL) and the branch constitutive equations the following MNA system results: g 1 + g 2 g 1 g 2 g 1 g 1 0 g 2 0 g 2 } {{ } G v 1 v 2 v 3 } {{ } v = } {{ } B [ i2 i 3 ] }{{} u (6.5) y = B T v (6.6) Note that G is singular, which is always the case for ungrounded circuit matrices. It is easy to see that the vector of all 1 s Null(G) [each row (and column) in G sum to 0]. 1 Thus the solution to (6.5) cannot be determined by inverting G. Denoting 1 = 1 1,

146 6.2 Ungrounded vs. grounded systems 137 the solution v can however be expressed as: v = α1 + v 0, where v 0 is the particular solution to (6.5) and α is a free parameter. Let us find v 0 and the interpretation of α. Expressing v 1 from the first row of (6.5) and replacing it in the second and third rows gives: ( v 1 = g 1 v g 1 + g 2 + g 2 v 2 g 1 + g 3 (6.7) ) 2 g 1 g2 1 g 1 + g 2 ( g 1 g 2 g 1 + g 2 v 2 + v 2 g 1 g 2 v g 1 + g 3 = i 2 (6.8) 2 ) g 2 g2 2 g 1 + g 2 v 3 = i 3 (6.9) Note that g 1 g2 1 g 1 +g 2 = g 2 g2 2 g 1 +g 2 = g 1 g 2 g 1 +g 2 := A. Therefore the last two equations read: Av 2 Av 3 = i 2 (6.10) Av 2 + Av 3 = i 3 i 2 + i 3 = 0. (6.11) As expected, one of the equations from (6.5) is redundant, and the unknown voltages are dependent. Let v 3 = α. From (6.10) one obtains: v 2 = α + A 1 i 2 = α + ( 1 g g 2 )i 2. Replacing v 2 and v 3 in the first row of (6.5), one obtains v 1 = α + g 1 i 2 2. Finally the general solution to the ungrounded system (6.5) is: α g i v= α + ( g g i 1 g )i 2 2 = α ( g g )i 2 2. (6.12) α 1 0 }{{}}{{} 1 v 0 Next, it is shown that v 0 is nothing but the solution of the grounded representation, where terminal node 3 is chosen as the reference node (i.e. when α = v 3 = 0). Consider now the circuit version on the right of Fig. 6.1: the current injection into terminal 3 is removed and node 3 is grounded (v 3 = 0). The input is the current i 2 flowing into terminal 2, and the output is measured as the voltage of this node, namely v 2. The

147 138 Using general reduced order models in simulations corresponding MNA equations are: [ g1 + g 2 g 1 ] [ v1 ] } g 1 {{ g 1 v 2 }}{{} G g v g = [ ] 0 [ ] i2 (6.13) 1 }{{}}{{} u g B g y g = B T g v g (6.14) G g is now invertible. The solution to (6.13) is completely determined, namely: v g = G 1 g B g u g = [ 1 g 2 i 2 ( 1 g g 2 )i 2 ] (6.15) Finally, note that the general solution (6.12) of the ungrounded system is expressed in terms of the solution (6.15) of the grounded system as follows: v = α1 + [ vg 0 where α = v 3 is taken as the reference voltage. ], α = v 3, (6.16) 6.3 Generalized frequency response/transfer function With the result (6.16) at hand, we proceed towards expressing the input/output transfer function of an ungrounded system in terms of that of the grounded system. From (6.14), the output of the grounded system is: y g = B T g v g = BT g G 1 g B g }{{} u g, (6.17) :=H g from which the transfer function of the grounded circuit is identified as H g. Next we form the generalized transfer function H for the ungrounded system (6.5) in terms ] Bg of H g. Note that B is expressed in terms of B g as follows: B=[ 0 c R (n+p) p, 0 r 1 where B g R (n+p 1) (p 1), and 0 c and 0 r are the row and column vector respectively of all 0s, corresponding to the dimensions of B g. For an ungrounded system of dimension n + p, we have 1 R n+p, and for the grounded system 1 g R n+p 1. Recall also that [ ] Ip B =, so that B T 1 = 1 0 p R p is the vector of all 1 s of length p. Similarly, B T g 1 g = 1 p 1 Rp 1. Using (6.16), the general output for the ungrounded system (6.5)

148 6.3 Generalized frequency response/transfer function 139 is expressed as: [ B T g ] y=b T v = αb T 1 + B T 0 T [ ] [ ] r 1g B T v 0 =α 0c T g 0 T [ ] r vg c T = 1 0 [ ] [ ] B T = α g 1 g B T + g v [ ] [ ] [ ] g 1p 1 Hg u =α + g α1p 1 + H = g u g α = [ ] [ ] Hg 1 p 1 ug. 0 1 α }{{} (6.18) H }{{} u α From (6.18) the generalized transfer function H of an ungrounded system is identified. It is expressed in terms of H g, the transfer function of the grounded circuit. The generalized input u α is the input u g of the grounded circuit, and the free parameter α, the voltage of the reference node [e.g., the node v 3 from (6.5) which was set to ground in (6.13)]. The result (6.18) also holds for expressing the generalized transfer function H(s) C p p of an ungrounded multi-terminal RC circuit in terms of a grounded version, where H g (s) = B T g (G g + sc g ) 1 B g C (p 1) (p 1). It is emphasized that for the ungrounded system, expressing H(s) = B T (G + sc) 1 B directly is not allowed since the pencil (G, C) is singular Recovering the reference node from the grounded system From the generalized frequency response (6.18), system matrices associated with the ungrounded representation can be recovered from the grounded representation as follows: H(s)= [ B T g 1 p 1 0c T 1 ] ([ ] Gg 0 c +s 0 r 1 [ Cg 0 c 0 r 0 ]) 1 [ Bg ] 0 c 0 r 1 (6.19) With system matrices so redefined, we are interested in rewriting the equations so that the node which was set to ground is re-introduced. Let v p be the voltage drop at this node. The dynamical equations associated with (6.19) now read: ([ Gg 0 c 0 r 1 ] +s [ ]) [ Cg 0 c v g 0 r 0 v p ] = [ Bg 0 c 0 r 1 ][ ] ug α { (Gg + sc g )v g = B g u g v p = α (6.20)

149 140 Using general reduced order models in simulations Let the voltage drops of the grounded net be ] split into voltages measured at internal and terminal nodes respectively: v g = [ vgi v gt. Note that if the ungrounded net has p terminals, there are v gt R p 1 terminals in the grounded net. It follows that B T g v g = v gt. The output equation of (6.19) then reads: y= [ B T g 1 p 1 0 r 1 ][ vg v p ] = [ B T g v g + v p 1 p 1 v p ] =[ vgt + α1 p 1 α ]. (6.21) In other words, the outputs of the ungrounded net are the voltage drops at the p 1 terminals of the grounded net, measured as v gt R p 1 plus the value of the reference voltage v p = α. The last output is the p th terminal, the voltage v p = α itself of the node which was removed from the network by grounding. This allows one to recover the ungrounded netlist by unstamping the grounded net specified by G g, C g, B g as usually. However the branches that would usually be connected to a so called ground are now connected to the reference terminal node v p. In this manner, the recovered netlist contains all terminals and can be reconnected directly to other circuit blocks and simulated Implications Several implications emerge from the above derivations. These extend the applicability of state-of-the-art reduction methods to systems with dependencies such as ungrounded multi-terminal circuits. 1. The parameter α is independent of the frequency s, and represents the voltage of the chosen reference node. 2. After a reference node is chosen from one of the terminals, the grounded circuit (now with p 1 terminals) can be reduced with any reduction method which assume a regular pencil [what before we called (G g, C g )]. For instance, constructing reduced models from measurements of the original frequency response via the Loewner method [60], or PRIMA [71] becomes possible for ungrounded multiterminal circuits. Although measurements of H(s) cannot be taken explicitly [due to the fact the the inverse (G + s i C) does not exist as the pencil (G,C) is singular], they can be taken implicitly by measuring H g (s) for different frequencies s i [the pencil (G g,c g ) is regular]. 3. Connectivity via all terminals is recovered after reduction: the chosen reference node is simply re-inserted in the physical netlist as the p th terminal (with the appropriate connections to the other circuit nodes) during the synthesis of the reduced model.

150 6.4 Model reduction 141 In the following section, the reduction and terminal recovery (without controlled sources) are described. 6.4 Model reduction Let as before v p be the terminal set to ground, and the grounded system be reduced via a projection V R (n+p) (k+p), k + p being the dimension of the reduced model: Ĝ g = V T G g V, Ĉ g = V T C g V, B g = V T B g. The transfer function of the grounded reduced model is Ĥ g (s) = B T g (Ĝg + sĉg ) 1 B g C (p 1) (p 1). Then the generalized transfer function of the ungrounded reduced model follows similarly to (6.18): Ĥ(s)=[ Ĥg (s) 1 p 1 0 r 1 ] =[ B T g 1 p 1 0 r 1 ] ([ Ĝg 0 c 0 r 1 ] + s [ Ĉg 0 c 0 r 0 ]) 1 [ B g 0 c 0 r 1 Let the unknowns in the reduced, ungrounded system be split as follows: v = [ ] v T g v p.(6.22) R k+p, where v T g Rk+p 1 are the voltage drops at the nodes of the grounded circuit. Similarly to the original network, B T g v g = v g t. The dynamical system describing the reduced, ungrounded network associated with (6.22) reads: (Ĝg + sĉg ) v g = B g u g v p = α [ ] (6.23) vgt + α1 y =. α The reduced netlist containing the p 1 terminals is unstamped from the reduced grounded net specified by Ĝg, Ĉg, B g. The output equation from (6.23) introduces the grounded terminal back in the physical netlist as a node with voltage v p = α. In other words the ground node is no longer fixed to voltage v p = α = 0 but is kept free as the p th terminal, as in the original ungrounded model. As a final step, the reduced input/output matrix B g (which is usually dense) must be transformed to a form which enables straightforward synthesis of terminal re-connectivity, without introducing controlled sources. This is shown next. ] Transforming a reduced model into synthesis-ready form Towards transforming the reduced grounded model Ĝg, Ĉg, B g into a form appropriate for unstamping without controlled sources, the terminal locations have to be retrieved

151 142 Using general reduced order models in simulations from B g. This is achieved as follows. Assume that the dimension of the reduced model is K, where K p 1, as is usually the case for projection based MOR methods. Thus B g R K (p 1) is tall and thin (having linearly independent columns). A transformation T R K K is sought so that ] T B=[ T Ip 1 R K (p 1). This allows one to inject currents into the p 1 terminals 0 of the reduced network just as in the original network. The transformation is based on the singular value decomposition (SVD) of B g : B g = UΣV T, where U R K K, Σ R K (p 1), V R (p 1) (p 1). More precisely, T is obtained from: [ ] Σp 1 B g = U V T 0 T=U [ ] Σ 1 p 1 VT 0, (6.24) 0 I K p+1 which transforms the reduced system into: [ ] B g = T T Ip 1 B g = R K (p 1) 0 Ĝ g = T T Ĝ g T, Ĉ g = T T Ĉ g T (6.25) In (6.25), the new incidence matrix B g has the required form for direct synthesis via unstamping. The identity block I p 1 R (p 1) (p 1) represents the incidence of current injections into the p 1 terminals. The p th terminal is re-inserted instead of the ground node as previously described [see equations (6.23) and comment thereafter]. Sect illustrates the procedure via a small example. 6.5 Numerical results and simulations To demonstrate the functionality of the proposed multi-terminal reduction with terminal recovery, two examples are provided. The first is a simple RC network which clearly shows how the grounded terminal is re-introduced after reduction and the input/output transformation. The second is a larger industrial example of an RC transmission line with 22 terminals.

152 6.5 Numerical results and simulations 143 Current injection Terminal Internal node C2 i2 Current injection Terminal 1 3 R1 R2 R4 2 C3 4 i1 KCL: i1 + i2 + i5 + i6 = 0 R6 6 C1 R3 R5 5 i6 i5 Figure 6.2: Left: RC ungrounded circuit, where nodes 1, 2, 5, 6 are terminals, nodes 3, 4 are internal. Currents are injected into terminals. Right: KCL for the ungrounded network, where all nodes are viewed as one (the circle above). The sum of all currents entering this node is zero Simple RC circuit For the ungrounded simple RC network in Fig. 6.2, consider the MNA equations: g 1 +g 2 +g 3 +g 4 +g 6 g 4 g 1 g 2 g 3 g 6 g 4 g g 1 0 g g g (6.26) g g 3 +g 5 g 5 g g 5 g 5 +g 6 }{{} c 1 +c c 2 0 c 1 0 c 3 0 c s c 2 c 3 0 c 3 +c c c 1 G } {{ } C V 3 (s) V 4 (s) V 1 (s) V 2 (s) V 5 (s) V 6 (s) }{{} v = i i i i }{{}}{{} u B Observe that the row/column sums of G, C are zero, since the network is ungrounded. Before reduction, we ground the circuit by choosing a reference node, for example node 6. Hence the corresponding row, column, and input incidence are removed from (6.26) [shown in red in (6.26)]. The grounded system is thus obtained: { (Gg + sc g )v g = B g u g y g = B T g v g (6.27)

153 144 Using general reduced order models in simulations We reduce (6.27) with the well-known method, PRIMA [71], and apply the terminal transformation and ground recovery proposed in Sect Note the original, grounded system dimensions: G g, C g R 5 5, B g R 5 3. Let V be the projection which matches one moment of (6.27) at s 0 = 10 2, V = (G g + s 0 C g ) 1 B g, V R 5 3. The reduced matrices are Ĝg = VT G g V R 3 3, Ĉg = VT C g V R 3 3, B g = V T B g R 3 3. Note that B g is dense. Next, the transformed reduced model (6.25) is computed so that the new B g = I 3. The new reduced model as in (6.25) has the form: ĝ 1,1 ĝ 1,2 ĝ 1,3 ĉ 1,1 ĉ 1,2 ĉ 1,3 V 1 (s) V 6 (s) i 1 ĝ 1,2 ĝ 2,2 ĝ 2,3 +s ĉ 1,2 ĉ 2,2 ĉ 2,3 V 2 (s) V 6 (s) = i 2 (6.28) } ĝ 1,3 ĝ 2,3 {{ ĝ 3,3 } } ĉ 1,3 ĉ 2,3 {{ ĉ 3,3 V 5 (s) V 6 (s) }}{{} 0 } 0 {{ 1 i 5 }}{{} v g B g u g Ĝ g Ĉ g In (6.28), v g is expressed as a voltage drop measured between each node and the reference node V 6 which was set to ground. Since the transformations V and T preserve symmetry, the off-diagonal entries satisfy ĝ j,i = ĝ j,i, ĉ j,i = ĉ j,i, for j = i, 1 i, j 3. Rewriting (6.28) so that V 6 becomes a separate unknown gives: 3 3 ĝ 1,1 ĝ 1,2 ĝ 1,3 ĝ 1,i ĉ 1,1 ĉ 1,2 ĉ 1,3 ĉ 1,i i=1 i=1 V 1 (s) 3 3 ĝ 2,1 ĝ 2,2 ĝ 2,3 ĝ 2,i + s ĉ 1,2 ĉ 2,2 ĉ 2,3 ĉ V 2 (s) i 1 2,i i=1 i=1 V 5 (s) = i 2 (6.29) i V 6 (s) 5 ĝ 3,1 ĝ 3,2 ĝ 3,3 ĝ 3,i ĉ 1,3 ĉ 2,3 ĉ 3,3 ĉ 3,i i=1 i=1 Notice how in (6.29) the matrix row sums are zero. Let: ĝ j,4 = 3 i=1 ĝ j,i, ĉ j,4 = 3 i=1 ĉ j,i, for j {1, 2, 3} (6.30) From the symmetry property and (6.30), summing the rows of (6.29) gives the additional equation: [ 3 +s ĉ 1,i V 1 + i=1 3 i=1 3 i=1 ĝ 1,i V 1 + ĉ 2,i V i=1 3 i=1 ĝ 2,i V i=1 ĝ 3,i V 5 + (ĝ 1,4 + ĝ 2,4 + ĝ 3,4 )V ĉ 3,i V 5 + (ĉ 1,4 + ĉ 2,4 + ĉ 3,4 )V 6 ] = i 1 + i 2 + i 5 (6.31) ĝ 1,4 V 1 + ĝ 2,4 V 2 + ĝ 3,4 V 5 ( +sĉ 1,4 V 1 + sĉ 2,4 V 2 + sĉ 3,4 V 5 s( 3 i=1 3 i=1 ĝ i,4 )V ĉ i,4 )V 6 = (i 1 + i 2 + i 5 ) (6.32)

154 6.5 Numerical results and simulations 145 Denote: ĝ 4,4 = ( 3 i=1 ĝ i,4 ), ĉ 4,4 = ( 3 i=1 ĉ i,4 ), i 6 = (i 1 + i 2 + i 5 ) (6.33) Appending (6.32) to (6.29) and using (6.33) gives a new system of equations: ĝ 1,1 ĝ 1,2 ĝ 1,3 ĝ 1,4 ĉ 1,1 ĉ 1,2 ĉ 1,3 ĉ 1,4 V 1 (s) i 1 ĝ 1,2 ĝ 2,2 ĝ 2,3 ĝ 2,4 ĝ 1,3 ĝ 2,3 ĝ 3,3 ĝ +s ĉ 1,2 ĉ 2,2 ĉ 2,3 ĉ 2,4 V 2 (s) 3,4 ĉ 1,3 ĉ 2,3 ĉ 3,3 ĉ 3,4 V 5 (s) = i i (6.34) 5 } ĝ 1,4 ĝ 2,4 ĝ 3,4 {{ ĝ 4,4 } } ĉ 1,4 ĉ 2,4 ĉ 3,4 {{ ĉ 4,4 V 6 (s) 0 }}{{}} 0 0 {{ 1 i 6 }}{{} Ĝ Ĉ v B u System (6.34) represents the ungrounded reduced model with node 6 re-inserted. It can be readily verified that the rows/columns of Ĝ, Ĉ respectively add up to zero. Hence the reduced netlist with all terminals recovered is obtained as usually by unstamping (6.34) with RLCSYN (see chapter 2). In particular the resistor and capacitor values between nodes V i and V j, i, j {1, 2, 5, 6} are: r(v 1, V 2 ) = 1 ĝ 1,2, r(v 1, V 5 ) = 1 ĝ 1,3, r(v 1, V 6 ) = 1 ĝ 1,4 (6.35) c(v 1, V 2 ) = ĉ 1,2, c(v 1, V 5 ) = ĉ 1,3, c(v 1, V 6 ) = ĉ 1,4 (6.36) As the row/column sums of (6.34) are zero, there are no connections from any of the nodes to ground. The ground was the reference node V 6 which is mapped back as a terminal. A final intuitive note regarding the current equation from (6.33); this is the Kirchhoff s current law for the reduced net, which says that the sum of all currents entering the network is zero. This holds for the original network as well [just sum the rows of the original ungrounded MNA system from Fig to get i 6 +i 2 +i 3 +i 5 =0]. This can be visualized on the right of Fig Finally, a comparison between the AC simulation of the original and reduced netlist for this simple circuit is shown in Fig As this is a very small example, the quality of the approximation is not of major interest (although one can observe that the reduced response follows the original well especially for higher frequencies). More important is the synthesis result: the mathematical reduced model was converted to a simple RC representation without controlled sources and with all terminals recovered. Actually, during the AC simulation itself, node V 1 was set to ground, and the response at node V 6 was measured (shown in Fig. 6.3). This confirms that choosing a reference node prior to the reduction step can be done arbitrarily. Provided that this reference node is re-inserted as a terminal during synthesis, the response at all terminal nodes will be recovered.

155 146 Using general reduced order models in simulations Figure 6.3: AC analysis of the simple RC circuit. Original (red, dimension 6) vs. reduced with terminal recovery (blue, dimension 4). The voltage at node n6 (V 6 ) is measured. Figure 6.4: Example AC simulation of original (red, 3253 nodes), PACT reduced (magenta, 22 nodes), PRIMA reduced (blue, 22 nodes), PRIMA reduced (black, 43) nodes RC transmission line An RC transmission line with n + p = 3253 nodes of which 22 are terminals is reduced and synthesized in this generalized framework using three methods: PACT (see

156 6.5 Numerical results and simulations 147 Sect ), PRIMA [71] and the Loewner method [60]. Note that, while PACT applies by default to ungrounded systems and also preserves automatically the structure of the input/output matrices, the more general methods PRIMA and the Loewner method satisfy none of these properties. Nevertheless, using the framework proposed in this chapter, they also can be applied to multi-terminal systems. This is shown next. The original network is ungrounded, and the procedure follows in the same manner as for the small example in Sect As before, from the 22 terminals, one node is arbitrarily chosen as a reference node and the grounded system (6.27) (now with 21 terminals) is obtained. The three reduction methods PACT (see Sect ), PRIMA [71] and the Loewner method [60] are applied on the grounded system. The reducing projections are constructed to match moments of the original response around s = 0. Because the reduced models obtained by PRIMA and the Loewner method do not preserve the input/output structure, they are transformed via the input/output transformation (6.24) to (6.25) [the PACT reduced model does not need the input/output transformation, since it naturally preserves the input/output structure]. Note that all reduced models are of the form (6.28), where the states are the voltages measured with respect to the grounded node. As in Sect , for each reduced system the ungrounded reduced representation is obtained, with the reference node re-inserted as the 22 nd terminal. Finally, the synthesized netlist without controlled sources is obtained by RLCSYN unstamping (see Chapter 2). Figure 6.4 shows the comparison between the AC simulations (with Spectre [16]) of the synthesized netlists for the PACT model and two PRIMAreduced models. The PACT model and the small PRIMA model have dimension 21 (22 nodes after terminal recovery) and are superimposed. These models deviate very little from the original and only beyond 1GHz. The larger PACT model of dimension 42 (43 nodes in the synthesized netlist) follows the original perfectly. Two reduced models of dimension 21 and 42 [(LM1) and (LM2) respectively] were also obtained 3 with the Loewner method [60] by taking measurements of the original frequency response (of the grounded system). The maximum and minimum singular values σ max (Ĥ g (s)) and σ min (Ĥ g (s)) of the reduced models are compared to those of the original, σ max (H g (s)) and σ min (H g (s)). For LM1, Fig. 6.5 shows that the approximation for low frequencies may not be sufficiently accurate. The singular value plots for LM2 show, in Fig. 6.6, that this larger model provides a good approximation for the whole frequency range. Again, this is confirmed by the re-simulation result in Fig The AC simulations of the reduced synthesized models compared to the original simulation are shown in Fig The smaller model (LM1 with 22 nodes after synthesis) shows a slight mismatch in DC, which is expected given the result obtained in Fig However the larger model (LM2 with 43 nodes after synthesis) matches the original. 3 The Loewner reduction itself was performed by the first author of [60].

157 148 Using general reduced order models in simulations Figure 6.5: Example Maximum and minimum singular value plots for original (black, 3252 nodes), and reduced LM1 (red, 21 nodes). Figure 6.6: Example Maximum and minimum singular value plots for original (black, 3252 nodes), and reduced LM2 (red, 42 nodes)

158 6.6 Concluding remarks 149 Figure 6.7: Example AC simulation of original (red, 3253 nodes), and two reduced netlist obtained from measurements [60]: LM1 (blue, 22 nodes) and LM2 (magenta, 43 nodes). The original and LM2 are indistinguishable. 6.6 Concluding remarks Two problems are resolved, which usually limit the applicability of traditional reduction methods to multi-terminal circuits: (a) the reduction of ungrounded systems and (b) the synthesis of general reduced models without controlled sources. A terminal removal and recovery action is proposed, which allows the reduction of ungrounded multi-terminal models. The synthesis problem is resolved using a transformation based on the input/output matrices of the reduced model. The framework enables the reduction of multi-terminal circuits with conventional MOR methods, and the usability of such reduced order models in simulations.

159

160 Chapter 7 Graph partitioning with separation of terminals A graph partitioning problem is described, specifically targeted towards improving sparsity during the reduction of multi-terminal circuits. The traditional partitioning objectives are revised and new conditions are derived, aimed at directly minimizing fill-in. Upper bounds on fill-in are provided, which could serve as initial criteria for distributing nodes and terminals across partitions. Examples are provided to also illustrate that the upper bounds may be too loose to achieve the best sparsity results. The analysis motivates the need for a fill-estimating measure to be used directly as a partitioning objective. If incorporated in the new partitioning setup, the fill-minimizing objectives may improve sparsity in multi-terminal MOR beyond the level achieved via existing partitioning software. 7.1 Introduction Graph partitioning has become the key ingredient in developing efficient solutions for large scale problems in many disciplines, with application areas ranging from technological sciences to natural and social sciences [23]. Put in simple terms, the objective of graph partitioning in general is to distribute a large data set into different parts that communicate with each-other as little as possible, usually so that the resulting parts are equal in size. For applications where these represent the governing criteria (e.g. parallel computing or sparse matrix factorizations [18]), state-of-the-art partitioning software (to name a few: METIS [52], hmetis [51], Mondriaan [96]) successfully apply. In the context of multi-terminal model reduction where the ultimate goal is to preserve

161 152 Graph partitioning with separation of terminals sparsity in the reduced model, a new partitioning problem emerges: find the partitioning which distributes terminals across components so that the fill generated from eliminating the internal nodes of each component is minimized. As in the usual partitioning setup, components should be minimally connected to one-another, so as to preserve as few internal nodes (i.e. separator nodes) as possible (the fewer the preserved internal nodes, the smaller the dimension of the reduced model). Chapter 4 showed how, using graph partitioning, very large multi-terminal networks were split into components which were treated individually in subsequent reduction steps. The partitioning helped to efficiently obtain, from an original large circuit, a reduced circuit which is small and sparse [i.e., its graph representation ideally has fewer vertices (circuit nodes) and fewer edges (basic circuit elements) than the original]. It was also seen how the sizes of the different parts do not influence the approximation quality of the reduced model. However, as demonstrated next, the number of terminals falling in each part and the number of separator nodes influence the sparsity level achieved during reduction. The purpose of this chapter is to provide an avenue for improving sparsity in multiterminal MOR, even beyond the level achieved via existing partitioning software. A new partitioning criterion is proposed which eliminates some of the standard constraints (such as enforcing equal-sized partitions) and rather directly minimizes a fill-in objective, in addition to the usual partitioning objectives of minimizing (a) the edge cut or (b) the total communication volume 1. See [52] for a definition of these objectives. We give an abstract formulation of the objective function which distributes terminals across partitions, so that fill-in generated from reducing each partition up to terminals is minimized. If incorporated inside partitioning tools, the terminal constraint could further enhance the reduction of challenging multi-terminal electrical circuits. 7.2 Formulation of multi-terminal partitioning criterion In the following section, we formulate the multi-terminal partitioning criterion for a bisection. Afterwards a generalized formula for a partitioning into N components is derived way partitioning of multi-terminal networks Fig. 4.2 provides a rough problem visualization and formulates the partitioning task. More precisely, let the undirected graph G be partitioned into two subnets G 1 = (V 1, E 1 ), G 2 = (V 2, E 2 ), which communicate via a separator net G 3 = (K, E K ). In other words, the nodes V are partitioned into three disjoint sets V 1 V 2 K = V, such that there is no 1 Minimizing the total communication volume becomes relevant especially for partitions of size > 2.

162 7.2 Formulation of multi-terminal partitioning criterion 153 direct edge between any node from V 1 to a node from V 2, and the only path from a node of V 1 to a node from V 2 is through the separator nodes K. Denoting V 1 = N 1, V 2 = N 2, and K = k, the edges induced by V 1 are E 1 = {(v i, v j ) E v i, v j V 1 ; i = j}, by V 2 are E 2 = {(v i, v j ) E v i, v j V 2 ; i = j}, and by K are E K = {(v i, v j ) E v i, v j K; i = j}. Let the remaining edges connecting K to V 1 be E 1K = {(v i, v j ) E v i V 1 ; 1 i N 1, v j K; 1 j k}, and the edges connecting K to V 2 be E 2K = {(v i, v j ) E v i V 2 ; 1 i N 2, v j K; 1 j k}. The complete set of edges is thus E = E 1 E 2 E K E 1K E 2K, and the individual sets are disjoint. Through this partitioning, the terminals are split into the following disjoint sets: P 1 V 1 terminals fall under V 1, P 2 V 2 terminals fall under V 2, and P K K terminals 2 fall inside the separator K, P 1 = p 1, P 2 = p 2, P K = p k, P 1 P 2 P K = P. Let there be n 1 internal nodes in G 1, and n 2 in G 2, so that V 1 = p 1 + n 1, V 2 = p 2 + n 2. Figure 7.1 (left) gives the representation of a matrix G partitioned in this manner, which is to be reduced by eliminating internal nodes n 1 and n 2. As the reduction progresses through the BBD hierarchy, fill is introduced in the blocks corresponding to the preserved nodes p 1, p 2 and k (as was also described in Chapter 4). For instance the reduced Ĝ of (4.31) numerically represents the Schur complement with respect to the blocks of eliminated internal nodes, and clearly will have different dimensions and fill rates depending on the p 1, n 1, p 2, n 2, and k nodes resulting from each partition. The form of the reduced matrix is shown on the right of Fig In the worst case, all matrix blocks (G 1, G 2, G 3, G 13, G 23 ) in the reduced matrix are completely filled. In this setup, the 2-way multi-terminal partitioning problem is formulated: Problem 1: multi-terminal 2-way partitioning Find the smallest number of separator nodes k which partitions G into two parts such that the elimination of internal nodes n 1 and n 2, introduces the smallest amount of fill in the terminal blocks p 1, p 2, the separator block k, and the communication block from p 1 and p 2 to k respectively. Towards formalizing the above partitioning criterion, let: e f ill1 (n 1, [p 1, k]), (7.1) be an abstract function which estimates the fill created during the reduction of the first subnet. More precisely, the notation in (7.1) specifies the fill generated from the elimination of internal nodes n 1 inside the blocks corresponding to terminal nodes p 1 and separator nodes k. Similarly, a fill-estimating function for the reduction of the second subnet is e f ill2 (n 2, [p 2, k]). The partitioning objective of Problem 1 can be written compactly as: min e f ill1 (n 1, [p 1, k]) + e f ill2 (n 2, [p 2, k]), (7.2) min(k), k= K V where the additional constraint min(k) specifies that a small number k of separator 2 For multi-terminal MOR the p k terminals are not relevant, as all the k separator nodes become terminals.

163 154 Graph partitioning with separation of terminals Internal nodes (to be eliminated) Terminals (to be preserved) n1 p1 p2 Original G1 G13 n2 G2 G23 p1 p2 k Reduced G1 G13 G2 G23 G3 k G3 Separator nodes (to be preserved) Figure 7.1: Original and reduced circuit matrices for a two-way partitioning in BBD form. In the reduction step, internal nodes are eliminated, and blocks corresponding to terminal nodes and separator nodes are preserved but filled-in. The dimensions of the reduced model give the maximum fill value from (7.3). [A note on scale: in practice the number of internal nodes is larger than the number of terminals per block, but is kept small in the figure due to space limitations.] nodes is desired (as all separator nodes are preserved in the reduced model, the smaller the k the smaller is also the dimension of the reduced model). An upper bound for the fill-estimating function (7.2) is obtained immediately by inspection from Fig. 7.1 (right): min min(k), k= K V p 1 (p 1 1) + p 2 (p 2 1) + k(k 1) +(p p 2 )k, (7.3) }{{} Max f ill which represents the maximum fill-in possibly generated from reducing each subnet individually up to its terminals p i and to the k separator nodes. More precisely, the upper bound on e f ill1 is p 1 (p 1 1) 2 + p 1 k + k(k 1) 2, the upper bound on e f ill2 is p 2 (p 2 1) 2 + p 2 k + k(k 1) 2, so that the upper bound on (7.2) 3 becomes (7.3). Note that (7.3) is independent of the internal nodes n 1 and n 2. Hence in practice it may be a loose upper bound (this effect is shown in Fig. 7.8 for the more general case of an N-way partition). It should be emphasized that in the multi-terminal partitioning setup above defined, the resulting subnet sizes N 1 and N 2 need not be the same. In other words, the partitioning must not be balanced in the dimension of the generated components (decompositions into equal parts pertain more to parallel computing related applications, where the main goal is to balance computation among processors). In the context of multi-terminal 3 The upper bound on the sum of the two k(k 1) 2 factors is still k(k 1) 2, as the fill inside the upper triangle of a k k matrix can never be greater than k(k 1) 2.

164 7.2 Formulation of multi-terminal partitioning criterion 155 MOR, partitions should be balanced in a different measure, that which minimizes fillin. For that, terminals must be appropriately distributed across the subnets, together with the usual condition of minimizing connectivity between subnets. Therefore significant improvements in sparsity could be achieved with the help of a partitioner which does not impose an equal-sized decomposition, but rather uses condition (7.2) to directly control the distribution of terminals. This remains a motivating problem of future research, as is the formulation of an explicit measure (7.2) [tighter than (7.3)] for estimating fill-in. Closely related is the approximate minimum degree criterion which underlies the AMD [1, 2] reordering algorithm for sparse matrix factorizations. AMD uses upper bounds on the degrees of the nodes in the graph, rather than exact degrees, to determine the order in which nodes should be eliminated from a graph (matrix) so that the least amount of fill-in is produced. The pivot selection at each step is done by choosing the node with the smallest upper bound on the degree [1, 2]. Hence, an estimator for (7.1) could use the approximate minimum degree criterion to determine which n 1 internal nodes should be eliminated and what the predicted fill-in would be. Example for 2-way partitioning To understand the influence of different partitions on the sparsity of the reduced model, the biconnected component network from [80] is chosen. In Fig. 7.2 (left) the graph representation of the original network is shown: it has n + p = 244 nodes of which p = 54 are terminals (shown in red). The nodes of the graph are actual circuit nodes, while the edges between nodes correspond to the resistive connections. The network has visible biconnected components (if the articulation nodes in the middle are removed, the two parts become disconnected). Three partitioning choices are made (denoted as P 1, P 2, and P 3 respectively), each with different separator nodes. For each partitioning scenario, the statistics are recorded in Table 7.1: N 1 = n 1 + p 1 and N 2 = n 2 + p 2 are the sizes of each subnet, p 1 and p 2 are the number of terminals falling in each partition, k is the number of separator nodes, Maxfill is the corresponding upper bound on fill-in (7.3). Reduced networks are obtained from each partition and the true fill-in, i.e. the number of resistors in the reduced circuit is recorded and compared to the Maxfill. Table 7.1: Comparison of different partitions and reductions for the bi-component circuit P 1 P 2 P 3 Fig. 7.2 Fig. 7.3 Fig. 7.4 #nodes #terminals #nodes #terminals #nodes #terminals Subnet 1 N 1 = 60 p 1 = 1 N 1 = 120 p 1 = 37 N 1 = 79 p 1 = 27 Subnet 2 N 2 = 183 p 2 = 53 N 2 = 119 p 2 = 17 N 2 = 153 p 2 = 27 Separator k = 1 p k = 0 k = 5 p k = 0 k = 12 p k = 0 Maxfill True fill

165 Graph partitioning with separation of terminals The graph representations for the original and reduced network in each partitioning scenario are shown in Figures 7.2, 7.3 and 7.4 respectively. The corresponding reduced Ĝ matrices appear in Fig For P 1 one articulation node is chosen as the separator K, shown in blue in Fig. 7.2 (left). The reduced model (right) is dense: there is an edge (resistor element) in between almost all pairs of nodes. The reduced matrix (left of Fig. 7.5) clearly shows the fill-in. The Maxfill (1432) and True fill (1432) values in Table 7.1 are the same Figure 7.2: P 1. Original circuit (left) with a biconnected component (terminals shown in red). The connection node (blue) is picked as separator. Circuit after reduction with all terminals and separator node preserved (right) is dense (55 nodes, 1432 resistors). The partitioning P 2 is obtained with Nested dissection (NESDIS) [18], [52, 54], and picks the five separator nodes shown in blue in Fig. 7.3, left. The corresponding reduced network (Fig. 7.3, right) is sparser than the one from P 1. This is also reflected by the reduced matrix (middle of Fig. 7.5), which is more sparse. As there are 1031 resistors in this reduced net, P 2 is a better partition than P 1. For the P 3 partitioning, separator nodes were chosen as shown in Fig. 7.4 (left). The resulting reduced model on the right is sparser than from P 2 and P 1, also reflected by the reduced matrix in Fig. 7.5, right. There are 66 nodes and 828 resistors in this reduced network. Comparing the number of resistors (True fill) in the reduced networks, it is clear that P 3 is the partitioning which yields the reduced model with the least fill-in. It should be noted that this partition (and selection of separator nodes) was done by inspection, rather than by a partitioning algorithm (as was the case for P 2 which was obtained from NESDIS). This supports the initial conjecture that in contrast to the usual partitioning objectives, partitioning for multi-terminal circuits requires a specialized criterion which directly minimizes fill-in. Another important observation confirmed by this example is that the Maxfill value (7.3) is indeed only an upper bound which may be too loose to consider as a partitioning

166 Formulation of multi-terminal partitioning criterion 157 Figure 7.3: P 2. Original circuit (left) with a biconnected component (terminals shown in red). Separator nodes (blue) are the ones given by NESDIS. Circuit after reduction (59 nodes, 1031 resistors) with all terminals and separator nodes preserved (right) is sparser than the one from Fig. 7.2, but still rather dense. Figure 7.4: P 3. Original circuit (left) with a biconnected component (terminals shown in red). More nodes are picked as separators (blue). Circuit after reduction (66 nodes, 828 resistors) with all terminals and separator nodes preserved (right) is sparser than the ones from Fig. 7.2 and Fig objective in practice. Notice in Table 7.1 that although from P 3 a sparser reduced model was obtained than from P 2 (828 vs resistors respectively), partitioning P 3 has a higher Maxfill value than P 2 (1416 vs respectively). If the Maxfill criterion (7.3) were used as a partitioning criterion, P 2 would be considered a better partition than P 3, which was clearly not the case. Hence for better results, a partitioning criterion based on a tighter fill-estimating function [denoted abstractly as (7.2)] is necessary, which accounts for the degrees of the internal nodes to be eliminated as well, rather than for the number of terminals and separator nodes alone.

158 Graph partitioning with separation of terminals P 1 P 2 P 3 Figure 7.5: Reduced matrices resulting from each partition (P 1, P 2, P 3, from left to right).

A final reduced model is computed based on partition P 1, where the nodes in each subnet are additionally re-ordered with CAMD [1, 2].

These actions were described in Chapter 4 (see Sect. 4.2.3, Sect. 4.3.2 and for instance Fig. 4.11.

167 158 Graph partitioning with separation of terminals P 1 P 2 P 3 Figure 7.5: Reduced matrices resulting from each partition (P 1, P 2, P 3, from left to right). These correspond to the reduced circuits on the right of Figures 7.2 (55 nodes, 1432 resistors), 7.3 (59 nodes, 1031 resistors) and 7.4 (66 nodes, 828 resistors) respectively. A final reduced model is computed based on partition P 1, where the nodes in each subnet are additionally re-ordered with CAMD [1, 2]. From each subnet, internal nodes are eliminated one by one in the order determined by CAMD, and the fill generated in the entire matrix is monitored at each step. These actions were described in Chapter 4 (see Sect , Sect and for instance Fig In this manner, the points of minimum fill in each subnet are recorded and the internal nodes beyond the minimum fill point are preserved along with the terminals to improve sparsity. The final reduced circuit is shown in Fig. 7.6 and is much sparser than the previous models (224 resistors), due to the additional internal nodes preserved (aside from terminals and separator nodes). There are 182 nodes in this circuit, only 24% less than the original 244, demonstrating the trade-off between small dimensionality and sparsity. The last results indicate that reduced models with the least fill-in could be determined with the combined framework: partitioning + CAMD reordering per subnet + filltracking. For an original network (graph) G = (V, E) with V nodes and E edges (circuit elements), the partitioning cost is O( E ) [53] and the CAMD reordering cost is O( V E ) [1, 2, 38], hence very cheap to perform. The extra cost for tracking fill-in can however become too expensive, especially for networks with nodes and elements exceeding 10 4, and terminals exceeding To monitor the actual fill-in from node-wise elimination, reduced matrices must be formed at each node elimination step, and the corresponding number of non-zero entries recorded. For example, for a subnet with n 1 + p 1 nodes, which communicates via k separator nodes to other subnets, n 1 internal nodes must be eliminated one by one. Hence n 1 reduced matrices must be formed, which also become denser and denser as fill is generated in the blocks corresponding to the non-eliminated nodes. The extra cost of repeatedly storing and retrieving from memory these reduced matrices may dominate the otherwise cheap partitioning and reordering operations. This effect was already shown for the Filter example of Chapter 4, see in particular the result in Table 4.2. There, the SpRC-mf reduced model formed with

168 Formulation of multi-terminal partitioning criterion Figure 7.6: P 1 + CAMD. Sparse reduced circuit (left) with the single connection node as separator (blue), but with additional internal nodes preserved [identified from CAMD re-ordering and fill-in tracking]. The corresponding sparse reduced matrix is on the right (182 nodes, 224 resistors). the combined framework: partitioning + CAMD reordering per subnet + fill-tracking, was indeed sparser than SpRC obtained from partitioning alone, but was almost 10 times slower to compute than SpRC. A partitioning criterion for multi-terminal MOR is thus envisioned which uses a fillestimating function (7.2) (similar to the approximate minimum degree criterion of [1, 2]) as a direct objective to determine the best partition. In this manner reduced models with minimum-fill could be ensured already from the partitioning itself, and the extra fill-monitoring costs would be avoided N-way multi-terminal partitioning The N-way partitioning of a graph G = (V, E) is defined here as follows: find a partitioning of V into 2N 1 disjoint subsets V = V 1 V 2... V N K 1 K 2... K N 1 where the V i, 1 i N are the nodes of the individual subnets, such that there is no edge from any node in V i to any node from V j for all i = j. The K j, 1 j N 1 are the sets of separator nodes comprising the communication among the subnets induced by the V i s. This is best visualized using the example of a 4-way partitioning, shown in Fig. 7.7 in the form of a border-block-diagonal (BBD) matrix ordering together with the corresponding separator tree. In particular, assuming N = 2 m, m 1, the N-way partitioning produces a binary tree of height m, with N = 2 m leaf blocks 4 and N 1 = 2 m 1 non-leaf blocks. The leaves of the tree are the individual subnets to be reduced, the nonleaves are the sets of separator nodes to be preserved. As for the 2-way partitioning, in each V i, 1 i N there are n i internal nodes and p i terminals. The number of separator 4 The traditional node of a tree is denoted here as a block, as not to mistake it for a node in the graph G itself.

169 Graph partitioning with separation of terminals 160 K3 K2 V4 (n4 + p4) BBD structure K1 V3 (n3+p3) V2 (n2+p2) V1 (n1+p1) Separator tree Figure 7.7: Border block diagonal (BBD) matrix structure and separator tree for a 4-way partitioning. As N = 4 = 22, the separator tree has height m = 2, N = 4 leaves (subnets to be reduced) and N 1 = 3 sets of separator nodes. nodes in each K j set, 1 j N, is k j. The reduction eliminates the ni internal nodes from the Vi subnets (the leaves of the separator tree). The multi-terminal partitioning criterion is formulated similarly as in Sect : Problem 2: multi-terminal N-way partitioning Given a number of partitions N, find the smallest number of separator nodes k j, 1 j N 1 which partitions G into N subnets such that the elimination of internal nodes ni, 1 i N from each subnet introduces the smallest amount of fill in the terminal blocks pi, the separator blocks k j, and the communication blocks from pi to the k j nodes. Another natural question is determining the optimal number of partitions N itself, hence a generalized problem is formulated: Problem 3: generalized multi-terminal partitioning For a multi-terminal network G, find the number N of partitions which distributes the p terminals across partitions so that: 1. the number of separator nodes k j, 1 j N 1 is minimized, and 2. the elimination of internal nodes ni, 1 i N from each subnet introduces the smallest amount of fill in the terminal blocks pi, the separator blocks k j, and the communication blocks from pi and to the k j nodes. As for the 2-way partitioning, in the N-way multi-terminal partitioning the sizes of the different parts need not be equal. Rather, an objective function similar to (7.2) for estimating the fill described by Problem 3 in the N-way case could be used directly as a partitioning criterion. For a partition into N = 2m subnets (and N 1 separator blocks),

Identification of Electrical Circuits for Realization of Sparsity Preserving Reduced Order Models

Identification of Electrical Circuits for Realization of Sparsity Preserving Reduced Order Models Christof Kaufmann 25th March 2010 Abstract Nowadays very-large scale integrated circuits contain a large