Variability Aware Statistical Timing Modelling Using SPICE Simulations

Similar documents
Chapter 2 Process Variability. Overview. 2.1 Sources and Types of Variations

Interconnects. Wire Resistance Wire Capacitance Wire RC Delay Crosstalk Wire Engineering Repeaters. ECE 261 James Morizio 1

Statistical Performance Modeling and Optimization

VLSI GATE LEVEL DESIGN UNIT - III P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

ELEN0037 Microelectronic IC Design. Prof. Dr. Michael Kraft

Interconnects. Introduction

L ECE 4211 UConn F. Jain Scaling Laws for NanoFETs Chapter 10 Logic Gate Scaling

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Topics to be Covered. capacitance inductance transmission lines

Pre and post-silicon techniques to deal with large-scale process variations

EEC 118 Lecture #16: Manufacturability. Rajeevan Amirtharajah University of California, Davis

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

An Analytical Approach to Efficient Circuit Variability Analysis. in Scaled CMOS Design. Samatha Gummalla

STATIC TIMING ANALYSIS

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Lecture 12 CMOS Delay & Transient Response

and V DS V GS V T (the saturation region) I DS = k 2 (V GS V T )2 (1+ V DS )

EECS240 Spring Lecture 21: Matching. Elad Alon Dept. of EECS. V i+ V i-

Luis Manuel Santana Gallego 71 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model 1

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

NANO-CMOS DESIGN FOR MANUFACTURABILILTY

Efficient Circuit Analysis under Multiple Input Switching (MIS) Anupama R. Subramaniam

PARADE: PARAmetric Delay Evaluation Under Process Variation *

Integrated Circuits & Systems

MOS Transistor Theory

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences. Professor Oldham Fall 1999

VLSI Design and Simulation

Lecture 5: DC & Transient Response

MOSFET: Introduction

Simple and accurate modeling of the 3D structural variations in FinFETs

Integrated Circuits & Systems

Topic 4. The CMOS Inverter

Variation-Resistant Dynamic Power Optimization for VLSI Circuits

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Interconnect (2) Buffering Techniques. Logical Effort

Chapter 2. Design and Fabrication of VLSI Devices

Statistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations

LECTURE 3 MOSFETS II. MOS SCALING What is Scaling?

EE115C Digital Electronic Circuits Homework #5

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

Max Operation in Statistical Static Timing Analysis on the Non-~Gaussian Variation Sources for VLSI Circuits

ECE 342 Electronic Circuits. 3. MOS Transistors

ECE 546 Lecture 10 MOS Transistors

Lecture 16: Circuit Pitfalls

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

ECE 438: Digital Integrated Circuits Assignment #4 Solution The Inverter

Fig. 1 CMOS Transistor Circuits (a) Inverter Out = NOT In, (b) NOR-gate C = NOT (A or B)

DKDT: A Performance Aware Dual Dielectric Assignment for Tunneling Current Reduction

Lecture 4: DC & Transient Response

Dynamic Repeater with Booster Enhancement for Fast Switching Speed and Propagation in Long Interconnect

Very Large Scale Integration (VLSI)

Lecture 5: DC & Transient Response

CMOS Transistors, Gates, and Wires

Name: Grade: Q1 Q2 Q3 Q4 Q5 Total. ESE370 Fall 2015

EE5780 Advanced VLSI CAD

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

S No. Questions Bloom s Taxonomy Level UNIT-I

The Wire. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

CMPEN 411 VLSI Digital Circuits. Lecture 03: MOS Transistor

Scaling of MOS Circuits. 4. International Technology Roadmap for Semiconductors (ITRS) 6. Scaling factors for device parameters

Lecture 0: Introduction

nmos IC Design Report Module: EEE 112

Chapter 5 MOSFET Theory for Submicron Technology

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

ENEE 359a Digital VLSI Design

Lecture 12: MOS Capacitors, transistors. Context

ECE 342 Electronic Circuits. Lecture 6 MOS Transistors

POST-SILICON TIMING DIAGNOSIS UNDER PROCESS VARIATIONS

9/18/2008 GMU, ECE 680 Physical VLSI Design

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Lecture 6: DC & Transient Response

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

SINCE the early 1990s, static-timing analysis (STA) has

Digital Integrated Circuits (83-313) Lecture 5: Interconnect. Semester B, Lecturer: Adam Teman TAs: Itamar Levi, Robert Giterman 1

Fast Buffer Insertion Considering Process Variation

Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

Lecture 15: Scaling & Economics

CMOS Logic Gates. University of Connecticut 181

Impact of Modern Process Technologies on the Electrical Parameters of Interconnects

University of Toronto. Final Exam

VLSI VLSI CIRCUIT DESIGN PROCESSES P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

ECE 497 JS Lecture - 12 Device Technologies

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 7, JULY

The Effects of Process Variations on Performance and Robustness of Bulk CMOS and SOI Implementations of C-Elements

Lecture 9: Interconnect

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Low Power VLSI Circuits and Systems Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CMPEN 411 VLSI Digital Circuits Spring 2012

Interconnect (2) Buffering Techniques.Transmission Lines. Lecture Fall 2003

Lecture #39. Transistor Scaling

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

From Physics to Power, Performance, and Parasitics

DC and Transient. Courtesy of Dr. Daehyun Dr. Dr. Shmuel and Dr.

Impact of parametric mismatch and fluctuations on performance and yield of deep-submicron CMOS technologies. Philips Research, The Netherlands

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

EECS 151/251A Homework 5

Transcription:

Variability Aware Statistical Timing Modelling Using SPICE Simulations Master Thesis by Di Wang Informatics and Mathematical Modelling, Technical University of Denmark January 23, 2008

2

Contents List of Figures vi List of Tables ix Acknowledgements xi Abstract xiii Introduction 1 1 Background 7 1.1 Sources of Variations........................... 8 1.2 Specific Variations on Timing...................... 9 1.3 Variation Probability Distribution................... 10

ii CONTENTS 1.4 Trends in Variations........................... 11 1.5 Traditional Corner-based Timing Modelling.............. 13 1.6 Statistical timing modelling....................... 13 1.6.1 Modelling in parameter-space.................. 14 1.6.2 Modelling in performance-space................. 15 1.6.2.1 Analytical Models................... 15 1.6.2.2 Semi-empirical Models................. 16 1.6.2.3 Table Lookup Models................. 17 1.6.3 CMOS 65nm technology variation modelling......... 17 1.7 Summary................................. 18 2 Static Timing Modelling of Interconnect 21 2.1 Wires and Vias Parameter Extraction................. 22 2.2 Distributed RC Delay Model...................... 24 2.3 Interconnect Timing Characterization................. 25 2.4 Uniform Buffer Insertion......................... 28 2.4.1 Different Routing Layers..................... 29 2.4.1.1 Local Layer....................... 29

CONTENTS iii 2.4.1.2 Intermediate Layers.................. 33 2.4.2 Delay Estimation Model..................... 34 2.4.3 Simulation Results and Comparisons.............. 36 2.5 Summary................................. 39 3 Statistical Timing Modelling 41 3.1 Statistical Standard Cell Characterization............... 42 3.1.1 Standard Cell Timing under Global Variations........ 42 3.1.2 Standard Cell Timing under Local Variations......... 47 3.1.3 Standard Cell Statistical Timing Models............ 52 3.2 Statistical Interconnect Characterization................ 52 3.2.1 Interconnect Structure...................... 56 3.2.2 Interconnect Geometry Parameter Variation.......... 56 3.2.3 Dielectrics Variation....................... 58 3.2.4 Interconnect Electrical Parameter Variation.......... 58 3.2.5 Interconnect Statistical Timing Models............ 62 3.3 Uniform Buffer Insertion under Parametric Variations........ 66 3.4 Summary................................. 69

iv CONTENTS 4 Design Flow and Automation 71 4.1 Statistical Timing Analysis Flow.................... 72 4.2 Characterization Automation...................... 74 4.3 Summary................................. 76 5 Conclusion 77 Bibliography 79 A Tcl Descriptions 85 A.1 Tcl for Standard Cell Characterization................. 85 A.2 Tcl for Interconnect Characterization.................. 98 A.3 Tcl for Uniform Buffer Insertion..................... 112 B Principle Component Analysis Methods 125 B.1 Multiple Linear Regression Method................... 125 B.2 Pareto Analysis.............................. 126

List of Figures 1 Relative importance of variations in Device, Interconnect and Environment from [9]............................. 4 1.1 Technology parameter variations from [9]............... 12 2.1 Wire segment model........................... 24 2.2 Interconnect static timing characterization............... 26 2.3 Circuit for HSPICE simulations..................... 28 2.4 Delay per Unit vs. Wire segment length, local layer......... 30 2.5 Best wire segment length vs. Buffer drive strength, local layer.... 31 2.6 Minimum delay per unit length vs. Buffer drive strength, local layer 32 2.7 Wire segment with vias between metal layers............. 33

vi LIST OF FIGURES 2.8 Delay per Unit vs. Wire segment Length, intermediate layer..... 35 3.1 Standard cell INVX2 propagation delay distribution under global variations................................... 43 3.2 Propagation delay sensitivities to different sources of variations... 44 3.3 V th variations of identical transistors.................. 48 3.4 U0 variations of identical transistors.................. 49 3.5 Standard cell INVX2 timing distribution under local variations... 50 3.6 INVX31 Timing Distribution...................... 53 3.7 Statistical standard cell timing characterization............ 54 3.8 Interconnect structure.......................... 57 3.9 Interconnect timing distribution under parametric variations.... 63 3.10 Statistical interconnect characterization................ 64 3.11 DPU + 3σ vs. Wire Segment Length.................. 67 3.12 Standard Deviation of DPU (σ) vs. Wire Segment Length...... 68 4.1 Statistical timing analysis flow..................... 72 4.2 Timing characterization automation flow................ 75

List of Tables 1 CMOS technology roadmap from [4].................. 2 2.1 Wires parameters............................. 23 2.2 Vias parameters............................. 23 2.3 Delay of Π-models with different number of segments........ 25 2.4 Local layer interconnect static timing model.............. 27 2.5 Optimal buffer number and wire delay comparison between simulations and calculations for the local layer with maximal C unit..... 37 2.6 Optimal buffer number and wire delay comparison between simulations and calculations for the intermediate layer with maximal C unit 38 3.1 Weight of each parameter for delay variation............. 45

viii LIST OF TABLES 3.2 Coefficients of Equation 3.1....................... 46 3.3 Standard cell statistical timing data.................. 46 3.4 Weight of each parameter for delay variation............. 51 3.5 Standard cell delay variation (standard deviation) caused by global and local parametric variations separately............... 51 3.6 Statistical timing model of INVX2................... 55 3.7 Statistical timing model of INVX31................... 55 3.8 Statistical timing model of NAND2X2................. 55 3.9 Statistical timing model of NOR2X6.................. 56 3.10 CMOS 65nm interconnect geometry parameter variation....... 58 3.11 CMOS 65nm dielectric relative permittivity variation......... 58 3.12 Interconnect unit length resistance and unit length capacitance variations with minimal width and minimal spacing............ 60 3.13 Weight of each parameter variation for the unit length resistance variation, local layer with minimal wire width and spacing........ 61 3.14 Weight of each parameter variation for the unit length capacitance variation, local layer with minimal wire width and spacing...... 61 3.15 Weight of each parameter variation for the interconnect delay, local layer with minimal wire width and spacing.............. 62

LIST OF TABLES ix 3.16 Local layer interconnect statistical timing models........... 65 3.17 Best wire segment length with and without parametric variations.. 69 4.1 Path timing data............................. 74

x LIST OF TABLES

Acknowledgements First and foremost, I would like to thank my supervisor, Associate Professor Alberto Nannarelli for his guidance, concern and help through my MSc. His advice and confidence in me have been encouraging me and they are very important in the completion of this thesis. I am grateful to have the pleasure of knowing and working with him. I also want to thank my thesis co-supervisor, Dr. Tobias Bjerregaard, CEO of Teklatech. Thanks for his idea of this thesis project. His insight, advice and comments regarding this thesis are crucial in completing this work. A special thanks goes to Associate Professor Flemming Stassen for giving me helpful advices along my MSc study. I own my gratitude to my family and friends. Specially, I would like to thank my parents for their constant support and encouragement. I want to thank Wei Liu and Bing Zhang for their various discussions and help. Last but not least, I want to give my special thank you to Xiaojun for her trust, patience and support through my MSc.

xii ACKNOWLEDGEMENTS

Abstract As technology scaling enters into nanometer geometries, there is a significant increase in performance uncertainty of SoC designs due to parametric variations. Variability is posing an increasing challenge in timing analysis for designers. The traditional corner-based approach is not effective anymore and it will induce pessimism. Therefore, statistical timing analysis has become very important in overcoming the weakness of the traditional methods. By constructing statistical timing models for variations, chip performance and parametric yield will be more accurately predicted. In this thesis, variability aware statistical timing models of standard cells and system level interconnects are going to be developed by using SPICE simulations.

xiv ABSTRACT

Introduction As CMOS technology scales down to nanometer, chip devices and interconnects are increasingly susceptible to process variations introduced during manufacturing process [1][2][3]. Table 1 shows that the magnitude of parameter variations can not scale down as fast as the nominal values, so the parameter variation, as a percentage of the nominal value, becomes larger and larger [4]. Thus, without properly addressing and predicting variations in devices and interconnects, the yield of chips can drop to unacceptably low levels due to timing violations. It has been shown that process variations can cause up to 30% variation in frequency of a chip fabricated in 180nm CMOS technology [5] and the timing spread due to process variations has been increasing by more than 25% for each new technology [6]. An approach to accurately identify, model and analyze these variations is essential for 65nm technology and below to achieve timing closure and high yields. The traditional corner-based static timing approach can not address variations in an efficient way and tends to be very pessimistic. The corner-based analysis focuses at

2 Introduction Parameters Nominal Values 3σ Values Years 1997 1999 2002 2005 2006 1997 1999 2002 2005 2006 L eff [nm] 250 180 130 100 70 80 60 50 40 33 T ox [nm] 5.00 4.50 4.00 3.50 3.00 0.40 0.36 0.39 0.42 0.48 Vdd [V] 2.50 1.80 1.50 1.20 0.90 0.25 0.18 0.15 0.12 0.09 V th [mv] 500 450 400 350 300 50 45 40 40 40 W [µm] 0.80 0.55 0.50 0.40 0.30 0.20 0.17 0.14 0.12 0.10 H [µm] 1.20 1.00 0.90 0.80 0.70 0.30 0.30 0.27 0.27 0.25 ρ [mω/sq] 45 50 55 60 75 10 12 15 19 25 Table 1: CMOS technology roadmap from [4] the extreme points of the parameters and considers all parameters to move in the same direction. It does not address the variations of parameters when they move in different directions, which can result in over design or even chip failure. Moreover, an extremely large number of corners will be generated and analyzed in order to model parametric variations. Recent efforts have been active in the area of statistical static timing analysis to overcome the barrier in the traditional static timing methodology, which focus on the propagation of gate delay variability along paths and probabilistically solve timing margin. Although these approaches aim to precisely predict chip performances and parametric yields as probability distributions, their accuracy strongly depends on the modelling of the gate and interconnect timing variability that are employed as the fundamental [7]. This work falls into two major parts: static timing modelling of chip interconnects and statistical timing modelling of both standard cells and chip interconnects. In conventional IC design, standard cells are well characterized but the interconnect

Introduction 3 characteristics are difficult to get until the detailed routing is finished. In the first part of this work, we develop an early system interconnect characterization method which is based on [8] and derive static timing models for different interconnect layers. Moreover, we consider the uniform buffer insertion method for delay minimization and build a delay estimation model from simulations. Our main objective is to construct statistical timing models for both standard cells and chip interconnects to address their timing variations since their relative importance in chip timing performance as Figure 1. In the first portion of statistical timing modelling part, we identify the most important process parameters in affecting the standard cell timing variation among different variation sources. Based on these parametric variations, we build statistical timing models addressed for standard cell (NAND2, NOR2, INV) global and local variations. In the next portion, we extend our static interconnect characterization method to take interconnect (wire) parametric variations into consideration and statistically characterize system interconnects with different dimensions. Furthermore, the uniform buffer insertion method under standard cell and interconnect variations for both delay minimization and yield optimization is compared to the uniform buffer insertion method without considering process variations. In order to apply these statistical characterization results, we develop an design flow for statistical timing analysis. We apply our standard cell and interconnect statistical timing models into a simple circuit and compare to Monte Carlo simulation results. This work is carried out by running HSPICE simulations of standard cells and interconnects in CMOS 65nm technology. The simulation automation is implemented by Tcl script files which can generate HSPICE deck and extract useful results automatically.

4 Introduction 70 60 50 Wire Percent 40 30 Device 20 Environment 10 0 1997 1999 2001 2003 2005 2007 Year Figure 1: Relative importance of variations in Device, Interconnect and Environment from [9]

Introduction 5 This work is organized as follows. Chapter 1 introduces background concepts related to statistical timing modelling and previous work in this area. Chapter 2 describes static timing modelling for chip interconnect. Chapter 3 presents statistical timing modelling for standard cells and chip interconnects. Chapter 4 summarizes the statistical timing analysis flow and characterization automation. Chapter 5 draws the conclusions. There are two appendixes: Appendix A includes the Tcl descriptions for the characterization automation and Appendix B gives the principle component analysis methods we use in this work.

6 Introduction

Chapter 1 Background Introduction The main purpose of this chapter is to provide the necessary background for the concepts and methods presented in this work. First, we classify the sources of variations and illustrate some important parametric variations which affect the timing performance. Then, we investigate the distributions of different parameter variations and observe the trends of variations from different technologies. In the last part of the chapter, we briefly describe the traditional corner-based timing modelling and the existing work in statistical timing modelling.

8 Background 1.1 Sources of Variations The variation is the deviation from intended or designed values for a structure or circuit parameter of concern [2]. There are various ways of categorizing variability. The sources of variations which affect IC performance can fall into two categories [9][10]: Environmental Variations which are due to unpredictable operating conditions such as the power supply voltage, chip s temperature, coupling noise, switching activity etc. Fabrication/Manufacturing Variations which are subject to fabrication processing and masking imperfections and various wearout mechanisms, and result in Front End of Line (FEOL) and Back End of Line (BEOL) variations that are essentially permanent. According to [14], FEOL or active transistors and gates variations have transistor gate length and gate width, gate oxide thickness, and doping related variations as the primary components. BEOL or interconnecting metal and dielectric layers variations entail line with and spacing, metal/dielectric thickness, metal resistivity and via resistance as the major components. Variability can also be characterized as die-to-die variations and within-die variations based on its inherent length scale [15]. Die-to-Die or Inter-die variation: Inter-die variation considers the difference of a parameter across nominally identical dies (they can be on the same wafer, or on different wafers, or on different lots) [2] and within the same die, there is no parameter variations taken into account. Inter-die variations

Specific Variations on Timing 9 are caused by the fabrication process and they are mainly independent of the design implementation. Within-Die or Intra-die variation: Intra-die variation is the difference of a parameter spatially within the same die. Intra-die variations are both fabrication and design implementation dependent. Intra-die variations can be further divided into systematic or random variations. Systematic variation: Systematic variations, such as those caused by lithography and chemical mechanical polishing (CMP), are deterministic and can be modeled. These variations depend on neighboring layout feature dimensions or local layout density, and their effects on circuit performance tend to cancel each other and leave spatial correlation of process parameters as the major contribution to the circuit performance variation [26]. Random variation: Random variations include the phenomena that are generally not predictable, such as dopant fluctuations in a channel. 1.2 Specific Variations on Timing Given the different categories of variations, some variations are more important since their impact on the timing performance. In this section, some specific variations which have high impact on the timing performance are listed according to the previous work [2][15][29][30]. Device geometry variations: gate length, width and oxide thickness have impact on the gate timing and gate length in particular has the highest impact.

10 Background Device electrical parameter variations: threshold voltage (V th ) has high effect on the gate timing. Interconnect geometry variations: wire width and spacing between wires, metal and dielectric thickness all impact the interconnect timing. Interconnect material parameter variations: metal resistivity, contact and via resistances are factors in affecting the interconnect timing. Environmental variation: supply voltage (V DD ) has also impact on timing. The reason of the above variations may have high impact on timing is mostly from an analytical way which is based on the existing models describing the relationship between the performance and the corresponding parameter. For the parameters which does not have a direct model, the sensitivities of the timing performance to the parametric variations can be carried out by simulations. However, most of the previous work are based on some assumptions or experiments within a small set of parameters, and due to the correlations between these variations, there might be other sources of variations which also have impact on timing. 1.3 Variation Probability Distribution Most researchers assume process variations as Gaussian (Normal) distributions. One major reason is that there exists a lot of mathematical framework to facilitate it. Experiments in [27] also show that non-gaussian distributions of parameter variations can be approximated by Gaussian ones with a reasonable accuracy in many cases. However, variability in parameters may not always be accurately modeled as a Gaus-

Trends in Variations 11 sian distribution. Some process parameters have significantly non-gaussian probability distributions. For example, via resistance is known to have an asymmetric probability distribution [10][27]. Forcing such a significantly non-gaussian distribution to be a Gaussian distribution may introduce large errors. The authors of [28] model non-gaussian parameter variation as a quadratic function of some other Gaussian random variable X, Y = ax 2 + bx + c (1.1) where Y is the non-gaussian parameter variation. They show a reasonable fitting accuracy with different non-gaussian distributions. The authors of [27] extend the first-order canonical form (1.8) to handle arbitrary non-gaussian and nonlinear parameter variations, n A = a 0 + a i X i + f A ( X N ) + a n+1 R a (1.2) i=1 where X N = ( X N,1, X N,2,...) is a vector of nonlinear and/or non-gaussian parameters and f A is a function describing the dependence on nonlinear and non- Gaussian parameters. Note that the only difference between (1.8) and (1.2) is by the term f A ( X N ), which can be specified by a table for numerical computations. The experimental results show that their methods achieve closer results of Monte-Carlo simulations than the first order canonical timing model. 1.4 Trends in Variations Three trends in variations have been observed: Most of the technology parameter variations as percentages of their scaled values are increasing in each technology, and the effective channel length (L eff )

12 Background variation percentage is increasing very fast. various parameters. Figure 1.1 shows the trends of Interconnect variations are getting more important since its increasing influence in the advanced technologies [9][1]. From Figure 1, relative importance of variations for timing performance in device, interconnect and environment can be seen based on the analysis in [9]. In the past, die-to-die variability, which is managed by the worst case design methods, dominates over within-die variability. In the present and the future, within-die variability becomes increasingly significant [18]. Figure 1.1: Technology parameter variations from [9]

Traditional Corner-based Timing Modelling 13 1.5 Traditional Corner-based Timing Modelling The deterministic, static timing analysis deals with variations in a corner-based manner, or case-based manner, e.g., best case, worst case, and normal timing. It tries to identify the process corners to ensure that the performance of the design is acceptable within the extreme boundaries. It has been proved that the inter-die variation is well considered by worst-case corners [9][18]. Foundries usually provide at least four corners (nmos:fast, pmos:fast), (nmos:slow, pmos:slow), (nmos:fast, pmos:slow), (nmos:slow, pmos:fast) and together with the typical point (nmos:typical, pmos: typical) to characterize transistors. However, the corner-based modelling has several serious limitations. First, it requires an exponential number of corners as the number of parameter variations increases. Second, it may over-estimate or underestimate variations. It is very pessimistic to assume that the worst-case occurs when every parameter is at its worst-case value, which is also unrealistic and increases the design effort; whereas, since it is impossible to analyze all possible corners, the missing corners may cause parametric yield loss. Third, the corner-based modelling can only provide designers with quantitative information about the robustness and sensitivities of their designs [16]. 1.6 Statistical timing modelling The limitations of corner-based modelling drive us to look at the timing in a fundamentally new way. That s why the statistical timing comes into consideration. Statistical timing analysis models gate and wire delays as probability distributions with complex correlations and predicts chip performances and parametric yields as probability distributions. The advantages of the new paradigm include fast turnaround, incremental operation for optimization, pessimism reduction, sensitivity prediction,

14 Background and the enabling of performance-versus-yield trade-offs [19]. Statistical timing modelling can be broadly characterized into parameter-space (or input space) and performance-space (or output space) [16][20]. 1.6.1 Modelling in parameter-space Modelling in parameter-space is actually a variation extraction process. Process variations are usually represented by continuous probability distributions and most researchers assume them to be normal or Gaussian distributions. The modelling involves decomposition of the variations into different components [14][16][17]. The process variations in parameter P can be modeled as, P var = P interdie (global) + P intradie (x, y) + P intradie (random) (1.3) where P var is the total variation, P interdie (global) represents die-to-die variations, P intradie (x, y) represents within-die spatially correlated variations, (x,y) is the coordinate system on the die, and P intradie (random) represents within-die random variations. Then the parameter P can be expressed as, P = P norm + P var (1.4), where P norm is the normal value of parameter P. As an example, Pelgrom [23] proposed a mismatch model for a transistor pair with the same size. The variations are modeled by normal distributions with zero mean and variance defined as, σ 2 ( P ) = s 2 pd 2 12 + a p W L (1.5), where W and L are the gate width and gate length of the transistors; D 12 is the distance between the pair of the transistors; s p and a p are process dependent

Statistical timing modelling 15 constants. Compared to Equation (1.3), there is no inter-die component of variation, and the first term on the right hand side of Equation (1.5) models the intra-die spatially correlated variation and the second term represents the intra-die random variation. 1.6.2 Modelling in performance-space Modelling in performance-space is to predict the impacts of parameter variations on performances, which involves modelling performance variables such as delays, arrival times, required arrival times and slacks [20]. In general, these models can be expressed as, r = F (d, p) (1.6), where r represents the performance variable, such as propagation delay; d is a vector of designable parameters (device sizes); and p is a vector of process (e.g. T ox ), environmental (e.g. temperature T) and empirical (e.g. mobility reduction) parameters [17]. These models can be categorized into three classes [21] based on the ways of evaluating function F,. 1.6.2.1 Analytical Models These models for variability are usually based on a nominal model which describes the typical relationship between the underlying physics and the corresponding performance. Then, variations are characterized and once the characterization results are available, two methods can be employed: either running simulations (e.g. Monte Carlo algorithm) which are used to study the resulting variation in performance and

16 Background find the fitting model parameter; or theoretically calculating the variability affect in performance. As an example, the authors of [15][24][7] built their timing models addressing to variability based on the following equation, T d = C LV DD I on (1.7), where T d is the gate delay; C L is the load capacitance; V DD is the supply voltage; and I on is the saturation current. Based on this nominal model, [15] got the delay variation model by simulations, while [24][7] derived their models theoretically. 1.6.2.2 Semi-empirical Models They model variability through various approximation methods and base upon the most important sources of variations. These models are usually easier to derive but less accurate compared to analytical models. As an example, Visweswriah [25] proposed a canonical first-order delay model for all timing quantities. The model allows for both global correlations and independent randomness. All gate and wire delays, arrival times, slacks, slews and so forth are expressed as n a 0 + a i X i + a n+1 R a (1.8) i=1, where a 0 is the nominal value; X i, i = 1, 2,..., n, represent the variation of n global sources of variation X i ; a i, i = 1, 2,..., n, give the sensitivities to each of the global sources of variation; R a is the variation of an independent random variable R a ; and a n+1 is the sensitivity of the timing quantity to R a.

Statistical timing modelling 17 1.6.2.3 Table Lookup Models These models are simulation based and variations can be easily modeled into the lookup tables. This type of models are technology independent and are easier to derive compared to the above two types. However, there is no physical insight into device behavior and it is time consuming to build these models. As an example, Bao [26] proposed a parameterized current source gate model for process variations by constructing two lookup tables, I o (V i, V o, ε) and C g (V i, V o, ε), where I o is gate output current, C g is intrinsic gate capacitance; ε represents process variations; V i and V o are a pair of gate input and output voltages. He also constructed a gate performance model based on the above current source gate model. The gate performance model is also a lookup table establishing functional relationships between the gate performance, the process variations and the input signal waveform variation, D g (g i ) = f(d g (g j ), ε) (1.9), where gate g j precedes gate g i in a signal propagation path; and D g (g i ) is a vector of signal arrival times. 1.6.3 CMOS 65nm technology variation modelling Many good points of CMOS 65nm technology variation modelling have been presented in [31]. Two types of variations are defined: global and local variations, which are the same as inter-die and intra-die variations described respectively in Section 1.1. Variation distributions are specified in three different ways, Gaussian distribution, Lognormal distribution (the logarithmic value of the variation is defined by a Gaussian distribution), and Uniform distribution. Three kinds of correlations

18 Background among global variations are assumed: Intrinsic, Intra-Family, and Extra-Family. Three modelling methodologies are available for the technology variations: statistical, user defined corner, and pre-defined corner models. Parameters are provided to the user by the library to enable these models and define them. All these modelling methods are based on typical models, and each variation is at least linked to one model parameter. For statistical models, Monte-Carlo simulation method is employed. For user defined corner models, the combination of the technology variations can be fully defined by the user. The pre-defined corner modelling is the traditional best case, worst case method (e.g. SS,FF,SF,FS corners). 1.7 Summary Categorizing variations can help modelling them and two ways of classifying variations have been summarized: environmental and manufacturing variations; inter-die (global) and intra-die (local) variations. Intra-die variations can be further divided into systematic and random categories. Three trends in variations have also been observed In practice, most of the variation distributions can be approximated by Gaussian and only some significant non-gaussian distributions need to take special considerations. Some variations (L eff, V th, etc.) have high impact on timing performance. In order to characterize these variations and their impact on timing performance, an accurate modelling method is needed. Traditional corner based methods are too pessimistic and will tradeoff much design effort. Statistical timing modelling comes as a alternative, which deals variations in a statistical way. The statistical timing models can be broadly classified into parameter space and performance space. In parameter space, variations are extracted and characterized by inter-die, intra-die

Summary 19 systematic and intra-die random variations. In performance space, models can be further categorized into three classes according to the way of deriving these models: analytical, approximation based semi-empirical and simulation based lookup table models.

20 Background

Chapter 2 Static Timing Modelling of Interconnect Introduction In this chapter, we describe our work in static timing modelling of chip interconnect (wire). In the first part of the chapter, we summarize the parameter extraction of different layer wires and vias. Then distributed RC delay model used in this work is described. Moreover, the interconnect timing characterization is shown. Finally, the uniform buffer insertion method for different routing layer interconnects, the delay estimation model and simulation results are described.

22 Static Timing Modelling of Interconnect 2.1 Wires and Vias Parameter Extraction The wire resistance is calculated as, R = R sheet (L/W ) (2.1), where R sheet is the sheet resistance of different metal layer; L is the wire length and W is the wire width. The wire capacitance is much more complicated to extract and it consists of the following different types of capacitances: the parallel plate capacitance of the bottom and top of the wire to the below and above metal layers respectively, the fringing capacitance arising from fringing fields along the edge of the wire, and neighbor wires coupling capacitance in the same layer. capacitance is calculated as, The wire C = C unit L (2.2), where C unit is the unit length capacitance of the wire. There are many ways to estimate C unit and in this section, we use HSPICE built-in two dimensional field solver which can compute C unit from a cross-sectional description of the wire. Since C unit is heavily dependent on the actual layout, we give a maximal and minimal estimation for the minimal width wire. The maximal case is with metal planes above and below the estimated wire and with minimal spacing neighboring wires in the same layer. The minimal case is with substrate below and nothing above and also no neighboring wires in the same layer. Different metal layers parameters have been extracted for CMOS 65nm technology as in Table 2.1. The parameters of different vias between metal layers have also been extracted for CMOS 65nm technology as in Table 2.2.

Wires and Vias Parameter Extraction 23 Metal layer Width [µm] Space [µm] Thickness [µm] Sheet resistance [Ω/sq] Maximal Unit Capacitance [af/µm] Minimal Unit Capacitance [af/µm] 1 0.09 0.09 0.18 0.175 215 42 2 0.10 0.10 0.22 0.135 201 40 3 0.10 0.10 0.22 0.135 201 36 4 0.10 0.10 0.22 0.135 201 33 5 0.10 0.10 0.22 0.135 210 31 6 0.40 0.40 0.90 0.024 287 59 Table 2.1: Wires parameters Via layer W(µm) L(µm) T(µm) S(µm) R(Ω) C(aF) Via1 0.10 0.10 0.16 0.10 1.0 3-15 Via2 0.10 0.10 0.16 0.10 1.0 3-15 Via3 0.10 0.10 0.16 0.10 1.0 3-15 Via4 0.10 0.10 0.16 0.10 1.0 3-15 Via5 0.36 0.36 0.60 0.34 0.5 4-16 Table 2.2: Vias parameters

24 Static Timing Modelling of Interconnect 2.2 Distributed RC Delay Model In this work, the wire delay is modeled as distributed RC delay. A four-segment Π-model is used for HSPICE simulation as Figure 2.1. Figure 2.1: Wire segment model In order to see the accuracy of four-segment compared with the true distributed RC delay, a series of experiments have been carried out for different number of segments with different wire segment lengths and the results are in Table 2.3. The wire segment length varies from 0.1mm to 3mm and the inaccuracy of four-segment compared with the true distributed RC delay is within 2%. The best wire segment length is less than 1mm for 65nm technology, so the four-segment Π-model is accurate enough to represent segmented wire delay.

Interconnect Timing Characterization 25 L(mm) 1- seg(ps) 2- seg(ps) 4- seg(ps) 6- seg(ps) 8- seg(ps) 10- seg(ps) 12- seg(ps) 0.1 3.83 3.86 3.86 3.86 3.86 3.86 3.86 0.5 62.50 67.80 69.10 69.20 69.30 69.30 69.30 1.0 212.00 233.00 240.00 241.00 242.00 242.00 242.00 2.0 755.00 829.00 849.00 854.00 856.00 857.00 857.00 3.0 1629.00 1781.00 1818.00 1827.00 1831.00 1832.00 1833.00 Table 2.3: Delay of Π-models with different number of segments 2.3 Interconnect Timing Characterization The goal of the interconnect timing characterization is similar to the standard cell characterization by providing timing tables representing the timing behavior (propagation delay and output transition time) of wires with respect to their driving gates and loading gates. The characterization process is illustrated in Figure 2.2 and it is based on [8]. A number of HSPICE transient analysis simulations are run for all relevant combinations of the driving gates, the system wires with different dimensions and the loading gates. The driving gates and loading gates are from standard cell library and they are all inverters in our work. The timing behaviors (propagation delay and output transition time) of interconnects are collected in a tabular way and the local layer interconnect (first metal layer) static timing model is shown in Table 2.4. The propagation delay does not include the delay of the driving gate and loading gate.

26 Static Timing Modelling of Interconnect Figure 2.2: Interconnect static timing characterization

Interconnect Timing Characterization 27 Width (µm) Spacing (µm) Length (mm) Driver Load Propagation delay (as) Output transition time (ns) 0.09 0.09 0.1 INVX2 INVX2 2.3 0.290 0.09 0.09 0.1 INVX31 INVX31 3.4 0.036 0.09 0.09 0.4 INVX2 INVX2 34.5 1.100 0.09 0.09 0.4 INVX31 INVX31 41.6 0.096 0.09 0.18 0.1 INVX2 INVX2 1.7 0.220 0.09 0.18 0.1 INVX31 INVX31 2.9 0.031 0.09 0.18 0.4 INVX2 INVX2 25.6 0.800 0.09 0.18 0.4 INVX31 INVX31 31.2 0.073 0.18 0.09 0.1 INVX2 INVX2 1.3 0.340 0.18 0.09 0.1 INVX31 INVX31 2.0 0.039 0.18 0.09 0.4 INVX2 INVX2 20.3 1.300 0.18 0.09 0.4 INVX31 INVX31 23.3 0.110 0.18 0.18 0.1 INVX2 INVX2 1.1 0.270 0.18 0.18 0.1 INVX31 INVX31 1.6 0.034 0.18 0.18 0.4 INVX2 INVX2 15.7 0.990 0.18 0.18 0.4 INVX31 INVX31 18.7 0.088 Table 2.4: Local layer interconnect static timing model

28 Static Timing Modelling of Interconnect 2.4 Uniform Buffer Insertion We use the uniform buffer insertion method for system wires to minimize delay. Buffers with the same drive strength are inserted evenly along the wire and each buffer drives the wire segment with the same length. The circuit for HSPICE simulations is as Figure 2.3 and each wire segment is modeled as Figure 2.1. Figure 2.3: Circuit for HSPICE simulations The uniform buffer insertion problem can be expressed as: finding the optimal wire segment length for a given wire length, which can achieve the minimal end-to-end signal propagation delay. The metric delay per unit length (DPU) [11] is used and can be calculated by dividing the segment delay (the delay of one buffer and the wire segment it drives) by the distance (the wire segment length) traveled. The similar mathematical deduction can be found in [12][13]. We use the circuit in Figure 2.3 to find out how the DPU distributes by increasing the wire segment length from 0.1µm to 1mm and recording the corresponding segment delay for different routing metal layers.

Uniform Buffer Insertion 29 2.4.1 Different Routing Layers We divide all the routing layers (6 metal layers) into three classes: local layer (metal layer 1), intermediate layers (metal layer 2,3,4,5) and top layer (metal layer 6). Since the top layer is often used for power grids, we consider only local and intermediate layers here. 2.4.1.1 Local Layer The DPU distribution of the local layer is in Figure 2.4. It can be seen that the DPU decreases first as the wire segment length increases and until it reaches the optimal length. When the wire segment length is optimal, the DPU achieves the smallest value and it then increases as the wire segment length continues to increase. So the best wire segment length is the length when the DPU reaches the smallest value. HSPICE has built-in optimization capabilities which can be used in this case to find the smallest delay per unit length of the wire by tweaking the wire segment length. We use Tcl scripts do the same thing which can be seen in Appendix A.3. Once the minimal delay per unit length is found, the best wire segment length is also known. We can estimate the buffer number for any given length of the specified metal layer and buffer drive strength based on the best wire segment length and it is calculated by dividing the given wire length by the best wire segment length. Figure 2.4(a) and 2.4(b) are DPU distributions simulated with maximal C unit and minimal C unit from Table 2.1 respectively and the drive strength of the buffer is X31. The best wire segment length is 0.25mm and the delay per unit is 282ps/mm for maximal C unit ; while the best wire segment length is 0.57mm and the delay per unit is 90ps/mm for minimal C unit.

30 Static Timing Modelling of Interconnect (a) Maximal C unit (b) Minimal C unit Figure 2.4: Delay per Unit vs. Wire segment length, local layer

Uniform Buffer Insertion 31 There are 20 different drive strengths for inverters ranging from X2 to X284 in CMOS 65nm technology library and Figure 2.5 and 2.6 show how the best wire segment length and the delay per unit length change with the buffer drive strength for local layer. From these Figures, X106 with maximal C unit achieves the minimal delay per unit length 190.7ps/mm with the best wire segment length 0.25mm; while X62 with minimal C unit achieves the minimal delay per unit length 86.3ps/mm with the best segment length 0.56mm. 1 0.9 Maximal wire unit capacitance Minimal wire unit capacitance Best wire segment length (mm) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 50 100 150 200 250 300 Drive strength (X) Figure 2.5: Best wire segment length vs. Buffer drive strength, local layer

32 Static Timing Modelling of Interconnect 2500 Maximal wire unit capacitance Minimal wire unit capacitance Minimal delay per unit length (ps/mm) 2000 1500 1000 500 0 0 50 100 150 200 250 300 Drive strength (X) Figure 2.6: Minimum delay per unit length vs. Buffer drive strength, local layer

Uniform Buffer Insertion 33 2.4.1.2 Intermediate Layers Metal layers 2,3,4,5 are of the same size, sheet resistance and similar unit capacitance, which can be see from Table 2.1. Here, we use metal layer 3 as the intermediate layer for simulations and compare with the local layer. Other intermediate layers have the similar delay and best wire segment length as metal layer 3. Vias need to be taken into account for intermediate layers. In order to use the same circuits in Figure 2.3, the wire segment should be changed to add vias into it. Now the wire segment is modeled as Figure 2.7 to consider vias between different metal layers. Each via is modeled as a lumped RC and the parasitics are taken from Table 2.2. We take a typical 10aF and 1Ω for each via and two vias between different metal layers. Figure 2.7: Wire segment with vias between metal layers The DPU distribution of the intermediate layer is in Figure 2.8. It can be seen that the best wire segment lengths are 0.32µm and 0.75µm and the DPU are 248.7ps/mm and 71.5ps/mm for maximal C unit and minimal C unit respectively. Due to the very small delay of vias, the intermediate layer has longer delay than the local layer only below the micro-meter level and at the millimeter level, the intermediate layer has

34 Static Timing Modelling of Interconnect shorter delay than the local layer because of its smaller R sheet and C unit. 2.4.2 Delay Estimation Model A static timing model for system wire can be developed from the previous section. The delay of the system wire can be estimated as, D w = D pu L (2.3), where D pu is the delay per unit length from HSPICE simulation and L is the system wire length. Since the source driver has a nearly constant delay for a given buffer drive strength, we can also extend this model to calculate the end-to-end propagation delay by adding a constant delay of the source driver. However, due to the input shaping during the first few segments and the load in the last segment, we add empirical parameters into this model for different ranges of lengths in order to achieve a better accuracy. Equations 2.4 and 2.5 are for the local layer and intermediate layer with buffer drive strength X31 respectively. These empirical parameters are extracted from a series of HSPICE simulations with different wire lengths. D w = D pu L 50 ps, D pu L 40 ps, D pu L 30 ps, L 1 mm 1 mm < L 2 mm L > 2 mm (2.4) D pu L 70 ps, L 1 mm D w = D pu L 60 ps, D pu L 50 ps, D pu L 40 ps, 1 mm < L 1.3 mm 1.3 mm < L 1.5 mm L > 1.5 mm (2.5)

Uniform Buffer Insertion 35 (a) Maximal C unit (b) Minimal C unit Figure 2.8: Delay per Unit vs. Wire segment Length, intermediate layer

36 Static Timing Modelling of Interconnect The buffer number of the system wire can be estimated as, L N = L seg (2.6), where L is the system wire length and L seg is the best wire segment length. Similarly, some empirical parameters can be added into Equation 2.6. Equations 2.7 and 2.8 are for the local and intermediate layers buffer number estimations respectively with buffer drive strength X31. N = N = 1, L 1 mm L L seg 2, 1 mm < L 4 mm L L seg 1, L > 4 mm 1, L 1.4 mm L L seg 3, 1.4 mm < L 2 mm L L seg 2, 2 mm < L 5 mm L L seg 1, 5 mm < L 15 mm L L seg, 15 mm < L 25 mm L L seg + 1, 25 mm < L 35 mm L L seg + 2, L > 35 mm (2.7) (2.8) 2.4.3 Simulation Results and Comparisons We use standard cell inverters as buffers. The power supply is 1.2V and the temperature is normal (25 o C). Typical input transition time (0.1Vdd to 0.9Vdd) is 0.02ns. In order to find out the accuracy of our delay estimation model, a series of simulations

Uniform Buffer Insertion 37 have been carried out. Local and intermediate layers are simulated with buffer drive strength X31. In Table 2.5 and 2.6, both simulated and estimated best buffer number and corresponding minimal delay have been listed. Wire (mm) length Simulated Estimated Simulated Estimated Wire delay wire delay buffer numbeber buffer num- (ns) (ns) 1.0 0.232 0.232 1 1 1.1 0.266 0.270 2 2 1.2 0.298 0.298 3 3 1.5 0.385 0.383 4 4 2.0 0.528 0.524 6 6 3.0 0.812 0.816 10 10 4.0 1.095 1.098 14/15 14 5.0 1.377 1.370 19 19 6.0 1.659 1.662 23 23 10.0 2.788 2.790 39 39 20.0 5.608 5.610 79 79 25.0 7.018 7.020 99 99 30.0 8.428 8.430 119 119 40.0 11.248 11.250 159/160/161 159 50.0 14.068 14.070 199/200/201 199 Table 2.5: Optimal buffer number and wire delay comparison between simulations and calculations for the local layer with maximal C unit From Table 2.5 and 2.6, we can see that the inaccuracy of wire delay estimation by our model is within 5% and buffer number estimation is within 1 compared with HSPICE simulations. Even without the empirical parameters, our model can still achieve within 10% inaccuracy and buffer number estimation inaccuracy within 2

38 Static Timing Modelling of Interconnect Wire (mm) length Simulated Estimated Simulated Estimated Wire delay wire delay buffer numbeber buffer num- (ns) (ns) 1.0 0.179 0.179 1 1 1.4 0.296 0.298 1 1 1.5 0.325 0.323 2 2 2.0 0.456 0.457 4 4 3.0 0.708 0.706 8 8 4.0 0.957 0.955 11 11 5.0 1.207 1.203 14 14 6.0 1.456 1.452 18 18 10.0 2.452 2.447 31/32 31 20.0 4.938 4.934 63 63 30.0 7.425 7.421 95 95 35.0 8.668 8.664 110/111/112 111 40.0 9.912 9.907 127 127 50.0 12.398 12.394 158/159 159 Table 2.6: Optimal buffer number and wire delay comparison between simulations and calculations for the intermediate layer with maximal C unit

Summary 39 for system wire longer than 2mm. 2.5 Summary In this chapter, the interconnect and via geometry and electrical parameters were extracted for CMOS 65nm technology. Since the wire capacitance can not be exactly known until the routing is finished, we gave a best case and a worst case estimation based on interconnect dimensions and the proximity effects. Wires were modeled as distributed RCs. Four segments Π-model are proved accurate enough from HSPICE simulations. The timing behaviors (propagation delay and output transition time) of different layers of interconnects have been characterized from different combinations of interconnect dimensions, drivers and loads. The uniform buffer insertion method was developed for long interconnects. The best wire segment length is decided by the minimal delay per unit length (DPU). A delay estimation model was derived based on the DPU and wire length. With empirical parameters added, the inaccuracy of this model in delay estimation is within 5% compared HSPICE simulations.

40 Static Timing Modelling of Interconnect

Chapter 3 Statistical Timing Modelling Introduction In this chapter, we first present the statistical standard cell (INV, NAND2, NOR2) characterization by considering both global and local device variations. Then we describe the interconnect characterization under geometric and dielectric parameters variations. In the last part of this chapter, the uniform buffer insertion method under both standard cell and interconnect variations for both delay minimization and yield optimization is investigated.

42 Statistical Timing Modelling 3.1 Statistical Standard Cell Characterization The goal of the statistical standard cell characterization is very similar to the traditional characterization approach, which is to provide timing tables representing the timing behaviors (propagation delay and output transition time) of cells with respect to their operating conditions, namely the input transition time and the output load [35]. The main differences are that Monte Carlo simulations rather than the normal transient analysis are employed by statistical characterization method and pairs of mean and standard deviation values of all timings rather than deterministic values are collected in a tabular way. In the characterization process, we also identify the most important process parameters in affecting the cell timing performance variation. 3.1.1 Standard Cell Timing under Global Variations First, only global variations are introduced into standard cells. Five types of global variations are provided by CMOS 65nm technology library: effective channel length variation (xl), effective channel width variation (xw), oxide thickness variation (tox), NMOS specific variation (nsvt) and PMOS specific variation (psvt). NMOS and PMOS specific variations are mainly due to implant variations including the variations of threshold voltage(vt), mobility, junction capacitance and leakage. We assume all these global variation distributions are Gaussian and by running Monte Carlo HSPICE simulations, the cell timing variation can be derived. Taking an inverter cell as an example, the propagation delay of INVX2 after 1000 runs of Monte Carlo simulations is as Figure 3.1. The delay variation distribution of INVX2 is near Gaussian and can be represented with a mean value 14.58ps and a standard

Statistical Standard Cell Characterization 43 deviation (σ) value 0.75ps. 0.18 0.16 0.14 Probability Density 0.12 0.1 0.08 0.06 0.04 0.02 0 1.2 1.3 1.4 1.5 1.6 1.7 Propagation Delay (s) x 10 11 Figure 3.1: Standard cell INVX2 propagation delay distribution under global variations By sweeping the σ value of each parameter variation, we can find out the timing variation sensitivity to each parameter. From Figure 3.2, the delay variation is most sensitive to the variations of xl, nsvt and psvt since it is a linear-like relation between delay and the parameter variation, and xw and tox are secondary sources of variations in affecting the cell timing compared to the most sensitive ones. In order to characterize how these parameters account for the delay variations, we use multiple linear regression method (see Appendix B.1) to analyze Monte Carlo simulation results. The weights of each parameter accounting for the delay variation are found as Table 3.1.

44 Statistical Timing Modelling 1.7 x 10 11 1.7 x 10 11 Propagation Delay (s) 1.6 1.5 1.4 1.3 Propagation Delay (s) 1.6 1.5 1.4 1.3 1.2 3 2 1 0 1 2 3 Sigma of xw (a) Delay sensitivity to xw variation 1.2 3 2 1 0 1 2 3 Sigma of xl (b) Delay sensitivity to xl variation 1.7 x 10 11 1.7 x 10 11 Propagation Delay (s) 1.6 1.5 1.4 1.3 Propagation Delay (s) 1.6 1.5 1.4 1.3 1.2 3 2 1 0 1 2 3 Sigma of tox (c) Delay sensitivity to tox variation 1.2 3 2 1 0 1 2 3 Sigma of nsvt (d) Delay sensitivity to nsvt variation 1.7 x 10 11 Propagation Delay (s) 1.6 1.5 1.4 1.3 1.2 3 2 1 0 1 2 3 Sigma of psvt (e) Delay sensitivity to psvt variation Figure 3.2: Propagation delay sensitivities to different sources of variations

Statistical Standard Cell Characterization 45 Parameter Weight Effective channel width (xw) -0.5% Effective channel length (xl) 58.7% Oxide thickness (tox) 0.1% NMOS specific (nsvt) -19.2% PMOS specific (psvt) -21.5% Table 3.1: Weight of each parameter for delay variation Table 3.1 shows that 99.4% of the INVX2 delay variation is due to the variations of three parameters: xl, nsvt and psvt, which explains the linear relation in Figure 3.2. Moreover, the sign of each weight indicates the direction of that parameter contributing to MAX/MIN delay variation spread corners, which can also be seen from Figure 3.2. Taking global variations into account, the inverter delay can be estimated as: D = D nom + a1 xw + a2 xl + a3 tox + a4 nsvt + a5 psvt (3.1), where D nom is the mean delay of INVX2 and a1 to a5 are the coefficients representing the sensitivities of the delay variation to each parameter variation. The coefficients values are in Table 3.2. The variables xw, xl, tox, nsvt and psvt all have standard Gaussian distribution N(0,1). The accuracy of using Equation 3.1 to estimate propagation delay variation compared to Monte Carlo Simulation is within 1% and the simulation time is much shorter, which can be seen from Table 3.3.

46 Statistical Timing Modelling Coefficient a1 a2 a3 a4 a5 Value -0.05E-12 0.56E-12 0.02E-12-0.32E-12-0.34E-12 Table 3.2: Coefficients of Equation 3.1 Standard Cell Model Estimation Monte Carlo Error (%) (ps) (ps) Mean Std. Mean Std. Mean Std. INVX2 14.57 0.75 14.58 0.75 0.10% 0.0% INVX31 10.16 0.68 10.16 0.68 0.00% 0.0% NAND2X2 25.54 1.32 25.55 1.33 0.04% 0.8% NOR2X2 26.78 1.29 26.77 1.29 0.04% 0.0% Table 3.3: Standard cell statistical timing data

Statistical Standard Cell Characterization 47 3.1.2 Standard Cell Timing under Local Variations Compared to global variations which affect all devices equally in a given chip, local variations can cause identically designed devices in a chip to behave differently under the same bias condition. We consider two sources of local variations which are threshold voltage (V th ) and mobility (U0) for device timing variation. In 65nm technology library, local variations of the impacted parameters P (V th and U0) are modeled as Gaussian distributions and the standard deviation is defined by the following equation: σ P = A/ 2 mult W L + B (3.2), where mult is the number of devices in parallel, W and L are the gate width and gate length of the transistors, and A and B are process dependent constants. As shown in Figure 3.3 and Figure 3.4, local variations are perpendicular to global variations in the scatter diagram of V th and U0, and local variations obey a gaussian distribution. V th A and V th B, U0A and U0B are threshold voltage and mobility values from HSPICE simulation of two neighboring NMOS transistors with the same size. Monte Carlo simulations for INVX2 with only local variations can give timing distributions in Figure 3.5. The distribution of timing are Gaussian or near Gaussian and the propagation delay has a mean value 14.64ps and a standard deviation value 0.63ps; the output transition time has a mean value 14.14ps and a standard deviation value 0.21ps. Unlike global variations, there is no linear relation between the cell timing and

48 Statistical Timing Modelling (a) Combination of both global and local variations (b) Local variation Figure 3.3: V th variations of identical transistors

Statistical Standard Cell Characterization 49 (a) Combination of both global and local variations (b) Local variation Figure 3.4: U0 variations of identical transistors

50 Statistical Timing Modelling 0.18 0.16 0.14 Probability Density 0.12 0.1 0.08 0.06 0.04 0.02 0 1.25 1.3 1.35 1.4 1.45 1.5 1.55 1.6 1.65 1.7 Propagation Delay (s) x 10 11 (a) Propagation delay distribution 0.16 0.14 0.12 Probability Density 0.1 0.08 0.06 0.04 0.02 0 1.34 1.36 1.38 1.4 1.42 1.44 1.46 1.48 Output Transition Time (s) x 10 11 (b) Output transition time distribution Figure 3.5: Standard cell INVX2 timing distribution under local variations

Statistical Standard Cell Characterization 51 local parametric variations. So multiple linear regression method is not suitable in analyzing local variations. Instead, we use Pareto analysis (see Appendix B.2) as the principle component analysis method and the weight of each parameter accounting for the local variation is as Table 3.4 Parameter Weight V th 62.8% U0-37.2% Table 3.4: Weight of each parameter for delay variation Table 3.5 reports the standard cell delay variations under global and local parametric variations separately. It can be seen from Table 3.5 that local variations are comparable with global variations in affecting standard cell delay and can not be neglected especially for small drive strength in 65nm technology. Standard cell Std. (ps) Global Local INVX2 0.75 0.63 INVX31 0.68 0.21 INVX284 0.62 0.06 NAND2X2 1.32 0.96 NAND2X7 0.89 0.29 NOR2X2 1.29 0.81 NOR2X19 0.74 0.17 Table 3.5: Standard cell delay variation (standard deviation) caused by global and local parametric variations separately

52 Statistical Timing Modelling 3.1.3 Standard Cell Statistical Timing Models The timing distributions of standard cells under both global and local device variations are mostly Gaussian or near Gaussian as shown in Figure 3.6, which can be represented as a pair of mean and standard deviation. Our standard cell statistical timing models provide pairs of mean and standard deviation values of cells timing behaviors (propagation delay and output transition time) in look up tables indexed by combinations of the input transition time and the output load. Both global and local device parameters variations are introduced into the standard cells. In order to also take the variations of input transition time and output load into consideration, we assume the input transition time and the output load as gaussian distributions instead of deterministic values as the traditional characterization approach. Monte Carlo HSPICE simulations are employed to derive these models. The statistical timing characterization process is illustrated in Figure 3.7. A number of Monte Carlo HSPICE simulations are run for all relevant combinations of the input transition time, standard cells with parametric variations and the output load. The simulation results are stored in a data structure (look up table). Table 3.6, Table 3.7, Table 3.8 and Table 3.8 show the statistical timing models of standard cells INVX2, INVX31, NAND2X2 and NOR2X6 respectively. 3.2 Statistical Interconnect Characterization The goal of the statistical interconnect characterization is similar to the statistical standard cell characterization, which is to model interconnect timing behaviors (propagation delay and output transition time) variations and find out the most

Statistical Interconnect Characterization 53 0.4 0.35 0.3 Probability Density 0.25 0.2 0.15 0.1 0.05 0 0.4 0.6 0.8 1 1.2 1.4 1.6 Propagation Delay (s) x 10 11 (a) Propagation delay distribution 0.35 0.3 0.25 Probability Density 0.2 0.15 0.1 0.05 0 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 Output Transition Time (s) x 10 11 (b) Output transition time distribution Figure 3.6: INVX31 Timing Distribution

54 Statistical Timing Modelling Figure 3.7: Statistical standard cell timing characterization

Statistical Interconnect Characterization 55 Input Transition Output Load Propagation Output Transi- Time (ff) Delay (ps) tion Time (ps) (ps) Mean Std. Mean Std. Mean Std. Mean Std. 33.0 0.0 0.8 0.0 20.2 1.6 22.3 1.5 28.0 2.0 85.2 4.0 543.6 51.6 1098.0 108.0 1400.0 151.0 85.2 4.0 874.9 74.8 1196.3 101.3 Table 3.6: Statistical timing model of INVX2 Input Transition Output Load Propagation Output Transi- Time (ff) Delay (ps) tion Time (ps) (ps) Mean Std. Mean Std. Mean Std. Mean Std. 33.0 0.0 0.9 0.0 10.1 0.8 12.1 0.48 28.0 2.0 85.2 4.0 51.2 3.8 88.0 7.0 1400.0 151.0 85.2 4.0 216.3 23.3 341.0 21.8 Table 3.7: Statistical timing model of INVX31 Input Transition Output Load Propagation Output Transi- Time (ff) Delay (ps) tion Time (ps) (ps) Mean Std. Mean Std. Mean Std. Mean Std. 33.0 0.0 0.9 0.0 26.2 2.0 27.3 3.5 28.0 2.0 85.2 4.0 560.3 70.9 1119.9 132.6 1400.0 105.0 85.2 4.0 910.3 93.3 1218.5 129.2 Table 3.8: Statistical timing model of NAND2X2

56 Statistical Timing Modelling Input Transition Output Load Propagation Output Transi- Time (ff) Delay (ps) tion Time (ps) (ps) Mean Std. Mean Std. Mean Std. Mean Std. 33.0 0.0 0.9 0.0 18.5 1.7 18.0 2.1 28.0 2.0 85.2 4.0 227.0 32.9 444.0 69.4 1400.0 105.0 85.2 4.0 532.7 50.5 633.0 69.3 Table 3.9: Statistical timing model of NOR2X6 important parameters in affecting the interconnect timing variation. 3.2.1 Interconnect Structure We consider a typical interconnect structure which consists of three parallel wire segments in the same layer between two ground planes (the top metal layer has only one ground plane below). The cross section of the structure is shown in Figure 3.8. The wires are uniform, with the length of l, the width of w, the thickness of t, the distance of h to the ground plane, and the spacing of s between wires. 3.2.2 Interconnect Geometry Parameter Variation The variations of geometry parameters (w, t, h, s) are listed in Table 3.10, which are extracted from CMOS 65nm design rule manual [34]. Table 3.10 shows that w, h and s all have a ±10% variation and t exhibits a ±15% variation.

Statistical Interconnect Characterization 57 (a) Cross section of local and intermediate layer interconnects (b) Cross section of top layer interconnect Figure 3.8: Interconnect structure