Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets

Similar documents
Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Variation-aware Clock Network Design Methodology for Ultra-Low Voltage (ULV) Circuits

Thermal-reliable 3D Clock-tree Synthesis Considering Nonlinear Electrical-thermal-coupled TSV Model

Robust Clock Tree Synthesis with Timing Yield Optimization for 3D-ICs

Fast Buffer Insertion Considering Process Variation

Analysis of Substrate Thermal Gradient Effects on Optimal Buffer Insertion

Timing-Aware Decoupling Capacitance Allocation in Power Distribution Networks

Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

EECS 141: FALL 05 MIDTERM 1

Minimizing Clock Latency Range in Robust Clock Tree Synthesis

Multi-Objective Module Placement For 3-D System-On-Package

ULTRALOW VOLTAGE (ULV) circuits, where the supply

Dept. Information Systems Engineering, Osaka Univ., Japan

Clock Buffer Polarity Assignment Utilizing Useful Clock Skews for Power Noise Reduction

The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization

iretilp : An efficient incremental algorithm for min-period retiming under general delay model

An Efficient Graph Sparsification Approach to Scalable Harmonic Balance (HB) Analysis of Strongly Nonlinear RF Circuits

Three-Tier 3D ICs for More Power Reduction: Strategies in CAD, Design, and Bonding Selection

Digital Integrated Circuits (83-313) Lecture 5: Interconnect. Semester B, Lecturer: Adam Teman TAs: Itamar Levi, Robert Giterman 1

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem

EECS 151/251A Homework 5

Adding a New Dimension to Physical Design. Sachin Sapatnekar University of Minnesota

Skew Management of NBTI Impacted Gated Clock Trees

Simple and Accurate Models for Capacitance Increment due to Metal Fill Insertion

PARADE: PARAmetric Delay Evaluation Under Process Variation * (Revised Version)

ISPD 2009 Clock Network Synthesis Contest

Longest Path Selection for Delay Test under Process Variation

A Robustness Optimization of SRAM Dynamic Stability by Sensitivity-based Reachability Analysis

Spiral 2 7. Capacitance, Delay and Sizing. Mark Redekopp

Technical Report GIT-CERCS. Thermal Field Management for Many-core Processors

VLSI GATE LEVEL DESIGN UNIT - III P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

Lecture 9: Interconnect

MULTITIER monolithic 3-D integration shown in Fig. 1

On Potential Design Impacts of Electromigration Awareness

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Modeling the Overshooting Effect for CMOS Inverter in Nanometer Technologies

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

3-D INTEGRATED circuits (3-D ICs) have regained the

Dynamic Power and Thermal Integrity in 3D Integration

On Application of Output Masking to Undetectable Faults in Synchronous Sequential Circuits with Design-for-Testability Logic

Micro-architecture Pipelining Optimization with Throughput- Aware Floorplanning

C.K. Ken Yang UCLA Courtesy of MAH EE 215B

Static Timing Analysis Considering Power Supply Variations

On Optimal Physical Synthesis of Sleep Transistors

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 28, NO. 9, SEPTEMBER /$26.

Luis Manuel Santana Gallego 31 Investigation and simulation of the clock skew in modern integrated circuits

Stack Sizing for Optimal Current Drivability in Subthreshold Circuits REFERENCES

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

PARADE: PARAmetric Delay Evaluation Under Process Variation *

Smart Non-Default Routing for Clock Power Reduction

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CMOS device technology has scaled rapidly for nearly. Modeling and Analysis of Nonuniform Substrate Temperature Effects on Global ULSI Interconnects

Technology Mapping for Reliability Enhancement in Logic Synthesis

A Fast Simulation Framework for Full-chip Thermo-mechanical Stress and Reliability Analysis of Through-Silicon-Via based 3D ICs

HIGH-PERFORMANCE circuits consume a considerable

Skew Management of NBTI Impacted Gated Clock Trees

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Gate Delay Calculation Considering the Crosstalk Capacitances. Soroush Abbaspour and Massoud Pedram. University of Southern California Los Angeles CA

The complexity of VLSI power-delay optimization by interconnect resizing

Name: Answers. Mean: 38, Standard Deviation: 15. ESE370 Fall 2012

! Crosstalk. ! Repeaters in Wiring. ! Transmission Lines. " Where transmission lines arise? " Lossless Transmission Line.

Transistor Sizing for Radiation Hardening

ESE535: Electronic Design Automation. Delay PDFs? (2a) Today. Central Problem. Oxide Thickness. Line Edge Roughness

Equivalent Circuit Model Extraction for Interconnects in 3D ICs

Interconnect s Role in Deep Submicron. Second class to first class

9/18/2008 GMU, ECE 680 Physical VLSI Design

Power and Area Reduction using Carbon Nanotube Bundle Interconnect in Global Clock Tree Distribution Network (Invited Paper)

Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm

Interconnect Lifetime Prediction for Temperature-Aware Design

Floorplanning. ECE6133 Physical Design Automation of VLSI Systems

An Automated Approach for Evaluating Spatial Correlation in Mixed Signal Designs Using Synopsys HSpice

Simultaneous Power and Thermal Integrity Driven Via Stapling in 3D ICs

Implementation of Clock Network Based on Clock Mesh

Unit 1A: Computational Complexity

Analysis of TSV-to-TSV Coupling with High-Impedance Termination in 3D ICs

Itanium TM Processor Clock Design

Recent Progress and Challenges for Relay Logic Switch Technology

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences. Professor Oldham Fall 1999

Energy-Efficient Real-Time Task Scheduling in Multiprocessor DVS Systems

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 12 CMOS Delay & Transient Response

Digital Integrated Circuits. The Wire * Fuyuzhuo. *Thanks for Dr.Guoyong.SHI for his slides contributed for the talk. Digital IC.

Making Fast Buffer Insertion Even Faster via Approximation Techniques

Spurious-Tone Suppression Techniques Applied to a Wide-Bandwidth 2.4GHz Fractional-N PLL. University of California at San Diego, La Jolla, CA

Emerging Interconnect Technologies for CMOS and beyond-cmos Circuits

Electrical Characterization for Intertier Connections and Timing Analysis for 3-D ICs

Network Flow-based Simultaneous Retiming and Slack Budgeting for Low Power Design

Design for Variability and Signoff Tips

Through Silicon Via-Based Grid for Thermal Control in 3D Chips

Optimal Voltage Allocation Techniques for Dynamically Variable Voltage Processors

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Stochastic Capacitance Extraction Considering Process Variation with Spatial Correlation

3-D Thermal-ADI: A Linear-Time Chip Level Transient Thermal Simulator

POLITECNICO DI TORINO Repository ISTITUZIONALE

Power-Driven Global Routing for Multi-Supply Voltage Domains

AN ANALYTICAL THERMAL MODEL FOR THREE-DIMENSIONAL INTEGRATED CIRCUITS WITH INTEGRATED MICRO-CHANNEL COOLING

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Transcription:

Buffered Clock Tree Sizing for Skew Minimization under Power and Thermal Budgets Krit Athikulwongse, Xin Zhao, and Sung Kyu Lim School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, Georgia, U.S.A.

Outline Introduction Problem formulation Algorithm design Experiments and results Conclusions 2/25

Review Sequential-Linear-Programming based clock sizing Low-power and skew minimization [Wang, etc. IEEE Trans. 05] Process-variation-aware clock skew minimization under power budget [Guthaus, etc. DAC 06] Thermal-aware clock network design Balanced thermal-aware skew under given thermal profiles [Cho, etc. ICCAD 05] Thermal-aware bottom-up merging based on thermal sensitivity [Yu, etc. ISPD 07] Thermal-aware 3D clock routing algorithm design [Minz, etc. ASPDAC 08] 3/25

Challenges Wire/buffer sizing impacts on delay / thermal / power Impact of sizing on thermal variations can not be ignored Thermal profiles are changed during optimization Consider power and thermal budget simultaneously 4/25

Contributions Thermal-Aware Sequential-Linear-Programming (TA-SLP) Thermal-aware clock skew minimization under power and thermal budgets Single and multiple thermal profiles Thermal-aware skew or skew range minimization Minimize skew value under single steady-state thermal profile Minimize skew range under multiple thermal profiles Loosely-Thermal-Aware-Sequential-Linear-Programming (LTA-SLP) A speedup scheme to update delay, thermal and power sensitivity 5/25

Problem formulation: Thermal-aware clock sizing Input: Non-sized buffered clock tree Thermal profile (single or multiple) Power and thermal budgets Output: A sized-clock tree: wire widths and buffer sizes are determined Object: Minimize thermal-aware clock skew (skew value or skew range) Constraint: Power budget Temperature budget 6/25

Overview: TA-SLP sizing flow 7/25

Thermal analysis Power consumption Underlying power: Substrate devices Clock power: Clock wires and buffers Compact substrate thermal model* n x n thermal tiles R: thermal resistance matrix t, p: temperature and power vectors *C.-H. Tsai and S.-M. Kang. Cell-level placement for improving substrate thermal distribution. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, 19(2):253 266, Feb. 2000. 8/25

Thermal-aware delay models Thermal impact on wire model Temperature-dependent resistance Thermal impact on buffer model Driving resistance linearly depends on temperature within a small region. Intrinsic delay with buffer size S and temperature T 10x to 70x buffer size, at 60 to 110 o C, α = 0.005, τ =0.00013, mean absolute error = 0.04ps. 9/25

Sensitivity computation For a wire/buffer j, with w j0 changes to w j 0 + ε j : G: Delay sensitivity matrix Delay variation caused by sizing 10/25

Sensitivity computation (cont.) Г: Thermal sensitivity matrix Average thermal variation of thermal tiles caused by sizing β: Power sensitivity vector Total power variation caused by sizing 11/25

Linear sub-problem in TA-SLP Given the current wire/buffer size w 0, decide the size change δ Minimize Subject to 12/25

Wire/buffer size update and control New size after sizing: Restrict to: ε j : size perturbation for In each linear sub-problem, the variable δ j for each wire/buffer is restricted within a small range, which provides the approximated linear range for the delay gradient 13/25

Extension: multi-thermal profiles Single thermal profile -> multiple thermal profiles Minimization: thermal-aware skew -> thermal-aware skew range Our strategy: Single average thermal profile: Average thermal profile among the multiple non-uniform profiles Thermal-aware clock sizing under the single average thermal profile, and then evaluate it using multiple profiles 14/25

Loosely-Thermal-Aware-Sequential-Linear- Programming Motivation: TA-SLP spends most of time solving LP sub-problems Density of G matrix affects the runtime A speed-up scheme: LTA-SLP Keep thermal profile the same in G matrix calculation Update the thermal profile in each iteration 15/25

Detail experiment settings 65nm technology: Wire: R = 0.15 Ω/um, C = 0.2 ff/um, at 0 C, minimum size Size: 0.24um 0.96um Buffer: R d = 4.7kΩ, C L = 0.47fF, t d =17.4ps, at 0 C, minimum size Size: 12X 64X Circuits # Sinks Frequency: 5GHz, V dd : 1.2V r1 267 Thermal profile: Multiple non-uniform: 10 power / thermal profiles r2 598 r3 862 r4 1903 r5 3101 16/25

Impact of sizing on power and temperature Wire and buffer sizing greatly impacts power and temperature. Three cases: min-size, mid-size and max-size Total power increases by 4 times Temperature increases by 10 C to 13 C. 17/25

Impact of initial sizing Three initial sizing: min-size, mid-size and max-size The single average thermal profile out of 10 thermal profiles TA-SLP reduces thermal-aware clock skew to near zero, regardless of the initial solution Ckt. Min-size 0.24um 12X Mid-size 0.60um 38X Max-size 0.96um 64X r1 0.00 0.00 0.77 r2 0.00 0.00 0.44 r3 0.01 0.00 0.31 r4 0.01 0.01 0.42 r5 0.01 0.01 0.44 Skew value based on single average thermal profile (ps) 18/25

Impact of initial sizing (cont.) Clock skew range based on 10 thermal profiles Mid-size case is slightly better than max-size case Min-size case is the worst We choose mid-size as initial sizing 19/25

Comparison: NTA-SLP vs TA-SLP NTA-SLP: Non-thermal-aware SLP -based sizing Minimize skew under power budget only, which resembles an existing work* Ptotal ( W ) Tmax ( o C ) Budget NTA-SLP TA-SLP Budget NTA-SLP TA-SLP r1 2.87 2.86 2.86 74.18 71.16 71.12 r2 6.07 6.05 6.04 77.07 72.98 72.91 r3 8.65 8.60 8.59 76.96 73.62 73.60 r4 18.32 18.25 18.23 80.88 76.59 76.59 r5 28.61 28.46 28.52 82.85 78.00 78.03 *M. R. Guthaus, D. Sylvester, and R. B. Brown. Clock buffer and wire sizing using sequential programming. In Proc. ACM Design Automation Conf., pages 1041 1046, San Francisco, CA, July 24 28 2006. 20/25

Comparison: NTA-SLP vs TA-SLP (cont.) Skew reduction: Avg. skew saving: 59%-77% Max skew saving: 52%-69% 21/25

Impact of multiple thermal profiles TA-SLP with mid-size initial solution Under 10, 100, 1000 thermal profiles: Average skew stays stable Skew range increase with more thermal profiles 22/25

Runtime Comparisons among NTA-SLP, TA-SLP, and LTA-SLP Ckt. NTA-SLP TA-SLP LTA-SLP # itr CPU/itr # itr CPU/itr # itr CPU/itr r1 12 2.0 12 12.0 17 11.4 r2 12 8.8 13 43.8 20 29.3 r3 14 18.4 15 84.0 16 46.6 r4 15 75.9 13 692.2 14 139.9 r5 15 184.5 15 2190.5 16 292.8 Density of G matrix: r5: G matrix in TA-SLP has 23.6 million non-zero elements r5: G matrix in LTA-SLP has 2.0 million non-zero elements 23/25

Conclusions Proposed thermal-aware sequential-linear-programming (TA-SLP) based clock sizing Addressed the impact of clock sizing on power, thermal and delay Minimized thermal-aware clock skew under power and thermal budgets Narrowed clock skew range under multiple thermal profiles Proposed loosely-thermal-aware sequential-linear-programming (LTA-SLP) 24/25

Thank you 25/25