L15: Custom and ASIC VLSI Integration

Similar documents
Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits

DSP Design Lecture 7. Unfolding cont. & Folding. Dr. Fredrik Edman.

Digital Integrated Circuits A Design Perspective

EE241 - Spring 2006 Advanced Digital Integrated Circuits

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

School of EECS Seoul National University

Integrated Circuits & Systems

Homework 4 due today Quiz #4 today In class (80min) final exam on April 29 Project reports due on May 4. Project presentations May 5, 1-4pm

VLSI Signal Processing

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

The Inverter. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic

Chapter 8. Low-Power VLSI Design Methodology

Pipelining and Parallel Processing

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Lecture 1: Circuits & Layout

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective

Implementation of Clock Network Based on Clock Mesh

DSP Design Lecture 5. Dr. Fredrik Edman.

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Digital Integrated Circuits A Design Perspective

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

CMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design

EE141- Spring 2007 Digital Integrated Circuits

Administrative Stuff

EE241 - Spring 2001 Advanced Digital Integrated Circuits

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

Parallel Multipliers. Dr. Shoab Khan

Digital Integrated Circuits A Design Perspective

Area-Time Optimal Adder with Relative Placement Generator

L16: Power Dissipation in Digital Systems. L16: Spring 2007 Introductory Digital Systems Laboratory

Retiming. delay elements in a circuit without affecting the input/output characteristics of the circuit.

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Issues on Timing and Clocking

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

L8/9: Arithmetic Structures

S No. Questions Bloom s Taxonomy Level UNIT-I

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

Designing Sequential Logic Circuits

Combinational Logic Design

EECS Components and Design Techniques for Digital Systems. FSMs 9/11/2007

GMU, ECE 680 Physical VLSI Design 1

Professor Fearing EECS150/Problem Set Solution Fall 2013 Due at 10 am, Thu. Oct. 3 (homework box under stairs)

For smaller NRE cost For faster time to market For smaller high-volume manufacturing cost For higher performance

Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995

Homework #2 10/6/2016. C int = C g, where 1 t p = t p0 (1 + C ext / C g ) = t p0 (1 + f/ ) f = C ext /C g is the effective fanout

Digital VLSI Design. Lecture 8: Clock Tree Synthesis

ECE321 Electronics I

Adders, subtractors comparators, multipliers and other ALU elements

Hw 6 due Thursday, Nov 3, 5pm No lab this week

Pipelining and Parallel Processing

Memory, Latches, & Registers

EE241 - Spring 2000 Advanced Digital Integrated Circuits. References

Example: vending machine

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

CSE477 VLSI Digital Circuits Fall Lecture 20: Adder Design

EE141-Fall 2010 Digital Integrated Circuits. Announcements. An Intel Microprocessor. Bit-Sliced Design. Class Material. Last lecture.

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

Clocking Issues: Distribution, Energy

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

Errata of K Introduction to VLSI Systems: A Logic, Circuit, and System Perspective

9/18/2008 GMU, ECE 680 Physical VLSI Design

GMU, ECE 680 Physical VLSI Design

Itanium TM Processor Clock Design

EEC 216 Lecture #2: Metrics and Logic Level Power Estimation. Rajeevan Amirtharajah University of California, Davis

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Semiconductor Memories

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata

Arithmetic Building Blocks

CMPE12 - Notes chapter 1. Digital Logic. (Textbook Chapter 3)

EECS 151/251A Spring 2018 Digital Design and Integrated Circuits. Instructors: Nick Weaver & John Wawrzynek. Lecture 10 EE141

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Models for representing sequential circuits

Skew-Tolerant Circuit Design

Adders, subtractors comparators, multipliers and other ALU elements

The Wire. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. July 30, 2002

Table of Content. Chapter 11 Dedicated Microprocessors Page 1 of 25

CprE 281: Digital Logic

VLSI. Faculty. Srikanth

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M.

Hardware Design I Chap. 4 Representative combinational logic

Lecture 22 Chapters 3 Logic Circuits Part 1

Transformation Techniques for Real Time High Speed Implementation of Nonlinear Algorithms

The Linear-Feedback Shift Register

Floating Point Representation and Digital Logic. Lecture 11 CS301

University of Toronto. Final Exam

Lecture 9: Sequential Logic Circuits. Reading: CH 7

Transcription:

L15: Custom and ASIC VLSI Integration Average Cost of one transistor 10 1 0.1 0.01 0.001 0.0001 0.00001 $ 0.000001 Gordon Moore, Keynote Presentation at ISSCC 2003 0.0000001 '68 '70 '72 '74 '76 '78 '80 '82 '84 '86 '88 '90 '92 '94 '96 '98 '00 '02 Acknowledgement: J. Rabaey, A. Chandrakasan, B. Nikolic, igital Integrated Circuits: A esign Perspective Prentice Hall, 2003. Curt Schurgers L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 1

Moore s Law LOG 2 OF THE NUMBER OF COMPONENTS PER INTEGRATE FUNCTION Transistors Shipped Per Year 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 10 18 10 17 10 16 10 15 10 14 10 13 10 12 10 11 10 10 10 9 Electronics, April 19, 1965. 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 Gordon Moore, Keynote Presentation at ISSCC 2003 Source: ataquest/intel, 12/02 E.O. Wilson, the famous Harvard biologist who is an expert on ants, estimates that there are 10 to the 16th and 10 to the 17th ants on earth. But if you look at this curve, this year we re making one transistor for every ant. Gordon Moore, An Update on Moore s Law '68 '70 '72 '74 '76 '78 '80 '82 '84 '86 '88 '90 '92 '94 '96 '98 '00 '02 In 1965, Gordon Moore was preparing a speech and made a memorable observation. When he started to graph data about the growth in memory chip performance, he realized there was a striking trend. Each new chip contained roughly twice as much capacity as its predecessor, and each chip was released within 18-24 months of the previous chip. If this trend continued, he reasoned, computing power would rise exponentially over relatively brief periods of time. P6 Pentium proc 486 386 286 8085 8086 8080 4004 8008 L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 2 Transistors (MT) 1000 100 10 1 0.1 0.01 0.001 2X growth in 1.96 years! 1970 1980 1990 2000 2010 Year Courtesy of S. Borkar (Intel)

Layout 101 3- Cross-Section V p-type substrate n-type well metal/pdiff contact W p L p N-channel MOSFET P-channel MOSFET IN OUT V W n G S Circuit Representation GN L n contact frommetal to ndiff IN OUT metal poly Layout n+ diff p+ diff G S Follow simple design rules (contract between process and circuit designers) L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 3

Custom esign/layout 9-1 Mux 5-1 Mux a CARRYGEN g64 Itanium has 6 integer execution units like this node1 ck1 SUMSEL REG sum sumb to Cache 9-1 Mux 2-1 Mux b SUMGEN + LU s0 s1 1000um LU : Logical Unit From register files / Cache / Bypass Multiplexers Shifter Adder stage 1 Loopback Bus Loopback Bus Wiring Adder stage 2 Wiring Loopback Bus ie photograph of the Itanium integer datapath Bit slice 63 Adder stage 3 Sum Select Bit slice 2 Bit slice 1 Bit slice 0 Bit-slice esign Methodology To register files / Cache Hand crafting the layout to achieve maximum clock rates (> 1Ghz) Exploits regularity in datapath structure to optimize interconnects L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 4

The ASIC Approach esign Capture Behavioral esign Iteration Pre-Layout Simulation Post-Layout Simulation Verilog (or (or VHL )) Logic Synthesis Floorplanning Placement Structural Physical Circuit Extraction Routing Tape-out Most Common esign Approach for esigns up to 500Mhz Clock Rates L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 5

Standard Cell Example Power Supply Line (V ) elay in (ns)!! 3-input NAN cell (from ST Microelectronics): C = Load capacitance T = input rise/fall time Ground Supply Line (GN) Each library cell (FF, NAN, NOR, INV, etc.) and the variations on size (strength of the gate) is fully characterized across temperature, loading, etc. L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 6

Standard Cell Layout Methodology 2-level metal technology Current ay Technology Cell-structure hidden under interconnect layers With limited interconnect layers, dedicated routing channels between rows of standard cells are needed Width of the cell allowed to vary to accommodate complexity Interconnect plays a significant role in speed of a digital circuit L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 7

Verilog to ASIC Layout (the push button approach) module adder64 (a, b, sum); input [63:0] a, b; output [63:0] sum; After Synthesis assign sum = a + b; endmodule After Routing After Placement L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 8

The esign Closure Problem V BUS d 1 l 1 d 2 l 2 C L C C I λ = = 5 C I L Wire-to-wire capacitance causes inter-wire delay dependencies C L Iterative Removal of Timing Violations (white lines) L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 9

Macro Modules 256 32 (or 8192 bit) SRAM Generated by hard-macro module generator Generate highly regular structures (entire memories, multipliers, etc.) with a few lines of code Verilog models for memories automatically generated based on size L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 10

Clock istribution Q Clock skew, courtesy Alpha Q For 1Ghz clock, skew budget is 100ps. Variations along different paths arise from: evice: V T, W/L, etc. Environment: V, C Interconnect: dielectric thickness variation IBM Clock Routing L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 11

The Power Supply Wires are Not Ideal! To V Grid To V Grid To V Grid C coup Receiver C int R d C d river GROUN GRI Pad Pad The IR-drop problem causes internal power supply voltage to be less than the external source Courtesy avid Blaauw L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 12

Analog Circuits: Clock Frequency Multiplication (Phase Locked Loop) up PLL down Intel 486, 50Mhz VCO produces high frequency square wave ivider divides down VCO frequency PF compares phase of ref and div Loop filter extracts phase error information Used widely in digital systems for clock synthesis (a standard IP block in most ASIC flows) Courtesy M. Perrott L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 13

Behavioral Transformations There are a large number of implementations of the same functionality These implementations present a different point in the area-time-power design space Behavioral transformations allow exploring the design space a high-level Optimization metrics: 1. Area of the design 2. Throughput or sample time T S 3. Latency: clock cycles between the input and associated output change 4. Power consumption 5. Energy of executing a task 6. time power area L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 14

Fixed-Coefficient Multiplication Conventional Multiplication Z = X Y Y 3 Y 0 X 0 X 0 Y 1 X 0 Y 2 X 0 Y 3 Y 2 Y 1 X 0 X 3 Y 0 X 2 Y 0 X 1 Y 0 0 Y X 3 Y 1 X 2 Y 1 X 1 Y 1 X 3 Y 2 X 2 Y 2 X 1 Y 2 X 3 Y 3 X 2 Y 3 X 1 Y 3 Z 7 Z 6 Constant multiplication (become hardwired shifts and adds) Z 6 Z 5 Z 5 Z 4 Z 4 Z 3 X 3 Z 3 X 2 X 1 Z 2 Z 1 Z 0 Z = X (1001) 2 1 0 0 1 X 3 X 2 X 1 X 0 X 3 X 2 X 1 X 0 Z 7 X 3 X 2 X 1 X 0 Z 2 Z 1 Z 0 Y = (1001) 2 = 2 3 + 2 0 X << 3 Z shifts using wiring L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 15

Transform: Canonical Signed igits (CS) Canonical signed digit representation is used to increase the number of zeros. It uses digits {-1, 0, 1} instead of only {0, 1}. Iterative encoding: replace string of consecutive 1 s 0 1 1 1 1 2 N-2 + + 2 1 + 2 0 Worst case CS has 50% non zero bits 1 0 0 0-1 2 N-1-2 0 01101111 0 1 1 0 1 1 1 1 0 1 1 1 0 0 0-1 = 10010001 1 0 0-1 0 0 0-1 X << 7 Z << 4 Shift translates to re-wiring L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 16

Algebraic Transformations Commutativity istributivity A B A B A B C A C B A + B = B + A (A + B) C = AB + BC Associativity Common sub-expressions A B B C X Y X X Y C A (A + B) + C = A + (B+C) A B A B L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 17

Transforms for Efficient Resource Utilization A B C E FG H I 1 Time multiplexing: mapped to 3 multipliers and 3 adders 2 distributivity A C B E FG H I 1 Reduce number of operators to 2 multipliers and 2 adders 2 L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 18

A Very Useful Transform: Retiming Retiming is the action of moving delay around in the systems elays have to be moved from ALL inputs to ALL outputs or vice versa Cutset retiming: A cutset intersects the edges, such that this would result in two disjoint partitions of these edges being cut. To retime, delays are moved from the ingoing to the outgoing edges or vice versa. Benefits of retiming: Modify critical path delay Reduce total number of registers L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 19

Retiming Example: FIR Filter x(n) h(0) h(1) h(2) h(3) Symbol for multiplication y( n) y(n) irect form = h( n) x( n) = K i= 0 x( n i) h( i) x(n) associativity of the addition (10) h(0) h(1) h(2) h(3) T clk = 22 ns y(n) (4) retime x(n) h(0) h(1) h(2) h(3) Transposed form T clk = 14 ns y(n) Note: here we use a first cut analysis that assumes the delay of a chain of operators is the sum of their individual delays. This is not accurate. L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 20

Pipelining, Just Another Transformation (Pipelining = Adding elays + Retiming) Contrary to retiming, pipelining adds extra registers to the system add input registers How to pipeline: 1. Add extra registers at all inputs 2. Retime retime L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 21

The Power of Transforms: Lookahead y(n) = x(n) + A y(n-1) x(n) y(n) A loop unrolling x(n) A A 2 y(n) Try pipelining this structure distributivity y(n) = x(n) + A[x(n-1) + A y(n-2)] x(n) y(n) How about pipelining this structure! associativity A A A 2 x(n) y(n) x(n) y(n) A retiming A A 2 2 A 2 precomputed L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 22

Scan Testing shift out shift in ScanShift CLK... Idea: have a mode in which all registers are chained into one giant shift register which can be loaded/ 0 read-out bit serially. Test remaining (combinational) 1 logic by ScanShift (1) in test mode, shift in new values for all register bits thus setting up the inputs to the combinational logic 0 (2) clock the circuit once in normal mode, latching 1 the outputs of the combinational logic back into ScanShift the registers (3) in test mode, shift out the values of all shift in register bits and compare against expected results. Courtesy of C. Terman and IEEE Press L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 23

Trends: Chip in a ay (Matlab/Simulink to Silicon ) Mult1 S reg Mac1 X reg Mac2 Add, Sub, Shift Mult2 Map algorithms directly to silicon - bypass writing Verilog! Courtesy of R. Brodersen L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 24

Trends: Watermarking of igital esigns Fingerprinting is a technique to deter people from illegally redistributing legally obtained IP by enabling the author of the IP to uniquely identify the original buyer of the resold copy. The essence of the watermarking approach is to encode the author's signature. The selection, encoding, and embedding of the signature must result in minimal performance and storage overhead. same functionality, same area, same performance watermark of 4768 bits embedded (courtesy of G. Qu,, M. Potkonjak) L15: 6.111 Spring 2004 Introductory igital Systems Laboratory 25