Performance/Complexity Space Exploration : Bulk vs. SOI

Similar documents
Power Dissipation. Where Does Power Go in CMOS?

Lecture 2: CMOS technology. Energy-aware computing

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

A Novel Low Power 1-bit Full Adder with CMOS Transmission-gate Architecture for Portable Applications

EE115C Winter 2017 Digital Electronic Circuits. Lecture 6: Power Consumption

Delay Modelling Improvement for Low Voltage Applications

Lecture 7 Circuit Delay, Area and Power

MODULE III PHYSICAL DESIGN ISSUES

EE241 - Spring 2001 Advanced Digital Integrated Circuits

EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis

ECE 342 Electronic Circuits. Lecture 35 CMOS Delay Model

EECS 427 Lecture 11: Power and Energy Reading: EECS 427 F09 Lecture Reminders

Digital Integrated Circuits A Design Perspective

Last Lecture. Power Dissipation CMOS Scaling. EECS 141 S02 Lecture 8

University of Toronto. Final Exam

Chapter 8. Low-Power VLSI Design Methodology

EECS 141: FALL 05 MIDTERM 1

Physical Extension of the Logical Effort Model

CMPEN 411 VLSI Digital Circuits. Lecture 04: CMOS Inverter (static view)

EE241 - Spring 2003 Advanced Digital Integrated Circuits

Toward More Accurate Scaling Estimates of CMOS Circuits from 180 nm to 22 nm

Lecture 8-1. Low Power Design

Transistor Sizing for Radiation Hardening

ECE 438: Digital Integrated Circuits Assignment #4 Solution The Inverter

A Novel LUT Using Quaternary Logic

Design for Manufacturability and Power Estimation. Physical issues verification (DSM)

Announcements. EE141- Fall 2002 Lecture 7. MOS Capacitances Inverter Delay Power

Fig. 1 CMOS Transistor Circuits (a) Inverter Out = NOT In, (b) NOR-gate C = NOT (A or B)

High-to-Low Propagation Delay t PHL

Dynamic operation 20

Digital Electronics Part II - Circuits

VLSI Design, Fall Logical Effort. Jacob Abraham

2007 Fall: Electronic Circuits 2 CHAPTER 10. Deog-Kyoon Jeong School of Electrical Engineering

THE INVERTER. Inverter

Where Does Power Go in CMOS?

5.0 CMOS Inverter. W.Kucewicz VLSICirciuit Design 1

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

Topic 4. The CMOS Inverter

Introduction to CMOS VLSI Design (E158) Lecture 20: Low Power Design

Control gates as building blocks for reversible computers

Lecture 5: DC & Transient Response

EE141Microelettronica. CMOS Logic

CHAPTER 15 CMOS DIGITAL LOGIC CIRCUITS

EE115C Digital Electronic Circuits Homework #4

EE 330 Lecture 37. Digital Circuits. Other Logic Families. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)

CMOS logic gates. João Canas Ferreira. March University of Porto Faculty of Engineering

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

DC and Transient Responses (i.e. delay) (some comments on power too!)

CSE493/593. Designing for Low Power

Objective and Outline. Acknowledgement. Objective: Power Components. Outline: 1) Acknowledgements. Section 4: Power Components

EE141-Fall 2011 Digital Integrated Circuits

9/18/2008 GMU, ECE 680 Physical VLSI Design

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

ECE321 Electronics I

Floating Point Representation and Digital Logic. Lecture 11 CS301

VLSI VLSI CIRCUIT DESIGN PROCESSES P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

EEC 118 Lecture #5: CMOS Inverter AC Characteristics. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EE 230 Lecture 31. THE MOS TRANSISTOR Model Simplifcations THE Bipolar Junction TRANSISTOR

Svoboda-Tung Division With No Compensation

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Heap Charge Pump Optimisation by a Tapered Architecture

EE 434 Lecture 33. Logic Design

Lecture 1: Gate Delay Models

Errata of K Introduction to VLSI Systems: A Logic, Circuit, and System Perspective

On-Line Hardware Implementation for Complex Exponential and Logarithm

Chapter 5. The Inverter. V1. April 10, 03 V1.1 April 25, 03 V2.1 Nov Inverter

Integrated Circuits & Systems

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Lecture 4: CMOS Transistor Theory

ASIC FPGA Chip hip Design Pow Po e w r e Di ssipation ssipa Mahdi Shabany

Lecture 6: DC & Transient Response

mobility reduction design rule series resistance lateral electrical field transversal electrical field

Introduction to Computer Engineering ECE 203

Very Large Scale Integration (VLSI)

P. R. Nelson 1 ECE418 - VLSI. Midterm Exam. Solutions

Status. Embedded System Design and Synthesis. Power and temperature Definitions. Acoustic phonons. Optic phonons

Mixed Analog-Digital VLSI Circuits & Systems Laboratory

EE 330 Lecture 39. Digital Circuits. Propagation Delay basic characterization Device Sizing (Inverter and multiple-input gates)

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

Transistor Implementation of Reversible Comparator Circuit Using Low Power Technique

Midterm. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Lecture Outline. Pass Transistor Logic. Restore Output.

Slide Set 6. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary

Lecture 4: CMOS review & Dynamic Logic

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

Low Power CMOS Dr. Lynn Fuller Webpage:

ENEE 359a Digital VLSI Design

ECE 342 Solid State Devices & Circuits 4. CMOS

CMOS Transistors, Gates, and Wires

CMOS Inverter (static view)

VTCMOS characteristics and its optimum conditions predicted by a compact analytical model

PLA Minimization for Low Power VLSI Designs

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation

! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips

Digital Electronics Part II Electronics, Devices and Circuits

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

VLSI Design I; A. Milenkovic 1

EECS150 - Digital Design Lecture 22 Power Consumption in CMOS. Announcements

Transcription:

Performance/Complexity Space Exploration : Bulk vs. SOI S. J. Abou-Samra and A. Guyot TIMA laboratory 6, av. Félix Viallet, F-38031 Grenoble - France Tel : (+33) 76 57 8 1 Fax : (+33) 76 7 38 1 E-Mail: Abou-Samra@imag.fr Abstract : Performance and complexity are considered here as two orthogonal axes. Performance metrics are recalled. Then different complexity metrics and scales are proposed. Different definitions of complexity are used depending on the considered level of abstraction. Finally, SOI and bulk CMOS technologies are compared in this space. 1. INTRODUCTION It is now well known [1,, 3] that SOI technologies deliver higher performance than their bulk counterparts. But today SOI designs are ported from bulk with only slight modifications and without taking advantage of some design subtleties that are SOI-specific []. One of these subtleties is the better use of complex cells that one can make in SOI compared to bulk. We will see how the reduced source/drain to body capacitance favours complex cells. Higher performance can then be achieved, but first, detailed explanations of what is meant by performance and complexity are given. Only the gate and transistor levels are addressed here, and not the architecture and algorithm ones [13]. This paper is structured as follows : In the first part, performance metrics are recalled, and the different figures of merit used are detailed. In the second part, an attempt is made to quantify and measure complexity : complexity scales are introduced. In the third part, comparative results for bulk and SOI CMOS are shown and discussed; finally, some concluding remarks are given as well as a brief presentation of the related work.. WHAT'S PERFORMANCE? The word performance is subjective and the way it is commonly used in microelectronics can sometimes be misleading, as many things can be put behind depending on what is expected : throughput, energy per operation or energy per throughput. The problem of performance metrics has been addressed by C. Piguet [5, 6] amongst others. An overview of the different components of performance are defined or reminded here. This work is supported by the ESPRIT project HIPERLOGIC (# 003)

.1. Throughput If one is looking for the best throughput, then performance will be the delay D, regardless to any other parameter (area, energy consumption etc.). The delay is a decreasing function with V dd. As a rule of thumb, it can be assumed that D ~ α.v dd /(V dd -V t ) (1) Where α is a constant depending on the technology for a given design style. This equation does not account for short channel effects, it is only a first order approximation. From equation (1) one can see that the delay requirements can always be fulfilled by increasing the supply voltage - in the limits of the technology tolerance and the model boundaries though (fig. 1). The delay is measured in nanoseconds. The throughput is the inverse of the delay, and is given in MHz. 6 Delay (arbitray units) 5 3 1 0 1 3 5 6 7 V t Supply voltage Figure 1: Gate delay vs. supply voltage Better delay models [1] are of course needed for accurate predictions... Battery life If the only constraint for a design is battery life (like in a watch for example), then the energy per operation is the metrics to use. Energy per operation EPO is the same as power delay product PDP, but the term Energy Per Operation is preferred here, as the delay (or the throughput) is not a constraint. Indeed, the energy required for an operation is proportional to V dd (say E = β.v dd ) and the power dissipated depends on the frequency : P = β.v dd.f = β.v dd /D; thus, P x D = PDP = β.v dd = EPO ()

So, EPO is a monotone function of V dd (fig. ) meaning that it can be made as small as required simply by lowering the supply voltage. Of course, this strategy is not compatible with increasing the throughput, and furthermore, there are technological limits for the reduction of the supply voltage. The EPO also depends on the architecture, i.e. on the operation to perform. This can be influenced by parallelism or pipelining schemes [6] that do not appear in eq. (). The EPO is measured in picojoules. EPO (Arbitrary unit) 0 35 30 5 0 15 10 5 0 0 1 3 5 6 7 Supply voltage.3. Efficiency Figure : EPO vs. supply voltage And what if both constraints, throughput and battery life, need to be satisfied? This means that the EPO has to be minimised in the same time as the delay. A compromise will be necessary, as, on one hand EPO is improved by lowering V dd (fig. ) and on the other hand, delay improvement requires higher V dd (fig. 1). This compromise is the minimum of the Energy Delay product EDP. EDP = EPOxDelay (3) Using the coarse power and delay models shown in the previous paragraphs, it is easy to solve : EDP V dd = 0 () The result is V dd = 3V t. Actually, this is a first order approximation, further refinements require more technology based considerations. The optimal supply voltage is lower in SOI than in bulk [1]. On figure 3 the EDP is plotted against V dd for a ring oscillator designed in a 0.1µm technology [7, 11, 1] with ( V tn + V tp )/ = 0.7V. the

EDP is measured in pj/mhz. In this paper, the energy delay product will be called performance, and the nearly optimal supply voltage of V (fig. 3) will be used for further optimisation of the EDP in the performance complexity space for both SOI and bulk technologies. 1600 100 Energy Delay Product (*1e-5 J.s) 100 1000 800 600 00 1 3 5 6 Vdd (V) Figure 3: EDP vs. V dd for bulk S i 3. COMPLEXITY QUANTIFICATION At first sight, one might be tempted to define complexity as being equal to the number of transistors regardless to the level of abstraction - very simple! But as this might be true at the system level, it becomes false at the gate level. In fact, the complex cell implementation of a given function takes usually less transistors (fig. ). The reduction of the transistor count has a cost : a logical cost and a performance cost. The logical cost is the loss of the internal nodes (γ on fig. ). This is a purely formal cost as it does not affect the behaviour of the circuit. The performance cost will be discussed in the following sections. γ γ Figure : Building complex gates Gate decomposition for higher speed or lower power is a well explored topic [8, 9,

10]. Alternative implementation of a logical function are given, but always without a complexity scale. The attempt is made here to define such a scale. Complexity metrics actually depend on the level of abstraction one is dealing with. Here, only the gate and transistor levels will be considered. In this section, complexity quantification schemes are proposed, and the experimental protocol is described. 3.1. Gate level At the gate level, the complexity of a design is related to the length of the critical path, thus at this level of abstraction it seems suitable to define complexity as being the logical depth needed to achieve a given logical function. In this case, a Nand 16 is taken. It can be implemented in one step - with 16 transistors in series (corresponding to Complexity = 1), in three steps like in figure 5a (complexity = 3) or in five steps like in figure 5b which corresponds to a complexity of 5. Other decomposition schemes are possible, but only those in which all gates have the same number of inputs are retained. Thus, the complexity scale at the gate level does not depend on the physical implementation (transistor level) addressed in the next section. The different performance figures can now be plotted on this complexity scale. The experimental protocol is described later. It should be noted that this definition of complexity depends on the logical function performed. Figure 5a : Complexity = 3 Figure 5b : Complexity = 5 3.. Transistor level A finer granularity is needed at the transistor level, and, at this level, the complexity of a gate does not depend on the function it performs. Complexity can though be considered as being proportional to the number of transistors in series. Comparisons can

then be easily performed after normalisation. On this scale, the results are divided by the number of bits, and thus all figures are normalised per bit. 3.3. Experimental Protocol This section discusses the benchmark circuits used to explore the performance/complexity space. All results are obtained by HSPICE simulations. 3.3.1. Gate level benchmarks The benchmark used here is a 16 input Nand gate. It is convenient to choose a number of inputs that can be written as n, thus the gate can be decomposed in a balanced tree like manner. For example, the Nand 16 can be decomposed using only input gates (fig. 5a) or input ones (fig. 5b). The load capacitance is the same for all cases : C l = 100fF. The measured delay is worst case, i.e. we make sure to always switch the transistor which is the further from the output for all gates. These inputs are marked on figures 5a and 5b. The EPO is taken as the average of the current flowing through V dd over the duration of the transition, times V dd, times the gate delay. 3.3.. Transistor level benchmarks As the scale here is the number of transistors in series, the logical function performed doesn t matter. We assume here a Nand gate with to 8 inputs (fig. 6). All figures are given per bit, thus the loading capacitance has to be proportional to the number of inputs. The load is C l = n.50ff where n is the number of inputs of the gate. The measured delay is worst case, i.e. the closest input to ground changes. 1 V dd... i... n 1 C l... i... n Figure 6 : n input Nand gate Electrical simulations were carried out for the bulk and the SOI versions of the same technology; the results are discussed in the following section.

. RESULTS AND DISCUSSION A bulk and a SOI versions of the same CMOS technology are compared here using the benchmarks discussed in the previous section. The figure 7 shows the EPO and the EDP as a function of the logical depth for a 16 input Nand gate loading 100fF. 1,05 10-1 1 10-1 9,5 10-13 Bulk EPO SOI EPO 3 10-1,5 10-1 EPO (Joule) 9 10-13 8,5 10-13 Bulk EDP SOI EDP 10-1 1,5 10-1 EDP (Joule/Hz) 8 10-13 7,5 10-13 1 10-1 7 10-13 5 10-0 1 3 5 6 Logical depth Figure 7 : EPO and EDP vs. logical depth On the figure 8 the EPO/bit and the EDP/bit are plotted against the number of transistors in series. 1,6 10-13 5,5 10-3 1, 10-13 Bulk EPO/bit SOI EPO/bit 5 10-3,5 10-3 EPO/bit (Joules) 1, 10-13 1, 10-13 1,38 10-13 10-3 3,5 10-3 3 10-3 EDP/bit (Joules/Hz) 1,36 10-13 Bulk EDP/bit SOI EDP/bit,5 10-3 10-3 1,3 10-13 1 3 5 6 7 8 9 Number of transistors in series 1,5 10-3 Figure 8 : EPO/bit and EDP/bit vs. number of transistors in series

Some comments and design directives can be drawn from these results : There is an optimal decomposition scheme for large functions in terms of EDP but not in terms of EPO (fig. 7). In this case, the implementations with logical depth equals 3 and 5 (fig. 7) give approximately the same EDP. Then of course, one should choose the implementation with the lower EPO and transistor count. The logical depth that minimises the EDP depends on the function being analysed and on the technology. From figure 8 it can be seen that the difference in terms of EDP/bit between SOI and bulk increases with the number of transistors in series. This means that SOI favours complex gates design. This increasing difference comes mainly from the very small junction capacitance in SOI as compared to bulk, leading to faster switching of a larger number of transistors in series. The EPO/bit curves for bulk and SOI are parallel with a lower value for SOI. For both SOI and bulk, the EPO/bit decreases with complexity (the number of transistors in series) where the EDP/bit increases (fig. 8). Thus, whether using SOI or bulk, if the battery life is the critical parameter, then one should make extensive use of complex cells. On the other hand, if the EDP (or energy per throughput) is the main issue, then complex function should be mapped on smaller cells. 5. CONCLUSION The originality of this paper is the introduction of quantitative measures for complexity. The scales that are introduced are adapted to the considered level of abstraction. Also, a quantitative comparison between bulk and SOI is carried out, enabling to make a better use of SOI technologies for higher speed and lower energy consumption. The small differences shown here between SOI and bulk are due to the fact that in these benchmarks the loading capacitance C l are the same for both technologies. This is of course not the case in real designs where the loading capacitance are usually 5 to 30 percent smaller in SOI which leads to much higher improvements in terms of energy delay product (a factor of 3 to 5) [1]. This work is part of wider investigations concerning design methodologies and design tuning for a three dimensional 0.1µm SOI on SOI technology [7, 11, 1].

6. REFERENCES [1] J.-P. Collinge, Silicon-On Insulator Technology : Materials to VLSI, nd Edition, Kluwer, 1997, ISBN : 0-793-8007-X [] P. Zdebel, Low Power/Low Voltage CMOS Technologies, A Comparative Analysis, Microelectronics Engineering, Vol. 39, Elsevier, Dec. 1997, pp. 13-137 [3] B. A. Chen et al., In proc. of the th Intl. Conference on Solid-State and Integrated Circuits Techniques, 1995, p. 60 [] S. R. Wilson, M. A. Mendicino, M. L. Alles, TFSOI Circuit Applications, In proc. of the 8 th International Symposium on SOI Technology and Devices (ECS'97), Paris, France, September 1997, pp. 359-37 [5] C. Piguet, Circuit and Logic Level Design, in Low Power Design in Deep Submicron Electronics, ed. by W. Nebel and J. Mermet, Kluwer, 1997, ISBN 0-793-569-X, pp. 105-133 [6] C. Piguet, Low-Power and Low-Voltage CMOS Digital Design, Microelectronics Engineering, Vol. 39, Elsevier, Dec. 1997, pp. 179-08 [7] S.J. Abou-Samra, V. Dudek, F. Ayache, A. Guyot, B. Courtois and B. Höfflinger, Designing With 3D SOI CMOS, In proc. of the 8 th International Symposium on SOI Technology and Devices (ECS'97), Paris, France, September 1997, pp. 38-388. [8] J.-M. Masgonty et al., Technology and Power-Supply -Independent Cell- Library, IEEE CICC 91, San Diego, CA., USA, May 1-15 1991 [9] C. Piguet et al., Low Power Design of a Standard Cell Library, Low-Voltage Low-Power Workshop during ESSCIRC 95, Lille, France, Sept. 1995 [10] C. Mead, M. Renn, Cost and Performance of VLSI Computing Structures, IEEE JSSC-1, April 1979, pp. 55-6 [11] S.J. Abou-Samra, P. A. Aisa *, A. Guyot and B. Courtois, 3D CMOS SOI for High Performance Computing, In proc. of the 1998 International Symposium on Low-Power Electronics and Design (ISLPED), Monterey, CA, August 10-1, 1998 ( * University of Bologna, Bologna, Italy) [1] D. Auvergne, J. M. Daga and S. Turgis, Power and Delay Macro-Modelling for Submicronic CMOS Process : Application to Low Power Design, Microelectronics Engineering, Vol. 39, Elsevier, Dec. 1997, pp. 09-33 [13] J. Smit, Energy Complexity & Archtecture, In proc. of the 7 th International Workshop Power and Timing Modeling Optimization and Simulation (PATMOS'97), Louvain la Neuve, Belgium, September 1997, pp. 357-368 [1] S.J. Abou-Samra, J. Arweiler * and A. Guyot, Low Power SOI CMOS Multipliers : D vs. 3D, In proc. of the th European Solid State CIRcuits Conference (ESSCIRC 98), The Hague, The Netherlands, September - 1998 ( * Technische Hochschule Darmstadt, Germany)