The concept of Finite-State Machine: FSM (Hoffman Model)

Size: px
Start display at page:

Download "The concept of Finite-State Machine: FSM (Hoffman Model)"

Transcription

1 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic Wiley-Interscience and IEEE Press, January Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic lock Frequency of elected Historic omputers and upercomputers The concept of Finite-tate Machine: FM (Hoffman Model) ystem ray-x-mp ray-,-m yber 8/99 IBM 9 Amdahl 58 IBM 8X Univac /9 MIP-X HP-9 Motorola 68 Bellmac-A Intro ate Technology MI EL MI EL EL EL LI EL LI TTL LI EL VLI MO VLI MO VLI MO VLI MO lass Vector Processor Vector Processor Mainframe Mainframe Mainframe Mainframe Mainframe Microprocessor Micro-mainframe Microprocessor Microprocessor Nominal lock Period (n) , Nominal lock Frequency (MHz) , Inputs (X) Present tate: n ombinational Logic locked torage Elements Outputs (Y) Y=Y(X, n ) Next tate n+ lock n+ = f ( n, X) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Another epresentation of FM Another epresentation of FM locked torage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as Inputs (X) storage elements! Present tate: n Their main purpose is to synchronize fast and slow paths: prevent the fast path from corrupting the state ombinational Logic Path Blocker (E) Blocker Transparent logic signals adjusted not to arrive earlyer lock signals blocked Outputs (Y) Y=Y(X, n ) ignal Blocked can not corrupt present state n Next tate n+ n+ = f ( n, X) Orderly change of state from n to n+ at this point Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Inputs (X) locked torage Elements: Flip-Flops and Latches should be viewed as synchronization elements, not merely as storage elements! Present tate: n Their main purpose is to synchronize fast and slow paths: prevent the fast path from corrupting the state ombinational Logic Blocker Transparent Path Blocker (E) logic signals adjusted not to arrive earlyer lock signals blocked Outputs (Y) Y=Y(X, n ) ignal Blocked can not corrupt present state n Next tate n+ n+ = f ( n, X) Orderly change of state from n to n+ at this point Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

2 tate hanges in the Finite-tate Machine iagram of a Pipelined ystem X t ombinational Y t X t+ ombinational Y t+ Logic Logic IA Instr. ache I egister File ecode ALU ata ache egister File WA n n+ lock U U Instruction Fetch ecode Execute ache Access Write Back Y = n- n n+ Time φ φ φ φ φ φ φ φ φ φ WITE EA Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 Machine Execution Phases with espect to the Machine ycle: lock ycles Block Address M- Instruction Fetch Φ Φ Instruction ache ache Block M- ependency esolution locks: ename and Allocate M- Instruction Issue Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Increase in the lock Frequency and ecrease in the Number of Logic Levels, Mhz, A Pentium III 64A Pentium() MP75II P6 6, 6 Pentium() 99 Intel Processor Freq IBM Power P scales X per E technology Gate delays/clock generation 995 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Gate elays/lock Period ystem clocking schemes: (a) single-phase clock; (b) two-phase clock; (c) multiple-phase clock Local generation of Two-Phase locks as used in IBM /9 G4 φ (a) LKG A_LK AN_IN IN L L (AN_OUT) φ φ (b) B_LK LKG _ENABLE φ φ _IABLE φ 4 φ k Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

3 V On-hip lock Insertion elay Out ontrol voltage Temperature ensor ext. lock driver int. (a) ompenstation Network (b) Insertion delay (a) rystal oscillator (b) Temperature-compensated crystal oscillator Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 ext. Phase-Locked Loop Block iagram and Operation P p PLL LP cv VO lock driver load load ext. elay-locked Loop Block iagram and Operation P p LP LL cv VL lock driver load int. load load ext. int. cv p 5 o out of phase 45 o out of phase Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 int. load ext. int. cv p 5 o out of phase 45 o out of phase Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 PLL Frequency Multiplication ing-oscillator-based VO with MO Inverters as elay Elements PLL V reg ext. A P LP VO B lock driver int. f int. ext. load = f A n= k+ T inv f osc = ; k nt inv Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8

4 L Tank-Based VO, Equivalent A ircuit Model and urrent Waveform lock Parameters: Period (T), Width, ise and Fall Times t rise t fall L it () L eq I tail lock I tail I tail W T W w = T - duty cycle Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic lock Parameters: Period, Width, lock kew and lock Jitter t VLK lock Uncertainties t V_LK ef_lock ef_lock t skew t jit t skew t + jit t skew t jit t skew t + jit eceived lock t VLK T eceived lock tv_lk T lock uncertainty: jitter+skew Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic The oncept of Logic Islands (Wagner 988), opyright 988 IEEE Island lock Tuning Points (Wagner 988), opyright 988 IEEE ystem Oscillator lock divider/buffer-- Observation point elay element Tuning delay - Tune-point level elay in clock-waveform manipulation + cable delay to on-board input Island ivide, shape, and buffer Island 4 On-board clock-control chip Tuning delay- Tune-point level rystal Oscillator Island lock-control chip clock gating + clock-chopping delay lock-distribution chip Tuning delay- Tune-point level hape and buffer lock-distribution chip lock-powering delay On-chip delay ubisland A ubisland B ubisland Bistable-element clock-input delay Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4

5 Various lock haping Elements and Obtained lock ignals. (Wagner 988), opyright 988 IEEE lock In lock In lock In lock In Element A delay = d g delay = delay = d i Element B Element Element lock Out lock Out lock Out lock Out rystal Oscillator lock distribution network within a system, (b) on the board, and (c) tuning of the clock. (Wagner 988), opyright 988 IEEE Gate PU Gate PU ection ystem clock (square wave) T Gate PU ection Gate remote section Board lock In (a) hip lock in lock-distribution chip haping and powering To boards - To boards 4-6 To boards 7-9 AM Write trobe in Logic chip AM chip (a) Positive pulse (b) Input lock hop (Element A) hrink (Element ) O W +d d W- Time Board lock In Tune Point Tune Point Tunable clock chopper Tune Point lock chopper lockpowering tree lockpowering tree Early clocks Normal clocks hop (Element ) +d d W+ Tune Point 4 lock chopper lockpowering tree Late clocks (b) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 (c) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 lock distribution methods: (a) an matched tree, and (b) a grid (Bailey and Benschneider 998), opyright 988, IEEE delay matched clock distribution topologies: (a) a binary tree, (b) an H tree, (c) an X tree, (d) an arbitrary matched matched tree (From Bailey in handrakasan et al. ), opyright 988, IEEE (a) (b) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 (c) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 (d) lock distribution grid used in a E Alpha 6-MHz processor (Bailey and Benschneider 998), opyright 988, IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9

6 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter : locked torage Elements Operation Evolution of a Latch : Wiley-Interscience and IEEE Press, January (a) keeper; information is latched (b) - latch; information can be modified (c) -o-p latch; - latch that can implement a function IMPOTANT! Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic Adding lock ontrol to a Latch Importance of Latch -o-p onfiguration Latching can be done only when clock is active, otherwise no change is allowed: (a) locked -latch; (b) timing diagram of clocked -latch Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic (a) Basic Earl s Latch; (b) Implementing the arry function Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 4 Master-lave Latch IBM L ompatible locked torage Element Vdd Vdd (a) non-overlapping clocks; (b) single external clock; (c) timing diagram; (d) PowerP 6 M Latch (Gerosa, J /94), opyright 994 IEEE Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 6

7 True-single-phase-clock (TP) M- latch Yuan and venson (989), opyright 989 IEEE TP M- Latch Operation L L =; =; X omino tage =; =; transparent transparent =; n =; n- transfer of data between L and L Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 8 Pulse latch (Kozu at al. 996) opyright 996 IEEE Pulse latch: Intel s explicit pulsed latch (Tschanz at al. ), opyright IEEE EN _in (a) (b) (c) (a) local clock generator; (b) single latch; (c) clock signals Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 9 Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic Trigger V -V (a) V Flip-flop eset (a) Early version of a flip-flop; (b) PP-8 direct set-reset sequential element Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic -V (b) -5V et +V ifference between Flip-flop and M- Latch No lock lock ata ata Pulse Generator lave Latch Flip-Flop (a) Pulse apturing Latch lock: Φ lock: Φ ata ata Master (L ) Latch lave (L ) Latch M- Latch (a) General flip-flop structure; (b) general M- latch structure Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic (b)

8 Black-box view of the Flip-flop and M- latch Texas Instruments N7474 Flip-flop lock ata ata ata F-F lock Latch lock Master - L lave - L ata ata lock ata F-F lock Latch lock FF L lock ata L FF Flip-Flop failed! Experiment causing the flip-flop to fail while the M- latch is still operational Very much used and analyzed Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 4 Pulse generator block of N7474: malfunctioning due to a gate-delay N N N N (a) (b) Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 5 ystematic erivation of a FF using Karnaugh map Not Allowed Hold previous state n erivation of the pulse-generating stage of a flip-flop (only the n signal is shown) Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic n = + + n + n n + = ( + ) Hold "" apture "" apture "" Hold "" V V reating the time reference points for opening and shutting the flip-flop V l k l k Equivalent to: l k n (a) (b) (c) (a) Pulse generator stage of the sense-amplifier flip-flop. (Madden and Bowhill, 99); (b) Improvement for floating nodes. (obberpuhl, 997); (c) Pulse generator stage improvement by proper design. (Nikolic and Oklobdzija, 999). n n n + = + + Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 8

9 Hybrid-Latch Flip-Flop (HLFF) introduced by Partovi at al. Logic representation of Partovi s flip-flop (HLFF) Vdd = Pulse Generator Enable econd tage Latch = = = = Pulse Generator signal at = node X econd tage Latch Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic 9 = Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic = signal at node X ystematically derived single-ended flip-flop (according to the slide no. 8) Vdd Vdd Vdd M P M P Vdd M N M P4 M P M P5 Inv 4 M N7 M P6 Inv 5 M N M N6 M N4 M N8 Inv M N M N5 First tage Inv Inv (a) econd tage (b) (a) circuit diagram; (b) logic representation. (Nedovic and Oklobdzija, ). opyright IEEE Nov. 5, igital ystem locking, Oklobdzija, tojanovic, Markovic, Nedovic

10 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter : Timing and Energy Parameters Basic timing diagram in flip-flops t,lh t,hl U H t < U etup time violation efinitions: lock-to- elay: t low-to high= t,lh high-to-low= t,hl etup time U Hold time H Wiley-Interscience and IEEE Press, January Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic etup and hold time behavior as a function of clock-to-output delay etup time behavior as a function of data-to-output delay 5 -Output [ps] etup 5 5 Hold ata to Output elay onstant - egion - - Variable - egion 45 o Failure egion 5 m ata arrives early ata to lock elay U opt ata arrives late ata- [ps] Neither etup nor Hold time are fixed constant parameters. They are function of ata-to-lock time distance. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic When -to- delay is observed, we can see that data can come closer to the triggering event than we thought. ata-to- is the ELEVANT parameter NOT lock-to- as many think! Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 - and - delay as a function of - offset etup time, hold time, sampling window and clock width in a flip-flop FF ampling Window U H t W etermining the optimal setup U opt and hold time H opt. t Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

11 Latch: setup and hold time L Illustration of a data path Unlike Flip-Flop, Latch has two situations: (a)ata is ready waiting for the clock (b)latch is open data arrives during the active clock signal t U t W (a) t H A t - Logic t Logic t skew U H B (b) (a) early data arrival; (b) late data arrival Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 Late ata Arrival and Time Borrowing in a pipelined design ource T dw >dw >dw Logic T T > T estination dw ycle- dw ycle- dw ata-to-output () time window moves around the time axis; (becoming larger or smaller) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Early ata Arrival and Internal ace Imunity The maximum clock skew that system can tolerate is determined by the clock storage elements: If - delay of E is shorter than H, race can occur if there is no logic inbetween. If - is greater than H + possible skew no problem: t - > H+ t skew Internal race immunity: = t - -H Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Impact of supply voltage on the sampling window t - [ns] Energy Parameters.5. V [V] [ps] ampling window determines the minimum required duration of data signal Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

12 omponents of Energy onsumption witching Energy hort-ircuit Energy Leakage Energy tatic Energy Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic omponents of Energy onsumption t+ T E = V i (τ ) dτ t V This energy has four components: E + E + E = Eswitching + Eshort circuit witching Energy: Energy consumed by E during one clock period T, where t is chosen to include all relevant transitions: arrival of new data, clock pulse, and output transition. leakage static N Eswitching = α ( i) i Vswing ( i) V i= N is the number of nodes i is the capacitance of the node I α-(i) is the probability that a transition occurs at the node i Vswing(i) is voltage swing of the node I Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 hort-circuit current in an inverter Projected leakage currents Leakage power will soon become a significant portion of the total power consumption in modern microprocessors V in i pull-up i short-circ L V out V in i short-circ i pull-down i L (a) (b) (a) pull-up; (b) pull-down operation L V out i L Ioff (na/ ),,.m.m.8m.5m Temp () Assuming doubling of transistors / generation the leakage current will increase about 7.5 times corresponding to a 5 times increase in total leakage power Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 Where does the Energy go in E?. Internal clocked nodes in storage elements. Internal non-clocked nodes in storage elements. ata and clock input load 4. Output load. Energy Breakdown. Internal clocking energy. ata and lock Input Energy. Energy in Internal Non-clocked Nodes 4. Energy in Output Load 5. Energy per Transition 6. Glitching Energy Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8

13 Energy Breakdown in locked-torage Elements during one of the possible input data transitions E E int E ext E - Y/N Y/N N E - Y Y Y/N E - Y Y/N Y/N Y/N N Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Y E - Two cases: torage elements without pre-charge nodes torage elements with pre-charge nodes E characterization and Test setup t lock Energy Internal Energy: Energy in Output Load V - V -E t V - E + E Eint = E E Load Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic L Energy per transition The energy-per-transition is the total energy consumed in a E during one clock cycle for a specified input data transition: -, -, -, or - This metric is crucial in that it yields significant insight about circuit energy By inspection of the node activity in a E for different input data transitions, the energy-per-transition can be utilized to obtain the energy breakdown between clocked nodes, internal nodes, and the external output load. This forms a good basis for the study of alternative circuit techniques that deal with internal clock gating. The energy breakdown information also offers valuable information about the tradeoffs associated with reduced clocking energy and the energy penalty incurred by the clock-gating logic, thus providing a better understanding of the optimization goals for the overall design average Energy per transition p E p E p E p E E = p E + p E + p E + p E Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic init. int. nodes ( = ) Glitching Energy in Es init. int. nodes 4 ( = ) pos. glitch ( = ) Glitches are generated by the unintended transitions propagating from the fan-in gates, termed propagating glitches. Glitches produced by non-glitch transitions at the inputs, called generated glitches E pos. glitch ( = ) avg glitch = 4 i= β E i gi neg. glitch ( = ) neg. glitch ( = ) Interface with lock Network and ombinational Logic We assumed that the data and clock inputs were supplied by drivers with sufficient drive strength. The input clock and data capacitances are important interface parameters for the clock network and logic design. The clock network designer and logic designer need to be aware of these capacitances in order to design circuits that drive storage elements. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4

14 Interface with lock Network and ombinational Logic Interface with ombinational Logic: The relevant parameters to the combinational logic designer are: E input data slope Input data capacitance The data slope affects performance and energy consumption of both driving logic and storage elements. lock and data slopes are generally not equal. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Interface with lock Network and ombinational Logic Interface with lock Network: Es are affected by clock skew and clock slope. The total load of the clock distribution network is defined by the input capacitance of the clock node and number of Es on a chip. Increase in clock slope results in degradation of the E performance - the clock network designer has to know what slopes E can tolerate. This is especially important if Flip-Flops are used. The clock slope also affects energy consumption of the clock distribution network. If larger clock drivers with smaller fanout are used, the clock edges are sharper and the storage element performance better, at the expense of an increase in energy consumption of the clock network. Optimal tradeoff is achieved with minimal energy consumption that delivers the desired storage element performance. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 Interface with lock Network and ombinational Logic To evaluate the total clocking energy per clock cycle in the entire clock subsystem, one needs to add the energy consumed in the clock distribution network. The energy consumed in the clock distribution network depends on the total switched capacitance which is determined by the total number of clocked storage elements on a chip and the input capacitance of their clock inputs, the total wiring capacitance, and the total switched capacitance of clock drivers as given by: = N + + distrib net FF in, FF wire sw buff The last two terms depend on buffer insertion/placement strategy and should be minimized. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7

15 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter 4: Pipelining and Timing Analysis Timing in a digital system using a single clock and flip-flops flops Wiley-Interscience and IEEE Press, January Nov. 4, Timing in a digital system using a single clock and flip-flops ombinational logic = ource estination F-F skew F-F ata (cycle-) ycle ycle U critical race H critical pathuseful time data arrives early late clock arrival U U data arrives late early clock arrival Nov. 4, Late ata Arrival Analysis (single clock, FF) If we set the time reference to t= for the leading edge of the clock We set max. clock uncertainties tot L from the nominal time of arrival, T T (for the trailing edge). m represents the minimal lock-to- (output) delay of the Flip-Flop Lm represents minimal delay through the logic (as opposed to index M where M and LM represent maximal delays). Nov. 4, Late ata Arrival Analysis (single clock, FF) The latest data arrival in the next cycle is: t = T + + t LN L M But it should be there at least at this time: o that: Giving us: L t LN = P T L L U P T U T + + t L M M P T + U + + t t P T + U + L TL M M U TL This time! logic t Nov. 4, Early ata Arrival Analysis (single clock, FF) It is commonly misunderstood that the Flip-Flop provides edge-to-edge timing and is thus easier to use, as compared to the Latch based system, because it does not need to be checked for fast paths in the logic (Hold-time violation). This is not true, and a simple analysis that follows demonstrates that even with the Flip- Flop design the fast paths can represent a hazard and invalidate the system operation.

16 Nov. 4, Early ata Arrival Analysis (single clock, FF) If the clock controlling the Flip-Flop releasing the data is skewed so that it arrives early, and the clock controlling the Flip-Flop that receives this data arrives late, a hazard situation exists. This same hazard situation is present if the data travels through a fast path in the logic. A fast path is the path that contains very few logic blocks, or none at all. This hazard is also referred to as critical race (or race-through) Nov. 4, Early ata Arrival Analysis (single clock, FF) The earlyest data arrival in the same cycle is: EArr But it should be there not before this time: t = + T H o that: t = T + + L m EArrN L + Lm Giving us limits on the fast paths: T L T L TL + m + Lm > + TL + H Lm > = T + H L LB W T + T + t T L Wm m m H This time! Analysis of a ystem using a ingle Latch ystem using a ingle Latch ystem using a single Latch is more complex to analyze than Flip-Flop based one. ingle Latch is transparent while the clock in active and the possibility for the race-through exists. This analysis is still much simpler than a general analysis of a system using two Latches (Master-lave Latch based system). Use of a single Latch represents a hazard due to the transparency of the Latch, which introduces a possibility of races in the system. Therefore, the conditions for the single-latch based system must account for critical race conditions. Presence of the E delay decreases the useful time in the pipeline cycle. Therefore, in spite of the hazards introduced by such design, the additional performance gain may well be worth the risk. Nov. 4, Two ways of using a latch in a single-latch-based system: W Nov. 4, t ombinational Logic Period (a) t / ombinational Logic t ombinational Logic / ombinational Logic Transparent Transparent t / Period Period (b) Nov. 4, Late ata Arrival Analysis In the case of a Latch, input signal need to arrive at least a etup Time U before the trailing edge. This edge could arrive earlier. Thus, the latest arrival of data into the latch that assures reliable capture after the period P has to be: tlarr W TT U + P ata captured at the end of the clock period could be a result of two events (whichever later): a) The data was ready, and clock arrived at the latest possible moment T L, and the worse case delay of the Latch i.e. M was incurred. b) The clock was active and data arrived at the last possible moment, which is a setup time U and clock skew time T T before the trailing edge of the clock. - In both cases the path through the logic was the longest path LM. Under the worse scenario data must arrive in time: max { T +, W T U + } + W T U P L M T M LM T +

17 Nov. 4, Late ata Arrival Analysis This gives a constraint for the clock speed in terms of P such as: P max T + T + U + W, + { L T M M } LM This inequality breaks down into two inequalities: P m LM + LM M P = P + + T + T + U W L M This shows the minimal bound for Pm, which is the time to traverse the loop: tarting from the leading edge of a clock pulse, there must be time, under worst case, before the trailing edge of the clock in the next cycle, for a signal to pass through the Latch and the logic block in time to meet the setup time constraint. The value of P = Pm determines the highest frequency of the clock. T EArrN arrived Latched early Nov. 4, Early ignal Arrival Analysis (ingle Latch Based ystem) The fastest signal, should arrive at the minimum a hold time after the latest possible arrival of the same clock: tearrn W + TT + H W H after t= There are two possible scenarios: T T (a) signal was latched early and it passed through a fast path in the logic (b) it arrived early while the Latch was transparent and passed through the fast Latch and fast path in the logic. { tel + m tearr + m} Lm t = min, + passed Nov. 4, Early ignal Arrival Analysis (ingle Latch Based ystem) The earliest arrival of the clock t EL happens when the leading edge of the clock is skewed to arrive at T L. Thus, the condition for preventing race in the system is expressed as: min { T +, t + } + W + T H L m EArr m Lm T + The earliest possible arrival of the clock, plus clock-tooutput delay of the Latch has to occur earlier in time than early arrival of the data, thus: TL + m + Lm W + TT + H which gives us a lower bound on the signal delay in the logic: Lm > LmB W + T + T + H T L m Nov. 4, Early ignal Arrival Analysis (ingle Latch Based ystem) The conditions for reliable operation of a system using a single Latch are: P m = P LM P LM + M Lm > LmB + T M + T L L + T W + T + T + H T + U W m the increase of the clock width W may be beneficial for speed, but it increases the minimal bound for the fast paths Early ignal Arrival Analysis (ingle Latch Based ystem) Maximum useful value for W is obtained when the period P is minimal: opt W = T + T + U + L T M M ubstitute the optimal clock width W opt we obtain the values for the maximal speed and minimal signal delay in the logic which has to be maintained in order to satisfy the conditions for optimal single-latch system clocking: P LM + M = ( T + T ) + H + U + LmB T L M m M In a single Latch system, it is possible to make the clock period P as small as the sum of the delays in the signal path: Latch and critical path delay in the logic. This can be achieved by adjusting the clock width W, while taking care of LmB Nov. 4, Analysis of a ystem with two-phase lock and two Latches in an M- M arrangement

18 ystem using two-phase clock and two latches in M- arrangement ystem using two-phase clock and two latches in M- arrangement L ombinational logic L From the latest signal arrival analysis, several conditions can be derived. First, we need to assure an orderly transfer into L Latch (lave) from the L Latch (Master), even if the signal arrived late (in the last possible moment) into the (Master) L Latch. This analysis yields the following conditions: φ φ W V + U U + M + TT + T L φ W V lock Overlap Period W + W V + U + M + T T + TT Nov. 4, φ W These conditions assure timely arrival of the signal into the L Latch, thus an orderly L-L transfer (from Master to lave) Nov. 4, ystem using two-phase clock and two latches in M- arrangement The analysis of the latest arrival of the signal into L Latch in the next cycle (critical path analysis) yields to the equations: This conditions assure timely arrival of the signal that starts on the leading edge of φ, traverses the path through L, the longest path in the logic and arrives before the trailing edge of φ, in time to be captured. Nov. 4, P + M + M W P + M + M + U + LM + T L + T T P V + M + U + LM + T T + T L The last equation shows that the amount of overlap V between the clocks φ and φ allows the system to run at greater speed. LM ystem using two-phase clock and two latches in M- arrangement If we increase V we can tolerate longer critical path LM. However, the increase of the clock introduce a possibility of race conditions, thus requiring a fast path analysis. High-performance systems are designed with the objective of maximizing performance. Therefore, overlapping of the clocks is commonly employed, leading to the constraint of the minimal signal delay in the logic LmB : The maximal amount of overlap V that can be used is: = T + T + + U For maximal performance, it is possible to adjust the clock overlap V so that the system runs at the maximal frequency. Nov. 4, Lm > LmB = V + H + T T + T L m Vmax T L M M M M- (L-L latch) with non-overlapping clocks Φ and Φ obtained by locally generating clock Φ (This arrangement is also commonly referred to as flip-flop) ource Master - L ET L ource lave - L ET L ombinational logic estination estination Master - L lave - L ET ata Arrival ycle ycle t critical path L is driving through the L and the logic into L L t ET L negative overlapp betwen the two clock phases - no race Example: locking in the first generation of Alpha Processor (W64) Nov. 4,

19 Timing arrangement and Latches used in the firstgeneration Alpha (a) processor Example: locking the Alpha Processor (b) L logic Lg logic Lg L TT TT W TL TL W L opaque L transparent L transparent L opaque P lock skew: T L = T T = ps, for both edges of the clock. Latch L parameters are: clock to delay M = 5ps, m = ps, to delay M = 6ps, setup time U = ps, hold time H = ps. Latch L parameters are: M = 6ps, m = 4ps, M = 7ps, U = ps, H = 4ps. The critical paths in the logic sections and are: LM =p and LM =7ps Nov. 4, Nov. 4, Example: locking the Alpha Processor For the given clock setup: V= and clearly P=W+W L Nov. 4, logic Lg logic Lg L T T T L T L W W L opaque L transparent L transparent L opaque P Example: locking the Alpha Processor Nominal time, t= is set at the leading edge of the clock. The latest allowed data arrival times into latches L and L, respectively: LArr T LArr L The latest arrival time of data into latch L is limited by the time at which latch L releases the data into the logic stage Logic:. t = max t +, T + +. Nov. 4, t W T U t P T U L { } LArr LArr M L M LM logic Lg logic Lg L T T T L T L W W L opaque L transparent L transparent L opaque P Example: locking the Alpha Processor Example: locking the Alpha Processor Nov. 4, Nov. 4,

20 Example: locking the Alpha Processor Example: locking the Alpha Processor Nov. 4, Nov. 4, igital system using a single-phase clock and dual-edge triggered storage elements igital system using a single-phase clock and dual-edge triggered storage elements T L E ritical path ombinational logic T T E T L W=w*P P T T P-W=(-w)P Nov. 4, Two-stage dual-edge-triggered system tage Allowed clock period as a function of the clock duty cycle in the dual-edge-triggered system E U T T T H T Lm ombinational logic ombinational logic tage LM M,T U L = T L m,l Lm H L E M,L LM lock Period P [ns] Trailing-edge etup Time equirement.4ns.5ns w Opt =.479 Allowed region Optimal point Leading-edge etup Time equirement lock uty ycle w Nov. 4, m,t Nov. 4,

21 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter 5: High-Performance ystem Issues Wiley-Interscience and IEEE Press, January Absorbing lock Uncertainties locking in high-performance digital systems is most seriously affected by clock skew and clock jitter. We will treat both of them as clock uncertainties. Trends: elative portion of the clock cycle budgeted for clock uncertainty increases. lock distribution becomes progressively difficult due to: load mismatch Process, voltage, and temperature variations. The clock uncertainties occupy increasing portion of the cycle time; typically FO4. The ability to reduce impact of these uncertainties is one of the most important properties of the highperformance system. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic lock Generation and istribution Non-idealities Jitter Jitter is a temporal variation of the clock signal manifested as uncertainty of consecutive edges of a periodic clock signal. It is caused by temporal noise events Manifested as: - cycle-to-cycle or short-term jitter, t J - long-term jitter, t JL Mainly characteristic of clock generation system kew Is a time difference between temporally-equivalent or concurrent edges of two periodic signals aused by spatial variations in signal propagation Manifests as E-to-E fluctuation of clock arrival at the same time instance haracteristic of clock distribution system Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic ef_lock lock Uncertainties eceived lock t V_LK t skew t jit t skew t + jit lock tv_lk uncertainty: T jitter+skew Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Output of a flip-flop in the presence of clock jitter: Partovi et al. 996 lock Uncertainty Absorption Using oft lock Edge Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 A recent design of a Flip-Flop, controlled by a narrow, locally generated clock pulse, with negative etup Time exhibits some degree of clock uncertainty absorption. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

22 Properties of Time-Window based Flip-Flops The relationship between the clock and the output, in the presence of clock jitter, shows how the variation in the arrival time of the clock is somewhat absorbed by the Flip-Flop resulting in smaller variation of the time at which the output changes. This behavior can be explained as follows: If the capturing pulse is sufficiently wide, the Flip- Flop exhibits short transparency to the data signal, resembling Latch behavior, and its timing is less sensitive to the clock arrival. This short transparency period is also known as the soft clock edge. With increase in importance of clock uncertainties, the practical use of the clocked storage element in highperformance systems will to a large extent depend on its ability to absorb them. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 ata-to-output characteristics in the presence of clock uncertainty - delay [ps] t U M m Nominal arrival time [ps] ata-to-output elay versus lock Arrival Time when the data arrival time is constant. When no clock uncertainties are present, the clock is scheduled to arrive so that - delay (t m ) is smallest, in order to minimize the E overhead. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 ependence of data-to-output delay on clock arrival Nominal - Worst-case lock uncertainty U t Early - Late - T Nominal= Timing Analysis with lock Uncertainty Absorption m M The key role of a E is to minimize the propagation of clock uncertainty to the E output: M = max [ ( UOpt + t)], t [ tu /, tu / ] t Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Total delay versus clock uncertainty m + ( α ) t U U + LM P t U =ps t U =ps U Opt =-5ps ritical race in the presence of clock uncertainty Logic Buf B A Fast B Buf path Buf A M =ps ps U Opt =ps M =6ps 44ps (a) t U =ps (a U =9%) (b) t U =ps (a U =56%) We formulate clock uncertainty absorption α U of a storage element as the portion of the total clock uncertainty not reflected at the output: αu t = ( M m ) = tu tu U Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic A B B, skew t U Buf B, Buf jitter lock uncertainties induced prior to A have no effect to fast paths Early arrival + H + t Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic t U Late arrival Lm Hold time violation H m Lm U

23 Idealized - delay characteristic as a function of clock arrival - delay [ps] =8ps delay [ps] Time Borrowing early nominal late Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Time Borrowing We define time borrowing as: ynamic time borrowing the essential condition is that there are no hard boundaries between stages, i.e. that the storage elements are transparent at the time when data arrives. Occurs in two clocking styles, level sensitive and soft-edge clocking. tatic time borrowing a technique of intentionally delaying the clock by inserting delay between clock inputs of the clocked storage elements. The clocks are scheduled to arrive so that the slower paths obtain more time to evaluate, taking away the time from faster paths. It can operate with conventional hard-edge Flip-Flops. Also called opportunistic skew scheduling Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Timing of two-phase level-sensitive pipeline with time borrowing tage tage a b c d e f g h tage b tage a tage a tage b L L L L 4 L 5 Φ Φ Φ Φ Φ P/ P/ Φ Φ L + tagea < P/ a d d L + tageb> P/ b d d c d d L + tagea> P/ d d d e d d L 4 + tage b delay < P/ f d d g d d Borrowed time of tagea h d d Total borrowed time at node f Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 Timing Analysis with Time Borrowing: Late ata Arrival The arrival time of the input to the subsequent Latch t,i+ is equal to the sum of the arrival times of the input to the preceding Latch t,i, ata-to- Output delay of the Latch,i, and logic delay LM,i of the current stage: t, i+ = t, i +, i + LM, i The arrival of input to (N+)th Latch (input to (N+)th stage) is: N, +, i + LM, i ) i= (5.9) t N = t, + ( (Assumption: All logic blocks are used in time borrowing) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Timing Analysis with Time Borrowing: Late ata Arrival We assume that the after N stages, the pipeline produces data at the same point in the transparency period of clock phase Φ at which the input data was acquired in the first clock cycle. Therefore, t,n+ -t, is equal to N clock periods P. (5.) t,n + t, = NP ombining Eq. (5.9) and Eq. (5.), we obtain the requirement for minimum clock period under late data arrival: N (5.) P = (, i + Logic, i ) N i= It shows that the minimum clock cycle time of the pipeline is not determined by the delay of the slowest stage in the pipeline. It is rather the average delay of the logic and Latches through all stages. Note that Eq. (5.) is valid only if the data arrive to the Latch during the time it is transparent. i i (5.) P, i +, i < t, i < P U i, i [..N] where it is assumed that the first leading edge of Φ occurs at time zero. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8

24 Fast-path hazard d e tage a f g L L 4 Early ata Arrival Φ Φ d Φ Φ In fast paths, analysis must assume that the data arrives at earliest possible time -> disregard effects of time borrowing e Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 f earliest arrival, no time borrowing, hold-time violation H late arrival due to time borrowing, no hold time violation Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Time borrowing and signal loops mux tatic Time Borrowing ALU The timing of signals in the loops, should be treated separately. If the overall propagation delay through the loop occur later with each cycle, it will result in a etup Time violation. Any signal loop that borrows time from itself will eventually cause a timing violation. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Opportunistic skew scheduling scheme ef, tage Logic E E lower Faster tage tage Borrowed time = tage Logic E Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Opportunistic skew scheduling scheme Advantages: It can operate with conventional Flip-Flops. It places fewer constraints onto the circuit design, allowing longer critical paths where necessary. isadvantages: It increases the complexity of the clock distribution system. It is hard to control the inserted delays over process, supply and temperature variations. The analysis of clock skew is also complicated in this asymmetric clock distribution network. While all these difficulties make this technique impractical on a large-scale level, it is very useful in localized critical paths where every improvement directly increases the system clock rate. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4

25 Time Borrowing and lock Uncertainty Both clock uncertainty absorption and dynamic time borrowing use the property of the E to reduce effect of indeterminate ata-to-lock delay to ata-to-output delay: for clock uncertainty absorption, indeterminate ata-to-lock delay occurs due to the uncertainty of clock arrival, for time borrowing it takes place due to uncertain data arrival. Level-ensitive locking In both cases, the transparency of the storage elements is used to suppress the input uncertainty (either that of clock or data). The clock uncertainty absorption and time borrowing are essentially equivalent properties If a clocking strategy allows dynamic time borrowing, it will also be capable of absorbing the clock uncertainty. oft-clock edge property of the Flip-Flops, which is accountable for the clock uncertainty absorption, can also be used for time borrowing between the stages. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 Φ lock uncertainty immunity in the pipeline with level-sensitive clocking etup time requirement (slow paths) T W / t U T TLM T L U T T L T T T B t W t B T TM W/ U U U - t= t=t A () Hold time requirement (fast paths) + TT Φ + V H To keep latch transparent when data arrives: TL tb + + W / Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 T L, Φ, t Φ m Lm Impact of clock uncertainties on the critical path in the pipeline with time borrowing tage tage tage Logic a Logic b Logic a Logic b Logic a Logic b L L L L 4 L 5 L 6 L 7 Φ Φ Φ Φ Φ Φ Φ edge edge8 (late arrival) P (early arrival) Φ W edge edge edge4 edge5 edge6 edge7 Φ Logic a, L, Logic b, L, Logic a 4 Logic b, L 5, Logic a, L 6, Logic b U 4 7 P + W - U t tu + + ( ) ( U P + W ) + + U W tu P + Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 Effects of clock uncertainties to a timeborrowing system ecreasing of the margins for time borrowing. The pipeline absorbs the uncertainties for the data that arrives during the transparency period of the Latch. The effect of the uncertainties is reduced to an average uncertainty over all stages in the path. oft-edge ensitive locking Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

26 Time borrowing with uncertainty-absorbing clocked storage elements tage tage Logic Logic E E E t U t U = ( tb + tu )[ α U ] Nominal Logic delay t arrival Nominal Logic delay Actual Logic delay Actual Logic delay Borrowed time (t B ), actual arrival due to time borrowing nominal t U arrival t U t U Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

27 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter 6: Low-Energy ystem Issues E switching = N i= α witching Energy lock is the highest activity signal witching energy is dominant ( i) i Vswing ( i) V α- probability of a - transition i total switched capacitance at node i V swing (i) voltage swing at node i V supply voltage N number of nodes Energy is best reduced by scaling down V, but this also means performance degradation Wiley-Interscience and IEEE Press, January Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Pulsed esigns cale the Best with V (a) (b) (elay is normalized to FO4 at its respective V. FO4 as defined in this example increases with V performance degradation) Impact of Vdd on (a) delay, and (b) internal race immunity (.5 µm, light load). (Markovic et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Other Energy eduction Techniques Low-wing ircuit Techniques educed-swing drivers E redesign N-only Es with Low-Vcc lock Gating Global Local ual-edge Triggering Latch-mux Pulsed-latch Flip-flop Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Other Energy eduction Techniques Low-wing ircuit Techniques educed-swing drivers E redesign N-only Es with Low-Vcc lock Gating Global Local ual-edge Triggering Latch-mux Pulsed-latch Flip-flop V GN Low-wing locking, Option #: lock river e-design p PT NT n p PB NB n 5% power reduction with half-swing clock (minus some penalty in clock drivers) A H-V B V V thp V thn GN PT NT lock driver for half-swing clocking (Kojima at al. 995), opyright 995 IEEE PB NB Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

28 PMO does not fully turn-off Low-wing locking, Option #: E e-design V well > V V V E (c) ~(V -V th ) (V -V th ) n (c) (V -nv th ) lock drivers educed clock-swing flip-flop (Kawaguchi and akurai, 998), opyright 998 IEEE (a) (b) E (a),(b) ~V (V -V th ) V -Low (V -Low ) Low-wing locking, Option #: N-only Es M M N-only clocked transistors, M- Latch Example (N and N improve pull-up on M ) N-Only clocked M- latch N N N N 4 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 Other Energy eduction Techniques Low-wing ircuit Techniques educed-swing drivers E redesign N-only Es with Low-Vcc lock Gating Global Local ual-edge Triggering Latch-mux Pulsed-latch Flip-flop (a) In Load lock Gating, Option #: Global lock Gating Time-mux (no gating!) EG EN Global Gating Used to save clocking energy when data activity is low (a) Nongated clock circuit, (b) gated clock circuit. (Kitahara et al. 998), opyright 998 IEEE (b) In EG Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic lock Gating, Option #: Local lock Gating P P M Pulse Generator PI P P P lock ontrol P Local lock Gating, Another Example * * ' ' N N P P P ata-transition Look-Ahead ' ' ata-transition look-ahead latch (Nogawa and Ohtomo, 998), opyright 998 IEEE onditional capture flip-flop (Kong et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

29 Other Energy eduction Techniques Low-wing ircuit Techniques educed-swing drivers E redesign N-only Es with Low-Vcc lock Gating Global Local ual-edge Triggering Latch-mux Pulsed-latch Flip-flop ual-edge Triggering, Option #: Latch-Mux Used to save clocking energy regardless of data activity! ual-edge-triggered latch-mux design Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 ET Latch-Mux: ircuit Example ual-edge Triggering, Option #: Pulsed-Latch Pulse Gen Pulse Gen ual-edge-triggered latch-mux circuit (Llopis and achdev, 996), opyright 996 IEEE ual-edge-triggered pulsed-latch design Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 ET Pulsed-Latch: ircuit Examples ual-edge Triggered Flip-Flop (a) (b) L Pulsed-latch: (a) single-edge-triggered; (b) dual-edge-triggered ual-edge-triggered flip-flop design Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8

30 ET Flip-Flop: ircuit Example lock istribution st stage: PG Latch X L st stage: PG Latch Y s = torage Element s/ L- X Y s PLL s/ L- M/4 L- storage elements ET symmetric pulse-generator flip-flop Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 H-tree clock distribution network Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic locking Power: ET vs. ET P (ET) / P (ET) ingle edge better ouble edge better E,E / WIE-L Glitch obust esign locking power in single- and dual-edge-triggered systems Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Average Glitching Energy in Es omparison of E glitching and E switching omparison of average glitching energy in Es (Markovic at al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Glitching energy as a percentage of switching energy in representative Es showing the greatest glitch sensitivity of the gated designs Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4

31 ummary Energy best reduced by V scaling Penalty in performance educing swing only reduces E till penalty in performance lock gating educes E at low-activity No penalty in perf. if gating is outside crit-path ual-edge Triggering educes E ideally by x mall or no performance degradation Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5

32 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter 7: imulation Techniques Wiley-Interscience and IEEE Press, January imulation Techniques esults and conclusions about the performance of different Es depend significantly on the simulation setup and evaluation environment. E is just one of the elements in the pipeline, and has to be sized in such a way as to achieve optimum performance for given output load. The E output loads vary a lot across the processor core, depending on the level of parallelism in each unit and also whether the E is on the critical path or not. In modern data-paths Es experience heavy load due to the parallel execution units and increase in interconnect capacitance. It is the performance of these Es, on the critical path, that has the highest impact on the choice of the processor cycle time. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic imulation Techniques In high-speed designs, the design and evaluation of Es is focused on the elements on the critical path and often implicitly assumes such conditions during performance comparisons. There are a lot of Es that are placed on non-critical paths with relatively light load. While these Es do not directly impact the performance of the processor, careful design of these elements can significantly reduce the energy consumption and alleviate the clock distribution problems. The purpose of this presentation is to recommend simulation techniques that designers can use to evaluate the performance of Es, depending on the desired application. We try to build the understanding of the issues involved in creating a simulation environment of the E, so that such information can be used to build own setups tailored to the specific application. There is no universal setup that is good for all E applications. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic The Method of Logical Effort Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Logical Effort omponents Logical Effort Basics In W p /L p W n /L n Out ontribution of Bart Zeydel, Ph candidate AEL group University of alifornia avis Template Arbitrary ize Input apacitance increases by α template esistance decreases by template / α Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 6

33 Logical Effort Input apacitance ox = εox / tox Logical Effort esistance = (ρ / t) (L / W) ohms Leff = L xd = sheet (L / W) ohms gate = oxwleff = oxwl( xd / L) channel = sheet (L / W) ohms κ = ox( xd / L) sheet = / ( µox ( Vgs Vt )) ohms template(inv) = κwnln + κwplp κ = ox (Vgs Vt) channel = L / (κ µ W) ohms in(inv) = α template(inv) up(inv) = up-template(inv) / α Input apacitance is the sum of the gate capacitances Using LE, α denotes size in multiples of the template Nov. 4, µ = surface mobility down(inv) = down-template(inv) / α esistance is dependent on transistor size and process Larger values of α result in lower resistance ontribution of Bart Zeydel, Ph candidate AEL group 7 Nov. 4, Logical Effort Parasitics ontribution of Bart Zeydel, Ph candidate AEL group 8 Model for MO logic gate junction capacitance per µ periphery capacitance per µ width of diffusion region µ length of diffusion region µ ja jp W Ldiff j = ja W Ldiff + jp ( W + Ldiff ) Larger values of α result in increased parasitic capacitance, however decreases at same rate, thus constant delay Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 9 Nov. 4, I linear t f out ( sat ) = W (VG Vt ) L W V = µ ox (VG Vt ) Vout out L t f out = t = t s out ( sat ) + t s out ( linear ) Vdd out outvt dv = W W (V Vt ) Vdd Vt out µ ox (V Vt ) L L out µ ox µ ox Vdd dvout W Vdd Vt V (V Vt ) VOUT OUT L Vt out 4 (V Vt ) V + ln W V (V Vt ) V Vt L ubstituting in to obtain Vt V + ln 4 t t = out V V V t Vdd Vt out dv + out dv Vdd / I Vdd Vt I sat linear Vdd t f out = ontribution of Bart Zeydel, Ph candidate AEL group µ ox t f out ( linear ) = t s in = Nov. 4, LE Model derivation (cont.) LE elay derivation for step input I sat = µ ox ontribution of Bart Zeydel, Ph candidate AEL group Nov. 4, κ ontribution of Bart Zeydel, Ph candidate AEL group

34 Logical Effort Gate elay Model d = κ + d d in ( out parasitic ) template out template = κ in + ( parasitic κ α α in α out = (κ template template) + κ in g = template inv template inv τ = κ template Harris, utherland, proull Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group h = inv inv out d = τ ( g h + p) in p = parasitic template inv inv ) parasitic Input apacitance: Logical Effort: Parasitic elay: Gate in g p LE Path elay Gate tage Effort: f f f Gate = [( gh + p ) + ( gh + p) + ( g h + p )] τ Any n-stage path can be described using Logical Effort Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 4 g p g p out LE Path elay Optimization = [( gh + p ) + ( gh + p) + ( g h + p )] τ h = in h = out hhh = = in Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 5 out in h = = H out = [( g + p ) + ( g + p) + ( g + p )] τ in out LE Path elay Optimization (cont.) By efinition, in and out are fixed. out = [( g + p ) + ( g + p) + ( g + p )] τ olve for and : g = in in g g = g in = f = f f = f Minimum occurs when stage efforts are equal f = f = f Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 6 g = g out out g = g = implified Path Optimization We want the effort of each stages to be equal. F F To quickly solve for F: = f f f... f n fˆ = F Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 7 / n = g h g h g h... g n h n H = h h h... h = n G = g g g... g n F = G H out in Harris, utherland, proull LE elay Model for Path Optimization Input slope effects on g and p are ignored Parasitics assumed independent of input slope Therefore have no effect on optimization result Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 8

35 LE lopes Assumed for Path Optimization LE Optimization Assumptions lopes are drawn for each gate assuming step input lopes are equal for each gate at completion of optimization f i is constant, thus constant (ignoring parasitics). Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 9 gh is constant when using LE optimization lope variations are due to differences in p and are not accounted for in optimization Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group LE lope Variation Effects Logical Effort for Multi-path in Path a f f f Gate Gate Gate out Path b Gate Gate 4 Gate 5 out Input slope degradation causes output slope degradation ifferent gate types also cause output slope variation From LE, f f 4 f 5 = f f f = f 4 = f5 f = and Minimum delay occurs when a = b or F a = F b (ignoring parasitics) f = = f = f = f = f 4 f5 Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group Logical Effort for Multi-path (cont.) a b = [( g h + p ) + ( gh + p) + ( g h + p )] τ = [( g h + p) + ( g 4h4 + p4 ) + ( g5h5 + p5 )] τ f f f = g g g g g g = f f f o + ba = out out out f f f = g g g g g g 4 5 = f f f Branching: atio of total capacitance to on-path capacitance + b b = 4 5 out Logical Effort for Multi-path (cont) ubstituting and g g g g g g out f f f f f f 4 5 b = a g g g out f f f Assume Minimum elay occurs when f f f = f f 4 f 5 imilarly g gg out + gg 4g5 ba = g g g 4 5 out g g 4 g5 out + g gg bb = g g g out out out Branching = when G a = G b and out = out out Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 4

36 omplex Multi-path Optimization If each path has internal branching, b a and b b are as follows g b gb g b out + gb g 4b4 g5b5 ba = g b g b g b out gbg 4b4 g5b5 out + g b gb g b bb = g b g b g b out out out Note: This solution and previous solution differ from that are described in LE book (all simplify by ignoring parasitics) lopes in LE LE optimization assumes step input for gate Hence, slope variations are not accounted for. pice characterization is invalid for step input derivation of Logical Effort, as g and p relate to non-practical values. Fortunately Logical Effort can be derived assuming equal slopes if we assume no g and p dependence on input slope variation. These improved g and p values can be obtained from spice characterization Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 5 Nov. 4, ontribution of Bart Zeydel, Ph candidate AEL group 6 LE haracterization etup for tatic Gates t LH LE haracterization etup for ynamic Gates t HL Energy t HL Energy In Gate Gate Gate Gate In Gate Gate.. Variable Load Variable Load Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 elay (ps) L.E for Adder Gates Fanout *from Mathew anu /. Harris Inverter tatic M yn PG yn M Logical effort parameters obtained from simulation for std cells efine logical effort g of inverter = elay of complex gates can be defined with respect to d= Nov. 4, From M. anu, Intel Advanced Microprocessor esearch Laboratories 9 Mux Gate type Inverter yn. Nand yn. M yn. M-4N tatic M Mux XO Normalized L.E Logical Eff. (g) Logical effort & parasitic delay normalized to that of inverter *from Mathew anu Parasitics (Pinv) Nov. 4, From M. anu, Intel Advanced Microprocessor esearch Laboratories.4

37 tatic MO Gates: elay Graphs ynamic MO: elay Graphs 9 9 e l ay INV NA N NA N NO AOI OAI elay INV TGXOi TGXOs TGMUXi TGMUXs N N N4 kn kn AOI_A OAI_O G4 P4 4 TBum Fanout Fanout Nov. 4, From V. Oklobdzija,. Krishnamurthy, AITH-6 Tutorial Nov. 4, From V. Oklobdzija,. Krishnamurthy, AITH-6 Tutorial imple logic gates: (a) reference static inverter, (b) two-input static NAN gate, (c) two-input static NO gate, (d) domino-style inverter A Y A B Y A B 4 4 Y A Y Logical effort of gate driving a transmission gate: (a) reference inverter, (b) slow-data input, (c) fast-data input sizing A Inv gate gate Y A 4 Y A Y = ( + ) = A, nand = + ( + ) = 4 Ainv, =, ( 4+ ) = 5 A nor Ainv,, 4 A nand Anor, 5 g = = g = = g = = Ainv, Anand, Anor, Ainv, Ainv, (a) (b) (c) Ainv, = + = Aom, inv g Aom, inv = Aom, inv = (d) Ainv, Agate, = = = ( + ) = A, gate = + ( 4+ ) = 6 A, gate ( 4 ) Ainv, g Ainv, = = Ainv, Ainv, g A, gate Ainv, A, gate (a) (b) (c) g 9 = + + = 4 Agate, = = Ainv, Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Environment etup Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Environment etup everal studies used different simulation setups. We present some of the important concepts and global simulation setup framework that can be further tuned for the particular intended application: The setup for the comparison of the Es in (tojanovic and Oklobdzija 999) used a single size load. Load was chosen that resembles the typical situation in a moderately-toheavy loaded critical path in a processor with a lot of parallelism. All the Es were sized such as to achieve optimum data-to-output delay for given output load, while driven from the fixed size inverters. In most practical situations the Es are designed in a discrete set of sizes, each optimized for a particular load. It is useful to examine the performance of the E for a range of the loads around the load for which the E was optimized. This technique is illustrated in (Nikolic and Oklobdzija 999) ifferent Es are initially sized to drive a fixed load, and the load is then varied. The delay of a E exhibit linear dependence on the load, with the slope of the delay curve illustrating the logical effort of the driving stage of the E and the zero-load crossing illustrating the parasitic delay of the driving stage together with the delay of the inner stages of the E. This is not the optimal behavior of the E delay curve, but is the best that can be achieved when there are only a few E sizes available in the library. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

38 Environment etup In case where E can be re-optimized for each particular load, further speedup can be achieved since the effort can be shared between stages rather than resting solely on the output stage. This approach was illustrated in (Heo and Asanovic ). ontrary to their conclusions, in case when a general performance of a E needs to be assessed, the proper approach is to optimize the E for the most important application that determines the performance of the whole system, not the most frequent application. In high-speed systems the most important are the elements on the critical path which is typically moderately-to-heavy loaded due to branching to parallel execution units and wire capacitance. The small number of critical paths in a processor does not decrease their importance since it is their delay that determines the clock rate of the whole system. The performance of the large number of the lightly loaded Es that are placed off of the critical path is of concern only if it can be traded for energy savings. The simulation approach should attempt to resemble the actual datapath environment. The number of logic stages in a E and their complexity highly depend on particular circuit implementation, leading to differences in logical effort, parasitic delay and energy consumption. Every E structure needs to be optimized to drive the load with best possible effort delay. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 FO4 slope The size of the data input is fixed for all Es in order to exclude the impact of pipeline logic on E comparison. ata signal has signal slope identical to that of a FO4 inverter, which is the case in a welldesigned pipeline. General simulation setup In =const FO4 In =const E FO4 FO4 slope Buffer Load ( ) # stages Out (# stages ) Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 Out This setting is typical in designs where delay and energy requirements are balanced. In high-speed design methodology of Intel, un Microsystems and former igital orp, FO inverter metric is more common than FO4 due to more aggressive design style. General simulation setup The size of the clocked transistors is set to the size needed in order not to compromise the speed of the whole structure. A direct tradeoff exists between the E delay and clock energy (size of clocked transistors), as some of the clocked transistors are always on the critical path of the E. Increase in sizes of clocked transistors on a critical path results in diminishing returns since data input is fixed. epending on the E topology, some structures can trade delay for clocked transistor size more efficiently than others, so we allow this to happen up to a certain extent. Our goal is to examine Es that are used on a critical path, hence the assumption that designer might be willing to spend a bit more clock power to achieve better performance. ifferences in clock loads () among devices illustrate potential drawbacks in terms of clock power requirements, and serve as one of the performance metrics. lock inputs have identical signal slope to that of a FO4 inverter. This can be changed depending on the clock distribution design methodology. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 General simulation setup The question on how to compare differential and single-ended structures has always been one of the key issues among the people characterizing and designing the Es. ifferential and single-ended structures should not be compared with each other, due to the overhead that single-ended structures incur to generate the complementary output. We do not require that single-ended structure generate both true and complementary value at the output. The worst-case analysis requires that the E generates the output that has worse data-to-output delay. However, it is also beneficial to measure both - and - delay. Any imbalance between the two can lead to big delay savings in cases where proper logic polarity manipulation in the stages preceding or following the E can change the polarity requirement of the E, and hence its data-to-output delay. Load model always consists of several inverters in a chain to avoid the error in delay caused by Miller capacitance effects from the fast switching load back to the driver. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 General simulation setup The logical effort framework offers analogy between the E and a simple logic gate. At light load, logic gate is dominated by its parasitic delay, i.e. self-loading. At high load, effort delay becomes the dominant factor. imilarly, at light load, delay of a E with large number of stages is entirely dominated by parasitic delay. However, at high load, more stages are beneficial in reducing the effort delay, which then dominates over parasitic delay. Therefore, the performance of the E is best assessed if it is evaluated in a range of output loads of interest for the particular application. E evaluation can either be performed using some representative critical path load or a set of loads can be used in which case E has to be re-optimized for each load setting. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Additional buffering in simulation test bed In =const In =const G E, F E =f K E E (K stages) g INV =, F INV =f N-K Buffer (N-K stages) Out Out Load (# stages ) epending on the choice of the output load size, some E structures with inherently small number of stages and high logical effort may require additional buffering in order to achieve the best effort delay. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4

39 Additional buffering in simulation test bed For each E we need to find optimal effort per stage and number of stages to drive the required load. tarting from total electrical fan-out H, optimal number of stages N is obtained through rounding of the logarithm of the total effort of the path (assuming g INV is unity): OUT H = N = round ( log 4 ( G H E )) IN The logarithm is of base 4 since stage effort of 4 is a target for optimal speed. Once the integer number of stages is obtained, the updated value of stage effort is found from: N f = G H E HLFF izing Example After the stage effort is obtained, E internal stages have to be resized for new stage effort and also the external inverters, if there are any. This sizing approach is optimal even in case no additional inverters are required, since it will serve to distribute the effort between the internal stages of the E. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 44 HLFF izing Example In this problem we observe the change in minimum data-tooutput (- or - ) delay as the output load of the E increases: Before we start investigating the effect of different loads on the sizing of the HLFF, we show how the logical effort can be calculated for the given sizing. It is relatively easy to recognize that the HLFF is built up of three-input static NAN gate as the first stage and domino-like three-input NAN in the second stage. Minor variations from standard static NAN sizing for equal logical effort on all inputs are needed to speed up the data input and enable the first stage to evaluate before the transparency window closes. imilar situation occurs in the second stage. This HLFF sizing example also illustrates the application of logical effort to skewed gates (gates in which one output transition is faster than the other), and gates with keepers. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 45 HLFF sizing example ( ) ( ) + g 4 = =, ( + ) f, 4 4+ = = ( 4+ ) 7 = = ( + ) ( + 4) + ( + ) = = Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 46 g, f, 4 Load 4 8x The critical path of the HLFF is exercised with a "-to-" transition at data input. The first stage of the HLFF is a skewed NAN gate. HLFF sizing example The logical effort calculation of the second stage is slightly more complicated because of the keeper inverter pair. A keeper sinks a portion of the current that is sourced by the PMO transistor to node which can be taken into account as negative conductance. This negative conductance accounted for by subtracting the conductance of the NMO transistor () of the shaded keeper inverter, from the conductance of the driving PMO transistor (/). For the particular load, efforts per stage were calculated to be 4.7 and 4.5, which is near the optimum value of 4, indicating that this example sizing is nearly optimal. HLFF sizing example Load 4 8x 4 The sizing in example above is somewhat simplified because the short channel stack effect has not been taken into account, the logical effort values for the NMO transistor stack are somewhat pessimistic. Once the logical effort of each stage is known it can be used to adjust the sizing of each stage as the load is increased or decreased. The alternative method is to use one of the automated circuit optimizers, however, it is not recommend it as initial method, simply because it is essential that designer gets to know the circuit through manual sizing and logical effort estimation. This builds intuition about the circuit and ability to verify optimizer results. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 47 ( ) ( ) + 4 g, = = ( + ) f, 4 4+ = = ( 4+ ) 7 = = ( + ) ( + 4) + ( + ) = = The critical path of the HLFF is exercised with a "-to-" transition at data input. The first stage of the HLFF is a skewed NAN gate. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 48 g, f,

40 HLFF elay (normalized to FO4 inverter delay) vs. Fanout for ifferent HLFF ell izes izing versus load, HLFF example: linear scale Fanout Load-ize (#stages) elay [FO4] 4 A B mall-a () Medium-B () Best sizing vs. load Large- (+) Fanout There is only one optimal solution for each load size. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 49 According to the logical effort theory, the optimal delay versus fan-out curve should have logarithmic shape, which indeed holds for the "best sizing vs. load curve. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 izing versus load, HLFF example: log4 scale imilarly, the optimal delay is a linear function of the logarithm of the electrical effort (fanout), as shown. The logarithmic fanout scale makes it easy to see if the stage effort is properly determined. elay is linear function of stage effort and number of stages. The logarithm of the electrical effort approximately illustrates the needed number of stages, and if the delay checks out to be multiple of number of stages and FO4 delay, then the optimal effort per stage is chosen. 5 elay [FO4] 4 log4(fanout) A B Best sizing vs. load ase of Modified AFF Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 M-AFF elay vs. Fanout for ifferent M-AFF ell izes Fanout Optimal Load-ize (#stages) mall-a () Medium-B () Large- (+) In this case we also observe the minimum data-to-output (-) delay as the output load of the E increases. The performance of three different sizing solutions is illustrated versus the electrical fanout, normalized to the delay of the FO4 inverter. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Modified AFF The sizing is done in a way similar to that described in the HLFF example. We recognize that the logical effort of the input stage is very small, better than that of an inverter, because of the small input capacitance. This implies that the sizing changes will mostly be located in the output stage since the input stage can accommodate larger load variations without the need for resizing. While it was relatively easy to find different sizes that perform better at certain loads, in the case of HLFF, it was not so in the case of M-AFF. The small logical effort of the whole structure enables it to cover a huge range of loads with a single size achieving relatively good performance. This is the case with structure of size B in Table. ize A is only slightly better than size B, and only for very light load of FO4, and then size B device takes the lead all the way up to the FO64 after which additional inverter is needed to prevent excessive delay. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 54

41 elay [FO4] izing versus load, M-AFF example: (a) linear, (b) log4 scale.5 Fanout 4 6 (a) 8 4 A B Best sizing vs. load 5 elay [FO4] Best sizing vs. load log4(fanout).5 (b).5.5 -ize A is only slightly better than size B, and only for very light load of FO4 - ize B device takes the lead all the way up to the FO64 after which additional inverter is needed to prevent excessive delay. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic A B Energy Measurements While we were concerned about the performance aspects of the simulation setup, it is very important to prepare the simulation environment correctly such that the energy parameters of the E are measured accurately. We only need set the measurements to capture the energy for each of four possible binary transitions. With these values accurate average energy estimates can be made based on the statistics of the incoming data. More formal methods, using state transition diagrams (Zyban and Kogge 999), can be used to exactly evaluate the effect of regular transitions and glitches on total switching energy of the E. It is essential to provide separate supply voltages for different stages of the E in order to measure different energies. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 56 Automating the imulations The delay vs. load E evaluation described in the examples can be implemented automatically. Perl is suggested as one of the most convenient scripting languages today. For each E, we need to determine: the logical effort of every stage based on its topology (e.g. NAN-like stages, inverter stage, would be 4/, 4/, ), or better yet, exact logical-effort values obtained from the simulation. FO4 delay and other data for a given technology process. The product of logical efforts of all stages should equal the total logical effort of the E. After total logical effort is found, optimal number of stages and updated stage effort can be calculated. With stage effort and logical efforts obtained from the topology of the E, taking the data input of fixed size, and assuming that the clock is on (i.e. treating the structure as cascade of logic gates), transistor sizes for every stage can be calculated, progressing from the data input to the final load in the simulation setup. When a library of Es is created, a pre-simulation should be run for each environment parameter setup. This includes various process corners, supply voltages, etc., to determine the FO4 inverter slope and set that value as the rise/fall time of signals that drive data and clock into the E. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 57 E flow simulation reate a library of Es for a few fixed loads elect the device from the library. For each device in the library, -( ) delay and -( ) delay are stored in each run, decreasing the delay between the edge of the input data and clock edge (setup time). and sweep the. cript should check for the setup/hold time load failure (i.e. when the E fails to pass the input value to the output). weep - - This is typically detected by the large - delay and measure -, (i.e. measurement target occurred in the next cycle) or failure to measure delay if only one cycle is simulated. Find min( - or - ). The script automatically finds the minimum delay point at all the specified outputs. Plot min( - or - ) vs. 4. The whole procedure is repeated for a range of loads and the best sizing curve. load for each device in the library The Appendix A of this book contains an example script written in Perl that can serve as a basis of a more sophisticated tool for E characterization. In addition it also contains example spice decks for HLFF and M-AFF used in this example. These files are a good start for a designer who wants to evaluate various E topologies. Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 58

42 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter 8: tate-of-the-art locked torage Elements in MO Technology Master-lave Latches Flip-Flops E s with local clock gating Low clock swing ual-edge triggering Wiley-Interscience and IEEE Press, January Nov. 4, Transmission gate latches Transmission Gate Master-lave Latch (ML) implest implementation Basic static latch omplete implementation M M (a) (b) (c) - only 4 transistors -ynamic when = -usceptible to noise - pull-up/pull-down keeper - onflict at node whenever new data is written - Feedback turned off when writing to the latch - No conflict - Larger clock load Nov. 4, Master Transmission Gate Latch ML with unprotected input (Gerosa et al. 994), opyright 994 IEEE lave Transmission Gate Latch Nov. 4, 4 Transmission Gate M Latch (continued) Noise obustness of M Latch M M removed Protection from input noise istant driver V noise on input leakage α-particle and cosmic rays 4 unrelated signal coupling power supply ripple V ML with input gate isolation (Markovic et al. ), opyright IEEE Nov. 4, 5 ources of noise affecting the latch state node (Partovi in handrakasan et al. ), opyright IEEE Nov. 4, 6

43 locked MO ( MO) Latch locked MO ( MO) M Latch M removed Transmission gate latch with gate isolation (dynamic) MO latch (dynamic) tate-keeping feedbacks outside the -to- path (uzuki et al. 97), opyright 97 IEEE Nov. 4, 7 Nov. 4, 8 M Latches: omparison elay () and ace immunity () MO: larger clock transistors: -maller delay and race immunity (8% of ML) -Higher energy consumption (.4x more than ML) Energy per cycle Master-lave Latches Flip-Flops E s with local clock gating Low clock swing ual-edge triggering Nov. 4, 9 Nov. 4, Hybrid Latch Flip-Flop (HLFF) emidynamic Flip-Flop (FF) I Transparent to only when and are both high Limited clock uncertainty absorption mall delay mall clock load (Partovi et al. 996), opyright 996 IEEE Nov. 4, ynamic-style first stage Fast, small clock load, logic embedding onsumes energy for evaluation whenever = ynamic-to-static latch in second stage tatic hazard (Klass 998), opyright 998 IEEE Nov. 4,

44 tatic hazard in FF If == in previous cycle, race between and causes to falsely switch to generated glitch Also seen in HLFF Nov. 4, ense-amplifier Flip-Flop (AFF) When =, and are high, and unchanged At rising edge of sense amplifier in st stage generates a low pulse on either or, based on which of and is higher Other node or is driven high, preventing further changes Latch captures low level of or and updates output Both NAN gates must sequentially switch to change and Original design (Montanaro et al. 996), opyright 996 IEEE Nov. 4, 4 AFF: Evolution of nd tage Latch all-n-mo push-pull (Gieseke et al. 99); complementary push-pull (Oklobdzija and tojanovic ) complementary push-pull with gated keeper (Nikolic et al. 999). Modified ense Amplifier Flip-Flop (MAFF) ense amplifier in st stage generates a low pulse on either or, based on which of and is higher ymmetric latch in nd stage outputs are simultaneously pulled to Vdd and Gnd fast Large drive capability can be small Keeper in latch active only when there is no change No conflict Nov. 4, 5 (Nikolic et al. 999), opyright 999 IEEE Nov. 4, 6 elay [FO4] Flip-Flops and M Latches: elay omparison ( ) ML MO HLFF FF AFF M-AFF M Latches are slow positive setup time, two latches in critical path AFF is slow: it waits for one output to switch the other Fastest structures are simple flip-flops with negative setup time E delay comparison (.8 µm, high load) Nov. 4, 7 Flip-Flops: Timing omparisons with Voltage caling elay comparison: Internal race immunity comparison: - elative delay reduces with - mall race immunity, usually not a supply voltage due to concern in critical paths reduction of body effect (.5 µm, light load) Nov. 4, 8

45 Energy [fj] Flip-Flops and M Latches: Energy omparison Ext. clock Ext. data Int. clock Internal non-clk Flip-Flops and M Latches: Energy omparisons ML MO HLFF FF AFF M-AFF In M Latches, internal nodes change only when input changes AFF, M-AFF: very small clock load, small nd stage latch Most energy consumed in HLFF, FF with pulse generator and high internal switching activity E energy breakdown (.8 µm, 5% activity, high load) Nov. 4, 9 (.5 µm, light load) (Markovic et al. ), opyright IEEE Nov. 4, Gated Transmission Gate M Latch Master-lave Latches Flip-Flops E s with local clock gating Low clock swing ual-edge triggering oncept: inhibit clock switching when new = comp= XNO If comp= ( ), circuit works as ML If comp= (=), =, = latches closed, no output change, no internal power M comp M.5 *.5 comp.5.5 *.5.5 Nov. 4, Gated ML (Markovic et al. ), opyright IEEE Nov. 4, Gated TG M Latch: Timing and Energy ata-transition look-ahead latch P P P M P Pulse Generator PI P lock ontrol P P etup time (U) and Hold time (H) comparison with ML Energy comparison with ML Increased etup time in gated ML due to inclusion of the comparator into the critical path slower than conventional ML maller energy per transition if switching activity of is <. For higher switching activity, comparator and clock generator dominate the energy consumption Nov. 4, P P ata-transition Look-Ahead Pulsed latch in which the generation of clock pulses are gated with XO TLA circuit If P =, circuit operates as a conventional pulsed latch If = P = P=, no output change or energy consumption in the latch XO circuit and lock ontrol in the critical path large setup time and - delay (Nogawa and Ohtomo 998), opyright 998 IEEE Nov. 4, 4

46 TLA-L: Analysis of Energy onsumption P P M P P P P TLA-L without clock gating α α E = ( E + E ) + E + E ML in = E idle + ELK + EL + E E E + int ext = E idle + ELK + EG + Eint E + E N P P PG E idle = E = E = + E in in () Nov. 4, 5 E L FF Pulse Generator PI TLA-L Pulse Generator α = ( E α + E ) + E α input switching activity Pulse generator shared among N TLA-L s idle Energy comparison of TLA-L and ML E(TLA-L) < E(ML) E(TLA-L) > E(ML) TLA-L is more energy-efficient than ML when N> and α<.5 Nov. 4, 6 Pulsed latch in which the generation of clock pulses are gated with XNO TLA circuit If XNO=, P when, and P after has changed to If = XNO= P=, no output change or energy consumption in the latch lock-on-demand PL ata-transition Look-Ahead Pulse Generator includes clock control Pulse Generator can not be shared among multiple PL s (Hamada et al. 999), opyright 999 IEEE XNO Nov. 4, 7 P P P P P Energy-Efficient Pulse Generator in O-PL XNO inv XNO " " " " int " " "" " " XNO "" inv ompound AN-NO traightforward implementation with MO gates int switches in each cycle Energy-inefficient P ompound AN- NO gate Energy-efficient Nov. 4, 8 Impact of circuit sizing on the energy efficiency of O-PL O-PL more effective in high-speed sizing due to large clock transistors (Markovic et al. ), opyright IEEE Nov. 4, 9 onditional capture flip-flop First stage: pulse generator with internal clock gating When =, == When =, =, can switch low if =, =, can switch low if =, = Otherwise, == no energy consumption econd stage: pass-gate implementation of M-AFF latch (Oklobdzija, tojanovic) No setup time degradation due to clock gating * * N N (Kong et al. ), opyright IEEE Nov. 4,

47 omparison of latches and flip-flops with local clock gating: Timing omparison of latches and flip-flops with local clock gating: Energy, EP elay comparison: Internal race immunity comparison: - elay relatively constant with supply - Generally (FF)< (ML)< (gated ML) voltage - O-PL has low race immunity due to - Latches with clock gating have very wide clock pulse large delay due to large setup time (Markovic et al. ), opyright IEEE Nov. 4, Energy comparison: - Latches with gated clock consume less energy than ML if α <.. Energy-elay Product comparison: α <. G-ML best -. < α <. TLA-L best -. < α onventional ML best (Markovic et al. ), opyright IEEE Nov. 4, N-only clocked latches Master-lave Latches Flip-Flops E s with local clock gating Low clock swing ual-edge triggering P M M P N-PL (c) d N N N-ML (a) P oncept: Bring clock only to n-mo transistors to allow reduced clock swing without conflict with partially turned-off p-mo transistors educed clock swing reduces clocking energy with some penalty in performance lock is always in critical path as its edge signalizes when to change the output N-PPL (d) P Pulse Generator (b)-(d) N-FF (b) P (a) conventional TG ML, (b) pulsed-latch, (c) conventional PL, (d) push-pull PL Nov. 4, Nov. 4, 4 Low clock swing E s comparison: energy and delay Full-swing: PL preferred for high-speed ML preferred for low energy Low-swing clock: N-FF preferred for high-speed N-PPL is preferred for low energy Energy / cycle (norm) Energy / cycle (norm) N-PPL N-ML ata-to- delay (FO inverter delay) nm technology, 5fF load, max. input cap=.5ff, data activity=.: (a) high-v dd and (b) low-swing Nov. 4, 5 PL N-PPL N-FF N-FF ML N-PL High-Vdd Low-wing Effect of clock noise on low-swing clock latch delay - delay degradation % 6% % 8% 4% N-L N-PPL N-FF % % % 6% 9% % Noise on low-swing clock All latches fail for clock noise > % of clock voltage N-FF gives best clock noise rejection Nov. 4, 6

48 ET Latch-mux circuit (ET-LM) Master-lave Latches Flip-Flops E s with local clock gating Low clock swing ual-edge triggering Pass-gate latches: One transparent when = One transparent when = Pass-gate multiplexer that selects the output of the opaque latch Nov. 4, 7 (Llopis and achdev 996), opyright 996 IEEE Nov. 4, 8 MO Latch-mux ( MO-LM) N4 N Pulsed-latch (ET-PL) MO latches: One transparent when = One transparent when = Multiplexer: two MO inverters that propagate the output of the opaque latch Large clock transistors shared between the latches and the multiplexer (Gago et al. 99), opyright 99 IEEE Nov. 4, 9 N5 N N N N4 N5 (a) Pulse generator transparent to only when = =, or when = = shortly after both edges of the clock ET PL consumes lot of energy for four clocked pass gates To improve speed, modified from original design (trollo et al, 999) which implemented n-mo-only pass gate and p-mo-only keeper (a) single - edge, (b) dual - edge triggered Nov. 4, 4 (b) ET ymmetric pulse generator flip-flop (PGFF) Two pulse generators: X nd TAGE active at rising edge of the clock, Y active at falling edge of the clock X and Y alternately X Y precharge and evaluate At any moment, one of X and Y keeps the value of data sampled at the most recent clock edge The other X or Y is precharged high Pulses at X and Y have same width as clock econd stage is a simple NAN gate no need for a latch st TAGE: X st TAGE: Y Nov. 4, 4 elay [FO4] ET vs. ET: elay comparison E E ML/LM MO-LM PL PGFF Latch-MUX s have two equally critical paths, somewhat shorter than that of ML PL is more complex, adding more capacitance to the critical path compared to ET PL PGFF has short domino-like critical path fastest Nov. 4, 4

49 ET vs. ET: Power consumption comparison LM s benefit from clever implementation of latch-mux structure with clock transistors sharing PL adds extra highactivity capacitance compared to ET PL PGFF power consumption is in the middle, mainly due to alternate switching of nodes X and Y Power [uw] Non-clk ML LM PL E PL E M O E MO E PGFF (.8 µm, 5MHz for ET, 5MHZ for ET, high load) Nov. 4, 4 EP [fj/5mhz], [fj/5mhz] ET vs. ET: EP comparison ingle Edge ouble Edge M L/LM M O PL PGFF Latch-MUX s have similar or better EP than their ET counterparts PL exhibits worse delay and energy compared to ET PL, due to more complex design PGFF is fastest with moderate energy consumption: lowest EP EP (PGFF) < EP (LM) < EP (PL) (.8 µm, 5MHz for ET, 5MHZ for ET high load) Nov. 4, 44

50 igital ystem locking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. tojanovic, ejan M. Markovic, Nikola M. Nedovic hapter 9: Microprocessor Examples Wiley-Interscience and IEEE Press, January Microprocessor Examples locking for Intel Microprocessors IA- Pentium Pro First IA-64 Microprocessor Pentium 4 un Microsystems UltraPA-III locking locking and Es Alpha locking: A Historical Overview locking and Es IBM Microprocessors Level-ensitive can esign Examples of Es Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Microprocessor Examples Intel Microprocessor Features locking for Intel Microprocessors IA- Pentium Pro First IA-64 Microprocessor Pentium 4 un Microsystems UltraPA-III locking locking and Es Alpha locking: A Historical Overview locking and Es IBM Microprocessors Level-ensitive can esign Examples of Es MP Issue lock peed Pipeline tages Transistors ache (I//L) K/8K/56K ie ize I Process Max Power Pentium II Pentium III June 997 April 66 MHz GHz /4 /4 7.5M 4M 6k/6K/- 6K/6K/56K mm 6mm.8µm, 4M.8µm, 6M 7W W ource: Microprocessor eport Journal Pentium 4 ec GHz /4 4M 7mm.8µm, 6M 67W Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 IA- Pentium Pro IA- Pentium Pro Ext FB LK Gen In Load<:5,> elay Line Load<:4,> Out elay Line elay Line elay eskew ontrol elay Left pine ore P ight pine <:5,> <:4,> elay hift egister lock distribution network with deskewing circuit (Geannopoulos and ai 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 elay shift register (Geannopoulos and ai 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

51 IA- Pentium Pro First IA-64 Microprocessor ight Bandwidth ontrol elay = n Phase etector Left Leads PLL s Left elay = n Phase etector ight Leads PLL ore lock eference lock eskew luster Phase detector (Geannopoulos and ai 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 lock distribution topology (usu and Tam ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 First IA-64 Microprocessor First IA-64 Microprocessor Input Global lock TAP Interface eference lock Phase etector eskew Buffer igital Filter ontrol FM eskew ettings egional lock Grid egional Feedback lock Enable Output elay ontrol egister eskew buffer architecture (usu and Tam ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 igitally controlled delay line (usu and Tam ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic First IA-64 Microprocessor First IA-64 Microprocessor imulated regional clock-grid skew (usu and Tam ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Measured regional clock skew (usu and Tam ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

52 Pentium 4 Pentium 4 x- enable clock enable distribution & sync clock enable generator x- enables clock enable distribution & sync addr. bus outbound clocks To Test Access Port MAO ore core core clock bus clock PLL distribution bus clock# data bus I/O I/O data outbound clocks PLL distribution I/O feedback clock core clock divide by 4 data from core outbound deskew state machine data clock MFF data to core inbound input buffers buffer MFF inbound latching core clock clocks inbound strobe glitch input clocks gen protection and buffers state machine detection data strobes MAO PLL -stage binary tree of clock repeaters omain Buffer omain Buffer omain Buffer omain Buffer 46 omain Buffer 47 Phase etector Phase etector Phase etector Phase etector Local lock Macro Local lock Macro Local lock Macro Local lock Macro Local lock Macro equential Elements equential Elements equential Elements equential Elements equential Elements ore and I/O clock generation (Kurd et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Logical diagram of core clock distribution (Kurd et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 Pentium 4 tretch tretch tretch tretch medium Enable Enable freq. pulse Enable clk phase tretch Enable Buf Adjustable Gclk elay Buffer Gclk Type tretch Enable tretch medium tretch Enable tretch freq. pulse tretch slow freq. clk phase Enable Enable pulse clk Gclk lowync phase Enable Buf Gclk Gclk Type Enable Buf Type Gclk medium freq. normal Buf Type clk phase tretch Adjustable tretch elay Buffer fast freq. pulse clk Enable Enable Gclk Buf Type Example of local clock buffers generating various frequency, phase and types of clocks (Kurd et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 Microprocessor Examples locking for Intel Microprocessors IA- Pentium Pro First IA-64 Microprocessor Pentium 4 un Microsystems UltraPA-III locking locking and Es Alpha locking: A Historical Overview locking and Es IBM Microprocessors Level-ensitive can esign Examples of Es Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 UltraPA Family haracteristics UltraPA-III : locking UltraPA-I UltraPA-II UltraPA-III Year Architecture ie size # of transistors PA V9, 4-issue 7.7x7.8mm 5.M PA V9, 4-issue.5x.5mm 5.4M PA V9, 4-issue 5x5.5mm M lock Frequency 67MHz MHz GHz upply voltage.v.5v.6v Process Metal layers Power consumption.5µm MO 4 (Al) <W.5µm MO 5 (Al) <W.5µm MO 7 (Al) <8W Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 lock distribution delay in UltraPA-III (Heald et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8

53 UltraPA-III: lock torage Elements UltraPA-III: lock torage Elements Vdd Vdd Vdd M P Vdd M P Vdd M N NAN Inv 4 M N5 M P Inv 5 M P M N NAN NMO network Inv Inv4 Vdd M P Inv 5 M N5 Inv 6 M N M Na M Nb NAN M Nc M Nd M P Inv 5 Inv4 M N5 Inv Inv 6 M N4 M N Inv Inv Inv 6 N M N M N4 M N Inv Inv Inv Inv Inv M N M N4 a) b) emidynamic flip-flop (Klass 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 (a) Logic embedding in a semidynamic flip-flop; (b) Two-input XO function. (Klass, 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic UltraPA-III: lock torage Elements UltraPA-III: lock torage Elements Vdd Vdd Vdd Vdd Vdd Vdd M P Vdd M P M N NAN M N M N Inv Inv Inv Inv 4 Inv 5 Inv 5 M N6 M P M N M N M P M P4 M P M N5 Inv - Inv -4 M N4 M N M N7 Inv 6 M P M N M N M P4 M P NAN M Inv N4 Inv M N6 Vdd M P5 M P6 M P7 M N7 Inv 5 Inv 4 a) b) M N Inv M N5 ynamic versions of semidynamic flip-flop: (a) single-ended; (b) ifferential. (Klass 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic UltraPA-III flip-flop (Heald et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Microprocessor Examples Alpha Microprocessor Features locking for Intel Microprocessors IA- Pentium Pro First IA-64 Microprocessor Pentium 4 un Microsystems UltraPA-III locking locking and Es Alpha locking: A Historical Overview locking and Es IBM Microprocessors Level-ensitive can esign Examples of Es # transistors [M] ie ize [mm ] Process upply [V] Power [W] Freq. [MHz] Gates/ycle x.9.75µm x6.5.5µm x8.8.5µm x8.8.8µm.5 5 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4

54 Alpha Microprocessors: locking Alpha Microprocessors: locking clock grid (a) (b) (c) Alpha microprocessor final clock driver location: (a) 64, (b) 64, (c) 64 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 64 clock skew (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6 Alpha Microprocessors: locking Alpha Microprocessors: locking ext. clk PLL GLK Grid local clk Box Grid local clk cond cond. local clk cond cond. local clk 64 clock skew (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 7 64 clock hierarchy (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 8 Alpha Microprocessors: locking Alpha Microprocessors: locking NLK LL LL LL GLK grid LL L 64 clock skew (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 9 64 major clock domains (Xanthopoulos et al. ), opyright Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic

55 Alpha Microprocessors: locking Alpha µp: lock torage Elements P P N P 5 X P N 4 P X P P 4 N N N N N 5 64, NLK clock skew (Xanthopoulos et al. ), opyright IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 64 modified TP latches (Gronowski et al. 998), opyright 998 Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic Alpha µp: lock torage Elements Alpha µp: lock torage Elements X X X X 4 X (a) 64: (a) phase-a latch, (b) phase-b latch (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic (b) (a) Embedding of logic into a latch: (a) 64 TP latch, one level of logic; (b) 64 latch, two levels of logic. (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 4 (b) Alpha µp: lock torage Elements Alpha Microprocessors: Timing Logic Logic GLK GLK cond ritical Path efinition and riteria - Identify common clock, and - Maximize - Minimize cycle + UT ace efinition and riteria - Identify common clock, and - Minimize - Maximize H + 64 flip-flop (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 5 ritical-path and race analysis for clock buffering and conditioning (Gronowski et al. 998), opyright 998 IEEE Nov. 4, igital ystem locking: Oklobdzija, tojanovic, Markovic, Nedovic 6

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M.

Digital System Clocking: High-Performance and Low-Power Aspects. Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience and IEEE Press, January 2003 Nov. 14,

More information

Designing Sequential Logic Circuits

Designing Sequential Logic Circuits igital Integrated Circuits (83-313) Lecture 5: esigning Sequential Logic Circuits Semester B, 2016-17 Lecturer: r. Adam Teman TAs: Itamar Levi, Robert Giterman 26 April 2017 isclaimer: This course was

More information

Clocking Issues: Distribution, Energy

Clocking Issues: Distribution, Energy EE M216A.:. Fall 2010 Lecture 12 Clocking Issues: istribution, Energy Prof. ejan Marković ee216a@gmail.com Clock istribution Goals: eliver clock to all memory elements with acceptable skew eliver clock

More information

Skew-Tolerant Circuit Design

Skew-Tolerant Circuit Design Skew-Tolerant Circuit Design David Harris David_Harris@hmc.edu December, 2000 Harvey Mudd College Claremont, CA Outline Introduction Skew-Tolerant Circuits Traditional Domino Circuits Skew-Tolerant Domino

More information

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan,

More information

Lecture 9: Sequential Logic Circuits. Reading: CH 7

Lecture 9: Sequential Logic Circuits. Reading: CH 7 Lecture 9: Sequential Logic Circuits Reading: CH 7 Sequential Logic FSM (Finite-state machine) Inputs Current State COMBINATIONAL LOGIC Registers Outputs = f(current, inputs) Next state 2 storage mechanisms

More information

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders

EECS 427 Lecture 14: Timing Readings: EECS 427 F09 Lecture Reminders EECS 427 Lecture 14: Timing Readings: 10.1-10.3 EECS 427 F09 Lecture 14 1 Reminders CA assignments Please submit CA6 by tomorrow noon CA7 is due in a week Seminar by Prof. Bora Nikolic SRAM variability

More information

The Linear-Feedback Shift Register

The Linear-Feedback Shift Register EECS 141 S02 Timing Project 2: A Random Number Generator R R R S 0 S 1 S 2 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 The Linear-Feedback Shift Register 1 Project Goal Design a 4-bit LFSR SPEED, SPEED,

More information

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits igital Integrated Circuits A esign Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic esigning i Sequential Logic Circuits November 2002 Sequential Logic Inputs Current State COMBINATIONAL

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective igital Integrated Circuits A esign Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic esigning Sequential Logic Circuits November 2002 Naming Conventions In our text: a latch is level sensitive

More information

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM Mark McDermott Electrical and Computer Engineering The University of Texas at Austin 9/27/18 VLSI-1 Class Notes Why Clocking?

More information

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. 1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change

More information

Issues on Timing and Clocking

Issues on Timing and Clocking ECE152B TC 1 Issues on Timing and Clocking X Combinational Logic Z... clock clock clock period ECE152B TC 2 Latch and Flip-Flop L CK CK 1 L1 1 L2 2 CK CK CK ECE152B TC 3 Clocking X Combinational Logic...

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective igital Integrated Circuits A esign Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic esigning Sequential Logic Circuits November 2002 Sequential Logic Inputs Current State COMBINATIONAL LOGIC

More information

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació. Xarxes de distribució del senyal de rellotge. Clock skew, jitter, interferència electromagnètica, consum, soroll de conmutació. (transparències generades a partir de la presentació de Jan M. Rabaey, Anantha

More information

9/18/2008 GMU, ECE 680 Physical VLSI Design

9/18/2008 GMU, ECE 680 Physical VLSI Design ECE680: Physical VLSI esign Chapter IV esigning Sequential Logic Circuits (Chapter 7) 1 Sequential Logic Inputs Current State COMBINATIONAL LOGIC Registers Outputs Next state 2 storage mechanisms positive

More information

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis EE115C Winter 2017 Digital Electronic Circuits Lecture 19: Timing Analysis Outline Timing parameters Clock nonidealities (skew and jitter) Impact of Clk skew on timing Impact of Clk jitter on timing Flip-flop-

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Designing Sequential Logic Circuits November 2002 Sequential Logic Inputs Current State COMBINATIONAL

More information

Clock Strategy. VLSI System Design NCKUEE-KJLEE

Clock Strategy. VLSI System Design NCKUEE-KJLEE Clock Strategy Clocked Systems Latch and Flip-flops System timing Clock skew High speed latch design Phase locked loop ynamic logic Multiple phase Clock distribution Clocked Systems Most VLSI systems are

More information

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory

Lecture 27: Latches. Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday, May :30pm, 241 Cory EE241 - Spring 2008 Advanced Digital Integrated Circuits Lecture 27: Latches Timing Announcements Wrapping-up the class: Final presentations May 8, 1-5pm, BWRC Final reports due May 7 Final exam, Monday,

More information

GMU, ECE 680 Physical VLSI Design

GMU, ECE 680 Physical VLSI Design ECE680: Physical VLSI esign Chapter IV esigning Sequential Logic Circuits (Chapter 7) 1 Sequential Logic Inputs Current State COMBINATIONAL LOGIC Registers Outputs Next state 2 storage mechanisms positive

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective igital Integrated Circuits esign Perspective esigning Combinational Logic Circuits 1 Combinational vs. Sequential Logic In Combinational Logic Circuit Out In Combinational Logic Circuit Out State Combinational

More information

Integrated Circuits & Systems

Integrated Circuits & Systems Federal University of Santa Catarina Center for Technology Computer Science & Electronics Engineering Integrated Circuits & Systems INE 5442 Lecture 18 CMOS Sequential Circuits - 1 guntzel@inf.ufsc.br

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 18: March 27, 2018 Dynamic Logic, Charge Injection Lecture Outline! Sequential MOS Logic " D-Latch " Timing Constraints! Dynamic Logic " Domino

More information

MODULE 5 Chapter 7. Clocked Storage Elements

MODULE 5 Chapter 7. Clocked Storage Elements MODULE 5 Chapter 7 Clocked Storage Elements 3/9/2015 1 Outline Background Clocked Storage Elements Timing, terminology, classification Static CSEs Latches Registers Dynamic CSEs Latches Registers 3/9/2015

More information

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan

Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan Chapter 7 Sequential Circuits Jin-Fu Li Advanced Reliable Systems (ARES) Lab. epartment of Electrical Engineering National Central University it Jungli, Taiwan Outline Latches & Registers Sequencing Timing

More information

Chapter #6: Sequential Logic Design

Chapter #6: Sequential Logic Design Chapter #6: equential Logic esign Contemporary Logic esign No. 6- Cross-Coupled NO Gates ust like cascaded inverters, with capability to force output to (reset) or (set) \ eset Hold et eset et ace \ Forbidden

More information

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018 ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018 ΔΙΑΛΕΞΕΙΣ 12-13: esigning ynamic and Static CMOS Sequential Circuits ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ (ttheocharides@ucy.ac.cy) (ack: Prof. Mary Jane Irwin and

More information

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006

Managing Physical Design Issues in ASIC Toolflows Complex Digital Systems Christopher Batten February 21, 2006 Managing Physical Design Issues in ASI Toolflows 6.375 omplex Digital Systems hristopher Batten February 1, 006 Managing Physical Design Issues in ASI Toolflows Logical Effort Physical Design Issues lock

More information

DG417/418/419. Precision CMOS Analog Switches. Features Benefits Applications. Description. Functional Block Diagram and Pin Configuration

DG417/418/419. Precision CMOS Analog Switches. Features Benefits Applications. Description. Functional Block Diagram and Pin Configuration G417/418/419 Precision MO Analog witches Features Benefits Applications 1-V Analog ignal Range On-Resistance r (on) : 2 Fast witching Action t ON : 1 ns Ultra Low Power Requirements P :3 nw TTL and MO

More information

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab

Y. Tsiatouhas. VLSI Systems and Computer Architecture Lab CMOS INTEGRATE CIRCUIT ESIGN TECHNIUES University of Ioannina Memory Elements and other Circuits ept. of Computer Science and Engineering Y. Tsiatouhas CMOS Integrated Circuit esign Techniques Overview.

More information

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003 Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić Timing Issues January 2003 1 Synchronous Timing CLK In R Combinational 1 R Logic 2 C in C out Out 2

More information

EE141. Lecture 28 Multipliers. Lecture #20. Project Phase 2 Posted. Sign up for one of three project goals today

EE141. Lecture 28 Multipliers. Lecture #20. Project Phase 2 Posted. Sign up for one of three project goals today EE141-pring 2008 igital Integrated ircuits Lecture 28 Multipliers 1 Announcements Project Phase 2 Posted ign up for one of three project goals today Graded Phase 1 and Midterm 2 will be returned next Fr

More information

GMU, ECE 680 Physical VLSI Design 1

GMU, ECE 680 Physical VLSI Design 1 ECE680: Physical VLSI Design Chapter VII Timing Issues in Digital Circuits (chapter 10 in textbook) GMU, ECE 680 Physical VLSI Design 1 Synchronous Timing (Fig. 10 1) CLK In R Combinational 1 R Logic 2

More information

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining

EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 1 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining Slide 2 Topics Clocking Clock Parameters Latch Types Requirements for reliable clocking Pipelining Optimal pipelining

More information

Chapter # 3: Multi-Level Combinational Logic

Chapter # 3: Multi-Level Combinational Logic hapter # 3: Multi-Level ombinational Logic ontemporary Logic esign Randy H. Katz University of alifornia, erkeley June 993 No. 3- hapter Overview Multi-Level Logic onversion to NN-NN and - Networks emorgan's

More information

Chapter 8. Low-Power VLSI Design Methodology

Chapter 8. Low-Power VLSI Design Methodology VLSI Design hapter 8 Low-Power VLSI Design Methodology Jin-Fu Li hapter 8 Low-Power VLSI Design Methodology Introduction Low-Power Gate-Level Design Low-Power Architecture-Level Design Algorithmic-Level

More information

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II

CSE241 VLSI Digital Circuits Winter Lecture 07: Timing II CSE241 VLSI Digital Circuits Winter 2003 Lecture 07: Timing II CSE241 L3 ASICs.1 Delay Calculation Cell Fall Cap\Tr 0.05 0.2 0.5 0.01 0.02 0.16 0.30 0.5 2.0 0.04 0.32 0.178 0.08 0.64 0.60 1.20 0.1ns 0.147ns

More information

EE141Microelettronica. CMOS Logic

EE141Microelettronica. CMOS Logic Microelettronica CMOS Logic CMOS logic Power consumption in CMOS logic gates Where Does Power Go in CMOS? Dynamic Power Consumption Charging and Discharging Capacitors Short Circuit Currents Short Circuit

More information

Chapter 3. Chapter 3 :: Topics. Introduction. Sequential Circuits

Chapter 3. Chapter 3 :: Topics. Introduction. Sequential Circuits Chapter 3 Chapter 3 :: Topics igital esign and Computer Architecture, 2 nd Edition avid Money Harris and Sarah L. Harris Introduction Latches and Flip Flops Synchronous Logic esign Finite State Machines

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 17: March 23, 2017 Energy and Power Optimization, Design Space Exploration, Synchronous MOS Logic Lecture Outline! Energy and Power Optimization

More information

Data AND. Output AND. An Analysis of Pipeline Clocking. J. E. Smith. March 19, 1990

Data AND. Output AND. An Analysis of Pipeline Clocking. J. E. Smith. March 19, 1990 An Analysis of Pipeline locking J E Smith March 19, 1990 Abstract This note examines timing constraints on latches in pipelined systems The goal is to determine required clock characteristics based on

More information

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements

EE241 - Spring 2007 Advanced Digital Integrated Circuits. Announcements EE241 - Spring 2007 Advanced Digital Integrated Circuits Lecture 25: Synchronization Timing Announcements Homework 5 due on 4/26 Final exam on May 8 in class Project presentations on May 3, 1-5pm 2 1 Project

More information

Time Allowed 3:00 hrs. April, pages

Time Allowed 3:00 hrs. April, pages IGITAL ESIGN COEN 32 Prof. r. A. J. Al-Khalili Time Allowed 3: hrs. April, 998 2 pages Answer All uestions No materials are allowed uestion a) esign a half subtractor b) esign a full subtractor c) Using

More information

VLSI Design I; A. Milenkovic 1

VLSI Design I; A. Milenkovic 1 ourse dministration PE/EE 47, PE 57 VLI esign I L6: omplementary MO Logic Gates epartment of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic ( www.ece.uah.edu/~milenka

More information

EE141- Spring 2007 Digital Integrated Circuits

EE141- Spring 2007 Digital Integrated Circuits EE141- Spring 27 igital Integrated Circuits Lecture 19 Sequential Circuits 1 Administrative Stuff Project Ph. 2 due Tu. 5pm 24 Cory box + email ee141- project@bwrc.eecs.berkeley.edu Hw 8 Posts this Fr.,

More information

Quad SPST CMOS Analog Switches

Quad SPST CMOS Analog Switches Quad PT MO Analog witches Low On-Resistance: Low Leakage: 8 pa Low Power onsumption:.2 mw Fast witching Action t ON : 1 ns Low harge Injection Q: 1 p G21A/G22 Upgrades TTL/MO-ompatible Logic ingle upply

More information

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences Elad Alon Homework #9 EECS141 PROBLEM 1: TIMING Consider the simple state machine shown

More information

Dynamic operation 20

Dynamic operation 20 Dynamic operation 20 A simple model for the propagation delay Symmetric inverter (rise and fall delays are identical) otal capacitance is linear t p Minimum length devices R W C L t = 0.69R C = p W L 0.69

More information

COMP 103. Lecture 16. Dynamic Logic

COMP 103. Lecture 16. Dynamic Logic COMP 03 Lecture 6 Dynamic Logic Reading: 6.3, 6.4 [ll lecture notes are adapted from Mary Jane Irwin, Penn State, which were adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] COMP03

More information

Chapter 7: Digital Components. Oregon State University School of Electrical Engineering and Computer Science. Review basic digital design concepts:

Chapter 7: Digital Components. Oregon State University School of Electrical Engineering and Computer Science. Review basic digital design concepts: hapter 7: igital omponents Prof. en Lee Oregon tate University chool of Electrical Engineering and omputer cience hapter Goals Review basic digital design concepts: esigning basic digital components using

More information

Itanium TM Processor Clock Design

Itanium TM Processor Clock Design Itanium TM Processor Design Utpal Desai 1, Simon Tam, Robert Kim, Ji Zhang, Stefan Rusu Intel Corporation, M/S SC12-502, 2200 Mission College Blvd, Santa Clara, CA 95052 ABSTRACT The Itanium processor

More information

Fundamentals of Computer Systems

Fundamentals of Computer Systems Fundamentals of Computer Systems Sequential Logic Stephen A. Edwards Columbia University Summer 2017 State-Holding Elements Bistable Elements S Latch Latch Positive-Edge-Triggered Flip-Flop Flip-Flop with

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective Digital Integrated Circuits Design Perspective Jan M. Rabaey nantha Chandrakasan orivoje Nikolić Designing Combinational Logic Circuits November 2002. 1 Combinational vs. Sequential Logic In Combinational

More information

EE241 - Spring 2006 Advanced Digital Integrated Circuits

EE241 - Spring 2006 Advanced Digital Integrated Circuits EE241 - Spring 2006 Advanced Digital Integrated Circuits Lecture 20: Asynchronous & Synchronization Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal

More information

CPE/EE 427, CPE 527 VLSI Design I L07: CMOS Logic Gates, Pass Transistor Logic. Review: CMOS Circuit Styles

CPE/EE 427, CPE 527 VLSI Design I L07: CMOS Logic Gates, Pass Transistor Logic. Review: CMOS Circuit Styles PE/EE 427, PE 527 VLI esign I L07: MO Logic Gates, Pass Transistor Logic epartment of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic ( www.ece.uah.edu/~milenka

More information

Energy Delay Optimization

Energy Delay Optimization EE M216A.:. Fall 21 Lecture 8 Energy Delay Optimization Prof. Dejan Marković ee216a@gmail.com Some Common Questions Is sizing better than V DD for energy reduction? What are the optimal values of gate

More information

EE 330 Lecture 6. Improved Switch-Level Model Propagation Delay Stick Diagrams Technology Files

EE 330 Lecture 6. Improved Switch-Level Model Propagation Delay Stick Diagrams Technology Files EE 330 Lecture 6 Improved witch-level Model Propagation elay tick iagrams Technology Files Review from Last Time MO Transistor Qualitative iscussion of n-channel Operation Bulk ource Gate rain rain G Gate

More information

VLSI Design I; A. Milenkovic 1

VLSI Design I; A. Milenkovic 1 Why Power Matters PE/EE 47, PE 57 VLSI Design I L5: Power and Designing for Low Power Department of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic ( www.ece.uah.edu/~milenka

More information

Properties of CMOS Gates Snapshot

Properties of CMOS Gates Snapshot MOS logic 1 Properties of MOS Gates Snapshot High noise margins: V OH and V OL are at V DD and GND, respectively. No static power consumption: There never exists a direct path between V DD and V SS (GND)

More information

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline

CPE/EE 427, CPE 527 VLSI Design I L18: Circuit Families. Outline CPE/EE 47, CPE 57 VLI Design I L8: Circuit Families Department of Electrical and Computer Engineering University of labama in Huntsville leksandar Milenkovic ( www.ece.uah.edu/~milenka ) www.ece.uah.edu/~milenka/cpe57-05f

More information

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation

EEC 118 Lecture #6: CMOS Logic. Rajeevan Amirtharajah University of California, Davis Jeff Parkhurst Intel Corporation EEC 118 Lecture #6: CMOS Logic Rajeevan mirtharajah University of California, Davis Jeff Parkhurst Intel Corporation nnouncements Quiz 1 today! Lab 2 reports due this week Lab 3 this week HW 3 due this

More information

ESE 570: Digital Integrated Circuits and VLSI Fundamentals

ESE 570: Digital Integrated Circuits and VLSI Fundamentals ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective Designing ombinational Logic ircuits dapted from hapter 6 of Digital Integrated ircuits Design Perspective Jan M. Rabaey et al. opyright 2003 Prentice Hall/Pearson 1 ombinational vs. Sequential Logic In

More information

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th,

TAU 2014 Contest Pessimism Removal of Timing Analysis v1.6 December 11 th, TU 2014 Contest Pessimism Removal of Timing nalysis v1.6 ecember 11 th, 2013 https://sites.google.com/site/taucontest2014 1 Introduction This document outlines the concepts and implementation details necessary

More information

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1

Lecture Outline. ESE 570: Digital Integrated Circuits and VLSI Fundamentals. Total Power. Energy and Power Optimization. Worksheet Problem 1 ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 16: March 20, 2018 Energy and Power Optimization, Design Space Exploration Lecture Outline! Energy and Power Optimization " Tradeoffs! Design

More information

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University

EE 466/586 VLSI Design. Partha Pande School of EECS Washington State University EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 9 Propagation delay Power and delay Tradeoffs Follow board notes Propagation Delay Switching Time

More information

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders

EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7. EECS 427 F09 Lecture Reminders EECS 427 Lecture 15: Timing, Latches, and Registers Reading: Chapter 7 1 Reminders CA assignments CA7 is due Thursday at noon ECE Graduate Symposium Poster session in ECE Atrium on Friday HW4 (detailed

More information

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Topics. CMOS Design Multi-input delay analysis. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut Topics CMO Design Multi-input delay analysis pring 25 Transmission Gate OUT Z OUT Z pring 25 Transmission Gate OUT When is low, the output is at high impedance When is high, the output follows However,

More information

! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.

! Charge Leakage/Charge Sharing.  Domino Logic Design Considerations. ! Logic Comparisons. ! Memory.  Classification.  ROM Memories. ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification

More information

Combinational Logic Design

Combinational Logic Design PEN 35 - igital System esign ombinational Logic esign hapter 3 Logic and omputer esign Fundamentals, 4 rd Ed., Mano 2008 Pearson Prentice Hall esign oncepts and utomation top-down design proceeds from

More information

VLSI Design I; A. Milenkovic 1

VLSI Design I; A. Milenkovic 1 ourse dministration PE/EE 47, PE 57 VLI esign I L6: tatic MO Logic epartment of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic ( www. ece.uah.edu/~milenka )

More information

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

CMPEN 411. Spring Lecture 18: Static Sequential Circuits CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 18: Static Sequential Circuits [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11

More information

VLSI Design I; A. Milenkovic 1

VLSI Design I; A. Milenkovic 1 PE/EE 47, PE 57 VLI esign I L6: tatic MO Logic epartment of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic ( www. ece.uah.edu/~milenka ) www. ece.uah.edu/~milenka/cpe57-3f

More information

MODULE III PHYSICAL DESIGN ISSUES

MODULE III PHYSICAL DESIGN ISSUES VLSI Digital Design MODULE III PHYSICAL DESIGN ISSUES 3.2 Power-supply and clock distribution EE - VDD -P2006 3:1 3.1.1 Power dissipation in CMOS gates Power dissipation importance Package Cost. Power

More information

UNIVERSITY OF CALIFORNIA

UNIVERSITY OF CALIFORNIA UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences Last modified on April 14, 2004 by Brian Leibowitz (bsl@eecs.berkeley.edu) Jan Rabaey Homework

More information

ALU, Latches and Flip-Flops

ALU, Latches and Flip-Flops CSE14: Components and Design Techniques for Digital Systems ALU, Latches and Flip-Flops Tajana Simunic Rosing Where we are. Last time: ALUs Plan for today: ALU example, latches and flip flops Exam #1 grades

More information

SA CH SEGMENT /COMMON DRIVER FOR DOT MATRIX LCD

SA CH SEGMENT /COMMON DRIVER FOR DOT MATRIX LCD A06 A06 0 H EGMENT /OMMON RIVER FOR OT MATRIX L Ver. July, 000 A06 INTROUTION The A06 is an L driver LI which is fabricated by low power MO high voltage process technology. In segment driver mode, it can

More information

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering TIMING ANALYSIS Overview Circuits do not respond instantaneously to input changes

More information

CPE/EE 427, CPE 527 VLSI Design I Pass Transistor Logic. Review: CMOS Circuit Styles

CPE/EE 427, CPE 527 VLSI Design I Pass Transistor Logic. Review: CMOS Circuit Styles PE/EE 427, PE 527 VLI Design I Pass Transistor Logic Department of Electrical and omputer Engineering University of labama in Huntsville leksandar Milenkovic ( www.ece.uah.edu/~milenka ) Review: MO ircuit

More information

9/18/2008 GMU, ECE 680 Physical VLSI Design

9/18/2008 GMU, ECE 680 Physical VLSI Design ECE680: Physical VLSI Design Chapter III CMOS Device, Inverter, Combinational circuit Logic and Layout Part 3 Combinational Logic Gates (textbook chapter 6) 9/18/2008 GMU, ECE 680 Physical VLSI Design

More information

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics Lecture 23 Dealing with Interconnect Impact of Interconnect Parasitics Reduce Reliability Affect Performance Classes of Parasitics Capacitive Resistive Inductive 1 INTERCONNECT Dealing with Capacitance

More information

Menu. Master-Slave Flip-Flop

Menu. Master-Slave Flip-Flop Menu Clocks and Master-lave Flip-Flops J-K and other Flip-Flops Truth table & excitation table Adders (see [Lam: pg 130]) Counters Look into my... 1 CLK Master-lave Flip-Flop Master-lave Latch/Flip-Flop

More information

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut

Topics. Dynamic CMOS Sequential Design Memory and Control. John A. Chandy Dept. of Electrical and Computer Engineering University of Connecticut Topics Dynamic CMOS Sequential Design Memory and Control Dynamic CMOS In static circuits at every point in time (except when switching) the output is connected to either GND or V DD via a low resistance

More information

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA

Logical Effort: Designing for Speed on the Back of an Envelope David Harris Harvey Mudd College Claremont, CA Logical Effort: Designing for Speed on the Back of an Envelope David Harris David_Harris@hmc.edu Harvey Mudd College Claremont, CA Outline o Introduction o Delay in a Logic Gate o Multi-stage Logic Networks

More information

VLSI Circuit Design (EEC0056) Exam

VLSI Circuit Design (EEC0056) Exam Mestrado Integrado em Engenharia Eletrotécnica e de omputadores VLSI ircuit esign (EE0056) Exam 205/6 4 th year, 2 nd sem. uration: 2:30 Open notes Note: The test has 5 questions for 200 points. Show all

More information

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect Lecture 25 Dealing with Interconnect and Timing Administrivia Projects will be graded by next week Project phase 3 will be announced next Tu.» Will be homework-like» Report will be combined poster Today

More information

Lecture 16: Circuit Pitfalls

Lecture 16: Circuit Pitfalls Introduction to CMOS VLSI Design Lecture 16: Circuit Pitfalls David Harris Harvey Mudd College Spring 2004 Outline Pitfalls Detective puzzle Given circuit and symptom, diagnose cause and recommend solution

More information

Digital Logic Design - Chapter 4

Digital Logic Design - Chapter 4 Digital Logic Design - Chapter 4 1. Analyze the latch circuit shown below by obtaining timing diagram for the circuit; include propagation delays. Y This circuit has two external input and one feedback

More information

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002 CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING 18-322 DIGITAL INTEGRATED CIRCUITS FALL 2002 Final Examination, Monday Dec. 16, 2002 NAME: SECTION: Time: 180 minutes Closed

More information

An Extended Metastability Simulation Method for Synchronizer Characterization

An Extended Metastability Simulation Method for Synchronizer Characterization An Extended Metastability Simulation Method for Synchronizer Characterization Salomon Beer and Ran Ginosar Electrical Engineering Department, Technion Israel Institute of Technology, 382 Haifa, Israel

More information

EE141. Administrative Stuff

EE141. Administrative Stuff -Spring 2004 Digital Integrated ircuits Lecture 15 Logical Effort Pass Transistor Logic 1 dministrative Stuff First (short) project to be launched next Th. Overall span: 1 week Hardware lab this week Hw

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Instructor: Mohsen Imani UC San Diego Slides from: Prof.Tajana Simunic Rosing

More information

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

L4: Sequential Building Blocks (Flip-flops, Latches and Registers) L4: Sequential Building Blocks (Flip-flops, Latches and Registers) Acknowledgements: Lecture material adapted from R. Katz, G. Borriello, Contemporary Logic esign (second edition), Prentice-Hall/Pearson

More information

Digital Integrated Circuits A Design Perspective

Digital Integrated Circuits A Design Perspective rithmetic ircuitsss dapted from hapter 11 of Digital Integrated ircuits Design Perspective Jan M. Rabaey et al. opyright 2003 Prentice Hall/Pearson 1 Generic Digital Processor MEMORY INPUT-OUTPUT ONTROL

More information

Switching Activity Calculation of VLSI Adders

Switching Activity Calculation of VLSI Adders Switching Activity Calculation of VLSI Adders Dursun Baran, Mustafa Aktan, Hossein Karimiyan and Vojin G. Oklobdzija School of Electrical and Computer Engineering The University of Texas at Dallas, Richardson,

More information

Lecture 1: Circuits & Layout

Lecture 1: Circuits & Layout Lecture 1: Circuits & Layout Outline q A Brief History q CMOS Gate esign q Pass Transistors q CMOS Latches & Flip-Flops q Standard Cell Layouts q Stick iagrams 2 A Brief History q 1958: First integrated

More information

Lecture 8: Combinational Circuit Design

Lecture 8: Combinational Circuit Design Lecture 8: Combinational Circuit Design Mark McDermott Electrical and Computer Engineering The University of Texas at ustin 9/5/8 Verilog to Gates module mux(input s, d0, d, output y); assign y = s? d

More information

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo

Digital Integrated Circuits Designing Combinational Logic Circuits. Fuyuzhuo Digital Integrated Circuits Designing Combinational Logic Circuits Fuyuzhuo Introduction Digital IC Dynamic Logic Introduction Digital IC EE141 2 Dynamic logic outline Dynamic logic principle Dynamic logic

More information

Adders, subtractors comparators, multipliers and other ALU elements

Adders, subtractors comparators, multipliers and other ALU elements CSE4: Components and Design Techniques for Digital Systems Adders, subtractors comparators, multipliers and other ALU elements Adders 2 Circuit Delay Transistors have instrinsic resistance and capacitance

More information