Design of Control Modules for Use in a Globally Asynchronous, Locally Synchronous Design Methodology

Similar documents
CMPEN 411 VLSI Digital Circuits Spring 2012 Lecture 17: Dynamic Sequential Circuits And Timing Issues

MOSIS REPORT. Spring MOSIS Report 1. MOSIS Report 2. MOSIS Report 3

CMPEN 411. Spring Lecture 18: Static Sequential Circuits

Vidyalankar S.E. Sem. III [ETRX] Digital Circuits and Design Prelim Question Paper Solution

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

ELEC Digital Logic Circuits Fall 2014 Sequential Circuits (Chapter 6) Finite State Machines (Ch. 7-10)

L4: Sequential Building Blocks (Flip-flops, Latches and Registers)

Integrated Circuits & Systems

Sequential vs. Combinational

Chapter 9 Asynchronous Sequential Logic

Synchronous Sequential Circuit

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Problem Set 9 Solutions

CPE/EE 422/522. Chapter 1 - Review of Logic Design Fundamentals. Dr. Rhonda Kay Gaede UAH. 1.1 Combinational Logic

ECE 407 Computer Aided Design for Electronic Systems. Simulation. Instructor: Maria K. Michael. Overview

CARNEGIE MELLON UNIVERSITY DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING DIGITAL INTEGRATED CIRCUITS FALL 2002

Lecture 9: Digital Electronics

VLSI Design Verification and Test Simulation CMPE 646. Specification. Design(netlist) True-value Simulator

King Fahd University of Petroleum and Minerals College of Computer Science and Engineering Computer Engineering Department

EXPERIMENT Traffic Light Controller

Synthesizing Asynchronous Burst-Mode Machines without the Fundamental-Mode Timing Assumption

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

Implementation of Clock Network Based on Clock Mesh

DIGITAL LOGIC CIRCUITS

Integrated Circuits & Systems

Gates and Flip-Flops

Synchronous Sequential Circuit Design. Digital Computer Design

Digital Integrated Circuits A Design Perspective

CMPEN 411 VLSI Digital Circuits Spring Lecture 14: Designing for Low Power

The Design Procedure. Output Equation Determination - Derive output equations from the state table

Chapter 3. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 3 <1>

ΗΜΥ 307 ΨΗΦΙΑΚΑ ΟΛΟΚΛΗΡΩΜΕΝΑ ΚΥΚΛΩΜΑΤΑ Εαρινό Εξάμηνο 2018

Reg. No. Question Paper Code : B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER Second Semester. Computer Science and Engineering

EE241 - Spring 2006 Advanced Digital Integrated Circuits

COE 202: Digital Logic Design Sequential Circuits Part 4. Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office:

Time Allowed 3:00 hrs. April, pages

Synchronous Sequential Logic

A New Method for Converting Trace Theoretic Specifications to Signal Transition Graphs

Digital Electronics. Delay Max. FF Rate Power/Gate High Low (ns) (MHz) (mw) (V) (V) Standard TTL (7400)

Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.

Sequential Logic Design: Controllers

Digital Logic Design - Chapter 4

Chapter 13. Clocked Circuits SEQUENTIAL VS. COMBINATIONAL CMOS TG LATCHES, FLIP FLOPS. Baker Ch. 13 Clocked Circuits. Introduction to VLSI

Hold Time Illustrations

Laboratory Exercise #8 Introduction to Sequential Logic

Data Synchronization Issues in GALS SoCs

MOSIS REPORT. Report:

Clock Strategy. VLSI System Design NCKUEE-KJLEE

University of Toronto Faculty of Applied Science and Engineering Edward S. Rogers Sr. Department of Electrical and Computer Engineering

Chapter 3. Chapter 3 :: Topics. Introduction. Sequential Circuits

VLSI Design, Fall Logical Effort. Jacob Abraham

Lecture 7: Logic design. Combinational logic circuits

Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. November Digital Integrated Circuits 2nd Sequential Circuits

EE115C Digital Electronic Circuits Homework #5


EE141Microelettronica. CMOS Logic

Programmable Logic Devices II

The Linear-Feedback Shift Register

Simulation of Logic Primitives and Dynamic D-latch with Verilog-XL

Homework Assignment #1 Solutions EE 477 Spring 2017 Professor Parker

Counters. We ll look at different kinds of counters and discuss how to build them

EE 447 VLSI Design. Lecture 5: Logical Effort

ECE 341. Lecture # 3

Digital Integrated Circuits A Design Perspective

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

ELCT201: DIGITAL LOGIC DESIGN

ECE429 Introduction to VLSI Design

COEN 312 DIGITAL SYSTEMS DESIGN - LECTURE NOTES Concordia University

Lecture 10: Sequential Networks: Timing and Retiming

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec 09 Counters Outline.

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Chapter 4. Sequential Logic Circuits

Boolean Algebra and Digital Logic 2009, University of Colombo School of Computing

Multi-Level Logic Optimization. Technology Independent. Thanks to R. Rudell, S. Malik, R. Rutenbar. University of California, Berkeley, CA

UNIVERSITY OF CALIFORNIA

NTE74HC173 Integrated Circuit TTL High Speed CMOS, 4 Bit D Type Flip Flop with 3 State Outputs

Digital Integrated Circuits A Design Perspective

Metastability. Introduction. Metastability. in Altera Devices

Integrated Circuits & Systems

MAHALAKSHMI ENGINEERING COLLEGE TIRUCHIRAPALLI

Combinational Logic Design

EE40 Lec 15. Logic Synthesis and Sequential Logic Circuits

Logical Effort. Sizing Transistors for Speed. Estimating Delays

Adders, subtractors comparators, multipliers and other ALU elements

Design of Arithmetic Logic Unit (ALU) using Modified QCA Adder

University of Toronto Faculty of Applied Science and Engineering Edward S. Rogers Sr. Department of Electrical and Computer Engineering

9/18/2008 GMU, ECE 680 Physical VLSI Design

Very Large Scale Integration (VLSI)

Digital Integrated Circuits A Design Perspective

Sequential Logic Circuits

Written exam for IE1204/5 Digital Design with solutions Thursday 29/

FSM model for sequential circuits

Fundamentals of Computer Systems

L2: Combinational Logic Design (Construction and Boolean Algebra)

Final Exam. ECE 25, Spring 2008 Thursday, June 12, Problem Points Score Total 90

Chapter #6: Sequential Logic Design

Sequential Logic Worksheet

EE241 - Spring 2001 Advanced Digital Integrated Circuits

GMU, ECE 680 Physical VLSI Design 1

Synchronous 4 Bit Counters; Binary, Direct Reset

Transcription:

Design of Control Modules for Use in a Globally Asynchronous, Locally Synchronous Design Methodology Pradnya Deokar Department of Electrical and Computer Engineering, VLSI Design Research Laboratory, Southern Illinois University Edwardsville, Illinois, USA, 62025 ECE595: Masters Project Abstract In the proposed blended GALS design methodology, macromodular control elements are used to facilitate communication across different clocked domains. This paper describes the design and implementation of seven delay-insensitive control elements (Transfer, Call, Merge, Branch, Rendezvous, Decision, and Interlock) as standard cells. All seven control elements have been designed, successfully simulated, and layed out using the TSMC 0.25 micron process, available through MOSIS. The results of our simulations prove that this nearly universal set of control modules are completely free of both combinational and metastability hazards. 1.0 Introduction Circuit design methods can be classified in two major categories: synchronous and asynchronous. The inherent adaptive nature, low-power circuit operation, composability and most important no need of global verification make asynchronous Page 1 of 26

methods attractive. While possessing these advantages, asynchronous design is also generally viewed as more difficult and is foreign to the majority of present day digital designers. The Globally Asynchronous Locally Synchronous (GALS) [1] design approach combines advantages of both synchronous and asynchronous operations, eliminating the need for global timing verification. The specific methodology that we advocate (one of many proposed GALS design methodologies), blends clockless and clocked systems using a restartable clock generator [2] and seven macromodular control modules. Use of a clock generator eliminates the need for a synchronizer between the local and global domains, if the local domain is clocked. Macromodular control elements facilitate communication across subsystems in different clocked domains. Control signals generated by macromodular control elements provide necessary sequence controls for events in the system. By creating a universal set of macromodular control modules as standard cells that can be incorporated into an existing standard cell library, we make it possible for designers with little or no previous asynchronous design experience to quickly create these blended systems and achieve the advantages of both synchronous and asynchronous operation. This macromodular set of seven control elements is not universal but can be made universal by the addition of the S element [8] which is not described in this paper. Page 2 of 26

2.0 Synthesis of Control Elements The seven macromodular control elements are Transfer, Branch, Merge, Rendezvous, Call, Decision and Interlock. The control elements use 2-phase or transition signaling where transitions represent a meaningful event. Hazard-free implementations of the macromodular control elements prevent transient errors in combinational circuit outputs due to unintended component delays. The design and implementation of these seven control elements is explained in this paper, and the Interlock element is described in additional detail in [3]. All seven of the control elements are synthesized using a method which we will summarize here. In the design method we employ, Petri Net models for the elements and the associated reachability graphs drive our synthesis procedure. Using an improvement well suited to automation, of the synthesis procedure described by Molnar et. al. in [10], logic equations for the control element are derived. These equations are then subject to Unger's ternary test for hazards [11]. The combinational hazards (logic races) found are eliminated by a trial-and-error procedure that is repeated until the ternary test proves the absence of combinational hazards. In addition to combinational hazards, some of the circuits used to implement the various control modules display metastability hazards. These metastability hazards are identified, and an appropriate metastability detection circuit is constructed that detects when metastability has subsided. The falling edge of the metastability detection Page 3 of 26

circuit s output transfers the now stable intermediate outputs levels to the flipflops that drive the control element s final output lines. The target technology used to implement these control elements is the TSMC n-well 0.25 µm process, available through MOSIS. The concept of minimizing logical effort is used to size transistors for minimal delay through the cell, with an assumption that the output of the cell is loaded with an external capacitance representing three times the gate capacitance of an inverter with drive strength of 2 (inv_2 from the Virginia Tech Standard Cell Library [9]). This method of minimizing delay is explained in [4]. Simulations to verify performance were executed using Cadence s electrical simulator, Spectre. Each of the control modules were layed out so that they could, at some point in the future, be fully incorporated into the TSMC 0.25 micron standard cell library created by researchers at Virginia Tech University [9]. In this standard cell library, cells are 15.2 μm high and the supply rails are 1.32 μm wide. The area occupied by each control element will be expressed in terms of the area of the Virginia Tech TSMC standard cell library buffer of driving strength two (buf_2). 2.1 Transfer Element The Transfer element [5] is useful when a timed process must be initiated by a clockless sequencing network; thus, providing an interface between a clockless domain and a timed domain. The circuit used to implement the T element is shown in Figure 1. Page 4 of 26

The INIT and COMPLETE events belong to the clockless domain which is shown on the left in Figure 1, and the DONE and RUN events belong to the timed domain on the right. The INIT event produces a rising-edge RUN pulse which initiates a finite set of operations in a timed domain. After completion of these operations in the timed domain a DONE pulse produces a COMPELTE transition in the clockless sequencing network. The T element (as it was implemented) can be shown to be free of all combinational and metatstability hazards if the INIT event is constrained to occur only after a COMPLETE event and if the RUN and DONE events are separated by a finite interval. As shown in Figure 1(a), the output of the lower T-FF (flip-flop) is delayed by about 1 ns and fed back to one of the inputs of XOR to produce a RUN pulse of width of about 1 ns whenever there is an INIT event i.e. a transition. The following equations for the T- FF are given in [6]. Y 1 = xy2 + y1 y2 + xy1; Y2 = xy2 + y1 y2 + xy1 = z Eq. 1 The logic diagram for the T-FF used in the T element is shown in Figure 1(b). The CLR signal is used to initialize outputs RUN and COMPLETE to low. It is important that the user of the T-module ensures that the INIT signal is low when CLR is made active. Page 5 of 26

COMPLETE T < DONE INIT > T RUN Transfer (T) Figure1a. Block diagram of Transfer element x CLR x' y2 T FF y1 y2 y1 x y1 CLR x' y2 y1' y2 x y1' CLR y2 z Note -Subscript ' represents inverted signal Figure1b. Logic diagram for T-FF Page 6 of 26

Simulation results shown in Figure 2(a), demonstrate correct operation of the T element. An INIT event (a transition) at 5.0ns causes a RUN pulse of width approximated 1ns, and DONE pulse at 8.3ns causes a COMPLETE. The physical layout is shown in Figure 2(b). The area (167 μm by 15 μm) occupied by the T element is equivalent to that occupied by 34 standard cell buffers. 2.2 Call The Call element [7] effectively allows two or more processors to use a shared resource. It provides mutually exclusive access to a shared resource (perhaps another processor) and after use of the shared resource, returns control to the calling processor. The Petri net model of the hazard free single return Call element is shown in Figure 3(a). The synthesis of the hazard-free C element without feedback delay is explained in [5]. The resulting logic equations for a single return C element are as follows, C' = C( B + D E) + ( B + C)( ADE + AD E) E' = E( D + B C) + ( D + E)( ABC + ABC) F' = B D Eq. 2 The superscript represents the next state of output. These equations describe the three output variables C, E and F inferred from the Petri net reachability graph. Our implementation of these logic equations is given in Figure 3(b). The CLR signal is added to the outputs of the Call element in order that the outputs C and E can be initialized to a LOW logical level. Simulation results are presented in Figure 4(a). The layout of the Call module is shown in Figure 4(b). The area (186μm by 15μm) occupied by the C element is equivalent to 38 standard cell buffers. Page 7 of 26

Figure2a. Simulation results for T element Figure2b. Layout of T element Page 8 of 26

B g C f h A m E k F D j Call (C) Figure3a. Petri net model of single return Call element Call A D E B D B' C A' D E B' C' CLR C A D' E' B C D' E A' B C D' E' CLR E A B' C' Note -Subscript ' shows inverted signal Figure3b. Logic diagram of Call element Page 9 of 26

Page 10 of 26

Figure4a. Simulation results for Call element Figure4b. Layout of Call element Page 11 of 26

2.3 Merge The Merge element [7], as its name suggests merges two input into one output. In other words, if a transition occurs on either of its input lines, the output should make a transition. It is implemented as a XOR gate. The M element does not have a feedback path, and only one input can change at a time, if it is to be a hazard-free element. The Petri net model of the M module is presented in Figure 5(a). The logic equation for the Merge is derived from the Petri net model and its reachability graph in [7] and is c' = a b Eq.3 The logic diagram of the Merge is shown in Figure 5(b). Simulation results demonstrating successful implementation of the XOR gate is presented in Figure 5(c). The layout is shown in Figure 5(d). The area (16 μm by 15 μm) occupied by the M element is equivalent to 3 standard cell buffers. The maximum speed of operation when loaded with our agreed upon standard load is 2.5GHz. 2.4 Branch The Branch element [7] includes two buffers placed between its single input and two outputs. This is also hazard-free element as long as input transitions are properly placed. b' = a c' = a Eq. 4 The Petri net model of the Branch is shown in Figure 6 (a). A logic diagram is shown in Figure 6(b). Simulation results are presented in Figure 6(c), which shows two outputs b and c as buffered version of the input a. Page 12 of 26

d A B e C Merge (M) Figure5a. Petri net model of Merge element a b Merge c' Figure5b. Logic diagram of Merge element Figure5c. Simulation results of Merge element Figure5d. Layout of Merge element Page 13 of 26

The layout is shown in Figure 6(d). The area (14μm by 15μm) occupied by the B element is equivalent to 3 standard cell buffers. 2.5 Rendezvous The Rendezvous element [7] effectively combines two inputs into single output with the output also providing feedback to the input. A positive-going transition must occur on both of its inputs before the output will transition high. When both inputs have returned to zero, the output will return to zero. At initialization it is important that the user ensure that both inputs are low or both inputs are high. The Petri net model of the R element used to drive the synthesis procedure is given in Figure 7(a). The derivation of the logic equations for the R element is detailed in [7]. The resulting logic equation is c ' = ab + ac + bc Eq. 5 The superscript represents the next state of the output. Our implementation of these logic equations is given in Figure 7(b). The R element is also conflict-free and hazardfree. Simulation results are shown in Figure 7(c). The layout is shown in Figure 7(d). The area (22μm by 15μm) occupied by the R element is equivalent to 4 standard cell buffers. 2.6 Decision The Decision element [5] selects one or the other of its two outputs depending on two biasing level inputs. The D element has an input A and two inputs B and C (for biasing Page 14 of 26

levels) as well as two outputs D and E. The biasing inputs (B and C) will decide whether a transition is to occur on either the D or the E output line. Page 15 of 26

e g C Figure6a. Petri net model of Branch Branch b' a Figure6b. Schematic diagram of Branch element c' Figure6c. Simulation results of Branch element Figure6d. Layout of Branch element Page 16 of 26

d A f C e B g Rendezvous (R) Figure7a. Petri net model of Rendezvous element a b Rendezvous a c c b c Figure7b. Schematic diagram of Rendezvous Figure7c. Simulation results of Rendezvous element Figure7d. Layout of Rendezvous element Page 17 of 26

The inputs B and C can be considered as transition signals or levels, but care should be taken to avoid them from being concurrent with input A. The Petri net model of the D element is shown in Figure 8(a). The logic equations for D and E are D' = ( B + D)( A E) + BD E' = ( C + E)( A D) + CE Eq.6 The superscript represents the next state of an output. The resulting logic diagram for the D element is shown in Figure 8(b). When both level inputs B and C are false, A has no effect on D and E. Thus D and E remain unchanged. When one of the inputs B or C is asserted, it is responsible for its corresponding output to make a transition whenever input A transitions. With both B and C true, a transition is selected randomly either on output D or on output E. This last case may force the system into a metastable state leading oscillations on the outputs. Addition of a mutual exclusion circuit postpones any final output until both intermediate outputs have become fully stable. As shown in Figure 9(a), oscillations occurred on D and E at 140ns and 190ns are not seen on D and E. The metastability detector circuit prevents these hazards from being reflected at outputs D and E. The layout for this hazard-free D element is shown in Figure 9(b). The area (340μm by 15μm) occupied by the D element is equivalent to 68 standard cell buffers. Page 18 of 26

g B D g A E g C Biased Decision (D) Figure8a. Petri net model of biased Decision element B' D' A B C B' D Decision D A E C' E C' E' E A D Note -Subscript ' shows inverted signal Figure8b. Logic diagram of biased Decision Page 19 of 26

Figure9a. Simulation results of biased Decision element Figure9b. Layout of biased Decision element Page 20 of 26

2.7 Interlock The Interlock element ensures mutual exclusion between concurrent requests for a shared resource originating in remote processors. The Petri net model of the I element [3] is shown in Figure 10(a). The Interlock described in detail in [3] contained combinational hazards, which are removed by augmenting the following logic equations from [7]. a' = c d' = b ( g h) + c d ( g h) + b c d e' = g h' = f ( c d) + g h ( c d) + f g h Eq.7 The superscript represents the next state of an output. Implementation of these logic equations is shown in Figure 10(b). When inputs to the interlock transition simultaneously, the interlock exhibits oscillatory behavior while trying to resolve the conflict. As in the D element, addition of an appropriate metastability detector circuit prevents oscillations from appearing in the final outputs. The metastability detector used in the I element is described in [3]. Simulation results presented in Figure 11(a) demonstrate how the interlock resolves the conflict between the two contenders successfully without any hazards. The layout for this hazard-free I element is shown in Figure 11(b). The area (298μm by 15μm) occupied by the I element is equivalent to 60 standard cell buffers. Page 21 of 26

j A k C n B m D t p F r H s E q G Interlock (I)) Figure 10(a). Petri net model of Interlock Element g h Interlock b c g c d b c d d c d f g h h f g h Note- Subscript ' represents inverted signal Figure 10(b). Logic Diagram of Interlock Element Page 22 of 26

Figure 11(a). Simulation results for Interlock Element Figure 11(b). Layout of Interlock Element Page 23 of 26

3.0 Summary The set these seven delay insensitive is not a universal set, but it can be made by addition of S element described well in [8]. Delay-insensitive control and data paths are composed from this set of control elements implemented as standard cells and designed to be completely free of both combinational and metastability hazards. This guarantee is valid providing the composition of elements follows a few simple rules. Table 1 compares sizes and maximum speeds of all seven control elements discussed above. Table1. Comparison of control modules in terms of area and speed Input Type Metastability Size Speed T element SIC No 167um x 15um (34 beq) 250MHz C element SIC No 186um x 15um (38 beq) 1.1GHz M element SIC No 16um x 15um (3 beq) 2.5GHz B element SIC No 14um x 15um (3 beq) 3.3GHz R element MIC No 22um x 15um (4 beq) 2.0GHz D element MIC Yes 340um x 15um (68 beq) 1.4GHz I element MIC Yes 298um x 15um (60 beq) 1.6GHz Page 24 of 26

References [1] D. M. Chapiro, Globally-asynchronous locally-synchronous systems, Ph.D. thesis, Stanford University, Oct. 1984 [2] S. K. Tallapragada, Design of a Restartable Crystal Controlled Clock for use in a Globally Asynchronous, Locally Synchronous Design methodology, ECE595 Masters Project Report, Southern Illinois University, Edwardsville, Dec. 2005. [3] U. G. Swamy, J.R. Cox, G. L. Engel, D. M. Zar, Design of an interlock module for use in a globally asynchronous, locally synchronous design methodology, WU CSE Tech. Report, 2005-53. [4] J. M. Rabaey, A. Chandrakasan, B. Nikolic, Digital Integrated Circuits, Prentice Hall, 2003. [5] J.R. Cox, Blending Clockless and Clocked Systems: Call, Decision and Translate Elements, May 2006 [6] Personal communication with Dr. J.R. Cox [7] J. R. Cox, D. M. Zar, Synthesis of Control Elements from Petri Net Models, WU CSE Tech Report, 2005-43, Sep. 2005. [8] J.R. Cox, A Universal Set of Delay-Insensitive Elements, July 2006 [9] J. B. Sulistyo, J. Perry, and D. S. Ha, "Developing Standard Cells for TSMC 0.25um Technology under MOSIS DEEP Rules", Department of Electrical and Computer Engineering, Virginia Tech, Technical Report VISC-2003-01, November 2003. Page 25 of 26

[10] C. E. Molnar, T. P. Fang, F. U. Rosenberger, Synthesis of delay-insensitive modules, In: H. Fuchs, ed. Proc. 1985 Chapel Hill Conference on Very Large Scale Integration, Computer Science Press, pp 67-86, 1985. [11] Stephen Unger, Asynchronous sequential switching circuits, Wiley-Interscience, pp.177-183, 1969. Page 26 of 26