ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering
TIMING ANALYSIS
Overview Circuits do not respond instantaneously to input changes Predictable delay in transferring inputs to outputs Propagation delay Sequential circuits require a periodic clock Goal: analyze clock circuit to determine maximum clock frequency Requires analysis of paths from flip-flop outputs to flipflop inputs Even after inputs change, output signal of circuit maintains original output for short time
Sequential Circuits Sequential circuits can contain both combinational logic and edge-triggered flip flops A clock signal determines when data is stored in flip flops Goal: How fast can the circuit operate? Minimum clock period: T min Maximum clock frequency: f max Maximum clock frequency is the inverse of the minimum clock period 1/T min = f max Clock Period Clock
Combinational Logic Timing: Inverter A Y A Y t c d t p d Combinational logic is made from electronic circuits An input change takes time to propagate to the output The output remains unchanged for a time period equal to the contamination delay, t cd The new output value is guaranteed to valid after a time period equal to the propagation delay, t pd
Combinational Logic Timing: NOR Gate The output is guaranteed to be stable with old value until the contamination delay Unknown values shown in waveforms as Xs The output is guaranteed to be stable with the new value after the propagation delay
Combinational Logic Timing: complex circuits T pd = 2ns T cd = 1ns A Circuit X T pd = 3ns T cd = 1ns C A B Circuit X C B T pd = 5ns T cd = 1ns Propagation delays are additive Locate the longest combination of t pd Contamination delays may not be additive Locate the shortest path of t cd Find propagation and contamination delay of new, combined circuit
Clocked Device: Contamination and Propagation Delay D Clk Q Timing parameters for clocked devices are specified in relation to the clock input (rising edge) Output unchanged for a time period equal to the contamination delay, t cd after the rising clock edge New output guaranteed valid after time equal to the propagation delay, t Clk-Q Follows rising clock edge t cd t Clk-Q
Clocked Devices: Setup and Hold Times t s t h D Clk Q Timing parameters for clocked devices are specified in relation to the clock input (rising edge) D input must be valid at least t s (setup time) before the rising clock edge D input must be held steady t h (hold time) after rising clock edge Setup and hold are input restrictions Failure to meet restrictions causes circuit to operate incorrectly
Edge-Triggered Flip Flop Timing D CLK t h = hold time t s = setup time The logic driving the flip flop must ensure that setup and hold requirements are met Timing values (t cd t pd t Clk-Q t s t h )
Analyzing Sequential Circuits CLK D T Clk-Q = 5ns Comb. Logic T Clk-Q = 5 ns T s = 2 ns D Q D Q FFA X T pd = 5ns G Y FFB Z What is the minimum time between rising clock edges? T min = T CLK-Q (FFA) + T pd (G) + T s (FFB)
Analyzing Sequential Circuits T pd = 4ns Comb. Logic F CLK Comb. Logic H D Q D Q FFA T Clk-Q = 5ns What is the minimum clock period (T min ) of this circuit? Hint: evaluate all FF to FF paths Maximum clock frequency is 1/T min X T pd = 5ns Y FFB T Clk-Q = 4 ns T s = 2 ns Z
Analyzing Sequential Circuits T pd = 4ns F max = Comb. Logic F CLK Comb. Logic H D Q D Q FFA T Clk-Q = 5ns X T pd = 5ns T Clk-Q = 4 ns T s = 2 ns Path FFA to FFB T Clk-Q (FFA) + T pd (H) + T s (FFB) = 5ns + 5ns + 2ns = 12ns Path FFB to FFB T CLK-Q (FFB) + T pd (F) + T pd (H) + T s (FFB) = 4ns + 4ns + 5ns + 2ns Y FFB Z
Analyzing Sequential Circuits: Hold Time Violation T T h = 2 ns cd = 1ns T cd = 2ns CLK D X Comb. Logic D Q D Q FFA G Y FFB Z One more issue: make sure Y remains stable for hold time (T h ) after rising clock edge Remember: contamination delay ensures signal doesn t change How long before first change arrives at Y? T cd (FFA) + T cd (G) >= T h 1ns + 2ns > 2ns
Analyzing Sequential Circuits: Hold Time Violations All paths must satisfy requirements T cd = 1ns Comb. Logic F CLK Comb. Logic H D Q D Q FFA T ClD = 1ns T ClD = 1 ns T h = 2 ns Path FFA to FFB T CD (FFA) + T CD (H) > T h (FFB) = 1 ns + 2ns > 2ns X T cd = 2ns Y FFB Z Path FFB to FFB T CD (FFB) + T CD (F) + T Cd (H) > T h (FFB) = 1ns + 1ns + 2ns > 2ns
Summary Maximum clock frequency is a fundamental parameter in sequential computer systems Possible to determined clock frequency from propagation delays and setup time The longest path determines the clock frequenct All flip-flop to flip-flop paths must be checked Hold time are satisfied by examining contamination delays The shortest contamination delay path determines if hold times are met
Carry-Look Ahead Adders
Overview Ripple Adders are slow. Need time for each carry to propagate to the next step. Can we generate the carry inputs on the same timescale as the sum? Complexity versus delay issues.
Ripple Adder Ripple adder delays output due to rippling
Full Adder (See Week 4) Full adder includes carry in C i Karnaugh map for S i. C i x i y i S i C i+1 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 C i x i y i 0 1 00 01 11 10 1 1 1 1 S i S i = C i x i y i + + C i x i y i + + C i x i y i + + C i x i y i
Full Adder Implementation of carry out Minimize circuit for carry out - C i+1 C i x i y i S i C i+1 C i x i y i 00 01 11 10 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 C i+1 C i+1 = x i y i + C i x i + C i y i
Delays Delay in generating n th sum bit:
Carry look-ahead Generate Function: Propagate Function:
Carry Look Ahead Adder
Complete Circuit Delays
Two Level
To do Read up hierarchical adders. Read notes on Carry-Save adder.
STATE REDUCTION
Overview Important to minimize the size of digital circuitry Analysis of state machines leads to a state table (or diagram) In many cases reducing the number of states reduces the number of gates and flops This is not true 100% of the time In this course we attempt state reduction by examining the state table Other, more advanced approaches, possible Reducing the number of states generally reduces complexity.
FSM Comparison Solution A Moore Machine output function only of PS maybe more state synchronous outputs no glitching one cycle delay full cycle of stable output Solution B Mealy Machine output function of both PS & input maybe fewer states asynchronous outputs if input glitches, so does output output immediately available output may not be stable long enough to be useful:
FSM Recap Moore Machine Mealy Machine Both machine types allow one-hot implementations.
FSM Optimization State Reduction: Motivation: lower cost fewer flip-flops in one-hot implementations possibly fewer flip-flops in encoded implementations more don t cares in next state logic fewer gates in next state logic Simpler to design with extra states then reduce later. Example: Odd parity checker Moore machine
State Reduction Row Matching is based on the state-transition table: If two states have the same output and both transition to the same next state or both transition to each other or both self-loop then they are equivalent. Combine the equivalent states into a new renamed state. Repeat until no more states are combined State Transition Table NS output PS x=0 x=1 S0 S0 S1 0 S1 S1 S2 1 S2 S2 S1 0
FSM Optimization Merge state S2 into S0 Eliminate S2 New state machine shows same I/O behavior Example: Odd parity checker. State Transition Table NS output PS x=0 x=1 S0 S0 S1 0 S1 S1 S0 1
Row Matching Example (Mealy) State Transition Table NS output PS x=0 x=1 x=0 x=1 a a b 0 0 b c d 0 0 c a d 0 0 d e f 0 1 e a f 0 1 f g f 0 1 g a f 0 1
Row Matching Example NS output PS x=0 x=1 x=0 x=1 a a b 0 0 b c d 0 0 c a d 0 0 d e f 0 1 e a f 0 1 f e f 0 1 Reduced State Transition Diagram NS output PS x=0 x=1 x=0 x=1 a a b 0 0 b c d 0 0 c a d 0 0 d e d 0 1 e a d 0 1
Partitioning Minimization (Moore) State Transition Table NS output PS x=0 x=1 a b c 1 b d f 1 c f e 0 d b g 1 e f c 0 f e d 0 g f g 0 Step 1: P1= (abcdefg) Step 2: P2=(abd)(cefg) Step 3: (abd) 0-successors (bdb) (abd) 1-successors (cfg) (cefg) 0-successors (ffef) (cefg) 1-successors (ecdg) P3: (abd)(ceg)(f)
Partitioning Minimization Step 3: P3: (abd)(ceg)(f) State Transition Table NS output PS x=0 x=1 a b c 1 b d f 1 c f e 0 d b g 1 e f c 0 f e d 0 g f g 0 Step 4: (abd) 0-successors (bdb) (abd) 1-successors (cfg) b must be removed (ceg) 0-successors (fff) (ceg) 1-successors (ecg) P4: (ad)(b)(ceg)(f)
Partitioning Minimization Step 4: P4: (ad)(b)(ceg)(f) State Transition Table NS output PS x=0 x=1 a b c 1 b a f 1 c f c 0 f c a 0 Step 5: (verify no change)
STATE ASSIGNMENT
Encoding State Variables Option 1: Binary values 000, 001, 010, 011, 100 Option 2: Gray code 000, 001, 011, 010, 110 Option 3: One hot encoding One bit for every state Only one bit is a one at a given time For a 5-state machine 00001, 00010, 00100, 01000, 10000
Summary Important to create smallest possible FSMs This course: use visual inspection method Often possible to reduce logic and flip flops State encoding is important One-hot coding is popular for flip flop intensive designs.
SHIFT REGISTERS
Overview Multiple flip flops can be combined to form a data register Shift registers allow data to be transported one bit at a time Registers also allow for parallel transfer Many bits transferred at the same time Shift registers can be used with adders to build arithmetic units Remember: most digital hardware can be built from combinational logic (and, or, invert) and flip flops Basic components of most computers
Register with Parallel Load Register: Group of Flip-Flops Ex: D Flip-Flops Holds a Word of Data Loads in Parallel on Clock Transition Asynchronous Clear (Reset)
Register with Load Control Load Control = 1 New data loaded on next positive clock edge Load Control = 0 Old data reloaded on next positive clock edge
Shift Registers Cascade chain of Flip-Flops Bits travel on Clock edges Serial in Serial out, can also have parallel load / read
Parallel Data Transfer All data transfers on rising clock edge Data clocked into register Y
Parallel versus Serial Serial communications is defined as Provides a binary number as a sequence of binary digits, one after another, through one data line. Parallel communications Provides a binary number through multiple data lines at the same time.
Shift register application Parallel-to-serial conversion for serial transmission parallel outputs parallel inputs serial transmission
Serial Transfer Data transfer one bit at a time Data loopback for register A Time T0 T1 T2 T3 T4 Reg A 1011 1101 1110 0111 1011 Reg B 0011 1001 1100 0110 1011
Serial Transfer of Data Transfer from register X to register Y (negative clock edges for this example)
Serial Addition (D Flip-Flop) Slower than parallel Low cost Share fast hardware on slow data
Serial Addition (D Flip-Flop) Only one full adder Reused for each bit Start with low-order bit addition Note that carry (Q) is saved Add multiple values. New values placed in shift register B
Serial Addition (D Flip-Flop) Shift control used to stop addition Generally not a good idea to gate the clock Shift register can be of arbitrary length FA is built from combin. logic
Universal Shift Register Clear Clock Shift Right Left Load Read Control
Summary Shift registers can be combined together to allow for data transfer Serial transfer used in modems and computer peripherals (e.g. mouse) D flip flops allow for a simple design Data clocked in during clock transition (rising or falling edge) Serial addition takes less chip area but is slow Universal shift register allows for many operations The register is programmable. It allows for different operations at different times Next time: counters (circuits that count!)
READ ONLY MEMORIES (ROM)
Overview Read-only memory can normally only be read Internal organization similar to SRAM ROMs are effective at implementing truth tables Any logic function can be implemented using ROMs Multiple single-bit functions embedded in a single ROM Also used in computer systems for initialization ROM doesn t lose storage value when power is removed Very useful for implementing FSMs
Read-Only Memory (ROM) An array of semiconductor devices diodes transistors field effect transistors 2 N words by M bits Data can be read but not changed (normal operating conditions)
Read-Only Memory (ROM) N input bits 2 N words by M bits Implement M arbitrary functions of N variables Example 8 words by 5 bits: 3 Input Lines A B C ROM 8 words x 5 bits F 0 F 1 F 2 F 3 F 4 5 Output Lines
ROM Implementation ROM = "Read Only Memory" values of memory locations are fixed ahead of time A ROM can be used to implement a truth table if the address is m-bits, we can address 2 m entries in the ROM. our outputs are the bits of data that the address points to. ROM is a combinational device, not a sequential one m n 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 m is the "height", and n is the "width"
ROM Implementation Suppose there are 10 inputs 10 address lines (i.e., 2 10 = 1024 different addresses) Suppose there are 20 outputs ROM is 2 10 x 20 = 20K bits Rather wasteful, since lots of storage bits For functions, doesn t take advantage of K-maps, other minimizations
Read-Only Memory (ROM) Each minterm of each function can be specified 3 Inputs Lines A B C ROM 8 words x 5 bits F 0 F 1 F 2 F 3 F 4 5 Outputs Lines
ROM Internal Structure n Inputs Lines. n bit decoder... Memory Array 2 n words x m bits... m Outputs Lines
ROM Memory Array m 0 =A B C m 1 =A B C A B C 3 to 8 decoder m 2 =A BC m 3 =A BC m 4 =AB C m 5 =AB C m 6 =ABC m 7 =ABC F 0 F 1 F 2 F 3 F 4
Inside the ROM Alternate view Each possible horizontal/vertical intersection indicates a possible connection Or gates at bottom output the word selected by the decoder (32 x 8)
ROM Example Specify a truth table for a ROM which implements: F = AB + A BC G = A B C + C H = AB C + ABC + A B C
ROM Example Specify a truth table for a ROM which implements: F = AB + A BC G = A B C + C H = AB C + ABC + A B C
ROM Example Specify a truth table for a ROM which implements: F = AB + A BC G = A B C + C H = AB C + ABC + A B C
Function Implementation m 0 =A B C m 1 =A B C A B C 3 to 8 decoder m 2 =A BC m 3 =A BC m 4 =AB C m 5 =AB C m 6 =ABC m 7 =ABC Each column is a new function Note: two outputs unused! F G H
ROM Implementation of a Moore Machine ROMs implement combinational logic Note that ROMs do not hold state How would you determine the maximum clock frequency of this circuit? Look at the FF to FF path (NS to PS) Inputs ROM Next State Present State ROM Outputs
ROM Implementation of a Mealy Machine ROMs implement combinational logic Note that ROMs do not hold state How would you determine the maximum clock frequency of this circuit? Look at the FF to FF path (NS to PS) Inputs ROM Next State Present State ROM Outputs
Summary ROMs provide stable storage for data ROMs have address inputs and data outputs ROMs directly implement truth tables ROMs can be used effectively in Mealy and Moore machines to implement combinational logic In normal use ROMs are read-only They are only read, not written ROMs are often used by computers to store critical information Unlike SRAM, they maintain their storage after the power is turned off
PROGRAMMABLE LOGIC ARRAYS (PLA)
Programmable logic arrays 76 A ROM is potentially inefficient because it uses a decoder, which generates all possible minterms. No circuit minimization is done. Using a ROM to implement an n-input function requires: An n-to-2 n decoder, with n inverters and 2 n n-input AND gates. An OR gate with up to 2 n inputs. The number of gates roughly doubles for each additional ROM input.
Programmable logic arrays 77 A programmable logic array, or PLA, makes the decoder part of the ROM programmable too. Instead of generating all minterms, you can choose which products (not necessarily minterms) to generate.
78 A blank 3 x 4 x 3 PLA Inputs This is a 3 x 4 x 3 PLA (3 inputs, up to 4 product terms, and 3 outputs), ready to be programmed. OR array AND array Outputs
PLA example 79 x y z xy z xy x z x yz V 2 = Σm(1,2,3,4)= xy z + x z + x yz V 1 = Σm(2,6,7) = x yz + xy V 0 = Σm(4,6,7) = xy z + xy V 2 V 1 V 0
PLA evaluation 80 A k x m x n PLA can implement up to n functions of k inputs, each of which must be expressible with no more than m product terms. Unlike ROMs, PLAs allow you to choose which products are generated. This can significantly reduce the fan-in (number of inputs) of gates, as well as the total number of gates. However, a PLA is less general than a ROM. Not all functions may be expressible with the limited number of AND gates in a given PLA.