2IN35 VLSI Programming Lab Work Communication Protocols: A Synchronous and an Asynchronous One René Gabriëls, r.gabriels@student.tue.nl July 1, 2008 1
Contents 1 Introduction 3 2 Problem Description 3 3 Synchronous solution 3 3.1 Protocol....................................... 3 3.2 Composition.................................... 4 3.3 Verilog implementation............................... 5 4 Asynchronous solution 6 4.1 Protocol....................................... 6 4.2 Composition.................................... 7 4.3 Verilog implementation............................... 7 A Verilog code listings 8 A.1 Synchronous protocol............................... 8 A.2 Asynchronous protocol............................... 12 2
1 Introduction This document describes the problem of communication between two hardware components, and how to solve it using a communication protocol. Two solutions are presented: a synchronous method (i.e. using shared clocks) and using an asynchronous method (i.e. using a handshake protocol). 2 Problem Description Imagine a situation in which there are two hardware components: a producer P and a consumer C, connected together by a wire (see figure 1). P sends data to C over this wire. But how does C know when to look for a data item, and how does P know when C has received the data item and is ready for the next one? This problem will be tackled in the next two sections by a synchronous and an asynchronous approach. P data C Figure 1: Producer P and consumer C with shared communication wire data 3 Synchronous solution 3.1 Protocol The first solution is introducing a shared notion of time between the sender and receiver, known as a clock (hence the name synchronous ), plus a set of conventions to indicate when data items have to be sent and received. A clock can be any periodic signal, but a square wave with equal low and high periods is most common. For such a clock, a communication convention might be that one data item has to be send and received every clock cycle. The number of values exchanged per second then equals the frequency of the clock. This scheme works, except when the sender cannot send a data item every clock cycle, or the receiver cannot receive a data item every clock cycle. This can be solved by introducing signaling wires, where the sender and receiver can notify each other whether they are ready to communicate or not. When both are ready during the same clock cycle, communication can proceed according to the protocol outlined above. Each side in the communication can be either active or passive. An active side signals that it is ready, no matter what the state of the other side is, while a passive side only signals readyness whenever the other side has already signaled readyness. A communication channel always has a passive and an active component connected to it. This gives rise to 2 possible 3
clock ready P enabled data C Figure 2: Producer P and consumer C with communication wire data, signaling wires ready and enabled, and a shared clock. configurations: a pull configuration, where the consumer is active and the producer is passive, and a push configuration, where the producer is active and consumer is passive. A pull configuration is shown in figure 2. Whenever C is ready to accept new data in this configuration, it signals this by making the ready wire high. If C is ready, P can signal that it has data available by making the enabled wire high. Every clock cycle in which enabled is high, one data item is send over the wire. Such a system can be in four states: ready enabled meaning 0 0 P and C are both busy. 0 1 Forbidden! No data items may be send when ready is low. 1 0 C is ready to accept a value, P is busy. 1 1 C is ready to accept a value, P sends a value. Figure 3 shows an example of this scheme in action between a producer P and consumer C. Every clock cycle in which both ready and enabled are high, one data item is exchanged. So in this example, 4 data items are exchanged between P and C in total. clock ready enabled data item 1? item 2 item 3? item 4 Figure 3: Operation of a synchronous protocol with a shared clock, signaling wires ready and enabled and data wire data. 3.2 Composition The disadvantage of a pull or push configuration is that one side must always be passive (P in the pull configuratin above), while the other side must always be active (C in the pull configuration above). To prevent mismatching components, one could choose to adopt a convention that everything should be constructed in a pull or push configuration. However, it is more 4
attractive to choose the third option: make both sides active. So in every clock cycle when both sides are ready, a data item is exchanged. This can be implemented straightforwardly by using an AND-gate to combine the ready signals, and feed those back as enabled signals. Figure 4 shows a pipeline consisting of a producer, a combined producer/consumer and a consumer connected together using this scheme. clock ready ready ready ready P enabled & enabled C & P enabled & enabled C data data Figure 4: Sequential composition of three active synchronous components P, C&P and C, interconnected by passivators. 3.3 Verilog implementation A skeleton verilog implementation for the pipeline shown in figure 4 is listed in appendix A.1. It consists of 4 components: a producer prod, a producer and consumer prodcons, a consumer cons, a passivator passivator. These components are instantiated in the top-level module pipeline. The components cons and prodcons have an input port consisting of the wires in ready, in enabled and in data. Conversely, the component prod and prodcons have an output port consisting of the wires out ready, out enabled and out data. Additionaly, each component has a parameter to specify the data width of the data wires. In each component, a buffer is provided for every output wire to keep it stable (see figure 5. Implementing the protocol amounts to controlling these buffers in the behavioral section (the statements in always @(...)). The behavior is a state machine that alternates between computation and communication. clock in_ready in_enabled in_data buf logic buf buf out_ready out_enabled out_data Figure 5: Generic synchronous producer/consumer component. 5
4 Asynchronous solution 4.1 Protocol The other solution uses the signaling wires such that a global clock isn t needed, and hence an absence of a shared notion of time. This opens the door to components that work at different clock speeds internally, but can still communicate with each other. Components might even be asynchronous internally. Such a design is shown in figure 6. request P acknowledge data C Figure 6: Producer P and consumer C with communication wire data, and signaling wires request and acknowledge. The most common asynchronous protocols are a 2-phase or 4-phase handshake protocol. We will describe a 4-phase protocol, as illustrated in figure 7. The four phases of the protocol are: 1. Whenever C is ready to receive a value, it sends a request to P (by making the request wire high). 2. If C has send a request, the sender can send an acknowledge back to C (by making the acknowledge wire high) as soon as it has data available. 3. When C receives an acknowledge, it will read the data item, and signal that it is ready reading the data item by making the request wire low again. 4. Whenever the request wire becomes low, the sender will respond by making the acknowledge wire low as well, ending the 4 phase handshake. Note that after a cycle of this protocol, the system is in the same state as before. The four phases are (in order): request acknowledge meaning 0 0 P and C are both busy. 1 0 C has requested a data item, P is busy. 1 1 C has requested a data item, P has send and acknowledged it. 0 1 C has received a data item, P is waiting to for acknowledge low. Note that although the synchronous and asynchronous solution both use two signaling wires, they are not the same! The asynchronous protocol always needs to go through the four phases in order, and thereby exchange one data item. The synchronous protocol on the other hand does not have to do a complete cycle to exchange one data item: if both signaling wires are high during n consecutive clock cycles, n data items are exchanged. The clock signal is used to separate consecutive data items, which is impossible in an asynchronous system. 6
request acknowledge data? item 1? item 2? item 3? item 4? Figure 7: Operation of an asynchronous protocol with signaling wires request and acknowledge and data wire data. 4.2 Composition As was the case for the synchronous solution, this solution also has an active and a passive side: the consumer is active (because it controls the request wire), and the sender is passive (because it controls the acknowledge wire). This is again a pull configuration. Reversing the roles of the producer and consumer, would result in a push configuration. In both cases, an active side and a passive side have to be matched. In order to prevent mismatches, we could standardize on one of these two approaches. But again, another solution is more attractive: make all components active, and insert a simple two port passive component in between them, known as a passivator. This device may seem more complex than it actually is. Instead of an AND-gate (which worked for the synchronous solution), we need a symmetric Muller-C element. A Muller-C element can be implemented with a 3-input majority gate, with one feedback wire. In the case of an FPGA this gate can be mapped onto 1 LUT. A composition of three components in a pipeline using the asynchronous protocol to communicate is shown in figure 8. Note the similarity to the synchronous solution. The differences are the absense of a global clock signal (although there might be one) and the Muller-C element instead of the AND-gate to connect the signaling wires. request request request request P acknowledge C acknowledge C & P acknowledge C acknowledge C data data Figure 8: An active producer P and consumer C with a passivator in between. 4.3 Verilog implementation The verilog implementation of the asynchronous protocol for the pipeline of figure 8 is very similar to implementation of the synchronous pipeline. Only the names of the registers and wires, and the guards of the conditionals have changed. See appendix A.2 for a complete listing. Note that this implementation still uses a global clock, but in principle each component can have its own clock. This is what is called a globally asynchronous, locally synchronous (GALS) design. 7
A Verilog code listings A.1 Synchronous protocol 1 module prod #(parameter DWIDTH = 8) 3 input r e s e t, 4 output o u t r e a d y, 5 input o u t e n a b l e d, 6 output [ 0 : DWIDTH 1] o u t d a t a ) ; 7 8 // R e g i s t e r f o r output r e a d y s i g n a l 9 reg o u t r e a d y b u f ; 10 a s s i g n o u t r e a d y = o u t r e a d y b u f ; 11 12 // R e g i s t e r f o r output data 13 reg [ 0 : DWIDTH 1] o u t d a t a b u f ; 14 a s s i g n o u t d a t a = o u t d a t a b u f ; 15 16 always @( posedge c l o c k ) begin 17 // C l e a r a l l r e g i s t e r s on r e s e t 18 i f ( r e s e t ) begin 19 o u t r e a d y b u f <= 0 ; 20 o u t d a t a b u f <= 0 ; 21 end 22 e l s e begin 23 // Stop p r o v i d i n g data i f output r e q u e s t was acknowledged 24 i f ( o u t e n a b l e d ) begin 25 o u t r e a d y b u f <= 0 ; 26 end 27 // Compute i f no output r e q u e s t i s open 28 i f (! o u t r e a d y ) begin 29 i f output r e a d y n e x t c y c l e begin 30 o u t r e a d y b u f <= 1 ; 31 o u t d a t a b u f <= output ; 32 end 33 compute ; 34 end 35 end 36 end 37 38 endmodule 8
1 module c o n s #(parameter DWIDTH = 8) 3 input r e s e t, 4 output i n r e a d y, 5 input i n e n a b l e d, 6 input [ 0 : DWIDTH 1] i n d a t a ) ; 7 8 // R e g i s t e r f o r i n p u t r e a d y s i g n a l 9 reg i n r e a d y b u f ; 10 a s s i g n i n r e a d y = i n r e a d y b u f ; 11 12 // B u f f e r to s t o r e incoming v a l u e s 13 reg [ 0 : DWIDTH 1] b u f f e r ; 14 15 always @( posedge c l o c k ) begin 16 // C l e a r a l l r e g i s t e r s on r e s e t 17 i f ( r e s e t ) begin 18 i n r e a d y b u f <= 0 ; 19 b u f f e r <= 0 ; 20 end 21 e l s e begin 22 // Take data i f i n p u t r e q u e s t was acknowledged 23 i f ( i n e n a b l e d ) begin 24 b u f f e r <= i n d a t a ; 25 i n r e a d y b u f <= 0 ; 26 end 27 // Compute i f no i n p u t r e q u e s t i s open 28 i f (! i n r e a d y ) begin 29 i f i n p u t r e q u i r e d n e x t c y c l e begin 30 i n r e a d y b u f <= 1 ; 31 end 32 compute ; 33 end 34 end 35 end 36 37 endmodule 9
1 module prodcons #(parameter DWIDTH = 8) 3 input r e s e t, 4 output i n r e a d y, 5 input i n e n a b l e d, 6 input [ 0 : DWIDTH 1] i n d a t a, 7 output o u t r e a d y, 8 input o u t e n a b l e d, 9 output [ 0 : DWIDTH 1] o u t d a t a ) ; 10 11 // R e g i s t e r f o r i n p u t r e a d y s i g n a l 12 reg i n r e a d y b u f ; 13 a s s i g n i n r e a d y = i n r e a d y b u f ; 14 15 // R e g i s t e r f o r output r e a d y s i g n a l 16 reg o u t r e a d y b u f ; 17 a s s i g n o u t r e a d y = o u t r e a d y b u f ; 18 19 // R e g i s t e r f o r output data 20 reg [ 0 : DWIDTH 1] o u t d a t a b u f ; 21 a s s i g n o u t d a t a = o u t d a t a b u f ; 22 23 // B u f f e r to s t o r e incoming v a l u e s 24 reg [ 0 : DWIDTH 1] b u f f e r ; 25 26 always @( posedge c l o c k ) begin 27 // C l e a r a l l r e g i s t e r s on r e s e t 28 i f ( r e s e t ) begin 29 i n r e a d y b u f <= 0 ; 30 o u t r e a d y b u f <= 0 ; 31 o u t d a t a b u f <= 0 ; 32 b u f f e r <= 0 ; 33 end 34 e l s e begin 35 // Take data i f i n p u t r e q u e s t was acknowledged 36 i f ( i n e n a b l e d ) begin 37 b u f f e r <= i n d a t a ; 38 i n r e a d y b u f <= 0 ; 39 end 40 // Stop p r o v i d i n g data i f output r e q u e s t was acknowledged 41 i f ( o u t e n a b l e d ) begin 42 o u t r e a d y b u f <= 0 ; 43 end 44 // Compute i f no r e q u e s t s a r e open 45 i f (! i n r e a d y &&! o u t r e a d y ) begin 46 i f output r e a d y n e x t c y c l e begin 47 o u t r e a d y b u f <= 1 ; 48 o u t d a t a b u f <= output ; 49 end 50 i f i n p u t r e q u i r e d n e x t c y c l e begin 51 i n r e a d y b u f <= 1 ; 52 end 53 compute ; 54 end 55 end 56 end 57 endmodule 10
1 module p a s s i v a t o r #(parameter DWIDTH = 8) 2 ( input i n r e a d y, 3 output i n e n a b l e d, 4 input [ 0 : DWIDTH 1] i n d a t a, 5 input o u t r e a d y, 6 output o u t e n a b l e d, 7 output [ 0 : DWIDTH 1] o u t d a t a ) ; 8 9 // P a s s i v a t o r b e h a v i o u r (AND gate, 1 LUT) 10 a s s i g n i n e n a b l e d = i n r e a d y & o u t r e a d y ; 11 a s s i g n o u t e n a b l e d = i n e n a b l e d ; 12 13 // Data p a s s t h r o u g h 14 a s s i g n o u t d a t a = i n d a t a ; 15 16 endmodule 1 module p i p e l i n e #(parameter DWIDTH = 8) 3 input r e s e t ) ; 4 5 // I n t e r c o n n e c t s 6 wire rdy1, rdy2, rdy3, rdy4 ; 7 wire ena1, ena2, ena3, ena4 ; 8 wire [ 0 : DWIDTH 1] data1, data2, data3, data4 ; 9 10 // I n s t a n t i a t e the p i p e l i n e 11 prod #(DWIDTH) s t a g e 1 ( c l o c k, r e s e t, rdy1, ena1, data1 ) ; 12 p a s s i v a t o r #(DWIDTH) p a s s 1 ( rdy1, ena1, data1, rdy2, ena2, data2 ) ; 13 prodcons #(DWIDTH) s t a g e 2 ( c l o c k, r e s e t, rdy2, ena2, data2, rdy3, ena3, data3 ) ; 14 p a s s i v a t o r #(DWIDTH) p a s s 2 ( rdy3, ena3, data3, rdy4, ena4, data4 ) ; 15 cons #(DWIDTH) s t a g e 3 ( c l o c k, r e s e t, rdy4, ena4, data4 ) ; 16 17 endmodule 11
A.2 Asynchronous protocol 1 module prod #(parameter DWIDTH = 8) 3 input r e s e t, 4 output o u t r e q u e s t, 5 input out acknowledge, 6 output [ 0 : DWIDTH 1] o u t d a t a ) ; 7 8 // R e g i s t e r f o r output r e q u e s t s i g n a l 9 reg o u t r e q u e s t b u f ; 10 a s s i g n o u t r e q u e s t = o u t r e q u e s t b u f ; 11 12 // R e g i s t e r f o r output data 13 reg [ 0 : DWIDTH 1] o u t d a t a b u f ; 14 a s s i g n o u t d a t a = o u t d a t a b u f ; 15 16 always @( posedge c l o c k ) begin 17 // C l e a r a l l r e g i s t e r s on r e s e t 18 i f ( r e s e t ) begin 19 o u t r e q u e s t b u f <= 0 ; 20 o u t d a t a b u f <= 0 ; 21 end 22 e l s e begin 23 // Stop p r o v i d i n g data i f output r e q u e s t was acknowledged 24 i f ( o u t r e q u e s t && o u t a c k n o w l e d g e ) begin 25 o u t r e q u e s t b u f <= 0 ; 26 end 27 // Compute i f no r e q u e s t s a r e open 28 i f (! o u t r e q u e s t &&! o u t a c k n o w l e d g e ) begin 29 i f output r e a d y n e x t c y c l e begin 30 o u t r e q u e s t b u f <= 1 ; 31 o u t d a t a b u f <= output ; 32 end 33 compute ; 34 end 35 end 36 end 37 38 endmodule 12
1 module c o n s #(parameter DWIDTH = 8) 3 input r e s e t, 4 output i n r e q u e s t, 5 input i n a c k n o w l e d g e, 6 input [ 0 : DWIDTH 1] i n d a t a ) ; 7 8 // R e g i s t e r f o r i n p u t r e q u e s t s i g n a l 9 reg i n r e q u e s t b u f ; 10 a s s i g n i n r e q u e s t = i n r e q u e s t b u f ; 11 12 // B u f f e r to s t o r e incoming v a l u e s 13 reg [ 0 : DWIDTH 1] b u f f e r ; 14 15 always @( posedge c l o c k ) begin 16 // C l e a r a l l r e g i s t e r s on r e s e t 17 i f ( r e s e t ) begin 18 i n r e q u e s t b u f <= 0 ; 19 b u f f e r <= 0 ; 20 end 21 e l s e begin 22 // Take data i f i n p u t r e q u e s t was acknowledged 23 i f ( i n r e q u e s t && i n a c k n o w l e d g e ) begin 24 b u f f e r <= i n d a t a ; 25 i n r e q u e s t b u f <= 0 ; 26 end 27 // Compute i f no r e q u e s t s a r e open 28 i f (! i n r e q u e s t &&! i n a c k n o w l e d g e ) begin 29 i f i n p u t r e q u i r e d n e x t c y c l e begin 30 i n r e q u e s t b u f <= 1 ; 31 end 32 compute ; 33 end 34 end 35 end 36 37 endmodule 13
1 module prodcons #(parameter DWIDTH = 8) 3 input r e s e t, 4 output i n r e q u e s t, 5 input i n a c k n o w l e d g e, 6 input [ 0 : DWIDTH 1] i n d a t a, 7 output o u t r e q u e s t, 8 input out acknowledge, 9 output [ 0 : DWIDTH 1] o u t d a t a ) ; 10 11 // R e g i s t e r f o r i n p u t r e q u e s t s i g n a l 12 reg i n r e q u e s t b u f ; 13 a s s i g n i n r e q u e s t = i n r e q u e s t b u f ; 14 15 // R e g i s t e r f o r output r e q u e s t s i g n a l 16 reg o u t r e q u e s t b u f ; 17 a s s i g n o u t r e q u e s t = o u t r e q u e s t b u f ; 18 19 // R e g i s t e r f o r output data 20 reg [ 0 : DWIDTH 1] o u t d a t a b u f ; 21 a s s i g n o u t d a t a = o u t d a t a b u f ; 22 23 // B u f f e r to s t o r e incoming v a l u e s 24 reg [ 0 : DWIDTH 1] b u f f e r ; 25 26 always @( posedge c l o c k ) begin 27 // C l e a r a l l r e g i s t e r s on r e s e t 28 i f ( r e s e t ) begin 29 i n r e q u e s t b u f <= 0 ; 30 o u t r e q u e s t b u f <= 0 ; 31 o u t d a t a b u f <= 0 ; 32 b u f f e r <= 0 ; 33 end 34 e l s e begin 35 // Take data i f i n p u t r e q u e s t was acknowledged 36 i f ( i n r e q u e s t && i n a c k n o w l e d g e ) begin 37 b u f f e r <= i n d a t a ; 38 i n r e q u e s t b u f <= 0 ; 39 end 40 // Stop p r o v i d i n g data i f output r e q u e s t was acknowledged 41 i f ( o u t r e q u e s t && o u t a c k n o w l e d g e ) begin 42 o u t r e q u e s t b u f <= 0 ; 43 end 44 // Compute i f no r e q u e s t s a r e open 45 i f (! i n r e q u e s t &&! i n a c k n o w l e d g e &&! o u t r e q u e s t &&! o u t a c k n o w l e d g e ) begin 46 i f output r e a d y n e x t c y c l e begin 47 o u t r e q u e s t b u f <= 1 ; 48 o u t d a t a b u f <= output ; 49 end 50 i f i n p u t r e q u i r e d n e x t c y c l e begin 51 i n r e q u e s t b u f <= 1 ; 52 end 53 compute ; 54 end 55 end 56 end 57 endmodule 14
1 module p a s s i v a t o r #(parameter DWIDTH = 8) 2 ( input i n r e q, 3 output i n a c k, 4 input [ 0 : DWIDTH 1] i n d a t a, 5 input o u t r e q, 6 output out ack, 7 output [ 0 : DWIDTH 1] o u t d a t a ) ; 8 9 // P a s s i v a t o r b e h a v i o u r ( m a j o r i t y gate, 1 LUT) 10 a s s i g n i n a c k = ( i n r e q & o u t r e q ) ( i n r e q & i n a c k ) ( o u t r e q & i n a c k ) ; 11 a s s i g n o u t a c k = i n a c k ; 12 13 // Data p a s s t h r o u g h 14 a s s i g n o u t d a t a = i n d a t a ; 15 16 endmodule 1 module p i p e l i n e #(parameter DWIDTH = 8) 3 input r e s e t ) ; 4 5 // I n t e r c o n n e c t s 6 wire req1, req2, req3, req4 ; 7 wire ack1, ack2, ack3, ack4 ; 8 wire [ 0 : DWIDTH 1] data1, data2, data3, data4 ; 9 10 // I n s t a n t i a t e the p i p e l i n e 11 prod #(DWIDTH) s t a g e 1 ( c l o c k, r e s e t, req1, ack1, data1 ) ; 12 p a s s i v a t o r #(DWIDTH) p a s s 1 ( req1, ack1, data1, req2, ack2, data2 ) ; 13 prodcons #(DWIDTH) s t a g e 2 ( c l o c k, r e s e t, req2, ack2, data2, req3, ack3, data3 ) ; 14 p a s s i v a t o r #(DWIDTH) p a s s 2 ( req3, ack3, data3, req4, ack4, data4 ) ; 15 cons #(DWIDTH) s t a g e 3 ( c l o c k, r e s e t, req4, ack4, data4 ) ; 16 17 endmodule 15