ECE251. VLSI System Design Project 4 SRAM Cell and Memory Array Operation Area Memory core 4661 mm 2 (256bit) Row Decoder 204.7 mm 2 Collumn Decoder Overall Design Predecoder 156.1 mm 2 Mux 629.2 mm 2 Tristate Buffer 641.6 mm 2 1420 mm 2 8567 mm 2 Due Date: 6/12/2002 Instructor: Dr. A. Kavianpour ID: 64032780 Name: Jiwon Hahn File Directory: ~jhahn/ece251/project4
1. Objective Design 256 bit SRAM with the following characteristics: 1. The memory cell is based on the 6-transistor CMOS SR latch. 2. The memory cells are organized as an 8 x 4 byte rectangular array. 3. Row decoder addresses the 8 word lines, while column decoder addresses each of the 4-byte columns. Each row word line is 32 bits, and each column is one byte wide 8 rows deep. 4. Use NAND design for these decoders. 5. I/O signals swing between GND and Vdd and are assumed to be ideal steps. 6. Delays should be measured from 50% point of the input to 50% of the outputs of the slower decoder. 7. The memory cell should be sized in such a way that the overall memory achieves the goals: performance, power, and area. 8. No need to include sense amplifier or the tri-state driver. 9. Assume row word line load is 100fF and the column bit line load is 20pF. 10. Take only the capacitance of the bit- and word lines into account. Ignore the resistance. This gives optimistic and simplified analysis. 2. Description Static Random-Access Memory(SRAM) is a type of memory that is faster and more reliable than the more common DRAM. SRAM doesn t need to be refreshed like dynamic RAM, and the speed reaches about five times faster than DRAM. As a trade-off, the cost is much more expensive than DRAM, which limits it s usage in the practical world. In this project, we are implementing six-transistor CMOS SRAM cell, which means, it requires 6 transistors/bit. A0-A4 specifies the address of the memory to write/read, R/W defines the operation state of either Read or Write. D0- D7 is the 8 bit(1byte) data. f is the clock signal for refreshing dynamic decoder. R/W A0 A4 F 256 bit SRAM D0 D7
Storage cell A0 3x8 NAND Byte0 Byte1 Byte2 Byte3 A1 A2 Row Decoder 8 8 8 8 BUS 4 4 4 4 4 4 4 4 Column Decoder A3 A4 2x4 NAND Predecoder 4 4-to-1 Multiplexers M0 M1 M2 M3 M4 M5 M6 M7 R/W Tristate Buffer D0 D7 <Floor plan of SRAM> * The BUS is used for getting inputs of same bits in each column as a wired OR line, and distributing each line to corresponding MUX input. For example, every first bits of the first column are wired together as input of BUS, and passing the BUS, the line is directed to the first input of MUX0. In the same way, every first bits of the second column are wired together and directed to the first input of MUX1. * The number of each Mux corresponds to the bit position of one byte. 3. Procedure 1) Logic Schematics Gate Level Design
i) 1bit Memory BLbar WL BL ii) 2-to-4 NAND decoder (Collumn decoder) A3 74LS00 S3 74LS00 S2 A4 74LS00 S1 74LS00 S0 iii) 3-to-8 NAND decoder (Row decoder) A0 4023 WL7 A1 A2 4023 4023 4023 WL6 WL5 WL4 4023 WL3 4023 WL2 4023 WL1 4023 WL0
iv) Tristate Buffer R 74LS126 Din Do 74LS125 Rbar Transistor Level Diagram i) 1bit Memory BLbar IRF9510 Vdd IRF9510 WL IRF9510 BL Vdd ii) 2-to-4 NAND Decoder Vdd clock IRF9510 S0 S1 S2 A3 S3 A3bar A4 A4bar
iii) 3-to-8 NAND Decoder Vdd WLbar clock IRF9510 WL1bar WL2bar WL3bar A0 WL4bar A0bar WL5bar WL6bar A1 WL7bar A1bar A2 A2bar iv) 4-to-1 multiplexer B0 B1 B2 B3 S0 S0bar S1 PMOS S1bar S2 PMOS S2bar PMOS S3 S3bar PMOS D
v) Tristate Buffer Vdd PMOS Rbar PMOS PMOS R Din Vdd PMOS PMOS PMOS PMOS R Dout Rbar PMOS 2) Magic Layout i) Memory core One bit memory cell without & with bit-line inverter
256bit memory (area=589x977, 4661 mm 2 ) ii) 3-to-8 Row Decoder (area=178x142, 204.7mm 2 )
iii) 2-to-4 Column Predecoder (area=119x162, 156.2mm 2 ) iv) Tristate buffer One bit:
Altogether(8 bits) (area=88x411, 293mm 2 ) v) Column Decoder cell (Predecoder+Mux+Tristate Buffer) (area=169x1037, 1420mm 2 ) vi) Complete Layout (area=924x1145, 8567mm 2 )
3) Simulation IRSIM i) Memory Core One-bit Read model linear h Vdd! l GND! clock Q 0 0 0 1 1 0 1 1 clock Qb 1 1 1 0 0 1 0 0 clock WL 1 1 1 1 1 1 1 1 ana Q Qb WL BL c One-bit Write model linear h Vdd! l GND! clock BL 0 0 0 1 1 0 1 1 clock WL 1 1 1 1 1 1 1 1 ana WL BL Q Qb c
One byte Read model linear stepsize 10 vector rowaddr WL0 WL1 WL2 WL3 WL4 WL5 WL6 WL7 vector Data1 C11 C12 C13 C14 C15 C16 C17 C18 vector Data2 C21 C22 C23 C24 C25 C26 C27 C28 vector Data3 C31 C32 C33 C34 C35 C36 C37 C38 vector Data4 C41 C42 C43 C44 C45 C46 C47 C48 vector Q62 Q621 Q622 Q623 Q624 Q625 Q626 Q627 Q628 vector Q62b Q62b1 Q62b2 Q62b3 Q62b4 Q62b5 Q62b6 Q62b7 Q62b8 vector Q72 Q721 Q722 Q723 Q724 Q725 Q726 Q727 Q728 vector Q72b Q72b1 Q72b2 Q72b3 Q72b4 Q72b5 Q72b6 Q72b7 Q72b8 h vdd! l GND! set Q62 10101010 set Q62b 01010101 set Q72 10010110 set Q72b 01101001 ana rowaddr Q62 Q62b Q72 Q72b Data1 Data2 Data3 Data4 set rowaddr 00000100 s 100 set rowaddr 00000010 s 100
One Byte Write (all ones and all zeros) model linear stepsize 10 vector rowaddr WL0 WL1 WL2 WL3 WL4 WL5 WL6 WL7 vector Data1 C11 C12 C13 C14 C15 C16 C17 C18 vector Data2 C21 C22 C23 C24 C25 C26 C27 C28 vector Data3 C31 C32 C33 C34 C35 C36 C37 C38 vector Data4 C41 C42 C43 C44 C45 C46 C47 C48 vector Q62 Q621 Q622 Q623 Q624 Q625 Q626 Q627 Q628 vector Q62b Q62b1 Q62b2 Q62b3 Q62b4 Q62b5 Q62b6 Q62b7 Q62b8 vector Q72 Q721 Q722 Q723 Q724 Q725 Q726 Q727 Q728 vector Q72b Q72b1 Q72b2 Q72b3 Q72b4 Q72b5 Q72b6 Q72b7 Q72b8 h vdd! l GND! set rowaddr 00000100 set Data1 XXXXXXXX set Data2 11111111 set Data3 XXXXXXXX set Data4 XXXXXXXX ana rowaddr Data1 Data2 Data3 Data4 Q62 Q62b Q72 Q72b s 100 set rowaddr 00000010 set Data1 XXXXXXXX set Data2 00000000 set Data3 XXXXXXXX set Data4 XXXXXXXX s 100
It writes 11111111(ff) from Data2 into second column sixth row, and then writes 00000000(00) from Data2 into second column seventh row.
ii) 2-to-4 Collumn Predecoder model linear h Vdd! l GND! clock a3 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 clock a3b 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 clock a4 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 clock a4b 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 clock clk 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 ana clk a3 a3b a4 a4b WL0 WL1 WL2 WL3 c Output line is properly selected when the clock goes low. The reason is explained in Discussion. iii) 3-to-8 Row Decoder model linear h Vdd! l GND! clock a0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 clock a0b 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 clock a1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 clock a1b 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 clock a2 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 clock a2b 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 clock clk 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ana clk a0 a1 a2 a3 WL0 WL1 WL2 WL3 WL4 WL5 WL6 WL7 c
Again, the output lines are properly selected when the clock goes low. iv) Tri-State Buffer One-bit Read Input is propagated to output when the Read line is activated(high).
One-bit Write One-byte Read model linear h Vdd! l GND! clock r 0 0 1 1 0 0 1 1 clock W 1 1 0 0 1 1 0 0 clock DI0 0 0 0 1 1 0 1 1 clock DI1 0 1 1 0 1 1 0 0 clock DI2 1 0 1 1 0 0 0 1 clock DI3 1 1 0 0 0 1 1 0 clock DI4 0 0 0 1 1 0 1 1 clock DI5 0 1 1 0 1 1 0 0 clock DI6 1 0 1 1 0 0 0 1 clock DI7 1 1 0 0 0 1 1 0 ana r w DI0 D0 DI1 D1 DI2 D2 DI3 D3 DI4 D4 DI5 D5 DI6 D6 DI7 D7 c
When Read is high, output pattern of DI follows the input pattern of DI0. One-Byte Write model linear h Vdd! l GND! clock r 0 0 1 1 0 0 1 1 clock W 1 1 0 0 1 1 0 0 clock D0 0 0 0 1 1 0 1 1 clock D1 0 1 1 0 1 1 0 0 clock D2 1 0 1 1 0 0 0 1 clock D3 1 1 0 0 0 1 1 0 clock D4 0 0 0 1 1 0 1 1 clock D5 0 1 1 0 1 1 0 0 clock D6 1 0 1 1 0 0 0 1 clock D7 1 1 0 0 0 1 1 0 ana r w D0 DI0 D1 DI1 D2 DI2 D3 DI3 D4 DI4 D5 DI5 D6 DI6 D7 DI7 c
When Read is low, output pattern of DI0 follows the input pattern of DI. v) Predecoder and Mux together model linear h Vdd! l GND! clock a3 0 0 1 1 0 0 1 1 clock a3b 1 1 0 0 1 1 0 0 clock a4 0 1 0 1 0 1 0 1 clock a4b 1 0 1 0 1 0 1 0 clock B1 1 0 0 0 1 0 0 0 clock B2 0 1 0 0 0 1 0 0 clock B3 0 0 1 0 0 0 1 0 clock B4 0 0 0 1 0 0 0 1 clock clk 0 0 0 0 0 0 0 0 ana clk a3 a4 B1 B2 B3 B4 Din c ~
vi) Column Decoder (decoder+mux+tristatebuffer) model linear stepsize 10 vector M1 m11 m12 m13 m14 vector M2 m21 m22 m23 m24 vector M3 m31 m32 m33 m34 vector M4 m41 m42 m43 m44 vector M5 m51 m52 m53 m54 vector M6 m61 m62 m63 m64 vector M7 m71 m72 m73 m74 vector M8 m81 m82 m83 m84 vector DI Di0 Di1 Di2 Di3 Di4 Di5 Di6 Di7 vector D D0 D1 D2 D3 D4 D5 D6 D7 vector in a3 a3b a4 a4b vector input a3 a4 vector test t0 t1 t2 t3 h Vdd! l GND! l clk set M1 0001 set M2 0101 set M3 0001 set M4 0101 set M5 0011 set M6 0111 set M7 0011 set M8 0111
h R l W ana input M1 M2 M3 M4 M5 M6 M7 M8 test DI D set in 0101 s 100 set in 0110 s 100 set in 1001 s 100 set in 1010 s 100 vii) ETC (testing with all components together)
SPICE i) Writing one and zero in one bit memory cell * HSPICE file created from memcell.ext - technology: scmos.lib 'log018_1.l' TT.option scale=0.1u m0 vdd Q Qb Vdd pch w=3 l=3 + ad=105 pd=102 as=19 ps=18 m1 vdd Qb Q Vdd pch w=3 l=3 + ad=0 pd=0 as=19 ps=18 m2 GND Q Qb Gnd nch w=4 l=2 + ad=52 pd=42 as=44 ps=34 m3 Q Qb GND Gnd nch w=4 l=2 + ad=44 pd=34 as=0 ps=0
m4 Qb WL a_58_n244 Gnd nch w=3 l=2 + ad=0 pd=0 as=39 ps=36 m5 Q WL BL Gnd nch w=3 l=2 + ad=0 pd=0 as=19 ps=18 m6 GND BL a_58_n244 Gnd nch w=4 l=2 + ad=0 pd=0 as=0 ps=0 m7 a_58_n244 BL vdd Vdd pch w=3 l=2 + ad=19 pd=18 as=0 ps=0 C1 BL GND 20pF C2 WL GND 100fF Vdd Vdd GND DC 1.8V Vb BL GND PULSE (0 1.8 50ns 5ns 5ns 50ns 100ns) Vw WL GND PULSE (1.8 0 25ns 5ns 5ns 25ns 50ns).measure tran tplh trig V(WL) val=0.18 rise=1 targ V(Q) val=1.62 rise=1.measure tran tphl trig V(WL) val=0.18 rise=2 targ V(Q) val=0.18 fall=1.tran 1ns 150ns start=0.option POST=2 brief.end ** hspice subcircuit dictionary Output: ****** * hspice file created from memcell.ext - technology: scmos ****** transient analysis tnom= 25.000 temp= 25.000 ****** tplh= 2.3264E-09 targ= 5.7826E-08 trig= 5.5500E-08 tphl= 3.6179E-09 targ= 1.0912E-07 trig= 1.0550E-07 ***** job concluded ****** Star-HSPICE -- 2001.2 (20010615) 10:00:34 06/12/2002 solaris ****** * hspice file created from memcell.ext - technology: scmos ****** job statistics summary tnom= 25.000 temp= 25.000 ****** total memory used 462 kbytes # nodes = 23 # elements= 11 # diodes= 0 # bjts = 0 # jfets = 0 # mosfets = 8 analysis time # points tot. iter conv.iter op point 0.01 1 9 transient 0.14 151 304 102 rev= 12 readin 0.72 errchk 0.04 setup 0.00 output 0.00 total cpu time 1.08 seconds job started at 10:00:34 06/12/2002 job ended at 10:00:41 06/12/2002 lic: Release hspice token(s) >info: ***** hspice job concluded W
Writing one to one bit cell takes 2.326 ns, and writing zero takes 3.62ns. Awaves Writing one and zero in one bit memory cell 4. Discussion In this project, I tried to save area as much as possible. Since memory bit is the most crucial part, I made sure transistor sizing was appropriate, and didn t modify anything from the original technique, but I saved area by locating the inverter between Bit line and Bit line bar in letting them share the Vdd and GND lines between cells. This way, the total memory core became pretty small. Both decoders were also carefully designed so that they didn t waste any space, but at the same time give a correct operation. One thing to mention in decoder design is that, I implemented pull-up device, rather than precharge device. In other words, the clock is actually grounded, and the circuit is always activated. This was done by increasing the length of PMOS transistors, which connects the Vdd supply to the word select line. Since the selected lines are directly connected to inverters, weak zero of the selected line could create a strong one that gives the right operation. In this way, even though I am using dynamic pass transistor decoder design, there is no need to have pre-charge period, and also, there is no need of external clock signal. However, the disadvantage could be the static power consumption. I used the bus wires to interconnect the memory lines to mux lines. This gives a neat design, but in trade off, it wasted some amount of area. After all, I put most of the time on column decoder part. I aligned the 2-to-4 predecoder, mux, tristate buffer in one rectangular space with similar width to the memory core, that allowed to use the area very efficiently. First, I used NMOS pass
transistor MUX, but passing weak one aroused as a problem. After trying width of 19, which seemed to work separately, but didn t turn out to work altogether, I changed the whole mux design to CPL(complementary pass transistor logic). The picture below is the former design I used. Both design used exactly the same area, since CPL doesn t need NMOS width of 19. 5. Conclusion In this project, I designed 256-bit(32Byte) SRAM memory with six transistor memory cell. The whole design consists of 256-bit memory core of area 4661 mm 2, which is about the half size of overall 8567 mm 2. The peripheral devices include row decoder and column decoder to allocate address space, which together, map 5 bit input to one of the 32Byte memory space, and the tristate buffer, which regulates the Read/Write operation.