Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator
|
|
- Joleen Hunt
- 5 years ago
- Views:
Transcription
1 Developing an Architecture for a Single-Flux Quantum Based Reconfigurable Accelerator F. Mehdipour, Hiroaki Honda*, * H. Kataoka, K. Inoue and K. Murakami Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan *Institute of Systems, Information Technologies and Nanotechnologies (ISIT), Fukuoka, Japan farhad@c.csce.kyushu-ua.c.jp
2 Agenda Introduction ti SFQ-LSRDP General Architecture The Design Procedure and Tool Chain Input/ Output Nodes Placement Area Minimization Experimental Results Conclusions
3 CREST-JST SFQ-RDP Project (2006~): A Low-power, high-performance reconfigurable processor based on single-flux quantum circuits Yokohama National Univ. SFQ-FPU chip, cell library Nagoya Univ. SFQ-RDP chip, cell library, and wiring Prof. A. Fujimaki et al. Prof. N. Yoshikawa et al. SFQ-LSRDP Kyushu Univ. Architecture, Compiler and Applications Prof. K. Murakami et al. Nagoya Univ. CAD for logic design Superconducting and arithmetic circuits Research Lab. (SRL) Prof. N. Takagi (Leader) SFQ process et al. Dr. S. Nagasawa et al.
4 Goals Discovering appropriate scientific applications Developing compiler tools Developing performance analyzing tools Designing and Implementing SFQ-LSRDP architecture considering the features and limitations of SFQ circuits
5 How a reconfigurable processor works Non-critical code Computation-intensive (critical) code GPP LSRDP Non-critical code PE PE PE... PE Computation-intensive (critical) code Non-critical code... PE PE ORN LSRDP... PE PE PE ORN PE PE... PE Application code Main Memory
6 Single-flux quantum (SFQ) against CMOS CMOS main issues in implementing a large accelerator: High electric power consumption High heat radiation Difficulties in high-density packing SFQ Features: High-speed switching and signal transmission Low power consumption Compact implementation (smaller area) Suitable for pipeline processing of data stream 磁束量子 Single Flux Quantum Superconductivity 超伝導ループ loop ジョセフソン接合 Josephson junction
7 Outline of large-scale reconfigurable data-path th (LSRDP) processor PE LSRDP... PE PE PE GPP ORN : Operand Routing Network PE : : : : PE PE... PE ORN Reconfigurable data-path components: A matrix of large number of floatingpoint Functional Units (FUs) Reconfigurable Operand Routing Network : (ORN) Dynamic reconfiguration facilities Streaming Buffer (SB) for I/O ports Main Memory PE... PE PE PE SB : : :... : SMAC Scratchpad Memory Features: Handling data flow graphs (DFGs) extracted from scientific applications Pipeline execution Burst transfer of input /output rearranged data from/to memory Reduced no. of memory accesses (alleviating the memory wall problem)
8 SFQ-LSRDP General Architecture
9 LSRDP architecture Processing Elements Input ports FU (Functional Unit): implements basic 64-bit double-precision floating point operations including: ADD/SUB and MUL TU(transfer unit): as a routing resource for transferring data b/w inconsecutive rows MUL Node 15 TU FU TU FU FU TU FU PE including two components TU FU TU TU Four functionalities Output ports
10 PE structures FU TU FU - - TU PE Basic arch. 3-inps/2-outs FU TU TU TU - FU TU - FU TU - FU TU TU FU TU TU TU TU TU FU TU PE arch. I 4-inps/3-outs FU TU TU TU PE arch. II 3-inps/3-outs FU - - TU FU TU TU TU TU-TU TU
11 Layout types- Type I W A A A A T T T M M M M ORN T A M T Each PE implements ADD/SUB and MUL M A : MUL : ADD/SUB H A A A A T T T M M M M ORN A A A A T T T M M M M.. T T A T M ADD/SUB A T M MUL TU T : Transfer Unit. ORN A M T A M T A M T A M T A M T Flexible but consumes a lot of resources
12 Layout types- Type II W Each PE implements ADD/SUB or MUL A T M T A T A T M T ORN A T M T A T A T M T ORN Each PE implements ADD/SUB or MUL ADD/SUBA TU M T A T A T MUL M T TU H... ORN A T M T A T A T M T
13 Maximum connection length (MCL)- Definition MCL: maximum horizontal distance b/w two PEs located in two subsequent rows
14 An ORN structure T FPU T FPU T FPU T FPU T FPU ORN ½CB ½CB ½CB ½CB ½CB ½CB ½CB ½CB ½CB ½CB T2 CB CB CB CB CB CB CB CB CB T2 T2 CB CB CB CB CB CB CB CB CB CB CB CB CB CB CB CB CB CB CB T2 2bit shift register T2 CB T2 CB T2 CB T2 CB T2 CB T FPU T FPU T FPU T FPU T FPU ORN is consisted of 2-bit shift registers, 1-by-2 2and2by 2-by-22 cross bar switches A. Fujimaki, et al., Demonstration of an SFQ-Based Accelerator Prototype for a High-Performance Computer, ASC08, 2008.
15 Dynamic reconfiguration architecture Three bit-stream lines for dynamic reconfiguration of: Immediate registers (64bit) in each PE Selector bits for muxes selecting the input data of FUs Cross-bar switches in ORNs
16 What should be decided during the design procedure Width and Height? The number of I/O ports? Maximum Connection Length (MCL)? ORN size and structure? Layout: FU types (ADD/SUB and MUL)? Reconfiguration mechanism? (PE, ORN, Immediate data) On-chip memory configuration?
17 Th D i P d d The Design Procedure and Tool Chain
18 Compiler and design flow DFGs are manually generated DFG mapping results are employed for: Analyzing LSRDP architecture statistics (a quantitative approach) Generating LSRDP configuration bit-streams
19 Benchmark applications Finite differential method calculation of 2 nd order partial differential equations 1dim-Heat equation (Heat) 1dim-Vibration equation (Vibration) 2dim-Poisson equation (Poisson) Quantum chemistry application Recursive parts of Electron Repulsion Integral calculation (ERI-Rec) Types of operations in the calculations: ADD/SUB and MUL
20 DFG extraction- Heat equation 1-dim. heat equation for T(x,t) T ( x, t ) T ( x, t ) A 2 t x 2 (A is const.) T(i-1,j) T(i,j) T(i+1,j) Calculation by Finite Difference Method (FDM) T ( x i, t j1 ) D* T ( x i, t j ) B * T ( x i1, t j ) T ( x Basic DFG can be extended to horizontal and vertical directions to make a larger DFG i1, t j ) Basic DFG + * * + D B T(i,j+1)
21 A sample DFG - Heat Inputs: 32 Outputs: 16 Operations: 721 Immediates: 364 A sample DFG (Heat)
22 DFG mapping flow DFG LSRDP Architecture Description Longest connections MCL= 2 Placing DFG nodes on LSRDP Placing IO nodes Re-placing DFG nodes on LSRDP (considering IO nodes positions) Routing connections Re-palcing output nodes Routing Inp/Out connections Modified Mapping Flow Configuration File
23 Placing Input/Output Nodes
24 Fan-out based I/O nodes placement ni: the number of children of input node i Ci1, Ci2, Ci3, Ci,ni X: location of the input node i Total Connection Length: TCL= Ci1-X + Ci2- X + Ci,ni-X Objective: Minimize TCL ni= 1 X= Ci1 ni= 2 Ci1 <= X <= Ci2 ni= 3 X = Ci2 ni>=2 X = Cij, j=2 ni-1
25 One main reason for the large MCL Inputs Ports are far from each other
26 Proximity-factor based placement Proximity factor indicates how far a pair of input ports should be located from each other For a pair of input nodes The larger number of closer descendants, higher proximity factor is assigned S ij i,j :aset of common descendants for input nodes i and j D k,i (=D k,j ): distance of common descendant node k to the input nodes i and j (it is equal to ASAP execution level of the node) P p,2 p2,1... p n, 1 p n, 2 p1, 1... n... p... 2, n p i, j p j, i if i j 1 if i D k S i k, i, j j
27 Proximity factor-example I1 I2 I S1,3 7 S 1,2 P( I 1, I P( I, I3) p1, ,6,7 ) 1 4 p 1,2 1 2 S 2, P( I, I3) p2, Inputs nodes I1 and I2 should be located closer than I3
28 Input nodes placement alg.: Example C ( l ) C ( r ) r 1 p ij il1 r 1 pij il1 1 i l 1 r i if C(l)> C(r) l= l+1, L[l]=j else r= r+1, L[r]=j Placing the 1 st input node with the highest proximity factor N/2-3 N/2-2 N/2-1 1 N/2+1 N/2+2 N/2+3 Placing the 2 nd input node with the highest proximity factor N/2-3 N/ N/2+1 N/2+2 N/2+3
29 Input ports placement alg.: Example Placing i-th input node N/2-K N/2+M l r If C(l)> C(r): i N/2+M If C(r)> () C(l): () l N/2-K i r l r
30 Area Minimization
31 Estimating the area of a PE Area(FU)= Area(ADD/SUB)= Area(MUL) FU TU Area(TU)= Area(MUX)~ 0.1 Area (FU) FU TU TU PE basic arch Layout I: Area(PE)= 2.1x Area(FU), Layout II: Area(PE)= 1.1x Area(FU) A B C PE arch. I Layout I: Area(PE)= 2.2x Area(FU), Layout II: Area(PE)= 1.2x Area(FU) A B C FU TU TU TU op PE arch. II Layout I: Area(PE)= 2.2x 2 Area(FU) sel mux TU Layout II: Area(PE)= 1.2x Area(FU)
32 Estimating the ORN area-pe Basic arch. FU TU W Basic arch 3-inps/2-outs Num mber of row ws = 1.5 Basic arch. MCL= 1 Number of columns = 4 MCL Area (ORN) = 1.5 x W x (4 x MCL) x Area (CB) W: the no. of the PEs in a RDP row
33 Estimating the ORN area-pe arch. I TU FU TU PE arch. I 4-inps/3-outs = 2 W Number of rows MCL= 1 Number of columns = 6 MCL+2 Area (ORN) = 2 x W x (6 x MCL+ 2) x Area (CB)
34 Estimating the ORN area-pe arch. II FU TU TU TU PE arch. II 3-inps/3-outs Numb ber of row ws = W 1.5 MCL= 2 Number of columns = 4 MCL+1 Area (ORN) = 1.5 x W x (4 x MCL + 1) x Area (CB)
35 A modified connection length measurement New measurement technique for the net length src Connection length measurement: d v dest initial C.L.= d h modified C.L.= d h / d v d h src C.L.(previous)= 3 C.L.(new)=3 C.L.(previous)= 3 C.L.(new)=1 dest1 dest2
36 A modified connection length measurement- Example Parent 2 is chosen when C.L. is measured as d h /d v MCL= 1 Parent 1 d h 0 4 0, 4/3 d h /d v 0, 4 1, 3 2, 2 3, 1 4, 0 1, 1 2, 2/3 3,1/3 4, 0 d h is chosen when C.L. 0, 1 is measured as d h 0, 4 1, 3 2, 2 3, 1 4, 0 d , h /d v 1, 0.5 3/2, 1/4 2, 0 MCL= 2
37 MCL minimization- Using a MCL threshold A maximum threshold is assumed for the MCL During the placement process: For each CL larger than the threshold, the vertical distance increases as: d v = CL/MCL_Threshold PE with the min. C.L to the source max permitted length= 2 src d h =3 > max permitted length d v = 1 dest dest d v= d v + [3/2]=d v +1= 2
38 Basic placement and routing vs. integrated placement and routing DFG DFG Placing Input Nodes Placing Input Nodes using PF-based alg. LSRDP Architecture Description Placing Operational & Output Nodes LSRDP Architecture Description Placing Operational Nodes & Routing Nets (node by node) Routing Nets Placing Output Nodes Final Map Routing IO Nets Final Map Routing Output Nets Basic Placement and Routing Flow Integrated Placement and Routing Flow
39 Experimental Results
40 Specifications of the benchmark DFGs # of # of # of # of pure max. inp. Max. DFG nodes inputs outputs ops nodes fan-out fan-out Heat-8x Heat-8x Heat-16x Poisson-3x Vibration-4x Vibration-8x ERI ERI ERI ERI Max
41 Evaluation results for various architectures- MCL and ORN sizes Layout-I Layout-II S1 S2 S1 S2 PE basic arch MCL PE arch. I PE arch. II ORN size PE basic arch (overall) PE arch. I x CB PE arch. II nodes placement Connection length measurement S1 fan-out based l h S2 proximity-factor based l hv S2 results in smaller MCL and ORN size for both layout types
42 Evaluation results for various architectures- no. of utilized PEs No. of PEs (overall) x PE Layout-I Layout-II S1 S2 S1 S2 PE basic arch PE arch. I PE arch. II By using l hv, larger number of RDP rows are utilized larger number of PEs will be employed for S2
43 Evaluation results for various architectures- overall LSRDP area (KJJ) FU TU TU FU TU Basic PE arch. PE arch. I FU TU TU TU PE arch. II 3-inps/2-outs 4-inps/3-outs 3-inps/3-outs Overall LSRDP Area x (KJJ) Layout-I Layout-II S1 S2 S1 S2 PE basic arch PE arch. I PE arch. II S2 results in smaller overall area in terms of KJJ for both layout types Layout II results in smaller area PE arch. II gives smaller area
44 A sample ORN implementation Block diagram of a high frequency test bench clkin_hf ladder clkin_lfin clkin_lfout data_in input shift register circuit under test output shift register data_out A photograph p of a chip with 1-to-3 ORN prototype test bench circuit under test ladder mm 5 m input shift register output shift register
45 Conclusions SFQ-LSRDP is a basic core of a high-performance low-power computer Data Flow Graphs (DFGs) extracted from scientific applications are mapped on the LSRDP LSRDP micro-architecture is designed based on characteristics of DFGs via a quantitative approach LSRDP is promising for resolving issues originated from CMOS technology as well as achieving remarkable performance Acknowledgement: This research was supported in part by Core Research for Evolutional Science and Technology (CREST) of Japan Science and Technology Corporation (JST).
46 Thanks for your attention! Any questions?
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor
A Combined Analytical and Simulation-Based Model for Performance Evaluation of a Reconfigurable Instruction Set Processor Farhad Mehdipour, H. Noori, B. Javadi, H. Honda, K. Inoue, K. Murakami Faculty
More information12 th Superconducting SFQ VLSI Workshop (SSV 2019) Technical Program
12 th Superconducting SFQ VLSI Workshop (SSV 2019) Technical Program Venue: Advanced ICT Research Institute, National Institute of Information and Communications Technology (NICT), Kobe, Japan Wednesday,
More informationSerial Parallel Multiplier Design in Quantum-dot Cellular Automata
Serial Parallel Multiplier Design in Quantum-dot Cellular Automata Heumpil Cho and Earl E. Swartzlander, Jr. Application Specific Processor Group Department of Electrical and Computer Engineering The University
More informationProcessor Design & ALU Design
3/8/2 Processor Design A. Sahu CSE, IIT Guwahati Please be updated with http://jatinga.iitg.ernet.in/~asahu/c22/ Outline Components of CPU Register, Multiplexor, Decoder, / Adder, substractor, Varity of
More informationISSN (PRINT): , (ONLINE): , VOLUME-4, ISSUE-10,
A NOVEL DOMINO LOGIC DESIGN FOR EMBEDDED APPLICATION Dr.K.Sujatha Associate Professor, Department of Computer science and Engineering, Sri Krishna College of Engineering and Technology, Coimbatore, Tamilnadu,
More informationVLSI Signal Processing
VLSI Signal Processing Lecture 1 Pipelining & Retiming ADSP Lecture1 - Pipelining & Retiming (cwliu@twins.ee.nctu.edu.tw) 1-1 Introduction DSP System Real time requirement Data driven synchronized by data
More informationPerformance Evaluation of a Reconfigurable Instruction Set Processor
九州大学学術情報リポジトリ Kyushu University Institutional Repository Performance Evaluation of a Reconfigurable Instruction Set Processor Mehdipour, Farhad Faculty of Information Science and Electrical Engineering,
More informationNovel Devices and Circuits for Computing
Novel Devices and Circuits for Computing UCSB 594BB Winter 2013 Lecture 4: Resistive switching: Logic Class Outline Material Implication logic Stochastic computing Reconfigurable logic Material Implication
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEMORY INPUT-OUTPUT CONTROL DATAPATH
More informationAn Optical Parallel Adder Towards Light Speed Data Processing
An Optical Parallel Adder Towards Light Speed Data Processing Tohru ISHIHARA, Akihiko SHINYA, Koji INOUE, Kengo NOZAKI and Masaya NOTOMI Kyoto University NTT Nanophotonics Center / NTT Basic Research Laboratories
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 21: Shifters, Decoders, Muxes [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationArea-Time Optimal Adder with Relative Placement Generator
Area-Time Optimal Adder with Relative Placement Generator Abstract: This paper presents the design of a generator, for the production of area-time-optimal adders. A unique feature of this generator is
More informationHardware Design I Chap. 4 Representative combinational logic
Hardware Design I Chap. 4 Representative combinational logic E-mail: shimada@is.naist.jp Already optimized circuits There are many optimized circuits which are well used You can reduce your design workload
More informationEE 660: Computer Architecture Out-of-Order Processors
EE 660: Computer Architecture Out-of-Order Processors Yao Zheng Department of Electrical Engineering University of Hawaiʻi at Mānoa Based on the slides of Prof. David entzlaff Agenda I4 Processors I2O2
More informationDesign for Testability
Design for Testability Outline Ad Hoc Design for Testability Techniques Method of test points Multiplexing and demultiplexing of test points Time sharing of I/O for normal working and testing modes Partitioning
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationGALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders
GALOP : A Generalized VLSI Architecture for Ultrafast Carry Originate-Propagate adders Dhananjay S. Phatak Electrical Engineering Department State University of New York, Binghamton, NY 13902-6000 Israel
More informationDesign for Testability
Design for Testability Outline Ad Hoc Design for Testability Techniques Method of test points Multiplexing and demultiplexing of test points Time sharing of I/O for normal working and testing modes Partitioning
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] When we talk about the number of operands in an instruction (a 1-operand or a 2-operand instruction, for example), what do we mean? [2] What are the two main ways to define performance? [2] Predicting
More information3. (2) What is the difference between fixed and hybrid instructions?
1. (2 pts) What is a "balanced" pipeline? 2. (2 pts) What are the two main ways to define performance? 3. (2) What is the difference between fixed and hybrid instructions? 4. (2 pts) Clock rates have grown
More informationEEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture. Rajeevan Amirtharajah University of California, Davis
EEC 216 Lecture #3: Power Estimation, Interconnect, & Architecture Rajeevan Amirtharajah University of California, Davis Outline Announcements Review: PDP, EDP, Intersignal Correlations, Glitching, Top
More informationVHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES
VHDL DESIGN AND IMPLEMENTATION OF C.P.U BY REVERSIBLE LOGIC GATES 1.Devarasetty Vinod Kumar/ M.tech,2. Dr. Tata Jagannadha Swamy/Professor, Dept of Electronics and Commn. Engineering, Gokaraju Rangaraju
More information[2] Predicting the direction of a branch is not enough. What else is necessary?
[2] What are the two main ways to define performance? [2] Predicting the direction of a branch is not enough. What else is necessary? [2] The power consumed by a chip has increased over time, but the clock
More informationALUs and Data Paths. Subtitle: How to design the data path of a processor. 1/8/ L3 Data Path Design Copyright Joanne DeGroat, ECE, OSU 1
ALUs and Data Paths Subtitle: How to design the data path of a processor. Copyright 2006 - Joanne DeGroat, ECE, OSU 1 Lecture overview General Data Path of a multifunction ALU Copyright 2006 - Joanne DeGroat,
More informationUsing A54SX72A and RT54SX72S Quadrant Clocks
Application Note AC169 Using A54SX72A and RT54SX72S Quadrant Clocks Architectural Overview The A54SX72A and RT54SX72S devices offer four quadrant clock networks (QCLK0, 1, 2, and 3) that can be driven
More informationALU A functional unit
ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1
More informationWhere are we? Data Path Design
Where are we? Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath Data Path Design
More informationEEC 116 Lecture #5: CMOS Logic. Rajeevan Amirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation
EEC 116 Lecture #5: CMOS Logic Rajeevan mirtharajah Bevan Baas University of California, Davis Jeff Parkhurst Intel Corporation nnouncements Quiz 1 today! Lab 2 reports due this week Lab 3 this week HW
More informationCSE370: Introduction to Digital Design
CSE370: Introduction to Digital Design Course staff Gaetano Borriello, Brian DeRenzi, Firat Kiyak Course web www.cs.washington.edu/370/ Make sure to subscribe to class mailing list (cse370@cs) Course text
More informationSpecial Nodes for Interface
fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster
More informationIntroduction to CMOS VLSI Design Lecture 1: Introduction
Introduction to CMOS VLSI Design Lecture 1: Introduction David Harris, Harvey Mudd College Kartik Mohanram and Steven Levitan University of Pittsburgh Introduction Integrated circuits: many transistors
More informationCMOS Digital Integrated Circuits Lec 13 Semiconductor Memories
Lec 13 Semiconductor Memories 1 Semiconductor Memory Types Semiconductor Memories Read/Write (R/W) Memory or Random Access Memory (RAM) Read-Only Memory (ROM) Dynamic RAM (DRAM) Static RAM (SRAM) 1. Mask
More informationEE141-Fall 2011 Digital Integrated Circuits
EE4-Fall 20 Digital Integrated Circuits Lecture 5 Memory decoders Administrative Stuff Homework #6 due today Project posted Phase due next Friday Project done in pairs 2 Last Lecture Last lecture Logical
More informationLogic BIST. Sungho Kang Yonsei University
Logic BIST Sungho Kang Yonsei University Outline Introduction Basics Issues Weighted Random Pattern Generation BIST Architectures Deterministic BIST Conclusion 2 Built In Self Test Test/ Normal Input Pattern
More informationVLSI System Design Part V : High-Level Synthesis(2) Oct Feb.2007
VLSI System Design Part V : High-Level Synthesis(2) Oct.2006 - Feb.2007 Lecturer : Tsuyoshi Isshiki Dept. Communications and Integrated Systems, Tokyo Institute of Technology isshiki@vlsi.ss.titech.ac.jp
More informationGrasping The Deep Sub-Micron Challenge in POWERFUL Integrated Circuits
E = B; H = J + D D = ρ ; B = 0 D = ρ ; B = 0 Yehia Massoud ECE Department Rice University Grasping The Deep Sub-Micron Challenge in POWERFUL Integrated Circuits ECE Affiliates 10/8/2003 Background: Integrated
More informationCSCI-564 Advanced Computer Architecture
CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA
More informationFloating Point Representation and Digital Logic. Lecture 11 CS301
Floating Point Representation and Digital Logic Lecture 11 CS301 Administrative Daily Review of today s lecture w Due tomorrow (10/4) at 8am Lab #3 due Friday (9/7) 1:29pm HW #5 assigned w Due Monday 10/8
More informationBuilding a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering
More informationCombinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13
Combinatorial Logic Design Multiplexers and ALUs CS 64: Computer Organization and Design Logic Lecture #13 Ziad Matni Dept. of Computer Science, UCSB Administrative Re: Midterm Exam #2 Graded! 5/22/18
More informationA Novel LUT Using Quaternary Logic
A Novel LUT Using Quaternary Logic 1*GEETHA N S 2SATHYAVATHI, N S 1Department of ECE, Applied Electronics, Sri Balaji Chockalingam Engineering College, Arani,TN, India. 2Assistant Professor, Department
More informationMicroprocessor Power Analysis by Labeled Simulation
Microprocessor Power Analysis by Labeled Simulation Cheng-Ta Hsieh, Kevin Chen and Massoud Pedram University of Southern California Dept. of EE-Systems Los Angeles CA 989 Outline! Introduction! Problem
More informationImplementation of Clock Network Based on Clock Mesh
International Conference on Information Technology and Management Innovation (ICITMI 2015) Implementation of Clock Network Based on Clock Mesh He Xin 1, a *, Huang Xu 2,b and Li Yujing 3,c 1 Sichuan Institute
More information1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished?
1. (2 )Clock rates have grown by a factor of 1000 while power consumed has only grown by a factor of 30. How was this accomplished? 2. (2 )What are the two main ways to define performance? 3. (2 )What
More informationReduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs
Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,
More informationHw 6 due Thursday, Nov 3, 5pm No lab this week
EE141 Fall 2005 Lecture 18 dders nnouncements Hw 6 due Thursday, Nov 3, 5pm No lab this week Midterm 2 Review: Tue Nov 8, North Gate Hall, Room 105, 6:30-8:30pm Exam: Thu Nov 10, Morgan, Room 101, 6:30-8:00pm
More informationCircuit Modeling for Practical Many-core Architecture Design Exploration
Circuit Modeling for Practical Many-core Architecture Design Exploration Redefining design abstractions Dean Truong Bevan Baas VLSI Computation Lab University of California, Davis Outline Motivation Circuit
More informationJin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering. Jungli, Taiwan
Chapter 7 Sequential Circuits Jin-Fu Li Advanced Reliable Systems (ARES) Lab. epartment of Electrical Engineering National Central University it Jungli, Taiwan Outline Latches & Registers Sequencing Timing
More information! Memory. " RAM Memory. ! Cell size accounts for most of memory array size. ! 6T SRAM Cell. " Used in most commercial chips
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec : April 3, 8 Memory: Core Cells Today! Memory " RAM Memory " Architecture " Memory core " SRAM " DRAM " Periphery Penn ESE 57 Spring 8 - Khanna
More informationECE/CS 250 Computer Architecture
ECE/CS 250 Computer Architecture Basics of Logic Design: Boolean Algebra, Logic Gates (Combinational Logic) Tyler Bletsch Duke University Slides are derived from work by Daniel J. Sorin (Duke), Alvy Lebeck
More informationCMOS Ising Computer to Help Optimize Social Infrastructure Systems
FEATURED ARTICLES Taking on Future Social Issues through Open Innovation Information Science for Greater Industrial Efficiency CMOS Ising Computer to Help Optimize Social Infrastructure Systems As the
More informationNumbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture
Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose
More informationVLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1
VLSI Design Adder Design [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1 Major Components of a Computer Processor Devices Control Memory Input Datapath
More informationItanium TM Processor Clock Design
Itanium TM Processor Design Utpal Desai 1, Simon Tam, Robert Kim, Ji Zhang, Stefan Rusu Intel Corporation, M/S SC12-502, 2200 Mission College Blvd, Santa Clara, CA 95052 ABSTRACT The Itanium processor
More informationECE 250 / CPS 250 Computer Architecture. Basics of Logic Design Boolean Algebra, Logic Gates
ECE 250 / CPS 250 Computer Architecture Basics of Logic Design Boolean Algebra, Logic Gates Benjamin Lee Slides based on those from Andrew Hilton (Duke), Alvy Lebeck (Duke) Benjamin Lee (Duke), and Amir
More information4. (3) What do we mean when we say something is an N-operand machine?
1. (2) What are the two main ways to define performance? 2. (2) When dealing with control hazards, a prediction is not enough - what else is necessary in order to eliminate stalls? 3. (3) What is an "unbalanced"
More informationMark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions
Lecture Slides Intro Number Systems Logic Functions EE 0 in Context EE 0 EE 20L Logic Design Fundamentals Logic Design, CAD Tools, Lab tools, Project EE 357 EE 457 Computer Architecture Using the logic
More informationQuantum Dot Structures Measuring Hamming Distance for Associative Memories
Article Submitted to Superlattices and Microstructures Quantum Dot Structures Measuring Hamming Distance for Associative Memories TAKASHI MORIE, TOMOHIRO MATSUURA, SATOSHI MIYATA, TOSHIO YAMANAKA, MAKOTO
More informationExploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units
Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern
More informationECE 2300 Digital Logic & Computer Organization
ECE 2300 Digital Logic & Computer Organization pring 201 More inary rithmetic LU 1 nnouncements Lab 4 prelab () due tomorrow Lab 5 to be released tonight 2 Example: Fixed ize 2 C ddition White stone =
More informationExploiting In-Memory Processing Capabilities for Density Functional Theory Applications
Exploiting In-Memory Processing Capabilities for Density Functional Theory Applications 2016 Aug 23 P. F. Baumeister, T. Hater, D. Pleiter H. Boettiger, T. Maurer, J. R. Brunheroto Contributors IBM R&D
More informationRetiming. delay elements in a circuit without affecting the input/output characteristics of the circuit.
Chapter Retiming NCU EE -- SP VLSI esign. Chap. Tsung-Han Tsai 1 Retiming & A transformation techniques used to change the locations of delay elements in a circuit without affecting the input/output characteristics
More informationCMPEN 411 VLSI Digital Circuits Spring Lecture 19: Adder Design
CMPEN 411 VLSI Digital Circuits Spring 2011 Lecture 19: Adder Design [Adapted from Rabaey s Digital Integrated Circuits, Second Edition, 2003 J. Rabaey, A. Chandrakasan, B. Nikolic] Sp11 CMPEN 411 L19
More informationWhere are we? Data Path Design. Bit Slice Design. Bit Slice Design. Bit Slice Plan
Where are we? Data Path Design Subsystem Design Registers and Register Files dders and LUs Simple ripple carry addition Transistor schematics Faster addition Logic generation How it fits into the datapath
More informationAnalysis and Synthesis of Weighted-Sum Functions
Analysis and Synthesis of Weighted-Sum Functions Tsutomu Sasao Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan April 28, 2005 Abstract A weighted-sum
More informationComputer Engineering Department. CC 311- Computer Architecture. Chapter 4. The Processor: Datapath and Control. Single Cycle
Computer Engineering Department CC 311- Computer Architecture Chapter 4 The Processor: Datapath and Control Single Cycle Introduction The 5 classic components of a computer Processor Input Control Memory
More informationSPECIAL PROJECT PROGRESS REPORT
SPECIAL PROJECT PROGRESS REPORT Progress Reports should be 2 to 10 pages in length, depending on importance of the project. All the following mandatory information needs to be provided. Reporting year
More informationDesigning Cellular Automata Structures using Quantum-dot Cellular Automata
Designing Cellular Automata Structures using Quantum-dot Cellular Automata Mayur Bubna, Subhra Mazumdar, Sudip Roy and Rajib Mall Department of Computer Sc. & Engineering Indian Institute of Technology,
More informationRealization of programmable logic array using compact reversible logic gates 1
Realization of programmable logic array using compact reversible logic gates 1 E. Chandini, 2 Shankarnath, 3 Madanna, 1 PG Scholar, Dept of VLSI System Design, Geethanjali college of engineering and technology,
More informationVectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier
Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Espen Stenersen Master of Science in Electronics Submission date: June 2008 Supervisor: Per Gunnar Kjeldsberg, IET Co-supervisor: Torstein
More informationSimple Instruction-Pipelining (cont.) Pipelining Jumps
6.823, L9--1 Simple ruction-pipelining (cont.) + Interrupts Updated March 6, 2000 Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Src1 ( j / ~j ) Src2 ( / Ind) Pipelining Jumps
More informationAN EXAMPLE OF ILOC ABSTRACTION (VECTORIZATION AND MODULARIZATION) William Bricken March 2003
AN EXAMPLE OF ILOC ABSTRACTION (VECTORIZATION AND MODULARIZATION) William Bricken March 2003 The ILOC tools for abstraction of netlists into arbitrary bit-width vectors and into repeated functional modules
More informationA COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte
A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER Jesus Garcia and Michael J. Schulte Lehigh University Department of Computer Science and Engineering Bethlehem, PA 15 ABSTRACT Galois field arithmetic
More informationCOVER SHEET: Problem#: Points
EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)
More informationOn-Line Hardware Implementation for Complex Exponential and Logarithm
On-Line Hardware Implementation for Complex Exponential and Logarithm Ali SKAF, Jean-Michel MULLER * and Alain GUYOT Laboratoire TIMA / INPG - 46, Av. Félix Viallet, 3831 Grenoble Cedex * Laboratoire LIP
More informationEECS 579: SOC Testing
EECS 579: SOC Testing Core-Based Systems-On-A-Chip (SOCs) Cores or IP circuits are predesigned and verified functional units of three main types Soft core: synthesizable RTL Firm core: gate-level netlist
More informationFeatured Articles Advanced Research into AI Ising Computer
156 Hitachi Review Vol. 65 (2016), No. 6 Featured Articles Advanced Research into AI Ising Computer Masanao Yamaoka, Ph.D. Chihiro Yoshimura Masato Hayashi Takuya Okuyama Hidetaka Aoki Hiroyuki Mizuno,
More informationHYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI Sanjay Ranka and Sartaj Sahni
HYPERCUBE ALGORITHMS FOR IMAGE PROCESSING AND PATTERN RECOGNITION SANJAY RANKA SARTAJ SAHNI 1989 Sanjay Ranka and Sartaj Sahni 1 2 Chapter 1 Introduction 1.1 Parallel Architectures Parallel computers may
More informationWARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays
WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan Gomathisankaran Iowa State University gmdev@iastate.edu Akhilesh Tyagi Iowa State University tyagi@iastate.edu ➀ Introduction
More information2. Accelerated Computations
2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message
More informationESE 570: Digital Integrated Circuits and VLSI Fundamentals
ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 19: March 29, 2018 Memory Overview, Memory Core Cells Today! Charge Leakage/Charge Sharing " Domino Logic Design Considerations! Logic Comparisons!
More informationProgrammable Logic Devices
Programmable Logic Devices Mohammed Anvar P.K AP/ECE Al-Ameen Engineering College PLDs Programmable Logic Devices (PLD) General purpose chip for implementing circuits Can be customized using programmable
More informationKINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK
KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK SUBJECT CODE: EC 1354 SUB.NAME : VLSI DESIGN YEAR / SEMESTER: III / VI UNIT I MOS TRANSISTOR THEORY AND
More informationParallelism in Computer Arithmetic: A Historical Perspective
Parallelism in Computer Arithmetic: A Historical Perspective 21s 2s 199s 198s 197s 196s 195s Behrooz Parhami Aug. 218 Parallelism in Computer Arithmetic Slide 1 University of California, Santa Barbara
More informationSchool of EECS Seoul National University
4!4 07$ 8902808 3 School of EECS Seoul National University Introduction Low power design 3974/:.9 43 Increasing demand on performance and integrity of VLSI circuits Popularity of portable devices Low power
More informationChapter 5 CMOS Logic Gate Design
Chapter 5 CMOS Logic Gate Design Section 5. -To achieve correct operation of integrated logic gates, we need to satisfy 1. Functional specification. Temporal (timing) constraint. (1) In CMOS, incorrect
More informationComputer Architecture 10. Fast Adders
Computer Architecture 10 Fast s Ma d e wi t h Op e n Of f i c e. o r g 1 Carry Problem Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance
More informationEE 466/586 VLSI Design. Partha Pande School of EECS Washington State University
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University pande@eecs.wsu.edu Lecture 8 Power Dissipation in CMOS Gates Power in CMOS gates Dynamic Power Capacitance switching Crowbar
More informationA Novel Nanometric Reversible Four-bit Signed-magnitude Adder/Subtractor. Soudebeh Boroumand
A Novel Nanometric Reversible Four-bit Signed-magnitude Adder/Subtractor Soudebeh Boroumand Department of Computer Engineering, Tabriz Branch, Islamic Azad University, Tabriz, Iran sb.boroumand@gmail.com
More informationDesign and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives
Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny
More informationClock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements.
1 2 Introduction Clock signal in digital circuit is responsible for synchronizing the transfer to the data between processing elements. Defines the precise instants when the circuit is allowed to change
More information! Charge Leakage/Charge Sharing. " Domino Logic Design Considerations. ! Logic Comparisons. ! Memory. " Classification. " ROM Memories.
ESE 57: Digital Integrated Circuits and VLSI Fundamentals Lec 9: March 9, 8 Memory Overview, Memory Core Cells Today! Charge Leakage/ " Domino Logic Design Considerations! Logic Comparisons! Memory " Classification
More informationESE 570: Digital Integrated Circuits and VLSI Fundamentals
ESE 570: Digital Integrated Circuits and VLSI Fundamentals Lec 21: April 4, 2017 Memory Overview, Memory Core Cells Penn ESE 570 Spring 2017 Khanna Today! Memory " Classification " ROM Memories " RAM Memory
More informationThe goal differs from prime factorization. Prime factorization would initialize all divisors to be prime numbers instead of integers*
Quantum Algorithm Processor For Finding Exact Divisors Professor J R Burger Summary Wiring diagrams are given for a quantum algorithm processor in CMOS to compute, in parallel, all divisors of an n-bit
More informationCombinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan
Combinational Logic Mantıksal Tasarım BBM23 section instructor: Ufuk Çelikcan Classification. Combinational no memory outputs depends on only the present inputs expressed by Boolean functions 2. Sequential
More informationD esign of Random W a lk er for Monte-Carlo M ethod P a rt I I {Electronic Device)
Journal o f the In stitu te o f Polytech n ics, Osaka City U niversity, V ol. 11, No. I, S eries A D esign of Random W a lk er for Monte-Carlo M ethod P a rt I I {Electronic Device) Heihachiro H ir a i
More informationDesign for Manufacturability and Power Estimation. Physical issues verification (DSM)
Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity
More informationSlide Set 6. for ENEL 353 Fall Steve Norman, PhD, PEng. Electrical & Computer Engineering Schulich School of Engineering University of Calgary
Slide Set 6 for ENEL 353 Fall 2017 Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary Fall Term, 2017 SN s ENEL 353 Fall 2017 Slide Set 6 slide
More informationLecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995
Lecture 34: Portable Systems Technology Background Professor Randy H. Katz Computer Science 252 Fall 1995 RHK.F95 1 Technology Trends: Microprocessor Capacity 100000000 10000000 Pentium Transistors 1000000
More information