An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits

Similar documents
An Optical Parallel Adder Towards Light Speed Data Processing

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Hardware Design I Chap. 4 Representative combinational logic

CS 140 Lecture 14 Standard Combinational Modules

Binary addition by hand. Adding two bits

Chapter 5 Arithmetic Circuits

Arithmetic Circuits-2

LOGIC CIRCUITS. Basic Experiment and Design of Electronics. Ho Kyung Kim, Ph.D.

Low Latency Architectures of a Comparator for Binary Signed Digits in a 28-nm CMOS Technology

Arithmetic Circuits-2

ARITHMETIC COMBINATIONAL MODULES AND NETWORKS

COMBINATIONAL LOGIC CIRCUITS. Dr. Mudathir A. Fagiri

Serial Parallel Multiplier Design in Quantum-dot Cellular Automata

ECE 545 Digital System Design with VHDL Lecture 1. Digital Logic Refresher Part A Combinational Logic Building Blocks

LOGIC CIRCUITS. Basic Experiment and Design of Electronics

Computer Architecture 10. Fast Adders

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

9. Datapath Design. Jacob Abraham. Department of Electrical and Computer Engineering The University of Texas at Austin VLSI Design Fall 2017

EECS150 - Digital Design Lecture 24 - Arithmetic Blocks, Part 2 + Shifters

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis

EFFICIENT MULTIOUTPUT CARRY LOOK-AHEAD ADDERS

DIGITAL TECHNICS. Dr. Bálint Pődör. Óbuda University, Microelectronics and Technology Institute

Proposal to Improve Data Format Conversions for a Hybrid Number System Processor

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

Carry Look Ahead Adders

Binary Multipliers. Reading: Study Chapter 3. The key trick of multiplication is memorizing a digit-to-digit table Everything else was just adding

ENGIN 112 Intro to Electrical and Computer Engineering

ECE380 Digital Logic. Positional representation

A Gray Code Based Time-to-Digital Converter Architecture and its FPGA Implementation

Adders, subtractors comparators, multipliers and other ALU elements

What s the Deal? MULTIPLICATION. Time to multiply

Binary addition example worked out

Adders, subtractors comparators, multipliers and other ALU elements

Logic and Computer Design Fundamentals. Chapter 5 Arithmetic Functions and Circuits

CMPEN 411 VLSI Digital Circuits Spring Lecture 21: Shifters, Decoders, Muxes

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Chapter 4. Combinational: Circuits with logic gates whose outputs depend on the present combination of the inputs. elements. Dr.

Introduction to Digital Logic

Adders - Subtractors

VLSI Design, Fall Logical Effort. Jacob Abraham

Arithmetic Building Blocks

PERFORMANCE IMPROVEMENT OF REVERSIBLE LOGIC ADDER

Systems I: Computer Organization and Architecture

Bit-Sliced Design. EECS 141 F01 Arithmetic Circuits. A Generic Digital Processor. Full-Adder. The Binary Adder

Digital Techniques. Figure 1: Block diagram of digital computer. Processor or Arithmetic logic unit ALU. Control Unit. Storage or memory unit

ECE 545 Digital System Design with VHDL Lecture 1A. Digital Logic Refresher Part A Combinational Logic Building Blocks

Design of Sequential Circuits

VLSI Design. [Adapted from Rabaey s Digital Integrated Circuits, 2002, J. Rabaey et al.] ECE 4121 VLSI DEsign.1

Introduction to Digital Logic Missouri S&T University CPE 2210 Subtractors

ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN. Week 9 Dr. Srinivas Shakkottai Dept. of Electrical and Computer Engineering

Combinational Logic. Mantıksal Tasarım BBM231. section instructor: Ufuk Çelikcan

Xarxes de distribució del senyal de. interferència electromagnètica, consum, soroll de conmutació.

I. INTRODUCTION. CMOS Technology: An Introduction to QCA Technology As an. T. Srinivasa Padmaja, C. M. Sri Priya

Total Time = 90 Minutes, Total Marks = 100. Total /10 /25 /20 /10 /15 /20

14:332:231 DIGITAL LOGIC DESIGN

PARALLEL DIGITAL-ANALOG CONVERTERS

Novel Bit Adder Using Arithmetic Logic Unit of QCA Technology

Addition of QSD intermediat e carry and sum. Carry/Sum Generation. Fig:1 Block Diagram of QSD Addition

L15: Custom and ASIC VLSI Integration

Analog and Telecommunication Electronics

Lecture 2 Review on Digital Logic (Part 1)

Design of Digital Adder Using Reversible Logic

VLSI Arithmetic Lecture 10: Multipliers

Lecture 8: Sequential Multipliers

A COMBINED 16-BIT BINARY AND DUAL GALOIS FIELD MULTIPLIER. Jesus Garcia and Michael J. Schulte

Carry Look-ahead Adders. EECS150 - Digital Design Lecture 12 - Combinational Logic & Arithmetic Circuits Part 2. Carry Look-ahead Adders

Combinational Logic Design Arithmetic Functions and Circuits

Lecture 25. Dealing with Interconnect and Timing. Digital Integrated Circuits Interconnect

The Broadband Fixed-Angle Source Technique (BFAST) LUMERICAL SOLUTIONS INC

Timing Issues. Digital Integrated Circuits A Design Perspective. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolić. January 2003

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

GMU, ECE 680 Physical VLSI Design 1

CHAPTER VI COMBINATIONAL LOGIC BUILDING BLOCKS

14:332:231 DIGITAL LOGIC DESIGN. Why Binary Number System?

Lecture 9: Clocking, Clock Skew, Clock Jitter, Clock Distribution and some FM

Tunable Floating-Point for Energy Efficient Accelerators

Lecture 23. Dealing with Interconnect. Impact of Interconnect Parasitics

An Ultrafast Optical Digital Technology

Combinational Logic. Lan-Da Van ( 范倫達 ), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C.

mith College Computer Science CSC270 Spring 16 Circuits and Systems Lecture Notes Week 2 Dominique Thiébaut

ELCT201: DIGITAL LOGIC DESIGN

Additional Gates COE 202. Digital Logic Design. Dr. Muhamed Mudawar King Fahd University of Petroleum and Minerals

14 Gb/s AC Coupled Receiver in 90 nm CMOS. Masum Hossain & Tony Chan Carusone University of Toronto

MTJ-Based Nonvolatile Logic-in-Memory Architecture and Its Application

EECS 270 Midterm Exam 2 Fall 2009

A Low-Error Statistical Fixed-Width Multiplier and Its Applications

COMBINATIONAL LOGIC FUNCTIONS

L8/9: Arithmetic Structures

CSE140: Components and Design Techniques for Digital Systems. Decoders, adders, comparators, multipliers and other ALU elements. Tajana Simunic Rosing

KINGS COLLEGE OF ENGINEERING DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING QUESTION BANK

This Unit: Arithmetic. CIS 371 Computer Organization and Design. Pre-Class Exercise. Readings

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

ECE429 Introduction to VLSI Design

Mark Redekopp, All rights reserved. Lecture 1 Slides. Intro Number Systems Logic Functions

Physical Design of Digital Integrated Circuits (EN0291 S40) Sherief Reda Division of Engineering, Brown University Fall 2006

University of Toronto Faculty of Applied Science and Engineering Final Examination

CS470: Computer Architecture. AMD Quad Core

The Design Procedure. Output Equation Determination - Derive output equations from the state table

EE 447 VLSI Design. Lecture 5: Logical Effort

Transcription:

An Approximate Parallel Multiplier with Deterministic Errors for Ultra-High Speed Integrated Optical Circuits Jun Shiomi 1, Tohru Ishihara 1, Hidetoshi Onodera 1, Akihiko Shinya 2, Masaya Notomi 2 1 Graduate School of Informatics, Kyoto University, Japan 2 NTT Nanophotonics Center / NTT Basic Research Laboratories, Japan 1

Beyond Optical Communication Background: Advancement in nanophotonic devices Enables on-chip interconnects for broadband commutation CPU Core Optical Network-on-Chip Network interface CPU CPU CPU Inter-chip optical network To higher hierarchy (e.g., server-to-server) Goal: Add functional unit to boost up on-chip communication E.g. pattern recognition 2

Approximate Parallel Multiplier with Nanophotonic Devices CPU Core Optical Network-on-Chip Network interface CPU CPU CPU Inter-chip optical network To higher hierarchy (e.g., server-to-server) Optical implementation of approximate parallel multiplier - For Machine Learning (ML), Approximate Computing (AC) - 11% deterministic error at the worst case Enables signal processing with ultra-low latency - Example (32-bit parallel multiplier) W/o approximation: > 800 ps (< 1.25 GHz) This work: 123 ps ( 8.1 GHz) 6.5 boost 3

Outline Background Parallel Multiplier Using Nanophotonic Devices Approximate Parallel Multiplier Overview Detailed Implementation Performance Evaluation Conclusion 4

Photonic Crystal-Based Optical Pass-Gate (OPG) [2] Directional coupler Length > 100μm 0: cross Electric signal 1: pass Light Light Modulator Length 1.3 μm < 1 ps Electric signal 1: pass 0: block < 1 ps Electric signal 0: pass 1: block n-inp p-inp < 1 ps < 1 ps [2] T. Ishihara, et al., International SoC Design Conference, 2016 5

OptoElectoric Conversion Delay Serial connection Cascade connection Electric signal Optical signal OptoElectric () Light speed ( 1 ps /gate) τ OPG conversion ( 25 ps) τ OE Reducing s on a critical path is a key to ultra-fast operation 6

Issues in Conventional Optical Parallel Multiplier 5-bit array multiplier Partial Product PP2 PP1 PP0 Optical Full Adder (FA) [2] PP3 FA X Y CI π not CO S PP4 FA FA CLA [2] T. Ishihara, et al., International SoC Design Conference, 2016 sum Issue: #(s) = (Bit width n) Large latency for large n 1τ OE = 25 ps 32τ OE = 800 ps 64τ OE = 1600 ps 40 GHz 1.25 GHz 625 MHz Too slow! 7

Outline Background Parallel Multiplier Using Nanophotonic Devices Approximate Parallel Multiplier Overview Detailed Implementation Performance Evaluation Conclusion 8

Basic Idea of the Proposed Approximate Multiplier X n Y n = 2 log 2 X n +log 2 Y n X n, Y n Z + n-bit integer Fixed point value Optical signal Electric signal X n Y n Approximate logarithm Approximate logarithm Add [2] 2n-bit integer Approximate antilogarithm Result (X n Y n ) Serial connections only! Light speed signal processing [2] T. Ishihara, et al., International SoC Design Conference, 2016 Three converters on a critical path for any bit width n 9

Concept of Approximate Logarithm [3] 2 log 2 X n +log 2 Y n 4 3 W/o approximation E.g. log 2 7 approximation MSB 0 1 1 1 3 2 1 0 LSB log 2 X n 2 1. Find max. digit that has 1 10 2 (= Priority encoder) 1 0 W/ linear approximation (This work) 1 2 4 7 8 X n 2. Simply return the remainder 11 2 (= Shifter) Approximate log 2 7 result: 10.110 2 (Fixed point) Approximate antilogarithm: Inverse function of the logarithm [3] J. N. Mitchell, IRE Transactions on Electronic Computers, 1962 10

Priority Priority Encoder-Based Approximate Logarithm (n = 4) 2 log 2 X n +log 2 Y n Integer part of log 2 X n Fractional part of log 2 X n High MSB 1 0 2 1 0 LSB 0: cross 1: pass x 3 x 2 0: block 1: pass x 1 Low x 0 0: pass 1: block Light source Light source Priority encoder Shifter Only one converter on a critical path for any bit width n 11

Integer part s Barrel Shifter-Based Approximate Antilogarithm (n = 4) 2 log 2 X n +log 2 Y n Carry from adder of fractional part MSB Multiplication results 7 6 5 4 3 2 1 0 LSB LSB 0: cross 1: pass 0 1 2 Barrel Shifter 0: block 1: pass MSB Light source MSB 2 1 0 LSB log 2 X n + log 2 Y n (Fixed point & Optical signal) s Fractional part One converter on a critical path 12

Outline Background Parallel Multiplier Using Nanophotonic Devices Approximate Parallel Multiplier Overview Detailed Implementation Performance Evaluation Conclusion 13

Performance Analysis Critical path Serial connections only! X n Approximate logarithm Add [2] Approximate antilogarithm Result (X n Y n ) Latency = 3τ OE + O nτ OPG 25 ps 1 ps τ OE = 25 ps Error: Deterministic [3] 11% (when X n = 3 2 i and Y n = 3 2 j ) 0% (when X n = 2 i or Y n = 2 j ) i, j Z + τ OPG = 1 ps τ OPG = 1 ps [2] T. Ishihara, et al., International SoC Design Conference, 2016 [3] J. N. Mitchell, IRE Transactions on Electronic Computers, 1962 14

Performance Comparison Latency Error Conventional (array multiplier) > nτ OE 0 Proposed 3τ OE + O nτ OPG 11%, 0% Conventional: > 800 ps (< 1.25 GHz) n = 32 Proposed: 123 ps ( 8.1 GHz) 6.5 boost More than 6. 5 faster w/ deterministic errors Application: Machine learning, approximate computing 15

Multiplication result (optical power) Verification Using Optoelectronic Circuit Simulator (n = 4) MSB 7-bit 6-bit 5-bit 4-bit 3-bit 2-bit 1-bit 0-bit LSB 0 s X= 0 0 0 1 Y= 0 0 0 1 XY= 0 0 0 0 0 0 0 1 X= 1 1 1 1 Y= 0 0 0 1 XY= 0 0 0 0 1 1 1 1 X= 1 1 1 1 Y= 1 1 1 1 XY= 1 1 1 0 0 0 0 0 1 ns 16

Conclusion and Future Work Approximate parallel multiplier using optical path-gates - Ultra-fast: 8.1 GHz More than 6.5 faster operation - Deterministic error: From 11% to 0% - Correct operation confirmed by optoelectronic circuit simulator Future work - Power & area evaluation - Comparison with CMOS-based parallel multiplier 17