A Deep Convolutional Neural Network Based on Nested Residue Number System

Size: px
Start display at page:

Download "A Deep Convolutional Neural Network Based on Nested Residue Number System"

Transcription

1 A Deep Convolutional Neural Network Based on Nested Residue Number System Hiroki Nakahara Tsutomu Sasao Ehime University, Japan Meiji University, Japan

2 Outline Background Deep convolutional neural network (DCNN) Residue number system (RNS) DCNN using nested RNS (NRNS) Experimental results Conclusion

3 Background Deep Neural Network Multi-layer neuron model Used for embedded vision system FPGA realization is suitable for real-time systems faster than the CPU Lower power consumption than the GPU Fixed point representation is sufficient High-performance per area is desired 3

4 Deep Convolutional Neural Network (DCNN) 4

5 x = Artificial Neuron w (Bias) y f (u) x x w w + u f(u) y u N i w i x i x N... w N x i : Input signal w i : Weight u: Internal state f(u): Activation function (Sigmoid, ReLU, etc.) y: Output signal 5

6 Deep Convolutional Neural Network(DCNN) for ImageNet D convolutional layer, pooling layer, and fully connection layer 6

7 D Convolutional Layer Consumes more than 9% of the computation time Multiply-accumulation (MAC) operation is performed K z ij y ij K K m n x i m, j n w mn K x ij : Input signal y ij : Bias w mn : Weight K: Kernel size z ij : Output signal 7

8 Realization of D Convolutional Layer Requires more than billion MACs! Our realization Time multiplexing Nested Residue Number System(NRNS) + * * + BRAM + * * FPGA + * * + BRAM + * * FPGA Converter Converter Converter Converter * * * * * * * * * * * * * * * * * * * * * * * * * * * * Converter Converter Converter Converter BRAM BRAM BRAM BRAM Off chip Memory Off chip Memory 8

9 Residue Number System(RNS) 9

10 Residue Number System (RNS) Defined by a set of L mutually prime integer constants m,m,...,m L No pair modulus have a common factor with any other Typically, prime number is used as moduli set An arbitrary integer X can be uniquely represented by a tuple of L integers (X,X,,X L ), where X i X (mod m i Dynamic range M L i m i )

11 Multiplication on RNS Moduli set 3,4,5, X=8, Y= Z=X Y=6=(,,) X=(,,3), Y=(,,) Z=(4 mod 3, mod 4,6 mod 5) =(,,)=6 BinaryRNS Conversion Parallel Multiplication RNSBinary Conversion

12 BinaryRNS Converter X mod mod 3 mod

13 Functional Decomposition Free variables X=(x3, x4) Bound variables X=(x, x) h(x) Column multiplicity= x x h(x) h(x) x3,x4 4 x=6 [bit] x+ 3 x= [bit] 3

14 Decomposition Chart for X mod 3 X=(x, x) Free variables Bound variables X=(x3, x4, x5) mod 3 = mod 3 = mod 3 = 3 mod 3 = 4 mod 3 = 5 mod 3 = 6 mod 3 = 7 mod 3 = 8 mod 3 = 9 mod 3 = mod 3 = 4

15 Decomposition Chart for X mod 3 Free Bound X=(x3, x4, x5) X=(x, x) x3 x4 x5 h(x ) mod 3 = mod 3 = mod 3 = 3 mod 3 = 4 mod 3 = 5 mod 3 = 6 mod 3 = 7 mod 3 = 8 mod 3 = 9 mod 3 = mod 3 = 5

16 BinaryRNS Converter LUT cascade for X mod m LUT cascade for X mod m BRAM BRAM BRAM 6

17 RNSBinary Converter (m=3) x y 5 x y x3 y Mod m Adder Mod m Adder carry carry 7

18 Problem Moduli set of RNS consists of mutually prime numbers sizes of circuits are all different Example: <7,,3> 3 6 input 3 4 LUT BinaryRNS Converter by BRAMs input LUT 8 input LUT 4 4 RNSBinary Converter by DSP blocks and BRAMs 8

19 DCNN using Nested RNS 9

20 Nested RNS (Z,Z,,Z i,, Z L ) (Z,Z,,(Z i,z i,,z ij ),, Z L ) Ex: <7,,3> <7,,3> Original modulus <7,<5,6,7>,<5,6,7> 3 > <7,<5,6,7>,<5,6,7> 3 >. Reuse the same moduli set. Decompose a large modulo into smaller ones

21 Example of Nested RNS 9x(=48) on <7,<5,6,7>,<5,6,7> 3 > 9 BinaryNRNS Conversion =<5,8,6> <,,9> =<5,<3,,>,<,,6> 3 > <,<,,>,<4,3,> 3 > =<5,<,,>,<4,,5> 3 > Modulo Multiplication =<5,,> =48 RNSBin BinRNS on NRNS

22 Realization of Nested RNS Bin <7,,3> Bin <5,6,7> Bin <5,6,7> input LUT 6 input LUT 6 input LUT 6 input LUT 3 3 <5,6,7> Bin <7,,3> Bin Bin <7,,3> Binary NRNS Bin <5,6,7> Bin <5,6,7> 6 input LUT 6 input LUT 6 input LUT <5,6,7> Bin NRNS Binary Realized by BRAMs LUTs BRAMs and DSP blocks

23 Moduli Set for NRNS Conventional RNS (uses 3 moduli) <3,4,5,7,,3,7,9,3,9,3,37,4,43,47,53,59, 6,67,7,73,79,83> Applied the NRNS to moduli that are greater than 5 <3,4,5,7,,3, <3,4,5,7,,3> 7, <3,4,5,7,,3> 9, <3,4,5,7,,3,<3,4,5,7,,3> 7 > 3, <3,4,5,7,,3,<3,4,5,7,,3> 7 > 9,, <3,4,5,7,,3,<3,4,5,7,,3> 7 > 83 > All the 48-bit MAC operations are decomposed into 4-bit ones 3

24 DCNN Architecture using the NRNS Parallel BinNRNS Converters BRAM BRAM... BRAM BRAM BRAM... BRAM... BRAM BRAM... BRAM 6 parallel modulo m i D convolutional units Tree based NRNSBin Converters RNS Bin RNS Bin RNS Bin RNS Bin RNS Bin... Sequencer DDR3 Ctrl. On chip Memory External DDR3SODIMM 4

25 Experimental Results 5

26 Implementation Setup FPGA board: Xilinx VC77 FPGA: Virtex7 VC485T GB DDR3SODIMM 64 bit width) Realized the pre-trained ImageNet by Convnet 48-bit fixed precision Synthesis tool: Xilinx Vivado4. Timing constrain: 4MHz 6

27 Comparison with Other Precision Implementations Max. Freq. [MHz] FPGA Performance [GOPS] Performance per area [GOPS/ Slice x 4 ] ASAP9 6bit fixed 5 Viretex5 LX33T PACT fixed 5 Viretex5 SX4T 7..9 FPL9 48bit fixed 5 Spartax3A DSP ISCA 48bit fixed Virtex5 SX4T ICCD3 fixed 5 Virtex6 LVX4T FPGA5 3bit float Virtex7 VX485T Proposed 48bit fixed 4 Virtex7 VX485T

28 Conclusion Realized the DCNN on the FPGA Time multiplexing Nested RNS MAC operation is realized by small LUTs Functional decomposition are used as follows: BinNRNS converter is realized by BRAMs NRNSBin converter is realized by DSP blocks and BRAMs Performance per area (GOPS/Slice) 5.86 times higher than ISCA s 8

A Deep Convolutional Neural Network Based on Nested Residue Number System

A Deep Convolutional Neural Network Based on Nested Residue Number System A Deep Convolutional Neual Netwok Based on Nested Residue Numbe System Hioki Nakahaa Ehime Univesity, Japan Tsutomu Sasao Meiji Univesity, Japan Abstact A pe-tained deep convolutional neual netwok (DCNN)

More information

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,

More information

Efficient random number generation on FPGA-s

Efficient random number generation on FPGA-s Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation

More information

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System

Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System

More information

On LUT Cascade Realizations of FIR Filters

On LUT Cascade Realizations of FIR Filters On LUT Cascade Realizations of FIR Filters Tsutomu Sasao 1 Yukihiro Iguchi 2 Takahiro Suzuki 2 1 Kyushu Institute of Technology, Dept. of Comput. Science & Electronics, Iizuka 820-8502, Japan 2 Meiji University,

More information

On the Complexity of Error Detection Functions for Redundant Residue Number Systems

On the Complexity of Error Detection Functions for Redundant Residue Number Systems On the Complexity of Error Detection Functions for Redundant Residue Number Systems Tsutomu Sasao 1 and Yukihiro Iguchi 2 1 Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka

More information

A PACKET CLASSIFIER USING LUT CASCADES BASED ON EVMDDS (K)

A PACKET CLASSIFIER USING LUT CASCADES BASED ON EVMDDS (K) A PACKET CLASSIFIER USING CASCADES BASED ON EVMDDS (K) Hiroki Nakahara Tsutomu Sasao Munehiro Matsuura Kagoshima University, Japan Meiji University, Japan Kyushu Institute of Technology, Japan ABSTRACT

More information

Analysis and Synthesis of Weighted-Sum Functions

Analysis and Synthesis of Weighted-Sum Functions Analysis and Synthesis of Weighted-Sum Functions Tsutomu Sasao Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan April 28, 2005 Abstract A weighted-sum

More information

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering

More information

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,

More information

Mapping Sparse Matrix-Vector Multiplication on FPGAs

Mapping Sparse Matrix-Vector Multiplication on FPGAs Mapping Sparse Matrix-Vector Multiplication on FPGAs Junqing Sun 1, Gregory Peterson 1, Olaf Storaasli 2 1 University of Tennessee, Knoxville 2 Oak Ridge National Laboratory July 20, 2007 Outline Introduction

More information

8-Bit Dot-Product Acceleration

8-Bit Dot-Product Acceleration White Paper: UltraScale and UltraScale+ FPGAs WP487 (v1.0) June 27, 2017 8-Bit Dot-Product Acceleration By: Yao Fu, Ephrem Wu, and Ashish Sirasao The DSP architecture in UltraScale and UltraScale+ devices

More information

Computer Architecture 10. Residue Number Systems

Computer Architecture 10. Residue Number Systems Computer Architecture 10 Residue Number Systems Ma d e wi t h Op e n Of f i c e. o r g 1 A Puzzle What number has the reminders 2, 3 and 2 when divided by the numbers 7, 5 and 3? x mod 7 = 2 x mod 5 =

More information

doi: /TCAD

doi: /TCAD doi: 10.1109/TCAD.2006.870407 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 5, MAY 2006 789 Short Papers Analysis and Synthesis of Weighted-Sum Functions Tsutomu

More information

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)

International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational

More information

An Implementation of an Address Generator Using Hash Memories

An Implementation of an Address Generator Using Hash Memories An Implementation of an Address Generator Using Memories Tsutomu Sasao and Munehiro Matsuura Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan Abstract

More information

Digital Architecture for Real-Time CNN-based Face Detection for Video Processing

Digital Architecture for Real-Time CNN-based Face Detection for Video Processing ASPC lab sb229@zips.uakron.edu Grant # June 1509754 27, 2017 1 / 26 igital Architecture for Real-Time CNN-based Face etection for Video Processing Smrity Bhattarai 1, Arjuna Madanayake 1 Renato J. Cintra

More information

Sum-of-Generalized Products Expressions Applications and Minimization

Sum-of-Generalized Products Expressions Applications and Minimization Sum-of-Generalized Products Epressions Applications and Minimization Tsutomu Sasao Department of Computer Science and Electronics Kyushu Institute of Technology Iizuka 80-850 Japan Abstract This paper

More information

Simple neuron model Components of simple neuron

Simple neuron model Components of simple neuron Outline 1. Simple neuron model 2. Components of artificial neural networks 3. Common activation functions 4. MATLAB representation of neural network. Single neuron model Simple neuron model Components

More information

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.

KEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA. GLOBAL JOURNAL OF ADVANCED ENGINEERING TECHNOLOGIES AND SCIENCES DESIGN OF A QUINARY TO RESIDUE NUMBER SYSTEM CONVERTER USING MULTI-LEVELS OF CONVERSION Hassan Amin Osseily Electrical and Electronics Department,

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

An Application of Autocorrelation Functions to Find Linear Decompositions for Incompletely Specified Index Generation Functions

An Application of Autocorrelation Functions to Find Linear Decompositions for Incompletely Specified Index Generation Functions 03 IEEE 43rd International Symposium on Multiple-Valued Logic An Application of Autocorrelation Functions to Find Linear Decompositions for Incompletely Specified Index Generation Functions Tsutomu Sasao

More information

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.

Power Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography. Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces

More information

A High-Speed Realization of Chinese Remainder Theorem

A High-Speed Realization of Chinese Remainder Theorem Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching

More information

LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations

LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations Tsutomu Sasao and Alan Mishchenko Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 80-80, Japan Dept.

More information

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))

A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1)) The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering

More information

DNPU: An Energy-Efficient Deep Neural Network Processor with On-Chip Stereo Matching

DNPU: An Energy-Efficient Deep Neural Network Processor with On-Chip Stereo Matching DNPU: An Energy-Efficient Deep Neural Network Processor with On-hip Stereo atching Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo Semiconductor System Laboratory School of EE, KAIST

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,

More information

Approximate operations in Convolutional Neural Networks with RNS data representation

Approximate operations in Convolutional Neural Networks with RNS data representation Approximate operations in Convolutional Neural Networks with RNS data representation Valentina Arrigoni1, Beatrice Rossi2, Pasqualina Fragneto2 Giuseppe Desoli2 1- Universita degli Studi di Milano - Dipartimento

More information

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017

Neural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017 Neural Network Approximation Low rank, Sparsity, and Quantization zsc@megvii.com Oct. 2017 Motivation Faster Inference Faster Training Latency critical scenarios VR/AR, UGV/UAV Saves time and energy Higher

More information

PAPER Design Methods of Radix Converters Using Arithmetic Decompositions

PAPER Design Methods of Radix Converters Using Arithmetic Decompositions IEICE TRANS. INF. & SYST., VOL.E90 D, NO.6 JUNE 2007 905 PAPER Design Methods of Radix Converters Using Arithmetic Decompositions Yukihiro IGUCHI a), Tsutomu SASAO b), and Munehiro MATSUURA c), Members

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

A Regular Expression Matching Circuit Based on a Decomposed Automaton

A Regular Expression Matching Circuit Based on a Decomposed Automaton Regular Expression Matching Circuit Based on a Decomposed utomaton Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kyushu Institute of Technology, Japan bstract. In this paper, we propose a regular

More information

WITH rapid growth of traditional FPGA industry, heterogeneous

WITH rapid growth of traditional FPGA industry, heterogeneous INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2012, VOL. 58, NO. 1, PP. 15 20 Manuscript received December 31, 2011; revised March 2012. DOI: 10.2478/v10177-012-0002-x Input Variable Partitioning

More information

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster

Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India We solved the discrete logarithm of

More information

Hardware Acceleration of the Tate Pairing in Characteristic Three

Hardware Acceleration of the Tate Pairing in Characteristic Three Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)

More information

ABHELSINKI UNIVERSITY OF TECHNOLOGY

ABHELSINKI UNIVERSITY OF TECHNOLOGY On Repeated Squarings in Binary Fields Kimmo Järvinen Helsinki University of Technology August 14, 2009 K. Järvinen On Repeated Squarings in Binary Fields 1/1 Introduction Repeated squaring Repeated squaring:

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010 A FPGA Implementation of Large Restricted Boltzmann Machines by Charles Lo Supervisor: Paul Chow April 2010 Abstract A FPGA Implementation of Large Restricted Boltzmann Machines Charles Lo Engineering

More information

Karatsuba with Rectangular Multipliers for FPGAs

Karatsuba with Rectangular Multipliers for FPGAs Karatsuba with Rectangular Multipliers for FPGAs Martin Kumm, Oscar Gustafsson, Florent De Dinechin, Johannes Kappauf, Peter Zipf To cite this version: Martin Kumm, Oscar Gustafsson, Florent De Dinechin,

More information

Accelerating Transfer Entropy Computation

Accelerating Transfer Entropy Computation Accelerating Transfer Entropy Computation Shengjia Shao, Ce Guo and Wayne Luk Department of Computing Imperial College London United Kingdom E-mail: {shengjia.shao12, ce.guo10, w.luk}@imperial.ac.uk Stephen

More information

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions

An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions Shinobu Nagayama Tsutomu Sasao Jon T. Butler Dept. of Computer and Network Eng., Hiroshima City University, Hiroshima,

More information

Multi-Terminal Multi-Valued Decision Diagrams for Characteristic Function Representing Cluster Decomposition

Multi-Terminal Multi-Valued Decision Diagrams for Characteristic Function Representing Cluster Decomposition 22 IEEE 42nd International Symposium on Multiple-Valued Logic Multi-Terminal Multi-Valued Decision Diagrams for Characteristic Function Representing Cluster Decomposition Hiroki Nakahara, Tsutomu Sasao,

More information

Arithmetic Operators for Pairing-Based Cryptography

Arithmetic Operators for Pairing-Based Cryptography Arithmetic Operators for Pairing-Based Cryptography Jean-Luc Beuchat Laboratory of Cryptography and Information Security Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1

More information

FUZZY PERFORMANCE ANALYSIS OF NTT BASED CONVOLUTION USING RECONFIGURABLE DEVICE

FUZZY PERFORMANCE ANALYSIS OF NTT BASED CONVOLUTION USING RECONFIGURABLE DEVICE FUZZY PERFORMANCE ANALYSIS OF NTT BASED CONVOLUTION USING RECONFIGURABLE DEVICE 1 Dr.N.Anitha, 2 V.Lambodharan, 3 P.Arunkumar 1 Assistant Professor, 2 Assistant Professor, 3 Assistant Professor 1 Department

More information

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }

An Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 } An Effective New CRT Based Reverse Converter for a Novel Moduli Set +1 1, +1, 1 } Edem Kwedzo Bankas, Kazeem Alagbe Gbolagade Department of Computer Science, Faculty of Mathematical Sciences, University

More information

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli

On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli Behrooz Parhami Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560,

More information

High Performance GHASH Function for Long Messages

High Performance GHASH Function for Long Messages High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS

More information

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES

FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES R.P.MEENAAKSHI SUNDHARI 1, Dr.R.ANITA 2 1 Department of ECE, Sasurie College of Engineering, Vijayamangalam, Tamilnadu, India.

More information

Compact Ring LWE Cryptoprocessor

Compact Ring LWE Cryptoprocessor 1 Compact Ring LWE Cryptoprocessor CHES 2014 Sujoy Sinha Roy 1, Frederik Vercauteren 1, Nele Mentens 1, Donald Donglong Chen 2 and Ingrid Verbauwhede 1 1 ESAT/COSIC and iminds, KU Leuven 2 Electronic Engineering,

More information

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series

A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12, V. Ariyarathna 3, D. F. G. Coelho 1, L. Rakai 1, A. Madanayake 3, R. J. Cintra 4 1 ECE Department,

More information

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

Low-Complexity FPGA Implementation of Compressive Sensing Reconstruction

Low-Complexity FPGA Implementation of Compressive Sensing Reconstruction 2013 International Conference on Computing, Networking and Communications, Multimedia Computing and Communications Symposium Low-Complexity FPGA Implementation of Compressive Sensing Reconstruction Jerome

More information

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m )

Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) 1 / 19 Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) Métairie Jérémy, Tisserand Arnaud and Casseau Emmanuel CAIRN - IRISA July 9 th, 2015 ISVLSI 2015 PAVOIS ANR 12 BS02

More information

Large-Scale FPGA implementations of Machine Learning Algorithms

Large-Scale FPGA implementations of Machine Learning Algorithms Large-Scale FPGA implementations of Machine Learning Algorithms Philip Leong ( ) Computer Engineering Laboratory School of Electrical and Information Engineering, The University of Sydney Computer Engineering

More information

Optimization of new Chinese Remainder theorems using special moduli sets

Optimization of new Chinese Remainder theorems using special moduli sets Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2010 Optimization of new Chinese Remainder theorems using special moduli sets Narendran Narayanaswamy Louisiana State

More information

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Janus: FPGA Based System for Scientific Computing Filippo Mantovani Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and

More information

Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek

Finite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek Finite Fields In practice most finite field applications e.g. cryptography and error correcting codes utilizes a specific type of finite fields, namely the binary extension fields. The following exercises

More information

A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton

A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton 364 IEICE TRANS. INF. & SYST., VOL.E95 D, NO.2 FEBRUARY 2012 PAPER Special Section on Reconfigurable Systems A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton Hiroki

More information

A Virus Scanning Engine Using an MPU and an IGU Based on Row-Shift Decomposition

A Virus Scanning Engine Using an MPU and an IGU Based on Row-Shift Decomposition IEICE TRANS. INF. & SYST., VOL.E96 D, NO.8 AUGUST 2013 1667 PAPER Special Section on Reconfigurable Systems A Virus Scanning Engine Using an MPU and an IGU Based on Row-Shift Decomposition Hiroki NAKAHARA

More information

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr

More information

The GigaFitter for Fast Track Fitting based on FPGA DSP Arrays

The GigaFitter for Fast Track Fitting based on FPGA DSP Arrays The GigaFitter for Fast Track Fitting based on FPGA DSP Arrays Pierluigi Catastini (Universita di Siena - INFN Pisa) For the SVT Collaboration SVT: A Success for CDF 35µm 33µm resol beam σ = 48µm SVT provides

More information

A VLSI Algorithm for Modular Multiplication/Division

A VLSI Algorithm for Modular Multiplication/Division A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp

More information

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA

Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &

More information

Lecture 3 : Introduction to Binary Convolutional Codes

Lecture 3 : Introduction to Binary Convolutional Codes Lecture 3 : Introduction to Binary Convolutional Codes Binary Convolutional Codes 1. Convolutional codes were first introduced by Elias in 1955 as an alternative to block codes. In contrast with a block

More information

Public Key Encryption

Public Key Encryption Public Key Encryption 3/13/2012 Cryptography 1 Facts About Numbers Prime number p: p is an integer p 2 The only divisors of p are 1 and p s 2, 7, 19 are primes -3, 0, 1, 6 are not primes Prime decomposition

More information

Deep Learning (CNNs)

Deep Learning (CNNs) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Table-Based Polynomials for Fast Hardware Function Evaluation

Table-Based Polynomials for Fast Hardware Function Evaluation ASAP 05 Table-Based Polynomials for Fast Hardware Function Evaluation Jérémie Detrey Florent de Dinechin Projet Arénaire LIP UMR CNRS ENS Lyon UCB Lyon INRIA 5668 http://www.ens-lyon.fr/lip/arenaire/ CENTRE

More information

Index Generation Functions: Minimization Methods

Index Generation Functions: Minimization Methods 7 IEEE 47th International Symposium on Multiple-Valued Logic Index Generation Functions: Minimization Methods Tsutomu Sasao Meiji University, Kawasaki 4-857, Japan Abstract Incompletely specified index

More information

Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method

Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method 3510 PAPER Special Section on VLSI Design and CAD Algorithms Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method Shinobu NAGAYAMA a), Tsutomu SASAO

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Residue Number Systems Ivor Page 1

Residue Number Systems Ivor Page 1 Residue Number Systems 1 Residue Number Systems Ivor Page 1 7.1 Arithmetic in a modulus system The great speed of arithmetic in Residue Number Systems (RNS) comes from a simple theorem from number theory:

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Novel Modulo 2 n +1Multipliers

Novel Modulo 2 n +1Multipliers Novel Modulo Multipliers H. T. Vergos Computer Engineering and Informatics Dept., University of Patras, 26500 Patras, Greece. vergos@ceid.upatras.gr C. Efstathiou Informatics Dept.,TEI of Athens, 12210

More information

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture

Numbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose

More information

GENERALIZED ARYABHATA REMAINDER THEOREM

GENERALIZED ARYABHATA REMAINDER THEOREM International Journal of Innovative Computing, Information and Control ICIC International c 2010 ISSN 1349-4198 Volume 6, Number 4, April 2010 pp. 1865 1871 GENERALIZED ARYABHATA REMAINDER THEOREM Chin-Chen

More information

arxiv: v1 [cs.ar] 11 Dec 2017

arxiv: v1 [cs.ar] 11 Dec 2017 Multi-Mode Inference Engine for Convolutional Neural Networks Arash Ardakani, Carlo Condo and Warren J. Gross Electrical and Computer Engineering Department, McGill University, Montreal, Quebec, Canada

More information

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives

Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny

More information

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields

An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields . Motivation and introduction An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields Marcin Rogawski Ekawat Homsirikamol Kris Gaj Cryptographic Engineering Research Group (CERG)

More information

Enumeration of Bent Boolean Functions by Reconfigurable Computer

Enumeration of Bent Boolean Functions by Reconfigurable Computer Enumeration of Bent Boolean Functions by Reconfigurable Computer J. L. Shafer S. W. Schneider J. T. Butler P. Stănică ECE Department Department of ECE Department of Applied Math. US Naval Academy Naval

More information

Design and Implementation of Efficient Modulo 2 n +1 Adder

Design and Implementation of Efficient Modulo 2 n +1 Adder www..org 18 Design and Implementation of Efficient Modulo 2 n +1 Adder V. Jagadheesh 1, Y. Swetha 2 1,2 Research Scholar(INDIA) Abstract In this brief, we proposed an efficient weighted modulo (2 n +1)

More information

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This

More information

Case Studies of Logical Computation on Stochastic Bit Streams

Case Studies of Logical Computation on Stochastic Bit Streams Case Studies of Logical Computation on Stochastic Bit Streams Peng Li 1, Weikang Qian 2, David J. Lilja 1, Kia Bazargan 1, and Marc D. Riedel 1 1 Electrical and Computer Engineering, University of Minnesota,

More information

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at

Volume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND

More information

Fixed-Point Trigonometric Functions on FPGAs

Fixed-Point Trigonometric Functions on FPGAs Fixed-Point Trigonometric Functions on FPGAs Florent de Dinechin Matei Iştoan Guillaume Sergent LIP, Université de Lyon (CNRS/ENS-Lyon/INRIA/UCBL) 46, allée d Italie, 69364 Lyon Cedex 07 June 14th, 2013

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

2. Accelerated Computations

2. Accelerated Computations 2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Quantisation. Efficient implementation of convolutional neural networks. Philip Leong. Computer Engineering Lab The University of Sydney

Quantisation. Efficient implementation of convolutional neural networks. Philip Leong. Computer Engineering Lab The University of Sydney 1/51 Quantisation Efficient implementation of convolutional neural networks Philip Leong Computer Engineering Lab The University of Sydney July 2018 / PAPAA Workshop Australia 2/51 3/51 Outline 1 Introduction

More information

Failures of Gradient-Based Deep Learning

Failures of Gradient-Based Deep Learning Failures of Gradient-Based Deep Learning Shai Shalev-Shwartz, Shaked Shammah, Ohad Shamir The Hebrew University and Mobileye Representation Learning Workshop Simons Institute, Berkeley, 2017 Shai Shalev-Shwartz

More information

Low-complexity generation of scalable complete complementary sets of sequences

Low-complexity generation of scalable complete complementary sets of sequences University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Low-complexity generation of scalable complete complementary sets

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures

Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures Ms. P.THENMOZHI 1, Ms. C.THAMILARASI 2 and Mr. V.VENGATESHWARAN 3 Assistant Professor, Dept. of ECE, J.K.K.College of Technology,

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier

Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Espen Stenersen Master of Science in Electronics Submission date: June 2008 Supervisor: Per Gunnar Kjeldsberg, IET Co-supervisor: Torstein

More information