A Deep Convolutional Neural Network Based on Nested Residue Number System
|
|
- Agnes Gordon
- 5 years ago
- Views:
Transcription
1 A Deep Convolutional Neural Network Based on Nested Residue Number System Hiroki Nakahara Tsutomu Sasao Ehime University, Japan Meiji University, Japan
2 Outline Background Deep convolutional neural network (DCNN) Residue number system (RNS) DCNN using nested RNS (NRNS) Experimental results Conclusion
3 Background Deep Neural Network Multi-layer neuron model Used for embedded vision system FPGA realization is suitable for real-time systems faster than the CPU Lower power consumption than the GPU Fixed point representation is sufficient High-performance per area is desired 3
4 Deep Convolutional Neural Network (DCNN) 4
5 x = Artificial Neuron w (Bias) y f (u) x x w w + u f(u) y u N i w i x i x N... w N x i : Input signal w i : Weight u: Internal state f(u): Activation function (Sigmoid, ReLU, etc.) y: Output signal 5
6 Deep Convolutional Neural Network(DCNN) for ImageNet D convolutional layer, pooling layer, and fully connection layer 6
7 D Convolutional Layer Consumes more than 9% of the computation time Multiply-accumulation (MAC) operation is performed K z ij y ij K K m n x i m, j n w mn K x ij : Input signal y ij : Bias w mn : Weight K: Kernel size z ij : Output signal 7
8 Realization of D Convolutional Layer Requires more than billion MACs! Our realization Time multiplexing Nested Residue Number System(NRNS) + * * + BRAM + * * FPGA + * * + BRAM + * * FPGA Converter Converter Converter Converter * * * * * * * * * * * * * * * * * * * * * * * * * * * * Converter Converter Converter Converter BRAM BRAM BRAM BRAM Off chip Memory Off chip Memory 8
9 Residue Number System(RNS) 9
10 Residue Number System (RNS) Defined by a set of L mutually prime integer constants m,m,...,m L No pair modulus have a common factor with any other Typically, prime number is used as moduli set An arbitrary integer X can be uniquely represented by a tuple of L integers (X,X,,X L ), where X i X (mod m i Dynamic range M L i m i )
11 Multiplication on RNS Moduli set 3,4,5, X=8, Y= Z=X Y=6=(,,) X=(,,3), Y=(,,) Z=(4 mod 3, mod 4,6 mod 5) =(,,)=6 BinaryRNS Conversion Parallel Multiplication RNSBinary Conversion
12 BinaryRNS Converter X mod mod 3 mod
13 Functional Decomposition Free variables X=(x3, x4) Bound variables X=(x, x) h(x) Column multiplicity= x x h(x) h(x) x3,x4 4 x=6 [bit] x+ 3 x= [bit] 3
14 Decomposition Chart for X mod 3 X=(x, x) Free variables Bound variables X=(x3, x4, x5) mod 3 = mod 3 = mod 3 = 3 mod 3 = 4 mod 3 = 5 mod 3 = 6 mod 3 = 7 mod 3 = 8 mod 3 = 9 mod 3 = mod 3 = 4
15 Decomposition Chart for X mod 3 Free Bound X=(x3, x4, x5) X=(x, x) x3 x4 x5 h(x ) mod 3 = mod 3 = mod 3 = 3 mod 3 = 4 mod 3 = 5 mod 3 = 6 mod 3 = 7 mod 3 = 8 mod 3 = 9 mod 3 = mod 3 = 5
16 BinaryRNS Converter LUT cascade for X mod m LUT cascade for X mod m BRAM BRAM BRAM 6
17 RNSBinary Converter (m=3) x y 5 x y x3 y Mod m Adder Mod m Adder carry carry 7
18 Problem Moduli set of RNS consists of mutually prime numbers sizes of circuits are all different Example: <7,,3> 3 6 input 3 4 LUT BinaryRNS Converter by BRAMs input LUT 8 input LUT 4 4 RNSBinary Converter by DSP blocks and BRAMs 8
19 DCNN using Nested RNS 9
20 Nested RNS (Z,Z,,Z i,, Z L ) (Z,Z,,(Z i,z i,,z ij ),, Z L ) Ex: <7,,3> <7,,3> Original modulus <7,<5,6,7>,<5,6,7> 3 > <7,<5,6,7>,<5,6,7> 3 >. Reuse the same moduli set. Decompose a large modulo into smaller ones
21 Example of Nested RNS 9x(=48) on <7,<5,6,7>,<5,6,7> 3 > 9 BinaryNRNS Conversion =<5,8,6> <,,9> =<5,<3,,>,<,,6> 3 > <,<,,>,<4,3,> 3 > =<5,<,,>,<4,,5> 3 > Modulo Multiplication =<5,,> =48 RNSBin BinRNS on NRNS
22 Realization of Nested RNS Bin <7,,3> Bin <5,6,7> Bin <5,6,7> input LUT 6 input LUT 6 input LUT 6 input LUT 3 3 <5,6,7> Bin <7,,3> Bin Bin <7,,3> Binary NRNS Bin <5,6,7> Bin <5,6,7> 6 input LUT 6 input LUT 6 input LUT <5,6,7> Bin NRNS Binary Realized by BRAMs LUTs BRAMs and DSP blocks
23 Moduli Set for NRNS Conventional RNS (uses 3 moduli) <3,4,5,7,,3,7,9,3,9,3,37,4,43,47,53,59, 6,67,7,73,79,83> Applied the NRNS to moduli that are greater than 5 <3,4,5,7,,3, <3,4,5,7,,3> 7, <3,4,5,7,,3> 9, <3,4,5,7,,3,<3,4,5,7,,3> 7 > 3, <3,4,5,7,,3,<3,4,5,7,,3> 7 > 9,, <3,4,5,7,,3,<3,4,5,7,,3> 7 > 83 > All the 48-bit MAC operations are decomposed into 4-bit ones 3
24 DCNN Architecture using the NRNS Parallel BinNRNS Converters BRAM BRAM... BRAM BRAM BRAM... BRAM... BRAM BRAM... BRAM 6 parallel modulo m i D convolutional units Tree based NRNSBin Converters RNS Bin RNS Bin RNS Bin RNS Bin RNS Bin... Sequencer DDR3 Ctrl. On chip Memory External DDR3SODIMM 4
25 Experimental Results 5
26 Implementation Setup FPGA board: Xilinx VC77 FPGA: Virtex7 VC485T GB DDR3SODIMM 64 bit width) Realized the pre-trained ImageNet by Convnet 48-bit fixed precision Synthesis tool: Xilinx Vivado4. Timing constrain: 4MHz 6
27 Comparison with Other Precision Implementations Max. Freq. [MHz] FPGA Performance [GOPS] Performance per area [GOPS/ Slice x 4 ] ASAP9 6bit fixed 5 Viretex5 LX33T PACT fixed 5 Viretex5 SX4T 7..9 FPL9 48bit fixed 5 Spartax3A DSP ISCA 48bit fixed Virtex5 SX4T ICCD3 fixed 5 Virtex6 LVX4T FPGA5 3bit float Virtex7 VX485T Proposed 48bit fixed 4 Virtex7 VX485T
28 Conclusion Realized the DCNN on the FPGA Time multiplexing Nested RNS MAC operation is realized by small LUTs Functional decomposition are used as follows: BinNRNS converter is realized by BRAMs NRNSBin converter is realized by DSP blocks and BRAMs Performance per area (GOPS/Slice) 5.86 times higher than ISCA s 8
A Deep Convolutional Neural Network Based on Nested Residue Number System
A Deep Convolutional Neual Netwok Based on Nested Residue Numbe System Hioki Nakahaa Ehime Univesity, Japan Tsutomu Sasao Meiji Univesity, Japan Abstact A pe-tained deep convolutional neual netwok (DCNN)
More informationOptimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,
More informationEfficient random number generation on FPGA-s
Proceedings of the 9 th International Conference on Applied Informatics Eger, Hungary, January 29 February 1, 2014. Vol. 1. pp. 313 320 doi: 10.14794/ICAI.9.2014.1.313 Efficient random number generation
More informationImplementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System
Implementation Of Digital Fir Filter Using Improved Table Look Up Scheme For Residue Number System G.Suresh, G.Indira Devi, P.Pavankumar Abstract The use of the improved table look up Residue Number System
More informationOn LUT Cascade Realizations of FIR Filters
On LUT Cascade Realizations of FIR Filters Tsutomu Sasao 1 Yukihiro Iguchi 2 Takahiro Suzuki 2 1 Kyushu Institute of Technology, Dept. of Comput. Science & Electronics, Iizuka 820-8502, Japan 2 Meiji University,
More informationOn the Complexity of Error Detection Functions for Redundant Residue Number Systems
On the Complexity of Error Detection Functions for Redundant Residue Number Systems Tsutomu Sasao 1 and Yukihiro Iguchi 2 1 Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka
More informationA PACKET CLASSIFIER USING LUT CASCADES BASED ON EVMDDS (K)
A PACKET CLASSIFIER USING CASCADES BASED ON EVMDDS (K) Hiroki Nakahara Tsutomu Sasao Munehiro Matsuura Kagoshima University, Japan Meiji University, Japan Kyushu Institute of Technology, Japan ABSTRACT
More informationAnalysis and Synthesis of Weighted-Sum Functions
Analysis and Synthesis of Weighted-Sum Functions Tsutomu Sasao Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan April 28, 2005 Abstract A weighted-sum
More informationBuilding a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering
More informationReduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs
Article Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs E. George Walters III Department of Electrical and Computer Engineering, Penn State Erie,
More informationMapping Sparse Matrix-Vector Multiplication on FPGAs
Mapping Sparse Matrix-Vector Multiplication on FPGAs Junqing Sun 1, Gregory Peterson 1, Olaf Storaasli 2 1 University of Tennessee, Knoxville 2 Oak Ridge National Laboratory July 20, 2007 Outline Introduction
More information8-Bit Dot-Product Acceleration
White Paper: UltraScale and UltraScale+ FPGAs WP487 (v1.0) June 27, 2017 8-Bit Dot-Product Acceleration By: Yao Fu, Ephrem Wu, and Ashish Sirasao The DSP architecture in UltraScale and UltraScale+ devices
More informationComputer Architecture 10. Residue Number Systems
Computer Architecture 10 Residue Number Systems Ma d e wi t h Op e n Of f i c e. o r g 1 A Puzzle What number has the reminders 2, 3 and 2 when divided by the numbers 7, 5 and 3? x mod 7 = 2 x mod 5 =
More informationdoi: /TCAD
doi: 10.1109/TCAD.2006.870407 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 25, NO. 5, MAY 2006 789 Short Papers Analysis and Synthesis of Weighted-Sum Functions Tsutomu
More informationInternational Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research)
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Emerging Technologies in Computational
More informationAn Implementation of an Address Generator Using Hash Memories
An Implementation of an Address Generator Using Memories Tsutomu Sasao and Munehiro Matsuura Department of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 820-8502, Japan Abstract
More informationDigital Architecture for Real-Time CNN-based Face Detection for Video Processing
ASPC lab sb229@zips.uakron.edu Grant # June 1509754 27, 2017 1 / 26 igital Architecture for Real-Time CNN-based Face etection for Video Processing Smrity Bhattarai 1, Arjuna Madanayake 1 Renato J. Cintra
More informationSum-of-Generalized Products Expressions Applications and Minimization
Sum-of-Generalized Products Epressions Applications and Minimization Tsutomu Sasao Department of Computer Science and Electronics Kyushu Institute of Technology Iizuka 80-850 Japan Abstract This paper
More informationSimple neuron model Components of simple neuron
Outline 1. Simple neuron model 2. Components of artificial neural networks 3. Common activation functions 4. MATLAB representation of neural network. Single neuron model Simple neuron model Components
More informationKEYWORDS: Multiple Valued Logic (MVL), Residue Number System (RNS), Quinary Logic (Q uin), Quinary Full Adder, QFA, Quinary Half Adder, QHA.
GLOBAL JOURNAL OF ADVANCED ENGINEERING TECHNOLOGIES AND SCIENCES DESIGN OF A QUINARY TO RESIDUE NUMBER SYSTEM CONVERTER USING MULTI-LEVELS OF CONVERSION Hassan Amin Osseily Electrical and Electronics Department,
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationAn Application of Autocorrelation Functions to Find Linear Decompositions for Incompletely Specified Index Generation Functions
03 IEEE 43rd International Symposium on Multiple-Valued Logic An Application of Autocorrelation Functions to Find Linear Decompositions for Incompletely Specified Index Generation Functions Tsutomu Sasao
More informationPower Consumption Analysis. Arithmetic Level Countermeasures for ECC Coprocessor. Arithmetic Operators for Cryptography.
Power Consumption Analysis General principle: measure the current I in the circuit Arithmetic Level Countermeasures for ECC Coprocessor Arnaud Tisserand, Thomas Chabrier, Danuta Pamula I V DD circuit traces
More informationA High-Speed Realization of Chinese Remainder Theorem
Proceedings of the 2007 WSEAS Int. Conference on Circuits, Systems, Signal and Telecommunications, Gold Coast, Australia, January 17-19, 2007 97 A High-Speed Realization of Chinese Remainder Theorem Shuangching
More informationLUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations
LUTMIN: FPGA Logic Synthesis with MUX-Based and Cascade Realizations Tsutomu Sasao and Alan Mishchenko Dept. of Computer Science and Electronics, Kyushu Institute of Technology, Iizuka 80-80, Japan Dept.
More informationA Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form (2 n (2 p ± 1))
The Computer Journal, 47(1), The British Computer Society; all rights reserved A Suggestion for a Fast Residue Multiplier for a Family of Moduli of the Form ( n ( p ± 1)) Ahmad A. Hiasat Electronics Engineering
More informationDNPU: An Energy-Efficient Deep Neural Network Processor with On-Chip Stereo Matching
DNPU: An Energy-Efficient Deep Neural Network Processor with On-hip Stereo atching Dongjoo Shin, Jinmook Lee, Jinsu Lee, Juhyoung Lee, and Hoi-Jun Yoo Semiconductor System Laboratory School of EE, KAIST
More informationArithmetic Operators for Pairing-Based Cryptography
Arithmetic Operators for Pairing-Based Cryptography J.-L. Beuchat 1 N. Brisebarre 2 J. Detrey 3 E. Okamoto 1 1 University of Tsukuba, Japan 2 École Normale Supérieure de Lyon, France 3 Cosec, b-it, Bonn,
More informationApproximate operations in Convolutional Neural Networks with RNS data representation
Approximate operations in Convolutional Neural Networks with RNS data representation Valentina Arrigoni1, Beatrice Rossi2, Pasqualina Fragneto2 Giuseppe Desoli2 1- Universita degli Studi di Milano - Dipartimento
More informationNeural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017
Neural Network Approximation Low rank, Sparsity, and Quantization zsc@megvii.com Oct. 2017 Motivation Faster Inference Faster Training Latency critical scenarios VR/AR, UGV/UAV Saves time and energy Higher
More informationPAPER Design Methods of Radix Converters Using Arithmetic Decompositions
IEICE TRANS. INF. & SYST., VOL.E90 D, NO.6 JUNE 2007 905 PAPER Design Methods of Radix Converters Using Arithmetic Decompositions Yukihiro IGUCHI a), Tsutomu SASAO b), and Munehiro MATSUURA c), Members
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationA Regular Expression Matching Circuit Based on a Decomposed Automaton
Regular Expression Matching Circuit Based on a Decomposed utomaton Hiroki Nakahara, Tsutomu Sasao, and Munehiro Matsuura Kyushu Institute of Technology, Japan bstract. In this paper, we propose a regular
More informationWITH rapid growth of traditional FPGA industry, heterogeneous
INTL JOURNAL OF ELECTRONICS AND TELECOMMUNICATIONS, 2012, VOL. 58, NO. 1, PP. 15 20 Manuscript received December 31, 2011; revised March 2012. DOI: 10.2478/v10177-012-0002-x Input Variable Partitioning
More informationSolving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster
Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India We solved the discrete logarithm of
More informationHardware Acceleration of the Tate Pairing in Characteristic Three
Hardware Acceleration of the Tate Pairing in Characteristic Three CHES 2005 Hardware Acceleration of the Tate Pairing in Characteristic Three Slide 1 Introduction Pairing based cryptography is a (fairly)
More informationABHELSINKI UNIVERSITY OF TECHNOLOGY
On Repeated Squarings in Binary Fields Kimmo Järvinen Helsinki University of Technology August 14, 2009 K. Järvinen On Repeated Squarings in Binary Fields 1/1 Introduction Repeated squaring Repeated squaring:
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationA FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010
A FPGA Implementation of Large Restricted Boltzmann Machines by Charles Lo Supervisor: Paul Chow April 2010 Abstract A FPGA Implementation of Large Restricted Boltzmann Machines Charles Lo Engineering
More informationKaratsuba with Rectangular Multipliers for FPGAs
Karatsuba with Rectangular Multipliers for FPGAs Martin Kumm, Oscar Gustafsson, Florent De Dinechin, Johannes Kappauf, Peter Zipf To cite this version: Martin Kumm, Oscar Gustafsson, Florent De Dinechin,
More informationAccelerating Transfer Entropy Computation
Accelerating Transfer Entropy Computation Shengjia Shao, Ce Guo and Wayne Luk Department of Computing Imperial College London United Kingdom E-mail: {shengjia.shao12, ce.guo10, w.luk}@imperial.ac.uk Stephen
More informationAn Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions
An Exact Optimization Algorithm for Linear Decomposition of Index Generation Functions Shinobu Nagayama Tsutomu Sasao Jon T. Butler Dept. of Computer and Network Eng., Hiroshima City University, Hiroshima,
More informationMulti-Terminal Multi-Valued Decision Diagrams for Characteristic Function Representing Cluster Decomposition
22 IEEE 42nd International Symposium on Multiple-Valued Logic Multi-Terminal Multi-Valued Decision Diagrams for Characteristic Function Representing Cluster Decomposition Hiroki Nakahara, Tsutomu Sasao,
More informationArithmetic Operators for Pairing-Based Cryptography
Arithmetic Operators for Pairing-Based Cryptography Jean-Luc Beuchat Laboratory of Cryptography and Information Security Graduate School of Systems and Information Engineering University of Tsukuba 1-1-1
More informationFUZZY PERFORMANCE ANALYSIS OF NTT BASED CONVOLUTION USING RECONFIGURABLE DEVICE
FUZZY PERFORMANCE ANALYSIS OF NTT BASED CONVOLUTION USING RECONFIGURABLE DEVICE 1 Dr.N.Anitha, 2 V.Lambodharan, 3 P.Arunkumar 1 Assistant Professor, 2 Assistant Professor, 3 Assistant Professor 1 Department
More informationAn Effective New CRT Based Reverse Converter for a Novel Moduli Set { 2 2n+1 1, 2 2n+1, 2 2n 1 }
An Effective New CRT Based Reverse Converter for a Novel Moduli Set +1 1, +1, 1 } Edem Kwedzo Bankas, Kazeem Alagbe Gbolagade Department of Computer Science, Faculty of Mathematical Sciences, University
More informationOn Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli
On Equivalences and Fair Comparisons Among Residue Number Systems with Special Moduli Behrooz Parhami Department of Electrical and Computer Engineering University of California Santa Barbara, CA 93106-9560,
More informationHigh Performance GHASH Function for Long Messages
High Performance GHASH Function for Long Messages Nicolas Méloni 1, Christophe Négre 2 and M. Anwar Hasan 1 1 Department of Electrical and Computer Engineering University of Waterloo, Canada 2 Team DALI/ELIAUS
More informationFAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES
FAST FIR ALGORITHM BASED AREA-EFFICIENT PARALLEL FIR DIGITAL FILTER STRUCTURES R.P.MEENAAKSHI SUNDHARI 1, Dr.R.ANITA 2 1 Department of ECE, Sasurie College of Engineering, Vijayamangalam, Tamilnadu, India.
More informationCompact Ring LWE Cryptoprocessor
1 Compact Ring LWE Cryptoprocessor CHES 2014 Sujoy Sinha Roy 1, Frederik Vercauteren 1, Nele Mentens 1, Donald Donglong Chen 2 and Ingrid Verbauwhede 1 1 ESAT/COSIC and iminds, KU Leuven 2 Electronic Engineering,
More informationA Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series
A Parallel Method for the Computation of Matrix Exponential based on Truncated Neumann Series V. S. Dimitrov 12, V. Ariyarathna 3, D. F. G. Coelho 1, L. Rakai 1, A. Madanayake 3, R. J. Cintra 4 1 ECE Department,
More informationLRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation
LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationLow-Complexity FPGA Implementation of Compressive Sensing Reconstruction
2013 International Conference on Computing, Networking and Communications, Multimedia Computing and Communications Symposium Low-Complexity FPGA Implementation of Compressive Sensing Reconstruction Jerome
More informationSmall FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m )
1 / 19 Small FPGA-Based Multiplication-Inversion Unit for Normal Basis over GF(2 m ) Métairie Jérémy, Tisserand Arnaud and Casseau Emmanuel CAIRN - IRISA July 9 th, 2015 ISVLSI 2015 PAVOIS ANR 12 BS02
More informationLarge-Scale FPGA implementations of Machine Learning Algorithms
Large-Scale FPGA implementations of Machine Learning Algorithms Philip Leong ( ) Computer Engineering Laboratory School of Electrical and Information Engineering, The University of Sydney Computer Engineering
More informationOptimization of new Chinese Remainder theorems using special moduli sets
Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2010 Optimization of new Chinese Remainder theorems using special moduli sets Narendran Narayanaswamy Louisiana State
More informationJanus: FPGA Based System for Scientific Computing Filippo Mantovani
Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass
More informationIntroduction to Deep Learning CMPT 733. Steven Bergner
Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationFinite Fields. SOLUTIONS Network Coding - Prof. Frank H.P. Fitzek
Finite Fields In practice most finite field applications e.g. cryptography and error correcting codes utilizes a specific type of finite fields, namely the binary extension fields. The following exercises
More informationA Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton
364 IEICE TRANS. INF. & SYST., VOL.E95 D, NO.2 FEBRUARY 2012 PAPER Special Section on Reconfigurable Systems A Design Method of a Regular Expression Matching Circuit Based on Decomposed Automaton Hiroki
More informationA Virus Scanning Engine Using an MPU and an IGU Based on Row-Shift Decomposition
IEICE TRANS. INF. & SYST., VOL.E96 D, NO.8 AUGUST 2013 1667 PAPER Special Section on Reconfigurable Systems A Virus Scanning Engine Using an MPU and an IGU Based on Row-Shift Decomposition Hiroki NAKAHARA
More informationSajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationThe GigaFitter for Fast Track Fitting based on FPGA DSP Arrays
The GigaFitter for Fast Track Fitting based on FPGA DSP Arrays Pierluigi Catastini (Universita di Siena - INFN Pisa) For the SVT Collaboration SVT: A Success for CDF 35µm 33µm resol beam σ = 48µm SVT provides
More informationA VLSI Algorithm for Modular Multiplication/Division
A VLSI Algorithm for Modular Multiplication/Division Marcelo E. Kaihara and Naofumi Takagi Department of Information Engineering Nagoya University Nagoya, 464-8603, Japan mkaihara@takagi.nuie.nagoya-u.ac.jp
More informationMultivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA
Multivariate Gaussian Random Number Generator Targeting Specific Resource Utilization in an FPGA Chalermpol Saiprasert, Christos-Savvas Bouganis and George A. Constantinides Department of Electrical &
More informationLecture 3 : Introduction to Binary Convolutional Codes
Lecture 3 : Introduction to Binary Convolutional Codes Binary Convolutional Codes 1. Convolutional codes were first introduced by Elias in 1955 as an alternative to block codes. In contrast with a block
More informationPublic Key Encryption
Public Key Encryption 3/13/2012 Cryptography 1 Facts About Numbers Prime number p: p is an integer p 2 The only divisors of p are 1 and p s 2, 7, 19 are primes -3, 0, 1, 6 are not primes Prime decomposition
More informationDeep Learning (CNNs)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationTable-Based Polynomials for Fast Hardware Function Evaluation
ASAP 05 Table-Based Polynomials for Fast Hardware Function Evaluation Jérémie Detrey Florent de Dinechin Projet Arénaire LIP UMR CNRS ENS Lyon UCB Lyon INRIA 5668 http://www.ens-lyon.fr/lip/arenaire/ CENTRE
More informationIndex Generation Functions: Minimization Methods
7 IEEE 47th International Symposium on Multiple-Valued Logic Index Generation Functions: Minimization Methods Tsutomu Sasao Meiji University, Kawasaki 4-857, Japan Abstract Incompletely specified index
More informationCompact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method
3510 PAPER Special Section on VLSI Design and CAD Algorithms Compact Numerical Function Generators Based on Quadratic Approximation: Architecture and Synthesis Method Shinobu NAGAYAMA a), Tsutomu SASAO
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationResidue Number Systems Ivor Page 1
Residue Number Systems 1 Residue Number Systems Ivor Page 1 7.1 Arithmetic in a modulus system The great speed of arithmetic in Residue Number Systems (RNS) comes from a simple theorem from number theory:
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if
More informationIntroduction to Machine Learning (67577)
Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks
More informationNovel Modulo 2 n +1Multipliers
Novel Modulo Multipliers H. T. Vergos Computer Engineering and Informatics Dept., University of Patras, 26500 Patras, Greece. vergos@ceid.upatras.gr C. Efstathiou Informatics Dept.,TEI of Athens, 12210
More informationNumbering Systems. Computational Platforms. Scaling and Round-off Noise. Special Purpose. here that is dedicated architecture
Computational Platforms Numbering Systems Basic Building Blocks Scaling and Round-off Noise Computational Platforms Viktor Öwall viktor.owall@eit.lth.seowall@eit lth Standard Processors or Special Purpose
More informationGENERALIZED ARYABHATA REMAINDER THEOREM
International Journal of Innovative Computing, Information and Control ICIC International c 2010 ISSN 1349-4198 Volume 6, Number 4, April 2010 pp. 1865 1871 GENERALIZED ARYABHATA REMAINDER THEOREM Chin-Chen
More informationarxiv: v1 [cs.ar] 11 Dec 2017
Multi-Mode Inference Engine for Convolutional Neural Networks Arash Ardakani, Carlo Condo and Warren J. Gross Electrical and Computer Engineering Department, McGill University, Montreal, Quebec, Canada
More informationDesign and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives
Design and FPGA Implementation of Radix-10 Algorithm for Division with Limited Precision Primitives Miloš D. Ercegovac Computer Science Department Univ. of California at Los Angeles California Robert McIlhenny
More informationAn FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields
. Motivation and introduction An FPGA-based Accelerator for Tate Pairing on Edwards Curves over Prime Fields Marcin Rogawski Ekawat Homsirikamol Kris Gaj Cryptographic Engineering Research Group (CERG)
More informationEnumeration of Bent Boolean Functions by Reconfigurable Computer
Enumeration of Bent Boolean Functions by Reconfigurable Computer J. L. Shafer S. W. Schneider J. T. Butler P. Stănică ECE Department Department of ECE Department of Applied Math. US Naval Academy Naval
More informationDesign and Implementation of Efficient Modulo 2 n +1 Adder
www..org 18 Design and Implementation of Efficient Modulo 2 n +1 Adder V. Jagadheesh 1, Y. Swetha 2 1,2 Research Scholar(INDIA) Abstract In this brief, we proposed an efficient weighted modulo (2 n +1)
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationCase Studies of Logical Computation on Stochastic Bit Streams
Case Studies of Logical Computation on Stochastic Bit Streams Peng Li 1, Weikang Qian 2, David J. Lilja 1, Kia Bazargan 1, and Marc D. Riedel 1 1 Electrical and Computer Engineering, University of Minnesota,
More informationVolume 3, No. 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at
Volume 3, No 1, January 2012 Journal of Global Research in Computer Science RESEARCH PAPER Available Online at wwwjgrcsinfo A NOVEL HIGH DYNAMIC RANGE 5-MODULUS SET WHIT EFFICIENT REVERSE CONVERTER AND
More informationFixed-Point Trigonometric Functions on FPGAs
Fixed-Point Trigonometric Functions on FPGAs Florent de Dinechin Matei Iştoan Guillaume Sergent LIP, Université de Lyon (CNRS/ENS-Lyon/INRIA/UCBL) 46, allée d Italie, 69364 Lyon Cedex 07 June 14th, 2013
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More information2. Accelerated Computations
2. Accelerated Computations 2.1. Bent Function Enumeration by a Circular Pipeline Implemented on an FPGA Stuart W. Schneider Jon T. Butler 2.1.1. Background A naive approach to encoding a plaintext message
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationQuantisation. Efficient implementation of convolutional neural networks. Philip Leong. Computer Engineering Lab The University of Sydney
1/51 Quantisation Efficient implementation of convolutional neural networks Philip Leong Computer Engineering Lab The University of Sydney July 2018 / PAPAA Workshop Australia 2/51 3/51 Outline 1 Introduction
More informationFailures of Gradient-Based Deep Learning
Failures of Gradient-Based Deep Learning Shai Shalev-Shwartz, Shaked Shammah, Ohad Shamir The Hebrew University and Mobileye Representation Learning Workshop Simons Institute, Berkeley, 2017 Shai Shalev-Shwartz
More informationLow-complexity generation of scalable complete complementary sets of sequences
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2006 Low-complexity generation of scalable complete complementary sets
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationFast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures
Fast Fir Algorithm Based Area- Efficient Parallel Fir Digital Filter Structures Ms. P.THENMOZHI 1, Ms. C.THAMILARASI 2 and Mr. V.VENGATESHWARAN 3 Assistant Professor, Dept. of ECE, J.K.K.College of Technology,
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationVectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier
Vectorized 128-bit Input FP16/FP32/ FP64 Floating-Point Multiplier Espen Stenersen Master of Science in Electronics Submission date: June 2008 Supervisor: Per Gunnar Kjeldsberg, IET Co-supervisor: Torstein
More information