Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
|
|
- Austen Smith
- 5 years ago
- Views:
Transcription
1 Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering University of Toronto FPGA / 16
2 Motivation Restricted Boltzmann Machines (RBMs) Two-Layer Artificial Neural Network 2 / 16
3 Motivation Restricted Boltzmann Machines (RBMs) Two-Layer Artificial Neural Network Deep neural networks Efficiently trained with RBMs Successfully applied to interesting machine learning problems Learning slow with thousands of nodes per layer 2 / 16
4 Motivation RBM training easily parallelized FPGA implementations achieve dramatic speed-up RBM size limited by available on-chip RAM Virtualized architecture 1 Significant performance hit from limited memory bandwidth This work: Builds upon virtualized architecture Accelerate large RBMs Increase performance 1 D. Ly et al., High-Performance Reconfigurable Hardware Architecture for Restricted Boltzmann Machines, TNN, Nov / 16
5 The Rest of the Talk Restricted Boltzmann Machines Baseline Architecture Performance Improvements Embedded Message Passing Interface (MPI) 4 / 16
6 Restricted Boltzmann Machines Hidden Layer Visible Layer 5 / 16
7 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
8 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
9 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
10 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
11 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
12 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
13 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16
14 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes 5 / 16
15 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes RBM size limited by on-chip memory 5 / 16
16 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes RBM size limited by on-chip memory Train a virtual RBM by time-multiplexing hardware 6 / 16
17 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 W 0,0 W 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes Time Train a virtual RBM by time-multiplexing hardware 6 / 16
18 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 W 0,0 W 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes W 0,0 W 0,1 Time Train a virtual RBM by time-multiplexing hardware 6 / 16
19 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 Hidden Nodes W 0,0 W 0,1 W 0,0 W 0,1 W 1,1 ww 3,2 1,1 w 3,3 Time Train a virtual RBM by time-multiplexing hardware 6 / 16
20 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 W 0,0 W 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w ww 2,3 3,0 1,0 w 3,1 ww 3,2 1,1 w 3,3 Hidden Nodes W 0,0 W 0,1 W 1,1 W 1,0 Time Train a virtual RBM by time-multiplexing hardware 6 / 16
21 Baseline Architecture DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Message Passing Interface (MPI) Network Compute Engines FPGA 7 / 16
22 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16
23 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Three Approaches Partition Size Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16
24 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Three Approaches Partition Size Memory Interface Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16
25 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Three Approaches Partition Size Memory Interface Multi-FPGA Extension Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16
26 Partition Size W 0,0 Weight Matrix w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Address Weights in Block RAM 0 w 0,0 w 0,1 w 0,2 w 0,3 Larger 1 wpartitions: 1,1 w 1,2 w 1,3 w 1,0 Reduce context switches 2 w 2,2 w 2,3 w 2,0 w Increase parallel weight access 2,1 3 w 3,3 w 3,0 w 3,1 w 3,2 BRAM 0 BRAM 1 BRAM 2 BRAM Bits 9 / 16
27 Partition Size W 0,0 Weight Matrix w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Weights in Block RAM 0 w 0,0 w 0,1 w 0,2 w 0,3 Diagonal assignment allows 1 w 1,1 w 1,2 w 1,3 w 1,0 parallel access to rows and2columns w 2,2 w 2,3 w 2,0 w 2,1 Address 3 w 3,3 w 3,0 w 3,1 w 3,2 BRAM 0 BRAM 1 BRAM 2 BRAM Bits 9 / 16
28 Partition Size W 0,0 Weight Matrix Weights in Block RAM w 2,0 w 0,1 w 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 Address w 0,0 w 0,1 w 1,1 w 1,2 w 2,2 w 2,3 w 0,2 w 0,3 w 1,3 w 1,0 w 2,0 w 2,1 w 3,0 w 3,1 w 3,2 w 3,3 3 w 3,3 w 3,0 w 3,1 w 3,2 BRAM 0 BRAM 1 BRAM 2 BRAM Bits 9 / 16
29 Partition Size W 0,0 Weight Matrix Weights in Block RAM w 2,0 w 0,1 w 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 Address w 0,0 w 0,1 w 1,0 w 1,1 w 2,2 w 2,3 w 0,2 w 0,3 w 1,2 w 1,3 w 2,0 w 2,1 w 3,0 w 3,1 w 3,2 w 3,3 3 w 3,2 w 3,3 w 3,0 w 3,3 BRAM 0 BRAM Bits 9 / 16
30 Partition Size W 0,0 Weight Matrix Weights in Block RAM w 2,0 w 0,1 w 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 Address w 0,0 w 0,1 w 1,0 w 1,1 w 2,2 w 2,3 w 0,2 w 0,3 w 1,2 w 1,3 w 2,0 w 2,1 w 3,0 w 3,1 w 3,2 w 3,3 3 w 3,2 w 3,3 w 3,0 w 3,3 BRAM 0 BRAM Bits Block diagonal assignment provides access to packed weights 9 / 16
31 Memory Interface DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Message Passing Interface (MPI) Network Compute Engines FPGA 10 / 16
32 Memory Interface DDR2 Memory Processor Processor Local Bus (PLB) DMA Engine Memory Controller Memory Access Core Native Port Interface (NPI) Message Passing Interface (MPI) Network Compute Engines FPGA 10 / 16
33 Single-FPGA Performance Design compared to original virtualized architecture Performance Metric: Connection Updates per Second (CUPS) Single-FPGA performance increased 4.2 fold over original architecture 11 / 16
34 Multi-FPGA Extension Partitioned Weight Matrix W 0,0 W 1,0 W 0,2 W 0,1 W 0,3 W 1,1 W 1,2 W 1,3 16 context switches required on a single FPGA W 2,0 W 2,1 W 2,2 W 2,3 W 3,0 W 3,1 W 3,2 W 3,3 12 / 16
35 Multi-FPGA Extension Partitioned Weight Matrix Time W 0,0 W 2,0 W 3,0 W 0,1 W 0,2 W 0,3 W 1,0 W 1,1 W 1,2 W 2,1 W 3,1 W 2,2 W 3,2 W 1,3 W 2,3 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space 16 context switches required on a single FPGA Virtualization allows for distribution of partitions in time and space 12 / 16
36 Inter-FPGA Accumulation Partitioned Weight Matrix Time W 0,0 W 2,0 W 0,1 W 0,2 W 0,3 W 1,0 W 1,1 W 1,2 W 2,1 W 2,2 W 1,3 W 2,3 Accumulation in space requires inter-fpga communication W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space 13 / 16
37 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16
38 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16
39 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16
40 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16
41 Inter-FPGA Communication 12,000 Performance Scaling with Network Size Speed MCUPS] 10,000 8,000 6,000 4,000 2,000 4,000 6,000 8,000 Number of nodes in visible and hidden layers 1-FPGA 4-FPGA 14 / 16
42 Embedded Message Passing Interface ArchES-MPI 2 Scalable, high performance on-chip network System component implementations abstracted by ranks Simple incremental improvements Reconfigurable control and dataflow Design portability 2 M. Saldaña et al., MPI as an Abstraction for Software Hardware Interaction for HPRCs, HPRCTA, Nov / 16
43 Conclusion Virtualization allows for the acceleration of large RBMs through the distribution of work in time and space Single-FPGA performance increased 4.2 fold Embedded MPI simplifies design changes and increases design portability 16 / 16
44 RBM Bibliography G. Hinton et al., A fast learning algorithm for deep belief nets, Neural Computation, 2006 G. Hinton et al., Reducing the dimensionality of data with neural networks, Science Jul / 16
45 Weight Pipeline w 0,0 0 w 0,0 w 0,1 w 0,2 w 0,3 w 0,1 BRAM 0 buffer w 0,0 w 1,1 w 0,1 w 1,0 Address w 1,0 w 1,1 w 2,2 w 2,3 w 3,2 w 3,3 w 1,2 w 1,3 w 2,0 w 2,1 w 3,0 w 3,1 BRAM 0 w 2,2 w 2,3 buffer w 0,1 w 1, Bits BRAM 0 BRAM 1 BRAM 0 buffer 18 / 16
46 Mini-Batch Size Effects Effect of Mini-Batch size Speed MCUPS] 8,000 6,000 4,000 2, ,000 Mini-Batch Size 19 / 16
47 FPGA Performance Comparison Implementation Network mbatch Absolute Size Size Performance Virtualized 1-FPGA 1024x MCUPS Virtualized 4-FPGA 1024x MCUPS Virtualized 1-FPGA 1024x MCUPS Virtualized 4-FPGA 1024x MCUPS Virtualized 1-FPGA 8192x MCUPS Virtualized 4-FPGA 8192x MCUPS Virtualized 1-FPGA 256x MCUPS Kim et al. 4-FPGA x MCUPS CUP S = n2 T L 3 S. Kim et al., A Large-Scale Architecture for Restricted Boltzmann Machines, FPL / 16
A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010
A FPGA Implementation of Large Restricted Boltzmann Machines by Charles Lo Supervisor: Paul Chow April 2010 Abstract A FPGA Implementation of Large Restricted Boltzmann Machines Charles Lo Engineering
More informationSpeaker Representation and Verification Part II. by Vasileios Vasilakakis
Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationOptimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,
More informationA Deep Convolutional Neural Network Based on Nested Residue Number System
A Deep Convolutional Neural Network Based on Nested Residue Number System Hiroki Nakahara Tsutomu Sasao Ehime University, Japan Meiji University, Japan Outline Background Deep convolutional neural network
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationDeep Learning Srihari. Deep Belief Nets. Sargur N. Srihari
Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous
More informationRestricted Boltzmann Machines for Collaborative Filtering
Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationDeep unsupervised learning
Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.
More informationModeling Documents with a Deep Boltzmann Machine
Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationCourse Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch
Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure
More informationNCU EE -- DSP VLSI Design. Tsung-Han Tsai 1
NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using
More informationLOCKHEED MARTIN SITE UPDATE 11 APRIL 2018 MUNICH, GERMANY Kristen Pudenz Senior Quantum Applications Engineer
LOCKHEED MARTIN SITE UPDATE 11 APRIL 2018 MUNICH, GERMANY Kristen Pudenz Senior Quantum Applications Engineer THE USC-LM QUANTUM COMPUTING CENTER Dr. Edward H. Ned Allen Chief Scientist and LM Senior Fellow
More informationRepresentational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber
Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter
More informationOn the Use of a Many core Processor for Computational Fluid Dynamics Simulations
On the Use of a Many core Processor for Computational Fluid Dynamics Simulations Sebastian Raase, Tomas Nordström Halmstad University, Sweden {sebastian.raase,tomas.nordstrom} @ hh.se Preface based on
More informationDeep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)
CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information
More informationA Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )
A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and
More informationCMOS Ising Computer to Help Optimize Social Infrastructure Systems
FEATURED ARTICLES Taking on Future Social Issues through Open Innovation Information Science for Greater Industrial Efficiency CMOS Ising Computer to Help Optimize Social Infrastructure Systems As the
More informationDeep Boltzmann Machines
Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish
More informationStochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence
ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient
More informationGreedy Layer-Wise Training of Deep Networks
Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationDynamical Systems and Deep Learning: Overview. Abbas Edalat
Dynamical Systems and Deep Learning: Overview Abbas Edalat Dynamical Systems The notion of a dynamical system includes the following: A phase or state space, which may be continuous, e.g. the real line,
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationDeep Belief Networks are Compact Universal Approximators
Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, 2016 1 / 29 Outline 1 Introduction 2 Preliminaries Universal
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationA Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers
A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers Thomas E. Potok, Ph.D. Computational Data Analytics Group Oak Ridge National Laboratory ORNL is managed
More information7.1 Basis for Boltzmann machine. 7. Boltzmann machines
7. Boltzmann machines this section we will become acquainted with classical Boltzmann machines which can be seen obsolete being rarely applied in neurocomputing. It is interesting, after all, because is
More informationAccelerated Monte Carlo simulations with restricted Boltzmann machines
Accelerated simulations with restricted Boltzmann machines Li Huang 1 Lei Wang 2 1 China Academy of Engineering Physics 2 Institute of Physics, Chinese Academy of Sciences July 6, 2017 Machine Learning
More informationCOMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017
COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann
More informationDeep Learning & Neural Networks Lecture 4
Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly
More informationarxiv: v2 [cs.ne] 22 Feb 2013
Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA
More informationReading Group on Deep Learning Session 4 Unsupervised Neural Networks
Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann
More informationLearning Tetris. 1 Tetris. February 3, 2009
Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are
More informationarxiv: v1 [cs.ar] 11 Dec 2017
Multi-Mode Inference Engine for Convolutional Neural Networks Arash Ardakani, Carlo Condo and Warren J. Gross Electrical and Computer Engineering Department, McGill University, Montreal, Quebec, Canada
More informationTunable Floating-Point for Energy Efficient Accelerators
Tunable Floating-Point for Energy Efficient Accelerators Alberto Nannarelli DTU Compute, Technical University of Denmark 25 th IEEE Symposium on Computer Arithmetic A. Nannarelli (DTU Compute) Tunable
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationAn FPGA Implementation of Reciprocal Sums for SPME
An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationDeep Learning Architectures and Algorithms
Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationDoes the Wake-sleep Algorithm Produce Good Density Estimators?
Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto
More informationCRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?
CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?
More informationIBM Systems for Cognitive Solutions
IBM Q Quantum Computing IBM Systems for Cognitive Solutions Ehningen 12 th of July 2017 Albert Frisch, PhD - albert.frisch@de.ibm.com 2017 IBM 1 st wave of Quantum Revolution lasers atomic clocks GPS sensors
More informationBasic Principles of Unsupervised and Unsupervised
Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationFPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off
07 IEEE 5th Annual International Symposium on Field-Programmable Custom Computing Machines FPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off Kaan Kara, Dan Alistarh, Gustavo
More informationSajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationDeep Learning. Basics and Intuition. Constantin Gonzalez Principal Solutions Architect, Amazon Web Services
Deep Learning Basics and Intuition Constantin Gonzalez Principal Solutions Architect, Amazon Web Services glez@amazon.de September 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
More informationSome thoughts about energy efficient application execution on NEC LX Series compute clusters
Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science
More informationAN APPROACH TO FIND THE TRANSITION PROBABILITIES IN MARKOV CHAIN FOR EARLY PREDICTION OF SOFTWARE RELIABILITY
International Journal of Latest Research in Science and Technology Volume 2, Issue 6: Page No.111-115,November-December 2013 http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 AN APPROACH TO
More informationDeep Learning Architecture for Univariate Time Series Forecasting
CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann
ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output
More informationREVISIT ENCODER & DECODER
PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann
More informationChapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>
Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building
More informationUsually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,
Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More information8-Bit Dot-Product Acceleration
White Paper: UltraScale and UltraScale+ FPGAs WP487 (v1.0) June 27, 2017 8-Bit Dot-Product Acceleration By: Yao Fu, Ephrem Wu, and Ashish Sirasao The DSP architecture in UltraScale and UltraScale+ devices
More informationFundamentals of Computational Neuroscience 2e
Fundamentals of Computational Neuroscience 2e January 1, 2010 Chapter 10: The cognitive brain Hierarchical maps and attentive vision A. Ventral visual pathway B. Layered cortical maps Receptive field size
More informationA 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic- Compatible Embedded Flash Memory Technology
A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic- Compatible Embedded Flash Memory Technology M. Kim 1, J. Kim 1, G. Park 1, L. Everson 1, H. Kim 1, S. Song 1,2,
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationRapid Introduction to Machine Learning/ Deep Learning
Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture
More informationBinary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky
Binary Deep Learning Presented by Roey Nagar and Kostya Berestizshevsky Deep Learning Seminar, School of Electrical Engineering, Tel Aviv University January 22 nd 2017 Lecture Outline Motivation and existing
More informationECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders
ECE 645: Lecture 2 Carry-Lookahead, Carry-Select, & Hybrid Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 6, Carry-Lookahead Adders Sections 6.1-6.2.
More informationHigh Performance Computing
Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),
More informationCSC321 Lecture 20: Autoencoders
CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete
More informationIntroduction. Previous work has shown that AER can also be used to construct largescale networks with arbitrary, configurable synaptic connectivity.
Introduction The goal of neuromorphic engineering is to design and implement microelectronic systems that emulate the structure and function of the brain. Address-event representation (AER) is a communication
More informationLRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation
LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and
More informationIntroducing a Bioinformatics Similarity Search Solution
Introducing a Bioinformatics Similarity Search Solution 1 Page About the APU 3 The APU as a Driver of Similarity Search 3 Similarity Search in Bioinformatics 3 POC: GSI Joins Forces with the Weizmann Institute
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationChapter 11. Stochastic Methods Rooted in Statistical Mechanics
Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science
More informationImplementation of a Restricted Boltzmann Machine in a Spiking Neural Network
Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Srinjoy Das Department of Electrical and Computer Engineering University of California, San Diego srinjoyd@gmail.com Bruno Umbria
More informationNeural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28
1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain
More informationDigital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationARestricted Boltzmann machine (RBM) [1] is a probabilistic
1 Matrix Product Operator Restricted Boltzmann Machines Cong Chen, Kim Batselier, Ching-Yun Ko, and Ngai Wong chencong@eee.hku.hk, k.batselier@tudelft.nl, cyko@eee.hku.hk, nwong@eee.hku.hk arxiv:1811.04608v1
More informationCS470: Computer Architecture. AMD Quad Core
CS470: Computer Architecture Yashwant K. Malaiya, Professor malaiya@cs.colostate.edu AMD Quad Core 1 Architecture Layers Building blocks Gates, flip-flops Functional bocks: Combinational, Sequential Instruction
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationBinary Convolutional Neural Network on RRAM
Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua
More informationGaussian Cardinality Restricted Boltzmann Machines
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua
More informationWhy DNN Works for Acoustic Modeling in Speech Recognition?
Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,
More informationLarge-Scale FPGA implementations of Machine Learning Algorithms
Large-Scale FPGA implementations of Machine Learning Algorithms Philip Leong ( ) Computer Engineering Laboratory School of Electrical and Information Engineering, The University of Sydney Computer Engineering
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationExploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units
Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern
More informationHopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296
Hopfield Networks and Boltzmann Machines Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks A Hopfield network is a neural network with a graph G = (U,C) that satisfies
More informationCOVER SHEET: Problem#: Points
EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)
More information