Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Size: px
Start display at page:

Download "Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI"

Transcription

1 Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering University of Toronto FPGA / 16

2 Motivation Restricted Boltzmann Machines (RBMs) Two-Layer Artificial Neural Network 2 / 16

3 Motivation Restricted Boltzmann Machines (RBMs) Two-Layer Artificial Neural Network Deep neural networks Efficiently trained with RBMs Successfully applied to interesting machine learning problems Learning slow with thousands of nodes per layer 2 / 16

4 Motivation RBM training easily parallelized FPGA implementations achieve dramatic speed-up RBM size limited by available on-chip RAM Virtualized architecture 1 Significant performance hit from limited memory bandwidth This work: Builds upon virtualized architecture Accelerate large RBMs Increase performance 1 D. Ly et al., High-Performance Reconfigurable Hardware Architecture for Restricted Boltzmann Machines, TNN, Nov / 16

5 The Rest of the Talk Restricted Boltzmann Machines Baseline Architecture Performance Improvements Embedded Message Passing Interface (MPI) 4 / 16

6 Restricted Boltzmann Machines Hidden Layer Visible Layer 5 / 16

7 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

8 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

9 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

10 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

11 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

12 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

13 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 5 / 16

14 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes 5 / 16

15 Alternating Gibbs Sampling t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes RBM size limited by on-chip memory 5 / 16

16 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes RBM size limited by on-chip memory Train a virtual RBM by time-multiplexing hardware 6 / 16

17 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 W 0,0 W 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes Time Train a virtual RBM by time-multiplexing hardware 6 / 16

18 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 W 0,0 W 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Hidden Nodes W 0,0 W 0,1 Time Train a virtual RBM by time-multiplexing hardware 6 / 16

19 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 Hidden Nodes W 0,0 W 0,1 W 0,0 W 0,1 W 1,1 ww 3,2 1,1 w 3,3 Time Train a virtual RBM by time-multiplexing hardware 6 / 16

20 Virtualization t = 0 t = 1 t = 2 t = 3 Visible Nodes w 0,0 w 0,1 W 0,0 W 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w ww 2,3 3,0 1,0 w 3,1 ww 3,2 1,1 w 3,3 Hidden Nodes W 0,0 W 0,1 W 1,1 W 1,0 Time Train a virtual RBM by time-multiplexing hardware 6 / 16

21 Baseline Architecture DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Message Passing Interface (MPI) Network Compute Engines FPGA 7 / 16

22 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16

23 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Three Approaches Partition Size Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16

24 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Three Approaches Partition Size Memory Interface Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16

25 Improving Performance DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Three Approaches Partition Size Memory Interface Multi-FPGA Extension Message Passing Interface (MPI) Network Compute Engines FPGA 8 / 16

26 Partition Size W 0,0 Weight Matrix w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Address Weights in Block RAM 0 w 0,0 w 0,1 w 0,2 w 0,3 Larger 1 wpartitions: 1,1 w 1,2 w 1,3 w 1,0 Reduce context switches 2 w 2,2 w 2,3 w 2,0 w Increase parallel weight access 2,1 3 w 3,3 w 3,0 w 3,1 w 3,2 BRAM 0 BRAM 1 BRAM 2 BRAM Bits 9 / 16

27 Partition Size W 0,0 Weight Matrix w 0,0 w 0,1 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,0 w 2,1 w 2,2 w 2,3 w 3,0 w 3,1 w 3,2 w 3,3 Weights in Block RAM 0 w 0,0 w 0,1 w 0,2 w 0,3 Diagonal assignment allows 1 w 1,1 w 1,2 w 1,3 w 1,0 parallel access to rows and2columns w 2,2 w 2,3 w 2,0 w 2,1 Address 3 w 3,3 w 3,0 w 3,1 w 3,2 BRAM 0 BRAM 1 BRAM 2 BRAM Bits 9 / 16

28 Partition Size W 0,0 Weight Matrix Weights in Block RAM w 2,0 w 0,1 w 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 Address w 0,0 w 0,1 w 1,1 w 1,2 w 2,2 w 2,3 w 0,2 w 0,3 w 1,3 w 1,0 w 2,0 w 2,1 w 3,0 w 3,1 w 3,2 w 3,3 3 w 3,3 w 3,0 w 3,1 w 3,2 BRAM 0 BRAM 1 BRAM 2 BRAM Bits 9 / 16

29 Partition Size W 0,0 Weight Matrix Weights in Block RAM w 2,0 w 0,1 w 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 Address w 0,0 w 0,1 w 1,0 w 1,1 w 2,2 w 2,3 w 0,2 w 0,3 w 1,2 w 1,3 w 2,0 w 2,1 w 3,0 w 3,1 w 3,2 w 3,3 3 w 3,2 w 3,3 w 3,0 w 3,3 BRAM 0 BRAM Bits 9 / 16

30 Partition Size W 0,0 Weight Matrix Weights in Block RAM w 2,0 w 0,1 w 0,0 w 0,2 w 0,3 w 1,0 w 1,1 w 1,2 w 1,3 w 2,1 w 2,2 w 2,3 Address w 0,0 w 0,1 w 1,0 w 1,1 w 2,2 w 2,3 w 0,2 w 0,3 w 1,2 w 1,3 w 2,0 w 2,1 w 3,0 w 3,1 w 3,2 w 3,3 3 w 3,2 w 3,3 w 3,0 w 3,3 BRAM 0 BRAM Bits Block diagonal assignment provides access to packed weights 9 / 16

31 Memory Interface DDR2 Memory Processor Memory Controller Processor Local Bus (PLB) DMA Engine Message Passing Interface (MPI) Network Compute Engines FPGA 10 / 16

32 Memory Interface DDR2 Memory Processor Processor Local Bus (PLB) DMA Engine Memory Controller Memory Access Core Native Port Interface (NPI) Message Passing Interface (MPI) Network Compute Engines FPGA 10 / 16

33 Single-FPGA Performance Design compared to original virtualized architecture Performance Metric: Connection Updates per Second (CUPS) Single-FPGA performance increased 4.2 fold over original architecture 11 / 16

34 Multi-FPGA Extension Partitioned Weight Matrix W 0,0 W 1,0 W 0,2 W 0,1 W 0,3 W 1,1 W 1,2 W 1,3 16 context switches required on a single FPGA W 2,0 W 2,1 W 2,2 W 2,3 W 3,0 W 3,1 W 3,2 W 3,3 12 / 16

35 Multi-FPGA Extension Partitioned Weight Matrix Time W 0,0 W 2,0 W 3,0 W 0,1 W 0,2 W 0,3 W 1,0 W 1,1 W 1,2 W 2,1 W 3,1 W 2,2 W 3,2 W 1,3 W 2,3 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space 16 context switches required on a single FPGA Virtualization allows for distribution of partitions in time and space 12 / 16

36 Inter-FPGA Accumulation Partitioned Weight Matrix Time W 0,0 W 2,0 W 0,1 W 0,2 W 0,3 W 1,0 W 1,1 W 1,2 W 2,1 W 2,2 W 1,3 W 2,3 Accumulation in space requires inter-fpga communication W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space 13 / 16

37 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16

38 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16

39 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16

40 Inter-FPGA Accumulation Partitioned Weight Matrix W 0,0 W 0,1 W 0,2 W 0,3 FPGA 0 Dataflow FPGA 1 Time W 1,0 W 1,1 W 1,2 W 2,0 W 2,1 W 2,2 W 1,3 W 2,3 FPGA 3 FPGA 2 Tree Adder W 3,0 W 3,1 W 3,2 W 3,3 FPGA 0 FPGA 1 FPGA 2 FPGA 3 Space F0 F1 F2 F3 13 / 16

41 Inter-FPGA Communication 12,000 Performance Scaling with Network Size Speed MCUPS] 10,000 8,000 6,000 4,000 2,000 4,000 6,000 8,000 Number of nodes in visible and hidden layers 1-FPGA 4-FPGA 14 / 16

42 Embedded Message Passing Interface ArchES-MPI 2 Scalable, high performance on-chip network System component implementations abstracted by ranks Simple incremental improvements Reconfigurable control and dataflow Design portability 2 M. Saldaña et al., MPI as an Abstraction for Software Hardware Interaction for HPRCs, HPRCTA, Nov / 16

43 Conclusion Virtualization allows for the acceleration of large RBMs through the distribution of work in time and space Single-FPGA performance increased 4.2 fold Embedded MPI simplifies design changes and increases design portability 16 / 16

44 RBM Bibliography G. Hinton et al., A fast learning algorithm for deep belief nets, Neural Computation, 2006 G. Hinton et al., Reducing the dimensionality of data with neural networks, Science Jul / 16

45 Weight Pipeline w 0,0 0 w 0,0 w 0,1 w 0,2 w 0,3 w 0,1 BRAM 0 buffer w 0,0 w 1,1 w 0,1 w 1,0 Address w 1,0 w 1,1 w 2,2 w 2,3 w 3,2 w 3,3 w 1,2 w 1,3 w 2,0 w 2,1 w 3,0 w 3,1 BRAM 0 w 2,2 w 2,3 buffer w 0,1 w 1, Bits BRAM 0 BRAM 1 BRAM 0 buffer 18 / 16

46 Mini-Batch Size Effects Effect of Mini-Batch size Speed MCUPS] 8,000 6,000 4,000 2, ,000 Mini-Batch Size 19 / 16

47 FPGA Performance Comparison Implementation Network mbatch Absolute Size Size Performance Virtualized 1-FPGA 1024x MCUPS Virtualized 4-FPGA 1024x MCUPS Virtualized 1-FPGA 1024x MCUPS Virtualized 4-FPGA 1024x MCUPS Virtualized 1-FPGA 8192x MCUPS Virtualized 4-FPGA 8192x MCUPS Virtualized 1-FPGA 256x MCUPS Kim et al. 4-FPGA x MCUPS CUP S = n2 T L 3 S. Kim et al., A Large-Scale Architecture for Restricted Boltzmann Machines, FPL / 16

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010

A FPGA Implementation of Large Restricted Boltzmann Machines. Charles Lo. Supervisor: Paul Chow April 2010 A FPGA Implementation of Large Restricted Boltzmann Machines by Charles Lo Supervisor: Paul Chow April 2010 Abstract A FPGA Implementation of Large Restricted Boltzmann Machines Charles Lo Engineering

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,

More information

A Deep Convolutional Neural Network Based on Nested Residue Number System

A Deep Convolutional Neural Network Based on Nested Residue Number System A Deep Convolutional Neural Network Based on Nested Residue Number System Hiroki Nakahara Tsutomu Sasao Ehime University, Japan Meiji University, Japan Outline Background Deep convolutional neural network

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

LOCKHEED MARTIN SITE UPDATE 11 APRIL 2018 MUNICH, GERMANY Kristen Pudenz Senior Quantum Applications Engineer

LOCKHEED MARTIN SITE UPDATE 11 APRIL 2018 MUNICH, GERMANY Kristen Pudenz Senior Quantum Applications Engineer LOCKHEED MARTIN SITE UPDATE 11 APRIL 2018 MUNICH, GERMANY Kristen Pudenz Senior Quantum Applications Engineer THE USC-LM QUANTUM COMPUTING CENTER Dr. Edward H. Ned Allen Chief Scientist and LM Senior Fellow

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines http://deeplearning4.org/rbm-mnist-tutorial.html Slides from Hugo Larochelle, Geoffrey Hinton, and Yoshua Bengio CSC321: Intro to Machine Learning and Neural Networks, Winter

More information

On the Use of a Many core Processor for Computational Fluid Dynamics Simulations

On the Use of a Many core Processor for Computational Fluid Dynamics Simulations On the Use of a Many core Processor for Computational Fluid Dynamics Simulations Sebastian Raase, Tomas Nordström Halmstad University, Sweden {sebastian.raase,tomas.nordstrom} @ hh.se Preface based on

More information

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight) CSCE 636 Neural Networks Instructor: Yoonsuck Choe Deep Learning What Is Deep Learning? Learning higher level abstractions/representations from data. Motivation: how the brain represents sensory information

More information

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m )

A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) A Simple Architectural Enhancement for Fast and Flexible Elliptic Curve Cryptography over Binary Finite Fields GF(2 m ) Stefan Tillich, Johann Großschädl Institute for Applied Information Processing and

More information

CMOS Ising Computer to Help Optimize Social Infrastructure Systems

CMOS Ising Computer to Help Optimize Social Infrastructure Systems FEATURED ARTICLES Taking on Future Social Issues through Open Innovation Information Science for Greater Industrial Efficiency CMOS Ising Computer to Help Optimize Social Infrastructure Systems As the

More information

Deep Boltzmann Machines

Deep Boltzmann Machines Deep Boltzmann Machines Ruslan Salakutdinov and Geoffrey E. Hinton Amish Goel University of Illinois Urbana Champaign agoel10@illinois.edu December 2, 2016 Ruslan Salakutdinov and Geoffrey E. Hinton Amish

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Dynamical Systems and Deep Learning: Overview. Abbas Edalat Dynamical Systems and Deep Learning: Overview Abbas Edalat Dynamical Systems The notion of a dynamical system includes the following: A phase or state space, which may be continuous, e.g. the real line,

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Deep Belief Networks are Compact Universal Approximators

Deep Belief Networks are Compact Universal Approximators Deep Belief Networks are Compact Universal Approximators Franck Olivier Ndjakou Njeunje Applied Mathematics and Scientific Computation May 16, 2016 1 / 29 Outline 1 Introduction 2 Preliminaries Universal

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers

A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers A Study of Complex Deep Learning Networks on High Performance, Neuromorphic, and Quantum Computers Thomas E. Potok, Ph.D. Computational Data Analytics Group Oak Ridge National Laboratory ORNL is managed

More information

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

7.1 Basis for Boltzmann machine. 7. Boltzmann machines 7. Boltzmann machines this section we will become acquainted with classical Boltzmann machines which can be seen obsolete being rarely applied in neurocomputing. It is interesting, after all, because is

More information

Accelerated Monte Carlo simulations with restricted Boltzmann machines

Accelerated Monte Carlo simulations with restricted Boltzmann machines Accelerated simulations with restricted Boltzmann machines Li Huang 1 Lei Wang 2 1 China Academy of Engineering Physics 2 Institute of Physics, Chinese Academy of Sciences July 6, 2017 Machine Learning

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

Deep Learning & Neural Networks Lecture 4

Deep Learning & Neural Networks Lecture 4 Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly

More information

arxiv: v2 [cs.ne] 22 Feb 2013

arxiv: v2 [cs.ne] 22 Feb 2013 Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint arxiv:1301.3533v2 [cs.ne] 22 Feb 2013 Xanadu C. Halkias DYNI, LSIS, Universitè du Sud, Avenue de l Université - BP20132, 83957 LA

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Learning Tetris. 1 Tetris. February 3, 2009

Learning Tetris. 1 Tetris. February 3, 2009 Learning Tetris Matt Zucker Andrew Maas February 3, 2009 1 Tetris The Tetris game has been used as a benchmark for Machine Learning tasks because its large state space (over 2 200 cell configurations are

More information

arxiv: v1 [cs.ar] 11 Dec 2017

arxiv: v1 [cs.ar] 11 Dec 2017 Multi-Mode Inference Engine for Convolutional Neural Networks Arash Ardakani, Carlo Condo and Warren J. Gross Electrical and Computer Engineering Department, McGill University, Montreal, Quebec, Canada

More information

Tunable Floating-Point for Energy Efficient Accelerators

Tunable Floating-Point for Energy Efficient Accelerators Tunable Floating-Point for Energy Efficient Accelerators Alberto Nannarelli DTU Compute, Technical University of Denmark 25 th IEEE Symposium on Computer Arithmetic A. Nannarelli (DTU Compute) Tunable

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

An FPGA Implementation of Reciprocal Sums for SPME

An FPGA Implementation of Reciprocal Sums for SPME An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

IBM Systems for Cognitive Solutions

IBM Systems for Cognitive Solutions IBM Q Quantum Computing IBM Systems for Cognitive Solutions Ehningen 12 th of July 2017 Albert Frisch, PhD - albert.frisch@de.ibm.com 2017 IBM 1 st wave of Quantum Revolution lasers atomic clocks GPS sensors

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

FPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off

FPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off 07 IEEE 5th Annual International Symposium on Field-Programmable Custom Computing Machines FPGA-accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-off Kaan Kara, Dan Alistarh, Gustavo

More information

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr

More information

Deep Learning. Basics and Intuition. Constantin Gonzalez Principal Solutions Architect, Amazon Web Services

Deep Learning. Basics and Intuition. Constantin Gonzalez Principal Solutions Architect, Amazon Web Services Deep Learning Basics and Intuition Constantin Gonzalez Principal Solutions Architect, Amazon Web Services glez@amazon.de September 2017 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

AN APPROACH TO FIND THE TRANSITION PROBABILITIES IN MARKOV CHAIN FOR EARLY PREDICTION OF SOFTWARE RELIABILITY

AN APPROACH TO FIND THE TRANSITION PROBABILITIES IN MARKOV CHAIN FOR EARLY PREDICTION OF SOFTWARE RELIABILITY International Journal of Latest Research in Science and Technology Volume 2, Issue 6: Page No.111-115,November-December 2013 http://www.mnkjournals.com/ijlrst.htm ISSN (Online):2278-5299 AN APPROACH TO

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

REVISIT ENCODER & DECODER

REVISIT ENCODER & DECODER PERCEPTION-LINK BEHAVIOR MODEL: REVISIT ENCODER & DECODER IMI PHD Presentation Presenter: William Gu Yuanlong (PhD student) Supervisor: Assoc. Prof. Gerald Seet Gim Lee Co-Supervisor: Prof. Nadia Magnenat-Thalmann

More information

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1>

Chapter 5. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 5 <1> Chapter 5 Digital Design and Computer Architecture, 2 nd Edition David Money Harris and Sarah L. Harris Chapter 5 Chapter 5 :: Topics Introduction Arithmetic Circuits umber Systems Sequential Building

More information

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However,

Usually the estimation of the partition function is intractable and it becomes exponentially hard when the complexity of the model increases. However, Odyssey 2012 The Speaker and Language Recognition Workshop 25-28 June 2012, Singapore First attempt of Boltzmann Machines for Speaker Verification Mohammed Senoussaoui 1,2, Najim Dehak 3, Patrick Kenny

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

8-Bit Dot-Product Acceleration

8-Bit Dot-Product Acceleration White Paper: UltraScale and UltraScale+ FPGAs WP487 (v1.0) June 27, 2017 8-Bit Dot-Product Acceleration By: Yao Fu, Ephrem Wu, and Ashish Sirasao The DSP architecture in UltraScale and UltraScale+ devices

More information

Fundamentals of Computational Neuroscience 2e

Fundamentals of Computational Neuroscience 2e Fundamentals of Computational Neuroscience 2e January 1, 2010 Chapter 10: The cognitive brain Hierarchical maps and attentive vision A. Ventral visual pathway B. Layered cortical maps Receptive field size

More information

A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic- Compatible Embedded Flash Memory Technology

A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic- Compatible Embedded Flash Memory Technology A 68 Parallel Row Access Neuromorphic Core with 22K Multi-Level Synapses Based on Logic- Compatible Embedded Flash Memory Technology M. Kim 1, J. Kim 1, G. Park 1, L. Everson 1, H. Kim 1, S. Song 1,2,

More information

CMP 338: Third Class

CMP 338: Third Class CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/24 Lecture 5b Markov random field (MRF) November 13, 2015 2/24 Table of contents 1 1. Objectives of Lecture

More information

Binary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky

Binary Deep Learning. Presented by Roey Nagar and Kostya Berestizshevsky Binary Deep Learning Presented by Roey Nagar and Kostya Berestizshevsky Deep Learning Seminar, School of Electrical Engineering, Tel Aviv University January 22 nd 2017 Lecture Outline Motivation and existing

More information

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders

ECE 645: Lecture 2. Carry-Lookahead, Carry-Select, & Hybrid Adders ECE 645: Lecture 2 Carry-Lookahead, Carry-Select, & Hybrid Adders Required Reading Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design Chapter 6, Carry-Lookahead Adders Sections 6.1-6.2.

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

Introduction. Previous work has shown that AER can also be used to construct largescale networks with arbitrary, configurable synaptic connectivity.

Introduction. Previous work has shown that AER can also be used to construct largescale networks with arbitrary, configurable synaptic connectivity. Introduction The goal of neuromorphic engineering is to design and implement microelectronic systems that emulate the structure and function of the brain. Address-event representation (AER) is a communication

More information

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation

LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation LRADNN: High-Throughput and Energy- Efficient Deep Neural Network Accelerator using Low Rank Approximation Jingyang Zhu 1, Zhiliang Qian 2, and Chi-Ying Tsui 1 1 The Hong Kong University of Science and

More information

Introducing a Bioinformatics Similarity Search Solution

Introducing a Bioinformatics Similarity Search Solution Introducing a Bioinformatics Similarity Search Solution 1 Page About the APU 3 The APU as a Driver of Similarity Search 3 Similarity Search in Bioinformatics 3 POC: GSI Joins Forces with the Weizmann Institute

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network

Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Implementation of a Restricted Boltzmann Machine in a Spiking Neural Network Srinjoy Das Department of Electrical and Computer Engineering University of California, San Diego srinjoyd@gmail.com Bruno Umbria

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.

Digital Integrated Circuits A Design Perspective. Arithmetic Circuits. Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic. Digital Integrated Circuits A Design Perspective Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic Arithmetic Circuits January, 2003 1 A Generic Digital Processor MEM ORY INPUT-OUTPUT CONTROL DATAPATH

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

ARestricted Boltzmann machine (RBM) [1] is a probabilistic

ARestricted Boltzmann machine (RBM) [1] is a probabilistic 1 Matrix Product Operator Restricted Boltzmann Machines Cong Chen, Kim Batselier, Ching-Yun Ko, and Ngai Wong chencong@eee.hku.hk, k.batselier@tudelft.nl, cyko@eee.hku.hk, nwong@eee.hku.hk arxiv:1811.04608v1

More information

CS470: Computer Architecture. AMD Quad Core

CS470: Computer Architecture. AMD Quad Core CS470: Computer Architecture Yashwant K. Malaiya, Professor malaiya@cs.colostate.edu AMD Quad Core 1 Architecture Layers Building blocks Gates, flip-flops Functional bocks: Combinational, Sequential Instruction

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

Large-Scale FPGA implementations of Machine Learning Algorithms

Large-Scale FPGA implementations of Machine Learning Algorithms Large-Scale FPGA implementations of Machine Learning Algorithms Philip Leong ( ) Computer Engineering Laboratory School of Electrical and Information Engineering, The University of Sydney Computer Engineering

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units

Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Exploring the Potential of Instruction-Level Parallelism of Exposed Datapath Architectures with Buffered Processing Units Anoop Bhagyanath and Klaus Schneider Embedded Systems Chair University of Kaiserslautern

More information

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296

Hopfield Networks and Boltzmann Machines. Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks and Boltzmann Machines Christian Borgelt Artificial Neural Networks and Deep Learning 296 Hopfield Networks A Hopfield network is a neural network with a graph G = (U,C) that satisfies

More information

COVER SHEET: Problem#: Points

COVER SHEET: Problem#: Points EEL 4712 Midterm 3 Spring 2017 VERSION 1 Name: UFID: Sign here to give permission for your test to be returned in class, where others might see your score: IMPORTANT: Please be neat and write (or draw)

More information