DMTK 超大规模深度学习框架及其在文本理解中的应用

Size: px
Start display at page:

Download "DMTK 超大规模深度学习框架及其在文本理解中的应用"

Transcription

1 DMTK 超大规模深度学习框架及其在文本理解中的应用 Taifeng Wang Lead Researcher Machine Learning Group, MSRA 2016 GTC China 9/13/2016 Taifeng China 1

2 Microsoft Research Lab Locations Redmond, Washington, USA Sep 1991 Cambridge, Massachusetts, USA 2008 New York, USA May 2012 Cambridge, UK July 1997 Beijing, China Nov 1998 Bangalore, India Jan 2005

3 Microsoft Research Asia 1.0 Technologies transferred into all major Microsoft products

4 About DMTK (Distributed machine learning toolkit) We focus on providing distributed machine learning infrastructure and algorithms to handle big data and big model learning tasks. Release to github on by Machine Learning group of MSRA. 9/13/2016 Taifeng China 4

5 DMTK User Engagement Within just one week after release ( ): stars and 200+ forks at GitHub 1M+ visits to DMTK homepage 300K+ downloads of binary executables Major upgrade (2016.9) 9/13/2016 Taifeng China 5

6 有关 DMTK 发展历程 Parameter server 1.0 Distributed LightLDA Distributed Word2Vec Enrich program language support for parameter server Python Lua Connect with torch/caffe/theano Innovation on distributed optimization: DC-ASGD, ensemble model, accelerated optimization methods Model parallelism on deep learning models 2C-RNN for text understanding Graph embedding Parameter server 2.0 Simpler SDK usage System performance enhancement, e.g. memory & network cost reduction Deep integration with CNTK Distributed Logistic Regression with online update (FTRL) Distributed gradient boosting decision tree(gbdt) /13/2016 Taifeng China 6

7 Microsoft Distributed Machine Learning Toolkit (DMTK) 2D-RNN Logistic Regression LightGBM LightLDA Districted Word Embedding Distributed machine learning algorithms AzureML CNTK Other Single node dnn tools: Theano/caffe/torch parallelize different machine learning toolkits Multiverso Parameter Server ASGD /DC-ASGD MA / ADMM / BMUF Distributed synchronization mechanism Model Slicing Matrix / tensor Hash table Hybrid model store Tree MPI ZeroMQ RDMA GPU direct Rich communication interface Execution Engines YARN 9/13/2016 Taifeng Wang@GTC China 7

8 Workload supported LightLDA Word2Vec GBDT LSTM CNN Online FTRL Model Model Model Model Model Model 20M vocab, 1M topics (largest topic model) 10M vocab, 1000 dim (largest word embedding) 3000 trees (120-node) (GBDT) 20M parameters (4 hidden layer, LSTM) 41M parameters (GoogleLeNet) 800M parameters (Logistic Regression) Data Data Data Data Data Data 200B tokens (Bing web chunk) 200B samples (Bing web chunk) 7M records (Bing HRS data) 1570 hrs speech data (Win phone data 2M images (ImageNet 1K dataset) 6.4B impressions (Bing Ads click log) Training time 60 hrs on 24 machines (nearly linear speed-up) Training time 40 hrs on 8 machines (nearly linear speed-up) Training time 3 hrs on 8 machines (4x of speed-up) Training time 1 day on 16 GPUs (15.9x speed-up) Training time 30 hrs on 16 GPUs (10x speed-up Training time 2400s on 24 machines (12x speed-up) 4/25/2016 Taifeng HKUST 8

9 Forward looking - Microsoft Cognitive Toolkits 9/13/2016 Taifeng Wang@GTC China 9

10 如何推动大规模机器学习的发展 System Innovation One needs to leverage the full power of distributed system, and pursue almost linear scale out/speed up. New distributed training paradigm needs to be invented in order to revolve the bottle neck of existing distributed machine learning systems. Algorithmic Innovation Machine learning algorithms themselves need to have sufficiently high efficiency and throughout. Existing design/implementation of machine learning algorithms might not have considered this request; redesign/re-implementation might be needed. 9/13/2016 Taifeng China 10

11 Evolution of Distributed ML Asynchronous Dataflow (Deep learning) Parameter Server (Deep learning, LDA, GBDT, LR) Synchronous Iterative MapReduce (LDA, LR) Data Parallelism Model Parallelism Irregular Parallelism 9/13/2016 Taifeng China 11

12 Evolution of Distributed Machine Learning Iterative MapReduce Local computation Synchronous update Use MapReduce / AllReduce to sync parameters among workers Only synchronous update Example: Spark and other derived systems 9/13/2016 Taifeng Wang@GTC China 12

13 Evolution of Distributed Machine Learning Iterative MapReduce Parameter Server Parameter server (PS) based solution is proposed to support: Asynchronous update Different mechanisms for model aggregation, especially in asynchronous manner Model parallelism Example: Google s DistBelief; Petuum Multiverso PS + NIPS 12 DistBelief (Google), NIPS 13 Petuum (Eric Xing), OSDI 14 Parameter server (Mu Li), Multiverso PS etc. 9/13/2016 Taifeng Wang@GTC China 13

14 Evolution of Distributed Machine Learning Iterative MapReduce Parameter Server Dataflow Dataflow Resource Task scheduling & execution based on: 1. Data dependency 2. Resource availability Dataflow based solution is proposed to support: Irregular parallelism (e.g., hybrid data- and model-parallelism), particularly in deep learning Both high-level abstraction and low-level flexibility in implementation Example: Google s TensorFlow + Tensorflow, Eusys 07 Dryad (Microsoft), NSDI 12 Spark (AMP Lab) 9/13/2016 Taifeng Wang@GTC China 14

15 Delay compensate ASGD Our work on system innovation 9/13/2016 Taifeng China 15

16 Delayed Gradients Sequential SGD w t+τ+1 = w t+τ η g w t+τ Async SGD w t+τ+1 = w t+τ η g w t g w t+τ = g w t + g w t w t+τ w t + O( w t+τ w t 2 ) g w t corresponds to the Hessian matrix 9/13/2016 Taifeng Wang@GTC China 16

17 Unbiased Efficient Approximation of Hessian Matrix Theorem: Assume that Y is discrete random variable and P Y = k X = x, w = σ k (x; w), where σ k x; w < 1, X, w, k = 1,, K. Let L x, y, w = I y=k log σ k (x; w) Then we can prove that there exists a function φ, such that: 2 E (Y x,w) w 2 L X, Y; w = E Y x,w φ L X, Y; w w k. For cross-entropy loss, the second-order derivatives can be derived from first-order derivatives in an unbiased manner. 9/13/2016 Taifeng Wang@GTC China 17

18 Delay Compensated ASGD (DC-ASGD) ASGD: w t+τ+1 = w t+τ η g w t DC-ASGD: w t+τ+1 = w t+τ η g w t λφ(g w t ) w t+τ w t 9/13/2016 Taifeng Wang@GTC China 18

19 Experimental Result (based on ResNet) CFAR ImageNet 9/13/2016 Taifeng China 19

20 2C-RNN: A Super Efficient and Scalable Deep Algorithm for Text Understanding Our work on algorithm innovation published in NIPS /13/2016 Taifeng Wang@GTC China 20

21 Recurrent Neural Networks for text applications A widely used model for sequence representation and learning Language modeling Machine translation Conversation bot Image/video caption Major challenges: efficiency and scalability 9/13/2016 Taifeng Wang@GTC China 21

22 Language modeling h t = σ Ux t + Wh t 1 + b o t = Vh t y t = softmax o t Symbol Definition Dimension x t U W V y t (Input) Embedding vector of the word at position t Parameter matrix: input hidden state Parameter matrix: hidden state hidden state Output embedding matrix: hidden state output Predicted probability for each word w h w h h V h V 9/13/2016 Taifeng Wang@GTC China 22

23 Challenge in text applications: model size Large scale Dataset #token Vocab Clueweb09(en) 143,820,387,816 10,784,180 Large model size current GPU cannot support Symbol Definition Dimension Memory Size x (Input) Embedding vector of the word at position t V w 10M*1024*4B = 40G U Parameter matrix: input hidden state h w 1024*1024*4B=4M W V Parameter matrix: hidden state hidden state Output embedding matrix: hidden state output h h V h 1024*1024*4B=4M 10M*1024*4B = 40G y t Predicted probability for each word V 10M*4B = 40M 9/13/2016 Taifeng Wang@GTC China 23

24 Challenge in text applications: running time Large scale Dataset #token Vocab Clueweb09(en) 143,820,387,816 10,784,180 Huge computation complexity To choose one word, we need to go through all the words in the vocabulary #operation Operation unit h t 2 million Float operation o t 10 billion Float operation h t = σ Ux t + Wh t 1 + b o t = Vh t y t = softmax o t 9/13/2016 Taifeng Wang@GTC China 24

25 Training time estimation on mainstream hardware Dataset #token Voc Clueweb09(en) 143,820,387,816 10,784,1 Device Computation (#core, flops) Global Mem Cap./BW Running time Xeon Broadwell 14nm (V16) Broadwell 2*20Core, TFLOPS 8x32GB DDR4 (256GB)/95GBps 0.143T*10G*10*2/0.736T /3600/24/365=1232 years GPU (K40) 28nm 5.0 TFLOPS (float32) (2880 cores) 12GB GDDR5 /288GBps 0.143T*10G*10*2/5T/360 0/24/365=181 years GPU (M40) 28nm 6.8 TFLOPS (float32) (3072 cores) 24GB GDDR5/ 288GBps 0.143T*10G*10*2/6.8T/3 600/24/365=133 years GPU (P100) 16nm 10.6 TFLOPS (float32) 21.2 TFLOPS (float16) 16GB HBM2/720GBps 0.143T*10G*10*2/10.6T/ 3600/24/365=85 years 0.143T*10G*10*2/5T/3600/24/365=181 years #tokens #operations per token #epochs #FLOPS Forward and backward propagation 9/13/2016 Taifeng China 25

26 Big challenge to algorithm innovation and hardware manufactory Key problem - Huge vocabulary Symbol Definition Dimension Memory Size x (Input) Embedding vector of the word at position t V w 10M*1024*4B = 40G U Parameter matrix: input hidden state h w 1024*1024*4B=4M W V Parameter matrix: hidden state hidden state Output embedding matrix: hidden state output h h V h 1024*1024*4B=4M 10M*1024*4B = 40G y t Predicted probability for each word V 10M*4B = 40M 9/13/2016 Taifeng Wang@GTC China 26

27 Our proposal: 2-Component shared embedding (Accepted by NIPS 2016) Current practice Our approach Embedding vector word Embedding vector y 1 y 2 y 3 y 4 Embedding vector word x 1 x 2 x 15 x 16 January February one two x 1 January February x 2 one two x 3 x 4 2 V vectors == x 1, y 1 (x 1, y 2 ) (x 2, y 1 ) (x 2, y 2 ) January February one two V vectors 2C: each word is partitioned and represented by two vectors (x, y) Shared embedding: x is shared in the same row, y is shared in the same column 9/13/2016 Taifeng Wang@GTC China 27

28 2C-RNN Predicted current word Predicted next word w t w t+1 w t w t+1 P c (w t 1 ) P r (w t ) P c (w t ) P r (w t+1 ) P c (w t+1 ) P(w t ) P(w t+1 ) Y c Y r Y c Y r Y c Y Y U r h t 1 U c h t 1 U h t r U h t c U r h t+1 U h t 1 U h t U W W W W W W W r x t 1 c x t 1 x t r x t c r x t+1 x t 1 x t X r X c X r X c X r X X w t 1 w t w t 1 w t Previous word Current word 9/13/2016 Taifeng Wang@GTC China 28

29 Analysis on model size h t = σ Ux t + Wh t 1 + b o t = Vh t y t = softmax o t 80G 70M Symbol Definition Dimension Memory Size x, y (Input) Embedding vector of the word at position t 2 V w 2*(10M) 1/2 *1024*4B = 25M U x, U y Parameter matrix: input hidden state 2 h w 2*1024*1024*4B=8M W x, W y Parameter matrix: hidden state hidden state 2 h h 2*1024*1024*4B=8M V x, V y Output embedding matrix: hidden state output 2 V h 2*(10M) 1/2 *1024*4B = 25M y t Predicted probability for each word 2 V <1M 9/13/2016 Taifeng Wang@GTC China 29

30 Analysis on computational complexity h t = σ Ux t + Wh t 1 + b o t = Vh t y t = softmax o t 10G 10M #operation per token Operation unit h t t x, h y 4M Float operation o t t x, o y 2 10M 1024 = 6M Float operation GPU (K40) 28nm 5.0 TFLOPS (float32) (2880 cores) 12GB GDDR5 /288GBps 0.143T*10M*10*2/5T/36 00/24/365=0.18 years Training time estimation: 181 years 0.18 years Now we can easily parallel with parameter server framework, if you have 20 machine -> 2-3 days 9/13/2016 Taifeng Wang@GTC China 30

31 How to allocate words into the 2D table Cold start Row partition according to prefix Column partition according to suffix Bootstrap Train with current partitions for several iterations Adjust partitions based on training loss Continue training react real return rexxxxx Billion Million Trillion xxxllion In the same row In the same column sure prepare gre xxxxre In the same column 9/13/2016 Taifeng China 31

32 Experimental results Middle-sized Dataset 2013 ACL Workshop dataset PPL. on test (ACLW-Spanish) KN-4 [1] MLBL[1] PPL. on test (ACLW-French) model size LSTM word-in/word-out M LSTM char-cnn-in/wordout [2] M Our 2C-RNN [cold start] M Our 2C-RNN [bootstrap] M To achieve same PPL with HSM baseline Method (1 GPU) Runtime(hours) HSM C-RNN % Reallocation /Training [1] non-lstm-rnn method baseline. [2] previous state-of-art method using character-cnn-input, Character-Aware Neural Language Models, 9/13/2016 Taifeng Wang@GTC China 32

33 Experimental results Large scale Dataset one billion benchmark: PPL. on test KN-5 [1] 68 2 G HSM [2] G Blackout-RNN [3] G Our 2C-RNN [cold start] M Our 2C-RNN [bootstrap] M KN + HSM [2] KN + Blackout-RNN [3] KN + 2C-RNN model size To achieve same PPL with HSM baseline Method (1 GPU) Runtime(hours) HSM C-RNN % Reallocation /Training [1] One Billion Word benchmark for measuring progress in statistical language modeling, [2] Strategies for Training Large Vocabulary Neural Language Models, [3] blackout:speeding up recurrent neural network language models with very large vocabularies, 9/13/2016 Taifeng Wang@GTC China 33

34 Summary and forward looking DMTK includes innovation from both system and algorithm Excellent speed up and widely available system integration Advanced distributed optimization method Many world leading algorithms Machine learning for distributed deep learning Learning how to acquire, select, and partition the data Learning the optimal network structure Learning how to perform model update Learning how to tune the hyper-parameters Learning how to aggregate local models Create an AI that can automatically create new AI! 9/13/2016 Taifeng China 34

35 DMTK 有关材料 欢迎加入我们的微信群, 一起讨论大数据人工智能 Thanks! 分布式机器学习联盟 9/13/2016 Taifeng China 35

36 Bootstrap: bipartite graph matching 9/13/2016 Taifeng China 36

37 Comparison Class based softmax c 1 c 2 c k w 1,1 w 1,k w k,1 w k,k Model size Training time Test time Generalization time Standard O( V w) O( V w) O( V w) O( V w) Tree based softmax O( V w) O(log V w) O(log V w) O( V w) Class based softmax O( V w) O( V w) O( V w) O( V w) Our 2C O( V w) O( V w) O( V w) O( V w) 9/13/2016 Taifeng Wang@GTC China 37

CNTK Microsoft s Open Source Deep Learning Toolkit. Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China

CNTK Microsoft s Open Source Deep Learning Toolkit. Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China CNTK Microsoft s Open Source Deep Learning Toolkit Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China Deep learning in Microsoft Cognitive Services https://how-old.net http://www.captionbot.ai

More information

Outline. Overview CNTK introduction. Educational resources Conclusions. Symbolic loop Batch scheduling Data parallel training

Outline. Overview CNTK introduction. Educational resources Conclusions. Symbolic loop Batch scheduling Data parallel training Outline Overview CNTK introduction Symbolic loop Batch scheduling Data parallel training Educational resources Conclusions Outline Overview CNTK introduction Symbolic loop Batch scheduling Data parallel

More information

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models

Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Scalable Asynchronous Gradient Descent Optimization for Out-of-Core Models Chengjie Qin 1, Martin Torres 2, and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced August 31, 2017 Machine

More information

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Nov 2, 2016 Outline SGD-typed algorithms for Deep Learning Parallel SGD for deep learning Perceptron Prediction value for a training data: prediction

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics

Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Dot-Product Join: Scalable In-Database Linear Algebra for Big Model Analytics Chengjie Qin 1 and Florin Rusu 2 1 GraphSQL, Inc. 2 University of California Merced June 29, 2017 Machine Learning (ML) Is

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

A Practitioner s Guide to MXNet

A Practitioner s Guide to MXNet 1/34 A Practitioner s Guide to MXNet Xingjian Shi Hong Kong University of Science and Technology (HKUST) HKUST CSE Seminar, March 31st, 2017 2/34 Outline 1 Introduction Deep Learning Basics MXNet Highlights

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Machine Learning. Boris

Machine Learning. Boris Machine Learning Boris Nadion boris@astrails.com @borisnadion @borisnadion boris@astrails.com astrails http://astrails.com awesome web and mobile apps since 2005 terms AI (artificial intelligence)

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Fast, Cheap and Deep Scaling machine learning

Fast, Cheap and Deep Scaling machine learning Fast, Cheap and Deep Scaling machine learning SFW Alexander Smola CMU Machine Learning and github.com/dmlc Many thanks to Mu Li Dave Andersen Chris Dyer Li Zhou Ziqi Liu Manzil Zaheer Qicong Chen Amr Ahmed

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Recurrent Neural Networks. COMP-550 Oct 5, 2017 Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks Mohit Shridhar Stanford University mohits@stanford.edu, mohit@u.nus.edu Abstract In particle physics, Higgs Boson to tau-tau

More information

Deep Learning intro and hands-on tutorial

Deep Learning intro and hands-on tutorial Deep Learning intro and hands-on tutorial Π passalis@csd.auth.gr Ε. ώ Π ώ. Π ΠΘ 1 / 53 Deep Learning 2 / 53 Δ Έ ( ) ω π μ, ώπ ώ π ω (,...), π ώ π π π π ω ω ώπ ώ π, (biologically plausible) Δ π ώώ π ώ Ε

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

More information

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine

More information

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday!

Case Study 1: Estimating Click Probabilities. Kakade Announcements: Project Proposals: due this Friday! Case Study 1: Estimating Click Probabilities Intro Logistic Regression Gradient Descent + SGD Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade April 4, 017 1 Announcements:

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Contents. (75pts) COS495 Midterm. (15pts) Short answers Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

A Parallel SGD method with Strong Convergence

A Parallel SGD method with Strong Convergence A Parallel SGD method with Strong Convergence Dhruv Mahajan Microsoft Research India dhrumaha@microsoft.com S. Sundararajan Microsoft Research India ssrajan@microsoft.com S. Sathiya Keerthi Microsoft Corporation,

More information

Gradient Coding: Mitigating Stragglers in Distributed Gradient Descent.

Gradient Coding: Mitigating Stragglers in Distributed Gradient Descent. Gradient Coding: Mitigating Stragglers in Distributed Gradient Descent. Alex Dimakis joint work with R. Tandon, Q. Lei, UT Austin N. Karampatziakis, Microsoft Setup Problem: Large-scale learning. Given

More information

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING Yichi Zhang Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence

DANIEL WILSON AND BEN CONKLIN. Integrating AI with Foundation Intelligence for Actionable Intelligence DANIEL WILSON AND BEN CONKLIN Integrating AI with Foundation Intelligence for Actionable Intelligence INTEGRATING AI WITH FOUNDATION INTELLIGENCE FOR ACTIONABLE INTELLIGENCE in an arms race for artificial

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

CSC321 Lecture 9 Recurrent neural nets

CSC321 Lecture 9 Recurrent neural nets CSC321 Lecture 9 Recurrent neural nets Roger Grosse and Nitish Srivastava February 3, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 9 Recurrent neural nets February 3, 2015 1 / 20 Overview You

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA

More information

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE Copyright 2013, SAS Institute Inc. All rights reserved. Agenda (Optional) History Lesson 2015 Buzzwords Machine Learning for X Citizen Data

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 INF 5860 Machine learning for image classification Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 OUTLINE Deep learning frameworks TensorFlow TensorFlow graphs TensorFlow session

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this

More information

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Content Quick Review of Recurrent Neural Network Introduction

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Neural networks Daniel Hennes 21.01.2018 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Logistic regression Neural networks Perceptron

More information

Behavioral Simulations in MapReduce

Behavioral Simulations in MapReduce Behavioral Simulations in MapReduce Guozhang Wang, Marcos Vaz Salles, Benjamin Sowell, Xun Wang, Tuan Cao, Alan Demers, Johannes Gehrke, Walker White Cornell University 1 What are Behavioral Simulations?

More information

T. HOEFLER

T. HOEFLER T HOEFLER Demystifying Parallel and Distributed Deep Learning - An In-Depth Concurrency Analysis Keynote at the 6 th Accelerated Data Analytics and Computing Workshop (ADAC 18) https://wwwarxivorg/abs/1829941

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Scikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python

Scikit-learn. scikit. Machine learning for the small and the many Gaël Varoquaux. machine learning in Python Scikit-learn Machine learning for the small and the many Gaël Varoquaux scikit machine learning in Python In this meeting, I represent low performance computing Scikit-learn Machine learning for the small

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Knowledge Tracing Machines: Families of models for predicting student performance

Knowledge Tracing Machines: Families of models for predicting student performance Knowledge Tracing Machines: Families of models for predicting student performance Jill-Jênn Vie RIKEN Center for Advanced Intelligence Project, Tokyo Optimizing Human Learning, June 12, 2018 Polytechnique

More information

Supporting Information

Supporting Information Supporting Information Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction Connor W. Coley a, Regina Barzilay b, William H. Green a, Tommi S. Jaakkola b, Klavs F. Jensen

More information

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Conquering the Complexity of Time: Machine Learning for Big Time Series Data

Conquering the Complexity of Time: Machine Learning for Big Time Series Data Conquering the Complexity of Time: Machine Learning for Big Time Series Data Yan Liu Computer Science Department University of Southern California Mini-Workshop on Theoretical Foundations of Cyber-Physical

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Administrative Issues. CS5242 mirror site: https://web.bii.a-star.edu.sg/~leehk/cs5424.html

Administrative Issues. CS5242 mirror site: https://web.bii.a-star.edu.sg/~leehk/cs5424.html Administrative Issues CS5242 mirror site: https://web.bii.a-star.edu.sg/~leehk/cs5424.html 1 Assignments (10% * 4 = 40%) For each assignment It has multiple sub-tasks, including coding tasks You need to

More information

Machine Learning Techniques

Machine Learning Techniques Machine Learning Techniques ( 機器學習技法 ) Lecture 15: Matrix Factorization Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan University

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Normalization Techniques in Training of Deep Neural Networks

Normalization Techniques in Training of Deep Neural Networks Normalization Techniques in Training of Deep Neural Networks Lei Huang ( 黄雷 ) State Key Laboratory of Software Development Environment, Beihang University Mail:huanglei@nlsde.buaa.edu.cn August 17 th,

More information

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis T. BEN-NUN, T. HOEFLER Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis https://www.arxiv.org/abs/1802.09941 What is Deep Learning good for? Digit Recognition Object

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

Identifying QCD transition using Deep Learning

Identifying QCD transition using Deep Learning Identifying QCD transition using Deep Learning Kai Zhou Long-Gang Pang, Nan Su, Hannah Peterson, Horst Stoecker, Xin-Nian Wang Collaborators: arxiv:1612.04262 Outline 2 What is deep learning? Artificial

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

@SoyGema GEMA PARREÑO PIQUERAS

@SoyGema GEMA PARREÑO PIQUERAS @SoyGema GEMA PARREÑO PIQUERAS WHAT IS AN ARTIFICIAL NEURON? WHAT IS AN ARTIFICIAL NEURON? Image Recognition Classification using Softmax Regressions and Convolutional Neural Networks Languaje Understanding

More information

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01 (Li Lab) National Science Foundation Center for Big Learning (CBL) Department of Electrical and Computer Engineering (ECE) Department of Computer & Information Science & Engineering (CISE) Pytorch Tutorial

More information

Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction

Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction Feng Qian,LeiSha, Baobao Chang, Zhifang Sui Institute of Computational Linguistics, Peking

More information

Forecasting demand in the National Electricity Market. October 2017

Forecasting demand in the National Electricity Market. October 2017 Forecasting demand in the National Electricity Market October 2017 Agenda Trends in the National Electricity Market A review of AEMO s forecasting methods Long short-term memory (LSTM) neural networks

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Learning in a Distributed and Heterogeneous Environment

Learning in a Distributed and Heterogeneous Environment Learning in a Distributed and Heterogeneous Environment Martin Jaggi EPFL Machine Learning and Optimization Laboratory mlo.epfl.ch Inria - EPFL joint Workshop - Paris - Feb 15 th Machine Learning Methods

More information

CS425: Algorithms for Web Scale Data

CS425: Algorithms for Web Scale Data CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original slides can be accessed at: www.mmds.org Challenges

More information

Binary Convolutional Neural Network on RRAM

Binary Convolutional Neural Network on RRAM Binary Convolutional Neural Network on RRAM Tianqi Tang, Lixue Xia, Boxun Li, Yu Wang, Huazhong Yang Dept. of E.E, Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua

More information

Deep Learning & Neural Networks Lecture 4

Deep Learning & Neural Networks Lecture 4 Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly

More information