Bilinear Attention Networks
|
|
- Berniece Shepherd
- 5 years ago
- Views:
Transcription
1 ˇ»» º º»»»» º».» C D -Rom ˇ» º» fi` ` Bilinear Attention Networks 2018 VQA Challenge runner-up (1st single model) fi` ` A fi` ` B Jin-Hwa Kim, Jaehyun Jun, Byoung-Tak Zhang Biointelligence Lab. Seoul National University
2 Visual Question Answering Learning joint representation of multiple modalities Linguistic and visual information Credit: visualqa.org
3 VQA 2.0 Dataset 204K MS COCO images Images Questions Answers Train 80K 443K 4.4M Val 40K 214K 2.1M Test 80K 447K unknown 760K questions (10 answers for each question from unique AMT workers) train, val, test-dev (remote validation), test-standard (publications) and test-challenge (competitions), test-reserve (check overfitting) # annotations VQA Score VQA Score is the avg. of 10 choose 9 accuracies Goyal et al., >3 1.0
4 Objective Introducing bilinear attention - Interactions between words and visual concepts are meaningful - Proposing an efficient method (with the same time complexity) on top of low-rank bilinear pooling Residual learning of attention - Residual learning with attention mechanism for incremental inference Integration of counting module (Zhang et al., 2018)
5 Preliminary Question embedding (fine-tuning) - Use the all outputs of GRU (every time steps) - X R N ρ where N is hidden dim., and ρ= {xi} is # of tokens Image embedding (fixed bottom-up-attention) - Select detected objects (rectangles) using pre-trained Faster RCNN, to extract rich features for each object (1600 classes, 400 attributes) - Y R M φ where M is feature dim., and φ= {yj} is # of objects
6 usessingle-channel single-channelinput input(question (question vector) vector) to combine the other multi-channel uses multi-channel input input(image (imagefeatures) features) single-channelintermediate intermediate representation representation (attended feature). asassingle-channel Low-rankbilinear bilinearmodel. model. Pirsiavash Pirsiavash et et al. al. [22] proposed a low-rank Low-rank low-rank bilinear bilinear model modelto toreduce reducethe the rankofofbilinear bilinearweight weightmatrix matrix W Wii to to give give regularity. regularity. For For this, rank this, W Wii is is replaced replaced with withthe themultiplication multiplication T N d d M d T N M d. As a result, this replacement of two smaller matrices U V, where U 2 R and V 2 R i i i of two smaller matrices Ui Vii, where Ui 2 R and Vi 2 R. As a result, this replacement makesthe therank rankof ofw Wiito tobe beat at most most dd min(n, min(n, M M ). ). For For the makes the scalar scalar output output ffii (bias (biasterms termsare areomitted omitted Bilinear model and its approximation (Wolf et al., 2007, Pirsiavash et al., 2009) without loss of generality): without loss of generality): T T T T T TT T T T T T f = x W y = x U V y = 1 (U x V y) (1) i i i i i i x Ui V y = 1 (U x V y) fi = x W i y = (1) Low-rank Bilinear Pooling i i i where 1 2 is a vector of ones and denotes Hadamard product (element-wise multiplication). where 1 2 R is a vector of ones and denotes Hadamard product (element-wise multiplication). Low-rank bilinear pooling (Kim et al., 2017) Low-rank bilinear pooling. For a vector output f, a pooling matrix P is introduced: Low-rank bilinear pooling. For a vector output f, a pooling matrix P is introduced: d Rd T T PT (UT x T V T y) ff = = P (U x V y) (2) (2) d c N d M d where P 2 R d c, U 2 RN d, and V 2 RM d. It allows U and V to be two-dimensional tensors where P 2R, Uoutput, 2 R instead, and Vof 2 using R c three-dimensional. It allows U and V to be two-dimensional For vector tensors U and V, tensors by introducing P for a vector output f 2 R c, significantly reducing the number of parameters. by introducing P for a vector output f 2 R, significantly reducing the number of parameters. replace the vector of ones with a pooling matrix P (use three two-dimensional Unitary attention networks. Attention provides an efficient mechanism to reduce input channel Unitarytensors). attention networks. Attention provides an efficient mechanism to reduce input channel by selectively utilizing given information. Assuming that a multi-channel input Y consisting of by selectively utilizing given information. Assuming that a multi-channel input Y consisting of ˆ from Y using the weights { i }: = {yi } column vectors, we want to get single channel y ˆ = {yi } column vectors, we want to get single channel y from Y using the weights { }: i X X ˆ= y i y i (3)
7 Unitary Attention This pooling is used to get attention weights with a question embedding (single-channel) vector and visual feature vectors (multi-channel) as the two inputs. We call it unitary attention since a question embedding vector queried the feature vectors, unidirectionally. Q Linear Tanh Replicate Linear Tanh Linear Softmax A V Conv Tanh Conv Softmax Linear Tanh Kim et al., 2017
8 Bilinear Attention Maps K φ φ ρ K = ρ U and V are linear embeddings p is a learnable projection vector A element-wise multiplication A := softmax K M M φ (1 p T ) X T U V T Y ρ φ ρ K ρ N N K ρ K K φ
9 Bilinear Attention Maps K φ φ ρ K = ρ Exactly the same approach with low-rank bilinear pooling element-wise A multiplication A := softmax (1 p T ) X T U V T Y A i,j = p T (U T X i ) (V T Y j ).
10 Bilinear Attention Maps K φ φ ρ K = ρ Multiple bilinear attention maps are acquired by different projection vectors pg as: A g := softmax (1 p T ) X T U V T Y g
11 Bilinear Attention Networks Each multimodal joint feature is filled with following equation (k is the index of K; broadcasting in PyTorch let you avoid for-loop for this): A 2 { } { } ρ φ f 0 k =(X T U 0 ) T k A(Y T V 0 ) k k-th K 1 ρ φ { } 0 2 { } φ 1 k-th 1 ρ ρ φ = k-th K K broadcasting: automatically repeat tensor operations in api-level supported by Numpy, Tensorflow, Pytorch
12 Bilinear Attention Networks One can show that this is equivalent to a bilinear attention model where each feature is pooled by low-rank bilinear approximation A 2 { } { } f 0 k = f 0 k =(X T U 0 ) T k A(Y T V 0 ) k 0 2 { } 0 2 {x i } X {y j } X { } A i,j (X T i U 0 k)(v 0T k Y j ) = i=1 {x i } X i=1 j=1 {y j } X j=1 A i,j X T i (U 0 kv 0T k )Y j low-rank bilinear pooling
13 Bilinear Attention Networks One can show that this is equivalent to a bilinear attention model where each feature is pooled by low-rank bilinear approximation Low-rank bilinear feature learning inside bilinear attention f 0 k = {x i } X {y j } X A i,j (X T i U 0 k)(v 0T k Y j ) i=1 j=1 = {x i } X {y j } X A i,j X T i (U 0 kv 0T k )Y j i=1 j=1 low-rank bilinear pooling
14 Bilinear Attention Networks One can show that this is equivalent to a bilinear attention model where each feature is pooled by low-rank bilinear approximation Low-rank bilinear feature learning inside bilinear attention Similarly to MLB (Kim et al., ICLR 2017), activation functions can be applied
15 Bilinear Attention Networks X Y Linear ReLU Linear ReLU transpose Linear ReLU Linear Softmax Linear ReLU transpose Linear Classifier
16 Time Complexity Assuming that M N > K > φ ρ, the time complexity of bilinear attention networks is O(KMφ) where K denotes hidden size, since BAN consists of matrix chain multiplication Empirically, BAN takes 284s/epoch while unitary attention control takes 190s/epoch Largely due to the increment of input size for Softmax function, φ to φ ρ
17 Residual Learning of Attention Residual learning exploits multiple attention maps (f0 is X and {ɑi} is fixed to ones): bilinear attention map f i+1 = BAN i (f i, Y; A i ) 1 T + i f i shortcut bilinear attention networks repeat {# of tokens} times
18 Overview After getting bilinear attention maps, we can stack multiple BANs. Residual Learning X T U ρ What is the mustache made of? GRU K p All hidden states X φ V T Y Y Object Detection φ K = att_1 ρ att_2 Softmax K K ρ 1 ρ φ U T X Y T V K=N Residual Learning U T X ρ 1 ρ Att_1 φ Att_2 1 1 K Y T V K φ φ X = = X + repeat 1 ρ X + K repeat 1 ρ Sum Pooling MLP classifier Step 1. Bilinear Attention Maps Step 2. Bilinear Attention Networks
19 Multiple Attention Maps Single model on validation score Validation VQA 2.0 Score +% Bottom-Up (Teney et al., 2017) BAN BAN BAN BAN BAN
20 Residual Learning Validation VQA 2.0 Score +/- BAN-4 (Residual) ±0.09 i BAN i (X,Y; A i )! BAN (X,Y; A ) i i i BAN-4 (Sum) ± BAN-4 (Concat) ±
21 Comparison with Co-attention Validation VQA 2.0 Score +/- BAN-1 Bilinear Attention ±0.14 Co-Attention ± Attention ± The number of parameters is controlled (all comparison models have 32M parameters).
22 Valid Comparison with 35 Co-attention 45 uni-att train bi-att val co-att val uni-att val Valid BAN-2 BAN-4 BAN-8 BAN Epoch Used glimpses (a) 85 (a) 85 (b) (b) 67.0 (c) 70 Validation score Validation score Bi-Att train Co-Att train Uni-Att train Bi-Att val Co-Att val Uni-Att val Training curves Validation curves Bi-Att train Co-Att train Uni-Att train Bi-Att val Co-Att val Uni-Att val Validation score Uni-Att Co-Att BAN-1 BAN-4 BAN-1+MFB 0M 15M 30M 45M 60M Validation score Epoch Epoch The number of parameters The number of parameters Used g The number of parameters is controlled (all comparison models have 32M parameters).
23 Visualization
24 Integration of Counting module with BAN Counter (Zhang et al., 2018) gets a multinoulli distribution and detected box info Our method maxout from bilinear attention distribution to unitary attention distribution, easy to adapt multitask learning ɸ Softmax Bilinear Attention ρ Logits Maxout Sigmoid Counting Module f i+1 = BAN i (f i, Y; A i )+g i (c i ) 1 T + f i 1 ɸ maxout inference steps for multi-glimpse attention linear embedding of the i-th output of counter module residual learning of attention
25 Comparison with State-of-the-arts test-dev Zhang et al. (2018) Numbers Ours Single model on test-dev score Test-dev VQA 2.0 Score Prior % Language-Only % 2016 winner 2017 winner 2017 runner-up MCB (ResNet) % Bottom-Up (FRCNN) % MFH (ResNet) % MFH (FRCNN) % BAN (Ours; FRCNN) % BAN-Glove (Ours; FRCNN) % BAN-Glove-Counter (Ours; FRCNN) % image feature attention model counting feature
26 Flickr30k Entities Visual grounding task mapping entity phrases to regions in an image 0.48<0.5 [/EN#40120/people A girl] in [/EN#40122/clothing a yellow tennis suit], [/ EN#40125/other green visor] and [/EN#40128/clothing white tennis shoes] holding [/EN#40124/other a tennis racket] in a position where she is going to hit [/EN#40121/other the tennis ball].
27 Flickr30k Entities Visual grounding task mapping entity phrases to regions in an image [/EN#38656/people A male conductor] wearing [/EN#38657/clothing all black] leading [/EN#38653/people a orchestra] and [/EN#38658/people choir] on [/ EN#38659/scene a brown stage] playing and singing [/EN#38664/other a musical number].
28 Flickr30k Entities Zhang et al. (2016) SCRC (2016) DSPE (2016) GroundeR (2016) MCB (2016) CCA (2017) Yeh et al. (2017) Hinami & Satoh (2017) BAN (ours)
29 Flickr30k Entities people clothing bodyparts animals vehicles instruments scene other #Instances 5,656 2, ,619 3,374 GroundeR (2016) CCA (2017) Yeh et al. (2017) Hinami & Satoh (2017) BAN (ours)
30 Conclusions Bilinear attention networks gracefully extends unitary attention networks, as low-rank bilinear pooling inside bilinear attention. Furthermore, residual learning of attention efficiently uses multiple attention maps VQA Challenge runners-up (shared 2nd place) 1st single model (70.35 on test-standard) Challenge Runner-Ups Joint Runner-Up Team 1 SNU-BI Jin-Hwa Kim (Seoul National University) Jaehyun Jun (Seoul National University) Byoung-Tak Zhang (Seoul National University & Surromind Robotics) Challenge Accuracy: 71.69
31 arxiv: Code & pretrained model in PyTorch: github.com/jnhwkim/ban-vqa Thank you! Any question? We would like to thank Kyoung-Woon On, Bohyung Han, and Hyeonwoo Noh for helpful comments and discussion. This work was supported by 2017 Google Ph.D. Fellowship in Machine Learning and Ph.D. Completion Scholarship from College of Humanities, Seoul National University, and the Korea government (IITP VTT, IITP-R SW.StarLab, RMI, KEIT HRI.MESSI, KEIT RISF). The part of computing resources used in this study was generously shared by Standigm Inc.
Bilinear Attention Networks
Bilinear Attention Networks Jin-Hwa Kim 1, Jaehyun Jun 2, Byoung-Tak Zhang 2,3 1 SK T-Brain, 2 Seoul National University, 3 Surromind Robotics jnhwkim@sktbrain.com, {jhjun,btzhang}@bi.snu.ac.kr Abstract
More informationarxiv: v2 [cs.cv] 19 Oct 2018
Bilinear Attention Networks Jin-Hwa Kim 1, Jaehyun Jun 2, Byoung-Tak Zhang 2,3 1 SK T-Brain, 2 Seoul National University, 3 Surromind Robotics jnhwkim@sktbrain.com, {jhjun,btzhang}@bi.snu.ac.kr arxiv:1805.07932v2
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationGeneralized Hadamard-Product Fusion Operators for Visual Question Answering
Generalized Hadamard-Product Fusion Operators for Visual Question Answering Brendan Duke, Graham W Taylor School of Engineering, University of Guelph Vector Institute for Artificial Intelligence Canadian
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationLecture 35: Optimization and Neural Nets
Lecture 35: Optimization and Neural Nets CS 4670/5670 Sean Bell DeepDream [Google, Inceptionism: Going Deeper into Neural Networks, blog 2015] Aside: CNN vs ConvNet Note: There are many papers that use
More informationECE521 W17 Tutorial 1. Renjie Liao & Min Bai
ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review
More informationConvolutional Neural Network Architecture
Convolutional Neural Network Architecture Zhisheng Zhong Feburary 2nd, 2018 Zhisheng Zhong Convolutional Neural Network Architecture Feburary 2nd, 2018 1 / 55 Outline 1 Introduction of Convolution Motivation
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationExpressive Efficiency and Inductive Bias of Convolutional Networks:
Expressive Efficiency and Inductive Bias of Convolutional Networks: Analysis & Design via Hierarchical Tensor Decompositions Nadav Cohen The Hebrew University of Jerusalem AAAI Spring Symposium Series
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationNotes on Back Propagation in 4 Lines
Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this
More informationExpressive Efficiency and Inductive Bias of Convolutional Networks:
Expressive Efficiency and Inductive Bias of Convolutional Networks: Analysis & Design via Hierarchical Tensor Decompositions Nadav Cohen The Hebrew University of Jerusalem AAAI Spring Symposium Series
More informationarxiv: v2 [cs.cv] 18 Jul 2018
Question Type Guided Attention in Visual Question Answering Yang Shi 1, Tommaso Furlanello 2, Sheng Zha 3, and Animashree Anandkumar 34 arxiv:1804.02088v2 [cs.cv] 18 Jul 2018 1 University of California,
More informationDeep Feedforward Neural Networks Yoshua Bengio
Deep Feedforward Neural Networks Yoshua Bengio August 28-29th, 2017 @ DS3 Data Science Summer School Input Layer, Hidden Layers, Output Layer 2 Output layer Here predicting a supervised target Hidden layers
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationMitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks
Mitosis Detection in Breast Cancer Histology Images with Multi Column Deep Neural Networks IDSIA, Lugano, Switzerland dan.ciresan@gmail.com Dan C. Cireşan and Alessandro Giusti DNN for Visual Pattern Recognition
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationIntroduction to Deep Learning CMPT 733. Steven Bergner
Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization
More informationConditional Kronecker Batch Normalization for Compositional Reasoning
STUDENT, PROF, COLLABORATOR: BMVC AUTHOR GUIDELINES 1 Conditional Kronecker Batch Normalization for Compositional Reasoning Cheng Shi 12 shic17@mails.tsinghua.edu.cn Chun Yuan 2 yuanc@sz.tsinghua.edu.cn
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationNeural Networks Language Models
Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationRecurrent Neural Networks. COMP-550 Oct 5, 2017
Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)
More informationWaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio
More information2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data
2017 Fall ECE 692/599: Binary Representation Learning for Large Scale Visual Data Liu Liu Instructor: Dr. Hairong Qi University of Tennessee, Knoxville lliu25@vols.utk.edu September 21, 2017 Liu Liu (UTK)
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationMultimodal context analysis and prediction
Multimodal context analysis and prediction Valeria Tomaselli (valeria.tomaselli@st.com) Sebastiano Battiato Giovanni Maria Farinella Tiziana Rotondo (PhD student) Outline 2 Context analysis vs prediction
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationMidterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm
Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm Name: Student number: This is a closed-book test. It is marked out of 15 marks. Please answer ALL of the
More informationDeep Residual. Variations
Deep Residual Network and Its Variations Diyu Yang (Originally prepared by Kaiming He from Microsoft Research) Advantages of Depth Degradation Problem Possible Causes? Vanishing/Exploding Gradients. Overfitting
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationSpatial Transformation
Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention
More informationHomework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown
Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as
More informationGloVe: Global Vectors for Word Representation 1
GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background
More informationMultiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis
Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis Minghang Zhao, Myeongsu Kang, Baoping Tang, Michael Pecht 1 Backgrounds Accurate fault diagnosis is important to ensure
More informationLecture 4 Backpropagation
Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz
More informationFaster Training of Very Deep Networks Via p-norm Gates
Faster Training of Very Deep Networks Via p-norm Gates Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh Center for Pattern Recognition and Data Analytics Deakin University, Geelong Australia Email:
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationRandom Coattention Forest for Question Answering
Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationRECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS
2018 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17 20, 2018, AALBORG, DENMARK RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS Simone Scardapane,
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationDIVERSITY REGULARIZATION IN DEEP ENSEMBLES
Workshop track - ICLR 8 DIVERSITY REGULARIZATION IN DEEP ENSEMBLES Changjian Shui, Azadeh Sadat Mozafari, Jonathan Marek, Ihsen Hedhli, Christian Gagné Computer Vision and Systems Laboratory / REPARTI
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationMaxout Networks. Hien Quoc Dang
Maxout Networks Hien Quoc Dang Outline Introduction Maxout Networks Description A Universal Approximator & Proof Experiments with Maxout Why does Maxout work? Conclusion 10/12/13 Hien Quoc Dang Machine
More informationCS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016
CS 1674: Intro to Computer Vision Final Review Prof. Adriana Kovashka University of Pittsburgh December 7, 2016 Final info Format: multiple-choice, true/false, fill in the blank, short answers, apply an
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationGeneralized Zero-Shot Learning with Deep Calibration Network
Generalized Zero-Shot Learning with Deep Calibration Network Shichen Liu, Mingsheng Long, Jianmin Wang, and Michael I.Jordan School of Software, Tsinghua University, China KLiss, MOE; BNRist; Research
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationPytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01
(Li Lab) National Science Foundation Center for Big Learning (CBL) Department of Electrical and Computer Engineering (ECE) Department of Computer & Information Science & Engineering (CISE) Pytorch Tutorial
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationLecture 3: Multiclass Classification
Lecture 3: Multiclass Classification Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Some slides are adapted from Vivek Skirmar and Dan Roth CS6501 Lecture 3 1 Announcement v Please enroll in
More informationAUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A PROBABILISTIC PERSPECTIVE. Qiuqiang Kong*, Yong Xu*, Wenwu Wang, Mark D. Plumbley
AUDIO SET CLASSIFICATION WITH ATTENTION MODEL: A PROBABILISTIC PERSPECTIVE Qiuqiang ong*, Yong Xu*, Wenwu Wang, Mark D. Plumbley Center for Vision, Speech and Signal Processing, University of Surrey, U
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationStatistical Machine Learning
Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se
More informationCS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam
CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam This examination consists of 17 printed sides, 5 questions, and 100 points. The exam accounts for 20% of your total grade.
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationNeural Word Embeddings from Scratch
Neural Word Embeddings from Scratch Xin Li 12 1 NLP Center Tencent AI Lab 2 Dept. of System Engineering & Engineering Management The Chinese University of Hong Kong 2018-04-09 Xin Li Neural Word Embeddings
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationSegmentation of Cell Membrane and Nucleus using Branches with Different Roles in Deep Neural Network
Segmentation of Cell Membrane and Nucleus using Branches with Different Roles in Deep Neural Network Tomokazu Murata 1, Kazuhiro Hotta 1, Ayako Imanishi 2, Michiyuki Matsuda 2 and Kenta Terai 2 1 Meijo
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationFrom CDF to PDF A Density Estimation Method for High Dimensional Data
From CDF to PDF A Density Estimation Method for High Dimensional Data Shengdong Zhang Simon Fraser University sza75@sfu.ca arxiv:1804.05316v1 [stat.ml] 15 Apr 2018 April 17, 2018 1 Introduction Probability
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationAnticipating Visual Representations from Unlabeled Data. Carl Vondrick, Hamed Pirsiavash, Antonio Torralba
Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio Torralba Overview Problem Key Insight Methods Experiments Problem: Predict future actions and objects Image
More information