Deep Learning for Natural Language Processing
|
|
- Tyler Parsons
- 6 years ago
- Views:
Transcription
1 Deep Learning for Natural Language Processing Xipeng Qiu Fudan University 2016/5/29, CCF ADL, Beijing Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 1 / 131
2 Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 2 / 131
3 Basic Concepts Articial Intelligence Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 3 / 131
4 Basic Concepts Articial Intelligence Begin with AI Human: Memory, Computation Computer: Learning, Thinking, Creativity Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 4 / 131
5 Basic Concepts Articial Intelligence Turing Test Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 5 / 131
6 Basic Concepts Articial Intelligence Articial Intelligence Denition from Wikipedia Articial intelligence (AI) is the intelligence exhibited by machines. Colloquially, the term articial intelligence is likely to be applied when a machine uses cutting-edge techniques to competently perform or mimic cognitive functions that we intuitively associate with human minds, such as learning and problem solving. Research Topics Knowledge Representation Machine Learning Natural Language Processing Computer Vision Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 6 / 131
7 Basic Concepts Articial Intelligence Challenge: Sematic Gap Text in Human 床前明月光, 疑是地上霜 举头望明月, 低头思故乡 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 7 / 131
8 Basic Concepts Articial Intelligence Challenge: Sematic Gap Text in Computer Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 8 / 131
9 Basic Concepts Articial Intelligence Challenge: Sematic Gap Figure: Guernica (Picasso) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 9 / 131
10 Basic Concepts Machine Learning Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 10 / 131
11 Basic Concepts Machine Learning Machine Learning Input: x Model Output: y Training Data: (x, y) Learning Algorithm Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 11 / 131
12 Basic Concepts Machine Learning Basic Concepts of Machine Learning Input Data: (x i, y i ),1 i m Model: Linear Model: y = f (x) = w T x + b Generalized Linear Model: y = f (x) = w T φ(x) + b Non-linear Model: Neural Network Criterion: Loss Function: L(y, f (x)) Optimization Empirical Risk: Q(θ) = 1 m m L(y i=1 i, f (x i, θ)) Minimization Regularization: θ 2 Objective Function: Q(θ) + λ θ 2 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 12 / 131
13 Basic Concepts Machine Learning Loss Function Given an test sample (x, y), the predicted label is f (x, θ) 0-1 Loss L(y, f (x, θ)) = { 0 if y = f (x, θ) 1 if y f (x, θ) (1) = I (y f (x, θ)), (2) here I is indicator function. Quadratic Loss L(y, ŷ) = (y f (x, θ)) 2 (3) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 13 / 131
14 Basic Concepts Machine Learning Loss Function Cross-entropy Loss We regard f i (x, θ) as the conditional probability of class i. f i (x, θ) [0, 1], C f i (x, θ) = 1 (4) i=1 f y (x, θ) is the likelihood function of y. Negative Log Likelihood function is L(y, f (x, θ)) = log f y (x, θ). (5) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 14 / 131
15 Basic Concepts Machine Learning Loss Function We use one-hot vector y to represent class c in which y c = 1 and other elements are 0. Negative Log Likelihood function can be rewritten as L(y, f (x, θ)) = C y i log f i (x, θ). (6) y i is distribution of gold labels. Thus, Eq 6 is Cross Entropy Loss function. i=1 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 15 / 131
16 Basic Concepts Machine Learning Loss Function Hinge Loss For binary classication, y and f (x, θ) are in { 1, +1}. Hinge Loss is L(y, f (x, θ)) = max (0, 1 yf (x, θ)) (7) = 1 yf (x, θ) +. (8) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 16 / 131
17 Basic Concepts Machine Learning Loss Function For binary classication, y and f (x, θ) are in { 1, +1}. z = yf (x, θ). yandongl/loss.html Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 17 / 131
18 Basic Concepts Machine Learning Parameter Learning In ML, our objective is to learn the parameter θ to minimize the loss function. Gradient Descent: θ = arg min R(θ t ) (9) θ N = arg min θ 1 N i=1 L ( y (i), f (x (i), θ) ). (10) a t+1 = a t λ R(θ) θ t (11) N R ( θ t ; x (i), y (i)) = a t λ, θ (12) i=1 λ is also called Learning Rate in ML. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 18 / 131
19 Basic Concepts Machine Learning Stochastic Gradient Descent (SGD) a t+1 = a t λ R( θ t ; x (t), y (t)), (13) θ Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 19 / 131
20 Basic Concepts Machine Learning Two Tricks of SGD Early-Stop Shue Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 20 / 131
21 Basic Concepts Machine Learning Linear Classication For binary classication y {0, 1}, the classier is { 1 if w ŷ = T x > 0 0 if w T x 0 = I (wt x > 0), (14) x 1 w T x = 0 w w1 w x 2 Figure: Binary Linear Classication Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 21 / 131
22 Basic Concepts Machine Learning Logistic Regression How to learn the parameter w: Perceptron, Logistic Regression, etc. The posterior probability of y = 1 is P(y = 1 x) = σ(w T x) = exp( w T x), (15) where, σ( ) is logistic function. The posterior probability of y = 0 is P(y = 0 x) = 1 P(y = 1 x). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 22 / 131
23 Basic Concepts Machine Learning Logistic Regression Given n samples (x (i), y (i) ), 1 i N, we use the cross-entropy loss function. J (w) = N i=1 ( ( y (i) log σ(w T ) ( x (i) ) + (1 y (i) ) log 1 σ(w T )) x (i) ) (16) The gradient of J (w) is Initialize w 0 = 0, and update J (w) = N ( ( x (i) σ(w T x (i) ) y (i))) (17) w i=1 J (w) w t+1 = w t + λ w, (18) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 23 / 131
24 Basic Concepts Machine Learning Multiclass Classication Generally, y = {1,, C}, we dene C discriminant functions where w c is weight vector of class c. Thus, f c (x) = w T c x, c = 1,, C, (19) ŷ = C arg max c=1 w T c x (20) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 24 / 131
25 Basic Concepts Machine Learning Softmax Regression Softmax regression is a generalization of logistic regression to multi-class classication problems. With softmax, the posterior probability of y = c is P(y = c x) = softmax(w T c x) = exp(w c x) C i=1 exp(w i x). (21) To represent class c by one-hot vector where I () is indictor function. y = [I (1 = c), I (2 = c),, I (C = c)] T, (22) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 25 / 131
26 Basic Concepts Machine Learning Softmax Regression Redene Eq 21, ŷ = softmax(w T x) = exp(w T x) 1 T exp ((W T x)) = exp(z) 1T, (23) exp (z) where, W = [w 1,, w C ], ŷ is predicted posterior probability, ẑ = W T x is input of softmax function. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 26 / 131
27 Basic Concepts Machine Learning Softmax Regression Given training set (x (i), y (i) ), 1 i N, the cross-entropy loss is J (W ) = The gradient of J (W ) is N C i=1 c=1 y c (i) log ŷ c (i) = N (y (i) ) T log ŷ (i) i=1 J (W ) w c N ( = x (i) y (i) ŷ (i)) c i=1 (24) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 27 / 131
28 Basic Concepts Machine Learning The idea pipeline of NLP Word Segmentation POS Tagging Applications: Question Answering Machine Translation Sentiment Analysis Syntactic Parsing Semantic Parsing Knowledge Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 28 / 131
29 Basic Concepts Machine Learning But in practice: End-to-End I like this movie. I dislike this movie. Model + Model Selection Decoding/Inference Feature Extraction Parameter Learning Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 29 / 131
30 Basic Concepts Machine Learning Feature Extraction Bag-of-Word Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 30 / 131
31 Basic Concepts Machine Learning Text Classication Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 31 / 131
32 Basic Concepts Deep Learning Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 32 / 131
33 Basic Concepts Deep Learning Articial Neural Network Articial neural networks 1 (ANNs) are a family of models inspired by biological neural networks (the central nervous systems of animals, in particular the brain). Articial neural networks are generally presented as systems of interconnected neurons which exchange messages between each other. 1 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 33 / 131
34 Basic Concepts Deep Learning Articial Neuron Input: x = (x 1, x 2,, x n ) State: z Output: a z = w x + b (25) a = f (z) (26) 1 b x 1 w 1 x 2 w 2 σ 0/1. w n x n weight Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 34 / 131
35 Basic Concepts Deep Learning Activation Function Sigmoid Function: σ(x) = e x (27) tanh(x) = ex e x e x + e x (28) tanh(x) = 2σ(2x) 1 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 35 / 131
36 Basic Concepts Deep Learning Activation Function rectier function 2, also called rectied linear unit (ReLU) 3. rectier(x) = max(0, x) (29) softplus function 4 : softplus(x) = log(1 + e x ) (30) 2 X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectier neural networks. In: International Conference on Articial Intelligence and Statistics. 2011, pp V. Nair and G. E. Hinton. Rectied linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10). 2010, pp C. Dugas et al. Incorporating second-order functional knowledge for better option pricing. In: Advances in Neural Information Processing Systems (2001). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 36 / 131
37 Basic Concepts Deep Learning Activation Function (a) logistic (b) tanh (c) rectier (d) softplus Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 37 / 131
38 Basic Concepts Deep Learning Types of Articial Neural Network 5 Feedforward neural network, also called Multilayer Perceptron (MLP). Recurrent neural network. 5 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 38 / 131
39 Basic Concepts Deep Learning Basic Concepts of Deep Learning Model: Articial neural networks that consist of multiple hidden non-linear layers. Function: Non-linear function y = σ( i w ix i + b). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 39 / 131
40 Basic Concepts Deep Learning Feedforward Neural Network In feedforward neural network, the information moves in only one direction forward: From the input nodes data goes through the hidden nodes (if any) and to the output nodes. There are no cycles or loops in the network. Input Hidden Hidden Output x 1 x 2 y x 3 x 4 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 40 / 131
41 Basic Concepts Deep Learning Feedforward Computing Denitions: L Number of Layers; n l Number of neurons in l-th layer; f l ( ) Activation function in l-th layer; W (l) R nl n l 1 weight matrix between l 1-th layer and l-th layer; b (l) R nl bias vector between l 1-th layer and l-th layer; z (l) R nl state vector of neurons in l-th layer; a (l) R nl activation vector of neurons in l-th layer. z (l) = W (l) a (l 1) + b (l) (31) a (l) = f l (z (l) ) (32) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 41 / 131
42 Basic Concepts Deep Learning Feedforward Computing Thus, z (l) = W (l) f l (z (l 1) ) + b (l) (33) x = a (0) z (1) a (1) z (2) a (L 1) z (L) a (L) = y (34) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 42 / 131
43 Basic Concepts Deep Learning Combining feedforward network and Machine Learning Given training samples (x (i), y (i) ), 1 i N, and feedforward network f (x w, b), the objective function is J(W, b) = = N L(y (i), f (x (i) W, b)) λ W 2 F, (35) i=1 N J(W, b; x (i), y (i) ) λ W 2 F, (36) i=1 where W 2 F = L n l+1 n l l=1 j=1 j= W (l) ij. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 43 / 131
44 Basic Concepts Deep Learning Learning by GD W (l) = W (l) J(W, b) α W (l), (37) = W (l) α N i=1 ( J(W, b; x(i), y (i) ) W (l) ) λw, (38) b (l) = b (l) α J(W, b; x(i), y (i) ) b (l), (39) N = b (l) α ( J(W, b; x(i), y (i) ) b (l) ), (40) i=1 (41) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 44 / 131
45 Basic Concepts Deep Learning Backpropogation How to compute J(W,b;x,y) W (l) ij J(W,b;x,y) W (l)? J(W, b; x, y) W (l) ij ( ) J(W, b; x, y) z (l) = z (l) W (l). (42) ij We dene δ (l) = J(W,b;x,y) z (l) R n(l). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 45 / 131
46 Basic Concepts Deep Learning Backpropogation Because z (l) = W (l) a (l 1) + b (l), z (l) W (l) ij = (W (l) a (l 1) + b (l) ) W (l) ij = 0. a (l 1) j. 0. i-th row(43) Therefore, J(W, b; x, y) W (l) ij = δ (l) i a (l 1) j (44) (45) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 46 / 131
47 Basic Concepts Deep Learning Backpropogation J(W, b; x, y) W (l) = δ (l) (a (l 1) ). (46) In the same way, J(W, b; x, y) b (l) = δ (l). (47) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 47 / 131
48 Basic Concepts Deep Learning How to compute δ (l)? δ (l) J(W, b; x, y) z (l) (48) = a(l) z(l+1) (l) J(W, b; x, y) z (l+1) (49) z a (l) = diag(f l )) (W (l+1) ) δ (l+1) (50) = f l ) ((W (l+1) ) δ (l+1) ), (51) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 48 / 131
49 Basic Concepts Deep Learning Backpropogation Algorithm Input Output: W, b 1 Initialize W, b ; : Training Set: (x (i), y (i) ), i = 1,, N, Iteration: T 2 for t = 1 T do 3 for i = 1 N do 4 (1) Feedforward Computing; 5 (2) Compute δ (l) by 51; 6 (3) Compute gradient of parameters by 46 47; 7 8 J(W,b;x (i),y (i) ) 9 (4) Update Parameter; = δ (l) W (l) (a (l 1) ) ; J(W (i),b;x,y (i) ) = δ (l) ; b (l) 10 W (l) = W (l) α N i=1 ( J(W,b;x(i),y (i) ) W (l) ) λw ; 11 b (l) = b (l) α N i=1 ( J(W,b;x(i),y (i) ) b (l) ); 12 end 13 end Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 49 / 131
50 Basic Concepts Deep Learning Figure: Gradient Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 50 / 131 Gradient Vanishing δ (l) = f l (z(l) ) ((W (l+1) ) δ (l+1), (52) When we use sigmoid function, such as logistic σ(x) and tanh, σ (x) = σ(x)(1 σ(x)) [0, 0.25] (53) tanh (x) = 1 (tanh(x)) 2 [0, 1]. (54) (e) logistic (f) tanh
51 Basic Concepts Deep Learning Diculties Huge Parameters Non-convex Optimization Gradient Vanishing Poor Interpretability Requirments High Computation Big Data Good Algorithms Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 51 / 131
52 Basic Concepts Deep Learning Tricks and Skills 6 7 Use ReLU non-linearities Use cross-entropy loss for classication SDG+mini-batch Shue the training samples ( very important) Early-Stop Normalize the input variables (zero mean, unit variance) Schedule to decrease the learning rate Use a bit of L1 or L2 regularization on the weights (or a combination) Use dropout for regularization Data Argument 6 G. B. Orr and K.-R. M ller. Neural networks: tricks of the trade. Springer, Geo Hinton, Yoshua Bengio & Yann LeCun, Deep Learning, NIPS 2015 Tutorial. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 52 / 131
53 Neural Models for Representation Learning General Architecture Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 53 / 131
54 Neural Models for Representation Learning General Architecture General Neural Architectures for NLP How to use neural network for the NLP tasks? Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 54 / 131
55 Neural Models for Representation Learning General Architecture General Neural Architectures for NLP How to use neural network for the NLP tasks? Distributed Representation Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 54 / 131
56 Neural Models for Representation Learning General Architecture General Neural Architectures for NLP 8 1 represent the words/features with dense vectors (embeddings) by lookup table; 2 concatenate the vectors; 3 multi-layer neural networks. classication matching ranking 8 R. Collobert et al. Natural language processing (almost) from scratch. In: The Journal of Machine Learning Research (2011). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 55 / 131
57 Neural Models for Representation Learning General Architecture Dierence with the traditional methods Traditional methods Neural methods Discrete Vector Dense Vector Features (One-hot Representation) (Distributed Representation) High-dimension Low-dimension Classier Linear Non-Linear Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 56 / 131
58 Neural Models for Representation Learning General Architecture The key point is how to encode the word, phrase, sentence, paragraph, or even document into the distributed representation? Representation Learning Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 57 / 131
59 Neural Models for Representation Learning General Architecture Representation Learning for NLP Word Level NNLM C&W CBOW & Skip-Gram Sentence Level NBOW Sequence Models: Recurrent NN (LSTM/GRU), Paragraph Vector Topoligical Models: Recursive NN, Convolutional Models: Convolutional NN Document Level NBOW Hierachical Models two-level CNN Sequence Models LSTM, Paragraph Vector Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 58 / 131
60 Neural Models for Representation Learning General Architecture Let's start with Language Model A statistical language model is a probability distribution over sequences of words. A sequence W consists of L words. n-gram model: P(W ) = P(w 1:L) = P(w 1,, w L ) = P(w 1 )P(w 2 w 1 )P(w 3 w 1 w 2 ) P(w L w 1:(L 1)) L = P(w i w 1:(i 1)). (55) i=1 P(W ) = L P(w i w (i n+1):(i 1)). (56) i=1 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 59 / 131
61 Neural Models for Representation Learning General Architecture Neural Probabilistic Language Model 9 turn unsupervised learning into supervised learning; avoid the data sparsity of n-gram model; project each word into a low dimensional space. 9 Y. Bengio, R. Ducharme, and P. Vincent. A Neural probabilistic language model. In: Journal of Machine Learning Research (2003). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 60 / 131
62 Neural Models for Representation Learning General Architecture Problem of Very Large Vocabulary Softmax output: P h θ = exp(s θ (w, h)) w exp(s θ(w, h)), (57) Unfortunately both evaluating Pθ h and computing the corresponding likelihood gradient requires normalizing over the entire vocabulary Hierarchical Softmax: a tree-structured vocabulary 10 Negative Sampling 11, noise-contrastive estimation (NCE) A. Mnih and G. Hinton. A scalable hierarchical distributed language model. In: Advances in neural information processing systems (2009); F. Morin and Y. Bengio. Hierarchical Probabilistic Neural Network Language Model. In: Aistats. Vol. 5. Citeseer. 2005, pp T. Mikolov et al. Ecient estimation of word representations in vector space. In: arxiv preprint arxiv: (2013). 12 A. Mnih and K. Kavukcuoglu. Learning word embeddings eciently with noise-contrastive estimation. In: Advances in Neural Information Processing Systems. 2013, pp Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 61 / 131
63 Neural Models for Representation Learning General Architecture Linguistic Regularities of Word Embeddings T. Mikolov, W.-t. Yih, and G. Zweig. Linguistic Regularities in Continuous Space Word Representations. In: HLT-NAACL. 2013, pp Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 62 / 131
64 Neural Models for Representation Learning General Architecture Skip-Gram Model Mikolov et al., Ecient estimation of word representations in vector space. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 63 / 131
65 Neural Models for Representation Learning General Architecture Skip-Gram Model Given a pair of words (w, c), the probability that the word c is observed in the context of the target word w is given by Pr(D = 1 w, c) = exp( w T c), where w and c are embedding vectors of w and c respectively. The probability of not observing word c in the context of w is given by, Pr(D = 0 w, c) = exp( w T c). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 64 / 131
66 Neural Models for Representation Learning General Architecture Skip-Gram Model with Negative Sampling Given a training set D, the word embeddings are learned by maximizing the following objective function: J(θ) = Pr(D = 1 w, c) + Pr(D = 0 w, c), w,c D w,c D where the set D is randomly sampled negative examples, assuming they are all incorrect. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 65 / 131
67 Neural Models for Representation Learning Convolutional Neural Network Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 66 / 131
68 Neural Models for Representation Learning Convolutional Neural Network Convolutional Neural Network Convolutional neural network (CNN, or ConvNet) is a type of feed-forward articial neural network. Distinguishing features 15 : 1 Local connectivity 2 Shared weights 3 Subsampling 15 Y. LeCun et al. Gradient-based learning applied to document recognition. In: Proceedings of the IEEE 11 (1998). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 67 / 131
69 Neural Models for Representation Learning Convolutional Neural Network Convolution Convolution is an integral that expresses the amount of overlap of one function as it is shifted over another function. Given an input sequence x t, t = 1,, n and a lter f t, t = 1,, m, the convolution is n y t = f k x t k+1. (58) k=1 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 68 / 131
70 Neural Models for Representation Learning Convolutional Neural Network One-dimensional convolution 15 Figure from: Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 69 / 131
71 Neural Models for Representation Learning Convolutional Neural Network Two-dimensional convolution Given an image x ij, 1 i M, 1 j N and lter f ij, 1 i m, 1 j n, the convolution is Mean lter:f uv = 1 mn y ij = m u=1 v=1 n f uv x i u+1,j v+1. (59) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 70 / 131
72 Neural Models for Representation Learning Convolutional Neural Network Convolutional Layer a (l) = f (w (l) a (l 1) + b (l) ), (60) is convolutional operation. w (l) is shared by all the neurons of l-th layer. Just need m + 1 parameters, and n (l+1) = n (l) m + 1. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 71 / 131
73 Neural Models for Representation Learning Convolutional Neural Network Fully Connected Layer V.S. Convolutional Layer (a) Fully Connected Layer (b) Convolutional Layer Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 72 / 131
74 Neural Models for Representation Learning Convolutional Neural Network Pooling Layer It is common to periodically insert a pooling layer in-between successive convolutional layers. progressively reduce the spatial size of the representation reduce the amount of parameters and computation in the network avoid overtting For a feature map X (l), we divide it into several (non-)overlapped regions R k, k = 1,, K. A pooling function down( ) is X (l+1) k = f (Z (l+1) k ), (61) ( = f w (l+1) down(r k ) + b (l+1)). (62) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 73 / 131
75 Neural Models for Representation Learning Convolutional Neural Network Pooling Layer X (l+1) = f (Z (l+1) ), (63) ( = f w (l+1) down(x l ) + b (l+1)), (64) Two choices of down( ): Maximum Pooling and Average Pooling. pool max (R k ) = max a i i R k (65) pool avg (R k ) = 1 a i. R k i R k (66) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 74 / 131
76 Neural Models for Representation Learning Convolutional Neural Network Pooling Layer 15 Figure from: Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 75 / 131
77 Neural Models for Representation Learning Convolutional Neural Network Large Scale Visual Recognition Challenge Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 76 / 131
78 Neural Models for Representation Learning Convolutional Neural Network DeepMind's AlphaGo 15 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 77 / 131
79 Neural Models for Representation Learning Convolutional Neural Network DeepMind's AlphaGo 15 Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 78 / 131
80 Neural Models for Representation Learning Convolutional Neural Network CNN for Sentence Modeling Input: A sentence of length n, After Lookup layer, X = [x 1, x 2,, x n ] R d n variable-length input convolution pooling Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 79 / 131
81 Neural Models for Representation Learning Convolutional Neural Network CNN for Sentence Modeling 16 Key steps convolution z t:t+m 1 = x t x t+1 x t+m 1 R dm matrix-vector operation x l t = f (W l z t:t+m 1 + b l ) Pooling (max over time) x l i = max t x l 1 i,t 16 Collobert et al., Natural language processing (almost) from scratch. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 80 / 131
82 Neural Models for Representation Learning Convolutional Neural Network CNN for Sentence Modeling 17 Key steps convolution z t:t+m 1 = x t x t+1 x t+m 1 R dm vector-vector operation x l t = f (w l z t:t+m 1 + b l ) multiple lters / multiple channels pooling (max over time) x l i = max t x l 1 i,t 17 Y. Kim. Convolutional neural networks for sentence classication. In: arxiv preprint arxiv: (2014). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 81 / 131
83 Neural Models for Representation Learning Convolutional Neural Network Dynamic CNN for Sentence Modeling 18 Key steps one-dimensional convolution x l i,t = f (wl i x i,t:t+m 1 + bi l) k-max pooling (max over time) k l = max(k top, L l L n) (optional) folding: sums every two rows in a feature map component-wise. multiple lters / multiple channels 18 N. Kalchbrenner, E. Grefenstette, and P. Blunsom. A Convolutional Neural Network for Modelling Sentences. In: Proceedings of ACL Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 82 / 131
84 Neural Models for Representation Learning Convolutional Neural Network CNN for Sentence Modeling 19 Key steps convolution z t:t+m 1 = x t x t+1 x t+m 1 R dm matrix-vector operation x l t = f (W l z l 1 t:t+m 1 + bl ) binary max pooling (over time) x l i,t = max(xl 1 i,2t 1, xl 1 i,2t ) 19 B. Hu et al. Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 83 / 131
85 Neural Models for Representation Learning Recurrent Neural Network Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 84 / 131
86 Neural Models for Representation Learning Recurrent Neural Network Recurrent Neural Network (RNN) Output h t Hidden x t h t Delay The RNN has recurrent hidden states whose output at each time is dependent on that of the previous time. More formally, given a sequence x (1:n) = (x (1),..., x (t),..., x (n) ), the RNN updates its recurrent hidden state h (t) by Input h t 1 h t = { 0 t = 0 f (h t 1, x t ) otherwise (67) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 85 / 131
87 Neural Models for Representation Learning Recurrent Neural Network Simple Recurrent Network 20 where f is non-linear function. h t = f (Uh t 1 + Wx t + b), (68) y 1 y 2 y 3 y 4 y T h 1 h 2 h 3 h 4 h T x 1 x 2 x 3 x 4 x T 20 J. L. Elman. Finding structure in time. In: Cognitive science 2 (1990). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 86 / 131
88 Neural Models for Representation Learning Recurrent Neural Network Backpropagation Through Time, BPTT Loss at time t is J t, the whole loss is J = T t=1 J t. The gradient of J is J T U = = t=1 T t=1 J t U (69) h t J t, (70) U h t J 1 J 2 J 3 J 4 J T h 1 h 2 h 3 h 4 h T x 1 x 2 x 3 x 4 x T Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 87 / 131
89 Neural Models for Representation Learning Recurrent Neural Network Gradient of RNN J T U = J T U = h t h k = t t=1 k=1 = h k U t t=1 k=1 t i=k+1 t i=k+1 ( t i=k+1 h k U h t h k y t h t J t y t, (71) h i h i 1 (72) U T diag[f (h i 1 )]. (73) U T diag(f (h i 1 )) ) y t h t J t y t. (74) Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 88 / 131
90 Neural Models for Representation Learning Recurrent Neural Network Long-Term Dependencies Dene γ = U T diag(f (h i 1 )), Exploding Gradient Problem: When γ > 1 and t k, γ t k. Vanishing Gradient Problem: When γ < 1 and t k, γ t k 0. There are various ways to solve Long-Term Dependency problem. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 89 / 131
91 Neural Models for Representation Learning Recurrent Neural Network Long Short Term Memory Neural Network (LSTM) 21 The core of the LSTM model is a memory cell c encoding memory at every time step of what inputs have been observed up to this step. The behavior of the cell is controlled by three gates: input gate i output gate o forget gate f i t = σ(w i x t + U i h t 1 + V i c t 1 ), (75) f t = σ(w f x t + U f h t 1 + V f c t 1 ), (76) o t = σ(w o x t + U o h t 1 + V o c t ), (77) c t = tanh(w c x t + U c h t 1 ), (78) c t = f t c t 1 + i t c t, (79) h t = o t tanh(c t ), (80) 21 S. Hochreiter and J. Schmidhuber. Long short-term memory. In: Neural computation 8 (1997). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 90 / 131
92 Neural Models for Representation Learning Recurrent Neural Network LSTM Architecture c t 1 + c t h t 1 f t i t ~c t o t h t σ σ tanh σ x t Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 91 / 131
93 Neural Models for Representation Learning Recurrent Neural Network Gated Recurrent Unit, GRU 22 Two gates: update gate z and reset gate r. r t = σ(w r x t + U r h t 1 ) (81) z t = σ(w z x t + U z h t 1 ) (82) h t = tanh(w c x t + U(r t h t 1 )), (83) h t = z t h t 1 + (1 z t ) h t, (84) (85) 22 K. Cho et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: arxiv preprint arxiv: (2014); J. Chung et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. In: arxiv preprint arxiv: (2014). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 92 / 131
94 Neural Models for Representation Learning Recurrent Neural Network GRU Architecture h t 1 + h t 1 z t r t tanh ~ ht σ σ x t Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 93 / 131
95 Neural Models for Representation Learning Recurrent Neural Network Stacked RNN y 1 y 2 y 3 y 4 y T h (3) 1 h (3) 2 h (3) 3 h (3) 4 h (3) T h (2) 1 h (2) 2 h (2) 3 h (2) 4 h (2) T h (1) 1 h (1) 2 h (1) 3 h (1) 4 h (1) T x 1 x 2 x 3 x 4 x T Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 94 / 131
96 Neural Models for Representation Learning Recurrent Neural Network Bidirectional RNN h (1) t = f (U (1) h (1) t 1 + W(1) x t + b (1) ), (86) h (2) t = f (U (2) h (2) t+1 + W(2) x t + b (2) ), (87) h t = h (1) t h (2) t. (88) y 1 y 2 y 3 y 4 y T h (2) 1 h (2) 2 h (2) 3 h (2) 4 h (2) T h (1) 1 h (1) 2 h (1) 3 h (1) 4 h (1) T x 1 x 2 x 3 x 4 x T Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 95 / 131
97 Neural Models for Representation Learning Recurrent Neural Network Application of RNN: Sequence to Category Text Classication Sentiment Classication h y y h1 h2 h T h1 h2 h T x1 x2 x T (c) Mean x1 x2 x T (d) Last Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 96 / 131
98 Neural Models for Representation Learning Recurrent Neural Network Application of RNN: Synchronous Sequence to Sequence Sequence Labeling, such as Chinese word segmentation, POS tagging y 1 y 2 y T h 1 h 2 h T x 1 x 2 x T Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 97 / 131
99 Neural Models for Representation Learning Recurrent Neural Network Application of RNN: Asynchronous Sequence to Sequence Machine Translation y 1 y 2 y M h 1 h 2 h T h T +1 h T +2 h T +M x 1 x 2 x T < EOS > Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 98 / 131
100 Neural Models for Representation Learning Recurrent Neural Network Unfolded LSTM for Text Classication h 1 h 2 h 3 h 4 h T softmax x 1 x 2 x 3 x 4 x T y Drawback: long-term dependencies need to be transmitted one-by-one along the sequence. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 99 / 131
101 Neural Models for Representation Learning Recurrent Neural Network Unfolded Multi-Timescale LSTM 23 g 3 1 g 3 2 g 3 3 g 3 4 g 3 T g 2 1 g 2 2 g 2 3 g 2 4 g 2 T softmax g 1 1 g 1 2 g 1 3 g 1 4 g 1 T y x 1 x 2 x 3 x 4 x T 23 P. Liu et al. Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 100 / 131
102 Neural Models for Representation Learning Recurrent Neural Network LSTM for Sentiment Analysis LSTM MT-LSTM LSTM MT-LSTM <s> Is this progress? </s> <s> He d create a movie better than this. </s> LSTM MT-LSTM <s> It s not exactly a gourmetmeal but the fare is fair, even coming from the drive. </s> Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 101 / 131
103 Neural Models for Representation Learning Recurrent Neural Network Paragraph Vector Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In: arxiv preprint arxiv: (2014). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 102 / 131
104 Neural Models for Representation Learning Recurrent Neural Network Memory Mechanism What dierences among the various models from memory view? Short-term Long short-term Global External SRN Yes No No No LSTM/GRU Yes Yes Maybe No PV Yes Yes Yes No NTM/DMN Yes Yes Maybe Yes Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 103 / 131
105 Neural Models for Representation Learning Recursive Neural Network Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 104 / 131
106 Neural Models for Representation Learning Recursive Neural Network Recursive Neural Network (RecNN) 25 a red bike,np red bike,np a,det red,jj bike,nn Topological models compose the sentence representation following a given topological structure over the words. Given a labeled binary parse tree, ((p 2 ap 1 ), (p 1 bc)), the node representations are computed by [ ] b p 1 = f (W ), c [ a p 2 = f (W p1 ] ). 25 R. Socher et al. Parsing with compositional vector grammars. In: Proceedings of ACL Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 105 / 131
107 Neural Models for Representation Learning Recursive Neural Network Recursive Convolutional Neural Network 26 Recursive neural network can only process the binary combination and is not suitable for dependency parsing. a red bike,nn Pooling Convolution a bike,nn red bike,nn a,det red,jj bike,nn introducing the convolution and pooling layers; modeling the complicated interactions of the head word and its children. 26 C. Zhu et al. A Re-Ranking Model For Dependency Parser With Recursive Convolutional Neural Network. In: Proceedings of Annual Meeting of the Association for Computational Linguistics Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 106 / 131
108 Neural Models for Representation Learning Recursive Neural Network Tree-Structured LSTMs 27 Natural language exhibits syntactic properties that would naturally combine words to phrases. Child-Sum Tree-LSTMs N-ary Tree-LSTMs 27 K. S. Tai, R. Socher, and C. D. Manning. Improved semantic representations from tree-structured long short-term memory networks. In: arxiv preprint arxiv: (2015). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 107 / 131
109 Neural Models for Representation Learning Recursive Neural Network Gated Recursive Neural Network 28 B M E S 下 Rainy 雨 天 Day 地 Ground 面 积 水 Accumulated water DAG based Recursive Neural Network Gating mechanism An relative complicated solution GRNN models the complicated combinations of the features, which selects and preserves the useful combinations via reset and update gates. 28 X. Chen et al. Gated Recursive Neural Network For Chinese Word Segmentation. In: Proceedings of Annual Meeting of the Association for Computational Linguistics. 2015; X. Chen et al. Sentence Modeling with Gated Recursive Neural Network. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 108 / 131
110 Neural Models for Representation Learning Attention Model Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 109 / 131
111 Neural Models for Representation Learning Attention Model Attention Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 110 / 131
112 Neural Models for Representation Learning Attention Model Attention Model 29 1 The context vector c i is computed as a weighted sum of h i : c i = T α ij h j j=1 2 The weight α ij is computed by α ij = softmax(v T tanh(ws i 1 +Uh j )) 29 D. Bahdanau, K. Cho, and Y. Bengio. Neural Machine Translation by Jointly Learning to Align and Translate. In: ArXiv e-prints (Sept. 2014). arxiv: [cs.cl]. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 111 / 131
113 Applications Question Answering Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 112 / 131
114 Applications Question Answering Question Answering Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 113 / 131
115 Applications Question Answering LSTM K. M. Hermann et al. Teaching machines to read and comprehend. In: Advances in Neural Information Processing Systems. 2015, pp Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 114 / 131
116 Applications Question Answering Dynamic Memory Networks A. Kumar et al. Ask me anything: Dynamic memory networks for natural language processing. In: arxiv preprint arxiv: (2015). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 115 / 131
117 Applications Question Answering Memory Networks S. Sukhbaatar, J. Weston, R. Fergus, et al. End-to-end memory networks. In: Advances in Neural Information Processing Systems. 2015, pp Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 116 / 131
118 Applications Question Answering Neural Reasoner B. Peng et al. Towards Neural Network-based Reasoning. In: arxiv preprint arxiv: (2015). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 117 / 131
119 Applications Machine Translation Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 118 / 131
120 Applications Machine Translation Sequence to Sequence Machine Translation 34 Neural machine translation is a recently proposed framework for machine translation based purely on neural networks. 34 I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems. 2014, pp Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 119 / 131
121 Applications Machine Translation Image Caption A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, pp O. Vinyals et al. Show and tell: A neural image caption generator. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015, pp K. Xu et al. Show, attend and tell: Neural image caption generation with visual attention. In: arxiv preprint arxiv: (2015). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 120 / 131
122 Applications Machine Translation Image Caption Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 121 / 131
123 Applications Text Matching Table of Contents 1 Basic Concepts Articial Intelligence Machine Learning Deep Learning 2 Neural Models for Representation Learning General Architecture Convolutional Neural Network Recurrent Neural Network Recursive Neural Network Attention Model 3 Applications Question Answering Machine Translation Text Matching 4 Challenges & Open Problems Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 122 / 131
124 Applications Text Matching Text Matching Among many natural language processing (NLP) tasks, such as text classication, question answering and machine translation, a common problem is modelling the relevance/similarity of a pair of texts, which is also called text semantic matching. Three types of interaction models: Weak interaction Models Semi-interaction Models Strong Interaction Models Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 123 / 131
125 Applications Text Matching Weak interaction Models Some early works focus on sentence level interactions, such as ARC-I 38, CNTN 39 and so on. These models rst encode two sequences into continuous dense vectors by separated neural models, and then compute the matching score based on sentence encoding. question f + + answer Matching Score Figure: Convolutional Neural Tensor Network 38 Hu et al., Convolutional neural network architectures for matching natural language sentences. 39 X. Qiu and X. Huang. Convolutional Neural Tensor Network Architecture for Community-based Question Answering. In: Proceedings of International Joint Conference on Articial Intelligence Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 124 / 131
126 Applications Text Matching Semi-interaction Models Another kind of models use soft attention mechanism to obtain the representation of one sentence by depending on representation of another sentence, such as ABCNN 40, Attention LSTM 41. These models can alleviate the weak interaction problem to some extent. Figure: Attention LSTM W. Yin et al. ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. In: arxiv preprint arxiv: (2015). 41 Hermann et al., Teaching machines to read and comprehend. 42 T. Rockt schel et al. Reasoning about Entailment with Neural Attention. In: arxiv preprint arxiv: (2015). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 125 / 131
127 Applications Text Matching Strong Interaction Models The models build the interaction at dierent granularity (word, phrase and sentence level), such as ARC-II 43, MV-LSTM 44, coupled-lstms 45. The nal matching score depends on these dierent levels of interactions. Figure: coupled-lstms 43 Hu et al., Convolutional neural network architectures for matching natural language sentences. 44 S. Wan et al. A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations. In: AAAI P. Liu, X. Qiu, and X. Huang. Modelling Interaction of Sentence Pair with coupled-lstms. In: arxiv preprint arxiv: (2016). Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 126 / 131
128 Challenges & Open Problems Conclusion of DL4NLP (just kidding) Long long ago: you must know intrinsic rules of data. Past ten years: you just know eective features of data. Nowadays: you just need to have big data. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 127 / 131
129 Challenges & Open Problems Challenges & Open Problems Depth of network Rare words Fundamental data structure Long-term dependencies Natural language understanding & reasoning Biology inspired model Unsupervised learning Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 128 / 131
130 Challenges & Open Problems DL4NLP from scratch Select a real problem Translate your problem to (supervised) learning problem Prepare your data and hardware (GPU) Select a library: Theano, Keras, TensorFlow Find an open source implementation Incrementally writing your own code Run it. Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 129 / 131
131 Challenges & Open Problems More Information 神经网络与深度学习 最新讲义 : Xipeng Qiu (Fudan University) Deep Learning for Natural Language Processing 130 / 131
a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM
c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of
More informationDeep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang
Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationword2vec Parameter Learning Explained
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationHigh Order LSTM/GRU. Wenjie Luo. January 19, 2016
High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationDeep learning for Natural Language Processing and Machine Translation
Deep learning for Natural Language Processing and Machine Translation 2015.10.16 Seung-Hoon Na Contents Introduction: Neural network, deep learning Deep learning for Natural language processing Neural
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationRecurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves
Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationGloVe: Global Vectors for Word Representation 1
GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background
More informationDeep Learning for NLP Part 2
Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationRecurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST
1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second
More informationNeural Networks for NLP. COMP-599 Nov 30, 2016
Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks
More informationSequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018
Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationStephen Scott.
1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationDeep Learning Architectures and Algorithms
Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning
More informationNeural Networks Language Models
Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationAn overview of word2vec
An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationSpatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015
Spatial Transormer Re: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transormer Networks, NIPS, 2015 Spatial Transormer Layer CNN is not invariant to scaling and rotation
More informationPart-of-Speech Tagging + Neural Networks 2 CS 287
Part-of-Speech Tagging + Neural Networks 2 CS 287 Review: Bilinear Model Bilinear model, ŷ = f ((x 0 W 0 )W 1 + b) x 0 R 1 d 0 start with one-hot. W 0 R d 0 d in, d 0 = F W 1 R d in d out, b R 1 d out
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationarxiv: v3 [cs.cl] 30 Jan 2016
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu arxiv:1411.2738v3 [cs.cl] 30 Jan 2016 Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention
More informationDeep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng
Deep Sequence Models Context Representation, Regularization, and Application to Language Adji Bousso Dieng All Data Are Born Sequential Time underlies many interesting human behaviors. Elman, 1990. Why
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationCS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention
CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationDeep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning
Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationarxiv: v1 [cs.cl] 31 May 2015
Recurrent Neural Networks with External Memory for Language Understanding Baolin Peng 1, Kaisheng Yao 2 1 The Chinese University of Hong Kong 2 Microsoft Research blpeng@se.cuhk.edu.hk, kaisheny@microsoft.com
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationNatural Language Processing
Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationMachine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26
Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and
More informationGated Recurrent Neural Tensor Network
Gated Recurrent Neural Tensor Network Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani and Satoshi Nakamura Faculty of Computer Science, Universitas Indonesia, Indonesia Email: andros.tjandra@gmail.com,
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationStructured Neural Networks (I)
Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationA Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN
A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationNeural Word Embeddings from Scratch
Neural Word Embeddings from Scratch Xin Li 12 1 NLP Center Tencent AI Lab 2 Dept. of System Engineering & Engineering Management The Chinese University of Hong Kong 2018-04-09 Xin Li Neural Word Embeddings
More informationSeq2Tree: A Tree-Structured Extension of LSTM Network
Seq2Tree: A Tree-Structured Extension of LSTM Network Weicheng Ma Computer Science Department, Boston University 111 Cummington Mall, Boston, MA wcma@bu.edu Kai Cao Cambia Health Solutions kai.cao@cambiahealth.com
More informationRecurrent Neural Networks
Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationLanguage Modeling. Hung-yi Lee 李宏毅
Language Modeling Hung-yi Lee 李宏毅 Language modeling Language model: Estimated the probability of word sequence Word sequence: w 1, w 2, w 3,., w n P(w 1, w 2, w 3,., w n ) Application: speech recognition
More informationDeep Learning for NLP
Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks
More informationRecurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)
Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationNeural Networks and Deep Learning.
Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden
More information