Global and Local Feature Learning for Ego-Network Analysis

Size: px
Start display at page:

Download "Global and Local Feature Learning for Ego-Network Analysis"

Transcription

1 Global and Local Feature Learning for Ego-Network Analysis Fatemeh Salehi Rizi Michael Granitzer and Konstantin Ziegler TIR Workshop 29 August 2017 (TIR Workshop) University of Passau 29 August / 28

2 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

3 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

4 representation. Adjacency Matrix Graph Embedding Bryan Perozzi Latent Dimensions V d << V Anomaly Detection Attribute Prediction Clustering Link Prediction... DeepWalk: Online Learning of Social Representations Given graph G with a set of nodes V = {v1,..., vn } f : vi 7 yi Rd, d V Methods based on eigen-decomposition of the Adjacency Matrix Methods inspired by NLP and Deep Learning (TIR Workshop) University of Passau 29 August / 28

5 representation. Adjacency Matrix Graph Embedding Bryan Perozzi Latent Dimensions V d << V Anomaly Detection Attribute Prediction Clustering Link Prediction... DeepWalk: Online Learning of Social Representations Given graph G with a set of nodes V = {v1,..., vn } f : vi 7 yi Rd, d V Methods based on eigen-decomposition of the Adjacency Matrix Methods inspired by NLP and Deep Learning (TIR Workshop) University of Passau 29 August / 28

6 What is an Ego Network? Social graphs have been divided to several subgraphs (ego-networks) [1] extracting features for nodes detecting distinct neighborhood patterns study social relationships Ego-network [1] ego alters social circles Sport teammates alter ego Colleagues Friends Family members An ego-network with four social circles (TIR Workshop) University of Passau 29 August / 28

7 Local Neighborhood Analysis Neighborhood around each ego has a different pattern [2] (a) Linked neighbors (b) Strongly linked (c) Dense (d) Complete (e) Powerful ego node (f) Strong ego neighbor (g) Less cohesive star (h) Star Figure 2: Clustering results of ego graphs, depicting the ego node in the middle (red color), with connections to first and second-degree neighbors. This study focuses on the automatic categorization of such ego graphs according to their graph structure. All in all, there exist eight distinct trends. We label each graph according to its characteristics. Cluster(f) (Strong ego neighbors): The ego node has Network C(a) C(b) C(c) C(d) C(e) C(f) C(g) C(h) M Nokia few immediate neighbors and the network is not densely Nokia (TIR Workshop) University of Passau +* * 29 August / 28 BT +*! +*! +! +*! GPS +*! +*! +*! +*! Fri& Fam +*! * +*! +*!

8 Local Neighborhood Analysis Neighborhood around each ego has a different pattern [2] (a) Linked neighbors (b) Strongly linked (c) Dense (d) Complete (e) Powerful ego node (f) Strong ego neighbor (g) Less cohesive star (h) Star Figure 2: Clustering results of ego graphs, depicting the ego node in the middle (red color), with connections to first and second-degree neighbors. This study focuses on the automatic categorization of such ego graphs according to their graph structure. All in all, there exist eight distinct trends. We label each graph according to its characteristics. Finding a vector representation for each ego-network Social circle detection and prediction Cluster(f) (Strong ego neighbors): The ego node has Network C(a) C(b) C(c) C(d) C(e) C(f) C(g) C(h) M Nokia few immediate neighbors and the network is not densely Nokia (TIR Workshop) University of Passau +* * 29 August / 28 BT +*! +*! +! +*! GPS +*! +*! +*! +*! Fri& Fam +*! * +*! +*!

9 Social Circle Prediction Predicting the social circle for a new added alter to the ego-network (TIR Workshop) University of Passau 29 August / 28

10 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

11 Our Contributions We introduce local vector representations for egos to capture neighborhood structures We apply local vectors to the circle prediction problem We replace global representations by local to improve the performance (TIR Workshop) University of Passau 29 August / 28

12 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

13 Vector Representation for Social Graphs DeepWalk [3] walks globally over the graph and samples sequences of nodes treats all these sequences as an artificial corpus feeds the corpus to a Skip-Gram based Word2Vec [5] (TIR Workshop) University of Passau 29 August / 28

14 Vector Representation for Social Graphs DeepWalk [3] walks globally over the graph and samples sequences of nodes treats all these sequences as an artificial corpus feeds the corpus to a Skip-Gram based Word2Vec [5] Word2Vec: Having the sequence of words {w 1, w 2,..., w t 1, w t, w t+1..., w n }, language models aims to maximize P (w t w 1,..., w t 1 ). Input layer Hidden layer Output layer x 1 x 2 x 3 x k h 1 h 2 h i W V N ={w ki } W' N V ={w' ij } h N y 1 y 2 y 3 y j x V y V Figure 1: A simple CBOW model with only one word in the context (TIR Workshop) University of Passau 29 August / 28

15 Vector Representation for Social Graphs DeepWalk [3] walks globally over the graph and samples sequences of nodes treats all these sequences as an artificial corpus feeds the corpus to a Skip-Gram based Word2Vec [5] Word2Vec: Having the sequence of words {w1, w2,..., wt 1, wt, wt+1..., wn }, language models aims to maximize P (wt w1,..., wt 1 ). given sequence nodes {v1, v2,..., vt 1, vt, vt+1,..., vn } it Visual Example Pnof log P r(v vt+c,..., vt1, vt+1,..., vt c ) maximizes: t=1 On Zachary s KaratetGraph: glo : V Rd Input Output Zachary s karate club embedding [2] Bryan Perozzi (TIR Workshop) DeepWalk: Online Learning of Social Representations University of Passau 29 August / 28

16 Vector Representation for Social Graphs node2vec [4] similar to DeepWalk with two additional parameters hyper-parameters p R + and q R + control random walks q > 1 and p < min(q, 1) walk locally (BFS) p > 1 and q < min(q, 1)) walk explorative (DFS) belong to (i.e., hod be based on the structural equivawe observe nodes mmunity of nodes, munities share the etworks commonly s essential to allow sentations obeying that embed nodes ether, as well as to ilar roles have simning algorithms to prediction tasks. s 1 s 3 u s 2 s 4 (TIR Workshop) University of Passau 29 August / 28 s 7 s 5 s 6 s 8 s 9 BFS DFS Figure 1: BFS and DFS search strategies from node u (k = 3). principles in network science, providing flexibility in discovering representations conforming to different equivalences. 3. We extend node2vec and other feature learning methods based

17 Vector Representation for Social Graphs node2vec [4] similar to DeepWalk with two additional parameters hyper-parameters p R + and q R + control random walks q > 1 and p < min(q, 1) walk locally (BFS) p > 1 and q < min(q, 1)) walk explorative (DFS) belong to (i.e., hod be based on the structural equivawe observe nodes mmunity of nodes, munities share the etworks commonly s essential to allow sentations obeying that embed nodes ether, as well as to ilar roles have simning algorithms to prediction tasks. s 1 s 3 u Even the local walk can exceed the ego-network s 2 s 4 (TIR Workshop) University of Passau 29 August / 28 s 7 s 5 s 6 s 8 s 9 BFS DFS Figure 1: BFS and DFS search strategies from node u (k = 3). principles in network science, providing flexibility in discovering representations conforming to different equivalences. 3. We extend node2vec and other feature learning methods based

18 4:12 J. McAuley and J. Leskovec Social Circle Prediction by et al. [1] ALGORITHM 2: Update memberships node x and circle k. Data: nodex whose membership to circle Ck is to be updated Result: updated membership for node x initialize l k x (0) := 0, lk x (1) := 0; construct a dummy node x0 with the communities and features of x but with x / Ck; construct a dummy node x1 with the communities and features of x but with x Ck; for (c, f ) dom(types) do // c = community string, f = feature string n := types(c, f ); // n = number of nodes of this type if S(x) = c Q(x) = f then // avoid including a self-loop on x n := n 1; end construct a dummy node y with community memberships c and features f ; // first compute probabilities assuming all pairs (x, y) are non-edges l k x (0) := lk x (0) + nlog p((x0, y) / E); l k x (1) := lk x (1) + nlog p((x1, y) / E); end for (x, y) E do // correct for edges incident on x l k x (0) := lk x (0) log p((x0, y) / E) + log p((x0, y) E); l k x (1) := lk x (1) log p((x1, y) / E) + log p((x1, y) E); end // update membership to circle k types(s(x), Q(x)) := types(s(x), Q(x)) 1; z U(0, 1); if z < exp {T (l k x (1) lk x (0))} then S(x)[k] := 1 else S(x)[k] := 0 end types(s(x), Q(x)) := types(s(x), Q(x)) + 1; 4:14 J. McAuley and J. Leskove Fig. 3. Feature construction. Profiles are tree structured, and we construct features by comparing paths those trees.examples of trees for two users x (blue) and y (pink) are shown at top. Two schemes for construc ing feature vectors from these profiles are shown at bottom. (1) (bottom left) We construct binary indicato measuring the difference between leaves in the two trees for example, work position Cryptanalyst a pears in both trees. (2) (bottom right) We sum over the leaf nodes in the first scheme, maintaining the fa that the two users worked at the same institution but discarding the identity of that institution. A Probabilistic Classifier Interms Time of Complexity Big-O notation, O(n 3 ) our MCMC algorithm computes updates in worst case O( N 2 7. CONSTRUCTING FEATURES FROM USER PROFILES ) per iteration, whereas our pseudo-boolean optimization algorithm of Section 3 (which is (TIR based Workshop) on maximum flow) requires O( N 3 University ). These are of Passau both loose, worst-case 29 August / 28 Profile information in all of our datasets can be represented as a tree where each lev encodes increasingly specific information (Figure 3, left). In other words, user profile are organized into increasingly specific categories. For example, a user s profile migh

19 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

20 Local Representations using Paragraph Vector walking locally over an ego-network to generate sequence of nodes treating this sequence as an artificial paragraph applying Paragraph Vector [6] to learn vector representation (TIR Workshop) University of Passau 29 August / 28

21 Local Representations using Paragraph Vector walking locally over an ego-network to generate sequence of nodes treating this sequence as an artificial paragraph applying Paragraph Vector [6] to learn vector representation given an artificial paragraph v 1, v 2, v 3,..., v t,..., v l for ego u i, it maximizes the average log probability: loc: U R d l log P r(v t u i, v t+c,..., v t c ) t=1 (TIR Workshop) University of Passau 29 August / 28

22 Social Circle Prediction Setting social network G = (V, E) with egos U and alters V \ U profile information (v. feat 1,..., v. feat f ) for every v V Input/Output predict social circles c: V \ U {C 1,..., C k } given several samples Approach Feature selection (users profile information, graph embeddings) A Neural Network Classifier (TIR Workshop) University of Passau 29 August / 28

23 Incorporating Profile Information Similarity of ego s and alter s profile as a feature ego s profile feature: u. feat 1,..., u. feat f alter s profile feature: v. feat 1,..., v. feat f sim(u, v) = (b 1,..., b f ), where { 1 if u. feat i = v. feat i, b i = 0 otherwise. (1) (TIR Workshop) University of Passau 29 August / 28

24 Social Circle Prediction A feed-forward neural network classifier Input layer Hidden layer Output layer Predicting social circle for alter v which belongs to ego-network of ego u Input layer: locglo: loc(u) glo(v) gloglo: glo(u) glo(v) locgloglo: loc(u) glo(u) glo(v) locglosim: loc(u) glo(v sim(u, v) gloglosim: glo(u) glo(v) sim(u, v) locgloglosim: loc(u) glo(u) glo(v) sim(u, v) Hidden layer: a single dense layer with ReLU activation units Output layer: softmax units (same number as circles) Ground-truth alter s circle label (family, colleagues, etc) (TIR Workshop) University of Passau 29 August / 28

25 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

26 Dataset Table 1: Statistics of Social Network Datasets [7] Facebook Twitter Google+ nodes V 4,039 81, ,614 edges E 88,234 1,768,149 13,673,453 egos U circles C features f 576 2,271 4,122 (TIR Workshop) University of Passau 29 August / 28

27 Experimental Results Table 2: Performance (F 1 -score) of different embeddings for circle prediction on three dataset. Standard deviation is less than 0.02 for all experiments. Approach Facebook Twitter Google+ gloglo locglo locgloglo gloglosim locglosim locgloglosim Φ 1, McAuley & Leskovec [1] (TIR Workshop) University of Passau 29 August / 28

28 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

29 Future Work Using embeddings to approximate more complex measures (e.g. shortest-path distance) Using embedding to find similar egos Learning embedding for directed graphs (TIR Workshop) University of Passau 29 August / 28

30 Outline 1 Introduction Graph Embedding Ego-Network Analysis 2 Contributions 3 State-of-the-art 4 Approach 5 Experimental Results 6 Future Work 7 References (TIR Workshop) University of Passau 29 August / 28

31 References McAuley, Julian J., and Jure Leskovec. Learning to Discover Social Circles in Ego Networks. In NIPS, vol. 2012, pp Muhammad, Syed Agha, and Kristof Van Laerhoven. DUKE: A Solution for Discovering Neighborhood Patterns in Ego Networks. In ICWSM, pp Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp ACM, Grover, Aditya, and Jure Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp ACM, T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013 Le, Quoc V., and Tomas Mikolov. Distributed Representations of Sentences and Documents. In ICML, vol. 14, pp (TIR Workshop) University of Passau 29 August / 28

32 Word2vec [5] Language Modeling Distributional Hypothesis in the natural languages: semantically similar words dispose to appear in similar word neighborhoods Having the sequence of words {w 1, w 2,..., w t 1, w t,..., w n }, language models aims to maximize P (w t w 1,..., w t 1 ). In word2vec [2], they defined a fix context length surrounding each word with length context c and the sequence of words {w 1, w 2,..., w t 1, w t,..., w n } the goal is to word2vec is to maximize: n t=1 log P (w t w t+c,..., w t c ) The neural network which learns word representations: One hidden layer The number of input layer entries is equal to the vocabulary size of the text The number of units in the hidden layer determines dimensionality of the vectors (TIR Workshop) University of Passau 29 August / 28

33 Word2vec [5] Having a sequence of words {w 1, w 2,..., w t 1, w t,..., w n } and context window with length one Predict one target word, given one context word P (w t w t 1 ). Input layer Hidden layer Output layer x 1 x 2 x 3 x k h 1 h 2 h i W V N ={w ki } W' h N V ={w' ij } N y 1 y 2 y 3 y j x V y V Figure 1: A simple CBOW model with only one word in the context Each input is a one-hot encoding vector layers are fully connected. The input is a one-hot encoded vector, which means for a given input context word, only one out of V units, {x1,, xv }, will be 1, and all other units are 0. The weights between the input layer and the hidden layer can be represented by a V N matrix W. Each row of W is the N-dimension vector representation vw of the associated word of the input layer. Formally, row i of W is vw. T Given a context (a word), assuming xk = 1 and xk = 0 for k k, we have V h = W T x = W(k, ) T := vt, i=1 eu j (1) wi The probability is computed using the softmax function: h = W T x, u = W T e h, P (w t w t 1 ) = y t = u t After several iterations matrix W will not change (TIR Workshop) which is essentially copying the k-th row of W to h. vwi is the vector representation of the input word wi. This implies that the link (activation) function of the hidden layer units is simply linear (i.e., directly passing its weighted sum of inputs to the next layer). From the hidden layer to the output layer, there is a different weight matrix W = {w ij }, which is an N V matrix. University Using these of weights, Passau we can compute a score uj for each word 29 August / 28

34 t al., 2013d). It is also posanslate words and phrases l., 2013b). Paragraph Vector ctors attractive for many s such as language modolov, 2012), natural lant & Weston, 2008; Zhila translation (Mikolov et al., ge understanding (Frome tion (Socher et al., 2013a). Given a sequence theof softmax paragraphs weights, are p 1 fixed., p 2,..., p q and training words w 1, w 2, w 3,..., Suppose w t,... wthat n, there the idea are Nof paragraphs Paragraph in the Vector corpus, [6] M is to maximize p(w t p j, w t c..., w t+c )) For example consider 2 paragraphs and window size of 3 ibuted memory model raph vectors is inspired by d vectors. The inspiration to contribute to a predicthe sentence. So despite initialized randomly, they as an indirect result of the idea in our paragraph vecaragraph vectors are also tion task of the next word m the paragraph. ork (see Figure 2), every e vector, represented by a word is also mapped to a column in matrix W. The rs are averaged or concate- Figure 2. A framework for learning paragraph vector. This framework (TIR Workshop) is similar to the University framework of Passau presented in Figure 1; the only29 August / 28 our model. At prediction time, one needs to perform an inference step to compute the paragraph vector for a new paragraph. This is also obtained by gradient descent. In this step, the parameters for the rest of the model, the word vectors W and words in the vocabulary, and we want to learn paragraph vectors such that each paragraph is mapped to p dimensions and each word is mapped to q dimensions, then the model has the total of N p + M q parameters (excluding the softmax parameters). Even though the number of parameters can be large when N is large, the updates during training are typically sparse and thus efficient. P 1 : The cat sat on the mat P 2 : I ate potato crisps for evening snack p(on P 1, T he, cat, sat)

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

word2vec Parameter Learning Explained

word2vec Parameter Learning Explained word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

arxiv: v3 [cs.cl] 30 Jan 2016

arxiv: v3 [cs.cl] 30 Jan 2016 word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu arxiv:1411.2738v3 [cs.cl] 30 Jan 2016 Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

GloVe: Global Vectors for Word Representation 1

GloVe: Global Vectors for Word Representation 1 GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background

More information

Deep Learning. Ali Ghodsi. University of Waterloo

Deep Learning. Ali Ghodsi. University of Waterloo University of Waterloo Language Models A language model computes a probability for a sequence of words: P(w 1,..., w T ) Useful for machine translation Word ordering: p (the cat is small) > p (small the

More information

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Bayesian Paragraph Vectors

Bayesian Paragraph Vectors Bayesian Paragraph Vectors Geng Ji 1, Robert Bamler 2, Erik B. Sudderth 1, and Stephan Mandt 2 1 Department of Computer Science, UC Irvine, {gji1, sudderth}@uci.edu 2 Disney Research, firstname.lastname@disneyresearch.com

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

ATASS: Word Embeddings

ATASS: Word Embeddings ATASS: Word Embeddings Lee Gao April 22th, 2016 Guideline Bag-of-words (bag-of-n-grams) Today High dimensional, sparse representation dimension reductions LSA, LDA, MNIR Neural networks Backpropagation

More information

Neural Word Embeddings from Scratch

Neural Word Embeddings from Scratch Neural Word Embeddings from Scratch Xin Li 12 1 NLP Center Tencent AI Lab 2 Dept. of System Engineering & Engineering Management The Chinese University of Hong Kong 2018-04-09 Xin Li Neural Word Embeddings

More information

Data Mining & Machine Learning

Data Mining & Machine Learning Data Mining & Machine Learning CS57300 Purdue University April 10, 2018 1 Predicting Sequences 2 But first, a detour to Noise Contrastive Estimation 3 } Machine learning methods are much better at classifying

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Word vectors Many slides borrowed from Richard Socher and Chris Manning Lecture plan Word representations Word vectors (embeddings) skip-gram algorithm Relation to matrix factorization

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Seman&cs with Dense Vectors. Dorota Glowacka

Seman&cs with Dense Vectors. Dorota Glowacka Semancs with Dense Vectors Dorota Glowacka dorota.glowacka@ed.ac.uk Previous lectures: - how to represent a word as a sparse vector with dimensions corresponding to the words in the vocabulary - the values

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Structured deep models: Deep learning on graphs and beyond

Structured deep models: Deep learning on graphs and beyond Structured deep models: Deep learning on graphs and beyond Hidden layer Hidden layer Input Output ReLU ReLU, 25 May 2018 CompBio Seminar, University of Cambridge In collaboration with Ethan Fetaya, Rianne

More information

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.cl] 1 Jan 2019 Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016 GloVe on Spark Alex Adamson SUNet ID: aadamson June 6, 2016 Introduction Pennington et al. proposes a novel word representation algorithm called GloVe (Global Vectors for Word Representation) that synthesizes

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Overlapping Communities

Overlapping Communities Overlapping Communities Davide Mottin HassoPlattner Institute Graph Mining course Winter Semester 2017 Acknowledgements Most of this lecture is taken from: http://web.stanford.edu/class/cs224w/slides GRAPH

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

CS224n: Natural Language Processing with Deep Learning 1

CS224n: Natural Language Processing with Deep Learning 1 CS224n: Natural Language Processing with Deep Learning Lecture Notes: Part I 2 Winter 27 Course Instructors: Christopher Manning, Richard Socher 2 Authors: Francois Chaubard, Michael Fang, Guillaume Genthial,

More information

Deep Learning For Mathematical Functions

Deep Learning For Mathematical Functions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -

More information

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec Jiezhong Qiu Tsinghua University February 21, 2018 Joint work with Yuxiao Dong (MSR), Hao Ma (MSR), Jian Li (IIIS,

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Natural Language Processing with Deep Learning CS224N/Ling284. Richard Socher Lecture 2: Word Vectors

Natural Language Processing with Deep Learning CS224N/Ling284. Richard Socher Lecture 2: Word Vectors Natural Language Processing with Deep Learning CS224N/Ling284 Richard Socher Lecture 2: Word Vectors Organization PSet 1 is released. Coding Session 1/22: (Monday, PA1 due Thursday) Some of the questions

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Word Embeddings 2 - Class Discussions

Word Embeddings 2 - Class Discussions Word Embeddings 2 - Class Discussions Jalaj February 18, 2016 Opening Remarks - Word embeddings as a concept are intriguing. The approaches are mostly adhoc but show good empirical performance. Paper 1

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding. c Word Embedding Embedding Word2Vec Embedding Word EmbeddingWord2Vec 1. Embedding 1.1 BEDORE 0 1 BEDORE 113 0033 2 35 10 4F y katayama@bedore.jp Word Embedding Embedding 1.2 Embedding Embedding Word Embedding

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

More information

CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam

CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam This examination consists of 17 printed sides, 5 questions, and 100 points. The exam accounts for 20% of your total grade.

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Krähenbühl and Vladlen Koltun Stanford University Presenter: Yuan-Ting Hu 1 Conditional Random Field (CRF) E x I = φ u

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Instructions for NLP Practical (Units of Assessment) SVM-based Sentiment Detection of Reviews (Part 2)

Instructions for NLP Practical (Units of Assessment) SVM-based Sentiment Detection of Reviews (Part 2) Instructions for NLP Practical (Units of Assessment) SVM-based Sentiment Detection of Reviews (Part 2) Simone Teufel (Lead demonstrator Guy Aglionby) sht25@cl.cam.ac.uk; ga384@cl.cam.ac.uk This is the

More information

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

The representation of word and sentence

The representation of word and sentence 2vec Jul 4, 2017 Presentation Outline 2vec 1 2 2vec 3 4 5 6 discrete representation taxonomy:wordnet Example:good 2vec Problems 2vec synonyms: adept,expert,good It can t keep up to date It can t accurate

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Intelligent Systems (AI-2)

Intelligent Systems (AI-2) Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

1 Inference in Dirichlet Mixture Models Introduction Problem Statement Dirichlet Process Mixtures... 3

1 Inference in Dirichlet Mixture Models Introduction Problem Statement Dirichlet Process Mixtures... 3 Contents 1 Inference in Dirichlet Mixture Models 2 1.1 Introduction....................................... 2 1.2 Problem Statement................................... 2 1.3 Dirichlet Process Mixtures...............................

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Modeling Documents with a Deep Boltzmann Machine

Modeling Documents with a Deep Boltzmann Machine Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava, Ruslan Salakhutdinov & Geoffrey Hinton UAI 2013 Presented by Zhe Gan, Duke University November 14, 2014 1 / 15 Outline Replicated Softmax

More information

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach

Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Overlapping Community Detection at Scale: A Nonnegative Matrix Factorization Approach Author: Jaewon Yang, Jure Leskovec 1 1 Venue: WSDM 2013 Presenter: Yupeng Gu 1 Stanford University 1 Background Community

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Conditional Random Field

Conditional Random Field Introduction Linear-Chain General Specific Implementations Conclusions Corso di Elaborazione del Linguaggio Naturale Pisa, May, 2011 Introduction Linear-Chain General Specific Implementations Conclusions

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Lecture 15. Probabilistic Models on Graph

Lecture 15. Probabilistic Models on Graph Lecture 15. Probabilistic Models on Graph Prof. Alan Yuille Spring 2014 1 Introduction We discuss how to define probabilistic models that use richly structured probability distributions and describe how

More information

Laconic: Label Consistency for Image Categorization

Laconic: Label Consistency for Image Categorization 1 Laconic: Label Consistency for Image Categorization Samy Bengio, Google with Jeff Dean, Eugene Ie, Dumitru Erhan, Quoc Le, Andrew Rabinovich, Jon Shlens, and Yoram Singer 2 Motivation WHAT IS THE OCCLUDED

More information

Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop

Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop Project Tips Announcement: Hint for PSet1: Understand

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 21.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 27.10. (2) A.1 Linear Regression Fri. 3.11. (3) A.2 Linear Classification Fri. 10.11. (4) A.3 Regularization

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Embeddings Learned By Matrix Factorization

Embeddings Learned By Matrix Factorization Embeddings Learned By Matrix Factorization Benjamin Roth; Folien von Hinrich Schütze Center for Information and Language Processing, LMU Munich Overview WordSpace limitations LinAlgebra review Input matrix

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT

Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT 2013-12-30 Outline N-gram Introduction data sparsity and smooth NNLM Introduction Multi NNLMs Toolkit Word2vec(Deep learing in NLP)

More information

Assignment 1. Learning distributed word representations. Jimmy Ba

Assignment 1. Learning distributed word representations. Jimmy Ba Assignment 1 Learning distributed word representations Jimmy Ba csc321ta@cstorontoedu Background Text and language play central role in a wide range of computer science and engineering problems Applications

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Neural Networks for NLP. COMP-599 Nov 30, 2016

Neural Networks for NLP. COMP-599 Nov 30, 2016 Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Conditional Random Fields for Sequential Supervised Learning

Conditional Random Fields for Sequential Supervised Learning Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Popularity Recommendation Systems Predicting user responses to options Offering news articles based on users interests Offering suggestions on what the user might like to buy/consume

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Course 10. Kernel methods. Classical and deep neural networks.

Course 10. Kernel methods. Classical and deep neural networks. Course 10 Kernel methods. Classical and deep neural networks. Kernel methods in similarity-based learning Following (Ionescu, 2018) The Vector Space Model ò The representation of a set of objects as vectors

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Recommendation Systems

Recommendation Systems Recommendation Systems Pawan Goyal CSE, IITKGP October 29-30, 2015 Pawan Goyal (IIT Kharagpur) Recommendation Systems October 29-30, 2015 1 / 61 Recommendation System? Pawan Goyal (IIT Kharagpur) Recommendation

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information