Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec

Size: px
Start display at page:

Download "Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec"

Transcription

1 Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec Jiezhong Qiu Tsinghua University February 21, 2018 Joint work with Yuxiao Dong (MSR), Hao Ma (MSR), Jian Li (IIIS, Tsinghua), Kuansan Wang (MSR), Jie Tang (DCST, Tsinghua)

2 Motivation and Problem Formulation Problem Formulation Give a network G = (V, E), aim to learn a function f : V R p to capture neighborhood similarity and community membership. Applications: link prediction community detection label classification Figure 1: A toy example (Figure from DeepWalk).

3 History of Network Embedding 2017 metapath2vec [Dong et al.] 2016 node2vec [Grover & Leskovec] 2015 LINE & PTE [Tang et al.] 2014 DeepWalk [Perozzi et al.] word2vec (skip-gram) [Mikolov et al.] 2009 Spectral Clustering v.s. Kernel k-means [Dhillon et al.] Spectral Clustering [Ng et al.] Image Segmentation [Shi & Malik] SocDim [Tang & Liu] A large body of literature [Pothen et al.] [Simon] [Bolla], [Hagen & Kahng] [Hendrickson & Leland] [Van Driessche & Roose], [Barnard et al.] [Spielman & Teng], [Guattery & Miller] Fiedler Vector [Fiedler] Spectral Partitioning [Donath, Hoffman]

4 Contents Preliminaries Main Theoretic Results Notations DeepWalk (KDD 14) LINE (WWW 15) PTE (KDD 15) node2vec (KDD 16) NetMF NetMF for a Small Window Size T NetMF for a Large Window Size T Experiments

5 Notations Consider an undirected weighted graph G = (V, E), where V = n and E = m. Adjacency matrix A R n n + : { a i,j > 0 (i, j) E A i,j = 0 (i, j) E. Degree matrix D = diag(d 1,, d n ), where d i is the generalized degree of vertex i. Volume of the graph G: vol(g) = i j A i,j. Assumption G = (V, E) is connected, undirected, and not bipartite, which makes P (w) = a unique stationary distribution. dw vol(g)

6 DeepWalk Roadmap Input G=(V,E) Random Walk Skip-gram Output: Node Embedding

7 DeepWalk a Two-step Algorithm Algorithm 1: DeepWalk 1 for n = 1, 2,..., N do 2 Pick w1 n according to a probability distribution P (w 1); 3 Generate a vertex sequence (w1 n,, wn L ) of length L by a random walk on network G; 4 for j = 1, 2,..., L T do 5 for r = 1,..., T do 6 Add vertex-context pair (wj n, wn j+r ) to multiset D; 7 Add vertex-context pair (wj+r n, wn j ) to multiset D; 8 Run SGNS on D with b negative samples.

8 DeepWalk Roadmap Input G=(V,E) Random Walk Skip-gram Output: Node Embedding Levy & Goldberg (NIPS 14) #(w, c) Co-occurrence of w and c D Total number of word-context pairs b Number of negative samples #(w) Occurrence of word w #(c) Occurrence of context c

9 Skip-gram with Negative Sampling (SGNS) SGNS maintains a multiset D which counts the occurrence of each word-context pair (w, c). Objective: L = ( #(w, c) log g ( x ) b#(w)#(c) wy c + log g ( x ) ) D wy c, w c where x w, y c R d, g is the sigmoid function, and b is the number of negative samples for SGNS. For sufficiently large dimensionality d, equivalent to factorizing PMI matrix (Levy & Goldberg, NIPS 14): ( ) #(w, c) D log. b#(w)#(c)

10 DeepWalk Roadmap Input G=(V,E) Random Walk Skip-gram Output: Node Embedding Levy & Goldberg (NIPS 14) #(w, c) Co-occurrence of w and c D Total number of word-context pairs b Number of negative samples #(w) Occurrence of word w #(c) Occurrence of context c

11 DeepWalk (c, d) a b c d e (c, a) (c, e) Question Suppose the multiset D is constructed ( ) based on random walk on graph, can we interpret log #(w,c) D b#(w)#(c) with graph theory terminologies?

12 DeepWalk (c, d) a b c d e (c, a) (c, e) Question Suppose the multiset D is constructed ( ) based on random walk on graph, can we interpret log #(w,c) D b#(w)#(c) with graph theory terminologies? Challange We mix so many things together, i.e., direction and distance.

13 DeepWalk (c, d) a b c d e (c, a) (c, e) Question Suppose the multiset D is constructed ( ) based on random walk on graph, can we interpret log #(w,c) D b#(w)#(c) with graph theory terminologies? Challange We mix so many things together, i.e., direction and distance. Solution Let s distinguish them!

14 DeepWalk Partition the multiset D into several sub-multisets according to the way in which vertex and its context appear in a random walk sequence. More formally, for r = 1,, T, we define D r = { (w, c) : (w, c) D, w = w n j, c = w n j+r}, D r = { (w, c) : (w, c) D, w = w n j+r, c = w n j }. (c, d) D! 1 a b c d e (c, a) (c, e) D 2 D! 2

15 DeepWalk as Implicit Matrix Factorization Some observations Observation 1: ( ) #(w, c) D log b#(w) #(c) = log #(w,c) D b #(w) D #(c) D Observation 2: #(w, c) D = 1 2T T ( #(w, c) r r=1 D r + #(w, c) r D r ). Sufficient to characterize #(w,c) r D r and #(w,c) r D r.

16 DeepWalk Theorems Theorem Denote P = D 1 A, when the length of random walk L, #(w, c) r D r p d w vol(g) (P r ) w,c and #(w, c) r D r p d c vol(g) (P r ) c,w. Theorem When the length of random walk L, we have #(w, c) D p 1 2T T r=1 ( dw vol(g) (P r ) w,c + d ) c vol(g) (P r ) c,w. Theorem For DeepWalk, when the length of random walk L, ( ) #(w, c) D p vol(g) 1 T (P r ) #(w) #(c) 2T d w,c + 1 T (P r ) c d c,w. w r=1 r=1

17 DeepWalk Conclusion Theorem DeepWalk is asymptotically and implicitly factorizing ( ( vol(g) 1 T ( log D 1 A ) ) ) r D 1. b T r=1

18 DeepWalk Roadmap Input G=(V,E) Random Walk Skip-gram Output: Node Embedding Levy & Goldberg (NIPS 14) Adjacency matrix Degree matrix b Number of negative samples

19 LINE Objective of LINE: V V ( L = A i,j log g ( x ) bd i d j i y j + vol(g) log g ( ) x ) i y j. i=1 j=1 Align it with the Objective of SGNS: L = ( #(w, c) log g ( x ) b#(w)#(c) wy c + log g ( x ) ) D wy c. w c LINE is actually factorizing ( ) vol(g) log D 1 AD 1 b Recall DeepWalk s matrix form: ( ( vol(g) 1 T ( log D 1 A ) ) ) r D 1. b T r=1 Observation LINE is a special case of DeepWalk (T = 1).

20 PTE Figure 2: Heterogeneous Text Network. word-word network G ww, A ww R #word #word. document-word network G dw, A dw R #doc #word. label-word network G lw, A lw R #label #word.

21 PTE as Implicit Matrix Factorization log α vol(g ww)(drow) 1 A ww (D ww β vol(g dw )(Drow) dw 1 A dw (D dw γ vol(g lw )(Drow) lw 1 A lw (D lw col ) 1 col ) 1 col ) 1 log b, The matrix is of shape (#word + #doc + #label) #word. b is the number of negative samples in training. {α, β, γ} are hyper-parameters to balance the weights of the three networks. In PTE, {α, β, γ} satisfy α vol(g ww ) = β vol(g dw ) = γ vol(g lw )

22 node2vec 2nd Order Random Walk 1 p (u, v) E, (v, w) E, u = w; 1 (u, v) E, (v, w) E, u w, (w, u) E; T u,v,w = 1 q (u, v) E, (v, w) E, u w, (w, u) E; 0 otherwise. P u,v,w = Prob (w j+1 = u w j = v, w j 1 = w) = Stationary Distribution P u,v,w X v,w = X u,v w T u,v,w u T. u,v,w Existence guaranteed by Perron-Frobenius theorem, but may not be unique.

23 node2vec as Implicit Matrix Factorization Theorem node2vec is asymptotically and implicitly factorizing a matrix whose entry at w-th row, c-th column is log ( 1 2T T r=1 ( u X w,up r c,w,u + u X c,up r )) w,c,u b ( u X w,u) ( u X c,u)

24 Contents Preliminaries Main Theoretic Results Notations DeepWalk (KDD 14) LINE (WWW 15) PTE (KDD 15) node2vec (KDD 16) NetMF NetMF for a Small Window Size T NetMF for a Large Window Size T Experiments

25 Roadmap Input G=(V,E) Random Walk Skip-gram Output: Node Embedding Levy & Goldberg (NIPS 14) Matrix Factorization

26 NetMF Factorize the DeepWalk matrix: ( ( vol(g) 1 T ( log D 1 A ) ) ) r D 1. b T r=1 For numerical reason, we use truncated logarithm log(x) = log (max(1, x)) Figure 3: Truncated Logarithm

27 NetMF for a Small Window Size T Algorithm 2: NetMF for a Small Window Size T 1 Compute P 1,, P T ; ( 2 Compute M = vol(g) T ) bt r=1 P r D 1 ; 3 Compute M = max(m, 1); 4 Rank-d approximation by SVD: log M = U d Σ d Vd ; 5 return U d Σd as network embedding.

28 NetMF for a Large Window Size T Observations ( 1 T We want to factorize log ( vol(g) b ( 1 T T r=1 ( D 1 A ) ) ) r D 1. We know the property of normalized graph Laplacian T r=1 D 1/2 AD 1/2 = UΛU where Λ = diag(λ 1,, λ n ) and λ i [ 1, 1]. ( D 1 A ) ) ( ( r D 1 = D 1/2) 1 T = (D 1/2) U T r) ( (D 1/2 AD 1/2) D 1/2) r=1 ( 1 T ) T Λ r r=1 } {{ } A polynomial U (D 1/2)

29 NetMF for a Large Window Size T Observations Eigenvalues After Filtering T = 1 T = 2 T = 5 T = Eigenvalues Before Filtering Figure 4: f(λ) = 1 T T r=1 λr Idea This polynomial implicitly filters out negative eigenvalues and small positive eigenvalues, why not do it explicitly.

30 NetMF for a Large Window Size T Algorithm Algorithm 3: NetMF for a Large Window Size T 1 Eigen-decomposition D 1/2 AD 1/2 U h Λ h Uh ; 2 Approximate M with ( ˆM = vol(g) b D 1/2 U 1 ) T h T r=1 Λr h Uh D 1/2 ; 3 Compute ˆM = max( ˆM, 1); 4 Rank-d approximation by SVD: log ˆM = U d Σ d Vd ; 5 return U d Σd as network embedding.

31 Setup Label Classification: BlogCatelog, PPI, Wikipedia, Flickr Logistic Regression NetMF (T = 1) v.s. LINE NetMF (T = 10) v.s. DeepWalk Table 1: Statistics of Datasets. Dataset BlogCatalog PPI Wikipedia Flickr V 10,312 3,890 4,777 80,513 E 333,983 76, ,812 5,899,882 #Labels

32 Experimental Results 50 BlogCatalog 30 PPI 70 Wikipedia 40 Flickr Micro-F1 (%) Macro-F1 (%) NetMF (T=1) LINE NetMF (T=10) DeepWalk Figure 5: Predictive performance on varying the ratio of training data. The x-axis represents the ratio of labeled data (%), and the y-axis in the top and bottom rows denote the Micro-F1 and Macro-F1 scores respectively.

33 Conclusion Table 2: The matrices that are implicitly approximated and factorized by DeepWalk, LINE, PTE, and node2vec. Algorithm Matrix ( ( ) ) 1 T DeepWalk log vol(g) T r=1 (D 1 A) r D 1 log b LINE log ( vol(g)d 1 AD 1) log b PTE log α vol(g ww)(drow) 1 A ww (Dcol ww) 1 β vol(g dw )(Drow) dw 1 A dw (Dcol dw) 1 log b ( γ vol(g lw )(Drow) lw 1 A lw (Dcol lw ) 1 1 T 2T node2vec log r=1( ) u Xw,uP r c,w,u + u Xc,uP r w,c,u) ( u Xw,u)( Xc,u) log b u

34 Thanks. Standing on the shoulders of giants Isaac Newton Code available at github.com/xptree/netmf Q&A

35 DeepWalk Sketched Proof Theorem Denote P = D 1 A, when L, we have #(w, c) r D r p d w vol(g) (P r ) w,c and #(w, c) r D r p d c vol(g) (P r ) c,w. Proof. Consider the special case when N = 1, thus we only have one vertex sequence w 1,, w L generated by random walk. Let Y j (j = 1,, L T ) be the indicator function for event that w j = w and w j+r = c w c w j w j+r

36 Proof (Con t) Observation E[Y j ] = Prob(w j = w, w j+r = c) L T j=1 Y j. #(w,c) r D r = 1 L T Cov(Y i, Y j ) 0 as i j. dw vol(g) (P r ) w,c. Lemma (S.N. Bernstein Law of Large Numbers) Let Y 1, Y 2 be a sequence of random variables with finite expectation E[Y j ] and variance Var(Y j ) < K, j 1, and covariances are s.t. Cov(Y i, Y j ) 0 as i j. Then the law of large numbers (LLN) holds. #(w, c) r D r = 1 L T p 1 Y j L T L T j=1 L T j=1 E(Y j ) d w vol(g) (P r ) w,c

37 Time Complexity Eigen-Decomposition (Implicitly Restarted Lanczos Method) O(mhI + nh 2 I + h 3 I). Reconstruction O(n 2 h) Element-wise logarithm O(n 2 ). SVD (a naive implementation with eigen-decomposition): O(n 2 di + nd 2 I + d 3 I).

38 Future Work Comprehend high-order cases, e.g., node2vec. ( 1 T ( 2T r=1 u log X w,up r c,w,u + u X c,up r )) w,c,u b ( u X w,u) ( u X c,u) Design scalable algorithm (e.g., using spectral sparsification of random-walk polynomials). ( ( vol(g) 1 T ( log D 1 A ) ) ) r D 1. b T r=1 Connection with graph convolutional networks (Kipf & Welling, ICLR 17).

arxiv: v2 [cs.si] 11 Oct 2017

arxiv: v2 [cs.si] 11 Oct 2017 Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PE, and node2vec Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, and Jie ang Department of Computer Science and echnology,

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 6: Numerical Linear Algebra: Applications in Machine Learning Cho-Jui Hsieh UC Davis April 27, 2017 Principal Component Analysis Principal

More information

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral Graph Theory and its Applications. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral Graph Theory and its Applications Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Adjacency matrix and Laplacian Intuition, spectral graph drawing

More information

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay

A Statistical Look at Spectral Graph Analysis. Deep Mukhopadhyay A Statistical Look at Spectral Graph Analysis Deep Mukhopadhyay Department of Statistics, Temple University Office: Speakman 335 deep@temple.edu http://sites.temple.edu/deepstat/ Graph Signal Processing

More information

The Forward-Backward Embedding of Directed Graphs

The Forward-Backward Embedding of Directed Graphs The Forward-Backward Embedding of Directed Graphs Anonymous authors Paper under double-blind review Abstract We introduce a novel embedding of directed graphs derived from the singular value decomposition

More information

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods

Spectral Clustering. Spectral Clustering? Two Moons Data. Spectral Clustering Algorithm: Bipartioning. Spectral methods Spectral Clustering Seungjin Choi Department of Computer Science POSTECH, Korea seungjin@postech.ac.kr 1 Spectral methods Spectral Clustering? Methods using eigenvectors of some matrices Involve eigen-decomposition

More information

Neural Word Embeddings from Scratch

Neural Word Embeddings from Scratch Neural Word Embeddings from Scratch Xin Li 12 1 NLP Center Tencent AI Lab 2 Dept. of System Engineering & Engineering Management The Chinese University of Hong Kong 2018-04-09 Xin Li Neural Word Embeddings

More information

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving

Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving. 22 Element-wise Sampling of Graphs and Linear Equation Solving Stat260/CS294: Randomized Algorithms for Matrices and Data Lecture 24-12/02/2013 Lecture 24: Element-wise Sampling of Graphs and Linear Equation Solving Lecturer: Michael Mahoney Scribe: Michael Mahoney

More information

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu Task: Find coalitions in signed networks Incentives: European

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 9: Dimension Reduction/Word2vec Cho-Jui Hsieh UC Davis May 15, 2018 Principal Component Analysis Principal Component Analysis (PCA) Data

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Markov Chains and Spectral Clustering

Markov Chains and Spectral Clustering Markov Chains and Spectral Clustering Ning Liu 1,2 and William J. Stewart 1,3 1 Department of Computer Science North Carolina State University, Raleigh, NC 27695-8206, USA. 2 nliu@ncsu.edu, 3 billy@ncsu.edu

More information

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering

Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral

More information

Deep Learning. Ali Ghodsi. University of Waterloo

Deep Learning. Ali Ghodsi. University of Waterloo University of Waterloo Language Models A language model computes a probability for a sequence of words: P(w 1,..., w T ) Useful for machine translation Word ordering: p (the cat is small) > p (small the

More information

1 Matrix notation and preliminaries from spectral graph theory

1 Matrix notation and preliminaries from spectral graph theory Graph clustering (or community detection or graph partitioning) is one of the most studied problems in network analysis. One reason for this is that there are a variety of ways to define a cluster or community.

More information

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity

Spectral and Electrical Graph Theory. Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Spectral and Electrical Graph Theory Daniel A. Spielman Dept. of Computer Science Program in Applied Mathematics Yale Unviersity Outline Spectral Graph Theory: Understand graphs through eigenvectors and

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course

Spectral Algorithms I. Slides based on Spectral Mesh Processing Siggraph 2010 course Spectral Algorithms I Slides based on Spectral Mesh Processing Siggraph 2010 course Why Spectral? A different way to look at functions on a domain Why Spectral? Better representations lead to simpler solutions

More information

Faloutsos, Tong ICDE, 2009

Faloutsos, Tong ICDE, 2009 Large Graph Mining: Patterns, Tools and Case Studies Christos Faloutsos Hanghang Tong CMU Copyright: Faloutsos, Tong (29) 2-1 Outline Part 1: Patterns Part 2: Matrix and Tensor Tools Part 3: Proximity

More information

Networks and Their Spectra

Networks and Their Spectra Networks and Their Spectra Victor Amelkin University of California, Santa Barbara Department of Computer Science victor@cs.ucsb.edu December 4, 2017 1 / 18 Introduction Networks (= graphs) are everywhere.

More information

ATASS: Word Embeddings

ATASS: Word Embeddings ATASS: Word Embeddings Lee Gao April 22th, 2016 Guideline Bag-of-words (bag-of-n-grams) Today High dimensional, sparse representation dimension reductions LSA, LDA, MNIR Neural networks Backpropagation

More information

Bruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM

Bruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM Latent Semantic Analysis and Fiedler Retrieval Bruce Hendrickson Discrete Algorithms & Math Dept. Sandia National Labs Albuquerque, New Mexico Also, CS Department, UNM Informatics & Linear Algebra Eigenvectors

More information

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU)

sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) sublinear time low-rank approximation of positive semidefinite matrices Cameron Musco (MIT) and David P. Woodru (CMU) 0 overview Our Contributions: 1 overview Our Contributions: A near optimal low-rank

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

CS 664 Segmentation (2) Daniel Huttenlocher

CS 664 Segmentation (2) Daniel Huttenlocher CS 664 Segmentation (2) Daniel Huttenlocher Recap Last time covered perceptual organization more broadly, focused in on pixel-wise segmentation Covered local graph-based methods such as MST and Felzenszwalb-Huttenlocher

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

CMPSCI 791BB: Advanced ML: Laplacian Learning

CMPSCI 791BB: Advanced ML: Laplacian Learning CMPSCI 791BB: Advanced ML: Laplacian Learning Sridhar Mahadevan Outline! Spectral graph operators! Combinatorial graph Laplacian! Normalized graph Laplacian! Random walks! Machine learning on graphs! Clustering!

More information

Cholesky Decomposition Rectification for Non-negative Matrix Factorization

Cholesky Decomposition Rectification for Non-negative Matrix Factorization Cholesky Decomposition Rectification for Non-negative Matrix Factorization Tetsuya Yoshida Graduate School of Information Science and Technology, Hokkaido University N-14 W-9, Sapporo 060-0814, Japan yoshida@meme.hokudai.ac.jp

More information

An Algorithmist s Toolkit September 24, Lecture 5

An Algorithmist s Toolkit September 24, Lecture 5 8.49 An Algorithmist s Toolkit September 24, 29 Lecture 5 Lecturer: Jonathan Kelner Scribe: Shaunak Kishore Administrivia Two additional resources on approximating the permanent Jerrum and Sinclair s original

More information

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University

Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Spectral Feature Selection for Supervised and Unsupervised Learning

Spectral Feature Selection for Supervised and Unsupervised Learning Spectral Feature Selection for Supervised and Unsupervised Learning Zheng Zhao Huan Liu Department of Computer Science and Engineering, Arizona State University zhaozheng@asu.edu huan.liu@asu.edu Abstract

More information

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa

Introduction to Search Engine Technology Introduction to Link Structure Analysis. Ronny Lempel Yahoo Labs, Haifa Introduction to Search Engine Technology Introduction to Link Structure Analysis Ronny Lempel Yahoo Labs, Haifa Outline Anchor-text indexing Mathematical Background Motivation for link structure analysis

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element

More information

Introduction to Spectral Graph Theory and Graph Clustering

Introduction to Spectral Graph Theory and Graph Clustering Introduction to Spectral Graph Theory and Graph Clustering Chengming Jiang ECS 231 Spring 2016 University of California, Davis 1 / 40 Motivation Image partitioning in computer vision 2 / 40 Motivation

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching 1

A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching 1 CHAPTER-3 A New Spectral Technique Using Normalized Adjacency Matrices for Graph Matching Graph matching problem has found many applications in areas as diverse as chemical structure analysis, pattern

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings

Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline

More information

Data Mining Recitation Notes Week 3

Data Mining Recitation Notes Week 3 Data Mining Recitation Notes Week 3 Jack Rae January 28, 2013 1 Information Retrieval Given a set of documents, pull the (k) most similar document(s) to a given query. 1.1 Setup Say we have D documents

More information

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering

Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Certifying the Global Optimality of Graph Cuts via Semidefinite Programming: A Theoretic Guarantee for Spectral Clustering Shuyang Ling Courant Institute of Mathematical Sciences, NYU Aug 13, 2018 Joint

More information

Global and Local Feature Learning for Ego-Network Analysis

Global and Local Feature Learning for Ego-Network Analysis Global and Local Feature Learning for Ego-Network Analysis Fatemeh Salehi Rizi Michael Granitzer and Konstantin Ziegler TIR Workshop 29 August 2017 (TIR Workshop) University of Passau 29 August 2017 1

More information

Latent Semantic Analysis. Hongning Wang

Latent Semantic Analysis. Hongning Wang Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:

More information

Spectral Clustering using Multilinear SVD

Spectral Clustering using Multilinear SVD Spectral Clustering using Multilinear SVD Analysis, Approximations and Applications Debarghya Ghoshdastidar, Ambedkar Dukkipati Dept. of Computer Science & Automation Indian Institute of Science Clustering:

More information

Embedding and Spectrum of Graphs

Embedding and Spectrum of Graphs Embedding and Spectrum of Graphs Ali Shokoufandeh, Department of Computer Science, Drexel University Overview Approximation Algorithms Geometry of Graphs and Graphs Encoding the Geometry Spectral Graph

More information

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018

ELE 538B: Mathematics of High-Dimensional Data. Spectral methods. Yuxin Chen Princeton University, Fall 2018 ELE 538B: Mathematics of High-Dimensional Data Spectral methods Yuxin Chen Princeton University, Fall 2018 Outline A motivating application: graph clustering Distance and angles between two subspaces Eigen-space

More information

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs

Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Data Analysis and Manifold Learning Lecture 9: Diffusion on Manifolds and on Graphs Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Locally-biased analytics

Locally-biased analytics Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2:

More information

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms

Spectral clustering. Two ideal clusters, with two points each. Spectral clustering algorithms A simple example Two ideal clusters, with two points each Spectral clustering Lecture 2 Spectral clustering algorithms 4 2 3 A = Ideally permuted Ideal affinities 2 Indicator vectors Each cluster has an

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Consistency of Spectral Clustering

Consistency of Spectral Clustering Max Planck Institut für biologische Kybernetik Max Planck Institute for Biological Cybernetics Technical Report No. TR-34 Consistency of Spectral Clustering Ulrike von Luxburg, Mikhail Belkin 2, Olivier

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

EIGENVECTOR NORMS MATTER IN SPECTRAL GRAPH THEORY

EIGENVECTOR NORMS MATTER IN SPECTRAL GRAPH THEORY EIGENVECTOR NORMS MATTER IN SPECTRAL GRAPH THEORY Franklin H. J. Kenter Rice University ( United States Naval Academy 206) orange@rice.edu Connections in Discrete Mathematics, A Celebration of Ron Graham

More information

Spectral Clustering. by HU Pili. June 16, 2013

Spectral Clustering. by HU Pili. June 16, 2013 Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm

Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Algorithmic Primitives for Network Analysis: Through the Lens of the Laplacian Paradigm Shang-Hua Teng Computer Science, Viterbi School of Engineering USC Massive Data and Massive Graphs 500 billions web

More information

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min

Spectral Graph Theory Lecture 2. The Laplacian. Daniel A. Spielman September 4, x T M x. ψ i = arg min Spectral Graph Theory Lecture 2 The Laplacian Daniel A. Spielman September 4, 2015 Disclaimer These notes are not necessarily an accurate representation of what happened in class. The notes written before

More information

Embeddings Learned By Matrix Factorization

Embeddings Learned By Matrix Factorization Embeddings Learned By Matrix Factorization Benjamin Roth; Folien von Hinrich Schütze Center for Information and Language Processing, LMU Munich Overview WordSpace limitations LinAlgebra review Input matrix

More information

Link Analysis Ranking

Link Analysis Ranking Link Analysis Ranking How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would you do it? Naïve ranking of query results Given query

More information

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010

Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability. COMPSTAT 2010 Paris, August 23, 2010 Statistical Inference on Large Contingency Tables: Convergence, Testability, Stability Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu COMPSTAT

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018

Lab 8: Measuring Graph Centrality - PageRank. Monday, November 5 CompSci 531, Fall 2018 Lab 8: Measuring Graph Centrality - PageRank Monday, November 5 CompSci 531, Fall 2018 Outline Measuring Graph Centrality: Motivation Random Walks, Markov Chains, and Stationarity Distributions Google

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

5 Flows and cuts in digraphs

5 Flows and cuts in digraphs 5 Flows and cuts in digraphs Recall that a digraph or network is a pair G = (V, E) where V is a set and E is a multiset of ordered pairs of elements of V, which we refer to as arcs. Note that two vertices

More information

SVD, discrepancy, and regular structure of contingency tables

SVD, discrepancy, and regular structure of contingency tables SVD, discrepancy, and regular structure of contingency tables Marianna Bolla Institute of Mathematics Technical University of Budapest (Research supported by the TÁMOP-4.2.2.B-10/1-2010-0009 project) marib@math.bme.hu

More information

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices)

SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Chapter 14 SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Today we continue the topic of low-dimensional approximation to datasets and matrices. Last time we saw the singular

More information

Spectral Analysis of k-balanced Signed Graphs

Spectral Analysis of k-balanced Signed Graphs Spectral Analysis of k-balanced Signed Graphs Leting Wu 1, Xiaowei Ying 1, Xintao Wu 1, Aidong Lu 1 and Zhi-Hua Zhou 2 1 University of North Carolina at Charlotte, USA, {lwu8,xying, xwu,alu1}@uncc.edu

More information

Approximate Spectral Clustering via Randomized Sketching

Approximate Spectral Clustering via Randomized Sketching Approximate Spectral Clustering via Randomized Sketching Christos Boutsidis Yahoo! Labs, New York Joint work with Alex Gittens (Ebay), Anju Kambadur (IBM) The big picture: sketch and solve Tradeoff: Speed

More information

Spectral Clustering. Guokun Lai 2016/10

Spectral Clustering. Guokun Lai 2016/10 Spectral Clustering Guokun Lai 2016/10 1 / 37 Organization Graph Cut Fundamental Limitations of Spectral Clustering Ng 2002 paper (if we have time) 2 / 37 Notation We define a undirected weighted graph

More information

A New Space for Comparing Graphs

A New Space for Comparing Graphs A New Space for Comparing Graphs Anshumali Shrivastava and Ping Li Cornell University and Rutgers University August 18th 2014 Anshumali Shrivastava and Ping Li ASONAM 2014 August 18th 2014 1 / 38 Main

More information

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz

Part I: Preliminary Results. Pak K. Chan, Martine Schlag and Jason Zien. Computer Engineering Board of Studies. University of California, Santa Cruz Spectral K-Way Ratio-Cut Partitioning Part I: Preliminary Results Pak K. Chan, Martine Schlag and Jason Zien Computer Engineering Board of Studies University of California, Santa Cruz May, 99 Abstract

More information

Community detection in stochastic block models via spectral methods

Community detection in stochastic block models via spectral methods Community detection in stochastic block models via spectral methods Laurent Massoulié (MSR-Inria Joint Centre, Inria) based on joint works with: Dan Tomozei (EPFL), Marc Lelarge (Inria), Jiaming Xu (UIUC),

More information

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora

Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 14: SVD, Power method, and Planted Graph problems (+ eigenvalues of random matrices) Lecturer: Sanjeev Arora Scribe: Today we continue the

More information

Graph Partitioning Using Random Walks

Graph Partitioning Using Random Walks Graph Partitioning Using Random Walks A Convex Optimization Perspective Lorenzo Orecchia Computer Science Why Spectral Algorithms for Graph Problems in practice? Simple to implement Can exploit very efficient

More information

Spectra of Adjacency and Laplacian Matrices

Spectra of Adjacency and Laplacian Matrices Spectra of Adjacency and Laplacian Matrices Definition: University of Alicante (Spain) Matrix Computing (subject 3168 Degree in Maths) 30 hours (theory)) + 15 hours (practical assignment) Contents 1. Spectra

More information

MATH 567: Mathematical Techniques in Data Science Clustering II

MATH 567: Mathematical Techniques in Data Science Clustering II This lecture is based on U. von Luxburg, A Tutorial on Spectral Clustering, Statistics and Computing, 17 (4), 2007. MATH 567: Mathematical Techniques in Data Science Clustering II Dominique Guillot Departments

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Lecture: Some Practical Considerations (3 of 4)

Lecture: Some Practical Considerations (3 of 4) Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Spectral Graph Theory

Spectral Graph Theory Spectral Graph Theory Aaron Mishtal April 27, 2016 1 / 36 Outline Overview Linear Algebra Primer History Theory Applications Open Problems Homework Problems References 2 / 36 Outline Overview Linear Algebra

More information

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley

Jure Leskovec Joint work with Jaewon Yang, Julian McAuley Jure Leskovec (@jure) Joint work with Jaewon Yang, Julian McAuley Given a network, find communities! Sets of nodes with common function, role or property 2 3 Q: How and why do communities form? A: Strength

More information

Prediction of Citations for Academic Papers

Prediction of Citations for Academic Papers 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

COMPSCI 514: Algorithms for Data Science

COMPSCI 514: Algorithms for Data Science COMPSCI 514: Algorithms for Data Science Arya Mazumdar University of Massachusetts at Amherst Fall 2018 Lecture 8 Spectral Clustering Spectral clustering Curse of dimensionality Dimensionality Reduction

More information

Analysis of Spectral Space Properties of Directed Graphs using Matrix Perturbation Theory with Application in Graph Partition

Analysis of Spectral Space Properties of Directed Graphs using Matrix Perturbation Theory with Application in Graph Partition Analysis of Spectral Space Properties of Directed Graphs using Matrix Perturbation Theory with Application in Graph Partition Yuemeng Li, Xintao Wu and Aidong Lu University of North Carolina at Charlotte,

More information

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics

Spectral Graph Theory and You: Matrix Tree Theorem and Centrality Metrics Spectral Graph Theory and You: and Centrality Metrics Jonathan Gootenberg March 11, 2013 1 / 19 Outline of Topics 1 Motivation Basics of Spectral Graph Theory Understanding the characteristic polynomial

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states.

Definition A finite Markov chain is a memoryless homogeneous discrete stochastic process with a finite number of states. Chapter 8 Finite Markov Chains A discrete system is characterized by a set V of states and transitions between the states. V is referred to as the state space. We think of the transitions as occurring

More information