Data Mining and Matrices
|
|
- Jeffrey Barnett
- 5 years ago
- Views:
Transcription
1 Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23
2 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences of each word in a text document) Quantities (e.g., amount of each ingredient in a chemical experiment) Intensities (e.g., intensity of each color in an image) The corresponding data matrix D has only non-negative values. Decompositions such as SVD and SDD may involve negative values in factors and components Negative values describe the absence of something Often no natural interpretation Can we find a decomposition that is more natural to non-negative data? 2 / 39
3 Example (SVD) Consider the following bridge matrix and its truncated SVD: D = U Σ V T Here are the corresponding components: D = U D V T + U 2 D 22 V T 2 Negative values make interpretation unnatural or difficult. 3 / 39
4 Outline Non-Negative Matrix Factorization 2 Algorithms 3 Probabilistic Latent Semantic Analysis 4 Summary 4 / 39
5 Non-Negative Matrix Factorization (NMF) Definition (Non-negative matrix factorization, basic form) Given a non-negative matrix D R m n +, a non-negative matrix factorization of rank k is D LR, where L R m r + and R R r n + are both non-negative. Additive decomposition: factors and components non-negative No cancellation effects Rows of R can be thought as parts Row of D obtained by mixing (or assembling ) parts in L Smallest r such that D = LR exists is called non-negative rank of D rank(d) rank + (D) min { m, n } 5 / 39
6 Example (NMF) Consider the following bridge matrix and its rank-2 NMF: D = L R Here are the corresponding components: D = L R + L 2 R 2 Non-negative matrix decomposition encourage a more natural, part-based representation and (sometimes) sparsity. 6 / 39
7 Decomposing faces (PCA) D i (original) [LR] i = L i R [UΣV T ] i = U i Σ V T Lee and Seung, 999. PCA factors are hard to interpret. 7 / 39
8 Decomposing faces (NMF) D i (original) [LR] i = L i R NMF factors correspond to parts of faces. Lee and Seung, / 39
9 Decomposing digits (NMF) D R NMF factors correspond to parts of digits and background. Cichocki et al., / 39
10 Some applications Cichocki et al., 29. Text mining (more later) Bioinformatics Microarray analysis Mineral exploration Neuroscience Image understanding Air pollution research Chemometrics Spectral data analysis Linear sparse coding Image classification Clustering Neural learning process Sound recognition Remote sensing Object characterization... / 39
11 Gaussian NMF Gaussian NMF is the most basic form of non-negative factorizations: minimize D LR 2 F s. t. L R m r + R R r n + Truncated SVD minimizes the same objective (but without non-negativity constraints) Many other variants exist Different objective functions (e.g., KL-divergence) Additional regularizations (e.g., L -regularization) Different constraints (e.g., orthogonality of R) Different compositions (e.g., 3 matrices) multi-layer NMF, semi-nmf, sparse NMF, tri-nmf, symmetric NMF, orthogonal NMF, non-smooth NMF (nsnmf), overlapping NMF, convolutive NMF (CNMF), k-means,... / 39
12 k-means can be seen as a variant of NMF Additional constraint: L contains exactly one in each row, rest D i (original) [LR] i = L i R k-means factors correspond to prototypical faces. Lee and Seung, / 39
13 NMF is not unique Factors are not ordered = = One way of ordering: decreasing Frobenius norm of components (i.e., order by L k R k F ) Factors/components are not unique = + = = + Additional constraints or regularization can encourage uniqueness. 3 / 39
14 NMF is not hierarchical Rank- NMF = Rank-2 NMF = + = Best rank-k approximation may differ significantly from best rank-(k ) approximation Rank influences sparsity, interpretability, and statistical fidelity Optimum choice of rank is not well-studied (often requires experimentation) 4 / 39
15 Outline Non-Negative Matrix Factorization 2 Algorithms 3 Probabilistic Latent Semantic Analysis 4 Summary 5 / 39
16 NMF is difficult We focus on minimizing L(L, R) = D LR 2 F. For varying m, n, and r, problem is NP-hard When rank(d) = (or r = ), can be solved in polynomial time Take first non-zero column of D as L m 2 Determine R n entry by entry (using the fact that D j = LR j ) Problem is not convex Local optimum may not correspond to global optimum Generally little hope to find global optimum But: Problem is biconvex For fixed R, f (L) = D LR 2 F is convex f (L) = i D i L i R 2 F Lik f (L) = 2(D i L i R)R T k (chain rule) (product rule) 2 L ik f (L) = 2R k R T k (does not depend on L) For fixed L, f (R) = D LR 2 F Allows for efficient algorithms is convex 6 / 39
17 General framework Gradient descent generally slow Stochastic gradient descent inappropriate Key approach: alternating minimization : Pick starting point L and R 2: while not converged do 3: Keep R fixed, optimize L 4: Keep L fixed, optimize R 5: end while Update steps 3 and 4 easier than full problem Also called alternating projections or (block) coordinate descent Starting point Random Multi-start initialization: try multiple random starting points, run a few epochs, continue with best Based on SVD... 7 / 39
18 Example Ignore non-negativity for now. Consider the regularized least-square error: L(L, R) = D LR 2 F + λ( L 2 F + R 2 F ) By setting m = n = r =, D = () and λ =.5, we obtain L(l, r) = ( lr) 2 +.5(l 2 + r 2 ) L(l,r) 5 l f (l) = 2r( lr) +.l 4 r f (r) = 2l( lr) +.r Local optima: 3 ( ) ( , 9 9 2, 2, 2 r Stationary point: (,) 9 2 ) 2 2 l 8 / 39
19 Example (ALS) f (l, r) = ( lr) 2 +.5(l 2 + r 2 ) l min l f (l) = r min r f (r) = Step l r Converges to local minimum 2r 2r l 2l 2 +. r / 39
20 Example (ALS) f (l, r) = ( lr) 2 +.5(l 2 + r 2 ) l min l f (l) = r min r f (r) = Step l r 2.. Converges to stationary point. 2r 2r l 2l 2 +. r / 39
21 Alternating non-negative least squares (ANLS) Uses non-negative least squares approximation of L and R: argmin D LR 2 F and argmin D LR 2 F L R m r + R R r n + Equivalently: find non-negative least squares solution to LR = D Common approach: Solve unconstrained least squares problems and remove negative values. E.g., when columns (rows) of L (R) are linearly independent, set L = [DR ] ɛ and R = [L D] ɛ where R = R T (RR T ) is the right pseudo-inverse of R L = (L T L) L T is the left pseudo-inverse of L [a]ɛ = max { ɛ, a } for ɛ = or some small constant (e.g., ɛ = 9 ) Difficult to analyze due to non-linear update steps Often slow convergence to a bad local minimum (better when regularized) 2 / 39
22 Example (ANLS) f (l, r) = ( lr) 2 +.5(l 2 + r 2 ) and set ɛ = 9 [ ] 2r l 2r 2 +. ɛ [ ] r 2l 2l 2 +. Step l r Converges to local minimum ɛ r / 39
23 Hierarchical alternating least squares (HALS) Work locally on a single factor, then proceed to next factor, and so on Let D (k) be the residual matrix (error) when k-th factor is removed: D (k) = D LR + L k R k = D k k L k R k HALS minimizes D (k) L k R k 2 F for k =, 2,..., r,,... (equivalently: finds best solution for k-th factor, fixing the rest) In each iteration, set (once or multiple times): [ ] L k = R k 2 D (k) R T k and R T k = ] [(D (k) F ɛ L k 2 ) T L k F D (k) can be incrementally maintained fast implementation D (k+) = D (k) + L k R k L (k+) R (k+) ɛ Often better performance in practice than ANLS Converges to stationary point when initialized with positive matrix and sufficiently small ɛ 23 / 39
24 Multiplicative updates Gradient descent step with step size η kj R kj R kj + η kj ([L T D] kj [L T LR] kj ) Setting η kj = R kj (L T LR) kj, we obtain the multiplicative update rules L L DRT LRR T and R R LT D L T LR, where multiplication ( ) and division are element-wise Does not necessarily find optimum L (or R), but can be shown to never increase loss Faster than ANLS (no computation of pseudo-inverse), easy to implement and parallelize Zeros in factors are problematic (divisions become undefined) L L [DRT ] ɛ LRR T + ɛ and R R [LT D] ɛ L T LR + ɛ Lee and Seung, 2. Cichocki et al., / 39
25 Example (multiplicative updates) f (l, r) = ( lr) 2 +.5(l 2 + r 2 ) l l r.5l lr 2 r r l.5r l 2 r Step l r Converges to local minimum r / 39
26 Outline Non-Negative Matrix Factorization 2 Algorithms 3 Probabilistic Latent Semantic Analysis 4 Summary 26 / 39
27 Topic modeling Consider a document-word matrix constructed from some corpus D = air water pollution democrat republican doc doc doc 3 doc doc 5 Documents seem to talk about two topics Environment (with words air, water, and pollution) 2 Congress (with words democrat and republican) Can we automatically detect topics in documents? 27 / 39
28 A probabilistic viewpoint Let s normalize such that the entries sum to unity D = air water pollution democrat republican doc doc doc doc doc Put all words in an urn and draw. The probability to draw word w from document d is given by P(d, w) = D dw Matrix D can represent any probability distribution plsa tries to find a distribution that is close to D but exposes information about topics 28 / 39
29 Probabilistic latent semantic analysis (plsa) Definition (plsa, NMF formulation) Given a rank r, find matrices L, Σ, and R such that where D LΣR L m r is a non-negative, column-stochastic matrix (columns sum to unity), Σ r r is a non-negative, diagonal matrix that sums to unity, and R r n is a non-negative, row-stochastic matrix (rows sum to unity). is usually taken to be the (generalized) KL divergence Additional regularization or tempering necessary to avoid overfitting Hofmann, / 39
30 Example plsa factorization of example matrix air wat pol dem rep air wat pol dem rep D L Σ R Rank r corresponds to number of topics Σ kk corresponds to overall frequency of topic k L dk corresponds to contribution of document d to topic k R kw corresponds to frequency of word w in topic k.9.6 plsa constraints allow for probabilistic interpretation P(d, w) [LΣR] dw = k Σ kkl dk R kw = k P(k)P(d k)p(w k) plsa model imposes conditional independence constraints restricted space of distributions 3 / 39
31 Another example Concepts ( of 28) extracted from Science Magazine articles (2K) Hofmann, / 39
32 plsa geometry Rewrite probabilistic formulation P(d, w) = k P(w d) = z P(k)P(d k)p(w k) P(w z)p(z d) Generative process of creating a word Pick a document according to P(d) 2 Select a topic acc. to P(z d) 3 Select a word acc. to P(w z) Hofmann, / 39
33 Kullback-Leibler divergence () Let D be the unnormalized word-count data and denote by N total number of words Likelihood of seeing D when drawing N words with replacement is proportional to m n P(d, w) D dw d= w= plsa maximizes the log-likelihood of seeing the data given the model m n log P( D L, Σ, R) D dw log P(d, w L, Σ, R) Gaussier and Goutte, 25. d= w= = m n D log [LΣR] d= w= dw m n D dw D dw log +c D [LΣR] d= w= dw }{{} Kullback-Leibler divergence 33 / 39
34 Kullback-Leibler divergence (2) KL divergence D KL (P Q) = m n d= w= P dw log P dw Q dw Interpretation: expected number of extra bits for encoding a value drawn from P using an optimum code for distribution Q D KL (P Q) D KL (P P) = D KL (P Q) D KL (Q P) NMF-based plsa algorithms minimize the generalized KL divergence D GKL ( P Q) = m d= w= where P = D and Q = L ΣR n ( P dw log P dw Q P dw + Q dw ), dw 34 / 39
35 Multiplicative updates for GKL (w/o tempering) We first find a decomposition D L R, where L and R are non-negative matrices Update rules L L D R T diag(/ rowsums( R)) L R R R diag(/ colsums( L)) L T D L R GKL is non-increasing under these update rules Normalize by rescaling columns of L and rows of R to obtain L = L diag(/ colsums( L)) R = diag(/ rowsums( R)) R Σ = diag(colsums( L) rowsums( R)) Σ = Σ/ k Σ kk Lee and Seung, / 39
36 Applications of plsa Topic modeling Clustering documents Clustering terms Information retrieval Treat query q as a new document (new row in D and L) Determine P(k q) by keeping Σ and R fixed ( fold in the query) Retrieve documents with similar topic mixture as query Can deal with synonymy and polysemy Better generalization performance than LSA (=SVD), esp. with tempering In practice, outperformed by Latent Dirichlet Allocation (LDA) Hofmann, / 39
37 Outline Non-Negative Matrix Factorization 2 Algorithms 3 Probabilistic Latent Semantic Analysis 4 Summary 37 / 39
38 Lessons learned Non-negative matrix factorization (NMF) appears natural for non-negative data NMF encourages parts-based decomposition, interpretability, and (sometimes) sparseness Many variants, many applications Usually solved via alternating minimization algorithms Alternating non-negative least squares (ANLS) Projected gradient local hierarchical ALS (HALS) Multiplicative updates plsa is an approach to topic modeling that can be seen as an NMF 38 / 39
39 Literature David Skillicorn Understanding Complex Datasets: Data Mining with Matrix Decompositions (Chapter 8) Chapman and Hall, 27 Andrzej Cichocki, Rafal Zdunek, Anh Huy Phan, and Shun-ichi Amari Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation Wiley, 29 Yifeng Li and Alioune Ngom The NMF MATLAB Toolbox Renaud Gaujoux and Cathal Seoighe NMF R package References given at bottom of slides 39 / 39
Non-Negative Matrix Factorization
Chapter 3 Non-Negative Matrix Factorization Part 2: Variations & applications Geometry of NMF 2 Geometry of NMF 1 NMF factors.9.8 Data points.7.6 Convex cone.5.4 Projections.3.2.1 1.5 1.5 1.5.5 1 3 Sparsity
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationData Mining and Matrices
Data Mining and Matrices 05 Semi-Discrete Decomposition Rainer Gemulla, Pauli Miettinen May 16, 2013 Outline 1 Hunting the Bump 2 Semi-Discrete Decomposition 3 The Algorithm 4 Applications SDD alone SVD
More informationNon-Negative Matrix Factorization
Chapter 3 Non-Negative Matrix Factorization Part : Introduction & computation Motivating NMF Skillicorn chapter 8; Berry et al. (27) DMM, summer 27 2 Reminder A T U Σ V T T Σ, V U 2 Σ 2,2 V 2.8.6.6.3.6.5.3.6.3.6.4.3.6.4.3.3.4.5.3.5.8.3.8.3.3.5
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationNon-negative Matrix Factorization: Algorithms, Extensions and Applications
Non-negative Matrix Factorization: Algorithms, Extensions and Applications Emmanouil Benetos www.soi.city.ac.uk/ sbbj660/ March 2013 Emmanouil Benetos Non-negative Matrix Factorization March 2013 1 / 25
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationCOMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017
COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University TOPIC MODELING MODELS FOR TEXT DATA
More informationNon-negative matrix factorization with fixed row and column sums
Available online at www.sciencedirect.com Linear Algebra and its Applications 9 (8) 5 www.elsevier.com/locate/laa Non-negative matrix factorization with fixed row and column sums Ngoc-Diep Ho, Paul Van
More informationNonnegative Matrix Factorization
Nonnegative Matrix Factorization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,
More informationSimplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm
Simplicial Nonnegative Matrix Tri-Factorization: Fast Guaranteed Parallel Algorithm Duy-Khuong Nguyen 13, Quoc Tran-Dinh 2, and Tu-Bao Ho 14 1 Japan Advanced Institute of Science and Technology, Japan
More informationRank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs
Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs Bhargav Kanagal Department of Computer Science University of Maryland College Park, MD 277 bhargav@cs.umd.edu Vikas
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More information1 Non-negative Matrix Factorization (NMF)
2018-06-21 1 Non-negative Matrix Factorization NMF) In the last lecture, we considered low rank approximations to data matrices. We started with the optimal rank k approximation to A R m n via the SVD,
More informationMULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION
MULTIPLICATIVE ALGORITHM FOR CORRENTROPY-BASED NONNEGATIVE MATRIX FACTORIZATION Ehsan Hosseini Asl 1, Jacek M. Zurada 1,2 1 Department of Electrical and Computer Engineering University of Louisville, Louisville,
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationProbabilistic Latent Variable Models as Non-Negative Factorizations
1 Probabilistic Latent Variable Models as Non-Negative Factorizations Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis Abstract In this paper, we present a family of probabilistic latent variable models
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationOrthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds
Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo and Seungjin Choi Department of Computer Science Pohang University of Science and Technology San 31 Hyoja-dong,
More informationInformation Retrieval
Introduction to Information CS276: Information and Web Search Christopher Manning and Pandu Nayak Lecture 13: Latent Semantic Indexing Ch. 18 Today s topic Latent Semantic Indexing Term-document matrices
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Probabilistic Latent Semantic Analysis Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, 2014 1 / 47 Big picture: KDDM
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationInformation Retrieval and Topic Models. Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin)
Information Retrieval and Topic Models Mausam (Based on slides of W. Arms, Dan Jurafsky, Thomas Hofmann, Ata Kaban, Chris Manning, Melanie Martin) Sec. 1.1 Unstructured data in 1620 Which plays of Shakespeare
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationQuick Introduction to Nonnegative Matrix Factorization
Quick Introduction to Nonnegative Matrix Factorization Norm Matloff University of California at Davis 1 The Goal Given an u v matrix A with nonnegative elements, we wish to find nonnegative, rank-k matrices
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationNon-Negative Matrix Factorization with Quasi-Newton Optimization
Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix
More informationcross-language retrieval (by concatenate features of different language in X and find co-shared U). TOEFL/GRE synonym in the same way.
10-708: Probabilistic Graphical Models, Spring 2015 22 : Optimization and GMs (aka. LDA, Sparse Coding, Matrix Factorization, and All That ) Lecturer: Yaoliang Yu Scribes: Yu-Xiang Wang, Su Zhou 1 Introduction
More informationSingle-channel source separation using non-negative matrix factorization
Single-channel source separation using non-negative matrix factorization Mikkel N. Schmidt Technical University of Denmark mns@imm.dtu.dk www.mikkelschmidt.dk DTU Informatics Department of Informatics
More informationData Mining and Matrices
Data Mining and Matrices 08 Boolean Matrix Factorization Rainer Gemulla, Pauli Miettinen June 13, 2013 Outline 1 Warm-Up 2 What is BMF 3 BMF vs. other three-letter abbreviations 4 Binary matrices, tiles,
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2014 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationLatent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276,
More informationRegularized Alternating Least Squares Algorithms for Non-negative Matrix/Tensor Factorization
Regularized Alternating Least Squares Algorithms for Non-negative Matrix/Tensor Factorization Andrzej CICHOCKI and Rafal ZDUNEK Laboratory for Advanced Brain Signal Processing, RIKEN BSI, Wako-shi, Saitama
More informationCS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization
CS264: Beyond Worst-Case Analysis Lecture #15: Topic Modeling and Nonnegative Matrix Factorization Tim Roughgarden February 28, 2017 1 Preamble This lecture fulfills a promise made back in Lecture #1,
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationAutomatic Rank Determination in Projective Nonnegative Matrix Factorization
Automatic Rank Determination in Projective Nonnegative Matrix Factorization Zhirong Yang, Zhanxing Zhu, and Erkki Oja Department of Information and Computer Science Aalto University School of Science and
More informationLinear Algebra Background
CS76A Text Retrieval and Mining Lecture 5 Recap: Clustering Hierarchical clustering Agglomerative clustering techniques Evaluation Term vs. document space clustering Multi-lingual docs Feature selection
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationCONVOLUTIVE NON-NEGATIVE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT
CONOLUTIE NON-NEGATIE MATRIX FACTORISATION WITH SPARSENESS CONSTRAINT Paul D. O Grady Barak A. Pearlmutter Hamilton Institute National University of Ireland, Maynooth Co. Kildare, Ireland. ABSTRACT Discovering
More informationhe Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation
he Applications of Tensor Factorization in Inference, Clustering, Graph Theory, Coding and Visual Representation Amnon Shashua School of Computer Science & Eng. The Hebrew University Matrix Factorization
More informationNonnegative matrix factorization (NMF) for determined and underdetermined BSS problems
Nonnegative matrix factorization (NMF) for determined and underdetermined BSS problems Ivica Kopriva Ruđer Bošković Institute e-mail: ikopriva@irb.hr ikopriva@gmail.com Web: http://www.lair.irb.hr/ikopriva/
More informationLatent Semantic Models. Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze
Latent Semantic Models Reference: Introduction to Information Retrieval by C. Manning, P. Raghavan, H. Schutze 1 Vector Space Model: Pros Automatic selection of index terms Partial matching of queries
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationProblems. Looks for literal term matches. Problems:
Problems Looks for literal term matches erms in queries (esp short ones) don t always capture user s information need well Problems: Synonymy: other words with the same meaning Car and automobile 电脑 vs.
More informationIntelligent Data Analysis. Mining Textual Data. School of Computer Science University of Birmingham
Intelligent Data Analysis Mining Textual Data Peter Tiňo School of Computer Science University of Birmingham Representing documents as numerical vectors Use a special set of terms T = {t 1, t 2,..., t
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition
ENGG5781 Matrix Analysis and Computations Lecture 10: Non-Negative Matrix Factorization and Tensor Decomposition Wing-Kin (Ken) Ma 2017 2018 Term 2 Department of Electronic Engineering The Chinese University
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationNMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION. Julian M. Becker, Christian Sohn and Christian Rohlfing
NMF WITH SPECTRAL AND TEMPORAL CONTINUITY CRITERIA FOR MONAURAL SOUND SOURCE SEPARATION Julian M. ecker, Christian Sohn Christian Rohlfing Institut für Nachrichtentechnik RWTH Aachen University D-52056
More informationCS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu
CS598 Machine Learning in Computational Biology (Lecture 5: Matrix - part 2) Professor Jian Peng Teaching Assistant: Rongda Zhu Feature engineering is hard 1. Extract informative features from domain knowledge
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationNovel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations
Novel Alternating Least Squares Algorithm for Nonnegative Matrix and Tensor Factorizations Anh Huy Phan 1, Andrzej Cichocki 1,, Rafal Zdunek 1,2,andThanhVuDinh 3 1 Lab for Advanced Brain Signal Processing,
More informationOn the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing
Computational Statistics and Data Analysis 52 (2008) 3913 3927 www.elsevier.com/locate/csda On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing Chris
More informationNonnegative Matrix Factorization. Data Mining
The Nonnegative Matrix Factorization in Data Mining Amy Langville langvillea@cofc.edu Mathematics Department College of Charleston Charleston, SC Yahoo! Research 10/18/2005 Outline Part 1: Historical Developments
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationCSE 494/598 Lecture-6: Latent Semantic Indexing. **Content adapted from last year s slides
CSE 494/598 Lecture-6: Latent Semantic Indexing LYDIA MANIKONDA HT TP://WWW.PUBLIC.ASU.EDU/~LMANIKON / **Content adapted from last year s slides Announcements Homework-1 and Quiz-1 Project part-2 released
More informationEmbeddings Learned By Matrix Factorization
Embeddings Learned By Matrix Factorization Benjamin Roth; Folien von Hinrich Schütze Center for Information and Language Processing, LMU Munich Overview WordSpace limitations LinAlgebra review Input matrix
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationUsing Matrix Decompositions in Formal Concept Analysis
Using Matrix Decompositions in Formal Concept Analysis Vaclav Snasel 1, Petr Gajdos 1, Hussam M. Dahwa Abdulla 1, Martin Polovincak 1 1 Dept. of Computer Science, Faculty of Electrical Engineering and
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationFast Coordinate Descent methods for Non-Negative Matrix Factorization
Fast Coordinate Descent methods for Non-Negative Matrix Factorization Inderjit S. Dhillon University of Texas at Austin SIAM Conference on Applied Linear Algebra Valencia, Spain June 19, 2012 Joint work
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationAbstract. LANDI, AMANDA KIM. The Nonnegative Matrix Factorization: Methods and Applications. (Under the direction of Kazufumi Ito.
Abstract LANDI, AMANDA KIM. The Nonnegative Matrix Factorization: Methods and Applications. (Under the direction of Kazufumi Ito.) In today s world, data continues to grow in size and complexity. Thus,
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2017
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2017 Admin Assignment 4: Due Friday. Assignment 5: Posted, due Monday of last week of classes Last Time: PCA with Orthogonal/Sequential
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationManning & Schuetze, FSNLP, (c)
page 554 554 15 Topics in Information Retrieval co-occurrence Latent Semantic Indexing Term 1 Term 2 Term 3 Term 4 Query user interface Document 1 user interface HCI interaction Document 2 HCI interaction
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationDimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification
250 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'15 Dimension Reduction Using Nonnegative Matrix Tri-Factorization in Multi-label Classification Keigo Kimura, Mineichi Kudo and Lu Sun Graduate
More informationNon-Negative Matrix Factorization And Its Application to Audio. Tuomas Virtanen Tampere University of Technology
Non-Negative Matrix Factorization And Its Application to Audio Tuomas Virtanen Tampere University of Technology tuomas.virtanen@tut.fi 2 Contents Introduction to audio signals Spectrogram representation
More informationNonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation
Nonnegative Matrix Factor 2-D Deconvolution for Blind Single Channel Source Separation Mikkel N. Schmidt and Morten Mørup Technical University of Denmark Informatics and Mathematical Modelling Richard
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationNonnegative Matrix Factorization with Orthogonality Constraints
Nonnegative Matrix Factorization with Orthogonality Constraints Jiho Yoo and Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology (POSTECH) Pohang, Republic
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationNonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, Chi-square Statistic, and a Hybrid Method
Nonnegative Matrix Factorization and Probabilistic Latent Semantic Indexing: Equivalence, hi-square Statistic, and a Hybrid Method hris Ding a, ao Li b and Wei Peng b a Lawrence Berkeley National Laboratory,
More informationDictionary Learning Using Tensor Methods
Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone
More informationFast Nonnegative Matrix Factorization with Rank-one ADMM
Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,
More informationDATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Linear models for regression Regularized
More informationMatrix decompositions and latent semantic indexing
18 Matrix decompositions and latent semantic indexing On page 113, we introduced the notion of a term-document matrix: an M N matrix C, each of whose rows represents a term and each of whose columns represents
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More information