Dictionary Learning Using Tensor Methods
|
|
- Annis Gardner
- 6 years ago
- Views:
Transcription
1 Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang.
2 Feature learning as cornerstone of ML ML Practice
3 Feature learning as cornerstone of ML ML Practice ML Papers
4 Feature learning as cornerstone of ML Find efficient representation of data, e.g. based on sparsity, Invariances, low dimensional structures etc. ML Practice ML Papers Feature engineering typically critical for good performance Deep learning has shown considerable promise for feature learning
5 Feature learning as cornerstone of ML Find efficient representation of data, e.g. based on sparsity, Invariances, low dimensional structures etc. ML Practice ML Papers Feature engineering typically critical for good performance Deep learning has shown considerable promise for feature learning Can we provide principled approaches which are guaranteed to learn good features?
6 Applications of Representation Learning Compressed sensing Extensive literature on compressed sensing Few linear measurements to recover sparse signals What if the signal is not sparse in input representation? What if the dictionary has invariances, e.g. shift, rotation.
7 Applications of Representation Learning Compressed sensing Extensive literature on compressed sensing Few linear measurements to recover sparse signals What if the signal is not sparse in input representation? What if the dictionary has invariances, e.g. shift, rotation. Can we learn a representation where the signal is sparse?
8 Applications of Representation Learning Compressed sensing Extensive literature on compressed sensing Few linear measurements to recover sparse signals What if the signal is not sparse in input representation? What if the dictionary has invariances, e.g. shift, rotation. Can we learn a representation where the signal is sparse? Topic Modeling Unsupervised learning of admixtures. In text documents, social networks (community modeling), biological models,...
9 Dictionary Learning Model Goal: Find dictionary A with k elements such that each data point is a linear combination of sparse combination of dictionary elements. X A H =
10 Dictionary Learning Model Goal: Find dictionary A with k elements such that each data point is a linear combination of sparse combination of dictionary elements. X A H = Topic models: x i is a document, A contains topics, h i gives topics in document i
11 Dictionary Learning Model Goal: Find dictionary A with k elements such that each data point is a linear combination of sparse combination of dictionary elements. X A H = Topic models: x i is a document, A contains topics, h i gives topics in document i Compressed sensing: x i are the signals, A is a basis with sparse representation
12 Dictionary Learning Model Goal: Find dictionary A with k elements such that each data point is a linear combination of sparse combination of dictionary elements. X A H = Topic models: x i is a document, A contains topics, h i gives topics in document i Compressed sensing: x i are the signals, A is a basis with sparse representation Images: x i is an image, A contains filters, h i gives filters present in image i (also need to incorporate invariances)
13 Outline 1 Introduction 2 Tensor Methods for Dictionary Learning 3 Convolutional Dictionary Models 4 Conclusion
14 Computational Challenges Learning Dictionary Models Maximum likelihood: non-convex optimization. NP-hard. Practice: Local search approaches such as gradient descent, EM, Variational Bayes have no consistency guarantees. Can get stuck in bad local optima. Poor convergence rates and hard to parallelize. Tensor methods can yield guaranteed learning
15 Moment Matrices and Tensors Multivariate Moments M 1 := E[x], M 2 := E[x x], M 3 := E[x x x]. Matrix E[x x] R d d is a second order tensor. E[x x] i1,i 2 = E[x i1 x i2 ]. For matrices: E[x x] = E[xx ]. Tensor E[x x x] R d d d is a third order tensor. E[x x x] i1,i 2,i 3 = E[x i1 x i2 x i3 ].
16 Spectral Decomposition of Tensors M 2 = i λ i u i v i = +... Matrix M 2 λ 1 u 1 v 1 λ 2 u 2 v 2
17 Spectral Decomposition of Tensors M 2 = i λ i u i v i = +... Matrix M 2 λ 1 u 1 v 1 λ 2 u 2 v 2 M 3 = i λ i u i v i w i = +... Tensor M 3 λ 1 u 1 v 1 w 1 λ 2 u 2 v 2 w 2 u v w is a rank-1 tensor since its (i 1,i 2,i 3 ) th entry is u i1 v i2 w i3.
18 Moment forms for Dictionary Models x i = Ah i, i [n]. Independent components analysis (ICA) h i are independent, e.g. Bernoulli Gaussian M 4 := E[x x x x] T, where T i1,i 2,i 3,i 4 := E[x i1 x i2 ]E[x i3 x i4 ]+E[x i1 x i3 ]E[x i2 x i4 ]+E[x i1 x i4 ]E[x i2 x i3 ], Let κ j := E[h 4 j ] 3E2 [h 2 j ], j [k]. Then, we have M 4 = j [k]κ j a j a j a j a j.
19 Moment forms for Dictionary Models General (sparse) coefficients x i = Ah i, i [n], E[h i ] = s. E [ h 4 i] = E [ h 2 i ] = βs/k, E [ h 2 i h2 j] τ, i j, E [ h 3 i h j] = 0, i j, E[x x x x] = j [k]κ j a j a j a j a j +E, where E τ A 4.
20 Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w a b c T(i,j,l) = w a(i) b(j) c(l).
21 Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w a b c T(i,j,l) = w a(i) b(j) c(l). CANDECOMP/PARAFAC (CP) Decomposition T = j [k]w j a j b j c j R d d d, a j,b j,c j S d 1. = +... Tensor T w 1 a 1 b 1 c 1 w 2 a 2 b 2 c 2
22 Tensor Rank and Tensor Decomposition Rank-1 tensor: T = w a b c T(i,j,l) = w a(i) b(j) c(l). CANDECOMP/PARAFAC (CP) Decomposition T = j [k]w j a j b j c j R d d d, a j,b j,c j S d 1. = +... Tensor T w 1 a 1 b 1 c 1 w 2 a 2 b 2 c 2 k: tensor rank, d: ambient dimension. k d: undercomplete and k > d: overcomplete.
23 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i.
24 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v).
25 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v).
26 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)?
27 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)? {v i} s are the only robust fixed points.
28 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)? {v i} s are the only robust fixed points. All other eigenvectors are saddle points.
29 Orthogonal Tensor Power Method Symmetric orthogonal tensor T R d d d : T = i [k]λ i v i v i v i. Recall matrix power method: v M(I,v) M(I,v). Algorithm: tensor power method: v T(I,v,v) T(I,v,v). How do we avoid spurious solutions (not part of decomposition)? {v i} s are the only robust fixed points. All other eigenvectors are saddle points. For an orthogonal tensor, no spurious local optima!
30 Putting it together Non-orthogonal tensor M 3 = i w ia i a i a i, M 2 = i w ia i a i. Whitening matrix W: a 1a2a3 W v 1 v 2 Multilinear transform: T = M 3 (W,W,W) v 3 Tensor M 3 Tensor T
31 Putting it together Non-orthogonal tensor M 3 = i w ia i a i a i, M 2 = i w ia i a i. Whitening matrix W: a 1a2a3 W v 1 v 2 Multilinear transform: T = M 3 (W,W,W) v 3 Tensor M 3 Tensor T Tensor Decomposition in Undercomplete Case: Solved!
32 Overcomplete Setting In general, tensor decomposition NP-hard. Tractable when A is incoherence, i.e. a i,a j 1 d for i j.
33 Overcomplete Setting In general, tensor decomposition NP-hard. Tractable when A is incoherence, i.e. a i,a j 1 d for i j. SVD Initialization Find the top singular vectors of T(I,I,θ) for θ N(0,I). Use them for initialization of power method. L trials.
34 Overcomplete Setting In general, tensor decomposition NP-hard. Tractable when A is incoherence, i.e. a i,a j 1 d for i j. SVD Initialization Find the top singular vectors of T(I,I,θ) for θ N(0,I). Use them for initialization of power method. L trials. Assumptions Number of initializations: L k Ω(k/d)2, Tensor Rank: k = O(d) No. of Iterations: N = Θ(log(1/ E )). Recall E : recovery error. Theorem (Global Convergence)[AGJ-COLT2015]: a 1 â (N) O( E ).
35 Improved Sample Complexity Analysis Dictionary A R d k satisfying RIP, sparse-ica model with sub-gaussian variables. Sparsity level s. Number of samples n. M 4 M 4 = Õ ( s 2 n + s 4 d 3 n ) Careful ǫ-net covering and bucketing.
36 Outline 1 Introduction 2 Tensor Methods for Dictionary Learning 3 Convolutional Dictionary Models 4 Conclusion
37 Convolutional Dictionary Model So far, invariances in dictionary are not incorporated. Convolutional models: incorporate invariances such as shift invariance. Image Dictionary elements
38 Rewriting as a standard dictionary model = = x fi wi x F w (a)convolutional model (b)reformulated model x = i f i w i = i Cir(f i )w i = F w Assume coefficients w i are independent (convolutional ICA model) Cumulant tensor has decomposition with components F i.
39 Moment forms and optimization x = i f i w i = i Cir(f i )w i = F w Assume coefficients w i are independent (convolutional ICA model) Cumulant tensor has decomposition with components F i. Cumulant λ 1 (F 1 ) 3 +λ 2 (F 2 ) 3... =
40 Efficient Optimization Techniques cumulant = j λ j F 3 j or matricization: cumulant = F Λ (F F )
41 Efficient Optimization Techniques cumulant = λ j Fj 3 or matricization: cumulant = F Λ (F F ) j Objective function: min Cumulant FΛ(F F) 2 F F s.t. blk l (F) = UDiag(FFT(f l ))U H, f l 2 = 1.
42 Efficient Optimization Techniques cumulant = λ j Fj 3 or matricization: cumulant = F Λ (F F ) j Objective function: min Cumulant FΛ(F F) 2 F F s.t. blk l (F) = UDiag(FFT(f l ))U H, f l 2 = 1. Alternating minimization: Relax FΛ(F F) to FΛ(H G)
43 Efficient Optimization Techniques cumulant = λ j Fj 3 or matricization: cumulant = F Λ (F F ) j Objective function: min Cumulant FΛ(F F) 2 F F s.t. blk l (F) = UDiag(FFT(f l ))U H, f l 2 = 1. Alternating minimization: Relax FΛ(F F) to FΛ(H G) ( Under full column rank H G, form: T := Cumulant (H G) ).
44 Efficient Optimization Techniques cumulant = λ j Fj 3 or matricization: cumulant = F Λ (F F ) j Objective function: min Cumulant FΛ(F F) 2 F F s.t. blk l (F) = UDiag(FFT(f l ))U H, f l 2 = 1. Alternating minimization: Relax FΛ(F F) to FΛ(H G) ( Under full column rank H G, form: T := Cumulant (H G) ). Main Result: Optimal solution f opt l, p [n],q := (i j) mod n, blk l (T) j 1 blk l (T) i j Iq p 1 f opt i,j [n] l (p) =, I q p 1 i,j [n]
45 Efficient Optimization Techniques ( Under full column rank H G, form: T := Cumulant (H G) ). Optimal solution is then computed in closed form. ( Bottleneck computation: (H G) ). Naive implementation: O(n 6 ) time, where n is the length of signal. Running time of our method: For length-n signals and L number of filters, O(logn+logL) time with O(L 2 n 3 ) processors. Involves 2L FFT s, some matrix multiplications, inverse of diagonal matrices.
46 Experiments (synthetic) Convolutional tensor (CT). Alternating minimization (AM). error CT: f 1 AM: f 1 CT: f 2 AM: f 2 CT: Reconst AM: Reconst seconds CT AM seconds Proposed CT Baseline AM iteration 10 2 (a) Reconstruction Error Number of Filters L (b) Running Times Scale with L Number of Samples N (c) Running Times Scale with N
47 Experiments (NLP) Microsoft paraphrase dataset sentence pairs. Unsupervised convolutional tensor method: no outside information. F score. Method Description Outside Information F score Vector Similarity cosine similarity with tf-idf weights word similarity 75.3% ESA explicit semantic space word semantic profiles 79.3% LSA latent semantic space word semantic profiles 79.9% RMLMG graph subsumption lexical&syntactic&synonymy info 80.5% CT (proposed) convolutional dictionary learning none 80.7% MCS combine word similarity measures word similarity 81.3% STS combine semantic&string similarity semantic similarity 81.3% SSA salient semantic space word semantic profiles 81.4% matrixjcn JCN WordNet similarity with matrix word similarity 82.4% Paraphrase detected: (1) Amrozi accused his brother, whom he called the witness, of deliberately distorting his evidence. (2) Referring to him as only the witness, Amrozi accused his brother of deliberately distorting his evidence. Non-paraphrase detected : (1) I never organised a youth camp for the diocese of Bendigo. (2) I never attended a youth camp organised by that diocese.
48 Outline 1 Introduction 2 Tensor Methods for Dictionary Learning 3 Convolutional Dictionary Models 4 Conclusion
49 Summary and Outlook Summary Method of moments for learning dictionary elements. Invariances in convolutional models can be handled efficiently.
50 Summary Summary and Outlook Method of moments for learning dictionary elements. Invariances in convolutional models can be handled efficiently. Outlook Analyze optimization landscape for convolutional models for tensor methods. Extend to other kinds of invariances (e.g. rotation).
51 Summary Summary and Outlook Method of moments for learning dictionary elements. Invariances in convolutional models can be handled efficiently. Outlook Analyze optimization landscape for convolutional models for tensor methods. Extend to other kinds of invariances (e.g. rotation). How is feature learning useful for classification? Precise characterization for training neural networks: first polynomial time methods! Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods by Majid Janzamin, Hanie Sedghi and A.
Tensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationTensor Decompositions for Machine Learning. G. Roeder 1. UBC Machine Learning Reading Group, June University of British Columbia
Network Feature s Decompositions for Machine Learning 1 1 Department of Computer Science University of British Columbia UBC Machine Learning Group, June 15 2016 1/30 Contact information Network Feature
More informationNon-convex Robust PCA: Provable Bounds
Non-convex Robust PCA: Provable Bounds Anima Anandkumar U.C. Irvine Joint work with Praneeth Netrapalli, U.N. Niranjan, Prateek Jain and Sujay Sanghavi. Learning with Big Data High Dimensional Regime Missing
More informationGuaranteed Learning of Latent Variable Models through Spectral and Tensor Methods
Guaranteed Learning of Latent Variable Models through Spectral and Tensor Methods Anima Anandkumar U.C. Irvine Application 1: Clustering Basic operation of grouping data points. Hypothesis: each data point
More informationLearning Sentence Embeddings through Tensor Methods
Learning Sentence Embeddings through Tensor Methods Anima Anandkumar Joint work with Dr. Furong Huang.. ACL Workshop 2016 Representations for Text Understanding tree The weather is good. soccer football
More informationProvable Alternating Minimization Methods for Non-convex Optimization
Provable Alternating Minimization Methods for Non-convex Optimization Prateek Jain Microsoft Research, India Joint work with Praneeth Netrapalli, Sujay Sanghavi, Alekh Agarwal, Animashree Anandkumar, Rashish
More informationIdentifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints
Identifiability and Learning of Topic Models: Tensor Decompositions under Structural Constraints Anima Anandkumar U.C. Irvine Joint work with Daniel Hsu, Majid Janzamin Adel Javanmard and Sham Kakade.
More informationTensor intro 1. SIAM Rev., 51(3), Tensor Decompositions and Applications, Kolda, T.G. and Bader, B.W.,
Overview 1. Brief tensor introduction 2. Stein s lemma 3. Score and score matching for fitting models 4. Bringing it all together for supervised deep learning Tensor intro 1 Tensors are multidimensional
More informationOrthogonal tensor decomposition
Orthogonal tensor decomposition Daniel Hsu Columbia University Largely based on 2012 arxiv report Tensor decompositions for learning latent variable models, with Anandkumar, Ge, Kakade, and Telgarsky.
More informationRecent Advances in Non-Convex Optimization and its Implications to Learning
Recent Advances in Non-Convex Optimization and its Implications to Learning Anima Anandkumar.. U.C. Irvine.. ICML 2016 Tutorial Optimization at the heart of Machine Learning Unsupervised Learning Most
More informationCompressed Sensing and Neural Networks
and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa VS model in practice Document and query are represented by term vectors Terms are not necessarily orthogonal to each other Synonymy: car v.s. automobile Polysemy:
More informationAn Introduction to Spectral Learning
An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 Outline 1 Method of Moments 2 Learning topic models using spectral properties 3 Anchor words Preliminaries X 1,, X n p (x; θ), θ = (θ 1,
More informationAppendix A. Proof to Theorem 1
Appendix A Proof to Theorem In this section, we prove the sample complexity bound given in Theorem The proof consists of three main parts In Appendix A, we prove perturbation lemmas that bound the estimation
More informationLatent Semantic Analysis. Hongning Wang
Latent Semantic Analysis Hongning Wang CS@UVa Recap: vector space model Represent both doc and query by concept vectors Each concept defines one dimension K concepts define a high-dimensional space Element
More informationStructured matrix factorizations. Example: Eigenfaces
Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix
More informationBeating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods
Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods Majid Janzamin Hanie Sedghi Anima Anandkumar Abstract Training neural networks is a challenging non-convex
More informationDimensionality Reduction and Principle Components Analysis
Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture
More informationECE 598: Representation Learning: Algorithms and Models Fall 2017
ECE 598: Representation Learning: Algorithms and Models Fall 2017 Lecture 1: Tensor Methods in Machine Learning Lecturer: Pramod Viswanathan Scribe: Bharath V Raghavan, Oct 3, 2017 11 Introduction Tensors
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationSum-of-Squares Method, Tensor Decomposition, Dictionary Learning
Sum-of-Squares Method, Tensor Decomposition, Dictionary Learning David Steurer Cornell Approximation Algorithms and Hardness, Banff, August 2014 for many problems (e.g., all UG-hard ones): better guarantees
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationGuaranteed Learning of Latent Variable Models through Tensor Methods
Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of Maryland furongh@cs.umd.edu ACM SIGMETRICS Tutorial 2018 1/75 Tutorial Topic Learning algorithms for latent
More informationSketching for Large-Scale Learning of Mixture Models
Sketching for Large-Scale Learning of Mixture Models Nicolas Keriven Université Rennes 1, Inria Rennes Bretagne-atlantique Adv. Rémi Gribonval Outline Introduction Practical Approach Results Theoretical
More informationFast and Robust Phase Retrieval
Fast and Robust Phase Retrieval Aditya Viswanathan aditya@math.msu.edu CCAM Lunch Seminar Purdue University April 18 2014 0 / 27 Joint work with Yang Wang Mark Iwen Research supported in part by National
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationLecture Notes 10: Matrix Factorization
Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationarxiv: v1 [cs.lg] 13 Aug 2013
When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity arxiv:1308.2853v1 [cs.lg] 13 Aug 2013 Animashree Anandkumar, Daniel Hsu, Majid Janzamin
More informationVolodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang. May 11, 2015
Tensor Factorization via Matrix Factorization Volodymyr Kuleshov ú Arun Tejasvi Chaganty ú Percy Liang Stanford University May 11, 2015 Kuleshov, Chaganty, Liang (Stanford University) Tensor Factorization
More informationA Randomized Approach for Crowdsourcing in the Presence of Multiple Views
A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationEstimating Latent Variable Graphical Models with Moments and Likelihoods
Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods
More informationTensor Factorization via Matrix Factorization
Volodymyr Kuleshov Arun Tejasvi Chaganty Percy Liang Department of Computer Science Stanford University Stanford, CA 94305 Abstract Tensor factorization arises in many machine learning applications, such
More informationTHE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING
THE HIDDEN CONVEXITY OF SPECTRAL CLUSTERING Luis Rademacher, Ohio State University, Computer Science and Engineering. Joint work with Mikhail Belkin and James Voss This talk A new approach to multi-way
More informationNon-Convex Optimization in Machine Learning. Jan Mrkos AIC
Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More information2.3. Clustering or vector quantization 57
Multivariate Statistics non-negative matrix factorisation and sparse dictionary learning The PCA decomposition is by construction optimal solution to argmin A R n q,h R q p X AH 2 2 under constraint :
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationLearning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints Animashree Anandkumar 1, Daniel Hsu 2, Adel Javanmard 3, and Sham M. Kakade 2 1 Department of EECS, University of California,
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationGatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II
Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference
More informationCS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery
CS168: The Modern Algorithmic Toolbox Lecture #10: Tensors, and Low-Rank Tensor Recovery Tim Roughgarden & Gregory Valiant May 3, 2017 Last lecture discussed singular value decomposition (SVD), and we
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationCOR-OPT Seminar Reading List Sp 18
COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes
More informationGlobal Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond
Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This
More informationFEAST at Play: Feature ExtrAction using Score function Tensors
JMLR: Workshop and Conference Proceedings 44 (2015) 130-144 NIPS 2015 The 1st International Workshop Feature Extraction: Modern Questions and Challenges FEAST at Play: Feature ExtrAction using Score function
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationFeature Engineering, Model Evaluations
Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering
More informationLearning Tractable Graphical Models: Latent Trees and Tree Mixtures
Learning Tractable Graphical Models: Latent Trees and Tree Mixtures Anima Anandkumar U.C. Irvine Joint work with Furong Huang, U.N. Niranjan, Daniel Hsu and Sham Kakade. High-Dimensional Graphical Modeling
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationData Mining and Matrices
Data Mining and Matrices 6 Non-Negative Matrix Factorization Rainer Gemulla, Pauli Miettinen May 23, 23 Non-Negative Datasets Some datasets are intrinsically non-negative: Counters (e.g., no. occurrences
More informationCS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya
CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents
More informationScaling Neighbourhood Methods
Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationApplied Machine Learning for Biomedical Engineering. Enrico Grisan
Applied Machine Learning for Biomedical Engineering Enrico Grisan enrico.grisan@dei.unipd.it Data representation To find a representation that approximates elements of a signal class with a linear combination
More informationWhy Sparse Coding Works
Why Sparse Coding Works Mathematical Challenges in Deep Learning Shaowei Lin (UC Berkeley) shaowei@math.berkeley.edu 10 Aug 2011 Deep Learning Kickoff Meeting What is Sparse Coding? There are many formulations
More informationOptimization for Compressed Sensing
Optimization for Compressed Sensing Robert J. Vanderbei 2014 March 21 Dept. of Industrial & Systems Engineering University of Florida http://www.princeton.edu/ rvdb Lasso Regression The problem is to solve
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationLearning Linear Bayesian Networks with Latent Variables
Learning Linear Bayesian Networks with Latent Variables Adel Javanmard Stanford University joint work with Anima Anandkumar, Daniel Hsu y, Sham Kakade y University of California, Irvine y Microsoft Research,
More information1. Background: The SVD and the best basis (questions selected from Ch. 6- Can you fill in the exercises?)
Math 35 Exam Review SOLUTIONS Overview In this third of the course we focused on linear learning algorithms to model data. summarize: To. Background: The SVD and the best basis (questions selected from
More informationDonald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016
Optimization for Tensor Models Donald Goldfarb IEOR Department Columbia University UCLA Mathematics Department Distinguished Lecture Series May 17 19, 2016 1 Tensors Matrix Tensor: higher-order matrix
More informationGeneric Text Summarization
June 27, 2012 Outline Introduction 1 Introduction Notation and Terminology 2 3 4 5 6 Text Summarization Introduction Notation and Terminology Two Types of Text Summarization Query-Relevant Summarization:
More informationIndependent Component Analysis. Contents
Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationMachine Learning for Signal Processing Sparse and Overcomplete Representations
Machine Learning for Signal Processing Sparse and Overcomplete Representations Abelino Jimenez (slides from Bhiksha Raj and Sourish Chaudhuri) Oct 1, 217 1 So far Weights Data Basis Data Independent ICA
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationOverview. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion
More informationNeural Network Approximation. Low rank, Sparsity, and Quantization Oct. 2017
Neural Network Approximation Low rank, Sparsity, and Quantization zsc@megvii.com Oct. 2017 Motivation Faster Inference Faster Training Latency critical scenarios VR/AR, UGV/UAV Saves time and energy Higher
More informationCSC 576: Variants of Sparse Learning
CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More informationLatent semantic indexing
Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,
More informationCompressive Sensing and Beyond
Compressive Sensing and Beyond Sohail Bahmani Gerorgia Tech. Signal Processing Compressed Sensing Signal Models Classics: bandlimited The Sampling Theorem Any signal with bandwidth B can be recovered
More informationStatistical Geometry Processing Winter Semester 2011/2012
Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian
More informationClustering. SVD and NMF
Clustering with the SVD and NMF Amy Langville Mathematics Department College of Charleston Dagstuhl 2/14/2007 Outline Fielder Method Extended Fielder Method and SVD Clustering with SVD vs. NMF Demos with
More informationCPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018
CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,
More informationPhase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms
Phase Retrieval from Local Measurements: Deterministic Measurement Constructions and Efficient Recovery Algorithms Aditya Viswanathan Department of Mathematics and Statistics adityavv@umich.edu www-personal.umich.edu/
More informationIntroduction to the Tensor Train Decomposition and Its Applications in Machine Learning
Introduction to the Tensor Train Decomposition and Its Applications in Machine Learning Anton Rodomanov Higher School of Economics, Russia Bayesian methods research group (http://bayesgroup.ru) 14 March
More informationc Springer, Reprinted with permission.
Zhijian Yuan and Erkki Oja. A FastICA Algorithm for Non-negative Independent Component Analysis. In Puntonet, Carlos G.; Prieto, Alberto (Eds.), Proceedings of the Fifth International Symposium on Independent
More informationRecovering any low-rank matrix, provably
Recovering any low-rank matrix, provably Rachel Ward University of Texas at Austin October, 2014 Joint work with Yudong Chen (U.C. Berkeley), Srinadh Bhojanapalli and Sujay Sanghavi (U.T. Austin) Matrix
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationRobust Principal Component Analysis
ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M
More informationShort Course Robust Optimization and Machine Learning. Lecture 4: Optimization in Unsupervised Learning
Short Course Robust Optimization and Machine Machine Lecture 4: Optimization in Unsupervised Laurent El Ghaoui EECS and IEOR Departments UC Berkeley Spring seminar TRANSP-OR, Zinal, Jan. 16-19, 2012 s
More informationLinear Factor Models. Sargur N. Srihari
Linear Factor Models Sargur N. srihari@cedar.buffalo.edu 1 Topics in Linear Factor Models Linear factor model definition 1. Probabilistic PCA and Factor Analysis 2. Independent Component Analysis (ICA)
More informationRank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs
Rank Selection in Low-rank Matrix Approximations: A Study of Cross-Validation for NMFs Bhargav Kanagal Department of Computer Science University of Maryland College Park, MD 277 bhargav@cs.umd.edu Vikas
More information