Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Size: px
Start display at page:

Download "Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms"

Transcription

1 Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms François Caron Department of Statistics, Oxford STATLEARN 2014, Paris April 7, 2014 Joint work with Adrien Todeschini, Marie Chavent (INRIA, U. of Bordeaux) F. Caron 1 / 44

2 Outline Introduction Complete case Matrix completion Experiments Conclusion and perspectives F. Caron 2 / 44

3 Matrix Completion Netflix prize 480k users and 18k movies providing 1-5 ratings 99% of the ratings are missing Objective: predict missing entries in order to make recommendations Movies Users F. Caron 3 / 44

4 Matrix Completion Objective Complete a matrix X of size m n from a subset of its entries Applications Recommender systems Image inpainting Imputation of missing data F. Caron 4 / 44

5 Matrix Completion Potentially large matrices (each dimension of order ) Very sparsely observed (1%-10%) F. Caron 5 / 44

6 Low rank Matrix Completion Assume that the complete matrix Z is of low rank with k min(m, n). }{{} Z m n A }{{} m k B T }{{} k n F. Caron 6 / 44

7 Low rank Matrix Completion Assume that the complete matrix Z is of low rank with k min(m, n). }{{} Z m n A }{{} m k B T }{{} k n Features Users Items F. Caron 6 / 44

8 Low rank Matrix Completion Let Ω {1,..., m} {1,..., n} be the subset of observed entries For (i, j) Ω where σ 2 > 0 X ij = Z ij + ε ij, ε ij iid N (0, σ 2 ) F. Caron 7 / 44

9 Low rank Matrix Completion Optimization problem minimize Z 1 (X 2σ 2 ij Z ij ) 2 + λ rank(z) (i,j) Ω }{{}}{{} loglikelihood penalty where λ > 0 is some regularization parameter. Non-convex Computationally hard for general subset Ω F. Caron 8 / 44

10 Low rank Matrix Completion Matrix completion with nuclear norm penalty minimize Z 1 (X 2σ 2 ij Z ij ) 2 + λ Z (i,j) Ω }{{}}{{} loglikelihood penalty where Z is the nuclear norm of Z, or the sum of the singular values of Z. Convex relaxation of the rank penalty optimization [Fazel, 2002, Candès and Recht, 2009, Candès and Tao, 2010] F. Caron 9 / 44

11 Low rank Matrix Completion Soft-Impute algorithm Start with an initial matrix Z (0) At each iteration t = 1, 2,... Replace the missing elements in X with those in Z (t 1) Perform a soft-thresholded SVD on the completed matrix, with shrinkage λ to obtain the low rank matrix Z (t) [Mazumder et al., 2010] F. Caron 10 / 44

12 Low rank Matrix Completion Soft-Impute algorithm Soft-thresholded SVD yields a low-rank representation Each iteration decreases the value of the nuclear norm objective function towards its minimum Various strategies proposed to scale the algorithm to problems where n, m of order 10 6 Same shrinkage applied to all singular values [Mazumder et al., 2010] F. Caron 11 / 44

13 Contributions Probabilistic interpretation of the nuclear norm objective function Maximum A Posteriori estimation assuming exponential priors on the singular values Soft-Impute = Expectation-Maximization algorithm Construction of alternative non-convex objective functions building on hierarchical priors Bridge the gap between the rank penalty and the nuclear norm penalty EM: Adaptative algorithm that iteratively adjusts the shrinkage coefficients for each singular value Similar to adaptive lasso in multivariate regression Numerical results show the interest of the approach on various datasets F. Caron 12 / 44

14 Outline Introduction Complete case Hierarchical adaptive spectral penalty EM algorithm for MAP estimation Matrix completion Experiments Conclusion and perspectives F. Caron 13 / 44

15 Nuclear Norm penalty Complete matrix X Nuclear norm objective function minimize Z 1 2σ 2 X Z 2 F + λ Z where 2 F is the Frobenius norm Global solution given by a soft-thresholded SVD Ẑ = S λσ 2(X) where S λ (X) = Ũ D λ Ṽ T with D λ = diag(( d 1 λ) +,..., ( d r λ) + ) and t + = max(t, 0). [Cai et al., 2010, Mazumder et al., 2010] F. Caron 14 / 44

16 Nuclear Norm penalty Maximum A Posteriori (MAP) estimate under the prior Ẑ = arg max [log p(x Z) + log p(z)] Z p(z) exp ( λ Z ) where Z = UDV T with D = diag(d 1, d 2,..., d r ), and U, V iid Haar uniform prior on unitary matrices d i iid Exp(λ) F. Caron 15 / 44

17 Hierarchical adaptive spectral penalty Each singular value has its own random shrinkage coefficient Hierarchical model, for each singular value i = 1,..., r d i γ i Exp(γ i ) γ i Gamma(a, b) Marginal distribution over d i : p(d i ) = 0 Exp(d i ; γ i ) Gamma(γ i ; a, b)dγ i = ab a (d i + b) a+1 Pareto distribution with heavier tails than exponential distribution [Todeschini et al., 2013] F. Caron 16 / 44

18 Hierarchical adaptive spectral penalty β = β = 2 β = p(di) d i HASP penalty Figure: Marginal distribution p(d i ) with a = b = β pen(z) = log p(z) = r (a + 1) log(b + d i ) Admits as special case the nuclear norm penalty λ Z when a = λb and b. F. Caron 17 / 44 i=1

19 Hierarchical adaptive spectral penalty (a) Nuclear norm (b) HASP (β = 1) (c) HASP (β = 0.1) (d) Rank penalty (e) l 1 norm (f) HAL (β = 1) (g) HAL (β = 0.1) (h) l 0 norm Figure: Top: Manifold of constant penalty, for a symmetric 2 2 matrix Z = [x, y; y, z] for (a) the nuclear norm, hierarchical adaptive spectral penalty with a = b = β (b) β = 1 and (c) β = 0.1, and (d) the rank penalty. Bottom: contour of constant penalty for a diagonal matrix [x, 0; 0, z], where one recovers the classical (e) lasso, (f-g) hierarchical lasso and (h) l 0 F. penalties. Caron 18 / 44

20 EM algorithm for MAP estimation Expectation Maximization (EM) algorithm to obtain a MAP estimate i.e. to minimize Ẑ = arg max [log p(x Z) + log p(z)] Z L(Z) = 1 2σ 2 X Z 2 F + r (a + 1) log(b + d i ) i=1 F. Caron 19 / 44

21 EM algorithm for MAP estimation Latent variables: γ = (γ 1,..., γ r ) E step: Q(Z, Z ) = E [log(p(x, Z, γ)) Z, X] = C 1 r 2σ X 2 Z 2 F ω i d i where ω i = E[γ i d i ] = a+1. b+d i i=1 F. Caron 20 / 44

22 EM algorithm for MAP estimation M step: minimize Z 1 2σ 2 X Z 2 F + r ω i d i (1) i=1 (1) is an adaptive spectral penalty regularized optimization problem, with weights ω i = a+1. b+d i d 1 d 2... d r 0 ω 1 ω 2... ω r (2) Given condition (2), the solution is given by a weighted soft-thresholded SVD Ẑ = S σ 2 ω(x) (3) where S ω (X) = Ũ D ω Ṽ T with D ω = diag(( d 1 ω 1 ) +,..., ( d r ω r ) + ). [Gaïffas and Lecué, 2011] F. Caron 21 / 44

23 EM algorithm for MAP estimation Nuclear Norm HASP (β = 2) HASP (β = 0.1) d i d i Figure: Thresholding rules on the singular values d i of X The weights will penalize less heavily higher singular values, hence reducing bias. F. Caron 22 / 44

24 Low rank estimation of complete matrices Hierarchical Adaptive Soft Thresholded (HAST) algorithm for low rank estimation of complete matrices Initialize Z (0). At iteration t 1 For i = 1,..., r, compute the weights ω (t) i = a+1 b+d (t 1) i Set Z (t) = S σ 2 ω (t)(x) If L(Z(t 1) ) L(Z (t) ) < ε then return L(Z (t 1) ) Ẑ = Z(t) Admits soft-thresholded SVD operator as a special case when a = bλ and b = β. F. Caron 23 / 44

25 Settings Parametrization: We set b = β and a = λβ where λ and β are tuning parameters that can be chosen by cross-validation. Possible to estimate σ within the EM algorithm. Initialization: Initialization with the soft thresholded SVD with parameter σ 2 λ F. Caron 24 / 44

26 Outline Introduction Complete case Matrix completion Experiments Conclusion and perspectives F. Caron 25 / 44

27 Matrix completion Only a subset Ω {1,..., m} {1,..., n} of the entries of the matrix X is observed. Subset operators: P Ω (X)(i, j) = { Xij if (i, j) Ω 0 otherwise P Ω (X)(i, j) = { 0 X ij if (i, j) Ω otherwise Same prior over Z MAP estimate is obtained by minimizing L(Z) = 1 2σ 2 P Ω(X) P Ω (Z) 2 F + (a + 1) r log(b + d i ) i=1 F. Caron 26 / 44

28 EM algorithm for MAP estimation Latent variables: γ and P Ω (X) Hierarchical Adaptive Soft Impute (HASI) algorithm for matrix completion Initialize Z (0). At iteration t 1 For i = 1,..., r, compute the weights ω (t) i = a+1 b+d (t 1) i Set Z (t) ( = S σ 2 ω (t) PΩ (X) + PΩ (Z(t 1) ) ) If L(Z(t 1) ) L(Z (t) ) L(Z (t 1) ) < ε then return Ẑ = Z(t) F. Caron 27 / 44

29 EM algorithm for MAP estimation HASI algorithm admits the Soft-Impute algorithm as a special case when a = λb and b = β. In this case, ω (t) i = λ for all i. When β <, the algorithm adaptively updates the weights so that to penalize less heavily higher singular values. F. Caron 28 / 44

30 Initialization Non-convex objective function - different initializations may lead to different modes. We set a = λb and b = β and initialize the algorithm with the Soft-Impute algorithm with regularization parameter σ 2 λ. F. Caron 29 / 44

31 Scaling Similarly to the Soft-Impute algorithm, the computationally bottleneck is the computation of the weighted soft-truncated SVD ) S σ 2 ω (P (t) Ω (X) + PΩ (Z(t 1) ) For large matrices, one can resort to the PROPACK algorithm. Efficiently computes the truncated SVD of the sparse + low rank matrix P Ω (X) + PΩ (Z(t 1) ) = P Ω (X) P Ω (Z (t 1) ) }{{} sparse and can thus handle large matrices. + Z (t 1) }{{} low rank [Larsen, 2004] F. Caron 30 / 44

32 Outline Introduction Complete case Matrix completion Experiments Simulated data Collaborative filtering examples Conclusion and perspectives F. Caron 31 / 44

33 Simulated data Procedure We generate matrices Z = AB T of low rank q where A and B are Gaussian matrices of size m q and n q, m = n = 100 and add Gaussian noise with σ = 1. var(z) The signal to noise ratio is defined as SNR =. σ 2 For the HASP penalty, we set a = λβ and b = β. Grid of 50 values of the regularization parameter λ Metric err = Ẑ Z 2 F Z 2 F and err Ω = P Ω (Ẑ) P Ω (Z) 2 F P Ω (Z) 2 F F. Caron 32 / 44

34 Simulated data Complete case Relative error Rank ST HT HAST β = 100 HAST β = 10 HAST β = 1 (a) SNR=1; Complete; rank=10 Figure: Test error w.r.t. the rank obtained by varying the value of the regularization parameter λ. The HASP penalty provides a bridge/tradeoff between the nuclear norm and the rank penalty. For example, value of β = 10 show a minimum at the true rank q = 10 as HT, but with a lower error when the rank is overestimated. F. Caron 33 / 44

35 Simulated data Incomplete case Test error Rank MMMF SoftImp SoftImp+ HardImp HASI β = 100 HASI β = 10 HASI β = 1 (a) SNR=1; 50% missing; rank=5 Test error Rank MMMF SoftImp SoftImp+ HardImp HASI β = 100 HASI β = 10 HASI β = 1 (b) SNR=10; 80% missing; rank=5 Figure: Test error w.r.t. the rank obtained by varying the value of the regularization parameter λ, averaged over 50 replications. Similar behavior is observed, with the HASI algorithm attaining a minimum at the true rank q = 5. F. Caron 34 / 44

36 Simulated data Incomplete case We then remove 20% of the observed entries as a validation set to estimate the regularization parameters. We use the unobserved entries as a test set. MMMF SoftImp SoftImp+ HardImp HASI Test error (a) SNR=1; 50% miss Rank (b) SNR=1; 50% miss Test error (c) SNR=10; 80% miss Rank (d) SNR=10; 80% miss. Figure: Boxplots of the test error and ranks obtained over 50 replications. F. Caron 35 / 44

37 Collaborative filtering examples (Jester) Procedure We randomly select two ratings per user as a test set, and two other ratings per user as a validation set to select the parameters λ and β. The results are computed over four values β = 1000, 100, 10, 1. We compare the results of the different methods with the Normalized Mean Absolute Error (NMAE) NMAE = 1 card(ω test) (i,j) Ω test X ij Ẑij max(x) min(x) F. Caron 36 / 44

38 Collaborative filtering examples (Jester) Table: Results on the Jester datasets, averaged over 10 replications Jester 1 Jester 2 Jester % miss. 27.3% miss. 75.3% miss. Method NMAE Rank NMAE Rank NMAE Rank MMMF Soft Imp Soft Imp Hard Imp HASI F. Caron 37 / 44

39 Collaborative filtering examples (Jester) Test error MMMF SoftImp SoftImp+ HardImp HASI β = 1000 HASI β = 100 HASI β = 10 HASI β = 1 Test error MMMF SoftImp SoftImp+ HardImp HASI β = 1000 HASI β = 100 HASI β = 10 HASI β = Rank (a) Jester Rank (b) Jester 3 Figure: NMAE w.r.t. the rank obtained by varying the regularization parameter λ. F. Caron 38 / 44

40 Collaborative filtering examples (MovieLens) Procedure We randomly select 20% of the entries as a test set, and the remaining entries are split between a training set (80%) and a validation set (20%). For all the methods, we stop the regularization path as soon as the estimated rank exceeds r max = 100. For the larger MovieLens 1M dataset, the precision, maximum number of iterations and maximum rank are decreased to ɛ = 10 6, t max = 100 and r max = 30. F. Caron 39 / 44

41 Collaborative filtering examples (MovieLens) Table: Results on the MovieLens datasets, averaged over 5 replications MovieLens 100k MovieLens 1M % miss. 95.8% miss. Method NMAE Rank NMAE Rank MMMF Soft Imp Soft Imp Hard Imp HASI F. Caron 40 / 44

42 Outline Introduction Complete case Matrix completion Experiments Conclusion and perspectives F. Caron 41 / 44

43 Conclusion and perspectives Conclusion: Good results compared to several alternative low rank matrix completion methods. Bridge between nuclear norm and rank regularization algorithms. Can be extended to binary matrices Non-convex optimization, but experiments show that initializing the algorithm with the Soft-Impute algorithm provides very satisfactory results. Matlab code available online Perspectives: Fully Bayesian approach Larger datasets F. Caron 42 / 44

44 Bibliography I Cai, J., Candès, E., and Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4): Candès, E. and Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6): Candès, E. J. and Tao, T. (2010). The power of convex relaxation: Near-optimal matrix completion. Information Theory, IEEE Transactions on, 56(5): Fazel, M. (2002). Matrix rank minimization with applications. PhD thesis, Stanford University. Gaïffas, S. and Lecué, G. (2011). Weighted algorithms for compressed sensing and matrix completion. arxiv preprint arxiv: Larsen, R. M. (2004). Propack-software for large and sparse svd calculations. Available online. URL stanford. edu/rmunk/propack. F. Caron 43 / 44

45 Bibliography II Mazumder, R., Hastie, T., and Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. The Journal of Machine Learning Research, 11: Todeschini, A., Caron, F., and Chavent, M. (2013). Probabilistic low-rank matrix completion with adaptive spectral regularization algorithms. In Advances in Neural Information Processing Systems, pages F. Caron 44 / 44

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms

Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms Adrien Todeschini Inria Bordeaux JdS 2014, Rennes Aug. 2014 Joint work with François Caron (Univ. Oxford), Marie

More information

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison

Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Low-rank Matrix Completion with Noisy Observations: a Quantitative Comparison Raghunandan H. Keshavan, Andrea Montanari and Sewoong Oh Electrical Engineering and Statistics Department Stanford University,

More information

Lecture 9: September 28

Lecture 9: September 28 0-725/36-725: Convex Optimization Fall 206 Lecturer: Ryan Tibshirani Lecture 9: September 28 Scribes: Yiming Wu, Ye Yuan, Zhihao Li Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These

More information

Binary matrix completion

Binary matrix completion Binary matrix completion Yaniv Plan University of Michigan SAMSI, LDHD workshop, 2013 Joint work with (a) Mark Davenport (b) Ewout van den Berg (c) Mary Wootters Yaniv Plan (U. Mich.) Binary matrix completion

More information

Lecture Notes 10: Matrix Factorization

Lecture Notes 10: Matrix Factorization Optimization-based data analysis Fall 207 Lecture Notes 0: Matrix Factorization Low-rank models. Rank- model Consider the problem of modeling a quantity y[i, j] that depends on two indices i and j. To

More information

Adaptive one-bit matrix completion

Adaptive one-bit matrix completion Adaptive one-bit matrix completion Joseph Salmon Télécom Paristech, Institut Mines-Télécom Joint work with Jean Lafond (Télécom Paristech) Olga Klopp (Crest / MODAL X, Université Paris Ouest) Éric Moulines

More information

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products

A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products A Scalable Spectral Relaxation Approach to Matrix Completion via Kronecker Products Hui Zhao Jiuqiang Han Naiyan Wang Congfu Xu Zhihua Zhang Department of Automation, Xi an Jiaotong University, Xi an,

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

Machine Learning for Disease Progression

Machine Learning for Disease Progression Machine Learning for Disease Progression Yong Deng Department of Materials Science & Engineering yongdeng@stanford.edu Xuxin Huang Department of Applied Physics xxhuang@stanford.edu Guanyang Wang Department

More information

Spectral Regularization Algorithms for Learning Large Incomplete Matrices

Spectral Regularization Algorithms for Learning Large Incomplete Matrices Spectral Regularization Algorithms for Learning Large Incomplete Matrices Rahul Mazumder Department of Statistics Stanford University Trevor Hastie Statistics Department and Department of Health, Research

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Spectral k-support Norm Regularization

Spectral k-support Norm Regularization Spectral k-support Norm Regularization Andrew McDonald Department of Computer Science, UCL (Joint work with Massimiliano Pontil and Dimitris Stamos) 25 March, 2015 1 / 19 Problem: Matrix Completion Goal:

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-6) Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization Yang Song and Jun Zhu Department of Physics, Tsinghua

More information

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University

Matrix completion: Fundamental limits and efficient algorithms. Sewoong Oh Stanford University Matrix completion: Fundamental limits and efficient algorithms Sewoong Oh Stanford University 1 / 35 Low-rank matrix completion Low-rank Data Matrix Sparse Sampled Matrix Complete the matrix from small

More information

Algorithms for Collaborative Filtering

Algorithms for Collaborative Filtering Algorithms for Collaborative Filtering or How to Get Half Way to Winning $1million from Netflix Todd Lipcon Advisor: Prof. Philip Klein The Real-World Problem E-commerce sites would like to make personalized

More information

Decoupled Collaborative Ranking

Decoupled Collaborative Ranking Decoupled Collaborative Ranking Jun Hu, Ping Li April 24, 2017 Jun Hu, Ping Li WWW2017 April 24, 2017 1 / 36 Recommender Systems Recommendation system is an information filtering technique, which provides

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

arxiv: v1 [stat.ml] 1 Mar 2015

arxiv: v1 [stat.ml] 1 Mar 2015 Matrix Completion with Noisy Entries and Outliers Raymond K. W. Wong 1 and Thomas C. M. Lee 2 arxiv:1503.00214v1 [stat.ml] 1 Mar 2015 1 Department of Statistics, Iowa State University 2 Department of Statistics,

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Matrix Completion from a Few Entries

Matrix Completion from a Few Entries Matrix Completion from a Few Entries Raghunandan H. Keshavan and Sewoong Oh EE Department Stanford University, Stanford, CA 9434 Andrea Montanari EE and Statistics Departments Stanford University, Stanford,

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

Recommender systems, matrix factorization, variable selection and social graph data

Recommender systems, matrix factorization, variable selection and social graph data Recommender systems, matrix factorization, variable selection and social graph data Julien Delporte & Stéphane Canu stephane.canu@litislab.eu StatLearn, april 205, Grenoble Road map Model selection for

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Andriy Mnih and Ruslan Salakhutdinov

Andriy Mnih and Ruslan Salakhutdinov MATRIX FACTORIZATION METHODS FOR COLLABORATIVE FILTERING Andriy Mnih and Ruslan Salakhutdinov University of Toronto, Machine Learning Group 1 What is collaborative filtering? The goal of collaborative

More information

Matrix Completion: Fundamental Limits and Efficient Algorithms

Matrix Completion: Fundamental Limits and Efficient Algorithms Matrix Completion: Fundamental Limits and Efficient Algorithms Sewoong Oh PhD Defense Stanford University July 23, 2010 1 / 33 Matrix completion Find the missing entries in a huge data matrix 2 / 33 Example

More information

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion

An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion An Extended Frank-Wolfe Method, with Application to Low-Rank Matrix Completion Robert M. Freund, MIT joint with Paul Grigas (UC Berkeley) and Rahul Mazumder (MIT) CDC, December 2016 1 Outline of Topics

More information

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research

More information

Joint Capped Norms Minimization for Robust Matrix Recovery

Joint Capped Norms Minimization for Robust Matrix Recovery Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-7) Joint Capped Norms Minimization for Robust Matrix Recovery Feiping Nie, Zhouyuan Huo 2, Heng Huang 2

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Lecture 1: September 25

Lecture 1: September 25 0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

1-Bit Matrix Completion

1-Bit Matrix Completion 1-Bit Matrix Completion Mark A. Davenport School of Electrical and Computer Engineering Georgia Institute of Technology Yaniv Plan Mary Wootters Ewout van den Berg Matrix Completion d When is it possible

More information

Robust Matrix Completion via Joint Schatten p-norm and l p -Norm Minimization

Robust Matrix Completion via Joint Schatten p-norm and l p -Norm Minimization 0 IEEE th International Conference on Data Mining Robust Matrix Completion via Joint Schatten p-norm and l p -Norm Minimization Feiping Nie, Hua Wang,XiaoCai, Heng Huang and Chris Ding Department of Computer

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Generalized Linear Models in Collaborative Filtering

Generalized Linear Models in Collaborative Filtering Hao Wu CME 323, Spring 2016 WUHAO@STANFORD.EDU Abstract This study presents a distributed implementation of the collaborative filtering method based on generalized linear models. The algorithm is based

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

High-dimensional Statistical Models

High-dimensional Statistical Models High-dimensional Statistical Models Pradeep Ravikumar UT Austin MLSS 2014 1 Curse of Dimensionality Statistical Learning: Given n observations from p(x; θ ), where θ R p, recover signal/parameter θ. For

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Low Rank Matrix Completion Formulation and Algorithm

Low Rank Matrix Completion Formulation and Algorithm 1 2 Low Rank Matrix Completion and Algorithm Jian Zhang Department of Computer Science, ETH Zurich zhangjianthu@gmail.com March 25, 2014 Movie Rating 1 2 Critic A 5 5 Critic B 6 5 Jian 9 8 Kind Guy B 9

More information

Breaking the Limits of Subspace Inference

Breaking the Limits of Subspace Inference Breaking the Limits of Subspace Inference Claudia R. Solís-Lemus, Daniel L. Pimentel-Alarcón Emory University, Georgia State University Abstract Inferring low-dimensional subspaces that describe high-dimensional,

More information

Restricted Boltzmann Machines for Collaborative Filtering

Restricted Boltzmann Machines for Collaborative Filtering Restricted Boltzmann Machines for Collaborative Filtering Authors: Ruslan Salakhutdinov Andriy Mnih Geoffrey Hinton Benjamin Schwehn Presentation by: Ioan Stanculescu 1 Overview The Netflix prize problem

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

A Simple Algorithm for Nuclear Norm Regularized Problems

A Simple Algorithm for Nuclear Norm Regularized Problems A Simple Algorithm for Nuclear Norm Regularized Problems ICML 00 Martin Jaggi, Marek Sulovský ETH Zurich Matrix Factorizations for recommender systems Y = Customer Movie UV T = u () The Netflix challenge:

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Probabilistic low-rank matrix completion on finite alphabets

Probabilistic low-rank matrix completion on finite alphabets Probabilistic low-rank matrix completion on finite alphabets Jean Lafond Institut Mines-Télécom Télécom ParisTech CNRS LTCI jean.lafond@telecom-paristech.fr Olga Klopp CREST et MODAL X Université Paris

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning

CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning CS242: Probabilistic Graphical Models Lecture 4A: MAP Estimation & Graph Structure Learning Professor Erik Sudderth Brown University Computer Science October 4, 2016 Some figures and materials courtesy

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS

MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS MATRIX RECOVERY FROM QUANTIZED AND CORRUPTED MEASUREMENTS Andrew S. Lan 1, Christoph Studer 2, and Richard G. Baraniuk 1 1 Rice University; e-mail: {sl29, richb}@rice.edu 2 Cornell University; e-mail:

More information

Matrix Completion from Fewer Entries

Matrix Completion from Fewer Entries from Fewer Entries Stanford University March 30, 2009 Outline The problem, a look at the data, and some results (slides) 2 Proofs (blackboard) arxiv:090.350 The problem, a look at the data, and some results

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1 Introduction to Machine Learning Introduction to ML - TAU 2016/7 1 Course Administration Lecturers: Amir Globerson (gamir@post.tau.ac.il) Yishay Mansour (Mansour@tau.ac.il) Teaching Assistance: Regev Schweiger

More information

Bias-free Sparse Regression with Guaranteed Consistency

Bias-free Sparse Regression with Guaranteed Consistency Bias-free Sparse Regression with Guaranteed Consistency Wotao Yin (UCLA Math) joint with: Stanley Osher, Ming Yan (UCLA) Feng Ruan, Jiechao Xiong, Yuan Yao (Peking U) UC Riverside, STATS Department March

More information

Maximum Margin Matrix Factorization

Maximum Margin Matrix Factorization Maximum Margin Matrix Factorization Nati Srebro Toyota Technological Institute Chicago Joint work with Noga Alon Tel-Aviv Yonatan Amit Hebrew U Alex d Aspremont Princeton Michael Fink Hebrew U Tommi Jaakkola

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via convex relaxations Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

Learning Task Grouping and Overlap in Multi-Task Learning

Learning Task Grouping and Overlap in Multi-Task Learning Learning Task Grouping and Overlap in Multi-Task Learning Abhishek Kumar Hal Daumé III Department of Computer Science University of Mayland, College Park 20 May 2013 Proceedings of the 29 th International

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Dimension Reduction Methods

Dimension Reduction Methods Dimension Reduction Methods And Bayesian Machine Learning Marek Petrik 2/28 Previously in Machine Learning How to choose the right features if we have (too) many options Methods: 1. Subset selection 2.

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Junior Conference on Data Science 2016 Université Paris Saclay, 15-16 September 2016 Introduction: Matrix Completion

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2017 Notes on Lecture the most technical lecture of the course includes some scary looking math, but typically with intuitive interpretation use of standard machine

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28

Sparsity Models. Tong Zhang. Rutgers University. T. Zhang (Rutgers) Sparsity Models 1 / 28 Sparsity Models Tong Zhang Rutgers University T. Zhang (Rutgers) Sparsity Models 1 / 28 Topics Standard sparse regression model algorithms: convex relaxation and greedy algorithm sparse recovery analysis:

More information

Scalable Completion of Nonnegative Matrices with the Separable Structure

Scalable Completion of Nonnegative Matrices with the Separable Structure Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-6) Scalable Completion of Nonnegative Matrices with the Separable Structure Xiyu Yu, Wei Bian, Dacheng Tao Center for Quantum

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Homework 1. Yuan Yao. September 18, 2011

Homework 1. Yuan Yao. September 18, 2011 Homework 1 Yuan Yao September 18, 2011 1. Singular Value Decomposition: The goal of this exercise is to refresh your memory about the singular value decomposition and matrix norms. A good reference to

More information

Overview. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Overview. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Overview Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 1/25/2016 Sparsity Denoising Regression Inverse problems Low-rank models Matrix completion

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information