Divide-and-Conquer Matrix Factorization

Size: px

Start display at page:

Download "Divide-and-Conquer Matrix Factorization"

Derrick Peters
5 years ago
Views:

1 Divide-and-Conquer Matrix Factorization Lester Mackey Collaborators: Ameet Talwalkar Michael I. Jordan Stanford University UCLA UC Berkeley December 14, 2015 Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

2 Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 R m n given a subset of its entries?? ??...? ? 5? Examples Collaborative filtering: How will user i rate movie j? Netflix: 40 million users, 200K movies and television shows Ranking on the web: Is URL j relevant to user i? Google News: millions of articles, 1 billion users Link prediction: Is user i friends with user j? Facebook: 1.5 billion users Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

3 Introduction Motivation: Large-scale Matrix Completion Goal: Estimate a matrix L 0 R m n given a subset of its entries?? ??...? ? 5? State of the art MC algorithms Strong estimation guarantees Plagued by expensive subroutines (e.g., truncated SVD) This talk Present divide and conquer approaches for scaling up any MC algorithm while maintaining strong estimation guarantees Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

4 Matrix Completion Exact Matrix Completion Background Goal: Estimate a matrix L 0 R m n given a subset of its entries Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

5 Matrix Completion Noisy Matrix Completion Background Goal: Given entries from a matrix M = L 0 + Z R m n where Z is entrywise noise and L 0 has rank r m, n, estimate L 0 Good news: L 0 has (m + n)r mn degrees of freedom L 0 = A B Factored form: AB for A R m r and B R n r Bad news: Not all low-rank matrices can be recovered Question: What can go wrong? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

6 Matrix Completion What can go wrong? Background Entire column missing 1 2? ? ? No hope of recovery! Solution: Uniform observation model Assume that the set of s observed entries Ω is drawn uniformly at random: Ω Unif(m, n, s) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

7 Matrix Completion What can go wrong? Background Bad spread of information 1 L = 0 [ 1 ][ ] = Can only recover L if L 11 is observed Solution: Incoherence with standard basis (Candès and Recht, 2009) A matrix L = UΣV R m n with rank(l) = r is incoherent if { max i UU e i 2 µr/m Singular vectors are not too skewed: max i VV e i 2 µr/n µr and not too cross-correlated: UV mn (In this literature, it s good to be incoherent) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

8 Matrix Completion How do we estimate L 0? Background First attempt: minimize A rank(a) subject to (i,j) Ω (A ij M ij ) 2 2. Problem: Computationally intractable! Solution: Solve convex relaxation (Fazel, Hindi, and Boyd, 2001; Candès and Plan, 2010) minimize A A subject to (i,j) Ω (A ij M ij ) 2 2 where A = k σ k(a) is the trace/nuclear norm of A. Questions: Will the nuclear norm heuristic successfully recover L 0? Can nuclear norm minimization scale to large MC problems? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

9 Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it work? Yes, with high probability. Typical Theorem If L 0 with rank r is incoherent, s rn log 2 (n) entries of M R m n are observed uniformly at random, and ˆL solves the noisy nuclear norm heuristic, then ˆL L 0 F f(m, n) with high probability when M L 0 F. See Candès and Plan (2010); Mackey, Talwalkar, and Jordan (2014b); Keshavan, Montanari, and Oh (2010); Negahban and Wainwright (2010) Implies exact recovery in the noiseless setting ( = 0) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

10 Matrix Completion Background Noisy Nuclear Norm Heuristic: Does it scale? Not quite... Standard interior point methods (Candès and Recht, 2009): O( Ω (m + n) 3 + Ω 2 (m + n) 2 + Ω 3 ) More efficient, tailored algorithms: Singular Value Thresholding (SVT) (Cai, Candès, and Shen, 2010) Augmented Lagrange Multiplier (ALM) (Lin, Chen, Wu, and Ma, 2009a) Accelerated Proximal Gradient (APG) (Toh and Yun, 2010) All require rank-k truncated SVD on every iteration Take away: These provably accurate MC algorithms are too expensive for large-scale or real-time matrix completion Question: How can we scale up a given matrix completion algorithm and still retain estimation guarantees? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

11 Matrix Completion DFC Divide-Factor-Combine (DFC) Our Solution: Divide and conquer 1 Divide M into submatrices. 2 Factor each submatrix in parallel. 3 Combine submatrix estimates to estimate L 0. Advantages Submatrix completion is often much cheaper than completing M Multiple submatrix completions can be carried out in parallel DFC works with any base MC algorithm With the right choice of division and recombination, yields estimation guarantees comparable to those of the base algorithm Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

12 Matrix Completion DFC DFC-Proj: Partition and Project 1 Randomly partition M into t column submatrices M = [ C 1 C 2 C t ] where each Ci R m l 2 Complete the submatrices in parallel to obtain [Ĉ1 Ĉ 2 Ĉ t ] Reduced cost: Expect t-fold speed-up per iteration Parallel computation: Pay cost of one cheaper MC 3 Project submatrices onto a single low-dimensional column space Estimate column space of L 0 with column space of Ĉ1 ˆL proj ] = Ĉ1Ĉ+ 1 [Ĉ1 Ĉ 2 Ĉ t Common technique for randomized low-rank approximation (Frieze, Kannan, and Vempala, 1998) Minimal cost: O(mk 2 + lk 2 ) where k = rank(ˆl proj ) 4 Ensemble: Project onto column space of each Ĉj and average Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

13 Matrix Completion DFC: Does it work? DFC Yes, with high probability. Theorem (Mackey, Talwalkar, and Jordan, 2014b) If L 0 with rank r is incoherent and s = ω(r 2 n log 2 (n)/ɛ 2 ) entries of M R m n are observed uniformly at random, then l = o(n) random columns suffice to have ˆL proj L 0 F (2 + ɛ)f(m, n) with high probability when M L 0 F and the noisy nuclear norm heuristic is used as a base algorithm. Can sample vanishingly small fraction of columns (l/n 0) Implies exact recovery for noiseless ( = 0) setting Analysis streamlined by matrix Bernstein inequality Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

14 Matrix Completion DFC: Does it work? DFC Yes, with high probability. Proof Ideas: 1 If L 0 is incoherent (has good spread of information), its partitioned submatrices are incoherent w.h.p. 2 Each submatrix has sufficiently many observed entries w.h.p. Submatrix completion succeeds 3 Random submatrix captures the full column space of L 0 w.h.p. Analysis builds on randomized l 2 regression work of Drineas, Mahoney, and Muthukrishnan (2008) Column projection succeeds Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

15 Matrix Completion DFC Noisy Recovery Error Simulations MC Proj 10% Proj Ens 10% Base MC RMSE % revealed entries Figure : Recovery error of DFC relative to base algorithm (APG) with m = 10K and r = 10. Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

16 DFC Speed-up Matrix Completion Simulations Proj 10% Proj Ens 10% Base MC MC time (s) m x 10 4 Figure : Speed-up over base algorithm (APG) for random matrices with r = 0.001m and 4% of entries revealed. Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

17 Matrix Completion Application: Collaborative filtering CF Task: Given a sparsely observed matrix of user-item ratings, predict the unobserved ratings Issues Full-rank rating matrix Noisy, non-uniform observations The Data Netflix Prize Dataset million ratings in {1,..., 5} 17,770 movies, 480,189 users 1 Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

18 Matrix Completion Application: Collaborative filtering CF Task: Predict unobserved user-item ratings Netflix Method RMSE Time APG s DFC-Proj-25% s DFC-Proj-10% s DFC-Proj-Ens-25% s DFC-Proj-Ens-10% s Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

19 Robust Matrix Factorization Background Robust Matrix Factorization Goal: Given a matrix M = L 0 + S 0 + Z where L 0 is low-rank, S 0 is sparse, and Z is entrywise noise, recover L 0 (Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009; Candès, Li, Ma, and Wright, 2011; Zhou, Li, Wright, Candès, and Ma, 2010) Examples: Background modeling/foreground activity detection M L 0 S (Candès, Li, Ma, and Wright, 2011) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

20 Robust Matrix Factorization Background Robust Matrix Factorization Goal: Given a matrix M = L 0 + S 0 + Z where L 0 is low-rank, S 0 is sparse, and Z is entrywise noise, recover L 0 (Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009; Candès, Li, Ma, and Wright, 2011; Zhou, Li, Wright, Candès, and Ma, 2010) M L 0 S 0 S 0 can be viewed as an outlier/gross corruption matrix Ordinary PCA breaks down in this setting Harder than MC: outlier locations are unknown More expensive than MC: dense, fully observed matrices Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

21 Robust Matrix Factorization How do we recover L 0? Background First attempt: minimize L,S rank(l) + λ card(s) subject to M L S F. Problem: Computationally intractable! Solution: Convex relaxation minimize L,S L + λ S 1 subject to M L S F. where S 1 = ij S ij is the l 1 entrywise norm of S. Question: Does it work? Will noisy Principal Component Pursuit (PCP) recover L 0? Question: Is it efficient? Can noisy PCP scale to large RMF problems? Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

22 Robust Matrix Factorization Background Noisy Principal Component Pursuit: Does it work? Yes, with high probability. Theorem (Zhou, Li, Wright, Candès, and Ma, 2010) If L 0 with rank r is incoherent, and S 0 R m n contains s non-zero entries with uniformly distributed locations, then if the minimizer to the problem with λ = 1/ n satisfies r = O ( m/ log 2 n ) and s c mn, minimize L,S L + λ S 1 subject to M L S F. ˆL L 0 F f(m, n) with high probability when M L 0 S 0 F. See also Agarwal, Negahban, and Wainwright (2011) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

23 Robust Matrix Factorization Background Noisy Principal Component Pursuit: Is it efficient? Not quite... Standard interior point methods: O(n 6 ) (Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009) More efficient, tailored algorithms: Accelerated Proximal Gradient (APG) (Lin, Ganesh, Wright, Wu, Chen, and Ma, 2009b) Augmented Lagrange Multiplier (ALM) (Lin, Chen, Wu, and Ma, 2009a) Require rank-k truncated SVD on every iteration Best case SVD(m, n, k) = O(mnk) Idea: Leverage the divide-and-conquer techniques developed for MC in the RMF setting Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

24 Robust Matrix Factorization DFC: Does it work? Background Yes, with high probability. Theorem (Mackey, Talwalkar, and Jordan, 2014b) If L 0 with rank r is incoherent, and S 0 R m n contains s c mn non-zero entries with uniformly distributed locations, then ( r 2 log 2 ) (n) l = O random columns suffice to have ɛ 2 ˆL proj L 0 F (2 + ɛ)f(m, n) with high probability when M L 0 S 0 F and noisy principal component pursuit is used as the base algorithm. Can sample polylogarithmic number of columns Implies exact recovery for noiseless ( = 0) setting Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

25 Robust Matrix Factorization DFC Estimation Error Simulations 0.25 RMF RMSE Proj 10% Proj Ens 10% Base RMF % of outliers Figure : Estimation error of DFC and base algorithm (APG) with m = 1K and r = 10. Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

26 DFC Speed-up Robust Matrix Factorization Simulations RMF Proj 10% Proj Ens 10% Base RMF time (s) m Figure : Speed-up over base algorithm (APG) for random matrices with r = 0.01m and 10% of entries corrupted. Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

Robust Matrix Factorization Video Application: Video

matrix M Decompose M into stationary background L 0 and moving

Foreground corruption is often clustered, not uniform Mackey

27 Robust Matrix Factorization Video Application: Video background modeling Task Each video frame forms one column of matrix M Decompose M into stationary background L 0 and moving foreground objects S 0 M L 0 S 0 Challenges Video is noisy Foreground corruption is often clustered, not uniform Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

28 Robust Matrix Factorization Video Application: Video background modeling Example: Significant foreground variation Specs 1 minute of airport surveillance (Li, Huang, Gu, and Tian, 2004) 1000 frames, pixels Base algorithm: half an hour DFC: 7 minutes Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

29 Robust Matrix Factorization Video Application: Video background modeling Example: Changes in illumination Specs 1.5 minutes of lobby surveillance (Li, Huang, Gu, and Tian, 2004) 1546 frames, pixels Base algorithm: 1.5 hours DFC: 8 minutes Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

30 Future Directions Future Directions New Applications and Datasets Practical problems with large-scale or real-time requirements Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

31 Future Directions Example: Large-scale Affinity Estimation Goal: Estimate semantic similarity between pairs of datapoints Motivation: Assign class labels to datapoints based on similarity Examples from computer vision Image tagging: tree vs. firefighter vs. Tony Blair Video / multimedia content detection: wedding vs. concert Face clustering: Application: Content detection, 9K YouTube videos, 20 classes Baseline: Low Rank Representation (Liu, Lin, and Yu, 2010) Strong guarantees but 1.5 days to run Divide and conquer (Talwalkar, Mackey, Mu, Chang, and Jordan, 2013) Comparable guarantees Comparable performance in 1 hour (5 subproblems) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

32 Future Directions Future Directions New Applications and Datasets Practical problems with large-scale or real-time requirements New Divide-and-Conquer Strategies Other ways to reduce computation while preserving accuracy Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

33 Future Directions DFC-Nys: Generalized Nyström Decomposition 1 Choose a random column submatrix C R m l and a random row submatrix R R d n from M. Call their intersection W. [ ] [ ] W M12 W M = C = R = [W M M 21 M 22 M 12 ] 21 2 Recover the low rank components of C and R in parallel to obtain Ĉ and ˆR 3 Recover L 0 from Ĉ, ˆR, and their intersection Ŵ ˆL nys = ĈŴ+ ˆR Generalized Nyström method (Goreinov, Tyrtyshnikov, and Zamarashkin, 1997) Minimal cost: O(mk 2 + lk 2 + dk 2 ) where k = rank(ˆl nys ) 4 Ensemble: Run p times in parallel and average estimates Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

34 Future Directions Future Directions New Applications and Datasets Practical problems with large-scale or real-time requirements New Divide-and-Conquer Strategies Other ways to reduce computation while preserving accuracy More extensive use of ensembling New Theory Analyze statistical implications of divide and conquer algorithms Trade-off between statistical and computational efficiency Impact of ensembling Developing suite of matrix concentration inequalities to aid in the analysis of randomized algorithms with matrix data Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

35 Future Directions Concentration Inequalities Matrix concentration P{ X E X t} δ P{λ max (X E X) t} δ Non-asymptotic control of random matrices with complex distributions Applications Matrix completion from sparse random measurements (Gross, 2011; Recht, 2011; Negahban and Wainwright, 2010; Mackey, Talwalkar, and Jordan, 2014b) Randomized matrix multiplication and factorization (Drineas, Mahoney, and Muthukrishnan, 2008; Hsu, Kakade, and Zhang, 2011) Convex relaxation of robust or chance-constrained optimization (Nemirovski, 2007; So, 2011; Cheung, So, and Wang, 2011) Random graph analysis (Christofides and Markström, 2008; Oliveira, 2009) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

36 Future Directions Concentration Inequalities Matrix concentration P{λ max (X E X) t} δ Difficulty: Matrix multiplication is not commutative e X+Y e X e Y e Y e X Past approaches (Ahlswede and Winter, 2002; Oliveira, 2009; Tropp, 2011) Rely on deep results from matrix analysis Apply to sums of independent matrices and matrix martingales Our work (Mackey, Jordan, Chen, Farrell, and Tropp, 2014a; Paulin, Mackey, and Tropp, 2015) Stein s method of exchangeable pairs (1972), as advanced by Chatterjee (2007) for scalar concentration Improved exponential tail inequalities (Hoeffding, Bernstein, Bounded differences) Polynomial moment inequalities (Khintchine, Rosenthal) Dependent sums and more general matrix functionals Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

37 Future Directions Example: Matrix Bounded Differences Inequality Corollary (Paulin, Mackey, and Tropp, 2015) Suppose Z = (Z 1,..., Z n ) has independent coordinates, and ( H(z1,..., z j,..., z n ) H(z 1,..., z j,..., z n ) ) 2 A 2 j for all j and values z 1,..., z n, z j. Define the boundedness parameter σ 2 := n. If each A j is d d, then, for all t 0, j=1 A2 j P{λ max (H(Z) E H(Z)) t} d e t2 /(2σ 2). Improves prior results in the literature (e.g., Tropp, 2011) Useful for analyzing Multiclass classifier performance (Machart and Ralaivola, 2012) Crowdsourcing accuracy (Dalvi, Dasgupta, Kumar, and Rastogi, 2013) Convergence in non-differentiable optimization (Zhou and Hu, 2014) Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

38 Future Directions Future Directions New Applications and Datasets Practical problems with large-scale or real-time requirements New Divide-and-Conquer Strategies Other ways to reduce computation while preserving accuracy More extensive use of ensembling New Theory Analyze statistical implications of divide and conquer algorithms Trade-off between statistical and computational efficiency Impact of ensembling Developing suite of matrix concentration inequalities to aid in the analysis of randomized algorithms with matrix data Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

39 The End Future Directions Thanks! Divide Factor Combine (Project) P Ω (M) P Ω (C 1 ) P Ω (C 2 ) P Ω (C t ) Ĉ 1 Ĉ ˆLproj 2 Ĉ t Divide Factor Combine (Nyström) P Ω (M) P Ω (C) P Ω (R) Ĉ ˆR ˆLnys Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

40 References I Future Directions Agarwal, A., Negahban, S., and Wainwright, M. J. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. In International Conference on Machine Learning, Ahlswede, R. and Winter, A. Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory, 48(3): , Mar Cai, J. F., Candès, E. J., and Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), Candès, E. J. and Recht, B. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9 (6): , Candès, E. J., Li, X., Ma, Y., and Wright, J. Robust principal component analysis? Journal of the ACM, 58(3):1 37, Candès, E.J. and Plan, Y. Matrix completion with noise. Proceedings of the IEEE, 98(6): , Chandrasekaran, V., Sanghavi, S., Parrilo, P. A., and Willsky, A. S. Sparse and low-rank matrix decompositions. In Allerton Conference on Communication, Control, and Computing, Chandrasekaran, V., Parrilo, P. A., and Willsky, A. S. Latent variable graphical model selection via convex optimization. preprint, Chatterjee, S. Stein s method for concentration inequalities. Probab. Theory Related Fields, 138: , Cheung, S.-S., So, A. Man-Cho, and Wang, K. Chance-constrained linear matrix inequalities with dependent perturbations: a safe tractable approximation approach. Available at Christofides, D. and Markström, K. Expansion properties of random cayley graphs and vertex transitive graphs via matrix martingales. Random Struct. Algorithms, 32(1):88 100, Dalvi, N., Dasgupta, A., Kumar, R., and Rastogi, V. Aggregating crowdsourced binary ratings. In Proceedings of the 22Nd International Conference on World Wide Web, WWW 13, pp , Republic and Canton of Geneva, Switzerland, Drineas, P., Mahoney, M. W., and Muthukrishnan, S. Relative-error CUR matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30: , Fazel, M., Hindi, H., and Boyd, S. P. A rank minimization heuristic with application to minimum order system approximation. In In Proceedings of the 2001 American Control Conference, pp , Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

41 References II Future Directions Frieze, A., Kannan, R., and Vempala, S. Fast Monte-Carlo algorithms for finding low-rank approximations. In Foundations of Computer Science, Goreinov, S. A., Tyrtyshnikov, E. E., and Zamarashkin, N. L. A theory of pseudoskeleton approximations. Linear Algebra and its Applications, 261(1-3):1 21, Gross, D. Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inform. Theory, 57(3): , Mar Hsu, D., Kakade, S. M., and Zhang, T. Dimension-free tail inequalities for sums of random matrices. Available at arxiv: , Keshavan, R. H., Montanari, A., and Oh, S. Matrix completion from noisy entries. Journal of Machine Learning Research, 99: , Li, L., Huang, W., Gu, I. Y. H., and Tian, Q. Statistical modeling of complex backgrounds for foreground object detection. IEEE Transactions on Image Processing, 13(11): , Lin, Z., Chen, M., Wu, L., and Ma, Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. UIUC Technical Report UILU-ENG , 2009a. Lin, Z., Ganesh, A., Wright, J., Wu, L., Chen, M., and Ma, Y. Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. UIUC Technical Report UILU-ENG , 2009b. Liu, G., Lin, Z., and Yu, Y. Robust subspace segmentation by low-rank representation. In International Conference on Machine Learning, Machart, P. and Ralaivola, L. Confusion Matrix Stability Bounds for Multiclass Classification. Available at February Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B., and Tropp, J. A. Matrix concentration inequalities via the method of exchangeable pairs. The Annals of Probability, 42(3): , 2014a. Mackey, L., Talwalkar, A., and Jordan, M. I. Distributed matrix completion and robust factorization. Journal of Machine Learning Research, 2014b. In press. Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

42 References III Future Directions Min, K., Zhang, Z., Wright, J., and Ma, Y. Decomposing background topics from keywords by principal component pursuit. In Conference on Information and Knowledge Management, Negahban, S. and Wainwright, M. J. Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. arxiv: v2[cs.it], Nemirovski, A. Sums of random symmetric matrices and quadratic optimization under orthogonality constraints. Math. Program., 109: , January ISSN doi: /s URL Oliveira, R. I. Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Available at arxiv: , Nov Paulin, D., Mackey, L., and Tropp, J. A. Efron-Stein Inequalities for Random Matrices. The Annals of Probability, to appear Recht, B. Simpler approach to matrix completion. J. Mach. Learn. Res., 12: , So, A. Man-Cho. Moment inequalities for sums of random matrices and their applications in optimization. Math. Program., 130 (1): , Stein, C. A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proc. 6th Berkeley Symp. Math. Statist. Probab., Berkeley, Univ. California Press. Talwalkar, Ameet, Mackey, Lester, Mu, Yadong, Chang, Shih-Fu, and Jordan, Michael I. Distributed low-rank subspace segmentation. December Toh, K. and Yun, S. An accelerated proximal gradient algorithm for nuclear norm regularized least squares problems. Pacific Journal of Optimization, 6(3): , Tropp, J. A. User-friendly tail bounds for sums of random matrices. Found. Comput. Math., August Zhou, Enlu and Hu, Jiaqiao. Gradient-based adaptive stochastic search for non-differentiable optimization. Automatic Control, IEEE Transactions on, 59(7): , Zhou, Z., Li, X., Wright, J., Candès, E. J., and Ma, Y. Stable principal component pursuit. In IEEE International Symposium on Information Theory Proceedings (ISIT), pp , Mackey (Stanford) Divide-and-Conquer Matrix Factorization December 14, / 42

Matrix Completion and Matrix Concentration

Matrix Completion and Matrix Concentration Lester Mackey Collaborators: Ameet Talwalkar, Michael I. Jordan, Richard Y. Chen, Brendan Farrell, Joel A. Tropp, and Daniel Paulin Stanford University UCLA UC