COR-OPT Seminar Reading List Sp 18

Similar documents
Research Statement Qing Qu

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Foundations of Deep Learning: SGD, Overparametrization, and Generalization

Overparametrization for Landscape Design in Non-convex Optimization

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Nonconvex Matrix Factorization from Rank-One Measurements

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

arxiv: v2 [stat.ml] 27 Sep 2016

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

Composite nonlinear models at scale

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Mini-Course 1: SGD Escapes Saddle Points

Research Statement Figure 2: Figure 1:

Symmetric Factorization for Nonconvex Optimization

A Conservation Law Method in Optimization

Nonconvex Methods for Phase Retrieval

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

A Geometric Analysis of Phase Retrieval

The non-convex Burer Monteiro approach works on smooth semidefinite programs

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

arxiv: v1 [cs.lg] 4 Oct 2018

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow

Solving Corrupted Quadratic Equations, Provably

Spectral Initialization for Nonconvex Estimation: High-Dimensional Limit and Phase Transitions

SGD CONVERGES TO GLOBAL MINIMUM IN DEEP LEARNING VIA STAR-CONVEX PATH

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

arxiv: v1 [math.oc] 9 Oct 2018

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Dictionary Learning Using Tensor Methods

Linear dimensionality reduction for data analysis

Nonlinear Optimization Methods for Machine Learning

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Convex Phase Retrieval without Lifting via PhaseMax

References. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis

+1 (951) Suite Riverside, CA 92507

AN ALGORITHM FOR EXACT SUPER-RESOLUTION AND PHASE RETRIEVAL

Gradient Descent Learns One-hidden-layer CNN: Don t be Afraid of Spurious Local Minima

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

On the local stability of semidefinite relaxations

Fast and Robust Phase Retrieval

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow

Day 3 Lecture 3. Optimizing deep networks

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More

Stein s Method for Matrix Concentration

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

GAMINGRE 8/1/ of 7

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

CHAPTER 1 EXPRESSIONS, EQUATIONS, FUNCTIONS (ORDER OF OPERATIONS AND PROPERTIES OF NUMBERS)

arxiv: v1 [cs.it] 21 Feb 2013

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

On the fast convergence of random perturbations of the gradient flow.

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization

An iterative hard thresholding estimator for low rank matrix recovery

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Stochastic optimization in Hilbert spaces

Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications

Statistical Machine Learning for Structured and High Dimensional Data

arxiv: v1 [cs.lg] 4 Oct 2018

+1 (626) Dwight Way Berkeley, CA 94704

ADAPTIVE FILTER THEORY

Restricted Strong Convexity Implies Weak Submodularity

STA141C: Big Data & High Performance Statistical Computing

ADAPTIVE FILTER THEORY

Tensor Low-Rank Completion and Invariance of the Tucker Core

High-dimensional Statistics

Analysis of Robust PCA via Local Incoherence

PHASE RETRIEVAL FROM STFT MEASUREMENTS VIA NON-CONVEX OPTIMIZATION. Tamir Bendory and Yonina C. Eldar, Fellow IEEE

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Learning with stochastic proximal gradient

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal

Sparse Solutions of an Undetermined Linear System

A random perturbation approach to some stochastic approximation algorithms in optimization.

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

arxiv: v2 [math.oc] 5 Nov 2017

ECS289: Scalable Machine Learning

CSC 576: Variants of Sparse Learning

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

COMPRESSED Sensing (CS) is a method to recover a

Tensor network vs Machine learning. Song Cheng ( 程嵩 ) IOP, CAS

Estimators based on non-convex programs: Statistical and computational guarantees

Transcription:

COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow. In: arxiv:1507.03566 [math] (July 13, 2015). arxiv: 1507.03566. url: http://arxiv.org/abs/1507.03566. [2] R. Meka, P. Jain, and I. S. Dhillon. Guaranteed Rank Minimization via Singular Value Projection. In: arxiv:0909.5457 [cs, math] (Sept. 30, 2009). arxiv: 0909. 5457. url: http://arxiv.org/abs/0909.5457. [3] R. Ge, C. Jin, and Y. Zheng. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis. In: arxiv:1704.00708 [cs, math, stat] (Apr. 3, 2017). arxiv: 1704.00708. url: http://arxiv.org/abs/1704.00708. [4] S. Bhojanapalli, B. Neyshabur, and N. Srebro. Global Optimality of Local Search for Low Rank Matrix Recovery. In: arxiv:1605.07221 [cs, math, stat] (May 23, 2016). arxiv: 1605.07221. url: http://arxiv.org/abs/1605.07221. [5] R. Ge, J. D. Lee, and T. Ma. Matrix Completion has No Spurious Local Minimum. In: arxiv:1605.07272 [cs, stat] (May 23, 2016). arxiv: 1605. 07272. url: http : //arxiv.org/abs/1605.07272. [6] R. Ge and T. Ma. On the Optimization Landscape of Tensor Decompositions. In: arxiv:1706.05598 [cs, math, stat] (June 17, 2017). arxiv: 1706.05598. url: http: //arxiv.org/abs/1706.05598. [7] R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping From Saddle Points Online Stochastic Gradient for Tensor Decomposition. In: arxiv:1503.02101 [cs, math, stat] (Mar. 6, 2015). arxiv: 1503.02101. url: http://arxiv.org/abs/1503.02101. [8] A. S. Bandeira, N. Boumal, and V. Voroninski. On the low-rank approach for semidefinite programs arising in synchronization and community detection. In: arxiv:1602.04426 [math] (Feb. 14, 2016). arxiv: 1602.04426. url: http://arxiv.org/abs/1602. 04426. 1

[9] C. Kim, A. S. Bandeira, and M. X. Goemans. Community Detection in Hypergraphs, Spiked Tensor Models, and Sum-of-Squares. In: arxiv:1705.02973 [cs, math, stat] (May 8, 2017). arxiv: 1705.02973. url: http://arxiv.org/abs/1705.02973. [10] E. Abbe. Community detection and stochastic block models: recent developments. In: arxiv:1703.10146 [cs, math, stat] (Mar. 29, 2017). arxiv: 1703. 10146. url: http://arxiv.org/abs/1703.10146. [11] K. Kawaguchi. Deep Learning without Poor Local Minima. In: arxiv:1605.07110 [cs, math, stat] (May 23, 2016). arxiv: 1605.07110. url: http://arxiv.org/abs/ 1605.07110. [12] M. Hardt and T. Ma. Identity Matters in Deep Learning. In: arxiv:1611.04231 [cs, stat] (Nov. 13, 2016). arxiv: 1611.04231. url: http://arxiv.org/abs/1611. 04231. [13] M. Hardt, T. Ma, and B. Recht. Gradient Descent Learns Linear Dynamical Systems. In: arxiv:1609.05191 [cs, math, stat] (Sept. 16, 2016). arxiv: 1609.05191. url: http://arxiv.org/abs/1609.05191. [14] A. S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. In: Mathematical Programming 163.1 (May 2017), pp. 145 167. issn: 0025-5610, 1436-4646. doi: 10.1007/s10107-016-1059-6. arxiv: 1411.3272. url: http://arxiv.org/abs/1411.3272. [15] A. Bandeira, P. Rigollet, and J. Weed. Optimal rates of estimation for multireference alignment. In: arxiv:1702.08546 [math, stat] (Feb. 27, 2017). arxiv: 1702. 08546. url: http://arxiv.org/abs/1702.08546. [16] D. Boob and G. Lan. Theoretical properties of the global optimizer of two layer neural network. In: arxiv:1710.11241 [cs] (Oct. 30, 2017). arxiv: 1710.11241. url: http://arxiv.org/abs/1710.11241. [17] D. Boob and G. Lan. Theoretical properties of the global optimizer of two layer neural network. In: arxiv:1710.11241 [cs] (Oct. 30, 2017). arxiv: 1710.11241. url: http://arxiv.org/abs/1710.11241. [18] L. Wang and A. Singer. Exact and Stable Recovery of Rotations for Robust Synchronization. In: arxiv:1211.2441 [cs, math] (Nov. 11, 2012). arxiv: 1211.2441. url: http://arxiv.org/abs/1211.2441. [19] R. Vershynin. Estimation in high dimensions: a geometric perspective. In: arxiv:1405.5103 [math, stat] (May 20, 2014). arxiv: 1405.5103. url: http://arxiv.org/abs/1405. 5103. [20] H. Liu, M.-C. Yue, and A. M.-C. So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. In: arxiv:1603.00211 [math] (Mar. 1, 2016). arxiv: 1603.00211. url: http://arxiv. org/abs/1603.00211. 2

[21] N. Boumal. Nonconvex phase synchronization. In: arxiv:1601.06114 [math] (Jan. 22, 2016). arxiv: 1601.06114. url: http://arxiv.org/abs/1601.06114. [22] Y. Zhong and N. Boumal. Near-optimal bounds for phase synchronization. In: arxiv:1703.06605 [math] (Mar. 20, 2017). arxiv: 1703.06605. url: http://arxiv. org/abs/1703.06605. [23] V. Roulet, N. Boumal, and A. d Aspremont. Computational Complexity versus Statistical Performance on Sparse Recovery Problems. In: arxiv:1506.03295 [math] (June 10, 2015). arxiv: 1506.03295. url: http://arxiv.org/abs/1506.03295. [24] M. Simchowitz, A. E. Alaoui, and B. Recht. On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation. In: arxiv:1704.04548 [cs, math, stat] (Apr. 14, 2017). arxiv: 1704.04548. url: http: //arxiv.org/abs/1704.04548. [25] Y. Chen and E. Candes. The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences. In: arxiv:1609.05820 [cs, math, stat] (Sept. 19, 2016). arxiv: 1609.05820. url: http://arxiv.org/abs/1609.05820. [26] P.-L. Loh. Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. In: arxiv:1501.00312 [cs, math, stat] (Jan. 1, 2015). arxiv: 1501.00312. url: http://arxiv.org/abs/1501.00312. [27] G. B. Arous, S. Mei, A. Montanari, and M. Nica. The landscape of the spiked tensor model. In: arxiv:1711.05424 [math, stat] (Nov. 15, 2017). arxiv: 1711.05424. url: http://arxiv.org/abs/1711.05424. [28] E. Abbe, L. Massoulie, A. Montanari, A. Sly, and N. Srivastava. Group Synchronization on Grids. In: arxiv:1706.08561 [cs, math, stat] (June 26, 2017). arxiv: 1706.08561. url: http://arxiv.org/abs/1706.08561. [29] S. S. Du, J. D. Lee, Y. Tian, B. Poczos, and A. Singh. Gradient Descent Learns Onehidden-layer CNN: Don t be Afraid of Spurious Local Minima. In: arxiv:1712.00779 [cs, math, stat] (Dec. 3, 2017). arxiv: 1712.00779. url: http://arxiv.org/abs/ 1712.00779. [30] Q. Qu, Y. Zhang, Y. C. Eldar, and J. Wright. Convolutional Phase Retrieval via Gradient Descent. In: arxiv:1712.00716 [cs, math, stat] (Dec. 3, 2017). arxiv: 1712. 00716. url: http://arxiv.org/abs/1712.00716. [31] J. C. Duchi and F. Ruan. Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. In: arxiv:1705.02356 [cs, math, stat] (May 5, 2017). arxiv: 1705.02356. url: http://arxiv.org/abs/1705.02356. [32] E. Abbe, J. Fan, K. Wang, and Y. Zhong. Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank. In: arxiv:1709.09565 [math, stat] (Sept. 27, 2017). arxiv: 1709.09565. url: http://arxiv.org/abs/1709.09565. 3

[33] Y. Chen, Y. Chi, and A. Goldsmith. Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming. In: arxiv:1310.0807 [cs, math, stat] (Oct. 2, 2013). arxiv: 1310.0807. url: http://arxiv.org/abs/1310.0807. [34] J. Tang, F. Bach, M. Golbabaee, and M. Davies. Structure-Adaptive, Variance- Reduced, and Accelerated Stochastic Optimization. In: arxiv:1712.03156 [math] (Dec. 8, 2017). arxiv: 1712.03156. url: http://arxiv.org/abs/1712.03156. [35] M. Soltanolkotabi. Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. In: arxiv:1702.06175 [cs, math, stat] (Feb. 20, 2017). arxiv: 1702.06175. url: http://arxiv.org/abs/ 1702.06175 (visited on 12/30/2017). [36] A. Ahmed, B. Recht, and J. Romberg. Blind Deconvolution using Convex Programming. In: arxiv:1211.5608 [cs, math] (Nov. 21, 2012). arxiv: 1211.5608. url: http://arxiv.org/abs/1211.5608 (visited on 01/10/2018). [37] H. Namkoong and J. C. Duchi. Variance-based Regularization with Convex Objectives. In: Advances in Neural Information Processing Systems. 2017, pp. 2975 2984. [38] G. Liu, Q. Liu, and X. Yuan. A New Theory for Matrix Completion. In: Advances in Neural Information Processing Systems. 2017, pp. 785 794. [39] N. Chatterji and P. L. Bartlett. Alternating minimization for dictionary learning with random initialization. In: Advances in Neural Information Processing Systems. 2017, pp. 1994 2003. [40] Z. Artstein and R. J.-B. Wets. Consistency of minimizers and the SLLN for stochastic programs. IBM Thomas J. Watson Research Division, 1994. [41] S. Goel and A. Klivans. Eigenvalue decay implies polynomial-time learnability for neural networks. In: Advances in Neural Information Processing Systems. 2017, pp. 2189 2199. [42] K. Hayashi and Y. Yoshida. Fitting Low-Rank Tensors in Constant Time. In: Advances in Neural Information Processing Systems. 2017, pp. 2470 2478. [43] C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. In: arxiv preprint arxiv:1711.10467 (2017). [44] V. I. Norkin and R. J.-B. Wets. Law of small numbers as concentration inequalities for sums of independent random setsand random set valued mappings. In: The Association of Lithuanian Serials, July 3, 2012, pp. 94 99. isbn: 978-609-95241-4- 6. doi: 10. 5200 / stoprog. 2012. 17. url: http : / / www. moksloperiodika. lt / STOPROG_2012/abstract/017.html (visited on 01/17/2018). 4

[45] M. Soltanolkotabi. Learning ReLUs via Gradient Descent. In: arxiv preprint arxiv:1705.04591 (2017). [46] S. S. Du, Y. Wang, and A. Singh. On the Power of Truncated SVD for General Highrank Matrix Estimation Problems. In: arxiv preprint arxiv:1702.06861 (2017). [47] G. Wang, G. B. Giannakis, Y. Saad, and J. Chen. Solving Almost all Systems of Random Quadratic Equations. In: arxiv preprint arxiv:1705.10407 (2017). [48] X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation Synchronization via Truncated Least Squares. In: Advances in Neural Information Processing Systems. 2017, pp. 1458 1467. [49] D. Cohen, Y. C. Eldar, and G. Leus. Universal lower bounds on sampling rates for covariance estimation. In: Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 3272 3276. [50] S. Mei, Y. Bai, and A. Montanari. The Landscape of Empirical Risk for Non-convex Losses. In: arxiv:1607.06534 [stat] (July 21, 2016). arxiv: 1607.06534. url: http: //arxiv.org/abs/1607.06534 (visited on 01/17/2018). 5