COR-OPT Seminar Reading List Sp 18

Size: px
Start display at page:

Download "COR-OPT Seminar Reading List Sp 18"

Transcription

1 COR-OPT Seminar Reading List Sp 18 Damek Davis January 28, 2018 References [1] S. Tu, R. Boczar, M. Simchowitz, M. Soltanolkotabi, and B. Recht. Low-rank Solutions of Linear Matrix Equations via Procrustes Flow. In: arxiv: [math] (July 13, 2015). arxiv: url: [2] R. Meka, P. Jain, and I. S. Dhillon. Guaranteed Rank Minimization via Singular Value Projection. In: arxiv: [cs, math] (Sept. 30, 2009). arxiv: url: [3] R. Ge, C. Jin, and Y. Zheng. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis. In: arxiv: [cs, math, stat] (Apr. 3, 2017). arxiv: url: [4] S. Bhojanapalli, B. Neyshabur, and N. Srebro. Global Optimality of Local Search for Low Rank Matrix Recovery. In: arxiv: [cs, math, stat] (May 23, 2016). arxiv: url: [5] R. Ge, J. D. Lee, and T. Ma. Matrix Completion has No Spurious Local Minimum. In: arxiv: [cs, stat] (May 23, 2016). arxiv: url: http : //arxiv.org/abs/ [6] R. Ge and T. Ma. On the Optimization Landscape of Tensor Decompositions. In: arxiv: [cs, math, stat] (June 17, 2017). arxiv: url: http: //arxiv.org/abs/ [7] R. Ge, F. Huang, C. Jin, and Y. Yuan. Escaping From Saddle Points Online Stochastic Gradient for Tensor Decomposition. In: arxiv: [cs, math, stat] (Mar. 6, 2015). arxiv: url: [8] A. S. Bandeira, N. Boumal, and V. Voroninski. On the low-rank approach for semidefinite programs arising in synchronization and community detection. In: arxiv: [math] (Feb. 14, 2016). arxiv: url:

2 [9] C. Kim, A. S. Bandeira, and M. X. Goemans. Community Detection in Hypergraphs, Spiked Tensor Models, and Sum-of-Squares. In: arxiv: [cs, math, stat] (May 8, 2017). arxiv: url: [10] E. Abbe. Community detection and stochastic block models: recent developments. In: arxiv: [cs, math, stat] (Mar. 29, 2017). arxiv: url: [11] K. Kawaguchi. Deep Learning without Poor Local Minima. In: arxiv: [cs, math, stat] (May 23, 2016). arxiv: url: [12] M. Hardt and T. Ma. Identity Matters in Deep Learning. In: arxiv: [cs, stat] (Nov. 13, 2016). arxiv: url: [13] M. Hardt, T. Ma, and B. Recht. Gradient Descent Learns Linear Dynamical Systems. In: arxiv: [cs, math, stat] (Sept. 16, 2016). arxiv: url: [14] A. S. Bandeira, N. Boumal, and A. Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. In: Mathematical Programming (May 2017), pp issn: , doi: /s arxiv: url: [15] A. Bandeira, P. Rigollet, and J. Weed. Optimal rates of estimation for multireference alignment. In: arxiv: [math, stat] (Feb. 27, 2017). arxiv: url: [16] D. Boob and G. Lan. Theoretical properties of the global optimizer of two layer neural network. In: arxiv: [cs] (Oct. 30, 2017). arxiv: url: [17] D. Boob and G. Lan. Theoretical properties of the global optimizer of two layer neural network. In: arxiv: [cs] (Oct. 30, 2017). arxiv: url: [18] L. Wang and A. Singer. Exact and Stable Recovery of Rotations for Robust Synchronization. In: arxiv: [cs, math] (Nov. 11, 2012). arxiv: url: [19] R. Vershynin. Estimation in high dimensions: a geometric perspective. In: arxiv: [math, stat] (May 20, 2014). arxiv: url: [20] H. Liu, M.-C. Yue, and A. M.-C. So. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. In: arxiv: [math] (Mar. 1, 2016). arxiv: url: org/abs/

3 [21] N. Boumal. Nonconvex phase synchronization. In: arxiv: [math] (Jan. 22, 2016). arxiv: url: [22] Y. Zhong and N. Boumal. Near-optimal bounds for phase synchronization. In: arxiv: [math] (Mar. 20, 2017). arxiv: url: org/abs/ [23] V. Roulet, N. Boumal, and A. d Aspremont. Computational Complexity versus Statistical Performance on Sparse Recovery Problems. In: arxiv: [math] (June 10, 2015). arxiv: url: [24] M. Simchowitz, A. E. Alaoui, and B. Recht. On the Gap Between Strict-Saddles and True Convexity: An Omega(log d) Lower Bound for Eigenvector Approximation. In: arxiv: [cs, math, stat] (Apr. 14, 2017). arxiv: url: http: //arxiv.org/abs/ [25] Y. Chen and E. Candes. The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences. In: arxiv: [cs, math, stat] (Sept. 19, 2016). arxiv: url: [26] P.-L. Loh. Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. In: arxiv: [cs, math, stat] (Jan. 1, 2015). arxiv: url: [27] G. B. Arous, S. Mei, A. Montanari, and M. Nica. The landscape of the spiked tensor model. In: arxiv: [math, stat] (Nov. 15, 2017). arxiv: url: [28] E. Abbe, L. Massoulie, A. Montanari, A. Sly, and N. Srivastava. Group Synchronization on Grids. In: arxiv: [cs, math, stat] (June 26, 2017). arxiv: url: [29] S. S. Du, J. D. Lee, Y. Tian, B. Poczos, and A. Singh. Gradient Descent Learns Onehidden-layer CNN: Don t be Afraid of Spurious Local Minima. In: arxiv: [cs, math, stat] (Dec. 3, 2017). arxiv: url: [30] Q. Qu, Y. Zhang, Y. C. Eldar, and J. Wright. Convolutional Phase Retrieval via Gradient Descent. In: arxiv: [cs, math, stat] (Dec. 3, 2017). arxiv: url: [31] J. C. Duchi and F. Ruan. Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. In: arxiv: [cs, math, stat] (May 5, 2017). arxiv: url: [32] E. Abbe, J. Fan, K. Wang, and Y. Zhong. Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank. In: arxiv: [math, stat] (Sept. 27, 2017). arxiv: url: 3

4 [33] Y. Chen, Y. Chi, and A. Goldsmith. Exact and Stable Covariance Estimation from Quadratic Sampling via Convex Programming. In: arxiv: [cs, math, stat] (Oct. 2, 2013). arxiv: url: [34] J. Tang, F. Bach, M. Golbabaee, and M. Davies. Structure-Adaptive, Variance- Reduced, and Accelerated Stochastic Optimization. In: arxiv: [math] (Dec. 8, 2017). arxiv: url: [35] M. Soltanolkotabi. Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization. In: arxiv: [cs, math, stat] (Feb. 20, 2017). arxiv: url: (visited on 12/30/2017). [36] A. Ahmed, B. Recht, and J. Romberg. Blind Deconvolution using Convex Programming. In: arxiv: [cs, math] (Nov. 21, 2012). arxiv: url: (visited on 01/10/2018). [37] H. Namkoong and J. C. Duchi. Variance-based Regularization with Convex Objectives. In: Advances in Neural Information Processing Systems. 2017, pp [38] G. Liu, Q. Liu, and X. Yuan. A New Theory for Matrix Completion. In: Advances in Neural Information Processing Systems. 2017, pp [39] N. Chatterji and P. L. Bartlett. Alternating minimization for dictionary learning with random initialization. In: Advances in Neural Information Processing Systems. 2017, pp [40] Z. Artstein and R. J.-B. Wets. Consistency of minimizers and the SLLN for stochastic programs. IBM Thomas J. Watson Research Division, [41] S. Goel and A. Klivans. Eigenvalue decay implies polynomial-time learnability for neural networks. In: Advances in Neural Information Processing Systems. 2017, pp [42] K. Hayashi and Y. Yoshida. Fitting Low-Rank Tensors in Constant Time. In: Advances in Neural Information Processing Systems. 2017, pp [43] C. Ma, K. Wang, Y. Chi, and Y. Chen. Implicit regularization in nonconvex statistical estimation: Gradient descent converges linearly for phase retrieval, matrix completion and blind deconvolution. In: arxiv preprint arxiv: (2017). [44] V. I. Norkin and R. J.-B. Wets. Law of small numbers as concentration inequalities for sums of independent random setsand random set valued mappings. In: The Association of Lithuanian Serials, July 3, 2012, pp isbn: doi: / stoprog url: http : / / www. moksloperiodika. lt / STOPROG_2012/abstract/017.html (visited on 01/17/2018). 4

5 [45] M. Soltanolkotabi. Learning ReLUs via Gradient Descent. In: arxiv preprint arxiv: (2017). [46] S. S. Du, Y. Wang, and A. Singh. On the Power of Truncated SVD for General Highrank Matrix Estimation Problems. In: arxiv preprint arxiv: (2017). [47] G. Wang, G. B. Giannakis, Y. Saad, and J. Chen. Solving Almost all Systems of Random Quadratic Equations. In: arxiv preprint arxiv: (2017). [48] X. Huang, Z. Liang, C. Bajaj, and Q. Huang. Translation Synchronization via Truncated Least Squares. In: Advances in Neural Information Processing Systems. 2017, pp [49] D. Cohen, Y. C. Eldar, and G. Leus. Universal lower bounds on sampling rates for covariance estimation. In: Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp [50] S. Mei, Y. Bai, and A. Montanari. The Landscape of Empirical Risk for Non-convex Losses. In: arxiv: [stat] (July 21, 2016). arxiv: url: http: //arxiv.org/abs/ (visited on 01/17/2018). 5

Research Statement Qing Qu

Research Statement Qing Qu Qing Qu Research Statement 1/4 Qing Qu (qq2105@columbia.edu) Today we are living in an era of information explosion. As the sensors and sensing modalities proliferate, our world is inundated by unprecedented

More information

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion

Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion : Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion Cong Ma 1 Kaizheng Wang 1 Yuejie Chi Yuxin Chen 3 Abstract Recent years have seen a flurry of activities in designing provably

More information

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach

Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach Dohuyng Park Anastasios Kyrillidis Constantine Caramanis Sujay Sanghavi acebook UT Austin UT Austin UT Austin Abstract

More information

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks

Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Characterization of Gradient Dominance and Regularity Conditions for Neural Networks Yi Zhou Ohio State University Yingbin Liang Ohio State University Abstract zhou.1172@osu.edu liang.889@osu.edu The past

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

Foundations of Deep Learning: SGD, Overparametrization, and Generalization

Foundations of Deep Learning: SGD, Overparametrization, and Generalization Foundations of Deep Learning: SGD, Overparametrization, and Generalization Jason D. Lee University of Southern California November 13, 2018 Deep Learning Single Neuron x σ( w, x ) ReLU: σ(z) = [z] + Figure:

More information

Overparametrization for Landscape Design in Non-convex Optimization

Overparametrization for Landscape Design in Non-convex Optimization Overparametrization for Landscape Design in Non-convex Optimization Jason D. Lee University of Southern California September 19, 2018 The State of Non-Convex Optimization Practical observation: Empirically,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Nonconvex Matrix Factorization from Rank-One Measurements

Nonconvex Matrix Factorization from Rank-One Measurements Yuanxin Li Cong Ma Yuxin Chen Yuejie Chi CMU Princeton Princeton CMU Abstract We consider the problem of recovering lowrank matrices from random rank-one measurements, which spans numerous applications

More information

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval

Fundamental Limits of Weak Recovery with Applications to Phase Retrieval Proceedings of Machine Learning Research vol 75:1 6, 2018 31st Annual Conference on Learning Theory Fundamental Limits of Weak Recovery with Applications to Phase Retrieval Marco Mondelli Department of

More information

arxiv: v2 [stat.ml] 27 Sep 2016

arxiv: v2 [stat.ml] 27 Sep 2016 Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach arxiv:1609.030v [stat.ml] 7 Sep 016 Dohyung Park, Anastasios Kyrillidis, Constantine Caramanis, and Sujay Sanghavi

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Low-rank matrix recovery via nonconvex optimization Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization

Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Ranking from Crowdsourced Pairwise Comparisons via Matrix Manifold Optimization Jialin Dong ShanghaiTech University 1 Outline Introduction FourVignettes: System Model and Problem Formulation Problem Analysis

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

Mini-Course 1: SGD Escapes Saddle Points

Mini-Course 1: SGD Escapes Saddle Points Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent

More information

Research Statement Figure 2: Figure 1:

Research Statement Figure 2: Figure 1: Qing Qu Research Statement 1/6 Nowadays, as sensors and sensing modalities proliferate (e.g., hyperspectral imaging sensors [1], computational microscopy [2], and calcium imaging [3] in neuroscience, see

More information

Symmetric Factorization for Nonconvex Optimization

Symmetric Factorization for Nonconvex Optimization Symmetric Factorization for Nonconvex Optimization Qinqing Zheng February 24, 2017 1 Overview A growing body of recent research is shedding new light on the role of nonconvex optimization for tackling

More information

A Conservation Law Method in Optimization

A Conservation Law Method in Optimization A Conservation Law Method in Optimization Bin Shi Florida International University Tao Li Florida International University Sundaraja S. Iyengar Florida International University Abstract bshi1@cs.fiu.edu

More information

Nonconvex Methods for Phase Retrieval

Nonconvex Methods for Phase Retrieval Nonconvex Methods for Phase Retrieval Vince Monardo Carnegie Mellon University March 28, 2018 Vince Monardo (CMU) 18.898 Midterm March 28, 2018 1 / 43 Paper Choice Main reference paper: Solving Random

More information

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos 1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and

More information

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property

Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Convergence of Cubic Regularization for Nonconvex Optimization under KŁ Property Yi Zhou Department of ECE The Ohio State University zhou.1172@osu.edu Zhe Wang Department of ECE The Ohio State University

More information

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences

The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences The Projected Power Method: An Efficient Algorithm for Joint Alignment from Pairwise Differences Yuxin Chen Emmanuel Candès Department of Statistics, Stanford University, Sep. 2016 Nonconvex optimization

More information

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Yuejie Chi, Yuanxin Li, Huishuai Zhang, and Yingbin Liang Abstract Recent work has demonstrated the effectiveness

More information

A Geometric Analysis of Phase Retrieval

A Geometric Analysis of Phase Retrieval A Geometric Analysis of Phase Retrieval Ju Sun, Qing Qu, John Wright js4038, qq2105, jw2966}@columbiaedu Dept of Electrical Engineering, Columbia University, New York NY 10027, USA Abstract Given measurements

More information

The non-convex Burer Monteiro approach works on smooth semidefinite programs

The non-convex Burer Monteiro approach works on smooth semidefinite programs The non-convex Burer Monteiro approach works on smooth semidefinite programs Nicolas Boumal Department of Mathematics Princeton University nboumal@math.princeton.edu Vladislav Voroninski Department of

More information

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,

More information

arxiv: v1 [cs.lg] 4 Oct 2018

arxiv: v1 [cs.lg] 4 Oct 2018 Gradient Descent Provably Optimizes Over-parameterized Neural Networks Simon S. Du 1, Xiyu Zhai, Barnabás Póczos 1, and Aarti Singh 1 arxiv:1810.0054v1 [cs.lg] 4 Oct 018 1 Machine Learning Department,

More information

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow Huishuai Zhang Department of EECS, Syracuse University, Syracuse, NY 3244 USA Yuejie Chi Department of ECE, The Ohio State

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Spectral Initialization for Nonconvex Estimation: High-Dimensional Limit and Phase Transitions

Spectral Initialization for Nonconvex Estimation: High-Dimensional Limit and Phase Transitions Spectral Initialization for Nonconvex Estimation: High-Dimensional Limit and hase Transitions Yue M. Lu aulson School of Engineering and Applied Sciences Harvard University, USA Email: yuelu@seas.harvard.edu

More information

SGD CONVERGES TO GLOBAL MINIMUM IN DEEP LEARNING VIA STAR-CONVEX PATH

SGD CONVERGES TO GLOBAL MINIMUM IN DEEP LEARNING VIA STAR-CONVEX PATH Under review as a conference paper at ICLR 9 SGD CONVERGES TO GLOAL MINIMUM IN DEEP LEARNING VIA STAR-CONVEX PATH Anonymous authors Paper under double-blind review ASTRACT Stochastic gradient descent (SGD)

More information

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210 ROBUST BLIND SPIKES DECONVOLUTION Yuejie Chi Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 4 ABSTRACT Blind spikes deconvolution, or blind super-resolution, deals with

More information

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v1 [math.oc] 9 Oct 2018 Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization

Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Rapid, Robust, and Reliable Blind Deconvolution via Nonconvex Optimization Shuyang Ling Department of Mathematics, UC Davis Oct.18th, 2016 Shuyang Ling (UC Davis) 16w5136, Oaxaca, Mexico Oct.18th, 2016

More information

Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting

Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting Recovery of Compactly Supported Functions from Spectrogram Measurements via Lifting Mark Iwen markiwen@math.msu.edu 2017 Friday, July 7 th, 2017 Joint work with... Sami Merhi (Michigan State University)

More information

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization

Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Symmetry, Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization Xingguo Li Jarvis Haupt Dept of Electrical and Computer Eng University of Minnesota Email: lixx1661, jdhaupt@umdedu

More information

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond

Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Global Optimality in Matrix and Tensor Factorization, Deep Learning & Beyond Ben Haeffele and René Vidal Center for Imaging Science Mathematical Institute for Data Science Johns Hopkins University This

More information

Dictionary Learning Using Tensor Methods

Dictionary Learning Using Tensor Methods Dictionary Learning Using Tensor Methods Anima Anandkumar U.C. Irvine Joint work with Rong Ge, Majid Janzamin and Furong Huang. Feature learning as cornerstone of ML ML Practice Feature learning as cornerstone

More information

Linear dimensionality reduction for data analysis

Linear dimensionality reduction for data analysis Linear dimensionality reduction for data analysis Nicolas Gillis Joint work with Robert Luce, François Glineur, Stephen Vavasis, Robert Plemmons, Gabriella Casalino The setup Dimensionality reduction for

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow

Low-rank Solutions of Linear Matrix Equations via Procrustes Flow Low-rank Solutions of Linear Matrix Equations via Procrustes low Stephen Tu, Ross Boczar, Max Simchowitz {STEPHENT,BOCZAR,MSIMCHOW}@BERKELEYEDU EECS Department, UC Berkeley, Berkeley, CA Mahdi Soltanolkotabi

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Neural Networks: A brief touch Yuejie Chi Department of Electrical and Computer Engineering Spring 2018 1/41 Outline

More information

Gradient Descent Can Take Exponential Time to Escape Saddle Points

Gradient Descent Can Take Exponential Time to Escape Saddle Points Gradient Descent Can Take Exponential Time to Escape Saddle Points Simon S. Du Carnegie Mellon University ssdu@cs.cmu.edu Jason D. Lee University of Southern California jasonlee@marshall.usc.edu Barnabás

More information

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization

Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Sparse and low-rank decomposition for big data systems via smoothed Riemannian optimization Yuanming Shi ShanghaiTech University, Shanghai, China shiym@shanghaitech.edu.cn Bamdev Mishra Amazon Development

More information

Convex Phase Retrieval without Lifting via PhaseMax

Convex Phase Retrieval without Lifting via PhaseMax Convex Phase etrieval without Lifting via PhaseMax Tom Goldstein * 1 Christoph Studer * 2 Abstract Semidefinite relaxation methods transform a variety of non-convex optimization problems into convex problems,

More information

References. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization

References. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization References --- a tentative list of papers to be mentioned in the ICML 2017 tutorial Recent Advances in Stochastic Convex and Non-Convex Optimization Disclaimer: in a quite arbitrary order. 1. [ShalevShwartz-Zhang,

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints

Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints Lift me up but not too high Fast algorithms to solve SDP s with block-diagonal constraints Nicolas Boumal Université catholique de Louvain (Belgium) IDeAS seminar, May 13 th, 2014, Princeton The Riemannian

More information

Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis

Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis Fundamental Limits of PhaseMax for Phase Retrieval: A Replica Analysis Oussama Dhifallah and Yue M. Lu John A. Paulson School of Engineering and Applied Sciences Harvard University, Cambridge, MA 0238,

More information

+1 (951) Suite Riverside, CA 92507

+1 (951) Suite Riverside, CA 92507 Samet Oymak oymak@ece.ucr.edu Winston Chung Hall +1 (951) 827-7701 Suite 322 www.sametoymak.com Riverside, CA 92507 ACADEMIC EXPERIENCE University of California, Riverside Assistant Professor in Electrical

More information

AN ALGORITHM FOR EXACT SUPER-RESOLUTION AND PHASE RETRIEVAL

AN ALGORITHM FOR EXACT SUPER-RESOLUTION AND PHASE RETRIEVAL 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) AN ALGORITHM FOR EXACT SUPER-RESOLUTION AND PHASE RETRIEVAL Yuxin Chen Yonina C. Eldar Andrea J. Goldsmith Department

More information

Gradient Descent Learns One-hidden-layer CNN: Don t be Afraid of Spurious Local Minima

Gradient Descent Learns One-hidden-layer CNN: Don t be Afraid of Spurious Local Minima : Don t be Afraid of Spurious Local Minima Simon S. Du Jason D. Lee Yuandong Tian 3 Barnabás Póczos Aarti Singh Abstract We consider the problem of learning a one-hiddenlayer neural network with non-overlapping

More information

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

Diffusion Approximations for Online Principal Component Estimation and Global Convergence Diffusion Approximations for Online Principal Component Estimation and Global Convergence Chris Junchi Li Mengdi Wang Princeton University Department of Operations Research and Financial Engineering, Princeton,

More information

On the local stability of semidefinite relaxations

On the local stability of semidefinite relaxations On the local stability of semidefinite relaxations Diego Cifuentes Department of Mathematics Massachusetts Institute of Technology Joint work with Sameer Agarwal (Google), Pablo Parrilo (MIT), Rekha Thomas

More information

Fast and Robust Phase Retrieval

Fast and Robust Phase Retrieval Fast and Robust Phase Retrieval Aditya Viswanathan aditya@math.msu.edu CCAM Lunch Seminar Purdue University April 18 2014 0 / 27 Joint work with Yang Wang Mark Iwen Research supported in part by National

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow Gang Wang,, Georgios B. Giannakis, and Jie Chen Dept. of ECE and Digital Tech. Center, Univ. of Minnesota,

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More

Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Global Optimality in Matrix and Tensor Factorizations, Deep Learning and More Ben Haeffele and René Vidal Center for Imaging Science Institute for Computational Medicine Learning Deep Image Feature Hierarchies

More information

Stein s Method for Matrix Concentration

Stein s Method for Matrix Concentration Stein s Method for Matrix Concentration Lester Mackey Collaborators: Michael I. Jordan, Richard Y. Chen, Brendan Farrell, and Joel A. Tropp University of California, Berkeley California Institute of Technology

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

GAMINGRE 8/1/ of 7

GAMINGRE 8/1/ of 7 FYE 09/30/92 JULY 92 0.00 254,550.00 0.00 0 0 0 0 0 0 0 0 0 254,550.00 0.00 0.00 0.00 0.00 254,550.00 AUG 10,616,710.31 5,299.95 845,656.83 84,565.68 61,084.86 23,480.82 339,734.73 135,893.89 67,946.95

More information

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

Reshaped Wirtinger Flow for Solving Quadratic System of Equations Reshaped Wirtinger Flow for Solving Quadratic System of Equations Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 344 hzhan3@syr.edu Yingbin Liang Department of EECS Syracuse University

More information

CHAPTER 1 EXPRESSIONS, EQUATIONS, FUNCTIONS (ORDER OF OPERATIONS AND PROPERTIES OF NUMBERS)

CHAPTER 1 EXPRESSIONS, EQUATIONS, FUNCTIONS (ORDER OF OPERATIONS AND PROPERTIES OF NUMBERS) Aug 29 CHAPTER 1 EXPRESSIONS, EQUATIONS, FUNCTIONS (ORDER OF OPERATIONS AND PROPERTIES OF NUMBERS) Sept 5 No School Labor Day Holiday CHAPTER 1 EXPRESSIONS, EQUATIONS, FUNCTIONS (RELATIONS AND FUNCTIONS)

More information

arxiv: v1 [cs.it] 21 Feb 2013

arxiv: v1 [cs.it] 21 Feb 2013 q-ary Compressive Sensing arxiv:30.568v [cs.it] Feb 03 Youssef Mroueh,, Lorenzo Rosasco, CBCL, CSAIL, Massachusetts Institute of Technology LCSL, Istituto Italiano di Tecnologia and IIT@MIT lab, Istituto

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

On the fast convergence of random perturbations of the gradient flow.

On the fast convergence of random perturbations of the gradient flow. On the fast convergence of random perturbations of the gradient flow. Wenqing Hu. 1 (Joint work with Chris Junchi Li 2.) 1. Department of Mathematics and Statistics, Missouri S&T. 2. Department of Operations

More information

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Xingguo Li Zhehui Chen Lin Yang Jarvis Haupt Tuo Zhao University of Minnesota Princeton University

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC

Non-Convex Optimization in Machine Learning. Jan Mrkos AIC Non-Convex Optimization in Machine Learning Jan Mrkos AIC The Plan 1. Introduction 2. Non convexity 3. (Some) optimization approaches 4. Speed and stuff? Neural net universal approximation Theorem (1989):

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval

Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval Optimal Spectral Initialization for Signal Recovery with Applications to Phase Retrieval Wangyu Luo, Wael Alghamdi Yue M. Lu Abstract We present the optimal design of a spectral method widely used to initialize

More information

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications

Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Local Strong Convexity of Maximum-Likelihood TDOA-Based Source Localization and Its Algorithmic Implications Huikang Liu, Yuen-Man Pun, and Anthony Man-Cho So Dept of Syst Eng & Eng Manag, The Chinese

More information

Statistical Machine Learning for Structured and High Dimensional Data

Statistical Machine Learning for Structured and High Dimensional Data Statistical Machine Learning for Structured and High Dimensional Data (FA9550-09- 1-0373) PI: Larry Wasserman (CMU) Co- PI: John Lafferty (UChicago and CMU) AFOSR Program Review (Jan 28-31, 2013, Washington,

More information

arxiv: v1 [cs.lg] 4 Oct 2018

arxiv: v1 [cs.lg] 4 Oct 2018 Gradient descent aligns the layers of deep linear networks Ziwei Ji Matus Telgarsky {ziweiji,mjt}@illinois.edu University of Illinois, Urbana-Champaign arxiv:80.003v [cs.lg] 4 Oct 08 Abstract This paper

More information

+1 (626) Dwight Way Berkeley, CA 94704

+1 (626) Dwight Way  Berkeley, CA 94704 Samet Oymak sametoymak@gmail.com The Voleon Group +1 (626) 720-2114 2170 Dwight Way www.sametoymak.com Berkeley, CA 94704 ACADEMIC EXPERIENCE University of California, Berkeley (Sept. 2014 June 2015) Postdoctoral

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fifth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada International Edition contributions by Telagarapu Prabhakar Department

More information

Restricted Strong Convexity Implies Weak Submodularity

Restricted Strong Convexity Implies Weak Submodularity Restricted Strong Convexity Implies Weak Submodularity Ethan R. Elenberg Rajiv Khanna Alexandros G. Dimakis Department of Electrical and Computer Engineering The University of Texas at Austin {elenberg,rajivak}@utexas.edu

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

Tensor Low-Rank Completion and Invariance of the Tucker Core

Tensor Low-Rank Completion and Invariance of the Tucker Core Tensor Low-Rank Completion and Invariance of the Tucker Core Shuzhong Zhang Department of Industrial & Systems Engineering University of Minnesota zhangs@umn.edu Joint work with Bo JIANG, Shiqian MA, and

More information

High-dimensional Statistics

High-dimensional Statistics High-dimensional Statistics Pradeep Ravikumar UT Austin Outline 1. High Dimensional Data : Large p, small n 2. Sparsity 3. Group Sparsity 4. Low Rank 1 Curse of Dimensionality Statistical Learning: Given

More information

Analysis of Robust PCA via Local Incoherence

Analysis of Robust PCA via Local Incoherence Analysis of Robust PCA via Local Incoherence Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yi Zhou Department of EECS Syracuse University Syracuse, NY 3244 yzhou35@syr.edu

More information

PHASE RETRIEVAL FROM STFT MEASUREMENTS VIA NON-CONVEX OPTIMIZATION. Tamir Bendory and Yonina C. Eldar, Fellow IEEE

PHASE RETRIEVAL FROM STFT MEASUREMENTS VIA NON-CONVEX OPTIMIZATION. Tamir Bendory and Yonina C. Eldar, Fellow IEEE PHASE RETRIEVAL FROM STFT MEASUREMETS VIA O-COVEX OPTIMIZATIO Tamir Bendory and Yonina C Eldar, Fellow IEEE Department of Electrical Engineering, Technion - Israel Institute of Technology, Haifa, Israel

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal Stochastic Optimization Methods for Machine Learning Jorge Nocedal Northwestern University SIAM CSE, March 2017 1 Collaborators Richard Byrd R. Bollagragada N. Keskar University of Colorado Northwestern

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

A random perturbation approach to some stochastic approximation algorithms in optimization.

A random perturbation approach to some stochastic approximation algorithms in optimization. A random perturbation approach to some stochastic approximation algorithms in optimization. Wenqing Hu. 1 (Presentation based on joint works with Chris Junchi Li 2, Weijie Su 3, Haoyi Xiong 4.) 1. Department

More information

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization Proceedings of the hirty-first AAAI Conference on Artificial Intelligence (AAAI-7) Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization Zhouyuan Huo Dept. of Computer

More information

arxiv: v2 [math.oc] 5 Nov 2017

arxiv: v2 [math.oc] 5 Nov 2017 Gradient Descent Can Take Exponential Time to Escape Saddle Points arxiv:175.1412v2 [math.oc] 5 Nov 217 Simon S. Du Carnegie Mellon University ssdu@cs.cmu.edu Jason D. Lee University of Southern California

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking

Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Spectral Method and Regularized MLE Are Both Optimal for Top-K Ranking Yuxin Chen Electrical Engineering, Princeton University Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang Ranking A fundamental

More information

COMPRESSED Sensing (CS) is a method to recover a

COMPRESSED Sensing (CS) is a method to recover a 1 Sample Complexity of Total Variation Minimization Sajad Daei, Farzan Haddadi, Arash Amini Abstract This work considers the use of Total Variation (TV) minimization in the recovery of a given gradient

More information

Tensor network vs Machine learning. Song Cheng ( 程嵩 ) IOP, CAS

Tensor network vs Machine learning. Song Cheng ( 程嵩 ) IOP, CAS Tensor network vs Machine learning Song Cheng ( 程嵩 ) IOP, CAS physichengsong@iphy.ac.cn Outline Tensor network in a nutshell TN concepts in machine learning TN methods in machine learning Outline Tensor

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information