A Stochastic PCA Algorithm with an Exponential Convergence Rate. Ohad Shamir

Size: px
Start display at page:

Download "A Stochastic PCA Algorithm with an Exponential Convergence Rate. Ohad Shamir"

Transcription

1 A Stochastic PCA Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science NIPS Optimization Workshop December 2014 Ohad Shamir Stochastic PCA with Exponential Convergence 1/19

2 Principal Component Analysis PCA Input: x 1,..., x n R d Goal: Find k directions with most variance max U R d k :U U=I max w R d : w =1 1 n n U x i=1 2 For k = 1: Find leading eigenvector of covariance matrix ( ) w 1 n x i x i w n i=1 Ohad Shamir Stochastic PCA with Exponential Convergence 2/19

3 Existing Approaches max w R d : w =1 w ( 1 n ) n x i x i w i=1 Regime: n, d large, non-sparse matrix Ohad Shamir Stochastic PCA with Exponential Convergence 3/19

4 Existing Approaches max w R d : w =1 w ( 1 n ) n x i x i w i=1 Regime: n, d large, non-sparse matrix Approach 1: Eigendecomposition Compute leading eigenvector of 1 n n i=1 x ix i exactly (e.g. via QR decomposition) Runtime: O(d 3 ) Ohad Shamir Stochastic PCA with Exponential Convergence 3/19

5 Existing Approaches Approach 2: Power Iterations Initialize w 1 randomly on unit sphere For t = 1, 2,... w t+1 := ( 1 n n i=1 x ) ix i wt = 1 n n i=1 w t, x i x i := w t+1 / w t+1 w t+1 O ( 1 λ log ( )) 1 ɛ iterations for ɛ-optimality λ: Eigengap O(nd) runtime per iteration Overall runtime O ( nd λ log ( )) d ɛ Ohad Shamir Stochastic PCA with Exponential Convergence 4/19

6 Existing Approaches Approach 2: Power Iterations Initialize w 1 randomly on unit sphere For t = 1, 2,... w t+1 := ( 1 n n i=1 x ) ix i wt = 1 n n i=1 w t, x i x i := w t+1 / w t+1 w t+1 O ( 1 λ log ( )) 1 ɛ iterations for ɛ-optimality λ: Eigengap O(nd) runtime per iteration Overall runtime O ( nd λ log ( )) d ɛ Approach 2.5: Lanczos Iterations More complex ( algorithm, but roughly similar iteration runtime and only O λ 1 log ( ) ) 1 ɛ iterations [Kuczyński and Wozńiakowski 1989] ( Overall runtime O nd λ log ( ) ) d ɛ Ohad Shamir Stochastic PCA with Exponential Convergence 4/19

7 Existing Approaches Approach 3: Stochastic/Incremental Algorithms Example (Oja s algorithm) Initialize w 1 randomly on unit sphere For t = 1, 2,... Pick i t {1,..., n} (randomly or otherwise) w t+1 := w t + η t x it x i t w t w t+1 := w t+1 / w t+1 Also Krasulina 1969; Arora, Cotter, Livescu, Srebro 2012; Mitliagkas, Caramanis, Jain 2013; De Sa, Olukotun, Ré Ohad Shamir Stochastic PCA with Exponential Convergence 5/19

8 Existing Approaches Approach 3: Stochastic/Incremental Algorithms Example (Oja s algorithm) Initialize w 1 randomly on unit sphere For t = 1, 2,... Pick i t {1,..., n} (randomly or otherwise) w t+1 := w t + η t x it x i t w t w t+1 := w t+1 / w t+1 Also Krasulina 1969; Arora, Cotter, Livescu, Srebro 2012; Mitliagkas, Caramanis, Jain 2013; De Sa, Olukotun, Ré O(d) runtime per iteration Iteration bounds: Balsubramani, Dasgupta, Freund 2013: Õ ( ( d 1 λ 2 ɛ + d)) De Sa, Olukotun, Ré 2014: For a different SGD method, Õ ( ) d λ 2 ɛ ( ) Runtime: Õ d 2 λ 2 ɛ Ohad Shamir Stochastic PCA with Exponential Convergence 5/19

9 Existing Approaches Up to constants/log-factors: Algorithm Time per iter. # iter. Runtime Exact d 3 Power/Lanczos nd 1 λ p Incremental d d λ 2 ɛ nd λ p d 2 λ 2 ɛ Main Question Can we get the best of both worlds? O(d) time per iteration and fast convergence (logarithmic dependence on ɛ?) Ohad Shamir Stochastic PCA with Exponential Convergence 6/19

10 Convex Optimization to the Rescue? Our problem is equivalent to: min w: w =1 1 n n ( w, x i 2) i=1 Much recent progress in solving strongly convex + smooth problems with finite-sum structure min w W 1 n n f i (w) i=1 Stochastic algorithms with O(d) runtime per iteration and exponential convergence [Le Roux, Schmidt, Bach 2012; Shalev-Shwartz and Zhang 2012; Johnson and Zhang 2013; Zhang, Mahdavi, Jin 2013; Konečný and Richtárik 2013; Xiao and Zhang 2014; Zhang and Xiao, ] Ohad Shamir Stochastic PCA with Exponential Convergence 7/19

11 Convex Optimization to the Rescue? Unfortunately: 1 min w: w =1 n n ( w, x i 2) i=1 Function not strongly convex, or even convex (in fact, concave everywhere) Has > 1 global optima, plateaus... Existing results don t work as-is But: Maybe we can borrow some ideas... Ohad Shamir Stochastic PCA with Exponential Convergence 8/19

12 Algorithm Oja Iteration 1 min w: w =1 n n ( w, x i 2) i=1 Choose i t {1,..., n} at random w t+1 = w t + η t w t, x it x it w t+1 := w t+1 / w t+1 Essentially projected stochastic gradient descent Ohad Shamir Stochastic PCA with Exponential Convergence 9/19

13 Algorithm Letting A = 1 n n i=1 x ix i, update step is w t+1 = w t + η t x it x i t w t ) = w t + η t Aw }{{} t + η t (x it x i t A w t }{{} power/gradient step zero-mean noise Ohad Shamir Stochastic PCA with Exponential Convergence 10/19

14 Algorithm Letting A = 1 n n i=1 x ix i, update step is w t+1 = w t + η t x it x i t w t ) = w t + η t Aw }{{} t + η t (x it x i t A w t }{{} power/gradient step zero-mean noise Main idea: Replace by w t+1 = w t + ηaw }{{} t power/gradient step ( + η where ũ close to w t (similar to SVRG of Johnson and Zhang (2013)) ) A (w t ũ) x it x i t }{{} zero-mean noise Ohad Shamir Stochastic PCA with Exponential Convergence 10/19

15 Algorithm VR-PCA Parameters: Step size η, epoch length m Input: Data set {x i } n i=1, Initial unit vector w 0 For s = 1, 2,... ũ = 1 n n i=1 x ix i w s 1 w 0 = w s 1 For t = 1, 2,..., m Pick i t {1,..., n} uniformly at random w t = w t 1 + η ( x it x i t (w t 1 w s 1) + ũ ) w t = 1 w t w t w s = w m Ohad Shamir Stochastic PCA with Exponential Convergence 11/19

16 Algorithm VR-PCA Parameters: Step size η, epoch length m Input: Data set {x i } n i=1, Initial unit vector w 0 For s = 1, 2,... ũ = 1 n n i=1 x ix i w s 1 w 0 = w s 1 For t = 1, 2,..., m Pick i t {1,..., n} uniformly at random w t = w t 1 + η ( x it x i t (w t 1 w s 1) + ũ ) w t = 1 w t w t w s = w m To get k > 1 directions: Either repeat, or perform orthogonal-like iterations: Replace all vectors by k d matrices Replace normalization step by orthogonalization step Ohad Shamir Stochastic PCA with Exponential Convergence 11/19

17 Analysis Theorem Suppose max i x i 2 r, and A has leading eigenvector v 1. Assuming w 0, v 1 1 2, then for any δ, ɛ (0, 1), if η c 1δ 2 λ, m c 2 log(2/δ) r 2 (where c 1, c 2, c 3 are constants) and we run T = ( ) then Pr w T, v ɛ 1 2 log(1/ɛ)δ Corollary ηλ, mη 2 r 2 + r mη 2 log(2/δ) c 3, epochs, log(1/ɛ) log(2/δ) Picking η, m appropriately, ɛ-convergence w.h.p. in O ( d ( n + 1 λ 2 ) log ( 1 ɛ )) runtime Exponential convergence with O(d)-time iterations Proportional to # examples plus eigengap Proportional to single data pass if λ 1/ n Ohad Shamir Stochastic PCA with Exponential Convergence 12/19

18 Proof Idea Track decay of F (w t ) = 1 w t, v 1 2 Key Lemma Assuming η = αλ and F (w t ) 3/4, E [F (w t+1 ) w t ] ( 1 Θ(αλ 2 ) ) F (w t ) + O ( α 2 λ 2 F ( w s 1 ) ). Ohad Shamir Stochastic PCA with Exponential Convergence 13/19

19 Proof Idea Track decay of F (w t ) = 1 w t, v 1 2 Key Lemma Assuming η = αλ and F (w t ) 3/4, E [F (w t+1 ) w t ] ( 1 Θ(αλ 2 ) ) F (w t ) + O ( α 2 λ 2 F ( w s 1 ) ). Ohad Shamir Stochastic PCA with Exponential Convergence 13/19

20 Proof Idea Assume η = αλ (α 1) Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

21 Proof Idea Assume η = αλ (α 1) Using martingale arguments: W.h.p., never reach flat region in m O ( 1 α 2 λ 2 ) iterations Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

22 Proof Idea Assume η = αλ (α 1) Using martingale arguments: W.h.p., never reach flat region in m O ( 1 α 2 λ 2 ) iterations For all t m E [F (w t+1 ) w t ] ( 1 Θ(αλ 2 ) ) F (w t ) + O ( α 2 λ 2 F ( w s 1 ) ). Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

23 Proof Idea Assume η = αλ (α 1) Using martingale arguments: W.h.p., never reach flat region in m O ( 1 α 2 λ 2 ) iterations For all t m E [F (w t+1 ) w t ] ( 1 Θ(αλ 2 ) ) F (w t ) + O ( α 2 λ 2 F ( w s 1 ) ). Carefully unwinding recursion, and using w 0 = w s 1, E [F (w m ) w 0 ] (( 1 Θ(αλ 2 ) ) m + O(α) ) F ( ws 1 ) Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

24 Proof Idea Assume η = αλ (α 1) Using martingale arguments: W.h.p., never reach flat region in m O ( 1 α 2 λ 2 ) iterations For all t m E [F (w t+1 ) w t ] ( 1 Θ(αλ 2 ) ) F (w t ) + O ( α 2 λ 2 F ( w s 1 ) ). Carefully unwinding recursion, and using w 0 = w s 1, E [F (w m ) w 0 ] (( 1 Θ(αλ 2 ) ) m + O(α) ) F ( ws 1 ) If m Ω ( 1 αλ 2 ), F (wm ) smaller than F ( w s 1 ) by some constant factor Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

25 Proof Idea Assume η = αλ (α 1) Using martingale arguments: W.h.p., never reach flat region in m O ( 1 α 2 λ 2 ) iterations For all t m E [F (w t+1 ) w t ] ( 1 Θ(αλ 2 ) ) F (w t ) + O ( α 2 λ 2 F ( w s 1 ) ). Carefully unwinding recursion, and using w 0 = w s 1, E [F (w m ) w 0 ] (( 1 Θ(αλ 2 ) ) m + O(α) ) F ( ws 1 ) If m Ω ( 1 αλ 2 ), F (wm ) smaller than F ( w s 1 ) by some constant factor 1 Overall, if m 1, every epoch shrinks F by αλ 2 α 2 λ 2 constant factor Ohad Shamir Stochastic PCA with Exponential Convergence 14/19

26 Experiments Ran VR-PCA with Random initialization m = n (epoch one pass over data) η = 0.05/ n (based on theory) Compared to Oja s algorithm with hand-tuned step-size Ohad Shamir Stochastic PCA with Exponential Convergence 15/19

27 Experiments λ= λ= λ= λ= λ= VR PCA η t = 1/t η t = 3/t η t = 9/t η t = 27/t Ohad Shamir Stochastic PCA with Exponential Convergence 16/19

28 Preliminary Experiments VR PCA η t =3 η t =9 η t =27 η t =81 η t =243 Hybrid Ohad Shamir Stochastic PCA with Exponential Convergence 17/19

29 Work-in-progress and Open Questions Ohad Shamir Stochastic PCA with Exponential Convergence 18/19

30 Work-in-progress and Open Questions Runtime is O(d(n + 1 ) log ( ) 1 λ 2 ɛ ) can we get O(d(n + 1 λ ) log ( ) 1 ɛ or better, analogous to convex case? Experimentally, required parameters do seem somewhat different Generalizing analysis to PCA with several directions Theoretically optimal parameters without knowing λ Other fast-stochastic approaches Other non-convex problems Ohad Shamir Stochastic PCA with Exponential Convergence 18/19

31 More details: arxiv 1409:2848 THANKS! Ohad Shamir Stochastic PCA with Exponential Convergence 19/19

A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate

A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate Ohad Shamir Weizmann Institute of Science, Rehovot, Israel OHAD.SHAMIR@WEIZMANN.AC.IL Abstract We describe and analyze a simple algorithm for principal component analysis and singular value decomposition,

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization Shai Shalev-Shwartz and Tong Zhang School of CS and Engineering, The Hebrew University of Jerusalem Optimization for Machine

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Diffusion Approximations for Online Principal Component Estimation and Global Convergence

Diffusion Approximations for Online Principal Component Estimation and Global Convergence Diffusion Approximations for Online Principal Component Estimation and Global Convergence Chris Junchi Li Mengdi Wang Princeton University Department of Operations Research and Financial Engineering, Princeton,

More information

Online (and Distributed) Learning with Information Constraints. Ohad Shamir

Online (and Distributed) Learning with Information Constraints. Ohad Shamir Online (and Distributed) Learning with Information Constraints Ohad Shamir Weizmann Institute of Science Online Algorithms and Learning Workshop Leiden, November 2014 Ohad Shamir Learning with Information

More information

Stochastic Optimization

Stochastic Optimization Introduction Related Work SGD Epoch-GD LM A DA NANJING UNIVERSITY Lijun Zhang Nanjing University, China May 26, 2017 Introduction Related Work SGD Epoch-GD Outline 1 Introduction 2 Related Work 3 Stochastic

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong

More information

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization

Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Online Generalized Eigenvalue Decomposition: Primal Dual Geometry and Inverse-Free Stochastic Optimization Xingguo Li Zhehui Chen Lin Yang Jarvis Haupt Tuo Zhao University of Minnesota Princeton University

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Fast Stochastic Optimization Algorithms for ML

Fast Stochastic Optimization Algorithms for ML Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2

More information

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization

Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization Accelerated Stochastic Block Coordinate Gradient Descent for Sparsity Constrained Nonconvex Optimization Jinghui Chen Department of Systems and Information Engineering University of Virginia Quanquan Gu

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan Shiqian Ma Yu-Hong Dai Yuqiu Qian May 16, 2016 Abstract One of the major issues in stochastic gradient descent (SGD) methods is how

More information

Streaming Principal Component Analysis in Noisy Settings

Streaming Principal Component Analysis in Noisy Settings Streaming Principal Component Analysis in Noisy Settings Teodor V. Marinov * 1 Poorya Mianjy * 1 Raman Arora 1 Abstract We study streaming algorithms for principal component analysis (PCA) in noisy settings.

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Fast and Simple PCA via Convex Optimization

Fast and Simple PCA via Convex Optimization Fast and Simple PCA via Convex Optimization Dan Garber Technion - Israel Inst. of Tech. dangar@tx.technion.ac.il Elad Hazan Princeton University ehazan@cs.princeton.edu arxiv:509.05647v [math.oc] 8 Sep

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method

Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method Davood Hajinezhad Iowa State University Davood Hajinezhad Optimizing Nonconvex Finite Sums by a Proximal Primal-Dual Method 1 / 35 Co-Authors

More information

Full-information Online Learning

Full-information Online Learning Introduction Expert Advice OCO LM A DA NANJING UNIVERSITY Full-information Lijun Zhang Nanjing University, China June 2, 2017 Outline Introduction Expert Advice OCO 1 Introduction Definitions Regret 2

More information

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization Alexander Rakhlin University of Pennsylvania Ohad Shamir Microsoft Research New England Karthik Sridharan University of Pennsylvania

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Lecture 3-A - Convex) SUVRIT SRA Massachusetts Institute of Technology Special thanks: Francis Bach (INRIA, ENS) (for sharing this material, and permitting its use) MPI-IS

More information

Finite-sum Composition Optimization via Variance Reduced Gradient Descent

Finite-sum Composition Optimization via Variance Reduced Gradient Descent Finite-sum Composition Optimization via Variance Reduced Gradient Descent Xiangru Lian Mengdi Wang Ji Liu University of Rochester Princeton University University of Rochester xiangru@yandex.com mengdiw@princeton.edu

More information

Convergence Analysis of Gradient Descent for Eigenvector Computation

Convergence Analysis of Gradient Descent for Eigenvector Computation Convergence Analysis of Gradient Descent for Eigenvector Computation Zhiqiang Xu, Xin Cao 2 and Xin Gao KAUST, Saudi Arabia 2 UNSW, Australia zhiqiang.u@kaust.edu.sa, in.cao@unsw.edu.au, in.gao@kaust.edu.sa

More information

Exponentially convergent stochastic k-pca without variance reduction

Exponentially convergent stochastic k-pca without variance reduction arxiv:1904.01750v1 [cs.lg 3 Apr 2019 Exponentially convergent stochastic k-pca without variance reduction Cheng Tang Amazon AI Abstract We present Matrix Krasulina, an algorithm for online k-pca, by generalizing

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get

Noisy Streaming PCA. Noting g t = x t x t, rearranging and dividing both sides by 2η we get Supplementary Material A. Auxillary Lemmas Lemma A. Lemma. Shalev-Shwartz & Ben-David,. Any update of the form P t+ = Π C P t ηg t, 3 for an arbitrary sequence of matrices g, g,..., g, projection Π C onto

More information

Mini-Course 1: SGD Escapes Saddle Points

Mini-Course 1: SGD Escapes Saddle Points Mini-Course 1: SGD Escapes Saddle Points Yang Yuan Computer Science Department Cornell University Gradient Descent (GD) Task: min x f (x) GD does iterative updates x t+1 = x t η t f (x t ) Gradient Descent

More information

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic

More information

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations Improved Optimization of Finite Sums with Miniatch Stochastic Variance Reduced Proximal Iterations Jialei Wang University of Chicago Tong Zhang Tencent AI La Astract jialei@uchicago.edu tongzhang@tongzhang-ml.org

More information

Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration

Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration Stochastic Approximation: Mini-Batches, Optimistic Rates and Acceleration Nati Srebro Toyota Technological Institute at Chicago a philanthropically endowed academic computer science institute dedicated

More information

Nesterov s Acceleration

Nesterov s Acceleration Nesterov s Acceleration Nesterov Accelerated Gradient min X f(x)+ (X) f -smooth. Set s 1 = 1 and = 1. Set y 0. Iterate by increasing t: g t 2 @f(y t ) s t+1 = 1+p 1+4s 2 t 2 y t = x t + s t 1 s t+1 (x

More information

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis Dan Garber Ohad Shamir Nathan Srebro 3 Abstract We study the fundamental problem of Principal Component Analysis

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Lecture 3: Minimizing Large Sums. Peter Richtárik

Lecture 3: Minimizing Large Sums. Peter Richtárik Lecture 3: Minimizing Large Sums Peter Richtárik Graduate School in Systems, Op@miza@on, Control and Networks Belgium 2015 Mo@va@on: Machine Learning & Empirical Risk Minimiza@on Training Linear Predictors

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

Failures of Gradient-Based Deep Learning

Failures of Gradient-Based Deep Learning Failures of Gradient-Based Deep Learning Shai Shalev-Shwartz, Shaked Shammah, Ohad Shamir The Hebrew University and Mobileye Representation Learning Workshop Simons Institute, Berkeley, 2017 Shai Shalev-Shwartz

More information

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization

Modern Stochastic Methods. Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization Modern Stochastic Methods Ryan Tibshirani (notes by Sashank Reddi and Ryan Tibshirani) Convex Optimization 10-725 Last time: conditional gradient method For the problem min x f(x) subject to x C where

More information

Accelerating SVRG via second-order information

Accelerating SVRG via second-order information Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical

More information

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm

Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm Stochastic Gradient Descent, Weighted Sampling, and the Randomized Kaczmarz algorithm Deanna Needell Department of Mathematical Sciences Claremont McKenna College Claremont CA 97 dneedell@cmc.edu Nathan

More information

Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation

Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation Gen-Oja: A Simple and Efficient Algorithm for Streaming Generalized Eigenvector Computation Kush Bhatia University of California, Berkeley kushbhatia@berkeley.edu Nicolas Flammarion University of California,

More information

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent

IFT Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent IFT 6085 - Lecture 6 Nesterov s Accelerated Gradient, Stochastic Gradient Descent This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s):

More information

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow Gang Wang,, Georgios B. Giannakis, and Jie Chen Dept. of ECE and Digital Tech. Center, Univ. of Minnesota,

More information

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm

Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Minimizing Finite Sums with the Stochastic Average Gradient Algorithm Joint work with Nicolas Le Roux and Francis Bach University of British Columbia Context: Machine Learning for Big Data Large-scale

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

1. Introduction. We consider the problem of minimizing the sum of two convex functions: = F (x)+r(x)}, f i (x),

1. Introduction. We consider the problem of minimizing the sum of two convex functions: = F (x)+r(x)}, f i (x), SIAM J. OPTIM. Vol. 24, No. 4, pp. 2057 2075 c 204 Society for Industrial and Applied Mathematics A PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION LIN XIAO AND TONG ZHANG Abstract.

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Stochastic optimization: Beyond stochastic gradients and convexity Part I

Stochastic optimization: Beyond stochastic gradients and convexity Part I Stochastic optimization: Beyond stochastic gradients and convexity Part I Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint tutorial with Suvrit Sra, MIT - NIPS

More information

Coordinate Descent Faceoff: Primal or Dual?

Coordinate Descent Faceoff: Primal or Dual? JMLR: Workshop and Conference Proceedings 83:1 22, 2018 Algorithmic Learning Theory 2018 Coordinate Descent Faceoff: Primal or Dual? Dominik Csiba peter.richtarik@ed.ac.uk School of Mathematics University

More information

A tutorial on sparse modeling. Outline:

A tutorial on sparse modeling. Outline: A tutorial on sparse modeling. Outline: 1. Why? 2. What? 3. How. 4. no really, why? Sparse modeling is a component in many state of the art signal processing and machine learning tasks. image processing

More information

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)

More information

Optimization for Machine Learning (Lecture 3-B - Nonconvex)

Optimization for Machine Learning (Lecture 3-B - Nonconvex) Optimization for Machine Learning (Lecture 3-B - Nonconvex) SUVRIT SRA Massachusetts Institute of Technology MPI-IS Tübingen Machine Learning Summer School, June 2017 ml.mit.edu Nonconvex problems are

More information

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations

Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Stochastic Optimization for Deep CCA via Nonlinear Orthogonal Iterations Weiran Wang Toyota Technological Institute at Chicago * Joint work with Raman Arora (JHU), Karen Livescu and Nati Srebro (TTIC)

More information

Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation

Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation Gradient Descent Meets Shift-and-Invert Preconditioning for Eigenvector Computation Zhiqiang Xu Cognitive Computing Lab (CCL, Baidu Research National Engineering Laboratory of Deep Learning Technology

More information

Accelerated Stochastic Power Iteration

Accelerated Stochastic Power Iteration Peng Xu 1 Bryan He 1 Christopher De Sa Ioannis Mitliagkas 3 Christopher Ré 1 1 Stanford University Cornell University 3 University of Montréal Abstract Principal component analysis (PCA) is one of the

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization

Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization Limitations on Variance-Reduction and Acceleration Schemes for Finite Sum Optimization Yossi Arjevani Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot 7610001,

More information

arxiv: v2 [stat.ml] 16 Jun 2015

arxiv: v2 [stat.ml] 16 Jun 2015 Semi-Stochastic Gradient Descent Methods Jakub Konečný Peter Richtárik arxiv:1312.1666v2 [stat.ml] 16 Jun 2015 School of Mathematics University of Edinburgh United Kingdom June 15, 2015 (first version:

More information

Correlated-PCA: Principal Components Analysis when Data and Noise are Correlated

Correlated-PCA: Principal Components Analysis when Data and Noise are Correlated Correlated-PCA: Principal Components Analysis when Data and Noise are Correlated Namrata Vaswani and Han Guo Iowa State University, Ames, IA, USA Email: {namrata,hanguo}@iastate.edu Abstract Given a matrix

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

arxiv: v1 [math.oc] 18 Mar 2016

arxiv: v1 [math.oc] 18 Mar 2016 Katyusha: Accelerated Variance Reduction for Faster SGD Zeyuan Allen-Zhu zeyuan@csail.mit.edu Princeton University arxiv:1603.05953v1 [math.oc] 18 Mar 016 March 18, 016 Abstract We consider minimizing

More information

Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization

Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Lin Xiao (Microsoft Research) Joint work with Qihang Lin (CMU), Zhaosong Lu (Simon Fraser) Yuchen Zhang (UC Berkeley)

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer

More information

Optimistic Rates Nati Srebro

Optimistic Rates Nati Srebro Optimistic Rates Nati Srebro Based on work with Karthik Sridharan and Ambuj Tewari Examples based on work with Andy Cotter, Elad Hazan, Tomer Koren, Percy Liang, Shai Shalev-Shwartz, Ohad Shamir, Karthik

More information

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos 1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and

More information

CSC321 Lecture 8: Optimization

CSC321 Lecture 8: Optimization CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:

More information

Efficient Stochastic Optimization for Low-Rank Distance Metric Learning

Efficient Stochastic Optimization for Low-Rank Distance Metric Learning Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17 Efficient Stochastic Optimization for Low-Rank Distance Metric Learning Jie Zhang, Lijun Zhang National Key Laboratory

More information

arxiv: v2 [cs.lg] 4 Oct 2016

arxiv: v2 [cs.lg] 4 Oct 2016 Appearing in 2016 IEEE International Conference on Data Mining (ICDM), Barcelona. Efficient Distributed SGD with Variance Reduction Soham De and Tom Goldstein Department of Computer Science, University

More information

Subsampling for Ridge Regression via Regularized Volume Sampling

Subsampling for Ridge Regression via Regularized Volume Sampling Subsampling for Ridge Regression via Regularized Volume Sampling Micha l Dereziński and Manfred Warmuth University of California at Santa Cruz AISTATS 18, 4-9-18 1 / 22 Linear regression y x 2 / 22 Optimal

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection

Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Efficiently Training Sum-Product Neural Networks using Forward Greedy Selection Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Greedy Algorithms, Frank-Wolfe and Friends

More information

arxiv: v2 [cs.na] 3 Apr 2018

arxiv: v2 [cs.na] 3 Apr 2018 Stochastic variance reduced multiplicative update for nonnegative matrix factorization Hiroyui Kasai April 5, 208 arxiv:70.078v2 [cs.a] 3 Apr 208 Abstract onnegative matrix factorization (MF), a dimensionality

More information

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity

Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Randomized Coordinate Descent with Arbitrary Sampling: Algorithms and Complexity Zheng Qu University of Hong Kong CAM, 23-26 Aug 2016 Hong Kong based on joint work with Peter Richtarik and Dominique Cisba(University

More information

SGD and Randomized projection algorithms for overdetermined linear systems

SGD and Randomized projection algorithms for overdetermined linear systems SGD and Randomized projection algorithms for overdetermined linear systems Deanna Needell Claremont McKenna College IPAM, Feb. 25, 2014 Includes joint work with Eldar, Ward, Tropp, Srebro-Ward Setup Setup

More information

Mini-Batch Primal and Dual Methods for SVMs

Mini-Batch Primal and Dual Methods for SVMs Mini-Batch Primal and Dual Methods for SVMs Peter Richtárik School of Mathematics The University of Edinburgh Coauthors: M. Takáč (Edinburgh), A. Bijral and N. Srebro (both TTI at Chicago) arxiv:1303.2314

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

arxiv: v1 [stat.ml] 15 Feb 2018

arxiv: v1 [stat.ml] 15 Feb 2018 1 : A New Algorithm for Streaming PCA arxi:1802.054471 [stat.ml] 15 Feb 2018 Puyudi Yang, Cho-Jui Hsieh, Jane-Ling Wang Uniersity of California, Dais pydyang, chohsieh, janelwang@ucdais.edu Abstract In

More information

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization

Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization Doubly Accelerated Stochastic Variance Reduced Dual Averaging Method for Regularized Empirical Risk Minimization Tomoya Murata Taiji Suzuki More recently, several authors have proposed accelerarxiv:70300439v4

More information

A Parallel SGD method with Strong Convergence

A Parallel SGD method with Strong Convergence A Parallel SGD method with Strong Convergence Dhruv Mahajan Microsoft Research India dhrumaha@microsoft.com S. Sundararajan Microsoft Research India ssrajan@microsoft.com S. Sathiya Keerthi Microsoft Corporation,

More information

References. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization

References. --- a tentative list of papers to be mentioned in the ICML 2017 tutorial. Recent Advances in Stochastic Convex and Non-Convex Optimization References --- a tentative list of papers to be mentioned in the ICML 2017 tutorial Recent Advances in Stochastic Convex and Non-Convex Optimization Disclaimer: in a quite arbitrary order. 1. [ShalevShwartz-Zhang,

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

On the Iteration Complexity of Oblivious First-Order Optimization Algorithms

On the Iteration Complexity of Oblivious First-Order Optimization Algorithms On the Iteration Complexity of Oblivious First-Order Optimization Algorithms Yossi Arjevani Weizmann Institute of Science, Rehovot 7610001, Israel Ohad Shamir Weizmann Institute of Science, Rehovot 7610001,

More information

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence

More information

Proximal Minimization by Incremental Surrogate Optimization (MISO)

Proximal Minimization by Incremental Surrogate Optimization (MISO) Proximal Minimization by Incremental Surrogate Optimization (MISO) (and a few variants) Julien Mairal Inria, Grenoble ICCOPT, Tokyo, 2016 Julien Mairal, Inria MISO 1/26 Motivation: large-scale machine

More information

arxiv: v5 [cs.lg] 23 Dec 2017 Abstract

arxiv: v5 [cs.lg] 23 Dec 2017 Abstract Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction Xingguo Li, Raman Arora, Han Liu, Jarvis Haupt, and Tuo Zhao arxiv:1605.02711v5 [cs.lg] 23 Dec 2017 Abstract We

More information

arxiv: v2 [stat.ml] 3 Jun 2017

arxiv: v2 [stat.ml] 3 Jun 2017 : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient Lam M. Nguyen Jie Liu Katya Scheinberg Martin Takáč arxiv:703.000v [stat.ml] 3 Jun 07 Abstract In this paper, we propose

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu Thanks to: Aryan Mokhtari, Mark

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Incremental Quasi-Newton methods with local superlinear convergence rate

Incremental Quasi-Newton methods with local superlinear convergence rate Incremental Quasi-Newton methods wh local superlinear convergence rate Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro Department of Electrical and Systems Engineering Universy of Pennsylvania Int. Conference

More information

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Principal Component Analysis (PCA) for Sparse High-Dimensional Data AB Principal Component Analysis (PCA) for Sparse High-Dimensional Data Tapani Raiko, Alexander Ilin, and Juha Karhunen Helsinki University of Technology, Finland Adaptive Informatics Research Center Principal

More information

A Two-Stage Approach for Learning a Sparse Model with Sharp

A Two-Stage Approach for Learning a Sparse Model with Sharp A Two-Stage Approach for Learning a Sparse Model with Sharp Excess Risk Analysis The University of Iowa, Nanjing University, Alibaba Group February 3, 2017 1 Problem and Chanllenges 2 The Two-stage Approach

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information