Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Size: px
Start display at page:

Download "Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo"

Transcription

1 Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

2 Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence rate of Newton s Method Ex: Consider ff xx = xx 2 and gg xx = ff xx 2 = xx2 4. GD for the second function takes smaller steps, whereas Newton s method solves this in a single step. Thus, potentially requires less hyperparameter tuning Hopefully improve training speed First order methods that achieve the theoretical lower bound are already achieved. Can we further improve? Number of iterations to converge may balance the per-iteration cost. Disadvantages: If HH(xx) is not invertible? Use Pseudo-inverse (Moore-Penrose) Computing HH 1 xx f(x) is expensive Approximate

3 Stochastic Newton Step? Suppose we are minimizing ff xx = mm kk=1 ff kk (xx) with ff kk xx μμ strongly convex and LL-smooth. For analysis of second order algorithms, we also need another constraint: HH(xx) is MM-Lipschitz HH xx HH yy xx yy Naïve Generalization: HH kk 1 (xx) ff kk (xx) Estimation of curvature hurts performance

4 Hessian-Free (HF) Optimization To avoid computing H, we instead compute HHHH, where v is any vector, which can be computed as for a small εε ff xx + εεεε + ff(xx) HHHH = εε To avoid inverting HH to obtain HHHH = ff xx, we solve min yy using Conjugate Gradient ff xx + yy TT ff xx yytt HHHH

5 Hessian Free Optimization Off the shelf HF algorithms are not feasible for large scale problems Damping, makes a more conservative curvature estimate Adding the constant λλ dd 2 to the curvature estimate depending on ρρ where ρρ = ff xx+pp ff(xx) qq xx pp qq xx (0) Computing Matrix Vector products Use GG instead of HH for Hv, where GG is the Gauss-Newton approximation for Hessian which is positive semidefinite Terminating conditions for CG CG finds solution to AAAA = bb not by optimizing AAAA bb 2 but by optimizing the quadratic φφ xx = 1 2 xxtt AAAA bb TT xx φφ xx decreases with every step whereas AAAA bb 2 fluctuates a lot before tending towards 0 Terminates when relative improvement of φφ xx over the last kk steps drops below a constant kkεε Many methods to better precondition for CG

6 Lower Bounds First order methods require Ω mm + mmmm log 1 oracle calls to the εε gradient to achieve an εε-approximate solution Linear dependence on the condition number is obtained by SVRG, SAGA, This minimal bound is obtained by Katyusha and AccSDCA Second Order methods: An algorithm can use at most mm Hessians for update 2 Indices kk [mm] sampled uniformly at random Input dimension: d = OO(1 + κκκκ) Oracle calls = Ω mm + mmmm log 1 εε Bound is better by logarithmic factor by randomized construction

7 Lower Bounds Discussion This lower bound suggests that second algorithms cannot improve rates of optimization by much. Because even with the oracles, second order methods requires computing the Hessian OO(mmdd 2 ) and inverting it OO(dd 3 ), simple second order algorithms are not attractive. Because of the assumption that the algorithm cannot use the Hessian of all samples HH kk xx kk [mm], we lose the quadratic convergence rate. Suggestion: If an algorithm that does not satisfy the assumption, it may achieve faster convergence than the bound presented. LiSSA-sample uses leverage scoring to sample the Hessians non-uniformly and achieves convergence rate in the high accuracy regime faster than any first order algorithm.

8 Overview min ff xx = 1 mm kk=1 mm 2 ff kk (xx) + λλ xx 2 ff kk (xx) is μμ-strongly convex, LL-smooth, Hessian MM-Lipschitz LiSSA and LiSSA-sample focuses on Generalized Linear Models (GLM): ff kk xx = ll(vv kk xx, yy kk ) with ll μμ strongly-convex and LL-smooth e.g. linear regression with Mean Squared Error loss, data (vv kk, yy kk ) This results in HH kk xx = αα kk vv kk vv kk TT Condition Number κκ = max xx λλ mmmmmm 2 ff xx min λλ mmmmmm 2 ff xx xx μμ strongly-convex and LL smooth implies LII 2 ff xx μμμμ

9 LiSSA Description (LiSSA LiSSA-Sample) Key Idea 1 (Estimator): Avoid direct inversion of the Hessian by using a recursive formula of Taylor Approximation for matrices AA 1 = ii=0 II AA ii Key Idea 2 (Concentration): With sufficient number of random matrices, Matrix Bernstein inequality gives a tail bound: AA ii ~ iiiiii PP AA with EE AA ii = 0 and EE AA ii MM where AA ii RR dddddd PP AA ii tt dd exp tt2 4RR 2

10 LiSSA Pseudocode Run any (fast) first order algorithm to obtain xx 0 such that xx 0 xx 1 (in practice, use some estimate) 4κκ ll MM For each iteration t = 0,, T-1: Compute the full gradient X i = ff xx tt = 1 mm kk ff kk xx tt ii [SS 1 ] where SS 1 is a parameter Iterate Inner loop 2 SS 1 κκ ll ln(2 κκ ll ) times: Compute the Hessian of a single random sample HH XX ii = f x t + I HH XX ii xx tt+1 = xx tt 1 SS 1 ii SS 1 XX ii

11 LiSSA Analysis Time Complexity (convergence rate + per-iteration cost): ff xx TT ff xx εε requires time OO( mm + κκ ll 3 dd log 1 εε for small εε with high probability.

12 LiSSA Sample Description Key Idea 1: Hessian Sketch BBHH 1 ff xx = arg min ff xx TT yy + 1 yy 2 yytt HH xx BB 1 yy Leverage Score: Measurement of deviation of a sample from other observations Sample OO(dd log dd) Hessians uniformly at random, without replacement Use these to compute (generalized) leverage scores for all samples BB = mm 1 kk=1 HH pp kk BBBBBBBBBBBBBBBBBB(pp kk ) (where pp kk kk 1 llllllllllllllll ssssssssss kk )

13 LiSSA-Sample Pseudocode Repeat for log log 1 εε : 1. Sample HH kk ~ pp kk where pp kk to compute BB = llllllllllllllll ssssssssss OO(dd log dd) kk=1 HHkk (xx) such that 1 BB AA 2BB 2 2. Minimize the quadratic objective (approximately): yy BBHH 1 ff xx = arg min ff xx TT yy + 1 yy 2 yytt HH xx BB 1 yy 3. Approximately solve for u to obtain HH 1 ff xx uu BB 1 yy 4. Update: xx xx uu 1

14 Computing Leverage scores efficiently Computing Leverage scores requires computing: TT γγ ii = vv ii HH 1 vv ii = AA 1 vv ii 2 2 dd log dd where H = kk=1 HHkk = AAAA TT Instead, randomly sample GG RR OO log mmmm dd where each entry is a normal random variable and compute γγ ii = GGAA 1 2 vv ii 2 By Johnson-Lindenstrauss Lemma, with high probability: γγ ii 1 2 AA 1 vv ii, 2 AA 1 vv ii 2 2 All of this takes OO dd 2 log mmmm + mmmm + dd Note: dd 2 mmmm κκκκ

15 LiSSA-Sample Analysis In the high accuracy regime (i.e. εε small), LiSSA-Sample enjoys a convergence rate (with high probability): OO mmmm log 1 εε + dd + κκκκ )dd log2 1 εε log log 1 εε OO mmmm + dd κκκκ log 2 1 εε when κκ > mm dd ( dd 2 mmmm) This is faster than accelerated first order methods: OO mmdd + dd κκκκ log 1 εε

16 Extensions to Nonconvex Optimization 1. Same authors used a similar observation to extend the algorithm for non-convex optimization, proving a convergence rate (to a local min) that is faster than gradient descent. 2. Use similar techniques such as non-uniform sampling or sketching to use the Saddle-Free Newton Method proposed by Dauphin: HH 1 ff(xx)

17 LiSSA Experiments

18 Empirical study: sketch size and convergence speed

19 Sketch Hessian vs computing exact Hessian

20 Red curve sketch results in deviation in find optimal point, giving independent trials to verify. As sketch sizes increases, it converges to the center path.

21 References Zeyuan Allen-Zhu, Katyusha: the first direct acceleration of stochastic gradient methods, Proceedings of the 49 th Annual ACM SIGACT Symposium on Theory of Computing, June 19-23, 2017, Montreal, Canada Naman Agarwal, Brian Bullins, and Elad Hazan. Second-order stochastic optimization for machine learning in linear time. The Journal of Machine Learning Research, 18(1): , Alekh Agarwal, Leon Bottou. A Lower Bound for the Optimization of Finite Sums. Jounral of Machine Learning Research Yossi Arjevani and Ohad Shamir. Oracle Complexity of Second-Order Methods for Finite-Sum Problems. In: arxiv preprint Naman Agarwal et al. Finding Approximate Local Minima Faster than Gradient Descent. In arxiv preprint Cohen et al. Uniform Sampling for Matrix Approximation. Proceedings of the 6 th Conference on Innovations in Theoretical Computer Science (ITCS) James Martens. Deep learning via Hessian Free Optimization. In Proceedings of the 26 th international Conference on Machine Learning, 2010 Dauphin et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NIPS

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Second-Order Stochastic Optimization for Machine Learning in Linear Time Journal of Machine Learning Research 8 (207) -40 Submitted 9/6; Revised 8/7; Published /7 Second-Order Stochastic Optimization for Machine Learning in Linear Time Naman Agarwal Computer Science Department

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:

More information

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il

More information

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter

More information

arxiv: v4 [math.oc] 24 Apr 2017

arxiv: v4 [math.oc] 24 Apr 2017 Finding Approximate ocal Minima Faster than Gradient Descent arxiv:6.046v4 [math.oc] 4 Apr 07 Naman Agarwal namana@cs.princeton.edu Princeton University Zeyuan Allen-Zhu zeyuan@csail.mit.edu Institute

More information

2.4 Error Analysis for Iterative Methods

2.4 Error Analysis for Iterative Methods 2.4 Error Analysis for Iterative Methods 1 Definition 2.7. Order of Convergence Suppose {pp nn } nn=0 is a sequence that converges to pp with pp nn pp for all nn. If positive constants λλ and αα exist

More information

Last Name _Piatoles_ Given Name Americo ID Number

Last Name _Piatoles_ Given Name Americo ID Number Last Name _Piatoles_ Given Name Americo ID Number 20170908 Question n. 1 The "C-V curve" method can be used to test a MEMS in the electromechanical characterization phase. Describe how this procedure is

More information

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman Lecture, Sept 2016 1 Collaborators Raghu Bollapragada Northwestern University Richard Byrd University of Colorado

More information

General Strong Polarization

General Strong Polarization General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 1, 018 G.Tech:

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

7.3 The Jacobi and Gauss-Seidel Iterative Methods

7.3 The Jacobi and Gauss-Seidel Iterative Methods 7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2

More information

Statistical Learning with the Lasso, spring The Lasso

Statistical Learning with the Lasso, spring The Lasso Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function

More information

A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions

A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions Lin Lin A Posteriori DG using Non-Polynomial Basis 1 A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions Lin Lin Department of Mathematics, UC Berkeley;

More information

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated

More information

An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University

An Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed

More information

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra Worksheets for GCSE Mathematics Quadratics mr-mathematics.com Maths Resources for Teachers Algebra Quadratics Worksheets Contents Differentiated Independent Learning Worksheets Solving x + bx + c by factorisation

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Review for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa Review for Exam3 12. 9. 2015 Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa Assistant Research Scientist IIHR-Hydroscience & Engineering, University

More information

Stochastic Quasi-Newton Methods

Stochastic Quasi-Newton Methods Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent

More information

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer

Tutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,

More information

Lecture 11. Kernel Methods

Lecture 11. Kernel Methods Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product

More information

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition Work, Energy, and Power Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 With the knowledge we got so far, we can handle the situation on the left but not the one on the right.

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

Exact and Inexact Subsampled Newton Methods for Optimization

Exact and Inexact Subsampled Newton Methods for Optimization Exact and Inexact Subsampled Newton Methods for Optimization Raghu Bollapragada Richard Byrd Jorge Nocedal September 27, 2016 Abstract The paper studies the solution of stochastic optimization problems

More information

The Randomized Newton Method for Convex Optimization

The Randomized Newton Method for Convex Optimization The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:

More information

(1) Correspondence of the density matrix to traditional method

(1) Correspondence of the density matrix to traditional method (1) Correspondence of the density matrix to traditional method New method (with the density matrix) Traditional method (from thermal physics courses) ZZ = TTTT ρρ = EE ρρ EE = dddd xx ρρ xx ii FF = UU

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Radial Basis Function (RBF) Networks

Radial Basis Function (RBF) Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Secondary 3H Unit = 1 = 7. Lesson 3.3 Worksheet. Simplify: Lesson 3.6 Worksheet

Secondary 3H Unit = 1 = 7. Lesson 3.3 Worksheet. Simplify: Lesson 3.6 Worksheet Secondary H Unit Lesson Worksheet Simplify: mm + 2 mm 2 4 mm+6 mm + 2 mm 2 mm 20 mm+4 5 2 9+20 2 0+25 4 +2 2 + 2 8 2 6 5. 2 yy 2 + yy 6. +2 + 5 2 2 2 0 Lesson 6 Worksheet List all asymptotes, holes and

More information

1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2,

1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2, 1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. x = 4, x = 3, x = 2, x = 1, x = 1, x = 2, x = 3, x = 4, x = 5 b. Find the value(s)

More information

Convex Optimization Algorithms for Machine Learning in 10 Slides

Convex Optimization Algorithms for Machine Learning in 10 Slides Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

Classical RSA algorithm

Classical RSA algorithm Classical RSA algorithm We need to discuss some mathematics (number theory) first Modulo-NN arithmetic (modular arithmetic, clock arithmetic) 9 (mod 7) 4 3 5 (mod 7) congruent (I will also use = instead

More information

PHY103A: Lecture # 4

PHY103A: Lecture # 4 Semester II, 2017-18 Department of Physics, IIT Kanpur PHY103A: Lecture # 4 (Text Book: Intro to Electrodynamics by Griffiths, 3 rd Ed.) Anand Kumar Jha 10-Jan-2018 Notes The Solutions to HW # 1 have been

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Review for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa

Review for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa Review for Exam2 11. 13. 2015 Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa Assistant Research Scientist IIHR-Hydroscience & Engineering, University

More information

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University

Property Testing and Affine Invariance Part I Madhu Sudan Harvard University Property Testing and Affine Invariance Part I Madhu Sudan Harvard University December 29-30, 2015 IITB: Property Testing & Affine Invariance 1 of 31 Goals of these talks Part I Introduce Property Testing

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos

Stochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos 1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and

More information

Mathematics Ext 2. HSC 2014 Solutions. Suite 403, 410 Elizabeth St, Surry Hills NSW 2010 keystoneeducation.com.

Mathematics Ext 2. HSC 2014 Solutions. Suite 403, 410 Elizabeth St, Surry Hills NSW 2010 keystoneeducation.com. Mathematics Ext HSC 4 Solutions Suite 43, 4 Elizabeth St, Surry Hills NSW info@keystoneeducation.com.au keystoneeducation.com.au Mathematics Extension : HSC 4 Solutions Contents Multiple Choice... 3 Question...

More information

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic

More information

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations Improved Optimization of Finite Sums with Miniatch Stochastic Variance Reduced Proximal Iterations Jialei Wang University of Chicago Tong Zhang Tencent AI La Astract jialei@uchicago.edu tongzhang@tongzhang-ml.org

More information

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE

Expectation Propagation performs smooth gradient descent GUILLAUME DEHAENE Expectation Propagation performs smooth gradient descent 1 GUILLAUME DEHAENE In a nutshell Problem: posteriors are uncomputable Solution: parametric approximations 2 But which one should we choose? Laplace?

More information

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems SECTION 5: POWER FLOW ESE 470 Energy Distribution Systems 2 Introduction Nodal Analysis 3 Consider the following circuit Three voltage sources VV sss, VV sss, VV sss Generic branch impedances Could be

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Selected Topics in Optimization. Some slides borrowed from

Selected Topics in Optimization. Some slides borrowed from Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model

More information

Sub-Sampled Newton Methods

Sub-Sampled Newton Methods Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1

More information

Dan Roth 461C, 3401 Walnut

Dan Roth   461C, 3401 Walnut CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

More information

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline These preliminaries serve to signal to students what tools they need to know to succeed in ECON 360 and refresh their

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.

ECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor. ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.

More information

arxiv: v2 [math.oc] 1 Nov 2017

arxiv: v2 [math.oc] 1 Nov 2017 Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of

More information

Control of Mobile Robots

Control of Mobile Robots Control of Mobile Robots Regulation and trajectory tracking Prof. Luca Bascetta (luca.bascetta@polimi.it) Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Organization and

More information

An Investigation of Newton-Sketch and Subsampled Newton Methods

An Investigation of Newton-Sketch and Subsampled Newton Methods An Investigation of Newton-Sketch and Subsampled Newton Methods Albert S. Berahas, Raghu Bollapragada Northwestern University Evanston, IL {albertberahas,raghu.bollapragada}@u.northwestern.edu Jorge Nocedal

More information

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate

First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate 58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

CHAPTER 2 Special Theory of Relativity

CHAPTER 2 Special Theory of Relativity CHAPTER 2 Special Theory of Relativity Fall 2018 Prof. Sergio B. Mendes 1 Topics 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 Inertial Frames of Reference Conceptual and Experimental

More information

Lecture 3 Optimization methods for econometrics models

Lecture 3 Optimization methods for econometrics models Lecture 3 Optimization methods for econometrics models Cinzia Cirillo University of Maryland Department of Civil and Environmental Engineering 06/29/2016 Summer Seminar June 27-30, 2016 Zinal, CH 1 Overview

More information

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard

Uncertain Compression & Graph Coloring. Madhu Sudan Harvard Uncertain Compression & Graph Coloring Madhu Sudan Harvard Based on joint works with: (1) Adam Kalai (MSR), Sanjeev Khanna (U.Penn), Brendan Juba (WUStL) (2) Elad Haramaty (Harvard) (3) Badih Ghazi (MIT),

More information

On one Application of Newton s Method to Stability Problem

On one Application of Newton s Method to Stability Problem Journal of Multidciplinary Engineering Science Technology (JMEST) ISSN: 359-0040 Vol. Issue 5, December - 204 On one Application of Newton s Method to Stability Problem Şerife Yılmaz Department of Mathematics,

More information

Course Business. Homework 3 Due Now. Homework 4 Released. Professor Blocki is travelling, but will be back next week

Course Business. Homework 3 Due Now. Homework 4 Released. Professor Blocki is travelling, but will be back next week Course Business Homework 3 Due Now Homework 4 Released Professor Blocki is travelling, but will be back next week 1 Cryptography CS 555 Week 11: Discrete Log/DDH Applications of DDH Factoring Algorithms,

More information

SECTION 5: CAPACITANCE & INDUCTANCE. ENGR 201 Electrical Fundamentals I

SECTION 5: CAPACITANCE & INDUCTANCE. ENGR 201 Electrical Fundamentals I SECTION 5: CAPACITANCE & INDUCTANCE ENGR 201 Electrical Fundamentals I 2 Fluid Capacitor Fluid Capacitor 3 Consider the following device: Two rigid hemispherical shells Separated by an impermeable elastic

More information

Local Decoding and Testing Polynomials over Grids

Local Decoding and Testing Polynomials over Grids Local Decoding and Testing Polynomials over Grids Madhu Sudan Harvard University Joint work with Srikanth Srinivasan (IIT Bombay) January 11, 2018 ITCS: Polynomials over Grids 1 of 12 DeMillo-Lipton-Schwarz-Zippel

More information

ECS171: Machine Learning

ECS171: Machine Learning ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f

More information

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra

Worksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra Worksheets for GCSE Mathematics Algebraic Expressions Mr Black 's Maths Resources for Teachers GCSE 1-9 Algebra Algebraic Expressions Worksheets Contents Differentiated Independent Learning Worksheets

More information

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc. Quantum Mechanics An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc. Fall 2018 Prof. Sergio B. Mendes 1 CHAPTER 3 Experimental Basis of

More information

Math 171 Spring 2017 Final Exam. Problem Worth

Math 171 Spring 2017 Final Exam. Problem Worth Math 171 Spring 2017 Final Exam Problem 1 2 3 4 5 6 7 8 9 10 11 Worth 9 6 6 5 9 8 5 8 8 8 10 12 13 14 15 16 17 18 19 20 21 22 Total 8 5 5 6 6 8 6 6 6 6 6 150 Last Name: First Name: Student ID: Section:

More information

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India

How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization

More information

Line Search Methods for Unconstrained Optimisation

Line Search Methods for Unconstrained Optimisation Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic

More information

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016

Independent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016 Independent Component Analysis and FastICA Copyright Changwei Xiong 016 June 016 last update: July 7, 016 TABLE OF CONTENTS Table of Contents...1 1. Introduction.... Independence by Non-gaussianity....1.

More information

Elastic light scattering

Elastic light scattering Elastic light scattering 1. Introduction Elastic light scattering in quantum mechanics Elastic scattering is described in quantum mechanics by the Kramers Heisenberg formula for the differential cross

More information

Gradient expansion formalism for generic spin torques

Gradient expansion formalism for generic spin torques Gradient expansion formalism for generic spin torques Atsuo Shitade RIKEN Center for Emergent Matter Science Atsuo Shitade, arxiv:1708.03424. Outline 1. Spintronics a. Magnetoresistance and spin torques

More information

Rotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition

Rotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition Rotational Motion Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 We ll look for a way to describe the combined (rotational) motion 2 Angle Measurements θθ ss rr rrrrrrrrrrrrrr

More information

Approximate Newton Methods and Their Local Convergence

Approximate Newton Methods and Their Local Convergence Haishan Ye Luo Luo Zhihua Zhang 2 Abstract Many machine learning models are reformulated as optimization problems. Thus, it is important to solve a large-scale optimization problem in big data applications.

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Angular Momentum, Electromagnetic Waves

Angular Momentum, Electromagnetic Waves Angular Momentum, Electromagnetic Waves Lecture33: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay As before, we keep in view the four Maxwell s equations for all our discussions.

More information

Accelerating SVRG via second-order information

Accelerating SVRG via second-order information Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical

More information

Lecture 7 MOS Capacitor

Lecture 7 MOS Capacitor EE 471: Transport Phenomena in Solid State Devices Spring 2018 Lecture 7 MOS Capacitor Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030

More information

arxiv: v1 [math.oc] 9 Oct 2018

arxiv: v1 [math.oc] 9 Oct 2018 Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University

More information

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning

Tutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret

More information

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016

Online Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural

More information

Linear Regression. S. Sumitra

Linear Regression. S. Sumitra Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Lecture 2: Plasma particles with E and B fields

Lecture 2: Plasma particles with E and B fields Lecture 2: Plasma particles with E and B fields Today s Menu Magnetized plasma & Larmor radius Plasma s diamagnetism Charged particle in a multitude of EM fields: drift motion ExB drift, gradient drift,

More information

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays

On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays Journal of Mathematics and System Science 6 (216) 194-199 doi: 1.17265/2159-5291/216.5.3 D DAVID PUBLISHING On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time

More information

Module 7 (Lecture 25) RETAINING WALLS

Module 7 (Lecture 25) RETAINING WALLS Module 7 (Lecture 25) RETAINING WALLS Topics Check for Bearing Capacity Failure Example Factor of Safety Against Overturning Factor of Safety Against Sliding Factor of Safety Against Bearing Capacity Failure

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick Grover s algorithm Search in an unordered database Example: phonebook, need to find a person from a phone number Actually, something else, like hard (e.g., NP-complete) problem 0, xx aa Black box ff xx

More information

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()

More information