Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo
|
|
- Arabella Merritt
- 5 years ago
- Views:
Transcription
1 Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo
2 Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence rate of Newton s Method Ex: Consider ff xx = xx 2 and gg xx = ff xx 2 = xx2 4. GD for the second function takes smaller steps, whereas Newton s method solves this in a single step. Thus, potentially requires less hyperparameter tuning Hopefully improve training speed First order methods that achieve the theoretical lower bound are already achieved. Can we further improve? Number of iterations to converge may balance the per-iteration cost. Disadvantages: If HH(xx) is not invertible? Use Pseudo-inverse (Moore-Penrose) Computing HH 1 xx f(x) is expensive Approximate
3 Stochastic Newton Step? Suppose we are minimizing ff xx = mm kk=1 ff kk (xx) with ff kk xx μμ strongly convex and LL-smooth. For analysis of second order algorithms, we also need another constraint: HH(xx) is MM-Lipschitz HH xx HH yy xx yy Naïve Generalization: HH kk 1 (xx) ff kk (xx) Estimation of curvature hurts performance
4 Hessian-Free (HF) Optimization To avoid computing H, we instead compute HHHH, where v is any vector, which can be computed as for a small εε ff xx + εεεε + ff(xx) HHHH = εε To avoid inverting HH to obtain HHHH = ff xx, we solve min yy using Conjugate Gradient ff xx + yy TT ff xx yytt HHHH
5 Hessian Free Optimization Off the shelf HF algorithms are not feasible for large scale problems Damping, makes a more conservative curvature estimate Adding the constant λλ dd 2 to the curvature estimate depending on ρρ where ρρ = ff xx+pp ff(xx) qq xx pp qq xx (0) Computing Matrix Vector products Use GG instead of HH for Hv, where GG is the Gauss-Newton approximation for Hessian which is positive semidefinite Terminating conditions for CG CG finds solution to AAAA = bb not by optimizing AAAA bb 2 but by optimizing the quadratic φφ xx = 1 2 xxtt AAAA bb TT xx φφ xx decreases with every step whereas AAAA bb 2 fluctuates a lot before tending towards 0 Terminates when relative improvement of φφ xx over the last kk steps drops below a constant kkεε Many methods to better precondition for CG
6 Lower Bounds First order methods require Ω mm + mmmm log 1 oracle calls to the εε gradient to achieve an εε-approximate solution Linear dependence on the condition number is obtained by SVRG, SAGA, This minimal bound is obtained by Katyusha and AccSDCA Second Order methods: An algorithm can use at most mm Hessians for update 2 Indices kk [mm] sampled uniformly at random Input dimension: d = OO(1 + κκκκ) Oracle calls = Ω mm + mmmm log 1 εε Bound is better by logarithmic factor by randomized construction
7 Lower Bounds Discussion This lower bound suggests that second algorithms cannot improve rates of optimization by much. Because even with the oracles, second order methods requires computing the Hessian OO(mmdd 2 ) and inverting it OO(dd 3 ), simple second order algorithms are not attractive. Because of the assumption that the algorithm cannot use the Hessian of all samples HH kk xx kk [mm], we lose the quadratic convergence rate. Suggestion: If an algorithm that does not satisfy the assumption, it may achieve faster convergence than the bound presented. LiSSA-sample uses leverage scoring to sample the Hessians non-uniformly and achieves convergence rate in the high accuracy regime faster than any first order algorithm.
8 Overview min ff xx = 1 mm kk=1 mm 2 ff kk (xx) + λλ xx 2 ff kk (xx) is μμ-strongly convex, LL-smooth, Hessian MM-Lipschitz LiSSA and LiSSA-sample focuses on Generalized Linear Models (GLM): ff kk xx = ll(vv kk xx, yy kk ) with ll μμ strongly-convex and LL-smooth e.g. linear regression with Mean Squared Error loss, data (vv kk, yy kk ) This results in HH kk xx = αα kk vv kk vv kk TT Condition Number κκ = max xx λλ mmmmmm 2 ff xx min λλ mmmmmm 2 ff xx xx μμ strongly-convex and LL smooth implies LII 2 ff xx μμμμ
9 LiSSA Description (LiSSA LiSSA-Sample) Key Idea 1 (Estimator): Avoid direct inversion of the Hessian by using a recursive formula of Taylor Approximation for matrices AA 1 = ii=0 II AA ii Key Idea 2 (Concentration): With sufficient number of random matrices, Matrix Bernstein inequality gives a tail bound: AA ii ~ iiiiii PP AA with EE AA ii = 0 and EE AA ii MM where AA ii RR dddddd PP AA ii tt dd exp tt2 4RR 2
10 LiSSA Pseudocode Run any (fast) first order algorithm to obtain xx 0 such that xx 0 xx 1 (in practice, use some estimate) 4κκ ll MM For each iteration t = 0,, T-1: Compute the full gradient X i = ff xx tt = 1 mm kk ff kk xx tt ii [SS 1 ] where SS 1 is a parameter Iterate Inner loop 2 SS 1 κκ ll ln(2 κκ ll ) times: Compute the Hessian of a single random sample HH XX ii = f x t + I HH XX ii xx tt+1 = xx tt 1 SS 1 ii SS 1 XX ii
11 LiSSA Analysis Time Complexity (convergence rate + per-iteration cost): ff xx TT ff xx εε requires time OO( mm + κκ ll 3 dd log 1 εε for small εε with high probability.
12 LiSSA Sample Description Key Idea 1: Hessian Sketch BBHH 1 ff xx = arg min ff xx TT yy + 1 yy 2 yytt HH xx BB 1 yy Leverage Score: Measurement of deviation of a sample from other observations Sample OO(dd log dd) Hessians uniformly at random, without replacement Use these to compute (generalized) leverage scores for all samples BB = mm 1 kk=1 HH pp kk BBBBBBBBBBBBBBBBBB(pp kk ) (where pp kk kk 1 llllllllllllllll ssssssssss kk )
13 LiSSA-Sample Pseudocode Repeat for log log 1 εε : 1. Sample HH kk ~ pp kk where pp kk to compute BB = llllllllllllllll ssssssssss OO(dd log dd) kk=1 HHkk (xx) such that 1 BB AA 2BB 2 2. Minimize the quadratic objective (approximately): yy BBHH 1 ff xx = arg min ff xx TT yy + 1 yy 2 yytt HH xx BB 1 yy 3. Approximately solve for u to obtain HH 1 ff xx uu BB 1 yy 4. Update: xx xx uu 1
14 Computing Leverage scores efficiently Computing Leverage scores requires computing: TT γγ ii = vv ii HH 1 vv ii = AA 1 vv ii 2 2 dd log dd where H = kk=1 HHkk = AAAA TT Instead, randomly sample GG RR OO log mmmm dd where each entry is a normal random variable and compute γγ ii = GGAA 1 2 vv ii 2 By Johnson-Lindenstrauss Lemma, with high probability: γγ ii 1 2 AA 1 vv ii, 2 AA 1 vv ii 2 2 All of this takes OO dd 2 log mmmm + mmmm + dd Note: dd 2 mmmm κκκκ
15 LiSSA-Sample Analysis In the high accuracy regime (i.e. εε small), LiSSA-Sample enjoys a convergence rate (with high probability): OO mmmm log 1 εε + dd + κκκκ )dd log2 1 εε log log 1 εε OO mmmm + dd κκκκ log 2 1 εε when κκ > mm dd ( dd 2 mmmm) This is faster than accelerated first order methods: OO mmdd + dd κκκκ log 1 εε
16 Extensions to Nonconvex Optimization 1. Same authors used a similar observation to extend the algorithm for non-convex optimization, proving a convergence rate (to a local min) that is faster than gradient descent. 2. Use similar techniques such as non-uniform sampling or sketching to use the Saddle-Free Newton Method proposed by Dauphin: HH 1 ff(xx)
17 LiSSA Experiments
18 Empirical study: sketch size and convergence speed
19 Sketch Hessian vs computing exact Hessian
20 Red curve sketch results in deviation in find optimal point, giving independent trials to verify. As sketch sizes increases, it converges to the center path.
21 References Zeyuan Allen-Zhu, Katyusha: the first direct acceleration of stochastic gradient methods, Proceedings of the 49 th Annual ACM SIGACT Symposium on Theory of Computing, June 19-23, 2017, Montreal, Canada Naman Agarwal, Brian Bullins, and Elad Hazan. Second-order stochastic optimization for machine learning in linear time. The Journal of Machine Learning Research, 18(1): , Alekh Agarwal, Leon Bottou. A Lower Bound for the Optimization of Finite Sums. Jounral of Machine Learning Research Yossi Arjevani and Ohad Shamir. Oracle Complexity of Second-Order Methods for Finite-Sum Problems. In: arxiv preprint Naman Agarwal et al. Finding Approximate Local Minima Faster than Gradient Descent. In arxiv preprint Cohen et al. Uniform Sampling for Matrix Approximation. Proceedings of the 6 th Conference on Innovations in Theoretical Computer Science (ITCS) James Martens. Deep learning via Hessian Free Optimization. In Proceedings of the 26 th international Conference on Machine Learning, 2010 Dauphin et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. NIPS
Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016
ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................
More informationSecond-Order Stochastic Optimization for Machine Learning in Linear Time
Journal of Machine Learning Research 8 (207) -40 Submitted 9/6; Revised 8/7; Published /7 Second-Order Stochastic Optimization for Machine Learning in Linear Time Naman Agarwal Computer Science Department
More informationGeneral Strong Polarization
General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) December 4, 2017 IAS:
More informationSupport Vector Machines. CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 4309 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationOracle Complexity of Second-Order Methods for Smooth Convex Optimization
racle Complexity of Second-rder Methods for Smooth Convex ptimization Yossi Arjevani had Shamir Ron Shiff Weizmann Institute of Science Rehovot 7610001 Israel Abstract yossi.arjevani@weizmann.ac.il ohad.shamir@weizmann.ac.il
More informationReview for Exam Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa
57:020 Fluids Mechanics Fall2013 1 Review for Exam3 12. 11. 2013 Hyunse Yoon, Ph.D. Assistant Research Scientist IIHR-Hydroscience & Engineering University of Iowa 57:020 Fluids Mechanics Fall2013 2 Chapter
More informationarxiv: v4 [math.oc] 24 Apr 2017
Finding Approximate ocal Minima Faster than Gradient Descent arxiv:6.046v4 [math.oc] 4 Apr 07 Naman Agarwal namana@cs.princeton.edu Princeton University Zeyuan Allen-Zhu zeyuan@csail.mit.edu Institute
More information2.4 Error Analysis for Iterative Methods
2.4 Error Analysis for Iterative Methods 1 Definition 2.7. Order of Convergence Suppose {pp nn } nn=0 is a sequence that converges to pp with pp nn pp for all nn. If positive constants λλ and αα exist
More informationLast Name _Piatoles_ Given Name Americo ID Number
Last Name _Piatoles_ Given Name Americo ID Number 20170908 Question n. 1 The "C-V curve" method can be used to test a MEMS in the electromechanical characterization phase. Describe how this procedure is
More informationSub-Sampled Newton Methods for Machine Learning. Jorge Nocedal
Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman Lecture, Sept 2016 1 Collaborators Raghu Bollapragada Northwestern University Richard Byrd University of Colorado
More informationGeneral Strong Polarization
General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Gurswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 1, 018 G.Tech:
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More information7.3 The Jacobi and Gauss-Seidel Iterative Methods
7.3 The Jacobi and Gauss-Seidel Iterative Methods 1 The Jacobi Method Two assumptions made on Jacobi Method: 1.The system given by aa 11 xx 1 + aa 12 xx 2 + aa 1nn xx nn = bb 1 aa 21 xx 1 + aa 22 xx 2
More informationStatistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationA Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions
Lin Lin A Posteriori DG using Non-Polynomial Basis 1 A Posteriori Error Estimates For Discontinuous Galerkin Methods Using Non-polynomial Basis Functions Lin Lin Department of Mathematics, UC Berkeley;
More informationVariations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra
Variations ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra Last Time Probability Density Functions Normal Distribution Expectation / Expectation of a function Independence Uncorrelated
More informationAn Efficient Algorithm For Weak Hierarchical Lasso. Yashu Liu, Jie Wang, Jieping Ye Arizona State University
An Efficient Algorithm For Weak Hierarchical Lasso Yashu Liu, Jie Wang, Jieping Ye Arizona State University Outline Regression with Interactions Problems and Challenges Weak Hierarchical Lasso The Proposed
More informationWorksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra
Worksheets for GCSE Mathematics Quadratics mr-mathematics.com Maths Resources for Teachers Algebra Quadratics Worksheets Contents Differentiated Independent Learning Worksheets Solving x + bx + c by factorisation
More informationTrade-Offs in Distributed Learning and Optimization
Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed
More informationNonlinear Optimization Methods for Machine Learning
Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks
More informationIPAM Summer School Optimization methods for machine learning. Jorge Nocedal
IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep
More informationReview for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa
Review for Exam3 12. 9. 2015 Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa Assistant Research Scientist IIHR-Hydroscience & Engineering, University
More informationStochastic Quasi-Newton Methods
Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, 2016 1 / 35 Outline Stochastic Approximation Stochastic Gradient Descent
More informationTutorial: PART 2. Optimization for Machine Learning. Elad Hazan Princeton University. + help from Sanjeev Arora & Yoram Singer
Tutorial: PART 2 Optimization for Machine Learning Elad Hazan Princeton University + help from Sanjeev Arora & Yoram Singer Agenda 1. Learning as mathematical optimization Stochastic optimization, ERM,
More informationLecture 11. Kernel Methods
Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product
More informationWork, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition
Work, Energy, and Power Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 With the knowledge we got so far, we can handle the situation on the left but not the one on the right.
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationExact and Inexact Subsampled Newton Methods for Optimization
Exact and Inexact Subsampled Newton Methods for Optimization Raghu Bollapragada Richard Byrd Jorge Nocedal September 27, 2016 Abstract The paper studies the solution of stochastic optimization problems
More informationThe Randomized Newton Method for Convex Optimization
The Randomized Newton Method for Convex Optimization Vaden Masrani UBC MLRG April 3rd, 2018 Introduction We have some unconstrained, twice-differentiable convex function f : R d R that we want to minimize:
More information(1) Correspondence of the density matrix to traditional method
(1) Correspondence of the density matrix to traditional method New method (with the density matrix) Traditional method (from thermal physics courses) ZZ = TTTT ρρ = EE ρρ EE = dddd xx ρρ xx ii FF = UU
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationRadial Basis Function (RBF) Networks
CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks 1 Function approximation We have been using MLPs as pattern classifiers But in general, they are function approximators Depending
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationSecondary 3H Unit = 1 = 7. Lesson 3.3 Worksheet. Simplify: Lesson 3.6 Worksheet
Secondary H Unit Lesson Worksheet Simplify: mm + 2 mm 2 4 mm+6 mm + 2 mm 2 mm 20 mm+4 5 2 9+20 2 0+25 4 +2 2 + 2 8 2 6 5. 2 yy 2 + yy 6. +2 + 5 2 2 2 0 Lesson 6 Worksheet List all asymptotes, holes and
More information1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. Ans: x = 4, x = 3, x = 2,
1. The graph of a function f is given above. Answer the question: a. Find the value(s) of x where f is not differentiable. x = 4, x = 3, x = 2, x = 1, x = 1, x = 2, x = 3, x = 4, x = 5 b. Find the value(s)
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem Newton-CG 3 Composite Problem Proximal-Newton-CD 4 Non-smooth,
More informationAn Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal
An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve
More informationClassical RSA algorithm
Classical RSA algorithm We need to discuss some mathematics (number theory) first Modulo-NN arithmetic (modular arithmetic, clock arithmetic) 9 (mod 7) 4 3 5 (mod 7) congruent (I will also use = instead
More informationPHY103A: Lecture # 4
Semester II, 2017-18 Department of Physics, IIT Kanpur PHY103A: Lecture # 4 (Text Book: Intro to Electrodynamics by Griffiths, 3 rd Ed.) Anand Kumar Jha 10-Jan-2018 Notes The Solutions to HW # 1 have been
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationReview for Exam Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa
Review for Exam2 11. 13. 2015 Hyunse Yoon, Ph.D. Adjunct Assistant Professor Department of Mechanical Engineering, University of Iowa Assistant Research Scientist IIHR-Hydroscience & Engineering, University
More informationProperty Testing and Affine Invariance Part I Madhu Sudan Harvard University
Property Testing and Affine Invariance Part I Madhu Sudan Harvard University December 29-30, 2015 IITB: Property Testing & Affine Invariance 1 of 31 Goals of these talks Part I Introduce Property Testing
More informationStochastic Gradient Descent with Variance Reduction
Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction
More informationStochastic Variance Reduction for Nonconvex Optimization. Barnabás Póczos
1 Stochastic Variance Reduction for Nonconvex Optimization Barnabás Póczos Contents 2 Stochastic Variance Reduction for Nonconvex Optimization Joint work with Sashank Reddi, Ahmed Hefny, Suvrit Sra, and
More informationMathematics Ext 2. HSC 2014 Solutions. Suite 403, 410 Elizabeth St, Surry Hills NSW 2010 keystoneeducation.com.
Mathematics Ext HSC 4 Solutions Suite 43, 4 Elizabeth St, Surry Hills NSW info@keystoneeducation.com.au keystoneeducation.com.au Mathematics Extension : HSC 4 Solutions Contents Multiple Choice... 3 Question...
More informationThird-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima
Third-order Smoothness elps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima Yaodong Yu and Pan Xu and Quanquan Gu arxiv:171.06585v1 [math.oc] 18 Dec 017 Abstract We propose stochastic
More informationImproved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations
Improved Optimization of Finite Sums with Miniatch Stochastic Variance Reduced Proximal Iterations Jialei Wang University of Chicago Tong Zhang Tencent AI La Astract jialei@uchicago.edu tongzhang@tongzhang-ml.org
More informationExpectation Propagation performs smooth gradient descent GUILLAUME DEHAENE
Expectation Propagation performs smooth gradient descent 1 GUILLAUME DEHAENE In a nutshell Problem: posteriors are uncomputable Solution: parametric approximations 2 But which one should we choose? Laplace?
More informationSub-Sampled Newton Methods I: Globally Convergent Algorithms
Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationSECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems
SECTION 5: POWER FLOW ESE 470 Energy Distribution Systems 2 Introduction Nodal Analysis 3 Consider the following circuit Three voltage sources VV sss, VV sss, VV sss Generic branch impedances Could be
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationSelected Topics in Optimization. Some slides borrowed from
Selected Topics in Optimization Some slides borrowed from http://www.stat.cmu.edu/~ryantibs/convexopt/ Overview Optimization problems are almost everywhere in statistics and machine learning. Input Model
More informationSub-Sampled Newton Methods
Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationMath, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD
Math, Stats, and Mathstats Review ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD Outline These preliminaries serve to signal to students what tools they need to know to succeed in ECON 360 and refresh their
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Linear Algebra U Kang Seoul National University U Kang 1 In This Lecture Overview of linear algebra (but, not a comprehensive survey) Focused on the subset
More informationDistributed Inexact Newton-type Pursuit for Non-convex Sparse Learning
Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology
More informationECE580 Exam 1 October 4, Please do not write on the back of the exam pages. Extra paper is available from the instructor.
ECE580 Exam 1 October 4, 2012 1 Name: Solution Score: /100 You must show ALL of your work for full credit. This exam is closed-book. Calculators may NOT be used. Please leave fractions as fractions, etc.
More informationarxiv: v2 [math.oc] 1 Nov 2017
Stochastic Non-convex Optimization with Strong High Probability Second-order Convergence arxiv:1710.09447v [math.oc] 1 Nov 017 Mingrui Liu, Tianbao Yang Department of Computer Science The University of
More informationControl of Mobile Robots
Control of Mobile Robots Regulation and trajectory tracking Prof. Luca Bascetta (luca.bascetta@polimi.it) Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Organization and
More informationAn Investigation of Newton-Sketch and Subsampled Newton Methods
An Investigation of Newton-Sketch and Subsampled Newton Methods Albert S. Berahas, Raghu Bollapragada Northwestern University Evanston, IL {albertberahas,raghu.bollapragada}@u.northwestern.edu Jorge Nocedal
More informationFirst Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate
58th Annual IEEE Symposium on Foundations of Computer Science First Efficient Convergence for Streaming k-pca: a Global, Gap-Free, and Near-Optimal Rate Zeyuan Allen-Zhu Microsoft Research zeyuan@csail.mit.edu
More informationLecture 1: Supervised Learning
Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)
More informationCHAPTER 2 Special Theory of Relativity
CHAPTER 2 Special Theory of Relativity Fall 2018 Prof. Sergio B. Mendes 1 Topics 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 Inertial Frames of Reference Conceptual and Experimental
More informationLecture 3 Optimization methods for econometrics models
Lecture 3 Optimization methods for econometrics models Cinzia Cirillo University of Maryland Department of Civil and Environmental Engineering 06/29/2016 Summer Seminar June 27-30, 2016 Zinal, CH 1 Overview
More informationUncertain Compression & Graph Coloring. Madhu Sudan Harvard
Uncertain Compression & Graph Coloring Madhu Sudan Harvard Based on joint works with: (1) Adam Kalai (MSR), Sanjeev Khanna (U.Penn), Brendan Juba (WUStL) (2) Elad Haramaty (Harvard) (3) Badih Ghazi (MIT),
More informationOn one Application of Newton s Method to Stability Problem
Journal of Multidciplinary Engineering Science Technology (JMEST) ISSN: 359-0040 Vol. Issue 5, December - 204 On one Application of Newton s Method to Stability Problem Şerife Yılmaz Department of Mathematics,
More informationCourse Business. Homework 3 Due Now. Homework 4 Released. Professor Blocki is travelling, but will be back next week
Course Business Homework 3 Due Now Homework 4 Released Professor Blocki is travelling, but will be back next week 1 Cryptography CS 555 Week 11: Discrete Log/DDH Applications of DDH Factoring Algorithms,
More informationSECTION 5: CAPACITANCE & INDUCTANCE. ENGR 201 Electrical Fundamentals I
SECTION 5: CAPACITANCE & INDUCTANCE ENGR 201 Electrical Fundamentals I 2 Fluid Capacitor Fluid Capacitor 3 Consider the following device: Two rigid hemispherical shells Separated by an impermeable elastic
More informationLocal Decoding and Testing Polynomials over Grids
Local Decoding and Testing Polynomials over Grids Madhu Sudan Harvard University Joint work with Srikanth Srinivasan (IIT Bombay) January 11, 2018 ITCS: Polynomials over Grids 1 of 12 DeMillo-Lipton-Schwarz-Zippel
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationWorksheets for GCSE Mathematics. Algebraic Expressions. Mr Black 's Maths Resources for Teachers GCSE 1-9. Algebra
Worksheets for GCSE Mathematics Algebraic Expressions Mr Black 's Maths Resources for Teachers GCSE 1-9 Algebra Algebraic Expressions Worksheets Contents Differentiated Independent Learning Worksheets
More informationQuantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.
Quantum Mechanics An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc. Fall 2018 Prof. Sergio B. Mendes 1 CHAPTER 3 Experimental Basis of
More informationMath 171 Spring 2017 Final Exam. Problem Worth
Math 171 Spring 2017 Final Exam Problem 1 2 3 4 5 6 7 8 9 10 11 Worth 9 6 6 5 9 8 5 8 8 8 10 12 13 14 15 16 17 18 19 20 21 22 Total 8 5 5 6 6 8 6 6 6 6 6 150 Last Name: First Name: Student ID: Section:
More informationHow to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India
How to Escape Saddle Points Efficiently? Praneeth Netrapalli Microsoft Research India Chi Jin UC Berkeley Michael I. Jordan UC Berkeley Rong Ge Duke Univ. Sham M. Kakade U Washington Nonconvex optimization
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationIndependent Component Analysis and FastICA. Copyright Changwei Xiong June last update: July 7, 2016
Independent Component Analysis and FastICA Copyright Changwei Xiong 016 June 016 last update: July 7, 016 TABLE OF CONTENTS Table of Contents...1 1. Introduction.... Independence by Non-gaussianity....1.
More informationElastic light scattering
Elastic light scattering 1. Introduction Elastic light scattering in quantum mechanics Elastic scattering is described in quantum mechanics by the Kramers Heisenberg formula for the differential cross
More informationGradient expansion formalism for generic spin torques
Gradient expansion formalism for generic spin torques Atsuo Shitade RIKEN Center for Emergent Matter Science Atsuo Shitade, arxiv:1708.03424. Outline 1. Spintronics a. Magnetoresistance and spin torques
More informationRotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition
Rotational Motion Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition 1 We ll look for a way to describe the combined (rotational) motion 2 Angle Measurements θθ ss rr rrrrrrrrrrrrrr
More informationApproximate Newton Methods and Their Local Convergence
Haishan Ye Luo Luo Zhihua Zhang 2 Abstract Many machine learning models are reformulated as optimization problems. Thus, it is important to solve a large-scale optimization problem in big data applications.
More informationLarge-scale Stochastic Optimization
Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation
More informationSVRG++ with Non-uniform Sampling
SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract
More informationAngular Momentum, Electromagnetic Waves
Angular Momentum, Electromagnetic Waves Lecture33: Electromagnetic Theory Professor D. K. Ghosh, Physics Department, I.I.T., Bombay As before, we keep in view the four Maxwell s equations for all our discussions.
More informationAccelerating SVRG via second-order information
Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical
More informationLecture 7 MOS Capacitor
EE 471: Transport Phenomena in Solid State Devices Spring 2018 Lecture 7 MOS Capacitor Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030
More informationarxiv: v1 [math.oc] 9 Oct 2018
Cubic Regularization with Momentum for Nonconvex Optimization Zhe Wang Yi Zhou Yingbin Liang Guanghui Lan Ohio State University Ohio State University zhou.117@osu.edu liang.889@osu.edu Ohio State University
More informationTutorial: PART 2. Online Convex Optimization, A Game- Theoretic Approach to Learning
Tutorial: PART 2 Online Convex Optimization, A Game- Theoretic Approach to Learning Elad Hazan Princeton University Satyen Kale Yahoo Research Exploiting curvature: logarithmic regret Logarithmic regret
More informationOnline Convex Optimization. Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016
Online Convex Optimization Gautam Goel, Milan Cvitkovic, and Ellen Feldman CS 159 4/5/2016 The General Setting The General Setting (Cover) Given only the above, learning isn't always possible Some Natural
More informationLinear Regression. S. Sumitra
Linear Regression S Sumitra Notations: x i : ith data point; x T : transpose of x; x ij : ith data point s jth attribute Let {(x 1, y 1 ), (x, y )(x N, y N )} be the given data, x i D and y i Y Here D
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationLecture 2: Plasma particles with E and B fields
Lecture 2: Plasma particles with E and B fields Today s Menu Magnetized plasma & Larmor radius Plasma s diamagnetism Charged particle in a multitude of EM fields: drift motion ExB drift, gradient drift,
More informationOn The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time Delays
Journal of Mathematics and System Science 6 (216) 194-199 doi: 1.17265/2159-5291/216.5.3 D DAVID PUBLISHING On The Cauchy Problem For Some Parabolic Fractional Partial Differential Equations With Time
More informationModule 7 (Lecture 25) RETAINING WALLS
Module 7 (Lecture 25) RETAINING WALLS Topics Check for Bearing Capacity Failure Example Factor of Safety Against Overturning Factor of Safety Against Sliding Factor of Safety Against Bearing Capacity Failure
More informationComparison of Modern Stochastic Optimization Algorithms
Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,
More informationGrover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick
Grover s algorithm Search in an unordered database Example: phonebook, need to find a person from a phone number Actually, something else, like hard (e.g., NP-complete) problem 0, xx aa Black box ff xx
More informationCS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds
CS 6347 Lecture 8 & 9 Lagrange Multipliers & Varitional Bounds General Optimization subject to: min ff 0() R nn ff ii 0, h ii = 0, ii = 1,, mm ii = 1,, pp 2 General Optimization subject to: min ff 0()
More information