R-Linear Convergence of Limited Memory Steepest Descent
|
|
- Randolph Gibbs
- 6 years ago
- Views:
Transcription
1 R-Linear Convergence of Limited Memory Steepest Descent Frank E. Curtis, Lehigh University joint work with Wei Guo, Lehigh University OP17 Vancouver, British Columbia, Canada 24 May 2017 R-Linear Convergence of Limited Memory Steepest Descent 1 of 26
2 Outline Introduction Limited Memory Steepest Descent (LMSD) R-Linear Convergence of LMSD Numerical Demonstrations Summary R-Linear Convergence of Limited Memory Steepest Descent 2 of 26
3 Outline Introduction Limited Memory Steepest Descent (LMSD) R-Linear Convergence of LMSD Numerical Demonstrations Summary R-Linear Convergence of Limited Memory Steepest Descent 3 of 26
4 Unconstrained optimization: Steepest descent Consider the unconstrained optimization problem min f(x), where f : x R n Rn R is C 1. Let us focus exclusively on a steepest descent framework: Algorithm SD Steepest Descent Require: x 1 R n 1: for k N do 2: Compute g k f(x k ) 3: Choose α k (0, ) 4: Set x k+1 x k α k g k 5: end for All that remains to be determined are the stepsizes {α k }. R-Linear Convergence of Limited Memory Steepest Descent 4 of 26
5 Minimizing strongly convex quadratics Suppose f(x) = 1 2 xt Ax b T x, where A has eigenvalues λ 1 λ n. 0 1/λ n 1/λ 1 Convergence (rate) of the algorithm depends on choices for {α k }. R-Linear Convergence of Limited Memory Steepest Descent 5 of 26
6 Minimizing strongly convex quadratics Suppose f(x) = 1 2 xt Ax b T x, where A has eigenvalues λ 1 λ n. 0 1/λ n 1/λ 1 Choosing α k 1/λ n leads to Q-linear convergence with constant (1 λ 1 /λ n) R-Linear Convergence of Limited Memory Steepest Descent 5 of 26
7 Minimizing strongly convex quadratics Suppose f(x) = 1 2 xt Ax b T x, where A has eigenvalues λ 1 λ n. 0 1/λ n 1/λ 1... but certain components of the gradient vanish in a larger range. R-Linear Convergence of Limited Memory Steepest Descent 5 of 26
8 Minimizing strongly convex quadratics Suppose f(x) = 1 2 xt Ax b T x, where A has eigenvalues λ 1 λ n. 0 1/λ n 1/λ 1 Goal: Allow large stepsizes, shrink range (automatically) to catch entire gradient. R-Linear Convergence of Limited Memory Steepest Descent 5 of 26
9 Contributions Consider Fletcher s limited memory steepest descent (LMSD) method. Extends the Barzilai-Borwein (BB) two-point stepsize strategy. BB methods known to have R-linear convergence rate; Dai and Liao (2002). We prove that LMSD also attains R-linear convergence. Although proved convergence rate is not necessarily better than that for BB, one can see reasons for improved empirical performance. R-Linear Convergence of Limited Memory Steepest Descent 6 of 26
10 Outline Introduction Limited Memory Steepest Descent (LMSD) R-Linear Convergence of LMSD Numerical Demonstrations Summary R-Linear Convergence of Limited Memory Steepest Descent 7 of 26
11 Decomposition min f(x), x R n where f(x) = 1 2 xt Ax b T x Let A have the eigendecomposition A = QΛQ T, where Q = [ q 1 q n ] is orthogonal and Λ = diag(λ 1,..., λ n) with λ n λ 1 > 0. R-Linear Convergence of Limited Memory Steepest Descent 8 of 26
12 Decomposition min f(x), x R n where f(x) = 1 2 xt Ax b T x Let A have the eigendecomposition A = QΛQ T, where Q = [ q 1 q n ] is orthogonal and Λ = diag(λ 1,..., λ n) with λ n λ 1 > 0. Let g := f. For any x R n, the gradient of f at x can be expressed as g(x) = n d i q i, where d i R for all i [n] := {1,..., n}. i=1 R-Linear Convergence of Limited Memory Steepest Descent 8 of 26
13 Recursion Let g := f. For any x R n, the gradient of f at x can be expressed as n g(x) = d i q i, where d i R for all i [n] := {1,..., n}. (1) i=1 R-Linear Convergence of Limited Memory Steepest Descent 9 of 26
14 Recursion Let g := f. For any x R n, the gradient of f at x can be expressed as n g(x) = d i q i, where d i R for all i [n] := {1,..., n}. (1) i=1 If x + x αg(x), then the weights satisfy the recursive property: d + i = (1 αλ i )d i for all i [n]. R-Linear Convergence of Limited Memory Steepest Descent 9 of 26
15 Recursion Let g := f. For any x R n, the gradient of f at x can be expressed as n g(x) = d i q i, where d i R for all i [n] := {1,..., n}. (1) i=1 If x + x αg(x), then the weights satisfy the recursive property: d + i = (1 αλ i )d i for all i [n]. Proof (Sketch). Since g(x) = Ax b, x + = x αg(x) Ax + = Ax αg(x) g(x + ) = (I αa)g(x) g(x + ) = (I αqλq T )g(x), then decompose according to (1). R-Linear Convergence of Limited Memory Steepest Descent 9 of 26
16 Recursion Let g := f. For any x R n, the gradient of f at x can be expressed as n g(x) = d i q i, where d i R for all i [n] := {1,..., n}. (1) i=1 If x + x αg(x), then the weights satisfy the recursive property: d + i = (1 αλ i )d i for all i [n]. Proof (Sketch). Since g(x) = Ax b, x + = x αg(x) Ax + = Ax αg(x) g(x + ) = (I αa)g(x) g(x + ) = (I αqλq T )g(x), then decompose according to (1). Idea: Choose stepsizes as reciprocals of (estimates of) eigenvalues of A. R-Linear Convergence of Limited Memory Steepest Descent 9 of 26
17 LMSD method: Main idea Fletcher (2012): Repeated cycles (or sweeps ) of m iterations. At start of (k + 1)st cycle, suppose one has the kth cycle values in G k := [ ] g k,1 g k,m corresponding to {xk,1,..., x k,m }. Iterate displacements lie in Krylov sequence initiated from g k,1. R-Linear Convergence of Limited Memory Steepest Descent 10 of 26
18 LMSD method: Main idea Fletcher (2012): Repeated cycles (or sweeps ) of m iterations. At start of (k + 1)st cycle, suppose one has the kth cycle values in G k := [ ] g k,1 g k,m corresponding to {xk,1,..., x k,m }. Iterate displacements lie in Krylov sequence initiated from g k,1. Performing a QR decomposition to obtain G k = Q k R k, one obtains m eigenvalue estimates (Ritz values) as eigenvalues of (symmetric tridiagonal) T k Q T k AQ k, which are contained in the spectrum of A in an optimal sense (more later). One can also obtain these estimates more cheaply and with less storage... R-Linear Convergence of Limited Memory Steepest Descent 10 of 26
19 LMSD method: Efficient eigenvalue estimation Storing the kth cycle reciprocal stepsizes in α 1 k,1 α J k 1 k,1....,.. α 1 k,m α 1 k,m one finds that by computing the (partially extended) Cholesky factorization G T [ ] [ ] k Gk g k,m+1 = R T k Rk r k, one has T k [ R k r k ] Jk R 1 k. Long story short: One can obtain Ritz values (and stepsizes) in 1 2 m2 n flops... and this is done only once every m steps. R-Linear Convergence of Limited Memory Steepest Descent 11 of 26
20 LMSD Algorithm LMSD Limited Memory Steepest Descent Require: x 1,1 R n, m N, and ɛ R + 1: Choose stepsizes {α 1,j } j [m] R ++ 2: Compute g 1,1 f(x 1,1 ) 3: if g 1,1 ɛ, then return x 1,1 4: for k N do 5: for j [m] do 6: Set x k,j+1 x k,j α k,j g k,j 7: Compute g k,j+1 f(x k,j+1 ) 8: if g k,j+1 ɛ, then return x k,j+1 9: end for 10: Set x k+1,1 x k,m+1 and g k+1,1 g k,m+1 11: Set G k and J k 12: Compute (R k, r k ), then compute T k 13: Compute {θ k,j } j [m] R ++ as the eigenvalues of T k 14: Compute {α k+1,j } j [m] {θ 1 k,j } j [m] R ++ 15: end for (Note: There is also a version using harmonic Ritz values.) R-Linear Convergence of Limited Memory Steepest Descent 12 of 26
21 Known convergence properties BB methods (m = 1): R-superlinear when n = 2; Barzilai and Borwein (1988) Convergent for any n from any starting point; Raydan (1993) R-linear for any n; Dai and Liao (2002) LMSD methods (m 1): Convergent for any n from any starting point; Fletcher (2012) Prior to our work: Convergence rate not yet analyzed. R-Linear Convergence of Limited Memory Steepest Descent 13 of 26
22 Outline Introduction Limited Memory Steepest Descent (LMSD) R-Linear Convergence of LMSD Numerical Demonstrations Summary R-Linear Convergence of Limited Memory Steepest Descent 14 of 26
23 Basic Assumptions Assumption 1 (i) Algorithm LMSD is run with ɛ = 0 and g k,j 0 for all (k, j) N [m]. (ii) For all k N, the matrix G k has linearly independent columns. Further, there exists ρ [1, ) such that, for all k N, To justify (2), note that when m = 1, one has R 1 k ρ g k,1 1. (2) Q k R k = G k = g k,1 where Q k = g k,1 / g k,1 and R k = g k,1. Hence, (2) holds with ρ = 1. R-Linear Convergence of Limited Memory Steepest Descent 15 of 26
24 Intuition Lemma 2 For all k N, the eigenvalues of T k satisfy θ k,j [λ m+1 j, λ n+1 j ] [λ 1, λ n] for all j [m]. Recall /λ n 1/λ 1 We essentially prove that... R-Linear Convergence of Limited Memory Steepest Descent 16 of 26
25 Worst-case blow-up of weights over a cycle Lemma 3 For each (k, j, i) N [m] [n]: { } λ i d k,j+1,i δ j,i d k,j,i where δ j,i := max 1 λ, λ i 1 m+1 j λ. n+1 j Hence, for each (k, j, i) N [m] [n]: d k+1,j,i i d k,j,i where i := Furthermore, for each (k, j, p) N [m] [n]: m δ j,i. j=1 p d 2 k,j+1,i ˆδ j,p p d 2 k,j,i where ˆδ j,p := max δ j,i, i [p] i=1 i=1 while, for each (k, j) N [m]: g k+1,j g k,j where := max i [n] i. R-Linear Convergence of Limited Memory Steepest Descent 17 of 26
26 Q-linear convergence of weight i = 1 Lemma 4 If 1 = 0, then d 1+ˆk,ĵ,1 = 0 for all (ˆk, ĵ) N [m]. Otherwise, if 1 > 0, then: (i) for (k, j) N [m] with d k,j,1 = 0, it follows that d k+ˆk,ĵ,1 = 0 for all (ˆk, ĵ) N [m]; (ii) for (k, j) N [m] with d k,j,1 > 0 and any ɛ 1 (0, 1), it follows that d k+ˆk,ĵ,1 d k,j,1 log ɛ1 ɛ 1 for all ˆk 1 + log 1 and ĵ [m]. R-Linear Convergence of Limited Memory Steepest Descent 18 of 26
27 Ritz value representation Lemma 5 For all (k, j) N [m], let q k,j R m denote the unit eigenvector corresponding to the eigenvalue θ k,j of T k, i.e., that with T k q k,j = θ k,j q k,j and q k,j = 1. Then, defining d k,1,1 d k,m,1 D k := and c k,j := D k R 1 k q k,j, d k,1,n d k,m,n it follows that, with the diagonal matrix of eigenvalues (namely, Λ = Q T AQ), θ k,j = c T k,j Λc k,j and c T k,j c k,j = 1. R-Linear Convergence of Limited Memory Steepest Descent 19 of 26
28 If first p weights small, then bound for weight p (We express ˆδ p [1, ) dependent only on m, p, and the spectrum of A.) Lemma 6 (simplified) For any (k, p) N [n 1], if there exists (ɛ p, K p) (0, p d 2 k+ˆk,1,i ɛ2 p g k,1 2 for all ˆk Kp, i=1 1 2ˆδ pρ ) N with then there exists K p+1 K p dependent only on ɛ p, ρ, and the spectrum of A with d 2 k+k p+1,1,p+1 4ˆδ 2 pρ 2 ɛ 2 p g k,1 2 ; Proof (Key step). First p elements of c k+ˆk,j small enough such that n θ k+ˆk,j = λ i c 2 k+ˆk,j,i 3 4 λ p+1 for ˆk K p and j [m]. i=1 R-Linear Convergence of Limited Memory Steepest Descent 20 of 26
29 If first p weights small, then bound for all first p + 1 weights... Lemma 7 For any (k, p) N [n 1], if there exists (ɛ p, K p) (0, p d 2 k+ˆk,1,i ɛ2 p g k,1 2 for all ˆk Kp, i=1 then, with ɛ 2 p+1 := (1 + 4 max{1, 4 p+1 }ˆδ 2 pρ 2 )ɛ 2 p and K p+1 N, 1 2ˆδ pρ ) N with p+1 d 2 k+ˆk,1,i ɛ2 p+1 g k,1 2 for all ˆk Kp+1. i=1 R-Linear Convergence of Limited Memory Steepest Descent 21 of 26
30 R-linear convergence of LMSD Lemma 8 There exists K N dependent only on the spectrum of A such that g k+k,1 1 2 g k,1 for all k N. Theorem 9 The sequence { g k,1 } vanishes R-linearly in the sense that g k,1 c 1 c k 2 g 1,1, where c 1 := 2 K 1 and c 2 := 2 1/K (0, 1). R-Linear Convergence of Limited Memory Steepest Descent 22 of 26
31 Outline Introduction Limited Memory Steepest Descent (LMSD) R-Linear Convergence of LMSD Numerical Demonstrations Summary R-Linear Convergence of Limited Memory Steepest Descent 23 of 26
32 Numerical demonstrations with n = 100: m = 1 and m = 5 Figure: {λ 1,..., λ 100} [1, 1.9] R-Linear Convergence of Limited Memory Steepest Descent 24 of 26
33 Numerical demonstrations with n = 100: m = 1 and m = 5 Figure: {λ 1,..., λ 100} [1, 100] R-Linear Convergence of Limited Memory Steepest Descent 24 of 26
34 Numerical demonstrations with n = 100: m = 1 and m = 5 Figure: {λ 1,..., λ 100} 5 clusters, m = 5 R-Linear Convergence of Limited Memory Steepest Descent 24 of 26
35 Numerical demonstrations with n = 100: m = 1 and m = 5 Figure: {λ 1,..., λ 100} 2 clusters (low heavy), m = 5 R-Linear Convergence of Limited Memory Steepest Descent 24 of 26
36 Numerical demonstrations with n = 100: m = 1 and m = 5 Figure: {λ 1,..., λ 100} 2 clusters (high heavy), m = 5 R-Linear Convergence of Limited Memory Steepest Descent 24 of 26
37 Outline Introduction Limited Memory Steepest Descent (LMSD) R-Linear Convergence of LMSD Numerical Demonstrations Summary R-Linear Convergence of Limited Memory Steepest Descent 25 of 26
38 Summary Consider Fletcher s limited memory steepest descent (LMSD) method. Extends the Barzilai-Borwein (BB) two-point stepsize strategy. BB methods known to have R-linear convergence rate; Dai and Liao (2002). We prove that LMSD also attains R-linear convergence. Although proved convergence rate is not necessarily better than that for BB, one can see reasons for improved empirical performance. F. E. Curtis and W. Guo. R-Linear Convergence of Limited Memory Steepest Descent. Technical Report 16T-010, COR@L Laboratory, Department of ISE, Lehigh University, Soon in IMA Journal of Numerical Analysis: R-Linear Convergence of Limited Memory Steepest Descent 26 of 26
R-Linear Convergence of Limited Memory Steepest Descent
R-Linear Convergence of Limited Memory Steepest Descent Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 16T-010 R-Linear Convergence
More informationR-linear convergence of limited memory steepest descent
IMA Journal of Numerical Analysis (2018) 38, 720 742 doi: 10.1093/imanum/drx016 Advance Access publication on April 24, 2017 R-linear convergence of limited memory steepest descent Fran E. Curtis and Wei
More informationHandling nonpositive curvature in a limited memory steepest descent method
IMA Journal of Numerical Analysis (2016) 36, 717 742 doi:10.1093/imanum/drv034 Advance Access publication on July 8, 2015 Handling nonpositive curvature in a limited memory steepest descent method Frank
More informationHandling Nonpositive Curvature in a Limited Memory Steepest Descent Method
Handling Nonpositive Curvature in a Limited Memory Steepest Descent Method Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 14T-011-R1
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationThe Conjugate Gradient Method
The Conjugate Gradient Method Lecture 5, Continuous Optimisation Oxford University Computing Laboratory, HT 2006 Notes by Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The notion of complexity (per iteration)
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationA Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity
A Trust Funnel Algorithm for Nonconvex Equality Constrained Optimization with O(ɛ 3/2 ) Complexity Mohammadreza Samadi, Lehigh University joint work with Frank E. Curtis (stand-in presenter), Lehigh University
More informationDept. of Mathematics, University of Dundee, Dundee DD1 4HN, Scotland, UK.
A LIMITED MEMORY STEEPEST DESCENT METHOD ROGER FLETCHER Dept. of Mathematics, University of Dundee, Dundee DD1 4HN, Scotland, UK. e-mail fletcher@uk.ac.dundee.mcs Edinburgh Research Group in Optimization
More informationUnconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices
Unconstrained Optimization Models for Computing Several Extreme Eigenpairs of Real Symmetric Matrices Yu-Hong Dai LSEC, ICMSEC Academy of Mathematics and Systems Science Chinese Academy of Sciences Joint
More informationCourse Notes: Week 1
Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues
More informationGRADIENT = STEEPEST DESCENT
GRADIENT METHODS GRADIENT = STEEPEST DESCENT Convex Function Iso-contours gradient 0.5 0.4 4 2 0 8 0.3 0.2 0. 0 0. negative gradient 6 0.2 4 0.3 2.5 0.5 0 0.5 0.5 0 0.5 0.4 0.5.5 0.5 0 0.5 GRADIENT DESCENT
More informationSteepest Descent. Juan C. Meza 1. Lawrence Berkeley National Laboratory Berkeley, California 94720
Steepest Descent Juan C. Meza Lawrence Berkeley National Laboratory Berkeley, California 94720 Abstract The steepest descent method has a rich history and is one of the simplest and best known methods
More informationECE 680 Modern Automatic Control. Gradient and Newton s Methods A Review
ECE 680Modern Automatic Control p. 1/1 ECE 680 Modern Automatic Control Gradient and Newton s Methods A Review Stan Żak October 25, 2011 ECE 680Modern Automatic Control p. 2/1 Review of the Gradient Properties
More informationNumerical Optimization
Numerical Optimization Unit 2: Multivariable optimization problems Che-Rung Lee Scribe: February 28, 2011 (UNIT 2) Numerical Optimization February 28, 2011 1 / 17 Partial derivative of a two variable function
More informationStochastic Optimization Algorithms Beyond SG
Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods
More informationGradient Methods Using Momentum and Memory
Chapter 3 Gradient Methods Using Momentum and Memory The steepest descent method described in Chapter always steps in the negative gradient direction, which is orthogonal to the boundary of the level set
More informationLecture 3: Linesearch methods (continued). Steepest descent methods
Lecture 3: Linesearch methods (continued). Steepest descent methods Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 3: Linesearch methods (continued).
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationCS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3
CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3 Felix Kwok February 27, 2004 Written Problems 1. (Heath E3.10) Let B be an n n matrix, and assume that B is both
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric
More informationMay 9, 2014 MATH 408 MIDTERM EXAM OUTLINE. Sample Questions
May 9, 24 MATH 48 MIDTERM EXAM OUTLINE This exam will consist of two parts and each part will have multipart questions. Each of the 6 questions is worth 5 points for a total of points. The two part of
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 19: More on Arnoldi Iteration; Lanczos Iteration Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 17 Outline 1
More informationC&O367: Nonlinear Optimization (Winter 2013) Assignment 4 H. Wolkowicz
C&O367: Nonlinear Optimization (Winter 013) Assignment 4 H. Wolkowicz Posted Mon, Feb. 8 Due: Thursday, Feb. 8 10:00AM (before class), 1 Matrices 1.1 Positive Definite Matrices 1. Let A S n, i.e., let
More informationResidual iterative schemes for largescale linear systems
Universidad Central de Venezuela Facultad de Ciencias Escuela de Computación Lecturas en Ciencias de la Computación ISSN 1316-6239 Residual iterative schemes for largescale linear systems William La Cruz
More informationDirect methods for symmetric eigenvalue problems
Direct methods for symmetric eigenvalue problems, PhD McMaster University School of Computational Engineering and Science February 4, 2008 1 Theoretical background Posing the question Perturbation theory
More informationLast Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection
Eigenvalue Problems Last Time Social Network Graphs Betweenness Girvan-Newman Algorithm Graph Laplacian Spectral Bisection λ 2, w 2 Today Small deviation into eigenvalue problems Formulation Standard eigenvalue
More informationEigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis
Eigenvalue Problems Eigenvalue problems occur in many areas of science and engineering, such as structural analysis Eigenvalues also important in analyzing numerical methods Theory and algorithms apply
More informationNumerical Methods - Numerical Linear Algebra
Numerical Methods - Numerical Linear Algebra Y. K. Goh Universiti Tunku Abdul Rahman 2013 Y. K. Goh (UTAR) Numerical Methods - Numerical Linear Algebra I 2013 1 / 62 Outline 1 Motivation 2 Solving Linear
More informationDS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.
DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1
More informationOn the steplength selection in gradient methods for unconstrained optimization
On the steplength selection in gradient methods for unconstrained optimization Daniela di Serafino a,, Valeria Ruggiero b, Gerardo Toraldo c, Luca Zanni d a Department of Mathematics and Physics, University
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationSolving linear equations with Gaussian Elimination (I)
Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationA Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem
A Cholesky LR algorithm for the positive definite symmetric diagonal-plus-semiseparable eigenproblem Bor Plestenjak Department of Mathematics University of Ljubljana Slovenia Ellen Van Camp and Marc Van
More informationGradient methods exploiting spectral properties
Noname manuscript No. (will be inserted by the editor) Gradient methods exploiting spectral properties Yaui Huang Yu-Hong Dai Xin-Wei Liu Received: date / Accepted: date Abstract A new stepsize is derived
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More information10. Unconstrained minimization
Convex Optimization Boyd & Vandenberghe 10. Unconstrained minimization terminology and assumptions gradient descent method steepest descent method Newton s method self-concordant functions implementation
More informationSuppose that the approximate solutions of Eq. (1) satisfy the condition (3). Then (1) if η = 0 in the algorithm Trust Region, then lim inf.
Maria Cameron 1. Trust Region Methods At every iteration the trust region methods generate a model m k (p), choose a trust region, and solve the constraint optimization problem of finding the minimum of
More informationMath 164: Optimization Barzilai-Borwein Method
Math 164: Optimization Barzilai-Borwein Method Instructor: Wotao Yin Department of Mathematics, UCLA Spring 2015 online discussions on piazza.com Main features of the Barzilai-Borwein (BB) method The BB
More informationAdaptive Beamforming Algorithms
S. R. Zinka srinivasa_zinka@daiict.ac.in October 29, 2014 Outline 1 Least Mean Squares 2 Sample Matrix Inversion 3 Recursive Least Squares 4 Accelerated Gradient Approach 5 Conjugate Gradient Method Outline
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More information, b = 0. (2) 1 2 The eigenvectors of A corresponding to the eigenvalues λ 1 = 1, λ 2 = 3 are
Quadratic forms We consider the quadratic function f : R 2 R defined by f(x) = 2 xt Ax b T x with x = (x, x 2 ) T, () where A R 2 2 is symmetric and b R 2. We will see that, depending on the eigenvalues
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 4 Eigenvalue Problems Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationIterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)
Iterative methods for Linear System of Equations Joint Advanced Student School (JASS-2009) Course #2: Numerical Simulation - from Models to Software Introduction In numerical simulation, Partial Differential
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationThe Lanczos and conjugate gradient algorithms
The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization
More informationORIE 6326: Convex Optimization. Quasi-Newton Methods
ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationECE580 Partial Solution to Problem Set 3
ECE580 Fall 2015 Solution to Problem Set 3 October 23, 2015 1 ECE580 Partial Solution to Problem Set 3 These problems are from the textbook by Chong and Zak, 4th edition, which is the textbook for the
More informationA New Trust Region Algorithm Using Radial Basis Function Models
A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationA refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices
A refined Lanczos method for computing eigenvalues and eigenvectors of unsymmetric matrices Jean Christophe Tremblay and Tucker Carrington Chemistry Department Queen s University 23 août 2007 We want to
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationPreliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012
Instructions Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012 The exam consists of four problems, each having multiple parts. You should attempt to solve all four problems. 1.
More informationConstrained optimization. Unconstrained optimization. One-dimensional. Multi-dimensional. Newton with equality constraints. Active-set method.
Optimization Unconstrained optimization One-dimensional Multi-dimensional Newton s method Basic Newton Gauss- Newton Quasi- Newton Descent methods Gradient descent Conjugate gradient Constrained optimization
More information14. Nonlinear equations
L. Vandenberghe ECE133A (Winter 2018) 14. Nonlinear equations Newton method for nonlinear equations damped Newton method for unconstrained minimization Newton method for nonlinear least squares 14-1 Set
More informationAMS526: Numerical Analysis I (Numerical Linear Algebra)
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 21: Sensitivity of Eigenvalues and Eigenvectors; Conjugate Gradient Method Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis
More informationNumerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725
Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: proximal gradient descent Consider the problem min g(x) + h(x) with g, h convex, g differentiable, and h simple
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More information6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection
6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationSymmetric Matrices and Eigendecomposition
Symmetric Matrices and Eigendecomposition Robert M. Freund January, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 2 1 Symmetric Matrices and Convexity of Quadratic Functions
More informationNetwork Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)
Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu
More informationLecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky
Lecture 2 INF-MAT 4350 2009: 7.1-7.6, LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Tom Lyche and Michael Floater Centre of Mathematics for Applications, Department of Informatics,
More informationReduced-Hessian Methods for Constrained Optimization
Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its
More information6.4 Krylov Subspaces and Conjugate Gradients
6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P
More informationTMA4180 Solutions to recommended exercises in Chapter 3 of N&W
TMA480 Solutions to recommended exercises in Chapter 3 of N&W Exercise 3. The steepest descent and Newtons method with the bactracing algorithm is implemented in rosenbroc_newton.m. With initial point
More informationIterative methods for Linear System
Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline Basics: Matrices and their properties Eigenvalues, Condition Number Iterative Methods Direct and
More informationNew Inexact Line Search Method for Unconstrained Optimization 1,2
journal of optimization theory and applications: Vol. 127, No. 2, pp. 425 446, November 2005 ( 2005) DOI: 10.1007/s10957-005-6553-6 New Inexact Line Search Method for Unconstrained Optimization 1,2 Z.
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationHow to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization
How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department
More informationCME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.
CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax
More informationThe Q Method for Symmetric Cone Programmin
The Q Method for Symmetric Cone Programming The Q Method for Symmetric Cone Programmin Farid Alizadeh and Yu Xia alizadeh@rutcor.rutgers.edu, xiay@optlab.mcma Large Scale Nonlinear and Semidefinite Progra
More information5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationOptimization Methods. Lecture 18: Optimality Conditions and. Gradient Methods. for Unconstrained Optimization
5.93 Optimization Methods Lecture 8: Optimality Conditions and Gradient Methods for Unconstrained Optimization Outline. Necessary and sucient optimality conditions Slide. Gradient m e t h o d s 3. The
More informationNumerical methods part 2
Numerical methods part 2 Alain Hébert alain.hebert@polymtl.ca Institut de génie nucléaire École Polytechnique de Montréal ENE6103: Week 6 Numerical methods part 2 1/33 Content (week 6) 1 Solution of an
More informationMidterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015
Midterm for Introduction to Numerical Analysis I, AMSC/CMSC 466, on 10/29/2015 The test lasts 1 hour and 15 minutes. No documents are allowed. The use of a calculator, cell phone or other equivalent electronic
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationKrylov subspace projection methods
I.1.(a) Krylov subspace projection methods Orthogonal projection technique : framework Let A be an n n complex matrix and K be an m-dimensional subspace of C n. An orthogonal projection technique seeks
More informationThis manuscript is for review purposes only.
1 2 3 4 5 6 7 8 9 10 11 12 THE USE OF QUADRATIC REGULARIZATION WITH A CUBIC DESCENT CONDITION FOR UNCONSTRAINED OPTIMIZATION E. G. BIRGIN AND J. M. MARTíNEZ Abstract. Cubic-regularization and trust-region
More informationSymmetric matrices and dot products
Symmetric matrices and dot products Proposition An n n matrix A is symmetric iff, for all x, y in R n, (Ax) y = x (Ay). Proof. If A is symmetric, then (Ax) y = x T A T y = x T Ay = x (Ay). If equality
More information1 Conjugate gradients
Notes for 2016-11-18 1 Conjugate gradients We now turn to the method of conjugate gradients (CG), perhaps the best known of the Krylov subspace solvers. The CG iteration can be characterized as the iteration
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationConjugate Gradient Method
Conjugate Gradient Method Tsung-Ming Huang Department of Mathematics National Taiwan Normal University October 10, 2011 T.M. Huang (NTNU) Conjugate Gradient Method October 10, 2011 1 / 36 Outline 1 Steepest
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More information1. Introduction. We develop an active set method for the box constrained optimization
SIAM J. OPTIM. Vol. 17, No. 2, pp. 526 557 c 2006 Society for Industrial and Applied Mathematics A NEW ACTIVE SET ALGORITHM FOR BOX CONSTRAINED OPTIMIZATION WILLIAM W. HAGER AND HONGCHAO ZHANG Abstract.
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationAdaptive two-point stepsize gradient algorithm
Numerical Algorithms 27: 377 385, 2001. 2001 Kluwer Academic Publishers. Printed in the Netherlands. Adaptive two-point stepsize gradient algorithm Yu-Hong Dai and Hongchao Zhang State Key Laboratory of
More information