Optimization 2. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
|
|
- William McCoy
- 6 years ago
- Views:
Transcription
1 Optimization 2 CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Optimization 2 1 / 38
2 Last Time... Last Time... Gradient descent updates variables by x k+1 = x k α f(x k ). (1) Gradient descent can be slow. To increase convergence rate, find good step length α using line search. Leow Wee Kheng (NUS) Optimization 2 2 / 38
3 Line Search Line Search Gradient descent minimizes f(x) by updating x k along f(x k ): x k+1 = x k α f(x k ). (2) In general, other descent directions p k are possible if f(x) p k < 0. Then, update x k according to x k+1 = x k +αp k. (3) Ideally, want to find α > 0 that minimizes f(x k +αp k ). Too expensive. In practice, find α > 0 that reduces f(x) sufficiently (not too small). The simple condition f(x k +αp k ) < f(x k ) may not be sufficient. Leow Wee Kheng (NUS) Optimization 2 3 / 38
4 Line Search A popular sufficient line search condition is Armijo condition: f(x k +αp k ) f(x k )+c 1 α f(x k ) p k, (4) where c 1 > 0, usually quite small, say, c 1 = Armijo condition may accepts very small α. So, add another condition to ensure sufficient reduction of f(x): f(x k +α k p k ) p k c 2 f(x) p k, (5) with 0 < c 1 < c 2 1, typically c 2 = 0.9. These two conditions together are called Wolfe conditions. Known result: Gradient descent will find local minimum if α satisfies Wolfe conditions. Leow Wee Kheng (NUS) Optimization 2 4 / 38
5 Line Search Another way is to bound α from below, giving Goldstein conditions: f(x k )+(1 c)α f(x k ) p k f(x k +αp k ) f(x k )+cα f(x k ) p k. (6) Another simple way is to start with large α and reduce it iteratively: Backtracking Line Search 1. Choose large α > 0, 0 < ρ < Repeat until Armijo condition is met: 2.1 α ρα. In this case, Armijo condition alone is enough. Leow Wee Kheng (NUS) Optimization 2 5 / 38
6 Gauss-Newton Method Gauss-Newton Method Carl Friedrich Gauss Sir Isaac Newton Leow Wee Kheng (NUS) Optimization 2 6 / 38
7 Gauss-Newton Method Gauss-Newton method estimates Hessian H by 1st-order derivatives. Express E(a) as the sum of square of r i (a), E(a) = n r i (a) 2, (7) i=1 where r i is the error or residual of each data point x i : and Then, 2 E a j a l = 2 n i=1 r i (a) = v i f(x i ;a), (8) r = [r 1 r n ]. (9) E a j = 2 n i=1 [ ri r i 2 ] r i +r i a j a l a j a l r i r i a j, (10) 2 n i=1 r i a j r i a l. (11) Leow Wee Kheng (NUS) Optimization 2 7 / 38
8 Gauss-Newton Method The derivative r i a j is the (i,j)-th element of the Jacobian J r of r. J r = r 1 r 1 a 1 a m..... r n r n a 1 a m The gradient and Hessian are related to the Jacobian by So, the update equation becomes. (12) E = 2J r r. (13) H 2J r J r, (14) a k+1 = a k (J r J r ) 1 J r r. (15) Leow Wee Kheng (NUS) Optimization 2 8 / 38
9 Gauss-Newton Method We can also use the Jacobian J f of f(x i ): J f = f(x 1 ) f(x 1 ) a 1 a m..... f(x n ) f(x n ) a 1 a m. (16) Then, J f = J r, and the update equation becomes a k+1 = a k +(J f J f) 1 J f r. (17) Leow Wee Kheng (NUS) Optimization 2 9 / 38
10 Gauss-Newton Method Compare to linear fitting: linear fitting a = (D D) 1 D v Gauss-Newton a = (J f J f) 1 J f r (J f J f) 1 J f is analogous to pseudo-inverse. Compare to gradient descent: gradient descent a = α E Gauss-Newton a = 1 2 (J r J r ) 1 E Gauss-Newton updates a along negative gradient at variable rate. Gauss-Newton method may converge slowly or diverge if initial guess a(0) is far from minimum, or matrix J J is ill-conditioned. Leow Wee Kheng (NUS) Optimization 2 10 / 38
11 Gauss-Newton Method Condition Number Condition Number The condition number of (real) matrix D is given by the ratio of its singular values: κ(d) = σ max(d) σ min (D). (18) If D is normal, i.e., D D = DD, then its condition number is given by the ratio of its eigenvalues: κ(d) = λ max(d) λ min (D). (19) Large condition number means the elements of D are less balanced. Leow Wee Kheng (NUS) Optimization 2 11 / 38
12 Gauss-Newton Method Condition Number For the linear equation Da = v, the condition number indicates change of solution a vs. change of v. Small condition number: Small change in v gives small change in a: stable. Small error in v gives small error in a. D is well-conditioned. Large condition number: Small change in v gives large change in a: unstable. Small error in v gives large error in a. D is ill-conditioned. Leow Wee Kheng (NUS) Optimization 2 12 / 38
13 Gauss-Newton Method Practical Issues Practical Issues The update equation, with J = J f, a k+1 = a k +(J J) 1 J r. (20) requires computation of matrix inverse, which can be expensive. Can use singular value decomposition (SVD) to help. SVD of J is J = UΣV. (21) Substituting it into the update equation yields (Homework) a k+1 = a k +VΣ 1 U r. (22) Leow Wee Kheng (NUS) Optimization 2 13 / 38
14 Gauss-Newton Method Practical Issues Convergence criteria: same as gradient descent E k+1 E k E k < ǫ, e.g., ǫ = a j,k+1 a j,k a j,k < ǫ, for each component j. If divergence occurs, can introduce step length α: where 0 < α < 1. a k+1 = a k +α(j J) 1 J r, (23) Is there an alternative way to control convergence? Yes: Levenberg-Marquardt Method. Leow Wee Kheng (NUS) Optimization 2 14 / 38
15 Levenberg-Marquardt Method Levenberg-Marquardt Method Gauss-Newton method updates a according to (J J) a = J r. (24) Levenberg modifies the update equation to λ is adjusted at each iteration. (J J+λI) a = J r. (25) If reduction of error E is large, can use small λ: more similar to Gauss-Newton. If reduction of error E is small, can use large λ: more similar to gradient descent. Leow Wee Kheng (NUS) Optimization 2 15 / 38
16 Levenberg-Marquardt Method If λ is very large, (J J+λI) λi, i.e., just doing gradient descent. Marquardt scales each component of gradient: (J J+λdiag(J J)) a = J r, (26) where diag(j J) is a diagonal matrix of the diagonal elements of J J. The (i,i)-th component of diag(j J) is n ( ) f(xl ) 2. (27) l=1 a i If (i,i)-th component of gradient is small, a i is updated more: faster convergence along i-th direction. Then, λ doesn t have to be too large. Leow Wee Kheng (NUS) Optimization 2 16 / 38
17 Quasi-Newton Quasi-Newton Quasi-Newton also minimizes f(x) by iteratively updating x: where the descent direction p k is x k+1 = x k +αp k (28) p k = H 1 k f(x k). (29) There are many methods of estimating H 1. The most effective is the method by Broyden, Fletcher, Goldfarb and Shanno (BFGS). Leow Wee Kheng (NUS) Optimization 2 17 / 38
18 Quasi-Newton BFGS Method Let x k = x k+1 x k and y k = f(x k+1 ) f(x k ). Then, estimate H k or H 1 k by H k+1 = H k H k x k x k H k x k H + y kyk k x k yk x, (30) k ( H 1 k+1 = I x kyk ) ( yk x H 1 k I y k x ) k k yk x + x k x k k yk x. (31) k (For derivation, please refer to [1].) Leow Wee Kheng (NUS) Optimization 2 18 / 38
19 Quasi-Newton BFGS Algorithm 1. Set α and initialize x 0 and H For k = 0,1,... until convergence: 2.1 Compute search direction p k = H 1 k f(x k). 2.2 Perform line search to find α that satisfies Wolfe conditions. 2.3 Compute x k = αp k, and set x k+1 = x k + x k. 2.4 Compute y k = f(x k+1 ) f(x k ). 2.5 Compute H 1 k+1 by Eq. 31. No standard way to initialize H. Can try H = I. Leow Wee Kheng (NUS) Optimization 2 19 / 38
20 Approximation of Gradient Approximation of Gradient All previous methods require the gradient f (or E). In many complex problems, it is very difficult to derive f. Several methods to overcome this problem: Use symbolic differentiation to derive math expression of f. Provided in Matlab, Maple, etc. Approximate gradient numerically. Optimize without gradient. Leow Wee Kheng (NUS) Optimization 2 20 / 38
21 Approximation of Gradient Forward Difference From Taylor series expansion, f(x+ x) = f(x)+ f(x) x+ (32) Denote the j-th component of x as ǫe j, where e j is the unit vector of the j-th dimension. Then, the j-th component of f(x) is f(x) f(x+ǫe j) f(x). (33) x j ǫ Note that x+ǫe j updates only the j-th component: Then, x+ǫe j = [ x 1 x j 1 x j x j+1 x m ] + (34) [ 0 0 ǫ 0 0 ] f(x) [ f(x) x 1 ] f(x). (35) x m Leow Wee Kheng (NUS) Optimization 2 21 / 38
22 Approximation of Gradient Central Difference Forward difference method has forward bias: x+ǫe j. Error in forward bias can accumulate quickly. To be less biased, use central difference. f(x+ǫe j ) f(x)+ǫ f(x) x j f(x ǫe j ) f(x) ǫ f(x) x j. So, f(x) f(x+ǫe j) f(x ǫe j ). (36) x j 2ǫ Leow Wee Kheng (NUS) Optimization 2 22 / 38
23 Optimization without Gradient Optimization without Gradient A simple idea is to descend along each coordinate direction iteratively. Let e 1,...,e m denote the unit vectors along coordinates 1,...,m. Coordinate Descent 1. Choose initial x Repeat until convergence: 2.1 For each j = 1,...,m: Find x k+1 along e j that minimizes f(x k ). Leow Wee Kheng (NUS) Optimization 2 23 / 38
24 Optimization without Gradient Alternative for Step 2.1.1: Find x k+1 along e j using line search to reduce f(x k ) sufficiently. Strength: Simplicity. Weakness: Can be slow. Can iterate around the minimum, never approaching it. Powell s Method Overcomes the problem of coordinate descent. Basic Idea: Choose directions that do not interfere with or undo previous effort. Leow Wee Kheng (NUS) Optimization 2 24 / 38
25 Alternating Optimization Alternating Optimization Let S R m denote the set of x that satisfy the constraints. Then, constrained optimization problem can be written as: minf(x). (37) x S Alternating optimization (AO) is similar to coordinate descent except that the variables x j,j = 1,...,m, are split into groups. Partition x S into p non-overlapping parts y l such that x = (y 1,...,y l,...,y p ). (38) Example: x = (x 1,x 2,x }{{} 4,x 3,x 7,x }{{} 5,x 6,x 8,x 9 ). }{{} y 1 y 2 y 3 Denote S l S as the set that contains y l. Leow Wee Kheng (NUS) Optimization 2 25 / 38
26 Alternating Optimization Alternating Optimization 1. Choose initial x (0) = (y (0) 1,...,y(0) p ). 2. For k = 0,1,2,..., until convergence: 2.1 For l = 1,...,p: Find y (k+1) l y (k+1) l such that = arg min f(y (k+1) 1,...,y (k+1) y l S l 1, y l, y (k) l+1,...,y(k) p ). l That is, find y (k+1) l while fixing other variables. (Alternative) Find y (k+1) l that decreases f(x) sufficiently: f(y (k+1) 1,...,y (k+1) l 1, y (k) l, y (k) l+1,...,y(k) p ) f(y (k+1) 1,...,y (k+1) l 1, y (k+1) l, y (k) l+1,...,y(k) p ) > c > 0. Leow Wee Kheng (NUS) Optimization 2 26 / 38
27 Alternating Optimization Example: initially (y (0) 1,y(0) 2,y(0) 3,...,y(0) l,...,y (0) p ) k = 0,l = 1 (y (1) 1,y(0) 2,y(0) 3,...,y(0) l,...,y (0) p ) k = 0,l = 2 (y (1) 1,y(1) 2,y(0) 3,...,y(0) l,...,y (0) p ) k = 0,l = p (y (1) 1,y(1) 2,y(1) 3,...,y(1) l,...,y p (1) ) k = 1,l = 1 (y (2) 1,y(1) 2,y(1) 3,...,y(1) l,...,y p (1) ) Obviously, we want to partition the variables so that Each sub-problem of finding y l is relatively easy to solve. Solving y l does not undo previous partial solutions.. Leow Wee Kheng (NUS) Optimization 2 27 / 38
28 Alternating Optimization Example: k-means clustering k-means clustering tries to group data points x into k clusters. c 2 c 1 c 3 Let c j denote the prototype of cluster C j. Then, k-means clustering finds C j and c j, j = 1,...,k, that minimize within-cluster difference: k D = (x c j ) 2. (39) j=1 x C j Leow Wee Kheng (NUS) Optimization 2 28 / 38
29 Alternating Optimization Suppose C j are already known. Then, the c j that minimize D is given by That is, c j is the centroid of C j : How to find C j? Use alternating optimization. D c j = 2 x C j (x c j ) = 0. (40) c j = 1 C j x C j x. (41) Leow Wee Kheng (NUS) Optimization 2 29 / 38
30 Alternating Optimization k-means clustering 1. Initialize c j, j = 1,...,k. 2. Repeat until convergence: 2.1 For each x, assign it to the nearest cluster C j : (fix c j, update C j ) 2.2 Update cluster centroids: (fix C j, update c j ) x c j x c l, for l j. c j = 1 C j Step 2.2 is (locally) optimal. Step 2.1 does not undo step 2.2. Why? x C j x. So, k-means clustering can converge to a local minimum. Leow Wee Kheng (NUS) Optimization 2 30 / 38
31 Convergence Rates Convergence Rates Suppose the sequence {x k } converges to x. Then, the sequence converges to x linearly if µ (0,1) such that x k+1 x lim k x k x = µ. (42) The sequence converges sublinearly if µ = 1. The sequence converges superlinearly if µ = 0. If the sequence converges sublinearly and then, the sequence converges logarithmically. x k+2 x k+1 lim = 1, (43) k x k+1 x k Leow Wee Kheng (NUS) Optimization 2 31 / 38
32 Convergence Rates We can distinguish between different rates of superlinear convergence. The sequence converges with order of q, for q > 1, if µ > 0 such that x k+1 x lim k x k x q = µ > 0. (44) If q = 1, the sequence converges q-linearly. If q = 2, the sequence converges qudratically or q-quadratically. If q = 3, the sequence converges cubically or q-cubically. etc. Leow Wee Kheng (NUS) Optimization 2 32 / 38
33 Convergence Rates Known Convergence Rates Gradient descent converges linearly, often slowly linearly. Gauss-Newton converges quadratically, if it converges. Levenberg-Marquardt converges quadratically. Quasi-Newton converges superlinearly. Alternating optimization converges q-linearly if f is convex around a local minimizer x. Leow Wee Kheng (NUS) Optimization 2 33 / 38
34 Summary Summary Method For Use Gradient descent optimization f Gauss-Newton nonlinear least-square E, estimate of H Levenberg-Marquardt nonlinear least-square E, estimate of H quasi-newton optimization f, estimate of H Newton optimization f, H coordinate descent optimization f alternating optimization optimization f Leow Wee Kheng (NUS) Optimization 2 34 / 38
35 Summary SciPy supports many optimization methods including linear least-square Levenberg-Marquardt method quasi-newton BFGS method line search that satisfies Wolfe conditions etc. For more details and other methods, refer to [1]. Leow Wee Kheng (NUS) Optimization 2 35 / 38
36 Probing Questions Probing Questions Can you apply an algorithm specialized for nonlinear least square, e.g., Gauss-Newton, to generic optimization? Gauss-Newton may converge slowly or diverge if the matrix J J is ill-conditioned. How to make the matrix well-conditioned? How can your alternating optimization program detect interference of previous update by current update? If your alternating optimization program detects frequent interference between updates, what can you do about it? So far, we have look at minimizing a scalar function f(x). How to minimize a vector function f(x) or multiple scalar functions f l (x) simultaneously? How to ensure that your optimization program will always converge regardless of the test condition? Leow Wee Kheng (NUS) Optimization 2 36 / 38
37 Homework Homework 1. Describe the essence of Gauss-Newton, Levenberg-Marquardt, quasi-newton, coordinate descent and alternating optimization each in one sentence. 2. With SVD of J given by J = UΣV, show that (J J) 1 J = VΣ 1 U. Hint: For matrices A and B, (AB) 1 = B 1 A Q3(c) of AY2015/16 Final Evaluation. 4. Q5 of AY2015/16 Final Evaluation. Leow Wee Kheng (NUS) Optimization 2 37 / 38
38 References References 1. J. Nocedal and S. J. Wright, Numerical Optimization, Springer-Verlag, W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numerical Recipes: The Art of Scientific Computing, 3rd Ed., Cambridge University Press, J. C. Bezdek and R. J. Hathaway, Some notes on alternating optimization, in Proc. Int. Conf. on Fuzzy Systems, Leow Wee Kheng (NUS) Optimization 2 38 / 38
5 Quasi-Newton Methods
Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters
More informationMethods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent
Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived
More informationOptimization: Nonlinear Optimization without Constraints. Nonlinear Optimization without Constraints 1 / 23
Optimization: Nonlinear Optimization without Constraints Nonlinear Optimization without Constraints 1 / 23 Nonlinear optimization without constraints Unconstrained minimization min x f(x) where f(x) is
More informationPrincipal Component Analysis
Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal
More informationLecture 7 Unconstrained nonlinear programming
Lecture 7 Unconstrained nonlinear programming Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University,
More informationNumerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09
Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods
More informationProgramming, numerics and optimization
Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428
More informationQuasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationMath 273a: Optimization Netwon s methods
Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives
More informationUnconstrained optimization
Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout
More informationChapter 4. Unconstrained optimization
Chapter 4. Unconstrained optimization Version: 28-10-2012 Material: (for details see) Chapter 11 in [FKS] (pp.251-276) A reference e.g. L.11.2 refers to the corresponding Lemma in the book [FKS] PDF-file
More informationRobust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng
Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationOptimization II: Unconstrained Multivariable
Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Justin Solomon CS 205A: Mathematical Methods Optimization II: Unconstrained Multivariable 1
More informationMATH 4211/6211 Optimization Quasi-Newton Method
MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:
More informationNumerical Optimization
Numerical Optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Spring 2010 Emo Todorov (UW) AMATH/CSE 579, Spring 2010 Lecture 9 1 / 8 Gradient descent
More informationOptimization Methods
Optimization Methods Decision making Examples: determining which ingredients and in what quantities to add to a mixture being made so that it will meet specifications on its composition allocating available
More informationMethods for Unconstrained Optimization Numerical Optimization Lectures 1-2
Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods
More informationOptimization II: Unconstrained Multivariable
Optimization II: Unconstrained Multivariable CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Optimization II: Unconstrained
More informationHigher-Order Methods
Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth
More informationNonlinear Programming
Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week
More informationMaria Cameron. f(x) = 1 n
Maria Cameron 1. Local algorithms for solving nonlinear equations Here we discuss local methods for nonlinear equations r(x) =. These methods are Newton, inexact Newton and quasi-newton. We will show that
More informationLecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 5. Nonlinear Equations
Lecture Notes to Accompany Scientific Computing An Introductory Survey Second Edition by Michael T Heath Chapter 5 Nonlinear Equations Copyright c 2001 Reproduction permitted only for noncommercial, educational
More informationEAD 115. Numerical Solution of Engineering and Scientific Problems. David M. Rocke Department of Applied Science
EAD 115 Numerical Solution of Engineering and Scientific Problems David M. Rocke Department of Applied Science Multidimensional Unconstrained Optimization Suppose we have a function f() of more than one
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationNumerisches Rechnen. (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang. Institut für Geometrie und Praktische Mathematik RWTH Aachen
Numerisches Rechnen (für Informatiker) M. Grepl P. Esser & G. Welper & L. Zhang Institut für Geometrie und Praktische Mathematik RWTH Aachen Wintersemester 2011/12 IGPM, RWTH Aachen Numerisches Rechnen
More informationOptimization and Root Finding. Kurt Hornik
Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding
More information4 Newton Method. Unconstrained Convex Optimization 21. H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion:
Unconstrained Convex Optimization 21 4 Newton Method H(x)p = f(x). Newton direction. Why? Recall second-order staylor series expansion: f(x + p) f(x)+p T f(x)+ 1 2 pt H(x)p ˆf(p) In general, ˆf(p) won
More informationNonlinear Optimization: What s important?
Nonlinear Optimization: What s important? Julian Hall 10th May 2012 Convexity: convex problems A local minimizer is a global minimizer A solution of f (x) = 0 (stationary point) is a minimizer A global
More informationj=1 r 1 x 1 x n. r m r j (x) r j r j (x) r j (x). r j x k
Maria Cameron Nonlinear Least Squares Problem The nonlinear least squares problem arises when one needs to find optimal set of parameters for a nonlinear model given a large set of data The variables x,,
More informationQuasi-Newton methods for minimization
Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1
More informationTrajectory-based optimization
Trajectory-based optimization Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2012 Emo Todorov (UW) AMATH/CSE 579, Winter 2012 Lecture 6 1 / 13 Using
More informationConjugate Directions for Stochastic Gradient Descent
Conjugate Directions for Stochastic Gradient Descent Nicol N Schraudolph Thore Graepel Institute of Computational Science ETH Zürich, Switzerland {schraudo,graepel}@infethzch Abstract The method of conjugate
More information2. Quasi-Newton methods
L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization
More informationCS 450 Numerical Analysis. Chapter 5: Nonlinear Equations
Lecture slides based on the textbook Scientific Computing: An Introductory Survey by Michael T. Heath, copyright c 2018 by the Society for Industrial and Applied Mathematics. http://www.siam.org/books/cl80
More informationStatistics 580 Optimization Methods
Statistics 580 Optimization Methods Introduction Let fx be a given real-valued function on R p. The general optimization problem is to find an x ɛ R p at which fx attain a maximum or a minimum. It is of
More informationLine Search Methods for Unconstrained Optimisation
Line Search Methods for Unconstrained Optimisation Lecture 8, Numerical Linear Algebra and Optimisation Oxford University Computing Laboratory, MT 2007 Dr Raphael Hauser (hauser@comlab.ox.ac.uk) The Generic
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationOptimization Methods
Optimization Methods Categorization of Optimization Problems Continuous Optimization Discrete Optimization Combinatorial Optimization Variational Optimization Common Optimization Concepts in Computer Vision
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the
More informationEECS260 Optimization Lecture notes
EECS260 Optimization Lecture notes Based on Numerical Optimization (Nocedal & Wright, Springer, 2nd ed., 2006) Miguel Á. Carreira-Perpiñán EECS, University of California, Merced May 2, 2010 1 Introduction
More informationNumerical Methods I Solving Nonlinear Equations
Numerical Methods I Solving Nonlinear Equations Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 16th, 2014 A. Donev (Courant Institute)
More informationNonlinearOptimization
1/35 NonlinearOptimization Pavel Kordík Department of Computer Systems Faculty of Information Technology Czech Technical University in Prague Jiří Kašpar, Pavel Tvrdík, 2011 Unconstrained nonlinear optimization,
More informationAM 205: lecture 19. Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality Today: Newton s method for optimization, survey of optimization methods Optimality Conditions: Equality Constrained Case As another example of equality
More information17 Solution of Nonlinear Systems
17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m
More informationIntroduction to Nonlinear Optimization Paul J. Atzberger
Introduction to Nonlinear Optimization Paul J. Atzberger Comments should be sent to: atzberg@math.ucsb.edu Introduction We shall discuss in these notes a brief introduction to nonlinear optimization concepts,
More informationQuasi-Newton Methods
Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 6 Optimization Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction permitted
More informationMath 411 Preliminaries
Math 411 Preliminaries Provide a list of preliminary vocabulary and concepts Preliminary Basic Netwon s method, Taylor series expansion (for single and multiple variables), Eigenvalue, Eigenvector, Vector
More informationMA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS
MA/OR/ST 706: Nonlinear Programming Midterm Exam Instructor: Dr. Kartik Sivaramakrishnan INSTRUCTIONS 1. Please write your name and student number clearly on the front page of the exam. 2. The exam is
More informationCOMP 558 lecture 18 Nov. 15, 2010
Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to
More informationNotes on Numerical Optimization
Notes on Numerical Optimization University of Chicago, 2014 Viva Patel October 18, 2014 1 Contents Contents 2 List of Algorithms 4 I Fundamentals of Optimization 5 1 Overview of Numerical Optimization
More informationMATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N.
MATH 3795 Lecture 13. Numerical Solution of Nonlinear Equations in R N. Dmitriy Leykekhman Fall 2008 Goals Learn about different methods for the solution of F (x) = 0, their advantages and disadvantages.
More informationOptimization Tutorial 1. Basic Gradient Descent
E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.
More informationChapter 3 Numerical Methods
Chapter 3 Numerical Methods Part 2 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization 1 Outline 3.2 Systems of Equations 3.3 Nonlinear and Constrained Optimization Summary 2 Outline 3.2
More informationData Mining (Mineria de Dades)
Data Mining (Mineria de Dades) Lluís A. Belanche belanche@lsi.upc.edu Soft Computing Research Group Dept. de Llenguatges i Sistemes Informàtics (Software department) Universitat Politècnica de Catalunya
More informationImproving L-BFGS Initialization for Trust-Region Methods in Deep Learning
Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University
More informationCS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares
CS 542G: Robustifying Newton, Constraints, Nonlinear Least Squares Robert Bridson October 29, 2008 1 Hessian Problems in Newton Last time we fixed one of plain Newton s problems by introducing line search
More informationNumerical solutions of nonlinear systems of equations
Numerical solutions of nonlinear systems of equations Tsung-Ming Huang Department of Mathematics National Taiwan Normal University, Taiwan E-mail: min@math.ntnu.edu.tw August 28, 2011 Outline 1 Fixed points
More informationConvex Optimization. Problem set 2. Due Monday April 26th
Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining
More informationLinear and Nonlinear Optimization
Linear and Nonlinear Optimization German University in Cairo October 10, 2016 Outline Introduction Gradient descent method Gauss-Newton method Levenberg-Marquardt method Case study: Straight lines have
More informationGradient-Based Optimization
Multidisciplinary Design Optimization 48 Chapter 3 Gradient-Based Optimization 3. Introduction In Chapter we described methods to minimize (or at least decrease) a function of one variable. While problems
More informationApplied Mathematics 205. Unit I: Data Fitting. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit I: Data Fitting Lecturer: Dr. David Knezevic Unit I: Data Fitting Chapter I.4: Nonlinear Least Squares 2 / 25 Nonlinear Least Squares So far we have looked at finding a best
More informationGeometry optimization
Geometry optimization Trygve Helgaker Centre for Theoretical and Computational Chemistry Department of Chemistry, University of Oslo, Norway European Summer School in Quantum Chemistry (ESQC) 211 Torre
More informationMatrix Derivatives and Descent Optimization Methods
Matrix Derivatives and Descent Optimization Methods 1 Qiang Ning Department of Electrical and Computer Engineering Beckman Institute for Advanced Science and Techonology University of Illinois at Urbana-Champaign
More informationNon-polynomial Least-squares fitting
Applied Math 205 Last time: piecewise polynomial interpolation, least-squares fitting Today: underdetermined least squares, nonlinear least squares Homework 1 (and subsequent homeworks) have several parts
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationOutline. Scientific Computing: An Introductory Survey. Nonlinear Equations. Nonlinear Equations. Examples: Nonlinear Equations
Methods for Systems of Methods for Systems of Outline Scientific Computing: An Introductory Survey Chapter 5 1 Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign
More informationMath 409/509 (Spring 2011)
Math 409/509 (Spring 2011) Instructor: Emre Mengi Study Guide for Homework 2 This homework concerns the root-finding problem and line-search algorithms for unconstrained optimization. Please don t hesitate
More informationCLASS NOTES Computational Methods for Engineering Applications I Spring 2015
CLASS NOTES Computational Methods for Engineering Applications I Spring 2015 Petros Koumoutsakos Gerardo Tauriello (Last update: July 27, 2015) IMPORTANT DISCLAIMERS 1. REFERENCES: Much of the material
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationA Trust-region-based Sequential Quadratic Programming Algorithm
Downloaded from orbit.dtu.dk on: Oct 19, 2018 A Trust-region-based Sequential Quadratic Programming Algorithm Henriksen, Lars Christian; Poulsen, Niels Kjølstad Publication date: 2010 Document Version
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Gradient Descent, Newton-like Methods Mark Schmidt University of British Columbia Winter 2017 Admin Auditting/registration forms: Submit them in class/help-session/tutorial this
More informationIntroduction. New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems
New Nonsmooth Trust Region Method for Unconstraint Locally Lipschitz Optimization Problems Z. Akbari 1, R. Yousefpour 2, M. R. Peyghami 3 1 Department of Mathematics, K.N. Toosi University of Technology,
More informationE5295/5B5749 Convex optimization with engineering applications. Lecture 8. Smooth convex unconstrained and equality-constrained minimization
E5295/5B5749 Convex optimization with engineering applications Lecture 8 Smooth convex unconstrained and equality-constrained minimization A. Forsgren, KTH 1 Lecture 8 Convex optimization 2006/2007 Unconstrained
More informationLine Search Algorithms
Lab 1 Line Search Algorithms Investigate various Line-Search algorithms for numerical opti- Lab Objective: mization. Overview of Line Search Algorithms Imagine you are out hiking on a mountain, and you
More informationComparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming Problems
International Journal of Scientific and Research Publications, Volume 3, Issue 10, October 013 1 ISSN 50-3153 Comparative study of Optimization methods for Unconstrained Multivariable Nonlinear Programming
More informationScientific Computing: An Introductory Survey
Scientific Computing: An Introductory Survey Chapter 5 Nonlinear Equations Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More information1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:
Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion
More informationStatic unconstrained optimization
Static unconstrained optimization 2 In unconstrained optimization an objective function is minimized without any additional restriction on the decision variables, i.e. min f(x) x X ad (2.) with X ad R
More informationImproved Damped Quasi-Newton Methods for Unconstrained Optimization
Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified
More informationWritten Examination
Division of Scientific Computing Department of Information Technology Uppsala University Optimization Written Examination 202-2-20 Time: 4:00-9:00 Allowed Tools: Pocket Calculator, one A4 paper with notes
More informationIterative Methods for Smooth Objective Functions
Optimization Iterative Methods for Smooth Objective Functions Quadratic Objective Functions Stationary Iterative Methods (first/second order) Steepest Descent Method Landweber/Projected Landweber Methods
More informationQuasi-Newton Methods
Quasi-Newton Methods Werner C. Rheinboldt These are excerpts of material relating to the boos [OR00 and [Rhe98 and of write-ups prepared for courses held at the University of Pittsburgh. Some further references
More informationOutline. Scientific Computing: An Introductory Survey. Optimization. Optimization Problems. Examples: Optimization Problems
Outline Scientific Computing: An Introductory Survey Chapter 6 Optimization 1 Prof. Michael. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2002. Reproduction
More informationSolving linear equations with Gaussian Elimination (I)
Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian
More informationMotivation: We have already seen an example of a system of nonlinear equations when we studied Gaussian integration (p.8 of integration notes)
AMSC/CMSC 460 Computational Methods, Fall 2007 UNIT 5: Nonlinear Equations Dianne P. O Leary c 2001, 2002, 2007 Solving Nonlinear Equations and Optimization Problems Read Chapter 8. Skip Section 8.1.1.
More information(One Dimension) Problem: for a function f(x), find x 0 such that f(x 0 ) = 0. f(x)
Solving Nonlinear Equations & Optimization One Dimension Problem: or a unction, ind 0 such that 0 = 0. 0 One Root: The Bisection Method This one s guaranteed to converge at least to a singularity, i not
More informationNonlinear Optimization for Optimal Control
Nonlinear Optimization for Optimal Control Pieter Abbeel UC Berkeley EECS Many slides and figures adapted from Stephen Boyd [optional] Boyd and Vandenberghe, Convex Optimization, Chapters 9 11 [optional]
More informationAM 205: lecture 19. Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods
AM 205: lecture 19 Last time: Conditions for optimality, Newton s method for optimization Today: survey of optimization methods Quasi-Newton Methods General form of quasi-newton methods: x k+1 = x k α
More informationSECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS
SECTION: CONTINUOUS OPTIMISATION LECTURE 4: QUASI-NEWTON METHODS HONOUR SCHOOL OF MATHEMATICS, OXFORD UNIVERSITY HILARY TERM 2005, DR RAPHAEL HAUSER 1. The Quasi-Newton Idea. In this lecture we will discuss
More informationQuasi-Newton Methods. Javier Peña Convex Optimization /36-725
Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex
More informationMarch 8, 2010 MATH 408 FINAL EXAM SAMPLE
March 8, 200 MATH 408 FINAL EXAM SAMPLE EXAM OUTLINE The final exam for this course takes place in the regular course classroom (MEB 238) on Monday, March 2, 8:30-0:20 am. You may bring two-sided 8 page
More informationUniversity of Maryland at College Park. limited amount of computer memory, thereby allowing problems with a very large number
Limited-Memory Matrix Methods with Applications 1 Tamara Gibson Kolda 2 Applied Mathematics Program University of Maryland at College Park Abstract. The focus of this dissertation is on matrix decompositions
More informationLBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)
LBFGS John Langford, Large Scale Machine Learning Class, February 5 (post presentation version) We are still doing Linear Learning Features: a vector x R n Label: y R Goal: Learn w R n such that ŷ w (x)
More informationSearch Directions for Unconstrained Optimization
8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All
More information