Stochastic Quasi-Newton Methods

Size: px
Start display at page:

Download "Stochastic Quasi-Newton Methods"

Transcription

1 Stochastic Quasi-Newton Methods Donald Goldfarb Department of IEOR Columbia University UCLA Distinguished Lecture Series May 17-19, / 35

2 Outline Stochastic Approximation Stochastic Gradient Descent Variance Reduction Techniques Newton-like and quasi-newton methods for convex stochastic optimization problems using limited memory block BFGS updates. Numerical results on problems from machine learning. Quasi-Newton methods for nonconvex stochastic optimization problems using damped and modified limited memory BFGS updates. 2 / 35

3 Stochastic optimization Stochastic optimization min f (x) = E[f (x, ξ)], ξ is random variable Or finite sum (with f i (x) f (x, ξ i ) for i = 1,..., n and very large n) min f (x) = 1 n f i (x) n f and f are very expensive to evaluate; stochastic gradient descent (SGD) methods choose a random subset S [n] and evaluate f S (x) = 1 S i S i=1 f i (x) and f S (x) = 1 S f i (x) Essentially, only noisy info about f, f and 2 f is available Challenge: how to smooth variability of stochastic methods Challenge: how to design methods that take advantage of noisy 2nd-order information? i S 3 / 35

4 Stochastic optimization Deterministic gradient method Stochastic gradient method 4 / 35

5 Stochastic Variance Reduced Gradients Stochastic methods converge slowly near the optimum due to the variance of the gradient estimates f S (x); hence requiring a decreasing step size. We use the control variates approach of Johnson and Zhang (2013) for a SGD method SVRG. It uses d = f S (x t ) f S (w k ) + f (w k ), where w k is a reference point, in place of f S (x t ). w k, and the full gradient, are computed after each full pass of the data, hence doubling the work of computing stochastic gradients. 5 / 35

6 Stochastic Average Gradient At iteration t - Sample i from {1,..., N} - update y t+1 i = f i (x t ) and y t+1 j - Compute g t+1 = 1 N N j=1 y t+1 i - Set x t+1 = x t α t+1 g t+1 = y t j for all j i Provable linear convergence in expectation. Other SGD variance reduction techniques have been recently proposes including the methods: SAGA, SDCA, S2GD. 6 / 35

7 Quasi-Newton Method for min f (x) : f C 1 Gradient method: Newton s method: Quasi-Newton method: x k+1 = x k α k f (x k ) x k+1 = x k α k [ 2 f (x k )] 1 f (x k ) x k+1 = x k α k B 1 k f (x k) where B k 0 approximates the Hessian matrix Update B k+1 s k = y k, (Secant equation) where s k = x k+1 x k = α k d k, and y k = f k+1 f k 7 / 35

8 BFGS BFGS quasi-newton method B k+1 = B k + y k y k s k y k B ks k s k B k s k B ks k where s k := x k+1 x k and y k := f (x k+1 ) f (x k ) B k+1 0 if B k 0 and s k y k > 0 (curvature condition) Secant equation has a solution if s k y k > 0 When f is strongly convex, s k y k > 0 holds automatically If f is nonconvex, use line search to guarantee s k y k > 0 H k+1 = (I s kyk sk y )H k (I y ksk k sk y ) + s ksk k sk y k 8 / 35

9 Prior work on Quasi-Newton Methods for Stochastic Optimization P1 N.N. Schraudolph, J. Yu and S.Günter. A stochastic quasi-newton method for online convex optim. Int l. Conf. AI & Stat., 2007 Modifies BFGS and L-BFGS updates by reducing the step s k and the last term in the update of H k, uses step size α k = β/k for small β > 0. P2 A. Bordes, L. Bottou and P. Gallinari. SGD-QN: Careful quasi-newton stochastic gradient descent. JMLR vol. 10, 2009 Uses a diagonal matrix approximation to [ 2 f ( )] 1 which is updated (hence, the name SGD-QN) on each iteration, α k = 1/(k + α). 9 / 35

10 Prior work on Quasi-Newton Methods for Stochastic Optimization P3 A. Mokhtari and A. Ribeiro. RES: Regularized stochastic BFGS algorithm. IEEE Trans. Signal Process., no. 10, Replaces y k by y k δs k for some δ > 0 in BFGS update and also adds δi to the update; uses α k = β/k; converges in expectation at sub-linear rate E(f (x k ) f ) C/k P4 A. Mokhtari and A. Ribeiro. Global convergence of online limited memory BFGS. to appear in J. Mach. Learn. Res., Uses L-BFGS without regularization and α k = β/k; converges in expectation at sub-linear rate E(f (x k ) f ) C/k 10 / 35

11 Prior work on Quasi-Newton Methods for Stochastic Optimization P5 R.H. Byrd, S.L. Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-newton method for large-scale optim. arxiv v2, 2015 Averages iterates over L steps keeping H k fixed; uses average iterates to update H k using subsampled Hessian to compute y k ; α k = β/k; converges in expectation at a sub-linear rate E(f (x k ) f ) C/k P6 P. Moritz, R. Nishihara, M.I. Jordan. A linearly-convergent stochastic L-BFGS Algorithm, 2015 arxiv: v1 Combines [P5] with SVRG; uses fixed step size α; converges in expectation at a linear rate. 11 / 35

12 Using Stochastic 2nd-order information Assumption: f (x) = 1 n n i=1 f i(x) is strongly convex and twice continuously differentiable. Choose (compute) a sketching matrix S k (the columns of S k are a set of directions). We do not use differences in noisy gradients to estimate curvature, but rather compute the action of the sub-sampled Hessian on S k. i.e., compute Y k = 1 T i T 2 f i (x)s k, where T [n]. 12 / 35

13 Example of Hessian-Vector Computation In binary classification problem, sample function (logistic loss) where f (w; x i, z i ) = z i log(c(w; x i )) + (1 z i ) log(1 c(w; x i )) c(w; x i ) = exp( x i w), x i R n, w R n, z i {0, 1}, Gradient: f (w; x i, z i ) = (c(w; x i ) z i )x i Action of Hessian on s : 2 f (w; x i, z i )s = c(w; x i )(1 c(w; x i ))(x i s)x i 13 / 35

14 block BFGS The block BFGS method computes a least change update to the current approximation H k to the inverse Hessian matrix 2 f (x) at the current point x, by solving min H H k s.t., H = H, HY k = S k. where A = ( 2 f (x k )) 1/2 A( 2 f (x k )) 1/2 F (F = Frobenius) This gives the updating formula (analgous to the updates derived by Broyden, Fletcher, Goldfarb and Shanno, 1970). H k+1 = (I S k [S k Y k] 1 Y k )H k(i Y k [S k Y k] 1 S k )+S k[s k Y k] 1 S k or, by the Sherman-Morrison-Woodbury formula: B k+1 = B k B k S k [S k B ks k ] 1 S k B k + Y k [S k Y k] 1 Y k 14 / 35

15 Limited Memory Block BFGS After M block BFGS steps starting from H k+1 M, one can express H k+1 as H k+1 = V k H k V T k + S kλ k S T k. = V k V k 1 H k 1 V T k 1 V k + V k S k 1 Λ k 1 S T k 1 V T k + S kλ k S T k k+1 M = V k:k+1 M H k+1 M Vk:k+1 M T + i=k V k:i+1 S i Λ i S T i V T k:i+1, where V k = (I S k Λ k Y T k ) (1) and Λ k = (S T k Y k) 1 and V k:i = V k V i. 15 / 35

16 Limited Memory Block BFGS Intuition Hence, when the number of variables d is large, instead of storing the d d matrix H k, we store the previous M block curvature triples (S k+1 M, Y k+1 M, Λ k+1 M ),..., (S k, Y k, Λ k ). Then, analogously to the standard L-BFGS method, for any vector v R d, H k v can be computed efficiently using a two-loop block recursion (in O(Mp(d + p) + p 3 ) operations), if all S i R d p. Limited memory - least change aspect of BFGS is important Each block update acts like a sketching procedure. 16 / 35

17 Two Loop Recursion 17 / 35

18 Choices for the Sketching Matrix S k We employ one of the following strategies Gaussian: S k N (0, I ) has Gaussian entries sampled i.i.d at each iteration. Previous search directions s i delayed: Store the previous L search directions S k = [s k+1 L,..., s k ] then update H k only once every L iterations. Self-conditioning: Sample the columns of the Cholesky factors L k of H k (i.e., L k L T k = H k) uniformly at random. Fortunately we can maintain and update L k efficiently with limited memory. The matrix S is a sketching matrix, in the sense that we are sketching the, possibly very large equation 2 f (x)h = I to which the solution is the inverse Hessian. Right multiplying by S compresses/sketches the equation yielding 2 f (x)hs = S. 18 / 35

19 The Basic Algorithm 19 / 35

20 Convergence - Assumptions There exist constants λ, Λ R + such that f is λ strongly convex f (w) f (x) + f (x) T (w x) + λ 2 w x 2 2, (2) f is Λ smooth f (w) f (x) + f (x) T (w x) + Λ 2 w x 2 2, (3) These assumptions imply that λi 2 f S (w) ΛI, for all x R d, S [n], (4) from which we can prove that there exist constants γ, Γ R + such that for all k we have γi H k ΓI. (5) 20 / 35

21 Bounds on Spectrum of H k Lemma Assuming 0 < λ < Λ such that for all x R d and T [n], where λi 2 f T (x) ΛI γi H k ΓI 1 1+MΛ γ, Γ (1 + κ) 2M 1 (1 + λ(2 ) and κ = Λ/λ κ+κ) bounds in MNJ depend on problem dimension 1 (d+m)λ γ and Γ [(d+m)λ]d+m 1 λ d+m (dκ) d+m 21 / 35

22 Linear Convergence Theorem Suppose that the Assumptions hold. Let w be the unique minimizer of f (w). Then in our Algorithm, we have for all k 0 that Ef (w k ) f (w ) ρ k Ef (w 0 ) f (w ), where the convergence rate is given by ρ = 1/2mη + ηγ2 Λ(Λ λ) γλ ηγ 2 Λ 2 < 1, assuming we have chosen η < γλ/(2γ 2 Λ 2 ) and that we choose m large enough to satisfy m 1 2η (γλ ηγ 2 Λ(2Λ λ)), which is a positive lower bound given our restriction on η. 22 / 35

23 Numerical Experiments Empirical Risk Minimization Test Problems logistic loss with l 2 regularizer min w n log(1 + exp( y i a i, w )) + L w 2 2 i=1 given data: A = [a 1, a 2,, a n ] R d n y {0, 1} n. For each method, chose step size η {1,.5,.1,.05,..., , 10 8 } that gave best results Computed full gradient after each full data pass. Vertical axis in figures below: log(relative error) 23 / 35

24 error gisette-scale d = 5, 000, n = 6, 000 gauss_18_m_5 prev_18_m_5 fact_18_m_3 MNJ_bH_330 SVRG time (s) 24 / 35

25 error covtype-libsvm-binary d = 54, n = 581, 012 gauss_8_m_5 prev_8_m_5 fact_8_m_3 MNJ_bH_3815 SVRG time (s) 25 / 35

26 error Higgs d = 28, n = 11, 000, 000 gauss_4_m_5 prev_4_m_5 fact_4_m_3 MNJ_bH_16585 SVRG time (s) 26 / 35

27 error SUSY d = 18, n = 3, 548, 466 gauss_5_m_5 prev_5_m_5 fact_5_m_3 MNJ_bH_9420 SVRG time (s) 27 / 35

28 error epsilon-normalized d = 2, 000, n = 400, 000 gauss_45_m_5 prev_45_m_5 fact_45_m_3 MNJ_bH_3165 SVRG time (s) 28 / 35

29 error rcv1-training d = 47, 236, n = 20, 242 gauss_10_m_5 prev_10_m_5 fact_10_m_3 MNJ_bH_715 SVRG time (s) 29 / 35

30 error url-combined d = 3, 231, 961, n = 2, 396, 130 gauss_2_m_5 prev_2_m_5 fact_2_m_3 MNJ_bH_7740 SVRG time (s) 30 / 35

31 Contributions New metric learning framework. A block BFGS framework for gradually learning the metric of the underlying function using a sketched form of the subsampled Hessian matrix New limited memory block BFGS method. May also be of interest for non-stochastic optimization Several sketching matrix possibilities. More reasonable bounds on eigenvalues of H k more reasonable conditions for step size 31 / 35

32 Nonconvex stochastic optimization Most stochastic quasi-newton optimization methods are for strongly convex problems; this is needed to ensure a curvature condition required for the positive definiteness of B k (H k ) This is not possible for problems min f (x) E[F (x, ξ)], where f is nonconvex In deterministic setting, one can do line search to guarantee the curvature condition, and hence the positive definiteness of B k (H k ) Line search is not possible for stochastic optimization To address these issues we develop a stochastic damped and a stochastic modified L-BFGS method. 32 / 35

33 Stochastic Damped BFGS (Wang, Ma, G, Liu, 2015) Let y k = 1 m m i=1 ( f (x k+1, ξ k,i ) f (x k, ξ k,i )) and define ȳ k = θ k y k + (1 θ k )B k s k, where { 1, if sk θ k = y k 0.25sk B ks k, (0.75sk B ks k )/(sk B ks k sk y k), if sk y k < 0.25sk B ks k. Update H k : (replace y k by ȳ k ) H k+1 = (I ρ k s k ȳ k )H k(i ρ k ȳ k s k ) + ρ ks k s k where ρ k = 1/s k ȳk Implemented in a limited memory version Work in progress: combine with variance reduced stochastic gradients (SVRG) 33 / 35

34 Convergence of Stochastic Damped BFGS Method Assumptions [AS1] f C 1, bounded below, f is L Lipschitz continuous [AS2] For any iteration k, the stochastic gradient satisfies E ξk [ f (x k, ξ k )] = f (x k ) E ξk [ f (x k, ξ k ) f (x k ) 2 ] σ 2 Theorem (Global convergence): Assume AS1-AS2 hold, (and α k = β/k γ/(lγ 2 ) for all k), then there exist positive constants γ, Γ, such that γi H k ΓI, for all k, and lim inf k f (x k) = 0, with probability 1. Under additional assumption E ξk [ f (xk, ξ k ) 2] M lim f (x k) = 0, with probability 1. k We do not need to assume convexity of f 34 / 35

35 Block-L-BFGS Method for Non-Convex Stochastic Optimization Block-update H k+1 = (I S k Λ 1 k Y k )H k(i Y k Λ 1 k S k ) + S kλ 1 where Λ k = Sk Y k = Sk 2 f (x k )S k In non-convex case Λ k = Λ k may not be positive definite. Λ k 0 discovered while computing Cholesky factorization LDL of Λ k. If during Cholesky, d j δ or (LD 1/2 ) ij β are not satisfied, d j is increased by τ j. = (Λ k ) jj (Λ k ) jj + τ j has the effect of moving search direction H k+1 f (x k+1 ) toward one of negative curvature. k S k Modification based on Gershgorin disc also possible. 35 / 35

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725

Quasi-Newton Methods. Javier Peña Convex Optimization /36-725 Quasi-Newton Methods Javier Peña Convex Optimization 10-725/36-725 Last time: primal-dual interior-point methods Consider the problem min x subject to f(x) Ax = b h(x) 0 Assume f, h 1,..., h m are convex

More information

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno February 6, / 25 (BFG. Limited memory BFGS (L-BFGS) Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden Fletcher Goldfarb

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu Thanks to: Aryan Mokhtari, Mark

More information

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization

Quasi-Newton Methods. Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization Quasi-Newton Methods Zico Kolter (notes by Ryan Tibshirani, Javier Peña, Zico Kolter) Convex Optimization 10-725 Last time: primal-dual interior-point methods Given the problem min x f(x) subject to h(x)

More information

Incremental Quasi-Newton methods with local superlinear convergence rate

Incremental Quasi-Newton methods with local superlinear convergence rate Incremental Quasi-Newton methods wh local superlinear convergence rate Aryan Mokhtari, Mark Eisen, and Alejandro Ribeiro Department of Electrical and Systems Engineering Universy of Pennsylvania Int. Conference

More information

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal

An Evolving Gradient Resampling Method for Machine Learning. Jorge Nocedal An Evolving Gradient Resampling Method for Machine Learning Jorge Nocedal Northwestern University NIPS, Montreal 2015 1 Collaborators Figen Oztoprak Stefan Solntsev Richard Byrd 2 Outline 1. How to improve

More information

Large-scale Stochastic Optimization

Large-scale Stochastic Optimization Large-scale Stochastic Optimization 11-741/641/441 (Spring 2016) Hanxiao Liu hanxiaol@cs.cmu.edu March 24, 2016 1 / 22 Outline 1. Gradient Descent (GD) 2. Stochastic Gradient Descent (SGD) Formulation

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless

More information

Second-Order Methods for Stochastic Optimization

Second-Order Methods for Stochastic Optimization Second-Order Methods for Stochastic Optimization Frank E. Curtis, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

5 Quasi-Newton Methods

5 Quasi-Newton Methods Unconstrained Convex Optimization 26 5 Quasi-Newton Methods If the Hessian is unavailable... Notation: H = Hessian matrix. B is the approximation of H. C is the approximation of H 1. Problem: Solve min

More information

Accelerating SVRG via second-order information

Accelerating SVRG via second-order information Accelerating via second-order information Ritesh Kolte Department of Electrical Engineering rkolte@stanford.edu Murat Erdogdu Department of Statistics erdogdu@stanford.edu Ayfer Özgür Department of Electrical

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 3. Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 3 Gradient Method Shiqian Ma, MAT-258A: Numerical Optimization 2 3.1. Gradient method Classical gradient method: to minimize a differentiable convex

More information

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning

Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Improving L-BFGS Initialization for Trust-Region Methods in Deep Learning Jacob Rafati http://rafati.net jrafatiheravi@ucmerced.edu Ph.D. Candidate, Electrical Engineering and Computer Science University

More information

High Order Methods for Empirical Risk Minimization

High Order Methods for Empirical Risk Minimization High Order Methods for Empirical Risk Minimization Alejandro Ribeiro Department of Electrical and Systems Engineering University of Pennsylvania aribeiro@seas.upenn.edu IPAM Workshop of Emerging Wireless

More information

Sub-Sampled Newton Methods

Sub-Sampled Newton Methods Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1

More information

Programming, numerics and optimization

Programming, numerics and optimization Programming, numerics and optimization Lecture C-3: Unconstrained optimization II Łukasz Jankowski ljank@ippt.pan.pl Institute of Fundamental Technological Research Room 4.32, Phone +22.8261281 ext. 428

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

2. Quasi-Newton methods

2. Quasi-Newton methods L. Vandenberghe EE236C (Spring 2016) 2. Quasi-Newton methods variable metric methods quasi-newton methods BFGS update limited-memory quasi-newton methods 2-1 Newton method for unconstrained minimization

More information

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman Lecture, Sept 2016 1 Collaborators Raghu Bollapragada Northwestern University Richard Byrd University of Colorado

More information

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations

Improved Optimization of Finite Sums with Minibatch Stochastic Variance Reduced Proximal Iterations Improved Optimization of Finite Sums with Miniatch Stochastic Variance Reduced Proximal Iterations Jialei Wang University of Chicago Tong Zhang Tencent AI La Astract jialei@uchicago.edu tongzhang@tongzhang-ml.org

More information

A Stochastic Quasi-Newton Method for Large-Scale Optimization

A Stochastic Quasi-Newton Method for Large-Scale Optimization A Stochastic Quasi-Newton Method for Large-Scale Optimization R. H. Byrd S.L. Hansen Jorge Nocedal Y. Singer February 17, 2015 Abstract The question of how to incorporate curvature information in stochastic

More information

Lecture 1: Supervised Learning

Lecture 1: Supervised Learning Lecture 1: Supervised Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech ISYE6740/CSE6740/CS7641: Computational Data Analysis/Machine from Portland, Learning Oregon: pervised learning (Supervised)

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan The Chinese University of Hong Kong chtan@se.cuhk.edu.hk Shiqian Ma The Chinese University of Hong Kong sqma@se.cuhk.edu.hk Yu-Hong

More information

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v3 [math.oc] 26 Feb 2016 Farbod Roosta-Khorasani February 29, 2016 Abstract Michael W. Mahoney Large scale optimization problems

More information

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent

Methods that avoid calculating the Hessian. Nonlinear Optimization; Steepest Descent, Quasi-Newton. Steepest Descent Nonlinear Optimization Steepest Descent and Niclas Börlin Department of Computing Science Umeå University niclas.borlin@cs.umu.se A disadvantage with the Newton method is that the Hessian has to be derived

More information

ORIE 6326: Convex Optimization. Quasi-Newton Methods

ORIE 6326: Convex Optimization. Quasi-Newton Methods ORIE 6326: Convex Optimization Quasi-Newton Methods Professor Udell Operations Research and Information Engineering Cornell April 10, 2017 Slides on steepest descent and analysis of Newton s method adapted

More information

Higher-Order Methods

Higher-Order Methods Higher-Order Methods Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. PCMI, July 2016 Stephen Wright (UW-Madison) Higher-Order Methods PCMI, July 2016 1 / 25 Smooth

More information

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence

Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence Parallel and Distributed Stochastic Learning -Towards Scalable Learning for Big Data Intelligence oé LAMDA Group H ŒÆOŽÅ Æ EâX ^ #EâI[ : liwujun@nju.edu.cn Dec 10, 2016 Wu-Jun Li (http://cs.nju.edu.cn/lwj)

More information

Second order machine learning

Second order machine learning Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 88 Outline Machine Learning s Inverse Problem

More information

Second-Order Stochastic Optimization for Machine Learning in Linear Time

Second-Order Stochastic Optimization for Machine Learning in Linear Time Journal of Machine Learning Research 8 (207) -40 Submitted 9/6; Revised 8/7; Published /7 Second-Order Stochastic Optimization for Machine Learning in Linear Time Naman Agarwal Computer Science Department

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani Quasi Newton Methods 2 Outline Modified Newton Method Rank one correction of the inverse Rank two correction of the

More information

Solving linear equations with Gaussian Elimination (I)

Solving linear equations with Gaussian Elimination (I) Term Projects Solving linear equations with Gaussian Elimination The QR Algorithm for Symmetric Eigenvalue Problem The QR Algorithm for The SVD Quasi-Newton Methods Solving linear equations with Gaussian

More information

Nonlinear Optimization Methods for Machine Learning

Nonlinear Optimization Methods for Machine Learning Nonlinear Optimization Methods for Machine Learning Jorge Nocedal Northwestern University University of California, Davis, Sept 2018 1 Introduction We don t really know, do we? a) Deep neural networks

More information

Stochastic Gradient Methods for Large-Scale Machine Learning

Stochastic Gradient Methods for Large-Scale Machine Learning Stochastic Gradient Methods for Large-Scale Machine Learning Leon Bottou Facebook A Research Frank E. Curtis Lehigh University Jorge Nocedal Northwestern University 1 Our Goal 1. This is a tutorial about

More information

Math 273a: Optimization Netwon s methods

Math 273a: Optimization Netwon s methods Math 273a: Optimization Netwon s methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 some material taken from Chong-Zak, 4th Ed. Main features of Newton s method Uses both first derivatives

More information

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Barzilai-Borwein Step Size for Stochastic Gradient Descent Barzilai-Borwein Step Size for Stochastic Gradient Descent Conghui Tan Shiqian Ma Yu-Hong Dai Yuqiu Qian May 16, 2016 Abstract One of the major issues in stochastic gradient descent (SGD) methods is how

More information

Trade-Offs in Distributed Learning and Optimization

Trade-Offs in Distributed Learning and Optimization Trade-Offs in Distributed Learning and Optimization Ohad Shamir Weizmann Institute of Science Includes joint works with Yossi Arjevani, Nathan Srebro and Tong Zhang IHES Workshop March 2016 Distributed

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Prof. C. F. Jeff Wu ISyE 8813 Section 1 Motivation What is parameter estimation? A modeler proposes a model M(θ) for explaining some observed phenomenon θ are the parameters

More information

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure Alberto Bietti Julien Mairal Inria Grenoble (Thoth) March 21, 2017 Alberto Bietti Stochastic MISO March 21,

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data

More information

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version)

LBFGS. John Langford, Large Scale Machine Learning Class, February 5. (post presentation version) LBFGS John Langford, Large Scale Machine Learning Class, February 5 (post presentation version) We are still doing Linear Learning Features: a vector x R n Label: y R Goal: Learn w R n such that ŷ w (x)

More information

Quasi-Newton methods for minimization

Quasi-Newton methods for minimization Quasi-Newton methods for minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS Universitá di Trento November 21 December 14, 2011 Quasi-Newton methods for minimization 1

More information

Variable Metric Stochastic Approximation Theory

Variable Metric Stochastic Approximation Theory Variable Metric Stochastic Approximation Theory Abstract We provide a variable metric stochastic approximation theory. In doing so, we provide a convergence theory for a large class of online variable

More information

SVRG++ with Non-uniform Sampling

SVRG++ with Non-uniform Sampling SVRG++ with Non-uniform Sampling Tamás Kern András György Department of Electrical and Electronic Engineering Imperial College London, London, UK, SW7 2BT {tamas.kern15,a.gyorgy}@imperial.ac.uk Abstract

More information

Linear Convergence under the Polyak-Łojasiewicz Inequality

Linear Convergence under the Polyak-Łojasiewicz Inequality Linear Convergence under the Polyak-Łojasiewicz Inequality Hamed Karimi, Julie Nutini and Mark Schmidt The University of British Columbia LCI Forum February 28 th, 2017 1 / 17 Linear Convergence of Gradient-Based

More information

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal

Stochastic Optimization Methods for Machine Learning. Jorge Nocedal Stochastic Optimization Methods for Machine Learning Jorge Nocedal Northwestern University SIAM CSE, March 2017 1 Collaborators Richard Byrd R. Bollagragada N. Keskar University of Colorado Northwestern

More information

Second order machine learning

Second order machine learning Second order machine learning Michael W. Mahoney ICSI and Department of Statistics UC Berkeley Michael W. Mahoney (UC Berkeley) Second order machine learning 1 / 96 Outline Machine Learning s Inverse Problem

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν

1. Search Directions In this chapter we again focus on the unconstrained optimization problem. lim sup ν 1 Search Directions In this chapter we again focus on the unconstrained optimization problem P min f(x), x R n where f : R n R is assumed to be twice continuously differentiable, and consider the selection

More information

Lecture 18: November Review on Primal-dual interior-poit methods

Lecture 18: November Review on Primal-dual interior-poit methods 10-725/36-725: Convex Optimization Fall 2016 Lecturer: Lecturer: Javier Pena Lecture 18: November 2 Scribes: Scribes: Yizhu Lin, Pan Liu Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION

LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION LARGE-SCALE NONCONVEX STOCHASTIC OPTIMIZATION BY DOUBLY STOCHASTIC SUCCESSIVE CONVEX APPROXIMATION Aryan Mokhtari, Alec Koppel, Gesualdo Scutari, and Alejandro Ribeiro Department of Electrical and Systems

More information

Block stochastic gradient update method

Block stochastic gradient update method Block stochastic gradient update method Yangyang Xu and Wotao Yin IMA, University of Minnesota Department of Mathematics, UCLA November 1, 2015 This work was done while in Rice University 1 / 26 Stochastic

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016

Contents. 1 Introduction. 1.1 History of Optimization ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 ALG-ML SEMINAR LISSA: LINEAR TIME SECOND-ORDER STOCHASTIC ALGORITHM FEBRUARY 23, 2016 LECTURERS: NAMAN AGARWAL AND BRIAN BULLINS SCRIBE: KIRAN VODRAHALLI Contents 1 Introduction 1 1.1 History of Optimization.....................................

More information

Efficient Methods for Large-Scale Optimization

Efficient Methods for Large-Scale Optimization Efficient Methods for Large-Scale Optimization Aryan Mokhtari Department of Electrical and Systems Engineering University of Pennsylvania aryanm@seas.upenn.edu Ph.D. Proposal Advisor: Alejandro Ribeiro

More information

Convex Optimization. Problem set 2. Due Monday April 26th

Convex Optimization. Problem set 2. Due Monday April 26th Convex Optimization Problem set 2 Due Monday April 26th 1 Gradient Decent without Line-search In this problem we will consider gradient descent with predetermined step sizes. That is, instead of determining

More information

Stochastic Gradient Descent with Variance Reduction

Stochastic Gradient Descent with Variance Reduction Stochastic Gradient Descent with Variance Reduction Rie Johnson, Tong Zhang Presenter: Jiawen Yao March 17, 2015 Rie Johnson, Tong Zhang Presenter: JiawenStochastic Yao Gradient Descent with Variance Reduction

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Linearly-Convergent Stochastic-Gradient Methods

Linearly-Convergent Stochastic-Gradient Methods Linearly-Convergent Stochastic-Gradient Methods Joint work with Francis Bach, Michael Friedlander, Nicolas Le Roux INRIA - SIERRA Project - Team Laboratoire d Informatique de l École Normale Supérieure

More information

Improved Damped Quasi-Newton Methods for Unconstrained Optimization

Improved Damped Quasi-Newton Methods for Unconstrained Optimization Improved Damped Quasi-Newton Methods for Unconstrained Optimization Mehiddin Al-Baali and Lucio Grandinetti August 2015 Abstract Recently, Al-Baali (2014) has extended the damped-technique in the modified

More information

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning

Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning Bo Liu Department of Computer Science, Rutgers Univeristy Xiao-Tong Yuan BDAT Lab, Nanjing University of Information Science and Technology

More information

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods. Linear models for classification Logistic regression Gradient descent and second-order methods

More information

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2

Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Methods for Unconstrained Optimization Numerical Optimization Lectures 1-2 Coralia Cartis, University of Oxford INFOMM CDT: Modelling, Analysis and Computation of Continuous Real-World Problems Methods

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

Stochastic Second-Order Method for Large-Scale Nonconvex Sparse Learning Models

Stochastic Second-Order Method for Large-Scale Nonconvex Sparse Learning Models Stochastic Second-Order Method for Large-Scale Nonconvex Sparse Learning Models Hongchang Gao, Heng Huang Department of Electrical and Computer Engineering, University of Pittsburgh, USA hongchanggao@gmail.com,

More information

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade

Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Optimization in the Big Data Regime 2: SVRG & Tradeoffs in Large Scale Learning. Sham M. Kakade Machine Learning for Big Data CSE547/STAT548 University of Washington S. M. Kakade (UW) Optimization for

More information

Sub-Sampled Newton Methods I: Globally Convergent Algorithms

Sub-Sampled Newton Methods I: Globally Convergent Algorithms Sub-Sampled Newton Methods I: Globally Convergent Algorithms arxiv:1601.04737v1 [math.oc] 18 Jan 2016 Farbod Roosta-Khorasani January 20, 2016 Abstract Michael W. Mahoney Large scale optimization problems

More information

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization

Stochastic Gradient Descent. Ryan Tibshirani Convex Optimization Stochastic Gradient Descent Ryan Tibshirani Convex Optimization 10-725 Last time: proximal gradient descent Consider the problem min x g(x) + h(x) with g, h convex, g differentiable, and h simple in so

More information

Presentation in Convex Optimization

Presentation in Convex Optimization Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology

More information

Optimization Methods for Machine Learning

Optimization Methods for Machine Learning Optimization Methods for Machine Learning Sathiya Keerthi Microsoft Talks given at UC Santa Cruz February 21-23, 2017 The slides for the talks will be made available at: http://www.keerthis.com/ Introduction

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and Wu-Jun Li National Key Laboratory for Novel Software Technology Department of Computer

More information

Optimization Algorithms on Riemannian Manifolds with Applications

Optimization Algorithms on Riemannian Manifolds with Applications Optimization Algorithms on Riemannian Manifolds with Applications Wen Huang Coadvisor: Kyle A. Gallivan Coadvisor: Pierre-Antoine Absil Florida State University Catholic University of Louvain November

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee

Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Fast Asynchronous Parallel Stochastic Gradient Descent: A Lock-Free Approach with Convergence Guarantee Shen-Yi Zhao and

More information

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal

IPAM Summer School Optimization methods for machine learning. Jorge Nocedal IPAM Summer School 2012 Tutorial on Optimization methods for machine learning Jorge Nocedal Northwestern University Overview 1. We discuss some characteristics of optimization problems arising in deep

More information

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo

Approximate Second Order Algorithms. Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Approximate Second Order Algorithms Seo Taek Kong, Nithin Tangellamudi, Zhikai Guo Why Second Order Algorithms? Invariant under affine transformations e.g. stretching a function preserves the convergence

More information

Optimization and Root Finding. Kurt Hornik

Optimization and Root Finding. Kurt Hornik Optimization and Root Finding Kurt Hornik Basics Root finding and unconstrained smooth optimization are closely related: Solving ƒ () = 0 can be accomplished via minimizing ƒ () 2 Slide 2 Basics Root finding

More information

Accelerating Stochastic Optimization

Accelerating Stochastic Optimization Accelerating Stochastic Optimization Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem and Mobileye Master Class at Tel-Aviv, Tel-Aviv University, November 2014 Shalev-Shwartz

More information

Numerical Optimization: Basic Concepts and Algorithms

Numerical Optimization: Basic Concepts and Algorithms May 27th 2015 Numerical Optimization: Basic Concepts and Algorithms R. Duvigneau R. Duvigneau - Numerical Optimization: Basic Concepts and Algorithms 1 Outline Some basic concepts in optimization Some

More information

Fast Stochastic Optimization Algorithms for ML

Fast Stochastic Optimization Algorithms for ML Fast Stochastic Optimization Algorithms for ML Aaditya Ramdas April 20, 205 This lecture is about efficient algorithms for minimizing finite sums min w R d n i= f i (w) or min w R d n f i (w) + λ 2 w 2

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Towards stability and optimality in stochastic gradient descent

Towards stability and optimality in stochastic gradient descent Towards stability and optimality in stochastic gradient descent Panos Toulis, Dustin Tran and Edoardo M. Airoldi August 26, 2016 Discussion by Ikenna Odinaka Duke University Outline Introduction 1 Introduction

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

Nonlinear Programming

Nonlinear Programming Nonlinear Programming Kees Roos e-mail: C.Roos@ewi.tudelft.nl URL: http://www.isa.ewi.tudelft.nl/ roos LNMB Course De Uithof, Utrecht February 6 - May 8, A.D. 2006 Optimization Group 1 Outline for week

More information

A Stochastic Quasi-Newton Method for Online Convex Optimization

A Stochastic Quasi-Newton Method for Online Convex Optimization A Stochastic Quasi-Newton Method for Online Convex Optimization Nicol N. Schraudolph nic.schraudolph@nicta.com.au Jin Yu jin.yu@rsise.anu.edu.au Statistical Machine Learning, National ICT Australia Locked

More information

Trust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning

Trust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning Trust-Region Optimization Methods Using Limited-Memory Symmetric Rank-One Updates for Off-The-Shelf Machine Learning by Jennifer B. Erway, Joshua Griffin, Riadh Omheni, and Roummel Marcia Technical report

More information

Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization

Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Accelerated Randomized Primal-Dual Coordinate Method for Empirical Risk Minimization Lin Xiao (Microsoft Research) Joint work with Qihang Lin (CMU), Zhaosong Lu (Simon Fraser) Yuchen Zhang (UC Berkeley)

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Summary and discussion of: Dropout Training as Adaptive Regularization

Summary and discussion of: Dropout Training as Adaptive Regularization Summary and discussion of: Dropout Training as Adaptive Regularization Statistics Journal Club, 36-825 Kirstin Early and Calvin Murdock November 21, 2014 1 Introduction Multi-layered (i.e. deep) artificial

More information

MATH 4211/6211 Optimization Quasi-Newton Method

MATH 4211/6211 Optimization Quasi-Newton Method MATH 4211/6211 Optimization Quasi-Newton Method Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 Quasi-Newton Method Motivation:

More information

Math 408A: Non-Linear Optimization

Math 408A: Non-Linear Optimization February 12 Broyden Updates Given g : R n R n solve g(x) = 0. Algorithm: Broyden s Method Initialization: x 0 R n, B 0 R n n Having (x k, B k ) compute (x k+1, B x+1 ) as follows: Solve B k s k = g(x

More information

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09

Numerical Optimization Professor Horst Cerjak, Horst Bischof, Thomas Pock Mat Vis-Gra SS09 Numerical Optimization 1 Working Horse in Computer Vision Variational Methods Shape Analysis Machine Learning Markov Random Fields Geometry Common denominator: optimization problems 2 Overview of Methods

More information