Lecture 3: Huge-scale optimization problems

Size: px
Start display at page:

Download "Lecture 3: Huge-scale optimization problems"

Transcription

1 Liege University: Francqui Chair Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, / 32

2 Outline 1 Problems sizes 2 Random coordinate search 3 Confidence level of solutions 4 Sparse Optimization problems 5 Sparse updates for linear operators 6 Fast updates in computational trees 7 Simple subgradient methods 8 Application examples Yu. Nesterov () Huge-scale optimization problems 2/32March 9, / 32

3 Nonlinear Optimization: problems sizes Class Operations Dimension Iter.Cost Memory Small-size All n 4 n 3 Kilobyte: 10 3 Medium-size A n 3 n 2 Megabyte: 10 6 Large-scale Ax n 2 n Gigabyte: 10 9 Huge-scale x + y n log n Terabyte: Sources of Huge-Scale problems Internet (New) Telecommunications (New) Finite-element schemes (Old) Partial differential equations (Old) Yu. Nesterov () Huge-scale optimization problems 3/32March 9, / 32

4 Very old optimization idea: Coordinate Search Problem: min f (x) x R n (f is convex and differentiable). Coordinate relaxation algorithm For k 0 iterate 1 Choose active coordinate i k. 2 Update x k+1 = x k h k ik f (x k )e ik ensuring f (x k+1 ) f (x k ). (e i is ith coordinate vector in R n.) Main advantage: Very simple implementation. Yu. Nesterov () Huge-scale optimization problems 4/32March 9, / 32

5 Possible strategies 1 Cyclic moves. (Difficult to analyze.) 2 Random choice of coordinate (Why?) 3 Choose coordinate with the maximal directional derivative. Complexity estimate: assume f (x) f (y) L x y, x, y R n. Let us choose h k = 1 L. Then f (x k ) f (x k+1 ) 1 2L i k f (x k ) 2 1 2nL f (x k) 2 1 2nLR 2 (f (x k ) f ) 2. Hence, f (x k ) f 2nLR2 k, k 1. (For Grad.Method, drop n.) This is the only known theoretical result known for CDM! Yu. Nesterov () Huge-scale optimization problems 5/32March 9, / 32

6 Criticism Theoretical justification: Complexity bounds are not known for the most of the schemes. The only justified scheme needs computation of the whole gradient. (Why don t use GM?) Computational complexity: Fast differentiation: if function is defined by a sequence of operations, then C( f ) 4C(f ). Can we do anything without computing the function s values? Result: CDM are almost out of the computational practice. Yu. Nesterov () Huge-scale optimization problems 6/32March 9, / 32

7 Google problem Let E R n n be an incidence matrix of a graph. and Ē = E diag (E T e) 1. Thus, Ē T e = e. Our problem is as follows: Denote e = (1,..., 1) T Optimization formulation: Find x 0 : Ēx = x. f (x) def = 1 2 Ēx x 2 + γ 2 [ e, x 1]2 min x R n Yu. Nesterov () Huge-scale optimization problems 7/32March 9, / 32

8 Huge-scale problems Main features The size is very big (n 10 7 ). The data is distributed in space. The requested parts of data are not always available. The data is changing in time. Consequences Simplest operations are expensive or infeasible: Update of the full vector of variables. Matrix-vector multiplication. Computation of the objective function s value, etc. Yu. Nesterov () Huge-scale optimization problems 8/32March 9, / 32

9 Structure of the Google Problem Let ua look at the gradient of the objective: Main observations: i f (x) = a i, g(x) + γ[ e, x 1], i = 1,..., n, g(x) = Ēx x R n, (Ē = (a 1,..., a n )). The coordinate move x + = x h i i f (x)e i needs O(p i ) a.o. (p i is the number of nonzero elements in a i.) ( ) def d i = diag 2 f def = Ē T Ē + γee T = γ + 1 p i are available. We can use them for choosing the step sizes (h i = 1 d i ). i Reasonable coordinate choice strategy? Random! Yu. Nesterov () Huge-scale optimization problems 9/32March 9, / 32

10 Random coordinate descent methods (RCDM) min f (x), x R N (f is convex and differentiable) Main Assumption: f i (x + h ie i ) f i (x) L i h i, h i R, i = 1,..., N, where e i is a coordinate vector. Then f (x + h i e i ) f (x) + f i (x)h i + L i 2 h2 i. x RN, h i R. Define the coordinate steps: T i (x) def = x 1 L i f i (x)e i. Then, f (x) f (T i (x)) 1 2L i [f i (x)]2, i = 1,..., N. Yu. Nesterov () Huge-scale optimization problems 10/32 March 9, / 32

11 Random coordinate choice We need a special random counter R α, α R: Prob [i] = p (i) α = L α i [ N L α j j=1 Note: R 0 generates uniform distribution. ] 1, i = 1,..., N. Method RCDM(α, x 0 ) For k 0 iterate: 1) Choose i k = R α. 2) Update x k+1 = T ik (x k ). Yu. Nesterov () Huge-scale optimization problems 11/32 March 9, / 32

12 Complexity bounds for RCDM We need to introduce the following norms for x, g R N : x α = [ N ] 1/2 L α i [x (i) ] 2, g α = i=1 [ N i=1 1 L α i [g (i) ] 2 ] 1/2. After k iterations, RCDM(α, x 0 ) generates random output x k, which depends on ξ k = {i 0,..., i k }. Denote φ k = E ξk 1 f (x k ). Theorem. For any k 1 we have [ φ k f 2 k N L α j j=1 ] R 2 1 α (x 0), { where R β (x 0 ) = max max x x β : f (x) f (x 0 ) x x X }. Yu. Nesterov () Huge-scale optimization problems 12/32 March 9, / 32

13 Interpretation 1. α = 0. Then S 0 = N, and we get φ k f 2N k R2 1 (x 0). Note We use the metric x 2 1 = N L i [x (i) ] 2. i=1 A matrix with diagonal {L i } N i=1 can have its norm equal to n. Hence, for GM we can guarantee the same bound. But its cost of iteration is much higher! Yu. Nesterov () Huge-scale optimization problems 13/32 March 9, / 32

14 Interpretation 2. α = 1 2. Denote { D (x 0 ) = max max x y X Then, R 2 1/2 (x 0) S 1/2 D 2 (x 0 ), and we obtain } max x (i) y (i) : f (x) f (x 0 ). 1 i N Note: [ N ] 2 φ k f 2 k L 1/2 i D (x 2 0 ). i=1 For the first order methods, the worst-case complexity of minimizing over a box depends on N. Since S 1/2 can be bounded, RCDM can be applied in situations when the usual GM fail. Yu. Nesterov () Huge-scale optimization problems 14/32 March 9, / 32

15 Interpretation 3. α = 1. Then R 0 (x 0 ) is the size of the initial level set in the standard Euclidean norm. Hence, φ k f [ N ] [ ] 2 k L i R0 2(x 0) 2N k 1 N N L i R0 2(x 0). i=1 i=1 Rate of convergence of GM can be estimated as f (x k ) f γ k R2 0 (x 0 ), where γ satisfies condition f (x) γ I, x R N. Note: maximal eigenvalue of symmetric matrix can reach its trace. In the worst case, the rate of convergence of GM is the same as that of RCDM. Yu. Nesterov () Huge-scale optimization problems 15/32 March 9, / 32

16 Minimizing the strongly convex functions Theorem. Let f (x) be strongly convex with respect to 1 α with convexity parameter σ 1 α > 0. Then, for {x k } generated by RCDM(α, x 0 ) we have φ k φ ( 1 σ 1 α S α ) k (f (x0 ) f ). Proof: Let x k be generated by RCDM after k iterations. Let us estimate the expected result of the next iteration. f (x k ) E ik (f (x k+1 )) = N N i=1 σ 1 α i=1 p (i) α [f (x k ) f (T i (x k ))] p (i) α 2L i [f i (x k)] 2 = 1 2S α ( f (x k ) 1 α )2 S α (f (x k ) f ). It remains to compute expectation in ξ k 1. Yu. Nesterov () Huge-scale optimization problems 16/32 March 9, / 32

17 Confidence level of the answers Note: We have proved that the expected values of random f (x k ) are good. Can we guarantee anything after a single run? Confidence level: Probability β (0, 1), that some statement about random output is correct. Main tool: Chebyschev inequality (ξ 0): Our situation: Prob [ξ T ] E(ξ) T. Prob [f (x k ) f ɛ] 1 ɛ [φ k f ] 1 β. We need φ k f ɛ (1 β). Too expensive for β 1? Yu. Nesterov () Huge-scale optimization problems 17/32 March 9, / 32

18 Regularization technique Consider f µ (x) = f (x) + µ 2 x x α. It is strongly convex. Therefore, we can obtain φ k f µ ɛ (1 β) in Theorem. Define α = 1, µ = ( ) O 1 µ S 1 α ln ɛ (1 β) iterations. k 1 + 8S 1R 2 0 (x 0) ɛ ɛ 4R0 2(x, and choose 0) [ ] ln 2S 1R0 2(x 0) ɛ + ln 1 1 β. Let x k be generated by RCDM(1, x 0 ) as applied to f µ.then Prob (f (x k ) f ɛ) β. Note: β = 1 10 p ln 10 p = 2.3p. Yu. Nesterov () Huge-scale optimization problems 18/32 March 9, / 32

19 Implementation details: Random Counter Given the values L i, i = 1,..., N, generate efficiently random i {1,..., N} with probabilities Prob [i = k] = L k / N L j. Solution: a) Trivial O(N) operations. b). Assume N = 2 p. Define p + 1 vectors S k R 2p k, k = 0,..., p: S (i) 0 = L i, i = 1,..., N. j=1 S (i) k = S (2i) k 1 + S (2i 1) k 1, i = 1,..., 2 p k, k = 1,..., p. Algorithm: Make the choice in p steps, from top to bottom. If the element i of S k is chosen, then choose in S k 1 either 2i or 2i 1 in accordance to probabilities S(2i) k 1 or S(2i 1) k 1. S (i) k S (i) k Difference: for n = 2 20 > 10 6 we have p = log 2 N = 20. Yu. Nesterov () Huge-scale optimization problems 19/32 March 9, / 32

20 Sparse problems Problem: min f (x), x Q where Q is closed and convex in R N, and f (x) = Ψ(Ax), where Ψ is a simple convex function: Ψ(y 1 ) Ψ(y 2 ) + Ψ (y 2 ), y 1 y 2, y 1, y 2 R M, A : R N R M is a sparse matrix. Let p(x) def = # of nonzeros in x. Sparsity coefficient: γ(a) def = p(a) MN. Example 1: Matrix-vector multiplication Computation of vector Ax needs p(a) operations. Initial complexity MN is reduced in γ(a) times. Yu. Nesterov () Huge-scale optimization problems 20/32 March 9, / 32

21 Gradient Method x 0 Q, x k+1 = π Q (x k hf (x k )), k 0. Main computational expenses Projection of simple set Q needs O(N) operations. Displacement x k x k hf (x k ) needs O(N) operations. f (x) = A T Ψ (Ax). If Ψ is simple, then the main efforts are spent for two matrix-vector multiplications: 2p(A). Conclusion: As compared with full matrices, we accelerate in γ(a) times. Note: For Large- and Huge-scale problems, we often have γ(a) Can we get more? Yu. Nesterov () Huge-scale optimization problems 21/32 March 9, / 32

22 Sparse updating strategy Main idea After update x + = x + d we have y + def = Ax + = Ax What happens if d is sparse? Denote σ(d) = {j : d (j) 0}. Then y + = y + Its complexity, κ A (d) def = κ A (d) = M j σ(d) j σ(d) p(ae j ), γ(ae j ) = γ(d) 1 p(d) γ(d) max 1 j m γ(ae j) MN. j σ(d) }{{} y +Ad. d (j) Ae j. can be VERY small! j σ(d) γ(ae j ) MN If γ(d) cγ(a), γ(a j ) cγ(a), then κ A (d) c 2 γ 2 (A) MN. Expected acceleration: (10 6 ) 2 = sec years Yu. Nesterov () Huge-scale optimization problems 22/32 March 9, / 32

23 When it can work? Simple methods: No full-vector operations! (Is it possible?) Simple problems: Examples Functions with sparse gradients. 1 Quadratic function f (x) = 1 2 Ax, x b, x. The gradient f (x) = Ax b, x R N, is not sparse even if A is sparse. 2 Piece-wise linear function g(x) = max [ a i, x b (i) ]. Its 1 i m subgradient f (x) = a i(x), i(x) : f (x) = a i(x), x b (i(x)), can be sparse if a i is sparse! But: We need a fast procedure for updating max-operations. Yu. Nesterov () Huge-scale optimization problems 23/32 March 9, / 32

24 Fast updates in short computational trees Def: Function f (x), x R n, is short-tree representable, if it can be computed by a short binary tree with the height ln n. Let n = 2 k and the tree has k + 1 levels: v 0,i = x (i), i = 1,..., n. Size of the next level halves the size of the previous one: v i+1,j = ψ i+1,j (v i,2j 1, v i,2j ), j = 1,..., 2 k i 1, i = 0,..., k 1, where ψ i,j are some bivariate functions. v k 1,1 v k,1 v k 1, v 2,1 v 2,n/4 v 1,1 v 1,2... v 1,n/2 1 v 1,n/2 v 0,1 v 0,2 v 0,3 v 0,4 v 0,n 3 v 0,n 2 v 0,n 1 v 0,n Yu. Nesterov () Huge-scale optimization problems 24/32 March 9, / 32

25 Main advantages Important examples (symmetric functions) f (x) = x ( p, p 1, ψ i,j (t 1, t 2 ) [ t 1 p + t 2 p ] 1/p, n ) f (x) = ln e x(i), ψ i,j (t 1, t 2 ) ln (e t 1 + e t 2 ), i=1 f (x) = max x (i), ψ i,j (t 1, t 2 ) max {t 1, t 2 }. 1 i n The binary tree requires only n 1 auxiliary cells. Its value needs n 1 applications of ψ i,j (, ) ( operations). If x + differs from x in one entry only, then for re-computing f (x + ) we need only k log 2 n operations. Thus, we can have pure subgradient minimization schemes with Sublinear Iteration Cost. Yu. Nesterov () Huge-scale optimization problems 25/32 March 9, / 32

26 Simple subgradient methods I. Problem: f def = min x Q f (x), where Q is a closed and convex and f (x) L(f ), x Q, the optimal value f is known. Consider the following optimization scheme (B.Polyak, 1967): ( x 0 Q, x k+1 = π Q x k f (x k) f ) f (x k ) 2 f (x k ), k 0. Denote f k = min 0 i k f (x i). Then for any k 0 we have: f k f L(f ) x 0 π X (x 0 ) (k+1) 1/2, x k x x 0 x, x X. Yu. Nesterov () Huge-scale optimization problems 26/32 March 9, / 32

27 Proof: Let us fix x X. Denote r k (x ) = x k x. r 2 k+1 (x ) Then x k f (x k) f f (x f (x k ) 2 k ) x 2 = rk 2(x ) 2 f (x k) f f (x f (x k ) 2 k ), x k x + (f (x k) f ) 2 f (x k ) 2 r 2 k (x ) (f (x k) f ) 2 f (x k ) 2 r 2 k (x ) (f k f ) 2 L 2 (f ). From this reasoning, x k+1 x 2 x k x 2, x X. Corollary: Assume X has recession direction d. Then x k π X (x 0 ) x 0 π X (x 0 ), d, x k d, x 0. (Proof: consider x = π X (x 0 ) + αd, α 0.) Yu. Nesterov () Huge-scale optimization problems 27/32 March 9, / 32

28 Constrained minimization (N.Shor (1964) & B.Polyak) II. Problem: min{f (x) : g(x) 0}, where x Q Q is closed and convex, f, g have uniformly bounded subgradients. Consider the following method. It has step-size parameter h > 0. ( ) If g(x k ) > h g (x k ), then (A): x k+1 = π Q x k g(x k) g (x g (x k ) 2 k, ) else (B): x k+1 = π Q (x k h f (x k ) f (x k ). Let F k {0,..., k} be the set (B)-iterations, and f k Theorem: If k > x 0 x 2 /h 2, then F k and = min i F k f (x i ). f k f (x) hl(f ), max g(x i ) hl(g). i F k Yu. Nesterov () Huge-scale optimization problems 28/32 March 9, / 32

29 Computational strategies 1. Constants L(f ), L(g) are known (e.g. Linear Programming) We can take h = of steps N (easy!). ɛ max{l(f ),L(g)}. Note: The standard advice is h = Then we need to decide on the number R N+1 (much more difficult!) 2. Constants L(f ), L(g) are not known Start from a guess. Restart from scratch each time we see the guess is wrong. The guess is doubled after restart. 3. Tracking the record value fk Double run. Other ideas are welcome! Yu. Nesterov () Huge-scale optimization problems 29/32 March 9, / 32

30 Application examples Observations: 1 Very often, Large- and Huge- scale problems have repetitive sparsity patterns and/or limited connectivity. Social networks. Mobile phone networks. Truss topology design (local bars). Finite elements models (2D: four neighbors, 3D: six neighbors). 2 For p-diagonal matrices κ(a) p 2. Yu. Nesterov () Huge-scale optimization problems 30/32 March 9, / 32

31 Nonsmooth formulation of Google Problem Main property of spectral radius (A 0) If A R+ n n, then ρ(a) = min max 1 e x 0 1 i n x (i) i, Ax. The minimum is attained at the corresponding eigenvector. Since ρ(ē) = 1, our problem is as follows: f (x) def = max 1 i N [ e i, Ēx x (i) ] min x 0. Interpretation: Maximizing the self-esteem! Since f = 0, we can apply Polyak s method with sparse updates. Additional features; the optimal set X is a convex cone. If x 0 = e, then the whole sequence is separated from zero: x, e x, x k x 1 x k = x, e x k. Goal: Find x 0 such that x 1 and f ( x) ɛ. (First condition is satisfied automatically.) Yu. Nesterov () Huge-scale optimization problems 31/32 March 9, / 32

32 Computational experiments: Iteration Cost We compare Polyak s GM with sparse update (GM s ) with the standard one (GM). Setup: Each agent has exactly p random friends. Thus, κ(a) p 2. Iteration Cost: GM s p 2 log 2 N, Time for 10 4 iterations (p = 32) N κ(a) GM s GM GM pn. Time for 10 3 iterations (p = 16) N κ(a) GM s GM sec 100 min! Yu. Nesterov () Huge-scale optimization problems 32/32 March 9, / 32

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems

More information

Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

Coordinate Descent Methods on Huge-Scale Optimization Problems

Coordinate Descent Methods on Huge-Scale Optimization Problems Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting Warm up exercise? Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant,

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Complexity and Simplicity of Optimization Problems

Complexity and Simplicity of Optimization Problems Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 - March 23, 2012 (ULg) Yu. Nesterov Algorithmic Challenges in Optimization 1/27 Developments in Computer Sciences

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Primal-dual IPM with Asymmetric Barrier

Primal-dual IPM with Asymmetric Barrier Primal-dual IPM with Asymmetric Barrier Yurii Nesterov, CORE/INMA (UCL) September 29, 2008 (IFOR, ETHZ) Yu. Nesterov Primal-dual IPM with Asymmetric Barrier 1/28 Outline 1 Symmetric and asymmetric barriers

More information

On Nesterov s Random Coordinate Descent Algorithms

On Nesterov s Random Coordinate Descent Algorithms On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm

More information

On Nesterov s Random Coordinate Descent Algorithms - Continued

On Nesterov s Random Coordinate Descent Algorithms - Continued On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower

More information

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Lecture 15 Newton Method and Self-Concordance. October 23, 2008 Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications

More information

Accelerating Nesterov s Method for Strongly Convex Functions

Accelerating Nesterov s Method for Strongly Convex Functions Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0

More information

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.

More information

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Martin Takáč The University of Edinburgh Based on: P. Richtárik and M. Takáč. Iteration complexity of randomized

More information

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

ORIE 6334 Spectral Graph Theory October 13, Lecture 15 ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,

More information

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725 Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:

More information

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following

More information

Subgradient Method. Ryan Tibshirani Convex Optimization

Subgradient Method. Ryan Tibshirani Convex Optimization Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

Iterative solvers for linear equations

Iterative solvers for linear equations Spectral Graph Theory Lecture 15 Iterative solvers for linear equations Daniel A. Spielman October 1, 009 15.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving linear

More information

CS 246 Review of Linear Algebra 01/17/19

CS 246 Review of Linear Algebra 01/17/19 1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector

More information

Accelerating the cubic regularization of Newton s method on convex problems

Accelerating the cubic regularization of Newton s method on convex problems Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach

More information

Primal-dual subgradient methods for convex problems

Primal-dual subgradient methods for convex problems Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different

More information

Robust PageRank: Stationary Distribution on a Growing Network Structure

Robust PageRank: Stationary Distribution on a Growing Network Structure oname manuscript o. will be inserted by the editor Robust PageRank: Stationary Distribution on a Growing etwork Structure Anna Timonina-Farkas Received: date / Accepted: date Abstract PageRank PR is a

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

ARock: an algorithmic framework for asynchronous parallel coordinate updates

ARock: an algorithmic framework for asynchronous parallel coordinate updates ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,

More information

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection 6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

6. Iterative Methods for Linear Systems. The stepwise approach to the solution... 6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse

More information

Fast Linear Iterations for Distributed Averaging 1

Fast Linear Iterations for Distributed Averaging 1 Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider

More information

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for 1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.

More information

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Improving the Convergence of Back-Propogation Learning with Second Order Methods the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible

More information

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f

More information

Convex Optimization Lecture 16

Convex Optimization Lecture 16 Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean

More information

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline

More information

5. Subgradient method

5. Subgradient method L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize

More information

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

DLM: Decentralized Linearized Alternating Direction Method of Multipliers 1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction

More information

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

CHAPTER 11. A Revision. 1. The Computers and Numbers therein CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725 Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like

More information

Accelerated Proximal Gradient Methods for Convex Optimization

Accelerated Proximal Gradient Methods for Convex Optimization Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS

More information

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count

More information

Nonlinear Programming Algorithms Handout

Nonlinear Programming Algorithms Handout Nonlinear Programming Algorithms Handout Michael C. Ferris Computer Sciences Department University of Wisconsin Madison, Wisconsin 5376 September 9 1 Eigenvalues The eigenvalues of a matrix A C n n are

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers

More information

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus 1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality

More information

Lecture 10: October 27, 2016

Lecture 10: October 27, 2016 Mathematical Toolkit Autumn 206 Lecturer: Madhur Tulsiani Lecture 0: October 27, 206 The conjugate gradient method In the last lecture we saw the steepest descent or gradient descent method for finding

More information

Constrained Optimization

Constrained Optimization 1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange

More information

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector

More information

Inverse Power Method for Non-linear Eigenproblems

Inverse Power Method for Non-linear Eigenproblems Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Lecture 13: Spectral Graph Theory

Lecture 13: Spectral Graph Theory CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

Solving Dual Problems

Solving Dual Problems Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix. Quadratic forms 1. Symmetric matrices An n n matrix (a ij ) n ij=1 with entries on R is called symmetric if A T, that is, if a ij = a ji for all 1 i, j n. We denote by S n (R) the set of all n n symmetric

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix

More information

We describe the generalization of Hazan s algorithm for symmetric programming

We describe the generalization of Hazan s algorithm for symmetric programming ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

Introduction to Semidefinite Programming I: Basic properties a

Introduction to Semidefinite Programming I: Basic properties a Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite

More information

Assignment 1: From the Definition of Convexity to Helley Theorem

Assignment 1: From the Definition of Convexity to Helley Theorem Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012. Math 5620 - Introduction to Numerical Analysis - Class Notes Fernando Guevara Vasquez Version 1990. Date: January 17, 2012. 3 Contents 1. Disclaimer 4 Chapter 1. Iterative methods for solving linear systems

More information

Stochastic and online algorithms

Stochastic and online algorithms Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem

More information

Semidefinite Programming

Semidefinite Programming Semidefinite Programming Basics and SOS Fernando Mário de Oliveira Filho Campos do Jordão, 2 November 23 Available at: www.ime.usp.br/~fmario under talks Conic programming V is a real vector space h, i

More information

Interior-Point Methods for Linear Optimization

Interior-Point Methods for Linear Optimization Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function

More information

Lecture 8: February 9

Lecture 8: February 9 0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we

More information

Lecture: Local Spectral Methods (1 of 4)

Lecture: Local Spectral Methods (1 of 4) Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide

More information

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725 Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and

More information

Composite nonlinear models at scale

Composite nonlinear models at scale Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)

More information

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A

More information

8.1 Concentration inequality for Gaussian random matrix (cont d)

8.1 Concentration inequality for Gaussian random matrix (cont d) MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration

More information

Semidefinite Programming Basics and Applications

Semidefinite Programming Basics and Applications Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent

More information

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information