Lecture 3: Huge-scale optimization problems
|
|
- Deirdre Bishop
- 6 years ago
- Views:
Transcription
1 Liege University: Francqui Chair Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, / 32
2 Outline 1 Problems sizes 2 Random coordinate search 3 Confidence level of solutions 4 Sparse Optimization problems 5 Sparse updates for linear operators 6 Fast updates in computational trees 7 Simple subgradient methods 8 Application examples Yu. Nesterov () Huge-scale optimization problems 2/32March 9, / 32
3 Nonlinear Optimization: problems sizes Class Operations Dimension Iter.Cost Memory Small-size All n 4 n 3 Kilobyte: 10 3 Medium-size A n 3 n 2 Megabyte: 10 6 Large-scale Ax n 2 n Gigabyte: 10 9 Huge-scale x + y n log n Terabyte: Sources of Huge-Scale problems Internet (New) Telecommunications (New) Finite-element schemes (Old) Partial differential equations (Old) Yu. Nesterov () Huge-scale optimization problems 3/32March 9, / 32
4 Very old optimization idea: Coordinate Search Problem: min f (x) x R n (f is convex and differentiable). Coordinate relaxation algorithm For k 0 iterate 1 Choose active coordinate i k. 2 Update x k+1 = x k h k ik f (x k )e ik ensuring f (x k+1 ) f (x k ). (e i is ith coordinate vector in R n.) Main advantage: Very simple implementation. Yu. Nesterov () Huge-scale optimization problems 4/32March 9, / 32
5 Possible strategies 1 Cyclic moves. (Difficult to analyze.) 2 Random choice of coordinate (Why?) 3 Choose coordinate with the maximal directional derivative. Complexity estimate: assume f (x) f (y) L x y, x, y R n. Let us choose h k = 1 L. Then f (x k ) f (x k+1 ) 1 2L i k f (x k ) 2 1 2nL f (x k) 2 1 2nLR 2 (f (x k ) f ) 2. Hence, f (x k ) f 2nLR2 k, k 1. (For Grad.Method, drop n.) This is the only known theoretical result known for CDM! Yu. Nesterov () Huge-scale optimization problems 5/32March 9, / 32
6 Criticism Theoretical justification: Complexity bounds are not known for the most of the schemes. The only justified scheme needs computation of the whole gradient. (Why don t use GM?) Computational complexity: Fast differentiation: if function is defined by a sequence of operations, then C( f ) 4C(f ). Can we do anything without computing the function s values? Result: CDM are almost out of the computational practice. Yu. Nesterov () Huge-scale optimization problems 6/32March 9, / 32
7 Google problem Let E R n n be an incidence matrix of a graph. and Ē = E diag (E T e) 1. Thus, Ē T e = e. Our problem is as follows: Denote e = (1,..., 1) T Optimization formulation: Find x 0 : Ēx = x. f (x) def = 1 2 Ēx x 2 + γ 2 [ e, x 1]2 min x R n Yu. Nesterov () Huge-scale optimization problems 7/32March 9, / 32
8 Huge-scale problems Main features The size is very big (n 10 7 ). The data is distributed in space. The requested parts of data are not always available. The data is changing in time. Consequences Simplest operations are expensive or infeasible: Update of the full vector of variables. Matrix-vector multiplication. Computation of the objective function s value, etc. Yu. Nesterov () Huge-scale optimization problems 8/32March 9, / 32
9 Structure of the Google Problem Let ua look at the gradient of the objective: Main observations: i f (x) = a i, g(x) + γ[ e, x 1], i = 1,..., n, g(x) = Ēx x R n, (Ē = (a 1,..., a n )). The coordinate move x + = x h i i f (x)e i needs O(p i ) a.o. (p i is the number of nonzero elements in a i.) ( ) def d i = diag 2 f def = Ē T Ē + γee T = γ + 1 p i are available. We can use them for choosing the step sizes (h i = 1 d i ). i Reasonable coordinate choice strategy? Random! Yu. Nesterov () Huge-scale optimization problems 9/32March 9, / 32
10 Random coordinate descent methods (RCDM) min f (x), x R N (f is convex and differentiable) Main Assumption: f i (x + h ie i ) f i (x) L i h i, h i R, i = 1,..., N, where e i is a coordinate vector. Then f (x + h i e i ) f (x) + f i (x)h i + L i 2 h2 i. x RN, h i R. Define the coordinate steps: T i (x) def = x 1 L i f i (x)e i. Then, f (x) f (T i (x)) 1 2L i [f i (x)]2, i = 1,..., N. Yu. Nesterov () Huge-scale optimization problems 10/32 March 9, / 32
11 Random coordinate choice We need a special random counter R α, α R: Prob [i] = p (i) α = L α i [ N L α j j=1 Note: R 0 generates uniform distribution. ] 1, i = 1,..., N. Method RCDM(α, x 0 ) For k 0 iterate: 1) Choose i k = R α. 2) Update x k+1 = T ik (x k ). Yu. Nesterov () Huge-scale optimization problems 11/32 March 9, / 32
12 Complexity bounds for RCDM We need to introduce the following norms for x, g R N : x α = [ N ] 1/2 L α i [x (i) ] 2, g α = i=1 [ N i=1 1 L α i [g (i) ] 2 ] 1/2. After k iterations, RCDM(α, x 0 ) generates random output x k, which depends on ξ k = {i 0,..., i k }. Denote φ k = E ξk 1 f (x k ). Theorem. For any k 1 we have [ φ k f 2 k N L α j j=1 ] R 2 1 α (x 0), { where R β (x 0 ) = max max x x β : f (x) f (x 0 ) x x X }. Yu. Nesterov () Huge-scale optimization problems 12/32 March 9, / 32
13 Interpretation 1. α = 0. Then S 0 = N, and we get φ k f 2N k R2 1 (x 0). Note We use the metric x 2 1 = N L i [x (i) ] 2. i=1 A matrix with diagonal {L i } N i=1 can have its norm equal to n. Hence, for GM we can guarantee the same bound. But its cost of iteration is much higher! Yu. Nesterov () Huge-scale optimization problems 13/32 March 9, / 32
14 Interpretation 2. α = 1 2. Denote { D (x 0 ) = max max x y X Then, R 2 1/2 (x 0) S 1/2 D 2 (x 0 ), and we obtain } max x (i) y (i) : f (x) f (x 0 ). 1 i N Note: [ N ] 2 φ k f 2 k L 1/2 i D (x 2 0 ). i=1 For the first order methods, the worst-case complexity of minimizing over a box depends on N. Since S 1/2 can be bounded, RCDM can be applied in situations when the usual GM fail. Yu. Nesterov () Huge-scale optimization problems 14/32 March 9, / 32
15 Interpretation 3. α = 1. Then R 0 (x 0 ) is the size of the initial level set in the standard Euclidean norm. Hence, φ k f [ N ] [ ] 2 k L i R0 2(x 0) 2N k 1 N N L i R0 2(x 0). i=1 i=1 Rate of convergence of GM can be estimated as f (x k ) f γ k R2 0 (x 0 ), where γ satisfies condition f (x) γ I, x R N. Note: maximal eigenvalue of symmetric matrix can reach its trace. In the worst case, the rate of convergence of GM is the same as that of RCDM. Yu. Nesterov () Huge-scale optimization problems 15/32 March 9, / 32
16 Minimizing the strongly convex functions Theorem. Let f (x) be strongly convex with respect to 1 α with convexity parameter σ 1 α > 0. Then, for {x k } generated by RCDM(α, x 0 ) we have φ k φ ( 1 σ 1 α S α ) k (f (x0 ) f ). Proof: Let x k be generated by RCDM after k iterations. Let us estimate the expected result of the next iteration. f (x k ) E ik (f (x k+1 )) = N N i=1 σ 1 α i=1 p (i) α [f (x k ) f (T i (x k ))] p (i) α 2L i [f i (x k)] 2 = 1 2S α ( f (x k ) 1 α )2 S α (f (x k ) f ). It remains to compute expectation in ξ k 1. Yu. Nesterov () Huge-scale optimization problems 16/32 March 9, / 32
17 Confidence level of the answers Note: We have proved that the expected values of random f (x k ) are good. Can we guarantee anything after a single run? Confidence level: Probability β (0, 1), that some statement about random output is correct. Main tool: Chebyschev inequality (ξ 0): Our situation: Prob [ξ T ] E(ξ) T. Prob [f (x k ) f ɛ] 1 ɛ [φ k f ] 1 β. We need φ k f ɛ (1 β). Too expensive for β 1? Yu. Nesterov () Huge-scale optimization problems 17/32 March 9, / 32
18 Regularization technique Consider f µ (x) = f (x) + µ 2 x x α. It is strongly convex. Therefore, we can obtain φ k f µ ɛ (1 β) in Theorem. Define α = 1, µ = ( ) O 1 µ S 1 α ln ɛ (1 β) iterations. k 1 + 8S 1R 2 0 (x 0) ɛ ɛ 4R0 2(x, and choose 0) [ ] ln 2S 1R0 2(x 0) ɛ + ln 1 1 β. Let x k be generated by RCDM(1, x 0 ) as applied to f µ.then Prob (f (x k ) f ɛ) β. Note: β = 1 10 p ln 10 p = 2.3p. Yu. Nesterov () Huge-scale optimization problems 18/32 March 9, / 32
19 Implementation details: Random Counter Given the values L i, i = 1,..., N, generate efficiently random i {1,..., N} with probabilities Prob [i = k] = L k / N L j. Solution: a) Trivial O(N) operations. b). Assume N = 2 p. Define p + 1 vectors S k R 2p k, k = 0,..., p: S (i) 0 = L i, i = 1,..., N. j=1 S (i) k = S (2i) k 1 + S (2i 1) k 1, i = 1,..., 2 p k, k = 1,..., p. Algorithm: Make the choice in p steps, from top to bottom. If the element i of S k is chosen, then choose in S k 1 either 2i or 2i 1 in accordance to probabilities S(2i) k 1 or S(2i 1) k 1. S (i) k S (i) k Difference: for n = 2 20 > 10 6 we have p = log 2 N = 20. Yu. Nesterov () Huge-scale optimization problems 19/32 March 9, / 32
20 Sparse problems Problem: min f (x), x Q where Q is closed and convex in R N, and f (x) = Ψ(Ax), where Ψ is a simple convex function: Ψ(y 1 ) Ψ(y 2 ) + Ψ (y 2 ), y 1 y 2, y 1, y 2 R M, A : R N R M is a sparse matrix. Let p(x) def = # of nonzeros in x. Sparsity coefficient: γ(a) def = p(a) MN. Example 1: Matrix-vector multiplication Computation of vector Ax needs p(a) operations. Initial complexity MN is reduced in γ(a) times. Yu. Nesterov () Huge-scale optimization problems 20/32 March 9, / 32
21 Gradient Method x 0 Q, x k+1 = π Q (x k hf (x k )), k 0. Main computational expenses Projection of simple set Q needs O(N) operations. Displacement x k x k hf (x k ) needs O(N) operations. f (x) = A T Ψ (Ax). If Ψ is simple, then the main efforts are spent for two matrix-vector multiplications: 2p(A). Conclusion: As compared with full matrices, we accelerate in γ(a) times. Note: For Large- and Huge-scale problems, we often have γ(a) Can we get more? Yu. Nesterov () Huge-scale optimization problems 21/32 March 9, / 32
22 Sparse updating strategy Main idea After update x + = x + d we have y + def = Ax + = Ax What happens if d is sparse? Denote σ(d) = {j : d (j) 0}. Then y + = y + Its complexity, κ A (d) def = κ A (d) = M j σ(d) j σ(d) p(ae j ), γ(ae j ) = γ(d) 1 p(d) γ(d) max 1 j m γ(ae j) MN. j σ(d) }{{} y +Ad. d (j) Ae j. can be VERY small! j σ(d) γ(ae j ) MN If γ(d) cγ(a), γ(a j ) cγ(a), then κ A (d) c 2 γ 2 (A) MN. Expected acceleration: (10 6 ) 2 = sec years Yu. Nesterov () Huge-scale optimization problems 22/32 March 9, / 32
23 When it can work? Simple methods: No full-vector operations! (Is it possible?) Simple problems: Examples Functions with sparse gradients. 1 Quadratic function f (x) = 1 2 Ax, x b, x. The gradient f (x) = Ax b, x R N, is not sparse even if A is sparse. 2 Piece-wise linear function g(x) = max [ a i, x b (i) ]. Its 1 i m subgradient f (x) = a i(x), i(x) : f (x) = a i(x), x b (i(x)), can be sparse if a i is sparse! But: We need a fast procedure for updating max-operations. Yu. Nesterov () Huge-scale optimization problems 23/32 March 9, / 32
24 Fast updates in short computational trees Def: Function f (x), x R n, is short-tree representable, if it can be computed by a short binary tree with the height ln n. Let n = 2 k and the tree has k + 1 levels: v 0,i = x (i), i = 1,..., n. Size of the next level halves the size of the previous one: v i+1,j = ψ i+1,j (v i,2j 1, v i,2j ), j = 1,..., 2 k i 1, i = 0,..., k 1, where ψ i,j are some bivariate functions. v k 1,1 v k,1 v k 1, v 2,1 v 2,n/4 v 1,1 v 1,2... v 1,n/2 1 v 1,n/2 v 0,1 v 0,2 v 0,3 v 0,4 v 0,n 3 v 0,n 2 v 0,n 1 v 0,n Yu. Nesterov () Huge-scale optimization problems 24/32 March 9, / 32
25 Main advantages Important examples (symmetric functions) f (x) = x ( p, p 1, ψ i,j (t 1, t 2 ) [ t 1 p + t 2 p ] 1/p, n ) f (x) = ln e x(i), ψ i,j (t 1, t 2 ) ln (e t 1 + e t 2 ), i=1 f (x) = max x (i), ψ i,j (t 1, t 2 ) max {t 1, t 2 }. 1 i n The binary tree requires only n 1 auxiliary cells. Its value needs n 1 applications of ψ i,j (, ) ( operations). If x + differs from x in one entry only, then for re-computing f (x + ) we need only k log 2 n operations. Thus, we can have pure subgradient minimization schemes with Sublinear Iteration Cost. Yu. Nesterov () Huge-scale optimization problems 25/32 March 9, / 32
26 Simple subgradient methods I. Problem: f def = min x Q f (x), where Q is a closed and convex and f (x) L(f ), x Q, the optimal value f is known. Consider the following optimization scheme (B.Polyak, 1967): ( x 0 Q, x k+1 = π Q x k f (x k) f ) f (x k ) 2 f (x k ), k 0. Denote f k = min 0 i k f (x i). Then for any k 0 we have: f k f L(f ) x 0 π X (x 0 ) (k+1) 1/2, x k x x 0 x, x X. Yu. Nesterov () Huge-scale optimization problems 26/32 March 9, / 32
27 Proof: Let us fix x X. Denote r k (x ) = x k x. r 2 k+1 (x ) Then x k f (x k) f f (x f (x k ) 2 k ) x 2 = rk 2(x ) 2 f (x k) f f (x f (x k ) 2 k ), x k x + (f (x k) f ) 2 f (x k ) 2 r 2 k (x ) (f (x k) f ) 2 f (x k ) 2 r 2 k (x ) (f k f ) 2 L 2 (f ). From this reasoning, x k+1 x 2 x k x 2, x X. Corollary: Assume X has recession direction d. Then x k π X (x 0 ) x 0 π X (x 0 ), d, x k d, x 0. (Proof: consider x = π X (x 0 ) + αd, α 0.) Yu. Nesterov () Huge-scale optimization problems 27/32 March 9, / 32
28 Constrained minimization (N.Shor (1964) & B.Polyak) II. Problem: min{f (x) : g(x) 0}, where x Q Q is closed and convex, f, g have uniformly bounded subgradients. Consider the following method. It has step-size parameter h > 0. ( ) If g(x k ) > h g (x k ), then (A): x k+1 = π Q x k g(x k) g (x g (x k ) 2 k, ) else (B): x k+1 = π Q (x k h f (x k ) f (x k ). Let F k {0,..., k} be the set (B)-iterations, and f k Theorem: If k > x 0 x 2 /h 2, then F k and = min i F k f (x i ). f k f (x) hl(f ), max g(x i ) hl(g). i F k Yu. Nesterov () Huge-scale optimization problems 28/32 March 9, / 32
29 Computational strategies 1. Constants L(f ), L(g) are known (e.g. Linear Programming) We can take h = of steps N (easy!). ɛ max{l(f ),L(g)}. Note: The standard advice is h = Then we need to decide on the number R N+1 (much more difficult!) 2. Constants L(f ), L(g) are not known Start from a guess. Restart from scratch each time we see the guess is wrong. The guess is doubled after restart. 3. Tracking the record value fk Double run. Other ideas are welcome! Yu. Nesterov () Huge-scale optimization problems 29/32 March 9, / 32
30 Application examples Observations: 1 Very often, Large- and Huge- scale problems have repetitive sparsity patterns and/or limited connectivity. Social networks. Mobile phone networks. Truss topology design (local bars). Finite elements models (2D: four neighbors, 3D: six neighbors). 2 For p-diagonal matrices κ(a) p 2. Yu. Nesterov () Huge-scale optimization problems 30/32 March 9, / 32
31 Nonsmooth formulation of Google Problem Main property of spectral radius (A 0) If A R+ n n, then ρ(a) = min max 1 e x 0 1 i n x (i) i, Ax. The minimum is attained at the corresponding eigenvector. Since ρ(ē) = 1, our problem is as follows: f (x) def = max 1 i N [ e i, Ēx x (i) ] min x 0. Interpretation: Maximizing the self-esteem! Since f = 0, we can apply Polyak s method with sparse updates. Additional features; the optimal set X is a convex cone. If x 0 = e, then the whole sequence is separated from zero: x, e x, x k x 1 x k = x, e x k. Goal: Find x 0 such that x 1 and f ( x) ɛ. (First condition is satisfied automatically.) Yu. Nesterov () Huge-scale optimization problems 31/32 March 9, / 32
32 Computational experiments: Iteration Cost We compare Polyak s GM with sparse update (GM s ) with the standard one (GM). Setup: Each agent has exactly p random friends. Thus, κ(a) p 2. Iteration Cost: GM s p 2 log 2 N, Time for 10 4 iterations (p = 32) N κ(a) GM s GM GM pn. Time for 10 3 iterations (p = 16) N κ(a) GM s GM sec 100 min! Yu. Nesterov () Huge-scale optimization problems 32/32 March 9, / 32
Subgradient methods for huge-scale optimization problems
Subgradient methods for huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) May 24, 2012 (Edinburgh, Scotland) Yu. Nesterov Subgradient methods for huge-scale problems 1/24 Outline 1 Problems
More informationSubgradient methods for huge-scale optimization problems
CORE DISCUSSION PAPER 2012/02 Subgradient methods for huge-scale optimization problems Yu. Nesterov January, 2012 Abstract We consider a new class of huge-scale problems, the problems with sparse subgradients.
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationCoordinate Descent Methods on Huge-Scale Optimization Problems
Coordinate Descent Methods on Huge-Scale Optimization Problems Zhimin Peng Optimization Group Meeting Warm up exercise? Warm up exercise? Q: Why do mathematicians, after a dinner at a Chinese restaurant,
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationComplexity and Simplicity of Optimization Problems
Complexity and Simplicity of Optimization Problems Yurii Nesterov, CORE/INMA (UCL) February 17 - March 23, 2012 (ULg) Yu. Nesterov Algorithmic Challenges in Optimization 1/27 Developments in Computer Sciences
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationPrimal-dual IPM with Asymmetric Barrier
Primal-dual IPM with Asymmetric Barrier Yurii Nesterov, CORE/INMA (UCL) September 29, 2008 (IFOR, ETHZ) Yu. Nesterov Primal-dual IPM with Asymmetric Barrier 1/28 Outline 1 Symmetric and asymmetric barriers
More informationOn Nesterov s Random Coordinate Descent Algorithms
On Nesterov s Random Coordinate Descent Algorithms Zheng Xu University of Texas At Arlington February 19, 2015 1 Introduction Full-Gradient Descent Coordinate Descent 2 Random Coordinate Descent Algorithm
More informationOn Nesterov s Random Coordinate Descent Algorithms - Continued
On Nesterov s Random Coordinate Descent Algorithms - Continued Zheng Xu University of Texas At Arlington February 20, 2015 1 Revisit Random Coordinate Descent The Random Coordinate Descent Upper and Lower
More informationLecture 15 Newton Method and Self-Concordance. October 23, 2008
Newton Method and Self-Concordance October 23, 2008 Outline Lecture 15 Self-concordance Notion Self-concordant Functions Operations Preserving Self-concordance Properties of Self-concordant Functions Implications
More informationAccelerating Nesterov s Method for Strongly Convex Functions
Accelerating Nesterov s Method for Strongly Convex Functions Hao Chen Xiangrui Meng MATH301, 2011 Outline The Gap 1 The Gap 2 3 Outline The Gap 1 The Gap 2 3 Our talk begins with a tiny gap For any x 0
More informationRandom coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara
Random coordinate descent algorithms for huge-scale optimization problems Ion Necoara Automatic Control and Systems Engineering Depart. 1 Acknowledgement Collaboration with Y. Nesterov, F. Glineur ( Univ.
More informationEfficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization
Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization Martin Takáč The University of Edinburgh Based on: P. Richtárik and M. Takáč. Iteration complexity of randomized
More informationA random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints
Comput. Optim. Appl. manuscript No. (will be inserted by the editor) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints Ion
More informationLecture 5 : Projections
Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization
More informationORIE 6334 Spectral Graph Theory October 13, Lecture 15
ORIE 6334 Spectral Graph heory October 3, 206 Lecture 5 Lecturer: David P. Williamson Scribe: Shijin Rajakrishnan Iterative Methods We have seen in the previous lectures that given an electrical network,
More informationNOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained
NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS 1. Introduction. We consider first-order methods for smooth, unconstrained optimization: (1.1) minimize f(x), x R n where f : R n R. We assume
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationmin f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;
Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many
More informationSubgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725
Subgradient Method Guest Lecturer: Fatma Kilinc-Karzan Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization 10-725/36-725 Adapted from slides from Ryan Tibshirani Consider the problem Recall:
More information15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018
15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018 Usual rules. :) Exercises 1. Lots of Flows. Suppose you wanted to find an approximate solution to the following
More informationSubgradient Method. Ryan Tibshirani Convex Optimization
Subgradient Method Ryan Tibshirani Convex Optimization 10-725 Consider the problem Last last time: gradient descent min x f(x) for f convex and differentiable, dom(f) = R n. Gradient descent: choose initial
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationProximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725
Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationNewton Method with Adaptive Step-Size for Under-Determined Systems of Equations
Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations Boris T. Polyak Andrey A. Tremba V.A. Trapeznikov Institute of Control Sciences RAS, Moscow, Russia Profsoyuznaya, 65, 117997
More informationConjugate Gradient (CG) Method
Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous
More informationIterative solvers for linear equations
Spectral Graph Theory Lecture 15 Iterative solvers for linear equations Daniel A. Spielman October 1, 009 15.1 Overview In this and the next lecture, I will discuss iterative algorithms for solving linear
More informationCS 246 Review of Linear Algebra 01/17/19
1 Linear algebra In this section we will discuss vectors and matrices. We denote the (i, j)th entry of a matrix A as A ij, and the ith entry of a vector as v i. 1.1 Vectors and vector operations A vector
More informationAccelerating the cubic regularization of Newton s method on convex problems
Accelerating the cubic regularization of Newton s method on convex problems Yu. Nesterov September 005 Abstract In this paper we propose an accelerated version of the cubic regularization of Newton s method
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationPrimal-Dual Interior-Point Methods for Linear Programming based on Newton s Method
Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method Robert M. Freund March, 2004 2004 Massachusetts Institute of Technology. The Problem The logarithmic barrier approach
More informationPrimal-dual subgradient methods for convex problems
Primal-dual subgradient methods for convex problems Yu. Nesterov March 2002, September 2005 (after revision) Abstract In this paper we present a new approach for constructing subgradient schemes for different
More informationRobust PageRank: Stationary Distribution on a Growing Network Structure
oname manuscript o. will be inserted by the editor Robust PageRank: Stationary Distribution on a Growing etwork Structure Anna Timonina-Farkas Received: date / Accepted: date Abstract PageRank PR is a
More informationIterative Methods for Solving A x = b
Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http
More informationNumerical Methods I Non-Square and Sparse Linear Systems
Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant
More informationChapter 7 Iterative Techniques in Matrix Algebra
Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationARock: an algorithmic framework for asynchronous parallel coordinate updates
ARock: an algorithmic framework for asynchronous parallel coordinate updates Zhimin Peng, Yangyang Xu, Ming Yan, Wotao Yin ( UCLA Math, U.Waterloo DCO) UCLA CAM Report 15-37 ShanghaiTech SSDS 15 June 25,
More information6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection
6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE Three Alternatives/Remedies for Gradient Projection Two-Metric Projection Methods Manifold Suboptimization Methods
More informationLinear Solvers. Andrew Hazel
Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More information6. Iterative Methods for Linear Systems. The stepwise approach to the solution...
6 Iterative Methods for Linear Systems The stepwise approach to the solution Miriam Mehl: 6 Iterative Methods for Linear Systems The stepwise approach to the solution, January 18, 2013 1 61 Large Sparse
More informationFast Linear Iterations for Distributed Averaging 1
Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider
More informationBindel, Fall 2016 Matrix Computations (CS 6210) Notes for
1 Iteration basics Notes for 2016-11-07 An iterative solver for Ax = b is produces a sequence of approximations x (k) x. We always stop after finitely many steps, based on some convergence criterion, e.g.
More informationImproving the Convergence of Back-Propogation Learning with Second Order Methods
the of Back-Propogation Learning with Second Order Methods Sue Becker and Yann le Cun, Sept 1988 Kasey Bray, October 2017 Table of Contents 1 with Back-Propagation 2 the of BP 3 A Computationally Feasible
More informationConvex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization
Convex Optimization Ofer Meshi Lecture 6: Lower Bounds Constrained Optimization Lower Bounds Some upper bounds: #iter μ 2 M #iter 2 M #iter L L μ 2 Oracle/ops GD κ log 1/ε M x # ε L # x # L # ε # με f
More informationConvex Optimization Lecture 16
Convex Optimization Lecture 16 Today: Projected Gradient Descent Conditional Gradient Descent Stochastic Gradient Descent Random Coordinate Descent Recall: Gradient Descent (Steepest Descent w.r.t Euclidean
More informationRandomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints
Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints By I. Necoara, Y. Nesterov, and F. Glineur Lijun Xu Optimization Group Meeting November 27, 2012 Outline
More information5. Subgradient method
L. Vandenberghe EE236C (Spring 2016) 5. Subgradient method subgradient method convergence analysis optimal step size when f is known alternating projections optimality 5-1 Subgradient method to minimize
More informationA Sparsity Preserving Stochastic Gradient Method for Composite Optimization
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization Qihang Lin Xi Chen Javier Peña April 3, 11 Abstract We propose new stochastic gradient algorithms for solving convex composite
More information8 Numerical methods for unconstrained problems
8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields
More informationDLM: Decentralized Linearized Alternating Direction Method of Multipliers
1 DLM: Decentralized Linearized Alternating Direction Method of Multipliers Qing Ling, Wei Shi, Gang Wu, and Alejandro Ribeiro Abstract This paper develops the Decentralized Linearized Alternating Direction
More informationCHAPTER 11. A Revision. 1. The Computers and Numbers therein
CHAPTER A Revision. The Computers and Numbers therein Traditional computer science begins with a finite alphabet. By stringing elements of the alphabet one after another, one obtains strings. A set of
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationGradient Descent. Ryan Tibshirani Convex Optimization /36-725
Gradient Descent Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: canonical convex programs Linear program (LP): takes the form min x subject to c T x Gx h Ax = b Quadratic program (QP): like
More informationAccelerated Proximal Gradient Methods for Convex Optimization
Accelerated Proximal Gradient Methods for Convex Optimization Paul Tseng Mathematics, University of Washington Seattle MOPTA, University of Guelph August 18, 2008 ACCELERATED PROXIMAL GRADIENT METHODS
More informationLECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel
LECTURE NOTES on ELEMENTARY NUMERICAL METHODS Eusebius Doedel TABLE OF CONTENTS Vector and Matrix Norms 1 Banach Lemma 20 The Numerical Solution of Linear Systems 25 Gauss Elimination 25 Operation Count
More informationNonlinear Programming Algorithms Handout
Nonlinear Programming Algorithms Handout Michael C. Ferris Computer Sciences Department University of Wisconsin Madison, Wisconsin 5376 September 9 1 Eigenvalues The eigenvalues of a matrix A C n n are
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning First-Order Methods, L1-Regularization, Coordinate Descent Winter 2016 Some images from this lecture are taken from Google Image Search. Admin Room: We ll count final numbers
More informationProximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization
Proximal Newton Method Zico Kolter (notes by Ryan Tibshirani) Convex Optimization 10-725 Consider the problem Last time: quasi-newton methods min x f(x) with f convex, twice differentiable, dom(f) = R
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationOptimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30
Optimization Escuela de Ingeniería Informática de Oviedo (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30 Unconstrained optimization Outline 1 Unconstrained optimization 2 Constrained
More informationConjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)
Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps
More informationSubgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus
1/41 Subgradient Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes definition subgradient calculus duality and optimality conditions directional derivative Basic inequality
More informationLecture 10: October 27, 2016
Mathematical Toolkit Autumn 206 Lecturer: Madhur Tulsiani Lecture 0: October 27, 206 The conjugate gradient method In the last lecture we saw the steepest descent or gradient descent method for finding
More informationConstrained Optimization
1 / 22 Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 30, 2015 2 / 22 1. Equality constraints only 1.1 Reduced gradient 1.2 Lagrange
More informationLecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University
Lecture Note 7: Iterative methods for solving linear systems Xiaoqun Zhang Shanghai Jiao Tong University Last updated: December 24, 2014 1.1 Review on linear algebra Norms of vectors and matrices vector
More informationInverse Power Method for Non-linear Eigenproblems
Inverse Power Method for Non-linear Eigenproblems Matthias Hein and Thomas Bühler Anubhav Dwivedi Department of Aerospace Engineering & Mechanics 7th March, 2017 1 / 30 OUTLINE Motivation Non-Linear Eigenproblems
More informationOn the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,
Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,
More informationLecture 13: Spectral Graph Theory
CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 13: Spectral Graph Theory Lecturer: Shayan Oveis Gharan 11/14/18 Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationSolving Dual Problems
Lecture 20 Solving Dual Problems We consider a constrained problem where, in addition to the constraint set X, there are also inequality and linear equality constraints. Specifically the minimization problem
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2016) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationQuadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.
Quadratic forms 1. Symmetric matrices An n n matrix (a ij ) n ij=1 with entries on R is called symmetric if A T, that is, if a ij = a ji for all 1 i, j n. We denote by S n (R) the set of all n n symmetric
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationChapter 2. Solving Systems of Equations. 2.1 Gaussian elimination
Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix
More informationWe describe the generalization of Hazan s algorithm for symmetric programming
ON HAZAN S ALGORITHM FOR SYMMETRIC PROGRAMMING PROBLEMS L. FAYBUSOVICH Abstract. problems We describe the generalization of Hazan s algorithm for symmetric programming Key words. Symmetric programming,
More informationDesign of Projection Matrix for Compressive Sensing by Nonsmooth Optimization
Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima
More informationIntroduction to Semidefinite Programming I: Basic properties a
Introduction to Semidefinite Programming I: Basic properties and variations on the Goemans-Williamson approximation algorithm for max-cut MFO seminar on Semidefinite Programming May 30, 2010 Semidefinite
More informationAssignment 1: From the Definition of Convexity to Helley Theorem
Assignment 1: From the Definition of Convexity to Helley Theorem Exercise 1 Mark in the following list the sets which are convex: 1. {x R 2 : x 1 + i 2 x 2 1, i = 1,..., 10} 2. {x R 2 : x 2 1 + 2ix 1x
More informationTopics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems
Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationMath Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.
Math 5620 - Introduction to Numerical Analysis - Class Notes Fernando Guevara Vasquez Version 1990. Date: January 17, 2012. 3 Contents 1. Disclaimer 4 Chapter 1. Iterative methods for solving linear systems
More informationStochastic and online algorithms
Stochastic and online algorithms stochastic gradient method online optimization and dual averaging method minimizing finite average Stochastic and online optimization 6 1 Stochastic optimization problem
More informationSemidefinite Programming
Semidefinite Programming Basics and SOS Fernando Mário de Oliveira Filho Campos do Jordão, 2 November 23 Available at: www.ime.usp.br/~fmario under talks Conic programming V is a real vector space h, i
More informationInterior-Point Methods for Linear Optimization
Interior-Point Methods for Linear Optimization Robert M. Freund and Jorge Vera March, 204 c 204 Robert M. Freund and Jorge Vera. All rights reserved. Linear Optimization with a Logarithmic Barrier Function
More informationLecture 8: February 9
0-725/36-725: Convex Optimiation Spring 205 Lecturer: Ryan Tibshirani Lecture 8: February 9 Scribes: Kartikeya Bhardwaj, Sangwon Hyun, Irina Caan 8 Proximal Gradient Descent In the previous lecture, we
More informationLecture: Local Spectral Methods (1 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 18-03/31/2015 Lecture: Local Spectral Methods (1 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough. They provide
More informationDuality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725
Duality in Linear Programs Lecturer: Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: proximal gradient descent Consider the problem x g(x) + h(x) with g, h convex, g differentiable, and
More informationComposite nonlinear models at scale
Composite nonlinear models at scale Dmitriy Drusvyatskiy Mathematics, University of Washington Joint work with D. Davis (Cornell), M. Fazel (UW), A.S. Lewis (Cornell) C. Paquette (Lehigh), and S. Roy (UW)
More informationAn Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84
An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84 Introduction Almost all numerical methods for solving PDEs will at some point be reduced to solving A
More information8.1 Concentration inequality for Gaussian random matrix (cont d)
MGMT 69: Topics in High-dimensional Data Analysis Falll 26 Lecture 8: Spectral clustering and Laplacian matrices Lecturer: Jiaming Xu Scribe: Hyun-Ju Oh and Taotao He, October 4, 26 Outline Concentration
More informationSemidefinite Programming Basics and Applications
Semidefinite Programming Basics and Applications Ray Pörn, principal lecturer Åbo Akademi University Novia University of Applied Sciences Content What is semidefinite programming (SDP)? How to represent
More informationITERATIVE METHODS BASED ON KRYLOV SUBSPACES
ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More information