Lecture 3: Huge-scale optimization problems

Similar documents
Subgradient methods for huge-scale optimization problems

Subgradient methods for huge-scale optimization problems

Gradient methods for minimizing composite functions

Coordinate Descent Methods on Huge-Scale Optimization Problems

Gradient methods for minimizing composite functions

Complexity and Simplicity of Optimization Problems

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual IPM with Asymmetric Barrier

On Nesterov s Random Coordinate Descent Algorithms

On Nesterov s Random Coordinate Descent Algorithms - Continued

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Accelerating Nesterov s Method for Strongly Convex Functions

Random coordinate descent algorithms for. huge-scale optimization problems. Ion Necoara

Efficient Serial and Parallel Coordinate Descent Methods for Huge-Scale Convex Optimization

A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints

Lecture 5 : Projections

ORIE 6334 Spectral Graph Theory October 13, Lecture 15

NOTES ON FIRST-ORDER METHODS FOR MINIMIZING SMOOTH FUNCTIONS. 1. Introduction. We consider first-order methods for smooth, unconstrained

15-780: LinearProgramming

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

15-850: Advanced Algorithms CMU, Fall 2018 HW #4 (out October 17, 2018) Due: October 28, 2018

Subgradient Method. Ryan Tibshirani Convex Optimization

Cubic regularization of Newton s method for convex problems with constraints

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Optimization methods

Newton Method with Adaptive Step-Size for Under-Determined Systems of Equations

Conjugate Gradient (CG) Method

Iterative solvers for linear equations

CS 246 Review of Linear Algebra 01/17/19

Accelerating the cubic regularization of Newton s method on convex problems

Sparsity Regularization

Primal-Dual Interior-Point Methods for Linear Programming based on Newton s Method

Primal-dual subgradient methods for convex problems

Robust PageRank: Stationary Distribution on a Growing Network Structure

Iterative Methods for Solving A x = b

Numerical Methods I Non-Square and Sparse Linear Systems

Chapter 7 Iterative Techniques in Matrix Algebra

Iteration-complexity of first-order penalty methods for convex programming

ARock: an algorithmic framework for asynchronous parallel coordinate updates

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

Linear Solvers. Andrew Hazel

Conditional Gradient (Frank-Wolfe) Method

6. Iterative Methods for Linear Systems. The stepwise approach to the solution...

Fast Linear Iterations for Distributed Averaging 1

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

Improving the Convergence of Back-Propogation Learning with Second Order Methods

Convex Optimization. Ofer Meshi. Lecture 6: Lower Bounds Constrained Optimization

Convex Optimization Lecture 16

Randomized Coordinate Descent Methods on Optimization Problems with Linearly Coupled Constraints

5. Subgradient method

A Sparsity Preserving Stochastic Gradient Method for Composite Optimization

8 Numerical methods for unconstrained problems

DLM: Decentralized Linearized Alternating Direction Method of Multipliers

CHAPTER 11. A Revision. 1. The Computers and Numbers therein

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Gradient Descent. Ryan Tibshirani Convex Optimization /36-725

Accelerated Proximal Gradient Methods for Convex Optimization

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Nonlinear Programming Algorithms Handout

CPSC 540: Machine Learning

Proximal Newton Method. Zico Kolter (notes by Ryan Tibshirani) Convex Optimization

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

Optimization. Escuela de Ingeniería Informática de Oviedo. (Dpto. de Matemáticas-UniOvi) Numerical Computation Optimization 1 / 30

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Lecture 10: October 27, 2016

Constrained Optimization

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

Inverse Power Method for Non-linear Eigenproblems

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

Lecture 13: Spectral Graph Theory

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

Solving Dual Problems

6. Proximal gradient method

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Quadratic forms. Here. Thus symmetric matrices are diagonalizable, and the diagonalization can be performed by means of an orthogonal matrix.

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

We describe the generalization of Hazan s algorithm for symmetric programming

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Introduction to Semidefinite Programming I: Basic properties a

Assignment 1: From the Definition of Convexity to Helley Theorem

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

Stochastic and online algorithms

Semidefinite Programming

Interior-Point Methods for Linear Optimization

Lecture 8: February 9

Lecture: Local Spectral Methods (1 of 4)

Duality in Linear Programs. Lecturer: Ryan Tibshirani Convex Optimization /36-725

Composite nonlinear models at scale

An Introduction to Algebraic Multigrid (AMG) Algorithms Derrick Cerwinsky and Craig C. Douglas 1/84

8.1 Concentration inequality for Gaussian random matrix (cont d)

Semidefinite Programming Basics and Applications

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Transcription:

Liege University: Francqui Chair 2011-2012 Lecture 3: Huge-scale optimization problems Yurii Nesterov, CORE/INMA (UCL) March 9, 2012 Yu. Nesterov () Huge-scale optimization problems 1/32March 9, 2012 1 / 32

Outline 1 Problems sizes 2 Random coordinate search 3 Confidence level of solutions 4 Sparse Optimization problems 5 Sparse updates for linear operators 6 Fast updates in computational trees 7 Simple subgradient methods 8 Application examples Yu. Nesterov () Huge-scale optimization problems 2/32March 9, 2012 2 / 32

Nonlinear Optimization: problems sizes Class Operations Dimension Iter.Cost Memory Small-size All 10 0 10 2 n 4 n 3 Kilobyte: 10 3 Medium-size A 1 10 3 10 4 n 3 n 2 Megabyte: 10 6 Large-scale Ax 10 5 10 7 n 2 n Gigabyte: 10 9 Huge-scale x + y 10 8 10 12 n log n Terabyte: 10 12 Sources of Huge-Scale problems Internet (New) Telecommunications (New) Finite-element schemes (Old) Partial differential equations (Old) Yu. Nesterov () Huge-scale optimization problems 3/32March 9, 2012 3 / 32

Very old optimization idea: Coordinate Search Problem: min f (x) x R n (f is convex and differentiable). Coordinate relaxation algorithm For k 0 iterate 1 Choose active coordinate i k. 2 Update x k+1 = x k h k ik f (x k )e ik ensuring f (x k+1 ) f (x k ). (e i is ith coordinate vector in R n.) Main advantage: Very simple implementation. Yu. Nesterov () Huge-scale optimization problems 4/32March 9, 2012 4 / 32

Possible strategies 1 Cyclic moves. (Difficult to analyze.) 2 Random choice of coordinate (Why?) 3 Choose coordinate with the maximal directional derivative. Complexity estimate: assume f (x) f (y) L x y, x, y R n. Let us choose h k = 1 L. Then f (x k ) f (x k+1 ) 1 2L i k f (x k ) 2 1 2nL f (x k) 2 1 2nLR 2 (f (x k ) f ) 2. Hence, f (x k ) f 2nLR2 k, k 1. (For Grad.Method, drop n.) This is the only known theoretical result known for CDM! Yu. Nesterov () Huge-scale optimization problems 5/32March 9, 2012 5 / 32

Criticism Theoretical justification: Complexity bounds are not known for the most of the schemes. The only justified scheme needs computation of the whole gradient. (Why don t use GM?) Computational complexity: Fast differentiation: if function is defined by a sequence of operations, then C( f ) 4C(f ). Can we do anything without computing the function s values? Result: CDM are almost out of the computational practice. Yu. Nesterov () Huge-scale optimization problems 6/32March 9, 2012 6 / 32

Google problem Let E R n n be an incidence matrix of a graph. and Ē = E diag (E T e) 1. Thus, Ē T e = e. Our problem is as follows: Denote e = (1,..., 1) T Optimization formulation: Find x 0 : Ēx = x. f (x) def = 1 2 Ēx x 2 + γ 2 [ e, x 1]2 min x R n Yu. Nesterov () Huge-scale optimization problems 7/32March 9, 2012 7 / 32

Huge-scale problems Main features The size is very big (n 10 7 ). The data is distributed in space. The requested parts of data are not always available. The data is changing in time. Consequences Simplest operations are expensive or infeasible: Update of the full vector of variables. Matrix-vector multiplication. Computation of the objective function s value, etc. Yu. Nesterov () Huge-scale optimization problems 8/32March 9, 2012 8 / 32

Structure of the Google Problem Let ua look at the gradient of the objective: Main observations: i f (x) = a i, g(x) + γ[ e, x 1], i = 1,..., n, g(x) = Ēx x R n, (Ē = (a 1,..., a n )). The coordinate move x + = x h i i f (x)e i needs O(p i ) a.o. (p i is the number of nonzero elements in a i.) ( ) def d i = diag 2 f def = Ē T Ē + γee T = γ + 1 p i are available. We can use them for choosing the step sizes (h i = 1 d i ). i Reasonable coordinate choice strategy? Random! Yu. Nesterov () Huge-scale optimization problems 9/32March 9, 2012 9 / 32

Random coordinate descent methods (RCDM) min f (x), x R N (f is convex and differentiable) Main Assumption: f i (x + h ie i ) f i (x) L i h i, h i R, i = 1,..., N, where e i is a coordinate vector. Then f (x + h i e i ) f (x) + f i (x)h i + L i 2 h2 i. x RN, h i R. Define the coordinate steps: T i (x) def = x 1 L i f i (x)e i. Then, f (x) f (T i (x)) 1 2L i [f i (x)]2, i = 1,..., N. Yu. Nesterov () Huge-scale optimization problems 10/32 March 9, 2012 10 / 32

Random coordinate choice We need a special random counter R α, α R: Prob [i] = p (i) α = L α i [ N L α j j=1 Note: R 0 generates uniform distribution. ] 1, i = 1,..., N. Method RCDM(α, x 0 ) For k 0 iterate: 1) Choose i k = R α. 2) Update x k+1 = T ik (x k ). Yu. Nesterov () Huge-scale optimization problems 11/32 March 9, 2012 11 / 32

Complexity bounds for RCDM We need to introduce the following norms for x, g R N : x α = [ N ] 1/2 L α i [x (i) ] 2, g α = i=1 [ N i=1 1 L α i [g (i) ] 2 ] 1/2. After k iterations, RCDM(α, x 0 ) generates random output x k, which depends on ξ k = {i 0,..., i k }. Denote φ k = E ξk 1 f (x k ). Theorem. For any k 1 we have [ φ k f 2 k N L α j j=1 ] R 2 1 α (x 0), { where R β (x 0 ) = max max x x β : f (x) f (x 0 ) x x X }. Yu. Nesterov () Huge-scale optimization problems 12/32 March 9, 2012 12 / 32

Interpretation 1. α = 0. Then S 0 = N, and we get φ k f 2N k R2 1 (x 0). Note We use the metric x 2 1 = N L i [x (i) ] 2. i=1 A matrix with diagonal {L i } N i=1 can have its norm equal to n. Hence, for GM we can guarantee the same bound. But its cost of iteration is much higher! Yu. Nesterov () Huge-scale optimization problems 13/32 March 9, 2012 13 / 32

Interpretation 2. α = 1 2. Denote { D (x 0 ) = max max x y X Then, R 2 1/2 (x 0) S 1/2 D 2 (x 0 ), and we obtain } max x (i) y (i) : f (x) f (x 0 ). 1 i N Note: [ N ] 2 φ k f 2 k L 1/2 i D (x 2 0 ). i=1 For the first order methods, the worst-case complexity of minimizing over a box depends on N. Since S 1/2 can be bounded, RCDM can be applied in situations when the usual GM fail. Yu. Nesterov () Huge-scale optimization problems 14/32 March 9, 2012 14 / 32

Interpretation 3. α = 1. Then R 0 (x 0 ) is the size of the initial level set in the standard Euclidean norm. Hence, φ k f [ N ] [ ] 2 k L i R0 2(x 0) 2N k 1 N N L i R0 2(x 0). i=1 i=1 Rate of convergence of GM can be estimated as f (x k ) f γ k R2 0 (x 0 ), where γ satisfies condition f (x) γ I, x R N. Note: maximal eigenvalue of symmetric matrix can reach its trace. In the worst case, the rate of convergence of GM is the same as that of RCDM. Yu. Nesterov () Huge-scale optimization problems 15/32 March 9, 2012 15 / 32

Minimizing the strongly convex functions Theorem. Let f (x) be strongly convex with respect to 1 α with convexity parameter σ 1 α > 0. Then, for {x k } generated by RCDM(α, x 0 ) we have φ k φ ( 1 σ 1 α S α ) k (f (x0 ) f ). Proof: Let x k be generated by RCDM after k iterations. Let us estimate the expected result of the next iteration. f (x k ) E ik (f (x k+1 )) = N N i=1 σ 1 α i=1 p (i) α [f (x k ) f (T i (x k ))] p (i) α 2L i [f i (x k)] 2 = 1 2S α ( f (x k ) 1 α )2 S α (f (x k ) f ). It remains to compute expectation in ξ k 1. Yu. Nesterov () Huge-scale optimization problems 16/32 March 9, 2012 16 / 32

Confidence level of the answers Note: We have proved that the expected values of random f (x k ) are good. Can we guarantee anything after a single run? Confidence level: Probability β (0, 1), that some statement about random output is correct. Main tool: Chebyschev inequality (ξ 0): Our situation: Prob [ξ T ] E(ξ) T. Prob [f (x k ) f ɛ] 1 ɛ [φ k f ] 1 β. We need φ k f ɛ (1 β). Too expensive for β 1? Yu. Nesterov () Huge-scale optimization problems 17/32 March 9, 2012 17 / 32

Regularization technique Consider f µ (x) = f (x) + µ 2 x x 0 2 1 α. It is strongly convex. Therefore, we can obtain φ k f µ ɛ (1 β) in Theorem. Define α = 1, µ = ( ) O 1 µ S 1 α ln ɛ (1 β) iterations. k 1 + 8S 1R 2 0 (x 0) ɛ ɛ 4R0 2(x, and choose 0) [ ] ln 2S 1R0 2(x 0) ɛ + ln 1 1 β. Let x k be generated by RCDM(1, x 0 ) as applied to f µ.then Prob (f (x k ) f ɛ) β. Note: β = 1 10 p ln 10 p = 2.3p. Yu. Nesterov () Huge-scale optimization problems 18/32 March 9, 2012 18 / 32

Implementation details: Random Counter Given the values L i, i = 1,..., N, generate efficiently random i {1,..., N} with probabilities Prob [i = k] = L k / N L j. Solution: a) Trivial O(N) operations. b). Assume N = 2 p. Define p + 1 vectors S k R 2p k, k = 0,..., p: S (i) 0 = L i, i = 1,..., N. j=1 S (i) k = S (2i) k 1 + S (2i 1) k 1, i = 1,..., 2 p k, k = 1,..., p. Algorithm: Make the choice in p steps, from top to bottom. If the element i of S k is chosen, then choose in S k 1 either 2i or 2i 1 in accordance to probabilities S(2i) k 1 or S(2i 1) k 1. S (i) k S (i) k Difference: for n = 2 20 > 10 6 we have p = log 2 N = 20. Yu. Nesterov () Huge-scale optimization problems 19/32 March 9, 2012 19 / 32

Sparse problems Problem: min f (x), x Q where Q is closed and convex in R N, and f (x) = Ψ(Ax), where Ψ is a simple convex function: Ψ(y 1 ) Ψ(y 2 ) + Ψ (y 2 ), y 1 y 2, y 1, y 2 R M, A : R N R M is a sparse matrix. Let p(x) def = # of nonzeros in x. Sparsity coefficient: γ(a) def = p(a) MN. Example 1: Matrix-vector multiplication Computation of vector Ax needs p(a) operations. Initial complexity MN is reduced in γ(a) times. Yu. Nesterov () Huge-scale optimization problems 20/32 March 9, 2012 20 / 32

Gradient Method x 0 Q, x k+1 = π Q (x k hf (x k )), k 0. Main computational expenses Projection of simple set Q needs O(N) operations. Displacement x k x k hf (x k ) needs O(N) operations. f (x) = A T Ψ (Ax). If Ψ is simple, then the main efforts are spent for two matrix-vector multiplications: 2p(A). Conclusion: As compared with full matrices, we accelerate in γ(a) times. Note: For Large- and Huge-scale problems, we often have γ(a) 10 4... 10 6. Can we get more? Yu. Nesterov () Huge-scale optimization problems 21/32 March 9, 2012 21 / 32

Sparse updating strategy Main idea After update x + = x + d we have y + def = Ax + = Ax What happens if d is sparse? Denote σ(d) = {j : d (j) 0}. Then y + = y + Its complexity, κ A (d) def = κ A (d) = M j σ(d) j σ(d) p(ae j ), γ(ae j ) = γ(d) 1 p(d) γ(d) max 1 j m γ(ae j) MN. j σ(d) }{{} y +Ad. d (j) Ae j. can be VERY small! j σ(d) γ(ae j ) MN If γ(d) cγ(a), γ(a j ) cγ(a), then κ A (d) c 2 γ 2 (A) MN. Expected acceleration: (10 6 ) 2 = 10 12 1 sec 32 000 years Yu. Nesterov () Huge-scale optimization problems 22/32 March 9, 2012 22 / 32

When it can work? Simple methods: No full-vector operations! (Is it possible?) Simple problems: Examples Functions with sparse gradients. 1 Quadratic function f (x) = 1 2 Ax, x b, x. The gradient f (x) = Ax b, x R N, is not sparse even if A is sparse. 2 Piece-wise linear function g(x) = max [ a i, x b (i) ]. Its 1 i m subgradient f (x) = a i(x), i(x) : f (x) = a i(x), x b (i(x)), can be sparse if a i is sparse! But: We need a fast procedure for updating max-operations. Yu. Nesterov () Huge-scale optimization problems 23/32 March 9, 2012 23 / 32

Fast updates in short computational trees Def: Function f (x), x R n, is short-tree representable, if it can be computed by a short binary tree with the height ln n. Let n = 2 k and the tree has k + 1 levels: v 0,i = x (i), i = 1,..., n. Size of the next level halves the size of the previous one: v i+1,j = ψ i+1,j (v i,2j 1, v i,2j ), j = 1,..., 2 k i 1, i = 0,..., k 1, where ψ i,j are some bivariate functions. v k 1,1 v k,1 v k 1,2......... v 2,1 v 2,n/4 v 1,1 v 1,2... v 1,n/2 1 v 1,n/2 v 0,1 v 0,2 v 0,3 v 0,4 v 0,n 3 v 0,n 2 v 0,n 1 v 0,n Yu. Nesterov () Huge-scale optimization problems 24/32 March 9, 2012 24 / 32

Main advantages Important examples (symmetric functions) f (x) = x ( p, p 1, ψ i,j (t 1, t 2 ) [ t 1 p + t 2 p ] 1/p, n ) f (x) = ln e x(i), ψ i,j (t 1, t 2 ) ln (e t 1 + e t 2 ), i=1 f (x) = max x (i), ψ i,j (t 1, t 2 ) max {t 1, t 2 }. 1 i n The binary tree requires only n 1 auxiliary cells. Its value needs n 1 applications of ψ i,j (, ) ( operations). If x + differs from x in one entry only, then for re-computing f (x + ) we need only k log 2 n operations. Thus, we can have pure subgradient minimization schemes with Sublinear Iteration Cost. Yu. Nesterov () Huge-scale optimization problems 25/32 March 9, 2012 25 / 32

Simple subgradient methods I. Problem: f def = min x Q f (x), where Q is a closed and convex and f (x) L(f ), x Q, the optimal value f is known. Consider the following optimization scheme (B.Polyak, 1967): ( x 0 Q, x k+1 = π Q x k f (x k) f ) f (x k ) 2 f (x k ), k 0. Denote f k = min 0 i k f (x i). Then for any k 0 we have: f k f L(f ) x 0 π X (x 0 ) (k+1) 1/2, x k x x 0 x, x X. Yu. Nesterov () Huge-scale optimization problems 26/32 March 9, 2012 26 / 32

Proof: Let us fix x X. Denote r k (x ) = x k x. r 2 k+1 (x ) Then x k f (x k) f f (x f (x k ) 2 k ) x 2 = rk 2(x ) 2 f (x k) f f (x f (x k ) 2 k ), x k x + (f (x k) f ) 2 f (x k ) 2 r 2 k (x ) (f (x k) f ) 2 f (x k ) 2 r 2 k (x ) (f k f ) 2 L 2 (f ). From this reasoning, x k+1 x 2 x k x 2, x X. Corollary: Assume X has recession direction d. Then x k π X (x 0 ) x 0 π X (x 0 ), d, x k d, x 0. (Proof: consider x = π X (x 0 ) + αd, α 0.) Yu. Nesterov () Huge-scale optimization problems 27/32 March 9, 2012 27 / 32

Constrained minimization (N.Shor (1964) & B.Polyak) II. Problem: min{f (x) : g(x) 0}, where x Q Q is closed and convex, f, g have uniformly bounded subgradients. Consider the following method. It has step-size parameter h > 0. ( ) If g(x k ) > h g (x k ), then (A): x k+1 = π Q x k g(x k) g (x g (x k ) 2 k, ) else (B): x k+1 = π Q (x k h f (x k ) f (x k ). Let F k {0,..., k} be the set (B)-iterations, and f k Theorem: If k > x 0 x 2 /h 2, then F k and = min i F k f (x i ). f k f (x) hl(f ), max g(x i ) hl(g). i F k Yu. Nesterov () Huge-scale optimization problems 28/32 March 9, 2012 28 / 32

Computational strategies 1. Constants L(f ), L(g) are known (e.g. Linear Programming) We can take h = of steps N (easy!). ɛ max{l(f ),L(g)}. Note: The standard advice is h = Then we need to decide on the number R N+1 (much more difficult!) 2. Constants L(f ), L(g) are not known Start from a guess. Restart from scratch each time we see the guess is wrong. The guess is doubled after restart. 3. Tracking the record value fk Double run. Other ideas are welcome! Yu. Nesterov () Huge-scale optimization problems 29/32 March 9, 2012 29 / 32

Application examples Observations: 1 Very often, Large- and Huge- scale problems have repetitive sparsity patterns and/or limited connectivity. Social networks. Mobile phone networks. Truss topology design (local bars). Finite elements models (2D: four neighbors, 3D: six neighbors). 2 For p-diagonal matrices κ(a) p 2. Yu. Nesterov () Huge-scale optimization problems 30/32 March 9, 2012 30 / 32

Nonsmooth formulation of Google Problem Main property of spectral radius (A 0) If A R+ n n, then ρ(a) = min max 1 e x 0 1 i n x (i) i, Ax. The minimum is attained at the corresponding eigenvector. Since ρ(ē) = 1, our problem is as follows: f (x) def = max 1 i N [ e i, Ēx x (i) ] min x 0. Interpretation: Maximizing the self-esteem! Since f = 0, we can apply Polyak s method with sparse updates. Additional features; the optimal set X is a convex cone. If x 0 = e, then the whole sequence is separated from zero: x, e x, x k x 1 x k = x, e x k. Goal: Find x 0 such that x 1 and f ( x) ɛ. (First condition is satisfied automatically.) Yu. Nesterov () Huge-scale optimization problems 31/32 March 9, 2012 31 / 32

Computational experiments: Iteration Cost We compare Polyak s GM with sparse update (GM s ) with the standard one (GM). Setup: Each agent has exactly p random friends. Thus, κ(a) p 2. Iteration Cost: GM s p 2 log 2 N, Time for 10 4 iterations (p = 32) N κ(a) GM s GM 1024 1632 3.00 2.98 2048 1792 3.36 6.41 4096 1888 3.75 15.11 8192 1920 4.20 139.92 16384 1824 4.69 408.38 GM pn. Time for 10 3 iterations (p = 16) N κ(a) GM s GM 131072 576 0.19 213.9 262144 592 0.25 477.8 524288 592 0.32 1095.5 1048576 608 0.40 2590.8 1 sec 100 min! Yu. Nesterov () Huge-scale optimization problems 32/32 March 9, 2012 32 / 32