January 29, Introduction to optimization and complexity. Outline. Introduction. Problem formulation. Convexity reminder. Optimality Conditions

Olga Galinina olga.galinina@tut.fi ELT-53656 Network Analysis Dimensioning II Department of Electronics Communications Engineering Tampere University of Technology, Tampere, Finl January 29, 2014

1 2 3 4 5

A bit of a Hisry... Leonhard Euler (1707-1783): nothing at all takes place in the Universe in which some rule of maximum or minimum does not appear

Where Network Optimization Arises? Optimization discipline deals with finding the maxima minima of functions subject some constraints Transportation Systems Transportation of goods over transportation networks Scheduling of fleets of airplanes Manufacturing Systems Scheduling of goods for manufacturing Flow of manufactured items within invenry systems Communication Systems Design expansion of communication systems Flow of information across networks Energy Systems, Financial Systems, much more

Examples Portfolio variables: amounts invested in different assets constraints: budget, max./min. investment per asset, minimum return objective: overall risk or return variance Device sizing in electronic circuits variables: device widths lengths constraints: manufacturing limits, timing requirements, maximum area objective: power consumption Data fitting variables: model parameters constraints: prior information, parameter limits objective: measure of prediction error

Conventional Design Method Initial design System analysis Is design satisfacry? No Correct the design basing on experience Depends on the designers intuition, experience skills Trial--error method Not easy apply a complex system Does not always lead the best possible design Qualitative design

1 2 3 4 5

Mathematical Model Consider an (mathematical) problem minimize f (x), x R n subject x Ω. Definition The function f (x) : R R is a real-valued function, called the objective function, or cost function. Definition The variables x = [x 1,..., x n ] are variables.

Constraint Set Definition The set Ω R n is the constraint set or feasible set/region. Ω takes form: {x : h i (x) = 0, g j (x) < 0}, where h i (x), g i (x) are constraint functions

Constraint Set Definition The set Ω R n is the constraint set or feasible set/region. Definition The above problem is a general form of a constrained problem. If Ω = R n, the problem is we refer the problem unconstrained. Ω takes form: {x : h i (x) = 0, g j (x) < 0}, where h i (x), g i (x) are constraint functions

Solving Optimization s General problem very difficult solve methods involve some compromise, e.g., very long computation time, or not always finding the solution Exceptions: certain problem classes can be solved efficiently reliably least-squares problems linear programming problems convex problems

Least-squares problem minimize Ax b 2 2 analytical solution: x 0 = (A T A) 1 A T reliable efficient algorithms software computation time proportional n 2 k (A R k n ); less if structured a mature technology Using least-squares least-squares problems are easy recognize a few stard techniques increase flexibility (e.g., including weights, adding regularization terms)

Linear programming minimize c T x subject a T i x b i, i = 1,..., m Solving linear programs no analytical formula for solution reliable efficient algorithms software computation time proportional n 2 m if m n; less with structure a mature technology Using linear programming not as easy recognize as least-squares problems a few stard tricks used convert problems in linear programs (e.g., problems involving l 1 - or l -norms, piecewise-linear functions)

Convex problem minimize f (x) subject g i (x) b i, i = 1,..., m objective constraint functions are convex: g i (α 1 x + α 2 y) α 1 g i (x) + α 2 g i (y), if α 1 + α 2 = 1, α 1 0, α 1 0. includes least-squares problems linear programs as special cases

Convex problems Solving convex problems no analytical solution reliable efficient algorithms computation time (roughly) proportional max{n 3, n 2 m, F }, where F is cost of evaluating f (x) its first second derivatives almost a technology Using convex often difficult recognize many tricks for transforming problems in convex form surprisingly many problems can be solved via convex

Nonlinear Traditional techniques for general nonconvex problems involve compromises Local methods (nonlinear programming) find a point that minimizes f among feasible points near it fast, can hle large problems require initial guess provide no information about distance (global) optimum Global methods find the (global) solution worst-case grows exponentially with problem size These algorithms are often based on solving convex subproblems

Brief hisry of convex (1900-1970) Algorithms 1947: simplex algorithm for linear programming (Dantzig) 1960s: early interior-point methods (Fiacco McCormick, Dikin,... ) 1970s: ellipsoid method other subgradient methods 1980s: polynomial-time interior-point methods for linear programming (Karmarkar 1984) late 1980s-now: polynomial-time interior-point methods for nonlinear convex (Nesterov Nemirovski 1994) Applications before 1990: mostly in operations research; few in engineering since 1990: many new applications in engineering (control, signal processing, communications, circuit design,... ); new problem classes

Mental break The last possibility is not the case. Answer: 2/(4 1) = 2/3 A kitten has a 50/50 chance be male or female. My cat just delivered two adorable kittens. My veterinarian said that at least one of them is female. What is the probability that the other kitten is a boy? There are 4 variants: female-female female-male male-female male-male

1 2 3 4 5

Affine Convex Sets Definition S R n is affine if [x, y S, α R] αx + (1 α)y S If x 1,..., x m R n, j α j = 1, α j > 0, then x = α 1 x 1 +... + α 1 x 1 is a convex combination of x 1,..., x m. The intersection of (any number of) convex sets is convex

Affine Convex Sets Definition S R n is affine if [x, y S, α R] αx + (1 α)y S Definition S R n is convex if for all [x, y S, 0 < α < 1] z = αx + (1 α)y S (z a convex combination of x y). If x 1,..., x m R n, j α j = 1, α j > 0, then x = α 1 x 1 +... + α 1 x 1 is a convex combination of x 1,..., x m. The intersection of (any number of) convex sets is convex

Compact Sets Let B δ (x 0 ) denote the open ball of radius δ centered at the point x: B δ (x 0 ) = {x : x x 0 < δ}. Definition Set S R n is said be open if for each point x 0 S there is δ such that B δ (x 0 ). A set S R n is said be closed if its complement R n \ S is open. Alternative: every sequence in S has a convergent subsequence, whose limit lies in S. Note: If S R n, closed bounded, then S - compact (Heine-Borel theorem).

Compact Sets Let B δ (x 0 ) denote the open ball of radius δ centered at the point x: B δ (x 0 ) = {x : x x 0 < δ}. Definition Set S R n is said be open if for each point x 0 S there is δ such that B δ (x 0 ). A set S R n is said be closed if its complement R n \ S is open. Definition Set S is compact if each of its open covers has a finite subcover: {C i } i A, S {C i } i A finite J : S {C j } j J. Alternative: every sequence in S has a convergent subsequence, whose limit lies in S. Note: If S R n, closed bounded, then S - compact (Heine-Borel theorem).

Convex functions Definition Let C R n be a nonempty convex set. Then f : C R is convex (on C) if for all x, y C all α (0, 1): f (αx + (1 α)y) αf (x) + (1 α)f (y) If strict inequality holds whenever x y, then f is said be strictly convex. The negative of a (strictly) convex function is called a (strictly) concave function.

Convex functions nonnegative multiple: αf is convex if f is convex, α 0 sum: f 1 + f 2 convex if f 1, f 2 convex (extends infinite sums, integrals) composition with affine function: f (Ax + b) is convex if f is convex Some univariate convex functions: 1. exponential f (x) = e αx (for all real α) 2. powers f (x) = x p if x 0 1 p < 3. powers of abs valuef (x) = x p, if x > 0 < p 0 Concave: 1. powers f (x) = x p if x 0 0 p 1 2. logarithm: f (x) = log x if x > 0.

Differentials f is dierentiable if dom(f ) is open the gradient of f : f (x) ( f,..., f ) T, x dom (f ) x 1 x n f is twice dierentiable if dom f is open the Hessian of f : H D 2 f (x) = 2 f x 2 1... 2 f x 1 x n... 2 f x n x 1... 2 f xn 2, x dom (f ) Note: Not all convex functions are differentiable.

First-order condition Theorem (gradient inequality) Differentiable f is convex on convex C R n i.f.f. x, y C f (y) f (x) + ( f (x)) T (y x).

First-order condition Theorem (gradient inequality) Differentiable f is convex on convex C R n i.f.f. x, y C f (y) f (x) + ( f (x)) T (y x). Theorem Minimizing differentiable convex function f (x) s.t. x C Find x C such that ( f (x )) T (y x ) 0 for all y C (variational inequality problem)

Second-order condition Theorem Twice differentiable f is convex on C R n i.f.f Hessian matrix 2 f is positive semidefinite for all x C. Note: If 2 f (x) is positive definite for all x C, then f is strictly convex on C. The converse is false. Example: consider the function f (x) = x 4

1 2 3 4 5

Minimization Find an optimal decision x (minimizer WLG): Definition x Ω is a local minimizer (minimum) of f over Ω if there exists ɛ > 0 such that f (x) f (x ) for all x Ω \ N(x ), where N(x ) is a neighborhood of x. Typically, N(x ) is just some open ball B δ (x ) If we replace with > then we have a strict local minimizer a strict global minimizer. Then f (x) is the global minimum value.

Minimization Find an optimal decision x (minimizer WLG): Definition x Ω is a local minimizer (minimum) of f over Ω if there exists ɛ > 0 such that f (x) f (x ) for all x Ω \ N(x ), where N(x ) is a neighborhood of x. Typically, N(x ) is just some open ball B δ (x ) Definition x Ω is a global minimizer (minimum) of f over Ω if f (x) f (x ) for all x Ω \ {x }. If we replace with > then we have a strict local minimizer a strict global minimizer. Then f (x) is the global minimum value.

The Method of Lagrange Multipliers minimize f (x) s.t. c i (x) = 0, i = 1,.., m, x R n, m n Jacobian matrix of the mapping c(x) = (c 1 (x),..., c m (x)): Lagrange theorem c(x) = c 1 x 1...... c m x 1... c 1 x n c m x n For local minimizer x continiously differentiable f, c 1,..., c m, y 1,...y m: m f (x) y i c i (x ) = 0 i=1

The Method of Lagrange Multipliers Lagrange multipliers: y 1,...y me Lagrange function (Lagrangian): Partial gradients: x L = L(x, y) = f (x) m y i c i (x) i=1 ( L,..., L ) m = f (x) y i c i (x) x 1 x n i=1 ( ) L L y L =,..., = c(x) y 1 y m

The Karush-Kuhn-Tucker Theorem minimize f (x) s.t. c i (x) 0, i = 1,.., m, s.t. h i (x) = 0, i = 1,.., l Active or binding constrait at x 0 : c i (x 0 ) = 0 Theorem For local minimizer x continiously differentiable f, c i, h i, λ 0, λ 1,..., λ m, µ 1,..., µ l : λ 0 f (x) m λ i c i (x ) i=1 l µ i h i (x ) = 0 i=1 λ i c i (x ) = 0, i = 1,..., m (complementary slackness) λ, λ i 0, i = 1,..., m (dual feasibility) c i (x) 0, h i (x) = 0 (primal feasibility)

Mental break How many apples at equal distances from each other can I have? Place three apples on a plane add one more below the plane. They form a tetrahedron. 3 + 1 = 4

1 2 3 4 5

Computational Answers questions: What is an efficient algorithm? How do we measure efficiency? Computational of an algorithm A measure of how many steps the algorithm will require in the worst case for an input of a given size or

Algorithms A problem e.g. Traveling Salesman : Given a graph with nodes edges costs associated with the edges, what is a least-cost closed walk (or ur) containing each of the nodes exactly once? An instance of a problem The graph contains nodes 1, 2, 3, 4, 5, 6, edges (1, 2) with cost 10, (1, 3) with cost 14,... can be thought of as a function p that maps an instance x an output p(x) (an answer).

Measuring Computational By counting the number of elementary operations addition (a + b) subtraction (a b) multiplication (a b) finite-precision division ( a b ) comparison of two numbers (a < b). or running time of the algorithm A simple function of the input size that is a reasonably tight upper bound on the actual number of steps Examples 100 (t 2 + t) = O(t 2 ), but 0.0001t 3 O(t 2 )

Measuring Computational By counting the number of elementary operations addition (a + b) subtraction (a b) multiplication (a b) finite-precision division ( a b ) comparison of two numbers (a < b). or running time of the algorithm A simple function of the input size that is a reasonably tight upper bound on the actual number of steps Big-O notation We say that f (t) = O(g(t)), t 0 if c > 0: for t >0 f (t) cg(t). Examples 100 (t 2 + t) = O(t 2 ), but 0.0001t 3 O(t 2 )

classes