LMI Methods in Optimal and Robust Control

LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise)

What is Optimization? An Optimization Problem has 3 parts. x F f(x) : subject to g i (x) 0 i = 1, K 1 h i (x) = 0 i = 1, K 2 Variables: x F The things you must choose. F represents the set of possible choices for the variables. Can be vectors, matrices, functions, systems, locations, colors... However, computers prefer vectors or matrices. Objective: f(x) A function which assigns a scalar value to any choice of variables. e.g. [x1, x 2] x 1 x 2; red 4; et c. Constraints: g(x) 0; h(x) = 0 Defines what is a imally acceptable choice of variables. Equality and Inequality constraints are common. x is OK if g(x) 0 and h(x) = 0. Constraints mean variables are not independent. (Constrained optimization is much harder). M. Peet Lecture 02: Optimization 2 / 31

Formulating Optimization Problems What do we need to know? Topics to Cover: Formulating Constraints Tricks of the Trade for expressing constraints. Converting everything to equality and inequality constraints. Equivalence: How to Recognize if Two Optimization Problems are Equivalent. May be true despite different variables, constraints and objectives Knowing which Problems are Solvable The Convex Ones. Some others, if the problem is relatively small. M. Peet Lecture 02: Optimization 3 / 31

Least Squares Unconstrained Optimization Problem: Given a bunch of data in the form Inputs: a i Outputs: b i Find the function f(a) = b which best fits the data. For Least Squares: Assume f(a) = z T a + z 0 where z R n, z 0 R are the variables with objective K K h(z) := f(a i ) b i 2 = z T a i + z 0 b i 2 i=1 The Optimization Problem is: where A := i=1 z Rn Az b 2 a T 1 1. a T K 1 b := b 1. b K M. Peet Lecture 02: Optimization 4 / 31

Least Squares Unconstrained Optimization Least squares problems are easy-ish to solve. z = (A T A) 1 A T b Note that A is assumed to be skinny. More rows than columns. More data points than inputs (dimension of a i is small). M. Peet Lecture 02: Optimization 5 / 31

Integer Programg Example MAX-CUT Optimization of a graph. Graphs have Nodes and Edges. Figure: Division of a set of nodes to maximize the weighted cost of separation Goal: Assign each node i an index x i = 1 or x j = 1 to maximize overall cost. The cost if x i and x j do not share the same index is w ij. The cost if they share an index is 0 The weight w i,j are given. M. Peet Lecture 02: Optimization 6 / 31

Integer Programg Example MAX-CUT Goal: Assign each node i an index x i = 1 or x j = 1 to maximize overall cost. Variables: x { 1, 1} n Referred to as Integer Variables or Binary Variables. Binary constraints can be incorporated explicitly: x 2 i = 1 Integer/Binary variables may be declared directly in YALMIP: > x = intvar(n); > y = binvar(n); M. Peet Lecture 02: Optimization 7 / 31

Integer Programg Example MAX-CUT Objective: We use the trick: (1 x i x j ) = 0 if x i and x j have the same sign (Together). (1 x i x j ) = 2 if x i and x j have the opposite sign (Apart). Then the objective function is 1 w i,j (1 x i x j ) 2 i,j The optimization problem is the integer program: 1 max x 2 i =1 2 w i,j (1 x i x j ) i,j M. Peet Lecture 02: Optimization 8 / 31

MAX-CUT The optimization problem is the integer program: 1 max x 2 i =1 2 w i,j (1 x i x j ) Consider the MAX-CUT problem with 5 nodes i,j w 12 = w 23 = w 45 = w 15 = w 34 =.5 and w 14 = w 24 = w 25 = 0 where w ij = w ji. An Optimal Cut IS: x 1 = x 3 = x 4 = 1 x 2 = x 5 = 1 This cut has objective value f(x) = 2.5.5x 1 x 2.5x 2 x 3.5x 3 x 4.5x 4 x 5.5x 1 x 5 = 4 12 5 1 3 4 Figure: An Optimal Cut M. Peet Lecture 02: Optimization 9 / 31

Dynamic Programg and LQR Open-Loop Case Objective Function: Lets imize a quadratic Cost N 1 h(x, u) := x(n) T Sx(N) + x(k) T Qx(k) + u(k) T Ru(k) k=1 Variables: The states x(k), and inputs, u(k). Constraint: The dynamics define how u x. x(k + 1) = Ax(k) + Bu(k), x(0) = 1 k = 0,, N Note: The choice of x(0) = 1 does not affect the solution Combined: x,u x(n) T Sx(N) + N 1 k=1 x(k + 1) = Ax(k) + Bu(k), x(0) = 1 ( x(k) T Qx(k) + u(k) T Ru(k) ) k = 0,, N M. Peet Lecture 02: Optimization 10 / 31

Dynamic Programg and LQR Closed-Loop Case Objective Function: Lets imize a quadratic Cost N 1 x(n) T Sx(N) + x(k) T Qx(k) + u(k) T Ru(k) k=1 Variables: We want a fixed policy (gain matrix, K) which deteres u(k) based on x(k) as u(k) = Kx(k). Constraint: The dynamics define how u x. Combined: x,u x(k + 1) = Ax(k) + Bu(k), u(k) = Kx(k), x(0) = 1 x(n) T Sx(N) + N 1 k=1 x(k + 1) = Ax(k) + Bu(k), u(k) = Kx(k), x(0) = 1 k = 0,, N ( x(k) T Qx(k) + u(k) T Ru(k) ) k = 0,, N Question: Are the Closed-Loop and Open-Loop Problems Equivalent? M. Peet Lecture 02: Optimization 11 / 31

Formulating Optimization Problems Equivalence Definition 1. Two optimization problems are Equivalent if a solution to one can be used to construct a solution to the other. Example 1: Equivalent Objective Functions Problem 1: Problem 2: Problem 3: x f(x) subject to A T x b 10f(x) 12 x subject to A T x b 1 max x f(x) subject to A T x b In this case x 1 = x 2 = x 3. Proof: For any x x 1 (both feasible), we have f(x) > f(x 1). Thus 10f(x) 12 > 10f(x 1) 12 and 1 f(x) < 1 f(x 1 ). i.e x is suboptimal for all. M. Peet Lecture 02: Optimization 12 / 31

Formulating Optimization Problems Equivalence in Variables Example 2: Equivalent Variables Problem 1: Problem 2: x f(x) subject to A T x b x f(t x + c) subject to (T T A) T x b A T c Here x 1 = T x 2 + c and x 2 = T 1 (x 1 c). Change of variables is invertible. (given x x 2, you can show it is suboptimal) Example 3: Variable Separability Problem 1: Problem 2: Problem 3: x,y f(x) + g(y) subject to AT 1 x b 1, A T 2 y b 2 x f(x) subject to A T 1 x b 1 y g(y) subject to A T 2 y b 2 Here x 1 = x 2 and y 1 = y 3. Neither feasibility nor imality are coupled (Objective fn. is Separable). M. Peet Lecture 02: Optimization 13 / 31

Formulating Optimization Problems Constraint Equivalence Example 4: Constraint/Objective Equivalence Problem 1: Problem 2: x f(x) subject to g(x) 0 x,t t subject to g(x) 0, t f(x) Here x 1 = x 2 and t 2 = f(x 1). Some other Equivalences: Redundant Constraints {x R : x > 1} vs. {x R : x > 1, x > 0} Polytopes (Vertices vs. Hyperplanes) {x R n : x = i Aiαi, i αi = 1} vs. {x Rn : Cx > b} M. Peet Lecture 02: Optimization 14 / 31

Machine Learning Classification and Support-Vector Machines In Classification we have inputs (data) (x i ), each of which has a binary label (y i { 1, +1}) y i = +1 means the output of x i belongs to group 1 y i = 1 means the output of x i belongs to group 2 We want to find a rule (a classifier) which takes the data x and predicts which group it is in. Our rule has the form of a function f(x) = w T x b. Then x is in group 1 if f(x) = w T x b > 0. x is in group 2 if f(x) = w T x b < 0. 10 8 6 4 2 y=+1 y=-1 H w H w1 H w2 0 Question: How to find the best w and b?? -2 0 2 4 6 8 10 Figure: We want to find a rule which separates two sets of data. M. Peet Lecture 02: Optimization 15 / 31

Machine Learning Classification and Support-Vector Machines Definition 2. A Hyperplane is the generalization of the concept of line/plane to multiple dimensions. {x R n : w T x b = 0} Half-Spaces are the parts above and below a Hyperplane. {x R n : w T x b 0} OR {x R n : w T x b 0} M. Peet Lecture 02: Optimization 16 / 31

Machine Learning Classification and Support-Vector Machines We want to separate the data into disjoint half-spaces and maximize the distance between these half-spaces Variables: w R n and b define the hyperplane Constraint: Each existing data point should be correctly labelled. w T x b > 1 when y i = +1 and w T x b < 1 when y i = 1 (Strict Separation) Alternatively: y i (w T x i b) 1. These two constraints are Equivalent. Figure: Maximizing the distance between two sets of Data Objective: The distance between Hyperplanes {x : w T x b = 1} and {x : w T x b = 1} is 1 f(w, b) = 2 wt w M. Peet Lecture 02: Optimization 17 / 31

Machine Learning Unconstrained Form (Soft-Margin SVM) Machine Learning algorithms solve 1 w R p,b R 2 wt w, subject to y i (w T x i b) 1, i = 1,..., K. Soft Margin Problems The hard margin problem can be relaxed to maximize the distance between hyperplanes PLUS the magnitude of classification errors 9 8 7 6 5 4 Soft Margin SVM Problem data with y=-1 data with y=+1 H w H w1 H w2 H ξ 1 w R p,b R 2 w 2 +c n max(0, 1 (w T x i b)y i ). i=1 Link: Repository of Interesting Machine Learning Data Sets 3 2 1 0-2 0 2 4 6 8 10 Figure: Data separation using soft-margin metric and distances to associated hyperplanes M. Peet Lecture 02: Optimization 18 / 31

Geometric vs. Functional Constraints These Problems are all equivalent: The Classical Representation: x R n f(x) : g i (x) 0 subject to i = 1, k The Geometric Representation is: f(x) : subject to x S x Rn where S := {x R n : g i (x) 0, i = 1,, k}. The Pure Geometric Representation (x is eliated!): γ γ : subject to S γ (S γ has at least one element) where S γ := {x R n : γ f(x) 0, g i (x) 0, i = 1,, k}. Proposition: Optimization is only as hard as detering feasibility! M. Peet Lecture 02: Optimization 19 / 31

Solving by Bisection (Do you have an Oracle?) Optimization Problem: γ = max γ : γ subject to S γ Bisection Algorithm (Convexity???): 1 Initialize infeasible γ u = b 2 Initialize feasible γ l = a 3 Set γ = γu+γ l 2 5 If S γ feasible, set γ l = γu+γ l 2 4 If S γ infeasible, set γ u = γu+γ l 2 6 k = k + 1 7 Goto 3 Then γ [γ l, γ u ] and γ u γ l b a 2 k. Bisection with oracle also solves the Primary Problem. ( γ : S γ = ) γ S u γ = γ 2 S γ = S γ 5 γ 4 S γ γ 3 S γ γ S 1 γ γ L S γ M. Peet Lecture 02: Optimization 20 / 31

Computational Complexity In Computer Science, we focus on Complexity of the PROBLEM NOT complexity of the algorithm. On an Turing machine, the # of steps is a fn of problem size (number of variables) NL: A logarithmic # (SORT) P: A polynomial # (LP) NP: A polynomial # for verification (TSP) NP HARD: at least as hard as NP (TSP) NP COMPLETE: A set of Equivalent* NP problems (MAX-CUT, TSP) EXPTIME: Solvable in 2 p(n) steps. p polynomial. (Chess) EXPSPACE: Solvable with 2 p(n) memory. *Equivalent means there is a polynomial-time reduction from one to the other. M. Peet Lecture 02: Optimization 21 / 31

How Hard is Optimization? The Classical Representation: x R nf(x) : g i (x) 0 h i (x) = 0 subject to i = 1, k i = 1, k Answer: Easy (P) if f, g i are all Convex and h i are affine. The Geometric Representation: f(x) : subject to x S x Rn Answer: Easy (P) if f is Convex and S is a Convex Set. The Pure Geometric Representation: max γ : subject to γ,x Rn (γ, x) S Answer: Easy (P) if S is a Convex Set. M. Peet Lecture 02: Optimization 22 / 31

Convex Functions Definition 3. An OBJECTIVE FUNCTION or CONSTRAINT function is convex if f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ) for all λ [0, 1]. Useful Facts: e ax, x are convex. x n (n 1 or n 0), log x are convex on x 0 If f 1 is convex and f 2 is convex, then f 3 (x) := f 1 (x) + f 2 (x) is convex. A function is convex if 2 f(x) is positive semidefinite for all x. If f 1, f 2 are convex, then f 3 (x) := max(f 1 (x), f 2 (x)) is convex. If f 1, f 2 are convex, and f 1 is increasing, then f 3 (x) := f 1 (f 2 (x)) is convex. M. Peet Lecture 02: Optimization 23 / 31

Convex Sets Definition 4. A FEASIBLE SET is convex if for any x, y Q, {µx + (1 µ)y : µ [0, 1]} Q. The line connecting any two points lies in the set. Facts: If f is convex, then {x : f(x) 0} is convex. The intersection of convex sets is convex. If S1 and S 2 are convex, then S 2 := {x : x S 1, x S 2} is convex. M. Peet Lecture 02: Optimization 24 / 31

Descent Algorithms (Why Convex Optimization is Easy) Unconstrained Optimization All descent algorithms are iterative, with a search direction ( x R n ) and step size (t 0). x k+1 = x k + t x Gradient Descent x = f(x) Newton s Algorithm: x = ( 2 f(x)) 1 f(x) Tries to solve the equation f(x) = 0. Both converge for sufficiently small step size. M. Peet Lecture 02: Optimization 25 / 31

Descent Algorithms Dealing with Constraints Method 1: Gradient Projection Figure: Must project step (t x) onto feasible Set Method 2: Barrier Functions f(x) + log(g(x)) x Converts a Constrained problem to an unconstrained problem. (Interior-Point Methods) M. Peet Lecture 02: Optimization 26 / 31

Non-Convexity and Local Optima 1. For convex optimization problems, Descent Methods Always find the global optimal point. 2. For non-convex optimization, Descent Algorithms may get stuck at local optima. M. Peet Lecture 02: Optimization 27 / 31

imize c Important Classes of Optimization T xproblems Linear Programg Example Linear Programg (LP) imize x 1 + x 2 subject x : subject to x R nct to 3x 1 + x 2 3 Ax b x 2 1 A x = b x 1 4 x 1 + 5x 2 20 x EASY: Simplex/Ellipsoid 1 + 4x Algorithm 2 20 (P) Can solve for >10,000 variables subject to Ax = b Cx d Link: A List of Solvers, Performance and Benchmark Problems M. Peet Lecture 02: Optimization 28 / 31

Important Classes of Optimization Problems Quadratic Programg Quadratic Programg (QP) Qx + c T x : x R nxt Ax b subject to EASY (P): If Q 0. HARD (NP-Hard): Otherwise M. Peet Lecture 02: Optimization 29 / 31

Important Classes of Optimization Problems Mixed-Integer Linear Programg Mixed-Integer Linear Programg (MILP) x : x R nct Ax b x i Z subject to i = 1, K HARD (NP-Hard) Mixed-Integer NonLinear Programg (MINLP) x R nf(x) : g i (x) 0 x i Z subject to i = 1, K Very Hard M. Peet Lecture 02: Optimization 30 / 31

Next Time: Positive Matrices, SDP and LMIs Also a bit on Duality, Relaxations. M. Peet Lecture 02: Optimization 31 / 31