LMI Methods in Optimal and Robust Control

Similar documents
Modern Optimal Control

Machine Learning. Support Vector Machines. Manfred Huber

Support Vector Machines

Nonlinear Programming Models

Math 273a: Optimization Basic concepts

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Integer Programming ISE 418. Lecture 12. Dr. Ted Ralphs

6.252 NONLINEAR PROGRAMMING LECTURE 10 ALTERNATIVES TO GRADIENT PROJECTION LECTURE OUTLINE. Three Alternatives/Remedies for Gradient Projection

E5295/5B5749 Convex optimization with engineering applications. Lecture 5. Convex programming and semidefinite programming

Numerical Optimization. Review: Unconstrained Optimization

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Convex Optimization M2

ICS-E4030 Kernel Methods in Machine Learning

Module 04 Optimization Problems KKT Conditions & Solvers

Lecture 1: January 12

Semidefinite Programming Basics and Applications

Support Vector Machine (SVM) and Kernel Methods

Lecture Notes on Support Vector Machine

Support Vector Machines for Classification and Regression

4. Convex optimization problems

4F3 - Predictive Control

Statistical Pattern Recognition

Two hours. To be provided by Examinations Office: Mathematical Formula Tables. THE UNIVERSITY OF MANCHESTER. xx xxxx 2017 xx:xx xx.

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

Support Vector Machines

Convex Optimization and Modeling

Introduction to Nonlinear Stochastic Programming

Lecture: Convex Optimization Problems

Homework 4. Convex Optimization /36-725

Support Vector Machines and Kernel Methods

Duality Theory of Constrained Optimization

CS-E4830 Kernel Methods in Machine Learning

Support Vector Machine (SVM) and Kernel Methods

8. Geometric problems

Support Vector Machine (continued)

1 Strict local optimality in unconstrained optimization

Applications of Linear Programming

Support Vector Machine (SVM) and Kernel Methods

Interior Point Methods for Mathematical Programming

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Linear and non-linear programming

Constrained Optimization and Lagrangian Duality

6-1 The Positivstellensatz P. Parrilo and S. Lall, ECC

Lecture Support Vector Machine (SVM) Classifiers

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 5

Lecture 2: Convex Sets and Functions

Learning by constraints and SVMs (2)

Operations Research Lecture 4: Linear Programming Interior Point Method

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Linear & nonlinear classifiers

Canonical Problem Forms. Ryan Tibshirani Convex Optimization

4F3 - Predictive Control

ML (cont.): SUPPORT VECTOR MACHINES

9. Geometric problems

IE 5531: Engineering Optimization I

Convex optimization problems. Optimization problem in standard form

Outline. Outline. Outline DMP204 SCHEDULING, TIMETABLING AND ROUTING. 1. Scheduling CPM/PERT Resource Constrained Project Scheduling Model

Support Vector Machines

10 Numerical methods for constrained problems

1. Method 1: bisection. The bisection methods starts from two points a 0 and b 0 such that

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

8. Geometric problems

Symmetric Matrices and Eigendecomposition

A strongly polynomial algorithm for linear systems having a binary solution

Interior Point Methods in Mathematical Programming

Convex Optimization. Prof. Nati Srebro. Lecture 12: Infeasible-Start Newton s Method Interior Point Methods

Max Margin-Classifier

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Integer Programming ISE 418. Lecture 13. Dr. Ted Ralphs

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Multi-class SVMs. Lecture 17: Aykut Erdem April 2016 Hacettepe University

Convex Optimization and Modeling

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

minimize x subject to (x 2)(x 4) u,

LECTURE 7 Support vector machines

Linear & nonlinear classifiers

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Learning the Kernel Matrix with Semi-Definite Programming

Lecture: Examples of LP, SOCP and SDP

Conditional Gradient (Frank-Wolfe) Method

Machine Learning A Geometric Approach

Optimization of Polynomials

Agenda. 1 Cone programming. 2 Convex cones. 3 Generalized inequalities. 4 Linear programming (LP) 5 Second-order cone programming (SOCP)

MATH 4211/6211 Optimization Basics of Optimization Problems

A Brief Review on Convex Optimization

5. Duality. Lagrangian

Mathematical Optimization Models and Applications

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

CSCI 1951-G Optimization Methods in Finance Part 09: Interior Point Methods

MLCC 2017 Regularization Networks I: Linear Models

Solution Methods. Richard Lusby. Department of Management Engineering Technical University of Denmark

15. Conic optimization

Convex Optimization in Classification Problems

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Lecture 8. Strong Duality Results. September 22, 2008

Machine Learning And Applications: Supervised Learning-SVM

Transcription:

LMI Methods in Optimal and Robust Control Matthew M. Peet Arizona State University Lecture 02: Optimization (Convex and Otherwise)

What is Optimization? An Optimization Problem has 3 parts. x F f(x) : subject to g i (x) 0 i = 1, K 1 h i (x) = 0 i = 1, K 2 Variables: x F The things you must choose. F represents the set of possible choices for the variables. Can be vectors, matrices, functions, systems, locations, colors... However, computers prefer vectors or matrices. Objective: f(x) A function which assigns a scalar value to any choice of variables. e.g. [x1, x 2] x 1 x 2; red 4; et c. Constraints: g(x) 0; h(x) = 0 Defines what is a imally acceptable choice of variables. Equality and Inequality constraints are common. x is OK if g(x) 0 and h(x) = 0. Constraints mean variables are not independent. (Constrained optimization is much harder). M. Peet Lecture 02: Optimization 2 / 31

Formulating Optimization Problems What do we need to know? Topics to Cover: Formulating Constraints Tricks of the Trade for expressing constraints. Converting everything to equality and inequality constraints. Equivalence: How to Recognize if Two Optimization Problems are Equivalent. May be true despite different variables, constraints and objectives Knowing which Problems are Solvable The Convex Ones. Some others, if the problem is relatively small. M. Peet Lecture 02: Optimization 3 / 31

Least Squares Unconstrained Optimization Problem: Given a bunch of data in the form Inputs: a i Outputs: b i Find the function f(a) = b which best fits the data. For Least Squares: Assume f(a) = z T a + z 0 where z R n, z 0 R are the variables with objective K K h(z) := f(a i ) b i 2 = z T a i + z 0 b i 2 i=1 The Optimization Problem is: where A := i=1 z Rn Az b 2 a T 1 1. a T K 1 b := b 1. b K M. Peet Lecture 02: Optimization 4 / 31

Least Squares Unconstrained Optimization Least squares problems are easy-ish to solve. z = (A T A) 1 A T b Note that A is assumed to be skinny. More rows than columns. More data points than inputs (dimension of a i is small). M. Peet Lecture 02: Optimization 5 / 31

Integer Programg Example MAX-CUT Optimization of a graph. Graphs have Nodes and Edges. Figure: Division of a set of nodes to maximize the weighted cost of separation Goal: Assign each node i an index x i = 1 or x j = 1 to maximize overall cost. The cost if x i and x j do not share the same index is w ij. The cost if they share an index is 0 The weight w i,j are given. M. Peet Lecture 02: Optimization 6 / 31

Integer Programg Example MAX-CUT Goal: Assign each node i an index x i = 1 or x j = 1 to maximize overall cost. Variables: x { 1, 1} n Referred to as Integer Variables or Binary Variables. Binary constraints can be incorporated explicitly: x 2 i = 1 Integer/Binary variables may be declared directly in YALMIP: > x = intvar(n); > y = binvar(n); M. Peet Lecture 02: Optimization 7 / 31

Integer Programg Example MAX-CUT Objective: We use the trick: (1 x i x j ) = 0 if x i and x j have the same sign (Together). (1 x i x j ) = 2 if x i and x j have the opposite sign (Apart). Then the objective function is 1 w i,j (1 x i x j ) 2 i,j The optimization problem is the integer program: 1 max x 2 i =1 2 w i,j (1 x i x j ) i,j M. Peet Lecture 02: Optimization 8 / 31

MAX-CUT The optimization problem is the integer program: 1 max x 2 i =1 2 w i,j (1 x i x j ) Consider the MAX-CUT problem with 5 nodes i,j w 12 = w 23 = w 45 = w 15 = w 34 =.5 and w 14 = w 24 = w 25 = 0 where w ij = w ji. An Optimal Cut IS: x 1 = x 3 = x 4 = 1 x 2 = x 5 = 1 This cut has objective value f(x) = 2.5.5x 1 x 2.5x 2 x 3.5x 3 x 4.5x 4 x 5.5x 1 x 5 = 4 12 5 1 3 4 Figure: An Optimal Cut M. Peet Lecture 02: Optimization 9 / 31

Dynamic Programg and LQR Open-Loop Case Objective Function: Lets imize a quadratic Cost N 1 h(x, u) := x(n) T Sx(N) + x(k) T Qx(k) + u(k) T Ru(k) k=1 Variables: The states x(k), and inputs, u(k). Constraint: The dynamics define how u x. x(k + 1) = Ax(k) + Bu(k), x(0) = 1 k = 0,, N Note: The choice of x(0) = 1 does not affect the solution Combined: x,u x(n) T Sx(N) + N 1 k=1 x(k + 1) = Ax(k) + Bu(k), x(0) = 1 ( x(k) T Qx(k) + u(k) T Ru(k) ) k = 0,, N M. Peet Lecture 02: Optimization 10 / 31

Dynamic Programg and LQR Closed-Loop Case Objective Function: Lets imize a quadratic Cost N 1 x(n) T Sx(N) + x(k) T Qx(k) + u(k) T Ru(k) k=1 Variables: We want a fixed policy (gain matrix, K) which deteres u(k) based on x(k) as u(k) = Kx(k). Constraint: The dynamics define how u x. Combined: x,u x(k + 1) = Ax(k) + Bu(k), u(k) = Kx(k), x(0) = 1 x(n) T Sx(N) + N 1 k=1 x(k + 1) = Ax(k) + Bu(k), u(k) = Kx(k), x(0) = 1 k = 0,, N ( x(k) T Qx(k) + u(k) T Ru(k) ) k = 0,, N Question: Are the Closed-Loop and Open-Loop Problems Equivalent? M. Peet Lecture 02: Optimization 11 / 31

Formulating Optimization Problems Equivalence Definition 1. Two optimization problems are Equivalent if a solution to one can be used to construct a solution to the other. Example 1: Equivalent Objective Functions Problem 1: Problem 2: Problem 3: x f(x) subject to A T x b 10f(x) 12 x subject to A T x b 1 max x f(x) subject to A T x b In this case x 1 = x 2 = x 3. Proof: For any x x 1 (both feasible), we have f(x) > f(x 1). Thus 10f(x) 12 > 10f(x 1) 12 and 1 f(x) < 1 f(x 1 ). i.e x is suboptimal for all. M. Peet Lecture 02: Optimization 12 / 31

Formulating Optimization Problems Equivalence in Variables Example 2: Equivalent Variables Problem 1: Problem 2: x f(x) subject to A T x b x f(t x + c) subject to (T T A) T x b A T c Here x 1 = T x 2 + c and x 2 = T 1 (x 1 c). Change of variables is invertible. (given x x 2, you can show it is suboptimal) Example 3: Variable Separability Problem 1: Problem 2: Problem 3: x,y f(x) + g(y) subject to AT 1 x b 1, A T 2 y b 2 x f(x) subject to A T 1 x b 1 y g(y) subject to A T 2 y b 2 Here x 1 = x 2 and y 1 = y 3. Neither feasibility nor imality are coupled (Objective fn. is Separable). M. Peet Lecture 02: Optimization 13 / 31

Formulating Optimization Problems Constraint Equivalence Example 4: Constraint/Objective Equivalence Problem 1: Problem 2: x f(x) subject to g(x) 0 x,t t subject to g(x) 0, t f(x) Here x 1 = x 2 and t 2 = f(x 1). Some other Equivalences: Redundant Constraints {x R : x > 1} vs. {x R : x > 1, x > 0} Polytopes (Vertices vs. Hyperplanes) {x R n : x = i Aiαi, i αi = 1} vs. {x Rn : Cx > b} M. Peet Lecture 02: Optimization 14 / 31

Machine Learning Classification and Support-Vector Machines In Classification we have inputs (data) (x i ), each of which has a binary label (y i { 1, +1}) y i = +1 means the output of x i belongs to group 1 y i = 1 means the output of x i belongs to group 2 We want to find a rule (a classifier) which takes the data x and predicts which group it is in. Our rule has the form of a function f(x) = w T x b. Then x is in group 1 if f(x) = w T x b > 0. x is in group 2 if f(x) = w T x b < 0. 10 8 6 4 2 y=+1 y=-1 H w H w1 H w2 0 Question: How to find the best w and b?? -2 0 2 4 6 8 10 Figure: We want to find a rule which separates two sets of data. M. Peet Lecture 02: Optimization 15 / 31

Machine Learning Classification and Support-Vector Machines Definition 2. A Hyperplane is the generalization of the concept of line/plane to multiple dimensions. {x R n : w T x b = 0} Half-Spaces are the parts above and below a Hyperplane. {x R n : w T x b 0} OR {x R n : w T x b 0} M. Peet Lecture 02: Optimization 16 / 31

Machine Learning Classification and Support-Vector Machines We want to separate the data into disjoint half-spaces and maximize the distance between these half-spaces Variables: w R n and b define the hyperplane Constraint: Each existing data point should be correctly labelled. w T x b > 1 when y i = +1 and w T x b < 1 when y i = 1 (Strict Separation) Alternatively: y i (w T x i b) 1. These two constraints are Equivalent. Figure: Maximizing the distance between two sets of Data Objective: The distance between Hyperplanes {x : w T x b = 1} and {x : w T x b = 1} is 1 f(w, b) = 2 wt w M. Peet Lecture 02: Optimization 17 / 31

Machine Learning Unconstrained Form (Soft-Margin SVM) Machine Learning algorithms solve 1 w R p,b R 2 wt w, subject to y i (w T x i b) 1, i = 1,..., K. Soft Margin Problems The hard margin problem can be relaxed to maximize the distance between hyperplanes PLUS the magnitude of classification errors 9 8 7 6 5 4 Soft Margin SVM Problem data with y=-1 data with y=+1 H w H w1 H w2 H ξ 1 w R p,b R 2 w 2 +c n max(0, 1 (w T x i b)y i ). i=1 Link: Repository of Interesting Machine Learning Data Sets 3 2 1 0-2 0 2 4 6 8 10 Figure: Data separation using soft-margin metric and distances to associated hyperplanes M. Peet Lecture 02: Optimization 18 / 31

Geometric vs. Functional Constraints These Problems are all equivalent: The Classical Representation: x R n f(x) : g i (x) 0 subject to i = 1, k The Geometric Representation is: f(x) : subject to x S x Rn where S := {x R n : g i (x) 0, i = 1,, k}. The Pure Geometric Representation (x is eliated!): γ γ : subject to S γ (S γ has at least one element) where S γ := {x R n : γ f(x) 0, g i (x) 0, i = 1,, k}. Proposition: Optimization is only as hard as detering feasibility! M. Peet Lecture 02: Optimization 19 / 31

Solving by Bisection (Do you have an Oracle?) Optimization Problem: γ = max γ : γ subject to S γ Bisection Algorithm (Convexity???): 1 Initialize infeasible γ u = b 2 Initialize feasible γ l = a 3 Set γ = γu+γ l 2 5 If S γ feasible, set γ l = γu+γ l 2 4 If S γ infeasible, set γ u = γu+γ l 2 6 k = k + 1 7 Goto 3 Then γ [γ l, γ u ] and γ u γ l b a 2 k. Bisection with oracle also solves the Primary Problem. ( γ : S γ = ) γ S u γ = γ 2 S γ = S γ 5 γ 4 S γ γ 3 S γ γ S 1 γ γ L S γ M. Peet Lecture 02: Optimization 20 / 31

Computational Complexity In Computer Science, we focus on Complexity of the PROBLEM NOT complexity of the algorithm. On an Turing machine, the # of steps is a fn of problem size (number of variables) NL: A logarithmic # (SORT) P: A polynomial # (LP) NP: A polynomial # for verification (TSP) NP HARD: at least as hard as NP (TSP) NP COMPLETE: A set of Equivalent* NP problems (MAX-CUT, TSP) EXPTIME: Solvable in 2 p(n) steps. p polynomial. (Chess) EXPSPACE: Solvable with 2 p(n) memory. *Equivalent means there is a polynomial-time reduction from one to the other. M. Peet Lecture 02: Optimization 21 / 31

How Hard is Optimization? The Classical Representation: x R nf(x) : g i (x) 0 h i (x) = 0 subject to i = 1, k i = 1, k Answer: Easy (P) if f, g i are all Convex and h i are affine. The Geometric Representation: f(x) : subject to x S x Rn Answer: Easy (P) if f is Convex and S is a Convex Set. The Pure Geometric Representation: max γ : subject to γ,x Rn (γ, x) S Answer: Easy (P) if S is a Convex Set. M. Peet Lecture 02: Optimization 22 / 31

Convex Functions Definition 3. An OBJECTIVE FUNCTION or CONSTRAINT function is convex if f(λx 1 + (1 λ)x 2 ) λf(x 1 ) + (1 λ)f(x 2 ) for all λ [0, 1]. Useful Facts: e ax, x are convex. x n (n 1 or n 0), log x are convex on x 0 If f 1 is convex and f 2 is convex, then f 3 (x) := f 1 (x) + f 2 (x) is convex. A function is convex if 2 f(x) is positive semidefinite for all x. If f 1, f 2 are convex, then f 3 (x) := max(f 1 (x), f 2 (x)) is convex. If f 1, f 2 are convex, and f 1 is increasing, then f 3 (x) := f 1 (f 2 (x)) is convex. M. Peet Lecture 02: Optimization 23 / 31

Convex Sets Definition 4. A FEASIBLE SET is convex if for any x, y Q, {µx + (1 µ)y : µ [0, 1]} Q. The line connecting any two points lies in the set. Facts: If f is convex, then {x : f(x) 0} is convex. The intersection of convex sets is convex. If S1 and S 2 are convex, then S 2 := {x : x S 1, x S 2} is convex. M. Peet Lecture 02: Optimization 24 / 31

Descent Algorithms (Why Convex Optimization is Easy) Unconstrained Optimization All descent algorithms are iterative, with a search direction ( x R n ) and step size (t 0). x k+1 = x k + t x Gradient Descent x = f(x) Newton s Algorithm: x = ( 2 f(x)) 1 f(x) Tries to solve the equation f(x) = 0. Both converge for sufficiently small step size. M. Peet Lecture 02: Optimization 25 / 31

Descent Algorithms Dealing with Constraints Method 1: Gradient Projection Figure: Must project step (t x) onto feasible Set Method 2: Barrier Functions f(x) + log(g(x)) x Converts a Constrained problem to an unconstrained problem. (Interior-Point Methods) M. Peet Lecture 02: Optimization 26 / 31

Non-Convexity and Local Optima 1. For convex optimization problems, Descent Methods Always find the global optimal point. 2. For non-convex optimization, Descent Algorithms may get stuck at local optima. M. Peet Lecture 02: Optimization 27 / 31

imize c Important Classes of Optimization T xproblems Linear Programg Example Linear Programg (LP) imize x 1 + x 2 subject x : subject to x R nct to 3x 1 + x 2 3 Ax b x 2 1 A x = b x 1 4 x 1 + 5x 2 20 x EASY: Simplex/Ellipsoid 1 + 4x Algorithm 2 20 (P) Can solve for >10,000 variables subject to Ax = b Cx d Link: A List of Solvers, Performance and Benchmark Problems M. Peet Lecture 02: Optimization 28 / 31

Important Classes of Optimization Problems Quadratic Programg Quadratic Programg (QP) Qx + c T x : x R nxt Ax b subject to EASY (P): If Q 0. HARD (NP-Hard): Otherwise M. Peet Lecture 02: Optimization 29 / 31

Important Classes of Optimization Problems Mixed-Integer Linear Programg Mixed-Integer Linear Programg (MILP) x : x R nct Ax b x i Z subject to i = 1, K HARD (NP-Hard) Mixed-Integer NonLinear Programg (MINLP) x R nf(x) : g i (x) 0 x i Z subject to i = 1, K Very Hard M. Peet Lecture 02: Optimization 30 / 31

Next Time: Positive Matrices, SDP and LMIs Also a bit on Duality, Relaxations. M. Peet Lecture 02: Optimization 31 / 31