Convex Optimization and Modeling

Similar documents
MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

Functional Analysis Review

Introduction and Math Preliminaries

Analysis Preliminary Exam Workshop: Hilbert Spaces

10-725/36-725: Convex Optimization Prerequisite Topics

8. Geometric problems

8. Geometric problems

Basic Calculus Review

Part 1a: Inner product, Orthogonality, Vector/Matrix norm

Functions of Several Variables

Linear Algebra. Session 12

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Optimization and Optimal Control in Banach Spaces

Analysis-3 lecture schemes

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

Tutorials in Optimization. Richard Socher

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Lecture 2: Linear Algebra Review

Convex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)

Kernel Method: Data Analysis with Positive Definite Kernels

Homework 4. Convex Optimization /36-725

CSC Linear Programming and Combinatorial Optimization Lecture 10: Semidefinite Programming

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Economics 204 Fall 2013 Problem Set 5 Suggested Solutions

Multivariable Calculus

Lecture 1: Review of linear algebra

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

EE Applications of Convex Optimization in Signal Processing and Communications Dr. Andre Tkacenko, JPL Third Term

Inequality Constraints

Lecture Note 5: Semidefinite Programming for Stability Analysis

Linear and non-linear programming

Machine Learning And Applications: Supervised Learning-SVM

Lecture Notes on Support Vector Machine

ECON 4117/5111 Mathematical Economics

Linear Analysis Lecture 5

Appendix A Functional Analysis

Convex Analysis Background

SYLLABUS. 1 Linear maps and matrices

Chapter 2: Preliminaries and elements of convex analysis

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

j=1 [We will show that the triangle inequality holds for each p-norm in Chapter 3 Section 6.] The 1-norm is A F = tr(a H A).

STA141C: Big Data & High Performance Statistical Computing

1 Continuity Classes C m (Ω)

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Linear Algebra Massoud Malek

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

9. Geometric problems

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Optimality Conditions for Nonsmooth Convex Optimization

Machine Learning. Support Vector Machines. Manfred Huber

NORMS ON SPACE OF MATRICES

ECON 4117/5111 Mathematical Economics

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

UC Berkeley Department of Electrical Engineering and Computer Science. EECS 227A Nonlinear and Convex Optimization. Solutions 5 Fall 2009

Support Vector Machines

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Nonlinear Programming Models

Analysis and Linear Algebra. Lectures 1-3 on the mathematical tools that will be used in C103

Examination paper for TMA4145 Linear Methods

STA141C: Big Data & High Performance Statistical Computing

Nonlinear Programming Algorithms Handout

Lecture II: Linear Algebra Revisited

8. Conjugate functions

Optimization Theory. A Concise Introduction. Jiongmin Yong

L. Vandenberghe EE236C (Spring 2016) 18. Symmetric cones. definition. spectral decomposition. quadratic representation. log-det barrier 18-1

Semidefinite and Second Order Cone Programming Seminar Fall 2001 Lecture 2

Contents Real Vector Spaces Linear Equations and Linear Inequalities Polyhedra Linear Programs and the Simplex Method Lagrangian Duality

Constrained Optimization and Lagrangian Duality

Problem 1 (Exercise 2.2, Monograph)

Convex Optimization and Modeling

1. Foundations of Numerics from Advanced Mathematics. Linear Algebra

Support Vector Machines

ALG - Algebra

CS-E4830 Kernel Methods in Machine Learning

Support Vector Machines: Maximum Margin Classifiers

Lecture 5 : Projections

Lecture 1 Introduction

MATH 5720: Unconstrained Optimization Hung Phan, UMass Lowell September 13, 2018

Linear Normed Spaces (cont.) Inner Product Spaces

2. Review of Linear Algebra

Second Order Elliptic PDE

Exercises * on Linear Algebra

Lecture 15 Newton Method and Self-Concordance. October 23, 2008

Algebra II. Paulius Drungilas and Jonas Jankauskas

Newton s Method. Javier Peña Convex Optimization /36-725

MAT Linear Algebra Collection of sample exams

Normed & Inner Product Vector Spaces

homogeneous 71 hyperplane 10 hyperplane 34 hyperplane 69 identity map 171 identity map 186 identity map 206 identity matrix 110 identity matrix 45

Lecture 8: Linear Algebra Background

Problem Set 6: Solutions Math 201A: Fall a n x n,

Elements of Linear Algebra, Topology, and Calculus

Mathematical foundations - linear algebra

The following definition is fundamental.

Your first day at work MATH 806 (Fall 2015)

Transcription:

Convex Optimization and Modeling Introduction and a quick repetition of analysis/linear algebra First lecture, 12.04.2010 Jun.-Prof. Matthias Hein

Organization of the lecture Advanced course, 2+2 hours, 6 credit points Exercises: time+location: Friday, 16-18, E2.4, SR 216 teaching assistant: Shyam Sundar Rangapuram weekly exercises, theoretical and practical work, practical exercises will be in Matlab (available in the CIP-pools), 50% of the points in the exercises are needed to take part in the exams. Exams: End-term: 28.7. Re-exam: to be determined Grading: An exam is passed if you get at least 50% of the points. The grading is based on the best out of end-term and re-exam.

Lecture Material The course is based (too large extent) on the book Boyd, Vandenberghe: Convex Optimization The book is freely available: http://www.stanford.edu/ boyd/cvxbook/ Other material: Bertsekas: Nonlinear Programming Hiriart-Urruty, Lemarechal: Fundamentals of Convex Analysis. original papers. For the exercises in Matlab we will use CVX: Matlab Software for Disciplined Convex Programming available at: http://www.stanford.edu/ boyd/cvx/ (Version 1.21).

Roadmap of the lecture Introduction/Motivation - Review of Linear Algebra Theory: Convex Analysis: Convex sets and functions Theory: Convex Optimization, Duality theory Algorithms: Unconstrained/equality-constrained Optimization Algorithms: Interior Point Methods, Alternatives Applications: Image Processing, Machine Learning, Statistics Applications: Convex relaxations of combinatorial problems

Roadmap for today Introduction - Motivation Reminder of Analysis Reminder of Linear Algebra Inner Product and Norms

Introduction What is Optimization? we want to find the best parameters for a certain problem e.g. best investment, best function which fits the data, best tradeoff between fitting the data and having a smooth function (machine learning, image denoising) parameters underlie restrictions constraints. total investment limited and positive, images have to be positive, preservation of total intensity

Introduction II Mathematical Optimization/Programming min f(x), x D subject to: g i (x) 0, i = 1,...,r f is the objective or cost function. h j (x) = 0, j = 1,...,s The domain D of the optimization problem: D = dom f r i=1 domg i s j=1 domh j. x D is feasible if the inequality and equality constraints hold at x. the optimal value p of the optimization problem p = inf{f(x) g i (x) 0, i = 1,...,r, h j (x) = 0, j = 1,...,s x D}.

Note that γ i max { w,x i +b y i, ( w,x i +b y i )} = w,xi +b y i. In particular, at the optimum γ i = w,x i + b y i. Linear Programming Introduction III The objective f and the constraints g 1,...,g n, h 1,...,h m are all linear. Example of Linear Programming: We want to fit a linear function, φ(x) = w,x + b, to a set of k data points (x i,y i ) k i=1. arg min w,b k i=1 w,x i + b y i This non-linear problem can be formulated as a linear program: min w R n, b, γ 1,...,γ k R k γ i, i=1 subject to: w,x i + b y i γ i, i = 1,...,k ( w,x i + b y i ) γi, i = 1,...,k

Introduction IV Convex Optimization The objective f and the inequality constraints g 1,...,g n are convex. The equality constraints h 1,...,h m are linear. Distance between convex hulls - The hard margin Support Vector Machine (SVM) We want to separate two classes of points (x i,y i ) k i=1, where y i = 1 or y i = 1 with a hyperplane such that the hyperplane has maximal distance to the classes. min w R n, b R w 2 subject to: y i ( w,x i + b) 1, i = 1,...,k This problem has only a feasible solution if the two classes are separable.

Visualization of the SVM solution 2.5 2 1.5 Class 1 Class +1 Hyperplane, <w,x>+b=0 Hyperplane, <w,x>+b=1 Hyperplane, <w,x>+b= 1 SVM solution 1 0.5 0 0.5 1 1.5 2 2.5 1.5 1 0.5 0 0.5 1 1.5 Figure 1: A linearly separable problem. The hard margin solution of the SVM is shown together with the convex hulls of the positive and negative class. The points on the margin, that is w,x + b = ±1, are called support vectors.

Introduction V Unconstrained convex optimization: Total variation denoising min f Y f 2 + λ f 1.

Introduction VI Classification of optimization problems: Linear (good properties, polynomial-time algorithms) Convex (share a lot of properties of linear problems good complexity properties) Nonlinear and non-convex (difficult global optimality statements are usually not possible) Instead of linear versus nonlinear consider convex versus non-convex problem classes.

Introduction VII Goodies of convex optimization problems: many interesting problems can be formulated as convex optimization problems, the dual problem of non-convex problems is convex lower bounds for difficult problems! efficient algorithms available - but still active research area. Goal of this course Overview over the theory of convex analysis and convex optimization, Modeling aspect in applications: how to recognize and formulate a convex optimization problem, Introduction to nonlinear programming, interior point methods and specialized methods.

Reminder of Analysis Properties of sets (in R n ): Definition 1. A point x C lies in the interior of C if ǫ > 0 such that B(x,ε) C. A point x C lies at the boundary if for every ε > 0 the ball around x contains a point y C. A set C is open if every point x in C is an interior point. A set C is closed if the complement R n \C is open. A set C R n is compact if it is closed and bounded The closure of C is the set C plus the limit elements of all sequences of elements in C. A closed set C contains all limits of sequences of elements in C,

Reminder of Analysis II Continuous functions: Definition 2. A function f : R n R m is continuous at x if for all ε > 0 there exists a δ such that x y δ = f(x) f(y) ε. In particular for a continuous functions we have ( lim f(x n) = f n Closed functions, Level set lim n x n Definition 3. A function f : R n R is called closed if for each α the (sub)level set is closed. ). L α = {x dom f f(x) α}, The level set of a discontinuous function need not be closed.

Reminder of Analysis III Definition 4. A function f : R n R has a local minimum at x, if ε > 0, such that f(x) f(y), y B(x,ε). Properties: on a compact set every continuous functions attains its global maximum/minimum, convex functions are (almost) continuous (except for the boundary).

Reminder of Analysis IV Discontinuous functions: there exist functions which are everywhere discontinuous Dirichlet function: f(x) = { 1 if x Q, 0 if x R\Q, only continuous on the irrational numbers, Thomae function: f(x) = { 1/q if x = p q 0 if x R\Q Q, with gcd(p,q) = 1,. discontinuous function: can have no global maxima/minima on a compact set (dom f = [ π 2, π 2 ], f(x) = tan(x), f(π/2) = f( π/2) = 0). there exist discontinuous functions which have no local minima/maxima.

Reminder of Analysis V Jacobian, Gradient Definition 5. Let f : R n R m and x int domf, the derivative or Jacobian of f at x is the matrix Df(x) R m n given by Df(x) ij = f i x j, i = 1,...,m, j = 1,...,n. The affine function g of z given by g(z) = f(x) + Df(x)(z x), is the (best) first-order approximation of f at x. Definition 6. If f : R n R the Jacobian reduces to the gradient which we write usually as a (column) vector: f(x) = Df(x) T = f x i x, i = 1,...,n.

Reminder of Analysis VI Second Derivative, Hessian and Taylor s theorem Definition 7. Let f : R n R and f twice differentiable and x int domf, the Hessian matrix of f at x is the matrix Hf(x) R n n given by Hf(x) ij = 2 f x i x j, i,j = 1,...,n. BV use 2 f for the Hessian matrix. The quadratic function g of z given by g(z) = f(x) + f(x)(z x) + 1 2 z x,hf x (z x), is the (best) second-order approximation of f at x. Theorem 1 (Taylor second-order expansion). Let Ω R n, f C 2 (Ω) and x Ω, then h R n with [x,x + h] Ω there θ [0,1] such that f(x + h) = f(x) + f x,h + 1 2 h,hf(x + θh)h.

Taylor expansion 1.4 1.2 1 sin(x) first order approximation second order approximation 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.5 0 0.5 1 1.5 2 Figure 2: The first and second order Taylor approximation at x = π 4 sin(x). f(π/4) = 2 2, f (π/4) = 2 2, f (π/4) = 2 2. of f(x) =

Reminder of Linear Algebra Range and Kernel of linear mappings Definition 8. Let A R m n. The range of A is the subspace of R m defined as rana = {x R m x = Ay, y R n }. The dimension of ran A is the rank of A. The null space or kernel of A is the subspace of R n defined as kera = {y R n Ay = 0}. Theorem 2. One has dim kera + dim rana = n. Moreover, one has the orthogonal decomposition R n = kera rana T.

Reminder of Linear Algebra II Symmetric Matrices Every real, symmetric matrix A R n n can be written as A = QΛQ T, where Q R n n is orthogonal (QQ T = ½) and Λ is a diagonal matrix having the eigenvalues λ i on the diagonal. Alternatively, one can write A = n λ i q i q T i=1 i, where q i is the eigenvector corresponding to the eigenvalue λ i. Moreover, deta = n λ i, tra = n λ i. i=1 i=1 One can find the eigenvalues via the so-called Rayleigh-Ritz principle λ min = inf v R n v,av v,v, λ max = sup v R n v,av v,v.

Reminder of Linear Algebra III Positive Definite Matrices Definition 9. A real, symmetric matrix A R n n is positive semi-definite if w,aw 0, for all w R n, The real, symmetric matrix A R n n is positive definite if w,aw > 0, for all w R n with w 0. Notation: S n : the set of symmetric matrices in R n n, S+: n the set of positive semi-definite matrices, S++: n the set of positive definite matrices.

Reminder of Linear Algebra IV Singular Value Decomposition Every real matrix A R m n can be written as A = UΣV T, where U R m m and V R n n are orthogonal and Σ is a diagonal matrix having the positive singular values σ i on the diagonal. Facts: the singular values σ i are positive, the number of non-zero singular values is equal to the rank of A, U contains the left eigenvectors (eigenvectors of AA T ), V contains the right eigenvectors (eigenvectors of A T A), the singular values are the eigenvalues of AA T (A T A).

Norms Norms Definition 10. Let V be a vector space. A norm : V R satisfies, non-negative: x 0 for all x R n, x = 0 x = 0, homogeneous: αx = α x, triangle inequality: x + y x + y. A norm induces a distance measure(metric): d(x, y) = x y. In R n we have the p-norms (p 1) x p = ( n i=1 x i p) 1 p. On matrices R m n this can be defined equivalently: ( m n X p = X ij p) 1 p. i=1 j=1

p-norm Level sets of the p norms 1 0.5 1 2 y 0 1 1 0 1 x The unit-ball of the p-norms. Note that for p < 1, the unit-ball is not convex no norm.

Norms II Operator/Matrix norm Definition 11. Let α be a norm on R m and β a norm on R n. The operator-norm of A : R m R n is defined as A α,β = sup Av β. v R m, v α =1 This is equivalent to: A α,β = sup v R m Av β v α. If both norms are Euclidean, then the operator norm is A 2,2 = σ max (X) = λ max (A T A). Proof : Av 2 v 2 = Av 2 2 v 2 2 = v,a T Av. v,v

Norms III Equivalent Norms Definition 12. We say that two norms 1 and 2 on a vector space V are equivalent if there exist a,b > 0 such that a x 1 x 2 b x 1, x V. Remarks: all norms on R n are equivalent to each other, e.g. x 2 = x 2 i x 2 i = x i = x 1 x i 2 i i i i i 1 = n x 2. x = max i x i i x i = x 1 nmax i x i = n x. the definition of a continuous function f : R n R m does not depend on the choice of the norm.

Inner Product Inner Product Definition 13. Let V be a vector space. Then an inner product, over R is a bilinear form, : V V R, such that symmetry: x,y = y,x, non-negativity: x, x 0, non-degenerate: x,x = 0 x = 0, Remarks: An inner product space is a vector space with an inner product, A complete inner product space is a Hilbert space, A complete normed space is a Banach space.

Inner Product II Inner Product The standard-inner product on R n is given for x,y R n as x,y = x T y = n i=1 x iy i, On can extend this to the set of matrices R n m, for X,Y R n m, X,Y = tr(x T Y ) = n i=1 m j=1 X ijy ij. Clearly, an inner-product induces a norm via: x = x,x. The norm for the inner product on matrices is the Frobenius norm X F = tr(x T X) = ( n i=1 m j=1 X2 ij )1 2. Every inner product fulfills the Cauchy-Schwarz inequality x,y x y.

Hierarchy of mathematical structures The hierarchy of mathematical structures - an arrow denotes inclusion (e.g. a Banach space is also a metric space or R n is also a manifold.) Drawing from Teubner - Taschenbuch der Mathematik.