Convex Optimization: Applications

Similar documents
6. Approximation and fitting

Convex Optimization M2

7. Statistical estimation

7. Statistical estimation

8. Geometric problems

8. Geometric problems

9. Geometric problems

Nonlinear Programming Models

4. Convex optimization problems

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

Lecture: Convex Optimization Problems

4. Convex optimization problems

CSCI5654 (Linear Programming, Fall 2013) Lectures Lectures 10,11 Slide# 1

Determinant maximization with linear. S. Boyd, L. Vandenberghe, S.-P. Wu. Information Systems Laboratory. Stanford University

Convex optimization problems. Optimization problem in standard form

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

ICS-E4030 Kernel Methods in Machine Learning

Advances in Convex Optimization: Theory, Algorithms, and Applications

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

CS-E4830 Kernel Methods in Machine Learning

Convex Optimization and l 1 -minimization

Lecture 5 Least-squares

Short Course Robust Optimization and Machine Learning. 3. Optimization in Supervised Learning

Agenda. Interior Point Methods. 1 Barrier functions. 2 Analytic center. 3 Central path. 4 Barrier method. 5 Primal-dual path following algorithms

12. Interior-point methods

Lecture Note 5: Semidefinite Programming for Stability Analysis

subject to (x 2)(x 4) u,

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

MATH 423 Linear Algebra II Lecture 10: Inverse matrix. Change of coordinates.

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Conditional Gradient (Frank-Wolfe) Method

Lecture: Examples of LP, SOCP and SDP

Introduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research

Lecture 6: Conic Optimization September 8

Disciplined Convex Programming

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

minimize x subject to (x 2)(x 4) u,

Stochastic Subgradient Methods

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Outline lecture 2 2(30)

Convex Optimization. Lecture 12 - Equality Constrained Optimization. Instructor: Yuanzhang Xiao. Fall University of Hawaii at Manoa

Max Margin-Classifier

Lecture 2 Machine Learning Review

Convex Optimization in Classification Problems

Maximum likelihood estimation

Interior Point Methods: Second-Order Cone Programming and Semidefinite Programming

Lecture 2: Tikhonov-Regularization

CSCI : Optimization and Control of Networks. Review on Convex Optimization

5. Duality. Lagrangian

ELE539A: Optimization of Communication Systems Lecture 6: Quadratic Programming, Geometric Programming, and Applications

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

10. Unconstrained minimization

Probabilistic Optimal Estimation and Filtering

Convex Optimization Boyd & Vandenberghe. 5. Duality

The Q-parametrization (Youla) Lecture 13: Synthesis by Convex Optimization. Lecture 13: Synthesis by Convex Optimization. Example: Spring-mass System

Sparse and Robust Optimization and Applications

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Subgradient Method. Guest Lecturer: Fatma Kilinc-Karzan. Instructors: Pradeep Ravikumar, Aarti Singh Convex Optimization /36-725

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Agenda. Applications of semidefinite programming. 1 Control and system theory. 2 Combinatorial and nonconvex optimization

Lecture 2: August 31

Primal-Dual Interior-Point Methods

Convex Optimization M2

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

Bayesian Models for Regularization in Optimization

Lecture: Duality.

Estimation and Optimization: Gaps and Bridges. MURI Meeting June 20, Laurent El Ghaoui. UC Berkeley EECS

Exercises. Exercises. Basic terminology and optimality conditions. 4.2 Consider the optimization problem

IE 521 Convex Optimization

Lecture 9: Discrete-Time Linear Quadratic Regulator Finite-Horizon Case

Constrained Optimization and Lagrangian Duality

FRTN10 Multivariable Control, Lecture 13. Course outline. The Q-parametrization (Youla) Example: Spring-mass System

Contre-examples for Bayesian MAP restoration. Mila Nikolova

Gauge optimization and duality

Outline Lecture 2 2(32)

Machine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang

Regression. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh.

Linear and non-linear programming

SGD and Randomized projection algorithms for overdetermined linear systems

New ways of dimension reduction? Cutting data sets into small pieces

Newton s Method. Javier Peña Convex Optimization /36-725

Enhanced Compressive Sensing and More

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Lecture 4: Linear and quadratic problems

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Machine learning - HT Maximum Likelihood

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Machine Learning. Support Vector Machines. Manfred Huber

Relaxations and Randomized Methods for Nonconvex QCQPs

Linear Models for Classification

ECE521 week 3: 23/26 January 2017

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Statistical Data Mining and Machine Learning Hilary Term 2016

Transcription:

Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe

Norm Approximation minimize Ax b (A R m n with m n, is a norm on R m ) interpretations of solution x = argmin x Ax b : geometric: Ax is point in R(A) closest to b estimation: linearmeasurementmodel y = Ax + v y are measurements, x is unknown, v is measurement error given y = b, best guess of x is x optimal design: x are design variables (input), Ax is result (output) x is design that best approximates desired result b

Norm Approximation: ell_ norm minimize Ax b (A R m n with m n, is a norm on R m ) least-squares approximation ( ): solution satisfies normal equations A T Ax = A T b (x =(A T A) 1 A T b if rank A = n)

Norm Approximation: ell_infty norm minimize Ax b (A R m n with m n, is a norm on R m ) Chebyshev approximation ( ): can be solved as an LP minimize subject to t t1 Ax b t1

Norm Approximation: ell_1 norm minimize Ax b (A R m n with m n, is a norm on R m ) sum of absolute residuals approximation ( 1 ): can be solved as an LP minimize subject to 1 T y y Ax b y

Penalty Function Approximation minimize φ(r 1 )+ + φ(r m ) subject to r = Ax b (A R m n, φ : R R is a convex penalty function) examples quadratic: φ(u) =u deadzone-linear with width a: 1.5 log barrier quadratic φ(u) = max{, u a} φ(u) 1.5 deadzone-linear log-barrier with limit a: φ(u) = { a log(1 (u/a) ) u <a otherwise 1.5 1.5.5 1 1.5 u

Penalty Function Approximation Huber penalty function (with parameter M) φ hub (u) = { u u M M( u M) u >M linear growth for large u makes approximation less sensitive to outliers φhub(u) 1.5 1.5 f(t) 1 1 1.5 1.5.5 1 1.5 u 1 5 5 1 t left: Huber penalty for M =1 right: affine function f(t) =α + βt fitted to 4 points t i, y i (circles) using quadratic (dashed) and Huber (solid) penalty

Least Norm Problems minimize subject to x Ax = b (A R m n with m n, is a norm on R n ) interpretations of solution x = argmin Ax=b x : geometric: x is point in affine set {x Ax = b} with minimum distance to estimation: b = Ax are (perfect) measurements of x; x is smallest ( most plausible ) estimate consistent with measurements design: x are design variables (inputs); b are required results (outputs) x is smallest ( most efficient ) design that satisfies requirements

Least Norm Problems: ell_1 norm minimize subject to x Ax = b (A R m n with m n, is a norm on R n ) minimum sum of absolute values ( 1 ): can be solved as an LP minimize 1 T y subject to y x y, Ax = b tends to produce sparse solution x

Least Norm Problems: least penalty extension minimize subject to x Ax = b (A R m n with m n, is a norm on R n ) extension: least-penalty problem minimize φ(x 1 )+ + φ(x n ) subject to Ax = b φ : R R is convex penalty function

Regularized Approximation minimize (w.r.t. R +) ( Ax b, x ) A R m n, norms on R m and R n can be different interpretation: find good approximation Ax b with small x estimation: linear measurement model y = Ax + v, withprior knowledge that x is small optimal design: smallx is cheaper or more efficient, or the linear model y = Ax is only valid for small x robust approximation: good approximation Ax b with small x is less sensitive to errors in A than good approximation with large x

Regularized Approximation minimize Ax b + γ x solution for γ > traces out optimal trade-off curve other common method: minimize Ax b + δ x with δ > Tikhonov regularization minimize Ax b + δ x can be solved as a least-squares problem [ ] minimize δi A x solution x =(A T A + δi) 1 A T b [ b ]

Signal Reconstruction minimize (w.r.t. R +) ( ˆx x cor, φ(ˆx)) x R n is unknown signal x cor = x + v is (known) corrupted version of x, with additive noise v variable ˆx (reconstructed signal) is estimate of x φ : R n R is regularization function or smoothing objective examples: quadratic smoothing, total variation smoothing: φ quad (ˆx) = n 1 (ˆx i+1 ˆx i ), φ tv (ˆx) = i=1 n 1 i=1 ˆx i+1 ˆx i

Signal Reconstruction: Quadratic Smoothing.5.5 ˆx x.5 1 3 4.5.5 1 3 4 ˆx.5.5 1 3 4 xcor ˆx.5.5.5 1 i 3 original signal x and noisy signal x cor 4 1 i 3 4 three solutions on trade-off curve ˆx x cor versus φ quad (ˆx)

Signal Reconstruction: Total Variation Smoothing ˆxi x 1 1 5 1 15 5 1 15 ˆxi 1 5 1 15 xcor 1 ˆxi 5 1 i 15 original signal x and noisy signal x cor 5 1 i 15 three solutions on trade-off curve ˆx x cor versus φ quad (ˆx) quadratic smoothing smooths out noise and sharp transitions in signal

Signal Reconstruction: Total Variation Smoothing ˆx x 1 1 5 1 15 5 1 15 ˆx 1 5 1 15 xcor 1 ˆx 5 1 i 15 original signal x and noisy signal x cor 5 1 i 15 three solutions on trade-off curve ˆx x cor versus φ tv (ˆx) total variation smoothing preserves sharp transitions in signal

Robust Approximation minimize Ax b with uncertain A two approaches: stochastic: assumea is random, minimize E Ax b worst-case: set A of possible values of A, minimizesup A A Ax b tractable only in special cases (certain norms, distributions, sets A)

Robust Approximation minimize Ax b with uncertain A two approaches: stochastic: assumea is random, minimize E Ax b worst-case: set A of possible values of A, minimizesup A A Ax b tractable only in special cases (certain norms, distributions, sets A) example: A(u) =A + ua 1 x nom minimizes A x b 1 1 8 x nom x stoch minimizes E A(u)x b with u uniform on [ 1, 1] r(u) 6 4 x stoch x wc x wc minimizes sup 1 u 1 A(u)x b figure shows r(u) = A(u)x b 1 1 u

Robust Approximation: Stochastic stochastic robust LS with A = Ā + U, U random, E U =, E U T U = P minimize E (Ā + U)x b explicit expression for objective: E Ax b = E Āx b + Ux = Āx b + E x T U T Ux = Āx b + x T Px hence, robust LS problem is equivalent to LS problem minimize Āx b + P 1/ x for P = δi, get Tikhonov regularized problem minimize Āx b + δ x

Robust Approximation: Worst-Case worst-case robust LS with A = {Ā + u 1A 1 + + u p A p u 1} minimize sup A A Ax b =sup u 1 P (x)u + q(x) where P (x) = [ A 1 x A x A p x ], q(x) =Āx b

Robust Approximation: Worst-Case worst-case robust LS with A = {Ā + u 1A 1 + + u p A p u 1} minimize sup A A Ax b =sup u 1 P (x)u + q(x) where P (x) = [ A 1 x A x A p x ], q(x) =Āx b from It can page be shown 5 14, strong duality holds between the following problems maximize Pu+ q subject to u 1 minimize subject to t + λ I P q P T λi q T t hence, robust LS problem is equivalent to SDP minimize subject to t + λ I P(x) q(x) P (x) T λi q(x) T t

Statistical Estimation distribution estimation problem: estimate probability density p(y) of a random variable from observed values parametric distribution estimation: choose from a family of densities p x (y), indexedbyaparameterx maximum likelihood estimation maximize (over x) log p x (y) y is observed value l(x) = log p x (y) is called log-likelihood function can add constraints x C explicitly, or define p x (y) =for x C a convex optimization problem if log p x (y) is concave in x for fixed y

Statistical Estimation: Linear Measurements with Noise linear measurement model y i = a T i x + v i, i =1,...,m x R n is vector of unknown parameters v i is IID measurement noise, with density p(z) y i is measurement: y R m has density p x (y) = m i=1 p(y i a T i x) maximum likelihood estimate: any solution x of maximize l(x) = m i=1 log p(y i a T i x) (y is observed value)

Statistical Estimation: Linear Measurements with Noise Gaussian noise N (, σ ): p(z) =(πσ ) 1/ e z /(σ ), ML estimate is LS solution l(x) = m log(πσ ) 1 σ Laplacian noise: p(z) =(1/(a))e z /a, m (a T i x y i ) i=1 l(x) = m log(a) 1 a ML estimate is l 1 -norm solution uniform noise on [ a, a]: m a T i x y i i=1 l(x) = { m log(a) a T i x y i a, i =1,...,m otherwise ML estimate is any x with a T i x y i a

Statistical Estimation: Logistic Regression random variable y {, 1} with distribution p = prob(y = 1) = exp(at u + b) 1+exp(a T u + b) a, b are parameters; u R n are (observable) explanatory variables estimation problem: estimate a, b from m observations (u i,y i ) log-likelihood function (for y 1 = = y k =1, y k+1 = = y m =): k l(a, b) = log exp(a T u i + b) m 1 1+exp(a T u i + b) 1+exp(a T u i + b) i=1 i=k+1 = k (a T u i + b) m log(1 + exp(a T u i + b)) i=1 i=1 concave in a, b

Minimum Volume Ellipsoid Around a Set

Minimum Volume Ellipsoid Around a Set Löwner-John ellipsoid of a set C: minimum volume ellipsoid E s.t. C E parametrize E as E = {v Av + b 1}; w.l.o.g. assume A S n ++ vol E is proportional to det A 1 ; to compute minimum volume ellipsoid, minimize (over A, b) log det A 1 subject to sup v C Av + b 1 convex, but evaluating the constraint can be hard (for general C)

Minimum Volume Ellipsoid Around a Set Löwner-John ellipsoid of a set C: minimum volume ellipsoid E s.t. C E parametrize E as E = {v Av + b 1}; w.l.o.g. assume A S n ++ vol E is proportional to det A 1 ; to compute minimum volume ellipsoid, minimize (over A, b) log det A 1 subject to sup v C Av + b 1 convex, but evaluating the constraint can be hard (for general C) finite set C = {x 1,...,x m }: minimize (over A, b) log det A 1 subject to Ax i + b 1, i =1,...,m also gives Löwner-John ellipsoid for polyhedron conv{x 1,...,x m }

Maximum Volume Inscribed Ellipsoid

Maximum Volume Inscribed Ellipsoid maximum volume ellipsoid E inside a convex set C R n parametrize E as E = {Bu + d u 1}; w.l.o.g. assume B S n ++ vol E is proportional to det B; can compute E by solving maximize log det B subject to sup u 1 I C (Bu + d) (where I C (x) =for x C and I C (x) = for x C) convex, but evaluating the constraint can be hard (for general C)

Maximum Volume Inscribed Ellipsoid maximum volume ellipsoid E inside a convex set C R n parametrize E as E = {Bu + d u 1}; w.l.o.g. assume B S n ++ vol E is proportional to det B; can compute E by solving maximize log det B subject to sup u 1 I C (Bu + d) (where I C (x) =for x C and I C (x) = for x C) convex, but evaluating the constraint can be hard (for general C) polyhedron {x a T i x b i,i=1,...,m}: maximize log det B subject to Ba i + a T i d b i, i =1,...,m (constraint follows from sup u 1 a T i (Bu + d) = Ba i + a T i d)