Convex Optimization: Applications Lecturer: Pradeep Ravikumar Co-instructor: Aarti Singh Convex Optimization 1-75/36-75 Based on material from Boyd, Vandenberghe
Norm Approximation minimize Ax b (A R m n with m n, is a norm on R m ) interpretations of solution x = argmin x Ax b : geometric: Ax is point in R(A) closest to b estimation: linearmeasurementmodel y = Ax + v y are measurements, x is unknown, v is measurement error given y = b, best guess of x is x optimal design: x are design variables (input), Ax is result (output) x is design that best approximates desired result b
Norm Approximation: ell_ norm minimize Ax b (A R m n with m n, is a norm on R m ) least-squares approximation ( ): solution satisfies normal equations A T Ax = A T b (x =(A T A) 1 A T b if rank A = n)
Norm Approximation: ell_infty norm minimize Ax b (A R m n with m n, is a norm on R m ) Chebyshev approximation ( ): can be solved as an LP minimize subject to t t1 Ax b t1
Norm Approximation: ell_1 norm minimize Ax b (A R m n with m n, is a norm on R m ) sum of absolute residuals approximation ( 1 ): can be solved as an LP minimize subject to 1 T y y Ax b y
Penalty Function Approximation minimize φ(r 1 )+ + φ(r m ) subject to r = Ax b (A R m n, φ : R R is a convex penalty function) examples quadratic: φ(u) =u deadzone-linear with width a: 1.5 log barrier quadratic φ(u) = max{, u a} φ(u) 1.5 deadzone-linear log-barrier with limit a: φ(u) = { a log(1 (u/a) ) u <a otherwise 1.5 1.5.5 1 1.5 u
Penalty Function Approximation Huber penalty function (with parameter M) φ hub (u) = { u u M M( u M) u >M linear growth for large u makes approximation less sensitive to outliers φhub(u) 1.5 1.5 f(t) 1 1 1.5 1.5.5 1 1.5 u 1 5 5 1 t left: Huber penalty for M =1 right: affine function f(t) =α + βt fitted to 4 points t i, y i (circles) using quadratic (dashed) and Huber (solid) penalty
Least Norm Problems minimize subject to x Ax = b (A R m n with m n, is a norm on R n ) interpretations of solution x = argmin Ax=b x : geometric: x is point in affine set {x Ax = b} with minimum distance to estimation: b = Ax are (perfect) measurements of x; x is smallest ( most plausible ) estimate consistent with measurements design: x are design variables (inputs); b are required results (outputs) x is smallest ( most efficient ) design that satisfies requirements
Least Norm Problems: ell_1 norm minimize subject to x Ax = b (A R m n with m n, is a norm on R n ) minimum sum of absolute values ( 1 ): can be solved as an LP minimize 1 T y subject to y x y, Ax = b tends to produce sparse solution x
Least Norm Problems: least penalty extension minimize subject to x Ax = b (A R m n with m n, is a norm on R n ) extension: least-penalty problem minimize φ(x 1 )+ + φ(x n ) subject to Ax = b φ : R R is convex penalty function
Regularized Approximation minimize (w.r.t. R +) ( Ax b, x ) A R m n, norms on R m and R n can be different interpretation: find good approximation Ax b with small x estimation: linear measurement model y = Ax + v, withprior knowledge that x is small optimal design: smallx is cheaper or more efficient, or the linear model y = Ax is only valid for small x robust approximation: good approximation Ax b with small x is less sensitive to errors in A than good approximation with large x
Regularized Approximation minimize Ax b + γ x solution for γ > traces out optimal trade-off curve other common method: minimize Ax b + δ x with δ > Tikhonov regularization minimize Ax b + δ x can be solved as a least-squares problem [ ] minimize δi A x solution x =(A T A + δi) 1 A T b [ b ]
Signal Reconstruction minimize (w.r.t. R +) ( ˆx x cor, φ(ˆx)) x R n is unknown signal x cor = x + v is (known) corrupted version of x, with additive noise v variable ˆx (reconstructed signal) is estimate of x φ : R n R is regularization function or smoothing objective examples: quadratic smoothing, total variation smoothing: φ quad (ˆx) = n 1 (ˆx i+1 ˆx i ), φ tv (ˆx) = i=1 n 1 i=1 ˆx i+1 ˆx i
Signal Reconstruction: Quadratic Smoothing.5.5 ˆx x.5 1 3 4.5.5 1 3 4 ˆx.5.5 1 3 4 xcor ˆx.5.5.5 1 i 3 original signal x and noisy signal x cor 4 1 i 3 4 three solutions on trade-off curve ˆx x cor versus φ quad (ˆx)
Signal Reconstruction: Total Variation Smoothing ˆxi x 1 1 5 1 15 5 1 15 ˆxi 1 5 1 15 xcor 1 ˆxi 5 1 i 15 original signal x and noisy signal x cor 5 1 i 15 three solutions on trade-off curve ˆx x cor versus φ quad (ˆx) quadratic smoothing smooths out noise and sharp transitions in signal
Signal Reconstruction: Total Variation Smoothing ˆx x 1 1 5 1 15 5 1 15 ˆx 1 5 1 15 xcor 1 ˆx 5 1 i 15 original signal x and noisy signal x cor 5 1 i 15 three solutions on trade-off curve ˆx x cor versus φ tv (ˆx) total variation smoothing preserves sharp transitions in signal
Robust Approximation minimize Ax b with uncertain A two approaches: stochastic: assumea is random, minimize E Ax b worst-case: set A of possible values of A, minimizesup A A Ax b tractable only in special cases (certain norms, distributions, sets A)
Robust Approximation minimize Ax b with uncertain A two approaches: stochastic: assumea is random, minimize E Ax b worst-case: set A of possible values of A, minimizesup A A Ax b tractable only in special cases (certain norms, distributions, sets A) example: A(u) =A + ua 1 x nom minimizes A x b 1 1 8 x nom x stoch minimizes E A(u)x b with u uniform on [ 1, 1] r(u) 6 4 x stoch x wc x wc minimizes sup 1 u 1 A(u)x b figure shows r(u) = A(u)x b 1 1 u
Robust Approximation: Stochastic stochastic robust LS with A = Ā + U, U random, E U =, E U T U = P minimize E (Ā + U)x b explicit expression for objective: E Ax b = E Āx b + Ux = Āx b + E x T U T Ux = Āx b + x T Px hence, robust LS problem is equivalent to LS problem minimize Āx b + P 1/ x for P = δi, get Tikhonov regularized problem minimize Āx b + δ x
Robust Approximation: Worst-Case worst-case robust LS with A = {Ā + u 1A 1 + + u p A p u 1} minimize sup A A Ax b =sup u 1 P (x)u + q(x) where P (x) = [ A 1 x A x A p x ], q(x) =Āx b
Robust Approximation: Worst-Case worst-case robust LS with A = {Ā + u 1A 1 + + u p A p u 1} minimize sup A A Ax b =sup u 1 P (x)u + q(x) where P (x) = [ A 1 x A x A p x ], q(x) =Āx b from It can page be shown 5 14, strong duality holds between the following problems maximize Pu+ q subject to u 1 minimize subject to t + λ I P q P T λi q T t hence, robust LS problem is equivalent to SDP minimize subject to t + λ I P(x) q(x) P (x) T λi q(x) T t
Statistical Estimation distribution estimation problem: estimate probability density p(y) of a random variable from observed values parametric distribution estimation: choose from a family of densities p x (y), indexedbyaparameterx maximum likelihood estimation maximize (over x) log p x (y) y is observed value l(x) = log p x (y) is called log-likelihood function can add constraints x C explicitly, or define p x (y) =for x C a convex optimization problem if log p x (y) is concave in x for fixed y
Statistical Estimation: Linear Measurements with Noise linear measurement model y i = a T i x + v i, i =1,...,m x R n is vector of unknown parameters v i is IID measurement noise, with density p(z) y i is measurement: y R m has density p x (y) = m i=1 p(y i a T i x) maximum likelihood estimate: any solution x of maximize l(x) = m i=1 log p(y i a T i x) (y is observed value)
Statistical Estimation: Linear Measurements with Noise Gaussian noise N (, σ ): p(z) =(πσ ) 1/ e z /(σ ), ML estimate is LS solution l(x) = m log(πσ ) 1 σ Laplacian noise: p(z) =(1/(a))e z /a, m (a T i x y i ) i=1 l(x) = m log(a) 1 a ML estimate is l 1 -norm solution uniform noise on [ a, a]: m a T i x y i i=1 l(x) = { m log(a) a T i x y i a, i =1,...,m otherwise ML estimate is any x with a T i x y i a
Statistical Estimation: Logistic Regression random variable y {, 1} with distribution p = prob(y = 1) = exp(at u + b) 1+exp(a T u + b) a, b are parameters; u R n are (observable) explanatory variables estimation problem: estimate a, b from m observations (u i,y i ) log-likelihood function (for y 1 = = y k =1, y k+1 = = y m =): k l(a, b) = log exp(a T u i + b) m 1 1+exp(a T u i + b) 1+exp(a T u i + b) i=1 i=k+1 = k (a T u i + b) m log(1 + exp(a T u i + b)) i=1 i=1 concave in a, b
Minimum Volume Ellipsoid Around a Set
Minimum Volume Ellipsoid Around a Set Löwner-John ellipsoid of a set C: minimum volume ellipsoid E s.t. C E parametrize E as E = {v Av + b 1}; w.l.o.g. assume A S n ++ vol E is proportional to det A 1 ; to compute minimum volume ellipsoid, minimize (over A, b) log det A 1 subject to sup v C Av + b 1 convex, but evaluating the constraint can be hard (for general C)
Minimum Volume Ellipsoid Around a Set Löwner-John ellipsoid of a set C: minimum volume ellipsoid E s.t. C E parametrize E as E = {v Av + b 1}; w.l.o.g. assume A S n ++ vol E is proportional to det A 1 ; to compute minimum volume ellipsoid, minimize (over A, b) log det A 1 subject to sup v C Av + b 1 convex, but evaluating the constraint can be hard (for general C) finite set C = {x 1,...,x m }: minimize (over A, b) log det A 1 subject to Ax i + b 1, i =1,...,m also gives Löwner-John ellipsoid for polyhedron conv{x 1,...,x m }
Maximum Volume Inscribed Ellipsoid
Maximum Volume Inscribed Ellipsoid maximum volume ellipsoid E inside a convex set C R n parametrize E as E = {Bu + d u 1}; w.l.o.g. assume B S n ++ vol E is proportional to det B; can compute E by solving maximize log det B subject to sup u 1 I C (Bu + d) (where I C (x) =for x C and I C (x) = for x C) convex, but evaluating the constraint can be hard (for general C)
Maximum Volume Inscribed Ellipsoid maximum volume ellipsoid E inside a convex set C R n parametrize E as E = {Bu + d u 1}; w.l.o.g. assume B S n ++ vol E is proportional to det B; can compute E by solving maximize log det B subject to sup u 1 I C (Bu + d) (where I C (x) =for x C and I C (x) = for x C) convex, but evaluating the constraint can be hard (for general C) polyhedron {x a T i x b i,i=1,...,m}: maximize log det B subject to Ba i + a T i d b i, i =1,...,m (constraint follows from sup u 1 a T i (Bu + d) = Ba i + a T i d)