Final exam.

Size: px
Start display at page:

Download "Final exam."

Transcription

1 EE364b Convex Optimization II June 4 8, 205 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have 96 hours to work on the final, your solutions must be typeset using L A TEX. We are expecting your solutions to be typo-free, clear, and correctly typeset. (And yes, we will deduct points for poor typesetting, typos, or unclear solutions.) All code submitted must be clear, commented and readable. To download Matlab or Julia files containing problem data, you ll have to type the whole URL given in the problem into your browser; there are no links on the course web page pointing to these files. To get a file called filename.m, for example, you would retrieve with your browser. Pleasemakesureeachproblemstartsonanewpage, say, byusingthe\clearpagecommand. (This generates a new page after printing out any figures that have floated forward.) your solutions to ee364b.submission@gmail.com by Monday June 7th 5pm at the latest.

2 . Robust truss design. A truss is a construction composed of thin elastic bars linked at nodes, that, when subjected to a load, deform until the reaction forces caused by deformations of the bars compensate external forces. In truss design problems, one wishes to deformations of the truss under different (typical) loading patterns. The goal in this problem is to develop truss designs robust to deviations from typical loads. A truss consists of p fixed nodes (attached to the ground or other immobile surface) and n free nodes. In a planar (two-dimensional) truss, each free node may move in two dimensions, so the truss s displacement is represented by a vector in R 2n. A design is a selection of m nonnegative bar volumes t R m + connecting the n + p nodes. We are given a total volume V of usable material, so we have the constraint that m t i V. Associated with a truss is a bar-stiffness matrix A(t) = m t ib i b T i parameterized by the volumes t R m +. The vectors b i R 2n are determined by the structure s geometry (nominal node locations) and characteristics of the bars material. (a) Given a load (vector of forces) f R 2n, the compliance is a measure of internal work done by the truss with respect to the load and is given by c f (t) = sup{2f T u u T A(t)u u R 2n }, and the goal is to design a stiff (small compliance) truss. Formulate the problem of designing a truss with the smallest possible compliance as a tractable convex optimization problem. Your final answer should not involve the inverse of A(t). (b) The design in part (a) is a single-load design: it s the compliance for a nominal load f R 2n, which may be brittle to even small loads other than f. In multi-load compliance, the goal is to find the vector of bar volumes which results in the smallest possible worst-case compliance for all f in an uncertainty set F, c F (t) = sup{2f T u u T A(t)u u R 2n,f F}. Letting F be the ellipsoid F = {Qe e R k, e 2 } for some matrix Q R 2n k, formulate the problem of minimizing c F (t) as a tractable convex optimization problem. Your final answer should not involve the inverse of A(t). (c) Given the data in robust_truss_data.[m jl], find the optimal truss designs for parts (a) (use f = f nominal) and (b) using your optimization formulation. The function plot_truss.[m jl] plot truss and displacement under forces f_nominal and f_occasional, the latter a small perturbation. Plot your truss designs as well as the truss t = V/ uniformly distributing material. Include your code, plots of displacement for each of the three truss designs the compliance under f nominal, and the distances displaced (printed by plot truss) under load. Note: Julia will not give sufficiently accurate solutions to this problem; we recommend Matlab for the most interpretable results. The matrix Q represents nominal forces as well as small perturbations. A common choice is to to take known loads f,f 2,...,f l R 2n, a small ǫ > 0, and set Q = [f f 2 f l ǫi 2n 2n ] R 2n 2n+l. 2

3 2. Convex functions of matrix eigenvalues. In this question, we explore an elegant construction of a wide variety of convex functions of matrices. Let S n denote the space of symmetric n n matrices. For any such matrix A, we let λ(a) R n denote its eigenvalues in non-increasing order, so λ (A) λ 2 (A)... λ n (A). Now, let f : R n R be a closed convex function that is symmetric, meaning that for every permutation matrix P, 2 we have f(px) = f(x). For such a function f, let f M be the matricization of f, the function defined on S n by f M (A) = f(λ(a)). We use convex conjugacy to show that f M is convex and to evaluate its derivatives. For this question (parts (b) and (d)), you may use von Neumann s trace inequality, which is that n Tr(AB) λ(a) T λ(b) = λ i (A)λ i (B), where equality is obtained in the first inequality if and only if A = U diag(λ(a))u T and B = U diag(λ(b))u T for an orthonormal matrix U. (a) Show that for A,B S n, the function A,B = Tr(AB) defines an inner product. (b) Show that convex conjugation and matricization commute, that is, show that for any matrix A S n, (f ) M (A) = (f M ) (A). (For a function f : S n R, we let f (A) = sup B {Tr(BA) f(b)}.) (c) Using the result of part (2b), show that f M is convex by arguing that (f M ) = f M. (d) Show that if A = U diag(λ(a))u T is the eigen-decomposition of A, then the subdifferential f M (A) = U diag ( f(λ(a)) ) U T, where f(λ(a)) denotes the subdifferential of f evaluated at λ(a). (The subdifferential of a function f : S n R at a point A is the set of matrices G S n such that f(b) f(a)+tr(g(b A)) for all B S n.) Hint. TheresultofQuestion.9inthehomeworkexercises,thatis,thatg f(x) if and only if g T x = f(x)+f (g), may be useful. (e) Using the results of part (2d), argue that for A 0, logdet(a) = A. 2 A matrix P is a permutation matrix if P {0,} n n and P = and P T =. This implies that P T P = I n n. 3

4 3. Neural spike train decoding via non-convex methods. Neurons in the retina, auditory cortex, and brain propagate signals rapidly by generating electrical pulses known as action potentials, which in signal-processing we represent as spike trains, sequences of activations where typically only a few elements of the signal are large and non-zero (above the activation threshold for the neuron). A standard problem in neuroscience and neural coding is to take (noisy) signals and resolve them into clean spike trains. In this problem, we study decoding a spike train x R n from a noisy signal y R n. As neurons are not constantly activated and have a refractory period (it takes time for an excitable membrane to transmit additional stimuli), we wish to encode sparsity in x and that non-zero x i locally inhibit other elements of the vector x. We thus formulate spike train recovery as a non-convex problem with variable x R n : 2 x y 2 2 subject to card(x) k x i x i+ = 0, i =,...,n, where k is a constant. We explore three heuristic approaches for this problem. Throughout this problem, use the data in the file spike_train_data.[m jl] for all implementation parts. To plot the resulting spike train (and original signal), use the method plot_spike_train.[m jl], using the true signal and the decoded one. (You may find it interesting to use the stem function to plot the original signal y as well.) (a) Lasso. Wefirstignoretheinhibitorypropertiesofthesignalandusel -regularization as a heuristic for cardinality. Give a closed form solution to 2 x y 2 2 +λ x. Find the resulting signal x for each λ {0.8,0.9,.0,.}. Include your code, the solution plot for λ =.9, and the output of plot_spike_train for each λ. (b) Sequential convex programing. Weextendthisl -regularizationheuristicandsolve a sequence of convex approximations to the (non-convex) problem i. Prove that for all α 0, n 2 x y 2 2 +ν x i x i+ +λ x. ab α 2 a2 + 2α b2, and that there is an α attaining equality. (Treat 0 2 and 0 2 /0 as 0.) ii. Using the relaxation of ab in 3(b)i, give an objective function f(x, α) in variables x R n and α R n +, where f(x,α) is convex in x and convex in α, and which satisfies inf f(x,α) = n α 0 2 x y 2 2 +ν x i x i+ +λ x. 4

5 iii. Implement an alternating minimization procedure for your function f(x, α). Is your procedure guaranteed to converge? Using λ =.9, ν =, and initializing from x = 0 and α =, run 200 iterations of alternating minimization on your function f(x,α), treating any 0/0 terms as 0. Include code, the solution plot (using data spike_train_data), and the output of plot_spike_train. (c) ADMM. It is possible to use ADMM for non-convex problems: consider solving f(x)+g(x), where f and g are (potentially) non-convex functions for which it is still possible to find an x v argmin x {f(x) + 2 x v 2 2 } (and likewise for g). Then we may introduce variable z = x to form the augmented Lagrangian L ρ (x,z,y) = f(x)+g(z)+y T (x z)+ ρ 2 x z 2 2, performing the usual ADMM steps over x,z, and y. This procedure is not guaranteed to converge but can be quite effective. Let I even and I odd be the even and odd indices of {,...,n }, respectively. Introducing variables x odd R n and x even R n, we consider the problem 2 x y 2 2 +λ x subject to x odd i x odd i+ = 0, i I odd x even i x even i+ = 0, i I even x = x odd = x even. Using for the{0, + }-valued indicator function, this has augmented Lagrangian L ρ (x,x odd,x even ) = 2 x y 2 2 +λ x +(νodd ) T (x odd x)+(ν even ) T (x even x) + i x odd i+ = 0}+ i x even i+ = 0} + ρ 2 i I odd {x odd i. Give a closed form solution to x + = argmin x i I even {x even x odd x ρ 2 xeven x 2 2. { x v 2 2 x ix i+ = 0 for i I odd }. ii. Give exact forms for ADMM updates for the three vectors x odd, x even, and the consensus vector x. iii. Implement your non-convex ADMM procedure with λ =.9, and run it for 200 iterations on the data in spike_train_data, initialized at x = x odd = x even = 0, using augmented Lagrangian multiplier ρ = 4. Take the final x of your ADMM iterations as x. Include code, a plot of your solution x, and the output of plot_spike_train. 5

6 4. ADMM for support vector machines (SVMs) In this problem, we investigate the performance of ADMM in relation to a subgradient method for a problem for which ADMM is quite natural. We consider solving N [ a T i x ] + + Nλ 2 x 2 2 () in the variable x R n, where [t] + = max{t,0}. (a) Introducing variables x i R n for i =,...,N (and associated dual variables y i ) with central variable z = x i write an augmented Lagrangian for the problem N [ a T i x ] + + Nλ 2 x 2 2. (The variables x i should correspond to functions f i (x) = [ a T i x ], while the + consensus variable z should also incorporate the (Nλ/2) x 2 2 of the objective). (b) Compute and give exact (closed form) updates for ADMM for the variables x i, z, and y i with your augmented Lagrangian form. (c) Using the data in svm_admm_data.[m jl], implement both projected sub-gradient descent and your ADMM algorithm for this problem. For the projected gradient descent algorithm, use projections onto the l 2 -ball of radius 2/λ (that is, the domain X = {x R n x 2 2/λ}; this is not strictly necessary but can be done without any loss of generality) and use the stepsize sequence α k = /(Nλk). For ADMM, use multiplier ρ = 3. Initializing each algorithm with all 0 vectors, run each algorithm for 200 iterations, and plot the gaps to optimality from the true solution (as calculated, say, by CVX) for each algorithm, using z k as your iterates for ADMM. Include the plot of optimality gaps and your code. 6

Final exam.

Final exam. EE364b Convex Optimization II June 018 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have

More information

EE364b Convex Optimization II May 30 June 2, Final exam

EE364b Convex Optimization II May 30 June 2, Final exam EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you

More information

10725/36725 Optimization Homework 4

10725/36725 Optimization Homework 4 10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725

Dual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725 Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)

More information

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition

More information

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013

Convex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013 Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for

More information

EE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem.

EE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem. EE364b Prof. S. Boyd EE364b Homewor 5 1. Distributed method for bi-commodity networ flow problem. We consider a networ (directed graph) with n arcs and p nodes, described by the incidence matrix A R p

More information

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1, 4 Duality 4.1 Numerical perturbation analysis example. Consider the quadratic program with variables x 1, x 2, and parameters u 1, u 2. minimize x 2 1 +2x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Preconditioning via Diagonal Scaling

Preconditioning via Diagonal Scaling Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Homework 5. Convex Optimization /36-725

Homework 5. Convex Optimization /36-725 Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

ADMM and Fast Gradient Methods for Distributed Optimization

ADMM and Fast Gradient Methods for Distributed Optimization ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work

More information

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms

Convex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

Homework 4. Convex Optimization /36-725

Homework 4. Convex Optimization /36-725 Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Some tensor decomposition methods for machine learning

Some tensor decomposition methods for machine learning Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition

More information

Primal-dual Subgradient Method for Convex Problems with Functional Constraints

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual

More information

Distributed Optimization: Analysis and Synthesis via Circuits

Distributed Optimization: Analysis and Synthesis via Circuits Distributed Optimization: Analysis and Synthesis via Circuits Stephen Boyd Prof. S. Boyd, EE364b, Stanford University Outline canonical form for distributed convex optimization circuit intepretation primal

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

EE364b Homework 4. L(y,ν) = (1/2) y x ν(1 T y 1), minimize (1/2) y x 2 2 subject to y 0, 1 T y = 1,

EE364b Homework 4. L(y,ν) = (1/2) y x ν(1 T y 1), minimize (1/2) y x 2 2 subject to y 0, 1 T y = 1, EE364b Prof. S. Boyd EE364b Homework 4 1. Projection onto the probability simplex. In this problem you will work out a simple method for finding the Euclidean projection y of x R n onto the probability

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP p.1/39

PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP p.1/39 PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University

More information

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by

Duality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by Duality (Continued) Recall, the general primal problem is min f ( x), xx g( x) 0 n m where X R, f : X R, g : XR ( X). he Lagrangian is a function L: XR R m defined by L( xλ, ) f ( x) λ g( x) Duality (Continued)

More information

Fantope Regularization in Metric Learning

Fantope Regularization in Metric Learning Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Compressive Sensing, Low Rank models, and Low Rank Submatrix

Compressive Sensing, Low Rank models, and Low Rank Submatrix Compressive Sensing,, and Low Rank Submatrix NICTA Short Course 2012 yi.li@nicta.com.au http://users.cecs.anu.edu.au/~yili Sep 12, 2012 ver. 1.8 http://tinyurl.com/brl89pk Outline Introduction 1 Introduction

More information

Distributed Convex Optimization

Distributed Convex Optimization Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens

More information

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley

LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

6.079/6.975 S. Boyd & P. Parrilo December 10 11, Final exam

6.079/6.975 S. Boyd & P. Parrilo December 10 11, Final exam 6.079/6.975 S. Boyd & P. Parrilo December 10 11, 2009. Final exam This is a 24 hour take-home final exam. Please turn it in to Professor Stephen Boyd, (Stata Center), on Friday December 11, at 5PM (or

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)

Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof

More information

Exercise Sheet 1.

Exercise Sheet 1. Exercise Sheet 1 You can download my lecture and exercise sheets at the address http://sami.hust.edu.vn/giang-vien/?name=huynt 1) Let A, B be sets. What does the statement "A is not a subset of B " mean?

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression

Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Additional Homework Problems

Additional Homework Problems Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

10-725/ Optimization Midterm Exam

10-725/ Optimization Midterm Exam 10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods

Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of

More information

EE364b Homework 2. λ i f i ( x) = 0, i=1

EE364b Homework 2. λ i f i ( x) = 0, i=1 EE364b Prof. S. Boyd EE364b Homework 2 1. Subgradient optimality conditions for nondifferentiable inequality constrained optimization. Consider the problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM

DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM Due: Monday, April 11, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to

More information

Warm up. Regrade requests submitted directly in Gradescope, do not instructors.

Warm up. Regrade requests submitted directly in Gradescope, do not  instructors. Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required

More information

9. Dual decomposition and dual algorithms

9. Dual decomposition and dual algorithms EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple

More information

Midterm: CS 6375 Spring 2018

Midterm: CS 6375 Spring 2018 Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

Exercises for EE364b

Exercises for EE364b Exercises for EE364b Stephen Boyd John Duchi May 30, 2018 Contents 1 Subgradients 2 2 Subgradient methods 6 3 Stochastic subgradient methods 13 4 Localization methods 19 5 Decomposition methods 24 6 Monotone

More information

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers

Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010

More information

CSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2

CSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2 CSE 250a. Assignment 4 Out: Tue Oct 26 Due: Tue Nov 2 4.1 Noisy-OR model X 1 X 2 X 3... X d Y For the belief network of binary random variables shown above, consider the noisy-or conditional probability

More information

Solving linear and non-linear SDP by PENNON

Solving linear and non-linear SDP by PENNON Solving linear and non-linear SDP by PENNON Michal Kočvara School of Mathematics, The University of Birmingham University of Warwick, February 2010 Outline Why nonlinear SDP? PENNON the new generation

More information

Sparse Covariance Selection using Semidefinite Programming

Sparse Covariance Selection using Semidefinite Programming Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Constrained optimization: direct methods (cont.)

Constrained optimization: direct methods (cont.) Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Analytic Center Cutting-Plane Method

Analytic Center Cutting-Plane Method Analytic Center Cutting-Plane Method S. Boyd, L. Vandenberghe, and J. Skaf April 14, 2011 Contents 1 Analytic center cutting-plane method 2 2 Computing the analytic center 3 3 Pruning constraints 5 4 Lower

More information

Lecture 1: Background on Convex Analysis

Lecture 1: Background on Convex Analysis Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

Randomized Smoothing for Stochastic Optimization

Randomized Smoothing for Stochastic Optimization Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Sparse Gaussian conditional random fields

Sparse Gaussian conditional random fields Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian

More information

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices

A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22

More information

EE364a: Convex Optimization I March or March 15 16, Final Exam

EE364a: Convex Optimization I March or March 15 16, Final Exam EE364a: Convex Optimization I March 14 15 or March 15 16, 2014 S. Boyd Final Exam This is a 24 hour take-home final. Please turn it in at Bytes Cafe in the Packard building, 24 hours after you pick it

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Support Vector Machine Classification with Indefinite Kernels

Support Vector Machine Classification with Indefinite Kernels Support Vector Machine Classification with Indefinite Kernels Ronny Luss ORFE, Princeton University Princeton, NJ 08544 rluss@princeton.edu Alexandre d Aspremont ORFE, Princeton University Princeton, NJ

More information

Dual Ascent. Ryan Tibshirani Convex Optimization

Dual Ascent. Ryan Tibshirani Convex Optimization Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate

More information

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison

Optimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big

More information

Convex Optimization / Homework 1, due September 19

Convex Optimization / Homework 1, due September 19 Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as

Geometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.

More information

Proximal Methods for Optimization with Spasity-inducing Norms

Proximal Methods for Optimization with Spasity-inducing Norms Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology

More information

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control

ALADIN An Algorithm for Distributed Non-Convex Optimization and Control ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of

More information

The moment-lp and moment-sos approaches

The moment-lp and moment-sos approaches The moment-lp and moment-sos approaches LAAS-CNRS and Institute of Mathematics, Toulouse, France CIRM, November 2013 Semidefinite Programming Why polynomial optimization? LP- and SDP- CERTIFICATES of POSITIVITY

More information

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade

Adaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random

More information

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version

Convex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com

More information

Coordinate Update Algorithm Short Course Operator Splitting

Coordinate Update Algorithm Short Course Operator Splitting Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators

More information

Dual Proximal Gradient Method

Dual Proximal Gradient Method Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method

More information

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012

More information

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.

Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. . Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,

More information

An Optimization-based Approach to Decentralized Assignability

An Optimization-based Approach to Decentralized Assignability 2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines

SMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,

More information