Final exam.
|
|
- Austen Goodman
- 5 years ago
- Views:
Transcription
1 EE364b Convex Optimization II June 4 8, 205 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have 96 hours to work on the final, your solutions must be typeset using L A TEX. We are expecting your solutions to be typo-free, clear, and correctly typeset. (And yes, we will deduct points for poor typesetting, typos, or unclear solutions.) All code submitted must be clear, commented and readable. To download Matlab or Julia files containing problem data, you ll have to type the whole URL given in the problem into your browser; there are no links on the course web page pointing to these files. To get a file called filename.m, for example, you would retrieve with your browser. Pleasemakesureeachproblemstartsonanewpage, say, byusingthe\clearpagecommand. (This generates a new page after printing out any figures that have floated forward.) your solutions to ee364b.submission@gmail.com by Monday June 7th 5pm at the latest.
2 . Robust truss design. A truss is a construction composed of thin elastic bars linked at nodes, that, when subjected to a load, deform until the reaction forces caused by deformations of the bars compensate external forces. In truss design problems, one wishes to deformations of the truss under different (typical) loading patterns. The goal in this problem is to develop truss designs robust to deviations from typical loads. A truss consists of p fixed nodes (attached to the ground or other immobile surface) and n free nodes. In a planar (two-dimensional) truss, each free node may move in two dimensions, so the truss s displacement is represented by a vector in R 2n. A design is a selection of m nonnegative bar volumes t R m + connecting the n + p nodes. We are given a total volume V of usable material, so we have the constraint that m t i V. Associated with a truss is a bar-stiffness matrix A(t) = m t ib i b T i parameterized by the volumes t R m +. The vectors b i R 2n are determined by the structure s geometry (nominal node locations) and characteristics of the bars material. (a) Given a load (vector of forces) f R 2n, the compliance is a measure of internal work done by the truss with respect to the load and is given by c f (t) = sup{2f T u u T A(t)u u R 2n }, and the goal is to design a stiff (small compliance) truss. Formulate the problem of designing a truss with the smallest possible compliance as a tractable convex optimization problem. Your final answer should not involve the inverse of A(t). (b) The design in part (a) is a single-load design: it s the compliance for a nominal load f R 2n, which may be brittle to even small loads other than f. In multi-load compliance, the goal is to find the vector of bar volumes which results in the smallest possible worst-case compliance for all f in an uncertainty set F, c F (t) = sup{2f T u u T A(t)u u R 2n,f F}. Letting F be the ellipsoid F = {Qe e R k, e 2 } for some matrix Q R 2n k, formulate the problem of minimizing c F (t) as a tractable convex optimization problem. Your final answer should not involve the inverse of A(t). (c) Given the data in robust_truss_data.[m jl], find the optimal truss designs for parts (a) (use f = f nominal) and (b) using your optimization formulation. The function plot_truss.[m jl] plot truss and displacement under forces f_nominal and f_occasional, the latter a small perturbation. Plot your truss designs as well as the truss t = V/ uniformly distributing material. Include your code, plots of displacement for each of the three truss designs the compliance under f nominal, and the distances displaced (printed by plot truss) under load. Note: Julia will not give sufficiently accurate solutions to this problem; we recommend Matlab for the most interpretable results. The matrix Q represents nominal forces as well as small perturbations. A common choice is to to take known loads f,f 2,...,f l R 2n, a small ǫ > 0, and set Q = [f f 2 f l ǫi 2n 2n ] R 2n 2n+l. 2
3 2. Convex functions of matrix eigenvalues. In this question, we explore an elegant construction of a wide variety of convex functions of matrices. Let S n denote the space of symmetric n n matrices. For any such matrix A, we let λ(a) R n denote its eigenvalues in non-increasing order, so λ (A) λ 2 (A)... λ n (A). Now, let f : R n R be a closed convex function that is symmetric, meaning that for every permutation matrix P, 2 we have f(px) = f(x). For such a function f, let f M be the matricization of f, the function defined on S n by f M (A) = f(λ(a)). We use convex conjugacy to show that f M is convex and to evaluate its derivatives. For this question (parts (b) and (d)), you may use von Neumann s trace inequality, which is that n Tr(AB) λ(a) T λ(b) = λ i (A)λ i (B), where equality is obtained in the first inequality if and only if A = U diag(λ(a))u T and B = U diag(λ(b))u T for an orthonormal matrix U. (a) Show that for A,B S n, the function A,B = Tr(AB) defines an inner product. (b) Show that convex conjugation and matricization commute, that is, show that for any matrix A S n, (f ) M (A) = (f M ) (A). (For a function f : S n R, we let f (A) = sup B {Tr(BA) f(b)}.) (c) Using the result of part (2b), show that f M is convex by arguing that (f M ) = f M. (d) Show that if A = U diag(λ(a))u T is the eigen-decomposition of A, then the subdifferential f M (A) = U diag ( f(λ(a)) ) U T, where f(λ(a)) denotes the subdifferential of f evaluated at λ(a). (The subdifferential of a function f : S n R at a point A is the set of matrices G S n such that f(b) f(a)+tr(g(b A)) for all B S n.) Hint. TheresultofQuestion.9inthehomeworkexercises,thatis,thatg f(x) if and only if g T x = f(x)+f (g), may be useful. (e) Using the results of part (2d), argue that for A 0, logdet(a) = A. 2 A matrix P is a permutation matrix if P {0,} n n and P = and P T =. This implies that P T P = I n n. 3
4 3. Neural spike train decoding via non-convex methods. Neurons in the retina, auditory cortex, and brain propagate signals rapidly by generating electrical pulses known as action potentials, which in signal-processing we represent as spike trains, sequences of activations where typically only a few elements of the signal are large and non-zero (above the activation threshold for the neuron). A standard problem in neuroscience and neural coding is to take (noisy) signals and resolve them into clean spike trains. In this problem, we study decoding a spike train x R n from a noisy signal y R n. As neurons are not constantly activated and have a refractory period (it takes time for an excitable membrane to transmit additional stimuli), we wish to encode sparsity in x and that non-zero x i locally inhibit other elements of the vector x. We thus formulate spike train recovery as a non-convex problem with variable x R n : 2 x y 2 2 subject to card(x) k x i x i+ = 0, i =,...,n, where k is a constant. We explore three heuristic approaches for this problem. Throughout this problem, use the data in the file spike_train_data.[m jl] for all implementation parts. To plot the resulting spike train (and original signal), use the method plot_spike_train.[m jl], using the true signal and the decoded one. (You may find it interesting to use the stem function to plot the original signal y as well.) (a) Lasso. Wefirstignoretheinhibitorypropertiesofthesignalandusel -regularization as a heuristic for cardinality. Give a closed form solution to 2 x y 2 2 +λ x. Find the resulting signal x for each λ {0.8,0.9,.0,.}. Include your code, the solution plot for λ =.9, and the output of plot_spike_train for each λ. (b) Sequential convex programing. Weextendthisl -regularizationheuristicandsolve a sequence of convex approximations to the (non-convex) problem i. Prove that for all α 0, n 2 x y 2 2 +ν x i x i+ +λ x. ab α 2 a2 + 2α b2, and that there is an α attaining equality. (Treat 0 2 and 0 2 /0 as 0.) ii. Using the relaxation of ab in 3(b)i, give an objective function f(x, α) in variables x R n and α R n +, where f(x,α) is convex in x and convex in α, and which satisfies inf f(x,α) = n α 0 2 x y 2 2 +ν x i x i+ +λ x. 4
5 iii. Implement an alternating minimization procedure for your function f(x, α). Is your procedure guaranteed to converge? Using λ =.9, ν =, and initializing from x = 0 and α =, run 200 iterations of alternating minimization on your function f(x,α), treating any 0/0 terms as 0. Include code, the solution plot (using data spike_train_data), and the output of plot_spike_train. (c) ADMM. It is possible to use ADMM for non-convex problems: consider solving f(x)+g(x), where f and g are (potentially) non-convex functions for which it is still possible to find an x v argmin x {f(x) + 2 x v 2 2 } (and likewise for g). Then we may introduce variable z = x to form the augmented Lagrangian L ρ (x,z,y) = f(x)+g(z)+y T (x z)+ ρ 2 x z 2 2, performing the usual ADMM steps over x,z, and y. This procedure is not guaranteed to converge but can be quite effective. Let I even and I odd be the even and odd indices of {,...,n }, respectively. Introducing variables x odd R n and x even R n, we consider the problem 2 x y 2 2 +λ x subject to x odd i x odd i+ = 0, i I odd x even i x even i+ = 0, i I even x = x odd = x even. Using for the{0, + }-valued indicator function, this has augmented Lagrangian L ρ (x,x odd,x even ) = 2 x y 2 2 +λ x +(νodd ) T (x odd x)+(ν even ) T (x even x) + i x odd i+ = 0}+ i x even i+ = 0} + ρ 2 i I odd {x odd i. Give a closed form solution to x + = argmin x i I even {x even x odd x ρ 2 xeven x 2 2. { x v 2 2 x ix i+ = 0 for i I odd }. ii. Give exact forms for ADMM updates for the three vectors x odd, x even, and the consensus vector x. iii. Implement your non-convex ADMM procedure with λ =.9, and run it for 200 iterations on the data in spike_train_data, initialized at x = x odd = x even = 0, using augmented Lagrangian multiplier ρ = 4. Take the final x of your ADMM iterations as x. Include code, a plot of your solution x, and the output of plot_spike_train. 5
6 4. ADMM for support vector machines (SVMs) In this problem, we investigate the performance of ADMM in relation to a subgradient method for a problem for which ADMM is quite natural. We consider solving N [ a T i x ] + + Nλ 2 x 2 2 () in the variable x R n, where [t] + = max{t,0}. (a) Introducing variables x i R n for i =,...,N (and associated dual variables y i ) with central variable z = x i write an augmented Lagrangian for the problem N [ a T i x ] + + Nλ 2 x 2 2. (The variables x i should correspond to functions f i (x) = [ a T i x ], while the + consensus variable z should also incorporate the (Nλ/2) x 2 2 of the objective). (b) Compute and give exact (closed form) updates for ADMM for the variables x i, z, and y i with your augmented Lagrangian form. (c) Using the data in svm_admm_data.[m jl], implement both projected sub-gradient descent and your ADMM algorithm for this problem. For the projected gradient descent algorithm, use projections onto the l 2 -ball of radius 2/λ (that is, the domain X = {x R n x 2 2/λ}; this is not strictly necessary but can be done without any loss of generality) and use the stepsize sequence α k = /(Nλk). For ADMM, use multiplier ρ = 3. Initializing each algorithm with all 0 vectors, run each algorithm for 200 iterations, and plot the gaps to optimality from the true solution (as calculated, say, by CVX) for each algorithm, using z k as your iterates for ADMM. Include the plot of optimality gaps and your code. 6
Final exam.
EE364b Convex Optimization II June 018 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have
More informationEE364b Convex Optimization II May 30 June 2, Final exam
EE364b Convex Optimization II May 30 June 2, 2014 Prof. S. Boyd Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you
More information10725/36725 Optimization Homework 4
10725/36725 Optimization Homework 4 Due November 27, 2012 at beginning of class Instructions: There are four questions in this assignment. Please submit your homework as (up to) 4 separate sets of pages
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationDual methods and ADMM. Barnabas Poczos & Ryan Tibshirani Convex Optimization /36-725
Dual methods and ADMM Barnabas Poczos & Ryan Tibshirani Convex Optimization 10-725/36-725 1 Given f : R n R, the function is called its conjugate Recall conjugate functions f (y) = max x R n yt x f(x)
More informationDistributed Optimization via Alternating Direction Method of Multipliers
Distributed Optimization via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University ITMANET, Stanford, January 2011 Outline precursors dual decomposition
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 15. Suvrit Sra. (Gradient methods III) 12 March, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 15 (Gradient methods III) 12 March, 2013 Suvrit Sra Optimal gradient methods 2 / 27 Optimal gradient methods We saw following efficiency estimates for
More informationEE364b Homework 5. A ij = φ i (x i,y i ) subject to Ax + s = 0, Ay + t = 0, with variables x, y R n. This is the bi-commodity network flow problem.
EE364b Prof. S. Boyd EE364b Homewor 5 1. Distributed method for bi-commodity networ flow problem. We consider a networ (directed graph) with n arcs and p nodes, described by the incidence matrix A R p
More informationminimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,
4 Duality 4.1 Numerical perturbation analysis example. Consider the quadratic program with variables x 1, x 2, and parameters u 1, u 2. minimize x 2 1 +2x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationPreconditioning via Diagonal Scaling
Preconditioning via Diagonal Scaling Reza Takapoui Hamid Javadi June 4, 2014 1 Introduction Interior point methods solve small to medium sized problems to high accuracy in a reasonable amount of time.
More informationProximal methods. S. Villa. October 7, 2014
Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationADMM and Fast Gradient Methods for Distributed Optimization
ADMM and Fast Gradient Methods for Distributed Optimization João Xavier Instituto Sistemas e Robótica (ISR), Instituto Superior Técnico (IST) European Control Conference, ECC 13 July 16, 013 Joint work
More informationConvex Optimization Theory. Athena Scientific, Supplementary Chapter 6 on Convex Optimization Algorithms
Convex Optimization Theory Athena Scientific, 2009 by Dimitri P. Bertsekas Massachusetts Institute of Technology Supplementary Chapter 6 on Convex Optimization Algorithms This chapter aims to supplement
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationAlgorithms for Nonsmooth Optimization
Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization
More informationHomework 4. Convex Optimization /36-725
Homework 4 Convex Optimization 10-725/36-725 Due Friday November 4 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationSome tensor decomposition methods for machine learning
Some tensor decomposition methods for machine learning Massimiliano Pontil Istituto Italiano di Tecnologia and University College London 16 August 2016 1 / 36 Outline Problem and motivation Tucker decomposition
More informationPrimal-dual Subgradient Method for Convex Problems with Functional Constraints
Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov, CORE/INMA (UCL) Workshop on embedded optimization EMBOPT2014 September 9, 2014 (Lucca) Yu. Nesterov Primal-dual
More informationDistributed Optimization: Analysis and Synthesis via Circuits
Distributed Optimization: Analysis and Synthesis via Circuits Stephen Boyd Prof. S. Boyd, EE364b, Stanford University Outline canonical form for distributed convex optimization circuit intepretation primal
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationEE364b Homework 4. L(y,ν) = (1/2) y x ν(1 T y 1), minimize (1/2) y x 2 2 subject to y 0, 1 T y = 1,
EE364b Prof. S. Boyd EE364b Homework 4 1. Projection onto the probability simplex. In this problem you will work out a simple method for finding the Euclidean projection y of x R n onto the probability
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationPENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP p.1/39
PENNON A Generalized Augmented Lagrangian Method for Convex NLP and SDP Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University
More informationDuality (Continued) min f ( x), X R R. Recall, the general primal problem is. The Lagrangian is a function. defined by
Duality (Continued) Recall, the general primal problem is min f ( x), xx g( x) 0 n m where X R, f : X R, g : XR ( X). he Lagrangian is a function L: XR R m defined by L( xλ, ) f ( x) λ g( x) Duality (Continued)
More informationFantope Regularization in Metric Learning
Fantope Regularization in Metric Learning CVPR 2014 Marc T. Law (LIP6, UPMC), Nicolas Thome (LIP6 - UPMC Sorbonne Universités), Matthieu Cord (LIP6 - UPMC Sorbonne Universités), Paris, France Introduction
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationA direct formulation for sparse PCA using semidefinite programming
A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationCompressive Sensing, Low Rank models, and Low Rank Submatrix
Compressive Sensing,, and Low Rank Submatrix NICTA Short Course 2012 yi.li@nicta.com.au http://users.cecs.anu.edu.au/~yili Sep 12, 2012 ver. 1.8 http://tinyurl.com/brl89pk Outline Introduction 1 Introduction
More informationDistributed Convex Optimization
Master Program 2013-2015 Electrical Engineering Distributed Convex Optimization A Study on the Primal-Dual Method of Multipliers Delft University of Technology He Ming Zhang, Guoqiang Zhang, Richard Heusdens
More informationLOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley
LOCAL LINEAR CONVERGENCE OF ADMM Daniel Boley Model QP/LP: min 1 / 2 x T Qx+c T x s.t. Ax = b, x 0, (1) Lagrangian: L(x,y) = 1 / 2 x T Qx+c T x y T x s.t. Ax = b, (2) where y 0 is the vector of Lagrange
More informationThe Steepest Descent Algorithm for Unconstrained Optimization
The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More information6.079/6.975 S. Boyd & P. Parrilo December 10 11, Final exam
6.079/6.975 S. Boyd & P. Parrilo December 10 11, 2009. Final exam This is a 24 hour take-home final exam. Please turn it in to Professor Stephen Boyd, (Stata Center), on Friday December 11, at 5PM (or
More informationNew Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit
New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationSeries 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning)
Exercises Introduction to Machine Learning SS 2018 Series 6, May 14th, 2018 (EM Algorithm and Semi-Supervised Learning) LAS Group, Institute for Machine Learning Dept of Computer Science, ETH Zürich Prof
More informationExercise Sheet 1.
Exercise Sheet 1 You can download my lecture and exercise sheets at the address http://sami.hust.edu.vn/giang-vien/?name=huynt 1) Let A, B be sets. What does the statement "A is not a subset of B " mean?
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationAdditional Homework Problems
Additional Homework Problems Robert M. Freund April, 2004 2004 Massachusetts Institute of Technology. 1 2 1 Exercises 1. Let IR n + denote the nonnegative orthant, namely IR + n = {x IR n x j ( ) 0,j =1,...,n}.
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information10-725/ Optimization Midterm Exam
10-725/36-725 Optimization Midterm Exam November 6, 2012 NAME: ANDREW ID: Instructions: This exam is 1hr 20mins long Except for a single two-sided sheet of notes, no other material or discussion is permitted
More informationCoordinate Update Algorithm Short Course Proximal Operators and Algorithms
Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow
More informationDistributed Smooth and Strongly Convex Optimization with Inexact Dual Methods
Distributed Smooth and Strongly Convex Optimization with Inexact Dual Methods Mahyar Fazlyab, Santiago Paternain, Alejandro Ribeiro and Victor M. Preciado Abstract In this paper, we consider a class of
More informationEE364b Homework 2. λ i f i ( x) = 0, i=1
EE364b Prof. S. Boyd EE364b Homework 2 1. Subgradient optimality conditions for nondifferentiable inequality constrained optimization. Consider the problem minimize f 0 (x) subject to f i (x) 0, i = 1,...,m,
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM
DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM Due: Monday, April 11, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to
More informationWarm up. Regrade requests submitted directly in Gradescope, do not instructors.
Warm up Regrade requests submitted directly in Gradescope, do not email instructors. 1 float in NumPy = 8 bytes 10 6 2 20 bytes = 1 MB 10 9 2 30 bytes = 1 GB For each block compute the memory required
More information9. Dual decomposition and dual algorithms
EE 546, Univ of Washington, Spring 2016 9. Dual decomposition and dual algorithms dual gradient ascent example: network rate control dual decomposition and the proximal gradient method examples with simple
More informationMidterm: CS 6375 Spring 2018
Midterm: CS 6375 Spring 2018 The exam is closed book (1 cheat sheet allowed). Answer the questions in the spaces provided on the question sheets. If you run out of room for an answer, use an additional
More informationHow hard is this function to optimize?
How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize
More informationExercises for EE364b
Exercises for EE364b Stephen Boyd John Duchi May 30, 2018 Contents 1 Subgradients 2 2 Subgradient methods 6 3 Stochastic subgradient methods 13 4 Localization methods 19 5 Decomposition methods 24 6 Monotone
More informationDistributed Optimization and Statistics via Alternating Direction Method of Multipliers
Distributed Optimization and Statistics via Alternating Direction Method of Multipliers Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato Stanford University Stanford Statistics Seminar, September 2010
More informationCSE 250a. Assignment Noisy-OR model. Out: Tue Oct 26 Due: Tue Nov 2
CSE 250a. Assignment 4 Out: Tue Oct 26 Due: Tue Nov 2 4.1 Noisy-OR model X 1 X 2 X 3... X d Y For the belief network of binary random variables shown above, consider the noisy-or conditional probability
More informationSolving linear and non-linear SDP by PENNON
Solving linear and non-linear SDP by PENNON Michal Kočvara School of Mathematics, The University of Birmingham University of Warwick, February 2010 Outline Why nonlinear SDP? PENNON the new generation
More informationSparse Covariance Selection using Semidefinite Programming
Sparse Covariance Selection using Semidefinite Programming A. d Aspremont ORFE, Princeton University Joint work with O. Banerjee, L. El Ghaoui & G. Natsoulis, U.C. Berkeley & Iconix Pharmaceuticals Support
More informationConditions for Robust Principal Component Analysis
Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and
More informationConstrained optimization: direct methods (cont.)
Constrained optimization: direct methods (cont.) Jussi Hakanen Post-doctoral researcher jussi.hakanen@jyu.fi Direct methods Also known as methods of feasible directions Idea in a point x h, generate a
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationAnalytic Center Cutting-Plane Method
Analytic Center Cutting-Plane Method S. Boyd, L. Vandenberghe, and J. Skaf April 14, 2011 Contents 1 Analytic center cutting-plane method 2 2 Computing the analytic center 3 3 Pruning constraints 5 4 Lower
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationLearning with stochastic proximal gradient
Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and
More informationRandomized Smoothing for Stochastic Optimization
Randomized Smoothing for Stochastic Optimization John Duchi Peter Bartlett Martin Wainwright University of California, Berkeley NIPS Big Learn Workshop, December 2011 Duchi (UC Berkeley) Smoothing and
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationLinear Algebra Massoud Malek
CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product
More informationSparse Gaussian conditional random fields
Sparse Gaussian conditional random fields Matt Wytock, J. ico Kolter School of Computer Science Carnegie Mellon University Pittsburgh, PA 53 {mwytock, zkolter}@cs.cmu.edu Abstract We propose sparse Gaussian
More informationA Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices
A Fast Augmented Lagrangian Algorithm for Learning Low-Rank Matrices Ryota Tomioka 1, Taiji Suzuki 1, Masashi Sugiyama 2, Hisashi Kashima 1 1 The University of Tokyo 2 Tokyo Institute of Technology 2010-06-22
More informationEE364a: Convex Optimization I March or March 15 16, Final Exam
EE364a: Convex Optimization I March 14 15 or March 15 16, 2014 S. Boyd Final Exam This is a 24 hour take-home final. Please turn it in at Bytes Cafe in the Packard building, 24 hours after you pick it
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More information10-725/36-725: Convex Optimization Spring Lecture 21: April 6
10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationSupport Vector Machine Classification with Indefinite Kernels
Support Vector Machine Classification with Indefinite Kernels Ronny Luss ORFE, Princeton University Princeton, NJ 08544 rluss@princeton.edu Alexandre d Aspremont ORFE, Princeton University Princeton, NJ
More informationDual Ascent. Ryan Tibshirani Convex Optimization
Dual Ascent Ryan Tibshirani Conve Optimization 10-725 Last time: coordinate descent Consider the problem min f() where f() = g() + n i=1 h i( i ), with g conve and differentiable and each h i conve. Coordinate
More informationOptimization. Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison
Optimization Benjamin Recht University of California, Berkeley Stephen Wright University of Wisconsin-Madison optimization () cost constraints might be too much to cover in 3 hours optimization (for big
More informationConvex Optimization / Homework 1, due September 19
Convex Optimization 1-725/36-725 Homework 1, due September 19 Instructions: You must complete Problems 1 3 and either Problem 4 or Problem 5 (your choice between the two). When you submit the homework,
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationGeometric problems. Chapter Projection on a set. The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as
Chapter 8 Geometric problems 8.1 Projection on a set The distance of a point x 0 R n to a closed set C R n, in the norm, is defined as dist(x 0,C) = inf{ x 0 x x C}. The infimum here is always achieved.
More informationProximal Methods for Optimization with Spasity-inducing Norms
Proximal Methods for Optimization with Spasity-inducing Norms Group Learning Presentation Xiaowei Zhou Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology
More informationALADIN An Algorithm for Distributed Non-Convex Optimization and Control
ALADIN An Algorithm for Distributed Non-Convex Optimization and Control Boris Houska, Yuning Jiang, Janick Frasch, Rien Quirynen, Dimitris Kouzoupis, Moritz Diehl ShanghaiTech University, University of
More informationThe moment-lp and moment-sos approaches
The moment-lp and moment-sos approaches LAAS-CNRS and Institute of Mathematics, Toulouse, France CIRM, November 2013 Semidefinite Programming Why polynomial optimization? LP- and SDP- CERTIFICATES of POSITIVITY
More informationAdaptive Gradient Methods AdaGrad / Adam. Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade
Adaptive Gradient Methods AdaGrad / Adam Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade 1 Announcements: HW3 posted Dual coordinate ascent (some review of SGD and random
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationCoordinate Update Algorithm Short Course Operator Splitting
Coordinate Update Algorithm Short Course Operator Splitting Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 25 Operator splitting pipeline 1. Formulate a problem as 0 A(x) + B(x) with monotone operators
More informationDual Proximal Gradient Method
Dual Proximal Gradient Method http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/19 1 proximal gradient method
More informationSemidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization
Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization Instructor: Farid Alizadeh Author: Ai Kagawa 12/12/2012
More informationSelected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A.
. Selected Examples of CONIC DUALITY AT WORK Robust Linear Optimization Synthesis of Linear Controllers Matrix Cube Theorem A. Nemirovski Arkadi.Nemirovski@isye.gatech.edu Linear Optimization Problem,
More informationAn Optimization-based Approach to Decentralized Assignability
2016 American Control Conference (ACC) Boston Marriott Copley Place July 6-8, 2016 Boston, MA, USA An Optimization-based Approach to Decentralized Assignability Alborz Alavian and Michael Rotkowitz Abstract
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More information