Final exam.

Size: px

Start display at page:

Download "Final exam."

Austen Goodman
5 years ago
Views:

1 EE364b Convex Optimization II June 4 8, 205 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have 96 hours to work on the final, your solutions must be typeset using L A TEX. We are expecting your solutions to be typo-free, clear, and correctly typeset. (And yes, we will deduct points for poor typesetting, typos, or unclear solutions.) All code submitted must be clear, commented and readable. To download Matlab or Julia files containing problem data, you ll have to type the whole URL given in the problem into your browser; there are no links on the course web page pointing to these files. To get a file called filename.m, for example, you would retrieve with your browser. Pleasemakesureeachproblemstartsonanewpage, say, byusingthe\clearpagecommand. (This generates a new page after printing out any figures that have floated forward.) your solutions to ee364b.submission@gmail.com by Monday June 7th 5pm at the latest.

2 . Robust truss design. A truss is a construction composed of thin elastic bars linked at nodes, that, when subjected to a load, deform until the reaction forces caused by deformations of the bars compensate external forces. In truss design problems, one wishes to deformations of the truss under different (typical) loading patterns. The goal in this problem is to develop truss designs robust to deviations from typical loads. A truss consists of p fixed nodes (attached to the ground or other immobile surface) and n free nodes. In a planar (two-dimensional) truss, each free node may move in two dimensions, so the truss s displacement is represented by a vector in R 2n. A design is a selection of m nonnegative bar volumes t R m + connecting the n + p nodes. We are given a total volume V of usable material, so we have the constraint that m t i V. Associated with a truss is a bar-stiffness matrix A(t) = m t ib i b T i parameterized by the volumes t R m +. The vectors b i R 2n are determined by the structure s geometry (nominal node locations) and characteristics of the bars material. (a) Given a load (vector of forces) f R 2n, the compliance is a measure of internal work done by the truss with respect to the load and is given by c f (t) = sup{2f T u u T A(t)u u R 2n }, and the goal is to design a stiff (small compliance) truss. Formulate the problem of designing a truss with the smallest possible compliance as a tractable convex optimization problem. Your final answer should not involve the inverse of A(t). (b) The design in part (a) is a single-load design: it s the compliance for a nominal load f R 2n, which may be brittle to even small loads other than f. In multi-load compliance, the goal is to find the vector of bar volumes which results in the smallest possible worst-case compliance for all f in an uncertainty set F, c F (t) = sup{2f T u u T A(t)u u R 2n,f F}. Letting F be the ellipsoid F = {Qe e R k, e 2 } for some matrix Q R 2n k, formulate the problem of minimizing c F (t) as a tractable convex optimization problem. Your final answer should not involve the inverse of A(t). (c) Given the data in robust_truss_data.[m jl], find the optimal truss designs for parts (a) (use f = f nominal) and (b) using your optimization formulation. The function plot_truss.[m jl] plot truss and displacement under forces f_nominal and f_occasional, the latter a small perturbation. Plot your truss designs as well as the truss t = V/ uniformly distributing material. Include your code, plots of displacement for each of the three truss designs the compliance under f nominal, and the distances displaced (printed by plot truss) under load. Note: Julia will not give sufficiently accurate solutions to this problem; we recommend Matlab for the most interpretable results. The matrix Q represents nominal forces as well as small perturbations. A common choice is to to take known loads f,f 2,...,f l R 2n, a small ǫ > 0, and set Q = [f f 2 f l ǫi 2n 2n ] R 2n 2n+l. 2

3 2. Convex functions of matrix eigenvalues. In this question, we explore an elegant construction of a wide variety of convex functions of matrices. Let S n denote the space of symmetric n n matrices. For any such matrix A, we let λ(a) R n denote its eigenvalues in non-increasing order, so λ (A) λ 2 (A)... λ n (A). Now, let f : R n R be a closed convex function that is symmetric, meaning that for every permutation matrix P, 2 we have f(px) = f(x). For such a function f, let f M be the matricization of f, the function defined on S n by f M (A) = f(λ(a)). We use convex conjugacy to show that f M is convex and to evaluate its derivatives. For this question (parts (b) and (d)), you may use von Neumann s trace inequality, which is that n Tr(AB) λ(a) T λ(b) = λ i (A)λ i (B), where equality is obtained in the first inequality if and only if A = U diag(λ(a))u T and B = U diag(λ(b))u T for an orthonormal matrix U. (a) Show that for A,B S n, the function A,B = Tr(AB) defines an inner product. (b) Show that convex conjugation and matricization commute, that is, show that for any matrix A S n, (f ) M (A) = (f M ) (A). (For a function f : S n R, we let f (A) = sup B {Tr(BA) f(b)}.) (c) Using the result of part (2b), show that f M is convex by arguing that (f M ) = f M. (d) Show that if A = U diag(λ(a))u T is the eigen-decomposition of A, then the subdifferential f M (A) = U diag ( f(λ(a)) ) U T, where f(λ(a)) denotes the subdifferential of f evaluated at λ(a). (The subdifferential of a function f : S n R at a point A is the set of matrices G S n such that f(b) f(a)+tr(g(b A)) for all B S n.) Hint. TheresultofQuestion.9inthehomeworkexercises,thatis,thatg f(x) if and only if g T x = f(x)+f (g), may be useful. (e) Using the results of part (2d), argue that for A 0, logdet(a) = A. 2 A matrix P is a permutation matrix if P {0,} n n and P = and P T =. This implies that P T P = I n n. 3

4 3. Neural spike train decoding via non-convex methods. Neurons in the retina, auditory cortex, and brain propagate signals rapidly by generating electrical pulses known as action potentials, which in signal-processing we represent as spike trains, sequences of activations where typically only a few elements of the signal are large and non-zero (above the activation threshold for the neuron). A standard problem in neuroscience and neural coding is to take (noisy) signals and resolve them into clean spike trains. In this problem, we study decoding a spike train x R n from a noisy signal y R n. As neurons are not constantly activated and have a refractory period (it takes time for an excitable membrane to transmit additional stimuli), we wish to encode sparsity in x and that non-zero x i locally inhibit other elements of the vector x. We thus formulate spike train recovery as a non-convex problem with variable x R n : 2 x y 2 2 subject to card(x) k x i x i+ = 0, i =,...,n, where k is a constant. We explore three heuristic approaches for this problem. Throughout this problem, use the data in the file spike_train_data.[m jl] for all implementation parts. To plot the resulting spike train (and original signal), use the method plot_spike_train.[m jl], using the true signal and the decoded one. (You may find it interesting to use the stem function to plot the original signal y as well.) (a) Lasso. Wefirstignoretheinhibitorypropertiesofthesignalandusel -regularization as a heuristic for cardinality. Give a closed form solution to 2 x y 2 2 +λ x. Find the resulting signal x for each λ {0.8,0.9,.0,.}. Include your code, the solution plot for λ =.9, and the output of plot_spike_train for each λ. (b) Sequential convex programing. Weextendthisl -regularizationheuristicandsolve a sequence of convex approximations to the (non-convex) problem i. Prove that for all α 0, n 2 x y 2 2 +ν x i x i+ +λ x. ab α 2 a2 + 2α b2, and that there is an α attaining equality. (Treat 0 2 and 0 2 /0 as 0.) ii. Using the relaxation of ab in 3(b)i, give an objective function f(x, α) in variables x R n and α R n +, where f(x,α) is convex in x and convex in α, and which satisfies inf f(x,α) = n α 0 2 x y 2 2 +ν x i x i+ +λ x. 4

5 iii. Implement an alternating minimization procedure for your function f(x, α). Is your procedure guaranteed to converge? Using λ =.9, ν =, and initializing from x = 0 and α =, run 200 iterations of alternating minimization on your function f(x,α), treating any 0/0 terms as 0. Include code, the solution plot (using data spike_train_data), and the output of plot_spike_train. (c) ADMM. It is possible to use ADMM for non-convex problems: consider solving f(x)+g(x), where f and g are (potentially) non-convex functions for which it is still possible to find an x v argmin x {f(x) + 2 x v 2 2 } (and likewise for g). Then we may introduce variable z = x to form the augmented Lagrangian L ρ (x,z,y) = f(x)+g(z)+y T (x z)+ ρ 2 x z 2 2, performing the usual ADMM steps over x,z, and y. This procedure is not guaranteed to converge but can be quite effective. Let I even and I odd be the even and odd indices of {,...,n }, respectively. Introducing variables x odd R n and x even R n, we consider the problem 2 x y 2 2 +λ x subject to x odd i x odd i+ = 0, i I odd x even i x even i+ = 0, i I even x = x odd = x even. Using for the{0, + }-valued indicator function, this has augmented Lagrangian L ρ (x,x odd,x even ) = 2 x y 2 2 +λ x +(νodd ) T (x odd x)+(ν even ) T (x even x) + i x odd i+ = 0}+ i x even i+ = 0} + ρ 2 i I odd {x odd i. Give a closed form solution to x + = argmin x i I even {x even x odd x ρ 2 xeven x 2 2. { x v 2 2 x ix i+ = 0 for i I odd }. ii. Give exact forms for ADMM updates for the three vectors x odd, x even, and the consensus vector x. iii. Implement your non-convex ADMM procedure with λ =.9, and run it for 200 iterations on the data in spike_train_data, initialized at x = x odd = x even = 0, using augmented Lagrangian multiplier ρ = 4. Take the final x of your ADMM iterations as x. Include code, a plot of your solution x, and the output of plot_spike_train. 5

6 4. ADMM for support vector machines (SVMs) In this problem, we investigate the performance of ADMM in relation to a subgradient method for a problem for which ADMM is quite natural. We consider solving N [ a T i x ] + + Nλ 2 x 2 2 () in the variable x R n, where [t] + = max{t,0}. (a) Introducing variables x i R n for i =,...,N (and associated dual variables y i ) with central variable z = x i write an augmented Lagrangian for the problem N [ a T i x ] + + Nλ 2 x 2 2. (The variables x i should correspond to functions f i (x) = [ a T i x ], while the + consensus variable z should also incorporate the (Nλ/2) x 2 2 of the objective). (b) Compute and give exact (closed form) updates for ADMM for the variables x i, z, and y i with your augmented Lagrangian form. (c) Using the data in svm_admm_data.[m jl], implement both projected sub-gradient descent and your ADMM algorithm for this problem. For the projected gradient descent algorithm, use projections onto the l 2 -ball of radius 2/λ (that is, the domain X = {x R n x 2 2/λ}; this is not strictly necessary but can be done without any loss of generality) and use the stepsize sequence α k = /(Nλk). For ADMM, use multiplier ρ = 3. Initializing each algorithm with all 0 vectors, run each algorithm for 200 iterations, and plot the gaps to optimality from the true solution (as calculated, say, by CVX) for each algorithm, using z k as your iterates for ADMM. Include the plot of optimality gaps and your code. 6

Final exam.

EE364b Convex Optimization II June 018 Prof. John C. Duchi Final exam By now, you know how it works, so we won t repeat it here. (If not, see the instructions for the EE364a final exam.) Since you have