MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver

Size: px

Start display at page:

Download "MS&E 318 (CME 338) Large-Scale Numerical Optimization. A Lasso Solver"

Helen Poole
5 years ago
Views:

1 Stanford University, Dept of Management Science and Engineering MS&E 318 (CME 338) Large-Scale Numerical Optimization Instructor: Michael Saunders Spring 2011 Final Project Due Friday June 10 A Lasso Solver We follow the terminology of van den Berg and Friedlander [7] in defining three parameterized problems arising in the estimation of linear models A b (where A is m n). The problems are equivalent in the sense that if one of them has its tuning parameter specified, the optimal solution for the other two problems will be the same if those problems are solved with certain parameter values (though it is nontrivial to find the matching parameter values). The first problem was defined by Tibshirani (1996) [6] to be the Lasso problem (least absolute shrinkage and selection operator): LS τ minimize A b 2 2 subject to 1 τ, where smaller values of the tuning parameter τ t to shrink more components of to be eactly zero. Models with m > n were probably envisaged at the time, although the Lasso problem is well defined for any m and n if 0 τ <. For signal analysis problems in which A was likely to be a fast operator, Chen, Donoho, and Saunders (1998) [2, 3] defined the parameter-free Basis Pursuit (BP) problem min 1 subject to A = b (BP) and also the Basis Pursuit Denoising problem (BPDN), which is a conve quadratic optimization problem with tuning parameter λ: QP λ minimize A b λ 1 subject to no constraints. Models with m n were envisaged for BP and BPDN, although problem QP λ is well defined for any m and n if 0 < λ. Finally, van den Berg and Friedlander [7, 5] regard the following problem as a more practical formulation of BPDN: BP σ minimize 1 subject to A b 2 2 σ2, because σ is more likely to be known than λ, and σ = 0 gives the BP problem naturally (whereas QP λ needs λ 0, where it is not well defined unless A has full column rank). Problem BP σ is well defined for any m and n for all σ 0.

2 2 MS&E 318 (CME 338) Large-Scale Numerical Optimization At the time Lasso was proposed, good solvers were not known for problem LS τ. The BPDN solvers in [2, 3] could handle problem QP λ with a given value of λ, allowing A to be a large fast operator (typically an over-complete dictionary composed of wavelets, chirplets, warplets, curvelets, etc). Efron, Hastie, Johnstone, and Tibshirani (2004) [4] later developed the LARS algorithm, a version of which can solve problem QP λ for all 0 < λ. In principle, LARS can be used to solve LS τ and BP σ with any specific τ or σ. Later, van den Berg and Friedlander (2008) [7, 5] developed algorithm SPGL1 to solve LS τ with a given τ (allowing A to be a fast operator). They include a rootfinding algorithm to solve problem BP σ with a specified σ 0 (by solving a sequence of Lasso problems LS τ ). They further generalized SPGL1 to handle comple data, group sparsity, and multiple measurement vectors (MMV); see [5]. Although SPGL1 is a highly versatile solver, it should be of interest to develop a conventional active-set solver for the Lasso problem, using established algorithms for the classical problem of nonnegative least squares (NNLS). Our project is a step in this direction. Smooth Formulations Note that the Lasso problem can be solved as the linearly constrained least-squares problem lasso τ 1 minimize v, w 2 Av Aw b 2 2 subject to e T v + e T w τ, v, w 0, where e is a vector of 1s and = v w. nonnegative Lasso problem For simplicity, we first consider the nnlasso τ 1 minimize 2 A b 2 2 subject to e T τ, 0. Keep in mind that an algorithm for solving nnlasso τ can easily be used to solve lasso τ. The Lasso problem is not very interesting unless the constraint 1 τ is binding. Thus our strategy for nnlasso τ is to assume that e T = τ. We can then apply the reduced-gradient method (an active-set method). There will be a single basic variable, and the remaining variables will be superbasic or nonbasic. If the basic variable is chosen fortuitously it will never be knocked out of the basis (it will stay strictly positive). Also, if τ is small enough there will not be many superbasic variables. Most variables will remain nonbasic at zero. Project Resources We will use Matlab and/or Tomlab to solve nnlasso τ directly using algorithms in the optim toolbo or elsewhere. The files mentioned below can be downloaded from

3 Spring 2011 Final Project: a Lasso Solver 3 Project Tasks 1. File lassoproblem.m generates some typical data (A, b). Use it to generate an eample nnlasso τ problem (type=0) and a sensible value for τ. In general we trust that the norms of each column of A will be roughly equal. Since A is eplicit and dense, we can compute these reasonably efficiently. Find a vectorized way to do this (no for loops!). Report the largest and smallest norm. 2. File spgl1problem.m generates slightly different data (A, b). Do the same for the problem suggested in the help info. 3. File nnlasso.m knows about some of the solvers available in Matlab and Tomlab. It also knows how to start Tomlab on the campus Linu cluster. Give a brief description of the solvers lsqlin and lssol. Are they suitable for problem nnlasso τ? 4. Set the Matlab and Tomlab paths inside nnlasso.m and solve the problems in Q1 and Q2 using matlab/lsqlin, tomlab/lsqlin and tomlab/lssol. Report your results and any differences that you notice. 5. In order to develop our new solver for nnlasso τ, we need a good choice for the basic variable in the reduced-gradient algorithm. Let s suppose that eactly one variable is positive. By studying r A b for each of n possible 1-variable solutions, deduce a reasonable method for choosing a good basic variable. (This method is used in nnlasso.m. We need to justify it.) 6. Suppose the chosen variable is B and the remaining variables are. We can use the constraint e T = τ to eliminate the basic variable. If we ignore the implicit constraint B 0, derive a problem that should solve. (It is an NNLS problem. We need to describe it algebraically.) 7. [,inform] = nnlasso(a,b,tau, matlab/lsqnonneg ); or [,inform] = nnlasso(a,b,tau, tomlab/lsqnonneg ); solves the nnlasso τ problem. Run either or both of these and report your findings. 8. nnlasso warns us if B has gone negative. Suggest a strategy for choosing a different basic variable from the solution obtained so far. (Bear in mind that a good NNLS solver would be able to warm start from the active set that you construct.) 9. Michael Friedlander s solver BCLS [1] is designed for bound-constrained leastsquares problems. Study the BCLS website and describe the features that would make BCLS a good solver for implementing nnlasso τ. 10. Given our solver for nnlasso τ, use it to construct a solver for the original problem lasso τ. Test it using the problem generators with type=1.

4 4 MS&E 318 (CME 338) Large-Scale Numerical Optimization function [A,b,1norm] = lassoproblem(m,n,k,type) [A,b,1norm] = lassoproblem(m,n,k,type); generates data for two types of Lasso problem: type=0: min 1/2 A - b _2^2 st sum() <= tau, >= 0; type=1: min 1/2 A - b _2^2 st norm(,1) <= tau, where A is size m n and there would be k nonzeros in if tau = infinity. We assume m>0, n>0, 1<=k<=n. It doesn t matter if m<=n or m>= n. Eample: matlab get into Matlab cd <projectlasso> your own directory m = 10; n = 50; k = 5; eample dimensions [A,b,1norm] = lassoproblem(m,n,k,0); 0 => nonneg Lasso or [A,b,1norm] = lassoproblem(m,n,k,1); 1 => normal Lasso tau = 0.5*1norm; smaller than 1norm 19 May 2011: First version for class project. Michael Saunders, MS&E318/CME338 instructor, Stanford University. rand ( state,0); Initialize the good old way randn( state,0); (easy to remember) p = randperm(n); p = p(1:k); Position of nonzeros in = zeros(n,1); Generate sparse solution k = randn(k,1); switch type case 0 (p) = abs(k); >= 0 case 1 (p) = k; positive or negative otherwise error( type must be 0 or 1 ) A = randn(m,n); Gaussian m-by-n ensemble b = A*; The rhs vector 1norm = norm(k,1);

5 Spring 2011 Final Project: a Lasso Solver 5 function [,inform] = nnlasso(a,b,tau,solver) [,inform] = nnlasso(a,b,tau,solver); solves the nonnegative Lasso problem min 1/2 A - b _2^2 s.t. sum() <= tau, >= 0. where tau > 0 and "solver" is a string that specifies the solver to be used. Eamples: [,inform] = nnlasso(a,b,tau, matlab/lsqlin ); or [,inform] = nnlasso(a,b,tau, tomlab/lsqlin ); or [,inform] = nnlasso(a,b,tau, tomlab/lssol ); or [,inform] = nnlasso(a,b,tau, matlab/lsqnonneg ); or [,inform] = nnlasso(a,b,tau, tomlab/lsqnonneg ); The first 3 eamples solve the nonnegative Lasso problem directly. The last 2 eamples convert it to a nonnegative least-squares problem. Before running nnlasso, set matlabpath and tomlabpath inside nnlasso.m. 22 May 2011: First version for class project. Michael Saunders, MS&E318/CME338 instructor, Stanford University. Select 2 of the net 4 lines: matlabpath = /afs/ir/software/matlab-2010b ; on Linu cluster tomlabpath = /afs/ir/software/matlab-2010b/tomlab ; matlabpath = /Applications/MATLAB_R2009b.app ; on Michael s imac tomlabpath = /Applications/tomlab ; Save wd, A, b, and size of problem wd = cd; AA = A; bb = b; [m,n] = size(a); lab = solver(1:6); matlab or tomlab alg = solver(8:); lsqlin, lssol, or lsqnonneg Start TOMLAB if necessary if lab== tomlab & ~eist( tomlab ) cd(tomlabpath); startup Set path to solver switch solver case matlab/lsqlin ; cd([matlabpath /toolbo/optim/optim ]); case tomlab/lsqlin ; cd([tomlabpath /optim ]); case tomlab/lssol ; cd([tomlabpath /me ]); case matlab/lsqnonneg ; cd([matlabpath /toolbo/matlab/matfun ]); case tomlab/lsqnonneg ; cd([tomlabpath /optim ]); otherwise error( Unknown solver )

6 6 MS&E 318 (CME 338) Large-Scale Numerical Optimization Setup inputs for the specified algorithm, then solve the problem switch alg case lsqlin Solve nonnegative Lasso directly C = A; d = b; A = ones(1,n); b = tau; Aeq = []; beq = []; bl = zeros(n,1); bu = inf(n,1); 0 = zeros(n,1); [,resnorm,residual,eitflag,output,lambda] = lsqlin(c,d,a,b,aeq,beq,bl,bu,0); inform = eitflag; output case lssol Solve nonnegative Lasso directly H = A; A = ones(1,n); bl = [zeros(n,1); 0]; bu = [ inf(n,1); tau]; c = []; zeros(n,1); = zeros(n,1); optpar = []; [,inform,istate,clamda,iter,fobj,r,k] = lssol(a,bl,bu,c,,optpar,h,b); case lsqnonneg Reduce the problem to nonnegative least squares by choosing a basic variable (jb) and eliminating it via (jb) = tau - sum((j),j~=jb) (assuming the linear constraint is binding) [rb,jb] = ma(a *b); ab = A(:,jB); C = A; C(:,jB) = []; C = C - ab*ones(1,n-1); d = b - ab*tau; 0 = zeros(n-1,1); [S,resnorm,residual,eitflag] = lsqnonneg(c,d); inform = eitflag; B = tau - sum(s); Evaluate the basic variable js = 1:n; Epand B,S into full js(jb) = []; = zeros(n,1); (jb) = B; (js) = S; fprintf( \n The basic variable is ) fprintf( \n 7i 15.7e, jb,b) if B <= -1e-12 fprintf( \n WARNING: it has the wrong sign ) inform = -1; otherwise error( Unknown algorithm )

7 Spring 2011 Final Project: a Lasso Solver 7 Restore wd and output a few things about the solution cd(wd); 1norm = norm(,1); r2norm = norm(bb - AA*); fprintf( \n ) fprintf( \n Solution obtained by s, solver) fprintf( \n _1 = 13.7e, 1norm) fprintf( \n r _2 = 13.7e, r2norm) fprintf( \n\n Nonzero ) K = find( > 1e-12) ; for j = K fprintf( \n 7i 15.7e, j,(j)) fprintf( \n ) References [1] BCLS solver for bound-constrained least-squares. mpf/bcls. [2] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20(1):33 61, [3] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43(1): , SIGEST article. [4] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Ann. Statist., 32(2): , [5] SPGL1 solver for large-scale sparse reconstruction. [6] R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58(1): , [7] E. van den Berg and M. P. Friedlander. Probing the Pareto frontier for basis pursuit solutions. SIAM J. Sci. Comput., 31(2): , 2008.

Basis Pursuit Denoising and the Dantzig Selector

BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept