arxiv: v1 [math.st] 22 Feb 2018

Similar documents
arxiv: v1 [math.st] 4 Dec 2015

Optimization methods

Geometric ergodicity of the Bayesian lasso

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

Optimization methods

University of Toronto Department of Statistics

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Sparsity Regularization

Lecture 8: The Metropolis-Hastings Algorithm

Monte Carlo methods for sampling-based Stochastic Optimization

Perturbed Proximal Gradient Algorithm

Inverse Brascamp-Lieb inequalities along the Heat equation

Computational statistics

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

The Metropolis-Hastings Algorithm. June 8, 2012

arxiv: v2 [math.st] 12 Feb 2008

Introduction to Machine Learning CMU-10701

A Geometric Interpretation of the Metropolis Hastings Algorithm

Advances and Applications in Perfect Sampling

The simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1

6 Markov Chain Monte Carlo (MCMC)

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Markov Chain Monte Carlo (MCMC)

Least Sparsity of p-norm based Optimization Problems with p > 1

Answers and expectations

Optimisation Combinatoire et Convexe.

Variance Bounding Markov Chains

CSC 2541: Bayesian Methods for Machine Learning

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Lecture 8: Bayesian Estimation of Parameters in State Space Models

On Reparametrization and the Gibbs Sampler

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

Sparse Linear Models (10/7/13)

Reconstruction from Anisotropic Random Measurements

DNNs for Sparse Coding and Dictionary Learning

Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Adaptive Monte Carlo methods

17 : Markov Chain Monte Carlo

Likelihood-free MCMC

Nonlinear Programming Models

Sampling Methods (11/30/04)

An Homotopy Algorithm for the Lasso with Online Observations

INDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina

Dimensional behaviour of entropy and information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Convex Optimization CMU-10725

Markov Chain Monte Carlo Methods for Stochastic Optimization

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

THE INVERSE FUNCTION THEOREM

Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems

Particle Filtering Approaches for Dynamic Stochastic Optimization

arxiv: v1 [stat.co] 18 Feb 2012

Markov Chain Monte Carlo methods

Statistical Inference for Stochastic Epidemic Models

On John type ellipsoids

Scale Mixture Modeling of Priors for Sparse Signal Recovery

Integrated Non-Factorized Variational Inference

About Split Proximal Algorithms for the Q-Lasso

Langevin and hessian with fisher approximation stochastic sampling for parameter estimation of structured covariance

Basis Pursuit Denoising and the Dantzig Selector

NEWTON POLYGONS AND FAMILIES OF POLYNOMIALS

arxiv: v1 [stat.co] 2 Nov 2017

Sequential Monte Carlo Methods in High Dimensions

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

A Dirichlet Form approach to MCMC Optimal Scaling

Stochastic Proximal Gradient Algorithm

Markov Chain Monte Carlo Methods for Stochastic

Stat 516, Homework 1

Spectral Gap and Concentration for Some Spherically Symmetric Probability Measures

Generating Random Variables

STAT 200C: High-dimensional Statistics

Bayesian Methods for Machine Learning

Weak convergence of Markov chain Monte Carlo II

Point spread function reconstruction from the image of a sharp edge

Riemann Manifold Methods in Bayesian Statistics

CPSC 540: Machine Learning

In English, this means that if we travel on a straight line between any two points in C, then we never leave C.

Computer Practical: Metropolis-Hastings-based MCMC

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

Markov Chains and MCMC

Least squares under convex constraint

On matrix variate Dirichlet vectors

Examples of Adaptive MCMC

MCMC Sampling for Bayesian Inference using L1-type Priors

Anomalous transport of particles in Plasma physics

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

The two subset recurrent property of Markov chains

Or How to select variables Using Bayesian LASSO

Bayesian Analysis of Massive Datasets Via Particle Filters

On Bayesian Computation

Monte Carlo Methods. Leon Gu CSD, CMU

Pure Mathematics Paper II

Lecture 1: September 25

Transcription:

Bayesian Lasso : Concentration and MCMC Diagnosis arxiv:18.857v1 [math.st] Feb 18 D.Ounaissi 1, N.Rahmania 1 Laboratoire LAMA, UFR de Mathématiques, Université de Paris 1, France. Laboratoire Paul Painlevé, UFR de Mathématiques, Université de Lille, France. Abstract Using posterior distribution of Bayesian LASSO we construct a semi-norm on the parameter space. We show that the partition function depends on the ratio of the l 1 and l norms and present three regimes. We derive the concentration of Bayesian LASSO, and present MCMC convergence diagnosis. Keywords: LASSO, Bayes, MCMC, log-concave, geometry, incomplete Gamma function 1. Introduction Let p n be two positive integers, y R n and A be an n p matrix with real numbers entries. Bayesian LASSO c(x) = 1 ( Z exp Ax ) y x 1 (1) is a typically posterior distribution used in the linear regression Here ( Z = exp R p y = Ax + w. Ax y x 1 )dx () is the partition function, and 1 are respectively the Euclidean and the l 1 norms. The vector y R n are the observations, x R p is the Email addresses: daoud.ounaissi@u-pec.fr (D.Ounaissi), nadji.rahmania@univ-lille1.fr (N.Rahmania) 1 Corresponding Author Preprint submitted to Elsevier November 8, 18

unknown signal to recover, w R n is the standard Gaussian noise, and A is a known matrix which maps the signal domain R p into the observation domain R n. If we suppose that x is drawn from Laplace distribution i.e. the distribution proportional to exp( x 1 ), (3) then the posterior of x known y is drawn from the distribution c (1). The mode { Ax y } arg min + x 1 : x R p (4) of c was first introduced in [18] and called LASSO. It is also called Basis Pursuit De-Noising method [4]. In our work we select the term LASSO and keep it for the rest of the article. In general LASSO is not a singleton, i.e. the mode of the distribution c is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See [5], [6], [9], [1], [15] and the references herein. The most popular algorithms to find LASSO are LARS algorithm [8], ISTA and FISTA algorithms see e.g. [] and the review article [14]. The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis.. Polar integration Using polar coordinates s = x S, r = x, the partition function () x Z p = Z p (s)ds, (5) S where denotes one of l or l 1 norms in R p, ds denotes the surface measure on the unit sphere S of the norm, and Z p (s) = exp{ f(rs)}r p 1 dr, (6) where f(x) := Ax y + x 1, x R p. We express the partition function (6) using the parabolic cylinder function. We also give an inequality of concentration and a geometric interpretation of the partition function Z p.

3. Parabolic cylinder function and partition function We extend the function s S Z p (s) x R p Z p (x) = exp{ f(rx)}r p 1 dr. (7) This extension is homogeneous of order p. If Ax =, then f(rx) = y + r x 1, and more if x, then Z p (x) = (p 1)! x p 1 exp( y ). If Ax, then we will express Z p (x) using the parabolic cylinder function. We recall that for b R, z the parabolic cylinder function is given by D b (z) = z b exp( z 4 )[1 + O(z )], (8) when z + [16]. We also recall the integral representation of Erdlyi [7] for the parabolic cylinder function exp( z 4 )Γ(ν)D ν(z) = exp( 1 r zr)r ν 1 dr, ν >, where Γ(ν) = exp( t)t ν 1 dt is the Γ function. Proposition 3.1. The variable ω lasso := x 1 Ax, y Ax (9) will play an important role. It depends only on s = x x 1 S p 1,1 and the function s S p 1,1 ω lasso (s) is bounded below by λ 1, := min{ 1 As,y As : s S p 1,1 }. Now we can announce the following result. Proposition 3.. We have for Ax Z p (x) = (p 1)! exp( y ) Ax p exp( ω lasso 4 )D p(ω lasso ). If Ax, then ω lasso + and Z p (x) = (p 1)! exp( y ) x p 1 [1 + O(ω lasso )]. 3 As

Corollary 3.3. If y = then ω lasso = 1 As is bounded below by 1 λ 1,, where λ 1, = max( As : s S p 1,1 ) is the norm of the operator A : (R p, 1 ) (R n, ). The partition fucntion Z p (s) = (p 1)!ω p lasso exp(ω lasso 4 )D p(ω lasso ) is As convex and decreasing. Proof 3.4. It suffices to remark that Z p (s) = exp{ As r r}r p 1 dr. 4. Geometric interpretation of the partition function First we represent f(rx) for Ax in the form f(rx) = y ω lasso + (r Ax + ω lasso ). (1) The function exp{ f(x)}, x R p is log-concav and integrable in R p. Observe that Z 1 p p [1] tells us that is a norm on the null space Ker(A) of A. A general result x R p Z 1 p p (x) := x c is a quasi-norm on R p. The unit ball of c is defined by B(A, y) := {x R p : x c 1} = {x R p : Z p (x) 1} = {x = rs R p : Z p (s) r p } = {x = rs R p : (p 1)! exp( y ) As p exp( ω lasso 4 )D p(ω lasso ) r p }. Its contour is equal to C(A, y) := {x R p : x c = 1} = {x R p : Z p (x) = 1} = {x = rs R p : Z p (s) = r p } = {x = rs R p : (p 1)! exp( y ) As p exp( ω lasso 4 )D p(ω lasso ) = r p }. We summarize our results in the following proposition. 4

Proposition 4.1. 1) For each s S p 1,1, the longest segment [, r]s contained in B(A, y) holds for r = r max (s) is solution of the equation r p = (p 1)! exp( y ) As p exp( ω lasso 4 )D p(ω lasso ). ) The ball B(A, y) = s S p 1,1 [, r max (s)]s, and its contour is equal to 3) The volume B(A, y) is C(A, y) = {r max (s)s : s S p 1,1 }. rmax(s) p S p 1,1 p ds = Z p p. 5. Necessary and sufficient condition to have lasso equal zero Now we can give the necessary and sufficient condition to have lasso = {} Proposition 5.1. The following assertions are equivalent. 1) = lasso. ) ω lasso (s) pour tout s S p 1,1. 3) A y 1. 6. Concentration around the lasso 6.1. The case lasso null The polar coordinate formula tells us that, we can draw a vector x from c(x)dx by drawing its angle s uniformly, and then simulate its distance r to the origin from c(r, s)dr = 1 Z p (s) exp{ f(rs)}rp 1 dr (11) 5

Now let s estimate for r > the probability P( x > r) = c(r, s)dr ds S, S r where S denotes the Lebesgue measure of S. We introduce for each pair a, b R p the function In the following g a,b,p (r) := g a,b (r) (p 1) ln(r), r >. (1) a = As, b = ω lasso. The function r g a,b (r) is increasing (because b := ω lasso ). The function r g a,b,p (r) f est convexe et atteint son minimum au point r(s) solution de l quation As (r As + ω lasso ) = p 1. r The positive root is given by On one hand r(s) = ω lasso + ωlasso + 4(p 1) (13) As exp{ g a,b,p (r)}dr exp{ g a,b (r(s))} r(s) r p 1 dr = exp{ g a,b,p (r(s))} r(s) p. On the other hand by using the convexity of r g a,b (r), we have for all r >, g a,b (r) g a,b (r(s)) + (p 1)(r r(s)), r(s) because r g a,b (r(s)) = p 1. We deduce for q >, r(s) qr(s) exp{ g a,b,p (r)}dr exp{ g a,b (r) + p 1} exp{ g a,b (r(s)) + p 1} qr(s) q(p 1) exp{ p 1 r(s) r}rp 1 dr exp( r)r p 1 r(s) dr (p 1) p exp{ g a,b (r(s)) + p 1} r(s) Γ(p, q(p 1)), (p 1) p 6

where Γ(ν, x) = exp( t)t ν 1 dt is the upper incomplete gamma function. x Finally we get the following result. Proposition 6.1. We have for all q >, P( x qr(s)) p exp(p 1) (p 1) p Γ(p, q(p 1)) := P (q, p). (14) Using the following estimate [? ] x a 1 exp( x) < Γ(a, x) < Bx a 1 exp( x), a > 1, B > 1, x > B (a 1), B 1 we get for q > 1, Therefore the quantity Γ(p, q(p 1)) q p 1 (p 1) p 1 exp( q(p 1)). P (q, p) pqp 1 exp{ (q 1)(p 1)}. (p 1) x balance sheet. If x is drawn from the density c, alors B r(θ) (, q) with a probability at least equal to 1 P (q, p). In the figure(1) we plot for p =, n = 1, A = ( 1 1 ) and y = the density c(r, s)dr for a fixed value of s. We notice that the mode c(r, s) =.6 is very close to the value of r( s) =.69 (13) for the same fixed s. 6.. The general case We take the vector l lasso. We will study the concentration of c around l. The variable of interest is u = x l. The law of u has for density c(u + l)du = 1 Z p exp{ f(u + l, A, y)}du. The change of variables formula gives for each norm c(u + l)du = 1 Z p exp{ f(rθ + l)}r p 1 drdθ, r >, θ S. 7

Figure 1: pour r [.1; 1] c(r, s). By definition for any vector x, the convex function r f(rs+l) reaches its minimum at the point r =. Therefore r f(rs + l) is increasing. The function f(rs + l, p) := f(rs + l) (p 1) ln(r), r >, (15) is strictly convex. Its critical point r l (s) is solution of the equation r f(rs + l) = p 1. r By a similar proof to that of propostion (6.1) we have the following result; Proposition 6.. If x is drawn from the density c, and s = x l, then for x l all q >, P( x l qr l (s)) 7. Applications p exp(p 1) (p 1) p Γ(p, q(p 1)) := P (q, p). (16) 7.1. The contour in the case p =, n = 1 Let A := (a 1, a ) a matrix of order 1. Its null-space Ker(A) = {(x 1, x ) : a 1 x 1 + a x = }. We have that B(a 1, a, y) contains Ker(A) B,1. 8

This intersection is a symmetric segment noted [(x 1 (a 1, a ), x (a 1, a )), (x 1 (a 1, a ), x (a 1, a ))]. To determine the other points of the set B(a 1, a, y), we will directly calculate Z (s). A simple calculation gives et As Z (s) = exp( y + ω lasso ) exp{ ( As r + ω lasso) Finally we have the following proposition. Proposition 7.1. 1) If As, then exp{ ( As r + ω lasso) }rdr, }rdr + ω lasso exp{ ( As r + ω lasso) }dr = 1. Z (s) = exp( y + ω lasso ) As 1 {1 ω lasso π(1 F (ωlasso ))}, Aω where F is the distibution function of the normal law. ) If As and y =, then Z (s) = ω lasso exp( ω lasso ){1 ω lasso π(1 F (ωlasso ))}. 3) Ifi s S 1,1, As and y =, then the function z z (b ) = 1 b exp( 1 b ){1 1 b π(1 F ( 1 b ))} defined on (, λ 1,] is convex and decreasing, where λ 1, = max s S1,1 As. 4) We have for s S 1,1 the ball Z (s) = z ( 1 ω lasso ), ω S 1,1. B(A, ) = {rs : Z (s) r }. is contained in the unit disk x 1 1 for the norm l 1. The contour is defined by the equation Z (s) = r. 9

The norm of the linear operator A : (R, 1 ) (R, ) is defined by λ 1, = sup As. s: s 1 =1 the function s Z (s) = z (λ 1,) is constant on est constante sur If A = (1, 1) then Ω 1, = {s : s 1 = 1, As = λ 1, }. Ω 1, = {s : s 1 = 1, As = λ 1, } If A = (a 1, a ) with a 1 < a, then In both case = [(1, ), (, 1)] [( 1, ), (, 1)]. Ω 1, = {s : s 1 = 1, As = λ 1, } = {(, sgn(a )), (, sgn(a ))}. {z (λ 1,)} 1 Ω1, is part of the contour. The other points of the contour are deduced from the equation z (b ) = a, b (, λ 1, ). Each pair (a, b) generate four points of B((a 1, a ), ) of the form as where s 1 + s = 1, a 1 s 1 + a s = b. We plot in the figure the contour of B(a 1, a, ) for different choices of the matrix (a 1, a ). We notice that the surface of B((a 1, a ), ) is decreasing function relatively the norm λ 1, of the matrix A. Remark 7.. The numerics show thatz(ω lasso ) exploses for the large values of ω lasso, it means that ω is closes to the null-space of A. to eleminate that explosion we need to estimate the tail of the gaussian density. Using the Gordon estimation [1] exp( x ) x + 1 π(1 F (x)) x we have the following approximation exp( x x ), x >, 1 b 1 b z (b ) 1 b 1, near to. (17) 1 + b3 1

Figure : Contours of B(a 1, a, ) for different matrices A, p = and n = 1. 8. MCMC diagnosis Here we take p = 7, n = 4, A B(± 1 n ) and for simplicity we consider y =. We sample from the distribution c (1) using Hastings-Metropolis algorithm (x (t) ) and propose the test x (t) qr(θ (t) ) as a criterion for the convergence. Here θ (t) := x(t) x (t). We recall that if x is drawn from the target distribution c, then x qr(θ) with the probability at least equal to P (q, p). Table gives the values of the probability P (q, p). Note that for q.5 the criterion x (t) qr(θ (t) ) is satisfied with a large probability. q.5 3 3.5 4 4.5 5 P (q, p).667.9446.994.9991.9999 1. 1. Table 1: Values of the probability P (q, p) for p = 7. 8.1. Independent sampler (IS) The proposal distribution Q(x, x 1 ) = p(x ) = 1 p exp( x 1 ), x 1, x. 11

The ratio c(x) p(x) p Z, x. It s known that MCMC (x (t) ) with the target distribution c and the proposal distribution p is uniformly ergodic [13]: sup P(x (t) A x () ) c(x)dx (1 Z A B(R p ) p )t. Here Z.14 and then (1 Z p ) =.987. Figure 4(a) shows respectively the plot of t 5r(θ (t) ) and t x (t). 8.. Random-walk (RW) Metropolis algorithm We do not know if the target distribution c satisfies the curvature condition in [17] Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm (x (t) ) using the criterion x (t) qr(θ (t) ). Figure 4(b) shows respectively the plot of t 5r(θ (t) ) and t x (t). Figures 4 show that contrary to independent sampler algorithm, the random walk (RW) algorithm satisfies early the criterion x (t) 5r(θ). More precisely A 1) the independent sampler (IS) algorithm begins to satisfy the criterion x (t) 5r(θ (t) ) at t = 8 1 5 iteration. ) The RW algorithm begins to satisfy the criterion x (t) 3.5r(θ (t) ) at t = 93965 iteration, but the IS algorithm never satisfies the criterion x (t) 3.5r(θ (t) ). We finally compare IS and RW algorithms using the fact that xc(x)dx = R p. The best algorithm will furnish the best approximation of the integral xc(x)dx. Table 3 gives the estimators 1 N R p N t=1 x(t) IS xc(x)dx and R p 1 N N t=1 x(t) RW xc(x)dx. It follows that 1 N R p N t=1 x(t) IS =.187 and 1 N N t=1 x(t) RW =.41. We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm. 1

x 1 x x 3 x 4 x 5 x 6 x 7 ˆx IS -.5 -.37.16.164.5.1 -.58 ˆx RW.5 -.19 -..1 -.5.31 -.11 Table : IS and RW estimators using N = 1 6 iterations. Figure 3: (a): Test of convergence of MCMC algorithm with proposal distribution p(x ). (b): Test of convergence of MCMC algorithm with N (,.5I p ) proposal distribution. N = 1 6 iterations. 9. Conclusion We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of Hasting Metropolis algorithm. We showed that the random walk MCMC with the variance.5 wins again the independent sampler with Laplace proposal distribution. 13

References [1] K. Ball, Logarithmically concave functions and sections of convex sets in R n, Stud. Math. T. LXXXVIII (1988) 69-84. [] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problem, SIAM J. Imag. Sci. (9) 183-.. [3] E.T. Copson, Asymptotic Expansions, Cambridge University Press, 1965. [4] S. Chen, D. L. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. (1) (1998) 33-61. [5] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (11) (4) 14131457. [6] A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of Metropolis- Hastings and simulated annealing algorithms around LASSO estimator, Math. Comput. Simulation (15). [7] A. Erdlyi, Higher transcendental functions, California institute of technology, bateman manuscript project, vol. p. 119 (1953). [8] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 (4) 47-499. [9] G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkage-thresholding Metropolis adjusted Langevin algorithm for bayesian variable selection, arxiv:131.5658v3 [math.st] (15). [1] R.D. Gordon, Values of Mills ratio of area to bounding ordinate of the normal probability integral for large values of the argument, Annals of Mathematical Statistics, vol. 1 364 366 (1941). [11] B. Klartag, V.D. Milman, Geometry of log-concave functions and measures, Geom. Dedicata 11 (1) (5) 169-18. [1] M. Mendoza, A. Allegra, T.P. Coleman, Bayesian Lasso Posterior Sampling Via Parallelized Measure Transport, arxiv:181.16v1 [stat.co] (18). 14

[13] K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, Ann. Statist. 4 (1994) 11-11. [14] N. Parikh, S. Boyed, Proximal algorithms, Found. Trends Optim. 1 (3) (3) 13-31. [15] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. (15) 1-16. [16] N.M. Temme, Numerical and asymptotic aspects of parabolic cylinder functions, ournal of Computational and Applied Mathematics 11 1 46 (). [17] G.O. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 (1) (1996) 95-11. [18] R. Tibshirani, Regression shrinkage and selection via Lasso, J. R. Stat. Soc. Ser. B Stat. Methodol. 58 (1) (1996) 67-88. [19] R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (13) 1456-149. 15