arxiv: v1 [math.st] 4 Dec 2015

Similar documents
arxiv: v1 [math.st] 22 Feb 2018

Optimization methods

Optimization methods

Geometric ergodicity of the Bayesian lasso

Markov Chain Monte Carlo (MCMC)

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Monte Carlo methods for sampling-based Stochastic Optimization

Computational statistics

Perturbed Proximal Gradient Algorithm

Integrated Non-Factorized Variational Inference

Sparsity Regularization

Lecture 8: The Metropolis-Hastings Algorithm

Bayesian Inference and MCMC

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Markov Chain Monte Carlo methods

6 Markov Chain Monte Carlo (MCMC)

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

17 : Markov Chain Monte Carlo

Part III. A Decision-Theoretic Approach and Bayesian testing

Introduction to Machine Learning CMU-10701

Stochastic Proximal Gradient Algorithm

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Inference in state-space models with multiple paths from conditional SMC

Weak convergence of Markov chain Monte Carlo II

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Reconstruction from Anisotropic Random Measurements

Monte Carlo in Bayesian Statistics

STAT 200C: High-dimensional Statistics

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

University of Toronto Department of Statistics

Multimodal Nested Sampling

7. Estimation and hypothesis testing. Objective. Recommended reading

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Tractable Upper Bounds on the Restricted Isometry Constant

Sparse Linear Models (10/7/13)

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

CSC 2541: Bayesian Methods for Machine Learning

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

arxiv: v2 [math.st] 12 Feb 2008

Optimisation Combinatoire et Convexe.

Nonlinear Programming Models

arxiv: v1 [math.st] 19 Dec 2013

The Metropolis-Hastings Algorithm. June 8, 2012

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Density Estimation. Seungjin Choi

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

On isotropicity with respect to a measure

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Can we do statistical inference in a non-asymptotic way? 1

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

Bayesian Methods for Machine Learning

Some Results on the Ergodicity of Adaptive MCMC Algorithms

I. Bayesian econometrics


INTRODUCTION TO BAYESIAN STATISTICS

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Or How to select variables Using Bayesian LASSO

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Controlled sequential Monte Carlo

Reversible Markov chains

A Dirichlet Form approach to MCMC Optimal Scaling

Bayesian Regression Linear and Logistic Regression

Monte Carlo Methods. Leon Gu CSD, CMU

Answers and expectations

Markov chain Monte Carlo

Statistics & Data Sciences: First Year Prelim Exam May 2018

Log-concave sampling: Metropolis-Hastings algorithms are fast!

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Advances and Applications in Perfect Sampling

Kobe University Repository : Kernel

On John type ellipsoids

STA 4273H: Sta-s-cal Machine Learning

Adaptive Monte Carlo methods

Stat 516, Homework 1

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Lasso: Algorithms and Extensions

Inverse Problems in the Bayesian Framework

Adaptive Population Monte Carlo

Example: Ground Motion Attenuation

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

STA414/2104 Statistical Methods for Machine Learning II

Asymptotics of minimax stochastic programs

First Hitting Time Analysis of the Independence Metropolis Sampler

arxiv: v1 [stat.co] 18 Feb 2012

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

A Geometric Interpretation of the Metropolis Hastings Algorithm

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

Probabilistic Graphical Models

1 Sparsity and l 1 relaxation

The lasso, persistence, and cross-validation

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Transcription:

MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv:151.01366v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct a semi-norm on the parameter space. We show that the partition function depends on the ratio of the l 1 and l norms and present three regimes. We derive the concentration of Bayesian LASSO, and present MCMC convergence diagnosis. keyword: LASSO, Bayes, MCMC, log-concave, geometry, incomplete Gamma function 1 Introduction Let p n be two positive integers, y R n and A be an n p matrix with real numbers entries. Bayesian LASSO c(x = 1 ( Z exp Ax y x 1 is a typically posterior distribution used in the linear regression Here y = Ax + w. ( Z = exp Ax y x 1 dx ( R p is the partition function, and 1 are respectively the Euclidean and the l 1 norms. The vector y R n are the observations, x R p is the unknown signal to recover, w R n is the standard Gaussian noise, and A is a known matrix which maps the signal domain R p into the observation domain R n. If we suppose that x is drawn from Laplace distribution i.e. the distribution proportional to (1 exp( x 1, (3 then the posterior of x known y is drawn from the distribution c (1. The mode { Ax y arg min + x 1 : x R p} (4 1

of c was first introduced in [14] and called LASSO. It is also called Basis Pursuit De-Noising method [4]. In our work we select the term LASSO and keep it for the rest of the article. In general LASSO is not a singleton, i.e. the mode of the distribution c is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See [5], [6], [8], [1] and the references herein. The most popular algorithms to find LASSO are LARS algorithm [7], ISTA and FISTA algorithms see e.g. [] and the review article [10]. The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis. Polar integration Using polar coordinates, the partition function ( Z = J p (θdθ, (5 where dθ denotes the surface measure on the unit sphere S, and Here where J p (θ = 0 S exp( g(r, θr p 1 dr. (6 g(r, θ = 1 (r Aθ + r Aθ β + y, (7 β := θ 1 Aθ y s, (8 and s denotes the cosine of the angle (Aθ, y i.e. cos((aθ, y. Using known estimate θ θ 1, we observe that β is bounded below by 1 A y, (9 and β + as Aθ 0. Here A is the square root of the largest eigenvalue of A A. Observe that c(xdx = 1 Z exp( g(r, θrp 1 drdθ. Hence, we can sample from Bayesian LASSO c (1 as following. We draw uniformly θ := x x from the unit sphere, and then draw the norm x following the distribution µ θ (r := 1 exp( ϕ(r, θ, (10 J p (θ

where ϕ(r, θ := g(r, θ (p 1 ln(r, r > 0. (11 Moreover, observe that the modes {x lasso = r lasso θ lasso : g(r lasso, θ lasso = min r 0,θ S g(r, θ} and {(r, θ : ϕ(r, θ = min r 0,θ S ϕ(r, θ} respectively of the distributions c(xdx and 1 Z exp( g(r, θrp 1 drdθ are different. We will show that (r, θ contains more information than x lasso. 3 Geometric interpretation of the partition function The volume (Lebesgue measure of the set K(A, y := {x R p : J p (x 1} is vol(k(a, y = 1 p Z. Observe that J 1 p p is a norm on the null-space N(A of A. A general result [1] tells us that if f is even, log-concave and integrable on an Euclidean space E, then x E ( 0 f(rxr p 1 dr 1 p is a norm on E. It follows that in the case E = N(A or E = {x R p : Ax, y = 0}, the map is a norm on E. The map x E J 1 p p (x x R p J 1 p p (x := x LASSO has nearly all the properties of a norm. Only the evenness is missing. The set K(A, y = {x R p : x LASSO 1} is convex, compact and contains the origin. See [9] for more details. 3.1 Necessary and sufficient condition to have LASS0 = {0} If β 0, then r [0, + g(r, θ is increasing, its minimizer is equal to r = 0, and its smallest value is y. If β < 0, then its minimizer is equal β to r = Aθ, and its smallest value is less than y. If the set {β < 0} is empty, then LASSO = {0}, if not β l LASSO = {l = θ l : Aθ l β l 0, s.t. βl = sup β }. (1 β 0 As an illustration we consider the case n = 4, p = 7 and the entries of the matrix A B(± 1 n are a realization of i.i.d. Bernoulli random variables 3

x 1 x x 3 x 4 x 5 x 6 x 7 LASSO F IST A 1.7744 0.6019-0.383 0 0-1.0050 0 LASSO P OLAR 0.999 0.3890-1.3980 0.0769-0.0070-0.8699-0.0379 Table 1: N = 10 5, p = 7, n = 4. with the values ± 1 n. We draw uniformly N = 10 5 vectors from the sphere S and estimate LASSO using Formula (1. Table 1 gives the value of LASSO using respectively FISTA algorithm and Formula (1. Observe that necessary and sufficient condition for β 0 for all θ is θ 1 y inf{ Aθ s : θ S}. Using known estimate θ θ 1, we obtain y 1 A as a sufficient condition for β 0 for all θ. 4 Closed form of the partition function We introduce for a R and for a couple p, r 1 of integers, the notations (a r = (a 1... (a r, (13 c(p, r = p 1 ( p 1 ( 1 p 1 k ( k + 1 k r. (14 k=0 Now, we can announce the following result. Proposition 4.1. 1 If β = +, then where If β 0, then θ p 1 J p(θ = (p 1! exp( y. θ p 1 J p(θ := Φ(β, Φ(β = exp( y (β + s y p p 1 k=0 k 1 exp( β k+1 Γ(, β ( p 1 k ( β p 1 k, (15 4

Here Γ(a, x = x exp( tt a 1 dt, a > 0, x 0, is the upper incomplete Gamma function. 3 If β < 0, then θ p 1 J p(θ = exp( y (β + s y p p 1 k=0 k 1 exp( β ( p 1 k ( β p 1 k k 1 (Γ( k+1 + ( 1k γ( k+1, β. (16 Here γ(a, x = x 0 exp( tta 1 dt is the lower incomplete Gamma function. where 4 If β = 0, then θ p 1 J p(θ = exp( y p y p sp Γ( p, 0. 5 If β > 0 then for M p + 1, θ p 1 J p(θ := Φ(β, M + R(β, M, Φ(β, M = (p 1! exp( y M 1 + p 1 exp( y ( 1 + y s pc(p, ( β r β and the remainder term r=p ( R(β, M 1 + y s p y exp( c(p, M β p 1 ( β M (p 1 (18 Proof 4.. Only the first part of assertions and 5 needs the proof. Let us prove the first part of. From the equality J p (θ = exp( y + β and the change of the variable 0 τ = Aθ r + β, exp( ( Aθ r + β r p 1 dr, p 1 r(17 we obtain 0 ( exp ( Aθ r + β r p 1 dr = = 1 Aθ p exp( τ β (τ βp 1 dτ 1 p 1 ( p 1 + ( β Aθ p p 1 k exp( τ k β τ k dτ. k=0 5

As β > 0, then the change of variable implies β ω = τ, exp( τ τ k dτ = k 1 k + 1 Γ(, β. The equality β = θ 1 Aθ y s achieves the proof of. Now we prove Assertion 5. We extend the incomplete Gamma function as following Γ(a, x = and we use known estimate see [3] page 14 x exp( tt a 1 dt, x > 0, a R, M 1 Γ(a, x = exp( xx a 1 + (a r exp( xx a 1 r + R(a, x, M, (19 r=1 where M > a 1, x > 0, a R, and the remainder term R(a, x, M (a M 1 exp( xx a 1 M. If β > 0, then from β = θ 1 Aθ y s, we have J p (θ = ( p 1 θ p exp( y 1 + y s β 1 ( exp( β β p ( β p 1 p 1 k=0 k 1 Γ( k+1, β. Using the expansion (19, and the fact that J p (θ (p 1! θ p 1 β +, we obtain for r < p 1 c(p, r = 0, 1 r < p 1, c(p, p 1 = It follows the following expansion: ( 1 + y s β ( p 1 k ( 1 p 1 k (p 1! p 1. p θ M 1 p 1 exp( y J p(θ = (p 1! + p 1 where the remainder term which achieves the proof. R(β, M p 1 c(p, M M (p 1, ( β r=p ( β exp( y as c(p, r r (p 1 + R(β, M, 6

4.1 Numerical calculations As an illustration we consider n = 4, p = 7, A B(± 1 n, and y = 0. The choice M = 17 corresponds to the relative error R(β,M Φ(β,M 10 4 for β 7.5. Figure 1: Curves of Φ(β and Φ(β, 17, β = [6, 45], p = 7, n = 4. Numerical calculations show that the function Φ(β (15 explodes for β > 13.8. To compass these explosions we use the expansion (19, and then we use the function Φ(, M (17. In Fig.(1 and Fig.( dashed and black curves represent respectively the function Φ(, M and Φ. In Fig. (1 we plot Φ(, M and Φ for β (6, 45. By zooming on β (1.09, 7.5, β (7.5, 13.8, and β (13.8, 15 we show that the behavior of Φ becomes abnormal from β β Φ = 13.8 and obtain Fig.(. 4. The case LASS0={ 0 }: Partition function estimate and concentration inequality The following is a consequence of [9] Lemma.1. Proposition 4.3. 1 The function r (0, + ϕ(r, θ is convex and its unique critical point is the mode of (10. r(θ = β + β + 4(p 1 Aθ (0 7

( By denoting M(θ = exp ϕ(r(θ, θ, we obtain M(θr(θ p J p (θ M(θr(θ(p 1! exp(p 1 (p 1 p. qr(θ 3 We have for q > 0, exp( ϕ(r, θdr pγ(p, (p 1q exp(p 1 (p 1 p 0 exp( ϕ(r, θdr, and Z min := S inf θ S M(θr(θ p Z S (p 1! exp(p 1 (p 1 p sup θ S M(θr(θ := Z max, where S denotes the surface of the unit sphere S. 4 If x is drawn from Bayesian LASSO distribution c (1, then for q > 0, x qr(θ with the probability at least equal to P (q, p := 1 pγ(p,(p 1q exp(p 1 (p 1 p exp( (p 1.. In particular for q = 5 we have pγ(p,(p 1q exp(p 1 (p 1 p Figure : Curves of Φ(β and Φ(β, 17, p = 7 and n = 4. Remark 4.4. If y = 0, then LASS0=0 and the mode (0 becomes r(θ θ 1 = β( β + β + 4(p 1. 8

Figure 3: Curve of r(θ θ 1, β min := 1 A = 0.4987, β max = 44.5, min(r(θ θ 1 = 1.1035, p = 7 and n = 4. In Fig.(3 we plot β [0.4987, 44.5] β( β+ β +4(p 1. The mode of the distribution of 1 Z exp( ϕ(r, θdrdθ is equal to arg min ϕ(r(θ, θ = r>0,θ S (r(θ, θ. As an illustration we consider p = 7, n = 4, A B(± 1 n. We draw uniformly N = 10 5 sample θ i S from the unit sphere S. For each i, we calculate ϕ(r(θ i, θ i, and we derive θ. Notice that β = θ 1 Aθ = 14.01 β Φ is nearly equal to the beginning of abnormality of Φ. Using Formula (5 and Monte Carlo method, we obtain Z.14, Z min 0.0058 and Z max 10.3654. If we draw N = 10 5 vectors using Laplace distribution (3 and calculate the value of Z using Formula ( and Monte Carlo method, then we obtain Z 0.0036 < Z min. Hence Monte Carlo method using Formula (5 wins against Monte Carlo method using Formula (. 5 The case 0 / LASSO If 0 / LASSO, then the assertions of Proposition (4.3 are no longer valid. But we are going to show that these assertions becomes valid if we work 9

around LASSO. We consider for l LASSO, h(x = A(x + l y x + l 1, h(x := h(x h(0, f(x = exp ( h(x. (1 Contrary to the map x c(x, the map x f(x attains its supremum at the origin. Observe that f(x l c(x = R f(xdx, p Z = exp(h(0z f := exp(h(0 f(xdx. R p If x is drawn from c, then x l is drawn from f Z f. Moreover where Z f = f(xdx = R p J p (θ, l := 0 S J p (θ, ldθ, ( f(rθr p 1 dr. (3 The map x R p J 1 p p (x, l := x A,y,l is nearly a norm (only the eveness is missing. The set K(A, y, l = {x R p : x A,y,l 1} (4 is convex, compact and contains the origin. The volume V ol(k(a, y, l = Z f p. If x is drawn from c, then x l is drawn from f Z f, or equivalently if x is drawn from f Z f, then x + l is drawn from c. To draw x from f Z f, we draw θ = uniformely on S, and then we draw x from x x µ θ,l (r := f(rθ J p (θ, l. (5 We have from [9] Lemma.1 and Remarks page 14 the following result. 10

Proposition 5.1. 1 The function r (0, + ϕ(r, θ, l := A(rθ + l y + rθ + l 1 (p 1 ln(r + h(0 is convex and its unique critical point ( r(θ, l is the mode of (5. By denoting M(θ, l = exp ϕ(r(θ, l, θ, l, we obtain M(θ, lr(θ, l p J p (θ, l M(θ, lr(θ, l(p 1! exp(p 1 (p 1 p, and for q > 0, exp( ϕ(r, θ, ldr qr(θ,l pγ(p, (p 1q exp(p 1 (p 1 p 0 exp( ϕ(r, θ, ldr. 3 We have S inf θ S M(θ, lr(θ, l p Z f S (p 1! exp(p 1 (p 1 p sup M(θ, lr(θ, l. θ S 4 if x is drawn from the distribution c (1, then x l qr(θ, l with the probability at least equal to 1 pγ(p,(p 1q exp(p 1 (p 1 p. 5.1 Calculation of the mode of (5 and the partition function (3 Now, we are going to calculate the mode r(θ, l, and the partition function J p (θ, l. The calculations are similar to the case LASSO={0}, but we need new notations. The vector y l = y Al, s l = cos(θ l where θ l denotes the angle (Aθ, y l, b l = y l s l. The components of the vector l LASSO are denoted by l 1,..., l p. For θ R p, we set S 0 = {i {1,..., p} : θ i = 0}, S + = {i {1,..., p} : θ i 0, θ i l i 0}, S = {i {1,..., p} : θ i 0, θ i l i < 0}. The cardinality of S is denoted by S, and the order statistic of the sequence l i θ i, for i S is denoted by lθ(0 := 0 lθ(1 := l θ (1... lθ( S := l θ ( S lθ( S + 1 := +. Using these new notations, we obtain rθ + l 1 = l i + θ i (r + l i + θ i r l i θ i θ i. i S 0 i S + i S 11

If lθ(k r < lθ(k + 1, then where c k := i S 0 l i + θ 1,k := rθ + l 1 = θ 1,k r + c k, i S + θ i + i S + l i k k i=0,i S l(i + i=0,i S θ(i Observe that θ 1, S = θ 1, and if Aθ = 0 then J p (θ, l = exp( y l Now, we have the following. S lθ(k+1 k=0 lθ(k S i=k+1 S l(i, i=k+1,i S θ(i. exp( θ 1,k rr p 1 dr. Proposition 5.. If Aθ = 0, then exp( y l J p (θ, l = exp( c k ((lθ(k + 1 p (lθ(k p + p k I 1 (θ ( exp( c k Γ(p, lθ(k Γ(p, lθ(k + 1, (6 where k I (θ θ p 1,k I 1 (θ = {k {0,..., S }, such that θ 1,k = 0}, I (θ = {k {0,..., S }, such that θ 1,k > 0}. Now, we are going to give the closed form of J p (θ, l when Aθ 0. We observe for lθ(k r < lθ(k + 1 that where raθ + Al y + rθ + l 1 = α k + ( Aθ r + β k, β k = θ 1,k Aθ b l, α k = Al y β k + c k. 1

Moreover if k I 1 (θ, then β k = b l. Observe also that β S is bounded below by y l sup(s l : θ S. It follows that J p (θ, l = exp( α k J p,k (θ, l + exp( α k J p,k (θ, l, where k I 1 (θ J p,k (θ, l = The calculation of lθ(k+1 lθ(k lθ(k+1 lθ(k k I (θ exp( ( Aθ r + β k r p 1 dr. exp( ( Aθ r + β k r p 1 dr is similar to Proposition (4.1, and depends on the sign of x k = Aθ lθ(k + β k, y k = Aθ lθ(k + 1 + β k. Let k 0 = max(k : x k < 0 and k 1 = min(k : y k > 0. Observe that k 0 + 1 k 1, and then y k 0 for k < k 1 k 0 + 1. It follows for k < k 1 that J p,k (θ, l = 1 p 1 Aθ p j=0 j 1 ( p 1 If k 1 < k 0 + 1, then x k1 < 0 < y k1 J p,k1 (θ, l = and for k > k 1, J p,k (θ, l = 1 Aθ p p 1 j=0 1 Aθ p j 1 p 1 j=0 p 1 j=0 j j 1 ( p 1 j j 1 ( β k p 1( γ( j + 1, x k γ(j + 1, y k. and ( p 1 j ( β k p 1 j γ( j + 1, y k + 1 Aθ p ( β k p 1 γ( j + 1, x k, ( p 1 If k 1 = k 0 + 1, then 0 x k1 < y k1, and for k k 1, J p,k (θ, l = 1 p 1 Aθ p j=0 j 1 ( p 1 j j ( β k p 1( γ( j + 1, y k γ(j + 1, x k. ( β k p 1( γ( j + 1, y k γ(j + 1, x k. Now we can show that J p (θ, l converges to (6 as Aθ 0, and we obtain an approximation similar to (19 for J p,k (θ, l as Aθ 0 for each k I 1 (θ. 13

6 MCMC diagnosis Here we take p = 7, n = 4, A B(± 1 n and for simplicity we consider y = 0. We sample from the distribution c (1 using Hastings-Metropolis algorithm (x (t and propose the test x (t qr(θ (t as a criterion for the convergence. Here θ (t := x(t. We recall that if x is drawn from the x (t target distribution c, then x qr(θ with the probability at least equal to P (q, p. Table gives the values of the probability P (q, p. Note that for q.5 the criterion x (t qr(θ (t is satisfied with a large probability. q.5 3 3.5 4 4.5 5 P (q, p 0.667 0.9446 0.994 0.9991 0.9999 1.0000 1.0000 Table : Values of the probability P (q, p for p = 7. 6.1 Independent sampler (IS The proposal distribution Q(x, x 1 = p(x = 1 p exp( x 1, x 1, x. The ratio c(x p(x p Z, x. It s known that MCMC (x (t with the target distribution c and the proposal distribution p is uniformly ergodic [11]: sup P(x (t A x (0 c(xdx (1 Z A B(R p A p t. Here Z.14 and then (1 Z p = 0.987. Figure 4(a shows respectively the plot of t 5r(θ (t and t x (t. 6. Random-walk (RW Metropolis algorithm We do not know if the target distribution c satisfies the curvature condition in [13] Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm (x (t using the criterion x (t qr(θ (t. Figure 4(b shows respectively the plot of t 5r(θ (t and t x (t. Figures 4 show that contrary to independent sampler algorithm, the random walk (RW algorithm satisfies early the criterion x (t 5r(θ. More precisely 14

1 N 1 N 1 the independent sampler (IS algorithm begins to satisfy the criterion x (t 5r(θ (t at t = 8 10 5 iteration. The RW algorithm begins to satisfy the criterion x (t 3.5r(θ (t at t = 939065 iteration, but the IS algorithm never satisfies the criterion x (t 3.5r(θ (t. We finally compare IS and RW algorithms using the fact that R xc(xdx = p 0. The best algorithm will furnish the best approximation of the integral R xc(xdx. Table 3 gives the estimators 1 N p N t=1 x(t IS R xc(xdx and p N t=1 x(t RW R xc(xdx. It follows that 1 N p N t=1 x(t IS = 0.0187 and N t=1 x(t RW = 0.0041. We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm. x 1 x x 3 x 4 x 5 x 6 x 7 ˆx IS -0.0005-0.0037 0.0016 0.0164 0.0050 0.001-0.0058 ˆx RW 0.0005-0.0019-0.000 0.001-0.0005 0.0031-0.0011 Table 3: N = 10 6, p = 7, n = 4 and q = 5. Figure 4: (a: Test of convergence of MCMC algorithm with proposal distribution p(x. (b: Test of convergence of MCMC algorithm with N (0, 0.5I p proposal distribution. N = 10 6 iterations, p = 7, n = 4, q = 5, A B(± 1 n, y = 0 and l = 0. 15

7 Conclusion We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of hasting metropolis algorithm. We showed that the random walk MCMC with the variance 0.5 wins again the independent sampler with the Laplace proposal distribution. 8 References References [1] K. Ball, Logarithmically concave functions and sections of convex sets in R n, Studia Mathematica, T. LXXXVIII (1988 70 84. [] A. Beck, M. Teboulle, A Fast Iterative Shrinkage-thresholding Algorithm for linear Inverse Problem, SIAM J. Imaging Sci. (009 183 0. [3] E.T. Copson, Asymptotic Expansions, Cambridge at the university press (1965. [4] S. Chen, D. L. Donoho, M. Saunders. Atomic decomposition by basis pursuit, SIAM J. Sci. Computing, Vol. 0 No. 1 (1998 33 61. [5] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics Vol. LVII (004, 1413 1457. [6] A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of Metropolis- Hasting and simulated annealing algorithms around penalized least squares estimator, Math. Comput.Simulation (015. [7] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 (004 407 499. [8] G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection, arxiv:131.5658 (015. [9] B. Klartag, V.D. Milman, Geometry of log-concave functions and measures, Geometriae Dedicata, Volume 11 Issue 1 (005 169 18. [10] N. Parikh, S. Boyed, Proximal algorithms, Foundation and Trends in Optimization, 1 (3 (003 13 31. 16

[11] K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, The Annals of Statistics (1994 4:101 11. [1] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. (015 1 16. [13] G.0. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 1 (1996 95 110. [14] R. Tibshirani, Regression shrinkage and selection via Lasso, Journal of the Royal Statistical Society. Series B. Methodological 58 1 (1996 67 88. [15] R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (013 1456 1490. 17