arxiv: v1 [math.st] 22 Feb 2018
|
|
- Meagan Thompson
- 5 years ago
- Views:
Transcription
1 Bayesian Lasso : Concentration and MCMC Diagnosis arxiv:18.857v1 [math.st] Feb 18 D.Ounaissi 1, N.Rahmania 1 Laboratoire LAMA, UFR de Mathématiques, Université de Paris 1, France. Laboratoire Paul Painlevé, UFR de Mathématiques, Université de Lille, France. Abstract Using posterior distribution of Bayesian LASSO we construct a semi-norm on the parameter space. We show that the partition function depends on the ratio of the l 1 and l norms and present three regimes. We derive the concentration of Bayesian LASSO, and present MCMC convergence diagnosis. Keywords: LASSO, Bayes, MCMC, log-concave, geometry, incomplete Gamma function 1. Introduction Let p n be two positive integers, y R n and A be an n p matrix with real numbers entries. Bayesian LASSO c(x) = 1 ( Z exp Ax ) y x 1 (1) is a typically posterior distribution used in the linear regression Here ( Z = exp R p y = Ax + w. Ax y x 1 )dx () is the partition function, and 1 are respectively the Euclidean and the l 1 norms. The vector y R n are the observations, x R p is the addresses: daoud.ounaissi@u-pec.fr (D.Ounaissi), nadji.rahmania@univ-lille1.fr (N.Rahmania) 1 Corresponding Author Preprint submitted to Elsevier November 8, 18
2 unknown signal to recover, w R n is the standard Gaussian noise, and A is a known matrix which maps the signal domain R p into the observation domain R n. If we suppose that x is drawn from Laplace distribution i.e. the distribution proportional to exp( x 1 ), (3) then the posterior of x known y is drawn from the distribution c (1). The mode { Ax y } arg min + x 1 : x R p (4) of c was first introduced in [18] and called LASSO. It is also called Basis Pursuit De-Noising method [4]. In our work we select the term LASSO and keep it for the rest of the article. In general LASSO is not a singleton, i.e. the mode of the distribution c is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See [5], [6], [9], [1], [15] and the references herein. The most popular algorithms to find LASSO are LARS algorithm [8], ISTA and FISTA algorithms see e.g. [] and the review article [14]. The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis.. Polar integration Using polar coordinates s = x S, r = x, the partition function () x Z p = Z p (s)ds, (5) S where denotes one of l or l 1 norms in R p, ds denotes the surface measure on the unit sphere S of the norm, and Z p (s) = exp{ f(rs)}r p 1 dr, (6) where f(x) := Ax y + x 1, x R p. We express the partition function (6) using the parabolic cylinder function. We also give an inequality of concentration and a geometric interpretation of the partition function Z p.
3 3. Parabolic cylinder function and partition function We extend the function s S Z p (s) x R p Z p (x) = exp{ f(rx)}r p 1 dr. (7) This extension is homogeneous of order p. If Ax =, then f(rx) = y + r x 1, and more if x, then Z p (x) = (p 1)! x p 1 exp( y ). If Ax, then we will express Z p (x) using the parabolic cylinder function. We recall that for b R, z the parabolic cylinder function is given by D b (z) = z b exp( z 4 )[1 + O(z )], (8) when z + [16]. We also recall the integral representation of Erdlyi [7] for the parabolic cylinder function exp( z 4 )Γ(ν)D ν(z) = exp( 1 r zr)r ν 1 dr, ν >, where Γ(ν) = exp( t)t ν 1 dt is the Γ function. Proposition 3.1. The variable ω lasso := x 1 Ax, y Ax (9) will play an important role. It depends only on s = x x 1 S p 1,1 and the function s S p 1,1 ω lasso (s) is bounded below by λ 1, := min{ 1 As,y As : s S p 1,1 }. Now we can announce the following result. Proposition 3.. We have for Ax Z p (x) = (p 1)! exp( y ) Ax p exp( ω lasso 4 )D p(ω lasso ). If Ax, then ω lasso + and Z p (x) = (p 1)! exp( y ) x p 1 [1 + O(ω lasso )]. 3 As
4 Corollary 3.3. If y = then ω lasso = 1 As is bounded below by 1 λ 1,, where λ 1, = max( As : s S p 1,1 ) is the norm of the operator A : (R p, 1 ) (R n, ). The partition fucntion Z p (s) = (p 1)!ω p lasso exp(ω lasso 4 )D p(ω lasso ) is As convex and decreasing. Proof 3.4. It suffices to remark that Z p (s) = exp{ As r r}r p 1 dr. 4. Geometric interpretation of the partition function First we represent f(rx) for Ax in the form f(rx) = y ω lasso + (r Ax + ω lasso ). (1) The function exp{ f(x)}, x R p is log-concav and integrable in R p. Observe that Z 1 p p [1] tells us that is a norm on the null space Ker(A) of A. A general result x R p Z 1 p p (x) := x c is a quasi-norm on R p. The unit ball of c is defined by B(A, y) := {x R p : x c 1} = {x R p : Z p (x) 1} = {x = rs R p : Z p (s) r p } = {x = rs R p : (p 1)! exp( y ) As p exp( ω lasso 4 )D p(ω lasso ) r p }. Its contour is equal to C(A, y) := {x R p : x c = 1} = {x R p : Z p (x) = 1} = {x = rs R p : Z p (s) = r p } = {x = rs R p : (p 1)! exp( y ) As p exp( ω lasso 4 )D p(ω lasso ) = r p }. We summarize our results in the following proposition. 4
5 Proposition ) For each s S p 1,1, the longest segment [, r]s contained in B(A, y) holds for r = r max (s) is solution of the equation r p = (p 1)! exp( y ) As p exp( ω lasso 4 )D p(ω lasso ). ) The ball B(A, y) = s S p 1,1 [, r max (s)]s, and its contour is equal to 3) The volume B(A, y) is C(A, y) = {r max (s)s : s S p 1,1 }. rmax(s) p S p 1,1 p ds = Z p p. 5. Necessary and sufficient condition to have lasso equal zero Now we can give the necessary and sufficient condition to have lasso = {} Proposition 5.1. The following assertions are equivalent. 1) = lasso. ) ω lasso (s) pour tout s S p 1,1. 3) A y Concentration around the lasso 6.1. The case lasso null The polar coordinate formula tells us that, we can draw a vector x from c(x)dx by drawing its angle s uniformly, and then simulate its distance r to the origin from c(r, s)dr = 1 Z p (s) exp{ f(rs)}rp 1 dr (11) 5
6 Now let s estimate for r > the probability P( x > r) = c(r, s)dr ds S, S r where S denotes the Lebesgue measure of S. We introduce for each pair a, b R p the function In the following g a,b,p (r) := g a,b (r) (p 1) ln(r), r >. (1) a = As, b = ω lasso. The function r g a,b (r) is increasing (because b := ω lasso ). The function r g a,b,p (r) f est convexe et atteint son minimum au point r(s) solution de l quation As (r As + ω lasso ) = p 1. r The positive root is given by On one hand r(s) = ω lasso + ωlasso + 4(p 1) (13) As exp{ g a,b,p (r)}dr exp{ g a,b (r(s))} r(s) r p 1 dr = exp{ g a,b,p (r(s))} r(s) p. On the other hand by using the convexity of r g a,b (r), we have for all r >, g a,b (r) g a,b (r(s)) + (p 1)(r r(s)), r(s) because r g a,b (r(s)) = p 1. We deduce for q >, r(s) qr(s) exp{ g a,b,p (r)}dr exp{ g a,b (r) + p 1} exp{ g a,b (r(s)) + p 1} qr(s) q(p 1) exp{ p 1 r(s) r}rp 1 dr exp( r)r p 1 r(s) dr (p 1) p exp{ g a,b (r(s)) + p 1} r(s) Γ(p, q(p 1)), (p 1) p 6
7 where Γ(ν, x) = exp( t)t ν 1 dt is the upper incomplete gamma function. x Finally we get the following result. Proposition 6.1. We have for all q >, P( x qr(s)) p exp(p 1) (p 1) p Γ(p, q(p 1)) := P (q, p). (14) Using the following estimate [? ] x a 1 exp( x) < Γ(a, x) < Bx a 1 exp( x), a > 1, B > 1, x > B (a 1), B 1 we get for q > 1, Therefore the quantity Γ(p, q(p 1)) q p 1 (p 1) p 1 exp( q(p 1)). P (q, p) pqp 1 exp{ (q 1)(p 1)}. (p 1) x balance sheet. If x is drawn from the density c, alors B r(θ) (, q) with a probability at least equal to 1 P (q, p). In the figure(1) we plot for p =, n = 1, A = ( 1 1 ) and y = the density c(r, s)dr for a fixed value of s. We notice that the mode c(r, s) =.6 is very close to the value of r( s) =.69 (13) for the same fixed s. 6.. The general case We take the vector l lasso. We will study the concentration of c around l. The variable of interest is u = x l. The law of u has for density c(u + l)du = 1 Z p exp{ f(u + l, A, y)}du. The change of variables formula gives for each norm c(u + l)du = 1 Z p exp{ f(rθ + l)}r p 1 drdθ, r >, θ S. 7
8 Figure 1: pour r [.1; 1] c(r, s). By definition for any vector x, the convex function r f(rs+l) reaches its minimum at the point r =. Therefore r f(rs + l) is increasing. The function f(rs + l, p) := f(rs + l) (p 1) ln(r), r >, (15) is strictly convex. Its critical point r l (s) is solution of the equation r f(rs + l) = p 1. r By a similar proof to that of propostion (6.1) we have the following result; Proposition 6.. If x is drawn from the density c, and s = x l, then for x l all q >, P( x l qr l (s)) 7. Applications p exp(p 1) (p 1) p Γ(p, q(p 1)) := P (q, p). (16) 7.1. The contour in the case p =, n = 1 Let A := (a 1, a ) a matrix of order 1. Its null-space Ker(A) = {(x 1, x ) : a 1 x 1 + a x = }. We have that B(a 1, a, y) contains Ker(A) B,1. 8
9 This intersection is a symmetric segment noted [(x 1 (a 1, a ), x (a 1, a )), (x 1 (a 1, a ), x (a 1, a ))]. To determine the other points of the set B(a 1, a, y), we will directly calculate Z (s). A simple calculation gives et As Z (s) = exp( y + ω lasso ) exp{ ( As r + ω lasso) Finally we have the following proposition. Proposition ) If As, then exp{ ( As r + ω lasso) }rdr, }rdr + ω lasso exp{ ( As r + ω lasso) }dr = 1. Z (s) = exp( y + ω lasso ) As 1 {1 ω lasso π(1 F (ωlasso ))}, Aω where F is the distibution function of the normal law. ) If As and y =, then Z (s) = ω lasso exp( ω lasso ){1 ω lasso π(1 F (ωlasso ))}. 3) Ifi s S 1,1, As and y =, then the function z z (b ) = 1 b exp( 1 b ){1 1 b π(1 F ( 1 b ))} defined on (, λ 1,] is convex and decreasing, where λ 1, = max s S1,1 As. 4) We have for s S 1,1 the ball Z (s) = z ( 1 ω lasso ), ω S 1,1. B(A, ) = {rs : Z (s) r }. is contained in the unit disk x 1 1 for the norm l 1. The contour is defined by the equation Z (s) = r. 9
10 The norm of the linear operator A : (R, 1 ) (R, ) is defined by λ 1, = sup As. s: s 1 =1 the function s Z (s) = z (λ 1,) is constant on est constante sur If A = (1, 1) then Ω 1, = {s : s 1 = 1, As = λ 1, }. Ω 1, = {s : s 1 = 1, As = λ 1, } If A = (a 1, a ) with a 1 < a, then In both case = [(1, ), (, 1)] [( 1, ), (, 1)]. Ω 1, = {s : s 1 = 1, As = λ 1, } = {(, sgn(a )), (, sgn(a ))}. {z (λ 1,)} 1 Ω1, is part of the contour. The other points of the contour are deduced from the equation z (b ) = a, b (, λ 1, ). Each pair (a, b) generate four points of B((a 1, a ), ) of the form as where s 1 + s = 1, a 1 s 1 + a s = b. We plot in the figure the contour of B(a 1, a, ) for different choices of the matrix (a 1, a ). We notice that the surface of B((a 1, a ), ) is decreasing function relatively the norm λ 1, of the matrix A. Remark 7.. The numerics show thatz(ω lasso ) exploses for the large values of ω lasso, it means that ω is closes to the null-space of A. to eleminate that explosion we need to estimate the tail of the gaussian density. Using the Gordon estimation [1] exp( x ) x + 1 π(1 F (x)) x we have the following approximation exp( x x ), x >, 1 b 1 b z (b ) 1 b 1, near to. (17) 1 + b3 1
11 Figure : Contours of B(a 1, a, ) for different matrices A, p = and n = MCMC diagnosis Here we take p = 7, n = 4, A B(± 1 n ) and for simplicity we consider y =. We sample from the distribution c (1) using Hastings-Metropolis algorithm (x (t) ) and propose the test x (t) qr(θ (t) ) as a criterion for the convergence. Here θ (t) := x(t) x (t). We recall that if x is drawn from the target distribution c, then x qr(θ) with the probability at least equal to P (q, p). Table gives the values of the probability P (q, p). Note that for q.5 the criterion x (t) qr(θ (t) ) is satisfied with a large probability. q P (q, p) Table 1: Values of the probability P (q, p) for p = Independent sampler (IS) The proposal distribution Q(x, x 1 ) = p(x ) = 1 p exp( x 1 ), x 1, x. 11
12 The ratio c(x) p(x) p Z, x. It s known that MCMC (x (t) ) with the target distribution c and the proposal distribution p is uniformly ergodic [13]: sup P(x (t) A x () ) c(x)dx (1 Z A B(R p ) p )t. Here Z.14 and then (1 Z p ) =.987. Figure 4(a) shows respectively the plot of t 5r(θ (t) ) and t x (t). 8.. Random-walk (RW) Metropolis algorithm We do not know if the target distribution c satisfies the curvature condition in [17] Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm (x (t) ) using the criterion x (t) qr(θ (t) ). Figure 4(b) shows respectively the plot of t 5r(θ (t) ) and t x (t). Figures 4 show that contrary to independent sampler algorithm, the random walk (RW) algorithm satisfies early the criterion x (t) 5r(θ). More precisely A 1) the independent sampler (IS) algorithm begins to satisfy the criterion x (t) 5r(θ (t) ) at t = iteration. ) The RW algorithm begins to satisfy the criterion x (t) 3.5r(θ (t) ) at t = iteration, but the IS algorithm never satisfies the criterion x (t) 3.5r(θ (t) ). We finally compare IS and RW algorithms using the fact that xc(x)dx = R p. The best algorithm will furnish the best approximation of the integral xc(x)dx. Table 3 gives the estimators 1 N R p N t=1 x(t) IS xc(x)dx and R p 1 N N t=1 x(t) RW xc(x)dx. It follows that 1 N R p N t=1 x(t) IS =.187 and 1 N N t=1 x(t) RW =.41. We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm. 1
13 x 1 x x 3 x 4 x 5 x 6 x 7 ˆx IS ˆx RW Table : IS and RW estimators using N = 1 6 iterations. Figure 3: (a): Test of convergence of MCMC algorithm with proposal distribution p(x ). (b): Test of convergence of MCMC algorithm with N (,.5I p ) proposal distribution. N = 1 6 iterations. 9. Conclusion We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of Hasting Metropolis algorithm. We showed that the random walk MCMC with the variance.5 wins again the independent sampler with Laplace proposal distribution. 13
14 References [1] K. Ball, Logarithmically concave functions and sections of convex sets in R n, Stud. Math. T. LXXXVIII (1988) [] A. Beck, M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problem, SIAM J. Imag. Sci. (9) [3] E.T. Copson, Asymptotic Expansions, Cambridge University Press, [4] S. Chen, D. L. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM J. Sci. Comput. (1) (1998) [5] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure Appl. Math. 57 (11) (4) [6] A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of Metropolis- Hastings and simulated annealing algorithms around LASSO estimator, Math. Comput. Simulation (15). [7] A. Erdlyi, Higher transcendental functions, California institute of technology, bateman manuscript project, vol. p. 119 (1953). [8] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 (4) [9] G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkage-thresholding Metropolis adjusted Langevin algorithm for bayesian variable selection, arxiv: v3 [math.st] (15). [1] R.D. Gordon, Values of Mills ratio of area to bounding ordinate of the normal probability integral for large values of the argument, Annals of Mathematical Statistics, vol (1941). [11] B. Klartag, V.D. Milman, Geometry of log-concave functions and measures, Geom. Dedicata 11 (1) (5) [1] M. Mendoza, A. Allegra, T.P. Coleman, Bayesian Lasso Posterior Sampling Via Parallelized Measure Transport, arxiv:181.16v1 [stat.co] (18). 14
15 [13] K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, Ann. Statist. 4 (1994) [14] N. Parikh, S. Boyed, Proximal algorithms, Found. Trends Optim. 1 (3) (3) [15] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. (15) [16] N.M. Temme, Numerical and asymptotic aspects of parabolic cylinder functions, ournal of Computational and Applied Mathematics (). [17] G.O. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 (1) (1996) [18] R. Tibshirani, Regression shrinkage and selection via Lasso, J. R. Stat. Soc. Ser. B Stat. Methodol. 58 (1) (1996) [19] R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (13)
arxiv: v1 [math.st] 4 Dec 2015
MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv:151.01366v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationInverse Brascamp-Lieb inequalities along the Heat equation
Inverse Brascamp-Lieb inequalities along the Heat equation Franck Barthe and Dario Cordero-Erausquin October 8, 003 Abstract Adapting Borell s proof of Ehrhard s inequality for general sets, we provide
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationA Geometric Interpretation of the Metropolis Hastings Algorithm
Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationThe simple slice sampler is a specialised type of MCMC auxiliary variable method (Swendsen and Wang, 1987; Edwards and Sokal, 1988; Besag and Green, 1
Recent progress on computable bounds and the simple slice sampler by Gareth O. Roberts* and Jerey S. Rosenthal** (May, 1999.) This paper discusses general quantitative bounds on the convergence rates of
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationSimulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris
Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationLeast Sparsity of p-norm based Optimization Problems with p > 1
Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationVariance Bounding Markov Chains
Variance Bounding Markov Chains by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 2006; revised April 2007.) Abstract. We introduce a new property of Markov chains, called variance bounding.
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationNear Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing
Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas M.Vidyasagar@utdallas.edu www.utdallas.edu/ m.vidyasagar
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationDNNs for Sparse Coding and Dictionary Learning
DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,
More informationSlice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method
Slice Sampling with Adaptive Multivariate Steps: The Shrinking-Rank Method Madeleine B. Thompson Radford M. Neal Abstract The shrinking rank method is a variation of slice sampling that is efficient at
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationLikelihood-free MCMC
Bayesian inference for stable distributions with applications in finance Department of Mathematics University of Leicester September 2, 2011 MSc project final presentation Outline 1 2 3 4 Classical Monte
More informationNonlinear Programming Models
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:
More informationSampling Methods (11/30/04)
CS281A/Stat241A: Statistical Learning Theory Sampling Methods (11/30/04) Lecturer: Michael I. Jordan Scribe: Jaspal S. Sandhu 1 Gibbs Sampling Figure 1: Undirected and directed graphs, respectively, with
More informationAn Homotopy Algorithm for the Lasso with Online Observations
An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu
More informationINDUSTRIAL MATHEMATICS INSTITUTE. B.S. Kashin and V.N. Temlyakov. IMI Preprint Series. Department of Mathematics University of South Carolina
INDUSTRIAL MATHEMATICS INSTITUTE 2007:08 A remark on compressed sensing B.S. Kashin and V.N. Temlyakov IMI Preprint Series Department of Mathematics University of South Carolina A remark on compressed
More informationDimensional behaviour of entropy and information
Dimensional behaviour of entropy and information Sergey Bobkov and Mokshay Madiman Note: A slightly condensed version of this paper is in press and will appear in the Comptes Rendus de l Académies des
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationTHE INVERSE FUNCTION THEOREM
THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)
More informationBayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems
Bayesian Methods and Uncertainty Quantification for Nonlinear Inverse Problems John Bardsley, University of Montana Collaborators: H. Haario, J. Kaipio, M. Laine, Y. Marzouk, A. Seppänen, A. Solonen, Z.
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationarxiv: v1 [stat.co] 18 Feb 2012
A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More informationStatistical Inference for Stochastic Epidemic Models
Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,
More informationOn John type ellipsoids
On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to
More informationScale Mixture Modeling of Priors for Sparse Signal Recovery
Scale Mixture Modeling of Priors for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Outline Outline Sparse
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationAbout Split Proximal Algorithms for the Q-Lasso
Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S
More informationLangevin and hessian with fisher approximation stochastic sampling for parameter estimation of structured covariance
Langevin and hessian with fisher approximation stochastic sampling for parameter estimation of structured covariance Cornelia Vacar, Jean-François Giovannelli, Yannick Berthoumieu To cite this version:
More informationBasis Pursuit Denoising and the Dantzig Selector
BPDN and DS p. 1/16 Basis Pursuit Denoising and the Dantzig Selector West Coast Optimization Meeting University of Washington Seattle, WA, April 28 29, 2007 Michael Friedlander and Michael Saunders Dept
More informationNEWTON POLYGONS AND FAMILIES OF POLYNOMIALS
NEWTON POLYGONS AND FAMILIES OF POLYNOMIALS ARNAUD BODIN Abstract. We consider a continuous family (f s ), s [0,1] of complex polynomials in two variables with isolated singularities, that are Newton non-degenerate.
More informationarxiv: v1 [stat.co] 2 Nov 2017
Binary Bouncy Particle Sampler arxiv:1711.922v1 [stat.co] 2 Nov 217 Ari Pakman Department of Statistics Center for Theoretical Neuroscience Grossman Center for the Statistics of Mind Columbia University
More informationSequential Monte Carlo Methods in High Dimensions
Sequential Monte Carlo Methods in High Dimensions Alexandros Beskos Statistical Science, UCL Oxford, 24th September 2012 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Andrew Stuart Imperial College,
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationA Dirichlet Form approach to MCMC Optimal Scaling
A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationSpectral Gap and Concentration for Some Spherically Symmetric Probability Measures
Spectral Gap and Concentration for Some Spherically Symmetric Probability Measures S.G. Bobkov School of Mathematics, University of Minnesota, 127 Vincent Hall, 26 Church St. S.E., Minneapolis, MN 55455,
More informationGenerating Random Variables
Generating Random Variables Christian Robert Université Paris Dauphine and CREST, INSEE George Casella University of Florida Keywords and Phrases: Random Number Generator, Probability Integral Transform,
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationWeak convergence of Markov chain Monte Carlo II
Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult
More informationPoint spread function reconstruction from the image of a sharp edge
DOE/NV/5946--49 Point spread function reconstruction from the image of a sharp edge John Bardsley, Kevin Joyce, Aaron Luttman The University of Montana National Security Technologies LLC Montana Uncertainty
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationIn English, this means that if we travel on a straight line between any two points in C, then we never leave C.
Convex sets In this section, we will be introduced to some of the mathematical fundamentals of convex sets. In order to motivate some of the definitions, we will look at the closest point problem from
More informationComputer Practical: Metropolis-Hastings-based MCMC
Computer Practical: Metropolis-Hastings-based MCMC Andrea Arnold and Franz Hamilton North Carolina State University July 30, 2016 A. Arnold / F. Hamilton (NCSU) MH-based MCMC July 30, 2016 1 / 19 Markov
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT
Elect. Comm. in Probab. 10 (2005), 36 44 ELECTRONIC COMMUNICATIONS in PROBABILITY ON THE ZERO-ONE LAW AND THE LAW OF LARGE NUMBERS FOR RANDOM WALK IN MIXING RAN- DOM ENVIRONMENT FIRAS RASSOUL AGHA Department
More informationAn Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models
Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong
More informationMarkov Chains and MCMC
Markov Chains and MCMC CompSci 590.02 Instructor: AshwinMachanavajjhala Lecture 4 : 590.02 Spring 13 1 Recap: Monte Carlo Method If U is a universe of items, and G is a subset satisfying some property,
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationOn matrix variate Dirichlet vectors
On matrix variate Dirichlet vectors Konstencia Bobecka Polytechnica, Warsaw, Poland Richard Emilion MAPMO, Université d Orléans, 45100 Orléans, France Jacek Wesolowski Polytechnica, Warsaw, Poland Abstract
More informationExamples of Adaptive MCMC
Examples of Adaptive MCMC by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September, 2006.) Abstract. We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters
More informationMCMC Sampling for Bayesian Inference using L1-type Priors
MÜNSTER MCMC Sampling for Bayesian Inference using L1-type Priors (what I do whenever the ill-posedness of EEG/MEG is just not frustrating enough!) AG Imaging Seminar Felix Lucka 26.06.2012 , MÜNSTER Sampling
More informationAnomalous transport of particles in Plasma physics
Anomalous transport of particles in Plasma physics L. Cesbron a, A. Mellet b,1, K. Trivisa b, a École Normale Supérieure de Cachan Campus de Ker Lann 35170 Bruz rance. b Department of Mathematics, University
More informationAnalysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*
Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.
More informationThe two subset recurrent property of Markov chains
The two subset recurrent property of Markov chains Lars Holden, Norsk Regnesentral Abstract This paper proposes a new type of recurrence where we divide the Markov chains into intervals that start when
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationBayesian Analysis of Massive Datasets Via Particle Filters
Bayesian Analysis of Massive Datasets Via Particle Filters Bayesian Analysis Use Bayes theorem to learn about model parameters from data Examples: Clustered data: hospitals, schools Spatial models: public
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationPure Mathematics Paper II
MATHEMATICS TUTORIALS H AL TARXIEN A Level 3 hours Pure Mathematics Question Paper This paper consists of five pages and ten questions. Check to see if any pages are missing. Answer any SEVEN questions.
More informationLecture 1: September 25
0-725: Optimization Fall 202 Lecture : September 25 Lecturer: Geoff Gordon/Ryan Tibshirani Scribes: Subhodeep Moitra Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have
More information