arxiv: v1 [math.st] 4 Dec 2015

Size: px
Start display at page:

Download "arxiv: v1 [math.st] 4 Dec 2015"

Transcription

1 MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv: v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct a semi-norm on the parameter space. We show that the partition function depends on the ratio of the l 1 and l norms and present three regimes. We derive the concentration of Bayesian LASSO, and present MCMC convergence diagnosis. keyword: LASSO, Bayes, MCMC, log-concave, geometry, incomplete Gamma function 1 Introduction Let p n be two positive integers, y R n and A be an n p matrix with real numbers entries. Bayesian LASSO c(x = 1 ( Z exp Ax y x 1 is a typically posterior distribution used in the linear regression Here y = Ax + w. ( Z = exp Ax y x 1 dx ( R p is the partition function, and 1 are respectively the Euclidean and the l 1 norms. The vector y R n are the observations, x R p is the unknown signal to recover, w R n is the standard Gaussian noise, and A is a known matrix which maps the signal domain R p into the observation domain R n. If we suppose that x is drawn from Laplace distribution i.e. the distribution proportional to (1 exp( x 1, (3 then the posterior of x known y is drawn from the distribution c (1. The mode { Ax y arg min + x 1 : x R p} (4 1

2 of c was first introduced in [14] and called LASSO. It is also called Basis Pursuit De-Noising method [4]. In our work we select the term LASSO and keep it for the rest of the article. In general LASSO is not a singleton, i.e. the mode of the distribution c is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See [5], [6], [8], [1] and the references herein. The most popular algorithms to find LASSO are LARS algorithm [7], ISTA and FISTA algorithms see e.g. [] and the review article [10]. The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis. Polar integration Using polar coordinates, the partition function ( Z = J p (θdθ, (5 where dθ denotes the surface measure on the unit sphere S, and Here where J p (θ = 0 S exp( g(r, θr p 1 dr. (6 g(r, θ = 1 (r Aθ + r Aθ β + y, (7 β := θ 1 Aθ y s, (8 and s denotes the cosine of the angle (Aθ, y i.e. cos((aθ, y. Using known estimate θ θ 1, we observe that β is bounded below by 1 A y, (9 and β + as Aθ 0. Here A is the square root of the largest eigenvalue of A A. Observe that c(xdx = 1 Z exp( g(r, θrp 1 drdθ. Hence, we can sample from Bayesian LASSO c (1 as following. We draw uniformly θ := x x from the unit sphere, and then draw the norm x following the distribution µ θ (r := 1 exp( ϕ(r, θ, (10 J p (θ

3 where ϕ(r, θ := g(r, θ (p 1 ln(r, r > 0. (11 Moreover, observe that the modes {x lasso = r lasso θ lasso : g(r lasso, θ lasso = min r 0,θ S g(r, θ} and {(r, θ : ϕ(r, θ = min r 0,θ S ϕ(r, θ} respectively of the distributions c(xdx and 1 Z exp( g(r, θrp 1 drdθ are different. We will show that (r, θ contains more information than x lasso. 3 Geometric interpretation of the partition function The volume (Lebesgue measure of the set K(A, y := {x R p : J p (x 1} is vol(k(a, y = 1 p Z. Observe that J 1 p p is a norm on the null-space N(A of A. A general result [1] tells us that if f is even, log-concave and integrable on an Euclidean space E, then x E ( 0 f(rxr p 1 dr 1 p is a norm on E. It follows that in the case E = N(A or E = {x R p : Ax, y = 0}, the map is a norm on E. The map x E J 1 p p (x x R p J 1 p p (x := x LASSO has nearly all the properties of a norm. Only the evenness is missing. The set K(A, y = {x R p : x LASSO 1} is convex, compact and contains the origin. See [9] for more details. 3.1 Necessary and sufficient condition to have LASS0 = {0} If β 0, then r [0, + g(r, θ is increasing, its minimizer is equal to r = 0, and its smallest value is y. If β < 0, then its minimizer is equal β to r = Aθ, and its smallest value is less than y. If the set {β < 0} is empty, then LASSO = {0}, if not β l LASSO = {l = θ l : Aθ l β l 0, s.t. βl = sup β }. (1 β 0 As an illustration we consider the case n = 4, p = 7 and the entries of the matrix A B(± 1 n are a realization of i.i.d. Bernoulli random variables 3

4 x 1 x x 3 x 4 x 5 x 6 x 7 LASSO F IST A LASSO P OLAR Table 1: N = 10 5, p = 7, n = 4. with the values ± 1 n. We draw uniformly N = 10 5 vectors from the sphere S and estimate LASSO using Formula (1. Table 1 gives the value of LASSO using respectively FISTA algorithm and Formula (1. Observe that necessary and sufficient condition for β 0 for all θ is θ 1 y inf{ Aθ s : θ S}. Using known estimate θ θ 1, we obtain y 1 A as a sufficient condition for β 0 for all θ. 4 Closed form of the partition function We introduce for a R and for a couple p, r 1 of integers, the notations (a r = (a 1... (a r, (13 c(p, r = p 1 ( p 1 ( 1 p 1 k ( k + 1 k r. (14 k=0 Now, we can announce the following result. Proposition If β = +, then where If β 0, then θ p 1 J p(θ = (p 1! exp( y. θ p 1 J p(θ := Φ(β, Φ(β = exp( y (β + s y p p 1 k=0 k 1 exp( β k+1 Γ(, β ( p 1 k ( β p 1 k, (15 4

5 Here Γ(a, x = x exp( tt a 1 dt, a > 0, x 0, is the upper incomplete Gamma function. 3 If β < 0, then θ p 1 J p(θ = exp( y (β + s y p p 1 k=0 k 1 exp( β ( p 1 k ( β p 1 k k 1 (Γ( k+1 + ( 1k γ( k+1, β. (16 Here γ(a, x = x 0 exp( tta 1 dt is the lower incomplete Gamma function. where 4 If β = 0, then θ p 1 J p(θ = exp( y p y p sp Γ( p, 0. 5 If β > 0 then for M p + 1, θ p 1 J p(θ := Φ(β, M + R(β, M, Φ(β, M = (p 1! exp( y M 1 + p 1 exp( y ( 1 + y s pc(p, ( β r β and the remainder term r=p ( R(β, M 1 + y s p y exp( c(p, M β p 1 ( β M (p 1 (18 Proof 4.. Only the first part of assertions and 5 needs the proof. Let us prove the first part of. From the equality J p (θ = exp( y + β and the change of the variable 0 τ = Aθ r + β, exp( ( Aθ r + β r p 1 dr, p 1 r(17 we obtain 0 ( exp ( Aθ r + β r p 1 dr = = 1 Aθ p exp( τ β (τ βp 1 dτ 1 p 1 ( p 1 + ( β Aθ p p 1 k exp( τ k β τ k dτ. k=0 5

6 As β > 0, then the change of variable implies β ω = τ, exp( τ τ k dτ = k 1 k + 1 Γ(, β. The equality β = θ 1 Aθ y s achieves the proof of. Now we prove Assertion 5. We extend the incomplete Gamma function as following Γ(a, x = and we use known estimate see [3] page 14 x exp( tt a 1 dt, x > 0, a R, M 1 Γ(a, x = exp( xx a 1 + (a r exp( xx a 1 r + R(a, x, M, (19 r=1 where M > a 1, x > 0, a R, and the remainder term R(a, x, M (a M 1 exp( xx a 1 M. If β > 0, then from β = θ 1 Aθ y s, we have J p (θ = ( p 1 θ p exp( y 1 + y s β 1 ( exp( β β p ( β p 1 p 1 k=0 k 1 Γ( k+1, β. Using the expansion (19, and the fact that J p (θ (p 1! θ p 1 β +, we obtain for r < p 1 c(p, r = 0, 1 r < p 1, c(p, p 1 = It follows the following expansion: ( 1 + y s β ( p 1 k ( 1 p 1 k (p 1! p 1. p θ M 1 p 1 exp( y J p(θ = (p 1! + p 1 where the remainder term which achieves the proof. R(β, M p 1 c(p, M M (p 1, ( β r=p ( β exp( y as c(p, r r (p 1 + R(β, M, 6

7 4.1 Numerical calculations As an illustration we consider n = 4, p = 7, A B(± 1 n, and y = 0. The choice M = 17 corresponds to the relative error R(β,M Φ(β,M 10 4 for β 7.5. Figure 1: Curves of Φ(β and Φ(β, 17, β = [6, 45], p = 7, n = 4. Numerical calculations show that the function Φ(β (15 explodes for β > To compass these explosions we use the expansion (19, and then we use the function Φ(, M (17. In Fig.(1 and Fig.( dashed and black curves represent respectively the function Φ(, M and Φ. In Fig. (1 we plot Φ(, M and Φ for β (6, 45. By zooming on β (1.09, 7.5, β (7.5, 13.8, and β (13.8, 15 we show that the behavior of Φ becomes abnormal from β β Φ = 13.8 and obtain Fig.(. 4. The case LASS0={ 0 }: Partition function estimate and concentration inequality The following is a consequence of [9] Lemma.1. Proposition The function r (0, + ϕ(r, θ is convex and its unique critical point is the mode of (10. r(θ = β + β + 4(p 1 Aθ (0 7

8 ( By denoting M(θ = exp ϕ(r(θ, θ, we obtain M(θr(θ p J p (θ M(θr(θ(p 1! exp(p 1 (p 1 p. qr(θ 3 We have for q > 0, exp( ϕ(r, θdr pγ(p, (p 1q exp(p 1 (p 1 p 0 exp( ϕ(r, θdr, and Z min := S inf θ S M(θr(θ p Z S (p 1! exp(p 1 (p 1 p sup θ S M(θr(θ := Z max, where S denotes the surface of the unit sphere S. 4 If x is drawn from Bayesian LASSO distribution c (1, then for q > 0, x qr(θ with the probability at least equal to P (q, p := 1 pγ(p,(p 1q exp(p 1 (p 1 p exp( (p 1.. In particular for q = 5 we have pγ(p,(p 1q exp(p 1 (p 1 p Figure : Curves of Φ(β and Φ(β, 17, p = 7 and n = 4. Remark 4.4. If y = 0, then LASS0=0 and the mode (0 becomes r(θ θ 1 = β( β + β + 4(p 1. 8

9 Figure 3: Curve of r(θ θ 1, β min := 1 A = , β max = 44.5, min(r(θ θ 1 = , p = 7 and n = 4. In Fig.(3 we plot β [0.4987, 44.5] β( β+ β +4(p 1. The mode of the distribution of 1 Z exp( ϕ(r, θdrdθ is equal to arg min ϕ(r(θ, θ = r>0,θ S (r(θ, θ. As an illustration we consider p = 7, n = 4, A B(± 1 n. We draw uniformly N = 10 5 sample θ i S from the unit sphere S. For each i, we calculate ϕ(r(θ i, θ i, and we derive θ. Notice that β = θ 1 Aθ = β Φ is nearly equal to the beginning of abnormality of Φ. Using Formula (5 and Monte Carlo method, we obtain Z.14, Z min and Z max If we draw N = 10 5 vectors using Laplace distribution (3 and calculate the value of Z using Formula ( and Monte Carlo method, then we obtain Z < Z min. Hence Monte Carlo method using Formula (5 wins against Monte Carlo method using Formula (. 5 The case 0 / LASSO If 0 / LASSO, then the assertions of Proposition (4.3 are no longer valid. But we are going to show that these assertions becomes valid if we work 9

10 around LASSO. We consider for l LASSO, h(x = A(x + l y x + l 1, h(x := h(x h(0, f(x = exp ( h(x. (1 Contrary to the map x c(x, the map x f(x attains its supremum at the origin. Observe that f(x l c(x = R f(xdx, p Z = exp(h(0z f := exp(h(0 f(xdx. R p If x is drawn from c, then x l is drawn from f Z f. Moreover where Z f = f(xdx = R p J p (θ, l := 0 S J p (θ, ldθ, ( f(rθr p 1 dr. (3 The map x R p J 1 p p (x, l := x A,y,l is nearly a norm (only the eveness is missing. The set K(A, y, l = {x R p : x A,y,l 1} (4 is convex, compact and contains the origin. The volume V ol(k(a, y, l = Z f p. If x is drawn from c, then x l is drawn from f Z f, or equivalently if x is drawn from f Z f, then x + l is drawn from c. To draw x from f Z f, we draw θ = uniformely on S, and then we draw x from x x µ θ,l (r := f(rθ J p (θ, l. (5 We have from [9] Lemma.1 and Remarks page 14 the following result. 10

11 Proposition The function r (0, + ϕ(r, θ, l := A(rθ + l y + rθ + l 1 (p 1 ln(r + h(0 is convex and its unique critical point ( r(θ, l is the mode of (5. By denoting M(θ, l = exp ϕ(r(θ, l, θ, l, we obtain M(θ, lr(θ, l p J p (θ, l M(θ, lr(θ, l(p 1! exp(p 1 (p 1 p, and for q > 0, exp( ϕ(r, θ, ldr qr(θ,l pγ(p, (p 1q exp(p 1 (p 1 p 0 exp( ϕ(r, θ, ldr. 3 We have S inf θ S M(θ, lr(θ, l p Z f S (p 1! exp(p 1 (p 1 p sup M(θ, lr(θ, l. θ S 4 if x is drawn from the distribution c (1, then x l qr(θ, l with the probability at least equal to 1 pγ(p,(p 1q exp(p 1 (p 1 p. 5.1 Calculation of the mode of (5 and the partition function (3 Now, we are going to calculate the mode r(θ, l, and the partition function J p (θ, l. The calculations are similar to the case LASSO={0}, but we need new notations. The vector y l = y Al, s l = cos(θ l where θ l denotes the angle (Aθ, y l, b l = y l s l. The components of the vector l LASSO are denoted by l 1,..., l p. For θ R p, we set S 0 = {i {1,..., p} : θ i = 0}, S + = {i {1,..., p} : θ i 0, θ i l i 0}, S = {i {1,..., p} : θ i 0, θ i l i < 0}. The cardinality of S is denoted by S, and the order statistic of the sequence l i θ i, for i S is denoted by lθ(0 := 0 lθ(1 := l θ (1... lθ( S := l θ ( S lθ( S + 1 := +. Using these new notations, we obtain rθ + l 1 = l i + θ i (r + l i + θ i r l i θ i θ i. i S 0 i S + i S 11

12 If lθ(k r < lθ(k + 1, then where c k := i S 0 l i + θ 1,k := rθ + l 1 = θ 1,k r + c k, i S + θ i + i S + l i k k i=0,i S l(i + i=0,i S θ(i Observe that θ 1, S = θ 1, and if Aθ = 0 then J p (θ, l = exp( y l Now, we have the following. S lθ(k+1 k=0 lθ(k S i=k+1 S l(i, i=k+1,i S θ(i. exp( θ 1,k rr p 1 dr. Proposition 5.. If Aθ = 0, then exp( y l J p (θ, l = exp( c k ((lθ(k + 1 p (lθ(k p + p k I 1 (θ ( exp( c k Γ(p, lθ(k Γ(p, lθ(k + 1, (6 where k I (θ θ p 1,k I 1 (θ = {k {0,..., S }, such that θ 1,k = 0}, I (θ = {k {0,..., S }, such that θ 1,k > 0}. Now, we are going to give the closed form of J p (θ, l when Aθ 0. We observe for lθ(k r < lθ(k + 1 that where raθ + Al y + rθ + l 1 = α k + ( Aθ r + β k, β k = θ 1,k Aθ b l, α k = Al y β k + c k. 1

13 Moreover if k I 1 (θ, then β k = b l. Observe also that β S is bounded below by y l sup(s l : θ S. It follows that J p (θ, l = exp( α k J p,k (θ, l + exp( α k J p,k (θ, l, where k I 1 (θ J p,k (θ, l = The calculation of lθ(k+1 lθ(k lθ(k+1 lθ(k k I (θ exp( ( Aθ r + β k r p 1 dr. exp( ( Aθ r + β k r p 1 dr is similar to Proposition (4.1, and depends on the sign of x k = Aθ lθ(k + β k, y k = Aθ lθ(k β k. Let k 0 = max(k : x k < 0 and k 1 = min(k : y k > 0. Observe that k k 1, and then y k 0 for k < k 1 k It follows for k < k 1 that J p,k (θ, l = 1 p 1 Aθ p j=0 j 1 ( p 1 If k 1 < k 0 + 1, then x k1 < 0 < y k1 J p,k1 (θ, l = and for k > k 1, J p,k (θ, l = 1 Aθ p p 1 j=0 1 Aθ p j 1 p 1 j=0 p 1 j=0 j j 1 ( p 1 j j 1 ( β k p 1( γ( j + 1, x k γ(j + 1, y k. and ( p 1 j ( β k p 1 j γ( j + 1, y k + 1 Aθ p ( β k p 1 γ( j + 1, x k, ( p 1 If k 1 = k 0 + 1, then 0 x k1 < y k1, and for k k 1, J p,k (θ, l = 1 p 1 Aθ p j=0 j 1 ( p 1 j j ( β k p 1( γ( j + 1, y k γ(j + 1, x k. ( β k p 1( γ( j + 1, y k γ(j + 1, x k. Now we can show that J p (θ, l converges to (6 as Aθ 0, and we obtain an approximation similar to (19 for J p,k (θ, l as Aθ 0 for each k I 1 (θ. 13

14 6 MCMC diagnosis Here we take p = 7, n = 4, A B(± 1 n and for simplicity we consider y = 0. We sample from the distribution c (1 using Hastings-Metropolis algorithm (x (t and propose the test x (t qr(θ (t as a criterion for the convergence. Here θ (t := x(t. We recall that if x is drawn from the x (t target distribution c, then x qr(θ with the probability at least equal to P (q, p. Table gives the values of the probability P (q, p. Note that for q.5 the criterion x (t qr(θ (t is satisfied with a large probability. q P (q, p Table : Values of the probability P (q, p for p = Independent sampler (IS The proposal distribution Q(x, x 1 = p(x = 1 p exp( x 1, x 1, x. The ratio c(x p(x p Z, x. It s known that MCMC (x (t with the target distribution c and the proposal distribution p is uniformly ergodic [11]: sup P(x (t A x (0 c(xdx (1 Z A B(R p A p t. Here Z.14 and then (1 Z p = Figure 4(a shows respectively the plot of t 5r(θ (t and t x (t. 6. Random-walk (RW Metropolis algorithm We do not know if the target distribution c satisfies the curvature condition in [13] Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm (x (t using the criterion x (t qr(θ (t. Figure 4(b shows respectively the plot of t 5r(θ (t and t x (t. Figures 4 show that contrary to independent sampler algorithm, the random walk (RW algorithm satisfies early the criterion x (t 5r(θ. More precisely 14

15 1 N 1 N 1 the independent sampler (IS algorithm begins to satisfy the criterion x (t 5r(θ (t at t = iteration. The RW algorithm begins to satisfy the criterion x (t 3.5r(θ (t at t = iteration, but the IS algorithm never satisfies the criterion x (t 3.5r(θ (t. We finally compare IS and RW algorithms using the fact that R xc(xdx = p 0. The best algorithm will furnish the best approximation of the integral R xc(xdx. Table 3 gives the estimators 1 N p N t=1 x(t IS R xc(xdx and p N t=1 x(t RW R xc(xdx. It follows that 1 N p N t=1 x(t IS = and N t=1 x(t RW = We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm. x 1 x x 3 x 4 x 5 x 6 x 7 ˆx IS ˆx RW Table 3: N = 10 6, p = 7, n = 4 and q = 5. Figure 4: (a: Test of convergence of MCMC algorithm with proposal distribution p(x. (b: Test of convergence of MCMC algorithm with N (0, 0.5I p proposal distribution. N = 10 6 iterations, p = 7, n = 4, q = 5, A B(± 1 n, y = 0 and l = 0. 15

16 7 Conclusion We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of hasting metropolis algorithm. We showed that the random walk MCMC with the variance 0.5 wins again the independent sampler with the Laplace proposal distribution. 8 References References [1] K. Ball, Logarithmically concave functions and sections of convex sets in R n, Studia Mathematica, T. LXXXVIII ( [] A. Beck, M. Teboulle, A Fast Iterative Shrinkage-thresholding Algorithm for linear Inverse Problem, SIAM J. Imaging Sci. ( [3] E.T. Copson, Asymptotic Expansions, Cambridge at the university press (1965. [4] S. Chen, D. L. Donoho, M. Saunders. Atomic decomposition by basis pursuit, SIAM J. Sci. Computing, Vol. 0 No. 1 ( [5] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics Vol. LVII (004, [6] A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of Metropolis- Hasting and simulated annealing algorithms around penalized least squares estimator, Math. Comput.Simulation (015. [7] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 ( [8] G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection, arxiv: (015. [9] B. Klartag, V.D. Milman, Geometry of log-concave functions and measures, Geometriae Dedicata, Volume 11 Issue 1 ( [10] N. Parikh, S. Boyed, Proximal algorithms, Foundation and Trends in Optimization, 1 (3 (

17 [11] K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, The Annals of Statistics (1994 4: [1] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. ( [13] G.0. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 1 ( [14] R. Tibshirani, Regression shrinkage and selection via Lasso, Journal of the Royal Statistical Society. Series B. Methodological 58 1 ( [15] R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (

arxiv: v1 [math.st] 22 Feb 2018

arxiv: v1 [math.st] 22 Feb 2018 Bayesian Lasso : Concentration and MCMC Diagnosis arxiv:18.857v1 [math.st] Feb 18 D.Ounaissi 1, N.Rahmania 1 Laboratoire LAMA, UFR de Mathématiques, Université de Paris 1, France. Laboratoire Paul Painlevé,

More information

Optimization methods

Optimization methods Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

Integrated Non-Factorized Variational Inference

Integrated Non-Factorized Variational Inference Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Lecture 8: The Metropolis-Hastings Algorithm

Lecture 8: The Metropolis-Hastings Algorithm 30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS

ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

17 : Markov Chain Monte Carlo

17 : Markov Chain Monte Carlo 10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Quantitative Non-Geometric Convergence Bounds for Independence Samplers

Quantitative Non-Geometric Convergence Bounds for Independence Samplers Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Inference in state-space models with multiple paths from conditional SMC

Inference in state-space models with multiple paths from conditional SMC Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

Reconstruction from Anisotropic Random Measurements

Reconstruction from Anisotropic Random Measurements Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013

More information

Monte Carlo in Bayesian Statistics

Monte Carlo in Bayesian Statistics Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

University of Toronto Department of Statistics

University of Toronto Department of Statistics Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704

More information

Multimodal Nested Sampling

Multimodal Nested Sampling Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo

Winter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte

More information

CSC 2541: Bayesian Methods for Machine Learning

CSC 2541: Bayesian Methods for Machine Learning CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

arxiv: v2 [math.st] 12 Feb 2008

arxiv: v2 [math.st] 12 Feb 2008 arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig

More information

Optimisation Combinatoire et Convexe.

Optimisation Combinatoire et Convexe. Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix

More information

Nonlinear Programming Models

Nonlinear Programming Models Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:

More information

arxiv: v1 [math.st] 19 Dec 2013

arxiv: v1 [math.st] 19 Dec 2013 A Shrinkage-Thresholding Metropolis adjusted Langevin algorithm for Bayesian Variable Selection Amandine Schreck arxiv:131.5658v1 [math.st] 19 Dec 13 LTCI, Télécom ParisTech and CNRS, Paris, France. Gersende

More information

The Metropolis-Hastings Algorithm. June 8, 2012

The Metropolis-Hastings Algorithm. June 8, 2012 The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.

Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J. Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo

Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10

More information

On isotropicity with respect to a measure

On isotropicity with respect to a measure On isotropicity with respect to a measure Liran Rotem Abstract A body is said to be isoptropic with respect to a measure µ if the function θ x, θ dµ(x) is constant on the unit sphere. In this note, we

More information

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*

Analysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal* Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10 COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

I. Bayesian econometrics

I. Bayesian econometrics I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods

More information

SC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.

More information

INTRODUCTION TO BAYESIAN STATISTICS

INTRODUCTION TO BAYESIAN STATISTICS INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Reversible Markov chains

Reversible Markov chains Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies

More information

A Dirichlet Form approach to MCMC Optimal Scaling

A Dirichlet Form approach to MCMC Optimal Scaling A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC

More information

Bayesian Regression Linear and Logistic Regression

Bayesian Regression Linear and Logistic Regression When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we

More information

Monte Carlo Methods. Leon Gu CSD, CMU

Monte Carlo Methods. Leon Gu CSD, CMU Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Markov chain Monte Carlo

Markov chain Monte Carlo Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Log-concave sampling: Metropolis-Hastings algorithms are fast!

Log-concave sampling: Metropolis-Hastings algorithms are fast! Proceedings of Machine Learning Research vol 75:1 5, 2018 31st Annual Conference on Learning Theory Log-concave sampling: Metropolis-Hastings algorithms are fast! Raaz Dwivedi Department of Electrical

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Advances and Applications in Perfect Sampling

Advances and Applications in Perfect Sampling and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC

More information

Kobe University Repository : Kernel

Kobe University Repository : Kernel Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Adaptive Monte Carlo methods

Adaptive Monte Carlo methods Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

Lasso: Algorithms and Extensions

Lasso: Algorithms and Extensions ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions

More information

Inverse Problems in the Bayesian Framework

Inverse Problems in the Bayesian Framework Inverse Problems in the Bayesian Framework Daniela Calvetti Case Western Reserve University Cleveland, Ohio Raleigh, NC, July 2016 Bayes Formula Stochastic model: Two random variables X R n, B R m, where

More information

Adaptive Population Monte Carlo

Adaptive Population Monte Carlo Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte

More information

Example: Ground Motion Attenuation

Example: Ground Motion Attenuation Example: Ground Motion Attenuation Problem: Predict the probability distribution for Peak Ground Acceleration (PGA), the level of ground shaking caused by an earthquake Earthquake records are used to update

More information

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding

The lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty

More information

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis

ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

First Hitting Time Analysis of the Independence Metropolis Sampler

First Hitting Time Analysis of the Independence Metropolis Sampler (To appear in the Journal for Theoretical Probability) First Hitting Time Analysis of the Independence Metropolis Sampler Romeo Maciuca and Song-Chun Zhu In this paper, we study a special case of the Metropolis

More information

arxiv: v1 [stat.co] 18 Feb 2012

arxiv: v1 [stat.co] 18 Feb 2012 A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania

More information

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris

Simulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms

More information

A Geometric Interpretation of the Metropolis Hastings Algorithm

A Geometric Interpretation of the Metropolis Hastings Algorithm Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given

More information

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J

r=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J 7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

1 Sparsity and l 1 relaxation

1 Sparsity and l 1 relaxation 6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the

More information

The lasso, persistence, and cross-validation

The lasso, persistence, and cross-validation The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University

More information

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,

More information