arxiv: v1 [math.st] 4 Dec 2015
|
|
- Cecilia Peters
- 6 years ago
- Views:
Transcription
1 MCMC convergence diagnosis using geometry of Bayesian LASSO A. Dermoune, D.Ounaissi, N.Rahmania Abstract arxiv: v1 [math.st] 4 Dec 015 Using posterior distribution of Bayesian LASSO we construct a semi-norm on the parameter space. We show that the partition function depends on the ratio of the l 1 and l norms and present three regimes. We derive the concentration of Bayesian LASSO, and present MCMC convergence diagnosis. keyword: LASSO, Bayes, MCMC, log-concave, geometry, incomplete Gamma function 1 Introduction Let p n be two positive integers, y R n and A be an n p matrix with real numbers entries. Bayesian LASSO c(x = 1 ( Z exp Ax y x 1 is a typically posterior distribution used in the linear regression Here y = Ax + w. ( Z = exp Ax y x 1 dx ( R p is the partition function, and 1 are respectively the Euclidean and the l 1 norms. The vector y R n are the observations, x R p is the unknown signal to recover, w R n is the standard Gaussian noise, and A is a known matrix which maps the signal domain R p into the observation domain R n. If we suppose that x is drawn from Laplace distribution i.e. the distribution proportional to (1 exp( x 1, (3 then the posterior of x known y is drawn from the distribution c (1. The mode { Ax y arg min + x 1 : x R p} (4 1
2 of c was first introduced in [14] and called LASSO. It is also called Basis Pursuit De-Noising method [4]. In our work we select the term LASSO and keep it for the rest of the article. In general LASSO is not a singleton, i.e. the mode of the distribution c is not unique. In this case LASSO is a set and we will denote by lasso any element of this set. A large number of theoretical results has been provided for LASSO. See [5], [6], [8], [1] and the references herein. The most popular algorithms to find LASSO are LARS algorithm [7], ISTA and FISTA algorithms see e.g. [] and the review article [10]. The aim of this work is to study geometry of bayesian LASSO and to derive MCMC convergence diagnosis. Polar integration Using polar coordinates, the partition function ( Z = J p (θdθ, (5 where dθ denotes the surface measure on the unit sphere S, and Here where J p (θ = 0 S exp( g(r, θr p 1 dr. (6 g(r, θ = 1 (r Aθ + r Aθ β + y, (7 β := θ 1 Aθ y s, (8 and s denotes the cosine of the angle (Aθ, y i.e. cos((aθ, y. Using known estimate θ θ 1, we observe that β is bounded below by 1 A y, (9 and β + as Aθ 0. Here A is the square root of the largest eigenvalue of A A. Observe that c(xdx = 1 Z exp( g(r, θrp 1 drdθ. Hence, we can sample from Bayesian LASSO c (1 as following. We draw uniformly θ := x x from the unit sphere, and then draw the norm x following the distribution µ θ (r := 1 exp( ϕ(r, θ, (10 J p (θ
3 where ϕ(r, θ := g(r, θ (p 1 ln(r, r > 0. (11 Moreover, observe that the modes {x lasso = r lasso θ lasso : g(r lasso, θ lasso = min r 0,θ S g(r, θ} and {(r, θ : ϕ(r, θ = min r 0,θ S ϕ(r, θ} respectively of the distributions c(xdx and 1 Z exp( g(r, θrp 1 drdθ are different. We will show that (r, θ contains more information than x lasso. 3 Geometric interpretation of the partition function The volume (Lebesgue measure of the set K(A, y := {x R p : J p (x 1} is vol(k(a, y = 1 p Z. Observe that J 1 p p is a norm on the null-space N(A of A. A general result [1] tells us that if f is even, log-concave and integrable on an Euclidean space E, then x E ( 0 f(rxr p 1 dr 1 p is a norm on E. It follows that in the case E = N(A or E = {x R p : Ax, y = 0}, the map is a norm on E. The map x E J 1 p p (x x R p J 1 p p (x := x LASSO has nearly all the properties of a norm. Only the evenness is missing. The set K(A, y = {x R p : x LASSO 1} is convex, compact and contains the origin. See [9] for more details. 3.1 Necessary and sufficient condition to have LASS0 = {0} If β 0, then r [0, + g(r, θ is increasing, its minimizer is equal to r = 0, and its smallest value is y. If β < 0, then its minimizer is equal β to r = Aθ, and its smallest value is less than y. If the set {β < 0} is empty, then LASSO = {0}, if not β l LASSO = {l = θ l : Aθ l β l 0, s.t. βl = sup β }. (1 β 0 As an illustration we consider the case n = 4, p = 7 and the entries of the matrix A B(± 1 n are a realization of i.i.d. Bernoulli random variables 3
4 x 1 x x 3 x 4 x 5 x 6 x 7 LASSO F IST A LASSO P OLAR Table 1: N = 10 5, p = 7, n = 4. with the values ± 1 n. We draw uniformly N = 10 5 vectors from the sphere S and estimate LASSO using Formula (1. Table 1 gives the value of LASSO using respectively FISTA algorithm and Formula (1. Observe that necessary and sufficient condition for β 0 for all θ is θ 1 y inf{ Aθ s : θ S}. Using known estimate θ θ 1, we obtain y 1 A as a sufficient condition for β 0 for all θ. 4 Closed form of the partition function We introduce for a R and for a couple p, r 1 of integers, the notations (a r = (a 1... (a r, (13 c(p, r = p 1 ( p 1 ( 1 p 1 k ( k + 1 k r. (14 k=0 Now, we can announce the following result. Proposition If β = +, then where If β 0, then θ p 1 J p(θ = (p 1! exp( y. θ p 1 J p(θ := Φ(β, Φ(β = exp( y (β + s y p p 1 k=0 k 1 exp( β k+1 Γ(, β ( p 1 k ( β p 1 k, (15 4
5 Here Γ(a, x = x exp( tt a 1 dt, a > 0, x 0, is the upper incomplete Gamma function. 3 If β < 0, then θ p 1 J p(θ = exp( y (β + s y p p 1 k=0 k 1 exp( β ( p 1 k ( β p 1 k k 1 (Γ( k+1 + ( 1k γ( k+1, β. (16 Here γ(a, x = x 0 exp( tta 1 dt is the lower incomplete Gamma function. where 4 If β = 0, then θ p 1 J p(θ = exp( y p y p sp Γ( p, 0. 5 If β > 0 then for M p + 1, θ p 1 J p(θ := Φ(β, M + R(β, M, Φ(β, M = (p 1! exp( y M 1 + p 1 exp( y ( 1 + y s pc(p, ( β r β and the remainder term r=p ( R(β, M 1 + y s p y exp( c(p, M β p 1 ( β M (p 1 (18 Proof 4.. Only the first part of assertions and 5 needs the proof. Let us prove the first part of. From the equality J p (θ = exp( y + β and the change of the variable 0 τ = Aθ r + β, exp( ( Aθ r + β r p 1 dr, p 1 r(17 we obtain 0 ( exp ( Aθ r + β r p 1 dr = = 1 Aθ p exp( τ β (τ βp 1 dτ 1 p 1 ( p 1 + ( β Aθ p p 1 k exp( τ k β τ k dτ. k=0 5
6 As β > 0, then the change of variable implies β ω = τ, exp( τ τ k dτ = k 1 k + 1 Γ(, β. The equality β = θ 1 Aθ y s achieves the proof of. Now we prove Assertion 5. We extend the incomplete Gamma function as following Γ(a, x = and we use known estimate see [3] page 14 x exp( tt a 1 dt, x > 0, a R, M 1 Γ(a, x = exp( xx a 1 + (a r exp( xx a 1 r + R(a, x, M, (19 r=1 where M > a 1, x > 0, a R, and the remainder term R(a, x, M (a M 1 exp( xx a 1 M. If β > 0, then from β = θ 1 Aθ y s, we have J p (θ = ( p 1 θ p exp( y 1 + y s β 1 ( exp( β β p ( β p 1 p 1 k=0 k 1 Γ( k+1, β. Using the expansion (19, and the fact that J p (θ (p 1! θ p 1 β +, we obtain for r < p 1 c(p, r = 0, 1 r < p 1, c(p, p 1 = It follows the following expansion: ( 1 + y s β ( p 1 k ( 1 p 1 k (p 1! p 1. p θ M 1 p 1 exp( y J p(θ = (p 1! + p 1 where the remainder term which achieves the proof. R(β, M p 1 c(p, M M (p 1, ( β r=p ( β exp( y as c(p, r r (p 1 + R(β, M, 6
7 4.1 Numerical calculations As an illustration we consider n = 4, p = 7, A B(± 1 n, and y = 0. The choice M = 17 corresponds to the relative error R(β,M Φ(β,M 10 4 for β 7.5. Figure 1: Curves of Φ(β and Φ(β, 17, β = [6, 45], p = 7, n = 4. Numerical calculations show that the function Φ(β (15 explodes for β > To compass these explosions we use the expansion (19, and then we use the function Φ(, M (17. In Fig.(1 and Fig.( dashed and black curves represent respectively the function Φ(, M and Φ. In Fig. (1 we plot Φ(, M and Φ for β (6, 45. By zooming on β (1.09, 7.5, β (7.5, 13.8, and β (13.8, 15 we show that the behavior of Φ becomes abnormal from β β Φ = 13.8 and obtain Fig.(. 4. The case LASS0={ 0 }: Partition function estimate and concentration inequality The following is a consequence of [9] Lemma.1. Proposition The function r (0, + ϕ(r, θ is convex and its unique critical point is the mode of (10. r(θ = β + β + 4(p 1 Aθ (0 7
8 ( By denoting M(θ = exp ϕ(r(θ, θ, we obtain M(θr(θ p J p (θ M(θr(θ(p 1! exp(p 1 (p 1 p. qr(θ 3 We have for q > 0, exp( ϕ(r, θdr pγ(p, (p 1q exp(p 1 (p 1 p 0 exp( ϕ(r, θdr, and Z min := S inf θ S M(θr(θ p Z S (p 1! exp(p 1 (p 1 p sup θ S M(θr(θ := Z max, where S denotes the surface of the unit sphere S. 4 If x is drawn from Bayesian LASSO distribution c (1, then for q > 0, x qr(θ with the probability at least equal to P (q, p := 1 pγ(p,(p 1q exp(p 1 (p 1 p exp( (p 1.. In particular for q = 5 we have pγ(p,(p 1q exp(p 1 (p 1 p Figure : Curves of Φ(β and Φ(β, 17, p = 7 and n = 4. Remark 4.4. If y = 0, then LASS0=0 and the mode (0 becomes r(θ θ 1 = β( β + β + 4(p 1. 8
9 Figure 3: Curve of r(θ θ 1, β min := 1 A = , β max = 44.5, min(r(θ θ 1 = , p = 7 and n = 4. In Fig.(3 we plot β [0.4987, 44.5] β( β+ β +4(p 1. The mode of the distribution of 1 Z exp( ϕ(r, θdrdθ is equal to arg min ϕ(r(θ, θ = r>0,θ S (r(θ, θ. As an illustration we consider p = 7, n = 4, A B(± 1 n. We draw uniformly N = 10 5 sample θ i S from the unit sphere S. For each i, we calculate ϕ(r(θ i, θ i, and we derive θ. Notice that β = θ 1 Aθ = β Φ is nearly equal to the beginning of abnormality of Φ. Using Formula (5 and Monte Carlo method, we obtain Z.14, Z min and Z max If we draw N = 10 5 vectors using Laplace distribution (3 and calculate the value of Z using Formula ( and Monte Carlo method, then we obtain Z < Z min. Hence Monte Carlo method using Formula (5 wins against Monte Carlo method using Formula (. 5 The case 0 / LASSO If 0 / LASSO, then the assertions of Proposition (4.3 are no longer valid. But we are going to show that these assertions becomes valid if we work 9
10 around LASSO. We consider for l LASSO, h(x = A(x + l y x + l 1, h(x := h(x h(0, f(x = exp ( h(x. (1 Contrary to the map x c(x, the map x f(x attains its supremum at the origin. Observe that f(x l c(x = R f(xdx, p Z = exp(h(0z f := exp(h(0 f(xdx. R p If x is drawn from c, then x l is drawn from f Z f. Moreover where Z f = f(xdx = R p J p (θ, l := 0 S J p (θ, ldθ, ( f(rθr p 1 dr. (3 The map x R p J 1 p p (x, l := x A,y,l is nearly a norm (only the eveness is missing. The set K(A, y, l = {x R p : x A,y,l 1} (4 is convex, compact and contains the origin. The volume V ol(k(a, y, l = Z f p. If x is drawn from c, then x l is drawn from f Z f, or equivalently if x is drawn from f Z f, then x + l is drawn from c. To draw x from f Z f, we draw θ = uniformely on S, and then we draw x from x x µ θ,l (r := f(rθ J p (θ, l. (5 We have from [9] Lemma.1 and Remarks page 14 the following result. 10
11 Proposition The function r (0, + ϕ(r, θ, l := A(rθ + l y + rθ + l 1 (p 1 ln(r + h(0 is convex and its unique critical point ( r(θ, l is the mode of (5. By denoting M(θ, l = exp ϕ(r(θ, l, θ, l, we obtain M(θ, lr(θ, l p J p (θ, l M(θ, lr(θ, l(p 1! exp(p 1 (p 1 p, and for q > 0, exp( ϕ(r, θ, ldr qr(θ,l pγ(p, (p 1q exp(p 1 (p 1 p 0 exp( ϕ(r, θ, ldr. 3 We have S inf θ S M(θ, lr(θ, l p Z f S (p 1! exp(p 1 (p 1 p sup M(θ, lr(θ, l. θ S 4 if x is drawn from the distribution c (1, then x l qr(θ, l with the probability at least equal to 1 pγ(p,(p 1q exp(p 1 (p 1 p. 5.1 Calculation of the mode of (5 and the partition function (3 Now, we are going to calculate the mode r(θ, l, and the partition function J p (θ, l. The calculations are similar to the case LASSO={0}, but we need new notations. The vector y l = y Al, s l = cos(θ l where θ l denotes the angle (Aθ, y l, b l = y l s l. The components of the vector l LASSO are denoted by l 1,..., l p. For θ R p, we set S 0 = {i {1,..., p} : θ i = 0}, S + = {i {1,..., p} : θ i 0, θ i l i 0}, S = {i {1,..., p} : θ i 0, θ i l i < 0}. The cardinality of S is denoted by S, and the order statistic of the sequence l i θ i, for i S is denoted by lθ(0 := 0 lθ(1 := l θ (1... lθ( S := l θ ( S lθ( S + 1 := +. Using these new notations, we obtain rθ + l 1 = l i + θ i (r + l i + θ i r l i θ i θ i. i S 0 i S + i S 11
12 If lθ(k r < lθ(k + 1, then where c k := i S 0 l i + θ 1,k := rθ + l 1 = θ 1,k r + c k, i S + θ i + i S + l i k k i=0,i S l(i + i=0,i S θ(i Observe that θ 1, S = θ 1, and if Aθ = 0 then J p (θ, l = exp( y l Now, we have the following. S lθ(k+1 k=0 lθ(k S i=k+1 S l(i, i=k+1,i S θ(i. exp( θ 1,k rr p 1 dr. Proposition 5.. If Aθ = 0, then exp( y l J p (θ, l = exp( c k ((lθ(k + 1 p (lθ(k p + p k I 1 (θ ( exp( c k Γ(p, lθ(k Γ(p, lθ(k + 1, (6 where k I (θ θ p 1,k I 1 (θ = {k {0,..., S }, such that θ 1,k = 0}, I (θ = {k {0,..., S }, such that θ 1,k > 0}. Now, we are going to give the closed form of J p (θ, l when Aθ 0. We observe for lθ(k r < lθ(k + 1 that where raθ + Al y + rθ + l 1 = α k + ( Aθ r + β k, β k = θ 1,k Aθ b l, α k = Al y β k + c k. 1
13 Moreover if k I 1 (θ, then β k = b l. Observe also that β S is bounded below by y l sup(s l : θ S. It follows that J p (θ, l = exp( α k J p,k (θ, l + exp( α k J p,k (θ, l, where k I 1 (θ J p,k (θ, l = The calculation of lθ(k+1 lθ(k lθ(k+1 lθ(k k I (θ exp( ( Aθ r + β k r p 1 dr. exp( ( Aθ r + β k r p 1 dr is similar to Proposition (4.1, and depends on the sign of x k = Aθ lθ(k + β k, y k = Aθ lθ(k β k. Let k 0 = max(k : x k < 0 and k 1 = min(k : y k > 0. Observe that k k 1, and then y k 0 for k < k 1 k It follows for k < k 1 that J p,k (θ, l = 1 p 1 Aθ p j=0 j 1 ( p 1 If k 1 < k 0 + 1, then x k1 < 0 < y k1 J p,k1 (θ, l = and for k > k 1, J p,k (θ, l = 1 Aθ p p 1 j=0 1 Aθ p j 1 p 1 j=0 p 1 j=0 j j 1 ( p 1 j j 1 ( β k p 1( γ( j + 1, x k γ(j + 1, y k. and ( p 1 j ( β k p 1 j γ( j + 1, y k + 1 Aθ p ( β k p 1 γ( j + 1, x k, ( p 1 If k 1 = k 0 + 1, then 0 x k1 < y k1, and for k k 1, J p,k (θ, l = 1 p 1 Aθ p j=0 j 1 ( p 1 j j ( β k p 1( γ( j + 1, y k γ(j + 1, x k. ( β k p 1( γ( j + 1, y k γ(j + 1, x k. Now we can show that J p (θ, l converges to (6 as Aθ 0, and we obtain an approximation similar to (19 for J p,k (θ, l as Aθ 0 for each k I 1 (θ. 13
14 6 MCMC diagnosis Here we take p = 7, n = 4, A B(± 1 n and for simplicity we consider y = 0. We sample from the distribution c (1 using Hastings-Metropolis algorithm (x (t and propose the test x (t qr(θ (t as a criterion for the convergence. Here θ (t := x(t. We recall that if x is drawn from the x (t target distribution c, then x qr(θ with the probability at least equal to P (q, p. Table gives the values of the probability P (q, p. Note that for q.5 the criterion x (t qr(θ (t is satisfied with a large probability. q P (q, p Table : Values of the probability P (q, p for p = Independent sampler (IS The proposal distribution Q(x, x 1 = p(x = 1 p exp( x 1, x 1, x. The ratio c(x p(x p Z, x. It s known that MCMC (x (t with the target distribution c and the proposal distribution p is uniformly ergodic [11]: sup P(x (t A x (0 c(xdx (1 Z A B(R p A p t. Here Z.14 and then (1 Z p = Figure 4(a shows respectively the plot of t 5r(θ (t and t x (t. 6. Random-walk (RW Metropolis algorithm We do not know if the target distribution c satisfies the curvature condition in [13] Section 6. Here we propose to analyse the convergence of the Random walk Metropolis algorithm (x (t using the criterion x (t qr(θ (t. Figure 4(b shows respectively the plot of t 5r(θ (t and t x (t. Figures 4 show that contrary to independent sampler algorithm, the random walk (RW algorithm satisfies early the criterion x (t 5r(θ. More precisely 14
15 1 N 1 N 1 the independent sampler (IS algorithm begins to satisfy the criterion x (t 5r(θ (t at t = iteration. The RW algorithm begins to satisfy the criterion x (t 3.5r(θ (t at t = iteration, but the IS algorithm never satisfies the criterion x (t 3.5r(θ (t. We finally compare IS and RW algorithms using the fact that R xc(xdx = p 0. The best algorithm will furnish the best approximation of the integral R xc(xdx. Table 3 gives the estimators 1 N p N t=1 x(t IS R xc(xdx and p N t=1 x(t RW R xc(xdx. It follows that 1 N p N t=1 x(t IS = and N t=1 x(t RW = We conclude that the random walk algorithm wins for both criteria against independent sampler algorithm. x 1 x x 3 x 4 x 5 x 6 x 7 ˆx IS ˆx RW Table 3: N = 10 6, p = 7, n = 4 and q = 5. Figure 4: (a: Test of convergence of MCMC algorithm with proposal distribution p(x. (b: Test of convergence of MCMC algorithm with N (0, 0.5I p proposal distribution. N = 10 6 iterations, p = 7, n = 4, q = 5, A B(± 1 n, y = 0 and l = 0. 15
16 7 Conclusion We studied the geometry of bayesian LASSO using polar coordinates and calculated the partition function. We obtained a concentration inequality and derived MCMC convergence diagnosis for the convergence of hasting metropolis algorithm. We showed that the random walk MCMC with the variance 0.5 wins again the independent sampler with the Laplace proposal distribution. 8 References References [1] K. Ball, Logarithmically concave functions and sections of convex sets in R n, Studia Mathematica, T. LXXXVIII ( [] A. Beck, M. Teboulle, A Fast Iterative Shrinkage-thresholding Algorithm for linear Inverse Problem, SIAM J. Imaging Sci. ( [3] E.T. Copson, Asymptotic Expansions, Cambridge at the university press (1965. [4] S. Chen, D. L. Donoho, M. Saunders. Atomic decomposition by basis pursuit, SIAM J. Sci. Computing, Vol. 0 No. 1 ( [5] I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Communications on Pure and Applied Mathematics Vol. LVII (004, [6] A. Dermoune, D. Ounaissi, N. Rahmania, Oscillation of Metropolis- Hasting and simulated annealing algorithms around penalized least squares estimator, Math. Comput.Simulation (015. [7] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani, Least angle regression, Ann. Statist. 37 ( [8] G. Fort, S. Le Corff, E. Moulines, A. Schreck, A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection, arxiv: (015. [9] B. Klartag, V.D. Milman, Geometry of log-concave functions and measures, Geometriae Dedicata, Volume 11 Issue 1 ( [10] N. Parikh, S. Boyed, Proximal algorithms, Foundation and Trends in Optimization, 1 (3 (
17 [11] K. Mengersen, R.L Tweedie, Rates of convergence of the Hastings and Metropolis algorithms, The Annals of Statistics (1994 4: [1] M. Pereyra, Proximal Markov chain Monte Carlo algorithms, Stat. Comput. ( [13] G.0. Roberts, A.L. Tweedie, Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms, Biometrika 83 1 ( [14] R. Tibshirani, Regression shrinkage and selection via Lasso, Journal of the Royal Statistical Society. Series B. Methodological 58 1 ( [15] R. Tibshirani, The Lasso problem and uniqueness, Electron. J. Stat. 7 (
arxiv: v1 [math.st] 22 Feb 2018
Bayesian Lasso : Concentration and MCMC Diagnosis arxiv:18.857v1 [math.st] Feb 18 D.Ounaissi 1, N.Rahmania 1 Laboratoire LAMA, UFR de Mathématiques, Université de Paris 1, France. Laboratoire Paul Painlevé,
More informationOptimization methods
Optimization methods Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda /8/016 Introduction Aim: Overview of optimization methods that Tend to
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationIntegrated Non-Factorized Variational Inference
Integrated Non-Factorized Variational Inference Shaobo Han, Xuejun Liao and Lawrence Carin Duke University February 27, 2014 S. Han et al. Integrated Non-Factorized Variational Inference February 27, 2014
More informationSparsity Regularization
Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation
More informationLecture 8: The Metropolis-Hastings Algorithm
30.10.2008 What we have seen last time: Gibbs sampler Key idea: Generate a Markov chain by updating the component of (X 1,..., X p ) in turn by drawing from the full conditionals: X (t) j Two drawbacks:
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationSparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda
Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic
More informationON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS
The Annals of Applied Probability 1998, Vol. 8, No. 4, 1291 1302 ON CONVERGENCE RATES OF GIBBS SAMPLERS FOR UNIFORM DISTRIBUTIONS By Gareth O. Roberts 1 and Jeffrey S. Rosenthal 2 University of Cambridge
More informationThe Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic
he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely
More information17 : Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models, Spring 2015 17 : Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Heran Lin, Bin Deng, Yun Huang 1 Review of Monte Carlo Methods 1.1 Overview Monte Carlo
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationQuantitative Non-Geometric Convergence Bounds for Independence Samplers
Quantitative Non-Geometric Convergence Bounds for Independence Samplers by Gareth O. Roberts * and Jeffrey S. Rosenthal ** (September 28; revised July 29.) 1. Introduction. Markov chain Monte Carlo (MCMC)
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationInference in state-space models with multiple paths from conditional SMC
Inference in state-space models with multiple paths from conditional SMC Sinan Yıldırım (Sabancı) joint work with Christophe Andrieu (Bristol), Arnaud Doucet (Oxford) and Nicolas Chopin (ENSAE) September
More informationWeak convergence of Markov chain Monte Carlo II
Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult
More informationMaster 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique
Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationThe Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model
Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationMultimodal Nested Sampling
Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationTractable Upper Bounds on the Restricted Isometry Constant
Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationWinter 2019 Math 106 Topics in Applied Mathematics. Lecture 9: Markov Chain Monte Carlo
Winter 2019 Math 106 Topics in Applied Mathematics Data-driven Uncertainty Quantification Yoonsang Lee (yoonsang.lee@dartmouth.edu) Lecture 9: Markov Chain Monte Carlo 9.1 Markov Chain A Markov Chain Monte
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 3 More Markov Chain Monte Carlo Methods The Metropolis algorithm isn t the only way to do MCMC. We ll
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationarxiv: v2 [math.st] 12 Feb 2008
arxiv:080.460v2 [math.st] 2 Feb 2008 Electronic Journal of Statistics Vol. 2 2008 90 02 ISSN: 935-7524 DOI: 0.24/08-EJS77 Sup-norm convergence rate and sign concentration property of Lasso and Dantzig
More informationOptimisation Combinatoire et Convexe.
Optimisation Combinatoire et Convexe. Low complexity models, l 1 penalties. A. d Aspremont. M1 ENS. 1/36 Today Sparsity, low complexity models. l 1 -recovery results: three approaches. Extensions: matrix
More informationNonlinear Programming Models
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models p. Introduction Nonlinear Programming Models p. NLP problems minf(x) x S R n Standard form:
More informationarxiv: v1 [math.st] 19 Dec 2013
A Shrinkage-Thresholding Metropolis adjusted Langevin algorithm for Bayesian Variable Selection Amandine Schreck arxiv:131.5658v1 [math.st] 19 Dec 13 LTCI, Télécom ParisTech and CNRS, Paris, France. Gersende
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationMessage Passing Algorithms for Compressed Sensing: II. Analysis and Validation
Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of
More informationDeblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox Richard A. Norton, J.
Deblurring Jupiter (sampling in GLIP faster than regularized inversion) Colin Fox fox@physics.otago.ac.nz Richard A. Norton, J. Andrés Christen Topics... Backstory (?) Sampling in linear-gaussian hierarchical
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationIntroduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo
Introduction to Computational Biology Lecture # 14: MCMC - Markov Chain Monte Carlo Assaf Weiner Tuesday, March 13, 2007 1 Introduction Today we will return to the motif finding problem, in lecture 10
More informationOn isotropicity with respect to a measure
On isotropicity with respect to a measure Liran Rotem Abstract A body is said to be isoptropic with respect to a measure µ if the function θ x, θ dµ(x) is constant on the unit sphere. In this note, we
More informationAnalysis of the Gibbs sampler for a model. related to James-Stein estimators. Jeffrey S. Rosenthal*
Analysis of the Gibbs sampler for a model related to James-Stein estimators by Jeffrey S. Rosenthal* Department of Statistics University of Toronto Toronto, Ontario Canada M5S 1A1 Phone: 416 978-4594.
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationPattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods
Pattern Recognition and Machine Learning Chapter 11: Sampling Methods Elise Arnaud Jakob Verbeek May 22, 2008 Outline of the chapter 11.1 Basic Sampling Algorithms 11.2 Markov Chain Monte Carlo 11.3 Gibbs
More informationCOS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 10
COS53: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 0 MELISSA CARROLL, LINJIE LUO. BIAS-VARIANCE TRADE-OFF (CONTINUED FROM LAST LECTURE) If V = (X n, Y n )} are observed data, the linear regression problem
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationI. Bayesian econometrics
I. Bayesian econometrics A. Introduction B. Bayesian inference in the univariate regression model C. Statistical decision theory D. Large sample results E. Diffuse priors F. Numerical Bayesian methods
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationINTRODUCTION TO BAYESIAN STATISTICS
INTRODUCTION TO BAYESIAN STATISTICS Sarat C. Dass Department of Statistics & Probability Department of Computer Science & Engineering Michigan State University TOPICS The Bayesian Framework Different Types
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationApril 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning
for for Advanced Topics in California Institute of Technology April 20th, 2017 1 / 50 Table of Contents for 1 2 3 4 2 / 50 History of methods for Enrico Fermi used to calculate incredibly accurate predictions
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationReversible Markov chains
Reversible Markov chains Variational representations and ordering Chris Sherlock Abstract This pedagogical document explains three variational representations that are useful when comparing the efficiencies
More informationA Dirichlet Form approach to MCMC Optimal Scaling
A Dirichlet Form approach to MCMC Optimal Scaling Giacomo Zanella, Wilfrid S. Kendall, and Mylène Bédard. g.zanella@warwick.ac.uk, w.s.kendall@warwick.ac.uk, mylene.bedard@umontreal.ca Supported by EPSRC
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationMonte Carlo Methods. Leon Gu CSD, CMU
Monte Carlo Methods Leon Gu CSD, CMU Approximate Inference EM: y-observed variables; x-hidden variables; θ-parameters; E-step: q(x) = p(x y, θ t 1 ) M-step: θ t = arg max E q(x) [log p(y, x θ)] θ Monte
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationMarkov chain Monte Carlo
Markov chain Monte Carlo Karl Oskar Ekvall Galin L. Jones University of Minnesota March 12, 2019 Abstract Practically relevant statistical models often give rise to probability distributions that are analytically
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationLog-concave sampling: Metropolis-Hastings algorithms are fast!
Proceedings of Machine Learning Research vol 75:1 5, 2018 31st Annual Conference on Learning Theory Log-concave sampling: Metropolis-Hastings algorithms are fast! Raaz Dwivedi Department of Electrical
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationAdvances and Applications in Perfect Sampling
and Applications in Perfect Sampling Ph.D. Dissertation Defense Ulrike Schneider advisor: Jem Corcoran May 8, 2003 Department of Applied Mathematics University of Colorado Outline Introduction (1) MCMC
More informationKobe University Repository : Kernel
Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI URL Note on the Sampling Distribution for the Metropolis-
More informationOn John type ellipsoids
On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationAdaptive Monte Carlo methods
Adaptive Monte Carlo methods Jean-Michel Marin Projet Select, INRIA Futurs, Université Paris-Sud joint with Randal Douc (École Polytechnique), Arnaud Guillin (Université de Marseille) and Christian Robert
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationLasso: Algorithms and Extensions
ELE 538B: Sparsity, Structure and Inference Lasso: Algorithms and Extensions Yuxin Chen Princeton University, Spring 2017 Outline Proximal operators Proximal gradient methods for lasso and its extensions
More informationInverse Problems in the Bayesian Framework
Inverse Problems in the Bayesian Framework Daniela Calvetti Case Western Reserve University Cleveland, Ohio Raleigh, NC, July 2016 Bayes Formula Stochastic model: Two random variables X R n, B R m, where
More informationAdaptive Population Monte Carlo
Adaptive Population Monte Carlo Olivier Cappé Centre Nat. de la Recherche Scientifique & Télécom Paris 46 rue Barrault, 75634 Paris cedex 13, France http://www.tsi.enst.fr/~cappe/ Recent Advances in Monte
More informationExample: Ground Motion Attenuation
Example: Ground Motion Attenuation Problem: Predict the probability distribution for Peak Ground Acceleration (PGA), the level of ground shaking caused by an earthquake Earthquake records are used to update
More informationThe lasso. Patrick Breheny. February 15. The lasso Convex optimization Soft thresholding
Patrick Breheny February 15 Patrick Breheny High-Dimensional Data Analysis (BIOS 7600) 1/24 Introduction Last week, we introduced penalized regression and discussed ridge regression, in which the penalty
More informationECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis
ECE 8201: Low-dimensional Signal Models for High-dimensional Data Analysis Lecture 7: Matrix completion Yuejie Chi The Ohio State University Page 1 Reference Guaranteed Minimum-Rank Solutions of Linear
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationAsymptotics of minimax stochastic programs
Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming
More informationFirst Hitting Time Analysis of the Independence Metropolis Sampler
(To appear in the Journal for Theoretical Probability) First Hitting Time Analysis of the Independence Metropolis Sampler Romeo Maciuca and Song-Chun Zhu In this paper, we study a special case of the Metropolis
More informationarxiv: v1 [stat.co] 18 Feb 2012
A LEVEL-SET HIT-AND-RUN SAMPLER FOR QUASI-CONCAVE DISTRIBUTIONS Dean Foster and Shane T. Jensen arxiv:1202.4094v1 [stat.co] 18 Feb 2012 Department of Statistics The Wharton School University of Pennsylvania
More informationSimulation of truncated normal variables. Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris
Simulation of truncated normal variables Christian P. Robert LSTA, Université Pierre et Marie Curie, Paris Abstract arxiv:0907.4010v1 [stat.co] 23 Jul 2009 We provide in this paper simulation algorithms
More informationA Geometric Interpretation of the Metropolis Hastings Algorithm
Statistical Science 2, Vol. 6, No., 5 9 A Geometric Interpretation of the Metropolis Hastings Algorithm Louis J. Billera and Persi Diaconis Abstract. The Metropolis Hastings algorithm transforms a given
More informationr=1 r=1 argmin Q Jt (20) After computing the descent direction d Jt 2 dt H t d + P (x + d) d i = 0, i / J
7 Appendix 7. Proof of Theorem Proof. There are two main difficulties in proving the convergence of our algorithm, and none of them is addressed in previous works. First, the Hessian matrix H is a block-structured
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More information1 Sparsity and l 1 relaxation
6.883 Learning with Combinatorial Structure Note for Lecture 2 Author: Chiyuan Zhang Sparsity and l relaxation Last time we talked about sparsity and characterized when an l relaxation could recover the
More informationThe lasso, persistence, and cross-validation
The lasso, persistence, and cross-validation Daniel J. McDonald Department of Statistics Indiana University http://www.stat.cmu.edu/ danielmc Joint work with: Darren Homrighausen Colorado State University
More informationRegularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics
Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics arxiv:1603.08163v1 [stat.ml] 7 Mar 016 Farouk S. Nathoo, Keelin Greenlaw,
More information