Bayesian Statistics in High Dimensions

Size: px
Start display at page:

Download "Bayesian Statistics in High Dimensions"

Transcription

1 Bayesian Statistics in High Dimensions Lecture 2: Sparsity Aad van der Vaart Universiteit Leiden, Netherlands 47th John H. Barrett Memorial Lectures, Knoxville, Tenessee, May 2017

2 Contents Sparsity Bayesian Sparsity Frequentist Bayes Model Selection Prior Horseshoe Prior

3 Sparsity

4 Sparsity sequence model A sparse model has many parameters, but most of them are (nearly) zero.

5 Sparsity sequence model A sparse model has many parameters, but most of them are (nearly) zero. In this lecture, (Bayesian) theory for: Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. n independent observations Y1 n,...,y n n. n unknowns θ 1,...,θ n. Yi n = θ i +ε i, for standard normal noise ε i. n is large. many of θ 1,...,θ n are (almost) zero.

6 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax

7 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax, but inadmissible. Theorem [Stein, 1956] If n 3, then there exists T such that θ R n, E θ T(Y n ) θ 2 < E θ Y n θ 2.

8 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax, but inadmissible. Theorem [Stein, 1956] If n 3, then there exists T such that θ R n, E θ T(Y n ) θ 2 < E θ Y n θ 2. Empirical Bayes method [Robbins, 1960s]: Working hypothesis: θ 1,...,θ n iid G. Estimate G using Y n, pretending this is true. T(Y n ):= EĜ(Y n ) (θ Y n ).

9 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax, but inadmissible. Theorem [Stein, 1956] If n 3, then there exists T such that θ R n, E θ T(Y n ) θ 2 < E θ Y n θ 2. Empirical Bayes method [Robbins, 1960s]: Working hypothesis: θ 1,...,θ n iid G. Estimate G using Y n, pretending this is true. T(Y n ):= EĜ(Y n ) (θ Y n ). For large n the gain can be substantial, ( borrowing of strength ), and be targeted to special subsets of R n, e.g. sparse vectors.

10 Sparsity regression Y n θ N n (X n p θ,σ 2 I), for θ = (θ 1,...,θ p ) R p Yi n : measurement on individual i = 1,...,n. X ij score of individual i on feature j = 1,...p. θ j effect of feature j. sparse if only few features matter. If p > n, then sparsity is necessary to recover θ, and X must be sparse-invertible, e.g.: Xθ Compatibility: 2 Sθ inf 0. θ: S θ 5s n X θ 1 s Mutual coherence: n max cor(x.i,x.j ) 1. i j s n = true number of nonzero coordinates.

11 Sparsity RNA sequencing Y i,j : RNA expression count of tag j = 1,...,p in tissue i = 1,...,n, x i : covariate(s) of tissue i, e.g. 0 or 1 for normal or cancer. sparse if only few tags (genes) matter. Y i,j (zero-inflated) negative binomial, with EY i,j = e α j+β j x i, vary i,j = EY i,j ( 1+EYi,j e φ j ). distribution of β j s and φ j s estimated by Empirical Bayes [Smyth & Robinson et al., van der Wiel, Leday, van Wieringen, & vdv et al., 12]

12 Sparsity Gaussian graphical model Data n iid copies of Y p θ N p (0,θ 1 ) Y p j = value of individual on feature j. precision matrix θ gives partial correlations: cor(y p i,y p j Y p k :k i,j) = θ i,j θi,i θ j,j. nodes 1,2,...,p edge (i,j) present iff θ i,j 0 sparse if few edges Apoptosis network

13 Bayesian Sparsity

14 Bayesian sparsity A sparse model has many parameters, but most of them are (nearly) zero.

15 Bayesian sparsity A sparse model has many parameters, but most of them are (nearly) zero. We express this in the prior, and apply the standard (full or empirical) Bayesian machine.

16 Bayesian sparsity A sparse model has many parameters, but most of them are (nearly) zero. We express this in the prior, and apply the standard (full or empirical) Bayesian machine. Prior θ Π, and data Y n θ p( θ), give posterior: dπ(θ Y n ) p(y n θ)dπ(θ).

17 Bayesian sparsity Gaussian graphical model nodes 1,2,...,p edge (i,j) present iff θ i,j 0 sparse if few edges Apoptosis network For given incidence matrix (P i,j ) use different priors for (θ i,j :P i,j = 0) and (θ i,j :P i,j = 1). [Leday, Kpogbezan, vdv, van Wieringen, van der Wiel (2016, 17).]

18 Model selection prior Constructive definition of prior Π for θ R p : (1) Choose s from prior on {0,1,2,...,p}. (2) Choose S {0,1,...,p} of size S = s at random. (3) Choose θ S = (θ i :i S) g S and set θ S c = 0. [Mitchell & Beachamp (88), George, George & McCulloch, Yuan, Berger, Johnstone & Silverman, Richardson et al., Johnson & Rossell, Chao Gao,...]

19 Model selection prior Constructive definition of prior Π for θ R p : (1) Choose s from prior on {0,1,2,...,p}. (2) Choose S {0,1,...,p} of size S = s at random. (3) Choose θ S = (θ i :i S) g S and set θ S c = 0. Example: spike and slab Choose θ 1,...,θ p i.i.d. from τδ 0 +(1 τ)g. Put a prior on τ, e.g. Beta(1,p+1). Then s binomial and g S = i S g. [Mitchell & Beachamp (88), George, George & McCulloch, Yuan, Berger, Johnstone & Silverman, Richardson et al., Johnson & Rossell, Chao Gao,...]

20 Horseshoe prior Constructive definition of prior Π for θ R p : (1) Generate τ Cauchy + (0,σ) (?) (2) Generate ψ 1,..., ψ p iid from Cauchy + (0,τ). (3) Generate independent θ i N(0,ψ i ).

21 Horseshoe prior Constructive definition of prior Π for θ R p : (1) Generate τ Cauchy + (0,σ) (?) (2) Generate ψ 1,..., ψ p iid from Cauchy + (0,τ). (3) Generate independent θ i N(0,ψ i ). Motivation if θ N(0,ψ) and Y θ N(θ,1), then θ Y,ψ N ( (1 κ)y,1 κ ) for κ = 1/(1+ψ). This suggests a prior for κ that concentrates near 0 or 1. [Carvalho & Polson & Scott, 10.]

22 Horseshoe prior Constructive definition of prior Π for θ R p : (1) Generate τ Cauchy + (0,σ) (?) (2) Generate ψ 1,..., ψ p iid from Cauchy + (0,τ). (3) Generate independent θ i N(0,ψ i ). Motivation if θ N(0,ψ) and Y θ N(θ,1), then θ Y,ψ N ( (1 κ)y,1 κ ) for κ = 1/(1+ψ). This suggests a prior for κ that concentrates near 0 or 1. prior shrinkage factor prior of θ i posterior mean of θ i as function of Y i [Carvalho & Polson & Scott, 10.]

23 Other sparsity priors Bayesian LASSO: θ 1,...,θ p iid from a mixture of Laplace (λ) distributions over λ Γ(a,b). Bayesian bridge: Same but with Laplace replaced with a density e λy α. Normal-Gamma: θ 1,...,θ p iid from a Gamma scale mixture of Gaussians. Correlated multivariate normal-gamma: θ = Cφ for a p k-matrix C and φ with independent normal-gamma (a i,1/2) coordinates. Horseshoe. Horseshoe+. Normal spike. Scalar multiple of Dirichlet. Nonparametric Dirichlet.... [Park & Casella 08, Polson & Scott, Griffin & Brown 10, 12, Carvalho & Polson & Scott, 10, George& Rockova 13, Bhattacharya et al. 12,...]

24 LASSO is not Bayesian p ] ˆθ LASSO = argmin [ Y n Xθ 2 +λ n θ i. θ i=1 posterior mode for prior θ i iid Laplace(λ n ), works great, but the full posterior distribution is useless.

25 LASSO is not Bayesian p ] ˆθ LASSO = argmin [ Y n Xθ 2 +λ n θ i. θ i=1 posterior mode for prior θ i iid Laplace(λ n ), works great, but the full posterior distribution is useless. Theorem If n/λ n then E 0 Π n ( θ 2 n/λ n Y n) 0. λ n = 2logn gives almost no Bayesian shrinkage. Trouble: λ n must be large to shrink θ i to 0, but small to model nonzero θ i.

26 Frequentist Bayes

27 Frequentist Bayes Assume data Y n follows a given parameter θ 0. Consider posterior Π(θ Y n ) as random measure on parameter set. We like Π(θ Y n ): to put most of its mass near θ 0 for most Y n. to have a spread that expresses remaining uncertainty. to select the model defined by the nonzero parameters of θ 0. We evaluate this by probabilities or expectations, given θ 0.

28 Benchmarks for recovery sequence model Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. θ 0 = #(1 i n:θ i 0), n θ 2 2 = θ i 2. Frequentist benchmark: minimax rate relative to 2 over: black bodies {θ: θ 0 s n }: sn log(n/s n ). i=1 [(if s n with s n /n 0.) Donoho & Johnstone, Golubev, Johnstone and Silverman, Abramovich et al.,...]

29 Benchmarks for recovery sequence model Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. θ 0 = #(1 i n:θ i 0), n θ q q = θ i q, 0 < q 2. i=1 Frequentist benchmarks: minimax rate relative to q over: black bodies {θ: θ 0 s n }: log(n/sn ). s 1/q n

30 Benchmarks for recovery sequence model Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. θ 0 = #(1 i n:θ i 0), n θ q q = θ i q, 0 < q 2. i=1 Frequentist benchmarks: minimax rate relative to q over: black bodies {θ: θ 0 s n }: log(n/sn ). s 1/q n weak l r -balls m r [s n ]:= {θ:max i i θ [i] r n(s n /n) r }: n 1/q (s n /n) r/q log(n/s n ) 1 r/q. [(if s n with s n /n 0.) Donoho & Johnstone, Golubev, Johnstone and Silverman, Abramovich et al.,...]

31 Model Selection Prior

32 Model selection prior Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. Constructive definition of prior Π for θ R p : (1) Choose s from prior π n on {0,1,2,...,n}. (2) Choose S {0,1,...,n} of size S = s at random. (3) Choose θ S = (θ i :i S) from density g S on R S and set θ S c = 0.

33 Model selection prior Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. Constructive definition of prior Π for θ R p : (1) Choose s from prior π n on {0,1,2,...,n}. (2) Choose S {0,1,...,n} of size S = s at random. (3) Choose θ S = (θ i :i S) from density g S on R S and set θ S c = 0. Assume π n (s) cπ n (s 1), for some c < 1 and every s. g S = i S e h, for uniformly Lipschitz h:r R. s n := θ 0 0, n, s n /n 0. Examples: complexity prior: π n (s) e aslog(bn/s). spike and slab: θ i iid τδ 0 +(1 τ)lap with τ B(1,n+1).

34 Numbers Single data with θ 0 = (0,...,0,5,...,5) and n = 500 and θ 0 0 = 100. Red dots: marginal posterior medians Orange: marginal credible intervals Green dots: data points. g standard Laplace density. ( ) π n (k) 2n k 0.1 n (left) and πn (k) ( ) 2n k (right). n

35 Dimensionality of posterior distribution Theorem [black body] There exists M such that sup θ 0 0 s n E θ0 Π n ( θ: θ 0 Ms n Y n) 0. Outside the space in which θ 0 lives, the posterior is concentrated in low-dimensional subspaces along the coordinate axes.

36 Recovery Theorem [black body] For every 0 < q 2 and large M, ( sup E θ0 Π n θ: θ θ0 q > Mr n s 1/q 1/2 n Y n) 0, θ 0 0 s n for r 2 n = s n log(n/s n ) log(1/π n (s n )). If π n (s n ) e as nlog(n/s n ) minimax rate is attained.

37 Selection S θ := {1 i n:θ i 0}. Theorem [No supersets] sup θ 0 0 s n E θ0 Π n (θ:s θ S θ0,s θ S θ0 Y n ) 0.

38 Selection S θ := {1 i n:θ i 0}. Theorem Theorem [No supersets] sup θ 0 0 s n E θ0 Π n (θ:s θ S θ0,s θ S θ0 Y n ) 0. [Finds big signals] inf θ 0 0 s n E θ0 Π n ( θ:sθ {i: θ 0,i logn} Y n) 1.

39 Selection S θ := {1 i n:θ i 0}. Theorem Theorem [No supersets] sup θ 0 0 s n E θ0 Π n (θ:s θ S θ0,s θ S θ0 Y n ) 0. [Finds big signals] inf θ 0 0 s n E θ0 Π n ( θ:sθ {i: θ 0,i logn} Y n) 1. Corollary: if all nonzero θ 0,i are suitably big, then posterior probability of true model S θ0 tends to 1.

40 Bernstein-von Mises theorem Theorem For spike-and-laplace(λ n )-slab prior with λ n logn/sn 0, there are random weights ŵ S, E θ0 Π n ( Y n ) ŵ S N S (YS n,i) δ S c 0. S Theorem Given consistent model selection, mixture can be replaced by N S0 (Y Sθ0,I) δ S c θ0. Corollary: Given consistent model selection, credible sets for individual parameters are asymptotic confidence sets.

41 Numbers: mean square errors p n A PM PM EBM PMed PMed EBMed HT HTO Average ˆθ θ 2 over 100 data experiments. n = 500; θ 0 = (0,...,0,A,...,A). ( ) PM1, PM2: posterior means for priors π n (k) e k log(3n/k)/10, 2n k 0.1. n PMed1, PMed2 marginal posterior medians for the same priors EBM, EBMed: empirical Bayes mean, median for Laplace prior (Johnstone et al.) HT, HTO: thresholding at 2logn, 2log(n/ θ 0 0 ). Short Summary: Bayesian method is neither better nor worse.

42 Horseshoe Prior

43 Horseshoe prior Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. Constructive definition of prior Π for θ R p : (1) Choose sparsity level ˆτ. (2) Generate ψ 1,..., ψ n iid from Cauchy + (0,ˆτ). (3) Generate independent θ i N(0,ψ i ). prior shrinkage factor prior of θ i posterior mean of θ i as function of Y i [Carvalho & Polson & Scott, 10.]

44 Estimating τ Ad-hoc: ˆτ n = #{ Y i n 2logn}. 1.1n Empirical Bayes: For g τ the prior of θ i, ˆτ n = argmax τ [1/n,1] n i=1 φ(y i θ)g τ (θ)dθ. Full Bayes: τ set by a hyper prior (supported on [1/n,1]).

45 Numbers estimating τ MSE of posterior mean as function of nonzero parameter s n n = 100, s n coordinates from N(0,1/4), n s n coordinates from N(A,1). p n = s n Short summary: Empirical Bayes and Full Bayes outperform ad-hoc estimator.

46 Recovery Horseshoe prior gives similar recovery as model selection prior.

47 Recovery Horseshoe prior gives similar recovery as model selection prior. τ can be interpreted as (s n /n) log(n/s n ).

48 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95

49 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95 S a := { 1 i n: θ 0,i 1/n }, M a := { 1 i n:(s n /n) log(n/s n ) θ 0,i log(n/s n ) }. L a := { 1 i n: logn θ 0,i }.

50 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95 S a := { 1 i n: θ 0,i 1/n }, M a := { 1 i n:(s n /n) log(n/s n ) θ 0,i log(n/s n ) }. L a := { 1 i n: logn θ 0,i }. marginal credible intervals for a single Y n with n = 200 and s n = 10. θ 1 = = θ 5 = 7, θ 6 = = θ 10 = 1.5. Insert: credible sets 5 to 13.

51 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95 S a := { 1 i n: θ 0,i 1/n }, M a := { 1 i n:(s n /n) log(n/s n ) θ 0,i log(n/s n ) }. L a := { 1 i n: logn θ 0,i }. Theorem For any γ > 0 and θ 0 0 s n, ( 1 ) P θ0 #{i S a :θ 0,i #S Ĉni(L S,γ )} 1 γ a 1, ( P θ0 θ0,i / Ĉ ni (L)) 1, for any L > 0 and i M a, ( 1 ) P θ0 #{i L a :θ 0,i #L Ĉni(L L,γ )} 1 γ 1. a Few false discoveries; most easy discoveries made. Intermediate discoveries not made.

52 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model.

53 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model. Theorem [Li, 1987] If P θ0 (C n (Y n ) θ 0 ) 0.95, all θ 0 R n, then diam(c n (Y n )) n 1/4, some θ 0.

54 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model. Theorem [Li, 1987] If P θ0 (C n (Y n ) θ 0 ) 0.95, all θ 0 R n, then diam(c n (Y n )) n 1/4, some θ 0. Theorem [Nickl, van de Geer, 2013] If s 1,n s 2,n and diam(c n (Y n )) is of optimal size, uniformly in θ 0 0 s i,n for i = 1,2, then C n (Y n ) cannot have uniform coverage over {θ 0 : θ 0 0 s 2,n }.

55 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model. Theorem [Li, 1987] If P θ0 (C n (Y n ) θ 0 ) 0.95, all θ 0 R n, then diam(c n (Y n )) n 1/4, some θ 0. Theorem [Nickl, van de Geer, 2013] If s 1,n s 2,n and diam(c n (Y n )) is of optimal size, uniformly in θ 0 0 s i,n for i = 1,2, then C n (Y n ) cannot have uniform coverage over {θ 0 : θ 0 0 s 2,n }. Since the Bayesian procedure adapts to sparsity, its credible sets cannot be honest confidence sets. [Optimal size is ( (s i,n /n)log(n/s i,n ) ) 1/2.]

56 Simultaneous credible balls impossibility of adaptation restricting the parameter Coverage only when θ 0 does not cause too much shrinkage. DEFINITION [self-similarity] For s = θ 0 0 at least 0.001s coordinates of θ 0 satisfy θ 0,i log(n/s).

57 Simultaneous credible balls impossibility of adaptation restricting the parameter Coverage only when θ 0 does not cause too much shrinkage. DEFINITION [self-similarity] For s = θ 0 0 at least 0.001s coordinates of θ 0 satisfy θ 0,i log(n/s). DEFINITION [excessive-bias restriction, Belitser & Nurushev, 2015] θ 0 s and s with s # ( i: θ 0,i log(n/ s) ) and i: θ 0,i log(n/ s) θ 2 0,i slog(n/ s). Excessive-bias restriction implies self-similarity. (Self-similarity allows to tighten up the sets S,M,L.)

58 Simultaneous credible balls Credible ball: Ĉ n (L) = {θ: θ ˆθ Lˆr } ˆθ = E(θ Y n ) Π ( θ: θ ˆθ ˆr Y n) = 0.95

59 Simultaneous credible balls Credible ball: Ĉ n (L) = {θ: θ ˆθ Lˆr } ˆθ = E(θ Y n ) Π ( θ: θ ˆθ ˆr Y n) = 0.95 Theorem If s n /n 0, for sufficiently large L, lim inf n ( ) inf P θ 0 θ 0 Ĉn(L) θ 0 EBR[s n ] 1 α. EBR[s]: vectors θ 0 that satisfy excessive bias restriction.

60 Numbers coverage average interval length n = 400. s n ( = p ) nonzero means from N(A,1). n = 400. s n ( = p ) nonzero means from N(A,1). Short summary: empirical and full Bayes work well

61 Conclusions Bayesian sparse estimation gives excellent recovery. For valid simultaneous credible sets need a fraction of nonzero parameters above the universal threshold. The danger of failing uncertainty quantification is not finding nonzero coordinates. Discoveries are real.

62 Co-authors Subhashis Ghoshal Harry van Zanten Ismael Castillo Johannes Schmidt-Hieber Stéphanie van der Pas Botond Szabo Willem Kruijer Magnus Münch Bartek Knapik Bas Kleijn Suzanne Sniekers Fengnan Gao Gwenael Leday Gino Kpogbezan Mark van de Wiel Wessel van Wieringen

63 Compatibility and coherence X := max j X.,j. Compatibility number φ(s) for S {1,...,p} is: Xθ 2 S inf. θ S c 1 7 θ 1 X θ S 1 Compatibility in s n -sparse vectors means: Xθ 2 Sθ inf 0. θ: θ 0 5s n X θ 1 Strong compatibility ins n -sparse vectors means: Mutual coherence means: Xθ 2 inf 0. θ: θ 0 5s n X θ 2 s n max cor(x.i,x.j ) 1. i j

64 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive.

65 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive. Sequence model: Completely compatible, with zero mutual coherence number.

66 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive. Sequence model: Completely compatible, with zero mutual coherence number. Response model: If X i,j are i.i.d. random variables, then coherence if s n n/logp. if logp = o(n) and X i,j are bounded. if logp = o(n α/(4+α) ) and E txα i,n <.

67 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive. Sequence model: Completely compatible, with zero mutual coherence number. Response model: If X i,j are i.i.d. random variables, then coherence if s n n/logp. if logp = o(n) and X i,j are bounded. if logp = o(n α/(4+α) ) and E txα i,n <. C = X T X/n: Compatibility, but no coherence if C i,j = ρ i j, for 0 < ρ < 1, and p = n. C is block diagonal with fixed block sizes.

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

arxiv: v1 [math.st] 13 Feb 2017

arxiv: v1 [math.st] 13 Feb 2017 Adaptive posterior contraction rates for the horseshoe arxiv:72.3698v [math.st] 3 Feb 27 Stéphanie van der Pas,, Botond Szabó,,, and Aad van der Vaart, Leiden University and Budapest University of Technology

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Bayesian methods in economics and finance

Bayesian methods in economics and finance 1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

More information

arxiv: v3 [stat.me] 10 Aug 2018

arxiv: v3 [stat.me] 10 Aug 2018 Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,

More information

arxiv: v1 [math.st] 6 Nov 2012

arxiv: v1 [math.st] 6 Nov 2012 The Annals of Statistics 212, Vol. 4, No. 4, 269 211 DOI: 1.1214/12-AOS129 c Institute of Mathematical Statistics, 212 arxiv:1211.1197v1 [math.st] 6 Nov 212 NEEDLES AND STRAW IN A HAYSTACK: POSTERIOR CONCENTRATION

More information

Scalable MCMC for the horseshoe prior

Scalable MCMC for the horseshoe prior Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data

Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology

More information

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

Sparsity and the truncated l 2 -norm

Sparsity and the truncated l 2 -norm Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures

More information

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Module 22: Bayesian Methods Lecture 9 A: Default prior selection Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical

More information

Bayesian Penalty Mixing: The Case of a Non-separable Penalty

Bayesian Penalty Mixing: The Case of a Non-separable Penalty Bayesian Penalty Mixing: The Case of a Non-separable Penalty Veronika Ročková and Edward I. George Abstract Separable penalties for sparse vector recovery are plentiful throughout statistical methodology

More information

arxiv: v1 [stat.me] 6 Jul 2017

arxiv: v1 [stat.me] 6 Jul 2017 Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September

More information

Bayesian shrinkage approach in variable selection for mixed

Bayesian shrinkage approach in variable selection for mixed Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction

More information

Package horseshoe. November 8, 2016

Package horseshoe. November 8, 2016 Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding

More information

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is

Multinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs

De-biasing the Lasso: Optimal Sample Size for Gaussian Designs De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Partial factor modeling: predictor-dependent shrinkage for linear regression

Partial factor modeling: predictor-dependent shrinkage for linear regression modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

The Horseshoe Estimator for Sparse Signals

The Horseshoe Estimator for Sparse Signals The Horseshoe Estimator for Sparse Signals Carlos M. Carvalho Nicholas G. Polson University of Chicago Booth School of Business James G. Scott Duke University Department of Statistical Science October

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

arxiv: v2 [stat.me] 16 Aug 2018

arxiv: v2 [stat.me] 16 Aug 2018 arxiv:1807.06539v [stat.me] 16 Aug 018 On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models Ray Bai Malay Ghosh August 0, 018 Abstract We study high-dimensional Bayesian

More information

arxiv: v1 [math.st] 31 Mar 2019

arxiv: v1 [math.st] 31 Mar 2019 arxiv:1904.01003v1 [math.st] 31 Mar 2019 Robust inference for general framework of projection structures Eduard Belitser 1 and Nurzhan Nurushev 1,2 1 Department of Mathematics,VU Amsterdam 2 Korteweg-de

More information

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Problem statement. The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + β j X ij + σz i, Z i N(0, 1),

Problem statement. The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + β j X ij + σz i, Z i N(0, 1), Problem statement The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + p j=1 β j X ij + σz i, Z i N(0, 1), Equivalent form: Normal mean problem (known σ) Y i =

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Carl N. Morris. University of Texas

Carl N. Morris. University of Texas EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

The Threshold Selection Problem

The Threshold Selection Problem . p.1 The Threshold Selection Problem Given data X i N(µ i, 1), i =1,...,n some threshold rule, e.g. { X i X i t ˆθ i = 0 X i

More information

SUPPLEMENT TO EMPIRICAL BAYES ORACLE UNCERTAINTY QUANTIFICATION FOR REGRESSION. VU Amsterdam and North Carolina State University

SUPPLEMENT TO EMPIRICAL BAYES ORACLE UNCERTAINTY QUANTIFICATION FOR REGRESSION. VU Amsterdam and North Carolina State University Submitted to the Annals of Statistics SUPPLEMENT TO EMPIRICAL BAYES ORACLE UNCERTAINTY QUANTIFICATION FOR REGRESSION By Eduard Belitser and Subhashis Ghosal VU Amsterdam and North Carolina State University

More information

Hypes and Other Important Developments in Statistics

Hypes and Other Important Developments in Statistics Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Bayesian variable selection and classification with control of predictive values

Bayesian variable selection and classification with control of predictive values Bayesian variable selection and classification with control of predictive values Eleni Vradi 1, Thomas Jaki 2, Richardus Vonk 1, Werner Brannath 3 1 Bayer AG, Germany, 2 Lancaster University, UK, 3 University

More information

Handling Sparsity via the Horseshoe

Handling Sparsity via the Horseshoe Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637

More information

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables

A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,

More information

arxiv: v1 [math.st] 23 Nov 2017

arxiv: v1 [math.st] 23 Nov 2017 Risk quantification for the thresholding rule for multiple testing using Gaussian scale mixtures arxiv:1711.08705v1 [math.st] 3 Nov 017 Jean-Bernard Salomond Université Paris-Est, Laboratoire d Analyse

More information

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model

Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population

More information

Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective

Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign University of Tokyo: 25 November 2013 Joint work with Jiaying Gu (UIUC)

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Some Curiosities Arising in Objective Bayesian Analysis

Some Curiosities Arising in Objective Bayesian Analysis . Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work

More information

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics

(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Horseshoe Prior: robustness against non-normal deviations.

Horseshoe Prior: robustness against non-normal deviations. D. Gomon Horseshoe Prior: robustness against non-normal deviations. Bachelor thesis 3 July 7 Thesis supervisors: dr. S.L. van der Pas dr. A.J. Schmidt-Hieber prof.dr. K.H. Kuijken Leiden University Mathematical

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Geometric ergodicity of the Bayesian lasso

Geometric ergodicity of the Bayesian lasso Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components

More information

Horseshoe, Lasso and Related Shrinkage Methods

Horseshoe, Lasso and Related Shrinkage Methods Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella

More information

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.

L 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011. L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Recursive Sparse Estimation using a Gaussian Sum Filter

Recursive Sparse Estimation using a Gaussian Sum Filter Proceedings of the 17th World Congress The International Federation of Automatic Control Recursive Sparse Estimation using a Gaussian Sum Filter Lachlan Blackhall Michael Rotkowitz Research School of Information

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)

Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Dirichlet-Laplace priors for optimal shrinkage

Dirichlet-Laplace priors for optimal shrinkage Dirichlet-Laplace priors for optimal shrinkage Anirban Bhattacharya, Debdeep Pati, Natesh S. Pillai, David B. Dunson January 22, 2014 arxiv:1401.5398v1 [math.st] 21 Jan 2014 Abstract Penalized regression

More information

Probabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model

Probabilistic machine learning group, Aalto University  Bayesian theory and methods, approximative integration, model Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and

More information

Lecture 7 October 13

Lecture 7 October 13 STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation

Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of

More information

Lasso & Bayesian Lasso

Lasso & Bayesian Lasso Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

The Spike-and-Slab LASSO (Revision)

The Spike-and-Slab LASSO (Revision) The Spike-and-Slab LASSO (Revision) Veronika Ročková and Edward I. George July 19, 2016 Abstract Despite the wide adoption of spike-and-slab methodology for Bayesian variable selection, its potential for

More information

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11

Peter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11 Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Factor model shrinkage for linear instrumental variable analysis with many instruments

Factor model shrinkage for linear instrumental variable analysis with many instruments Factor model shrinkage for linear instrumental variable analysis with many instruments P. Richard Hahn Booth School of Business University of Chicago Joint work with Hedibert Lopes 1 My goals for this

More information

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage

Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Shrinkage Estimation in High Dimensions

Shrinkage Estimation in High Dimensions Shrinkage Estimation in High Dimensions Pavan Srinath and Ramji Venkataramanan University of Cambridge ITA 206 / 20 The Estimation Problem θ R n is a vector of parameters, to be estimated from an observation

More information

Sample Size Requirement For Some Low-Dimensional Estimation Problems

Sample Size Requirement For Some Low-Dimensional Estimation Problems Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

An Introduction to Bayesian Linear Regression

An Introduction to Bayesian Linear Regression An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,

More information

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming

High Dimensional Inverse Covariate Matrix Estimation via Linear Programming High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω

More information