Bayesian Statistics in High Dimensions
|
|
- Emil Atkinson
- 5 years ago
- Views:
Transcription
1 Bayesian Statistics in High Dimensions Lecture 2: Sparsity Aad van der Vaart Universiteit Leiden, Netherlands 47th John H. Barrett Memorial Lectures, Knoxville, Tenessee, May 2017
2 Contents Sparsity Bayesian Sparsity Frequentist Bayes Model Selection Prior Horseshoe Prior
3 Sparsity
4 Sparsity sequence model A sparse model has many parameters, but most of them are (nearly) zero.
5 Sparsity sequence model A sparse model has many parameters, but most of them are (nearly) zero. In this lecture, (Bayesian) theory for: Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. n independent observations Y1 n,...,y n n. n unknowns θ 1,...,θ n. Yi n = θ i +ε i, for standard normal noise ε i. n is large. many of θ 1,...,θ n are (almost) zero.
6 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax
7 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax, but inadmissible. Theorem [Stein, 1956] If n 3, then there exists T such that θ R n, E θ T(Y n ) θ 2 < E θ Y n θ 2.
8 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax, but inadmissible. Theorem [Stein, 1956] If n 3, then there exists T such that θ R n, E θ T(Y n ) θ 2 < E θ Y n θ 2. Empirical Bayes method [Robbins, 1960s]: Working hypothesis: θ 1,...,θ n iid G. Estimate G using Y n, pretending this is true. T(Y n ):= EĜ(Y n ) (θ Y n ).
9 History Given Y n θ N n (θ,i) the obvious estimator of θ is Y n. ML, UMVU, best-equivariant, minimax, but inadmissible. Theorem [Stein, 1956] If n 3, then there exists T such that θ R n, E θ T(Y n ) θ 2 < E θ Y n θ 2. Empirical Bayes method [Robbins, 1960s]: Working hypothesis: θ 1,...,θ n iid G. Estimate G using Y n, pretending this is true. T(Y n ):= EĜ(Y n ) (θ Y n ). For large n the gain can be substantial, ( borrowing of strength ), and be targeted to special subsets of R n, e.g. sparse vectors.
10 Sparsity regression Y n θ N n (X n p θ,σ 2 I), for θ = (θ 1,...,θ p ) R p Yi n : measurement on individual i = 1,...,n. X ij score of individual i on feature j = 1,...p. θ j effect of feature j. sparse if only few features matter. If p > n, then sparsity is necessary to recover θ, and X must be sparse-invertible, e.g.: Xθ Compatibility: 2 Sθ inf 0. θ: S θ 5s n X θ 1 s Mutual coherence: n max cor(x.i,x.j ) 1. i j s n = true number of nonzero coordinates.
11 Sparsity RNA sequencing Y i,j : RNA expression count of tag j = 1,...,p in tissue i = 1,...,n, x i : covariate(s) of tissue i, e.g. 0 or 1 for normal or cancer. sparse if only few tags (genes) matter. Y i,j (zero-inflated) negative binomial, with EY i,j = e α j+β j x i, vary i,j = EY i,j ( 1+EYi,j e φ j ). distribution of β j s and φ j s estimated by Empirical Bayes [Smyth & Robinson et al., van der Wiel, Leday, van Wieringen, & vdv et al., 12]
12 Sparsity Gaussian graphical model Data n iid copies of Y p θ N p (0,θ 1 ) Y p j = value of individual on feature j. precision matrix θ gives partial correlations: cor(y p i,y p j Y p k :k i,j) = θ i,j θi,i θ j,j. nodes 1,2,...,p edge (i,j) present iff θ i,j 0 sparse if few edges Apoptosis network
13 Bayesian Sparsity
14 Bayesian sparsity A sparse model has many parameters, but most of them are (nearly) zero.
15 Bayesian sparsity A sparse model has many parameters, but most of them are (nearly) zero. We express this in the prior, and apply the standard (full or empirical) Bayesian machine.
16 Bayesian sparsity A sparse model has many parameters, but most of them are (nearly) zero. We express this in the prior, and apply the standard (full or empirical) Bayesian machine. Prior θ Π, and data Y n θ p( θ), give posterior: dπ(θ Y n ) p(y n θ)dπ(θ).
17 Bayesian sparsity Gaussian graphical model nodes 1,2,...,p edge (i,j) present iff θ i,j 0 sparse if few edges Apoptosis network For given incidence matrix (P i,j ) use different priors for (θ i,j :P i,j = 0) and (θ i,j :P i,j = 1). [Leday, Kpogbezan, vdv, van Wieringen, van der Wiel (2016, 17).]
18 Model selection prior Constructive definition of prior Π for θ R p : (1) Choose s from prior on {0,1,2,...,p}. (2) Choose S {0,1,...,p} of size S = s at random. (3) Choose θ S = (θ i :i S) g S and set θ S c = 0. [Mitchell & Beachamp (88), George, George & McCulloch, Yuan, Berger, Johnstone & Silverman, Richardson et al., Johnson & Rossell, Chao Gao,...]
19 Model selection prior Constructive definition of prior Π for θ R p : (1) Choose s from prior on {0,1,2,...,p}. (2) Choose S {0,1,...,p} of size S = s at random. (3) Choose θ S = (θ i :i S) g S and set θ S c = 0. Example: spike and slab Choose θ 1,...,θ p i.i.d. from τδ 0 +(1 τ)g. Put a prior on τ, e.g. Beta(1,p+1). Then s binomial and g S = i S g. [Mitchell & Beachamp (88), George, George & McCulloch, Yuan, Berger, Johnstone & Silverman, Richardson et al., Johnson & Rossell, Chao Gao,...]
20 Horseshoe prior Constructive definition of prior Π for θ R p : (1) Generate τ Cauchy + (0,σ) (?) (2) Generate ψ 1,..., ψ p iid from Cauchy + (0,τ). (3) Generate independent θ i N(0,ψ i ).
21 Horseshoe prior Constructive definition of prior Π for θ R p : (1) Generate τ Cauchy + (0,σ) (?) (2) Generate ψ 1,..., ψ p iid from Cauchy + (0,τ). (3) Generate independent θ i N(0,ψ i ). Motivation if θ N(0,ψ) and Y θ N(θ,1), then θ Y,ψ N ( (1 κ)y,1 κ ) for κ = 1/(1+ψ). This suggests a prior for κ that concentrates near 0 or 1. [Carvalho & Polson & Scott, 10.]
22 Horseshoe prior Constructive definition of prior Π for θ R p : (1) Generate τ Cauchy + (0,σ) (?) (2) Generate ψ 1,..., ψ p iid from Cauchy + (0,τ). (3) Generate independent θ i N(0,ψ i ). Motivation if θ N(0,ψ) and Y θ N(θ,1), then θ Y,ψ N ( (1 κ)y,1 κ ) for κ = 1/(1+ψ). This suggests a prior for κ that concentrates near 0 or 1. prior shrinkage factor prior of θ i posterior mean of θ i as function of Y i [Carvalho & Polson & Scott, 10.]
23 Other sparsity priors Bayesian LASSO: θ 1,...,θ p iid from a mixture of Laplace (λ) distributions over λ Γ(a,b). Bayesian bridge: Same but with Laplace replaced with a density e λy α. Normal-Gamma: θ 1,...,θ p iid from a Gamma scale mixture of Gaussians. Correlated multivariate normal-gamma: θ = Cφ for a p k-matrix C and φ with independent normal-gamma (a i,1/2) coordinates. Horseshoe. Horseshoe+. Normal spike. Scalar multiple of Dirichlet. Nonparametric Dirichlet.... [Park & Casella 08, Polson & Scott, Griffin & Brown 10, 12, Carvalho & Polson & Scott, 10, George& Rockova 13, Bhattacharya et al. 12,...]
24 LASSO is not Bayesian p ] ˆθ LASSO = argmin [ Y n Xθ 2 +λ n θ i. θ i=1 posterior mode for prior θ i iid Laplace(λ n ), works great, but the full posterior distribution is useless.
25 LASSO is not Bayesian p ] ˆθ LASSO = argmin [ Y n Xθ 2 +λ n θ i. θ i=1 posterior mode for prior θ i iid Laplace(λ n ), works great, but the full posterior distribution is useless. Theorem If n/λ n then E 0 Π n ( θ 2 n/λ n Y n) 0. λ n = 2logn gives almost no Bayesian shrinkage. Trouble: λ n must be large to shrink θ i to 0, but small to model nonzero θ i.
26 Frequentist Bayes
27 Frequentist Bayes Assume data Y n follows a given parameter θ 0. Consider posterior Π(θ Y n ) as random measure on parameter set. We like Π(θ Y n ): to put most of its mass near θ 0 for most Y n. to have a spread that expresses remaining uncertainty. to select the model defined by the nonzero parameters of θ 0. We evaluate this by probabilities or expectations, given θ 0.
28 Benchmarks for recovery sequence model Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. θ 0 = #(1 i n:θ i 0), n θ 2 2 = θ i 2. Frequentist benchmark: minimax rate relative to 2 over: black bodies {θ: θ 0 s n }: sn log(n/s n ). i=1 [(if s n with s n /n 0.) Donoho & Johnstone, Golubev, Johnstone and Silverman, Abramovich et al.,...]
29 Benchmarks for recovery sequence model Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. θ 0 = #(1 i n:θ i 0), n θ q q = θ i q, 0 < q 2. i=1 Frequentist benchmarks: minimax rate relative to q over: black bodies {θ: θ 0 s n }: log(n/sn ). s 1/q n
30 Benchmarks for recovery sequence model Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. θ 0 = #(1 i n:θ i 0), n θ q q = θ i q, 0 < q 2. i=1 Frequentist benchmarks: minimax rate relative to q over: black bodies {θ: θ 0 s n }: log(n/sn ). s 1/q n weak l r -balls m r [s n ]:= {θ:max i i θ [i] r n(s n /n) r }: n 1/q (s n /n) r/q log(n/s n ) 1 r/q. [(if s n with s n /n 0.) Donoho & Johnstone, Golubev, Johnstone and Silverman, Abramovich et al.,...]
31 Model Selection Prior
32 Model selection prior Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. Constructive definition of prior Π for θ R p : (1) Choose s from prior π n on {0,1,2,...,n}. (2) Choose S {0,1,...,n} of size S = s at random. (3) Choose θ S = (θ i :i S) from density g S on R S and set θ S c = 0.
33 Model selection prior Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. Constructive definition of prior Π for θ R p : (1) Choose s from prior π n on {0,1,2,...,n}. (2) Choose S {0,1,...,n} of size S = s at random. (3) Choose θ S = (θ i :i S) from density g S on R S and set θ S c = 0. Assume π n (s) cπ n (s 1), for some c < 1 and every s. g S = i S e h, for uniformly Lipschitz h:r R. s n := θ 0 0, n, s n /n 0. Examples: complexity prior: π n (s) e aslog(bn/s). spike and slab: θ i iid τδ 0 +(1 τ)lap with τ B(1,n+1).
34 Numbers Single data with θ 0 = (0,...,0,5,...,5) and n = 500 and θ 0 0 = 100. Red dots: marginal posterior medians Orange: marginal credible intervals Green dots: data points. g standard Laplace density. ( ) π n (k) 2n k 0.1 n (left) and πn (k) ( ) 2n k (right). n
35 Dimensionality of posterior distribution Theorem [black body] There exists M such that sup θ 0 0 s n E θ0 Π n ( θ: θ 0 Ms n Y n) 0. Outside the space in which θ 0 lives, the posterior is concentrated in low-dimensional subspaces along the coordinate axes.
36 Recovery Theorem [black body] For every 0 < q 2 and large M, ( sup E θ0 Π n θ: θ θ0 q > Mr n s 1/q 1/2 n Y n) 0, θ 0 0 s n for r 2 n = s n log(n/s n ) log(1/π n (s n )). If π n (s n ) e as nlog(n/s n ) minimax rate is attained.
37 Selection S θ := {1 i n:θ i 0}. Theorem [No supersets] sup θ 0 0 s n E θ0 Π n (θ:s θ S θ0,s θ S θ0 Y n ) 0.
38 Selection S θ := {1 i n:θ i 0}. Theorem Theorem [No supersets] sup θ 0 0 s n E θ0 Π n (θ:s θ S θ0,s θ S θ0 Y n ) 0. [Finds big signals] inf θ 0 0 s n E θ0 Π n ( θ:sθ {i: θ 0,i logn} Y n) 1.
39 Selection S θ := {1 i n:θ i 0}. Theorem Theorem [No supersets] sup θ 0 0 s n E θ0 Π n (θ:s θ S θ0,s θ S θ0 Y n ) 0. [Finds big signals] inf θ 0 0 s n E θ0 Π n ( θ:sθ {i: θ 0,i logn} Y n) 1. Corollary: if all nonzero θ 0,i are suitably big, then posterior probability of true model S θ0 tends to 1.
40 Bernstein-von Mises theorem Theorem For spike-and-laplace(λ n )-slab prior with λ n logn/sn 0, there are random weights ŵ S, E θ0 Π n ( Y n ) ŵ S N S (YS n,i) δ S c 0. S Theorem Given consistent model selection, mixture can be replaced by N S0 (Y Sθ0,I) δ S c θ0. Corollary: Given consistent model selection, credible sets for individual parameters are asymptotic confidence sets.
41 Numbers: mean square errors p n A PM PM EBM PMed PMed EBMed HT HTO Average ˆθ θ 2 over 100 data experiments. n = 500; θ 0 = (0,...,0,A,...,A). ( ) PM1, PM2: posterior means for priors π n (k) e k log(3n/k)/10, 2n k 0.1. n PMed1, PMed2 marginal posterior medians for the same priors EBM, EBMed: empirical Bayes mean, median for Laplace prior (Johnstone et al.) HT, HTO: thresholding at 2logn, 2log(n/ θ 0 0 ). Short Summary: Bayesian method is neither better nor worse.
42 Horseshoe Prior
43 Horseshoe prior Y n N n (θ,i), for θ = (θ 1,...,θ n ) R n. Constructive definition of prior Π for θ R p : (1) Choose sparsity level ˆτ. (2) Generate ψ 1,..., ψ n iid from Cauchy + (0,ˆτ). (3) Generate independent θ i N(0,ψ i ). prior shrinkage factor prior of θ i posterior mean of θ i as function of Y i [Carvalho & Polson & Scott, 10.]
44 Estimating τ Ad-hoc: ˆτ n = #{ Y i n 2logn}. 1.1n Empirical Bayes: For g τ the prior of θ i, ˆτ n = argmax τ [1/n,1] n i=1 φ(y i θ)g τ (θ)dθ. Full Bayes: τ set by a hyper prior (supported on [1/n,1]).
45 Numbers estimating τ MSE of posterior mean as function of nonzero parameter s n n = 100, s n coordinates from N(0,1/4), n s n coordinates from N(A,1). p n = s n Short summary: Empirical Bayes and Full Bayes outperform ad-hoc estimator.
46 Recovery Horseshoe prior gives similar recovery as model selection prior.
47 Recovery Horseshoe prior gives similar recovery as model selection prior. τ can be interpreted as (s n /n) log(n/s n ).
48 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95
49 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95 S a := { 1 i n: θ 0,i 1/n }, M a := { 1 i n:(s n /n) log(n/s n ) θ 0,i log(n/s n ) }. L a := { 1 i n: logn θ 0,i }.
50 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95 S a := { 1 i n: θ 0,i 1/n }, M a := { 1 i n:(s n /n) log(n/s n ) θ 0,i log(n/s n ) }. L a := { 1 i n: logn θ 0,i }. marginal credible intervals for a single Y n with n = 200 and s n = 10. θ 1 = = θ 5 = 7, θ 6 = = θ 10 = 1.5. Insert: credible sets 5 to 13.
51 Credible intervals Credible interval: Ĉ ni (L) = {θ i : θi ˆθ } i Lˆri ˆθ = E(θ Y n ) Π ( θ i : θ i ˆθ i ˆr i Y n) = 0.95 S a := { 1 i n: θ 0,i 1/n }, M a := { 1 i n:(s n /n) log(n/s n ) θ 0,i log(n/s n ) }. L a := { 1 i n: logn θ 0,i }. Theorem For any γ > 0 and θ 0 0 s n, ( 1 ) P θ0 #{i S a :θ 0,i #S Ĉni(L S,γ )} 1 γ a 1, ( P θ0 θ0,i / Ĉ ni (L)) 1, for any L > 0 and i M a, ( 1 ) P θ0 #{i L a :θ 0,i #L Ĉni(L L,γ )} 1 γ 1. a Few false discoveries; most easy discoveries made. Intermediate discoveries not made.
52 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model.
53 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model. Theorem [Li, 1987] If P θ0 (C n (Y n ) θ 0 ) 0.95, all θ 0 R n, then diam(c n (Y n )) n 1/4, some θ 0.
54 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model. Theorem [Li, 1987] If P θ0 (C n (Y n ) θ 0 ) 0.95, all θ 0 R n, then diam(c n (Y n )) n 1/4, some θ 0. Theorem [Nickl, van de Geer, 2013] If s 1,n s 2,n and diam(c n (Y n )) is of optimal size, uniformly in θ 0 0 s i,n for i = 1,2, then C n (Y n ) cannot have uniform coverage over {θ 0 : θ 0 0 s 2,n }.
55 Simultaneous credible balls impossibility of adaptation General principle: size of honest confidence set is determined by biggest model. Theorem [Li, 1987] If P θ0 (C n (Y n ) θ 0 ) 0.95, all θ 0 R n, then diam(c n (Y n )) n 1/4, some θ 0. Theorem [Nickl, van de Geer, 2013] If s 1,n s 2,n and diam(c n (Y n )) is of optimal size, uniformly in θ 0 0 s i,n for i = 1,2, then C n (Y n ) cannot have uniform coverage over {θ 0 : θ 0 0 s 2,n }. Since the Bayesian procedure adapts to sparsity, its credible sets cannot be honest confidence sets. [Optimal size is ( (s i,n /n)log(n/s i,n ) ) 1/2.]
56 Simultaneous credible balls impossibility of adaptation restricting the parameter Coverage only when θ 0 does not cause too much shrinkage. DEFINITION [self-similarity] For s = θ 0 0 at least 0.001s coordinates of θ 0 satisfy θ 0,i log(n/s).
57 Simultaneous credible balls impossibility of adaptation restricting the parameter Coverage only when θ 0 does not cause too much shrinkage. DEFINITION [self-similarity] For s = θ 0 0 at least 0.001s coordinates of θ 0 satisfy θ 0,i log(n/s). DEFINITION [excessive-bias restriction, Belitser & Nurushev, 2015] θ 0 s and s with s # ( i: θ 0,i log(n/ s) ) and i: θ 0,i log(n/ s) θ 2 0,i slog(n/ s). Excessive-bias restriction implies self-similarity. (Self-similarity allows to tighten up the sets S,M,L.)
58 Simultaneous credible balls Credible ball: Ĉ n (L) = {θ: θ ˆθ Lˆr } ˆθ = E(θ Y n ) Π ( θ: θ ˆθ ˆr Y n) = 0.95
59 Simultaneous credible balls Credible ball: Ĉ n (L) = {θ: θ ˆθ Lˆr } ˆθ = E(θ Y n ) Π ( θ: θ ˆθ ˆr Y n) = 0.95 Theorem If s n /n 0, for sufficiently large L, lim inf n ( ) inf P θ 0 θ 0 Ĉn(L) θ 0 EBR[s n ] 1 α. EBR[s]: vectors θ 0 that satisfy excessive bias restriction.
60 Numbers coverage average interval length n = 400. s n ( = p ) nonzero means from N(A,1). n = 400. s n ( = p ) nonzero means from N(A,1). Short summary: empirical and full Bayes work well
61 Conclusions Bayesian sparse estimation gives excellent recovery. For valid simultaneous credible sets need a fraction of nonzero parameters above the universal threshold. The danger of failing uncertainty quantification is not finding nonzero coordinates. Discoveries are real.
62 Co-authors Subhashis Ghoshal Harry van Zanten Ismael Castillo Johannes Schmidt-Hieber Stéphanie van der Pas Botond Szabo Willem Kruijer Magnus Münch Bartek Knapik Bas Kleijn Suzanne Sniekers Fengnan Gao Gwenael Leday Gino Kpogbezan Mark van de Wiel Wessel van Wieringen
63 Compatibility and coherence X := max j X.,j. Compatibility number φ(s) for S {1,...,p} is: Xθ 2 S inf. θ S c 1 7 θ 1 X θ S 1 Compatibility in s n -sparse vectors means: Xθ 2 Sθ inf 0. θ: θ 0 5s n X θ 1 Strong compatibility ins n -sparse vectors means: Mutual coherence means: Xθ 2 inf 0. θ: θ 0 5s n X θ 2 s n max cor(x.i,x.j ) 1. i j
64 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive.
65 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive. Sequence model: Completely compatible, with zero mutual coherence number.
66 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive. Sequence model: Completely compatible, with zero mutual coherence number. Response model: If X i,j are i.i.d. random variables, then coherence if s n n/logp. if logp = o(n) and X i,j are bounded. if logp = o(n α/(4+α) ) and E txα i,n <.
67 Compatibility and coherence examples Mutual coherence Strong compatibility Compatibility. Mutual coherence is easy to understand and gives best recovery results, but is very restrictive. Sequence model: Completely compatible, with zero mutual coherence number. Response model: If X i,j are i.i.d. random variables, then coherence if s n n/logp. if logp = o(n) and X i,j are bounded. if logp = o(n α/(4+α) ) and E txα i,n <. C = X T X/n: Compatibility, but no coherence if C i,j = ρ i j, for 0 < ρ < 1, and p = n. C is block diagonal with fixed block sizes.
Bayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationarxiv: v1 [math.st] 13 Feb 2017
Adaptive posterior contraction rates for the horseshoe arxiv:72.3698v [math.st] 3 Feb 27 Stéphanie van der Pas,, Botond Szabó,,, and Aad van der Vaart, Leiden University and Budapest University of Technology
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationBayesian methods in economics and finance
1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent
More informationarxiv: v3 [stat.me] 10 Aug 2018
Lasso Meets Horseshoe: A Survey arxiv:1706.10179v3 [stat.me] 10 Aug 2018 Anindya Bhadra 250 N. University St., West Lafayette, IN 47907. e-mail: bhadra@purdue.edu Jyotishka Datta 1 University of Arkansas,
More informationarxiv: v1 [math.st] 6 Nov 2012
The Annals of Statistics 212, Vol. 4, No. 4, 269 211 DOI: 1.1214/12-AOS129 c Institute of Mathematical Statistics, 212 arxiv:1211.1197v1 [math.st] 6 Nov 212 NEEDLES AND STRAW IN A HAYSTACK: POSTERIOR CONCENTRATION
More informationScalable MCMC for the horseshoe prior
Scalable MCMC for the horseshoe prior Anirban Bhattacharya Department of Statistics, Texas A&M University Joint work with James Johndrow and Paolo Orenstein September 7, 2018 Cornell Day of Statistics
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationConfidence Intervals for Low-dimensional Parameters with High-dimensional Data
Confidence Intervals for Low-dimensional Parameters with High-dimensional Data Cun-Hui Zhang and Stephanie S. Zhang Rutgers University and Columbia University September 14, 2012 Outline Introduction Methodology
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSparsity and the truncated l 2 -norm
Lee H. Dicker Department of Statistics and Biostatistics utgers University ldicker@stat.rutgers.edu Abstract Sparsity is a fundamental topic in highdimensional data analysis. Perhaps the most common measures
More informationModule 22: Bayesian Methods Lecture 9 A: Default prior selection
Module 22: Bayesian Methods Lecture 9 A: Default prior selection Peter Hoff Departments of Statistics and Biostatistics University of Washington Outline Jeffreys prior Unit information priors Empirical
More informationBayesian Penalty Mixing: The Case of a Non-separable Penalty
Bayesian Penalty Mixing: The Case of a Non-separable Penalty Veronika Ročková and Edward I. George Abstract Separable penalties for sparse vector recovery are plentiful throughout statistical methodology
More informationarxiv: v1 [stat.me] 6 Jul 2017
Sparsity information and regularization in the horseshoe and other shrinkage priors arxiv:77.694v [stat.me] 6 Jul 7 Juho Piironen and Aki Vehtari Helsinki Institute for Information Technology, HIIT Department
More informationBayesian nonparametrics
Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September
More informationBayesian shrinkage approach in variable selection for mixed
Bayesian shrinkage approach in variable selection for mixed effects s GGI Statistics Conference, Florence, 2015 Bayesian Variable Selection June 22-26, 2015 Outline 1 Introduction 2 3 4 Outline Introduction
More informationPackage horseshoe. November 8, 2016
Title Implementation of the Horseshoe Prior Version 0.1.0 Package horseshoe November 8, 2016 Description Contains functions for applying the horseshoe prior to highdimensional linear regression, yielding
More informationMultinomial Data. f(y θ) θ y i. where θ i is the probability that a given trial results in category i, i = 1,..., k. The parameter space is
Multinomial Data The multinomial distribution is a generalization of the binomial for the situation in which each trial results in one and only one of several categories, as opposed to just two, as in
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationDe-biasing the Lasso: Optimal Sample Size for Gaussian Designs
De-biasing the Lasso: Optimal Sample Size for Gaussian Designs Adel Javanmard USC Marshall School of Business Data Science and Operations department Based on joint work with Andrea Montanari Oct 2015 Adel
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationPartial factor modeling: predictor-dependent shrinkage for linear regression
modeling: predictor-dependent shrinkage for linear Richard Hahn, Carlos Carvalho and Sayan Mukherjee JASA 2013 Review by Esther Salazar Duke University December, 2013 Factor framework The factor framework
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationThe Horseshoe Estimator for Sparse Signals
The Horseshoe Estimator for Sparse Signals Carlos M. Carvalho Nicholas G. Polson University of Chicago Booth School of Business James G. Scott Duke University Department of Statistical Science October
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationarxiv: v2 [stat.me] 16 Aug 2018
arxiv:1807.06539v [stat.me] 16 Aug 018 On the Beta Prime Prior for Scale Parameters in High-Dimensional Bayesian Regression Models Ray Bai Malay Ghosh August 0, 018 Abstract We study high-dimensional Bayesian
More informationarxiv: v1 [math.st] 31 Mar 2019
arxiv:1904.01003v1 [math.st] 31 Mar 2019 Robust inference for general framework of projection structures Eduard Belitser 1 and Nurzhan Nurushev 1,2 1 Department of Mathematics,VU Amsterdam 2 Korteweg-de
More informationPeter Hoff Minimax estimation November 12, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationProblem statement. The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + β j X ij + σz i, Z i N(0, 1),
Problem statement The data: Orthonormal regression with lots of X s (possible lots of β s are zero: Y i = β 0 + p j=1 β j X ij + σz i, Z i N(0, 1), Equivalent form: Normal mean problem (known σ) Y i =
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationCarl N. Morris. University of Texas
EMPIRICAL BAYES: A FREQUENCY-BAYES COMPROMISE Carl N. Morris University of Texas Empirical Bayes research has expanded significantly since the ground-breaking paper (1956) of Herbert Robbins, and its province
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationThe Threshold Selection Problem
. p.1 The Threshold Selection Problem Given data X i N(µ i, 1), i =1,...,n some threshold rule, e.g. { X i X i t ˆθ i = 0 X i
More informationSUPPLEMENT TO EMPIRICAL BAYES ORACLE UNCERTAINTY QUANTIFICATION FOR REGRESSION. VU Amsterdam and North Carolina State University
Submitted to the Annals of Statistics SUPPLEMENT TO EMPIRICAL BAYES ORACLE UNCERTAINTY QUANTIFICATION FOR REGRESSION By Eduard Belitser and Subhashis Ghosal VU Amsterdam and North Carolina State University
More informationHypes and Other Important Developments in Statistics
Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationBayesian variable selection and classification with control of predictive values
Bayesian variable selection and classification with control of predictive values Eleni Vradi 1, Thomas Jaki 2, Richardus Vonk 1, Werner Brannath 3 1 Bayer AG, Germany, 2 Lancaster University, UK, 3 University
More informationHandling Sparsity via the Horseshoe
Handling Sparsity via the Carlos M. Carvalho Booth School of Business The University of Chicago Chicago, IL 60637 Nicholas G. Polson Booth School of Business The University of Chicago Chicago, IL 60637
More informationA New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables
A New Bayesian Variable Selection Method: The Bayesian Lasso with Pseudo Variables Qi Tang (Joint work with Kam-Wah Tsui and Sijian Wang) Department of Statistics University of Wisconsin-Madison Feb. 8,
More informationarxiv: v1 [math.st] 23 Nov 2017
Risk quantification for the thresholding rule for multiple testing using Gaussian scale mixtures arxiv:1711.08705v1 [math.st] 3 Nov 017 Jean-Bernard Salomond Université Paris-Est, Laboratoire d Analyse
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationUnobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective
Unobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective Roger Koenker University of Illinois, Urbana-Champaign University of Tokyo: 25 November 2013 Joint work with Jiaying Gu (UIUC)
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationSome Curiosities Arising in Objective Bayesian Analysis
. Some Curiosities Arising in Objective Bayesian Analysis Jim Berger Duke University Statistical and Applied Mathematical Institute Yale University May 15, 2009 1 Three vignettes related to John s work
More information(4) One-parameter models - Beta/binomial. ST440/550: Applied Bayesian Statistics
Estimating a proportion using the beta/binomial model A fundamental task in statistics is to estimate a proportion using a series of trials: What is the success probability of a new cancer treatment? What
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationHorseshoe Prior: robustness against non-normal deviations.
D. Gomon Horseshoe Prior: robustness against non-normal deviations. Bachelor thesis 3 July 7 Thesis supervisors: dr. S.L. van der Pas dr. A.J. Schmidt-Hieber prof.dr. K.H. Kuijken Leiden University Mathematical
More informationBAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage
BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationGeometric ergodicity of the Bayesian lasso
Geometric ergodicity of the Bayesian lasso Kshiti Khare and James P. Hobert Department of Statistics University of Florida June 3 Abstract Consider the standard linear model y = X +, where the components
More informationHorseshoe, Lasso and Related Shrinkage Methods
Readings Chapter 15 Christensen Merlise Clyde October 15, 2015 Bayesian Lasso Park & Casella (JASA 2008) and Hans (Biometrika 2010) propose Bayesian versions of the Lasso Bayesian Lasso Park & Casella
More informationL 0 methods. H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands. December 5, 2011.
L methods H.J. Kappen Donders Institute for Neuroscience Radboud University, Nijmegen, the Netherlands December 5, 2 Bert Kappen Outline George McCullochs model The Variational Garrote Bert Kappen L methods
More informationChapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1
Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationRecursive Sparse Estimation using a Gaussian Sum Filter
Proceedings of the 17th World Congress The International Federation of Automatic Control Recursive Sparse Estimation using a Gaussian Sum Filter Lachlan Blackhall Michael Rotkowitz Research School of Information
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu 1,2, Daniel F. Schmidt 1, Enes Makalic 1, Guoqi Qian 2, John L. Hopper 1 1 Centre for Epidemiology and Biostatistics,
More informationLong-Run Covariability
Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips
More informationApproximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA)
Approximating high-dimensional posteriors with nuisance parameters via integrated rotated Gaussian approximation (IRGA) Willem van den Boom Department of Statistics and Applied Probability National University
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationDirichlet-Laplace priors for optimal shrinkage
Dirichlet-Laplace priors for optimal shrinkage Anirban Bhattacharya, Debdeep Pati, Natesh S. Pillai, David B. Dunson January 22, 2014 arxiv:1401.5398v1 [math.st] 21 Jan 2014 Abstract Penalized regression
More informationProbabilistic machine learning group, Aalto University Bayesian theory and methods, approximative integration, model
Aki Vehtari, Aalto University, Finland Probabilistic machine learning group, Aalto University http://research.cs.aalto.fi/pml/ Bayesian theory and methods, approximative integration, model assessment and
More informationLecture 7 October 13
STATS 300A: Theory of Statistics Fall 2015 Lecture 7 October 13 Lecturer: Lester Mackey Scribe: Jing Miao and Xiuyuan Lu 7.1 Recap So far, we have investigated various criteria for optimal inference. We
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationMessage Passing Algorithms for Compressed Sensing: II. Analysis and Validation
Message Passing Algorithms for Compressed Sensing: II. Analysis and Validation David L. Donoho Department of Statistics Arian Maleki Department of Electrical Engineering Andrea Montanari Department of
More informationLasso & Bayesian Lasso
Readings Chapter 15 Christensen Merlise Clyde October 6, 2015 Lasso Tibshirani (JRSS B 1996) proposed estimating coefficients through L 1 constrained least squares Least Absolute Shrinkage and Selection
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationThe Spike-and-Slab LASSO (Revision)
The Spike-and-Slab LASSO (Revision) Veronika Ročková and Edward I. George July 19, 2016 Abstract Despite the wide adoption of spike-and-slab methodology for Bayesian variable selection, its potential for
More informationPeter Hoff Minimax estimation October 31, Motivation and definition. 2 Least favorable prior 3. 3 Least favorable prior sequence 11
Contents 1 Motivation and definition 1 2 Least favorable prior 3 3 Least favorable prior sequence 11 4 Nonparametric problems 15 5 Minimax and admissibility 18 6 Superefficiency and sparsity 19 Most of
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationFactor model shrinkage for linear instrumental variable analysis with many instruments
Factor model shrinkage for linear instrumental variable analysis with many instruments P. Richard Hahn Booth School of Business University of Chicago Joint work with Hedibert Lopes 1 My goals for this
More informationEstimating Sparse High Dimensional Linear Models using Global-Local Shrinkage
Estimating Sparse High Dimensional Linear Models using Global-Local Shrinkage Daniel F. Schmidt Centre for Biostatistics and Epidemiology The University of Melbourne Monash University May 11, 2017 Outline
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationShrinkage Estimation in High Dimensions
Shrinkage Estimation in High Dimensions Pavan Srinath and Ramji Venkataramanan University of Cambridge ITA 206 / 20 The Estimation Problem θ R n is a vector of parameters, to be estimated from an observation
More informationSample Size Requirement For Some Low-Dimensional Estimation Problems
Sample Size Requirement For Some Low-Dimensional Estimation Problems Cun-Hui Zhang, Rutgers University September 10, 2013 SAMSI Thanks for the invitation! Acknowledgements/References Sun, T. and Zhang,
More informationHigh-dimensional covariance estimation based on Gaussian graphical models
High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationAn Introduction to Bayesian Linear Regression
An Introduction to Bayesian Linear Regression APPM 5720: Bayesian Computation Fall 2018 A SIMPLE LINEAR MODEL Suppose that we observe explanatory variables x 1, x 2,..., x n and dependent variables y 1,
More informationHigh Dimensional Inverse Covariate Matrix Estimation via Linear Programming
High Dimensional Inverse Covariate Matrix Estimation via Linear Programming Ming Yuan October 24, 2011 Gaussian Graphical Model X = (X 1,..., X p ) indep. N(µ, Σ) Inverse covariance matrix Σ 1 = Ω = (ω
More information