Bayesian nonparametrics
|
|
- Wesley Wood
- 5 years ago
- Views:
Transcription
1 Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
2 Part I Introduction Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
3 Standard frequentist framework Statistical experiment X random object = data P model P = {P θ, θ Θ}. Frequentist assumption θ 0 Θ, X P θ0 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
4 Standard frequentist framework Statistical experiment X random object = data P model P = {P θ, θ Θ}. Frequentist assumption θ 0 Θ, X P θ0 Estimator a measurable function ˆθ(X ) Θ ˆθ(X ) is a random point in Θ one studies ˆθ(X ) under X P θ0 example ˆθ MLE (X ) = argmax p θ (X ) θ Θ Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
5 Bayesian framework Statistical experiment X random object = data P = {P θ, θ Θ} model Bayesian setting [Do not know θ? View it as random!] a) θ Π proba measure on Θ the prior distribution b) View P θ as law of X θ joint distribution of (θ, X ) is specified c) law of θ X is posterior distribution denoted Π[ X ] Bayesian estimator Π( X ) M 1(Θ) Π( X ) is a data dependent measure on Θ Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
6 E0 Example 0 X = (X 1,..., X n) P = {N (θ, 1) n, θ R} Frequentist estimator ˆθ MLE (X ) = X n Bayesian setting a) θ N (0, 1) = Π prior (say) b) X θ N (θ, 1) n ( ) nx c) θ X N n, 1 = Π[ X ] posterior n+1 n+1 Π[ X ] = N ( ) nx n n + 1, 1 n + 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
7 Bayesian framework Bayesian setting a) θ Π prior b) X θ P θ c) θ X : Π[ X ] posterior All this produces a data dependent measure Π[ X ] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
8 Bayesian framework Bayesian setting a) θ Π prior b) X θ P θ c) θ X : Π[ X ] posterior All this produces a data dependent measure Π[ X ] And what if one would forget a)+b)+c) and study Π[ X ] as a standard estimator?? Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
9 Frequentist analysis of Bayesian procedures Posterior distribution Π[ X ] Frequentist assumption θ 0 Θ, X P θ0 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
10 Frequentist analysis of Bayesian procedures Posterior distribution Π[ X ] Frequentist assumption θ 0 Θ, X P θ0 E0 - Example 0 Π[ X ] = N X θ N (θ, 1) n θ N (0, 1) = Π ( ) nx n n + 1, 1 1 θ(x ) + N (0, 1) n + 1 n + 1 centered close to ˆθ MLE (X ) = X n converges at rate 1/ n towards θ 0 [see below] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
11 Nonparametric models regression E1 Gaussian white noise dx (n) (t) = f (t)dt + 1 n db(t), t [0, 1] f L 2 [0, 1], B Brownian motion E1 Fixed design regression X i = f (i/n) + ε i f C 0 [0, 1], ε 1,..., ε n iid N (0, 1) Unknow parameter f is a function, element of functional space Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
12 Sparsity E2 Sparse gaussian sequence model X i = θ i + ε i, 1 i n θ l 0[s] = {θ R n, {i : θ i 0} s} Unknown θ is a sparse vector Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
13 Nonparametric models sampling models E3 Sampling from unknown distribution X (n) = (X 1,..., X n) i.i.d. P on [0, 1] P probability measure on [0, 1] E3 Density estimation X (n) = (X 1,..., X n) i.i.d. f density on [0, 1] f density on [0, 1] Unknown P measure with total mass 1 Unknown f density: f 0, f = 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
14 Nonparametric models functionals E4 linear functional ψ(f ) = 1 a(u)f (u)du 0 a bounded on [0, 1] E5 distribution function F ( ) = 0 f (u)du f density Many other semiparametric functionals are also of interest... Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
15 Prior on functions on [0, 1] Consider the law of a process Z with a.s. continuous sample paths Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
16 Prior on functions on [0, 1] Consider the law of a process Z with a.s. continuous sample paths Example (centered Gaussian process) K(x, y) = E(Z xz y ) covariance of Z GP K(x, y) = 1 + x y K(x, y) = e (x y) Brownian motion released at 0 Squared-exponential GP Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
17 Prior on functions on [0, 1] Consider the law of a process Z with a.s. continuous sample paths Example (centered Gaussian process) K(x, y) = E(Z xz y ) covariance of Z GP K(x, y) = 1 + x y K(x, y) = e (x y) Brownian motion released at 0 Squared-exponential GP Example (series expansions point of view) Z(x) = k 1 γ k α k ε k (x) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
18 Prior on sparse vectors Define Π prior on l 0[s] = {θ R n, {i : θ i 0} s} draw k π n distribution on {0,..., n} given k, pick S {1,..., n} of size S = k uniformly at random ( ) n Π n(s k) = 1/ k given S, set θ S c = 0 θ S i S g, g density on R Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
19 Sparsity Example [special case of π n = binomial] [ coin flipping prior and e.g. α n = 1/n] k Bin(n, α n) = π n g density on R Π n (1 α n)δ 0 + α ng i=1 [Remark. The posterior median can be shown to be a strict thresholding rule ˆθ med i (X i ) = 0 X i t(α n) for some threshold t(α n). For α n = 1/n, have t(α n) 2 log(n). ] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
20 Prior on densities on [0, 1] Example renormalisation [0, 1] x ezx 1 0 ezu du z t Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
21 Prior on densities on [0, 1] Example renormalisation [0, 1] x ezx 1 0 ezu du z Example [more to follow!] t Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
22 Priors on probability measures Prior on probability measures on R Dirichlet process DP(M, G 0) on R G 0 probability measure on R mean and M > 0 concentration parameter There exists a random probability measure on R such that For any finite partition (B 1,..., B r ) of R, (G(B 1),..., G(B r )) Dir(MG 0(B 1),..., MG 0(B r )) with Dir(a 1,..., a r ) the standard Dirichlet distribution on the r-simplex. [Ferguson 1973] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
23 Priors on probability measures, Dirichlet process G DP(M, G 0) What does it look like? G is a discrete distribution almost surely We have the representation [Sethuraman 94] G π k δ θk k=1 θ k G 0( ) i.i.d. π k weights given by stickbreaking πk = V k 1 k i=1 (1 V i) V1, V 2,... Beta(1, M) i.i.d. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
24 Priors on probability measures, Dirichlet process G DP(M, G 0) What does it look like? G is a discrete distribution almost surely We have the representation [Sethuraman 94] G π k δ θk k=1 θ k G 0( ) i.i.d. π k weights given by stickbreaking πk = V k 1 k i=1 (1 V i) V1, V 2,... Beta(1, M) i.i.d. Pitman-Yor process PY(M, d, G 0) Same representation with V k Beta(1 d, M + kd) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
25 Priors on probability measures, Pólya trees on [0, 1] Consider a sequence I of dyadic partitions of [0, 1] = I I 0 = {[0, 1]}, I 1 = {I 0, I 1}, I 2 = {I 00, I 01, I 10, I 11},..., I k = {I ε, ε {0, 1} k },... say for simplicity split in half dyadic intervals (k2 l, (k + 1)2 l ] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
26 Priors on probability measures, Pólya trees on [0, 1] Consider a sequence I of dyadic partitions of [0, 1] = I I 0 = {[0, 1]}, I 1 = {I 0, I 1}, I 2 = {I 00, I 01, I 10, I 11},..., I k = {I ε, ε {0, 1} k },... say for simplicity split in half dyadic intervals (k2 l, (k + 1)2 l ] I V 0 V 1 I 0 V 00 V 01 I 1 V 10 V 11 I 00 I 01 I 10 I 11 To get a probability measure, one sets V 1 = 1 V 0 and more generally V ε1 = 1 V ε0 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
27 Priors on probability measures, Pólya trees on [0, 1] Consider a sequence I of dyadic partitions of [0, 1] = I I 0 = {[0, 1]}, I 1 = {I 0, I 1}, I 2 = {I 00, I 01, I 10, I 11},..., I k = {I ε, ε {0, 1} k },... say for simplicity split in half dyadic intervals (k2 l, (k + 1)2 l ] I V 0 V 1 I 0 V 00 V 01 I 1 V 10 V 11 I 00 I 01 I 10 I 11 We set P(I 01) = V 0V 01 and more generally, P(I ε1...ε k ) = V ε1 V ε1 ε 2 V ε1 ε 2 ε k for ε = ε 1ε 2 ε k To get a probability measure, one sets V 1 = 1 V 0 and more generally V ε1 = 1 V ε0 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
28 Pólya trees on [0, 1] A Pólya tree PT (I, α) on [0, 1] is the random probability measure P on [0, 1] defined as Given a collection of independent random variables, V ε0 Beta(α ε0, α ε1), any ε {0, 1} k, k 0 Set, for any ε = ε 1... ε k {0, 1} k, any k 0, k k P(I ε) = V ε1...ε j 1 0 (1 V ε1...ε j 1 0) j=1; ε j =0 j=1; ε j =1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
29 Pólya trees on [0, 1] A Pólya tree PT (I, α) on [0, 1] is the random probability measure P on [0, 1] defined as Given a collection of independent random variables, V ε0 Beta(α ε0, α ε1), any ε {0, 1} k, k 0 Set, for any ε = ε 1... ε k {0, 1} k, any k 0, Special cases P(I ε) = k j=1; ε j =0 V ε1...ε j 1 0 k j=1; ε j =1 (1 V ε1...ε j 1 0) If α ε = Mα(I ε) for a probability measure α( ) on [0, 1], DP(M, α) For α ε1 ε k 1 0 = α ε1 ε k 1 1 = a k and k=1 a 1 k < with I regular partition In this case it is a prior on densities! P Leb[0, 1] a.s. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
30 Part II Rates Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
31 Bayesian dominated framework Experiment. X = X (n), P = {P (n), θ F}, (F, F) measure space θ Dominated framework. Suppose there exists a dominating measure µ (n) Bayesian setting. a) θ Π prior distribution b) X θ P (n) θ c) θ X : Π[ X ] posterior dp (n) θ = p (n) θ ( )dµ(n) Bayes formula. For any measurable set B F, Π(B X (n) ) = Remark. Π[B] = 0 Π[B X ] = 0 B p(n) θ (X (n) )dπ(θ) (n) p θ (X (n) )dπ(θ). Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
32 Consistency Consistency. The posterior is consistent at θ 0 if, as n, Π( X (n) ) w δ θ0, in P (n) θ 0 -probability. In a separable metric space equipped with a distance d, for any ε > 0, Π(θ : d(θ, θ 0) < ε X (n) ) 1, in P (n) θ 0 -probab. [Notation: in the sequel, drop index n and write P θ0, E θ0, X ] General consistency results [Doob (1949)] [Schwartz (1965)] [Barron, Schervish, Wasserman (1999)] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
33 Convergence rate Convergence rate. The posterior converges at rate ε n for the distance d at θ 0 if, as n, E θ0 Π(θ : d(θ, θ 0) ε n X ) 1 It is an upper bound: we look for the smallest possible ε n. What happens in a nonparametric framework? [Ghosal, Ghosh, van der Vaart 00], [Ghosal, van der Vaart 07] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
34 Convergence rate lower bound Lower bound to rate. We say that ζ n is a lower bound for the rate for d at θ 0 if E θ0 Π(θ : d(θ, θ 0) ζ n X ) 0 One looks for the largest possible ζ n. May look a little odd first that there is no mass close to θ 0... It just says that ζ n is a too fast scaling that captures little posterior mass Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
35 Rates, regular parametric models E0 X θ N (θ, 1) n Π[ X ] = N θ N (0, 1) = Π ( ) nx n n + 1, 1 n + 1 Exercise. Show that, for any M n, [ ] 1 E θ0 Π θ : θ θ 0 Mn X M n n n 1 This extends to regular parametric models and most reasonable priors Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
36 Posterior and aspects of posterior The posterior distribution Π[ X ] may have a well-defined posterior mean posterior mode θ(x ) := θdπ(θ X ) θ (X ) := argmax θ posterior median, posterior quantiles... These are estimators in the standard sense. dπ(θ X ) dµ Often, but not always, a convergence rate for Π[ X ] implies the same rate for a given aspect of it Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
37 Posterior and aspects of posterior Fact 1. Suppose Π[d(θ, θ 0) > ε n X ] = o P (1). Let θ B (X ) be the center of the smallest d ball containing at least 1/2 of the posterior mass d(θ B (X ), θ 0) = O P (ε n) Fact 2. Suppose Π[d(θ, θ 0) > ε n X ] = O P (ε 2 n) θ d 2 (θ, θ 0) is convex and bounded [e.g. d=h] Then d( θ(x ), θ 0) = O P (ε n) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
38 Posterior integrated loss Fact 3. Suppose Then for any M n E θ0 d(θ, θ 0) 2 dπ(θ X ) ε 2 n If θ d 2 (θ, θ 0) is convex, E θ0 Π[θ : d(θ, θ 0) 2 > M nε 2 n X ] = o(1) E θ0 [ d( θ(x ), θ 0) 2] ε 2 n Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
39 First examples E1 Fixed design regression [van der Vaart, van Zanten 08, 09] True function. Let f 0 C β [0, 1] Loss function. g 2 n = n 1 n i=1 g(t i) 2 Prior. Brownian motion + Gaussian W t = B t + Z 0, with Z 0 N (0, 1) Then as n, where E f0 Π [f : f f 0 n ε n X ] 1, { ε n n 4 1 β 2 n 1/4 if β 1/2 = n β/2 if β 1/2 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
40 First examples E1 Fixed design regression (followed) Prior. Riemann-Liouville process with parameter α > 0 W t = t 0 (t s)α 1/2 db s + α k=0 Z kt k, with Z k N (0, 1) iid Then where E f0 Π [f : f f 0 n ε n X ] 1, { ε n n α β n α 2α+1 = 2α+1 if β α n β 2α+1 if β α Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
41 First examples E3 Density estimation [van der Vaart, van Zanten 08, 09] True density. Let f 0 C β [0, 1] with f 0 > 0. Loss function. Hellinger distance h(f, g) 2 = ( f g) 2 Prior. Consider the distribution on continuous functions induced by t ewt 1 0 ewu du with W t either Brownian motion or Riemann-Liouville process with parameter α > 0 Then, for ε n as before, E f0 Π [h(f, f 0) ε n X ] 1. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
42 Theory: Bayesian nonparametrics Consistency [Doob (1949)] Consistency up to a Π-null set [Schwartz (1965)] Consistency under testing and enough prior mass in Kullback-Leibler-type neighborhoods [Diaconis, Freedman (1986)] Example of innocent prior in a semiparametric framework, for which the posterior is not consistent Rates of convergence? Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
43 GGV theory: first lemma To fix ideas consider first the density estimation model X = (X 1,..., X n) P = {P n f, dp f = fdµ, f F} F some set of densities Π prior distribution on F Bayes formula n B i=1 Π[B X ] = f (X i)dπ(f ) n i=1 f (X i)dπ(f ) Notation K(f 0, f ) = log f0 f f0dµ ( 2 V (f 0, f ) = log f0 K(f 0, f )) f 0dµ f Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
44 GGV theory: first lemma Define a Kullback Leibler type neighborhood of f 0 by B KL (f 0, ε n) := {f : K(f 0, f ) ε 2 n, V (f 0, f ) ε 2 n} Lemma 1. Let A n measurable and let nε 2 n. Suppose Then Π[A n X ] = o P (1). Π[A n] e 2nε2 n Π[B KL (f 0, ε n)] = o(1). [Idea: very small prior mass implies small posterior mass] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
45 Lemma 2. For any Π probability measure on F, any C > 0 and ε > 0, n i=1 with P f0 probability at least 1 1/(C 2 nε 2 ). f (X i )dπ(f ) Π(B KL (f 0, ε))e (1+C)nε 2 f 0 n i=1 f (X i )dπ(f ) f 0 B n i=1 f f 0 (X i )d Π(f )Π(B), Π = Π B Π(B) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
46 Lemma 2. For any Π probability measure on F, any C > 0 and ε > 0, n i=1 with P f0 probability at least 1 1/(C 2 nε 2 ). f (X i )dπ(f ) Π(B KL (f 0, ε))e (1+C)nε 2 f 0 n i=1 f (X i )dπ(f ) f 0 B n i=1 f f 0 (X i )d Π(f )Π(B), Π = Π B Π(B) log B n i=1 f f 0 (X i )d Π(f ) n i=1 B log f f 0 (X i )d Π(f ) [Jensen] n [ ] log f0 i=1 B f (X i) KL(f 0, f ) d Π(f ) n KL(f 0, f )d Π(f ) B n Z i nε 2 if B = B KL (f 0, ε) i=1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
47 We have Z i iid and E f0 Z 1 = 0 Var f0 Z 1 E f0 = [ ] Z i := log f0 B f (X i) KL(f 0, f ) d Π(f ) B B [Fubini] [ ] 2 log f0 f (X i) KL(f 0, f ) d Π(f ) [CS] V (f 0, f )d Π(f ) ε 2 Π(B) = ε 2 [Fubini] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
48 We have Z i iid and E f0 Z 1 = 0 Var f0 Z 1 E f0 = By Chebyshev s inequality, [ ] Z i := log f0 B f (X i) KL(f 0, f ) d Π(f ) B B [Fubini] [ ] 2 log f0 f (X i) KL(f 0, f ) d Π(f ) [CS] V (f 0, f )d Π(f ) ε 2 Π(B) = ε 2 P f0 [ ] n Z i > cnε 2 i=1 1 C 2 nε 2. On the complementary event, one thus has n f log (X i )d Π(f ) (C + 1)nε 2 f 0 that is n i=1 i=1 f (X i )d f Π(f ) Π(B)e (C+1)nε 2. 0 [Fubini] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
49 GGV theory: first lemma Let B KL (f 0, ε n) := {f : K(f 0, f ) ε 2 n, V (f 0, f ) ε 2 n} Lemma 1. Let A n measurable and let nε 2 n. Suppose Then Π[A n X ] = o P (1). Π[A n] e 2nε2 n Π[B KL (f 0, ε n)] = o(1). [Proof of lemma 1] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
50 Theory: Bayesian nonparametrics Experiment. X = X (n), P = {P (n) θ, θ F} as before [not necessarily iid sampling] Prior. Π prior distribution on θ Goal. For some distance d n and rate ε n, E θ0 Π [θ : d n(θ, θ 0) > Mε n X ] 0 Let us assume that, for nε 2 n, Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
51 Theory: Bayesian nonparametrics Experiment. X = X (n), P = {P (n) θ, θ F} as before [not necessarily iid sampling] Prior. Π prior distribution on θ Goal. For some distance d n and rate ε n, E θ0 Π [θ : d n(θ, θ 0) > Mε n X ] 0 Let us assume that, for nε 2 n, The prior puts enough mass on neighborhoods of θ 0 { (n) B KL (θ 0, ε n) = p θ 0 Π(B KL (θ 0, ε n)) e cnε2 n log p (n) θ 0 nε 2 n, p (n) θ p (n) θ 0 } log 2 p (n) θ 0 nε 2 p (n) n θ extends definition given above in iid sampling model Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
52 Theory: Bayesian nonparametrics Some sieve sets Θ n capture most of the prior mass Π(Θ\Θ n) e (c+4)nε2 n Test true parameter vs. Complement of a ball (T1) There exist tests ψ n such that E θ0 ψ n 0 sup E θ (1 ψ n) e (c+4)nε 2 n θ Θ n: d n(θ,θ 0 )>Mε n Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
53 Theory: Bayesian nonparametrics Some sieve sets Θ n capture most of the prior mass Π(Θ\Θ n) e (c+4)nε2 n Test true parameter vs. Complement of a ball (T1) There exist tests ψ n such that E θ0 ψ n 0 sup E θ (1 ψ n) e (c+4)nε 2 n θ Θ n: d n(θ,θ 0 )>Mε n A sufficient condition is a control of an entropy [see below] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
54 Theory: Bayesian nonparametrics Theorem 1. [Ghosal, Ghosh, van der Vaart 00], [Ghosal, van der Vaart 07] If there are sets Θ n Θ, c, M > 0 such that there exist tests ψ n with E θ0 ψ n 0, sup E θ (1 ψ n) e (c+4)nε 2 n θ Θ n: d n(θ,θ 0 )>Mε n tests Π(Θ\Θ n) e (c+4)nε2 n Π(B KL (θ 0, ε n)) e cnε2 n remaining mass prior mass Then for M > 0 as above, E θ0 Π[θ : d n(θ, θ 0) Mε n X ] 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
55 Testing via entropy Tests point vs. ball for some distance d n (T0) N(ε, Θ n, d n) = minimal number of balls of radius ε for d n to cover Θ n Lemma 3. Suppose (T0) holds and log N(ε, Θ n, d n) Cnε 2 n Then for any given c > 0 and M large enough, (T1) holds. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
56 Proof of lemma 3 [sketch] Build shells C j covering the complement of the ball Cover each shell by a minimal number of balls To each center of ball, associate ϕ i,j test via (T0) Set ψ := max ϕ i,j i,j testing power e Knjε2 over shells is summable in j Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
57 Testing distances: examples About the testing distance (T0) [Le Cam 70 s, Birgé 80 s] general results for testing between convex sets Imply that (T0) holds in iid and some simple dependency settings for d = h or d = 1 Proposition. In density estimation P (n) f = P n f, then (T0) holds for d = 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
58 Testing distances: examples About the testing distance (T0) [Le Cam 70 s, Birgé 80 s] general results for testing between convex sets Imply that (T0) holds in iid and some simple dependency settings for d = h or d = 1 Proposition. In density estimation P (n) f = P n f, then (T0) holds for d = 1 [Idea of proof.] Let f 0, f 1 with f 0 f 1 1 > ε. For A = {x : f 0(x) < f 1(x)}, P n(a) = 1 n { consider φ n = 1l P n(a) > P f0 (A) + Control of E f0 φ n and E f (1 φ n) via Hoeffding n i=1 1l Xi A, f0 f1 1 3 } Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
59 Theory: Bayesian nonparametrics Theorem 1 entropy version [GGV 00] If Θ n Θ and c > 0 such that, for d n such that (T0) is verified, log N(ε n, Θ n, d n) dnε 2 n Π(Θ\Θ n) e (c+4)nε2 n Π(B KL (θ 0, ε n)) e cnε2 n entropy remaining mass prior mass Then for M > 0 large enough, E θ0 Π[θ : d n(θ, θ 0) Mε n X ] 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
60 Theory, Gaussian priors W = (W t : t T ) centered Gaussian process taking values in Banach space (B, ) Covariance kernel K(s, t) = E(W sw t) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
61 Theory, Gaussian priors W = (W t : t T ) centered Gaussian process taking values in Banach space (B, ) Covariance kernel K(s, t) = E(W sw t) Reproducing Kernel Hilbert Space H (RKHS) associated to W. Define a norm H via p q a i K(s i, ), b j K(t j, ) H = i=1 j=1 i,j a i b j K(s i, t j ) Then one sets H = Vect{K(s, ), s T } H Brownian motion (B t) H = { Series prior σ k ν k ε k H = {h = (h k ) l 2, k 1 0 f (u)du, f L 2 [0, 1] } k 1 σ 2 k hk 2 < + } Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
62 Theory, Gaussian priors Fact. For all g in the support of W in B, and all ε > 0, e ϕg (ε/2) ϕg (ε) P( W g < ε) e where Concentration function. Let g be in the support of W in B. For ε > 0, set h 2 H ϕ g (ε) = inf h H: h g <ε 2 Approximation log P( W < ε) Small ball probability Example [small ball probability] Brownian motion (B t) log P( B < ε) ε 2 (ε 0) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
63 Theory, Gaussian priors [van der Vaart, van Zanten 08] Consider a nonparametric problem with unknown function f 0 B. Prior Π = law of a Gaussian process W on B, with RKHS H. Assume that f 0 is in the support in B of the prior the norm on B combines correctly with the testing distance d Let ε n be a solution of the equation ϕ f0 (ε n) nε 2 n Then the posterior contracts at rate ε n: for large enough M, Lower bound [C 08] E f0 Π(d(f, f 0) > Mε n X ) 0 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
64 Theory, Gaussian priors Ingredients of proof : prior mass the [Fact] links P( W w < ε) and concentration function sieves [Borell 75] s inequality Let B 1 and H 1 unit balls B and H associated to W P(W / MH 1 + εb 1) 1 Φ(Φ 1 (e φ 0(ε) ) + M) Suggests to set Θ n = nε nh 1 + ε nb 1 entropy can link entropy of H 1 and small ball probability Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
65 Example Density estimation and Gaussian priors Squared-exponential covariance kernel Centered Gaussian process Z t with covariance E(Z tz s) = e (s t)2 /L [van der Vaart, van Zanten 10] show that, for fixed L, there are regular functions f 0 for which the rate is at best logarithmic ε n (log n) γ(β) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
66 Example Density estimation and Gaussian priors Squared-exponential covariance kernel Centered Gaussian process Z t with covariance E(Z tz s) = e (s t)2 /L [van der Vaart, van Zanten 10] show that, for fixed L, there are regular functions f 0 for which the rate is at best logarithmic ε n (log n) γ(β) However, those priors are used in machine learning and give very good results in practice when the parameter L is "well chosen"... Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
67 Example Density estimation and Gaussian priors, adaptation [van der Vaart, van Zanten 09] !0.5!1!1.5!2! Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
68 Example Density estimation and Gaussian priors, adaptation [van der Vaart, van Zanten 09] !0.5!1!1.5!2! Prior Π: consider Z At and set t A Gamma distribution ez At 1 0 ez Au du, where u Z u centered GP with squared-exponential kernel Then the posterior converges at minimax rate up to a log factor ] E f0 Π [h(f, f 0) M(log n) γ(β) n β 2β+1 X 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
69 Example Adaptation on manifolds [C, Kerkyacharian, Picard 14] On M compact Riemannian manifold of dimension d without boundary Laplacian M linear operator on functions on M with discrete spectrum ( M)ϕ p = λ pϕ p Note that 0 λ 1 λ 2 M ( e λpt ϕ p ) = λ pe λpt ϕ p t e λpt ϕ p = λ pe λpt ϕ p This is a special solution of the "heat equation" on M Mf = t f Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
70 Example Adaptation on manifolds Let α p N (0, 1) iid. A GP "random solution of the heat equation" is W t := p 1 e λpt/2 α pϕ p The associated family of covariance kernels is P t(x, y) = p 1 e λpt ϕ p(x)ϕ p(y) M-Heat Kernel Subgaussian estimates c 1e c ρ2 (x,y) t B(x, t) B(y, P t(x, y) t) c 2e c ρ2 (x,y) t B(x, t) B(y, t) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
71 Example Adaptation on manifolds dx (n) (x) = f (x)dx + 1 n dz(x), x M Prior Π W T = p e λpt /2 α pϕ p with T t a e t d/2 log q (1/t) W T seen as prior on (B, ) = (L 2, 2) Set q = 1 + d/2 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
72 Example Adaptation on manifolds dx (n) (x) = f (x)dx + 1 n dz(x), x M Prior Π W T = p e λpt /2 α pϕ p with T t a e t d/2 log q (1/t) W T seen as prior on (B, ) = (L 2, 2) Set q = 1 + d/2 Suppose f 0 B2, (M). s Then for M large enough, as n, [ ( ) ] s/(2s+d) log n E f0 Π f f 0 2 M X 0 n The rate is sharp Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
73 Example Adaptation on manifolds dx (n) (x) = f (x)dx + 1 n dz(x), x M Prior Π W T = p e λpt /2 α pϕ p with T t a e t d/2 log q (1/t) W T seen as prior on (B, ) = (L 2, 2) Set q = 1 + d/2 Suppose f 0 B2, (M). s Then for M large enough, as n, [ ( ) ] s/(2s+d) log n E f0 Π f f 0 2 M X 0 n The rate is sharp for small enough ρ, there exists f 0 in B2, (M), s [ ( ) ] s/(2s+d) log n Π f f 0 2 ρ X 0 n Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
74 Example Sparsity Recall X i = θ i + ε i and θ l 0[s] s sparse vector Example [coin flipping prior with fixed α n = 1/n] k g Π Bin(n, α n) = π n density on R n (1 α n)δ 0 + α ng i=1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
75 Example Sparsity Example [coin flipping prior with random α] α Beta(1, n) k α Bin(n, α) = π n g density on R α Beta(1, n) n Π α (1 α)δ 0 + αg i=1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
76 Example Sparsity Theorem. Let Π be the Bayesian coin flipping prior with random α with α Beta(1, n) g the Laplace (or a heavier tailed) density Then for M large enough, sup E θ0 Π [θ : θ θ 0 > Ms n log(n/s n) X ] 0. θ 0 l 0 [s n] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
77 Example Mixtures [Rousseau 10] Observations X 1,..., X n i.i.d. density f 0 > 0 and Hölder C β [0, 1] Beta densities g(x a, b) = xa 1 (1 x) b 1 B(a,b) usual parametrisation α g α,ε(x) = g(x, α ) 1 ε ε reparametrisation Prior Π = hierarchical mixture of Beta densities k g α,k,p k,ɛk( ) = p j g α,εj ( ) j=1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
78 Example Mixtures [Rousseau 10] Observations X 1,..., X n i.i.d. density f 0 > 0 and Hölder C β [0, 1] Beta densities g(x a, b) = xa 1 (1 x) b 1 B(a,b) usual parametrisation α g α,ε(x) = g(x, α ) 1 ε ε reparametrisation Prior Π = hierarchical mixture of Beta densities g α,k,p k,ɛk( ) = k p j g α,εj ( ) j=1 α π α k π k p k k (p 1,..., p k ) law on the canonical simplex of R k ɛ k k (ε 1,..., ε k ) law in (0, 1) k Then the posterior concentrates at rate ] E f0 Π [h(f, f 0) M(log n) ρ(β) n β 2β+1 X 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
79 Posterior measure and aspects of the measure (I) MESSAGE Full posterior measure and aspects of it may differ significantly! Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
80 Posterior measure and aspects of the measure (I) MESSAGE Full posterior measure and aspects of it may differ significantly! Y = I n nθ + ɛ [seq.model] Let Π λ p i=1 Laplace(λ) [ without point masses at zero ] [ ] ˆθ λ LASSO = arg max θ Y X θ 2 2 2λ θ 1 posterior mode of Π λ [ Y ] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
81 Posterior measure and aspects of the measure (I) MESSAGE Full posterior measure and aspects of it may differ significantly! Y = I n nθ + ɛ [seq.model] Let Π λ p i=1 Laplace(λ) [ without point masses at zero ] [ ] ˆθ λ LASSO = arg max θ Y X θ 2 2 2λ θ 1 posterior mode of Π λ [ Y ] Lemma [C, Schmidt-Hieber, van der Vaart 15] There exists δ > 0 such that, as n, E Π n ) θ 0 =0 λn (θ : θ 2 δ Y 0 with λ n = λ LASSO n = C log n. λ n Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
82 Posterior measure and aspects of the measure (I) MESSAGE Full posterior measure and aspects of it may differ significantly! Y = I n nθ + ɛ [seq.model] Let Π λ p i=1 Laplace(λ) [ without point masses at zero ] [ ] ˆθ λ LASSO = arg max θ Y X θ 2 2 2λ θ 1 posterior mode of Π λ [ Y ] Lemma [C, Schmidt-Hieber, van der Vaart 15] There exists δ > 0 such that, as n, E Π n ) θ 0 =0 λn (θ : θ 2 δ Y 0 with λ n = λ LASSO n = C log n. λ n Note n/ log n suboptimal convergence rate! Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
83 Posterior measure and aspects of the measure (II) Y = I n nθ + ε Prior Π on θ Bayesian coin flipping with random α and g (e.g.) the Laplace density Consider estimation of θ under l q metric, 0 < q 2, n d q(θ, θ ) = θ i θ i q. i=1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
84 Posterior measure and aspects of the measure (II) Y = I n nθ + ε Prior Π on θ Bayesian coin flipping with random α and g (e.g.) the Laplace density Consider estimation of θ under l q metric, 0 < q 2, n d q(θ, θ ) = θ i θ i q. i=1 Theorem [C, van der Vaart 12] For large enough M, as n, sup E θ0 Π [ θ : d q(θ, θ 0) > Mrn,q Y ] 0 θ 0 l 0 [s n] Posterior measure is rate-optimal for any 0 < q 2 Posterior mean is suboptimal (!) for q < 1 [Johnstone, Silverman 04] Posterior median is rate-optimal for any 0 < q 2 [Johnstone, Silverman 04], [C, van der Vaart 12] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
85 Rates: conclusion and further topics The previous general rate theorem applies to a variety of problems in i.i.d., non-i.i.d. Markov, hidden Markov,... as soon as one can find a suitable testing distance d n and verify prior mass type conditions Sometimes refinements can be used to tackle specific problems [e.g. more precise assumptions using similar ideas] Other formulations/approaches are also possible: PAC Bayes, non asymptotic versions etc. Yet, with these techniques, unclear how to obtain results for other distances/loss functions that do not satify testing (T0) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
86 Part III Shape Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
87 Gaussian example E0 X θ N (θ, 1) n ; θ N (0, 1) = Π ( ) nx n Π[ X ] = N n + 1, 1 n + 1 Is Π[ X ], after recentering and rescaling, close to N (0, 1)? Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
88 Gaussian example E0 X θ N (θ, 1) n ; θ N (0, 1) = Π ( ) nx n Π[ X ] = N n + 1, 1 n + 1 Is Π[ X ], after recentering and rescaling, close to N (0, 1)? recentering and rescaling. Set τ a : x n(x a) comparing distributions. P Q TV = sup A P(A) Q(A) = 1 P Q 1 2 The recentering choice a = X n seems natural. Then Π[ X ] τ 1 n X n n X n = n N (0, 1). n + 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
89 Gaussian example E0 X θ N (θ, 1) n ; θ N (0, 1) = Π ( ) nx n Π[ X ] = N n + 1, 1 n + 1 Set τ Xn : x n(x X n) Proposition [convergence in Gaussian conjugate case] Π E θ0 [ X ] τ 1 X n N (0, 1) 0 TV Exercise. Prove it. [For instance: use 2 1 KL and compute KL distance between gaussians] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
90 Historical example Laplace [towards 1810] considered the setting X θ Bin(n, θ) θ Unif[0, 1] = Π with associated posterior [ a fact noted by Thomas Bayes a few decades before ] Π[ X ] Beta(X + 1, n X + 1) Laplace noticed that, with τ X /n : x n(x X /n), Π[ X ] τ 1 X /n N (0, 1) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
91 Regular parametric models P = {P θ, θ Θ R k } dp θ (x) = p θ (x)dx, l θ := log p θ Suppose l θ = l θ / θ exists and 0 < I θ := E θ [ l θ l T θ ] <. The model is locally asymptotically normal (LAN) at point θ 0 Θ if h R k log n i=1 p θ0 +h/ n p θ0 (X i ) = 1 n h T n l θ0 (X i ) 1 2 ht I θ0 h + o Pθ0 (1). i=1 An estimator ˆθ = ˆθ n(x ) is (asymp. linear and) efficient at θ 0 if n(ˆθ θ0) = I 1 θ 0 1 n Recall the notation τ ˆθ : x n(x ˆθ) n l θ0 (X i ) + o Pθ0 (1). i=1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
92 Bernstein von Mises, smooth parametric models X 1,..., X n θ P n θ, θ Π Theorem [Bernstein von Mises] [Le Cam van der Vaart] Suppose (S) the model is LAN at θ 0 (T) for any ε > 0 there exist tests φ n with E θ0 φ n = o(1), sup E θ (1 φ n) = o(1) θ θ 0 >ε (P) the prior Π has a continuous positive density at θ 0 Then for ˆθ n efficient estimator at θ 0, as n, Π[ X ] τ 1 N ( 1 0, I ) ˆθ θ0 ( ) n 0 under P θ0. 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
93 Bernstein von Mises, smooth parametric models By the Bernstein von Mises (BvM) theorem, Π[ X ] τ 1 N ( 1 0, I ) ˆθ θ0 ( ) n 0 under P θ0. 1 For ˆθ efficient estimator, under P θ0, n(ˆθ θ 0) d N (0, I 1 θ 0 ) This shows the remarkable duality, asymptotically, L Π ( n(θ ˆθ) X ) L( n(ˆθ θ) θ = θ 0) Bayes law Frequentist law Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
94 Parametric BvM implies: credible sets are confidence sets! Suppose BvM holds for Θ R [dimension 1] Π( X ) τ 1 N ( 0, I 1 ) ˆθ θ 0 ( ) 0 under P θ 1 0. Let A n, B n be such that, for α = 5%, Π((, A n) X ) = Π((B n, + ) X ) = α/2 By definition Π [(A n, B n) X ] = 1 α Credible set Exercise. As n, P θ0 [θ 0 (A n, B n)] 1 α. so that (A n, B n) is an asymptotic confidence set at level 1 α. So, if the parametric BvM holds, credible sets are (asymptotically) confidence sets. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
95 BvM in Gaussian model for nonconjugate prior As a particular case of the previous BvM theorem, we have Proposition [BvM for nonconjugate prior in Gaussian model]. Let X 1,..., X n θ N (θ, 1) θ Π with dπ(θ) = π(θ)dθ and π continuous and positive density on R. Then Π[ X ] τ 1 X n N (0, 1)( ) P 0. 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
96 BvM in Gaussian model for nonconjugate prior As a particular case of the previous BvM theorem, we have Proposition [BvM for nonconjugate prior in Gaussian model]. Let X 1,..., X n θ N (θ, 1) θ Π with dπ(θ) = π(θ)dθ and π continuous and positive density on R. Then Π[ X ] τ 1 X n N (0, 1)( ) P 0. 1 Idea of proof g X n (u) := exp( u2 /2)π( X n + u n ) exp( u2 /2)π( X n + u n )du exp( u2 /2)π(θ 0) 2ππ(θ0) use Scheffé s lemma on event { X n θ 0 1/ log(n)} Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
97 Towards a nonparametric BvM? And what about a nonparametric BvM? Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
98 Gaussian white noise model dx (n) (t) = f (t)dt + 1 n dw (t) t [0, 1] Observe a trajectory X (n) f L 2 [0, 1] =: L 2 is unknown = object of interest Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
99 Gaussian white noise model dx (n) (t) = f (t)dt + 1 n dw (t) t [0, 1] Observe a trajectory X (n) f L 2 [0, 1] =: L 2 is unknown = object of interest ϕ k (t)dx (n) (t) = ϕ k (t)f (t)dt + n 1/2 X k = f k + n 1/2 ε k, k 1 ϕ k (t)dw (t), k 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
100 Gaussian white noise model dx (n) (t) = f (t)dt + 1 n dw (t) t [0, 1] Observe a trajectory X (n) f L 2 [0, 1] =: L 2 is unknown = object of interest ϕ k (t)dx (n) (t) = ϕ k (t)f (t)dt + n 1/2 X k = f k + n 1/2 ε k, k 1 ϕ k (t)dw (t), k 1 Equivalently, writing X (n) for the sequence {X k, k 1}, X (n) = f + n 1/2 W Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
101 Bernstein-von Mises, nonparametric? X (n) = f + n 1/2 W Let Π be a prior on f L 2 prior on the coordinates f k = f, ϕ k, k 1 Set ˆf := E Π [f X (n) ] posterior mean Two questions 1 Can one find Π in such a way that Π(f ˆf X (n) ) L(ˆf f f ) optimal law in some sense? 2 In which sense should one interpret? Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
102 Nonparametric BvMs, related work Negative results [Cox 93], [Freedman 99], [Leahu 11] Let Π be a Gaussian prior on f sitting on L 2 A nonparametric BvM does not hold in L 2 Some positive results exist for specific models/priors Mostly in cases where some form of conjugacy is present [Lo 80s] Consider a i.i.d. sample X1,..., X n P in R with c.d.f. F Dirichlet process prior on P nonparametric-bvm for F n(f Fn) X 1,..., X n d G, in probability, with G law of a P-Brownian bridge. [Kim and Lee 06] Neutral to the right prior on F in Survival analysis model NP-BvM for A cumulative hazard function Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
103 A large enough Hilbert space Standard Sobolev spaces For {ϕ k, k 1} smooth enough orthonormal basis of L 2 H s 2 := { f L 2 ([0, 1]) : f 2 s,2 := } k 2s ϕ k, f 2 < k 1 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
104 A large enough Hilbert space Standard Sobolev spaces {ψ lk } smooth enough orthonormal basis of L 2, { H2 s := f L 2 ([0, 1]) : f 2 s,2 := } al 2s ψ lk, f 2 < l L k Z l 1 Z l = 1, a l = max(2, l ) Fourier-type basis 2 L N, a l = Z l = 2 l wavelet-type basis Negative Sobolev space: for s R, H s 2 { f : f 2 s,2 := l L a 2s l } ψ lk, f 2 < k Z l Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
105 A large enough Hilbert space H We use logarithmic Sobolev spaces to get sharp rates. For δ 1 and s = 1/2, let { H := H 1/2,δ 2 f : f 2 1/2,2,δ := a 2( 1/2) } l ψ (log a l ) 2δ lk, f 2 < l L k Z l Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
106 A large enough Hilbert space H We use logarithmic Sobolev spaces to get sharp rates. For δ 1 and s = 1/2, let { H := H 1/2,δ 2 f : f 2 1/2,2,δ := a 2( 1/2) } l ψ (log a l ) 2δ lk, f 2 < l L k Z l X (n) = f + 1 n W white noise model in H Note that W a.s. belong to H (as well as X (n), f ), as E W 2 1/2,2,δ = l L a 1 l (log a l ) 2δ k Z l Eg 2 lk <. [This implies that W takes values in H a.s. and is tight in H ] W is a centered Gaussian measure on H with EW(g)W(h) = g, h f, g L 2. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
107 A large enough Hilbert space H We use logarithmic Sobolev spaces to get sharp rates. For δ 1 and s = 1/2, let { H := H 1/2,δ 2 f : f 2 1/2,2,δ := a 2( 1/2) } l ψ (log a l ) 2δ lk, f 2 < l L k Z l X (n) = f + 1 n W white noise model in H Note that W a.s. belong to H (as well as X (n), f ), as E W 2 1/2,2,δ = l L a 1 l (log a l ) 2δ k Z l Eg 2 lk <. [This implies that W takes values in H a.s. and is tight in H ] W is a centered Gaussian measure on H with EW(g)W(h) = g, h f, g L 2. Denote by N the law of W as a r.v. in H [limiting non-parametric distribution] Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
108 Weak nonparametric BvM, definition X (n) = f + 1 n W Let Π prior on f L 2 H. Let Π n = Π( X (n) ) be the posterior distribution on H. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
109 Weak nonparametric BvM, definition X (n) = f + 1 n W Let Π prior on f L 2 H. Let Π n = Π( X (n) ) be the posterior distribution on H. Denote τ : f n(f X (n) ) and Π n τ 1 be the rescaled posterior. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
110 Weak nonparametric BvM, definition X (n) = f + 1 n W Let Π prior on f L 2 H. Let Π n = Π( X (n) ) be the posterior distribution on H. Denote τ : f n(f X (n) ) and Π n τ 1 be the rescaled posterior. Let β be the bounded Lipschitz metric for weak cv. of probability measures on H β(µ, ν) = sup u(s)(dµ dν)(s) u BL(1) S Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
111 Weak nonparametric BvM, definition X (n) = f + 1 n W Let Π prior on f L 2 H. Let Π n = Π( X (n) ) be the posterior distribution on H. Denote τ : f n(f X (n) ) and Π n τ 1 be the rescaled posterior. Let β be the bounded Lipschitz metric for weak cv. of probability measures on H β(µ, ν) = sup u(s)(dµ dν)(s) u BL(1) S Definition The prior Π satisfies the weak Bernstein - von Mises phenomenon in H if β(π n τ 1, N ) Pn f 0 0, (n ). Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
112 Product priors and γ-smooth functions Consider priors of the form Π l,k π lk defined on the coordinates of the basis {ψ lk }, with π lk probability measures with Lebesgue density ϕ lk on R. Further assume, for some fixed density ϕ, ϕ lk ( ) = 1 ( ) ϕ k Z l with σ l > 0. σ l σ l For instance, to model γ-smooth functions, take σ l = 2 l(γ+1/2) and set, with g lk ϕ, G γ = 2 l(γ+1/2) g lk ψ lk, γ > 0 l k Z l Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
113 Main theorem - Condition (P) Denote f 0,lk = f 0, ψ lk the coordinates of the true f 0 on basis {ψ lk }. Suppose that for some M > 0, (P1) sup f 0,lk /σ l M. l,k Suppose also that ϕ is bounded and that for some τ > M, c ϕ > 0, (P2) ϕ(x) c ϕ x ( τ, τ), x 2 ϕ(x)dx <. R For the previous wavelet prior, (P1) asks for f 0,lk M2 l(γ+1/2) l, k f 0 C γ. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
114 Main theorem in white noise [C. and Nickl 13] Theorem [Weak nonparametric BvM in H] Any product prior Π and f 0 satisfying Conditions (P) verify the weak Bernstein-von Mises phenomenon in H, β(π( X (n) ) τ 1, N ) Pn f 0 0. Moreover the posterior mean ˆf n = E Π [f X (n) ] is efficient in the sense that ˆf n X (n) H = o P (1/ n). Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
115 Nonparametric BvM: Note on weak convergence Unlike in total variation convergence one does not have uniformity in all Borel sets B in Π( X ) τ 1 T (B) N (B) Pn f 0 0. One does have uniformity in all sets that have a uniformly smooth boundary for the probability measure N. In particular uniformity holds for all H -balls {B H (0, t) : 0 < t M} Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
116 Application: Credible ellipsoids Suppose the weak BvM theorem holds for Π. A natural (1 α)-credible set is then obtained by solving for R n = R n(α, X (n) ) such that { Π(C n X (n) ) = 1 α, where C n = f : f X (n) H R } n/ n C n is the smallest ball around X (n) charged by the posterior with probability 1 α. Theorem [Elliptical Credible set C n] The random set C n has confidence 1 α asymptotically in that P n f 0 (f 0 C n) 1 α and R n = O P (1). Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
117 Application: BvM and Credible sets for smooth functionals Linear functionals, for g L H s 2, s > 1/2, L : f 1 0 f (t)g L (t)dt. Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
118 Application: BvM and Credible sets for smooth functionals Linear functionals, for g L H s 2, s > 1/2, L : f Smooth nonlinear functionals, e.g. f f (t)g L (t)dt. f 2 (t)dt, f 0 C β, β > 1/2. Self-convolutions of 1-periodic functions f f f Leads to BvM for these functionals Deduce confident credible sets by taking quantiles of the induced posterior Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
119 Application: Confidence sets in L 2, uniform priors Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
120 Application: Confidence sets in L 2, uniform priors Consider first the special case of a uniform wavelet prior Π on L 2 U γ,m = l,k 2 l(γ+1/2) u lk ψ lk ( ), u lk U[ M, M] i.i.d. Such priors model functions in a Hölder ball of radius M, with posteriors Π n contracting about f 0 at the L 2 -minimax rate n γ/(2γ+1) within log factors if f 0 γ, M. It is natural to intersect the ellipsoid credible set C n with the Hölderian support of Π { C n = f : f ˆf n H R } n/ n, f γ, M Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
121 Application: Confidence sets in L 2, uniform priors Consider first the special case of a uniform wavelet prior Π on L 2 U γ,m = l,k 2 l(γ+1/2) u lk ψ lk ( ), u lk U[ M, M] i.i.d. Such priors model functions in a Hölder ball of radius M, with posteriors Π n contracting about f 0 at the L 2 -minimax rate n γ/(2γ+1) within log factors if f 0 γ, M. It is natural to intersect the ellipsoid credible set C n with the Hölderian support of Π { C n = f : f ˆf n H R } n/ n, f γ, M Proposition [Confidence sets for uniform product priors] Π(C n X (n) ) = 1 α, Pf n 0 (f 0 C n) 1 α and the L 2 -diameter C n 2 of C n satisfies, for some κ > 0, C n 2 = O P (n γ/(2γ+1) (log n) κ ). Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
122 Application: Confidence sets in L 2, general product priors Consider more generally, if f 0 C γ, G γ = l,k 2 l(γ+1/2) g lk ψ lk ( ), g lk i.i.d. ϕ Let δ > 0 arbitrary. Let, for γ,2,1 the γ-sobolev norm with log correction, { C n = f : f ˆf n H R } n/ n, f γ,2,1 M n + 4δ, where M n is defined as a quantile for the γ-norm Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
123 Application: Confidence sets in L 2, general product priors Consider more generally, if f 0 C γ, G γ = l,k 2 l(γ+1/2) g lk ψ lk ( ), g lk i.i.d. ϕ Let δ > 0 arbitrary. Let, for γ,2,1 the γ-sobolev norm with log correction, { C n = f : f ˆf n H R } n/ n, f γ,2,1 M n + 4δ, where M n is defined as a quantile for the γ-norm: for any n and δ n = (log n) 1/4, M n = inf {M > 0 : Π n(f : f γ,2,1 M δ) 1 δ n}, Proposition [Confidence sets for general product priors] P n f 0 (f 0 C n ) 1 α, Π(C n X (n) ) = 1 α + o P (1) C n 2 = O P (n γ/(2γ+1) (log n) κ ). Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
124 Nonparametric BvMs in white noise, further questions Adaptation Prior on regularity α [Kolyan Ray] one obtains e.g. adaptive confidence regions (modulo expected appropriate restrictions) Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September / 97
Bayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationA semiparametric Bernstein - von Mises theorem for Gaussian process priors
Noname manuscript No. (will be inserted by the editor) A semiparametric Bernstein - von Mises theorem for Gaussian process priors Ismaël Castillo Received: date / Accepted: date Abstract This paper is
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationIntroduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems
Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationOptimality of Poisson Processes Intensity Learning with Gaussian Processes
Journal of Machine Learning Research 16 (2015) 2909-2919 Submitted 9/14; Revised 3/15; Published 12/15 Optimality of Poisson Processes Intensity Learning with Gaussian Processes Alisa Kirichenko Harry
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationTANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.)
ABSTRACT TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) Prediction of the future observations is an important practical issue for
More informationAdaptive Bayesian procedures using random series priors
Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationarxiv: v2 [math.st] 18 Feb 2013
Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationNonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII
Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationGaussian Approximations for Probability Measures on R d
SIAM/ASA J. UNCERTAINTY QUANTIFICATION Vol. 5, pp. 1136 1165 c 2017 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license Gaussian Approximations for Probability Measures
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES
Submitted to the Annals of Statistics arxiv: 1111.1044v1 ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES By Anirban Bhattacharya,, Debdeep Pati, and David Dunson, Duke University,
More informationBayesian Statistics in High Dimensions
Bayesian Statistics in High Dimensions Lecture 2: Sparsity Aad van der Vaart Universiteit Leiden, Netherlands 47th John H. Barrett Memorial Lectures, Knoxville, Tenessee, May 2017 Contents Sparsity Bayesian
More informationBrittleness and Robustness of Bayesian Inference
Brittleness and Robustness of Bayesian Inference Tim Sullivan with Houman Owhadi and Clint Scovel (Caltech) Mathematics Institute, University of Warwick Predictive Modelling Seminar University of Warwick,
More informationarxiv: v1 [math.st] 15 Dec 2017
A THEORETICAL FRAMEWORK FOR BAYESIAN NONPARAMETRIC REGRESSION: ORTHONORMAL RANDOM SERIES AND RATES OF CONTRACTION arxiv:171.05731v1 math.st] 15 Dec 017 By Fangzheng Xie, Wei Jin, and Yanxun Xu Johns Hopkins
More informationMIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design
MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationLectures on Nonparametric Bayesian Statistics
Lectures on Nonparametric Bayesian Statistics Aad van der Vaart Universiteit Leiden, Netherlands Bad Belzig, March 2013 Contents Introduction Dirichlet process Consistency and rates Gaussian process priors
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationRATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University
Submitted to the Annals o Statistics RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS By Chao Gao and Harrison H. Zhou Yale University A novel block prior is proposed or adaptive Bayesian estimation.
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationEmpirical priors for targeted posterior concentration rates
Empirical priors for targeted posterior concentration rates Ryan Martin and Stephen G. Walker June 7, 2018 arxiv:1604.05734v4 [math.st] 6 Jun 2018 Abstract In high-dimensional problems, choosing a prior
More informationMultiple Random Variables
Multiple Random Variables Joint Probability Density Let X and Y be two random variables. Their joint distribution function is F ( XY x, y) P X x Y y. F XY ( ) 1, < x
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationNonparametric estimation under Shape Restrictions
Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions
More informationBrownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539
Brownian motion Samy Tindel Purdue University Probability Theory 2 - MA 539 Mostly taken from Brownian Motion and Stochastic Calculus by I. Karatzas and S. Shreve Samy T. Brownian motion Probability Theory
More informationOptimal series representations of continuous Gaussian random fields
Optimal series representations of continuous Gaussian random fields Antoine AYACHE Université Lille 1 - Laboratoire Paul Painlevé A. Ayache (Lille 1) Optimality of continuous Gaussian series 04/25/2012
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini April 27, 2018 1 / 80 Classical case: n d. Asymptotic assumption: d is fixed and n. Basic tools: LLN and CLT. High-dimensional setting: n d, e.g. n/d
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationAsymptotics for posterior hazards
Asymptotics for posterior hazards Pierpaolo De Blasi University of Turin 10th August 2007, BNR Workshop, Isaac Newton Intitute, Cambridge, UK Joint work with Giovanni Peccati (Université Paris VI) and
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationarxiv: v1 [math.st] 6 Nov 2012
The Annals of Statistics 212, Vol. 4, No. 4, 269 211 DOI: 1.1214/12-AOS129 c Institute of Mathematical Statistics, 212 arxiv:1211.1197v1 [math.st] 6 Nov 212 NEEDLES AND STRAW IN A HAYSTACK: POSTERIOR CONCENTRATION
More informationSome Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book)
Some Asymptotic Bayesian Inference (background to Chapter 2 of Tanner s book) Principal Topics The approach to certainty with increasing evidence. The approach to consensus for several agents, with increasing
More information1 Continuity Classes C m (Ω)
0.1 Norms 0.1 Norms A norm on a linear space X is a function : X R with the properties: Positive Definite: x 0 x X (nonnegative) x = 0 x = 0 (strictly positive) λx = λ x x X, λ C(homogeneous) x + y x +
More informationNonparametric estimation under Shape Restrictions
Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions
More informationPOSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES. By Andriy Norets and Justinas Pelenis
Resubmitted to Econometric Theory POSTERIOR CONSISTENCY IN CONDITIONAL DENSITY ESTIMATION BY COVARIATE DEPENDENT MIXTURES By Andriy Norets and Justinas Pelenis Princeton University and Institute for Advanced
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationLecture 16: Sample quantiles and their asymptotic properties
Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS
Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)
More informationPosterior properties of the support function for set inference
Posterior properties of the support function for set inference Yuan Liao University of Maryland Anna Simoni CNRS and CREST November 26, 2014 Abstract Inference on sets is of huge importance in many statistical
More informationNonparametric estimation of log-concave densities
Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Seminaire, Institut de Mathématiques de Toulouse 5 March 2012 Seminaire, Toulouse Based on joint work
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationEstimation of the functional Weibull-tail coefficient
1/ 29 Estimation of the functional Weibull-tail coefficient Stéphane Girard Inria Grenoble Rhône-Alpes & LJK, France http://mistis.inrialpes.fr/people/girard/ June 2016 joint work with Laurent Gardes,
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationBERNSTEIN POLYNOMIAL DENSITY ESTIMATION 1265 bounded away from zero and infinity. Gibbs sampling techniques to compute the posterior mean and other po
The Annals of Statistics 2001, Vol. 29, No. 5, 1264 1280 CONVERGENCE RATES FOR DENSITY ESTIMATION WITH BERNSTEIN POLYNOMIALS By Subhashis Ghosal University of Minnesota Mixture models for density estimation
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationNonparametric estimation of log-concave densities
Nonparametric estimation of log-concave densities Jon A. Wellner University of Washington, Seattle Northwestern University November 5, 2010 Conference on Shape Restrictions in Non- and Semi-Parametric
More informationBrittleness and Robustness of Bayesian Inference
Brittleness and Robustness of Bayesian Inference Tim Sullivan 1 with Houman Owhadi 2 and Clint Scovel 2 1 Free University of Berlin / Zuse Institute Berlin, Germany 2 California Institute of Technology,
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationBayesian statistics: Inference and decision theory
Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:
More informationTD 1: Hilbert Spaces and Applications
Université Paris-Dauphine Functional Analysis and PDEs Master MMD-MA 2017/2018 Generalities TD 1: Hilbert Spaces and Applications Exercise 1 (Generalized Parallelogram law). Let (H,, ) be a Hilbert space.
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationBrownian Motion and Conditional Probability
Math 561: Theory of Probability (Spring 2018) Week 10 Brownian Motion and Conditional Probability 10.1 Standard Brownian Motion (SBM) Brownian motion is a stochastic process with both practical and theoretical
More information