Nonparametric Bayesian Uncertainty Quantification

Size: px
Start display at page:

Download "Nonparametric Bayesian Uncertainty Quantification"

Transcription

1 Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017

2 Contents Introduction Recovery Gaussian process priors Dirichlet process Dirichlet process mixtures

3 Introduction

4 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X).

5 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X). We assume whatever needed (e.g. Θ Polish and Π a probability distribution on its Borel σ-field; Polish sample space) to make this well defined.

6 Bayes s rule A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X). If P θ is given by a density x p θ (x), then Bayes s rule gives Π(Θ B X) = B p θ(x)dπ(θ) Θ p θ(x)dπ(θ)

7 Bayes s rule A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X). If P θ is given by a density x p θ (x), then Bayes s rule gives dπ(θ X) p θ (X)dΠ(θ)

8 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n,θ). Using his famous rule he computed that the posterior distribution is then Beta(X +1,n X +1). P(a Θ b) = b a, 0 < a < b < 1, ( ) n P(X = x Θ = θ) = θ x (1 θ) n x, x = 0,1,...,n, x dπ(θ X) = θ X (1 θ) n X 1.

9 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n,θ). Using his famous rule he computed that the posterior distribution is then Beta(X +1,n X +1). n=20, x=

10 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n,θ). Using his famous rule he computed that the posterior distribution is then Beta(X +1,n X +1)

11 Parametric Bayes Pierre-Simon Laplace ( ) rediscovered Bayes argument and applied it to general parametric models: models smoothly indexed by a Euclidean parameter θ. He had many followers, but Ronald Aylmer Fisher ( ) did not buy into it. The Bayesian method regained popularity following the development of MCMC methods in the 1980/90s. Nonparametric Bayesian statistics set off in the 1970/80/90s, although it was long thought not too work.

12 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions

13 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions

14 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions

15 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions

16 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions

17 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions

18 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X.

19 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X. UNCERTAINTY QUANTIFICATION We like the spread of Π(θ X) to indicate remaining uncertainty.

20 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X. UNCERTAINTY QUANTIFICATION We like the spread of Π(θ X) to indicate remaining uncertainty. A set with C X with Π(θ C X X) = 0.95 is called a credible set. Is it a confidence set? Is P θ0 (C X θ 0 ) = 0.95? Is its order of magnitude correct?

21 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X. UNCERTAINTY QUANTIFICATION We like the spread of Π(θ X) to indicate remaining uncertainty. A set with C X with Π(θ C X X) = 0.95 is called a credible set. Is it a confidence set? Is P θ0 (C X θ 0 ) = 0.95? Is its order of magnitude correct? Asymptotic setting: data X (n) where the information increases as n. We want Π n ( X (n) δ θ0, at a good rate. We like P θ0 (C X (n) θ 0 ) 0.95.

22 Parametric models Suppose the data are a random sample X 1,...,X n from a density x p θ (x) that is smoothly and identifiably parametrized by a vector θ R d (e.g. θ p θ continuously differentiable as map in L 2 (µ)). Theorem. [Laplace, Bernstein, von Mises, LeCam 1989] Under Pθ n 0 -probability, for any prior with density that is positive around θ 0, Π( X 1,...,X n ) N d ( θn, 1 n I 1 θ 0 ) ( ) 0. Here θ n are estimators with n( θ n θ 0 ) N(0,I 1 θ 0 ) n= n=

23 Parametric models Suppose the data are a random sample X 1,...,X n from a density x p θ (x) that is smoothly and identifiably parametrized by a vector θ R d (e.g. θ p θ continuously differentiable as map in L 2 (µ)). Theorem. [Laplace, Bernstein, von Mises, LeCam 1989] Under Pθ n 0 -probability, for any prior with density that is positive around θ 0, Π( X 1,...,X n ) N d ( θn, 1 n I 1 θ 0 ) ( ) 0. Here θ n are estimators with n( θ n θ 0 ) N(0,I 1 θ 0 ). RECOVERY: The posterior distribution concentrates most of its mass on balls of radius O(1/ n) around θ 0. UNCERTAINTY QUANTIFICATION: A central set of posterior probability 95 % is equivalent to the usual Wald confidence set { θ:n(θ θ n ) T I θn (θ θ n ) χ 2 d,1 α}.

24 These lectures Recovery and uncertainty quantification for nonparametric models. LECTURE 1: Introduction to recovery. LECTURE 2: Uncertainty quantification for curve fitting/smoothing. LECTURE 3: Uncertainty quantification in high dimensions under sparsity. Point of view: How does the posterior distribution for natural priors behave, in particular for priors that adapt to complexity in the data

25 Recovery

26 Consistency X (n) observation in sample space (X (n),x (n) ) with distribution P (n) θ. θ belongs to metric space (Θ,d). Definition. ǫ > 0. The posterior distribution is consistent at θ 0 Θ if for every E θ0 Π n ( θ:d(θ,θ0 ) > ǫ X (n)) 0, n.

27 Schwartz s theorem (1965) Parameter p: ν-density on sample space (X,X). True value p 0. K(p 0 ;p) = p 0 log(p 0 /p)dν.

28 Schwartz s theorem (1965) Parameter p: ν-density on sample space (X,X). True value p 0. K(p 0 ;p) = p 0 log(p 0 /p)dν. Definition. p 0 is said to possess the Kullback-Leibler property relative to Π if Π ( p:k(p 0 ;p) < ǫ ) > 0 for every ǫ > 0. Bayesian model: X 1,...,X n p iid p, p Π.

29 Schwartz s theorem (1965) Parameter p: ν-density on sample space (X,X). True value p 0. K(p 0 ;p) = p 0 log(p 0 /p)dν. Definition. p 0 is said to possess the Kullback-Leibler property relative to Π if Π ( p:k(p 0 ;p) < ǫ ) > 0 for every ǫ > 0. Bayesian model: X 1,...,X n p iid p, p Π. Theorem. If p 0 has KL-property, and for every neighbourhood U of p 0 there exist tests φ n such that P n 0 φ n 0, sup P n (1 φ n ) 0, p U c then Π n ( X 1,...,X n ) is consistent at p 0.

30 Extended Schwartz s theorem Bayesian model: X 1,...,X n p iid p, p Π. Theorem. If If p 0 has KL-property and for every neighbourhood U of p 0 there exist C > 0, sets P n P and tests φ n such that Π(P \P n ) < e Cn, P n 0 φ n e Cn, sup p P n U c P n (1 φ n ) e Cn, then Π n ( X 1,...,X n ) is consistent at p 0. Tests exist if: Weak topology: always L 1 -distance: if logn ( ǫ,p n, 1 ) nǫ 2 /3, for all ε > 0. Definition. The covering number N(ǫ,P,d) is the minimal number of d-balls of radius ǫ needed to cover P.

31 Rate of contraction Bayesian model: X 1,...,X n p iid p, p Π. Definition. The posterior distribution Π n ( X (n) ) contracts at rate ǫ n 0 at θ 0 Θ if, for every M n, E θ0 Π n ( θ:d(θ,θ0 ) > M n ǫ n X (n)) 0, n.

32 Rate of contraction Bayesian model: X 1,...,X n p iid p, p Π. Definition. The posterior distribution Π n ( X (n) ) contracts at rate ǫ n 0 at θ 0 Θ if, for every M n, E θ0 Π n ( θ:d(θ,θ0 ) > M n ǫ n X (n)) 0, n. Benchmark rate for curve fitting: A function θ of d variables that has bounded derivatives of order β is estimable based on n observations at rate n β/(2β+d). Proposition. If the posterior distribution contracts at rate ǫ n at θ 0, then ˆθ n defined as the center of a (nearly) smallest ball that contains posterior mass at least 1/2 satisfies d(ˆθ n,θ 0 ) = O P (ǫ n ) under P (n) θ 0.

33 Basic contraction theorem Bayesian model: X 1,...,X n p iid p, p Π. K(p 0 ;p) = P 0 log p ( 0 p, V(p 0;p) = P 0 log p ) 0 2, h 2 (p 0,p) = p ( p 0 p) 2 dν. Theorem. Given metric d h whose balls are convex suppose that there exist P n P and C > 0, such that, ( (i) Π n p:k(p0 ;p) < ǫ 2 n,v(p 0 ;p) < ǫ 2 n) e Cnǫ 2 n, (prior mass) (ii) logn ( ǫ n,p n,d ) nǫ 2 n. (complexity) (iii) Π n (Pn) c e (C+4)nǫ2 n. Then the posterior rate of convergence for d is ǫ n n 1/2.

34 Basic contraction theorem proof Proof. There exist tests φ n with P n 0 φ n e nǫ2 n e nm2 ǫ 2 n/8 1 e nm2 ǫ 2 n/8, For A n = { n i=1 (p/p } 0)(X i )dπ n (p) e (2+C)nǫ2 n Π n ( p:d(p,p0 ) > Mǫ n X 1,...,X n ) φ n +1l{A c n}+e (2+C)nǫ2 n P n 0 (Ac n) 0. See further on. sup P n nm2ǫ2 (1 φ n ) e n/8. p P n :d(p,p 0 )>Mǫ n d(p,p 0 )>Mǫ n n i=1 p p 0 (X i )dπ n (p)(1 φ n ).

35 Basic contraction theorem proof continued Proof. By Fubini (Continued) P n 0 p P n :d(p,p 0 )>Mǫ n n i=1 p p 0 (X i )dπ n (p) P n nm2ǫ2 (1 φ n )dπ n (p) e n /8 p P n :d(p,p 0 )>Mǫ n By Fubini P n 0 P\P n n i=1 p p 0 (X i )dπ n (p) Π n (P \P n ).

36 Bounding the denominator Lemma. For any probability measure Π on P, and positive constant ǫ, with P n 0 -probability at least 1 (nǫ2 ) 1, n i=1 p p 0 (X i )dπ(p) Π ( p:k(p 0 ;p) < ǫ 2,V(p 0 ;p) < ǫ 2) e 2nǫ2.

37 Apply Chebyshev s inequality. Bounding the denominator Lemma. For any probability measure Π on P, and positive constant ǫ, with P n 0 -probability at least 1 (nǫ2 ) 1, n i=1 p p 0 (X i )dπ(p) Π ( p:k(p 0 ;p) < ǫ 2,V(p 0 ;p) < ǫ 2) e 2nǫ2. Proof. B:= { p:k(p 0 ;p) < ǫ 2 n,v(p 0 ;p) < ǫ 2 n}. n p log (X i )dπ(p) p 0 i=1 n i=1 log p p 0 (X i )dπ(p) =:Z. EZ = n K(p 0 ;p)dπ(p) > nǫ 2, varz np 0 ( log p 0 p dπ(p) ) 2 np0 ( log p 0 p ) 2dΠ(p) nǫ 2,

38 Interpretation Consider a maximal set of points p 1,...,p N in P n with d(p i,p j ) ǫ n. Maximality implies N N(ǫ n,p n,d) e c 1nǫ 2 n, under the entropy bound. The balls of radius ǫ n /2 around the points are disjoint and hence the sum of their prior masses will be less than 1. If the prior mass were evenly distributed over these balls, then each would have no more mass than e c 1nǫ 2 n. This is of the same order as the prior mass bound. This argument suggests that the conditions can only be satisfied for every p 0 in the model if the prior distributes its mass uniformly, at discretization level ǫ n.

39 Gaussian process priors

40 Gaussian process prior The law of a stochastic process W = (W t :t T) is a prior distribution on the space of functions θ:t R W is a Gaussian process if (W t1,...,w tk ) is multivariate Gaussian, for every t 1,...,t k. Mean and covariance function: t EW t, and (s,t) cov(w s,w t ), s,t T.

41 Brownian motion and its primitives , 1, 2 and 3 times integrated Brownian motion

42 Settings Density estimation X 1,...,X n iid in [0,1], p θ (x) = eθ(x) 1 0 eθ(t) dt. Distance on parameter: Hellinger on p θ. Norm on W : uniform. Classification (X 1,Y 1 ),...,(X n,y n ) iid in [0,1] {0,1} P θ (Y = 1 X = x) = 1 1+e θ(x). Regression Y 1,...,Y n independent N(θ(x i ),σ 2 ), for fixed design points x 1,...,x n. Ergodic diffusions (X t :t [0,n]), ergodic, recurrent: dx t = θ(x t )dt+σ(x t )db t. Distance on parameter: L 2 (G) on P θ. (G marginal of X i.) Norm on W : L 2 (G). Distance on parameter: empirical L 2 -distance on θ. Norm on W : empirical L 2 -distance. Distance on parameter: random Hellinger h n ( /σ µ0,2). Norm on W : L 2 (µ 0 ). (µ 0 stationary measure.)

43 Posterior contraction rates for Gaussian priors View Gaussian process W as measurable map into Banach space (B, ). Theorem. If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if P ( W w 0 < ε n ) e nε 2 n.

44 Posterior contraction rates for Gaussian priors View Gaussian process W as measurable map into Banach space (B, ). Theorem. If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if P ( W w 0 < ε n ) e nε 2 n. Proof. Stated condition is prior mass. Complexity is automatic due to concentration of Gaussian processes.

45 Posterior contraction rates for Gaussian priors View Gaussian process W as measurable map into Banach space (B, ). Theorem. If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if P ( W w 0 < ε n ) e nε 2 n. An equivalent condition is φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2, h H: h w 0 <ε n where φ 0 (ε) = logπ( W < ε) is the small ball exponent and (H, H ) is the RKHS. Both inequalities give lower bound on ε n. The first depends on W and not on w 0.

46 Brownian Motion Theorem. If θ 0 C β [0,1], then the rate for Brownian motion is n β/2 if β 1/2 and n 1/4 for every β 1/2. The rate is n β/(2β+1) iff β = 1/2.

47 Brownian Motion Theorem. If θ 0 C β [0,1], then the rate for Brownian motion is n β/2 if β 1/2 and n 1/4 for every β 1/2. The rate is n β/(2β+1) iff β = 1/2. The small ball probability of Brownian motion is P ( W < ε ) e (1/ε)2, ε 0. This causes a n 1/4 -rate even for very smooth truths.

48 Integrated Brownian Motion Theorem. If θ 0 C β [0,1], then the rate for (α 1/2)-times integrated Brownian is n (α β)/(2α+d). The rate is n β/(2β+1) iff β = α.

49 Integrated Brownian motion 1/c Γ(a,b). (G t :t > 0) the k-fold integral of Brownian motion released at zero, W t cg t. Theorem. The prior W = ( cg t :0 t 1) gives contraction rate n β/(2β+1) for θ 0 C β [0,1], for any β (0,k +1]. Bayes solves the bandwidth problem.

50 Square exponential process cov(g s,g t ) = e s t 2, s,t R d Theorem. The prior G gives a rate (logn) γ / n if θ 0 is analytic, but may give a rate (logn) γ if θ 0 is only ordinary smooth.

51 Square exponential process adaptation by random time scaling c d Γ. (G t :t > 0) square exponential process. W t G ct. Theorem. if θ 0 C β [0,1] d, then the rate of contraction is nearly n β/(2β+d). if θ 0 is supersmooth, then the rate is nearly n 1/

52 Gaussian processes: summary Recovery is best if prior matches truth. Mismatch slows down, but does not prevent, recovery. Mismatch can be prevented by using hyperparameters.

53 Dirichlet process

54 Finite-dimensonal Dirichlet distribution Definition. (θ 1,...,θ k ) possesses a Dir(k,α 1,...,α k )-distribution for α i > 0 it has (Lebesgue) density on the unit simplex proportional to θ θ α θ α k 1 k.

55 Finite-dimensonal Dirichlet distribution Definition. (θ 1,...,θ k ) possesses a Dir(k,α 1,...,α k )-distribution for α i > 0 it has (Lebesgue) density on the unit simplex proportional to θ θ α θ α k 1 k. We extend to α i = 0 for one or more i on the understanding that θ i = 0. Theorem. If θ Dir(k,α) and N θ Multinom(n,α), then θ N Dir(k,α+N). Proof. Π(dθ N) ( n) k N i=1 θn i i k i=1 θα i 1 i.

56 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ).

57 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ). Lemma. EP(A) = α(a) α.

58 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ). Lemma. EP(A) = α(a) α

59 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ). Lemma. EP(A) = α(a) α. Theorem. If θ 1,θ 2,... ᾱ iid and Y 1,Y 2,... Be(1,M) iid are independent, and W j = (1 Y 1 ) (1 Y j 1 )Y j, then P:= W j δ θj DP(Mᾱ). j=1

60 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i.

61 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i. Proof. ( P(A1 ),...,P(A k ) ) X 1,...,X n ( P(A 1 ),...,P(A k ) ) (N 1,...,N k ), where N i = np n (A i ). Apply result for finite-dimensional Dirichlet.

62 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. Corollary. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i. E ( P(A) X 1,...,X n ) = α(a) α +n + n α +n P n(a).

63 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i. Theorem. The posterior distribution of P(A) satisfies a Bernstein-von Mises theorem: Π ( ) ( P(A) X 1,...,X n N Pn (A), P 0(A)(1 P 0 )(A)) 0. n

64 Pitman-Yor process θ 1,θ 2,... iid ᾱ independent of Y j ind Be(1 σ,m +jσ). W j = (1 Y 1 ) (1 Y j 1 )Y j. P:= W j δ θj. j=1 Theorem. The posterior distribution based on X 1,...,X n P P is inconsistent unless σ = 0 or P 0 = α or P 0 is discrete

65 Dirichlet process mixtures

66 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Posterior mean (solid black) and 10 draws of the posterior distribution for a sample of size 50 from a mixture of two normals (red).

67 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Two cases for the true density p 0 : Supersmooth: p 0 = p F0,σ 0, for compactly supported F 0. Take prior for σ with continuous positive density on (a,b) σ 0. Ordinary smooth: p 0 has β derivatives and exponentially small tails. Take 1/σ a priori Gamma distributed.

68 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Two cases for the true density p 0 : Supersmooth: p 0 = p F0,σ 0, for compactly supported F 0. Take prior for σ with continuous positive density on (a,b) σ 0. Ordinary smooth: p 0 has β derivatives and exponentially small tails. Take 1/σ a priori Gamma distributed. Theorem. Hellinger rate of contraction is nearly n 1/2 in the supersmooth case. nearly n β/(2β+1) in the ordinary smooth case.

69 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Two cases for the true density p 0 : Supersmooth: p 0 = p F0,σ 0, for compactly supported F 0. Take prior for σ with continuous positive density on (a,b) σ 0. Ordinary smooth: p 0 has β derivatives and exponentially small tails. Take 1/σ a priori Gamma distributed. Adaptation to any smoothness with a Gaussian kernel! Kernel density estimation needs higher order kernels. 1 nσ n ( x Xi ) φ σ i=1 = p Fn,σ(x).

70 Key lemma: finite approximation Lemma. For any probability measure F on the interval [0,1] there exists a discrete probability measure F on with at most N log 1 ε support points, such that p F,1 p F,1 ε, ( p F,1 p F,1 1 ε log 1 1/2. ε) Proof. Match moments of F and F up to order log(1/ǫ). Taylor expand the kernel z φ(x z).

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Lectures on Nonparametric Bayesian Statistics

Lectures on Nonparametric Bayesian Statistics Lectures on Nonparametric Bayesian Statistics Aad van der Vaart Universiteit Leiden, Netherlands Bad Belzig, March 2013 Contents Introduction Dirichlet process Consistency and rates Gaussian process priors

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

Nonparametric Bayesian Methods - Lecture I

Nonparametric Bayesian Methods - Lecture I Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Nonparametric Bayesian model selection and averaging

Nonparametric Bayesian model selection and averaging Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

On Bayesian bandit algorithms

On Bayesian bandit algorithms On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems

Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

21.1 Lower bounds on minimax risk for functional estimation

21.1 Lower bounds on minimax risk for functional estimation ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Bayesian Statistics in High Dimensions

Bayesian Statistics in High Dimensions Bayesian Statistics in High Dimensions Lecture 2: Sparsity Aad van der Vaart Universiteit Leiden, Netherlands 47th John H. Barrett Memorial Lectures, Knoxville, Tenessee, May 2017 Contents Sparsity Bayesian

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

Adaptive Bayesian procedures using random series priors

Adaptive Bayesian procedures using random series priors Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

Weak convergence of Markov chain Monte Carlo II

Weak convergence of Markov chain Monte Carlo II Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th

1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th The Annals of Statistics 2001, Vol. 29, No. 5, 1233 1263 ENTROPIES AND RATES OF CONVERGENCE FOR MAXIMUM LIKELIHOOD AND BAYES ESTIMATION FOR MIXTURES OF NORMAL DENSITIES By Subhashis Ghosal and Aad W. van

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Exchangeability. Peter Orbanz. Columbia University

Exchangeability. Peter Orbanz. Columbia University Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter

More information

A semiparametric Bernstein - von Mises theorem for Gaussian process priors

A semiparametric Bernstein - von Mises theorem for Gaussian process priors Noname manuscript No. (will be inserted by the editor) A semiparametric Bernstein - von Mises theorem for Gaussian process priors Ismaël Castillo Received: date / Accepted: date Abstract This paper is

More information

Gaussian Approximations for Probability Measures on R d

Gaussian Approximations for Probability Measures on R d SIAM/ASA J. UNCERTAINTY QUANTIFICATION Vol. 5, pp. 1136 1165 c 2017 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license Gaussian Approximations for Probability Measures

More information

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist

More information

Semiparametric Minimax Rates

Semiparametric Minimax Rates arxiv: math.pr/0000000 Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology

More information

Markov Chain Monte Carlo (MCMC)

Markov Chain Monte Carlo (MCMC) Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can

More information

Bayesian statistics: Inference and decision theory

Bayesian statistics: Inference and decision theory Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

arxiv: v1 [math.st] 15 Dec 2017

arxiv: v1 [math.st] 15 Dec 2017 A THEORETICAL FRAMEWORK FOR BAYESIAN NONPARAMETRIC REGRESSION: ORTHONORMAL RANDOM SERIES AND RATES OF CONTRACTION arxiv:171.05731v1 math.st] 15 Dec 017 By Fangzheng Xie, Wei Jin, and Yanxun Xu Johns Hopkins

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Convergence of latent mixing measures in nonparametric and mixture models 1

Convergence of latent mixing measures in nonparametric and mixture models 1 Convergence of latent mixing measures in nonparametric and mixture models 1 XuanLong Nguyen xuanlong@umich.edu Department of Statistics University of Michigan Technical Report 527 (Revised, Jan 25, 2012)

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/

More information

What are the Findings?

What are the Findings? What are the Findings? James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison Madison, Wisconsin April 2010 Rawlings (Wisconsin) Stating the findings 1 / 33

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

The Bayesian approach to inverse problems

The Bayesian approach to inverse problems The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu

More information

Hypes and Other Important Developments in Statistics

Hypes and Other Important Developments in Statistics Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Expectation Propagation for Approximate Bayesian Inference

Expectation Propagation for Approximate Bayesian Inference Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given

More information

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??

Introduction to Bayesian Methods. Introduction to Bayesian Methods p.1/?? to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter

More information

Free energy estimates for the two-dimensional Keller-Segel model

Free energy estimates for the two-dimensional Keller-Segel model Free energy estimates for the two-dimensional Keller-Segel model dolbeaul@ceremade.dauphine.fr CEREMADE CNRS & Université Paris-Dauphine in collaboration with A. Blanchet (CERMICS, ENPC & Ceremade) & B.

More information

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case

The largest eigenvalues of the sample covariance matrix. in the heavy-tail case The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Theoretical Statistics. Lecture 1.

Theoretical Statistics. Lecture 1. 1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

Intro to Bayesian Methods

Intro to Bayesian Methods Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011

LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011 LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Probability for Statistics and Machine Learning

Probability for Statistics and Machine Learning ~Springer Anirban DasGupta Probability for Statistics and Machine Learning Fundamentals and Advanced Topics Contents Suggested Courses with Diffe~ent Themes........................... xix 1 Review of Univariate

More information

Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures

Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures Submitted to Bernoulli arxiv: 1.v1 Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures SOPHIE DONNET 1 VINCENT RIVOIRARD JUDITH ROUSSEAU 3 and CATIA

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information