Nonparametric Bayesian Uncertainty Quantification
|
|
- Ophelia Marshall
- 6 years ago
- Views:
Transcription
1 Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017
2 Contents Introduction Recovery Gaussian process priors Dirichlet process Dirichlet process mixtures
3 Introduction
4 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X).
5 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X). We assume whatever needed (e.g. Θ Polish and Π a probability distribution on its Borel σ-field; Polish sample space) to make this well defined.
6 Bayes s rule A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X). If P θ is given by a density x p θ (x), then Bayes s rule gives Π(Θ B X) = B p θ(x)dπ(θ) Θ p θ(x)dπ(θ)
7 Bayes s rule A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a measure P θ. This gives a joint distribution of (X,Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x, the posterior distribution: Π(θ B X). If P θ is given by a density x p θ (x), then Bayes s rule gives dπ(θ X) p θ (X)dΠ(θ)
8 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n,θ). Using his famous rule he computed that the posterior distribution is then Beta(X +1,n X +1). P(a Θ b) = b a, 0 < a < b < 1, ( ) n P(X = x Θ = θ) = θ x (1 θ) n x, x = 0,1,...,n, x dπ(θ X) = θ X (1 θ) n X 1.
9 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n,θ). Using his famous rule he computed that the posterior distribution is then Beta(X +1,n X +1). n=20, x=
10 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n,θ). Using his famous rule he computed that the posterior distribution is then Beta(X +1,n X +1)
11 Parametric Bayes Pierre-Simon Laplace ( ) rediscovered Bayes argument and applied it to general parametric models: models smoothly indexed by a Euclidean parameter θ. He had many followers, but Ronald Aylmer Fisher ( ) did not buy into it. The Bayesian method regained popularity following the development of MCMC methods in the 1980/90s. Nonparametric Bayesian statistics set off in the 1970/80/90s, although it was long thought not too work.
12 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
13 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
14 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
15 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
16 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
17 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes s formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
18 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X.
19 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X. UNCERTAINTY QUANTIFICATION We like the spread of Π(θ X) to indicate remaining uncertainty.
20 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X. UNCERTAINTY QUANTIFICATION We like the spread of Π(θ X) to indicate remaining uncertainty. A set with C X with Π(θ C X X) = 0.95 is called a credible set. Is it a confidence set? Is P θ0 (C X θ 0 ) = 0.95? Is its order of magnitude correct?
21 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set dependent on X. RECOVERY We like Π(θ X) to put most of its mass near θ 0 for most X. UNCERTAINTY QUANTIFICATION We like the spread of Π(θ X) to indicate remaining uncertainty. A set with C X with Π(θ C X X) = 0.95 is called a credible set. Is it a confidence set? Is P θ0 (C X θ 0 ) = 0.95? Is its order of magnitude correct? Asymptotic setting: data X (n) where the information increases as n. We want Π n ( X (n) δ θ0, at a good rate. We like P θ0 (C X (n) θ 0 ) 0.95.
22 Parametric models Suppose the data are a random sample X 1,...,X n from a density x p θ (x) that is smoothly and identifiably parametrized by a vector θ R d (e.g. θ p θ continuously differentiable as map in L 2 (µ)). Theorem. [Laplace, Bernstein, von Mises, LeCam 1989] Under Pθ n 0 -probability, for any prior with density that is positive around θ 0, Π( X 1,...,X n ) N d ( θn, 1 n I 1 θ 0 ) ( ) 0. Here θ n are estimators with n( θ n θ 0 ) N(0,I 1 θ 0 ) n= n=
23 Parametric models Suppose the data are a random sample X 1,...,X n from a density x p θ (x) that is smoothly and identifiably parametrized by a vector θ R d (e.g. θ p θ continuously differentiable as map in L 2 (µ)). Theorem. [Laplace, Bernstein, von Mises, LeCam 1989] Under Pθ n 0 -probability, for any prior with density that is positive around θ 0, Π( X 1,...,X n ) N d ( θn, 1 n I 1 θ 0 ) ( ) 0. Here θ n are estimators with n( θ n θ 0 ) N(0,I 1 θ 0 ). RECOVERY: The posterior distribution concentrates most of its mass on balls of radius O(1/ n) around θ 0. UNCERTAINTY QUANTIFICATION: A central set of posterior probability 95 % is equivalent to the usual Wald confidence set { θ:n(θ θ n ) T I θn (θ θ n ) χ 2 d,1 α}.
24 These lectures Recovery and uncertainty quantification for nonparametric models. LECTURE 1: Introduction to recovery. LECTURE 2: Uncertainty quantification for curve fitting/smoothing. LECTURE 3: Uncertainty quantification in high dimensions under sparsity. Point of view: How does the posterior distribution for natural priors behave, in particular for priors that adapt to complexity in the data
25 Recovery
26 Consistency X (n) observation in sample space (X (n),x (n) ) with distribution P (n) θ. θ belongs to metric space (Θ,d). Definition. ǫ > 0. The posterior distribution is consistent at θ 0 Θ if for every E θ0 Π n ( θ:d(θ,θ0 ) > ǫ X (n)) 0, n.
27 Schwartz s theorem (1965) Parameter p: ν-density on sample space (X,X). True value p 0. K(p 0 ;p) = p 0 log(p 0 /p)dν.
28 Schwartz s theorem (1965) Parameter p: ν-density on sample space (X,X). True value p 0. K(p 0 ;p) = p 0 log(p 0 /p)dν. Definition. p 0 is said to possess the Kullback-Leibler property relative to Π if Π ( p:k(p 0 ;p) < ǫ ) > 0 for every ǫ > 0. Bayesian model: X 1,...,X n p iid p, p Π.
29 Schwartz s theorem (1965) Parameter p: ν-density on sample space (X,X). True value p 0. K(p 0 ;p) = p 0 log(p 0 /p)dν. Definition. p 0 is said to possess the Kullback-Leibler property relative to Π if Π ( p:k(p 0 ;p) < ǫ ) > 0 for every ǫ > 0. Bayesian model: X 1,...,X n p iid p, p Π. Theorem. If p 0 has KL-property, and for every neighbourhood U of p 0 there exist tests φ n such that P n 0 φ n 0, sup P n (1 φ n ) 0, p U c then Π n ( X 1,...,X n ) is consistent at p 0.
30 Extended Schwartz s theorem Bayesian model: X 1,...,X n p iid p, p Π. Theorem. If If p 0 has KL-property and for every neighbourhood U of p 0 there exist C > 0, sets P n P and tests φ n such that Π(P \P n ) < e Cn, P n 0 φ n e Cn, sup p P n U c P n (1 φ n ) e Cn, then Π n ( X 1,...,X n ) is consistent at p 0. Tests exist if: Weak topology: always L 1 -distance: if logn ( ǫ,p n, 1 ) nǫ 2 /3, for all ε > 0. Definition. The covering number N(ǫ,P,d) is the minimal number of d-balls of radius ǫ needed to cover P.
31 Rate of contraction Bayesian model: X 1,...,X n p iid p, p Π. Definition. The posterior distribution Π n ( X (n) ) contracts at rate ǫ n 0 at θ 0 Θ if, for every M n, E θ0 Π n ( θ:d(θ,θ0 ) > M n ǫ n X (n)) 0, n.
32 Rate of contraction Bayesian model: X 1,...,X n p iid p, p Π. Definition. The posterior distribution Π n ( X (n) ) contracts at rate ǫ n 0 at θ 0 Θ if, for every M n, E θ0 Π n ( θ:d(θ,θ0 ) > M n ǫ n X (n)) 0, n. Benchmark rate for curve fitting: A function θ of d variables that has bounded derivatives of order β is estimable based on n observations at rate n β/(2β+d). Proposition. If the posterior distribution contracts at rate ǫ n at θ 0, then ˆθ n defined as the center of a (nearly) smallest ball that contains posterior mass at least 1/2 satisfies d(ˆθ n,θ 0 ) = O P (ǫ n ) under P (n) θ 0.
33 Basic contraction theorem Bayesian model: X 1,...,X n p iid p, p Π. K(p 0 ;p) = P 0 log p ( 0 p, V(p 0;p) = P 0 log p ) 0 2, h 2 (p 0,p) = p ( p 0 p) 2 dν. Theorem. Given metric d h whose balls are convex suppose that there exist P n P and C > 0, such that, ( (i) Π n p:k(p0 ;p) < ǫ 2 n,v(p 0 ;p) < ǫ 2 n) e Cnǫ 2 n, (prior mass) (ii) logn ( ǫ n,p n,d ) nǫ 2 n. (complexity) (iii) Π n (Pn) c e (C+4)nǫ2 n. Then the posterior rate of convergence for d is ǫ n n 1/2.
34 Basic contraction theorem proof Proof. There exist tests φ n with P n 0 φ n e nǫ2 n e nm2 ǫ 2 n/8 1 e nm2 ǫ 2 n/8, For A n = { n i=1 (p/p } 0)(X i )dπ n (p) e (2+C)nǫ2 n Π n ( p:d(p,p0 ) > Mǫ n X 1,...,X n ) φ n +1l{A c n}+e (2+C)nǫ2 n P n 0 (Ac n) 0. See further on. sup P n nm2ǫ2 (1 φ n ) e n/8. p P n :d(p,p 0 )>Mǫ n d(p,p 0 )>Mǫ n n i=1 p p 0 (X i )dπ n (p)(1 φ n ).
35 Basic contraction theorem proof continued Proof. By Fubini (Continued) P n 0 p P n :d(p,p 0 )>Mǫ n n i=1 p p 0 (X i )dπ n (p) P n nm2ǫ2 (1 φ n )dπ n (p) e n /8 p P n :d(p,p 0 )>Mǫ n By Fubini P n 0 P\P n n i=1 p p 0 (X i )dπ n (p) Π n (P \P n ).
36 Bounding the denominator Lemma. For any probability measure Π on P, and positive constant ǫ, with P n 0 -probability at least 1 (nǫ2 ) 1, n i=1 p p 0 (X i )dπ(p) Π ( p:k(p 0 ;p) < ǫ 2,V(p 0 ;p) < ǫ 2) e 2nǫ2.
37 Apply Chebyshev s inequality. Bounding the denominator Lemma. For any probability measure Π on P, and positive constant ǫ, with P n 0 -probability at least 1 (nǫ2 ) 1, n i=1 p p 0 (X i )dπ(p) Π ( p:k(p 0 ;p) < ǫ 2,V(p 0 ;p) < ǫ 2) e 2nǫ2. Proof. B:= { p:k(p 0 ;p) < ǫ 2 n,v(p 0 ;p) < ǫ 2 n}. n p log (X i )dπ(p) p 0 i=1 n i=1 log p p 0 (X i )dπ(p) =:Z. EZ = n K(p 0 ;p)dπ(p) > nǫ 2, varz np 0 ( log p 0 p dπ(p) ) 2 np0 ( log p 0 p ) 2dΠ(p) nǫ 2,
38 Interpretation Consider a maximal set of points p 1,...,p N in P n with d(p i,p j ) ǫ n. Maximality implies N N(ǫ n,p n,d) e c 1nǫ 2 n, under the entropy bound. The balls of radius ǫ n /2 around the points are disjoint and hence the sum of their prior masses will be less than 1. If the prior mass were evenly distributed over these balls, then each would have no more mass than e c 1nǫ 2 n. This is of the same order as the prior mass bound. This argument suggests that the conditions can only be satisfied for every p 0 in the model if the prior distributes its mass uniformly, at discretization level ǫ n.
39 Gaussian process priors
40 Gaussian process prior The law of a stochastic process W = (W t :t T) is a prior distribution on the space of functions θ:t R W is a Gaussian process if (W t1,...,w tk ) is multivariate Gaussian, for every t 1,...,t k. Mean and covariance function: t EW t, and (s,t) cov(w s,w t ), s,t T.
41 Brownian motion and its primitives , 1, 2 and 3 times integrated Brownian motion
42 Settings Density estimation X 1,...,X n iid in [0,1], p θ (x) = eθ(x) 1 0 eθ(t) dt. Distance on parameter: Hellinger on p θ. Norm on W : uniform. Classification (X 1,Y 1 ),...,(X n,y n ) iid in [0,1] {0,1} P θ (Y = 1 X = x) = 1 1+e θ(x). Regression Y 1,...,Y n independent N(θ(x i ),σ 2 ), for fixed design points x 1,...,x n. Ergodic diffusions (X t :t [0,n]), ergodic, recurrent: dx t = θ(x t )dt+σ(x t )db t. Distance on parameter: L 2 (G) on P θ. (G marginal of X i.) Norm on W : L 2 (G). Distance on parameter: empirical L 2 -distance on θ. Norm on W : empirical L 2 -distance. Distance on parameter: random Hellinger h n ( /σ µ0,2). Norm on W : L 2 (µ 0 ). (µ 0 stationary measure.)
43 Posterior contraction rates for Gaussian priors View Gaussian process W as measurable map into Banach space (B, ). Theorem. If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if P ( W w 0 < ε n ) e nε 2 n.
44 Posterior contraction rates for Gaussian priors View Gaussian process W as measurable map into Banach space (B, ). Theorem. If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if P ( W w 0 < ε n ) e nε 2 n. Proof. Stated condition is prior mass. Complexity is automatic due to concentration of Gaussian processes.
45 Posterior contraction rates for Gaussian priors View Gaussian process W as measurable map into Banach space (B, ). Theorem. If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if P ( W w 0 < ε n ) e nε 2 n. An equivalent condition is φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2, h H: h w 0 <ε n where φ 0 (ε) = logπ( W < ε) is the small ball exponent and (H, H ) is the RKHS. Both inequalities give lower bound on ε n. The first depends on W and not on w 0.
46 Brownian Motion Theorem. If θ 0 C β [0,1], then the rate for Brownian motion is n β/2 if β 1/2 and n 1/4 for every β 1/2. The rate is n β/(2β+1) iff β = 1/2.
47 Brownian Motion Theorem. If θ 0 C β [0,1], then the rate for Brownian motion is n β/2 if β 1/2 and n 1/4 for every β 1/2. The rate is n β/(2β+1) iff β = 1/2. The small ball probability of Brownian motion is P ( W < ε ) e (1/ε)2, ε 0. This causes a n 1/4 -rate even for very smooth truths.
48 Integrated Brownian Motion Theorem. If θ 0 C β [0,1], then the rate for (α 1/2)-times integrated Brownian is n (α β)/(2α+d). The rate is n β/(2β+1) iff β = α.
49 Integrated Brownian motion 1/c Γ(a,b). (G t :t > 0) the k-fold integral of Brownian motion released at zero, W t cg t. Theorem. The prior W = ( cg t :0 t 1) gives contraction rate n β/(2β+1) for θ 0 C β [0,1], for any β (0,k +1]. Bayes solves the bandwidth problem.
50 Square exponential process cov(g s,g t ) = e s t 2, s,t R d Theorem. The prior G gives a rate (logn) γ / n if θ 0 is analytic, but may give a rate (logn) γ if θ 0 is only ordinary smooth.
51 Square exponential process adaptation by random time scaling c d Γ. (G t :t > 0) square exponential process. W t G ct. Theorem. if θ 0 C β [0,1] d, then the rate of contraction is nearly n β/(2β+d). if θ 0 is supersmooth, then the rate is nearly n 1/
52 Gaussian processes: summary Recovery is best if prior matches truth. Mismatch slows down, but does not prevent, recovery. Mismatch can be prevented by using hyperparameters.
53 Dirichlet process
54 Finite-dimensonal Dirichlet distribution Definition. (θ 1,...,θ k ) possesses a Dir(k,α 1,...,α k )-distribution for α i > 0 it has (Lebesgue) density on the unit simplex proportional to θ θ α θ α k 1 k.
55 Finite-dimensonal Dirichlet distribution Definition. (θ 1,...,θ k ) possesses a Dir(k,α 1,...,α k )-distribution for α i > 0 it has (Lebesgue) density on the unit simplex proportional to θ θ α θ α k 1 k. We extend to α i = 0 for one or more i on the understanding that θ i = 0. Theorem. If θ Dir(k,α) and N θ Multinom(n,α), then θ N Dir(k,α+N). Proof. Π(dθ N) ( n) k N i=1 θn i i k i=1 θα i 1 i.
56 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ).
57 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ). Lemma. EP(A) = α(a) α.
58 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ). Lemma. EP(A) = α(a) α
59 Dirichlet process Definition. A random measure P on a measurable space (X,X) is a Dirichlet process with base measure α, if for every partition A 1,...,A k of X, ( P(A1 ),...,P(A k ) ) Dir ( k;α(a 1 ),...,α(a k ) ). Lemma. EP(A) = α(a) α. Theorem. If θ 1,θ 2,... ᾱ iid and Y 1,Y 2,... Be(1,M) iid are independent, and W j = (1 Y 1 ) (1 Y j 1 )Y j, then P:= W j δ θj DP(Mᾱ). j=1
60 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i.
61 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i. Proof. ( P(A1 ),...,P(A k ) ) X 1,...,X n ( P(A 1 ),...,P(A k ) ) (N 1,...,N k ), where N i = np n (A i ). Apply result for finite-dimensional Dirichlet.
62 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. Corollary. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i. E ( P(A) X 1,...,X n ) = α(a) α +n + n α +n P n(a).
63 Conjugacy of Dirichlet process P DP(α), X 1,X 2,... P iid P. Theorem. P X 1,...,X n DP(α+nP n ), for P n = n 1 n i=1 δ X i. Theorem. The posterior distribution of P(A) satisfies a Bernstein-von Mises theorem: Π ( ) ( P(A) X 1,...,X n N Pn (A), P 0(A)(1 P 0 )(A)) 0. n
64 Pitman-Yor process θ 1,θ 2,... iid ᾱ independent of Y j ind Be(1 σ,m +jσ). W j = (1 Y 1 ) (1 Y j 1 )Y j. P:= W j δ θj. j=1 Theorem. The posterior distribution based on X 1,...,X n P P is inconsistent unless σ = 0 or P 0 = α or P 0 is discrete
65 Dirichlet process mixtures
66 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Posterior mean (solid black) and 10 draws of the posterior distribution for a sample of size 50 from a mixture of two normals (red).
67 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Two cases for the true density p 0 : Supersmooth: p 0 = p F0,σ 0, for compactly supported F 0. Take prior for σ with continuous positive density on (a,b) σ 0. Ordinary smooth: p 0 has β derivatives and exponentially small tails. Take 1/σ a priori Gamma distributed.
68 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Two cases for the true density p 0 : Supersmooth: p 0 = p F0,σ 0, for compactly supported F 0. Take prior for σ with continuous positive density on (a,b) σ 0. Ordinary smooth: p 0 has β derivatives and exponentially small tails. Take 1/σ a priori Gamma distributed. Theorem. Hellinger rate of contraction is nearly n 1/2 in the supersmooth case. nearly n β/(2β+1) in the ordinary smooth case.
69 Dirichlet normal mixtures p F,σ (x) = 1 ( x z ) σ φ df(z). σ X 1,...,X n F,σ iid p F,σ, F DP(α) σ π. Two cases for the true density p 0 : Supersmooth: p 0 = p F0,σ 0, for compactly supported F 0. Take prior for σ with continuous positive density on (a,b) σ 0. Ordinary smooth: p 0 has β derivatives and exponentially small tails. Take 1/σ a priori Gamma distributed. Adaptation to any smoothness with a Gaussian kernel! Kernel density estimation needs higher order kernels. 1 nσ n ( x Xi ) φ σ i=1 = p Fn,σ(x).
70 Key lemma: finite approximation Lemma. For any probability measure F on the interval [0,1] there exists a discrete probability measure F on with at most N log 1 ε support points, such that p F,1 p F,1 ε, ( p F,1 p F,1 1 ε log 1 1/2. ε) Proof. Match moments of F and F up to order log(1/ǫ). Taylor expand the kernel z φ(x z).
Bayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationLectures on Nonparametric Bayesian Statistics
Lectures on Nonparametric Bayesian Statistics Aad van der Vaart Universiteit Leiden, Netherlands Bad Belzig, March 2013 Contents Introduction Dirichlet process Consistency and rates Gaussian process priors
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationBayesian nonparametrics
Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationNonparametric Bayesian model selection and averaging
Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationIntroduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems
Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More information21.1 Lower bounds on minimax risk for functional estimation
ECE598: Information-theoretic methods in high-dimensional statistics Spring 016 Lecture 1: Functional estimation & testing Lecturer: Yihong Wu Scribe: Ashok Vardhan, Apr 14, 016 In this chapter, we will
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationBayesian Statistics in High Dimensions
Bayesian Statistics in High Dimensions Lecture 2: Sparsity Aad van der Vaart Universiteit Leiden, Netherlands 47th John H. Barrett Memorial Lectures, Knoxville, Tenessee, May 2017 Contents Sparsity Bayesian
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationAdaptive Bayesian procedures using random series priors
Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department
More informationStatistica Sinica Preprint No: SS R2
Statistica Sinica Preprint No: SS-2017-0074.R2 Title The semi-parametric Bernstein-von Mises theorem for regression models with symmetric errors Manuscript ID SS-2017-0074.R2 URL http://www.stat.sinica.edu.tw/statistica/
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation
More informationWeak convergence of Markov chain Monte Carlo II
Weak convergence of Markov chain Monte Carlo II KAMATANI, Kengo Mar 2011 at Le Mans Background Markov chain Monte Carlo (MCMC) method is widely used in Statistical Science. It is easy to use, but difficult
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More information1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th
The Annals of Statistics 2001, Vol. 29, No. 5, 1233 1263 ENTROPIES AND RATES OF CONVERGENCE FOR MAXIMUM LIKELIHOOD AND BAYES ESTIMATION FOR MIXTURES OF NORMAL DENSITIES By Subhashis Ghosal and Aad W. van
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationExchangeability. Peter Orbanz. Columbia University
Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter
More informationA semiparametric Bernstein - von Mises theorem for Gaussian process priors
Noname manuscript No. (will be inserted by the editor) A semiparametric Bernstein - von Mises theorem for Gaussian process priors Ismaël Castillo Received: date / Accepted: date Abstract This paper is
More informationGaussian Approximations for Probability Measures on R d
SIAM/ASA J. UNCERTAINTY QUANTIFICATION Vol. 5, pp. 1136 1165 c 2017 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license Gaussian Approximations for Probability Measures
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationSemiparametric Minimax Rates
arxiv: math.pr/0000000 Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationBayesian statistics: Inference and decision theory
Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More information16.1 Bounding Capacity with Covering Number
ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationarxiv: v1 [math.st] 15 Dec 2017
A THEORETICAL FRAMEWORK FOR BAYESIAN NONPARAMETRIC REGRESSION: ORTHONORMAL RANDOM SERIES AND RATES OF CONTRACTION arxiv:171.05731v1 math.st] 15 Dec 017 By Fangzheng Xie, Wei Jin, and Yanxun Xu Johns Hopkins
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationChapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets
Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationConvergence of latent mixing measures in nonparametric and mixture models 1
Convergence of latent mixing measures in nonparametric and mixture models 1 XuanLong Nguyen xuanlong@umich.edu Department of Statistics University of Michigan Technical Report 527 (Revised, Jan 25, 2012)
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationWhat are the Findings?
What are the Findings? James B. Rawlings Department of Chemical and Biological Engineering University of Wisconsin Madison Madison, Wisconsin April 2010 Rawlings (Wisconsin) Stating the findings 1 / 33
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationHypes and Other Important Developments in Statistics
Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationIntroduction to Bayesian Methods. Introduction to Bayesian Methods p.1/??
to Bayesian Methods Introduction to Bayesian Methods p.1/?? We develop the Bayesian paradigm for parametric inference. To this end, suppose we conduct (or wish to design) a study, in which the parameter
More informationFree energy estimates for the two-dimensional Keller-Segel model
Free energy estimates for the two-dimensional Keller-Segel model dolbeaul@ceremade.dauphine.fr CEREMADE CNRS & Université Paris-Dauphine in collaboration with A. Blanchet (CERMICS, ENPC & Ceremade) & B.
More informationThe largest eigenvalues of the sample covariance matrix. in the heavy-tail case
The largest eigenvalues of the sample covariance matrix 1 in the heavy-tail case Thomas Mikosch University of Copenhagen Joint work with Richard A. Davis (Columbia NY), Johannes Heiny (Aarhus University)
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationTheoretical Statistics. Lecture 1.
1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationSTAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song
STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song Presenter: Jiwei Zhao Department of Statistics University of Wisconsin Madison April
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationIntro to Bayesian Methods
Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More informationLARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS. S. G. Bobkov and F. L. Nazarov. September 25, 2011
LARGE DEVIATIONS OF TYPICAL LINEAR FUNCTIONALS ON A CONVEX BODY WITH UNCONDITIONAL BASIS S. G. Bobkov and F. L. Nazarov September 25, 20 Abstract We study large deviations of linear functionals on an isotropic
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationProbability for Statistics and Machine Learning
~Springer Anirban DasGupta Probability for Statistics and Machine Learning Fundamentals and Advanced Topics Contents Suggested Courses with Diffe~ent Themes........................... xix 1 Review of Univariate
More informationPosterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures
Submitted to Bernoulli arxiv: 1.v1 Posterior concentration rates for empirical Bayes procedures with applications to Dirichlet process mixtures SOPHIE DONNET 1 VINCENT RIVOIRARD JUDITH ROUSSEAU 3 and CATIA
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More information