Bayesian Regularization
|
|
- Mercy Allison
- 6 years ago
- Views:
Transcription
1 Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010
2 Contents Introduction Abstract result Gaussian process priors
3 Co-authors Abstract result Subhashis Ghosal Gaussian process priors Harry van Zanten
4 Introduction
5 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a probability density p θ. This gives a joint distribution of (X, Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x: the posterior distribution. dπ(θ X) p θ (X)dΠ(θ)
6 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a probability density p θ. This gives a joint distribution of (X, Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x: the posterior distribution. Π(Θ B X) = B p θ(x)dπ(θ) Θ p θ(x)dπ(θ)
7 The Bayesian paradigm A parameter Θ is generated according to a prior distribution Π. Given Θ = θ the data X is generated according to a probability density p θ. This gives a joint distribution of (X, Θ). Given observed data x the statistician computes the conditional distribution of Θ given X = x: the posterior distribution. dπ(θ X) p θ (X)dΠ(θ)
8 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n, θ). Using his famous rule he computed that the posterior distribution is then Beta(X + 1, n X + 1). dπ n (θ X) ( ) n θ X (1 θ) n X 1. X
9 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n, θ). Using his famous rule he computed that the posterior distribution is then Beta(X + 1, n X + 1)
10 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n, θ). Using his famous rule he computed that the posterior distribution is then Beta(X + 1, n X + 1) n=20,x=
11 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n, θ). Using his famous rule he computed that the posterior distribution is then Beta(X + 1, n X + 1) n=20,x=
12 Reverend Thomas Thomas Bayes ( , 1763) followed this argument with Θ possessing the uniform distribution and X given Θ = θ binomial (n, θ). Using his famous rule he computed that the posterior distribution is then Beta(X + 1, n X + 1) n=20,x=
13 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ).
14 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
15 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
16 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
17 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
18 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
19 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
20 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
21 Nonparametric Bayes If the parameter θ is a function, then the prior is a probability distribution on an function space. So is the posterior, given the data. Bayes formula does not change: dπ(θ X) p θ (X)dΠ(θ). Prior and posterior can be visualized by plotting functions that are simulated from these distributions
22 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set.
23 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set. We like Π(θ X) to put most of its mass near θ 0 for most X.
24 Frequentist Bayesian Assume that the data X is generated according to a given parameter θ 0 and consider the posterior Π(θ X) as a random measure on the parameter set. We like Π(θ X) to put most of its mass near θ 0 for most X. Asymptotic setting: data X n where the information increases as n. We like the posterior Π n ( X n ) to contract to {θ 0 }, at a good rate.
25 Rate of contraction Assume X n is generated according to a given parameter θ 0 where the information increases as n. Posterior is consistent if, for every ε > 0, E θ0 Π ( θ: d(θ, θ 0 ) < ε X n) 1. Posterior contracts at rate at least ε n if E θ0 Π ( θ: d(θ, θ 0 ) < ε n X n) 1. Basic results on consistency were proved by Doob (1948) and Schwarz (1965). Interest in rates is recent.
26 Minimaxity To a given model Θ is attached an optimal rate of convergence defined by the minimax criterion ε n = inf sup E θ d ( T(X), θ ). T θ Θ This criterion has nothing to do with Bayes. A prior is good if the posterior contracts at this rate. (?)
27 Adaptation A model can be viewed as an instrument to test quality. It makes sense to use a collection (Θ α : α A) of models simultaneously, e.g. a scale of regularity classes. A posterior is good if it adapts: if the true parameter belongs to Θ α, then the contraction rate is at least the minimax rate for this model.
28 Bayesian perspective Any prior (and hence posterior) is appropriate per se. In complex situations subject knowledge can be and must be incorporated in the prior. Computational ease is important for prior choice as well. Frequentist properties reveal key properties of priors of interest.
29 Abstract result
30 Entropy The covering number N(ε, Θ, d) of a metric space (Θ, d) is the minimal number of balls of radius ε needed to cover Θ. ε big ε small Entropy is its logarithm: log N(ε, Θ, d).
31 Rate theorem iid observations Given a random sample X 1,...,X n from a density p 0 and a prior Π on a set P of densities consider the posterior dπ n (p X 1,..., X n ) n p(x i )dπ(p). i=1 THEOREM [Ghosal+Ghosh+vdV, 2000] The Hellinger contraction rate is ε n if there exist P n P such that (1) log N(ε n, P n, h) nε 2 n and Π(P n ) = 1 o(e 3nε2 n ). entropy. (2) Π ( B KL (p 0, ε n ) ) e nε2 n. prior mass. h is the Hellinger distance : h 2 (p, q) = ( p q) 2 dµ. B KL (p 0, ε) is a Kullback-Leibler neighborhood of p 0.
32 Rate theorem iid observations Given a random sample X 1,...,X n from a density p 0 and a prior Π on a set P of densities consider the posterior dπ n (p X 1,..., X n ) n p(x i )dπ(p). i=1 THEOREM [Ghosal+Ghosh+vdV, 2000] The Hellinger contraction rate is ε n if there exist P n P such that (1) log N(ε n, P n, h) nε 2 n and Π(P n ) = 1 o(e 3nε2 n ). entropy. (2) Π ( B KL (p 0, ε n ) ) e nε2 n. prior mass. The entropy condition ensures that the likelihood is not too variable, so that it cannot be large at a wrong place by pure randomness. Le Cam (1964) showed that it gives the minimax rate.
33 Rate theorem general Given data X n following a model (Pθ n : θ Θ) that satisfies Le Cam s testing criterion, and a prior Π, form posterior dπ n (θ X n ) p n θ (Xn )dπ(θ). THEOREM The rate of contraction is ε n 1/ n if there exist Θ n Θ such that (1) D n (ε n, Θ n, d n ) nε 2 n and Π n (Θ Θ n ) = o(e 3nε2 n ). (2) Π n ( Bn (θ 0, ε n ; k) ) e nε2 n. B n (θ 0, ε; k) is Kullback-Leibler type neighbourhood of p n θ 0.
34 Rate theorem general Given data X n following a model (Pθ n : θ Θ) that satisfies Le Cam s testing criterion, and a prior Π, form posterior dπ n (θ X n ) p n θ (Xn )dπ(θ). THEOREM The rate of contraction is ε n 1/ n if there exist Θ n Θ such that (1) D n (ε n, Θ n, d n ) nε 2 n and Π n (Θ Θ n ) = o(e 3nε2 n ). (2) Π n ( Bn (θ 0, ε n ; k) ) e nε2 n. The theorem can be refined in various ways.
35 Le Cam s testing criterion Statistical model (P n θ : θ Θ) indexed by metric space (Θ, d). For all ε > 0: for all θ 1 with d(θ 1, θ 0 ) > ε test φ n with P n θ 0 φ n e nε2, sup Pθ n (1 φ n) e nε2 θ Θ:d(θ,θ 1 )<ε/2 θ 1 θ 0.. > ɛ radius ɛ/2 This applies to independent data, Markov chains, Gaussian time series, ergodic diffusions,....
36 Adaptation by hierarchical prior i.i.d. P n,α collection of densities with prior Π n,α, for α A. Prior weights λ n = (λ n,α : α A). Π n = α A λ n,απ n,α. THEOREM The Hellinger contraction rate is ε n,β if the prior weights satisfies (*) below and (1) log N ( ε n,α, P n,α, h ) nε 2 n,α, every α A. (2) Π n,β ( Bn,β (p 0, ε n,β ) ) e nε2 n,β. B n,α (p 0, ε) is Kullback-Leibler type neighbourhood of p 0 within P n,α.
37 Condition (*) on prior weights (simplified) α<β λ n,α e nε2 n,α + λ n,α e nε2 n,β, λ n,β λ n,β α>β α<β λ n,α λ n,β Π n,α ( Cn,α (p 0, ε n,α ) ) e 4nε2 n,β. α < β means ε n,α ε n,β. C n,α (p 0, ε) is Hellinger ball of radius ε around p 0 in P n,α.
38 Condition (*) on prior weights (simplified) α<β λ n,α e nε2 n,α + λ n,α e nε2 n,β, λ n,β λ n,β α>β α<β λ n,α λ n,β Π n,α ( Cn,α (p 0, ε n,α ) ) e 4nε2 n,β. α < β means ε n,α ε n,β. C n,α (p 0, ε) is Hellinger ball of radius ε around p 0 in P n,α. In many situations there is much freedom in choice of weights. The weights λ n,α µ α e Cnε2 n,α always work.
39 Model selection THEOREM Under the conditions of the theorem Π n ( α: α < β X1,, X n ) P 0, Π n ( α: α β, h(pn,α, p 0 ) ε n,β X 1,, X n ) P 0. Too big models do not get posterior weight. Neither do small models that are far from the truth.
40 Examples of priors Dirichlet mixtures of normals. Discrete priors. Mixtures of betas. Series priors (splines, Fourier, wavelets,...). Independent increment process priors. Sparse priors Gaussian process priors.
41 Gaussian process priors
42 Gaussian process The law of a stochastic process W = (W t : t T) is a prior distribution on the space of functions w: T R. Gaussian processes have been found useful, because of their variety and because of computational properties. Every Gaussian prior is reasonable in some way. We shall study performance with smoothness classes as test case.
43 Example: Brownian density estimation For W Brownian motion use as prior on a density p on [0, 1]: x ew x 1 0 ew y dy. [Leonard, Lenk, Tokdar & Ghosh]
44 Example: Brownian density estimation For W Brownian motion use as prior on a density p on [0, 1]: x ew x 1 0 ew y dy Brownian motion t W t Prior density t c exp(w t )
45 Integrated Brownian motion , 1, 2 and 3 times integrated Brownian motion
46 Stationary processes Gaussian spectral measure Matérn spectral measure (3/2)
47 Other Gaussian processes Brownian sheet alpha= alpha= Fractional Brownian motion w = i w ie i, w i ind N(0, σ 2 i ) Series prior
48 Rates for Gaussian priors Prior W is Gaussian map in (B, ) with RKHS (H, H ) and small ball exponent φ 0 (ε) = log P( W < ε). THEOREM If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2. h H: h w 0 <ε n
49 Rates for Gaussian priors Prior W is Gaussian map in (B, ) with RKHS (H, H ) and small ball exponent φ 0 (ε) = log P( W < ε). THEOREM If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2. h H: h w 0 <ε n Both inequalities give lower bound on ε n. The first depends on W and not on w 0. If w 0 H, then second inequality is satisfied for ε n 1/ n.
50 Rates for Gaussian priors Prior W is Gaussian map in (B, ) with RKHS (H, H ) and small ball exponent φ 0 (ε) = log P( W < ε). THEOREM If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2. h H: h w 0 <ε n Both inequalities give lower bound on ε n. The first depends on W and not on w 0. If w 0 H, then second inequality is satisfied for ε n 1/ n.
51 Settings Density estimation X 1,..., X n iid in [0,1], p w (x) = ew x 1 0 ew t dt. Distance on parameter: Hellinger on p w. Norm on W : uniform. Classification (X 1, Y 1 ),..., (X n, Y n ) iid in [0,1] {0,1} P w (Y = 1 X = x) = e w x. Distance on parameter: L 2 (G) on P w. (G marginal of X i.) Norm on W : L 2 (G). Regression Y 1,..., Y n independent N(w(x i ), σ 2 ), for fixed design points x 1,..., x n. Distance on parameter: empirical L 2 -distance on w. Norm on W : empirical L 2 -distance. Ergodic diffusions (X t : t [0, n]), ergodic, recurrent: dx t = w(x t ) dt + σ(x t ) db t. Distance on parameter: Hellinger h n ( /σ µ0,2). Norm on W : L 2 (µ 0 ). (µ 0 stationary measure.) random
52 Reproducing kernel Hilbert space To every Gaussian random element with values in a Banach space (B, ) is attached a certain Hilbert space (H, H ), called the RKHS. H is stronger than and hence can consider H B.
53 Reproducing kernel Hilbert space To every Gaussian random element with values in a Banach space (B, ) is attached a certain Hilbert space (H, H ), called the RKHS. H is stronger than and hence can consider H B. DEFINITION For S: B B defined by Sb = EWb (W), the RKHS is the completion of SB under Sb 1, Sb 2 H = Eb 1(W)b 2(W).
54 Reproducing kernel Hilbert space To every Gaussian random element with values in a Banach space (B, ) is attached a certain Hilbert space (H, H ), called the RKHS. H is stronger than and hence can consider H B. DEFINITION For a process W = (W x : x X) with bounded sample paths and covariance function K(x, y) = EW x W y, the RKHS is the completion of the set of functions x α i K(y i, x), i under α i K(y i, ), i j β j K(z j, ) = H i α i β j K(y i, z j ). j
55 Reproducing kernel Hilbert space To every Gaussian random element with values in a Banach space (B, ) is attached a certain Hilbert space (H, H ), called the RKHS. H is stronger than and hence can consider H B. EXAMPLE If W is multivariate normal N d (0, Σ), then the RKHS is R d with norm h H = h t Σ 1 h
56 Reproducing kernel Hilbert space To every Gaussian random element with values in a Banach space (B, ) is attached a certain Hilbert space (H, H ), called the RKHS. H is stronger than and hence can consider H B. EXAMPLE Any W can be represented as W = µ i Z i e i, i=1 for numbers µ i 0, iid standard normal Z 1, Z 2,..., and e 1, e 2,... B with e 1 = e 2 = = 1. The RKHS consists of all h: = i h ie i with h 2 H : = i h 2 i µ 2 i <.
57 Reproducing kernel Hilbert space To every Gaussian random element with values in a Banach space (B, ) is attached a certain Hilbert space (H, H ), called the RKHS. H is stronger than and hence can consider H B. EXAMPLE Brownian motion is a random element in C[0, 1]. Its RKHS is H = {h: h (t) 2 dt < } with norm h H = h 2.
58 Small ball probability The small ball probability of a Gaussian random element W in (B, ) is P( W < ε), and the small ball exponent φ 0 (ε) is minus the logarithm of this small ball for uniform norm
59 Small ball probability The small ball probability of a Gaussian random element W in (B, ) is P( W < ε), and the small ball exponent φ 0 (ε) is minus the logarithm of this. It can be computed either by probabilistic arguments, or analytically from the RKHS. THEOREM [Kuelbs & Li (93)] For H 1 the unit ball of the RKHS (up to constants), ( ε ) φ 0 (ε) log N φ0 (ε), H 1,. There is a big literature. (In July entries in database maintained by Michael Lifshits.)
60 Rates for Gaussian priors proof Prior W is Gaussian map in (B, ) with RKHS (H, H ) and small ball exponent φ 0 (ε) = log P( W < ε). THEOREM If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2. h H: h w 0 <ε n
61 Rates for Gaussian priors proof Prior W is Gaussian map in (B, ) with RKHS (H, H ) and small ball exponent φ 0 (ε) = log P( W < ε). THEOREM If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2. h H: h w 0 <ε n PROOF The posterior rate is ε n if there exist sets B n such that (1) log N(ε n, B n, d) nε 2 n and P(W B n ) = 1 o(e 3nε2 n ). entropy. (2) P ( W w 0 < ε n ) e nε 2 n. prior mass.
62 Rates for Gaussian priors proof Prior W is Gaussian map in (B, ) with RKHS (H, H ) and small ball exponent φ 0 (ε) = log P( W < ε). THEOREM If statistical distances on the model combine appropriately with the norm of B, then the posterior rate is ε n if φ 0 (ε n ) nε n 2 AND inf h 2 H nε n 2. h H: h w 0 <ε n PROOF The posterior rate is ε n if there exist sets B n such that (1) log N(ε n, B n, d) nε 2 n and P(W B n ) = 1 o(e 3nε2 n ). entropy. (2) P ( W w 0 < ε n ) e nε 2 n. prior mass. Take B n = M n H 1 + ε n B 1 for large M n (H 1, B 1 the unit balls of H, B).
63 Proof (2) key results φ w0 (ε): = φ 0 (ε) + inf h H: h w 0 <ε h 2 H. THEOREM [Kuelbs & Li (93)] Concentration function measures concentration around w 0 (up to factors 2): P( W w 0 < ε) e φ w 0 (ε). THEOREM [Borell (75)] For H 1 and B 1 the unit balls of RKHS and B P(W / MH 1 + εb 1 ) 1 Φ ( Φ 1 (e φ 0(ε) ) + M ).
64 (Integrated) Brownian Motion THEOREM If w 0 C β [0, 1], then the rate for Brownian motion is: n 1/4 if β 1/2; n β/2 if β 1/2. The small ball exponent of Brownian motion is φ 0 (ε) (1/ε) 2 as ε 0. This gives the n 1/4 -rate, even for very smooth truths. Truths with β 1/2 are far from the RKHS, giving the rate n β/2. The minimax rate is attained iff β = 1/2.
65 (Integrated) Brownian Motion THEOREM If w 0 C β [0, 1], then the rate for Brownian motion is: n 1/4 if β 1/2; n β/2 if β 1/2. The small ball exponent of Brownian motion is φ 0 (ε) (1/ε) 2 as ε 0. This gives the n 1/4 -rate, even for very smooth truths. Truths with β 1/2 are far from the RKHS, giving the rate n β/2. The minimax rate is attained iff β = 1/2. THEOREM If w 0 C β [0, 1], then the rate for (α 1/2)-times integrated Brownian is n (β α)/(2α+d). The minimax rate is attained iff β = α.
66 Stationary processes A stationary Gaussian field (W t : t R d ) is characterized through a spectral measure µ, by cov(w s, W t ) = e iλt (s t) dµ(λ).
67 Stationary processes A stationary Gaussian field (W t : t R d ) is characterized through a spectral measure µ, by cov(w s, W t ) = e iλt (s t) dµ(λ). THEOREM Suppose that µ is Gaussian. Let ŵ 0 be the Fourier transform of w 0 : [0, 1] d R. If e λ ŵ 0 (λ) 2 dλ <, then rate of contraction is near 1/ n. If (1 + λ 2 ) β ŵ 0 (λ) 2 dλ <, then rate is (1/log n) κ β.
68 Stationary processes A stationary Gaussian field (W t : t R d ) is characterized through a spectral measure µ, by cov(w s, W t ) = e iλt (s t) dµ(λ). THEOREM Suppose that µ is Gaussian. Let ŵ 0 be the Fourier transform of w 0 : [0, 1] d R. If e λ ŵ 0 (λ) 2 dλ <, then rate of contraction is near 1/ n. If (1 + λ 2 ) β ŵ 0 (λ) 2 dλ <, then rate is (1/log n) κ β. THEOREM Suppose that dµ(λ) = (1 + λ 2 ) (α d/2) dλ. If w 0 C β [0, 1] d, then rate of contraction is n (α β)/(2α+d).
69 Adaptation Every Gaussian prior is good for some regularity class, but may be very bad for another. This can be alleviated by putting a prior on the regularity of the process. An alternative, more attractive approach is scaling.
70 Stretching or shrinking Sample paths can be smoothed by stretching
71 Stretching or shrinking Sample paths can be smoothed by stretching and roughened by shrinking
72 Scaled (integrated) Brownian motion W t = B t/cn for B Brownian motion, and c n n (2α 1)/(2α+1) α < 1/2: c n 0 (shrink). α (1/2, 1]: c n (stretch). THEOREM The prior W t = B t/cn gives optimal rate for w 0 C α [0, 1], α (0, 1].
73 Scaled (integrated) Brownian motion W t = B t/cn for B Brownian motion, and c n n (2α 1)/(2α+1) α < 1/2: c n 0 (shrink). α (1/2, 1]: c n (stretch). THEOREM The prior W t = B t/cn gives optimal rate for w 0 C α [0, 1], α (0, 1]. THEOREM Appropriate scaling of k times integrated Brownian motion gives optimal prior for every α (0, k + 1].
74 Scaled (integrated) Brownian motion W t = B t/cn for B Brownian motion, and c n n (2α 1)/(2α+1) α < 1/2: c n 0 (shrink). α (1/2, 1]: c n (stretch). THEOREM The prior W t = B t/cn gives optimal rate for w 0 C α [0, 1], α (0, 1]. THEOREM Appropriate scaling of k times integrated Brownian motion gives optimal prior for every α (0, k + 1]. Stretching helps a little, shrinking helps a lot.
75 Scaled smooth stationary process A Gaussian field with infinitely-smooth sample paths is obtained for EG s G t = exp( s t 2 ). THEOREM The prior W t = G t/cn for c n n 1/(2α+d) gives nearly optimal rate for w 0 C α [0, 1], any α > 0.
76 Adaptation by random scaling Choose A d from a Gamma distribution. Choose (G t : t > 0) centered Gaussian with EG s G t = exp ( s t 2). Set W t G At. THEOREM if w 0 C α [0, 1] d, then the rate of contraction is nearly n α/(2α+d). if w 0 is supersmooth, then the rate is nearly n 1/2.
77 Adaptation by random scaling Choose A d from a Gamma distribution. Choose (G t : t > 0) centered Gaussian with EG s G t = exp ( s t 2). Set W t G At. THEOREM if w 0 C α [0, 1] d, then the rate of contraction is nearly n α/(2α+d). if w 0 is supersmooth, then the rate is nearly n 1/2. The first result is also true for randomly scaled k-times integrated Brownian motion and α k + 1.
78 Conclusion
79 Conclusion and final remark There exist natural (fixed) priors that yield fully automatic smoothing at the correct bandwidth. (For instance, randomly scaled Gaussian processes.)
80 Conclusion and final remark There exist natural (fixed) priors that yield fully automatic smoothing at the correct bandwidth. (For instance, randomly scaled Gaussian processes.) Similar statements are true for adaptation to the scale of models described by sparsity (research in progress).
Nonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationLectures on Nonparametric Bayesian Statistics
Lectures on Nonparametric Bayesian Statistics Aad van der Vaart Universiteit Leiden, Netherlands Bad Belzig, March 2013 Contents Introduction Dirichlet process Consistency and rates Gaussian process priors
More informationNonparametric Bayesian Methods - Lecture I
Nonparametric Bayesian Methods - Lecture I Harry van Zanten Korteweg-de Vries Institute for Mathematics CRiSM Masterclass, April 4-6, 2016 Overview of the lectures I Intro to nonparametric Bayesian statistics
More informationNonparametric Bayesian model selection and averaging
Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationBayesian nonparametrics
Bayesian nonparametrics Posterior contraction and limiting shape Ismaël Castillo LPMA Université Paris VI Berlin, September 2016 Ismaël Castillo (LPMA Université Paris VI) BNP lecture series Berlin, September
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationLatent factor density regression models
Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES
Submitted to the Annals of Statistics arxiv: 1111.1044v1 ANISOTROPIC FUNCTION ESTIMATION USING MULTI-BANDWIDTH GAUSSIAN PROCESSES By Anirban Bhattacharya,, Debdeep Pati, and David Dunson, Duke University,
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationAdaptive Bayesian procedures using random series priors
Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department
More informationAsymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors
Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:
More informationOptimality of Poisson Processes Intensity Learning with Gaussian Processes
Journal of Machine Learning Research 16 (2015) 2909-2919 Submitted 9/14; Revised 3/15; Published 12/15 Optimality of Poisson Processes Intensity Learning with Gaussian Processes Alisa Kirichenko Harry
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationExchangeability. Peter Orbanz. Columbia University
Exchangeability Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent noise Peter
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationHypes and Other Important Developments in Statistics
Hypes and Other Important Developments in Statistics Aad van der Vaart Vrije Universiteit Amsterdam May 2009 The Hype Sparsity For decades we taught students that to estimate p parameters one needs n p
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationIntroduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems
Introduction to Bayesian learning Lecture 2: Bayesian methods for (un)supervised problems Anne Sabourin, Ass. Prof., Telecom ParisTech September 2017 1/78 1. Lecture 1 Cont d : Conjugate priors and exponential
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationBayesian Sparse Linear Regression with Unknown Symmetric Error
Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of
More informationBayesian Aggregation for Extraordinarily Large Dataset
Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationA semiparametric Bernstein - von Mises theorem for Gaussian process priors
Noname manuscript No. (will be inserted by the editor) A semiparametric Bernstein - von Mises theorem for Gaussian process priors Ismaël Castillo Received: date / Accepted: date Abstract This paper is
More informationOn Bayesian bandit algorithms
On Bayesian bandit algorithms Emilie Kaufmann joint work with Olivier Cappé, Aurélien Garivier, Nathaniel Korda and Rémi Munos July 1st, 2012 Emilie Kaufmann (Telecom ParisTech) On Bayesian bandit algorithms
More informationLAN property for sde s with additive fractional noise and continuous time observation
LAN property for sde s with additive fractional noise and continuous time observation Eulalia Nualart (Universitat Pompeu Fabra, Barcelona) joint work with Samy Tindel (Purdue University) Vlad s 6th birthday,
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationThe Bayesian approach to inverse problems
The Bayesian approach to inverse problems Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology ymarz@mit.edu, http://uqgroup.mit.edu
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationConvergence of latent mixing measures in nonparametric and mixture models 1
Convergence of latent mixing measures in nonparametric and mixture models 1 XuanLong Nguyen xuanlong@umich.edu Department of Statistics University of Michigan Technical Report 527 (Revised, Jan 25, 2012)
More informationNonparametric Drift Estimation for Stochastic Differential Equations
Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,
More informationINFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1. By Yuhong Yang and Andrew Barron Iowa State University and Yale University
The Annals of Statistics 1999, Vol. 27, No. 5, 1564 1599 INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1 By Yuhong Yang and Andrew Barron Iowa State University and Yale University
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More information16.1 Bounding Capacity with Covering Number
ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationNonparametric Bayes Inference on Manifolds with Applications
Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon
More informationMODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY
ECO 513 Fall 2008 MODEL COMPARISON CHRISTOPHER A. SIMS PRINCETON UNIVERSITY SIMS@PRINCETON.EDU 1. MODEL COMPARISON AS ESTIMATING A DISCRETE PARAMETER Data Y, models 1 and 2, parameter vectors θ 1, θ 2.
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationTANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.)
ABSTRACT TANG, YONGQIANG. Dirichlet Process Mixture Models for Markov processes. (Under the direction of Dr. Subhashis Ghosal.) Prediction of the future observations is an important practical issue for
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationBAYESIAN DETECTION OF IMAGE BOUNDARIES. Duke University and North Carolina State University
Submitted to the Annals of Statistics arxiv: arxiv:1508.05847 BAYESIAN DETECTION OF IMAGE BOUNDARIES By Meng Li and Subhashis Ghosal Duke University and North Carolina State University Detecting boundary
More informationSurrogate loss functions, divergences and decentralized detection
Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1
More informationGeneral Theory of Large Deviations
Chapter 30 General Theory of Large Deviations A family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations
More informationSmall ball probabilities and metric entropy
Small ball probabilities and metric entropy Frank Aurzada, TU Berlin Sydney, February 2012 MCQMC Outline 1 Small ball probabilities vs. metric entropy 2 Connection to other questions 3 Recent results for
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationInconsistency of Bayesian inference when the model is wrong, and how to repair it
Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction
More informationGaussian Approximations for Probability Measures on R d
SIAM/ASA J. UNCERTAINTY QUANTIFICATION Vol. 5, pp. 1136 1165 c 2017 SIAM and ASA. Published by SIAM and ASA under the terms of the Creative Commons 4.0 license Gaussian Approximations for Probability Measures
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationRATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University
Submitted to the Annals o Statistics RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS By Chao Gao and Harrison H. Zhou Yale University A novel block prior is proposed or adaptive Bayesian estimation.
More informationStratégies bayésiennes et fréquentistes dans un modèle de bandit
Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationA new Hierarchical Bayes approach to ensemble-variational data assimilation
A new Hierarchical Bayes approach to ensemble-variational data assimilation Michael Tsyrulnikov and Alexander Rakitko HydroMetCenter of Russia College Park, 20 Oct 2014 Michael Tsyrulnikov and Alexander
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More informationSemiparametric Minimax Rates
arxiv: math.pr/0000000 Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationThe geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan
The geometry of Gaussian processes and Bayesian optimization. Contal CMLA, ENS Cachan Background: Global Optimization and Gaussian Processes The Geometry of Gaussian Processes and the Chaining Trick Algorithm
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationScaling up Bayesian Inference
Scaling up Bayesian Inference David Dunson Departments of Statistical Science, Mathematics & ECE, Duke University May 1, 2017 Outline Motivation & background EP-MCMC amcmc Discussion Motivation & background
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationBayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017
Bayesian inference Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 10, 2017 1 / 22 Outline for today A genetic example Bayes theorem Examples Priors Posterior summaries
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationGaussian Random Fields: Geometric Properties and Extremes
Gaussian Random Fields: Geometric Properties and Extremes Yimin Xiao Michigan State University Outline Lecture 1: Gaussian random fields and their regularity Lecture 2: Hausdorff dimension results and
More information