Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Size: px
Start display at page:

Download "Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4"

Transcription

1 Bayesian Adaptation Aad van der Vaart aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4

2 Joint work with Jyri Lember Bayesian Adaptation p. 2/4

3 Adaptation Given a collection of possible models find a single procedure that works well for all models Bayesian Adaptation p. 3/4

4 Adaptation Given a collection of possible models find a single procedure that works well for all models as well as a procedure specifically targetted to the correct model Bayesian Adaptation p. 3/4

5 Adaptation Given a collection of possible models find a single procedure that works well for all models as well as a procedure specifically targetted to the correct model correct model is the one that contains the true distribution of the data Bayesian Adaptation p. 3/4

6 Adaptation to Smoothness Given a random sample of size n from a density p 0 on R that is known to have α derivatives, there exist estimators ˆp n with rate ǫ n,α = n α/(2α+1) Bayesian Adaptation p. 4/4

7 Adaptation to Smoothness Given a random sample of size n from a density p 0 on R that is known to have α derivatives, there exist estimators ˆp n with rate ǫ n,α = n α/(2α+1) i.e. E p0 d 2 (ˆp n,p 0 ) 2 = O(ǫ 2 n,α), uniformly in p 0 with (p (α) 0 )2 dλ bounded (if d = 2 ) Bayesian Adaptation p. 4/4

8 Distances Global distances on densities d can be one of: Hellinger: h(p,q) = p q 2 dµ, Total variation: p q 1 = p q dµ, L 2 : p q 2 = p q 2 dµ. Bayesian Adaptation p. 5/4

9 Adaptation Data X 1,...,X n i.i.d. p 0 Models P n,α for α A, countable Optimal rates ǫ n,α Bayesian Adaptation p. 6/4

10 Adaptation Data X 1,...,X n i.i.d. p 0 Models P n,α for α A, countable Optimal rates ǫ n,α p 0 contained in or close to P n,β, some β A Bayesian Adaptation p. 6/4

11 Adaptation Data X 1,...,X n i.i.d. p 0 Models P n,α for α A, countable Optimal rates ǫ n,α p 0 contained in or close to P n,β, some β A We want procedures that (almost) attain rate ǫ n,β, but we do not know β Bayesian Adaptation p. 6/4

12 Adaptation-NonBayesian Main methods: Penalization Cross validation Bayesian Adaptation p. 7/4

13 Adaptation-NonBayesian Main methods: Penalization Cross validation Penalization: Minimize your favourite contrast function (MLE, LS,..), but add a penalty for model complexity Bayesian Adaptation p. 7/4

14 Adaptation-NonBayesian Main methods: Penalization Cross validation Penalization: Minimize your favourite contrast function (MLE, LS,..), but add a penalty for model complexity Cross validation: Split the sample Use first half to select best estimator for each model Use second half to select best model Bayesian Adaptation p. 7/4

15 Adaptation-Penalization Models P n,α, α A Estimator given model ˆp n,α = argmin M n (p) p P n,α Bayesian Adaptation p. 8/4

16 Adaptation-Penalization Models P n,α, α A Estimator given model ˆp n,α = argmin M n (p) p P n,α ( Estimator model ˆα n = argmin Mn (ˆp n,α ) + pen n (α) ) α A Bayesian Adaptation p. 8/4

17 Adaptation-Penalization Models P n,α, α A Estimator given model ˆp n,α = argmin M n (p) p P n,α ( Estimator model ˆα n = argmin Mn (ˆp n,α ) + pen n (α) ) α A Final estimator ˆp n = ˆp n,ˆαn Bayesian Adaptation p. 8/4

18 Adaptation-Penalization Models P n,α, α A Estimator given model ˆp n,α = argmin M n (p) p P n,α ( Estimator model ˆα n = argmin Mn (ˆp n,α ) + pen n (α) ) α A Final estimator ˆp n = ˆp n,ˆαn If M n is the log likelihood, then ˆp n is the posterior mode relative to prior π n (p,α) exp ( pen n (α) ) Bayesian Adaptation p. 8/4

19 Adaptation-Bayesian Models P n,α, α A Prior Π n,α on P n,α Prior (λ n,α ) α A on A Overall Prior Π n = α A λ n,απ n,α Bayesian Adaptation p. 9/4

20 Adaptation-Bayesian Models P n,α, α A Prior Π n,α on P n,α Prior (λ n,α ) α A on A Overall Prior Π n = α A λ n,απ n,α Posterior B Π n (B X 1,...,X n ), n B i=1 Π n (B X 1,...,X n ) = p(x i)dπ n (p) n i=1 p(x i)dπ n (p) α A n λ n n,α p P = n,α :p B i=1 p(x i)dπ n,α (p) α A n λ n n,α p P n,α i=1 p(x. i)dπ n,α (p) Bayesian Adaptation p. 9/4

21 Adaptation-Bayesian Models P n,α, α A Prior Π n,α on P n,α Prior (λ n,α ) α A on A Overall Prior Π n = α A λ n,απ n,α Posterior B Π n (B X 1,...,X n ) Desired result: If p 0 P n,β (or is close) then E p0 Π n ( p : d(p,p0 ) M n ǫ n,β X 1,...,X n ) 0 for every M n Bayesian Adaptation p. 9/4

22 Single Model Model Prior THEOREM P n,β Π n,β (GGvdV, 2000) If log N(ǫ n,β, P n,β,d) Enǫ 2 n,β Π n,β ( Bn,β (ǫ n,β ) ) e Fnǫ2 n,β entropy prior mass then the posterior rate of convergence is ǫ n,β Bayesian Adaptation p. 10/4

23 Single Model Model Prior THEOREM P n,β Π n,β (GGvdV, 2000) If log N(ǫ n,β, P n,β,d) Enǫ 2 n,β Π n,β ( Bn,β (ǫ n,β ) ) e Fnǫ2 n,β entropy prior mass then the posterior rate of convergence is ǫ n,β B n,α (ǫ) is a Kullback-Leibler ball around p 0 : ( ) 2 B n,α (ǫ) = {p } P n,α : P 0 log p ǫ 2,P p0 0 log p p 0 ǫ 2 Bayesian Adaptation p. 10/4

24 Covering Numbers DEFINITION The covering number N ( ǫ, P,d ) is the minimal number of balls of radius ǫ needed to cover the set P. Bayesian Adaptation p. 11/4

25 Covering Numbers DEFINITION The covering number N ( ǫ, P,d ) is the minimal number of balls of radius ǫ needed to cover the set P. Bayesian Adaptation p. 11/4

26 Covering Numbers DEFINITION The covering number N ( ǫ, P,d ) is the minimal number of balls of radius ǫ needed to cover the set P. Rate at which N ( ǫ, P,d ) increases if ǫ 0 determines size of model Parametric model (1/ǫ) d Nonparametric model e (1/ǫ)1/α e.g. smoothness α Bayesian Adaptation p. 11/4

27 Motivation Entropy Solution ǫ n to log N(ǫ, P n,d) nǫ 2 gives optimal rate of convergence for model P n in minimax sense Le Cam (1975, 1986), Birgé (1983), Barron and Yang (1999) Bayesian Adaptation p. 12/4

28 Single Model Model Prior THEOREM P n,β Π n,β (GGvdV, 2000) If log N(ǫ n,β, P n,β,d) Enǫ 2 n,β Π n,β ( Bn,β (ǫ n,β ) ) e Fnǫ2 n,β entropy prior mass then the posterior rate of convergence is ǫ n,β Bayesian Adaptation p. 13/4

29 Motivation Prior Mass Π n ( Bn (ǫ n ) ) e nǫ2 n prior mass Bayesian Adaptation p. 14/4

30 Motivation Prior Mass Π n ( Bn (ǫ n ) ) e nǫ2 n prior mass Need N(ǫ, P,d) exp(nǫ 2 n) balls Bayesian Adaptation p. 14/4

31 Motivation Prior Mass Π n ( Bn (ǫ n ) ) e nǫ2 n prior mass Need N(ǫ, P,d) exp(nǫ 2 n) balls Can place exp(cnǫ 2 n) balls Bayesian Adaptation p. 14/4

32 Motivation Prior Mass Π n ( Bn (ǫ n ) ) e nǫ2 n prior mass Need N(ǫ, P,d) exp(nǫ 2 n) balls Can place exp(cnǫ 2 n) balls If Π n uniform, then each ball receives mass exp( Cnǫ 2 n) Bayesian Adaptation p. 14/4

33 Equivalence KL and Hellinger The prior mass condition uses Kullback-Leibler balls, whereas the entropy condition uses d-balls These are typically (almost) equivalent Bayesian Adaptation p. 15/4

34 Equivalence KL and Hellinger The prior mass condition uses Kullback-Leibler balls, whereas the entropy condition uses d-balls These are typically (almost) equivalent If ratios p 0 /p of densities are bounded, then fully equivalent Bayesian Adaptation p. 15/4

35 Equivalence KL and Hellinger The prior mass condition uses Kullback-Leibler balls, whereas the entropy condition uses d-balls These are typically (almost) equivalent If ratios p 0 /p of densities are bounded, then fully equivalent If P 0 (p 0 /p) b is bounded, some b > 0, then equivalent up to logarithmic factors Bayesian Adaptation p. 15/4

36 Single Model Model Prior THEOREM P n,β Π n,β (GGvdV, 2000) If log N(ǫ n,β, P n,β,d) Enǫ 2 n,β Π n,β ( Bn,β (ǫ n,β ) ) e Fnǫ2 n,β entropy prior mass then the posterior rate of convergence is ǫ n,β Bayesian Adaptation p. 16/4

37 Single Model Model Prior THEOREM P n,β Π n,β (GGvdV, 2000) If log N(ǫ n,β, P n,β,d) Enǫ 2 n,β Π n,β ( Bn,β (ǫ n,β ) ) e Fnǫ2 n,β entropy prior mass then the posterior rate of convergence is ǫ n,β Can actually replace entropy log N(ǫ, P n,β,d) by Le Cam dimension sup η>ǫ log N ( η/2,c n,β (η),d ) Can also refine the prior mass condition Bayesian Adaptation p. 16/4

38 Adaptation-Bayesian Models P n,α, α A Prior Π n,α on P n,α Prior (λ n,α ) α A on A Overall Prior α A λ n,απ n,α Posterior B Π n (B X 1,...,X n ) Desired result: If p 0 P n,β (or is close) then E p0 Π n ( p : d(p,p0 ) Mǫ n,βn X 1,...,X n ) 0 for every sufficiently large M. Bayesian Adaptation p. 17/4

39 Adaptation (1) A finite, ordered nǫ 2 n,β ǫ n,α ǫ n,β if α β Bayesian Adaptation p. 18/4

40 Adaptation (1) A finite, ordered nǫ 2 n,β ǫ n,α ǫ n,β if α β λ n,α λ α e Cnǫ2 n,α Bayesian Adaptation p. 18/4

41 Adaptation (1) A finite, ordered nǫ 2 n,β ǫ n,α ǫ n,β if α β λ n,α λ α e Cnǫ2 n,α Small models get big weights Bayesian Adaptation p. 18/4

42 Adaptation (1) A finite, ordered nǫ 2 n,β ǫ n,α ǫ n,β if α β λ n,α λ α e Cnǫ2 n,α THEOREM If log N(ǫ n,α, P n,α,d) Enǫ 2 n,α Π n,β ( Bn,β (ǫ n,β ) ) e Fnǫ2 n,β entropy, α. prior mass then posterior rate is ǫ n,β Bayesian Adaptation p. 18/4

43 Adaptation (2) Extension to countable A possible in two ways: truncation of weights λ n,α to subsets A n A additional entropy control Bayesian Adaptation p. 19/4

44 Adaptation (2) Extension to countable A possible in two ways: truncation of weights λ n,α to subsets A n A additional entropy control Also replace β by β n Bayesian Adaptation p. 19/4

45 Adaptation (2) Extension to countable A possible in two ways: truncation of weights λ n,α to subsets A n A additional entropy control Also replace β by β n Always assume α (λ α/λ βn ) exp( Cǫ 2 n,α/4) = O(1) Bayesian Adaptation p. 19/4

46 Adaptation (2a)-Truncation A n A, β n A n, log(#a n ) nǫ 2 n,β n nǫ 2 n,β n Bayesian Adaptation p. 20/4

47 Adaptation (2a)-Truncation A n A, β n A n, log(#a n ) nǫ 2 n,β n nǫ 2 n,β n λ n,α λ α e Cnǫ2 n,α 1An (α) Bayesian Adaptation p. 20/4

48 Adaptation (2a)-Truncation A n A, β n A n, log(#a n ) nǫ 2 n,β n nǫ 2 n,β n λ n,α λ α e Cnǫ2 n,α 1An (α) max α A n :ǫ 2 n,α Hǫ 2 n,βn E α ǫ 2 n,α ǫ 2 n,β n = O(1), H 1 Bayesian Adaptation p. 20/4

49 Adaptation (2a)-Truncation A n A, β n A n, log(#a n ) nǫ 2 n,β n nǫ 2 n,β n λ n,α λ α e Cnǫ2 n,α 1An (α) max α A n :ǫ 2 n,α Hǫ 2 n,βn E α ǫ 2 n,α ǫ 2 n,β n = O(1), H 1 THEOREM If log N(ǫ n,α, P n,α,d) Enǫ 2 n,α Π n,βn ( Bn,βn (ǫ n,βn ) ) e Fnǫ2 n,βn entropy, α. prior mass then posterior rate is ǫ n,βn Bayesian Adaptation p. 20/4

50 Adaptation (2b)-Entropy control A countable nǫ 2 n,β Bayesian Adaptation p. 21/4

51 Adaptation (2b)-Entropy control A countable nǫ 2 n,β λ n,α λ α e Cnǫ2 n,α Bayesian Adaptation p. 21/4

52 Adaptation (2b)-Entropy control A countable nǫ 2 n,β λ n,α λ α e Cnǫ2 n,α THEOREM ( log N ǫ n,βn, If H 1 and ) P n,α,d Enǫ 2 n,β n, entropy, α:ǫ n,α Hǫ n,β n Π n,βn ( Bn,βn (ǫ n,βn ) ) e Fnǫ2 n,βn, prior mass then posterior rate is ǫ n,βn Bayesian Adaptation p. 21/4

53 Discrete priors Discrete priors that are uniform on specially constructed approximating sets are universal in the sense that under abstract and mild conditions they give the desired result To avoid unnecessary logarithmic factors we need to replace ordinary entropy by the slightly more restrictive bracketing entropy Bayesian Adaptation p. 22/4

54 Bracketing Numbers Given l,u : X R the bracket [l,u] is the set of p : X R with l p u An ǫ-bracket relative to d is a bracket [l,u] with d(u,l) < ǫ. DEFINITION The bracketing number N [ ] ( ǫ, P,d ) is the minimum number of ǫ-brackets needed to cover P. Bayesian Adaptation p. 23/4

55 Discrete priors Q n,α collection of nonnegative functions with log N ] (ǫ n,α, Q n,α,h) E α nǫ 2 n,α u 1,...,u N minimal set of ǫ n,α -upper brackets ũ 1,...,ũ N normalized functions Bayesian Adaptation p. 24/4

56 Discrete priors Q n,α collection of nonnegative functions with log N ] (ǫ n,α, Q n,α,h) E α nǫ 2 n,α u 1,...,u N minimal set of ǫ n,α -upper brackets ũ 1,...,ũ N normalized functions Prior Model Π n,α uniform on ũ 1,...,ũ N M>0 MQ n,α Bayesian Adaptation p. 24/4

57 Discrete priors Q n,α collection of nonnegative functions with log N ] (ǫ n,α, Q n,α,h) E α nǫ 2 n,α u 1,...,u N minimal set of ǫ n,α -upper brackets ũ 1,...,ũ N normalized functions Prior Model Π n,α uniform on ũ 1,...,ũ N M>0 MQ n,α THEOREM If λ n,α and A n A are as before, and p 0 M 0 Q n,β then posterior rate is ǫ n,β, relative to the Hellinger distance. Bayesian Adaptation p. 24/4

58 Smoothness Spaces B α 1 unit ball in a Banach Bα of functions log N ] ( ǫn,α, B α 1, 2) Eα nǫ 2 n,α Model p B α Bayesian Adaptation p. 25/4

59 Smoothness Spaces B α 1 unit ball in a Banach Bα of functions log N ] ( ǫn,α, B α 1, 2) Eα nǫ 2 n,α Model p B α THEOREM There exists a prior such that the posterior rate is ǫ n,β whenever p 0 B β for some β > 0. Bayesian Adaptation p. 25/4

60 Smoothness Spaces B α 1 unit ball in a Banach Bα of functions log N ] ( ǫn,α, B α 1, 2) Eα nǫ 2 n,α Model p B α THEOREM There exists a prior such that the posterior rate is ǫ n,β whenever p 0 B β for some β > 0. EXAMPLE Hölder spaces and Sobolev spaces of α-smooth functions, with ǫ n,α = n α/(2α+1). Besov spaces (in progress) Bayesian Adaptation p. 25/4

61 Finite-Dimensional Models Model P J of dimension J Bayesian Adaptation p. 26/4

62 Finite-Dimensional Models Model P J of dimension J Bias p 0 β-regular if d(p 0, P J ) (1/J) β Variance Precision when estimating J parameters J/n Bayesian Adaptation p. 26/4

63 Finite-Dimensional Models Model P J of dimension J Bias p 0 β-regular if d(p 0, P J ) (1/J) β Variance Precision when estimating J parameters J/n Bias-variance trade-off (1/J) 2β J/n Optimal dimension Rate J n 1/(2β+1) ǫ n,j n β/(2β+1) Bayesian Adaptation p. 26/4

64 Finite-Dimensional Models Model P J of dimension J Bias p 0 β-regular if d(p 0, P J ) (1/J) β Variance Precision when estimating J parameters J/n Bias-variance trade-off (1/J) 2β J/n Optimal dimension Rate J n 1/(2β+1) ǫ n,j n β/(2β+1) We want to adapt to β by putting weights on J Bayesian Adaptation p. 26/4

65 Finite-Dimensional Models Model P J of dimension J Model dimension can be taken as Le Cam dimension J sup η>ǫ log N ( η/2, {p P J : d(p,p 0 ) < η},d ) dimension 2 Bayesian Adaptation p. 27/4

66 Finite-Dimensional Models Models P J,M of Le Cam dimension A M J, J N, M M, Prior Π J,M ( BJ,M (ǫ) ) ( B J C M ǫ ) J, ǫ > DM d(p 0, P J,M ) Bayesian Adaptation p. 28/4

67 Finite-Dimensional Models Models P J,M of Le Cam dimension A M J, J N, M M, Prior Π J,M ( BJ,M (ǫ) ) ( B J C M ǫ ) J, ǫ > DM d(p 0, P J,M ) This correspond to a smooth prior on the J-dimensional model Bayesian Adaptation p. 28/4

68 Finite-Dimensional Models Models P J,M of Le Cam dimension A M J, J N, M M, ( Prior Π J,M BJ,M (ǫ) ) ( B J C M ǫ ) J, ǫ > DM d(p 0, P J,M ) Weights λ n,j,m e Cnǫ2 n,j,m 1Jn M n (J,M) Bayesian Adaptation p. 28/4

69 Finite-Dimensional Models Models P J,M of Le Cam dimension A M J, J N, M M, ( Prior Π J,M BJ,M (ǫ) ) ( B J C M ǫ ) J, ǫ > DM d(p 0, P J,M ) Weights λ n,j,m e Cnǫ2 n,j,m 1Jn M n (J,M) (log C M )A M 1, B J J k, M M e HA M < ǫ n,j,m = J log n n A M THEOREM If there exist J n J n with J n n and d(p 0, P n,jn,m 0 ) ǫ n,jn,m 0, then posterior rate is ǫ n,jn,m 0 Bayesian Adaptation p. 28/4

70 Finite-Dimensional Models: Examples If p 0 P J0,M 0 for some J 0, then rate (log n)/n. Bayesian Adaptation p. 29/4

71 Finite-Dimensional Models: Examples If p 0 P J0,M 0 for some J 0, then rate (log n)/n. If d(p 0, P J,M0 ) J β for every J, rate (n/ log n) β/(2β+1). Bayesian Adaptation p. 29/4

72 Finite-Dimensional Models: Examples If p 0 P J0,M 0 for some J 0, then rate (log n)/n. If d(p 0, P J,M0 ) J β for every J, rate (n/ log n) β/(2β+1). If d(p 0, P J,M0 ) e Jβ for every J, then rate (log n) 1/β+1/2 / n. Bayesian Adaptation p. 29/4

73 Finite-Dimensional Models: Examples If p 0 P J0,M 0 for some J 0, then rate (log n)/n. If d(p 0, P J,M0 ) J β for every J, rate (n/ log n) β/(2β+1). If d(p 0, P J,M0 ) e Jβ for every J, then rate (log n) 1/β+1/2 / n. Can logarithmic factors be avoided? By using different weights and/or different model priors? Bayesian Adaptation p. 29/4

74 Splines [0, 1) = K [ ) k=1 (k 1)/K,k/K Spline of order q is continuous function f : [0, 1] R with q 2 times differentiable on [0, 1) restriction to every [ (k 1)/K,k/K ) is a polynomial of degree < q linear spine Bayesian Adaptation p. 30/4

75 Splines [0, 1) = K [ ) k=1 (k 1)/K,k/K Spline of order q is continuous function f : [0, 1] R with q 2 times differentiable on [0, 1) restriction to every [ (k 1)/K,k/K ) is a polynomial of degree < q linear spine Splines form a J = q + K 1-dimensional vector space Convenient basis B-splines B J,1,...,B J,J Bayesian Adaptation p. 30/4

76 Splines-Properties [0, 1) = K [ ) k=1 (k 1)/K,k/K θ T B J = j θ jb J,j θ R J, J = K + q 1 Approximation of smooth functions If q α > 0 and f in C α [0, 1], then Equivalence of norms For any θ R J, inf θ R J θ T B J f C q,α ( 1 J θ θ T B J θ, ) α f α θ 2 J θ T B J 2 θ 2. Bayesian Adaptation p. 31/4

77 Log Spline Models [0, 1) = K [ ) k=1 (k 1)/K,k/K θ T B J = j θ jb J,j, J = K + q 1 p J,θ (x) = e θt B J (x) c J (θ), e c J(θ) = 1 0 e θt B J (x) dx. Bayesian Adaptation p. 32/4

78 Log Spline Models [0, 1) = K [ ) k=1 (k 1)/K,k/K θ T B J = j θ jb J,j, J = K + q 1 p J,θ (x) = e θt B J (x) c J (θ), e c J(θ) = 1 0 e θt B J (x) dx. prior on θ induces prior on p J,θ for fixed J prior on J give model weights λ n,j Bayesian Adaptation p. 32/4

79 Log Spline Models [0, 1) = K [ ) k=1 (k 1)/K,k/K θ T B J = j θ jb J,j, J = K + q 1 p J,θ (x) = e θt B J (x) c J (θ), e c J(θ) = 1 0 e θt B J (x) dx. prior on θ induces prior on p J,θ for fixed J prior on J give model weights λ n,j flat prior on θ and model weights λ n,j as before gives adaptation to smoothness classes up to logarithmic factor Bayesian Adaptation p. 32/4

80 Log Spline Models [0, 1) = K [ ) k=1 (k 1)/K,k/K θ T B J = j θ jb J,j, J = K + q 1 p J,θ (x) = e θt B J (x) c J (θ), e c J(θ) = 1 0 e θt B J (x) dx. prior on θ induces prior on p J,θ for fixed J prior on J give model weights λ n,j flat prior on θ and model weights λ n,j as before gives adaptation to smoothness classes up to logarithmic factor Can do better? Bayesian Adaptation p. 32/4

81 Adaptation (3) A finite, ordered ǫ n,α < ǫ n,β if α > β for every α nǫ 2 n,α THEOREM If sup log N ( ǫ/2,c n,α (ǫ),d ) Enǫ 2 n,α, α A, ǫ ǫ n,α ( λ n,α Π n,α Cn,α (Bǫ n,α ) ( λ n,β Π n,β Bn,β (ǫ n,β ) ) = o ( e n,β) 2nǫ2, α < β, ( λ n,α Π n,α Cn,α (iǫ n,α ) ) ( λ n,β Π n,β Bn,β (ǫ n,β ) ) e i2 n(ǫ 2 n,α ǫ 2 n,β ), then posterior rate is ǫ n,β B n,α (ǫ) and C n,α (ǫ) are KL-ball and d-ball in P n,α around p 0 Bayesian Adaptation p. 33/4

82 Log Spline Models Consider four combinations of priors Π n,α on θ weights λ n,α on J n,α to adapt to smoothness classes J n,α n 1/(2α+1) ǫ n,α = n α/(2α+1) Assume p 0 is β-smooth and sufficiently regular Bayesian Adaptation p. 34/4

83 Flat prior, uniform weights Π n,α uniform on [ M,M] J n,α, M large Uniform weights λ n,α = λ α Bayesian Adaptation p. 35/4

84 Flat prior, uniform weights Π n,α uniform on [ M,M] J n,α, M large Uniform weights λ n,α = λ α THEOREM Posterior rate is ǫ n,β log n Bayesian Adaptation p. 35/4

85 Flat prior, decreasing weights Π n,α uniform on [ M,M] J n,α, λ n,α γ<α (Cǫ n,γ) J n,γ, C > 1 M large THEOREM Posterior rate is ǫ n,β Bayesian Adaptation p. 36/4

86 Flat prior, decreasing weights Π n,α uniform on [ M,M] J n,α, λ n,α γ<α (Cǫ n,γ) J n,γ, C > 1 M large THEOREM Posterior rate is ǫ n,β Small models get small weight! Bayesian Adaptation p. 36/4

87 Discrete priors, increasing weights Π n,α discrete on R J with minimal number of support points to obtain approximation error ǫ n,α λ n,α λ α e Cnǫ2 n,α THEOREM Posterior rate is ǫ n,β Bayesian Adaptation p. 37/4

88 Discrete priors, increasing weights Π n,α discrete on R J with minimal number of support points to obtain approximation error ǫ n,α λ n,α λ α e Cnǫ2 n,α THEOREM Posterior rate is ǫ n,β Small models get big weight! Bayesian Adaptation p. 37/4

89 Discrete priors, increasing weights Π n,α discrete on R J with minimal number of support points to obtain approximation error ǫ n,α λ n,α λ α e Cnǫ2 n,α THEOREM Posterior rate is ǫ n,β Splines of dimension J n,α give approximation error ǫ n,α. A uniform grid on coefficients in dimension J n,α that gives approximation error ǫ n,α is too large. Need sparse subset. Similarly a smooth prior on coefficients in dimension J n,α is too rich. Bayesian Adaptation p. 37/4

90 Special smooth prior, increasing weights Π n,α continuous and uniform on minimal subset of R J that allows approximation with error ǫ n,α Special, increasing weights λ n,α THEOREM (Huang, 2002) Posterior rate is ǫ n,β Huang obtains this result for the full scale of regularity spaces in a general finite-dimensional setting Bayesian Adaptation p. 38/4

91 Conclusion There is a range of weights λ n,α that works Which weights λ n,α work depends on the fine properties of the priors on the models P n,α Bayesian Adaptation p. 39/4

92 Gaussian mixtures Model Prior p F,σ (x) = φ σ (x z)df(z) F Dirichlet(α), σ π n, independent (α Gaussian, π n smooth) Bayesian Adaptation p. 40/4

93 Gaussian mixtures Model Prior p F,σ (x) = φ σ (x z)df(z) F Dirichlet(α), σ π n, independent (α Gaussian, π n smooth) CASE ss: π n fixed CASE s : π n shrinks at rate n 1/5 Bayesian Adaptation p. 40/4

94 Gaussian mixtures Model Prior p F,σ (x) = φ σ (x z)df(z) F Dirichlet(α), σ π n, independent (α Gaussian, π n smooth) CASE ss: π n fixed CASE s : π n shrinks at rate n 1/5 THEOREM (Ghosal,vdV) Rate of convergence relative to (truncated) Hellinger distance is CASE ss: if p 0 = p σ0,f 0, then (log n) k / n CASE s: if p 0 is 2-smooth, then n 2/5 (log n) 2 Assume p 0 subgaussian Bayesian Adaptation p. 40/4

95 Gaussian mixtures Model Prior p F,σ (x) = φ σ (x z)df(z) F Dirichlet(α), σ π n, independent (α Gaussian, π n smooth) CASE ss: π n fixed CASE s : π n shrinks at rate n 1/5 THEOREM (Ghosal,vdV) Rate of convergence relative to (truncated) Hellinger distance is CASE ss: if p 0 = p σ0,f 0, then (log n) k / n CASE s: if p 0 is 2-smooth, then n 2/5 (log n) 2 Can we adapt to the two cases? Bayesian Adaptation p. 40/4

96 Gaussian mixtures Weights λ n,s et λ n,ss Bayesian Adaptation p. 41/4

97 Gaussian mixtures Weights λ n,s et λ n,ss THEOREM Adaptation up to logarithmic factors if exp ( c(log n) k) < λ n,ss λ n,s < exp ( Cn 1/5 (log n) k) Bayesian Adaptation p. 41/4

98 Gaussian mixtures Weights λ n,s et λ n,ss THEOREM Adaptation up to logarithmic factors if exp ( c(log n) k) < λ n,ss λ n,s < exp ( Cn 1/5 (log n) k) We believe this works already if exp ( c(log n) k) < λ n,ss λ n,s < exp ( Cn 1/5 (log n) k) In particular: equal weights. Bayesian Adaptation p. 41/4

99 Conclusion There is a range of weights λ n,α that works Which weights λ n,α work depends on the fine properties of the priors on the models P n,α Bayesian Adaptation p. 42/4

100 Conclusion There is a range of weights λ n,α that works Which weights λ n,α work depends on the fine properties of the priors on the models P n,α This interaction makes comparison with penalized minimum contrast estimation difficult Need refined asymptotics and numerical implementation for further understanding Bayesian Adaptation p. 42/4

101 Conclusion There is a range of weights λ n,α that works Which weights λ n,α work depends on the fine properties of the priors on the models P n,α This interaction makes comparison with penalized minimum contrast estimation difficult Need refined asymptotics and numerical implementation for further understanding Bayesian density estimation is 10 years behind? Bayesian Adaptation p. 42/4

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Nonparametric Bayesian model selection and averaging

Nonparametric Bayesian model selection and averaging Electronic Journal of Statistics Vol. 2 2008 63 89 ISSN: 1935-7524 DOI: 10.1214/07-EJS090 Nonparametric Bayesian model selection and averaging Subhashis Ghosal Department of Statistics, North Carolina

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1. By Yuhong Yang and Andrew Barron Iowa State University and Yale University

INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1. By Yuhong Yang and Andrew Barron Iowa State University and Yale University The Annals of Statistics 1999, Vol. 27, No. 5, 1564 1599 INFORMATION-THEORETIC DETERMINATION OF MINIMAX RATES OF CONVERGENCE 1 By Yuhong Yang and Andrew Barron Iowa State University and Yale University

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University

DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der

More information

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors

Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Ann Inst Stat Math (29) 61:835 859 DOI 1.17/s1463-8-168-2 Asymptotic properties of posterior distributions in nonparametric regression with non-gaussian errors Taeryon Choi Received: 1 January 26 / Revised:

More information

1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th

1234 S. GHOSAL AND A. W. VAN DER VAART algorithm to adaptively select a finitely supported mixing distribution. These authors showed consistency of th The Annals of Statistics 2001, Vol. 29, No. 5, 1233 1263 ENTROPIES AND RATES OF CONVERGENCE FOR MAXIMUM LIKELIHOOD AND BAYES ESTIMATION FOR MIXTURES OF NORMAL DENSITIES By Subhashis Ghosal and Aad W. van

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Bayesian Aggregation for Extraordinarily Large Dataset

Bayesian Aggregation for Extraordinarily Large Dataset Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng 1 Department of Statistics Purdue University www.science.purdue.edu/bigdata Department Seminar Statistics@LSE May 19, 2017 1 A Joint Work

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

arxiv: v2 [math.st] 18 Feb 2013

arxiv: v2 [math.st] 18 Feb 2013 Bayesian optimal adaptive estimation arxiv:124.2392v2 [math.st] 18 Feb 213 using a sieve prior Julyan Arbel 1,2, Ghislaine Gayraud 1,3 and Judith Rousseau 1,4 1 Laboratoire de Statistique, CREST, France

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2493 2511 RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1 By Theodoros Nicoleris and Yannis

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms

Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms university-logo Penalized Squared Error and Likelihood: Risk Bounds and Fast Algorithms Andrew Barron Cong Huang Xi Luo Department of Statistics Yale University 2008 Workshop on Sparsity in High Dimensional

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

arxiv: v1 [math.st] 15 Dec 2017

arxiv: v1 [math.st] 15 Dec 2017 A THEORETICAL FRAMEWORK FOR BAYESIAN NONPARAMETRIC REGRESSION: ORTHONORMAL RANDOM SERIES AND RATES OF CONTRACTION arxiv:171.05731v1 math.st] 15 Dec 017 By Fangzheng Xie, Wei Jin, and Yanxun Xu Johns Hopkins

More information

16.1 Bounding Capacity with Covering Number

16.1 Bounding Capacity with Covering Number ECE598: Information-theoretic methods in high-dimensional statistics Spring 206 Lecture 6: Upper Bounds for Density Estimation Lecturer: Yihong Wu Scribe: Yang Zhang, Apr, 206 So far we have been mostly

More information

Bayesian Sparse Linear Regression with Unknown Symmetric Error

Bayesian Sparse Linear Regression with Unknown Symmetric Error Bayesian Sparse Linear Regression with Unknown Symmetric Error Minwoo Chae 1 Joint work with Lizhen Lin 2 David B. Dunson 3 1 Department of Mathematics, The University of Texas at Austin 2 Department of

More information

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Convergence of latent mixing measures in nonparametric and mixture models 1

Convergence of latent mixing measures in nonparametric and mixture models 1 Convergence of latent mixing measures in nonparametric and mixture models 1 XuanLong Nguyen xuanlong@umich.edu Department of Statistics University of Michigan Technical Report 527 (Revised, Jan 25, 2012)

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Minimax theory for a class of non-linear statistical inverse problems

Minimax theory for a class of non-linear statistical inverse problems Minimax theory for a class of non-linear statistical inverse problems Kolyan Ray and Johannes Schmidt-Hieber Leiden University Abstract We study a class of statistical inverse problems with non-linear

More information

Adaptive Bayesian procedures using random series priors

Adaptive Bayesian procedures using random series priors Adaptive Bayesian procedures using random series priors WEINING SHEN 1 and SUBHASHIS GHOSAL 2 1 Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030 2 Department

More information

Quantifying Stochastic Model Errors via Robust Optimization

Quantifying Stochastic Model Errors via Robust Optimization Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations

More information

Semiparametric Minimax Rates

Semiparametric Minimax Rates arxiv: math.pr/0000000 Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

University of California, Berkeley

University of California, Berkeley University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2004 Paper 142 The Cross-Validated Adaptive Epsilon-Net Estimator Mark J. van der Laan Sandrine Dudoit

More information

Segment Description of Turbulence

Segment Description of Turbulence Dynamics of PDE, Vol.4, No.3, 283-291, 2007 Segment Description of Turbulence Y. Charles Li Communicated by Y. Charles Li, received August 25, 2007. Abstract. We propose a segment description for turbulent

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

A semiparametric Bernstein - von Mises theorem for Gaussian process priors

A semiparametric Bernstein - von Mises theorem for Gaussian process priors Noname manuscript No. (will be inserted by the editor) A semiparametric Bernstein - von Mises theorem for Gaussian process priors Ismaël Castillo Received: date / Accepted: date Abstract This paper is

More information

Worst-Case Bounds for Gaussian Process Models

Worst-Case Bounds for Gaussian Process Models Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University

Bayesian Statistics and Data Assimilation. Jonathan Stroud. Department of Statistics The George Washington University Bayesian Statistics and Data Assimilation Jonathan Stroud Department of Statistics The George Washington University 1 Outline Motivation Bayesian Statistics Parameter Estimation in Data Assimilation Combined

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

arxiv: v1 [stat.me] 21 Mar 2016

arxiv: v1 [stat.me] 21 Mar 2016 Sharp sup-norm Bayesian curve estimation Catia Scricciolo Department of Economics, University of Verona, Via Cantarane 24, 37129 Verona, Italy arxiv:163.648v1 [stat.me] 21 Mar 216 Abstract Sup-norm curve

More information

RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University

RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS. By Chao Gao and Harrison H. Zhou Yale University Submitted to the Annals o Statistics RATE EXACT BAYESIAN ADAPTATION WITH MODIFIED BLOCK PRIORS By Chao Gao and Harrison H. Zhou Yale University A novel block prior is proposed or adaptive Bayesian estimation.

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

Latent factor density regression models

Latent factor density regression models Biometrika (2010), 97, 1, pp. 1 7 C 2010 Biometrika Trust Printed in Great Britain Advance Access publication on 31 July 2010 Latent factor density regression models BY A. BHATTACHARYA, D. PATI, D.B. DUNSON

More information

Inconsistency of Bayesian inference when the model is wrong, and how to repair it

Inconsistency of Bayesian inference when the model is wrong, and how to repair it Inconsistency of Bayesian inference when the model is wrong, and how to repair it Peter Grünwald Thijs van Ommen Centrum Wiskunde & Informatica, Amsterdam Universiteit Leiden June 3, 2015 Outline 1 Introduction

More information

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS

MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Statistica Sinica 19 (2009), 159-176 MULTIVARIATE HISTOGRAMS WITH DATA-DEPENDENT PARTITIONS Jussi Klemelä University of Oulu Abstract: We consider estimation of multivariate densities with histograms which

More information

Nonparametric Bayes Inference on Manifolds with Applications

Nonparametric Bayes Inference on Manifolds with Applications Nonparametric Bayes Inference on Manifolds with Applications Abhishek Bhattacharya Indian Statistical Institute Based on the book Nonparametric Statistics On Manifolds With Applications To Shape Spaces

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Stratégies bayésiennes et fréquentistes dans un modèle de bandit

Stratégies bayésiennes et fréquentistes dans un modèle de bandit Stratégies bayésiennes et fréquentistes dans un modèle de bandit thèse effectuée à Telecom ParisTech, co-dirigée par Olivier Cappé, Aurélien Garivier et Rémi Munos Journées MAS, Grenoble, 30 août 2016

More information

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction

SHARP BOUNDARY TRACE INEQUALITIES. 1. Introduction SHARP BOUNDARY TRACE INEQUALITIES GILES AUCHMUTY Abstract. This paper describes sharp inequalities for the trace of Sobolev functions on the boundary of a bounded region R N. The inequalities bound (semi-)norms

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

The Calibrated Bayes Factor for Model Comparison

The Calibrated Bayes Factor for Model Comparison The Calibrated Bayes Factor for Model Comparison Steve MacEachern The Ohio State University Joint work with Xinyi Xu, Pingbo Lu and Ruoxi Xu Supported by the NSF and NSA Bayesian Nonparametrics Workshop

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Robust estimation, efficiency, and Lasso debiasing

Robust estimation, efficiency, and Lasso debiasing Robust estimation, efficiency, and Lasso debiasing Po-Ling Loh University of Wisconsin - Madison Departments of ECE & Statistics WHOA-PSI workshop Washington University in St. Louis Aug 12, 2017 Po-Ling

More information

Semiparametric minimax rates

Semiparametric minimax rates Electronic Journal of Statistics Vol. 3 (2009) 1305 1321 ISSN: 1935-7524 DOI: 10.1214/09-EJS479 Semiparametric minimax rates James Robins and Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

arxiv: v1 [stat.me] 17 Jan 2017

arxiv: v1 [stat.me] 17 Jan 2017 Some Theoretical Results Regarding the arxiv:1701.04512v1 [stat.me] 17 Jan 2017 Polygonal Distribution Hien D. Nguyen and Geoffrey J. McLachlan January 17, 2017 Abstract The polygonal distributions are

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Penalized Barycenters in the Wasserstein space

Penalized Barycenters in the Wasserstein space Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles

More information

Coding on Countably Infinite Alphabets

Coding on Countably Infinite Alphabets Coding on Countably Infinite Alphabets Non-parametric Information Theory Licence de droits d usage Outline Lossless Coding on infinite alphabets Source Coding Universal Coding Infinite Alphabets Enveloppe

More information

ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE

ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE By Ryan Martin and Surya T. Tokdar Indiana University-Purdue University Indianapolis and Duke University Here we explore

More information

On Bayes Risk Lower Bounds

On Bayes Risk Lower Bounds Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA

More information

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n

Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

On the Choice of Parametric Families of Copulas

On the Choice of Parametric Families of Copulas On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012

Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012 Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Nonparametric Drift Estimation for Stochastic Differential Equations

Nonparametric Drift Estimation for Stochastic Differential Equations Nonparametric Drift Estimation for Stochastic Differential Equations Gareth Roberts 1 Department of Statistics University of Warwick Brazilian Bayesian meeting, March 2010 Joint work with O. Papaspiliopoulos,

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Surrogate loss functions, divergences and decentralized detection

Surrogate loss functions, divergences and decentralized detection Surrogate loss functions, divergences and decentralized detection XuanLong Nguyen Department of Electrical Engineering and Computer Science U.C. Berkeley Advisors: Michael Jordan & Martin Wainwright 1

More information

Estimating divergence functionals and the likelihood ratio by convex risk minimization

Estimating divergence functionals and the likelihood ratio by convex risk minimization Estimating divergence functionals and the likelihood ratio by convex risk minimization XuanLong Nguyen Dept. of Statistical Science Duke University xuanlong.nguyen@stat.duke.edu Martin J. Wainwright Dept.

More information

Local Asymptotics and the Minimum Description Length

Local Asymptotics and the Minimum Description Length Local Asymptotics and the Minimum Description Length Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia, PA 19104-6302 March 27,

More information

Information and Statistics

Information and Statistics Information and Statistics Andrew Barron Department of Statistics Yale University IMA Workshop On Information Theory and Concentration Phenomena Minneapolis, April 13, 2015 Barron Information and Statistics

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Local Minimax Testing

Local Minimax Testing Local Minimax Testing Sivaraman Balakrishnan and Larry Wasserman Carnegie Mellon June 11, 2017 1 / 32 Hypothesis testing beyond classical regimes Example 1: Testing distribution of bacteria in gut microbiome

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links.

Fall, 2007 Nonlinear Econometrics. Theory: Consistency for Extremum Estimators. Modeling: Probit, Logit, and Other Links. 14.385 Fall, 2007 Nonlinear Econometrics Lecture 2. Theory: Consistency for Extremum Estimators Modeling: Probit, Logit, and Other Links. 1 Example: Binary Choice Models. The latent outcome is defined

More information