Minimax lower bounds I

Size: px
Start display at page:

Download "Minimax lower bounds I"

Transcription

1 Minimax lower bounds I Kyoung Hee Kim Sungshin University

2 1 Preliminaries 2 General strategy 3 Le Cam, Assouad, Appendix

3 Setting Family of probability measures {P θ : θ Θ} on a sigma field A Estimator ˆθ: measurable map from Ω to Θ L(ˆθ, θ): loss function Maximum risk R(Θ, ˆθ) := sup θ Θ E θ L(ˆθ, θ) Minimax risk with a minimax estimator ˆθ mm, R(Θ) = inf θ sup θ Θ E θ L( θ, θ) = sup E θ L(ˆθ mm, θ). θ Θ 1 / 44

4 Why minimax? Uniformity excludes super-efficient estimators Optimal rates reveal the difficulty in the model of interest Guidance on the choice of estimators 2 / 44

5 Various distances P, Q: two probability measures with densities p, q w.r.t ν. Total variation distance V (P, Q) = sup P (A) Q(A) = sup (p q)dν A A A A A Squared Hellinger distance h 2 (P, Q) = (p 1/2 q 1/2 ) 2 dν. Kullback Leibler (KL) divergence KL(P, Q) = p log p q dν. Chi-squared χ 2 distance p χ 2 2 (P, Q) = q dν 1. 3 / 44

6 Relation between distances P, Q: two probability measures with densities p, q w.r.t ν h2 (P, Q) V (P, Q) h(p, Q) 1 h2 (P,Q) 4 2 V (P, Q) h(p, Q) KL(P, Q) χ 2 (P, Q) 3 h 2 (P n, Q n ) nh 2 (P, Q) KL(P,Q) 4 (Pinsker) V (P, Q) 2 and V (P, Q) exp( KL(P, Q)). 4 / 44

7 Relation to Bayes estimator Let ˆθ Π be Bayes estimator using a prior Π. Least favourable prior gives a maximal Bayes risk (BR) for all prior distributions If Bayes risk of ˆθ Π is equal to a maximal risk of ˆθ Π, then ˆθ Π is minimax, and Π is least favourable. If maximal risk of ˆθ is equal to lim n BR of ˆθ Πn, converging limit of Bayes risks with a sequence of priors (which is at least any Bayes risk for any prior), then ˆθ is minimax. 5 / 44

8 Examples of minimax estimator 1 Sample mean Let X i be i.i.d. N(θ, 1), i = 1,..., n and let L(ˆθ, θ) = (ˆθ θ) 2. X is minimax with a sequence of prior Π b = N(µ, b) since ˆθ Πb = (n X + µ/b)/(n + 1/b) and the posterior variance 1/(n + 1/b) converges to 1/n as b, which is equal to V ar( X). 2 James Stein estimator Let X N d (θ, I d ) where d 3, and let L(ˆθ, θ) = d i=1 (ˆθ i θ i ) 2. Minimax ( estimator ˆθ ) mm (dominating X) is found as ˆθ mm = 1 d 2 d X i=1 X2 i But ˆθ mm is not unique minimax (and ( not admissible). ) For instance, positive part estimator 1 d 2 d X i=1 X2 i dominates ˆθ mm. + 6 / 44

9 risk function (d=4) JS, Minimax X, MLE, UMVUE Positive JS, Minimax θ 7 / 44

10 Minimax optimality Usually difficult to find minimax risk and minimax estimator. Typically satisfied if we find a good lower bound l(n) on R(Θ) and if we find a good upper bound u(n) by using a specific estimator θ, where l(n) R(Θ) sup E θ L( θ, θ) u(n) (1) θ Θ u(n) lim n l(n) C. If θ satisfies (1), then θ is called a minimax optimal estimator. 8 / 44

11 Possible questions 1 In a nonparametric regression problem where the regression function θ satisfies some smooth conditions (e.g. twice conti. differentiable), with the squared error loss function at one point, what is a minimax lower bound? That is, sup E θ ( θ(x 0 ) θ(x 0 )) 2...? θ Θ 2 In a density estimation problem where a density f on R is assumed to be s times differentiable, with the integrated squared error loss function, what is a mnimax lower bound? That is, sup E f ( f(x) f(x)) 2 dx...? f F s 9 / 44

12 Lower bounds via T V distance d is a metric on Θ d(θ 0, θ 1 ) 2δ where θ 0, θ 1 Θ Denote A 0 := {ω : d(θ 0, ˆθ(ω)) < δ}. (1) Risk bound implies T V bound. Suppose P θ {d(θ, ˆθ) δ} ɛ, θ Θ Then P θ0 A 0 1 ɛ and P θ1 A 0 ɛ, which shows V (P θ0, P θ1 ) P θ0 A 0 P θ1 A 0 1 2ɛ 10 / 44

13 Lower bounds via T V distance d is a metric on Θ d(θ 0, θ 1 ) 2δ where θ 0, θ 1 Θ Denote A 0 := {ω : d(θ 0, ˆθ(ω)) < δ}. (2) T V bound implies risk bound. Suppose V (P θ0, P θ1 ) < 1 2ɛ. Then, sup θ Θ P θ (d(θ, ˆθ) ) δ = R(Θ, ˆθ) > ɛ since 2R(Θ, ˆθ) P θ0 (d(θ 0, ˆθ) ) ( δ + P θ1 d(θ 1, ˆθ) ) δ P θ0 A c 0 + P θ1 A 0 1 sup P θ0 A P θ1 A > 2ɛ. A 10 / 44

14 Lower bounds via testing Above argument uses L(ξ, θ) = 1 {d(ξ,θ) δ} L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 for all ξ if d(θ 0, θ 1 ) 2δ. For a general loss function, Suppose inf ξ Θ (L(ξ, θ 0 ) + L(ξ, θ 1 )) d(θ 0, θ 1 ) for a pair θ 0 and θ 1 in Θ. Then ( L(ξ, θ0 ) inf ξ Θ d(θ 0, θ 1 ) + L(ξ, θ ) 1) 1. d(θ 0, θ 1 ) 11 / 44

15 Lower bounds via testing 2R(Θ, ˆθ) E θ0 L(ˆθ, θ 0 ) + E θ1 L(ˆθ, θ 1 ) [ ] L(ˆθ, θ 0 ) = d(θ 0, θ 1 ) E θ0 d(θ 0, θ 1 ) + E L(ˆθ, θ 1 ) θ 1 d(θ 0, θ 1 ) d(θ 0, θ 1 ) inf [E θ 0 f 0 + E θ1 f 1 ] f 0 +f 1 1,f 0,f 1 0 Note that inf E θ 0 f 0 + E θ1 f 1 = (p θ0 p θ1 ) dν f 0 +f 1 1,f 0,f 1 0 = 1 1 p θ0 p θ1 dν 2 = 1 V (P θ0, P θ1 ) 12 / 44

16 N(θ 0, 1) N(θ 1, 1) errori θ 0 + θ 1 2 errorii / 44

17 Testing affinity (T A) Testing affinity: (p θ0 p θ1 ) dν := P θ0 P θ1 1 T A = E θ0 1 {pθ0 p θ1 } + E θ1 1 {pθ1 p θ0 } = Sum of two errors If P θ0 = P θ1, then T A = 1 meaning testing impossible If supp(p θ0 ) = [0, 1], supp(p θ1 ) = [2, 3], then T A = 0 meaning perfect test Calculation of T A for product measures P θ0 = P n 0, P θ 1 = P n 1 (1 T A) 2 nh 2 (P 0, P 1 ) h 2 (P 0, P 1 ) KL(P 0, P 1 ) χ 2 (P 0, P 1 ) for P 0 P 1 14 / 44

18 Reduction strategy Restrict attention to a finite subset where A is a finite index set. With an estimator ˆθ, define Θ F := {θ α : α A} Θ ˆα := arg min α A L(ˆθ, θ α ). Assume with a metric d, for some constant δ > 0, inf (L(ξ, θ α) + L(ξ, θ β )) δd(θ α, θ β ) > 0. α, β A (2) ξ Θ 15 / 44

19 Reduction strategy Reduce maximum risk by the average risk over the finite sub parameter space R(Θ, ˆθ) = 1 2 sup E θ 2L(ˆθ, θ) θ Θ 1 2 max α A E θ α 2L(ˆθ, θ α ) 1 ( 2 max E θ α L(ˆθ, θ α ) + L(ˆθ, θˆα ) α A ) by def. of ˆα δ 2 max E θ α d(θˆα, θ α ) by condition (2) αa δ E θα d(θˆα, θ α ). bound max by average 2 A α A 16 / 44

20 Two-point tests Let A = {0, 1} and d(θ α, θ β ) = 1 {θα θ β } where α, β A. By the above calculation, R(Θ, ˆθ) δ 4 (E θ 0 d(θˆα, θ 0 ) + E θ1 d(θˆα, θ 1 )) Thus δ 4 (E θ 0 (ˆα = 1) + E θ1 (ˆα = 0)) δ 4 inf {E θ 0 f 0 + E θ1 f 1 : 0 f 0, f 1 1, f 0 + f 1 = 1} δ 4 P θ 0 P θ1 1. R(Θ) δ 4 P θ 0 P θ1 1 δ 4 (1 h(p θ 0, P θ1 )). (3) 17 / 44

21 Le Cam s Lemma Construct such that A := {θ 0, θ 1 } Θ 1 inf ξ Θ (L(ξ, θ 0 ) + L(ξ, θ 1 )) δ, 2 P θ0 P θ1 1 c > 0. Then, for every estimator ˆθ, sup E θ L(ˆθ, θ) cδ θ Θ / 44

22 Examples (1) Normal location models Let Y 1,..., Y n N(θ, 1), where θ Θ R. Then for any estimator θ, ) 2 sup E θ ( θ θ Cn 1. θ Θ P θ = Pθ n (i.i.d. sample) and P θ = N(θ, 1) with a loss function L(ˆθ, θ) = (ˆθ θ) 2. (loss cond.) L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 2 L(θ 0, θ 1 ) (?) δ (testing cond.) V 2 (P θ0, P θ1 ) h 2 (P θ0, P θ1 ) nh 2 (P θ0, P θ1 ) nkl(p θ0, P θ1 ) = n (θ 1 θ 0 ) 2 2 (?) 1/2 Fix θ 0 R and take θ 1 = θ 0 + 1/ n. Then (2) is satisfied with δ = 1/(2n), and h 2 (P θ0, P θ1 ) 1/2. By (3), we have R(Θ) 1 8n (1 1/2). 19 / 44

23 (2) Nonparametric regression Let Y i = θ(x i ) + ɛ i for i = 1,..., n, where ɛ i N(0, 1), x i = i/n, θ Θ with Θ the set of all twice continuously differentiable functions on [0, 1], θ (x) < M. Then for any estimator θ and any x 0 [0, 1], 2 sup E θ ( θ(x0 ) θ(x 0 )) Cn 4/5. θ Θ P θ = n i=1 P θ,i and P θ,i = N(θ(i/n), 1) with a loss function L(ˆθ, θ) = (ˆθ(x 0 ) θ(x 0 )) 2. (loss cond.) L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 2 L(θ 0, θ 1 ) (?) δ (testing cond.) h 2 (P θ0, P θ1 ) KL(P θ0, P θ1 ) (?) 1/2 20 / 44

24 ) Let θ 0 = 0 on [0, 1] and θ 1 (x) = MhnK( 2 x x0 h n where ( K(t) = a exp )1 1 1 (2t) 2 { 2t 1} where a is a normalizing constant so K(t) is a kernel, and let h = cn 1/5. Check θ 0, θ 1 Θ. Check L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 2 L(θ 0, θ 1 ) = M 2 2 h4 nk 2 (0) = M 2 2 h4 na 2 exp( 2) Check KL(P θ0,p θ1 ) = = n i=1 n i=1 φ(u i ) φ(u i ) log n i=1 n i=1 φ(u i) n i=1 φ(u i θ 1 (x i )) du 1... du n ( u i θ 1 (x i ) + 12 θ 1(x i ) 2 ) du 1... du n 21 / 44

25 KL(P θ0,p θ1 ) = 1 n θ 1 (x i ) 2 2 i=1 = M 2 n h 4 2 na 2 exp i=1 M 2 2 h4 na 2 n i=1 2 ) 2 1 4( xi x 0 h n 1 {x0 hn 2 i n x 0+ hn 2 } M 2 2 h4 na 2 nh n = 1 2 a2 nh 5 n 1 { xi x 0 hn 2 } Since h n 1/5, by choosing a sufficiently small, we have the lower bound of order h 4 n n 4/5. 22 / 44

26 Multiple tests Let A = {0, 1} m and d(θ α, θ β ) = H(α, β) = m k=1 1 {α k β k } where α, β A. e.g. α = (0, 0,..., 1, 0, 1,..., 1) Start from R(Θ, ˆθ) δ E θα d(θˆα, θ α ) 2 A α A 23 / 44

27 R(Θ, ˆθ) δ E θα d(θˆα, θ α ) 2 A α A = δ m 2 2 m E θα 1{θˆαk θ αk } α A k=1 = δ m m 1 E θα 1 {ˆαk 0} m 1 E θα 1 {ˆαk 1} k=1 α A 0,k α A 1,k = δ m ( P0,k 1 4 {ˆαk 0} + P 1,k 1 {ˆαk 1}) δ 4 k=1 m P 0,k P 1,k 1 δ 4 m min P θ α P θβ 1 H(α,β)=1 k=1 where P i,k = 1 2 m 1 α A i,k P θα, and A i,k = {α A : α k = 0} where i = 0 or / 44

28 Assouad s Lemma Construct such that A := {θ α, α {0, 1} m } Θ 1 inf ξ Θ (L(ξ, θ α ) + L(ξ, θ β )) δh(α, β) α, β {0, 1} m, 2 P θα P θβ 1 c > 0 if H(α, β) = 1. Then, for every estimator ˆθ, sup E θ L(θ, ˆθ) cδ θ Θ 4 m. Notation: H(α, β) = α β 0 = m k=1 1 {α k β k }. 25 / 44

29 Examples (3) Gaussian sequence models Let Y i = θ i + σɛ i where ɛ i N(0, 1) and σ 0 with θ = (θ 1, θ 2,...) Θ = { θ : k k2s θ 2 k M} where M is a constant. Then for any estimator θ, sup E θ θ θ 2 2 Cσ 4s 1+2s. θ Θ P θk = N(θ k, σ 2 ) for k = 1, 2,... where M is a constant. P θ = k N P θ k L(θ, t) = k (θ k t k ) 2 satisfying L(θ α, t) + L(θ β, t) 1 2 L(θ α, θ β ). 26 / 44

30 Need to construct θ α Θ with α {0, 1} m s.t k (θ α,k θ β,k ) 2 δ α β 0 for all α, β {0, 1} m. 2 P θα P θβ 1 c > 0 for α β 0 = 1. Construct a finite subset by setting Then with δ = 1 2 ɛ θ α = (θ α1, θ α2,..., θ αm, 0, 0,...) = ɛ(α 1, α 2,..., α m, 0, 0,...). (θ αk θ βk ) 2 = 1 2 ɛ2 H(α, β) k 27 / 44

31 T A calculation: w.l.o.g. α 1 β 1, then ( P θα P θβ 1 = P 0 Q ) ( P 0 P ɛ Q P 0) i>m i>m 1 = P 0 P ɛ 1. With normal distribution, if ɛ = c 0 σ for σ 0. ɛ/2 P 0 P ɛ 1 = 2 φ σ 2(x ɛ)dx ( = 2 1 Φ( ɛ ) 2σ ) > 0 28 / 44

32 Check θ α Θ by letting m = (Mɛ 2 ) 1/(1+2s) since k k2s θ 2 k m k=1 k2s ɛ 2 m 1+2s ɛ 2 M. Lower bound is obtained as mɛ 2 ɛ 4s/(1+2s) σ 4s/(1+2s) which gives n 2s/(1+2s) with the usual choice σ = n 1/2. 29 / 44

33 (4) Smooth density estimation Let Y 1,..., Y n be an i.i.d. sample from a density f F s where F s is a class of s times differentiable densities with Hölder type condition ( f (s 1) (x + h) f (s 1) (x) ) 2 dx (M h ) 2. Then sup E f f F s ( f(x) f(x)) 2 dx Cn 2s 1+2s. P f = P n f where P f has a density f. Loss L(f, g) = (f(x) g(x)) 2 dx satisfying L(ξ, f α ) + L(ξ, f β ) 1 2 L(f α, f β ) 30 / 44

34 Typical construction: small perturbations on a base function m f α (x) = f 0 (x) + ɛ α k ϕ k (x) For the loss separation condition, 1 2 k=1 (f α (x) f β (x)) 2 = 1 ( m 2 ɛ2 (α k β k )ϕ k (x) (?) δ k=1 m (α k β k ) 2 k=1 Convenient if ϕ k ϕ k = V 1 {k=k } s.t. δ = V ɛ 2 /2. ) 2 31 / 44

35 Ibragimov and Has minskii(1983) Construct perturbed uniform densities on [0, 1] f 0 (x) = 1 ϕ k (x) = ϕ(mx k) for k = 1,..., m 1 where ( ϕ(x) = c 0 exp 1 x 1 ) 1 1 2x {0 x 1 2 } with ϕ( x) = ϕ(x), where c 0 is small enough to satisfy ϕ F. Perturbation function ϕ(x): infinitely differentiable, 1/2 ϕ(x)dx = 0. 1/2 1 0 f α(x)dx = 1 0 [1 + ɛ m k=1 α kϕ k (x)] dx = 1 with f α (x) c 1 > 0 32 / 44

36 perturbation function / 44

37 perturbation on the first interval / 44

38 perturbation on every interval / 44

39 ϕk ϕ k = 1{k = k } C m, then δ = Cɛ2 /(2m) Then for the testing condition with α β 0 = 1, we need χ 2 (fα f β ) 2 (P fα, P fβ ) := dx f α 1 (f α f β ) 2 c 1 = 1 c 1 δ α β 0 = δ c 1, 1 n δ = Cɛ2 2m with the lower bound mδ ɛ 2 m/n up to a constant. 34 / 44

40 [ ] ϕ (s 1) k (x) = m s 1 ϕ (s 1) (mx k + 1/2) Check f α F. (f (s 1) α if m ɛ 1/s. (x + h) f α (s 1) (x)) 2 dx = ɛ 2 ( k 2 α k (ϕ (s 1) k (x + h) ϕ (s 1) k (x))) dx [ ] ɛ 2 m 2s 2 ( ϕ (s 1) (x + mh) ϕ (s 1) (x)) 2 dx ɛ 2 m 2s 2 (M mh ) 2 since ϕ F = M 2 ɛ 2 m 2s h 2 (?) (M h ) 2 From testing condition, 1/n ɛ 2 /m ɛ 2+(1/s). That is, ɛ n s/(1+2s). Then the lower bound is ɛ 2 n 2s/(1+2s). 35 / 44

41 Lemma V (P, Q) = 1 2 Proof. p q dν = 1 (p q). Let A 0 = {x X : p(x) q(x)}. 1 Then V P (A 0 ) Q(A 0 ) = 1 2 p q dν = 1 (p q). 2 For all A A, A (p q)dν = A A 0 (p q) + A A q) { 0(p c max A (p q), } 0 A (q p) = 1 c 0 2 p q. By the above two, we have proved the claims. 36 / 44

42 Lemma 1 2 h2 (P, Q) V (P, Q) h(p, Q) 1 h2 (P,Q) 4. Proof h2 (P, Q) = 1 ( ) pq = 1 p>q pq + q>p pq ( 1 p>q q + ) q>p p = 1 (p q) = V (P, Q). ( (p ) 2 2 (1 1 2 h2 (P, Q)) 2 = q)(p q) (p q) (p q) = ( (p q) 2 ) (p q) = (1 V )(1 + V ) = 1 V 2. Thus V (P, Q) 2 1 (1 1 2 h2 (P, Q)) 2 = h(p, Q) 2 (1 1 4 h2 (P, Q)). 37 / 44

43 Lemma h 2 (P, Q) KL(P, Q) χ 2 (P, Q). Proof. 1 KL(P, Q) = pq>0 p log p q = 2 pq>0 p log p q 2 pq>0 p log ( q p ) 2 pq>0 p ( q p 1 ) = 2 pq + 2 = h 2 (P, Q). 2 KL(P, Q) = pq>0 p log p q = pq>0 p log ( p q ) p 2 q 1 = χ2 (P, Q). 38 / 44

44 Lemma ( h 2 (P n, Q n ) = 2 1 ( 1 h2 (P,Q)) ) n 2 nh 2 (P, Q). Proof h2 (P n, Q n ) = ( p n q n ) n ( ) = pq = 1 h2 (P,Q) n. 2 2 Use (1 a) n 1 an for 0 a < / 44

45 Lemma (Pinsker) KL(P,Q) V (P, Q) 2. Proof. 1 Let ψ(x) = x log x x + 1 for x 0 where 0 log 0 := 0. Note that ψ(0) = 1, ψ(1) = 0, ψ (1) = 0, ψ (x) = 1/x 0 and ψ(x) 0 for all x 0. ( ) 2 Use x ψ(x) (x 1) 2 where x 0. If P Q, V (P, Q) = 1 2 p q = 1 2 q>0 p q 1 q 1 ( 2 q>0 q p ) ( 3q ψ p ) q 1 ( 4q p ) 3 q>0 qψ( p) q = 1 2 pq>0 p log p q = KL(P,Q) / 44

46 psi(x) approx. by (x 1)^2 x * log(x) x (4/3+2x/3)*psi(x) (x 1)^ x x 41 / 44

47 Lemma V (P, Q) exp( KL(P, Q)). Proof. 1 W.l.o.g., KL(P, Q) < (i.e. P Q). 2 ( ( pq) 2 = exp 2 log ) pq>0 pq = ( exp 2 log ) ( pq>0 p q p exp 2 ) pq>0 p log q p = ( ) exp KL(P, Q). 3 Use 2 ( (p q) 2 (p q)) (p q) ( (p ) 2 ( ) 2 ( ) q)(p q) = pq exp KL(P, Q) and 2 (p q) = 2(1 V (P, Q)). 42 / 44

48 Theorem Suppose Π is a distribution on Θ such that R(θ, δ Π )dπ(θ) = sup R(θ, δ Π ). θ Then 1 δ Π is minimax. 2 Π is least favourable. Proof. (i) Let δ be any other procedure. Then sup θ R(θ, δ) R(θ, δ)dπ(θ) R(θ, δπ )dπ(θ) = sup θ R(θ, δ Π ). (ii) Let Π be some other distribution of θ. Then R(θ, δπ )dπ (θ) R(θ, δ Π )dπ (θ) sup θ R(θ, δ Π ) = R(θ, δπ )dπ(θ). 43 / 44

49 Theorem Suppose {Π n } is a sequence of prior distributions with Bayes risks R(θ, δ n )dπ n (θ) =: r Πn, and that δ is an estimator for which sup θ R(θ, δ) = lim n r Πn. Then 1 δ is minimax. 2 the sequence {Π n } is least favourable. Proof. (i) Suppose δ is any other estimator. Then sup θ R(θ, δ ) R(θ, δ )dπ n (θ) r Πn, and this holds for every n. Hence sup θ R(θ, δ ) sup θ R(θ, δ). Thus δ is minimax. (ii) If Π is any distribution, then R(θ, δπ )dπ(θ) R(θ, δ)dπ(θ) sup θ R(θ, δ) = lim n r Πn. The claim holds by definition. 44 / 44

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Semiparametric posterior limits

Semiparametric posterior limits Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations

More information

Bayesian statistics: Inference and decision theory

Bayesian statistics: Inference and decision theory Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:

More information

Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad

Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad 40.850: athematical Foundation of Big Data Analysis Spring 206 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Lecturer: Fang Han arch 07 Disclaimer: These notes have not been subjected to the

More information

10/36-702: Minimax Theory

10/36-702: Minimax Theory 0/36-70: Minimax Theory Introduction When solving a statistical learning problem, there are often many procedures to choose from. This leads to the following question: how can we tell if one statistical

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

4 Invariant Statistical Decision Problems

4 Invariant Statistical Decision Problems 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Chapter 7. Hypothesis Testing

Chapter 7. Hypothesis Testing Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function

More information

Translation Invariant Experiments with Independent Increments

Translation Invariant Experiments with Independent Increments Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 3 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a)

More information

STAT215: Solutions for Homework 2

STAT215: Solutions for Homework 2 STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Priors for the frequentist, consistency beyond Schwartz

Priors for the frequentist, consistency beyond Schwartz Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

3.0.1 Multivariate version and tensor product of experiments

3.0.1 Multivariate version and tensor product of experiments ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]

More information

On Bayes Risk Lower Bounds

On Bayes Risk Lower Bounds Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

Theory of Statistical Tests

Theory of Statistical Tests Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 19, 2011 Lecture 17: UMVUE and the first method of derivation Estimable parameters Let ϑ be a parameter in the family P. If there exists

More information

λ(x + 1)f g (x) > θ 0

λ(x + 1)f g (x) > θ 0 Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

Hypothesis testing: theory and methods

Hypothesis testing: theory and methods Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable

More information

Principles of Statistics

Principles of Statistics Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for

More information

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58

Overview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58 Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian

More information

Nonparametric estimation under Shape Restrictions

Nonparametric estimation under Shape Restrictions Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions

More information

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

1.1 Basis of Statistical Decision Theory

1.1 Basis of Statistical Decision Theory ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

Hypothesis Testing via Convex Optimization

Hypothesis Testing via Convex Optimization Hypothesis Testing via Convex Optimization Arkadi Nemirovski Joint research with Alexander Goldenshluger Haifa University Anatoli Iouditski Grenoble University Information Theory, Learning and Big Data

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

STAT 830 Bayesian Estimation

STAT 830 Bayesian Estimation STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These

More information

A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES

A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES S. DEKEL, G. KERKYACHARIAN, G. KYRIAZIS, AND P. PETRUSHEV Abstract. A new proof is given of the atomic decomposition of Hardy spaces H p, 0 < p 1,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes

Hypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis

More information

Lecture 21: Minimax Theory

Lecture 21: Minimax Theory Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

The Asymptotic Minimax Constant for Sup-Norm. Loss in Nonparametric Density Estimation

The Asymptotic Minimax Constant for Sup-Norm. Loss in Nonparametric Density Estimation The Asymptotic Minimax Constant or Sup-Norm Loss in Nonparametric Density Estimation ALEXANDER KOROSTELEV 1 and MICHAEL NUSSBAUM 2 1 Department o Mathematics, Wayne State University, Detroit, MI 48202,

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various

More information

Thanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2

Thanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2 p.1 Collaborators: Felix Abramowich Yoav Benjamini David Donoho Noureddine El Karoui Peter Forrester Gérard Kerkyacharian Debashis Paul Dominique Picard Bernard Silverman Thanks Presentation help: B. Narasimhan,

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated

More information

Theoretical Statistics. Lecture 25.

Theoretical Statistics. Lecture 25. Theoretical Statistics. Lecture 25. Peter Bartlett 1. Relative efficiency of tests [vdv14]: Rescaling rates. 2. Likelihood ratio tests [vdv15]. 1 Recall: Relative efficiency of tests Theorem: Suppose that

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Problem set 2 The central limit theorem.

Problem set 2 The central limit theorem. Problem set 2 The central limit theorem. Math 22a September 6, 204 Due Sept. 23 The purpose of this problem set is to walk through the proof of the central limit theorem of probability theory. Roughly

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer

More information

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND

More information

Part III. A Decision-Theoretic Approach and Bayesian testing

Part III. A Decision-Theoretic Approach and Bayesian testing Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

G8325: Variational Bayes

G8325: Variational Bayes G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c

More information

Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability

Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability p. 1/1 Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability p. 2/1 Perturbed Systems: Nonvanishing Perturbation Nominal System: Perturbed System: ẋ = f(x), f(0) = 0 ẋ

More information

Likelihood inference in the presence of nuisance parameters

Likelihood inference in the presence of nuisance parameters Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

ACOURSEINLARGESAMPLETHEORY by Thomas S. Ferguson 1996, Chapman & Hall

ACOURSEINLARGESAMPLETHEORY by Thomas S. Ferguson 1996, Chapman & Hall MAIN ERRATA last update: April 2002 ACOURSEINLARGESAMPLETHEORY by Thomas S. Ferguson 1996, Chapman & Hall Page 4, line -4. Delete i, the first character. Page 6, line 15. X 0 should be x 0. Line 17. F

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

9 Bayesian inference. 9.1 Subjective probability

9 Bayesian inference. 9.1 Subjective probability 9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Nancy Reid SS 6002A Office Hours by appointment

Nancy Reid SS 6002A Office Hours by appointment Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Bayesian Interpretations of Regularization

Bayesian Interpretations of Regularization Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information