Minimax lower bounds I
|
|
- Maryann Fitzgerald
- 5 years ago
- Views:
Transcription
1 Minimax lower bounds I Kyoung Hee Kim Sungshin University
2 1 Preliminaries 2 General strategy 3 Le Cam, Assouad, Appendix
3 Setting Family of probability measures {P θ : θ Θ} on a sigma field A Estimator ˆθ: measurable map from Ω to Θ L(ˆθ, θ): loss function Maximum risk R(Θ, ˆθ) := sup θ Θ E θ L(ˆθ, θ) Minimax risk with a minimax estimator ˆθ mm, R(Θ) = inf θ sup θ Θ E θ L( θ, θ) = sup E θ L(ˆθ mm, θ). θ Θ 1 / 44
4 Why minimax? Uniformity excludes super-efficient estimators Optimal rates reveal the difficulty in the model of interest Guidance on the choice of estimators 2 / 44
5 Various distances P, Q: two probability measures with densities p, q w.r.t ν. Total variation distance V (P, Q) = sup P (A) Q(A) = sup (p q)dν A A A A A Squared Hellinger distance h 2 (P, Q) = (p 1/2 q 1/2 ) 2 dν. Kullback Leibler (KL) divergence KL(P, Q) = p log p q dν. Chi-squared χ 2 distance p χ 2 2 (P, Q) = q dν 1. 3 / 44
6 Relation between distances P, Q: two probability measures with densities p, q w.r.t ν h2 (P, Q) V (P, Q) h(p, Q) 1 h2 (P,Q) 4 2 V (P, Q) h(p, Q) KL(P, Q) χ 2 (P, Q) 3 h 2 (P n, Q n ) nh 2 (P, Q) KL(P,Q) 4 (Pinsker) V (P, Q) 2 and V (P, Q) exp( KL(P, Q)). 4 / 44
7 Relation to Bayes estimator Let ˆθ Π be Bayes estimator using a prior Π. Least favourable prior gives a maximal Bayes risk (BR) for all prior distributions If Bayes risk of ˆθ Π is equal to a maximal risk of ˆθ Π, then ˆθ Π is minimax, and Π is least favourable. If maximal risk of ˆθ is equal to lim n BR of ˆθ Πn, converging limit of Bayes risks with a sequence of priors (which is at least any Bayes risk for any prior), then ˆθ is minimax. 5 / 44
8 Examples of minimax estimator 1 Sample mean Let X i be i.i.d. N(θ, 1), i = 1,..., n and let L(ˆθ, θ) = (ˆθ θ) 2. X is minimax with a sequence of prior Π b = N(µ, b) since ˆθ Πb = (n X + µ/b)/(n + 1/b) and the posterior variance 1/(n + 1/b) converges to 1/n as b, which is equal to V ar( X). 2 James Stein estimator Let X N d (θ, I d ) where d 3, and let L(ˆθ, θ) = d i=1 (ˆθ i θ i ) 2. Minimax ( estimator ˆθ ) mm (dominating X) is found as ˆθ mm = 1 d 2 d X i=1 X2 i But ˆθ mm is not unique minimax (and ( not admissible). ) For instance, positive part estimator 1 d 2 d X i=1 X2 i dominates ˆθ mm. + 6 / 44
9 risk function (d=4) JS, Minimax X, MLE, UMVUE Positive JS, Minimax θ 7 / 44
10 Minimax optimality Usually difficult to find minimax risk and minimax estimator. Typically satisfied if we find a good lower bound l(n) on R(Θ) and if we find a good upper bound u(n) by using a specific estimator θ, where l(n) R(Θ) sup E θ L( θ, θ) u(n) (1) θ Θ u(n) lim n l(n) C. If θ satisfies (1), then θ is called a minimax optimal estimator. 8 / 44
11 Possible questions 1 In a nonparametric regression problem where the regression function θ satisfies some smooth conditions (e.g. twice conti. differentiable), with the squared error loss function at one point, what is a minimax lower bound? That is, sup E θ ( θ(x 0 ) θ(x 0 )) 2...? θ Θ 2 In a density estimation problem where a density f on R is assumed to be s times differentiable, with the integrated squared error loss function, what is a mnimax lower bound? That is, sup E f ( f(x) f(x)) 2 dx...? f F s 9 / 44
12 Lower bounds via T V distance d is a metric on Θ d(θ 0, θ 1 ) 2δ where θ 0, θ 1 Θ Denote A 0 := {ω : d(θ 0, ˆθ(ω)) < δ}. (1) Risk bound implies T V bound. Suppose P θ {d(θ, ˆθ) δ} ɛ, θ Θ Then P θ0 A 0 1 ɛ and P θ1 A 0 ɛ, which shows V (P θ0, P θ1 ) P θ0 A 0 P θ1 A 0 1 2ɛ 10 / 44
13 Lower bounds via T V distance d is a metric on Θ d(θ 0, θ 1 ) 2δ where θ 0, θ 1 Θ Denote A 0 := {ω : d(θ 0, ˆθ(ω)) < δ}. (2) T V bound implies risk bound. Suppose V (P θ0, P θ1 ) < 1 2ɛ. Then, sup θ Θ P θ (d(θ, ˆθ) ) δ = R(Θ, ˆθ) > ɛ since 2R(Θ, ˆθ) P θ0 (d(θ 0, ˆθ) ) ( δ + P θ1 d(θ 1, ˆθ) ) δ P θ0 A c 0 + P θ1 A 0 1 sup P θ0 A P θ1 A > 2ɛ. A 10 / 44
14 Lower bounds via testing Above argument uses L(ξ, θ) = 1 {d(ξ,θ) δ} L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 for all ξ if d(θ 0, θ 1 ) 2δ. For a general loss function, Suppose inf ξ Θ (L(ξ, θ 0 ) + L(ξ, θ 1 )) d(θ 0, θ 1 ) for a pair θ 0 and θ 1 in Θ. Then ( L(ξ, θ0 ) inf ξ Θ d(θ 0, θ 1 ) + L(ξ, θ ) 1) 1. d(θ 0, θ 1 ) 11 / 44
15 Lower bounds via testing 2R(Θ, ˆθ) E θ0 L(ˆθ, θ 0 ) + E θ1 L(ˆθ, θ 1 ) [ ] L(ˆθ, θ 0 ) = d(θ 0, θ 1 ) E θ0 d(θ 0, θ 1 ) + E L(ˆθ, θ 1 ) θ 1 d(θ 0, θ 1 ) d(θ 0, θ 1 ) inf [E θ 0 f 0 + E θ1 f 1 ] f 0 +f 1 1,f 0,f 1 0 Note that inf E θ 0 f 0 + E θ1 f 1 = (p θ0 p θ1 ) dν f 0 +f 1 1,f 0,f 1 0 = 1 1 p θ0 p θ1 dν 2 = 1 V (P θ0, P θ1 ) 12 / 44
16 N(θ 0, 1) N(θ 1, 1) errori θ 0 + θ 1 2 errorii / 44
17 Testing affinity (T A) Testing affinity: (p θ0 p θ1 ) dν := P θ0 P θ1 1 T A = E θ0 1 {pθ0 p θ1 } + E θ1 1 {pθ1 p θ0 } = Sum of two errors If P θ0 = P θ1, then T A = 1 meaning testing impossible If supp(p θ0 ) = [0, 1], supp(p θ1 ) = [2, 3], then T A = 0 meaning perfect test Calculation of T A for product measures P θ0 = P n 0, P θ 1 = P n 1 (1 T A) 2 nh 2 (P 0, P 1 ) h 2 (P 0, P 1 ) KL(P 0, P 1 ) χ 2 (P 0, P 1 ) for P 0 P 1 14 / 44
18 Reduction strategy Restrict attention to a finite subset where A is a finite index set. With an estimator ˆθ, define Θ F := {θ α : α A} Θ ˆα := arg min α A L(ˆθ, θ α ). Assume with a metric d, for some constant δ > 0, inf (L(ξ, θ α) + L(ξ, θ β )) δd(θ α, θ β ) > 0. α, β A (2) ξ Θ 15 / 44
19 Reduction strategy Reduce maximum risk by the average risk over the finite sub parameter space R(Θ, ˆθ) = 1 2 sup E θ 2L(ˆθ, θ) θ Θ 1 2 max α A E θ α 2L(ˆθ, θ α ) 1 ( 2 max E θ α L(ˆθ, θ α ) + L(ˆθ, θˆα ) α A ) by def. of ˆα δ 2 max E θ α d(θˆα, θ α ) by condition (2) αa δ E θα d(θˆα, θ α ). bound max by average 2 A α A 16 / 44
20 Two-point tests Let A = {0, 1} and d(θ α, θ β ) = 1 {θα θ β } where α, β A. By the above calculation, R(Θ, ˆθ) δ 4 (E θ 0 d(θˆα, θ 0 ) + E θ1 d(θˆα, θ 1 )) Thus δ 4 (E θ 0 (ˆα = 1) + E θ1 (ˆα = 0)) δ 4 inf {E θ 0 f 0 + E θ1 f 1 : 0 f 0, f 1 1, f 0 + f 1 = 1} δ 4 P θ 0 P θ1 1. R(Θ) δ 4 P θ 0 P θ1 1 δ 4 (1 h(p θ 0, P θ1 )). (3) 17 / 44
21 Le Cam s Lemma Construct such that A := {θ 0, θ 1 } Θ 1 inf ξ Θ (L(ξ, θ 0 ) + L(ξ, θ 1 )) δ, 2 P θ0 P θ1 1 c > 0. Then, for every estimator ˆθ, sup E θ L(ˆθ, θ) cδ θ Θ / 44
22 Examples (1) Normal location models Let Y 1,..., Y n N(θ, 1), where θ Θ R. Then for any estimator θ, ) 2 sup E θ ( θ θ Cn 1. θ Θ P θ = Pθ n (i.i.d. sample) and P θ = N(θ, 1) with a loss function L(ˆθ, θ) = (ˆθ θ) 2. (loss cond.) L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 2 L(θ 0, θ 1 ) (?) δ (testing cond.) V 2 (P θ0, P θ1 ) h 2 (P θ0, P θ1 ) nh 2 (P θ0, P θ1 ) nkl(p θ0, P θ1 ) = n (θ 1 θ 0 ) 2 2 (?) 1/2 Fix θ 0 R and take θ 1 = θ 0 + 1/ n. Then (2) is satisfied with δ = 1/(2n), and h 2 (P θ0, P θ1 ) 1/2. By (3), we have R(Θ) 1 8n (1 1/2). 19 / 44
23 (2) Nonparametric regression Let Y i = θ(x i ) + ɛ i for i = 1,..., n, where ɛ i N(0, 1), x i = i/n, θ Θ with Θ the set of all twice continuously differentiable functions on [0, 1], θ (x) < M. Then for any estimator θ and any x 0 [0, 1], 2 sup E θ ( θ(x0 ) θ(x 0 )) Cn 4/5. θ Θ P θ = n i=1 P θ,i and P θ,i = N(θ(i/n), 1) with a loss function L(ˆθ, θ) = (ˆθ(x 0 ) θ(x 0 )) 2. (loss cond.) L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 2 L(θ 0, θ 1 ) (?) δ (testing cond.) h 2 (P θ0, P θ1 ) KL(P θ0, P θ1 ) (?) 1/2 20 / 44
24 ) Let θ 0 = 0 on [0, 1] and θ 1 (x) = MhnK( 2 x x0 h n where ( K(t) = a exp )1 1 1 (2t) 2 { 2t 1} where a is a normalizing constant so K(t) is a kernel, and let h = cn 1/5. Check θ 0, θ 1 Θ. Check L(ξ, θ 0 ) + L(ξ, θ 1 ) 1 2 L(θ 0, θ 1 ) = M 2 2 h4 nk 2 (0) = M 2 2 h4 na 2 exp( 2) Check KL(P θ0,p θ1 ) = = n i=1 n i=1 φ(u i ) φ(u i ) log n i=1 n i=1 φ(u i) n i=1 φ(u i θ 1 (x i )) du 1... du n ( u i θ 1 (x i ) + 12 θ 1(x i ) 2 ) du 1... du n 21 / 44
25 KL(P θ0,p θ1 ) = 1 n θ 1 (x i ) 2 2 i=1 = M 2 n h 4 2 na 2 exp i=1 M 2 2 h4 na 2 n i=1 2 ) 2 1 4( xi x 0 h n 1 {x0 hn 2 i n x 0+ hn 2 } M 2 2 h4 na 2 nh n = 1 2 a2 nh 5 n 1 { xi x 0 hn 2 } Since h n 1/5, by choosing a sufficiently small, we have the lower bound of order h 4 n n 4/5. 22 / 44
26 Multiple tests Let A = {0, 1} m and d(θ α, θ β ) = H(α, β) = m k=1 1 {α k β k } where α, β A. e.g. α = (0, 0,..., 1, 0, 1,..., 1) Start from R(Θ, ˆθ) δ E θα d(θˆα, θ α ) 2 A α A 23 / 44
27 R(Θ, ˆθ) δ E θα d(θˆα, θ α ) 2 A α A = δ m 2 2 m E θα 1{θˆαk θ αk } α A k=1 = δ m m 1 E θα 1 {ˆαk 0} m 1 E θα 1 {ˆαk 1} k=1 α A 0,k α A 1,k = δ m ( P0,k 1 4 {ˆαk 0} + P 1,k 1 {ˆαk 1}) δ 4 k=1 m P 0,k P 1,k 1 δ 4 m min P θ α P θβ 1 H(α,β)=1 k=1 where P i,k = 1 2 m 1 α A i,k P θα, and A i,k = {α A : α k = 0} where i = 0 or / 44
28 Assouad s Lemma Construct such that A := {θ α, α {0, 1} m } Θ 1 inf ξ Θ (L(ξ, θ α ) + L(ξ, θ β )) δh(α, β) α, β {0, 1} m, 2 P θα P θβ 1 c > 0 if H(α, β) = 1. Then, for every estimator ˆθ, sup E θ L(θ, ˆθ) cδ θ Θ 4 m. Notation: H(α, β) = α β 0 = m k=1 1 {α k β k }. 25 / 44
29 Examples (3) Gaussian sequence models Let Y i = θ i + σɛ i where ɛ i N(0, 1) and σ 0 with θ = (θ 1, θ 2,...) Θ = { θ : k k2s θ 2 k M} where M is a constant. Then for any estimator θ, sup E θ θ θ 2 2 Cσ 4s 1+2s. θ Θ P θk = N(θ k, σ 2 ) for k = 1, 2,... where M is a constant. P θ = k N P θ k L(θ, t) = k (θ k t k ) 2 satisfying L(θ α, t) + L(θ β, t) 1 2 L(θ α, θ β ). 26 / 44
30 Need to construct θ α Θ with α {0, 1} m s.t k (θ α,k θ β,k ) 2 δ α β 0 for all α, β {0, 1} m. 2 P θα P θβ 1 c > 0 for α β 0 = 1. Construct a finite subset by setting Then with δ = 1 2 ɛ θ α = (θ α1, θ α2,..., θ αm, 0, 0,...) = ɛ(α 1, α 2,..., α m, 0, 0,...). (θ αk θ βk ) 2 = 1 2 ɛ2 H(α, β) k 27 / 44
31 T A calculation: w.l.o.g. α 1 β 1, then ( P θα P θβ 1 = P 0 Q ) ( P 0 P ɛ Q P 0) i>m i>m 1 = P 0 P ɛ 1. With normal distribution, if ɛ = c 0 σ for σ 0. ɛ/2 P 0 P ɛ 1 = 2 φ σ 2(x ɛ)dx ( = 2 1 Φ( ɛ ) 2σ ) > 0 28 / 44
32 Check θ α Θ by letting m = (Mɛ 2 ) 1/(1+2s) since k k2s θ 2 k m k=1 k2s ɛ 2 m 1+2s ɛ 2 M. Lower bound is obtained as mɛ 2 ɛ 4s/(1+2s) σ 4s/(1+2s) which gives n 2s/(1+2s) with the usual choice σ = n 1/2. 29 / 44
33 (4) Smooth density estimation Let Y 1,..., Y n be an i.i.d. sample from a density f F s where F s is a class of s times differentiable densities with Hölder type condition ( f (s 1) (x + h) f (s 1) (x) ) 2 dx (M h ) 2. Then sup E f f F s ( f(x) f(x)) 2 dx Cn 2s 1+2s. P f = P n f where P f has a density f. Loss L(f, g) = (f(x) g(x)) 2 dx satisfying L(ξ, f α ) + L(ξ, f β ) 1 2 L(f α, f β ) 30 / 44
34 Typical construction: small perturbations on a base function m f α (x) = f 0 (x) + ɛ α k ϕ k (x) For the loss separation condition, 1 2 k=1 (f α (x) f β (x)) 2 = 1 ( m 2 ɛ2 (α k β k )ϕ k (x) (?) δ k=1 m (α k β k ) 2 k=1 Convenient if ϕ k ϕ k = V 1 {k=k } s.t. δ = V ɛ 2 /2. ) 2 31 / 44
35 Ibragimov and Has minskii(1983) Construct perturbed uniform densities on [0, 1] f 0 (x) = 1 ϕ k (x) = ϕ(mx k) for k = 1,..., m 1 where ( ϕ(x) = c 0 exp 1 x 1 ) 1 1 2x {0 x 1 2 } with ϕ( x) = ϕ(x), where c 0 is small enough to satisfy ϕ F. Perturbation function ϕ(x): infinitely differentiable, 1/2 ϕ(x)dx = 0. 1/2 1 0 f α(x)dx = 1 0 [1 + ɛ m k=1 α kϕ k (x)] dx = 1 with f α (x) c 1 > 0 32 / 44
36 perturbation function / 44
37 perturbation on the first interval / 44
38 perturbation on every interval / 44
39 ϕk ϕ k = 1{k = k } C m, then δ = Cɛ2 /(2m) Then for the testing condition with α β 0 = 1, we need χ 2 (fα f β ) 2 (P fα, P fβ ) := dx f α 1 (f α f β ) 2 c 1 = 1 c 1 δ α β 0 = δ c 1, 1 n δ = Cɛ2 2m with the lower bound mδ ɛ 2 m/n up to a constant. 34 / 44
40 [ ] ϕ (s 1) k (x) = m s 1 ϕ (s 1) (mx k + 1/2) Check f α F. (f (s 1) α if m ɛ 1/s. (x + h) f α (s 1) (x)) 2 dx = ɛ 2 ( k 2 α k (ϕ (s 1) k (x + h) ϕ (s 1) k (x))) dx [ ] ɛ 2 m 2s 2 ( ϕ (s 1) (x + mh) ϕ (s 1) (x)) 2 dx ɛ 2 m 2s 2 (M mh ) 2 since ϕ F = M 2 ɛ 2 m 2s h 2 (?) (M h ) 2 From testing condition, 1/n ɛ 2 /m ɛ 2+(1/s). That is, ɛ n s/(1+2s). Then the lower bound is ɛ 2 n 2s/(1+2s). 35 / 44
41 Lemma V (P, Q) = 1 2 Proof. p q dν = 1 (p q). Let A 0 = {x X : p(x) q(x)}. 1 Then V P (A 0 ) Q(A 0 ) = 1 2 p q dν = 1 (p q). 2 For all A A, A (p q)dν = A A 0 (p q) + A A q) { 0(p c max A (p q), } 0 A (q p) = 1 c 0 2 p q. By the above two, we have proved the claims. 36 / 44
42 Lemma 1 2 h2 (P, Q) V (P, Q) h(p, Q) 1 h2 (P,Q) 4. Proof h2 (P, Q) = 1 ( ) pq = 1 p>q pq + q>p pq ( 1 p>q q + ) q>p p = 1 (p q) = V (P, Q). ( (p ) 2 2 (1 1 2 h2 (P, Q)) 2 = q)(p q) (p q) (p q) = ( (p q) 2 ) (p q) = (1 V )(1 + V ) = 1 V 2. Thus V (P, Q) 2 1 (1 1 2 h2 (P, Q)) 2 = h(p, Q) 2 (1 1 4 h2 (P, Q)). 37 / 44
43 Lemma h 2 (P, Q) KL(P, Q) χ 2 (P, Q). Proof. 1 KL(P, Q) = pq>0 p log p q = 2 pq>0 p log p q 2 pq>0 p log ( q p ) 2 pq>0 p ( q p 1 ) = 2 pq + 2 = h 2 (P, Q). 2 KL(P, Q) = pq>0 p log p q = pq>0 p log ( p q ) p 2 q 1 = χ2 (P, Q). 38 / 44
44 Lemma ( h 2 (P n, Q n ) = 2 1 ( 1 h2 (P,Q)) ) n 2 nh 2 (P, Q). Proof h2 (P n, Q n ) = ( p n q n ) n ( ) = pq = 1 h2 (P,Q) n. 2 2 Use (1 a) n 1 an for 0 a < / 44
45 Lemma (Pinsker) KL(P,Q) V (P, Q) 2. Proof. 1 Let ψ(x) = x log x x + 1 for x 0 where 0 log 0 := 0. Note that ψ(0) = 1, ψ(1) = 0, ψ (1) = 0, ψ (x) = 1/x 0 and ψ(x) 0 for all x 0. ( ) 2 Use x ψ(x) (x 1) 2 where x 0. If P Q, V (P, Q) = 1 2 p q = 1 2 q>0 p q 1 q 1 ( 2 q>0 q p ) ( 3q ψ p ) q 1 ( 4q p ) 3 q>0 qψ( p) q = 1 2 pq>0 p log p q = KL(P,Q) / 44
46 psi(x) approx. by (x 1)^2 x * log(x) x (4/3+2x/3)*psi(x) (x 1)^ x x 41 / 44
47 Lemma V (P, Q) exp( KL(P, Q)). Proof. 1 W.l.o.g., KL(P, Q) < (i.e. P Q). 2 ( ( pq) 2 = exp 2 log ) pq>0 pq = ( exp 2 log ) ( pq>0 p q p exp 2 ) pq>0 p log q p = ( ) exp KL(P, Q). 3 Use 2 ( (p q) 2 (p q)) (p q) ( (p ) 2 ( ) 2 ( ) q)(p q) = pq exp KL(P, Q) and 2 (p q) = 2(1 V (P, Q)). 42 / 44
48 Theorem Suppose Π is a distribution on Θ such that R(θ, δ Π )dπ(θ) = sup R(θ, δ Π ). θ Then 1 δ Π is minimax. 2 Π is least favourable. Proof. (i) Let δ be any other procedure. Then sup θ R(θ, δ) R(θ, δ)dπ(θ) R(θ, δπ )dπ(θ) = sup θ R(θ, δ Π ). (ii) Let Π be some other distribution of θ. Then R(θ, δπ )dπ (θ) R(θ, δ Π )dπ (θ) sup θ R(θ, δ Π ) = R(θ, δπ )dπ(θ). 43 / 44
49 Theorem Suppose {Π n } is a sequence of prior distributions with Bayes risks R(θ, δ n )dπ n (θ) =: r Πn, and that δ is an estimator for which sup θ R(θ, δ) = lim n r Πn. Then 1 δ is minimax. 2 the sequence {Π n } is least favourable. Proof. (i) Suppose δ is any other estimator. Then sup θ R(θ, δ ) R(θ, δ )dπ n (θ) r Πn, and this holds for every n. Hence sup θ R(θ, δ ) sup θ R(θ, δ). Thus δ is minimax. (ii) If Π is any distribution, then R(θ, δπ )dπ(θ) R(θ, δ)dπ(θ) sup θ R(θ, δ) = lim n r Πn. The claim holds by definition. 44 / 44
Lecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationSemiparametric posterior limits
Statistics Department, Seoul National University, Korea, 2012 Semiparametric posterior limits for regular and some irregular problems Bas Kleijn, KdV Institute, University of Amsterdam Based on collaborations
More informationBayesian statistics: Inference and decision theory
Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:
More informationLecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad
40.850: athematical Foundation of Big Data Analysis Spring 206 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Lecturer: Fang Han arch 07 Disclaimer: These notes have not been subjected to the
More information10/36-702: Minimax Theory
0/36-70: Minimax Theory Introduction When solving a statistical learning problem, there are often many procedures to choose from. This leads to the following question: how can we tell if one statistical
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationRATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University
Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationNonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII
Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric
More informationChapter 7. Hypothesis Testing
Chapter 7. Hypothesis Testing Joonpyo Kim June 24, 2017 Joonpyo Kim Ch7 June 24, 2017 1 / 63 Basic Concepts of Testing Suppose that our interest centers on a random variable X which has density function
More informationTranslation Invariant Experiments with Independent Increments
Translation Invariant Statistical Experiments with Independent Increments (joint work with Nino Kordzakhia and Alex Novikov Steklov Mathematical Institute St.Petersburg, June 10, 2013 Outline 1 Introduction
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 3 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a)
More informationSTAT215: Solutions for Homework 2
STAT25: Solutions for Homework 2 Due: Wednesday, Feb 4. (0 pt) Suppose we take one observation, X, from the discrete distribution, x 2 0 2 Pr(X x θ) ( θ)/4 θ/2 /2 (3 θ)/2 θ/4, 0 θ Find an unbiased estimator
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationPriors for the frequentist, consistency beyond Schwartz
Victoria University, Wellington, New Zealand, 11 January 2016 Priors for the frequentist, consistency beyond Schwartz Bas Kleijn, KdV Institute for Mathematics Part I Introduction Bayesian and Frequentist
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More information3.0.1 Multivariate version and tensor product of experiments
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]
More informationOn Bayes Risk Lower Bounds
Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationMinimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.
Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of
More informationNonparametric Bayesian Uncertainty Quantification
Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationContext tree models for source coding
Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding
More informationTheory of Statistical Tests
Ch 9. Theory of Statistical Tests 9.1 Certain Best Tests How to construct good testing. For simple hypothesis H 0 : θ = θ, H 1 : θ = θ, Page 1 of 100 where Θ = {θ, θ } 1. Define the best test for H 0 H
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 19, 2011 Lecture 17: UMVUE and the first method of derivation Estimable parameters Let ϑ be a parameter in the family P. If there exists
More informationλ(x + 1)f g (x) > θ 0
Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where
More informationQualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf
Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationVariance Function Estimation in Multivariate Nonparametric Regression
Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationHypothesis testing: theory and methods
Statistical Methods Warsaw School of Economics November 3, 2017 Statistical hypothesis is the name of any conjecture about unknown parameters of a population distribution. The hypothesis should be verifiable
More informationPrinciples of Statistics
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 81 Paper 4, Section II 28K Let g : R R be an unknown function, twice continuously differentiable with g (x) M for
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More informationNonparametric estimation under Shape Restrictions
Nonparametric estimation under Shape Restrictions Jon A. Wellner University of Washington, Seattle Statistical Seminar, Frejus, France August 30 - September 3, 2010 Outline: Five Lectures on Shape Restrictions
More informationLecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses
Information and Coding Theory Autumn 07 Lecturer: Madhur Tulsiani Lecture 9: October 5, 07 Lower bounds for minimax rates via multiple hypotheses In this lecture, we extend the ideas from the previous
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More information1.1 Basis of Statistical Decision Theory
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 1: Introduction Lecturer: Yihong Wu Scribe: AmirEmad Ghassami, Jan 21, 2016 [Ed. Jan 31] Outline: Introduction of
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationHypothesis Testing via Convex Optimization
Hypothesis Testing via Convex Optimization Arkadi Nemirovski Joint research with Alexander Goldenshluger Haifa University Anatoli Iouditski Grenoble University Information Theory, Learning and Big Data
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationSTAT 830 Bayesian Estimation
STAT 830 Bayesian Estimation Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Bayesian Estimation STAT 830 Fall 2011 1 / 23 Purposes of These
More informationA NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES
A NEW PROOF OF THE ATOMIC DECOMPOSITION OF HARDY SPACES S. DEKEL, G. KERKYACHARIAN, G. KYRIAZIS, AND P. PETRUSHEV Abstract. A new proof is given of the atomic decomposition of Hardy spaces H p, 0 < p 1,
More informationProbabilistic Graphical Models
Parameter Estimation December 14, 2015 Overview 1 Motivation 2 3 4 What did we have so far? 1 Representations: how do we model the problem? (directed/undirected). 2 Inference: given a model and partially
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationHypothesis Test. The opposite of the null hypothesis, called an alternative hypothesis, becomes
Neyman-Pearson paradigm. Suppose that a researcher is interested in whether the new drug works. The process of determining whether the outcome of the experiment points to yes or no is called hypothesis
More informationLecture 21: Minimax Theory
Lecture : Minimax Theory Akshay Krishnamurthy akshay@cs.umass.edu November 8, 07 Recap In the first part of the course, we spent the majority of our time studying risk minimization. We found many ways
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationLecture 3 January 28
EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only
More informationThe Asymptotic Minimax Constant for Sup-Norm. Loss in Nonparametric Density Estimation
The Asymptotic Minimax Constant or Sup-Norm Loss in Nonparametric Density Estimation ALEXANDER KOROSTELEV 1 and MICHAEL NUSSBAUM 2 1 Department o Mathematics, Wayne State University, Detroit, MI 48202,
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationSTA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources
STA 732: Inference Notes 10. Parameter Estimation from a Decision Theoretic Angle Other resources 1 Statistical rules, loss and risk We saw that a major focus of classical statistics is comparing various
More informationThanks. Presentation help: B. Narasimhan, N. El Karoui, J-M Corcuera Grant Support: NIH, NSF, ARC (via P. Hall) p.2
p.1 Collaborators: Felix Abramowich Yoav Benjamini David Donoho Noureddine El Karoui Peter Forrester Gérard Kerkyacharian Debashis Paul Dominique Picard Bernard Silverman Thanks Presentation help: B. Narasimhan,
More informationIntroduction to Statistical Learning Theory
Introduction to Statistical Learning Theory In the last unit we looked at regularization - adding a w 2 penalty. We add a bias - we prefer classifiers with low norm. How to incorporate more complicated
More informationTheoretical Statistics. Lecture 25.
Theoretical Statistics. Lecture 25. Peter Bartlett 1. Relative efficiency of tests [vdv14]: Rescaling rates. 2. Likelihood ratio tests [vdv15]. 1 Recall: Relative efficiency of tests Theorem: Suppose that
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationSUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth
More informationProblem set 2 The central limit theorem.
Problem set 2 The central limit theorem. Math 22a September 6, 204 Due Sept. 23 The purpose of this problem set is to walk through the proof of the central limit theorem of probability theory. Roughly
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationStatistics Ph.D. Qualifying Exam: Part I October 18, 2003
Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer
More informationA BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo
A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationChapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic
Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic Unbiased estimation Unbiased or asymptotically unbiased estimation plays an important role in
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationG8325: Variational Bayes
G8325: Variational Bayes Vincent Dorie Columbia University Wednesday, November 2nd, 2011 bridge Variational University Bayes Press 2003. On-screen viewing permitted. Printing not permitted. http://www.c
More informationNonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability
p. 1/1 Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability p. 2/1 Perturbed Systems: Nonvanishing Perturbation Nominal System: Perturbed System: ẋ = f(x), f(0) = 0 ẋ
More informationLikelihood inference in the presence of nuisance parameters
Likelihood inference in the presence of nuisance parameters Nancy Reid, University of Toronto www.utstat.utoronto.ca/reid/research 1. Notation, Fisher information, orthogonal parameters 2. Likelihood inference
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationACOURSEINLARGESAMPLETHEORY by Thomas S. Ferguson 1996, Chapman & Hall
MAIN ERRATA last update: April 2002 ACOURSEINLARGESAMPLETHEORY by Thomas S. Ferguson 1996, Chapman & Hall Page 4, line -4. Delete i, the first character. Page 6, line 15. X 0 should be x 0. Line 17. F
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More information9 Bayesian inference. 9.1 Subjective probability
9 Bayesian inference 1702-1761 9.1 Subjective probability This is probability regarded as degree of belief. A subjective probability of an event A is assessed as p if you are prepared to stake pm to win
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationNancy Reid SS 6002A Office Hours by appointment
Nancy Reid SS 6002A reid@utstat.utoronto.ca Office Hours by appointment Problems assigned weekly, due the following week http://www.utstat.toronto.edu/reid/4508s16.html Various types of likelihood 1. likelihood,
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationBayesian Interpretations of Regularization
Bayesian Interpretations of Regularization Charlie Frogner 9.50 Class 15 April 1, 009 The Plan Regularized least squares maps {(x i, y i )} n i=1 to a function that minimizes the regularized loss: f S
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More information