Inverse Statistical Learning
|
|
- Wendy Horton
- 6 years ago
- Views:
Transcription
1 Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning 1 / 51
2 The problem of Inverse Statistical Learning Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), Inverse Statistical Learning 2 / 51
3 The problem of Inverse Statistical Learning Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), from an indirect sequence of observations : (Z 1, Y 1 ),..., (Z n, Y n ) i.i.d. from P, where Z i Af, A is a linear compact operator (and X f ). Inverse Statistical Learning 2 / 51
4 Statistical Learning with errors in variables Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), from a noisy sequence of observations : (X 1 + ɛ 1, Y 1 ),..., (X n + ɛ n, Y n ) i.i.d. from P, where Z i f η and η is the density of the i.i.d. sequence (ɛ i ) n i=1. Inverse Statistical Learning 3 / 51
5 Statistical Learning with errors in variables Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), from a noisy sequence of observations : (X 1 + ɛ 1, Y 1 ),..., (X n + ɛ n, Y n ) i.i.d. from P, where Z i f η and η is the density of the i.i.d. sequence (ɛ i ) n i=1. Y = R : regression with errors in variables, Y = {1,..., M} : classification with errors in variables, Y = : unsupervised learning with errors in variables. Inverse Statistical Learning 3 / 51
6 Toy Example (I) Direct dataset (Unobservable) Observations (Available) Inverse Statistical Learning 4 / 51
7 Toy example (II) Direct dataset (Unobservable) Observations (Available) Inverse Statistical Learning 5 / 51
8 Real-world example in oncology (I) Fig.1 : The same tumor observed by two radiologists Z ij = X i + ɛ ij, j {1, 2}. Inverse Statistical Learning 6 / 51
9 Real-world example in oncology (II) Fig.1 : Batch effect in a Micro-array dataset J. A. Gagnon-Bartsch, L. Jacob and T. P. Speed, 2013 Inverse Statistical Learning 7 / 51
10 Minimax rates Heuristics, proofs Noisy K-means algorithm Adaptation using ERC Contents 1. Minimax rates in discriminant analysis 2. Excess risk bound 3. The algorithm of noisy k-means (4.) Adaptation Inverse Statistical Learning 8 / 51
11 Origin : a minimax motivation (with C. Marteau) Direct case Density estimation Classification n 2γ 2γ+1 n γ(α+1) γ(α+2)+d Noisy case n 2γ 2γ+2β+1??? f Σ(γ, L) E(Y = 1 X = x) Σ(γ, L) Assumptions Margin parameter α 0 F[η](t) t β F[η j ](t) t j β j j = 1,..., d Inverse Statistical Learning 9 / 51
12 Mammen and Tsybakov (1999) Given two densities f and g, for any G K, the Bayes risk is defined as : [ R K (G) = 1 ] fdq + gdq. 2 K/G G Given X 1 1,..., X 1 n f and X 2 1,..., X 2 n g, we aim at : G = arg min G G R K (G). Goal To obtain minimax fast rates r n (F) inf Ĝ sup Ed (Ĝ, G ), where d {d f,g, d }. (f,g) F Inverse Statistical Learning 10 / 51
13 Mammen and Tsybakov (1999) with errors in variables We observe Z 1 1,..., Z 1 n and Z 2 1,..., Z 2 n such that : where : Z 1 i = X 1 i + ɛ 1 i and Z 2 i = X 2 i + ɛ 2 i, for i = 1,... n, X 1 i f and X 2 i g, ɛ j i i.i.d. with density η. Goal To obtain minimax fast rates r n (F, β) inf Ĝ sup Ed (Ĝ, G ), where d {d f,g, d }. (f,g) F Inverse Statistical Learning 11 / 51
14 ERM approach ERM principle in the direct case : 1 2n n i=1 1 X 1 i G C + 1 2n n i=1 1 X 2 i G R K (G). Inverse Statistical Learning 12 / 51
15 ERM approach ERM principle in this model fails : 1 2n n i=1 1 Z 1 i G C + 1 2n n i=1 1 Z 2 i G 1 2 [ f η + G C G ] g η R K (G). Inverse Statistical Learning 12 / 51
16 ERM approach ERM principle in this model fails : 1 2n n i=1 1 Z 1 i G C + 1 2n n i=1 1 Z 2 i G 1 2 Solution Define Rn λ (G) = 1 [ λ ˆf 2 n (x)dx + G C G [ f η + G C G ] ĝn λ (x)dx R K (G), where (ˆf λ n, ĝ λ n ) are estimators of (f, g) of the form : ˆf λ n (x) = 1 nλ n i=1 ( Z 1 K i x λ ). ] g η R K (G). Inverse Statistical Learning 12 / 51
17 Details Z1 1,..., Z n 1 i.i.d. f η et Z1 2,..., Z n 2 i.i.d. g η. We consider : Rn λ (G) = 1 [ ] λ ˆf 2 n (x)dx + ĝn λ (x)dx, G C G where ˆf n λ and ĝn λ are deconvolution kernel estimator. Then : [ n ] Rn λ (G) = 1 n h λ n G (Z 1 C i ) + hg λ (Z i 2 ), i=1 i=1 where : h λ G (z) = G 1 λ K ( z x λ ) dx = 1 G K λ (z). Inverse Statistical Learning 13 / 51
18 Vapnik s bound (ɛ = 0) The use of empirical process comes from VC theory : R K (Ĝ n ) R K (G ) R K (Ĝ n ) R n (Ĝ n ) + R n (G ) R K (G ) 2 sup (R n R)(G). G G Goal to control uniformly the empirical process indexed by G. Inverse Statistical Learning 14 / 51
19 Vapnik s bound (ɛ = 0) The use of empirical process comes from VC theory : R K (Ĝ n ) R K (G ) R K (Ĝ n ) R n (Ĝ n ) + R n (G ) R K (G ) 2 sup (R n R)(G). G G Goal to control uniformly the empirical process indexed by G. ISL {1 G K λ, G G}. Inverse Statistical Learning 14 / 51
20 Theorem 1 : Upper bound (j.w. with C. Marteau) Suppose (f, g) G(α, γ) and F[η](t) Π d i=1 t i β i, β i > 1/2, i = 1,..., d. Consider a kernel K of order γ, which satisfies some properties. Then : where lim sup n + (f,g) G(α,γ) τ d (α, β, γ) = n τ d (α,β,γ) E f,g d (Ĝ n, G ) < +, γα γ(2+α)+d+2 d i=1 β i γ(α+1) γ(2+α)+d+2 d i=1 β i and λ = (λ 1,..., λ d ) is chosen as : for d = d for d = d f,g. λ j = n 1 γ(2+α)+2 d i=1 β i +d, j {1,..., d}. Inverse Statistical Learning 15 / 51
21 Theorem 2 : Lower bound (j.w. with C. Marteau) Suppose F[η](t) Π d i=1 t i β i, β i > 1/2, i = 1,..., d. Then for α 1, lim inf inf sup n τ d (α,β,γ) E f,g d (Ĝ n,m, G ) > 0, n + Ĝ n (f,g) G(α,γ) where the infinimum is taken over all possible estimators of the set G and γα γ(2+α)+d+2 d for d = d i=1 β i τ d (α, β, γ) = for d = d f,g. γ(α+1) γ(2+α)+d+2 d i=1 β i Inverse Statistical Learning 16 / 51
22 Conclusion (minimax) Direct case Density estimation Classification n 2γ 2γ+1 n γ(α+1) γ(α+2)+d Noisy case n 2γ 2γ+2β+1 n γ(α+1) γ(α+2)+2 β+d f Σ(γ, L) E(Y = 1 X = x) Σ(γ, L) Assumptions Margin parameter α 0 F[η](t) t β F[η j ](t) t j β j j = 1,... d Inverse Statistical Learning 17 / 51
23 Minimax rates Heuristics, proofs Noisy K-means algorithm Adaptation using ERC Sketch of the proofs, heuristic 1. Noisy quantization (for simplicity) 2. Excess risk decomposition 3. Bias control (easy and minimax) 4. Variance control : key lemma Inverse Statistical Learning 18 / 51
24 Other results (I) (Un)supervised classification with errors-in-variables : R l (ĝ λ n ) R l (g ) Cn κγ γ(2κ+ρ 1)+(2κ 1) d β i=1 i, where g = arg min R l (g, (X, Y )) Inverse Statistical Learning 19 / 51
25 Other results (I) (Un)supervised classification with errors-in-variables : where R l (ĝ λ n ) R l (g ) Cn κγ γ(2κ+ρ 1)+(2κ 1) d β i=1 i, g = arg min R l (g, (X, Y )) (Un)supervised classification with Z i Af using ˆf N n (x) = N ˆθ k φ k (x), k=1 where θ k = b 1 1 n k n i=1 ψ k(z i ) and A Aφ k = bk 2φ k and f Θ(γ, L) := {f = θ k ϕ k : θk 2 k2γ+1 L}. k=1 Inverse Statistical Learning 19 / 51
26 Other results (II) If f Σ( γ, L) the anisotropic Hölder class : where : R l (ĝ λ n ) R l (g ) Cn ɛ(κ, β, γ ) = (2κ 1) and λ = (λ 1,..., λ d ) is chosen as : 2κ 1 κ 2κ+ρ 1+ɛ(κ,β,γ) d j=1 β j γ j, λ j n 2γ j (2κ+ρ 1+ɛ(κ,β, γ )), j = 1,... d. Inverse Statistical Learning 20 / 51
27 Other results (II) If f Σ( γ, L) the anisotropic Hölder class : where : R l (ĝ λ n ) R l (g ) Cn ɛ(κ, β, γ ) = (2κ 1) and λ = (λ 1,..., λ d ) is chosen as : 2κ 1 κ 2κ+ρ 1+ɛ(κ,β,γ) d j=1 β j γ j, λ j n 2γ j (2κ+ρ 1+ɛ(κ,β, γ )), j = 1,... d. Non-exact oracle inequalities : R l (ĝ) (1 + ɛ) inf g G R l(g) + C(ɛ)n without margin assumption. γ γ(1+ρ)+ d β i=1 i, Inverse Statistical Learning 20 / 51
28 Finite dimensional clustering Given k, we aim at : arg min E min X c j 2. c=(c 1,...,c k ) R dk j=1,...k The empirical couterpart : 1 ĉ n arg min c=(c 1,...,c k ) R dk n n min X i c j 2, j=1,...k i=1 gives rise to the popular k-means studied in (Pollard, 1982). Inverse Statistical Learning 21 / 51
29 Finite dimensional noisy clustering (j.w. with C. Brunet) We want to approximate a solution of the stochastic minimization : 1 min c=(c 1,...,c k ) R dk n n γ λ (c, Z i ), i=1 where γ λ (c, z) = K min x c j 2 Kλ (z x) dx. j=1,...,k Inverse Statistical Learning 22 / 51
30 First order conditions (I) Suppose X M and Pollard s regularity assumptions are satisfied. Then, u {1,..., d} j {1,..., k}, we have the following assertion : n i=1 V c uj = j x u Kλ (Z i x) dx n = euj Jn λ (c) = 0, i=1 Kλ (Z i x) dx V j where J λ n (c) = n γ λ (c, Z i ). i=1 Inverse Statistical Learning 23 / 51
31 First order conditions (II) The standard k-means : n i=1 c u,j = X n i,u1 Xi V j i=1 V n i=1 1 = j x u δ Xi dx n, u, j, X i V j i=1 V j δ Xi dx where δ Xi is the Dirac function at point X i. Another look : V c u,j = j x uˆfn (x)dx, u {1,..., d}, j {1,..., k}, V ˆfn j (x)dx where ˆf n (x) = 1/n n i=1 K λ (Z i x) is the kernel deconvolution estimator of the density f. Inverse Statistical Learning 24 / 51
32 The algorithm of Noisy K-means (j.w. with C. Brunet) Inverse Statistical Learning 25 / 51
33 Experimental setting : simulation study 1. We draw i.i.d. sequences (X i ) i=1,...,n (gaussian mixtures), and (ɛ i ) n i=1 (symmetric noise) for n {100, 500}. 2. We draw repetitions (ɛ j ) j=1,...,m with m = We compute Noisy k-means clusters ĉ with an estimation step of f η thanks to We calculate the clustering risk : r n (ĉ) = i=1 1I X j i / V j (ĉ). Inverse Statistical Learning 26 / 51
34 Experimental setting - Model 1 For u {1,..., 10}, we call Mod1(u) : Z i = X i + ɛ i (u), i = 1,..., n, Mod1(u) where : (X i ) n i=1 are i.i.d. with density f = 1/2f N (02,I 2 ) + 1/2f N((5,0) T,I 2) and (ɛ i (u)) n i=1 are i.i.d. with law N (0 2, Σ(u)), where Σ(u) is a diagonal matrix with diagonal vector (0, u) T, for u {1,..., 10}. Inverse Statistical Learning 27 / 51
35 Illustrations Mod1 Inverse Statistical Learning 28 / 51
36 Experimental setting - Model 2 For u {1,..., 10}, we call model Mod2(u) : Z i = X i (u) + ɛ i, i = 1,..., n, Mod2(u) where : (X i (u)) n i=1 are i.i.d. with density f = 1/3f N (02,I 2 ) + 1/3f N((a,b) T,I 2) + 1/3f N((b,a) T,I 2), where (a, b) = (15 (u 1)/2, 5 + (u 1)/2), for u {1,..., 10}, and (ɛ i ) n i=1 are i.i.d. with law N (0 2, Σ), where Σ is a diagonal matrix with diagonal vector (5, 5) T. Inverse Statistical Learning 29 / 51
37 Illustrations Mod2 Inverse Statistical Learning 30 / 51
38 Results Mod1 for n = 100 Inverse Statistical Learning 31 / 51
39 Results Mod1 for n = 500 Inverse Statistical Learning 32 / 51
40 Results Mod2 Inverse Statistical Learning 33 / 51
41 Adaptation! To get the optimal rates, we act as follows : { ( ) } c(λ) 2/(1+ρ) R(ĉ λ, c ) inf C 1 + C 2 λ 2γ Cn λ n γ 2γ(1+ρ)+2β where λ = O(n 1 2γ(1+ρ)+2β ). Goal to choose the bandwidth based on Lepski s principle Inverse Statistical Learning 34 / 51
42 Empirical Risk Comparison (j.w. with M. Chichignoud) We choose λ as follows : ˆλ = max{λ Λ : R λ n (ĉ λ ) R λ n (ĉ λ ) 3δ λ, λ λ}, where δ λ is defined as : λ 2β δ λ = C adapt log n, n where C adapt > 0 is an explicit constant. Inverse Statistical Learning 35 / 51
43 Adaptation : data-driven choices of λ Inverse Statistical Learning 36 / 51
44 Uniform law for ɛ Inverse Statistical Learning 37 / 51
45 Adaptation : stability of ICI method Inverse Statistical Learning 38 / 51
46 Real dataset : Iris Inverse Statistical Learning 39 / 51
47 Adaptation using Empirical Risk Comparison (ERC) To get the optimal rates, we act as follows : { ( ) } c(λ) 2/(1+ρ) R(ĉ λ, c ) inf C 1 + C 2 λ 2γ Cn λ n γ 2γ(1+ρ)+2β where λ = O(n 1 2γ(1+ρ)+2β ). Goal to choose the bandwidth based on Lepski s principle Inverse Statistical Learning 40 / 51
48 Lepski s method {ˆf h, h H} a family of (kernel) estimators, with associated (bandwidth) h H R. BV decomposition : ˆf h f C{B(h) + V (h)}, where (usually) V ( ) is known. Related to minimax theory : f Σ(γ, L) ˆf h (γ) f C inf{b(h) + V (h)} = Cψ n (γ). Goal a data-driven method to reach the bias-variance trade-off (minimax adaptive method). Inverse Statistical Learning 41 / 51
49 Lepski s method : the rule The rule : ĥ = max{h > 0 : h h, ˆf h ˆf h cv (h )} ˆf h ˆf h ˆf h f + f ˆf h B(h) + V (h) + B(h ) + V (h ) h h B(h) + V (h ) The rule selects the biggest h > 0 such that : B(h) + V (h ) sup h h V (h c h h, B(h) (c 1)V (h ). ) Inverse Statistical Learning 42 / 51
50 Empirical Risk Comparison (j.w. with M. Chichignoud) We choose λ as follows : ˆλ = max{λ Λ : R λ n (ĉ λ ) R λ n (ĉ λ ) 3δ λ, λ λ}, where δ λ is defined as : λ 2β δ λ = C adapt log n, n where C adapt > 0 is an explicit constant. Inverse Statistical Learning 43 / 51
51 Theorem 3 : Adaptive upper bound (j.w. with M. Chichignoud) Suppose f Σ(γ, L), the noise assumption and Pollard s regularity assumptions are satisfied. Consider a kernel K of order γ, which satisfies the kernel assumption. Then : lim where sup n + (log n) n γ γ+ d β i=1 i sup f Σ(γ,L) ĉ λ = arg min c C and ˆλ is chosen with ERC rule. E[R(ĉˆλ) R(c )] < +, n l λ (c, Z i ), i=1 Inverse Statistical Learning 44 / 51
52 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Inverse Statistical Learning 45 / 51
53 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. The rule becomes : ˆλ = λ 1 1I Ω + λ 2 1I Ω C, where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. Inverse Statistical Learning 45 / 51
54 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. The rule becomes : ˆλ = λ 1 1I Ω + λ 2 1I Ω C, where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. Case 1 : λ = λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. Inverse Statistical Learning 45 / 51
55 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 1 : λ = λ 1 < λ 2. ER(ĉˆλ, c ) = ER(ĉˆλ, c )( 1I Ω + 1I Ω C ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω C. Inverse Statistical Learning 46 / 51
56 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 1 : λ = λ 1 < λ 2. ER(ĉˆλ, c ) = ER(ĉˆλ, c )( 1I Ω + 1I Ω C ) On Ω C, we have with high proba : ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω C. R(ĉˆλ, c ) = (R R λ )(ĉˆλ, c ) + (R λ Rn λ )(ĉˆλ, c ) + Rn λ (ĉˆλ, c ) B(λ ) + (R λ Rn λ )(ĉˆλ, c ) + 3δ λ Inverse Statistical Learning 46 / 51
57 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 1 : λ = λ 1 < λ 2. ER(ĉˆλ, c ) = ER(ĉˆλ, c )( 1I Ω + 1I Ω C ) On Ω C, we have with high proba : ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω C. R(ĉˆλ, c ) = (R R λ )(ĉˆλ, c ) + (R λ Rn λ )(ĉˆλ, c ) + Rn λ (ĉˆλ, c ) B(λ ) + (R λ Rn λ )(ĉˆλ, c ) + 3δ λ B(λ ) + r λ (2 log n) + 3δ λ Cψ n (λ ), where r λ (t) : P ( sup c R λ n R λ (c, c ) r λ (t) ) e t. Inverse Statistical Learning 46 / 51
58 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. ER(ĉˆλ, c ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω ψ n (λ ) + P(Ω), where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. Inverse Statistical Learning 47 / 51
59 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. ER(ĉˆλ, c ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω ψ n (λ ) + P(Ω), where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. R λ 1 n (ĉ λ2, ĉ λ1 ) = (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + (R λ 1 R)(ĉ λ2, ĉ λ1 ) + R(ĉ λ2, ĉ λ1 ) (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + 2B(λ 1 ) + R(ĉ λ2, c ). Inverse Statistical Learning 47 / 51
60 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. ER(ĉˆλ, c ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω ψ n (λ ) + P(Ω), where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. R λ 1 n (ĉ λ2, ĉ λ1 ) = (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + (R λ 1 R)(ĉ λ2, ĉ λ1 ) + R(ĉ λ2, ĉ λ1 ) (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + 2B(λ 1 ) + R(ĉ λ2, c ). Since B(λ 1 ) < B(λ 2 ) = B(λ ) = δ λ = δ λ2 < δ λ1 Bousquet twice, we have with proba 1 2n 2 : and using R λ 1 n (ĉ λ2, ĉ λ1 ) 2r λ1 (2 log n) + 2δ λ1 + B(λ 2 ) + δ λ2 Cδ λ1. Inverse Statistical Learning 47 / 51
61 ERC s Extension Consider a family of λ-erm {ĝ λ, λ > 0}. Assume : 1.There exists an increasing function denoted by Bias( ) such that : (R λ R)(g, g ) Bias(λ) R(g, g ), for all g G. 2.There exists a decreasing function denoted by Var t ( ) (t 0) such that λ, t > 0 : ( { P (Rn λ R λ )(g, g ) 1 } ) 4 R(g, g ) > Var t (λ) e -t. sup g G Then, there exists a universal constant C 3 such that ( { } ) ER(ĝˆλ, g ) C 3 inf Bias(λ) + Var t (λ) + e -t, for all t 0. λ Inverse Statistical Learning 48 / 51
62 Examples Nonparametric estimation Image denoising Rn λ (f t ) = i (Y i f t ) 2 K λ (X i x 0 ). Local robust regression Rn λ (t) = i ρ(y i t)k λ (X i x 0 ). Fitted local likelihood R λ n (θ) = i log p(y i, θ)k λ (X i x 0 ). Inverse Statistical Learning Quantile estimation Rn λ (q) = i (x q)(τ 1Ix q ) K λ (Z i x)dx. Learning principal curves Rn λ (g) = i inft x f (t) 2 K λ (Z i x)dx. Binary classification R λ n (G) = K i 1IYi 1I(x G) λ (Z i x)dx. Inverse Statistical Learning 49 / 51
63 Open problems Anisotropic case Margin adaptation Model selection Inverse Statistical Learning 50 / 51
64 Conclusion Thanks for your attention! Inverse Statistical Learning 51 / 51
The algorithm of noisy k-means
Noisy k-means The algorithm of noisy k-means Camille Brunet LAREMA Université d Angers 2 Boulevard Lavoisier, 49045 Angers Cedex, France Sébastien Loustau LAREMA Université d Angers 2 Boulevard Lavoisier,
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationMinimax fast rates for discriminant analysis with errors in variables
Submitted to the Bernoulli Minimax fast rates for discriminant analysis with errors in variables Sébastien Loustau and Clément Marteau The effect of measurement errors in discriminant analysis is investigated.
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationGeneralization theory
Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1
More informationStatistical learning with Lipschitz and convex loss functions
Statistical learning with Lipschitz and convex loss functions Geoffrey Chinot, Guillaume Lecué and Matthieu Lerasle October 3, 08 Abstract We obtain risk bounds for Empirical Risk Minimizers ERM and minmax
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationMinimax Estimation of Kernel Mean Embeddings
Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationLearning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013
Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description
More information1-bit Matrix Completion. PAC-Bayes and Variational Approximation
: PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational
More informationOPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute
More informationLecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More informationNonparametric regression with martingale increment errors
S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration
More informationNonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII
Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationInference for High Dimensional Robust Regression
Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011
More informationDISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania
Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the
More informationThe Learning Problem and Regularization
9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationLecture 8 Inequality Testing and Moment Inequality Models
Lecture 8 Inequality Testing and Moment Inequality Models Inequality Testing In the previous lecture, we discussed how to test the nonlinear hypothesis H 0 : h(θ 0 ) 0 when the sample information comes
More informationA Magiv CV Theory for Large-Margin Classifiers
A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector
More informationSession 2B: Some basic simulation methods
Session 2B: Some basic simulation methods John Geweke Bayesian Econometrics and its Applications August 14, 2012 ohn Geweke Bayesian Econometrics and its Applications Session 2B: Some () basic simulation
More informationBINARY CLASSIFICATION
BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the
More informationAdaptivity to Local Smoothness and Dimension in Kernel Regression
Adaptivity to Local Smoothness and Dimension in Kernel Regression Samory Kpotufe Toyota Technological Institute-Chicago samory@tticedu Vikas K Garg Toyota Technological Institute-Chicago vkg@tticedu Abstract
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationGenerative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis
Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :
More informationLecture 3: Introduction to Complexity Regularization
ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationSupervised Learning: Non-parametric Estimation
Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationThe Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding
The Root-Unroot Algorithm for Density Estimation as Implemented via Wavelet Block Thresholding Lawrence Brown, Tony Cai, Ren Zhang, Linda Zhao and Harrison Zhou Abstract We propose and implement a density
More informationDensity estimators for the convolution of discrete and continuous random variables
Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract
More informationsparse and low-rank tensor recovery Cubic-Sketching
Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationModel selection theory: a tutorial with applications to learning
Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical
More informationConvergence rates of spectral methods for statistical inverse learning problems
Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)
More informationA tailor made nonparametric density estimate
A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory
More information4 Invariant Statistical Decision Problems
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationBayesian Indirect Inference and the ABC of GMM
Bayesian Indirect Inference and the ABC of GMM Michael Creel, Jiti Gao, Han Hong, Dennis Kristensen Universitat Autónoma, Barcelona Graduate School of Economics, and MOVE Monash University Stanford University
More informationPersistent homology and nonparametric regression
Cleveland State University March 10, 2009, BIRS: Data Analysis using Computational Topology and Geometric Statistics joint work with Gunnar Carlsson (Stanford), Moo Chung (Wisconsin Madison), Peter Kim
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationChapter 1. Density Estimation
Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More information(Part 1) High-dimensional statistics May / 41
Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationAn Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI
An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationConcentration, self-bounding functions
Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3
More informationStatistical Inverse Problems and Instrumental Variables
Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More informationNonparametric Inference In Functional Data
Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationAsymptotics of minimax stochastic programs
Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More information1 The Glivenko-Cantelli Theorem
1 The Glivenko-Cantelli Theorem Let X i, i = 1,..., n be an i.i.d. sequence of random variables with distribution function F on R. The empirical distribution function is the function of x defined by ˆF
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationCurve learning. p.1/35
Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem
More informationDistirbutional robustness, regularizing variance, and adversaries
Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationMotivational Example
Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationA Neyman-Pearson Approach to Statistical Learning
A Neyman-Pearson Approach to Statistical Learning Clayton Scott and Robert Nowak Technical Report TREE 0407 Department of Electrical and Computer Engineering Rice University Email: cscott@rice.edu, nowak@engr.wisc.edu
More informationBayesian Nonparametric Point Estimation Under a Conjugate Prior
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda
More informationHomework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples
Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationSUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities
SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE
More information