Inverse Statistical Learning

Size: px
Start display at page:

Download "Inverse Statistical Learning"

Transcription

1 Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning 1 / 51

2 The problem of Inverse Statistical Learning Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), Inverse Statistical Learning 2 / 51

3 The problem of Inverse Statistical Learning Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), from an indirect sequence of observations : (Z 1, Y 1 ),..., (Z n, Y n ) i.i.d. from P, where Z i Af, A is a linear compact operator (and X f ). Inverse Statistical Learning 2 / 51

4 Statistical Learning with errors in variables Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), from a noisy sequence of observations : (X 1 + ɛ 1, Y 1 ),..., (X n + ɛ n, Y n ) i.i.d. from P, where Z i f η and η is the density of the i.i.d. sequence (ɛ i ) n i=1. Inverse Statistical Learning 3 / 51

5 Statistical Learning with errors in variables Given (X, Y ) P on X Y, a class G and a loss function l : G (X Y) R +,, we aim at : g arg min E P l(g, (X, Y )), from a noisy sequence of observations : (X 1 + ɛ 1, Y 1 ),..., (X n + ɛ n, Y n ) i.i.d. from P, where Z i f η and η is the density of the i.i.d. sequence (ɛ i ) n i=1. Y = R : regression with errors in variables, Y = {1,..., M} : classification with errors in variables, Y = : unsupervised learning with errors in variables. Inverse Statistical Learning 3 / 51

6 Toy Example (I) Direct dataset (Unobservable) Observations (Available) Inverse Statistical Learning 4 / 51

7 Toy example (II) Direct dataset (Unobservable) Observations (Available) Inverse Statistical Learning 5 / 51

8 Real-world example in oncology (I) Fig.1 : The same tumor observed by two radiologists Z ij = X i + ɛ ij, j {1, 2}. Inverse Statistical Learning 6 / 51

9 Real-world example in oncology (II) Fig.1 : Batch effect in a Micro-array dataset J. A. Gagnon-Bartsch, L. Jacob and T. P. Speed, 2013 Inverse Statistical Learning 7 / 51

10 Minimax rates Heuristics, proofs Noisy K-means algorithm Adaptation using ERC Contents 1. Minimax rates in discriminant analysis 2. Excess risk bound 3. The algorithm of noisy k-means (4.) Adaptation Inverse Statistical Learning 8 / 51

11 Origin : a minimax motivation (with C. Marteau) Direct case Density estimation Classification n 2γ 2γ+1 n γ(α+1) γ(α+2)+d Noisy case n 2γ 2γ+2β+1??? f Σ(γ, L) E(Y = 1 X = x) Σ(γ, L) Assumptions Margin parameter α 0 F[η](t) t β F[η j ](t) t j β j j = 1,..., d Inverse Statistical Learning 9 / 51

12 Mammen and Tsybakov (1999) Given two densities f and g, for any G K, the Bayes risk is defined as : [ R K (G) = 1 ] fdq + gdq. 2 K/G G Given X 1 1,..., X 1 n f and X 2 1,..., X 2 n g, we aim at : G = arg min G G R K (G). Goal To obtain minimax fast rates r n (F) inf Ĝ sup Ed (Ĝ, G ), where d {d f,g, d }. (f,g) F Inverse Statistical Learning 10 / 51

13 Mammen and Tsybakov (1999) with errors in variables We observe Z 1 1,..., Z 1 n and Z 2 1,..., Z 2 n such that : where : Z 1 i = X 1 i + ɛ 1 i and Z 2 i = X 2 i + ɛ 2 i, for i = 1,... n, X 1 i f and X 2 i g, ɛ j i i.i.d. with density η. Goal To obtain minimax fast rates r n (F, β) inf Ĝ sup Ed (Ĝ, G ), where d {d f,g, d }. (f,g) F Inverse Statistical Learning 11 / 51

14 ERM approach ERM principle in the direct case : 1 2n n i=1 1 X 1 i G C + 1 2n n i=1 1 X 2 i G R K (G). Inverse Statistical Learning 12 / 51

15 ERM approach ERM principle in this model fails : 1 2n n i=1 1 Z 1 i G C + 1 2n n i=1 1 Z 2 i G 1 2 [ f η + G C G ] g η R K (G). Inverse Statistical Learning 12 / 51

16 ERM approach ERM principle in this model fails : 1 2n n i=1 1 Z 1 i G C + 1 2n n i=1 1 Z 2 i G 1 2 Solution Define Rn λ (G) = 1 [ λ ˆf 2 n (x)dx + G C G [ f η + G C G ] ĝn λ (x)dx R K (G), where (ˆf λ n, ĝ λ n ) are estimators of (f, g) of the form : ˆf λ n (x) = 1 nλ n i=1 ( Z 1 K i x λ ). ] g η R K (G). Inverse Statistical Learning 12 / 51

17 Details Z1 1,..., Z n 1 i.i.d. f η et Z1 2,..., Z n 2 i.i.d. g η. We consider : Rn λ (G) = 1 [ ] λ ˆf 2 n (x)dx + ĝn λ (x)dx, G C G where ˆf n λ and ĝn λ are deconvolution kernel estimator. Then : [ n ] Rn λ (G) = 1 n h λ n G (Z 1 C i ) + hg λ (Z i 2 ), i=1 i=1 where : h λ G (z) = G 1 λ K ( z x λ ) dx = 1 G K λ (z). Inverse Statistical Learning 13 / 51

18 Vapnik s bound (ɛ = 0) The use of empirical process comes from VC theory : R K (Ĝ n ) R K (G ) R K (Ĝ n ) R n (Ĝ n ) + R n (G ) R K (G ) 2 sup (R n R)(G). G G Goal to control uniformly the empirical process indexed by G. Inverse Statistical Learning 14 / 51

19 Vapnik s bound (ɛ = 0) The use of empirical process comes from VC theory : R K (Ĝ n ) R K (G ) R K (Ĝ n ) R n (Ĝ n ) + R n (G ) R K (G ) 2 sup (R n R)(G). G G Goal to control uniformly the empirical process indexed by G. ISL {1 G K λ, G G}. Inverse Statistical Learning 14 / 51

20 Theorem 1 : Upper bound (j.w. with C. Marteau) Suppose (f, g) G(α, γ) and F[η](t) Π d i=1 t i β i, β i > 1/2, i = 1,..., d. Consider a kernel K of order γ, which satisfies some properties. Then : where lim sup n + (f,g) G(α,γ) τ d (α, β, γ) = n τ d (α,β,γ) E f,g d (Ĝ n, G ) < +, γα γ(2+α)+d+2 d i=1 β i γ(α+1) γ(2+α)+d+2 d i=1 β i and λ = (λ 1,..., λ d ) is chosen as : for d = d for d = d f,g. λ j = n 1 γ(2+α)+2 d i=1 β i +d, j {1,..., d}. Inverse Statistical Learning 15 / 51

21 Theorem 2 : Lower bound (j.w. with C. Marteau) Suppose F[η](t) Π d i=1 t i β i, β i > 1/2, i = 1,..., d. Then for α 1, lim inf inf sup n τ d (α,β,γ) E f,g d (Ĝ n,m, G ) > 0, n + Ĝ n (f,g) G(α,γ) where the infinimum is taken over all possible estimators of the set G and γα γ(2+α)+d+2 d for d = d i=1 β i τ d (α, β, γ) = for d = d f,g. γ(α+1) γ(2+α)+d+2 d i=1 β i Inverse Statistical Learning 16 / 51

22 Conclusion (minimax) Direct case Density estimation Classification n 2γ 2γ+1 n γ(α+1) γ(α+2)+d Noisy case n 2γ 2γ+2β+1 n γ(α+1) γ(α+2)+2 β+d f Σ(γ, L) E(Y = 1 X = x) Σ(γ, L) Assumptions Margin parameter α 0 F[η](t) t β F[η j ](t) t j β j j = 1,... d Inverse Statistical Learning 17 / 51

23 Minimax rates Heuristics, proofs Noisy K-means algorithm Adaptation using ERC Sketch of the proofs, heuristic 1. Noisy quantization (for simplicity) 2. Excess risk decomposition 3. Bias control (easy and minimax) 4. Variance control : key lemma Inverse Statistical Learning 18 / 51

24 Other results (I) (Un)supervised classification with errors-in-variables : R l (ĝ λ n ) R l (g ) Cn κγ γ(2κ+ρ 1)+(2κ 1) d β i=1 i, where g = arg min R l (g, (X, Y )) Inverse Statistical Learning 19 / 51

25 Other results (I) (Un)supervised classification with errors-in-variables : where R l (ĝ λ n ) R l (g ) Cn κγ γ(2κ+ρ 1)+(2κ 1) d β i=1 i, g = arg min R l (g, (X, Y )) (Un)supervised classification with Z i Af using ˆf N n (x) = N ˆθ k φ k (x), k=1 where θ k = b 1 1 n k n i=1 ψ k(z i ) and A Aφ k = bk 2φ k and f Θ(γ, L) := {f = θ k ϕ k : θk 2 k2γ+1 L}. k=1 Inverse Statistical Learning 19 / 51

26 Other results (II) If f Σ( γ, L) the anisotropic Hölder class : where : R l (ĝ λ n ) R l (g ) Cn ɛ(κ, β, γ ) = (2κ 1) and λ = (λ 1,..., λ d ) is chosen as : 2κ 1 κ 2κ+ρ 1+ɛ(κ,β,γ) d j=1 β j γ j, λ j n 2γ j (2κ+ρ 1+ɛ(κ,β, γ )), j = 1,... d. Inverse Statistical Learning 20 / 51

27 Other results (II) If f Σ( γ, L) the anisotropic Hölder class : where : R l (ĝ λ n ) R l (g ) Cn ɛ(κ, β, γ ) = (2κ 1) and λ = (λ 1,..., λ d ) is chosen as : 2κ 1 κ 2κ+ρ 1+ɛ(κ,β,γ) d j=1 β j γ j, λ j n 2γ j (2κ+ρ 1+ɛ(κ,β, γ )), j = 1,... d. Non-exact oracle inequalities : R l (ĝ) (1 + ɛ) inf g G R l(g) + C(ɛ)n without margin assumption. γ γ(1+ρ)+ d β i=1 i, Inverse Statistical Learning 20 / 51

28 Finite dimensional clustering Given k, we aim at : arg min E min X c j 2. c=(c 1,...,c k ) R dk j=1,...k The empirical couterpart : 1 ĉ n arg min c=(c 1,...,c k ) R dk n n min X i c j 2, j=1,...k i=1 gives rise to the popular k-means studied in (Pollard, 1982). Inverse Statistical Learning 21 / 51

29 Finite dimensional noisy clustering (j.w. with C. Brunet) We want to approximate a solution of the stochastic minimization : 1 min c=(c 1,...,c k ) R dk n n γ λ (c, Z i ), i=1 where γ λ (c, z) = K min x c j 2 Kλ (z x) dx. j=1,...,k Inverse Statistical Learning 22 / 51

30 First order conditions (I) Suppose X M and Pollard s regularity assumptions are satisfied. Then, u {1,..., d} j {1,..., k}, we have the following assertion : n i=1 V c uj = j x u Kλ (Z i x) dx n = euj Jn λ (c) = 0, i=1 Kλ (Z i x) dx V j where J λ n (c) = n γ λ (c, Z i ). i=1 Inverse Statistical Learning 23 / 51

31 First order conditions (II) The standard k-means : n i=1 c u,j = X n i,u1 Xi V j i=1 V n i=1 1 = j x u δ Xi dx n, u, j, X i V j i=1 V j δ Xi dx where δ Xi is the Dirac function at point X i. Another look : V c u,j = j x uˆfn (x)dx, u {1,..., d}, j {1,..., k}, V ˆfn j (x)dx where ˆf n (x) = 1/n n i=1 K λ (Z i x) is the kernel deconvolution estimator of the density f. Inverse Statistical Learning 24 / 51

32 The algorithm of Noisy K-means (j.w. with C. Brunet) Inverse Statistical Learning 25 / 51

33 Experimental setting : simulation study 1. We draw i.i.d. sequences (X i ) i=1,...,n (gaussian mixtures), and (ɛ i ) n i=1 (symmetric noise) for n {100, 500}. 2. We draw repetitions (ɛ j ) j=1,...,m with m = We compute Noisy k-means clusters ĉ with an estimation step of f η thanks to We calculate the clustering risk : r n (ĉ) = i=1 1I X j i / V j (ĉ). Inverse Statistical Learning 26 / 51

34 Experimental setting - Model 1 For u {1,..., 10}, we call Mod1(u) : Z i = X i + ɛ i (u), i = 1,..., n, Mod1(u) where : (X i ) n i=1 are i.i.d. with density f = 1/2f N (02,I 2 ) + 1/2f N((5,0) T,I 2) and (ɛ i (u)) n i=1 are i.i.d. with law N (0 2, Σ(u)), where Σ(u) is a diagonal matrix with diagonal vector (0, u) T, for u {1,..., 10}. Inverse Statistical Learning 27 / 51

35 Illustrations Mod1 Inverse Statistical Learning 28 / 51

36 Experimental setting - Model 2 For u {1,..., 10}, we call model Mod2(u) : Z i = X i (u) + ɛ i, i = 1,..., n, Mod2(u) where : (X i (u)) n i=1 are i.i.d. with density f = 1/3f N (02,I 2 ) + 1/3f N((a,b) T,I 2) + 1/3f N((b,a) T,I 2), where (a, b) = (15 (u 1)/2, 5 + (u 1)/2), for u {1,..., 10}, and (ɛ i ) n i=1 are i.i.d. with law N (0 2, Σ), where Σ is a diagonal matrix with diagonal vector (5, 5) T. Inverse Statistical Learning 29 / 51

37 Illustrations Mod2 Inverse Statistical Learning 30 / 51

38 Results Mod1 for n = 100 Inverse Statistical Learning 31 / 51

39 Results Mod1 for n = 500 Inverse Statistical Learning 32 / 51

40 Results Mod2 Inverse Statistical Learning 33 / 51

41 Adaptation! To get the optimal rates, we act as follows : { ( ) } c(λ) 2/(1+ρ) R(ĉ λ, c ) inf C 1 + C 2 λ 2γ Cn λ n γ 2γ(1+ρ)+2β where λ = O(n 1 2γ(1+ρ)+2β ). Goal to choose the bandwidth based on Lepski s principle Inverse Statistical Learning 34 / 51

42 Empirical Risk Comparison (j.w. with M. Chichignoud) We choose λ as follows : ˆλ = max{λ Λ : R λ n (ĉ λ ) R λ n (ĉ λ ) 3δ λ, λ λ}, where δ λ is defined as : λ 2β δ λ = C adapt log n, n where C adapt > 0 is an explicit constant. Inverse Statistical Learning 35 / 51

43 Adaptation : data-driven choices of λ Inverse Statistical Learning 36 / 51

44 Uniform law for ɛ Inverse Statistical Learning 37 / 51

45 Adaptation : stability of ICI method Inverse Statistical Learning 38 / 51

46 Real dataset : Iris Inverse Statistical Learning 39 / 51

47 Adaptation using Empirical Risk Comparison (ERC) To get the optimal rates, we act as follows : { ( ) } c(λ) 2/(1+ρ) R(ĉ λ, c ) inf C 1 + C 2 λ 2γ Cn λ n γ 2γ(1+ρ)+2β where λ = O(n 1 2γ(1+ρ)+2β ). Goal to choose the bandwidth based on Lepski s principle Inverse Statistical Learning 40 / 51

48 Lepski s method {ˆf h, h H} a family of (kernel) estimators, with associated (bandwidth) h H R. BV decomposition : ˆf h f C{B(h) + V (h)}, where (usually) V ( ) is known. Related to minimax theory : f Σ(γ, L) ˆf h (γ) f C inf{b(h) + V (h)} = Cψ n (γ). Goal a data-driven method to reach the bias-variance trade-off (minimax adaptive method). Inverse Statistical Learning 41 / 51

49 Lepski s method : the rule The rule : ĥ = max{h > 0 : h h, ˆf h ˆf h cv (h )} ˆf h ˆf h ˆf h f + f ˆf h B(h) + V (h) + B(h ) + V (h ) h h B(h) + V (h ) The rule selects the biggest h > 0 such that : B(h) + V (h ) sup h h V (h c h h, B(h) (c 1)V (h ). ) Inverse Statistical Learning 42 / 51

50 Empirical Risk Comparison (j.w. with M. Chichignoud) We choose λ as follows : ˆλ = max{λ Λ : R λ n (ĉ λ ) R λ n (ĉ λ ) 3δ λ, λ λ}, where δ λ is defined as : λ 2β δ λ = C adapt log n, n where C adapt > 0 is an explicit constant. Inverse Statistical Learning 43 / 51

51 Theorem 3 : Adaptive upper bound (j.w. with M. Chichignoud) Suppose f Σ(γ, L), the noise assumption and Pollard s regularity assumptions are satisfied. Consider a kernel K of order γ, which satisfies the kernel assumption. Then : lim where sup n + (log n) n γ γ+ d β i=1 i sup f Σ(γ,L) ĉ λ = arg min c C and ˆλ is chosen with ERC rule. E[R(ĉˆλ) R(c )] < +, n l λ (c, Z i ), i=1 Inverse Statistical Learning 44 / 51

52 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Inverse Statistical Learning 45 / 51

53 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. The rule becomes : ˆλ = λ 1 1I Ω + λ 2 1I Ω C, where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. Inverse Statistical Learning 45 / 51

54 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. The rule becomes : ˆλ = λ 1 1I Ω + λ 2 1I Ω C, where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. Case 1 : λ = λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. Inverse Statistical Learning 45 / 51

55 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 1 : λ = λ 1 < λ 2. ER(ĉˆλ, c ) = ER(ĉˆλ, c )( 1I Ω + 1I Ω C ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω C. Inverse Statistical Learning 46 / 51

56 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 1 : λ = λ 1 < λ 2. ER(ĉˆλ, c ) = ER(ĉˆλ, c )( 1I Ω + 1I Ω C ) On Ω C, we have with high proba : ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω C. R(ĉˆλ, c ) = (R R λ )(ĉˆλ, c ) + (R λ Rn λ )(ĉˆλ, c ) + Rn λ (ĉˆλ, c ) B(λ ) + (R λ Rn λ )(ĉˆλ, c ) + 3δ λ Inverse Statistical Learning 46 / 51

57 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 1 : λ = λ 1 < λ 2. ER(ĉˆλ, c ) = ER(ĉˆλ, c )( 1I Ω + 1I Ω C ) On Ω C, we have with high proba : ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω C. R(ĉˆλ, c ) = (R R λ )(ĉˆλ, c ) + (R λ Rn λ )(ĉˆλ, c ) + Rn λ (ĉˆλ, c ) B(λ ) + (R λ Rn λ )(ĉˆλ, c ) + 3δ λ B(λ ) + r λ (2 log n) + 3δ λ Cψ n (λ ), where r λ (t) : P ( sup c R λ n R λ (c, c ) r λ (t) ) e t. Inverse Statistical Learning 46 / 51

58 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. ER(ĉˆλ, c ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω ψ n (λ ) + P(Ω), where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. Inverse Statistical Learning 47 / 51

59 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. ER(ĉˆλ, c ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω ψ n (λ ) + P(Ω), where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. R λ 1 n (ĉ λ2, ĉ λ1 ) = (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + (R λ 1 R)(ĉ λ2, ĉ λ1 ) + R(ĉ λ2, ĉ λ1 ) (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + 2B(λ 1 ) + R(ĉ λ2, c ). Inverse Statistical Learning 47 / 51

60 Proof for λ {λ 1, λ 2 }, λ 1 < λ 2. Case 2 : λ = λ 2 > λ 1. ER(ĉˆλ, c ) ψ n (λ ) + ER(ĉˆλ, c ) 1I Ω ψ n (λ ) + P(Ω), where Ω = {R λ 1 n (ĉ λ2, ĉ λ1 ) > Cδ λ1 }. R λ 1 n (ĉ λ2, ĉ λ1 ) = (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + (R λ 1 R)(ĉ λ2, ĉ λ1 ) + R(ĉ λ2, ĉ λ1 ) (R λ 1 n R λ 1 )(ĉ λ2, ĉ λ1 ) + 2B(λ 1 ) + R(ĉ λ2, c ). Since B(λ 1 ) < B(λ 2 ) = B(λ ) = δ λ = δ λ2 < δ λ1 Bousquet twice, we have with proba 1 2n 2 : and using R λ 1 n (ĉ λ2, ĉ λ1 ) 2r λ1 (2 log n) + 2δ λ1 + B(λ 2 ) + δ λ2 Cδ λ1. Inverse Statistical Learning 47 / 51

61 ERC s Extension Consider a family of λ-erm {ĝ λ, λ > 0}. Assume : 1.There exists an increasing function denoted by Bias( ) such that : (R λ R)(g, g ) Bias(λ) R(g, g ), for all g G. 2.There exists a decreasing function denoted by Var t ( ) (t 0) such that λ, t > 0 : ( { P (Rn λ R λ )(g, g ) 1 } ) 4 R(g, g ) > Var t (λ) e -t. sup g G Then, there exists a universal constant C 3 such that ( { } ) ER(ĝˆλ, g ) C 3 inf Bias(λ) + Var t (λ) + e -t, for all t 0. λ Inverse Statistical Learning 48 / 51

62 Examples Nonparametric estimation Image denoising Rn λ (f t ) = i (Y i f t ) 2 K λ (X i x 0 ). Local robust regression Rn λ (t) = i ρ(y i t)k λ (X i x 0 ). Fitted local likelihood R λ n (θ) = i log p(y i, θ)k λ (X i x 0 ). Inverse Statistical Learning Quantile estimation Rn λ (q) = i (x q)(τ 1Ix q ) K λ (Z i x)dx. Learning principal curves Rn λ (g) = i inft x f (t) 2 K λ (Z i x)dx. Binary classification R λ n (G) = K i 1IYi 1I(x G) λ (Z i x)dx. Inverse Statistical Learning 49 / 51

63 Open problems Anisotropic case Margin adaptation Model selection Inverse Statistical Learning 50 / 51

64 Conclusion Thanks for your attention! Inverse Statistical Learning 51 / 51

The algorithm of noisy k-means

The algorithm of noisy k-means Noisy k-means The algorithm of noisy k-means Camille Brunet LAREMA Université d Angers 2 Boulevard Lavoisier, 49045 Angers Cedex, France Sébastien Loustau LAREMA Université d Angers 2 Boulevard Lavoisier,

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Minimax fast rates for discriminant analysis with errors in variables

Minimax fast rates for discriminant analysis with errors in variables Submitted to the Bernoulli Minimax fast rates for discriminant analysis with errors in variables Sébastien Loustau and Clément Marteau The effect of measurement errors in discriminant analysis is investigated.

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Generalization theory

Generalization theory Generalization theory Daniel Hsu Columbia TRIPODS Bootcamp 1 Motivation 2 Support vector machines X = R d, Y = { 1, +1}. Return solution ŵ R d to following optimization problem: λ min w R d 2 w 2 2 + 1

More information

Statistical learning with Lipschitz and convex loss functions

Statistical learning with Lipschitz and convex loss functions Statistical learning with Lipschitz and convex loss functions Geoffrey Chinot, Guillaume Lecué and Matthieu Lerasle October 3, 08 Abstract We obtain risk bounds for Empirical Risk Minimizers ERM and minmax

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Minimax Estimation of Kernel Mean Embeddings

Minimax Estimation of Kernel Mean Embeddings Minimax Estimation of Kernel Mean Embeddings Bharath K. Sriperumbudur Department of Statistics Pennsylvania State University Gatsby Computational Neuroscience Unit May 4, 2016 Collaborators Dr. Ilya Tolstikhin

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

1-bit Matrix Completion. PAC-Bayes and Variational Approximation

1-bit Matrix Completion. PAC-Bayes and Variational Approximation : PAC-Bayes and Variational Approximation (with P. Alquier) PhD Supervisor: N. Chopin Bayes In Paris, 5 January 2017 (Happy New Year!) Various Topics covered Matrix Completion PAC-Bayesian Estimation Variational

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003

Statistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003 Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

Nonparametric regression with martingale increment errors

Nonparametric regression with martingale increment errors S. Gaïffas (LSTA - Paris 6) joint work with S. Delattre (LPMA - Paris 7) work in progress Motivations Some facts: Theoretical study of statistical algorithms requires stationary and ergodicity. Concentration

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Inference for High Dimensional Robust Regression

Inference for High Dimensional Robust Regression Department of Statistics UC Berkeley Stanford-Berkeley Joint Colloquium, 2015 Table of Contents 1 Background 2 Main Results 3 OLS: A Motivating Example Table of Contents 1 Background 2 Main Results 3 OLS:

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011

More information

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania

DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING. By T. Tony Cai and Linjun Zhang University of Pennsylvania Submitted to the Annals of Statistics DISCUSSION OF INFLUENTIAL FEATURE PCA FOR HIGH DIMENSIONAL CLUSTERING By T. Tony Cai and Linjun Zhang University of Pennsylvania We would like to congratulate the

More information

The Learning Problem and Regularization

The Learning Problem and Regularization 9.520 Class 02 February 2011 Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference problem from usually small sets of high dimensional, noisy data. Learning

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Lecture 8 Inequality Testing and Moment Inequality Models

Lecture 8 Inequality Testing and Moment Inequality Models Lecture 8 Inequality Testing and Moment Inequality Models Inequality Testing In the previous lecture, we discussed how to test the nonlinear hypothesis H 0 : h(θ 0 ) 0 when the sample information comes

More information

A Magiv CV Theory for Large-Margin Classifiers

A Magiv CV Theory for Large-Margin Classifiers A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang Outline 1 Background 2 Magic CV formula 3 Magic support vector

More information

Session 2B: Some basic simulation methods

Session 2B: Some basic simulation methods Session 2B: Some basic simulation methods John Geweke Bayesian Econometrics and its Applications August 14, 2012 ohn Geweke Bayesian Econometrics and its Applications Session 2B: Some () basic simulation

More information

BINARY CLASSIFICATION

BINARY CLASSIFICATION BINARY CLASSIFICATION MAXIM RAGINSY The problem of binary classification can be stated as follows. We have a random couple Z = X, Y ), where X R d is called the feature vector and Y {, } is called the

More information

Adaptivity to Local Smoothness and Dimension in Kernel Regression

Adaptivity to Local Smoothness and Dimension in Kernel Regression Adaptivity to Local Smoothness and Dimension in Kernel Regression Samory Kpotufe Toyota Technological Institute-Chicago samory@tticedu Vikas K Garg Toyota Technological Institute-Chicago vkg@tticedu Abstract

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis Stéphanie Allassonnière CIS, JHU July, 15th 28 Context : Computational Anatomy Context and motivations :

More information

Lecture 3: Introduction to Complexity Regularization

Lecture 3: Introduction to Complexity Regularization ECE90 Spring 2007 Statistical Learning Theory Instructor: R. Nowak Lecture 3: Introduction to Complexity Regularization We ended the previous lecture with a brief discussion of overfitting. Recall that,

More information

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION

STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding

The Root-Unroot Algorithm for Density Estimation as Implemented. via Wavelet Block Thresholding The Root-Unroot Algorithm for Density Estimation as Implemented via Wavelet Block Thresholding Lawrence Brown, Tony Cai, Ren Zhang, Linda Zhao and Harrison Zhou Abstract We propose and implement a density

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

sparse and low-rank tensor recovery Cubic-Sketching

sparse and low-rank tensor recovery Cubic-Sketching Sparse and Low-Ran Tensor Recovery via Cubic-Setching Guang Cheng Department of Statistics Purdue University www.science.purdue.edu/bigdata CCAM@Purdue Math Oct. 27, 2017 Joint wor with Botao Hao and Anru

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

A tailor made nonparametric density estimate

A tailor made nonparametric density estimate A tailor made nonparametric density estimate Daniel Carando 1, Ricardo Fraiman 2 and Pablo Groisman 1 1 Universidad de Buenos Aires 2 Universidad de San Andrés School and Workshop on Probability Theory

More information

4 Invariant Statistical Decision Problems

4 Invariant Statistical Decision Problems 4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let G be a group of measurable transformations from the sample space X into itself. The group operation is composition. Note that

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Bayesian Indirect Inference and the ABC of GMM

Bayesian Indirect Inference and the ABC of GMM Bayesian Indirect Inference and the ABC of GMM Michael Creel, Jiti Gao, Han Hong, Dennis Kristensen Universitat Autónoma, Barcelona Graduate School of Economics, and MOVE Monash University Stanford University

More information

Persistent homology and nonparametric regression

Persistent homology and nonparametric regression Cleveland State University March 10, 2009, BIRS: Data Analysis using Computational Topology and Geometric Statistics joint work with Gunnar Carlsson (Stanford), Moo Chung (Wisconsin Madison), Peter Kim

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

(Part 1) High-dimensional statistics May / 41

(Part 1) High-dimensional statistics May / 41 Theory for the Lasso Recall the linear model Y i = p j=1 β j X (j) i + ɛ i, i = 1,..., n, or, in matrix notation, Y = Xβ + ɛ, To simplify, we assume that the design X is fixed, and that ɛ is N (0, σ 2

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Foundations of Machine Learning

Foundations of Machine Learning Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about

More information

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI

An Introduction to Statistical Theory of Learning. Nakul Verma Janelia, HHMI An Introduction to Statistical Theory of Learning Nakul Verma Janelia, HHMI Towards formalizing learning What does it mean to learn a concept? Gain knowledge or experience of the concept. The basic process

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Concentration, self-bounding functions

Concentration, self-bounding functions Concentration, self-bounding functions S. Boucheron 1 and G. Lugosi 2 and P. Massart 3 1 Laboratoire de Probabilités et Modèles Aléatoires Université Paris-Diderot 2 Economics University Pompeu Fabra 3

More information

Statistical Inverse Problems and Instrumental Variables

Statistical Inverse Problems and Instrumental Variables Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

Nonparametric Inference In Functional Data

Nonparametric Inference In Functional Data Nonparametric Inference In Functional Data Zuofeng Shang Purdue University Joint work with Guang Cheng from Purdue Univ. An Example Consider the functional linear model: Y = α + where 1 0 X(t)β(t)dt +

More information

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016

12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016 12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

1 The Glivenko-Cantelli Theorem

1 The Glivenko-Cantelli Theorem 1 The Glivenko-Cantelli Theorem Let X i, i = 1,..., n be an i.i.d. sequence of random variables with distribution function F on R. The empirical distribution function is the function of x defined by ˆF

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

Curve learning. p.1/35

Curve learning. p.1/35 Curve learning Gérard Biau UNIVERSITÉ MONTPELLIER II p.1/35 Summary The problem The mathematical model Functional classification 1. Fourier filtering 2. Wavelet filtering Applications p.2/35 The problem

More information

Distirbutional robustness, regularizing variance, and adversaries

Distirbutional robustness, regularizing variance, and adversaries Distirbutional robustness, regularizing variance, and adversaries John Duchi Based on joint work with Hongseok Namkoong and Aman Sinha Stanford University November 2017 Motivation We do not want machine-learned

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

A Neyman-Pearson Approach to Statistical Learning

A Neyman-Pearson Approach to Statistical Learning A Neyman-Pearson Approach to Statistical Learning Clayton Scott and Robert Nowak Technical Report TREE 0407 Department of Electrical and Computer Engineering Rice University Email: cscott@rice.edu, nowak@engr.wisc.edu

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities

SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1. Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities Jiantao Jiao*, Lin Zhang, Member, IEEE and Robert D. Nowak, Fellow, IEEE

More information