Inverse problems in statistics

Size: px
Start display at page:

Download "Inverse problems in statistics"

Transcription

1 Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32

2 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011 p. 2/32

3 Part II 2) Adaptation and oracle inequalities Regularization methods Classes of functions Rates of convergence Adaptation Oracle inequalities Unbiased risk estimation (URE) YES, Eurandom, 10 October 2011 p. 2/32

4 Inverse problem and sequence space Let the model : Y = Af +εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. YES, Eurandom, 10 October 2011 p. 3/32

5 Inverse problem and sequence space Let the model : Y = Af +εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. By using the SVD, we obtain the equivalent sequence space model : X k = θ k +εσ k ξ k, k = 1,2,... where σ k. YES, Eurandom, 10 October 2011 p. 3/32

6 Inverse problem and sequence space Let the model : Y = Af +εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. By using the SVD, we obtain the equivalent sequence space model : X k = θ k +εσ k ξ k, k = 1,2,... where σ k. The aim is to estimate (reconstruct) the function f (or the sequence {θ k }) by use of observations. YES, Eurandom, 10 October 2011 p. 3/32

7 Regularization methods In ill-posed inverse problems, find regularization methods to get fine reconstruction of f. YES, Eurandom, 10 October 2011 p. 4/32

8 Regularization methods In ill-posed inverse problems, find regularization methods to get fine reconstruction of f. The normal equation is A Y = A Af +ε A ξ. YES, Eurandom, 10 October 2011 p. 4/32

9 Regularization methods In ill-posed inverse problems, find regularization methods to get fine reconstruction of f. The normal equation is A Y = A Af +ε A ξ. One has to estimate the solution (A A) 1 A Y. Problem in ill-posed situation is that operator A A not invertible. YES, Eurandom, 10 October 2011 p. 4/32

10 Regularization methods In ill-posed inverse problems, find regularization methods to get fine reconstruction of f. The normal equation is A Y = A Af +ε A ξ. One has to estimate the solution (A A) 1 A Y. Problem in ill-posed situation is that operator A A not invertible. With regularization methods, get some inversion of a related operator. YES, Eurandom, 10 October 2011 p. 4/32

11 Regularization methods We call a regularization method an estimator defined by ˆf γ = Φ γ (A A)A Y, where Φ γ C(σ(A A)) depending on some regularization parameter γ > 0. YES, Eurandom, 10 October 2011 p. 5/32

12 Regularization methods We call a regularization method an estimator defined by ˆf γ = Φ γ (A A)A Y, where Φ γ C(σ(A A)) depending on some regularization parameter γ > 0. Examples of regularization methods or estimators which are commonly used. YES, Eurandom, 10 October 2011 p. 5/32

13 Regularization methods We call a regularization method an estimator defined by ˆf γ = Φ γ (A A)A Y, where Φ γ C(σ(A A)) depending on some regularization parameter γ > 0. Examples of regularization methods or estimators which are commonly used. Methods defined in the spectral domain even if some may be computed without using the whole spectrum. YES, Eurandom, 10 October 2011 p. 5/32

14 Estimation procedures Suppose now that operator A is compact. Using SVD, obtain sequence space model and statisticians prefer to work with. YES, Eurandom, 10 October 2011 p. 6/32

15 Estimation procedures Suppose now that operator A is compact. Using SVD, obtain sequence space model and statisticians prefer to work with. Many regularization methods may be expressed in a statistical framework, and usually correspond to some known estimation method in statistics. YES, Eurandom, 10 October 2011 p. 6/32

16 Estimation procedures Suppose now that operator A is compact. Using SVD, obtain sequence space model and statisticians prefer to work with. Many regularization methods may be expressed in a statistical framework, and usually correspond to some known estimation method in statistics. Notion of regularization not really used in statistics. However, more standard definition related. YES, Eurandom, 10 October 2011 p. 6/32

17 Linear estimators Consider here a specific family of estimators. YES, Eurandom, 10 October 2011 p. 7/32

18 Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1,ˆθ 2,...), ˆθk = λ k X k YES, Eurandom, 10 October 2011 p. 7/32

19 Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1,ˆθ 2,...), ˆθk = λ k X k and ˆf(λ) = ˆθ k ϕ k. k=1 YES, Eurandom, 10 October 2011 p. 7/32

20 Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1,ˆθ 2,...), ˆθk = λ k X k and ˆf(λ) = ˆθ k ϕ k. k=1 The L 2 risk of a linear estimator is E ˆf(λ) f 2 = R(θ,λ) = (1 λ k ) 2 θ 2 k +ε2 k=1 k=1 σ 2 k λ2 k. YES, Eurandom, 10 October 2011 p. 7/32

21 Equivalence in sequence space By use of the SVD in the spectral theorem one obtains for a general regularization method ˆf(λ) = Φ γ (A A)A Y = Φ γ (b 2 k )b ky k ϕ k = Φ γ (b 2 k )b2 k X kϕ k, k=1 k=1 YES, Eurandom, 10 October 2011 p. 8/32

22 Equivalence in sequence space By use of the SVD in the spectral theorem one obtains for a general regularization method ˆf(λ) = Φ γ (A A)A Y = Φ γ (b 2 k )b ky k ϕ k = Φ γ (b 2 k )b2 k X kϕ k, k=1 k=1 corresponds to the special case of linear estimator with λ k = Φ γ (b 2 k )b2 k. YES, Eurandom, 10 October 2011 p. 8/32

23 Projection estimators Standard projection weights λ k = I(k N) The projection estimator (also called spectral cut-off). YES, Eurandom, 10 October 2011 p. 9/32

24 Projection estimators Standard projection weights λ k = I(k N) The projection estimator (also called spectral cut-off). ˆθ(N) = { X k, k N, 0, k > N. The value N is called the bandwidth. YES, Eurandom, 10 October 2011 p. 9/32

25 Projection estimators Standard projection weights λ k = I(k N) The projection estimator (also called spectral cut-off). ˆθ(N) = { X k, k N, 0, k > N. The value N is called the bandwidth. The projection estimator is then defined by ˆf N = N k=1 X k ϕ k. YES, Eurandom, 10 October 2011 p. 9/32

26 Projection estimators Standard projection weights λ k = I(k N) The projection estimator (also called spectral cut-off). ˆθ(N) = { X k, k N, 0, k > N. The value N is called the bandwidth. The projection estimator is then defined by ˆf N = N k=1 X k ϕ k. Estimate first N coefficients θ k by empirical counter-part X k and then estimate remainder terms by 0 for k > N. YES, Eurandom, 10 October 2011 p. 9/32

27 Tikhonov regularization Tikhonov method is one well-known regularization method. YES, Eurandom, 10 October 2011 p. 10/32

28 Tikhonov regularization Tikhonov method is one well-known regularization method. Tikhonov regularization method (Tikhonov (1963)). In this method, minimize functional L γ (ϕ): { Aϕ Y 2 +γ ϕ 2}, inf ϕ H where γ > 0 is some tuning parameter. YES, Eurandom, 10 October 2011 p. 10/32

29 Tikhonov regularization Tikhonov method is one well-known regularization method. Tikhonov regularization method (Tikhonov (1963)). In this method, minimize functional L γ (ϕ): { Aϕ Y 2 +γ ϕ 2}, inf ϕ H where γ > 0 is some tuning parameter. The minimum is then attained by ˆf γ = (A A+γI) 1 A Y. YES, Eurandom, 10 October 2011 p. 10/32

30 Tikhonov regularization Tikhonov method is one well-known regularization method. Tikhonov regularization method (Tikhonov (1963)). In this method, minimize functional L γ (ϕ): { Aϕ Y 2 +γ ϕ 2}, inf ϕ H where γ > 0 is some tuning parameter. The minimum is then attained by ˆf γ = (A A+γI) 1 A Y. Choice of γ very sensitive since characterizes balance between fitting and smoothing. YES, Eurandom, 10 October 2011 p. 10/32

31 Tikhonov estimator Define Tikhonov estimator by its form in SVD domain : λ k = 1 1+γσ 2 k. YES, Eurandom, 10 October 2011 p. 11/32

32 Tikhonov estimator Define Tikhonov estimator by its form in SVD domain : λ k = 1 1+γσ 2 k. In special case where A = I, estimator defined and computed as a modified version of Tikhonov regularization and called spline (Wahba (1990)). YES, Eurandom, 10 October 2011 p. 11/32

33 Tikhonov estimator Define Tikhonov estimator by its form in SVD domain : λ k = 1 1+γσ 2 k. In special case where A = I, estimator defined and computed as a modified version of Tikhonov regularization and called spline (Wahba (1990)). In parametric context of linear regression, method called ridge regression, see Hoerl (1962). YES, Eurandom, 10 October 2011 p. 11/32

34 Tikhonov estimator Define Tikhonov estimator by its form in SVD domain : λ k = 1 1+γσ 2 k. In special case where A = I, estimator defined and computed as a modified version of Tikhonov regularization and called spline (Wahba (1990)). In parametric context of linear regression, method called ridge regression, see Hoerl (1962). Improve on least-squares estimator when singular values of design matrix are close to 0. YES, Eurandom, 10 October 2011 p. 11/32

35 Landweber iteration Standard method, based on idea to minimize Aϕ Y by steepest descent (i.e. Gradient descent algorithm). YES, Eurandom, 10 October 2011 p. 12/32

36 Landweber iteration Standard method, based on idea to minimize Aϕ Y by steepest descent (i.e. Gradient descent algorithm). Choose direction h equals to minus gradient (here approximate gradient). Thus, obtain h = A (Aϕ Y). YES, Eurandom, 10 October 2011 p. 12/32

37 Landweber iteration Standard method, based on idea to minimize Aϕ Y by steepest descent (i.e. Gradient descent algorithm). Choose direction h equals to minus gradient (here approximate gradient). Thus, obtain h = A (Aϕ Y). Recursion formula ϕ 0 = ˆf 0 = 0 and for some µ > 0, ˆf n = ˆf n 1 µa (Aˆf n 1 Y), YES, Eurandom, 10 October 2011 p. 12/32

38 Landweber iteration Standard method, based on idea to minimize Aϕ Y by steepest descent (i.e. Gradient descent algorithm). Choose direction h equals to minus gradient (here approximate gradient). Thus, obtain h = A (Aϕ Y). Recursion formula ϕ 0 = ˆf 0 = 0 and for some µ > 0, ˆf n = ˆf n 1 µa (Aˆf n 1 Y), Method is called Landweber iteration. YES, Eurandom, 10 October 2011 p. 12/32

39 Landweber iteration Standard method, based on idea to minimize Aϕ Y by steepest descent (i.e. Gradient descent algorithm). Choose direction h equals to minus gradient (here approximate gradient). Thus, obtain h = A (Aϕ Y). Recursion formula ϕ 0 = ˆf 0 = 0 and for some µ > 0, ˆf n = ˆf n 1 µa (Aˆf n 1 Y), Method is called Landweber iteration. By induction, n 1 ˆf n = j=0 (I µa A) j µa Y. YES, Eurandom, 10 October 2011 p. 12/32

40 Properties of Landweber Parameter µ chosen such that µ A A 1 with strong influence on convergence. YES, Eurandom, 10 October 2011 p. 13/32

41 Properties of Landweber Parameter µ chosen such that µ A A 1 with strong influence on convergence. Regularization parameter is the number of iterations n. YES, Eurandom, 10 October 2011 p. 13/32

42 Properties of Landweber Parameter µ chosen such that µ A A 1 with strong influence on convergence. Regularization parameter is the number of iterations n. Computational point of view, method faster than Tikhonov, since no need to invert an operator. YES, Eurandom, 10 October 2011 p. 13/32

43 Properties of Landweber Parameter µ chosen such that µ A A 1 with strong influence on convergence. Regularization parameter is the number of iterations n. Computational point of view, method faster than Tikhonov, since no need to invert an operator. Landweber iteration has some important drawbacks. YES, Eurandom, 10 October 2011 p. 13/32

44 Properties of Landweber Parameter µ chosen such that µ A A 1 with strong influence on convergence. Regularization parameter is the number of iterations n. Computational point of view, method faster than Tikhonov, since no need to invert an operator. Landweber iteration has some important drawbacks. the number of iterations may be large. YES, Eurandom, 10 October 2011 p. 13/32

45 Pinsker estimator Pinsker estimator (Pinsker (1980)), a special class of linear estimators defined by weights (Belitser and Levit (1995)) YES, Eurandom, 10 October 2011 p. 14/32

46 Pinsker estimator Pinsker estimator (Pinsker (1980)), a special class of linear estimators defined by weights (Belitser and Levit (1995)) λ k = (1 c ε a k ) +, where c ε is the solution of the equation ε 2 k=1 σ 2 k a k(1 c ε a k ) + = c ε L, with x + = max(0,x) and a k > 0. YES, Eurandom, 10 October 2011 p. 14/32

47 Pinsker estimator Pinsker estimator (Pinsker (1980)), a special class of linear estimators defined by weights (Belitser and Levit (1995)) λ k = (1 c ε a k ) +, where c ε is the solution of the equation ε 2 k=1 σ 2 k a k(1 c ε a k ) + = c ε L, with x + = max(0,x) and a k > 0. In context of estimation in ellipsoids, attains optimal rate of convergence, but also exact minimax constant. YES, Eurandom, 10 October 2011 p. 14/32

48 Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. YES, Eurandom, 10 October 2011 p. 15/32

49 Classes of linear estimators Projection estimators (spectral cut-off), Tikhonov regularization, λ k = I(k N), N > 0. λ k = (1+γσ 2 k ) 1, γ > 0. YES, Eurandom, 10 October 2011 p. 15/32

50 Classes of linear estimators Projection estimators (spectral cut-off), Tikhonov regularization, Landweber iteration, λ k = I(k N), N > 0. λ k = (1+γσ 2 k ) 1, γ > 0. λ k = 1 (1 σ 2 k )n,n N. YES, Eurandom, 10 October 2011 p. 15/32

51 Classes of linear estimators Projection estimators (spectral cut-off), Tikhonov regularization, Landweber iteration, Pinsker filter. λ k = I(k N), N > 0. λ k = (1+γσ 2 k ) 1, γ > 0. λ k = 1 (1 σ 2 k )n,n N. λ k = (1 c ε a k ) +, c ε > 0. YES, Eurandom, 10 October 2011 p. 15/32

52 Classes of linear estimators Projection estimators (spectral cut-off), Tikhonov regularization, Landweber iteration, Pinsker filter. λ k = I(k N), N > 0. λ k = (1+γσ 2 k ) 1, γ > 0. λ k = 1 (1 σ 2 k )n,n N. λ k = (1 c ε a k ) +, c ε > 0. Choice of N,γ or n? YES, Eurandom, 10 October 2011 p. 15/32

53 Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: { Θ = Θ(a,L) = θ : k=1 a 2 k θ2 k L } where a = {a k }, where a k > 0,a k and L > 0., YES, Eurandom, 10 October 2011 p. 16/32

54 Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: { Θ = Θ(a,L) = θ : k=1 a 2 k θ2 k L } where a = {a k }, where a k > 0,a k and L > 0. For large values of k coefficients θ k will be decreasing with k and then be small., YES, Eurandom, 10 October 2011 p. 16/32

55 Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: { Θ = Θ(a,L) = θ : k=1 a 2 k θ2 k L } where a = {a k }, where a k > 0,a k and L > 0. For large values of k coefficients θ k will be decreasing with k and then be small. Assumptions on coefficients θ k usually related to properties (smoothness) on f., YES, Eurandom, 10 October 2011 p. 16/32

56 Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } YES, Eurandom, 10 October 2011 p. 17/32

57 Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } where Θ(α,L) = Θ(a,L) with a = {a k } polynomial such that a 1 = 0 and { (k 1) α for k odd, a k = k α for k even, where α > 0, L > 0. We have also k = 2,3,..., YES, Eurandom, 10 October 2011 p. 17/32

58 Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } where Θ(α,L) = Θ(a,L) with a = {a k } polynomial such that a 1 = 0 and { (k 1) α for k odd, a k = k α for k even, k = 2,3,..., where α > 0, L > 0. We have also { W(α,L) = f periodic : 1 0 (f (α) (t)) 2 dt π 2α L }. YES, Eurandom, 10 October 2011 p. 17/32

59 Analytic classes Consider also more restrictive conditions and classes of functions { } A(α,L) = f = k=1 θ k ϕ k : θ Θ A (α,l) YES, Eurandom, 10 October 2011 p. 18/32

60 Analytic classes Consider also more restrictive conditions and classes of functions { } A(α,L) = f = k=1 θ k ϕ k : θ Θ A (α,l) where a = {a k } exponential, α > 0, and L > 0, a k = exp(αk). YES, Eurandom, 10 October 2011 p. 18/32

61 Analytic classes Consider also more restrictive conditions and classes of functions { } A(α,L) = f = k=1 θ k ϕ k : θ Θ A (α,l) where a = {a k } exponential, α > 0, and L > 0, a k = exp(αk). Classes of analytical functions (functions which admit an analytical continuation into a band of the complex plane, Ibragimov and Khasminskii (1984)). YES, Eurandom, 10 October 2011 p. 18/32

62 Analytic classes Consider also more restrictive conditions and classes of functions { } A(α,L) = f = k=1 θ k ϕ k : θ Θ A (α,l) where a = {a k } exponential, α > 0, and L > 0, a k = exp(αk). Classes of analytical functions (functions which admit an analytical continuation into a band of the complex plane, Ibragimov and Khasminskii (1984)). These functions are then very smooth. YES, Eurandom, 10 October 2011 p. 18/32

63 Optimal rates of convergence Function f has Fourier coefficients in some ellipsoid, and the problem is mildly, severely ill-posed or even direct. Rates appear in the following table : YES, Eurandom, 10 October 2011 p. 19/32

64 Optimal rates of convergence Function f has Fourier coefficients in some ellipsoid, and the problem is mildly, severely ill-posed or even direct. Rates appear in the following table : Problem/Functions Sobolev Analytic Direct problem Mildly ill-posed ε 4α 2α+1 ε 2 (log 1 ε ) ε 4α 2α+2β+1 ε 2 (log 1 ε )2β+1 Severely ill-posed (log 1 ε ) 2α ε 4α 2α+2β YES, Eurandom, 10 October 2011 p. 19/32

65 Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. YES, Eurandom, 10 October 2011 p. 20/32

66 Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. When β increases rates are slower. YES, Eurandom, 10 October 2011 p. 20/32

67 Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. When β increases rates are slower. In direct model, standard rates for nonparametric estimation. For example, 2α/(2α + 1) with Sobolev classes. YES, Eurandom, 10 October 2011 p. 20/32

68 Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. YES, Eurandom, 10 October 2011 p. 21/32

69 Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. YES, Eurandom, 10 October 2011 p. 21/32

70 Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. Even if operator A (and its degree β) is known, no real meaning to consider smoothness of f as known. YES, Eurandom, 10 October 2011 p. 21/32

71 Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. Even if operator A (and its degree β) is known, no real meaning to consider smoothness of f as known. Notion of adaptation and oracle inequalities, i.e. how to choose the tuning parameter (N, γ or n) without prior assumptions on f. YES, Eurandom, 10 October 2011 p. 21/32

72 Minimax adaptive procedures The starting point of the approach of minimax adaptation is a collection A = {Θ α } of classes Θ α l 2. YES, Eurandom, 10 October 2011 p. 22/32

73 Minimax adaptive procedures The starting point of the approach of minimax adaptation is a collection A = {Θ α } of classes Θ α l 2. The statistician knows that θ belongs to some member Θ α of the collection A, but does not know exactly which one. YES, Eurandom, 10 October 2011 p. 22/32

74 Minimax adaptive procedures The starting point of the approach of minimax adaptation is a collection A = {Θ α } of classes Θ α l 2. The statistician knows that θ belongs to some member Θ α of the collection A, but does not know exactly which one. If Θ α is a smoothness class, this assumption can be interpreted as follows : the underlying function has some smoothness, but one does not know the degree of smoothness. YES, Eurandom, 10 October 2011 p. 22/32

75 Minimax adaptive procedures An estimator θ is called minimax adaptive on the scale of classes A if for every Θ α A the estimator θ attains the optimal rate of convergence (Lepskii (1990)). YES, Eurandom, 10 October 2011 p. 23/32

76 Minimax adaptive procedures An estimator θ is called minimax adaptive on the scale of classes A if for every Θ α A the estimator θ attains the optimal rate of convergence (Lepskii (1990)). Estimator which adapts to the unknown smoothness of the function. YES, Eurandom, 10 October 2011 p. 23/32

77 Minimax adaptive procedures An estimator θ is called minimax adaptive on the scale of classes A if for every Θ α A the estimator θ attains the optimal rate of convergence (Lepskii (1990)). Estimator which adapts to the unknown smoothness of the function. In some cases, no estimator attains (exactly) the optimal rate on the whole scale. One has often to pay a price for adaptation (Lepskii (1992)). YES, Eurandom, 10 October 2011 p. 23/32

78 Minimax adaptive procedures An estimator θ is called minimax adaptive on the scale of classes A if for every Θ α A the estimator θ attains the optimal rate of convergence (Lepskii (1990)). Estimator which adapts to the unknown smoothness of the function. In some cases, no estimator attains (exactly) the optimal rate on the whole scale. One has often to pay a price for adaptation (Lepskii (1992)). Lepski procedure in inverse problems (Goldenshluger and Pereverzev (2000), Bauer and Hohage (2005) or Mathé (2006)). YES, Eurandom, 10 October 2011 p. 23/32

79 Comments Minimax adaptive estimators are really important in statistics from a theoretical and from a practical point of view. YES, Eurandom, 10 October 2011 p. 24/32

80 Comments Minimax adaptive estimators are really important in statistics from a theoretical and from a practical point of view. Estimator which automatically adapts to the unknown smoothness of the underlying function. YES, Eurandom, 10 October 2011 p. 24/32

81 Comments Minimax adaptive estimators are really important in statistics from a theoretical and from a practical point of view. Estimator which automatically adapts to the unknown smoothness of the underlying function. Indeed, it implies that these estimators are optimal for any possible parameter in the collection A. YES, Eurandom, 10 October 2011 p. 24/32

82 Comments Minimax adaptive estimators are really important in statistics from a theoretical and from a practical point of view. Estimator which automatically adapts to the unknown smoothness of the underlying function. Indeed, it implies that these estimators are optimal for any possible parameter in the collection A. Practical point of view, it garantees a good accuracy of the estimator for a very large choice of functions. YES, Eurandom, 10 October 2011 p. 24/32

83 Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). YES, Eurandom, 10 October 2011 p. 25/32

84 Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. YES, Eurandom, 10 October 2011 p. 25/32

85 Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. However, this is not an estimator since the risk depends on the unknown θ, the oracle will depend also. YES, Eurandom, 10 October 2011 p. 25/32

86 Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. However, this is not an estimator since the risk depends on the unknown θ, the oracle will depend also. An oracle is the best in the family, but it knows the true θ. YES, Eurandom, 10 October 2011 p. 25/32

87 Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. YES, Eurandom, 10 October 2011 p. 26/32

88 Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). YES, Eurandom, 10 October 2011 p. 26/32

89 Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). YES, Eurandom, 10 October 2011 p. 26/32

90 Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). Originally, the URE was in the context of regression estimation. Nowadays, it is a basic adaptation tool for many statistical models. YES, Eurandom, 10 October 2011 p. 26/32

91 Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). Originally, the URE was in the context of regression estimation. Nowadays, it is a basic adaptation tool for many statistical models. This idea appears also in all the cross-validation techniques. YES, Eurandom, 10 October 2011 p. 26/32

92 URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. YES, Eurandom, 10 October 2011 p. 27/32

93 URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. In this setting, the functional U(X,λ) = (1 λ k ) 2 (X 2 k ε2 σ 2 k )+ε2 σ 2 k λ2 k k=1 k=1 YES, Eurandom, 10 October 2011 p. 27/32

94 URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. In this setting, the functional U(X,λ) = (1 λ k ) 2 (X 2 k ε2 σ 2 k )+ε2 σ 2 k λ2 k k=1 k=1 is an unbiased estimator of R(θ,λ): R(θ,λ) = E θ U(X,λ), λ. YES, Eurandom, 10 October 2011 p. 27/32

95 Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). YES, Eurandom, 10 October 2011 p. 28/32

96 Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). This leads to the following data-driven choice of λ: λ = argmin λ Λ U(X,λ). YES, Eurandom, 10 October 2011 p. 28/32

97 Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). This leads to the following data-driven choice of λ: λ = argmin λ Λ U(X,λ). Define then the estimator θ by θ k = λ k X k. YES, Eurandom, 10 October 2011 p. 28/32

98 Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. YES, Eurandom, 10 October 2011 p. 29/32

99 Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. For any λ Λ 0 < k=1 σ 2 k λ2 k <, max sup λ k 1. λ Λ k YES, Eurandom, 10 October 2011 p. 29/32

100 Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. For any λ Λ 0 < k=1 σ 2 k λ2 k <, max sup λ k 1. λ Λ k There exists a constant C 1 > 0 such that, uniformly in λ Λ, σ 4 k λ2 k C 1 σ 4 k λ4 k. k=1 k=1 YES, Eurandom, 10 October 2011 p. 29/32

101 Oracle inequality for URE Theorem. Suppose σ k k β, β 0. Assume that Λ is finite with cardinality D and belongs to the family of Projection, Tikhonov or Pinsker estimators. There exist constants γ,c > 0 such that ε > 0, θ l 2, we have for B large enough, E θ θ θ 2 (1+γB 1 )min λ Λ R(θ,λ)+BC ε 2 (log(ds)) 2β+1. YES, Eurandom, 10 October 2011 p. 30/32

102 Oracle inequality for URE Theorem. Suppose σ k k β, β 0. Assume that Λ is finite with cardinality D and belongs to the family of Projection, Tikhonov or Pinsker estimators. There exist constants γ,c > 0 such that ε > 0, θ l 2, we have for B large enough, E θ θ θ 2 (1+γB 1 )min λ Λ R(θ,λ)+BC ε 2 (log(ds)) 2β+1. The data-driven choice by URE mimics the oracle. YES, Eurandom, 10 October 2011 p. 30/32

103 Comments Oracle inequalities prove that the estimator has a risk of the order of the oracle. YES, Eurandom, 10 October 2011 p. 31/32

104 Comments Oracle inequalities prove that the estimator has a risk of the order of the oracle. We are interested in data-driven methods, and then automatic, which more or less mimic the oracle. YES, Eurandom, 10 October 2011 p. 31/32

105 Comments Oracle inequalities prove that the estimator has a risk of the order of the oracle. We are interested in data-driven methods, and then automatic, which more or less mimic the oracle. The oracle approach is in some sense the opposite of the minimax approach. YES, Eurandom, 10 October 2011 p. 31/32

106 Comments Oracle inequalities prove that the estimator has a risk of the order of the oracle. We are interested in data-driven methods, and then automatic, which more or less mimic the oracle. The oracle approach is in some sense the opposite of the minimax approach. Fix a family of estimators and choose the best one among them. YES, Eurandom, 10 October 2011 p. 31/32

107 Comments Oracle inequalities prove that the estimator has a risk of the order of the oracle. We are interested in data-driven methods, and then automatic, which more or less mimic the oracle. The oracle approach is in some sense the opposite of the minimax approach. Fix a family of estimators and choose the best one among them. In the minimax approach, best accuracy for functions which belong to some function class. YES, Eurandom, 10 October 2011 p. 31/32

108 Comments The oracle approach is often used as a tool in order to obtain adaptive estimators. YES, Eurandom, 10 October 2011 p. 32/32

109 Comments The oracle approach is often used as a tool in order to obtain adaptive estimators. best estimator in a given class often attains the optimal rate of convergence. YES, Eurandom, 10 October 2011 p. 32/32

110 Comments The oracle approach is often used as a tool in order to obtain adaptive estimators. best estimator in a given class often attains the optimal rate of convergence. On the other hand, the minimax theory may be viewed as a justification for oracle inequality. YES, Eurandom, 10 October 2011 p. 32/32

111 Comments The oracle approach is often used as a tool in order to obtain adaptive estimators. best estimator in a given class often attains the optimal rate of convergence. On the other hand, the minimax theory may be viewed as a justification for oracle inequality. Indeed, one may ask if the given family of estimators is satisfying. YES, Eurandom, 10 October 2011 p. 32/32

112 Comments The oracle approach is often used as a tool in order to obtain adaptive estimators. best estimator in a given class often attains the optimal rate of convergence. On the other hand, the minimax theory may be viewed as a justification for oracle inequality. Indeed, one may ask if the given family of estimators is satisfying. Possible answer comes from minimax results, which prove that a given family gives optimal estimators. YES, Eurandom, 10 October 2011 p. 32/32

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).

More information

Inverse problems in statistics

Inverse problems in statistics Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/27 Table of contents YES, Eurandom, 10 October 2011 p. 2/27 Table of contents 1)

More information

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,

More information

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry,

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Statistical Inverse Problems and Instrumental Variables

Statistical Inverse Problems and Instrumental Variables Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology

More information

Convergence rates of spectral methods for statistical inverse learning problems

Convergence rates of spectral methods for statistical inverse learning problems Convergence rates of spectral methods for statistical inverse learning problems G. Blanchard Universtität Potsdam UCL/Gatsby unit, 04/11/2015 Joint work with N. Mücke (U. Potsdam); N. Krämer (U. München)

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Minimax Risk: Pinsker Bound

Minimax Risk: Pinsker Bound Minimax Risk: Pinsker Bound Michael Nussbaum Cornell University From: Encyclopedia of Statistical Sciences, Update Volume (S. Kotz, Ed.), 1999. Wiley, New York. Abstract We give an account of the Pinsker

More information

ORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE

ORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE ORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE QINIAN JIN AND PETER MATHÉ Abstract. We consider statistical linear inverse problems in Hilbert spaces. Approximate solutions are sought within

More information

Adaptive Piecewise Polynomial Estimation via Trend Filtering

Adaptive Piecewise Polynomial Estimation via Trend Filtering Adaptive Piecewise Polynomial Estimation via Trend Filtering Liubo Li, ShanShan Tu The Ohio State University li.2201@osu.edu, tu.162@osu.edu October 1, 2015 Liubo Li, ShanShan Tu (OSU) Trend Filtering

More information

Fast learning rates for plug-in classifiers under the margin condition

Fast learning rates for plug-in classifiers under the margin condition Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,

More information

Spectral Regularization

Spectral Regularization Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Regularization via Spectral Filtering

Regularization via Spectral Filtering Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators

Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators Clément Marteau, Institut Camille Jordan, Université Lyon I - Claude Bernard, 43 boulevard du novembre 98,

More information

High-dimensional regression with unknown variance

High-dimensional regression with unknown variance High-dimensional regression with unknown variance Christophe Giraud Ecole Polytechnique march 2012 Setting Gaussian regression with unknown variance: Y i = f i + ε i with ε i i.i.d. N (0, σ 2 ) f = (f

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Prediction Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict the

More information

Gaussian model selection

Gaussian model selection J. Eur. Math. Soc. 3, 203 268 (2001) Digital Object Identifier (DOI) 10.1007/s100970100031 Lucien Birgé Pascal Massart Gaussian model selection Received February 1, 1999 / final version received January

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Springer Series in Statistics

Springer Series in Statistics Springer Series in Statistics Advisors: P. Bickel, P. Diggle, S. Fienberg, U. Gather, I. Olkin, S. Zeger The French edition of this work that is the basis of this expanded edition was translated by Vladimir

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII Nonparametric estimation using wavelet methods Dominique Picard Laboratoire Probabilités et Modèles Aléatoires Université Paris VII http ://www.proba.jussieu.fr/mathdoc/preprints/index.html 1 Nonparametric

More information

COMPUTATION OF FOURIER TRANSFORMS FOR NOISY B FOR NOISY BANDLIMITED SIGNALS

COMPUTATION OF FOURIER TRANSFORMS FOR NOISY B FOR NOISY BANDLIMITED SIGNALS COMPUTATION OF FOURIER TRANSFORMS FOR NOISY BANDLIMITED SIGNALS October 22, 2011 I. Introduction Definition of Fourier transform: F [f ](ω) := ˆf (ω) := + f (t)e iωt dt, ω R (1) I. Introduction Definition

More information

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1999 Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach T. Tony Cai University of Pennsylvania

More information

Stochastic gradient descent and robustness to ill-conditioning

Stochastic gradient descent and robustness to ill-conditioning Stochastic gradient descent and robustness to ill-conditioning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France ÉCOLE NORMALE SUPÉRIEURE Joint work with Aymeric Dieuleveut, Nicolas Flammarion,

More information

Bootstrap tuning in model choice problem

Bootstrap tuning in model choice problem Weierstrass Institute for Applied Analysis and Stochastics Bootstrap tuning in model choice problem Vladimir Spokoiny, (with Niklas Willrich) WIAS, HU Berlin, MIPT, IITP Moscow SFB 649 Motzen, 17.07.2015

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Class 2 & 3 Overfitting & Regularization

Class 2 & 3 Overfitting & Regularization Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

A Lower Bound Theorem. Lin Hu.

A Lower Bound Theorem. Lin Hu. American J. of Mathematics and Sciences Vol. 3, No -1,(January 014) Copyright Mind Reader Publications ISSN No: 50-310 A Lower Bound Theorem Department of Applied Mathematics, Beijing University of Technology,

More information

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania Professors Antoniadis and Fan are

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Uncertainty Quantification for Inverse Problems. November 7, 2011

Uncertainty Quantification for Inverse Problems. November 7, 2011 Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance

More information

Ridge regression and asymptotic minimax estimation over spheres of growing dimension

Ridge regression and asymptotic minimax estimation over spheres of growing dimension Ridge regression and asymptotic minimax estimation over spheres of growing dimension Lee H. Dicker Department of Statistics and Biostatistics Rutgers University 50 Hill Center, 0 Frelinghuysen Road Piscataway,

More information

Persistent homology and nonparametric regression

Persistent homology and nonparametric regression Cleveland State University March 10, 2009, BIRS: Data Analysis using Computational Topology and Geometric Statistics joint work with Gunnar Carlsson (Stanford), Moo Chung (Wisconsin Madison), Peter Kim

More information

2 Tikhonov Regularization and ERM

2 Tikhonov Regularization and ERM Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods

More information

The Stein hull. Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France

The Stein hull. Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France Journal of Nonparametric Statistics Vol. 22, No. 6, August 2010, 685 702 The Stein hull Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F-31 077 Toulouse

More information

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals

Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals Acta Applicandae Mathematicae 78: 145 154, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 145 Asymptotically Efficient Nonparametric Estimation of Nonlinear Spectral Functionals M.

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

Numerical differentiation by means of Legendre polynomials in the presence of square summable noise

Numerical differentiation by means of Legendre polynomials in the presence of square summable noise www.oeaw.ac.at Numerical differentiation by means of Legendre polynomials in the presence of square summable noise S. Lu, V. Naumova, S. Pereverzyev RICAM-Report 2012-15 www.ricam.oeaw.ac.at Numerical

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Model selection theory: a tutorial with applications to learning

Model selection theory: a tutorial with applications to learning Model selection theory: a tutorial with applications to learning Pascal Massart Université Paris-Sud, Orsay ALT 2012, October 29 Asymptotic approach to model selection - Idea of using some penalized empirical

More information

Iterative regularization of nonlinear ill-posed problems in Banach space

Iterative regularization of nonlinear ill-posed problems in Banach space Iterative regularization of nonlinear ill-posed problems in Banach space Barbara Kaltenbacher, University of Klagenfurt joint work with Bernd Hofmann, Technical University of Chemnitz, Frank Schöpfer and

More information

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS Olivier Scaillet a * This draft: July 2016. Abstract This note shows that adding monotonicity or convexity

More information

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates : A Distributed Algorithm with Minimax Optimal Rates Yuchen Zhang, John C. Duchi, Martin Wainwright (UC Berkeley;http://arxiv.org/pdf/1305.509; Apr 9, 014) Gatsby Unit, Tea Talk June 10, 014 Outline Motivation.

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Penalized Barycenters in the Wasserstein space

Penalized Barycenters in the Wasserstein space Penalized Barycenters in the Wasserstein space Elsa Cazelles, joint work with Jérémie Bigot & Nicolas Papadakis Université de Bordeaux & CNRS Journées IOP - Du 5 au 8 Juillet 2017 Bordeaux Elsa Cazelles

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018 Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get

More information

Instrumental Variables Estimation and Other Inverse Problems in Econometrics. February Jean-Pierre Florens (TSE)

Instrumental Variables Estimation and Other Inverse Problems in Econometrics. February Jean-Pierre Florens (TSE) Instrumental Variables Estimation and Other Inverse Problems in Econometrics February 2011 Jean-Pierre Florens (TSE) 2 I - Introduction Econometric model: Relation between Y, Z and U Y, Z observable random

More information

Optimal Estimation of a Nonsmooth Functional

Optimal Estimation of a Nonsmooth Functional Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose

More information

Less is More: Computational Regularization by Subsampling

Less is More: Computational Regularization by Subsampling Less is More: Computational Regularization by Subsampling Lorenzo Rosasco University of Genova - Istituto Italiano di Tecnologia Massachusetts Institute of Technology lcsl.mit.edu joint work with Alessandro

More information

Machine Learning and Data Mining. Linear regression. Kalev Kask

Machine Learning and Data Mining. Linear regression. Kalev Kask Machine Learning and Data Mining Linear regression Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ Parameters q Learning algorithm Program ( Learner ) Change q Improve performance

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

REGULARIZATION OF SOME LINEAR ILL-POSED PROBLEMS WITH DISCRETIZED RANDOM NOISY DATA

REGULARIZATION OF SOME LINEAR ILL-POSED PROBLEMS WITH DISCRETIZED RANDOM NOISY DATA REGULARIZATION OF SOME LINEAR ILL-POSED PROBLEMS WITH DISCRETIZED RANDOM NOISY DATA PETER MATHÉ AND SERGEI V. PEREVERZEV Abstract. For linear statistical ill-posed problems in Hilbert spaces we introduce

More information

Iterative Methods for Smooth Objective Functions

Iterative Methods for Smooth Objective Functions Optimization Iterative Methods for Smooth Objective Functions Quadratic Objective Functions Stationary Iterative Methods (first/second order) Steepest Descent Method Landweber/Projected Landweber Methods

More information

arxiv: v4 [stat.me] 27 Nov 2017

arxiv: v4 [stat.me] 27 Nov 2017 CLASSIFICATION OF LOCAL FIELD POTENTIALS USING GAUSSIAN SEQUENCE MODEL Taposh Banerjee John Choi Bijan Pesaran Demba Ba and Vahid Tarokh School of Engineering and Applied Sciences, Harvard University Center

More information

CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION

CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION 1 CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION Yi Lin and Ming Yuan University of Wisconsin-Madison and Georgia Institute of Technology Abstract: Regularization with radial

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Is there an optimal weighting for linear inverse problems?

Is there an optimal weighting for linear inverse problems? Is there an optimal weighting for linear inverse problems? Jean-Pierre FLORENS Toulouse School of Economics Senay SOKULLU University of Bristol October 9, 205 Abstract This paper considers linear equations

More information

l 1 and l 2 Regularization

l 1 and l 2 Regularization David Rosenberg New York University February 5, 2015 David Rosenberg (New York University) DS-GA 1003 February 5, 2015 1 / 32 Tikhonov and Ivanov Regularization Hypothesis Spaces We ve spoken vaguely about

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Nonparametric Estimation: Part I

Nonparametric Estimation: Part I Nonparametric Estimation: Part I Regression in Function Space Yanjun Han Department of Electrical Engineering Stanford University yjhan@stanford.edu November 13, 2015 (Pray for Paris) Outline 1 Nonparametric

More information

Iterative Convex Regularization

Iterative Convex Regularization Iterative Convex Regularization Lorenzo Rosasco Universita di Genova Universita di Genova Istituto Italiano di Tecnologia Massachusetts Institute of Technology Optimization and Statistical Learning Workshop,

More information

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Preconditioning. Noisy, Ill-Conditioned Linear Systems Preconditioning Noisy, Ill-Conditioned Linear Systems James G. Nagy Emory University Atlanta, GA Outline 1. The Basic Problem 2. Regularization / Iterative Methods 3. Preconditioning 4. Example: Image

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Degrees of Freedom in Regression Ensembles

Degrees of Freedom in Regression Ensembles Degrees of Freedom in Regression Ensembles Henry WJ Reeve Gavin Brown University of Manchester - School of Computer Science Kilburn Building, University of Manchester, Oxford Rd, Manchester M13 9PL Abstract.

More information

Nonparametric regression using deep neural networks with ReLU activation function

Nonparametric regression using deep neural networks with ReLU activation function Nonparametric regression using deep neural networks with ReLU activation function Johannes Schmidt-Hieber February 2018 Caltech 1 / 20 Many impressive results in applications... Lack of theoretical understanding...

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes

Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes By Tomohito NAITO, Kohei ASAI and Masanobu TANIGUCHI Department of Mathematical Sciences, School of Science and Engineering,

More information

CONVERGENCE RATES OF GENERAL REGULARIZATION METHODS FOR STATISTICAL INVERSE PROBLEMS AND APPLICATIONS

CONVERGENCE RATES OF GENERAL REGULARIZATION METHODS FOR STATISTICAL INVERSE PROBLEMS AND APPLICATIONS CONVERGENCE RATES OF GENERAL REGULARIZATION METHODS FOR STATISTICAL INVERSE PROBLEMS AND APPLICATIONS BY N. BISSANTZ, T. HOHAGE, A. MUNK AND F. RUYMGAART UNIVERSITY OF GÖTTINGEN AND TEXAS TECH UNIVERSITY

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

BLOCK THRESHOLDING AND SHARP ADAPTIVE ESTIMATION IN SEVERELY ILL-POSED INVERSE PROBLEMS 1)

BLOCK THRESHOLDING AND SHARP ADAPTIVE ESTIMATION IN SEVERELY ILL-POSED INVERSE PROBLEMS 1) Т Е О Р И Я В Е Р О Я Т Н О С Т Е Й Т о м 48 И Е Е П Р И М Е Н Е Н И Я В ы п у с к 3 2003 c 2003 г. CAVALIER L., GOLUBEV Y., LEPSKI O., TSYBAKOV A. BLOCK THRESHOLDING AND SHARP ADAPTIVE ESTIMATION IN SEVERELY

More information

OPTIMAL UNIFORM CONVERGENCE RATES FOR SIEVE NONPARAMETRIC INSTRUMENTAL VARIABLES REGRESSION. Xiaohong Chen and Timothy Christensen.

OPTIMAL UNIFORM CONVERGENCE RATES FOR SIEVE NONPARAMETRIC INSTRUMENTAL VARIABLES REGRESSION. Xiaohong Chen and Timothy Christensen. OPTIMAL UNIFORM CONVERGENCE RATES FOR SIEVE NONPARAMETRIC INSTRUMENTAL VARIABLES REGRESSION By Xiaohong Chen and Timothy Christensen November 2013 COWLES FOUNDATION DISCUSSION PAPER NO. 1923 COWLES FOUNDATION

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 21 Model selection Choosing the best model among a collection of models {M 1, M 2..., M N }. What is a good model? 1. fits the data well (model

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 27 Introduction Fredholm first kind integral equation of convolution type in one space dimension: g(x) = 1 k(x x )f(x

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Due Giorni di Algebra Lineare Numerica (2GALN) Febbraio 2016, Como. Iterative regularization in variable exponent Lebesgue spaces

Due Giorni di Algebra Lineare Numerica (2GALN) Febbraio 2016, Como. Iterative regularization in variable exponent Lebesgue spaces Due Giorni di Algebra Lineare Numerica (2GALN) 16 17 Febbraio 2016, Como Iterative regularization in variable exponent Lebesgue spaces Claudio Estatico 1 Joint work with: Brigida Bonino 1, Fabio Di Benedetto

More information

arxiv: v2 [math.st] 18 Oct 2018

arxiv: v2 [math.st] 18 Oct 2018 Bayesian inverse problems with partial observations Shota Gugushvili a,, Aad W. van der Vaart a, Dong Yan a a Mathematical Institute, Faculty of Science, Leiden University, P.O. Box 9512, 2300 RA Leiden,

More information

Preconditioning. Noisy, Ill-Conditioned Linear Systems

Preconditioning. Noisy, Ill-Conditioned Linear Systems Preconditioning Noisy, Ill-Conditioned Linear Systems James G. Nagy Emory University Atlanta, GA Outline 1. The Basic Problem 2. Regularization / Iterative Methods 3. Preconditioning 4. Example: Image

More information

How hard is this function to optimize?

How hard is this function to optimize? How hard is this function to optimize? John Duchi Based on joint work with Sabyasachi Chatterjee, John Lafferty, Yuancheng Zhu Stanford University West Coast Optimization Rumble October 2016 Problem minimize

More information

[11] Peter Mathé and Ulrich Tautenhahn, Regularization under general noise assumptions, Inverse Problems 27 (2011), no. 3,

[11] Peter Mathé and Ulrich Tautenhahn, Regularization under general noise assumptions, Inverse Problems 27 (2011), no. 3, Literatur [1] Radu Boţ, Bernd Hofmann, and Peter Mathé, Regularizability of illposed problems and the modulus of continuity, Zeitschrift für Analysis und ihre Anwendungen. Journal of Analysis and its Applications

More information

Minimax theory for a class of non-linear statistical inverse problems

Minimax theory for a class of non-linear statistical inverse problems Minimax theory for a class of non-linear statistical inverse problems Kolyan Ray and Johannes Schmidt-Hieber Leiden University Abstract We study a class of statistical inverse problems with non-linear

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Estimation of a quadratic regression functional using the sinc kernel

Estimation of a quadratic regression functional using the sinc kernel Estimation of a quadratic regression functional using the sinc kernel Nicolai Bissantz Hajo Holzmann Institute for Mathematical Stochastics, Georg-August-University Göttingen, Maschmühlenweg 8 10, D-37073

More information

Stochastic optimization in Hilbert spaces

Stochastic optimization in Hilbert spaces Stochastic optimization in Hilbert spaces Aymeric Dieuleveut Aymeric Dieuleveut Stochastic optimization Hilbert spaces 1 / 48 Outline Learning vs Statistics Aymeric Dieuleveut Stochastic optimization Hilbert

More information