Inverse problems in statistics

Similar documents
Inverse problems in statistics

Inverse problems in statistics

Model Selection and Geometry

D I S C U S S I O N P A P E R

Statistical Inverse Problems and Instrumental Variables

Empirical Risk Minimization as Parameter Choice Rule for General Linear Regularization Methods

ON ILL-POSEDNESS OF NONPARAMETRIC INSTRUMENTAL VARIABLE REGRESSION WITH CONVEXITY CONSTRAINTS

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Minimax Goodness-of-Fit Testing in Ill-Posed Inverse Problems with Partially Unknown Operators

A Lower Bound Theorem. Lin Hu.

The Stein hull. Clément Marteau* Institut de Mathématiques, Université de Toulouse, INSA - 135, Avenue de Rangueil, F Toulouse Cedex 4, France

Reproducing Kernel Hilbert Spaces

Inverse Statistical Learning

Singular value decomposition. If only the first p singular values are nonzero we write. U T o U p =0

Minimax Risk: Pinsker Bound

Reproducing Kernel Hilbert Spaces

Convergence rates of spectral methods for statistical inverse learning problems

Is there an optimal weighting for linear inverse problems?

Discussion of Regularization of Wavelets Approximations by A. Antoniadis and J. Fan

A talk on Oracle inequalities and regularization. by Sara van de Geer

Linear Inverse Problems

Statistical Geometry Processing Winter Semester 2011/2012

Statistical inference on Lévy processes

Instrumental Variables Estimation and Other Inverse Problems in Econometrics. February Jean-Pierre Florens (TSE)

Gaussian model selection

Inverse problems Total Variation Regularization Mark van Kraaij Casa seminar 23 May 2007 Technische Universiteit Eindh ove n University of Technology

Satellite image deconvolution using complex wavelet packets

Endogeneity in non separable models. Application to treatment models where the outcomes are durations

Adaptive estimation of functionals in nonparametric instrumental regression.

OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1

ECO Class 6 Nonparametric Econometrics

OPTIMAL UNIFORM CONVERGENCE RATES FOR SIEVE NONPARAMETRIC INSTRUMENTAL VARIABLES REGRESSION. Xiaohong Chen and Timothy Christensen.

Inverse problem and optimization

Nonparametric estimation using wavelet methods. Dominique Picard. Laboratoire Probabilités et Modèles Aléatoires Université Paris VII

Reproducing Kernel Hilbert Spaces

Ill-Posedness of Backward Heat Conduction Problem 1

Spectral Regularization

PDEs in Image Processing, Tutorials

Regularization methods for large-scale, ill-posed, linear, discrete, inverse problems

Bootstrap tuning in model choice problem

Direct estimation of linear functionals from indirect noisy observations

Optimal design for inverse problems

Regularization via Spectral Filtering

An Overview of Sparsity with Applications to Compression, Restoration, and Inverse Problems

SIGNAL AND IMAGE RESTORATION: SOLVING

41903: Introduction to Nonparametrics

arxiv: v2 [math.st] 18 Oct 2018

Uncertainty Quantification for Inverse Problems. November 7, 2011

ORACLE INEQUALITY FOR A STATISTICAL RAUS GFRERER TYPE RULE

Ultra High Dimensional Variable Selection with Endogenous Variables

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

Numerical differentiation by means of Legendre polynomials in the presence of square summable noise

Model selection theory: a tutorial with applications to learning

Fast learning rates for plug-in classifiers under the margin condition

2 Tikhonov Regularization and ERM

Multiscale Frame-based Kernels for Image Registration

Lecture 3: Statistical Decision Theory (Part II)

PhD Course: Introduction to Inverse Problem. Salvatore Frandina Siena, August 19, 2012

Estimation of a quadratic regression functional using the sinc kernel

Adaptive Wavelet Estimation: A Block Thresholding and Oracle Inequality Approach

Resolving the White Noise Paradox in the Regularisation of Inverse Problems

Additive Nonparametric Instrumental Regressions: A Guide to Implementation

LPA-ICI Applications in Image Processing

Optimal Estimation of a Nonsmooth Functional

Springer Series in Statistics

arxiv: v4 [stat.me] 27 Nov 2017

Reproducing Kernel Hilbert Spaces

An Introduction to Statistical Machine Learning - Theoretical Aspects -

Approximation Theoretical Questions for SVMs

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

Spectral Filtering for MultiOutput Learning

Wavelet Shrinkage for Nonequispaced Samples

ISyE 691 Data mining and analytics

Functional linear instrumental regression under second order stationarity.

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Nonparametric Inference In Functional Data

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

An iterative hard thresholding estimator for low rank matrix recovery

sparse and low-rank tensor recovery Cubic-Sketching

Functional Analysis Exercise Class

RATES OF CONVERGENCE OF ESTIMATES, KOLMOGOROV S ENTROPY AND THE DIMENSIONALITY REDUCTION PRINCIPLE IN REGRESSION 1

Math 127: Course Summary

Efficient Solution Methods for Inverse Problems with Application to Tomography Radon Transform and Friends

One Picture and a Thousand Words Using Matrix Approximtions October 2017 Oak Ridge National Lab Dianne P. O Leary c 2017

A Data-Driven Block Thresholding Approach To Wavelet Estimation

Divide and Conquer Kernel Ridge Regression. A Distributed Algorithm with Minimax Optimal Rates

Deconvolution. Parameter Estimation in Linear Inverse Problems

Lecture 9 February 2, 2016

Simultaneous White Noise Models and Optimal Recovery of Functional Data. Mark Koudstaal

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

Minimax theory for a class of non-linear statistical inverse problems

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

High-dimensional regression with unknown variance

Asymptotic Equivalence and Adaptive Estimation for Robust Nonparametric Regression

Statistical Convergence of Kernel CCA

Stochastic optimization in Hilbert spaces

Lecture 7 Introduction to Statistical Decision Theory

1 Math 241A-B Homework Problem List for F2015 and W2016

Integral approximation by kernel smoothing

3 Compact Operators, Generalized Inverse, Best- Approximate Solution

Transcription:

Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35

Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite). Econometrics (instrumental variables). Financial mathematics (model calibration). Medical image processing (X-rays). Yale, May 2 2011 p. 2/35

Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite). Econometrics (instrumental variables). Financial mathematics (model calibration). Medical image processing (X-rays). These are problems where we have indirect observations of an object (a function) that we want to reconstruct. Yale, May 2 2011 p. 2/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Solving an inverse problem Inversion of the operator A. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Solving an inverse problem Inversion of the operator A. If A 1 is not continuous the problem is called ill-posed. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Solving an inverse problem Inversion of the operator A. If A 1 is not continuous the problem is called ill-posed. Observe g ε a noisy version of g, then f ε = A 1 g ε could be far from f. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Solving an inverse problem Inversion of the operator A. If A 1 is not continuous the problem is called ill-posed. Observe g ε a noisy version of g, then f ε = A 1 g ε could be far from f. Importance of the notion of noise or error. Yale, May 2 2011 p. 3/35

Linear inverse problems Let H and G two separable Hilbert spaces. Let A be a known linear bounded operator from the space H to G. Yale, May 2 2011 p. 4/35

Linear inverse problems Let H and G two separable Hilbert spaces. Let A be a known linear bounded operator from the space H to G. Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear operator fom H into G, ξ is a white noise, ε corresponds to the noise level. Yale, May 2 2011 p. 4/35

Linear inverse problems Let H and G two separable Hilbert spaces. Let A be a known linear bounded operator from the space H to G. Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear operator fom H into G, ξ is a white noise, ε corresponds to the noise level. Reconstruct (estimate) f with the observation Y. Yale, May 2 2011 p. 4/35

Linear inverse problems Let H and G two separable Hilbert spaces. Let A be a known linear bounded operator from the space H to G. Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear operator fom H into G, ξ is a white noise, ε corresponds to the noise level. Reconstruct (estimate) f with the observation Y. Projection of a white noise on any orthonormal basis {ψ k } gives a sequence of i.i.d. standard Gaussian random variables. Yale, May 2 2011 p. 4/35

Discrete model of inverse problems The standard discrete sample statistical model for linear inverse problems is Y i = Af(X i ) + ξ i, i = 1,...,n, where (X 1,Y 1 ),.., (X n,y n ) are observed (we may assume X i [0, 1]), f is an unknown function in L 2 (0, 1), A is an operator from L 2 (0, 1) into L 2 (0, 1), and ξ i are i.i.d. zero-mean Gaussian random variables of variance σ 2. Yale, May 2 2011 p. 5/35

Discrete model of inverse problems The standard discrete sample statistical model for linear inverse problems is Y i = Af(X i ) + ξ i, i = 1,...,n, where (X 1,Y 1 ),.., (X n,y n ) are observed (we may assume X i [0, 1]), f is an unknown function in L 2 (0, 1), A is an operator from L 2 (0, 1) into L 2 (0, 1), and ξ i are i.i.d. zero-mean Gaussian random variables of variance σ 2. Noise level is related to number of observations by ε 1/ n. Yale, May 2 2011 p. 5/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Yale, May 2 2011 p. 6/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Suppose A A compact operator with a known basis of eigenfunctions in H: A Aϕ k = b 2 k ϕ k. Yale, May 2 2011 p. 6/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Suppose A A compact operator with a known basis of eigenfunctions in H: A Aϕ k = b 2 k ϕ k. Singular Value Decomposition (SVD) of A : Yale, May 2 2011 p. 6/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Suppose A A compact operator with a known basis of eigenfunctions in H: A Aϕ k = b 2 k ϕ k. Singular Value Decomposition (SVD) of A : Aϕ k = b k ψ k, A ψ k = b k ϕ k, where b k > 0 are the singular values, {ϕ k } o.n.b. on H, {ψ k } o.n.b. on G. Yale, May 2 2011 p. 6/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Suppose A A compact operator with a known basis of eigenfunctions in H: A Aϕ k = b 2 k ϕ k. Singular Value Decomposition (SVD) of A : Aϕ k = b k ψ k, A ψ k = b k ϕ k, where b k > 0 are the singular values, {ϕ k } o.n.b. on H, {ψ k } o.n.b. on G. A linear bounded compact operator between two Hilbert spaces may really be seen as an infinite matrix. Yale, May 2 2011 p. 6/35

Projection on {ψ k } Projection of Y on {ψ k } : Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = Af,ψ k + ε ξ,ψ k Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = f,a ψ k + ε ξ,ψ k Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = f,a ψ k + ε ξ,ψ k = b k f,ϕ k + ε ξ,ψ k Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = f,a ψ k + ε ξ,ψ k = b k f,ϕ k + εξ k where {ξ k } standard Gaussian sequence i.i.d., by projection of a white noise ξ on the o.n.b. {ψ k }. Yale, May 2 2011 p. 7/35

Sequence space model Equivalent Sequence space model y k = b k θ k + εξ k, k = 1, 2,..., where {θ k } coefficients of f, ξ k N(0, 1) i.i.d., b k 0 singular values. Yale, May 2 2011 p. 8/35

Sequence space model Equivalent Sequence space model y k = b k θ k + εξ k, k = 1, 2,..., where {θ k } coefficients of f, ξ k N(0, 1) i.i.d., b k 0 singular values. Estimate θ = {θ k } with the observation Y = {Y k }. Use L 2 risk, it is equivalent to estimate f. Yale, May 2 2011 p. 8/35

Sequence space model Equivalent Sequence space model y k = b k θ k + εξ k, k = 1, 2,..., where {θ k } coefficients of f, ξ k N(0, 1) i.i.d., b k 0 singular values. Estimate θ = {θ k } with the observation Y = {Y k }. Use L 2 risk, it is equivalent to estimate f. Remark that b k 0 weaken the signal θ k. Yale, May 2 2011 p. 8/35

Sequence space model Equivalent Sequence space model y k = b k θ k + εξ k, k = 1, 2,..., where {θ k } coefficients of f, ξ k N(0, 1) i.i.d., b k 0 singular values. Estimate θ = {θ k } with the observation Y = {Y k }. Use L 2 risk, it is equivalent to estimate f. Remark that b k 0 weaken the signal θ k. Ill-posed problem. Yale, May 2 2011 p. 8/35

Inversion We have to invert in some sense the operator A. Yale, May 2 2011 p. 9/35

Inversion We have to invert in some sense the operator A. Thus, we obtain the model : where σ k = b 1 k. X k = b 1 k y k = θ k + εσ k ξ k, k = 1, 2,... Yale, May 2 2011 p. 9/35

Inversion We have to invert in some sense the operator A. Thus, we obtain the model : X k = b 1 k y k = θ k + εσ k ξ k, k = 1, 2,... where σ k = b 1 k. In the case where the problem is ill-posed the variance term grows to infinity. Yale, May 2 2011 p. 9/35

Inversion We have to invert in some sense the operator A. Thus, we obtain the model : X k = b 1 k y k = θ k + εσ k ξ k, k = 1, 2,... where σ k = b 1 k. In the case where the problem is ill-posed the variance term grows to infinity. In this model the aim is to estimate {θ k } by use of {X k }. When k is large the noise in X k may then be very large, making the estimation difficult. Yale, May 2 2011 p. 9/35

Inversion We have to invert in some sense the operator A. Thus, we obtain the model : X k = b 1 k y k = θ k + εσ k ξ k, k = 1, 2,... where σ k = b 1 k. In the case where the problem is ill-posed the variance term grows to infinity. In this model the aim is to estimate {θ k } by use of {X k }. When k is large the noise in X k may then be very large, making the estimation difficult. (see Donoho (1995), Mair and Ruymgaart (1996), Johnstone (1999) and C. and Tsybakov (2002)...). Yale, May 2 2011 p. 9/35

Difficulty of inverse problems Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. σ k k β, β > 0 : Mildly ill-posed problem. Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. σ k k β, β > 0 : Mildly ill-posed problem. σ k exp(βk), β > 0 : Severely ill-posed problem. Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. σ k k β, β > 0 : Mildly ill-posed problem. σ k exp(βk), β > 0 : Severely ill-posed problem. Parameter β is called degree of ill-posedness. Yale, May 2 2011 p. 10/35

Examples There exist many examples of operators for which the SVD is known: Yale, May 2 2011 p. 11/35

Examples There exist many examples of operators for which the SVD is known: Convolution. Yale, May 2 2011 p. 11/35

Examples There exist many examples of operators for which the SVD is known: Convolution. Tomography. Yale, May 2 2011 p. 11/35

Examples There exist many examples of operators for which the SVD is known: Convolution. Tomography. Instrumental variables. Yale, May 2 2011 p. 11/35

Circular convolution The framework of deconvolution is perhaps one of the most well-known inverse problem. It is used in many applications as econometrics, physics, astronomy, medical image processing. For example, it corresponds to the problem of a blurred signal that one wants to recover from indirect data. Yale, May 2 2011 p. 12/35

Circular convolution The framework of deconvolution is perhaps one of the most well-known inverse problem. It is used in many applications as econometrics, physics, astronomy, medical image processing. For example, it corresponds to the problem of a blurred signal that one wants to recover from indirect data. Consider the following convolution operator Af(t) = r f(t) = 1 0 r(t x)f(x)dx, x [0, 1], where r is a known 1-periodic symetric real convolution kernel in L 2 [0, 1]. In this model, A is a linear bounded self-adjoint operator from L 2 [0, 1] to L 2 [0, 1]. Yale, May 2 2011 p. 12/35

Blurred cameraman (a) (b) Yale, May 2 2011 p. 13/35

Convolution model Define then the following model Y (t) = r f(t) + ε ξ(t), x [0, 1], where Y is observed, f is an unknown periodic function in L 2 [0, 1] and ξ(t) is a white noise on L 2 [0, 1]. Yale, May 2 2011 p. 14/35

Convolution model Define then the following model Y (t) = r f(t) + ε ξ(t), x [0, 1], where Y is observed, f is an unknown periodic function in L 2 [0, 1] and ξ(t) is a white noise on L 2 [0, 1]. The SVD basis is then clearly here the Fourier basis {ϕ k (t)}. Yale, May 2 2011 p. 14/35

Convolution model Define then the following model Y (t) = r f(t) + ε ξ(t), x [0, 1], where Y is observed, f is an unknown periodic function in L 2 [0, 1] and ξ(t) is a white noise on L 2 [0, 1]. The SVD basis is then clearly here the Fourier basis {ϕ k (t)}. We make the projection on {ϕ k (t)}, in the Fourier domain, and obtain y k = b k θ k + εξ k, where b k = 2 1 0 r(x) cos(2πkx)dx for even k, θ k are the Fourier coefficients of f, and ξ k are i.i.d. N(0, 1). Yale, May 2 2011 p. 14/35

Tomography scan Yale, May 2 2011 p. 15/35

Instrumental variables An economic relationship between a response variable Y and a vector X of explanatory variables is represented by Y i = f(x i ) + U i, i = 1,...,n, where f has to be estimated and U i are the errors. Yale, May 2 2011 p. 16/35

Instrumental variables An economic relationship between a response variable Y and a vector X of explanatory variables is represented by Y i = f(x i ) + U i, i = 1,...,n, where f has to be estimated and U i are the errors. This model does not characterize the function f if U is not constrained. The problem is solved if E(U X) = 0. Yale, May 2 2011 p. 16/35

Instrumental variables An economic relationship between a response variable Y and a vector X of explanatory variables is represented by Y i = f(x i ) + U i, i = 1,...,n, where f has to be estimated and U i are the errors. This model does not characterize the function f if U is not constrained. The problem is solved if E(U X) = 0. In many structural econometrics models some components of X are endogeneous. Yale, May 2 2011 p. 16/35

Instrumental variables An economic relationship between a response variable Y and a vector X of explanatory variables is represented by Y i = f(x i ) + U i, i = 1,...,n, where f has to be estimated and U i are the errors. This model does not characterize the function f if U is not constrained. The problem is solved if E(U X) = 0. In many structural econometrics models some components of X are endogeneous. If Y denotes wages and X, level of education, among other variables. The error U includes, ability, not observed, but influences wages. Yale, May 2 2011 p. 16/35

Instrumental variables An economic relationship between a response variable Y and a vector X of explanatory variables is represented by Y i = f(x i ) + U i, i = 1,...,n, where f has to be estimated and U i are the errors. This model does not characterize the function f if U is not constrained. The problem is solved if E(U X) = 0. In many structural econometrics models some components of X are endogeneous. If Y denotes wages and X, level of education, among other variables. The error U includes, ability, not observed, but influences wages. High ability tends to have high level of education, then education and ability are correlated, and thus X and U also. Yale, May 2 2011 p. 16/35

Instrumental variables Nevertheless, suppose that we observe another set of data, W i where W is called an instrumental variable for which E(U W) = E(Y f(x) W) = 0. Yale, May 2 2011 p. 17/35

Instrumental variables Nevertheless, suppose that we observe another set of data, W i where W is called an instrumental variable for which E(U W) = E(Y f(x) W) = 0. This equation characterizes f by a Fredholm equation of the first kind. Estimation of the function f is in fact an ill-posed inverse problems. Yale, May 2 2011 p. 17/35

Instrumental variables Nevertheless, suppose that we observe another set of data, W i where W is called an instrumental variable for which E(U W) = E(Y f(x) W) = 0. This equation characterizes f by a Fredholm equation of the first kind. Estimation of the function f is in fact an ill-posed inverse problems. Not exactly our model of Gaussian white noise, but closely related. Yale, May 2 2011 p. 17/35

Instrumental variables Nevertheless, suppose that we observe another set of data, W i where W is called an instrumental variable for which E(U W) = E(Y f(x) W) = 0. This equation characterizes f by a Fredholm equation of the first kind. Estimation of the function f is in fact an ill-posed inverse problems. Not exactly our model of Gaussian white noise, but closely related. Inverse problems have been the topic of many articles in the econometrics literature, see Florens (2003), Hall and Horowitz (2005), Chen and Reiss (2009). Yale, May 2 2011 p. 17/35

Inverse problem and sequence space Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. Yale, May 2 2011 p. 18/35

Inverse problem and sequence space Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. By using the SVD, we obtain the equivalent sequence space model : X k = θ k + εσ k ξ k, k = 1, 2,.... where σ k. Yale, May 2 2011 p. 18/35

Inverse problem and sequence space Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. By using the SVD, we obtain the equivalent sequence space model : X k = θ k + εσ k ξ k, k = 1, 2,.... where σ k. The aim is to estimate (reconstruct) the function f (or the sequence {θ k }) by use of observations. Yale, May 2 2011 p. 18/35

Linear estimators Consider here a specific family of estimators. Yale, May 2 2011 p. 19/35

Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1, ˆθ 2,...), ˆθk = λ k X k Yale, May 2 2011 p. 19/35

Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1, ˆθ 2,...), ˆθk = λ k X k and ˆf(λ) = ˆθ k ϕ k. k=1 Yale, May 2 2011 p. 19/35

Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1, ˆθ 2,...), ˆθk = λ k X k and ˆf(λ) = ˆθ k ϕ k. k=1 The L 2 risk of a linear estimator is E ˆf(λ) f 2 = R(θ,λ) = (1 λ k ) 2 θk 2 + ε2 k=1 k=1 σ 2 k λ2 k. Yale, May 2 2011 p. 19/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Yale, May 2 2011 p. 20/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Tikhonov regularization (penalized), λ k = 1 1 + γσk 2α, α 1, γ > 0. Yale, May 2 2011 p. 20/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Tikhonov regularization (penalized), λ k = 1 1 + γσk 2α, α 1, γ > 0. Landweber iteration, λ k = 1 (1 σ 2 k )n,n N. Yale, May 2 2011 p. 20/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Tikhonov regularization (penalized), λ k = 1 1 + γσk 2α, α 1, γ > 0. Landweber iteration, Choice of N,γ or n? λ k = 1 (1 σ 2 k )n,n N. Yale, May 2 2011 p. 20/35

Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: { } Θ = Θ(a,L) = θ : a 2 k θ2 k L, k=1 where a = {a k }, where a k > 0,a k and L > 0. Yale, May 2 2011 p. 21/35

Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: Θ = Θ(a,L) = { θ : } a 2 k θ2 k L, k=1 where a = {a k }, where a k > 0,a k and L > 0. For large values of k coefficients θ k will be decreasing with k and then be small. Yale, May 2 2011 p. 21/35

Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: Θ = Θ(a,L) = { θ : } a 2 k θ2 k L, k=1 where a = {a k }, where a k > 0,a k and L > 0. For large values of k coefficients θ k will be decreasing with k and then be small. Assumptions on coefficients θ k usually related to properties (smoothness) on f. Yale, May 2 2011 p. 21/35

Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } Yale, May 2 2011 p. 22/35

Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } where Θ(α,L) = Θ(a,L) with a = {a k } polynomial such that a 1 = 0 and a k = { (k 1) α for k odd, k α for k even, where α > 0, L > 0. We have also k = 2, 3,..., Yale, May 2 2011 p. 22/35

Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } where Θ(α,L) = Θ(a,L) with a = {a k } polynomial such that a 1 = 0 and a k = { (k 1) α for k odd, k α for k even, where α > 0, L > 0. We have also W(α,L) = { f periodic : 1 0 k = 2, 3,..., } (f (α) (t)) 2 dt π 2α L. Yale, May 2 2011 p. 22/35

Rates of convergence Function f has Fourier coefficients in some ellipsoid, and the problem is mildly, severely ill-posed or even direct. Rates appear in the following table : Yale, May 2 2011 p. 23/35

Rates of convergence Function f has Fourier coefficients in some ellipsoid, and the problem is mildly, severely ill-posed or even direct. Rates appear in the following table : Problem/Functions Direct problem Mildly ill-posed Severely ill-posed Sobolev ε 4α 2α+1 ε 4α 2α+2β+1 (log 1 ε ) 2α Yale, May 2 2011 p. 23/35

Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. Yale, May 2 2011 p. 24/35

Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. When β increases rates are slower. Yale, May 2 2011 p. 24/35

Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. When β increases rates are slower. In direct model, standard rates for nonparametric estimation. For example, 2α/(2α + 1) with Sobolev classes. Yale, May 2 2011 p. 24/35

Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. Yale, May 2 2011 p. 25/35

Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. Yale, May 2 2011 p. 25/35

Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. Even if operator A (and its degree β) is known, no real meaning to consider smoothness of f as known. Yale, May 2 2011 p. 25/35

Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. Even if operator A (and its degree β) is known, no real meaning to consider smoothness of f as known. Notion of adaptation and oracle inequalities, i.e. how to choose bandwidth N without prior assumptions on f. Yale, May 2 2011 p. 25/35

Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Yale, May 2 2011 p. 26/35

Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. Yale, May 2 2011 p. 26/35

Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. However, this is not an estimator since the risk depends on the unknown θ, the oracle will depend also. Yale, May 2 2011 p. 26/35

Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. However, this is not an estimator since the risk depends on the unknown θ, the oracle will depend also. An oracle is the best in the family, but it knows the true θ. Yale, May 2 2011 p. 26/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. Yale, May 2 2011 p. 27/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). Yale, May 2 2011 p. 27/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). Yale, May 2 2011 p. 27/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). Originally, the URE was in the context of regression estimation. Nowadays, it is a basic adaptation tool for many statistical models. Yale, May 2 2011 p. 27/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). Originally, the URE was in the context of regression estimation. Nowadays, it is a basic adaptation tool for many statistical models. This idea appears also in all the cross-validation techniques. Yale, May 2 2011 p. 27/35

URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. Yale, May 2 2011 p. 28/35

URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. In this setting, the functional U(X,λ) = (1 λ k ) 2 (X 2 k ε2 σ 2 k ) + ε2 σ 2 k λ2 k k=1 k=1 Yale, May 2 2011 p. 28/35

URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. In this setting, the functional U(X,λ) = (1 λ k ) 2 (X 2 k ε2 σ 2 k ) + ε2 σ 2 k λ2 k k=1 k=1 is an unbiased estimator of R(θ,λ): R(θ,λ) = E θ U(X,λ), λ. Yale, May 2 2011 p. 28/35

Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). Yale, May 2 2011 p. 29/35

Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). This leads to the following data-driven choice of λ: λ = arg min λ Λ U(X,λ). Yale, May 2 2011 p. 29/35

Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). This leads to the following data-driven choice of λ: λ = arg min λ Λ U(X,λ). Define then the estimator θ by θ k = λ k X k. Yale, May 2 2011 p. 29/35

Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. Yale, May 2 2011 p. 30/35

Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. For any λ Λ 0 < k=1 σ 2 k λ2 k <, max sup λ k 1. λ Λ k Yale, May 2 2011 p. 30/35

Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. For any λ Λ 0 < k=1 σ 2 k λ2 k <, max sup λ k 1. λ Λ k There exists a constant C 1 > 0 such that, uniformly in λ Λ, σ 4 k λ2 k C 1 σ 4 k λ4 k. k=1 k=1 Yale, May 2 2011 p. 30/35

Oracle inequality for URE Theorem. Suppose σ k k β, β 0. Assume that Λ is finite with cardinality D and belongs to the family of Projection, Tikhonov or Pinsker estimators. There exist constants γ,c > 0 such that θ l 2, we have for B large enough, E θ θ θ 2 (1 + γb 1 ) min λ Λ R(θ,λ) + BC ε 2 (log(ds)) 2β+1. Yale, May 2 2011 p. 31/35

Oracle inequality for URE Theorem. Suppose σ k k β, β 0. Assume that Λ is finite with cardinality D and belongs to the family of Projection, Tikhonov or Pinsker estimators. There exist constants γ,c > 0 such that θ l 2, we have for B large enough, E θ θ θ 2 (1 + γb 1 ) min λ Λ R(θ,λ) + BC ε 2 (log(ds)) 2β+1. The data-driven choice by URE mimics the oracle. Yale, May 2 2011 p. 31/35

Simulations Discrete model : inverse problem. Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, Yale, May 2 2011 p. 32/35

Simulations Discrete model : inverse problem. where Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, f(t) = 0.5n(t, 0.4, 0.12) + 0.5n(t, 0.7, 0.08), and g(t) = exp( 10 t 0.5 ), β 2. Yale, May 2 2011 p. 32/35

Simulations Discrete model : inverse problem. where and Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, f(t) = 0.5n(t, 0.4, 0.12) + 0.5n(t, 0.7, 0.08), g(t) = exp( 10 t 0.5 ), β 2. Here m = 1000 et ε 2 = 10 5. Signal/Noise = 100. Estimator by truncated Fourier series. Yale, May 2 2011 p. 32/35

Simulations Discrete model : inverse problem. where and Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, f(t) = 0.5n(t, 0.4, 0.12) + 0.5n(t, 0.7, 0.08), g(t) = exp( 10 t 0.5 ), β 2. Here m = 1000 et ε 2 = 10 5. Signal/Noise = 100. Estimator by truncated Fourier series. With ε 2 k N σ 2 k 1/ log(1/ε). Yale, May 2 2011 p. 32/35

Simulations Discrete model : inverse problem. where and Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, f(t) = 0.5n(t, 0.4, 0.12) + 0.5n(t, 0.7, 0.08), g(t) = exp( 10 t 0.5 ), β 2. Here m = 1000 et ε 2 = 10 5. Signal/Noise = 100. Estimator by truncated Fourier series. With ε 2 k N σ 2 k 1/ log(1/ε). Yale, May 2 2011 p. 32/35

True function f. Estimator f. 3 Estimation de f 2.5 2 1.5 1 0.5 0 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Yale, May 2 2011 p. 33/35

Oracle by projection. Estimator f. 0.55 Risque Quadratique 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0 10 20 30 40 50 60 70 80 90 100 Signal/Bruit Yale, May 2 2011 p. 34/35

Comments Simulations correspond more or less to theory. Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Method not always stable enough. Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Method not always stable enough. Need for stronger penalties than the URE penalty (or AIC). Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Method not always stable enough. Need for stronger penalties than the URE penalty (or AIC). Different method called Risk Hull Method, defined in C. and Golubev (2006). Yale, May 2 2011 p. 35/35