Inverse problems in statistics

Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35

Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite). Econometrics (instrumental variables). Financial mathematics (model calibration). Medical image processing (X-rays). These are problems where we have indirect observations of an object (a function) that we want to reconstruct. Yale, May 2 2011 p. 2/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Yale, May 2 2011 p. 3/35

Inverse problems Let H et G be Hilbert spaces. Let A be a continuous linear operator from H into G. Given g G find f H such that Af = g. Solving an inverse problem Inversion of the operator A. If A 1 is not continuous the problem is called ill-posed. Observe g ε a noisy version of g, then f ε = A 1 g ε could be far from f. Yale, May 2 2011 p. 3/35

Linear inverse problems Let H and G two separable Hilbert spaces. Let A be a known linear bounded operator from the space H to G. Yale, May 2 2011 p. 4/35

Linear inverse problems Let H and G two separable Hilbert spaces. Let A be a known linear bounded operator from the space H to G. Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear operator fom H into G, ξ is a white noise, ε corresponds to the noise level. Reconstruct (estimate) f with the observation Y. Yale, May 2 2011 p. 4/35

Discrete model of inverse problems The standard discrete sample statistical model for linear inverse problems is Y i = Af(X i ) + ξ i, i = 1,...,n, where (X 1,Y 1 ),.., (X n,y n ) are observed (we may assume X i [0, 1]), f is an unknown function in L 2 (0, 1), A is an operator from L 2 (0, 1) into L 2 (0, 1), and ξ i are i.i.d. zero-mean Gaussian random variables of variance σ 2. Yale, May 2 2011 p. 5/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Yale, May 2 2011 p. 6/35

Singular value decomposition A major property of compact operators is that they have a discrete spectrum. Suppose A A compact operator with a known basis of eigenfunctions in H: A Aϕ k = b 2 k ϕ k. Singular Value Decomposition (SVD) of A : Aϕ k = b k ψ k, A ψ k = b k ϕ k, where b k > 0 are the singular values, {ϕ k } o.n.b. on H, {ψ k } o.n.b. on G. Yale, May 2 2011 p. 6/35

Projection on {ψ k } Projection of Y on {ψ k } : Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = Af,ψ k + ε ξ,ψ k Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = f,a ψ k + ε ξ,ψ k Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = f,a ψ k + ε ξ,ψ k = b k f,ϕ k + ε ξ,ψ k Yale, May 2 2011 p. 7/35

Projection on {ψ k } Projection of Y on {ψ k } : Y,ψ k = f,a ψ k + ε ξ,ψ k = b k f,ϕ k + εξ k where {ξ k } standard Gaussian sequence i.i.d., by projection of a white noise ξ on the o.n.b. {ψ k }. Yale, May 2 2011 p. 7/35

Sequence space model Equivalent Sequence space model y k = b k θ k + εξ k, k = 1, 2,..., where {θ k } coefficients of f, ξ k N(0, 1) i.i.d., b k 0 singular values. Yale, May 2 2011 p. 8/35

Sequence space model Equivalent Sequence space model y k = b k θ k + εξ k, k = 1, 2,..., where {θ k } coefficients of f, ξ k N(0, 1) i.i.d., b k 0 singular values. Estimate θ = {θ k } with the observation Y = {Y k }. Use L 2 risk, it is equivalent to estimate f. Yale, May 2 2011 p. 8/35

Inversion We have to invert in some sense the operator A. Yale, May 2 2011 p. 9/35

Inversion We have to invert in some sense the operator A. Thus, we obtain the model : where σ k = b 1 k. X k = b 1 k y k = θ k + εσ k ξ k, k = 1, 2,... Yale, May 2 2011 p. 9/35

Inversion We have to invert in some sense the operator A. Thus, we obtain the model : X k = b 1 k y k = θ k + εσ k ξ k, k = 1, 2,... where σ k = b 1 k. In the case where the problem is ill-posed the variance term grows to infinity. In this model the aim is to estimate {θ k } by use of {X k }. When k is large the noise in X k may then be very large, making the estimation difficult. Yale, May 2 2011 p. 9/35

Difficulty of inverse problems Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. σ k k β, β > 0 : Mildly ill-posed problem. Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. σ k k β, β > 0 : Mildly ill-posed problem. σ k exp(βk), β > 0 : Severely ill-posed problem. Yale, May 2 2011 p. 10/35

Difficulty of inverse problems σ k 1 : Direct problem. σ k k β, β > 0 : Mildly ill-posed problem. σ k exp(βk), β > 0 : Severely ill-posed problem. Parameter β is called degree of ill-posedness. Yale, May 2 2011 p. 10/35

Examples There exist many examples of operators for which the SVD is known: Yale, May 2 2011 p. 11/35

Examples There exist many examples of operators for which the SVD is known: Convolution. Yale, May 2 2011 p. 11/35

Examples There exist many examples of operators for which the SVD is known: Convolution. Tomography. Yale, May 2 2011 p. 11/35

Examples There exist many examples of operators for which the SVD is known: Convolution. Tomography. Instrumental variables. Yale, May 2 2011 p. 11/35

Circular convolution The framework of deconvolution is perhaps one of the most well-known inverse problem. It is used in many applications as econometrics, physics, astronomy, medical image processing. For example, it corresponds to the problem of a blurred signal that one wants to recover from indirect data. Consider the following convolution operator Af(t) = r f(t) = 1 0 r(t x)f(x)dx, x [0, 1], where r is a known 1-periodic symetric real convolution kernel in L 2 [0, 1]. In this model, A is a linear bounded self-adjoint operator from L 2 [0, 1] to L 2 [0, 1]. Yale, May 2 2011 p. 12/35

Blurred cameraman (a) (b) Yale, May 2 2011 p. 13/35

Convolution model Define then the following model Y (t) = r f(t) + ε ξ(t), x [0, 1], where Y is observed, f is an unknown periodic function in L 2 [0, 1] and ξ(t) is a white noise on L 2 [0, 1]. The SVD basis is then clearly here the Fourier basis {ϕ k (t)}. We make the projection on {ϕ k (t)}, in the Fourier domain, and obtain y k = b k θ k + εξ k, where b k = 2 1 0 r(x) cos(2πkx)dx for even k, θ k are the Fourier coefficients of f, and ξ k are i.i.d. N(0, 1). Yale, May 2 2011 p. 14/35

Tomography scan Yale, May 2 2011 p. 15/35

Instrumental variables An economic relationship between a response variable Y and a vector X of explanatory variables is represented by Y i = f(x i ) + U i, i = 1,...,n, where f has to be estimated and U i are the errors. This model does not characterize the function f if U is not constrained. The problem is solved if E(U X) = 0. In many structural econometrics models some components of X are endogeneous. If Y denotes wages and X, level of education, among other variables. The error U includes, ability, not observed, but influences wages. Yale, May 2 2011 p. 16/35

Instrumental variables Nevertheless, suppose that we observe another set of data, W i where W is called an instrumental variable for which E(U W) = E(Y f(x) W) = 0. Yale, May 2 2011 p. 17/35

Instrumental variables Nevertheless, suppose that we observe another set of data, W i where W is called an instrumental variable for which E(U W) = E(Y f(x) W) = 0. This equation characterizes f by a Fredholm equation of the first kind. Estimation of the function f is in fact an ill-posed inverse problems. Not exactly our model of Gaussian white noise, but closely related. Yale, May 2 2011 p. 17/35

Inverse problem and sequence space Let the model : Y = Af + εξ, where Y is the observation, f H unknown, A a continuous linear compact operator fom H into G, ξ is a white noise, ε is the noise level. By using the SVD, we obtain the equivalent sequence space model : X k = θ k + εσ k ξ k, k = 1, 2,.... where σ k. Yale, May 2 2011 p. 18/35

Linear estimators Consider here a specific family of estimators. Yale, May 2 2011 p. 19/35

Linear estimators Consider here a specific family of estimators. Let λ = (λ 1,λ 2,...) be a sequence of nonrandom weights. Every sequence λ defines a linear estimator ˆθ(λ) = (ˆθ 1, ˆθ 2,...), ˆθk = λ k X k and ˆf(λ) = ˆθ k ϕ k. k=1 Yale, May 2 2011 p. 19/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Yale, May 2 2011 p. 20/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Tikhonov regularization (penalized), λ k = 1 1 + γσk 2α, α 1, γ > 0. Yale, May 2 2011 p. 20/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Tikhonov regularization (penalized), λ k = 1 1 + γσk 2α, α 1, γ > 0. Landweber iteration, λ k = 1 (1 σ 2 k )n,n N. Yale, May 2 2011 p. 20/35

Classes of linear estimators Projection estimators (spectral cut-off), λ k = I(k N), N > 0. Tikhonov regularization (penalized), λ k = 1 1 + γσk 2α, α 1, γ > 0. Landweber iteration, Choice of N,γ or n? λ k = 1 (1 σ 2 k )n,n N. Yale, May 2 2011 p. 20/35

Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: { } Θ = Θ(a,L) = θ : a 2 k θ2 k L, k=1 where a = {a k }, where a k > 0,a k and L > 0. Yale, May 2 2011 p. 21/35

Ellipsoid of coefficients Assume that f belongs to functional class corresponding to ellipsoids Θ in space of coefficients {θ k }: Θ = Θ(a,L) = { θ : } a 2 k θ2 k L, k=1 where a = {a k }, where a k > 0,a k and L > 0. For large values of k coefficients θ k will be decreasing with k and then be small. Yale, May 2 2011 p. 21/35

Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } Yale, May 2 2011 p. 22/35

Sobolev classes Introduce the Sobolev classes { W(α,L) = f = k=1 θ k ϕ k : θ Θ(α,L) } where Θ(α,L) = Θ(a,L) with a = {a k } polynomial such that a 1 = 0 and a k = { (k 1) α for k odd, k α for k even, where α > 0, L > 0. We have also k = 2, 3,..., Yale, May 2 2011 p. 22/35

Rates of convergence Function f has Fourier coefficients in some ellipsoid, and the problem is mildly, severely ill-posed or even direct. Rates appear in the following table : Yale, May 2 2011 p. 23/35

Rates of convergence Function f has Fourier coefficients in some ellipsoid, and the problem is mildly, severely ill-posed or even direct. Rates appear in the following table : Problem/Functions Direct problem Mildly ill-posed Severely ill-posed Sobolev ε 4α 2α+1 ε 4α 2α+2β+1 (log 1 ε ) 2α Yale, May 2 2011 p. 23/35

Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. Yale, May 2 2011 p. 24/35

Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. When β increases rates are slower. Yale, May 2 2011 p. 24/35

Comments Rates usually depend on smoothness α of function f and on degree of ill-posedness β. When β increases rates are slower. In direct model, standard rates for nonparametric estimation. For example, 2α/(2α + 1) with Sobolev classes. Yale, May 2 2011 p. 24/35

Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. Yale, May 2 2011 p. 25/35

Comments To attain optimal rate with projection estimator, choose N corresponding to optimal trade-off between bias and variance. In minimax sense, optimal choice for N. However, choice depends on smoothness α and on degree of ill-posedness β. Even if operator A (and its degree β) is known, no real meaning to consider smoothness of f as known. Yale, May 2 2011 p. 25/35

Oracle Consider now a linked, but different point of view. Assume that a class of estimators is fixed, i.e. that the class of possible weights λ Λ is given (projection, Tikhonov,...). Define the oracle λ 0 as R(θ,λ 0 ) = inf λ Λ R(θ,λ). The oracle corresponds to the best possible choice in Λ, i.e. the one which minimizes the risk. However, this is not an estimator since the risk depends on the unknown θ, the oracle will depend also. Yale, May 2 2011 p. 26/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. Yale, May 2 2011 p. 27/35

Unbiased risk estimation A very natural idea in statistics is to estimate this unknown risk using the available data, and then to minimize this estimator of the risk. A classical approach to this minimization problem is based on the principle of unbiased risk estimation (URE) (Stein (1981)). This method goes back to Akaike Information Criteria (AIC) in Akaike (1973) and Mallows C p (1973). Originally, the URE was in the context of regression estimation. Nowadays, it is a basic adaptation tool for many statistical models. Yale, May 2 2011 p. 27/35

URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. Yale, May 2 2011 p. 28/35

URE in inverse problems For inverse problems, this method was studied in C., Golubev, Picard and Tsybakov (2002), where exact oracle inequalities were obtained. In this setting, the functional U(X,λ) = (1 λ k ) 2 (X 2 k ε2 σ 2 k ) + ε2 σ 2 k λ2 k k=1 k=1 Yale, May 2 2011 p. 28/35

Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). Yale, May 2 2011 p. 29/35

Data-driven choice Unbiased risk estimation suggests to minimize over λ Λ the functional U(X,λ) in place of R(θ,λ). This leads to the following data-driven choice of λ: λ = arg min λ Λ U(X,λ). Yale, May 2 2011 p. 29/35

Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. Yale, May 2 2011 p. 30/35

Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. For any λ Λ 0 < k=1 σ 2 k λ2 k <, max sup λ k 1. λ Λ k Yale, May 2 2011 p. 30/35

Assumptions Denote S = ( maxλ Λ k=1 σ4 k λ2 k min λ Λ k=1 σ4 k λ2 k ) 1/2. Let the following assumptions hold. For any λ Λ 0 < k=1 σ 2 k λ2 k <, max sup λ k 1. λ Λ k There exists a constant C 1 > 0 such that, uniformly in λ Λ, σ 4 k λ2 k C 1 σ 4 k λ4 k. k=1 k=1 Yale, May 2 2011 p. 30/35

Oracle inequality for URE Theorem. Suppose σ k k β, β 0. Assume that Λ is finite with cardinality D and belongs to the family of Projection, Tikhonov or Pinsker estimators. There exist constants γ,c > 0 such that θ l 2, we have for B large enough, E θ θ θ 2 (1 + γb 1 ) min λ Λ R(θ,λ) + BC ε 2 (log(ds)) 2β+1. Yale, May 2 2011 p. 31/35

Simulations Discrete model : inverse problem. Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, Yale, May 2 2011 p. 32/35

Simulations Discrete model : inverse problem. where Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, f(t) = 0.5n(t, 0.4, 0.12) + 0.5n(t, 0.7, 0.08), and g(t) = exp( 10 t 0.5 ), β 2. Yale, May 2 2011 p. 32/35

Simulations Discrete model : inverse problem. where and Y (i) = g f ( i m ) + ε mξ(i), i = 1,...,m, f(t) = 0.5n(t, 0.4, 0.12) + 0.5n(t, 0.7, 0.08), g(t) = exp( 10 t 0.5 ), β 2. Here m = 1000 et ε 2 = 10 5. Signal/Noise = 100. Estimator by truncated Fourier series. Yale, May 2 2011 p. 32/35

True function f. Estimator f. 3 Estimation de f 2.5 2 1.5 1 0.5 0 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Yale, May 2 2011 p. 33/35

Oracle by projection. Estimator f. 0.55 Risque Quadratique 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0 10 20 30 40 50 60 70 80 90 100 Signal/Bruit Yale, May 2 2011 p. 34/35

Comments Simulations correspond more or less to theory. Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Method not always stable enough. Yale, May 2 2011 p. 35/35

Comments Simulations correspond more or less to theory. Limitation on the size of the family. Method not always stable enough. Need for stronger penalties than the URE penalty (or AIC). Yale, May 2 2011 p. 35/35