Optimal design for inverse problems

School of Mathematics and Statistical Sciences Research Institute University of Southampton, UK Joint work with Nicolai Bissantz, Holger Dette (both Ruhr-Universität Bochum) and Edmund Jones (University of Bristol) Workshop: Experiments for Processes With Time or Space Dynamics Isaac Newton Institute, Cambridge, 20 July 2011

1 Introduction to inverse problems What is an inverse problem? The model Estimation 2 3 4

Example of an inverse problem What is an inverse problem? The model Estimation

Example of an inverse problem What is an inverse problem? The model Estimation Computed tomography: The shape of the object cannot be observed directly We measure the proportion of X-ray photons passing through an object along certain paths These line integrals have to be inverted in order to get a description of the object

Applications Introduction to inverse problems What is an inverse problem? The model Estimation Inverse problems occur in many different areas, e.g. Medical imaging Computed tomography Magnetic resonance imaging Ultrasound Materials Science find cracks in objects using computed tomography Geophysics Borehole tomography Astrophysics Imaging of galaxies All these applications have in common that the feature of interest cannot be observed directly.

The model - random design What is an inverse problem? The model Estimation The observations are independent pairs (X i, Y i ), i = 1,..., n, where E[Y i X i = x] = (Km)(x) and Var(Y i X i = x) = σ 2 (x). m(x) is the object of interest requires estimation K : L 2 (µ 1 ) L 2 (µ 2 ) is a compact and injective linear operator between L 2 -spaces with respect to the probability measures µ 1 and µ 2 X 1,..., X n are the design points drawn randomly from a density h σ 2 (x) is a positive and finite variance function

Singular value decomposition What is an inverse problem? The model Estimation The operator K has a singular system {(λ j, ϕ j, ψ j ) j IN} where λ j ψ j = K ϕ j, j IN ϕi, ϕ j µ 1 = δ ij, i, j IN ψi, ψ j µ 2 = δ ij, i, j IN. The functions m and Km have expansions of the form m = a j ϕ j and Km = j=1 b j ψ j = j=1 a j K ϕ j = j=1 a j λ j ψ j j=1 where a j = m, ϕ j µ 1 and b j = Km, ψ j µ 2.

Estimation Introduction to inverse problems What is an inverse problem? The model Estimation m = a j ϕ j and Km = b j ψ j = a j λ j ψ j j=1 j=1 j=1 Idea: Estimate the coefficients b j from the observations to obtain ˆb 1, ˆb 2,... Use a j = b j /λ j to estimate â j = ˆb j /λ j, j = 1, 2,... (The eigenvalues λ 1, λ 2,... of K are known.) Substitute â j, j = 1, 2,... into the expansion for m

Spectral cut-off regularisation What is an inverse problem? The model Estimation Problem: We need to estimate infinitely many parameters from a finite number of observations ill-posed problem There are different types of regularisation to overcome this issue Tikhonov regularisation (ridge regression) Spectral cut-off Lasso... In what follows, we will use spectral cut-off regularisation, i.e. ˆm = M j=1 ˆb j λ j ϕ j, for some M IN.

The goal is to minimise the Integrated Mean Squared Error for estimating m, Φ(h), with respect to the design density h(x). Φ(h) = 1 gm (x)(σ 2 (x) + (Km) 2 (x)) dµ 2 (x) n h(x) bj 2 + 1 M bj 2 where g M (x) = n λ 2 j=m+1 j λ 2 j=1 j M ψj 2 (x) j=1 λ 2 j Note that Only the first term of the IMSE depends on h This term also depends on the unknown functions σ 2 (x), (Km)(x) and the unknown regularisation parameter M

The optimal design density Theorem For fixed M, the objective function Φ(h) is minimised by the density h M (x) = gm (x) σ 2 (x) + (Km) 2 (x) gm (t) σ 2 (t) + (Km) 2 (t) dµ 2 (t). Proof: Application of the Cauchy-Schwartz inequality

Example: convolution Let m(x) L 2 [0, 1] be periodic and symmetric around 0.5 and K be the convolution operator, i.e. (Km)(x) = G m(x) = 1 0 G(x t)m(t) dt for some known symmetric function G. Then ϕ 1 (x) = ψ 1 (x) = 1, ϕ j (x) = ψ j (x) = 2 cos(2(j 1)πx), j 2, and λ j = 1 0 G(t)ϕ j(t) dt. The measures µ 1 (x) and µ 2 (x) are Lebesgue measure.

Example: convolution Let G be such that λ j = j 2, j = 1, 2,.... We require plausible values for a j, j = 1, 2,... and σ 2 (x) in order to find the optimal density. For a j = j 2, j = 1, 2,..., the integrated squared bias is of order O(M 3 ) and the integrated variance is of order O(M 5 /n), so we choose ( ) 1/8 n M = c + 1 τ 2 for different values of c and τ 2 = 1 0 (σ 2 (x) + (Km) 2 (x)) dx.

Some optimal designs M = 2, σ 2 = 1 M = 5, σ 2 = 1 h M * (x) 0.0 0.5 1.0 1.5 2.0 h M * (x) 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 x 0.0 0.2 0.4 0.6 0.8 1.0 x M = 10, σ 2 = 1 M = 20, σ 2 = 1 h M * (x) 0.0 0.5 1.0 1.5 2.0 h M * (x) 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x

Design assessment - comparison with the uniform design We compare the optimal designs with the uniform design h u (x) 1 using the ratio Φ(h M )/Φ(h u) as a measure of efficiency. n σ 2 = 0.25 σ 2 = 1 σ 2 = 4 c = 0.5 c = 1 c = 2 c = 0.5 c = 1 c = 2 c = 0.5 c = 1 c = 2 25 0.889 0.839 0.889 0.890 0.845 0.891 0.891 0.849 0.893 100 0.911 0.850 0.911 0.905 0.851 0.913 0.898 0.852 0.915 1000 0.916 0.895 0.926 0.901 0.895 0.928 0.877 0.895 0.929 Table: Efficiency of the uniform design for different sample sizes, variances and choices of the parameter M used in the spectral cut-off regularization. M = c( n τ 2 ) 1/8 + 1 for various values of c. The uniform design is doing quite well!

Design assessment - model misspecifications We want to assess the robustness of locally optimal designs under various model misspecifications. We calculate 8 locally optimal designs with respect to a j = j 2 or j 1.25 (j = 1, 2,...), σ 2 = 0.25 or 1 and M = 2 or 5 We assess each design under each of these 8 scenarios through its efficiency eff(h) = Φ(h (a j, σ 2, M))/Φ(h) where h (a j, σ 2, M) is the locally optimal design for the respective scenario We also include the uniform design h u in the study

Design assessment - model misspecifications design \ scenario a j = j 2 a j = j 1.25 σ 2 = 0.25 σ 2 = 1 σ 2 = 0.25 σ 2 = 1 M = 2 M = 5 M = 2 M = 5 M = 2 M = 5 M = 2 M = 5 h (j 2, 0.25, 2) 1 0.681 1 0.679 0.999 0.690 1 0.685 h (j 2, 0.25, 5) 0.743 1 0.740 1 0.830 0.999 0.805 1 h (j 2, 1, 2) 1 0.683 1 0.681 0.998 0.692 1 0.687 h (j 2, 1, 5) 0.740 1 0.739 1 0.827 0.997 0.804 0.999 h (j 1.25, 0.25, 2) 0.998 0.673 0.996 0.670 1 0.683 0.999 0.677 h (j 1.25, 0.25, 5) 0.747 0.999 0.743 0.997 0.835 1 0.809 0.999 h (j 1.25, 1, 2) 1 0.678 0.999 0.676 0.999 0.688 1 0.682 h (j 1.25, 1, 5) 0.745 1 0.742 0.999 0.831 0.999 0.807 1 h u 0.850 0.926 0.851 0.928 0.900 0.920 0.889 0.925 Table: Efficiencies of the 9 designs under investigation for 8 different scenarios with n = 100. Note: All off-diagonal 1 s come from rounding to three decimal places.

Design assessment - model misspecifications s from this example The uniform design is most robust across all scenarios Misspecification of the coefficients a j or of σ 2 hardly affect the efficiency of the locally optimal designs these designs are fairly similar

Design assessment - model misspecifications s from this example Locally optimal densities for M = 5, σ 2 = 0.25 or 1 and a j = j k, k = 2 or 1.25 M = 5 The uniform design is most robust across all scenarios Misspecification of the coefficients a j or of σ 2 hardly affect the efficiency of the locally optimal designs these designs are fairly similar h M * (x) 0.0 0.5 1.0 1.5 2.0 σ 2 = 0.25, k = 2 σ 2 = 1, k = 2 σ 2 = 0.25, k = 1.25 σ 2 = 1, k = 1.25 0.0 0.2 0.4 0.6 0.8 1.0 x

Radon transform Introduction to inverse problems Tomography We want to recover the density m(r, θ) of an object from line integrals through a slice Each line or path is parametrised through the distance s and the angle φ The paths are drawn randomly from the design density h(s, φ) We observe photon counts Poisson distribution

Radon transform Introduction to inverse problems The operator is the Radon transform K = R defined through 1 Rm(s, φ) = 2 1 s 2 with singular system 1 s2 1 s 2 m (s cos(φ) t sin(φ), s sin(φ) + t cos(φ)) dt ϕ p,q (r, ϑ) = q + 1 Z p q (r)e ipϑ, ψ p,q (s, φ) = U q (s)e ipφ, in brain space and detector space, respectively, and λ p,q = (q + 1) 1/2, q = 0, 1, 2,..., p = q, q + 2,..., q.

Radon transform Introduction to inverse problems The functions q + 1 Z p q (r)e ipϑ are the Zernike polynomials and U q (s) denotes the qth Chebyshev polynomial of the 2nd kind. The measures in brain and detector space are given by dµ B (r, ϑ) = r drdϑ for 0 r 1, 0 ϑ < 2π, π dµ D (s, φ) = 2 π 2 (1 s2 ) 1/2 dsdφ for 0 s 1, 0 φ < 2π.

The optimal design Introduction to inverse problems The optimal design density is given by hm (s, φ) = 1 0 Rm (s, φ) + Rm 2 (s, φ) g M (s) 2π 0 Rm (t, ρ) + Rm 2 (t, ρ) 1 t 2 g M (t) dρdt where M g M (s) = g M (s, φ) = (q + 1) 2 Uq 2 (s). q=1

Slices of example objects 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0

1.0 0.5 0.0 0.5 1.0 Introduction to inverse problems Scanning a centered disc Suppose we want to scan a solid disc of radius 0.5 positioned in the middle of the scan field. 1.0 0.5 0.0 0.5 1.0 Then for each slice, m(r, θ) = { 1 if 0 r 0.5 0 otherwise. The Radon transform of this function is given by Rm(s, φ) = 0.5 2 s 2 /( 1 s 2 )I [0,0.5] (s).

Scanning a centered disc In this case, we can find the optimal density explicitly: 0.5 hm (s, φ) = π gm (s) 2 s 2 4 0.5 0.5 gm (t) 2 t 2 0 + 0.52 s 2 1 s 2 1 s 2 + 0.52 t 2 1 t 2 1 t 2 if 0 s 0.5, 0 φ 2π and hm (s, φ) = 0 otherwise. dt

1.0 0.5 0.0 0.5 1.0 Introduction to inverse problems Scanning a polar rose For the polar rose with 8 petals and radius 0.5, 1.0 0.5 0.0 0.5 1.0 m(r, θ) = 1 if 0 r 0.5 cos(4θ), 0 θ 2π and m(r, θ) = 0 otherwise. Here, the optimal density has to be found numerically.

Some optimal designs for centered disc and polar rose M = 5 M = 10 2.5 2.5 2.0 2.0 1.5 1.5 h*(s,phi) 1.0 6 h*(s,phi) 1.0 6 0.5 4 0.5 4 0.0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi 0.0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi.. M = 5 M = 10 4 4 3 3 h*(s,phi) 2 h*(s,phi) 2 1 4 6 1 4 6 0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi 0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi..

Design assessment - comparison with the uniform design n centered disc polar rose c = 0.5 c = 1 c = 2 c = 0.5 c = 1 c = 2 25.751 (2).696 (3).607 (6).830 (2).691 (4).632 (8) 100.833 (3).658 (5).611 (9).910 (3).725 (6).646 (11) 1000.915 (4).733 (8).620 (15).950 (5).842 (9).679 (18) 10000.962 (7).801 (13).623 (26).981 (8).901 (16).661 (32) Table: Efficiency of the uniform design on [0, 1] [0, 2π], in brackets: M. Why is the uniform design doing so poorly this time? Many observations are made along paths which do not hit the object!

Illustration Introduction to inverse problems Scanning a solid disc of radius 0.5 in the centre of the scan field For the uniform design, many paths do not hit the object, so these observations give limited information

Design assessment - comparison with the uniform design Suppose we knew in advance the object extends only up to 0.5 from the centre of the scan field use a uniform design with constant density h U,0.5 (s, φ) π/( 0.75 + 2 arcsin(0.5)) 1.642 on [0, 0.5] [0, 2π] n centered disc polar rose c = 0.5 c = 1 c = 2 c = 0.5 c = 1 c = 2 25.963 (2).985 (3).981 (6).950 (2).920 (4).912 (8) 100.993 (3).973 (5).981 (9).989 (3).945 (6).919 (11) 1000.996 (4).985 (8).982 (15).992 (5).973 (9).931 (18) 10000.998 (7).992 (13).981 (26).997 (8).984 (16).926 (32) Table: Efficiency of the uniform design on [0, 0.5] [0, 2π]. This uniform design is doing very well.

1.0 0.5 0.0 0.5 1.0 1.0 0.5 0.0 0.5 1.0 Introduction to inverse problems Shifted disc and double disc For the final examples, the functions to be estimated are respectively: m(r, θ) = { 1 if 0 r cos(θ), 0 θ 2π 0 otherwise. 1.0 0.5 0.0 0.5 1.0 and m(r, θ) = { 1 if 0 r 0.5, 0 θ 2π 0.5 if 0.5 < r 1, 0 θ 2π, 1.0 0.5 0.0 0.5 1.0 i.e. the density of the object is higher towards the center.

h*(s,phi) h*(s,phi) Introduction to inverse problems Some optimal designs for shifted disc and double disc M = 5 M = 10 15 15 10 10 5 6 5 6 4 4 0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi 0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi.. M = 5 M = 10 6 6 5 5 4 4 h*(s,phi) 3 h*(s,phi) 3 2 6 2 6 1 4 1 4 0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi 0 0.0 0.2 0.4 s 0.6 0.8 2 0 1.0 phi..

Design assessment - comparison with the uniform design n shifted disc double disc c = 0.5 c = 1 c = 2 c = 0.5 c = 1 c = 2 25.679 (2).568 (3).541 (6).856 (2).860 (3).863 (5) 100.693 (3).581 (5).543 (9).873 (2).866 (4).866 (7) 1000.864 (4).644 (8).554 (15).920 (3).873 (6).866 (12) 10000.923 (7).702 (13).559 (26).937 (5).879 (10).867 (20) Table: Efficiency of the uniform design on [0, 1] [0, 2π]. For the double disc, the uniform design is doing reasonably well. For the shifted disc it s performing quite poorly.

Introduction to inverse problems The locally optimal designs rarely outperform the uniform design considerably... and if they do it can often be remedied using prior knowledge... but not always The uniform design appears to be more robust with respect to model misspecifications Any prior knowledge on m should be incorporated in the design

Future work Introduction to inverse problems Investigate the performance of sequential designs Consider optimal design for different methods of modelling/estimation/regularisation in inverse problems Consider dynamic problems in this context, e.g. images of a beating heart in real time

Thank You!

Some references Biedermann, S.G.M, Bissantz, N., Dette, H. and Jones, E. (2011). for indirect regression. Under review. Bissantz, N. and Holzmann, H. (2008). Statistical inference for inverse problems. Inverse problems, 24, 17pp. doi: 10.1088/0266-5611/24/3/034009 Johnstone, I. M. and Silverman, B. W. (1990). Speed of estimation in positron emission tomography and related inverse problems. Annals of Statistics, 18, 251-280.

Bias Introduction to inverse problems Estimate the coefficients as ˆb j = 1 n n i=1 ψ j (X i ) h(x i ) Y i. Note that this is not the LSE, but a direct estimator avoiding matrix inversion. E[ˆb j ] = (Km)(x)ψ j (x) dµ 2 (x) = b j unbiased! The integrated squared bias for estimating m is given by (E[m(x) ˆm(x)]) 2 dµ 1 (x) = b 2 j λ 2 j=m+1 j.

Variance Introduction to inverse problems The integrated variance for estimating m is Var( ˆm(x)) dµ 1 (x) = 1 gm (x)(σ 2 (x) + (Km) 2 (x)) dµ 2 (x) 1 n h(x) n M b 2 j λ 2 j=1 j where M ψj 2 (x) g M (x) =. The first term is usually dominating. j=1 λ 2 j