2 Linear Ill-Posed Problems Compact Operators Regularization Operators CGNE for the Bilinear Ansatz...

Size: px

Start display at page:

Download "2 Linear Ill-Posed Problems Compact Operators Regularization Operators CGNE for the Bilinear Ansatz..."

Camilla Daniel
6 years ago
Views:

1 Abstract The effect of wavefront distortion due to turbulence in the Earth s atmosphere has made it necessary to develop tools for ground-based telescopes that compensate for this aberration. Therefore, the technology of Adaptive Optics (AO) has been investigated and used in astronomy. However, AO systems require to solve large linear ill-conditioned systems. This thesis deals with possible approaches to find regularized solutions of such systems. We start with an introduction to Adaptive Optics and the underlying mathematical modeling. The second chapter deals with regularization of illposed problems and introduces some regularization methods. In Chapter 3, we discuss whether some bases are likely to yield sparse representations of the desired solution and present the algorithm of iterative soft-thresholding that promotes sparsity. Finally, we introduce an accelerated version of this method and conclude by giving some numerical results. 1

2 Zusammenfassung Turbulenzen in der erdnahen Atmosphäre, auch Luftunruhe genannt, verursachen Fehler in Bildern von Himmelskörpern, die von erdstationierten Teleskopen erzeugt werden. Daher ist es notwendig, diese Teleskope mit der Technik der Adaptiven Optik (AO) auszustatten, die die verzerrten Wellenfronten, die das Teleskop erreichen, korrigiert. Ein Teil dieser Korrektur durch AO Systeme verlangt das Lösen großdimensionierter linearer schlecht konditionierter Probleme. Die vorliegende Arbeit befasst sich mit möglichen Zugängen eine regularisierte Lösung solcher Probleme zu geben. Zu Beginn geben wir eine kurze Einführung in Adaptive Optik und dem zugrundeliegenden mathematischen Modell. Das zweite Kapitel beinhaltet einen Überblick über die Theorie der Regularisierung schlecht gestellter Probleme und führt bekannte Regularisierungsmethoden ein. In Kapitel 3 behandeln wir die Frage, ob es zur gesuchten Lösung eine Basis gibt, in der sie sich gut, also mit wenig Koeffizienten ungleich 0, darstellen lässt. Weiters stellen wir einen iterativen Algorithmus vor, der auf Soft-Thresholding basiert und dünnbesetzte Lösungen bevorzugt. Abschließend behandeln wir eine beschleunigte Version dieser Methode und präsentieren numerische Ergebnisse. 2

3 Acknowledgement I owe my deepest gratitude to my supervisor Prof. Ronny Ramlau who supported me throughout my work on this thesis and always took the time to answer my questions and make suggestions. I would also like to thank my colleagues Dr. Mariya Zhariy and Dr. Tapio Helin who encouraged me and helped me to develop an understanding of the subject. Last but not least I would like to show my gratitude to my family and friends. 3

4 List of Figures 1.1 Correction of the incoming wavefront Achievable improvement with adaptive optics. On the left: Image dominated by atmospheric turbulence. On the right: Corrected image clearly shows the binary star Configuration of an AO system A deformable mirror Schematic representation of the Shack-Hartmann WFS Discretization of the aperture for n = 10. The circle describes the pupil of the telescope. Shaded squares are the pupil-masked subapertures and dots represent DM actuators Condition number w.r.t. matrix size Original phase screen Reconstructed phase screens for different noise levels, τ = Relative error in φ w.r.t. noise level δ, τ = φ and ψ for Daubechies wavelets D slice of original phase screen Sparsity pattern of P for the Haar basis Sparsity pattern of P for the Daubechies wavelets Reconstructed 1D phase screens for Haar basis, p = 1 and different noise levels Error in φ w.r.t. noise level δ. Haar basis, p = Error in φ w.r.t. to noise level δ for FISTA with p = Performance of ISTA and FISTA for p = Performance of ISTA and FISTA for p = 1.5 and p = Distribution of absolute values of wavelet coefficients around zero for different values of p. x-axis: absolute values of coefficients, y-axis: number of coefficients

5 Contents 1 Introduction Imaging Through the Atmosphere Adaptive Optics Components Mathematical Modeling Bilinear Influence Functions Linear Ill-Posed Problems Compact Operators Regularization Operators CGNE for the Bilinear Ansatz Sparse Reconstruction An Iterative Soft-Thresholding Algorithm Wavelets Multiresolution Analysis Orthonormal Bases of Compactly Supported Wavelets Implementation Choosing the Weights w γ The Shrinkage Function S wγ,p Building the Poke Matrix The Regularization Parameter α A Fast Iterative Soft-Thresholding Algorithm Numerical Results and Conclusion 56 References 60 5

6 1 Introduction We start with a (very) brief introduction to Adaptive Optics. For a detailed discussion we refer to [8]. 1.1 Imaging Through the Atmosphere When light from an astronomical object, e.g. a star, propagates through the atmosphere, it undergoes some turbulence when entering the Earth s atmosphere. This is due to the interaction between different temperature layers and wind speeds. An indicator for turbulence in fluids is the Reynolds number. If the Reynolds number is below a critical value, the flow will be laminar, otherwise turbulence will occur. The Reynolds number is a dimensionless parameter given as Re = inertial forces viscous forces = V l k ν, where V is a characteristic velocity, l is a characteristic size and k ν is the kinematic viscosity of the fluid. Close to Earth, due to the solar heating of its surface, convection currents are caused and the kinematic viscosity is k ν = m 2 s 1. For typical characteristic velocities (V > 1m/s) and characteristic lengths l of several meters to kilometers, this results in Reynolds numbers Re > 10 6, which are sufficiently large for turbulence to occur, [12]. Since a beam of light that propagates through the atmosphere suffers from this turbulence, images of astronomical objects taken from Earth are blurred and distorted. This is the biggest challenge for Earth-based astronomy. Adaptive Optics deals with developing devices for ground-based telescopes that can compensate this distortion. The main idea of Adaptive Optics is the following: Since the object of interest is assumed to be far away from Earth, the propagating wavefronts are almost plane. Due to atmospheric turbulence they become perturbed and are no longer planar. A beam of light can be described by an electric field of the form Ae +iφatm, where A is the amplitude and φ atm is the phase of the beam. If a mirror can be shaped according to the conjugated phase, i.e. Ae iφatm, 6

7 Figure 1.1: Correction of the incoming wavefront. and the light emitted by the astronomical object is reflected at this mirror one obtains plane wavefronts and has thus corrected the distortion (see Fig ). Of course, this is only a very basic and schematic description of an Adaptive Optics (AO) system. The main challenge is to achieve a high-speed, real-time correction for the turbulence. Figure shows an example of how much Adaptive Optics can improve the quality of an image. 1.2 Adaptive Optics Components We now want to go into more detail about how an AO system looks like, i.e. what its main components are. Typically, an AO system consists of 1. a deformable mirror (DM), 2. a wavefront sensor (WFS) and 3. a control computer (see Fig ). In order to compensate for the atmospheric turbulence, the 1 optics 2 [16], p. 4 3 [16], p. 2 7

Figure 1.2: Achievable improvement with adaptive optics. On the left: Image dominated by atmospheric turbulence. On the right: Corrected image clearly shows the binary star.

8 Figure 1.2: Achievable improvement with adaptive optics. On the left: Image dominated by atmospheric turbulence. On the right: Corrected image clearly shows the binary star. shape of the DM is adapted in real-time to follow the wavefront aberrations. The control computer receives measurement signals from the WFS and generates control signals to drive the DM. The WFS is located after the DM in the optical path. Thus, it measures the residual in the wavefront perturbation after the DM correction has been applied. The aim is to minimize this residual by the AO control loop. Eventually, the corrected wavefront is sent to an astronomical instrument, such as an imaging camera, which is located in the focal plane of the telescope. Deformable Mirrors A deformable mirror consists of a continuous reflective facesheet that is deformed by a set of actuators placed at the back of it (see Fig ). There are several designs for the DM and especially for the actuators. The latter can be electromechanical, electromagnetic, piezoelectric or magnetostrictive units. Most commonly, the actuators are piezoelectric elements. The most important parameters in the design of DMs are - the number of actuators, - the spacing between them, - the maximum stroke, - the drive voltage levels and 4 atokovin/tutorial/part2/dm.html #SEC2.2 8

However, most wavefront sensors do not measure the wavefront directly but rather its first derivative (wavefront slopes) or second derivative (curvature).

9 Figure 1.3: Configuration of an AO system. - the shape of the influence functions (see Chapter 1.3). Wavefront Sensors Figure 1.4: A deformable mirror. The WFS measures the wavefront distortions caused by atmospheric turbulence. However, most wavefront sensors do not measure the wavefront directly but rather its first derivative (wavefront slopes) or second derivative (curvature). The most popular WFS is the Shack-Hartmann type. For the Shack-Hartmann WFS, the lenslet array is optically conjugated to the pupil plane of the telescope. This n n grid of tiny lenses spatially samples the distorted wavefront. Each lens forms a small part of the image, corresponding to a part of the aperture (subaperture), onto a detector located in the focal plane of the lenslet array. If the wavefront in the pupil is plane, the subaperture images, called spots, form a regular grid. A distorted wave- 9

front results in a displacement of the spots (see Fig. 1.5 5 ).

10 front results in a displacement of the spots (see Fig ). It is intuitively understood that the displacements of the spots in two orthogonal directions x and y are proportional to the average wavefront slopes s x, s y in x and y over the corresponding subapertures. Thus, a Shack-Hartmann WFS measures the local [ ] gradient error. The wavefront is then reconstructed from the sx array s = R 2n2 of the measured slopes. s y Figure 1.5: Schematic representation of the Shack-Hartmann WFS. 1.3 Mathematical Modeling Our aim is to reconstruct the incoming wavefront φ atm = φ atm (x) H 1 (R 2 ) from given WFS measurements s R 2n2 in order to compute the DM commands that are needed to shape the DM such that it is optically conjugated to φ atm. There are different setups of how the DM actuators are arranged. The most commonly used one for Shack-Hartmann WFSs is Fried geometry. In this geometry, the actuators are located at the corners of the subapertures. By introducing a phase-to-wfs interaction operator G : H 1 (R 2 ) R 2n2 the problem can be formulated as follows: s = Gφ atm. (1.1) 5 atokovin/tutorial/part3/wfs.html #SEC3.2 10

11 For a Shack-Hartmann WFS, Gφ atm is the gradient of the phase averaged over those subapertures that are located entirely inside the pupil, i.e. G has the following structure: G = [G x, G y ] = [MΓ x, MΓ y ], where for the i-th subaperture Ω i φ atm [Γ x φ atm ] i = Ω i x dx and similarly for Γ y. Furthermore, the mask M is defined as the diagonal matrix whose i-th diagonal entry is 1 if Ω i is completely contained in the pupil and 0 otherwise. Since H 1 (R 2 ) L 2 (R 2 ) and L 2 (R 2 ) = L 2 (R) L 2 (R) we can use a basis {b i } i=1 of L 2 (R) to represent the wavefront φ atm H 1 (R 2 ): φ atm (x, y) = ϕ i,j b i (x)b j (y) = Bϕ, (1.2) i,j=1 where ϕ i,j is the (i,j)-th coefficient of φ atm in this basis representation and B denotes the operator that maps the coefficients to φ atm H 1 (R 2 ). By inserting (1.2) in (1.1) we get s = GBϕ = Pϕ, (1.3) where P = GB denotes the DM-to-WFS (or Poke) operator and ϕ = (ϕ i,j ) i,j N. For the actual computation of the corrected wavefront φ rec which is generated by the deformable mirror we use a discrete approach. Therefore we compute a solution in finite dimensional spaces X k L 2 (R 2 ) for which X 1 X 2 X 3..., X k = L 2 (R 2 ), k N where k is the dimension of X k. If the telescope pupil consists of n n subapertures and if we neglect the mask M, we end up with N := (n + 1) 2 DM commands. Hence, we consider the space X N and represent φ rec via the basis {b i } n+1 i=1 of the corresponding subspace of L2 (R) and coefficients a i,j : φ rec (x, y) = n+1 i,j=1 11 a i,j b i (x)b j (y). (1.4)

12 We assume that {b i } n+1 i=1 is a basis that is shift invariant and locally supported, and we can thus define h j for j = (j 2 1) (n + 1) + j 1 by Here, := 1 n ( x xj1 h j (x, y) := b ) b ( y yj2 and b is defined such that ( x xj1 ) b = b j1 (x), ( y yj2 ) b = b j2 (y). With this new representation we get ). φ rec = Ha = N a j h j, (1.5) j=1 where a R N is the vector of DM commands, h j s are the influence functions and H : R N H 1 (R 2 ) is the DM-to-phase operator. Note that we need sufficiently smooth influence functions in order to guarantee that H maps to H 1 (R 2 ). The influence functions are called bilinear, if { z + 1 z 1, b(z) = 0 otherwise, and they are called bicubic, if b is a cubic b-spline supported on the interval [ 2, 2]. By inserting (1.5) in (1.1) we can introduce the DM-to-WFS matrix (Poke matrix) P as the product GH, i.e. s = GHa = Pa. (1.6) The entries of the Poke matrix P = [P x,p y ] are then determined by P i,j h j (x, y) x = d(x, y), (1.7) Ω i x yi2 +1 ( = hj (x i1 +1, y) h j (x i1, y) ) dy, y i2 P i,j h j (x, y) y = d(x, y), (1.8) Ω i y xi1 +1 ( = hj (x, y i2 +1) h j (x, y i2 ) ) dx, x i1 12

13 where Ω i = [x i1, x i1 +1] [y i2, y i2 +1]. In practice, the slopes s will not be given exactly but perturbed data s δ will be available, i.e. s δ = s + η δ for some random noise vector η δ. Then, (1.6) reads s δ = Pa. (1.9) Note that the total number of slope measurements is approximately twice the number of DM commands. This redundancy in the measurements has the beneficial effect of smoothing random errors, which do not accumulate in contrast to the case where there is no redundancy. 1.4 Bilinear Influence Functions In this section, we assume to have a discretization with n-by-n subapertures (see Fig. 1.4). For bilinear influence functions (1.7) and (1.8) reduce to P x i,j = 1 2 [h j(x i1 +1, y i2 ) h j (x i1, y i2 ) + h j (x i1 +1, y i2 +1) h j (x i1, y i2 +1)], P y i,j = 1 2 [h j(x i1, y i2 +1) h j (x i1, y i2 ) + h j (x i1 +1, y i2 +1) h j (x i1 +1, y i2 )]. The operator P T P, where P is the Poke matrix, has zero values in its spec- Figure 1.6: Discretization of the aperture for n = 10. The circle describes the pupil of the telescope. Shaded squares are the pupil-masked subapertures and dots represent DM actuators. trum which means that there is no unique solution of the problem. For illustration purposes, we determine κ(p T P) = λmax(pt P) λ min (P T P) by taking λ min (P T P) as the smallest non-zero eigenvalue in norm. Otherwise, κ(p T P) = for any n and no comparison could be possible. Figure 1.7 shows how κ(p T P) grows as n gets larger. 13

14 κ(p T P) n, where pupil has n x n subapertures Figure 1.7: Condition number w.r.t. matrix size. Large condition numbers can cause a problem for noisy data. Consider a linear equation system Ax = b. (1.10) If the data is noisy we need to change the equation to and the change in x is A(x + x) = b + b x = A 1 b. (1.11) The corresponding norm estimates to (1.10) and (1.11) are x 2 A 1 2 b 2, (1.12) b 2 A 2 x 2. (1.13) These two inequations yield an estimate for the relative error in x: x x κ(a) b b. Therefore, if the condition number κ(a) is large, a small perturbation in b can cause a large error in x. 14

15 But even if κ(a) is not large, small eigenvalues of A can still cause problems. If all the eigenvalues of a symmetric matrix A are small, the condition number κ(a) = λ max(a) λ min (A) will be moderate. Nevertheless, the eigenvalues of A 1 will be large, since they are reciprocal to the eigenvalues of A. Since we get x by multiplying with A 1 (see (1.11)), the data error can still be amplified. Systems with large condition numbers are called ill-conditioned. Growing condition numbers are often due to the fact that the underlying continuous problem is ill-posed, a property that is defined in the next chapter. There, we also introduce methods for solving ill-posed problems which we can use for tackling the ill-conditioned system (1.9). 15

16 2 Linear Ill-Posed Problems The following introduction to ill-posed problems is mainly based on [7]. Let T : X Y be a bounded linear operator and consider X and Y to be Hilbert spaces. We are interested in solving Tx = y (2.1) for given y Y. The problem is called ill-posed if it is not well-posed. The following definition of well-posedness goes back to J. Hadamard: Definition 2.1. A problem is well-posed if and only if the following properties hold: (i) For all admissible data, a solution exists: R(T) = Y. (ii) For all admissible data, the solution is unique: N (T) = {0}. (iii) The solution depends continuously on the data: T 1 L(Y, X). Therefore, a problem is ill-posed if one of these properties is violated. If the last criterion is not fulfilled, serious problems can occur when applying usual numerical methods to the problem, since they become unstable. So-called regularization methods make it possible to recover information about the solution as stably as possible. Uniqueness can be ensured by reformulating the notion of the solution. Definition 2.2. Let T : X Y be a bounded linear operator. An element x X is a least-squares solution of Tx = y if If in addition Tx y = inf{ Tz y z X }. x = inf{ z z is least-squares solution of T x = y} then x is called best-approximate or generalized solution. Thus, the best-approximate solution is defined as the least-squares solution of minimal norm. We can now define the Moore-Penrose inverse which is, roughly speaking, the operator that maps y onto the best-approximate solution of (2.1). 16

17 Definition 2.3. Let T := T N(T) : N(T) R(T). The Moore-Penrose generalized inverse T of T is defined as the unique linear extension of T 1 s.t. D(T ) := R(T) R(T) (2.2) and Moreover, for y D(T ), we define x := T y. N(T ) = R(T). (2.3) Proposition 2.1. Let P and Q be the orthogonal projectors onto N(T) and R(T) respectively. Then, the following Moore-Penrose equations hold: Proof. See [7], Prop TT T = T, T TT = T, T T = I P, Q D(T ) = TT. Proposition 2.2. The generalized inverse T is continuous if and only if R(T) is closed. Proof. See [7], Prop The next proposition gives the connection between the generalized inverse and least-squares solutions: Proposition 2.3. Let y D(T ). Then, x is the unique best-approximate solution of Tx = y. Furthermore, x + N(T) is the set of all least-squares solutions. Proof. See [7], Thm Let T denote the adjoint of T which is defined as the operator that maps from Y to X such that for all x X and y Y Tx, y = x, T y. Then, the least-squares solutions can be characterized by the normal equation: 17

18 Proposition 2.4. Let y D(T ). Then x X is a least-squares solution of (2.1) if and only if it is a solution of the normal equation where T is the adjoint of T. Proof. See [7], Thm Compact Operators T Tx = T y, (2.4) Let us now consider compact operators which are an important class of operators that lead to ill-posed problems. They are of interest because under suitable assumptions integral operators are compact and many problems can be formulated as integral equations. In the following, K always denotes a compact operator. Definition 2.4. An operator K L(X, Y) is compact if for all bounded subsets B of X it holds that R(B) is a compact set. A self-adjoint compact linear operator can be represented by its eigensystem which will help introducing regularization methods. In the following,, should denote the inner products of the respective Hilbert spaces X and Y. By taking all non-zero eigenvalues λ n of K and a corresponding complete system of eigenvectors v n we obtain the following representation for all x X: Kx = λ n x, v n v n. (2.5) n=1 If K is not self-adjoint, we can find a decomposition by its singular system (σ n ; v n, u n ) which is defined as follows: The operator K K is self-adjoint since K Kx, y = Kx, Ky = x, K Ky. Therefore, we can do a decomposition w.r.t. the complete eigensystem (λ n, v n ) of K K: K Kx = λ n x, v n v n. (2.6) Due to n=1 K Kx, x = Kx, Kx = Kx

19 we have that K K is positive semi-definite. Thus, all the eigenvalues λ n are non-negative and we can define σ n and u n such that Similar to (2.8) we get σ n := + λ n, (2.7) Kv n = σ n u n. (2.8) σ 2 nv n = K σ n u n, i.e. σ n v n = K u n, since a possible zero eigenvalue λ n is not used for the decomposition. By applying the operator K to the last equation we get or equivalently Kσ n v n = KK u n σ 2 n u n = KK u n which means that u n is an eigenvector of KK to the eigenvalue σ 2 n. Moreover, it is easy to show that (u n ) n N is an orthonormal system: u n, u m = 1 σ n σ m Kv n, Kv m = 1 σ n σ m K Kv n, v m = σ n σ m v n, v m = δ nm. For the proof of the next proposition, we need two fundamental theorems of functional calculus, which can be found e.g. in [9]: Theorem 2.1. Let K : X Y be a compact operator and K its adjoint. Then N(K) = R(K ), N(K ) = R(K). Theorem 2.2. Let H be a Hilbert space and let S be a complete subspace of H. Then, H is given as the direct sum We can now state the following H = S S. 19

20 Proposition 2.5. Let K : X Y be a compact operator. The following properties hold: R(K K) = R(K ), (2.9) R(KK ) = R(K). (2.10) Proof. It is sufficient to show the first equality, since the second one is its immediate consequence. Due to Theorems 2.1 and 2.2 we only need to show N(K) = N(K K). (2.11) For x X it is obvious that Kx = 0 implicates K Kx = 0, i.e. N(K) N(K K). Now let x N(K K). Because of K Kx = 0, we get that Kx N(K ) = R(K). But since Kx R(K), it follows that Kx = 0, i.e. x N(K). Hence, and therefore, (2.11) holds. N(K K) N(K) From this proposition we conclude that (u n ) n N and (v n ) n N span R(K) and R(K ) respectively. Therefore, for x X and y Y we get the following singular value decomposition: Kx = K y = σ n x, v n u n, (2.12) n=1 σ n y, u n v n. (2.13) n=1 If R(K) is finite-dimensional, then K has only finitely many singular values. Otherwise there is exactly one accumulation point for the singular values which is 0: lim n σ n = 0. The range R(K) is closed if and only if it is finite-dimensional. This yields, together with Proposition 2.2, Proposition 2.6. Let K : X Y be a compact operator. Then, the generalized inverse K is continuous if and only if dim R(K) <. 20

21 Therefore, the generalized inverse of a compact operator with infinite dimensional range cannot be continuous. This means that the best-approximate solution does not depend continuously on the right-hand side which makes the equation ill-posed. The next statement is of main importance: Proposition 2.7. Let (σ n ; v n, u n ) be a singular system for the compact linear operator K and y Y. Then y D(K y, u n 2 ) < (2.14) and for y D(K ) Proof. See [7], Thm K y = n=1 n=1 σ 2 n y, u n σ n v n. (2.15) Equivalence (2.14) is called Picard criterion and gives a necessary and sufficient condition for the existence of a best-approximate solution. It states that the coefficients ( y, u n ) n=1 have to decay fast enough w.r.t. the singular values σ n. If such a solution exists, then equation (2.15) yields a formula for computing it. Note that error components corresponding to small singular values can be drastically amplified. In case that dim R(K) <, there are only finitely many singular values and therefore the amplification is bounded. But still, it can be unacceptably large. In order to introduce regularization operators we need the notion of a function of a self-adjoint operator. Recall that if (σ n ; v n, u n ) is a singular system for the compact operator K : X Y we get for all x X K Kx = σn x, 2 v n v n, (2.16) n=1 since (σn 2; v n) is an eigensystem of K K. For λ R, x X and P as the orthogonal projector onto N(K K) we define E λ x := x, v n v n (+Px). (2.17) n=1 σn 2<λ The component Px is meant to appear only for λ > 0. The operator E λ is an orthogonal projector onto X λ := span {v n n N, σ 2 n < λ} (+N(K K), if λ > 0). 21

22 Obviously, E λ = 0 for λ 0. For the case that λ > σ 2 1, we have E λ = I since X λ = R(K K) + N(K K) = X. We can show a monotonicity property of the spectral family E λ. For all λ µ the following holds: E λ x, x = x, v n 2 (+ Px 2 ) n=1 σn 2 <λ x, v n 2 (+ Px 2 ) = E µ x, x. n=1 σ 2 n<µ Additionally, E λ is piecewise constant with jumps at λ = σn 2 (and at λ = 0 if and only if N(K K) {0}) of magnitude, v n v n. n=1 σ 2 n=λ Recall that the integral w.r.t. a piecewise constant weight function is defined as the sum over all function values at the jumps of the integrand multiplied by the heights of these jumps. Without going into details of measure theory, we state that the following representation is justified (note that the monotonicity property shown above is crucial): K Kx = σn x, 2 v n v n = λde λ x. (2.18) λ R + n=1 Moreover, one can define a piecewise continuous function f of a self-adjoint (not necessarily compact) operator as f(t T)x := and the norm of its evaluation by f(t T)x 2 := 0 0 f(λ)de λ x f(λ) d E λ x 2. For compact operators this reduces to f(k K)x := f(σn) x, 2 v n v n and f(k K)x 2 := n=1 f 2 (σn) x, 2 v n 2. n=1 22

23 2.2 Regularization Operators Let us go back to our original problem. For a linear bounded operator T : X Y and a given right-hand side y Y, we want to find a solution x X such that Tx = y. (2.19) We have introduced the notion of a generalized inverse T. If this operator exists, the best-approximate solution is given as x = T y. (2.20) In most applications, the right-hand side y is not given exactly. Then, we are only given an approximation y δ and a bound δ on the noise level such that y y δ δ. (2.21) is guaranteed. In the ill-posed case T is not continuous and thus, T y δ is not necessarily a good approximation of T y even if it exists. Therefore, we introduce the notion of regularization which roughly speaking means to approximate an ill-posed problem by a family of well-posed problems. The aim is to find an approximation of x that on one hand depends continuously on y δ in order to compute it in a stable way. On the other hand, it should tend to x if the noise level δ approaches zero. We do not want to determine this approximation only for a specific righthand side, but rather approximate the unbounded operator T by a family of continuous parameter-dependent operators {R α } in the way that for an appropriate choice of α = α(δ, y δ ) tends to x for δ 0. x δ α := R α y δ Definition 2.5. Let T : X Y be a bounded linear operator between Hilbert spaces and α 0 (0, ]. For every α (0, α 0 ), let R α : Y X be a continuous operator. The family {R α } is called a regularization for T if for all y D(T ) there exists a parameter choice rule α = α(δ, y δ ) such that the following holds: lim sup{ Rα(δ,y )y δ T y y δ Y, y y δ δ} = 0. (2.22) δ 0 δ 23

24 The parameter choice rule α : R + Y (0, α 0 ) has to fulfill lim sup{α(δ, δ 0 yδ ) y δ Y, y y δ δ} = 0. (2.23) Therefore, a regularization method has two components: a regularization operator and a parameter choice rule. If α depends only on δ, we call it an a-priori, otherwise an a-posteriori parameter choice rule. However, due to a theorem by Bakushinskii, α cannot depend on y δ only. The theorem states that in this case, convergence of the regularization method implies the boundedness of T, i.e. the well-posedness of the problem. Thus, for regularization of an ill-posed problem α cannot be chosen independently of δ. The questions that arise are how to construct a family of regularization operators and how to choose parameter choice rules that yield convergence. The following proposition gives an answer to the first question for the case of linear operator equations. Proposition 2.8. Let R α be a continuous operator for all α > 0. Then, the family {R α } is a regularization of T if R α α 0 T pointwise on D(T ). In this case, for all y D(T ) there exists an a-priori parameter choice rule α(δ) such that (R α, α) is a convergent regularization method for Tx = y. Proof. See [7], Prop Thus, we need to construct the regularization operators R α such that they converge pointwise towards T. We will now discuss possible ways of constructing the regularization operators in case T is linear. One can extend the notion of the spectral family to self-adjoint bounded but not necessarily compact operators. Now, let {E λ } be the spectral family of T T. If T T is continuously invertible, then (T T) 1 = 1 de λ λ. By Proposition 2.4 we get for the best-approximate solution x = λ R + 1 λ de λt y. (2.24) If R(T) is non-closed, i.e. in the case of ill-posedness, the eigenvalues {λ} accumulate in 0, which means that the above integral has a pole in 0. The 24

25 crucial idea of regularization is to replace 1/λ by a family of functions {g α (λ)} that have to fulfill some continuity conditions. We can now replace (2.24) by x α := g α (λ)de λ T y λ R + and define the family of regularization operators according to R α := g α (λ)de λ T. (2.25) λ R + The following proposition states under which assumptions on {g α } the convergence can be guaranteed: Proposition 2.9. Let for all α > 0, g α : [0, T 2 ] R be piecewise continuous and constructed in the way that for all λ (0, T 2 ] and Then, we have that for all y D(T ) holds. Proof. See [7], Thm λg α (λ) C lim α 0 g α(λ) = 1 λ. lim x α = x α 0 Since the operator R α is continuous, y y δ δ implies the boundedness of the error between x α and We define x δ α := λ R + g α (λ)de λ T y δ. r α (λ) := 1 λg α (λ) and state the following convergence result for a-priori parameter choice rules: Proposition Let g α fulfill the assumptions of Proposition 2.9 and assume that µ > 0. Furthermore, let for all α (0, α 0 ) and λ [0, T 2 ] and some c µ > 0 λ µ r α (λ) c µ α µ (2.26) 25

26 hold. If G α := sup{ g α (λ) λ [0, T 2 ]} = O(α 1 ) as α 0 and the so-called source condition x R((T T) µ ) is satisfied, then the parameter choice rule according to α δ 2 2µ+1 yields x δ α x = O(δ 2µ 2µ+1 ). Proof. See [7], Cor Source conditions usually imply smoothness and boundary conditions on the exact solution. An example for how to choose {g α } is { 1/λ λ α, g α (λ) := 0 λ < α. This method is called truncated singular value expansion. The assumptions of the previous propositions hold with C = 1, c µ = 1, arbitrary µ > 0 and G α = 1/α. In general, determining µ > 0 such that the source condition (2.10) holds, is not possible. Therefore, we briefly introduce a-posteriori parameter choice rules. The most common a-posteriori choice rule is Morozov s discrepancy principle which can be formulated as follows: For g α fulfilling the same assumptions as in Proposition 2.9 and a constant τ chosen according to τ > sup{ r α (λ) α > 0, λ [0, T 2 ]}, the regularization parameter defined by the discrepancy principle is α(δ, y δ ) := sup{α > 0 Tx δ α y δ τδ}. (2.27) Remark 2.1. The underlying idea of the discrepancy principle is the fact that since the right-hand side of (2.19) is only known up to a noise level δ, it does not make sense to search for an approximate solution x with a residual T x y δ < δ. We should only ask for a solution such that the residual is of order δ. In addition, a smaller regularization parameter implicates less stability, which is why we take the largest possible value for α. This is what is done by using the discrepancy principle. 26

27 Proposition The regularization method (R α, α) where α is defined by (2.27) is convergent for all y R(T). Moreover, we have x δ α(δ,y δ ) 2µ x = O(δ 2µ+1 ) (2.28) for all µ (0, µ 0 1/2]. Here, µ 0 denotes the largest number µ for which (2.26) holds. Proof. See [7], Thm Remark 2.2. It is sufficient, that the parameter choice rule α(δ, y δ ) satisfies Tx δ α yδ τδ Tx δ β yδ for some β with α β 2β. This is crucial for the numerical realization of the discrepancy principle. Finally, we introduce three regularization methods. The most commonly used is Tikhonov regularization for which g α (λ) := 1 λ + α. Due to the definition of x δ α and since {λ+α} are the eigenvalues of T T +αi, we have i.e. x δ α = λ R + g α (λ)de λ T y δ = (T T + αi) 1 T y δ, (2.29) (T T + αi)x δ α = T y δ, which can be regarded as a regularized form of the normal equation. By applying Tikhonov regularization to a compact operator K with singular system (σ n ; v n, u n ), we get x δ α = n=1 σ n σ 2 n + α yδ, u n v n. Compared to the original singular value expansion, the factor 1 σ n is now replaced by σn which is bounded for n. σn 2 +α Proposition Let x δ α be defined as in (2.29). Then it is the unique minimizer of the Tikhonov functional x Tx y δ 2 + α x 2. 27

28 Proof. See [7], Thm This proposition clearly shows what regularization does. One tries to find a solution that on the one hand minimizes the residual as far as possible and on the other hand enforces stability by introducing the penalty term x. The factor α ensures that the second term tends to 0 as the noise vanishes. Corollary 1. In case that α(δ, y δ ) is chosen according to the discrepancy principle Tikhonov regularization converges and yields (2.28). Proof. This is an immediate consequence of Proposition Another wide-spread regularization method is the so-called Landweber method. It uses only discrete values for α and is therefore an iterative method. Here, the family of functions approximating 1/λ is defined by 1 (1 λ)k g k (λ) :=, k N. λ Finally, we mention a version of the conjugate gradient method. The CG algorithm is a very efficient solver for self-adjoint positive (semi)-definite well-posed linear equations. In the case of an ill-posed equation Tx = y δ, we apply the CG method to the corresponding normal equation T Tx = T y δ and call this the CGNE method (conjugate gradients for the normal equation). Unlike Landweber iteration, the CGNE method is not based on a fixed sequence of polynomials {g k } and {r k } since these polynomials now depend on the given right-hand side. This ensures higher flexibility, however, the drawback is that {x δ k } depends non-linearly on the data yδ. Proposition In case that k(δ, y δ ) is chosen according to the discrepancy principle both CGNE and Landweber iteration converge and yield (2.28). Proof. See [7], Thm and Thm Remark 2.3. As for the CGNE method, in the non-attainable case y D(T ) \ R(T), y δ needs to be replaced by Qy δ in (2.27). Since T Tx = T y = Tx = Qy (2.30) it is sufficient to replace Tx δ α y δ by T Tx δ α T y δ in (2.27). Proof of (2.30). By Proposition 2.4, T Tx = T y = T y = x = TT y = Tx (2.31) Qy = Tx. (2.32) 28

29 Algorithm 1 CGNE x δ 0 = x, d 0 = y δ Tx δ 0, p 1 = r 0 = T d 0, k=1; while T Tx δ k T y δ > τδ do q k = Tp k, α k = r k 1 2 / q k 2, x δ k = x δ k 1 + α k p k, d k = d k 1 α k q k, r k = T d k, β k = r k 2 / r k 1 2, p k+1 = r k + β k p k, k = k + 1. end while 2.3 CGNE for the Bilinear Ansatz After this brief introduction to ill-posed problems we go back to our original problem (1.9), which is ill-conditioned as discussed in Section 1.4. Therefore it is necessary to apply regularization methods in order to get better results. We try to reconstruct the phase screen φ atm shown in Figure 2.1 that has a resolution of by applying the CGNE method. For the time being, we are only interested in reconstructing φ atm from given slope measurements s, i.e. we do not consider a telescope with a given number of subapertures. We compute the corresponding Poke matrix P as if the number of subapertures was equal to the resolution of φ atm and the pupil mask was neglected. What needs to be done is to evaluate the operator that maps φ atm to the slope measurements s = [s x, s y ]. Since we assume Fried geometry, the components of s are given by (s x ) i = d(x, y), (2.33) x (s y ) i = Ω i φ atm φ atm Ω i y d(x, y). (2.34) By applying the theorem of Fubini and the midpoint rule for approximating 29

50 100 150 200 250 50 100 150 200 250 Figure 2.1: Original phase screen.

30 Figure 2.1: Original phase screen. the integrals, we get (s x ) i = (s y ) i = yi2 +1 y i2 y i 2 2 xi1 +1 x i1 x i 1 2 [φ atm (x i1 +1, y) φ atm (x i1, y)]dy [φ atm (x i1 +1, y i2 +1) φ atm (x i1, y i2 +1) + φ atm (x i1 +1, y i2 ) φ atm (x i1, y i2 )], [φ atm (x, y i2 +1) φ atm (x, y i2 )]dx [φ atm (x i1 +1, y i2 +1) φ atm (x i1 +1, y i2 ) + φ atm (x i1, y i2 +1) φ atm (x i1, y i2 )], with x i1 := x i1 +1 x i1 and y i2 := y i2 +1 y i2. We then generate s δ by adding a random noise vector to s with normally distributed components. The noisy data should satisfy s s δ rms δ for a given noise level δ. Here, the norm of a vector s of length m is defined as ( s 2 ) i 1/2. s rms := m 1 30

δ = 10% 50 50 100 100 150 150 200 200 250 50 100 150 200 250 250 50 100 150 200 250 Figure 2.

discrepancy principle to Pa = s δ in order to determine the point-wise values a j.

31 noise level δ = 0% noise level δ = 2.5% noise level δ = 5% noise level δ = 10% Figure 2.2: Reconstructed phase screens for different noise levels, τ = 1.2. Then, we apply the CGNE algorithm with the discrepancy principle to Pa = s δ in order to determine the point-wise values a j. The reconstructed wavefront is then given by φ δ rec = a j h j. j Figure 2.2 shows the reconstructed phase screens for different choices of δ. The crucial point is that the error φatm φ δ rec tends to zero for δ 0 (see Fig. 2.3). 31

32 φ φ rec / φ s s δ / s Figure 2.3: Relative error in φ w.r.t. noise level δ, τ =

33 3 Sparse Reconstruction If it is known that the desired solution of Tx = y is likely to be sparse in some basis {ϕ γ : γ Γ} of X, one could consider regularization methods that yield solutions that are sparse w.r.t. {ϕ γ : γ Γ}. Therefore, we discuss an additional regularization method for linear inverse problems that promotes sparsity. The underlying idea is to replace the quadratic penalty term in (2.12) by a weighted l p -norm of the coefficients of x w.r.t. an orthonormal basis {ϕ γ : γ Γ} of X. Thus, we aim at minimizing Φ w,p (x) = Tx y 2 + γ Γ w γ x, ϕ γ p, (3.1) where p [1, 2] and w = (w γ ) γ Γ is a sequence of strictly positive weights. For the choice w 1, the penalty term is the ordinary l p -norm of the coefficients. Another possible choice leads to a penalty term that is equivalent to the Besov norm (see Section 3.3.1). Keeping the weights fixed and decreasing p from 2 to 1, yields an increase in the penalization of coefficients that are smaller than 1 and a decrease in the penalization of those that are larger than 1. Thus, the more we decrease p, the more we are likely to get a generalized solution that has a sparse expansion w.r.t. {ϕ γ : γ Γ}. 3.1 An Iterative Soft-Thresholding Algorithm The following algorithm was introduced in [6]. Minimizing the functional in (3.1) can be rewritten in a variational formulation. Since {ϕ γ : γ Γ} is an ONB, we get Φ w,p (x) = x, T Tx 2 x, T y + y, y + γ Γ = γ Γ x γ T Tx, ϕ γ 2 γ Γ w γ x γ p x γ T y, ϕ γ + y, y + γ Γ w γ x γ p, where x γ is a shortcut for x, ϕ γ. Differentiating w.r.t. to x, yields 2 T Tx, ϕ γ ϕ γ 2 γ Γ γ Γ T y, ϕ γ ϕ γ + w γ p x, ϕ γ p 1 sign( x, ϕ γ )ϕ γ = 0, γ Γ which implies that each coefficient is zero, i.e. for all γ Γ T Tx, ϕ γ T y, ϕ γ + w γp 2 x, ϕ γ p 1 sign( x, ϕ γ ) = 0. (3.2) 33

34 Here, we implicitly set the derivative of the absolute value to be zero in the origin. Both the coupling of the equations due to T Tx and the nonlinearity of the equations make the above system hard to solve. Therefore, one introduces a surrogate functional that has nicer properties and minimizes it instead of Φ w,p (x). As we will state later, the surrogate functional introduced below approximates Φ w,p. The new surrogate functional is constructed by adding an additional functional to Φ w,p (x): Φ SUR w,p (x; a) :=Φ w,p (x) Tx Ta 2 + C x a 2, = Tx y 2 + γ Γ w γ x γ p Tx Ta 2 + C x a 2, =C x 2 2 x, T y T Ta + Ca + γ Γ w γ x γ p + + y 2 Ta 2 + C a 2, = ) (Cx 2γ 2x γ(ca + T y T Ta) γ + w γ x γ p + γ Γ + y 2 Ta 2 + C a 2. (3.3) where C is a constant fulfilling T T < C. Note that instead of introducing C we could also rescale the equation such that T < 1. Since Φ w,p (x) and Ψ(x; a) := C x a 2 Tx Ta 2 are both strictly convex in x (for any 1 p 2 and any a), Φ SUR w,p (x; a) is also strictly convex in x and therefore has a unique minimizer for any a. The main advantage of this surrogate functional is that the variational equations of x γ are no longer coupled. We can now define an iterative algorithm that will lead us to a minimizer of the original functional Φ w,p (x): x 0 arbitrary; n N x n = arg min x X (ΦSUR w,p (x; xn 1 )). Thus, we first determine the minimizer x 1 of the surrogate functional with a = x 0 and then, in each iteration minimize the functional for a = x n 1. Let us now consider the case p = 1, that is most likely to yield sparse solutions. Then, the summand in (3.3) is differentiable in x γ only for x γ 0. In this case, we end up with the following variational equation 2Cx γ 2 ( Ca + T (y Ta) ) γ + w γsign(x γ ) = 0. 34

35 Thus, for x γ > 0 we get which is only valid for x γ = a γ + 1 C a γ + 1 C For x γ < 0, we similarly get ( T (y Ta) ) γ w γ 2C, ( T (y Ta) ) γ > w γ 2C. (3.4) which is true only if x γ = a γ + 1 C ( T (y Ta) ) γ + w γ 2C, a γ + 1 C ( T (y Ta) ) γ < w γ 2C. (3.5) If neither (3.4) nor (3.5) holds, we set x γ = 0. Let S w,1 : R R be a function defined as t w/2 if t w/2, S w,1 (t) := 0 if t < w/2, t + w/2 if t w/2. Then, x γ = S wγ/c,1 (a γ + 1 C (T (y Ta)) γ ). Therefore for p = 1, the iterative method reads as follows: γ Γ x n γ = S w γ/c,1 ( x n 1 γ + 1 C (T (y Tx n 1 )) γ ). If p > 1, the summand in (3.3) is differentiable in x γ and minimization reduces to solving the variational equation ( ) 2Cx γ 2 Ca + T (y Ta) + pw γ sign(x γ ) x γ p 1 = 0. (3.6) Since for any w 0 and p > 1, the real function F w,p (x) = x+ wp 2 sign(x) x p 1 is bijective on R, we can define γ S w,p := (F w,p ) 1 35

36 and can again find the minimizer of (3.3) via x γ = S wγ,p (a γ + 1 C (T (y Ta)) γ ). Unlike for p = 1, there is no explicit formula for S w,p. Thus, when implementing this method for p > 1, we need an algorithm to solve the non-linear equation (3.6), which is discussed in Section We now want to turn to convergence results of this method. The following proposition states the existence of a unique minimizer of the surrogate functional: Proposition 3.1. Let T be an operator mapping from a Hilbert space X to another Hilbert space Y, assume T T < 1 and y Y. Additionally, suppose that {ϕ γ } γ Γ is an orthonormal basis of X and that w = {w γ } γ Γ is a sequence of strictly positive elements. Then, for arbitrarily chosen a X, Φ SUR w,p (x; a) = Tx y 2 + w γ x γ p Tx Ta 2 + C x a 2 γ Γ has a unique minimizer in X that is given by x = ) S wγ,p (a γ + (T (y Ta)) γ ϕ γ. γ Γ Proof. See [6], Prop The next proposition guarantees that the iterates of the algorithm using surrogate functionals converge to a minimizer of the original functional Φ w,p : Proposition 3.2. Assume that the conditions of the previous statement hold. If in addition, the sequence w = {w γ } is uniformly bounded from below by a constant c > 0, i.e. if for all γ Γ w γ c, then, for any x 0 X the iterates x n = ( S wγ,p xγ n 1 + (T (y Tx n 1 )) γ )ϕ γ γ Γ strongly converge to a minimizer of Φ w,p (x) = Tx y 2 + γ Γ w γ x, ϕ γ p. (3.7) Proof. See [6], Thm

37 So far, it is not clear, whether minimizing (3.7) is related to solving Tx = y. The question is, how the penalized functional (3.7) can lead to a regularization method for the original problem. First of all, we need to introduce an additional parameter α in order to be able to vary the weight of the penalty term. Thus, we consider the functional Φ α,w,p (x) = Tx y 2 + α γ Γ w γ x, ϕ γ p. Due to Definition 2.5 the family {Φ α,w,p } is a regularization for T if for all y D(T ) there exists a parameter choice rule α = α(δ, y δ ) such that lim sup{ Φα(δ,y ),w,py δ T y y δ Y, y y δ δ} = 0. δ 0 δ The parameter choice rule has to fulfill lim sup{α(δ, δ 0 yδ ) y δ Y, y y δ δ} = 0. The following proposition states, that with an additional requirement for α(δ, y δ ) the functionals {Φ α,w,p } yield a regularization method for our linear system. Proposition 3.3. Let T : X Y be a bounded operator with T < 1 and assume that p [1, 2] and that w = {w γ } γ Γ is uniformly bounded from below by c > 0. For any y D(T ) and any α > 0, define x α,w,p;y to be the minimizer of Φ α,w,p;y (x). If α = α(δ) satisfies and then we have Proof. See [6], Thm lim α(δ) = 0 (3.8) δ 0 lim δ 0 δ 2 α(δ) = 0 (3.9) lim sup{ x α(δ),w,p;y T y } = 0. δ 0 Remark 3.1. As proven in [1], if α is chosen according to the discrepancy principle, it satisfies (3.8) and (3.9) and can be thus used here instead of an a-priori choice rule. 37

38 3.2 Wavelets To apply the algorithm discussed in this chapter, the first issue that has to be clarified is the choice of the orthonormal basis {ϕ γ } of X. In Section 1.4 we used the basis of bilinear influence functions. This basis is not orthogonal and thus cannot be considered for the algorithm based on surrogate functionals. As mentioned in [10] the distorted incoming wavefronts tend to have a fractal structure. Hence, wavelets might be a good choice for a basis in which to expand the phase screens. Before proceeding any further, we first want to give a brief introduction to wavelets. For a detailed discussion on wavelets we refer to [5]. Definition 3.1. Let ψ L 2 be a function mapping from R to R. If ψ fulfills the admissibilty condition C ψ := 2π ξ 1 ˆψ(ξ) 2 dξ <, (3.10) R where ˆψ denotes the Fourier transform of ψ, it is called a wavelet. From a wavelet ψ we can generate a family of wavelets according to ( x b ) ψ a,b (x) := a 1 2 ψ, a, b R, a 0, (3.11) a and refer to the function ψ as the mother wavelet. The factor a 1 2 ensures that the L 2 -norm is independent of a and b, i.e. ψ a,b = ψ. (3.12) The continuous wavelet transform of a function f L 2 (R) is defined via ( x b ) (T wav f)(a, b) := a 1 2 f(x)ψ dx = f, ψ a,b. a R Note that due to (3.12) and if ψ is compactly supported, functions ψ a,b with a high frequency a show a small support, whereas ψ a,b with a low frequency have a large support. This basic property of wavelets shows its major advantage in signal processing, compared to the Fourier transform, since it allows a good localization in both space and time. We now turn to the question of reconstructing a function from its wavelet transform. We assume that ψ = 1. Since ψ fulfills (3.10), we can recover any f L 2 (R) from its wavelet transform according to f = C 1 ψ a 2 (T wav f)(a, b)ψ a,b dadb. R R 38

39 If ψ L 1 (R), then condition (3.10) can only hold if ψ(x)dx = 0. R For implementation issues, we need to discuss how to discretize wavelet transforms. For convenience, we only consider the case in which f can be reconstructed by using only positive values of a. W.l.o.g, we fix a 0 > 1 and b 0 > 0 and restrict a and b to the discrete values which yields a = a j 0 and b = kb 0 a j 0, j, k Z, ψ j,k (x) = ψ a j 0,kb 0a j 0 (x) = a j/2 0 ψ ( x kb0 a j ) a j 0 = a j/2 0 ψ(a j 0x kb 0 ). Here, b 0 is chosen such that the functions ψ(x kb 0 ) cover the entire real axis. Then, for any fixed j, this property will also hold for the functions ψ j,k. As in the continuous case, the question arises whether or not we can reconstruct f out of f, ψ j,k in a stable way. Another question, dual to the first one, is whether any function f can be written as a superposition of elementary building blocks ψ j,k. If the ψ j,k constitute an orthonormal basis of L 2 (R), we can ensure that any f L 2 (R) is characterized by the coefficients f, ψ j,k. In addition, we can represent the L 2 -norm of any f L 2 (R) according to f 2 = f, ψ j,k 2. j,k Z Thus, if f L 2 (R), then { f, ψ j,k } j,k Z l 2 (Z), which guarantees a stable reconstruction. Since {ψ j,k } is a basis, the reconstruction is simply given by f = f, ψ j,k ψ j,k. j,k Z The question that arises is, how to construct orthonormal wavelet bases. We introduce multiresolution analysis in order to tackle this issue Multiresolution Analysis A sequence of successive approximation spaces V j L 2 (R) is called multiresolution analysis if it satisfies V 2 V 1 V 0 V 1 V 2, (3.13) 39 0

40 V j = L 2 (R), (3.14) j Z V j = {0}. (3.15) j Z In addition we require that all the spaces are scaled versions of the space V 0, i.e. j Z f V j f(2 j ) V 0 (3.16) and that V 0 is translation invariant, i.e. k Z f V 0 = f( k) V 0. (3.17) An immediate consequence of (3.16) and (3.17) is that k, j Z f V j = f( 2 j k) V j. The last requirement that has to be met is the existence of φ V 0 s.t. {φ 0,k : k Z} is an orthonormal basis of V 0, (3.18) where φ j,k (x) := 2 j/2 φ(2 j x k) for k, j Z. Due to (3.16), it follows that {φ j,k : k Z} is an orthonormal basis of V j. We call φ the scaling function of the multiresolution analysis. If we define P j to be the orthogonal projector onto V j, then (3.14) ensures that lim j P j f = f for all f L 2 (R). As stated below, Conditions (3.13) - (3.18) guarantee the existence of an orthonormal wavelet basis {ψ j,k : j, k Z} of L 2 (R), where ψ j,k (x) := 2 j/2 ψ(2 j x k), s.t. P j+1 f = P j f + k Z f, ψ j,k ψ j,k. (3.19) The mother wavelet ψ can be constructed in the following way: For all j Z we define W j to be the orthogonal complement of V j in V j+1, i.e. V j+1 = V j W j. Obviously, we have that Thus, for l < n W j W l for j l. n 1 V n = V l j=l W j 40

41 and together with (3.14) and (3.15), this yields L 2 (R) = j Z W j. (3.20) For the spaces W j we again have the scaling property f W j f(2 j ) W 0, (3.21) which is due to (3.16). Since (3.19) is equivalent to {ψ j,k : k Z} being an orthonormal basis of W j, we get with (3.20) that {ψ j,k : j, k Z} is an orthonormal basis of L 2 (R). Moreover we can conclude from (3.21) that if {ψ 0,k : k Z} is an orthonormal basis of W 0, then {ψ j,k : k Z} is an orthonormal basis of W j. Thus, we need to find ψ W 0 such that {ψ( k) : k Z} is an orthonormal basis of W 0. In order to introduce how this can be done, we need some definitions. First of all, since φ V 0 V 1 and {φ 1,k : k Z} is an orthonormal basis of V 1, we can define h k = φ, φ 1,k and represent φ according to φ = k Z h k φ 1,k. (3.22) We know that φ 1,k (x) = 2φ(2x k) and thus, can rewrite (3.22) as φ(x) = 2 h k φ(2x k) k Z or equivalently, as ˆφ(ξ) = 1 ( h k e ik ξ ξ 2 ˆφ 2 2) with convergence of the sums in the L 2 sense. Therefore, by defining we get k Z m 0 (ξ) := 1 h k e ikξ, (3.23) 2 k Z ( ξ ( ξ ˆφ(ξ) = m 0. 2)ˆφ 2) One possible way to construct ψ, is given by the following 41

42 Proposition 3.4. Let (V j ) j Z be a sequence of closed subspaces of L 2 (R) which satisfies (3.13) - (3.18). Then, there exists an orthonormal basis of wavelets {ψ j,k : j, k Z} of L 2 (R), s.t. P j+1 = P j + k Z, ψ j,k ψ j,k. One possibility for constructing the mother wavelet ψ is or equivalently, ψ = k Z ( ˆψ(ξ) = e i ξ ξ ( ξ 2 m0 )ˆφ 2 2) + π ( 1) k 1 h k 1 φ 1,k = 2 k Z( 1) k 1 h k 1 φ(2 k), where m 0 is defined via (3.23) and φ is chosen s.t. (3.18) is satisfied. Proof. See [5], Thm Remark 3.2. The orthonormality of the φ( k) leads to the following property of m 0 : m 0 (ζ) 2 + m 0 (ζ + π) 2 = 1 a.e. (3.24) Orthonormal Bases of Compactly Supported Wavelets If we go back to the problem of phase reconstruction, we see that if the basis functions we choose have compact support, we get the nice property that the Poke matrix will be sparse. Thus, we are interested in wavelets that are compactly supported. In order to ensure that all ψ j,k, j, k Z have compact support, we only need that the mother wavelet ψ is compactly supported. Furthermore, this is ensured if the scaling function φ has compact support, since then, only finitely many h k are non-zero and thus, ψ reduces to a finite linear combination of compactly supported functions. For compactly supported φ the 2π-periodic function m 0 becomes a trigonometric polynomial. One can show that if m 0 is a trigonometric polynomial with m 0 (0) = 1 and fulfilling (3.24), then under some further assumptions we get the following result: If we define φ and ψ according to ˆφ(ξ) := 1 m 0 (2 j ξ), 2π j=1 ( ξ ( ) ˆψ(ξ) := e iξ/2 m 0 )ˆφ 2 + π ξ/2, 42

43 then, {ψ j,k : j, k Z} is an orthonormal basis of L 2 (R)([5], Thm ). Thus, we need to find m 0 that satisfies (3.24). In addition, we are interested in making φ and ψ reasonably regular. One can show that imposing some regularity constraints implies that m 0 should be of the form ( 1 + e iξ ) NL(ξ), m 0 (ξ) = 2 where N 1 and L is a trigonometric polynomial. Proposition 3.5. A trigonometric polynomial m 0 of the form ( 1 + e iξ ) NL(ξ), m 0 (ξ) = 2 fulfills (3.24) if and only if L(ξ) = L(ξ) 2 can be written as L(ξ) = P(sin 2 ξ/2), with ( 1 ) P(y) = P N (y) + y N R 2 y, where P N (y) = N 1 k=0 ( N 1 + k k ) y k and R is an odd polynomial, chosen s.t. P(y) 0 for y [0, 1]. Proof. See [5], Prop This proposition completely characterizes m 0 2. With spectral factorization one can extract the square root (for further details we refer to Chapter 6 of [5]). One important class of compactly supported orthonormal wavelet bases is the family of Daubechies wavelets, first introduced in [4], which corresponds to R 0. By varying N we get the different Daubechies wavelets (which we abbreviate with dbn) and the regularity increases with N. Except for N = 1 which corresponds to the Haar basis (introduced in 3.3.3), there is no closed representation for φ and ψ. Nonetheless, their graph can be computed up to arbitrarily high precision by applying the Cascade Algorithm (see [5]). Figure 3.1 shows the graphs of φ and ψ for N = 2 and N = 4. Remark 3.3. The phase screen shown in Figure 2.1 can be very well compressed with wavelets. If we decompose the image w.r.t. the Haar wavelets and set all wavelet coefficients that have an absolute value smaller than 10 to zero, we get 91.81% zero coefficients and 99.95% retained energy. Thus, the 43

44 1.5 N=2 2 N=2 1 φ 1 ψ N=4 1.5 N=4 1 φ 1 ψ Figure 3.1: φ and ψ for Daubechies wavelets. image can be very well approximated with images that are sparse in the Haar wavelet. For the 1 D slice of the phase screen in Figure 3.2 we get 80.15% zero coefficients for a retained energy of 99.49% by setting the threshold to 9. We get similar results for other wavelet bases. The retained energy is defined as a comp 2 2 φ 2 2 where a comp is the coefficient vector of the compressed signal and φ is the original signal. 3.3 Implementation For simplicity (see Section for more details), we consider the issue of phase reconstruction in 1 D, i.e. we aim at reconstructing a slice of the 2 D phase screen. 44

45 3.3.1 Choosing the Weights w γ The first question we have to answer is how to choose the weights w γ. In Proposition 3.3, one of the necessary conditions is that the sequence w = (w γ ) γ Γ has to be uniformly bounded below away from zero. This is the only condition that has to be guaranteed for the weights. For the implementation we take two interesting choices of the w γ. The first one is w γ = 1 for all γ Γ. In this case we get Φ w,p (x) = Tx y 2 + x p p, where p denotes the l p -norm and x = ( x, ϕ γ ) γ Γ. The second choice is based on the fact that wavelets do not only constitute orthonormal bases of L 2 (R) but also bases for a variety of other Banach spaces of functions, such as Hölder spaces, Sobolev spaces and, more generally, Besov spaces. Roughly speaking, the Besov spaces B s p,q (Rd ) consist of functions that have s derivatives in L p. The parameter q provides some additional fine-tuning to the definition of these spaces. The norm x B s p,q is related to the modulus of continuity ω of x which is defined as a function ω : [0, ] [0, ] such that x(s) x(t) ω( s t ) for all s and t in the domain of x. We refer to [11] for further details and only want to point out that the Besov norm is equivalent to a norm that can be computed from the wavelet coefficients. More precisely, we assume that the scaling function φ and the mother wavelet ψ fulfill the smoothness property ( of being in C L (R), with L > s and that σ := s + d ( p only consider d = 1, we get σ = s + according to ( x s;p,q = (2 jσp j=0 γ Γ, γ =j ) Since we 2 p ). We define the norm s;p,q x, ψ γ p ) q/p ) 1/q, (3.25) where γ denotes the scale of the wavelet ψ γ. This norm is then equivalent to the Besov norm, i.e. there exist A > 0 and B > 0 s.t. A x s;p,q x B s p,q B x s;p,q. The condition σ 0 ensures that Bp,q s (R) is a subspace of L2 (R). We will restrict ourselves to choosing q to be equal to p, since then (3.25) reduces to ( ) x s;p,q = (2 ) 1/p jσp x, ψ γ p. j=0 γ Γ, γ =j 45

46 Both choices of weights should yield sparse solutions for p 1 and will be compared in Chapter The Shrinkage Function S wγ,p In order to speed up the computation, we implement a vector-valued version S w,p = (S wγ,p) γ Γ of the shrinkage function. For p = 1 we do this by using the MATLAB built-in function wthresh. As mentioned in Section 3.1, for p > 1 we cannot find S wγ,p explicitly but moreover have to find the solution x γ of the non-linear equation x γ + ksign(x γ ) x γ p 1 = c, (3.26) ( ) where k = pαwγ and c = a 2C γ + 1/C T (y Ta). At a first glance, using γ Newton s method for solving (3.26) seems to be a good idea. Unfortunately, this method fails in some cases. For instance, if p = 1.2, C = , w γ = 2 3(3/2 1/p)p and c = 0.2, the iterates of the Newton method oscillate between x 2k γ = and x 2k+1 γ = However, if the factor k is small enough, the method works. We can achieve this by either starting with a small regularization parameter α or with a large value for C. The latter has the drawback of slower convergence and should thus be avoided if possible. On the other hand, in the case of data noise, α cannot be chosen arbitrarily small. To ensure that the implemented algorithm always works, we add the method of bisection which will be used in case the Newton method fails. This algorithm is much slower than the Newton method, but it converges for any continuous strictly monotonic function f for which lim x f(x) = and lim x f(x) =, [14]. Algorithm 2 Bisection for solving f(x) = x + ksign(x) x p 1 = c a := min(c, 0), b := max(c, 0), m := a; while a b > ǫ f(m) > ǫ do m := a + b a, 2 if f(m) > 0 then b = m, else a = m, end if end while x = m; 46

47 Proposition 3.6. Let a < b R be chosen such that the unique solution x of f(x) = c fulfills a x b and let the iterates of the method of bisection be denoted by x k. Then, the method converges with the rate x k x 2 k b a. Proof. By induction. For k = 0 we know that Now suppose that x 0 x b a. x k x 2 k b a which means that in the k-th step the interval we are restricting our solution to is of length 2 k b a. In the k + 1-th step we reduce this interval by taking only half of it, i.e. x k+1 x k b a = 2 (k+1) b a. Again, we implement both the Newton and bisection algorithm such that the computation is done for vectors Building the Poke Matrix In order to test the algorithm, we need to determine the Poke matrix according to the chosen wavelet basis. In addition, we have to compute the slope measurements for the given wavefront. In 2 D, the entries of the Poke matrix P = [P x,p y ] are given by P i,l ψ j,k(x, y) x = d(x, y), Ω i x yi2 +1 ( = ψj,k (x i1 +1, y) ψ j,k (x i1, y) ) dy, y i2 P i,l ψ j,k(x, y) y = d(x, y), Ω i y xi1 +1 ( = ψj,k (x, y i2 +1) ψ j,k (x, y i2 ) ) dx, x i1 where l corresponds to the linear indexing of (j, k). In order to compute this integral we can apply a quadrature rule. However, we will need the 47

48 evaluation of the wavelet functions ψ j,k at some given points. This is not an easy task, since we cannot use the MATLAB wavelet toolbox to evaluate the wavelets and would thus need to build the wavelet family manually. Hence, for simplicity, we only want to reconstruct a 1D slice φ atm : [0, 1] R of the phase screen (shown in Fig. 3.2). We assume that φ atm L 2 and that it has zero mean. Furthermore, the resolution in which φ atm is actually given is 2 8. We choose the scale j to range from 0 to 7. In 1D, computing the slopes reduces to Figure 3.2: 1D slice of original phase screen. s i = φ atm(x)dx = φ atm (x i+1 ) φ atm (x i ), Ω i and the Poke matrix P is determined by P i,l = ψ j,k (x i+1 ) ψ j,k (x i ), where l corresponds to the linear indexing of (j, k). Here, we still need the evaluation of the wavelet functions which can be implemented in MATLAB by decomposing signals (i.e. computing their coefficients) that have only one non-zero value which is equal to 1. In order to test the introduced algorithm, we use it to solve Pa = s, 48

49 for the wavelet coefficient vector a of the reconstructed phase screen φ rec, which we finally compute according to φ rec = k a 0,k φ 0,k + 7 a j,k ψ j,k. j=1 k We start with the simplest possible wavelets: Haar wavelets. In this case, a 0 = 2, b 0 = 1 and the mother wavelet is defined as 1 if 0 x < 1 2, ψ(x) = 1 1 if 2 x < 1, 0 otherwise. In order to justify using these wavelets, we need to ensure that they constitute an orthonormal basis of L 2 (R), i.e. that 1. the ψ j,k are orthonormal and 2. any function in L 2 (R) can be approximated, up to any desired precision, by a finite linear combination of the ψ j,k. Orthonormality is easy to show. Since two Haar wavelets of the same scale j do not overlap, it holds that ψ j,k, ψ j,k = δ k,k. If they are of different scales j < j, then the support of ψ j,k lies in a region, where ψ j,k is constant. Therefore, the scalar product ψ j,k, ψ j,k is proportional to the integral of ψ j,k itself and thus, is zero. We skip the proof of the second statement, which can be found in [5]. Remark 3.4. The Haar wavelet basis can also be constructed with the multiresolution analysis for { 1 0 x < 1, φ(x) = 0 otherwise. Since the basis functions have compact support, the Poke matrix is sparse. Figure 3.3 shows the sparsity pattern of P. Figures 3.5 and 3.6 illustrate the behaviour of the algorithm for p = 1 and Haar wavelets for different noise levels. The reconstruction error approaches zero as the noise level tends to zero. Here, the implementation of the Haar basis is done independently, 49

0.6 50 100 150 0.4 0.2 0 0.2 0.4 0.6 0.8 200 1 250 50 100 150 200 250 1.2 1.4 Figure 3.

50 100 150 200 250 db2 50 100 150 200 250 db4 1 0.5 0 0.

50 Figure 3.3: Sparsity pattern of P for the Haar basis db db Figure 3.4: Sparsity pattern of P for the Daubechies wavelets. which is very complex for other Daubechies wavelets. For the latter we use the MATLAB wavelet toolbox to evaluate the basis functions at different values in order to build the Poke matrix and to compute φ rec from its wavelet 50

Regularization and Inverse Problems

Regularization and Inverse Problems Caroline Sieger Host Institution: Universität Bremen Home Institution: Clemson University August 5, 2009 Caroline Sieger (Bremen and Clemson) Regularization and Inverse