GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL

Size: px

Start display at page:

Download "GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL"

Everett Gallagher
6 years ago
Views:

1 GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL CHAO MA, XIN LIU, AND ZAIWEN WEN Abstract. In this paper, we consider a nonlinear least squares odel for the phase retrieval proble. Since the Hessian atrix ay not be positive definite and the Gauss-Newton GN) atrix is singular at any optial solution, we propose a odified Levenberg-Marquardt LM) ethod, where the Hessian is substituted by a suation of the GN atrix and a regularization ter. Siilar to the well-known Wirtinger flow ) algorith under certain assuptions, we start fro an initial point provably close to the set of the global optial solutions. Global linear convergence and local quadratic convergence to the global solution set are proved by estiating the sallest nonzero eigenvalues of the GN atrix, establishing local error bound properties and constructing a odified regularization condition. The coputational cost becoes tractable if a preconditioned conjugate gradient PCG) ethod is applied to solve the LM equation inexactly. Specifically, the pre-conditioner is constructed fro the expectation of the LM coefficient atrix by assuing the independence between the easureents and iteration point. Preliinary nuerical experients show that our algorith is robust and it is often faster than the ethod on both rando exaples and natural iage recovery. Key words. Non-convex optiization, phase retrieval, Levenberg-Marquardt ethod, convergence to global optiu AMS subject classification. 49N30, 49N45, 90C6, 90C30, 94A0. Introduction. One popular forulation of the phase retrieval proble is solving a syste of quadratic equations in the for.) y r = a r, z, r =,,...,, where z C n is the decision variable, a r C n are known sapling vectors, a r, z is the inner product between a r and z in C n, a is the agnitude of a C, and y r R are the observed easureents. This proble arises fro any areas of science and engineering such as X-ray crystallography [5, 35], icroscopy [34], astronoy [9], diffraction and array iaging [8, 0], and optics [43]. It also appears in a few other iportant fields, including acoustics [, 3], blind channel estiation in wireless counications [, 0], interferoetry [3], quantu echanics [, 39] and quantu inforation [6]. Many algoriths have been developed to solve.). One of the ost widely used ethod is the error reduction algorith derived by Gerchberg and Saxton [4] and Fienup [0, ]. This approach has been extended as the hybrid input-output HIO) algorith proposed by Fienup []. Bauschke et. al. established a few connections between the ER and HIO algoriths and classical convex optiization ethods in [4]. Based on these connections, they proposed the hybrid projection-refection HPR) ethod in [5]. Luke further developed in [3] the relaxed averaged alternating reflection RAAR) ethod which can often be ore efficient and reliable than the HIO and HPR ethods. The quadratic syste.) can be forulated as the following nonlinear least squares NLS) proble:.) in fz) = z C n a r z ) y r. School of Matheatical Sciences, Peking University, Beijing, CHINA achao pku@pku.edu.cn). State Key Laboratory of Scientific and Engineering Coputing, Acadey of Matheatics and Systes Science, Chinese Acadey of Sciences, CHINA liuxin@lsec.cc.ac.cn). Research supported in part by NSFC grants 0409, 330 and 93305, and the National Center for Matheatics and Interdisciplinary Sciences, CAS. Beijing International Center for Matheatical Research, Peking University, Beijing, CHINA wenzw@pku.edu.cn). Research supported in part by NSFC grant 309 and by the National Basic Research Project under the grant 05CB85600.

2 Wen et. al. introduced an alternating direction ethod of ultipliers ADMM) to solve.) in [44] and showed that the ADMM is usually coparable to any existing ethods for both classical and ptychographic phase retrieval probles. In [40], Yoav Shechtan et. al. proposed a daped Gauss-Newton schee. Other approaches include the difference ap DF) algorith developed by Elser [5] and the so-called saddle-point optiization algoriths developed by Machesini [3]. Netrapalli et. al designed alternating iniization ethods in [37]. Although the ethods entioned above often perfor well nuerically, their convergence to the global optial solutions is not clear, yet. Recently, there are a few iportant progress on achieving the global optiality for solving nonconvex optiization probles. Miniizing a coposite function with nonconvex sparse regularization ter is studied in [46, 30, 7]. Sun and Luo proved in [4] that a firstorder ethod converges to global optiality on a atrix copletion proble. Candes et. al. proposed a so-called Wirtinger flow ) algorith for solving the odel.) in [9]. The algorith is consisted of two parts. An initial point z 0 is obtained fro the leading eigenvector of a certain atrix, and the point is refined by a gradient descent schee in the sense of Wirtinger calculus iteratively. When there are no noise involved in the easureents of.), it is proved in [9] that the initialization step yields an initial point z 0 very close to the set of global optial solutions with a high probability. Then it is showed that the algorith converges to the global iniizer in a global linear rate. Since the coputational cost of the each step of the algorith is cheap, the nuerical results see to be practically useful. In this paper, we propose a odified LM approach for solving the NLS odel.). In fact, nuerical ethods for the general NLS probles in rz), where rz) are the residual functions, have been well studied for decades. The Gauss-Newton GN) ethod calculates a search direction deterined by a so-called GN atrix through the first-order inforation. Global convergence to a stationary point can be guaranteed after cobining certain line search techniques. If the NLS has a zero residual at the global optial solutions, the GN atrix equals to the Hessian at these points, which ensures the quadratic local convergence rate of the GN ethod. However, when the Hessian is singular at the solutions, the GN ethod ay fail. Another widely used approach is the LM ethod [9, 33] by adding a regularization ter to the GN atrix. The regularization paraeter is usually updated adaptively in a fashion siilar to the trust-region schee []. The regularization ter akes the LM ethod to conquer the singularity issue. Yaashita and Fukushia established quadratic convergence for singular probles satisfying certain error bound conditions when the regularization paraeter is chosen to be rz) in [45]. Fan and Yuan [8] provided a ore general analysis and extended the applicable regularization paraeters to be a faily µ k = rz k ) δ with δ [, ]. The readers are referred to [4, 8, 3, 36,, 7] and the reference therein, for other algoriths for NLS, including the structured quasi-newton ethod. Our ain contribution is a practical linearly convergent LM ethod with a provable second-order local convergence rate. Our approach is divided into two stages. The first stage is an initialization procedure exactly the sae as the ethod in [9]. The second stage is to update the iterate by an LM ethod where the regularization paraeter is based on the residual nor, i.e., the objective function value fz) in.). Since the Hessian is indefinite and calculating a positive definite correction to the Hessian ay be expensive, it is reasonable to use the LM ethod rather than the odified Newton ethod. By estiating the sallest nonzero eigenvalues of the GN atrix, and establishing local error bound properties and a odified regularization condition, we are able to prove that our approach can achieve a globally linear convergence to the global solution set and attain a locally quadratic convergence rate with high probability. In particular, the region of quadratic convergence is estiated explicitly. In order to reduce the coputational cost, the LM equation is solved

3 inexactly by the PCG ethod. The globally linear convergence to the global solution set is still ensured if the accuracy is proportional to the residual. We further construct a siple practical pre-conditioner using the expectation of the LM coefficient atrix by assuing that the easureents and iteration point are independent. Although the LM coefficient atrix tends to be singular close to the optial solution, the PCG ethod still runs soothly since all iterations are perfored in an invariant subspace. Because the condition nuber of the preconditioned coefficient atrix in this subspace is sall, the nuber of iterations of the PCG ethod can be controlled reasonably sall. Consequently, the total coputational cost becoes at least copetitive to the ethod. Our nuerical experients illustrate that the inexact LM ethod indeed outperfors the ethod on both rando exaples and natural iage recovery. We notice that the authors of [6] show local quadratic convergence rate of the odified LM ethod under certain deterinistic local error bound conditions. However, it is not clear how to verify if the original NLS proble.) satisfies these local error bound conditions, and how to estiate an explicit neighborhood around the solution set where these local error bound holds. The difference is that we can prove the existence of certain local error bound condition in a neighborhood close to the solution set with high probability. Although this theoretically neighborhood ay be quite sall when the diension n is large, our analysis is still eaningful for a second-type algorith. In the rest of this paper, we first give a brief description of the approach and its convergence properties in Section. Our proposed LM approach for the Gaussian odel is introduced in Section 3. The theoretical analysis on the exact LM ethod is presented in Section 4. In Section 5, we establish the convergence of the inexact LM fraework and construct a preconditioner for coputing the LM direction. The algorith is extended to the coded diffraction odel and is analyzed in Section 6. Nuerical experients are reported in Section 7 to deonstrate the effectiveness and efficiency of our LM ethod.. Preliinary... Proble Stateent. We first introduce the Gaussian odel for the choices of the sapling vectors. ASSUMPTION.. A proble is called the Gaussian odel if the saple vectors a r C n N 0, I/) + in 0, I/), where N µ, Σ) denotes a Gaussian distribution with ean µ and covariance Σ. It holds a r 6n for r =,,...,. There is no noise in the observation easureents. Naely, the global iniu of.) is zero. Siilar to the analysis in [9], the event a r 6n holds with probability no less than e.5n in Assuption.. During the theoretical analysis in this paper, we always ake this assuption. Hence, e.5n will always be a ter of the probabilities in the ain theores. Since the decision variable z of.) is coplex, we use the Wirtinger derivatives [38] to calculate the derivatives of the objective function. For any z C n, the coplex conjugate of z is written as z. For ease of notation, we define two augented vectors in bold face as.) z = [ z z ] and z = [ z z Then the objective function of.) can be viewed as a function with respect to the variable z, i.e., fz) = ]. z a r a r) z y r ). 3

4 It follows fro the calculation rules of the Wirtinger derivatives that the gradient is gz) := fz) = a r z ) [ ] a y r a.) r)z r ā r a. r ) z For convenience, we denote fz) := ) a r z y r ar a r)z... The Algorith. We briefly review the algorith in [9] in this subsection. The initial point is constructed fro the eigenvector corresponding to the largest eigenvalue of a atrix Y = y r a r a r. The detailed procedure is outlined in Algorith. It is shown in [9] that the initialization procedure can generate a good approxiation to the set of optial solutions. In fact, let x C n be an optial solution to.) and assue that x is independent of a r. The expectation of Y is EY = xx + x I, whose leading eigenvector is parallel to x. When is sufficiently large, Y is close to its expectation so that the angle r yr between x and the leading eigenvector of Y is sall, and n is close to x r. ar Algorith : Initialization in the ethod Input easureents {a r } and observations {y r } r =,,..., ). Calculate z 0 to be the leading eigenvector of Y = y r a r a r. 3 Noralize z 0 such that z 0 r = n y r r a r. Once an initial point z 0 is obtained, the ethod executes gradient descent steps via Wirtinger derivative using a restricted step size.3) z k+ = z k µ k z 0 : µ k z 0 fz k). The update of the conjugates { z k } is oitted since it is equivalent to the calculation of {z k }. Let x C n be an optial solution to.). For each z C n, the distance between x and z is easured as distz, x) = in z φ [0,π] eiφ x = z + x z x. The next theore shows the property of the initialization Algorith and the global linear convergence of the algorith.3). When the nuber of easureents is sufficiently large, the spectral initialization can produce a good initial point. Consequently, by initiating fro this point, a linear convergence can be achieved with high probability. THEOREM.. Theore 3.3 of [9]) Suppose that Assuption. holds. Let x C n be any solution of.), c 0 n log n, where c 0 is a sufficiently large constant. Then the initial estiate z 0 noralized to have a squared Euclidean nor equal to r y r, obeys.4) distz 0, x) 8 x with probability at least 0e γn 8/n γ is a fixed positive constant). Let {z k } be a sequence generated by.3) starting fro any initial solution z 0 obeying.4) with µ k = µ c /n for all k and soe fixed constant c. Then there is an event of probability at least 3e γn e.5 8/n, such that on this event, we have distz k, x) µ ) k/.5) x

5 3. A Modified LM Method. The algorith is essentially a gradient descent ethod with a restricted step size. Since the odel.) is a NLS proble, it is natural to consider the LM ethod for a faster local convergence rate than the ethod. Using the calculation rules of the Wirtinger derivatives, we obtain the Jacobian and GN atrix of fz): 3.) 3.) Jz) := [ ] a z a, a z a,, a z a a z ā, a z ā,, a, z ā Ψz) := Jz) Jz) = [ a r z a r a r a rz) a r a r a rz) ā r a r a rz ā r a r ]. The LM direction s k is calculated by solving the following linear syste 3.3) Ψ µ k z k s k = gz k ), where µ k 0 and Ψ µ z = Ψz) + µi. Then the iteration schee of the LM algorith is 3.4) z k+ = z k + s k. The role of the paraeter µ k is iportant. It can be updated siilar as the strategies in the classic trust-region type algoriths. For the sake of theoretical analysis, we propose the following updating rules for the Gaussian odel: 3.5) { 70000n nfzk ), if fz µ k = k ) 900n z k ; fzk ), otherwise. Roughly speaking, when the residual is large and the iteration is far away fro the optial solution set, the larger paraeter µ k = 70000n nfz k ) can guarantee a global linear convergence. As long as the residual becoes sall enough, the choice of µ k = fz k ) adapted fro [45, 8] ensures a fast local convergence rate. To further iprove the efficiency of the LM algorith in practice, the equation 3.3) can be solved inexactly after reaching certain criterion, such as 3.6) Ψ µ k z k s k + gz k ) η k gz k ) for soe constant η k 0. With a suitably chosen paraeter η k, a global linear convergence rate of the LM ethod can be guaranteed while a better nuerical perforance than the exact LM ethod can be achieved. The fraework of the exact and inexact LM ethod are unified in Algorith. Algorith : An Modified LM ethod for Phase Retrieval Input: Measureents {a r }, observations {y r }. Set ɛ 0. Construct an initial guess z 0 using Algorith. Set k := 0. 3 while gz k ) ɛ do 4 Copute s k by solving 3.3) with µ k specified in 3.5) until 3.6) is satisfied. 5 Set z k+ by 3.4) and k := k +. 6 Output: z k. Siilar to the ethod, the calculation involving the conjugates of {z k } is not necessary. As we will describe later in Section 5., the LM equation 3.3) can be solved by the PCG ethod which only consists of a series of vector suations and atrix-vector 5

6 ultiplications. It allows us to calculate s k without considering its conjugate. Therefore, the coputational cost and storage are reduced. Nevertheless, for convenience of theoretical analysis, we still deal with atrices in C n n and treat variables in C n. We should ention that the GN and Newton ethods are not used because of singularity issues. Note that the NLS.) adits a zero residual at an optial solution under Assuption.. The GN atrix Ψz) equals to the Hessian at this solution and they are ostly singular. Consequently, Newton and GN ethods cannot be eployed directly. The odified Newton ethod is not practical either because the Hessian is indefinite and it is often intractable to calculate a suitable regularization paraeter. Our odified LM ethod whose paraeter µ k tending to zero conquers the singularity issue and ensures a local quadratic convergence. 4. Analysis of the Exact LM Method. In this section, we analyze the convergence of our LM algorith with η k = 0 in 3.6). The ain result consists of two parts. When fz k ) 900n z k holds, our odified LM algorith can achieve a globally linear convergence with x and guarantees a quadratic convergence rate with high probability. Our ain result on Gaussian odel is stated as follows. THEOREM 4.. Suppose that Assuption. holds. Let x C n be any solution of.) and c 0 n log n, where c 0 is a sufficiently large constant. Let {z k } be a sequence generated by Algorith where the LM equation exactly solved. Then, starting fro any high probability. Otherwise, it iplies distz k, x) 4 n initial solution z 0 satisfying distz 0, x) 8 x, there is an event of probability at least 5e γn 8/n e.5n γ is a fixed positive constant), such that on this event, 4.) where 4.) distz k+, x) < c distz k, x), for all k = 0,,... c := x 4µ k ), if fz k ) 900n z k ; n 9.89 n, otherwise. Furtherore, there exists a sufficiently large integer l satisfying fz l ) < 900n z l. Consequently, it holds for all k l that 4.3) where 4.4) distz k+, x) < c distz k, x), c := n. x The lower bound of the probability of convergence in Theore 4. is of the sae order as that of Theore. although the constant γ is different. When n is sufficiently large, e γn and e.5n becoes negligible copared to the ter /n. Then the probabilities in Theores. and 4. tend to be equal. Since the ethod is onotone, and according to the selection of the paraeter µ k, the coefficient c is uniforly bounded above by x x n 4µ 0 = and tends to n nfz 0) 9.89, which is a constant less than. In this sense, n our linear convergence rate is no worse than the ethod. One advantage of our odified LM ethod is its locally quadratic convergence property. It cannot be derived directly fro the analysis for the deterinistic probles in [8] fro two ain perspectives: i) we adit a ore relaxed region where the local error bound properties hold; ii) the neighborhood of provable quadratic convergence can be estiated specifically. 6

7 4.. Leas for the Proof. Let X C n be the set of optial solutions of.) and the letter x C n be reserved for a solution of.). We first prove Theore 4. in the case x =. In the end, we coplete the proof by showing that the case x can be reduced to the case x =. When z is independent to {a r }, it is easily verified that EΨz) = Φz), where [ ] zz 4.5) Φz) = + z zi zz zz zz + z zi. Although the LM iterates {z k } are not independent to the easureents {a r }, the relationship between Ψz) and Φz) still plays an iportant role in our theoretical analysis. For convenience of notation, we also use Φ µ z = Φz) + µi hereafter. The first lea describes the concentration of the GN atrix at a solution x. LEMMA 4.. For any z C n and δ > 0, there exists a sufficiently large nuber c = cδ). If > cn log n, then Ψz) Φz) δ z holds with probability at least 0e γn 8/n. Lea 4. can be verified in the sae anner of Lea 4.7 in [9]. The next lea is on the saple covariance atrix which can be proved in a siilar fashion. LEMMA 4.3. Assue Ψx) Φx) δ, then I n a r a r δ with probability no less than e γ. On this event, it holds 4.6) δ) u a ru + δ) u, u C n. The next lea reveals the distribution of the eigenvalues of Ψx). LEMMA 4.4. Suppose that x =. Then Φx) has one eigenvalue of 4, one eigenvalue of 0, and all other eigenvalues are. If Ψx) Φx) δ, then the largest eigenvalue of Ψx) is less than 4 + δ. The above lea is straightforward and hence its proof is oitted. Our proof also uses the following lea fro [6]. LEMMA 4.5. Suppose X, X,..., X are i.i.d. real-valued rando variables obeying X r b for soe nonrando b > 0, EX r = 0, and EXr = v. Set σ = axb, v ), then ) PX X y) exp y σ. For any z C n, we define x z to be the vector in X nearest to z, i.e., x z = arg in z x. x X Then, we denote h z = z x z. We now describe a few essential characteristics of Ψz), fz), and gz) near the global solution. The so-called local error bound property is an instinctive 7

8 property of the objective function. Since its proof is different fro that of [9], the detailed analysis is included. The other two properties highly depend on our odified LM ethod. These three properties are the foundation of our analysis. We ephasize that the bold face letters z, u, v, h are the augented vectors defined as.) for z, u, v, h, respectively. LEMMA 4.6. Suppose that Assuption. holds, cn log n where c is sufficiently large, and Ψx) Φx) δ holds with δ = 0.0. Let µ be deterined by 3.5). Then, with probability at least e 3γn, we have the following properties.. Estiate of the sallest nonzero eigenvalues: 4.7) v Ψu)v u v holds for all u, v C n, such that u = v = and Iu v) = 0;. Local error bound property: 4.8) 4 distz, x) fz) 8.04distz, x) ndistz, x) 4, holds for any z satisfying distz, x) 8 ; 3. Regularization condition: 4.9) µz)h Ψ µ z ) gz) 6 h n h gz) holds for any z = x + h, h z 8, and fz) 900n. Proof. ) To prove 4.7), we first prove that for any u, v C n, 4.0) v Ψu)v v Φu)v u v, by eploying Lea 4.5. Then, by the condition Iu v) = 0, we have v Φu)v u v, which copletes the proof. We first consider the case when u and v are fixed. Define X r u, v) = a ru a rv + Rea ru) a r v) ), E r u, v) = EX r u, v), then v Ψu)v = X r u, v), v Φu)v = E r u, v). Let Y r u, v) = E r u, v) X r u, v), we obtain v Φu)v v Ψu)v = Y r u, v). Since Rea ru) a r v) ) a ru a rv, we have X r u, v) 0. In addition, considering u = v =, it is easy to know E r u, v) = u v + + 4Rev u) 8. Hence, we know Y r u, v) 8. Meanwhile, it follows fro the inequalities EEX) X) ) EX ) and X r 4 a ru a rv that EY r u, v) EX r u, v) 6E a ru 4 a rv 4 6 E a ru 8 E a rv 8 = 384. By choosing σ = 384 and y = /, Lea 4.5 iplies Pv Φu)v v Ψu)v 0.5) e

9 Choosing γ to be a sufficiently sall positive nuber, such as 307, we obtain Pv Ψu)v v Φu)v 0.5) e γ. We have verified 4.0) when u and v are a fixed pair of vectors. We next extend the result to any pair of u and v. To achieve this goal, we prove that Y r will not change too uch when the variation of u and v are sall, and use a net on S n S n to coplete the extension. We next define Then for any u, v, v C n, we have gu, v) = Y ru, v) = v Φu)v v Ψu)v. 4.) gu, v) gu, v ) E ru, v) E ru, v ) + X ru, v) X ru, v ) u v u v + 4 Re v u) v u) ) + a ru a r v a rv + Re a r ū) a rv) a rv ) )). For the first two parts of 4.), we have 4.) u v u v = v v ) uu v + v uu v v ) v v, 4.3) For the third part of 4.), we can derive 4.4) a ru a r v a rv Re v u) v u) ) v v. a ru v v ) a r a rv + v a r a rv v ) 6n v v + δ)n v v. A siilar derivation on the fourth part of 4.) gives a ru 4.5) Re a r ū) a rv) a rv ) )) + δ)n v v. Substituting 4.)-4.5) into 4.) yields gu, v) gu, v ) δ)n) v v. Siilarly, for any u, u, v C n, we obtain gu, v) gu, v) δ)n) u u. Hence, for any u, u, v, v C n, it holds gu, v) gu, v ) δ)n) u u + v v ). 9

10 Choose ɛ 48+9+δ)n, such as ɛ = 50n, and let N ɛ be an ɛ-net of S n. Then for any u, v) S n S n, we can find u, v N ɛ N ɛ, satisfying u u + v v 4+96+δ)n. Hence, gu, v) gu, v ). We can choose an N ɛ obeying N ɛ + ɛ )n. Therefore, with probability larger than + ) 4n e γ, ɛ we have for any u, v ) N ɛ N ɛ, gu, v ) 0.5. In this occasion, for any u, v S n, we have which eans gu, v) gu, v ) + gu, v) gu, v ), v Ψu)v v Φu)v = u v + 4Rev u) ) +. This copletes the proof of 4.0). When Iu v) = 0, we have Rev u) ) 0. Therefore, v Ψu)v. In addition, when cn log n and c is sufficiently large, we have + ) 4n e γ = + 500n) 4n n cγn e γ e γ. ɛ This copletes the proof of 4.7). ) We now prove the left hand side of 4.8). Recalling that z = x + h, what we want to prove is that with high probability, 4.6) a r x + h) a rx ) 4 h holds for any h 8. Note that a rx + h) a rx = Rex a r a rh) + a rh. Let h = sy, where s = h R, y C n and y =. Then, it suffices to prove 4.7) Rex a r a ry) + s a ry ), for 0 s 8. We first prove the inequality for a fixed y, then extend the result to any y by using a covering arguent. Since the technique is nearly the sae as what is done in VII.F) of [9], we only suarize the ain steps here. Let X r y, s) := Rex a r a ry) + s a ry ) and Yr y, s) := EX r y, s) X r y, s). Then, by VII.5-7) of [9] and the fact that Ix y) = 0, we can easily calculate: 4.8) EX r y, s) = s + 8sRex y) + 6Rex y) +. 0

11 Using 0 s 8 and X ry, s) 0, we obtain the following estiations Y r y, s) EX r y, s) s + 8s + 8 < 0, EY r y, s) EX r y, s) = s 4 E a ry 8 + 8s 3 E a ry 6 Rex a r a ry) + 4s E a ry 4 Rex a r a ry) +3sE a ry Rex a r a ry) 3 + 6ERex a r a ry) 4 s 4 E a ry 8 + 8s 3 E a rx E a ry 4 + 4s E a rx 4 E a ry +3s E a rx 6 E a ry E a rx 8 E a ry 8 4s s 3 + 9s + 859s < 50. Applying Lea 4.5 with σ = ax50, 0 ) = 50 and y = /4 yields ) P Y r y, s) e γ 4 with γ = /860. It further iplies ) P X r y, s) EX r y, s) e γ. 4 Since EX r y, s) = s + 8sRex y) + 6Rex y) + 8s x y, we have ) 4.9) P X r y, s) 3 e γ. 4 This copletes the proof of 4.9) for a fixed y. In order to extend the result to all y C n, we only need to estiate X r y, s) X r y, s) L y y, for any y, y C n, and find an ɛ-net N ɛ with ɛ /4L). Then, with probability no less than e γ, for all y N ɛ, 4.9) holds for 0 s /8. Under this circustance, for any y C n, we can find a y N ɛ, and have X r y, s) X r y, s) =. X r y, s) X r y, s) This copletes the proof of 4.7) and thus the left side of 4.8). We next prove the right hand side of 4.8). By soe siple calculation, we have f = Reh a r a rx) + a rh ) 4 Reh a r a rx) + a rh 4 4 a rh a rx + a rh 4.

12 Together with the inequalities 4.6), Corollary 7.6 of [9] and VII.9) of [9], we can further obtain Recall the fact δ = 0.0, f can be bounded as f 4 + δ) h δ)n h 4. f 8.04 h n h 4. This copletes the proof of 4.8). 3) Finally, we verify 4.9). The right side of 4.8) iplies h z 900n holds. We notice that 4.0) µh Ψ µ z ) g = h g h Ψ µ z ) Ψ z g. 00 n when fz) Therefore, we estiate the two ters in the right hand side of 4.0), respectively. Siilar to what is done in VII.G) of [9], we obtain 4.) h g 8 h + 000n h g. Let λ i and w i, i =,..., n be the i-th sallest eigenvalue and associated eigenvector of Ψ z, respectively. Suppose that g has the following decoposition g = n c s w s, where c s are coplex nubers. Then, we obtain s= which gives Ψ µ z ) Ψg = n s= λ s λ s + µ c sx s, 4.) h Ψ µ z ) Ψg h Ψ µ z ) Ψg On the other hand, for any y C n, y =, we have λ n λ n + µ h g. y Ψz) Ψx)) y = a r z a rx ) a ry + Re a rz) a rx) ) a ry) ) a r z a rx ) a ry + a rz) a rx) ) a r y ) a ry 4 4.3) a rz a rx + a rz) a rx) ). Siilar to the proof of the right side of 4.8), we can get which gives a r z a rx + a r z) a rx) 6.08 h +.n h 4, 4.4) y Ψz) Ψx)) y.n 6.08 h +.n h n h,

13 where the last inequality uses h /8. Together with Lea 4.4, we obtain λ n n h. Substituting the above eigenvalue evaluation to 4.) and together with µ = 70000n nfz) 35000n n h, we have 4.5) h Ψ µ z ) Ψg n h n h n n h h g n/ n/ n/00 h g n h g 6 h + 300n h g, where the second inequality uses that h 00 a+bz, and n a+cz decreases on z when b < c, and the last inequality uses the relationship h 8. Substituting 4.) and 4.5) into 4.0), we iediately obtain 4.9). This copletes the proof of Lea Proof of Theore 4.. By abuse of the notation, we siply denote z as the current iterate and z + deterined by z + = z Ψ µ z ) gz). Subtracting x fro this equation, we have 4.6) h z + z + x z = h z Ψ µ z ) gz). For the sake of siplicity, we oit the letter z in fz), µz), h z and h z +, and oit the letter z in gz), Ψz) and Φz), when it causes no abiguity. We divide the proof of Theore 4. into two parts. ) We verify 4.) and 4.3) under the condition fz) < z 900n. The updating forula for µ gives µ = fz). Using the left hand side of 4.8) of Lea 4.6, we have h z z f = 900n + h z 5 n, which iplies h 4 n. By the definition of h, we know that Iz h) = 0, which eans z h = 0. On the other hand, it is easy to verify that h z Ψ µ z ) gz) = Ψ µ z ) Ψ µ z h z gz)) = Ψ µ z ) µh + [ h a r a ar a rh r)z ā r a r ) z ] ), and z µh + [ h a r a ar a rh r)z ā r a r ) z 3 ] ) = 0,

14 which further gives I z h + ) h a r a rha r a r)z) = 0. Hence, the eigenvalue estiate 4.7) iplies that the sallest eigenvalue of Ψ restricted in the subspace S := {v Iz v) = 0} is z. Therefore, the largest eigenvalue of Ψ µ z ) restricted in S is. Then, we can obtain z +µ 4.7) 4.8) Denote v := h + Ψµ z ) µh + z µh + + µ h a r a rha r a r)z, we obtain 4.9) h + [ h a r a ar a rh r)z ā r a r ) z ] ) [ h a r a ar a rh r)z ā r a r ) z ] ). z + µ µh + v µ z + µ h + z + µ v. By using the definition of µ and the local error bound condition 4.8), it holds that 4.30) 4.3) z + µ = x h + f h + h + h, µ = f 8.04 h n h.0 h +.4 n h We next estiate v. For any u C n and u =, using 4.6), Corollary 7.6 of [9] and VII.9) of [9], we have u v a rh a rz a ru = a rh a rx + h) a ru a rh 3 a ru + a rh a rx a ru 6n h a rh + + δ) a rh 4 6n + δ) h 3 + 6n + δ) + δ) h. Therefore, the nor of v can be bounded by 4.3) v = v 6n + δ) h 3 + 6n + δ) + δ) h ) = 3.03n h n h. Substituting 4.30)-4.3) into 4.9) yields h + µ h + v 4.0 h +.48 n h n h n h. Using the fact that h = h 7, we further have n h + < n) h < n 9.89 h, n 4

15 which guarantees the inequalities 4.) and 4.3) under the situation that fz k ) < z 900n. ) We next consider the case under the conditions fz) z 900n and h 8. Recalling the inequality 4.8) and 4.9) in Lea 4.6, and the positive definiteness of Ψ z, we obtain h + h Ψ µ z ) g h h Ψ µ z ) g + µ g ) ) h + 8µ µ g 66000n h µ ) h + ) 8µ µ 35000n n h g ) h, 66000n h 8µ which iplies 4.33) h + ) h. 4µ Therefore, we finish the proof for the special case x =. 3) Finally, we consider x. By observing the iteration schee 3.4), it is not difficult to verify that starting fro z0, the kth LM iteration for a proble with a solution x x is x zk+ x = z k x + s k x. Therefore, by the previous proof for the case x =, we have zk+ dist x, dist zk+ x, x ) x < x x dist zk x, x x ) 4 µk x ) < n, which yields 4.) and 4.3). This copletes the proof for all x. ) ) zk dist x, x, x 5. Analysis of the Inexact LM Method. In this section, we first establish the convergence result for the inexact LM fraework, then present a pre-conditioned conjugate gradient PCG) ethod for solving the LM syste inexactly and provide a practically useful choice of the pre-conditioner. 5.. Convergence of the Inexact LM Method. The following theore describes the global linear convergence of the inexact LM ethod. THEOREM 5.. Suppose that Assuption. holds. Let x C n be any solution of.), and c 0 n log n, where c 0 is a sufficiently large constant. Assue that {z k } is a sequence generated by Algorith with the paraeter 5.) η k := { c ).35n µ k x, if fz k) 900n z k ; 9.89 n c x 49.57n x µk gz k ) x, otherwise. 5

16 Then, starting fro any initial solution z 0 satisfying distz 0, x) x /8, there is an event of probability at least 5e γn 8/n e.5n, such that on this event, it holds that 5.) distz k+, x) < + c distz k, x), for all k = 0,,... distz k+, x) < 9.89 n + c x distz k, x), for all k l, x where c, c are defined by 4.) and 4.4), respectively, and l satisfying fz l ) < 900n z l. Proof. We only prove the result when x = and fz k ) 900n z k. The other part can be proved in the sae anner and hence oitted. Let z k+ := z k Ψ µ k z k ) gz k ) be the exactly LM step at the k-th iteration. By using Theore 4., we have 5.3) distz k+, x) < c distz k, x), and 5.4) distz k+, x) distz k+, x) + z k+ z k+ = distz k+, x) + s k + Ψ µ k z k ) gzk ) = distz k+, x) + Ψ µ k z k ) Ψ µ k z k s k + gz k )) distz k+, x) + η µk gz k ). By using Lea 4.3, Lea 4.6 and Cauchy-Schwarz inequality, we obtain gz k ) = a r z k a rx ) a r a rz k 4n zk + δ) a rz a rx ) 5.5).35n h k. Substituting 5.3), 5.5) and the updating forula 5.) into 5.4), we iediately obtain the relationship 5.). Theore 5. tells us that if η k takes the order of fz k ), the inexactly LM ethod guarantees a global linear convergence to a global solution. When η k takes the order of fz k ) 3, the inexactly LM ethod achieves a local quadratic convergence rate. 5.. Solve the LM Equation by PCG. In this subsection, we discuss the PCG ethod for solving the LM equation 3.3). The CG ethods adits a global linear convergence rate which depends on the condition nuber of the coefficient atrix see [4, 7]). However, the linear syste atrix Ψ µ z tends to be singular as the paraeter µ k decreases, which takes place when the iteration is close enough to the solution set. Our recipe is using the PCG ethod with a suitable pre-conditioner M. Therefore, the original linear syste 3.3) is replaced by 5.6) M Ψ µ z s = M gz). 6

17 5.7) Since EΨz) = Φz) if z is independent to {a r }, we suggest to use a pre-conditioner Φ µ z := Φ z + µ z I n and Φ µ z ) gz) as the initial point of the PCG ethod. A siple verification shows that Φ µ z is positive definite and its inverse has an explicit forulation: 5.8) Φ µ z ) = ai n + bzz + c z z, where a = z + µ, b = 3 z + µ)4 z + µ), c = z + µ)µ. Hence, the linear syste Φ µ z ) s = b can be calculated in On) arithetic operations. The reaining task is to analyze the condition nuber of Φ µ z ) Ψ µ z. Siilar to Ψ µ z, Φ µ z is also nearly singular once µ is sall. Therefore, the condition nuber of Φ µ z ) Ψ µ z is likely to be huge. Fortunately, the subspace V := { x x = [ s s ], s C n } is a coon range space of Φ z and Ψ z. It can be easily verified that any iteration z is in V and Φ µ z ) Ψ µ z z V if z V. It is easy to establish the following convergence property of the CG ethod. LEMMA 5.. Assue that A is a positive seidefinite atrix and V is its range space. Denote A µ := A + µi. Let y V be the solution of the linear syste A µ y = b, and {y k } be the sequence generated by the CG ethod fro an initial point y 0 V. Then, for any k, it holds k κv A y k y A µ µ ) y 0 y A µ, κv A µ ) + ) where y A µ = y A µ y) / and κ V A µ ) refers to the restricted condition nuber κ V A µ ) := ax y V, y y A µ y = in y V, y y A µ y. = Lea 5. shows that one only need to evaluate the restricted condition nuber of Φ µ z ) Ψ µ z. Without loss of generality, we assue x =. Let λ be an eigenvalue of Φ µ z ) Ψ µ z, and y λ be the corresponding eigenvector. Firstly, we have 5.9) Φ µ z ) Ψ µ z y λ y λ = λ. Using the relationship 4.7) and 4.4) and the fact that h 8, we obtain Φ µ z ) Ψ µ z y λ y λ = Φ µ z ) )Ψ µ z Φ µ z )y λ z + µ Ψµ z Φ µ z y λ 7 8) + µ Ψ z Ψ x + Ψ x Φ x + Φ x Φ z ) 7 8) + µ 37.09n h + 0.0). 7

18 Assue µ = Kn nfz), then Hence, we have λ 74.8n h ) < 75 + Kn nfz) K n κ V Φ µ z ) Ψ µ z ).03K n K n 75, which eans the condition nuber is close to if either K or n is large. In each PCG iteration, the coputational cost of the gradient evaluation is On), and the cost of the atrix-vector ultiplications for calculating Φ µ z ) s is also On). Lea 5. shows that the upper bound of the nuber of iterations is related to the restricted condition nuber and the distance between the initial guess and the solution set. Since the restricted condition nuber of Φ µ z ) Ψ µ z is sall, the PCG ethod often takes just a few iterations to achieve a good accuracy. Therefore, the coputational cost at a single iteration of our PCG ethod is not too expensive than that of the ethod. 6. Extensions to the Coded Diffraction CD) Model. We ake the following assuption in this section. ASSUMPTION 6.. A proble is called the CD odel if n 6.) y r = xt) d l t)e iπkt/n, r = l, k), 0 k n, l L, t=0 where xt) and d l t) denote the t-th eleent of x and d l, respectively. Assue that L clog n) 4, where c is a sufficiently large nuerical constant, and d l are i.i.d sapled fro a distribution d, which is syetric and satisfies d M and Ed = 0, Ed = 0, E d 4 = E d ). For the CD odel, an initialization via resapled Wirtinger Flow is introduced in [9] as Algorith 3. By conducting a resapled gradient descent steps, this initialization schee can provide a better initial guess than that of Algorith. Algorith 3: Initialization via the resapled ethod Input easureents {a r } and observations {y r } r =,,..., ). Divide the easureents and observations into B + groups of size = /B + ). The easureents and observations in group b are denoted as a b) r and y b) r for b = 0,,..., B. 3 Obtain u 0 by conducting Algorith on group 0. 4 For b = 0 to B, perfor the following update: u b+ = u b µ u 0 5 Set z 0 = u B. z a b+) r 8 ) y b+) r r )z a b+) r a b+)

19 By eploying Algorith 3, the distance between the initial guess z 0 and a solution x can be iproved to 6.) distz 0, x) 8 n x. Then the ethod can achieve distz k, x) 8 n µ 3 ) k/ x. Readers who are interested in the algorith can refer to section V and VII of [9] for the detailed inforation. The odified LM Algorith can be extended to solve the CD odel directly. For the sake of theoretical analysis, the regularization paraeter µ k is updated as 6.3) { 35000n fzk ), if fz µ k = k ) 360n z k ; fzk ), otherwise. If L clog n) 3, then a counterpart of Lea 4. holds with probability at least L+)/n 3. The first equality in Lea 4.3 also holds with probability no less than /n. Finally, we extend Lea 4.6 to the CD odel. LEMMA 6.. Suppose that Assuption 6. holds, Ψx) Φx) δ holds with δ = 0.0 and µ k is updated by 6.3). Then, with probability at least 3/n, we have. Estiate of the sallest nonzero eigenvalue: 6.4) v Ψu)v u v holds for all u, v, z C n, such that Iu v) = 0, and distu, x) /50 n);. Local error bound property: 6.5) 4 5 distz, x) fz) 8.04distz, x) ndistz, x) 4, holds for any z satisfying distz, x) 8 n ; 3. Regularization condition: 6.6) µz)h Ψ µ z ) gz) 6 h n h gz) holds for any z = x + h, h 8 z, and fz) n 360n. Proof. Since the easureents are not independent fro each other in the CD odel, Lea 4.5 cannot be applied. We first prove 6.4). Note that for u, v, z C n satisfying Iu v) = 0, v Ψu)v = v Φu)v + v Ψu) Φu))v u v + u v + 4Re v u) ) v Ψu) Φu) u v v Ψu) Φu). Hence, 6.4) holds if Ψu) Φu) 3 4 u for all u obeying distu, x) 50 n. Because Ψu) Φu) is hoogeneous for u when x is fixed, we assue u = without loss of generality. Then, we have to prove Ψu) Φu) 3 4. It holds that 6.7) Ψu) Φu) Ψu) Ψx) + Ψx) Φx) + Φx) Φu). 9

20 Taking h = u x leads to h /50 n). By using 4.4), we have y Ψu) Ψx)) y.n6.08 h +.n h 4 ) 0.57, for all y C n, y =. Therefore, Ψu) Ψx) 0.57 is an estiation of the first ter of the right side of 6.7). The second ter of 6.7) satisfies Ψx) Φx) δ = 0.0. Siilar to the first ter, the third ter of the right side of 6.7) can be estiated as Hence, we have Φx) Φu) 8 + h ) h ) Ψu) Φu) = This copletes the proof of 6.4). We next prove the left side of 6.5). By following a b) a b, we have fz) = = a r z a rx ) Reh a ra r x ) + a rh ) Reh a ra r x ) Using Corollary 7.5 of [9] and Ih x) = 0, we know Together with a rh 4. Reh a ra r x ) δ h. a rh 4 6n + δ) h 4 and h /8 n), we obtain fz) δ ) 6 + δ) h > h. The right side of 6.5) can be proven in the sae way as 4.8). Hence, the detailed proof is oitted. Finally, we prove 6.6). Siilar to 4.3) and using h 8, we can estiate n the largest eigenvalue of Ψz): λ n nfz). Therefore, using a sae derivation as 4.5), we obtain h Ψ µ z ) Ψg 6 h n h g, which together with 4.) gives 6.6). 0

21 Consequently, both global linear convergence rate and local quadratic rate can be established for the CD odel based on the above leas. THEOREM 6.3. Suppose that Assuption 6. holds. Let x C n be any solution of.) and {z k } be a sequence generated by Algorith where the LM equation exactly solved and µ k is chosen as 6.3). Then, starting fro any initial solution z 0 obeying distz 0, x) x 8 n, there is an event of probability at least L + )/n 3 /n such that on this event, distz k+, x) < c distz k, x), for all k = 0,,... distz k+, x) < c distz k, x), for all k l, where s satisfies fz s ) < 360n z s, and ) x 4µ 6.9) c := k, if fz k ) 360n z k ; n n, otherwise; 6.0) c := n. x Siilar theoretical results on the inexact LM ethod can also be derived. The proof of the theore follows the sae procedure and shares the sae inequalities, although the calculation is different. We oit the for conciseness. 7. Nuerical Experients. In this section, we present soe nuerical results to deonstrate the perforance of the LM ethod using the paraeter µ k = fz k ), and copare it with the ethod in [9]. 7.. Recovery of D signals. We begin our nuerical experients on -D rando signals under Gaussian and CD odel. In order to ake coparison with the ethod, we choose the sae type of signals as that in [9]: Rando low-pass signals, where x is given by x[t] = M/ k= M/ ) X k + iy k )e πik )t )/n, with M = n/8 and X k and Y k are i.i.d. N 0, ). Rando Guassian signals, where x C n is a rando coplex Gaussian vector with i.i.d. entries of the for X[t] = X + iy, with X and Y distributed as N 0, ). In the initialization step, 50 iterations of power ethod are run to calculate the eigenvector needed in Algorith. For the LM ethod, we solve the LM syste accurately and inaccurately, and these two versions are denoted by and, respectively. For the ethod, we set η k = 0 6 in 3.6). For the ethod, we set the axiu iteration nuber of PCG to be ink +, 5), where k is the iteration nuber in Algorith. We stop the LM algorith after 00 iterations. For the ethod, we use the step length µ k = in exp k/τ 0 ), 0.), where τ 0 330, and stop after 500 iterations. Notice that in each iteration, there are at ost 5 PCG iterations. Consequently, the coputational

22 Gaussian odel Coded diffraction odel Success rate /n Gaussian odel a) Success rate for Gaussian signals L Coded diffraction odel Success rate /n b) Success rate for low-pass signals L FIG. 7.. Epirical probability of success based on 00 rando trials. cost of every PCG iteration is about two ties of a iteration. Hence, considering the calculation of gradient in each iterations, the coputational cost of one iteration is no ore than that of iterations. Therefore, excuting 00 iterations is not ore expensive than 500 iterations. In this experient, we set n = 5 and copare the epirical success rate and the CPU tie of the LM and ethods. The epirical probability of success is an average over 00 trials, where in each instance, new rando sapling vectors are generated according to the Gaussian or CD odels. For coded diffraction odel, we use octanary patterns as the asks in [9]. We declare a trial successful if the relative error of the reconstruction distz final, X )/ x falls below 0 5 before the iteration process is stopped. For a successful trial, we define the CPU tie of this trial to be the tie used until the first iteration after which the relative error is saller than 0 5. Figure 7. shows that around 4.5n Gaussian phaseless easureents or 6 octanary patterns are enough for an exact recovery with high probability for all algoriths. For all tested signals and odels, the success rate of the LM ethods rises a little bit earlier than the ethod as the nuber of easureents increases. The three algoriths perfor siilarly in ters of the success rates. Furtherore, this figure shows that solving the LM equations inexactly does not exert significant ipact on success rate of the LM ethod. We next exaine the order of convergence of the and LM ethods. Figure 7. shows the relationship between the relative error in logarith scale and the nuber of iterations for the three algoriths. To better illustrate the perforance of the LM ethods, we only show errors of the first 40 iterations. We can see fro Figure 7. that the ethod does show

23 0 Gaussian odel 0 Coded diffraction odel 4 4 log0relative error) Iteration 0 Gaussian odel a)gaussian signals Iteration 0 Coded diffraction odel 4 4 log0relative error) Iteration b)low-pass signals Iteration FIG. 7.. Relationship between the relative errors and the nuber of iterations. /n = 6 for Gaussian odel and L = 0 for CD odel. Signal Gaussian signal Low-pass signal odel Gaussian Coded diffraction Gaussian Coded diffraction iter CPU iter CPU iter CPU iter CPU s s s s s s s s s s s s TABLE 7. Coputational results on rando exaples quadratic convergence. As it is expected, the ethod shows linear convergence after the first several iterations. However, its convergence rate is uch faster than that of the ethod. Table 7. presents the averaged nuber of iterations of the LM and ethods used to achieve an accuracy of 0 5 under a fixed /n or L. In the table, the statistics of the ethod is approxiately proportional to the logarith of that of the ethod, which shows the quadratic convergence of the ethod. Although the ethod converges linearly, it converges fast and does not take any ore iterations than the accurate algorith. We should point out that Figure 7. is not very fair to the ethod, since the gradient ethod tends to takes a large nuber of iterations. Therefore, we show the relationship between the relative error and CPU tie in Figure 7.3. Fro this figure, we can see that, although the ethod converges quadratically, it consues uch ore CPU tie than the other two ethods because solving the LM equation accurately needs a lot of PCG iter- 3

24 0 Gaussian odel 0 Coded diffraction odel 4 4 log0relative error) CPU tie 0 Gaussian odel a)gaussian signals CPU tie 0 Coded diffraction odel 4 4 log0relative error) CPU tie b)low-pass signals CPU tie FIG Relationship between the relative errors and CPU tie. /n = 6 for Gaussian odel and L = 0 for CD odel. ations. On the other side, the ethod consues the sallest CPU tie. Table 7. also shows the averaged CPU tie it does not include the CPU tie of the initialization step) of the three ethods to ake a successful recovery. We still can see that the ethod is the ost tie-consuing ethod, while the ethod tends to take uch less CPU tie about /3 or less) than the ethod. Obviously, solving the LM equation is expensive although a proising PCG is eployed, and aking a suitable truncation to the PCG ethod can efficiently reduce the coputational cost. 7.. Perforance on natural iage. We next perfor a few nuerical experients on recovering natural iages, siilar to Section IV.C of [9]. The two iages that we use are colored photographs of the Turret of Palace Museu turret ) and the Milky Way Galaxy galaxy ). The colored iages are viewed as n n 3 arrays, where the first two indices encode the pixel location, and the last is the color band. We run the LM and ethods on each of the three RGB iages. We generate L = 0 rando octanary patterns and gather the CD patterns for each color band using these 0 saples. We run 50 iterations of the power ethod in the initialization step. For the ethod, the stopping tolerance of the PCG is set to η k = 0 6. For the ethod, the axiu iteration nuber of PCG is set to 5. For the ethod, the step length is µ k = in exp k/τ 0 ), 0.4), where τ We perfor 5 LM iterations and 300 iterations. The relative error is calculated by x x F / x F where x and x are the recovered and original iage, respectively. The CPU tie is an average of the CPU tie fro three RGB iages. 4

FIG. 7.4. Turret of Palace Museu. The iage size is 35 000 pixels. FIG. 7.5. The Milky Way Galaxy. The iage size is 080 90 pixels. Figure 7.4 and 7.

shows the average nuber of iterations and average CPU tie used by the three algoriths to reduce the relative error to 0 5 and 0 0 for the three color band.

25 FIG Turret of Palace Museu. The iage size is pixels. FIG The Milky Way Galaxy. The iage size is pixels. Figure 7.4 and 7.5 show the iage turret and galaxy recovered by the, respectively. The iages recovered by the other two algoriths are not reported because they are siilar. Table 7. shows the average nuber of iterations and average CPU tie used by the three algoriths to reduce the relative error to 0 5 and 0 0 for the three color band. We can see an obvious advantage of the ethod over the ethod. However, the CPU tie of the ethod is uch larger than the other two ethods. Figure 7.6 shows the relationship between the relative errors and the nuber of iterations. It obviously deonstrates quadratic convergence of the ethod and fast linear convergence of the ethod. In particular, the ethod takes about one iteration to reduce the accuracy fro 0 5 to 0 0. Figure 7.7 shows the relationship between the CPU tie and the relative errors of the three ethods. Due to the inexactness in solving the LM equation, the ethod takes uch less than than the ethod Phase retrieval with noise. We now evaluate the nuerical perforance of the LM ethods when there exists noises in the observation. We add different level of noises to {y r } and explore the relationship between the signal-to-noise rate SNR) of the observation and the ean square error MSE) of the recovered solution. Specifically, SNR and MSE are 5

26 Iage Turret of Palace Musue The Milky Way Galaxy Criterion iter CPU iter CPU iter CPU iter CPU s s s s s s s s s s s s TABLE 7. Coputational results in natural iage recovering. 0 Turret 0 Milky Way 4 4 log0relative error) Iteratioin Iteratioin FIG Relationship between the relative errors and the nuber of iterations for natural iages recovery. calculated by 7.) MSE := dist x, x) i= x, and SNR := a rx 4 w, where x is the output of the LM ethods after 50 iterations or of the ethod after 500 iterations, and w the added noise. The db-scale of MSE and SNR is calculated by 0 log MSE and 0 log SNR, respectively. We construct rando signals with n = 5, set = 6n for Gaussian odel and L = for the CD odel. The SNR is varied fro 0db to 60db. For each case, 00 Monte Carlo trials are repeated. Figure 7.8 shows the results on the change of MSE versus SNR. It shows that both algoriths achieve a siilar order of accuracy. In fact, both algoriths can converge to the sae iniu x with high probability. 8. Conclusion and future work. In this paper, we develop a odified LM ethod via Wirtinger derivative to solve the phase retrieval proble. Starting fro the sae spectral initialization step as the ethod, our ethod converges to the global solution linearly under the sae assuption as the ethod. The convergence rate is further iproved to be quadratic in a predictable neighborhood of the solution. Siilar theoretical analysis holds even if the LM equation is solved inexactly. In particular, a siple yet useful preconditioner is constructed based on the expectation of the LM coefficient atrix by assuing the independence between easureents and the LM iteration. Since the restricted condition nuber of this preconditioned coefficient atrix is sall, it enables a fast convergence of the PCG ethod for solving the LM equation. In our nuerical experients, we verify that the proposed LM ethod indeed converges quadratically in recovering both rando exaples and natural iages if the LM equation is solved sufficiently accurate. Our inexact LM ethod is coparable to the ethod in ters of the success rate and it has advantage in ters of the CPU tie. 6

Globally Convergent Levenberg-Marquardt Method For Phase Retrieval

Globally Convergent Levenberg-Marquardt Method For Phase Retrieval Zaiwen Wen Beijing International Center For Mathematical Research Peking University Thanks: Chao Ma, Xin Liu 1/38 Outline 1 Introduction