Solving Most Systems of Random Quadratic Equations

Size: px

Start display at page:

Download "Solving Most Systems of Random Quadratic Equations"

Joanna Hutchinson
5 years ago
Views:

1 Solving Most Systes of Rando Quadratic Equations Gang Wang, Georgios B. Giannakis Yousef Saad Jie Chen Digital Tech. Center & Dept. of Electrical and Coputer Eng., Univ. of Minnesota Key Lab of Intell. Contr. and Decision of Coplex Syst., Beijing Inst. of Technology Departent of Coputer Science and Engineering, Univ. of Minnesota {gangwang, georgios, Abstract This paper deals with finding an n-diensional solution x to a syste of quadratic equations y i = a i, x 2, 1 i, which in general is known to be NP-hard. We put forth a novel procedure, that starts with a weighted axial correlation initialization obtainable with a few power iterations, followed by successive refineents based on iteratively reweighted gradient-type iterations. The novel techniques distinguish theselves fro prior works by the inclusion of a fresh (re)weighting regularization. For certain rando easureent odels, the proposed procedure returns the true solution x with high probability in tie proportional to reading the data {(a i ; y i )} 1 i, provided that the nuber of equations is soe constant c > 0 ties the nuber n of unknowns, that is, cn. Epirically, the upshots of this contribution are: i) perfect signal recovery in the high-diensional regie given only an inforation-theoretic liit nuber of equations; and, ii) (near-)optial statistical accuracy in the presence of additive noise. Extensive nuerical tests using both synthetic data and real iages corroborate its iproved signal recovery perforance and coputational efficiency relative to state-of-the-art approaches. 1 Introduction One is often faced with solving quadratic equations of the for y i = a i, x 2, or equivalently, ψ i = a i, x, 1 i (1) where x R n /C n (hereafter, sybol A/B denotes either A or B) is the wanted unknown n 1 vector, while given observations ψ i and feature vectors a i R n /C n that are collectively stacked in the data vector ψ := [ψ i ] 1 i and the n sensing atrix A := [a i ] 1 i, respectively. Put differently, given inforation about the (squared) odulus of the inner products of the signal vector x and several known design vectors a i, can one reconstruct exactly (up to a global phase factor) x, or alternatively, the issing phase of a i, x? In fact, uch effort has been devoted to deterining the nuber of such equations necessary and/or sufficient for the uniqueness of the solution x; see e.g., [1, 8]. It has been proved that 2n 1 ( 4n 4) generic 1 (which includes the case of rando vectors) real (coplex) vectors a i are sufficient for uniquely deterining an n-diensional real (coplex) vector x [1, Theore 2.8], [8], while in the real case = 2n 1 is shown also necessary [1]. In this sense, the nuber = 2n 1 of equations as in (1) can be regarded as the inforation-theoretic liit for such a quadratic syste to be uniquely solvable. 1 It is out of the scope of the present paper to explain the eaning of generic vectors, whereas interested readers are referred to [1]. 31st Conference on Neural Inforation Processing Systes (NIPS 2017), Long Beach, CA, USA.

2 In diverse physical sciences and engineering fields, it is ipossible or very difficult to record phase easureents. The proble of recovering the signal or phase fro agnitude easureents only, also coonly known as phase retrieval, eerges naturally [11, 12]. Relevant application doains include e.g., X-ray crystallography, astronoy, icroscopy, ptychography, and coherent diffraction iaging [22]. In such setups, optical easureent and detection systes record solely the photon flux, which is proportional to the (squared) agnitude of the field, but not the phase. Proble (1) in its squared for, on the other hand, can be readily recast as an instance of nonconvex quadratically constrained quadratic prograing, that subsues as special cases several well-known cobinatorial optiization probles involving Boolean variables, e.g., the NP-coplete stone proble [2, Sec ]. A related task of this kind is that of estiating the ixture of linear regressions, where the latent ebership indicators can be converted into the issing phases [30]. Although of siple for and practical relevance across different fields, solving systes of nonlinear equations is arguably the ost difficult proble in all of the nuerical coputations [20, Page 355]. Notation: Lower- (upper-) case boldface letters denote vectors (atrices), e.g., a R n (A R n ). Calligraphic letters are reserved for sets. The floor operation c gives the largest integer no greater than the given real quantity c > 0, the cardinality S counts the nuber of eleents in set S, and x denotes the Euclidean nor of x. Since for any phase φ R, vectors x C n and e jφ x are indistinguishable given {ψ i } in (1), let dist(z, x) := in φ [0,2π) z xe jφ be the Euclidean distance of any estiate z C n to the solution set {e jφ x} 0 φ<2π of (1); in particular, φ = 0/π in the real case. 1.1 Prior contributions Following the least-squares (LS) criterion (which coincides with the axiu likelihood (ML) one assuing additive white Gaussian noise), the proble of solving quadratic equations can be naturally recast as an epirical loss iniization iniize z R n /C n L(z) := 1 l(z; ψ i /y i ) (2) where one can choose to work with the aplitude-based loss l(z; ψ i ) := (ψi ai,z )2 /2 [29, 31], or the intensity-based one l(z; y i ) := (yi ai,z 2 ) 2 /2 [3], and its related Poisson likelihood l(z; y i ) := y i log( a i, z 2 ) a i, z 2 [7]. Either way, the objective functional L(z) is nonconvex; hence, it is generally NP-hard and coputationally intractable to copute the ML or LS estiate. Miniizing the squared odulus-based LS loss in (2), several nuerical polynoial-tie algoriths have been devised via convex prograing for certain choices of design vectors a i [4, 26]. Such convex paradigs first rely on the atrix-lifting technique to express all squared odulus ters into linear ones in a new rank-1 atrix variable, followed by solving a convex sei-definite progra (SDP) after dropping the rank constraint. It has been established that perfect recovery and (near-)optial statistical accuracy are achieved in noiseless and noisy settings respectively with an optial-order nuber of easureents [4]. In ters of coputational efficiency however, such lifting-based convex approaches entail storing and solving for an n n sei-definite atrix fro general SDP constraints, whose coputational coplexity in the worst case scales as n 4.5 log 1 /ɛ for n [26], which is not scalable. Another recent line of convex relaxation [13], [14] reforulated the proble of phase retrieval as that of sparse signal recovery, and solved a linear progra in the natural paraeter vector doain. Although exact signal recovery can be established assuing an accurate enough anchor vector, its epirical perforance is in general not copetitive with state-of-the-art phase retrieval approaches. Recent proposals advocate suitably initialized iterative procedures for coping with certain nonconvex forulations directly; see e.g., algoriths abbreviated as AltMinPhase, (R/P)WF, (M)TWF, (S)TAF [17, 3, 7, 27, 29, 28, 31, 23, 6, 25], as well as (stochastic) prox-linear algoriths [9, 10]. These nonconvex approaches operate directly upon vector optiization variables, thus leading to significant coputational advantages over their convex counterparts. With rando features, they can be interpreted as perforing stochastic optiization over acquired exaples {(a i ; ψ i /y i )} 1 i to approxiately iniize the population risk functional L(z) := E (ai,ψ i/y i)[l(z; ψ i /y i )]. It is well docuented that iniizing nonconvex functionals is generally intractable due to existence of ultiple critical points [18]. Assuing Gaussian sensing vectors however, such nonconvex paradigs can provably locate the global optiu, several of which also achieve optial (statistical) guarantees. 2

3 Specifically, starting with a judiciously designed initial guess, successive iproveent is effected by eans of a sequence of (truncated) (generalized) gradient-type iterations given by z t+1 := z t µt i T t+1 l i (z t ; ψ i /y i ), t = 0, 1,... (3) where z t denotes the estiate returned by the algorith at the t-th iteration, µ t > 0 is learning rate that can be pre-selected or found via e.g., the backtracking line search strategy, and l(z t, ψ i /y i ) represents the (generalized) gradient of the odulus- or squared odulus-based LS loss evaluated at z t. Here, T t+1 denotes soe tie-varying index set signifying the per-iteration gradient truncation. Although they achieve optial statistical guarantees in both noiseless and noisy settings, state-of-theart (convex and nonconvex) approaches studied under Gaussian designs, epirically require stable recovery of a nuber of equations (several) ties larger than the inforation-theoretic liit [7, 3, 31]. As a atter of fact, when there are nuerously enough easureents (on the order of n up to soe polylog factors), the squared odulus-based LS functional adits benign geoetric structure in the sense that [24]: i) all local iniizers are also global; and, ii) there always exists a negative directional curvature at every saddle point. In a nutshell, the grand challenge of tackling systes of rando quadratic equations reains to develop algoriths capable of achieving perfect recovery and statistical accuracy when the nuber of easureents approaches the inforation liit. 1.2 This work Building upon but going beyond the scope of the aforeentioned nonconvex paradigs, the present paper puts forward a novel iterative linear-tie schee, naely, tie proportional to that required by the processor to scan all the data {(a i ; ψ i )} 1 i, that we ter reweighted aplitude flow, and henceforth, abbreviate as RAF. Our ethodology is capable of solving noiseless rando quadratic equations exactly, yielding an estiate of (near)-optial statistical accuracy fro noisy odulus observations. Exactness and accuracy hold with high probability and without extra assuption on the unknown signal vector x, provided that the ratio /n of the nuber of equations to that of the unknowns is larger than a certain constant. Epirically, our approach is shown able to ensure exact recovery of high-diensional unstructured signals given a inial nuber of equations, where /n in the real case can be as sall as 2. The new twist here is to leverage judiciously designed yet conceptually siple (re)weighting regularization techniques to enhance existing initializations and also gradient refineents. An inforal depiction of our RAF ethodology is given in two stages as follows, with rigorous details deferred to Section 3: S1) Weighted axial correlation initialization: Obtain an initializer z 0 axially correlated with a carefully selected subset S M := {1, 2,..., } of feature vectors a i, whose contributions toward constructing z 0 are judiciously weighted by suitable paraeters {wi 0 > 0} i S. S2) Iteratively reweighted gradient-like iterations: Loop over 0 t T : z t+1 = z t µt wi t l(z t ; ψ i ) (4) for soe tie-varying weighting paraeters {wi t 0}, each possibly relying on the current iterate z t and the datu (a i ; ψ i ). Two attributes of the novel approach are worth highlighting next. First, albeit being a variant of the spectral initialization devised in [29], the initialization here [cf. S1)] is distinct in the sense that different iportance is attached to each selected datu (a i ; ψ i ). Likewise, the gradient flow [cf. S2)] weighs judiciously the search direction suggested by each datu (a i ; ψ i ). In this anner, ore robust initializations and ore stable overall search directions can be constructed even based solely on a rather liited nuber of data saples. Moreover, with particular choices of the weights w t i s (e.g., taking 0/1 values), the developed ethodology subsues as special cases the recently proposed algoriths RWF [31] and TAF [29]. 2 Algorith: Reweighted Aplitude Flow This section explains the intuition and basic principles behind each stage of the advocated RAF algorith in detail. For analytical concreteness, we focus on the real Gaussian odel with x R n, 3

4 and independent sensing vectors a i R n N (0, I) for all 1 i. Nonetheless, the presented approach can be directly applied when the coplex Gaussian and the coded diffraction pattern (CDP) odels are considered. 2.1 Weighted axial correlation initialization A key enabler of general nonconvex iterative heuristics success in finding the global optiu is to seed the with an excellent starting point [15]. Indeed, several sart initialization strategies have been advocated for iterative phase retrieval algoriths; see e.g., the spectral initialization [17], [3] as well as its truncated variants [7], [29], [9], [31], [16]. One proising approach is the one pursued in [29], which is also shown robust to outliers in [9]. To hopefully approach the inforationtheoretic liit however, its perforance ay need further enhanceent. Intuitively, it is increasingly challenging to iprove the initialization (over state-of-the-art) as the nuber of acquired data saples approaches the inforation-theoretic liit. In this context, we develop a ore flexible initialization schee based on the correlation property (as opposed to the orthogonality in [29]), in which the added benefit is the inclusion of a flexible weighting regularization technique to better balance the useful inforation exploited in the selected data. Siilar to related approaches of the sae kind, our strategy entails estiating both the nor x and the direction x / x of x. Leveraging the strong law of large nubers and the rotational invariance of Gaussian a i vectors (the latter suffices to assue x = x e 1, with e 1 being the first canonical vector in R n ), it is clear that 1 ψi 2 = 1 ai, x e 1 ( 2 1 = a 2 i,1 ) x 2 x 2 (5) whereby x can be estiated to be ψ2 i/. This estiate proves very accurate even with a liited nuber of data saples because a2 i,1/ is unbiased and tightly concentrated. The challenge thus lies in accurately estiating the direction of x, or seeking a unit vector axially aligned with x. Toward this end, let us first present a variant of the initialization in [29]. Note that the larger the odulus ψ i of the inner-product between a i and x is, the known design vector a i is deeed ore correlated to the unknown solution x, hence bearing useful directional inforation of x. Inspired by this fact and having available data {(a i ; ψ i )} 1 i, one can sort all (absolute) correlation coefficients {ψ i } 1 i in an ascending order, yielding ordered coefficients 0 < ψ [] ψ [2] ψ [1]. Sorting records takes tie proportional to O( log ). 2 Let S M denote the set of selected feature vectors a i to be used for coputing the initialization, which is to be designed next. Fix a priori the cardinality S to soe integer on the order of, say, S := 3 /13. It is then natural to define S to collect the a i vectors that correspond to one of the largest S correlation coefficients {ψ [i] } 1 i S, each of which can be thought of as pointing to (roughly) the direction of x. Approxiating the direction of x therefore boils down to finding a vector to axiize its correlation with the subset S of selected directional vectors a i. Succinctly, the wanted approxiation vector can be efficiently found as the solution of 1 axiize ai, z 2 = z ( 1 ) a i a i z (6) z =1 S S i S where the superscript represents the transpose or the conjugate transpose that will be clear fro the context. Upon scaling the unity-nor solution of (6) by the nor estiate obtained ψ2 i/ in (5), to atch the agnitude of x, we will develop what we will henceforth refer to as axial correlation initialization. As long as S is chosen on the order of, the axial correlation ethod outperfors the spectral ones in [3, 17, 7], and has coparable perforance to the orthogonality-prooting ethod [29]. Its perforance around the inforation-liit however, is still not the best that we can hope for. Recall fro (6) that all selected directional vectors {a i } i S are treated the sae in ters of their contributions to constructing the initialization. Nevertheless, according to our starting principle, this ordering inforation carried by the selected a i vectors is not exploited by the initialization schee in (6) and [29]. In other words, if for i, j S, the correlation coefficient of ψ i with a i is larger 2 f() = O(g()) eans that there exists a constant C > 0 such that f() C g(). i S 4

5 Relative error than that of ψ j with a j, then a i is deeed ore correlated (with x) than a j is, hence bearing ore useful inforation about the direction of x. It is thus prudent to weigh ore the selected a i vectors associated with larger ψ i values. Given the ordering inforation ψ [ S ] ψ [2] ψ [1] available fro the sorting procedure, a natural way to achieve this goal is weighting each a i vector with siple onotonically increasing functions of ψ i, say e.g., taking the weights wi 0 := ψγ i, i S with the exponent paraeter γ 0 chosen to aintain the wanted ordering w S 0 w0 [2] w0 [1]. In a nutshell, a ore flexible initialization strategy, that we refer to as weighted axial correlation, can be suarized as follows z 0 := arg ax z =1 z ( 1 S ψ γ i a ia i i S ) z. (7) For any given ɛ > 0, the power ethod or the Lanczos algorith can be called for to find an ɛ-accurate solution to (7) in tie proportional to O(n S ) [21], assuing a positive eigengap between the largest and the second largest eigenvalues of the atrix (1/ S ) i S ψγ i a ia i, which is often true when {a i } are sapled fro continuous distribution. The proposed initialization can be obtained upon scaling z 0 fro (7) by the nor estiate in (5), to yield z 0 := ( ψ2 i/) z 0. By default, we take γ := 1 /2 in all reported nuerical ipleentations, yielding wi 0 := a i, x for all i S. Regarding the initialization procedure in (7), we next highlight two features, whereas technical details and theoretical perforance guarantees are provided in Section 3: F1) The weights {wi 0 } in the axial correlation schee enable leveraging useful inforation that each feature vector a i ay bear regarding the direction of x. F2) Taking wi 0 := ψγ i for all i S and 0 otherwise, proble (7) can be equivalently rewritten as z 0 := arg ax z =1 z ( 1 wi 0 a i a i ) z (8) which subsues previous initialization schees with particular selections of weights {wi 0 }. For instance, the spectral initialization in [17, 3] is recovered by choosing S := M, and wi 0 := ψ2 i for all 1 i Reweight. ax. correlation Spectral initialization Trunc. spectral in TWF Orthogonality prooting Trunc. spectral in RWF 0.9 1,000 2,000 3,000 4,000 5,000 n: signal diension (=2n-1) Figure 1: Relative initialization error for i.i.d. a i N (0, I 1,000 ), 1 i 1, 999. For coparison, define dist(z, x) Relative error :=. x Throughout the paper, all siulated results were averaged over 100 Monte Carlo (MC) realizations, and each siulated schee was ipleented with their pertinent default paraeters. Figure 1 evaluates the perforance of the developed initialization relative to several state-of-the-art strategies, and also with the inforation liit nuber of data bencharking the inial nuber of saples required. It is clear that our initialization is: i) consistently better than the state-of-the-art; and, ii) stable as n grows, which is in contrast to the instability encountered by the spectral ones [17, 3, 7, 31]. It is worth stressing that the ore than 5% epirical advantage (relative to the best) at the challenging inforation-theoretic benchark is nontrivial, and is one of the ain RAF upshots. This advantage becoes increasingly pronounced as the ratio /n grows. 2.2 Iteratively reweighted gradient flow For independent data obeying the real Gaussian odel, the direction that TAF oves along in stage S2) presented earlier is given by the following (generalized) gradient [29]: 1 l(z; ψ i ) = 1 ( a a i i z ψ z ) i a i z a i (9) i T i T 5

6 where the dependence on the iterate count t is neglected for notational brevity, and the convention a i z / a i z := 0 is adopted when a i z = 0. Unfortunately, the (negative) gradient of the average in (9) generally ay not point towards the true solution x unless the current iterate z is already very close to x. Therefore, oving along such a descent direction ay not drag z closer to x. To see this, consider an initial guess z 0 that has already been in a basin of attraction (i.e., a region within which there is only a unique stationary point) of x. Certainly, there are suands (a i z ψ i a i z a z )a i in (9), that could give rise to bad/isleading i gradient directions due to the erroneously estiated signs a i z a i z a i x a i x [29], or (a i z)(a i x) < 0 [31]. Those gradients as a whole ay drag z away fro x, and hence out of the basin of attraction. Such an effect becoes increasingly severe as approaches the inforation-theoretic liit of 2n 1, thus rendering past approaches less effective in this case. Although this issue is soewhat reedied by TAF with a truncation procedure, its efficacy is liited due to isses of bad gradients and is-rejections of eaningful ones around the inforation liit. To address this challenge, reweighted aplitude flow effecting suitable gradient directions fro all data saples {(a i ; ψ i )} 1 i will be adopted in a (tiely) adaptive fashion, naely introducing appropriate weights for all gradients to yield the update z t+1 = z t µ t l rw (z t ; ψ i ), t = 0, 1,... (10) The reweighted gradient l rw (z t ) evaluated at the current point z t is given as l rw (z):= 1 w i l(z; ψ i ) (11) for suitable weights {w i } 1 i to be designed next. To that end, we observe that the truncation criterion [29] { T := 1 i : a i z } a i x α with soe given paraeter α > 0 suggests to include only gradient coponents associated with a i z of relatively large sizes. This is because gradients of sizable a i z / a i x offer reliable and eaningful directions pointing to the truth x with large probability [29]. As such, the ratio a i z / a i x can be soewhat viewed as a confidence score about the reliability or eaningfulness of the corresponding gradient l(z; ψ i ). Recognizing that confidence can vary, it is natural to distinguish the contributions that different gradients ake to the overall search direction. An easy way is to attach large weights to the reliable gradients, and sall weights to the spurious ones. Assue without loss of generality that 0 w i 1 for all 1 i ; otherwise, lup the noralization factor achieving this into the learning rate µ t. Building upon this observation and leveraging the gradient reliability confidence score a i z / a x, the weight per gradient l(z; ψ i i) in RAF is designed to be 1 w i := 1 i (13) 1 + βi /( a i z / a i x ), in which {β i > 0} 1 i are soe pre-selected paraeters. Regarding the proposed weighting criterion in (13), three rearks are in order, followed by the RAF algorith suarized in Algorith 1. R1) The weights {wi t} 1 i are tie adapted to z t. One can also interpret the reweighted gradient flow z t+1 in (10) as perforing a single gradient step to iniize the sooth reweighted loss 1 wt i l(z; ψ i) with starting point z t ; see also [4] for related ideas successfully exploited in the iteratively reweighted least-squares approach to copressive sapling. R2) Note that the larger a i z / a i x is, the larger w i will be. More iportance will be attached to reliable gradients than to spurious ones. Gradients fro alost all data points are are judiciously accounted for, which is in sharp contrast to [29], where withdrawn gradients do not contribute the inforation they carry. R3) At the points {z} where a i z = 0 for certain i M, the corresponding weight will be w i = 0. That is, the losses l(z; ψ i ) in (2) that are nonsooth at points z will be eliinated, to prevent their contribution to the reweighted gradient update in (10). Hence, the convergence analysis of RAF can be considerably siplified because it does not have to cope with the nonsoothness of the objective function in (2). (12) 6

7 2.3 Algorithic paraeters To optiize the epirical perforance and facilitate nuerical ipleentations, choice of pertinent algorithic paraeters of RAF is independently discussed here. It is obvious that the RAF algorith entails four paraeters. Our theory and all experients are based on: i) S / 0.25; ii) 0 β i 10 for all 1 i ; and, iii) 0 γ 1. For convenience, a constant step size µ t µ > 0 is suggested, but other step size rules such as backtracking line search with the reweighted objective work as well. As will be foralized in Section 3, RAF converges if the constant µ is not too large, with the upper bound depending in part on the selection of {β i } 1 i. In the nuerical tests presented in Sections 2 and 4, we take S := 3 /13, β i β := 10, γ := 0.5, and µ := 2 (14) while larger step sizes µ > 0 can be afforded for larger /n values. Algorith 1 Reweighted Aplitude Flow 1: Input: Data {(a i ; ψ i } 1 i ; axiu nuber of iterations T ; step size µ t = 2/6 and weighting paraeter β i = 10/5 for real/coplex Gaussian odel; S = 3 /13, and γ = : Construct S to include indices associated with the S largest entries aong {ψ i } 1 i. 3: Initialize z 0 := ψ2 i/ z 0 with z 0 being the unit principal eigenvector of { ψ where wi 0 γ := i, i S M 0, otherwise 4: Loop: for t = 0 to T 1 where w t i := 5: Output: z T. z t+1 = z t µt Y := 1 wi 0 a i a i (15) for all 1 i. a i zt /ψ i a i zt /ψ i +β i for all 1 i. w t i (a i z t a ) i ψ zt i a a i (16) i zt 3 Main results Our ain results suarized in Theore 1 next establish exact recovery under the real Gaussian odel, whose proof is provided in the suppleentary aterial. Our RAF approach however can be generalized readily to the coplex Gaussian and CDP odels. Theore 1 (Exact recovery) Consider noiseless easureents ψ = Ax for an arbitrary x R n. If the data size c 0 S c 1 n and the step size µ µ 0, then with probability at least 1 c 3 e c2, the reweighted aplitude flow s estiates z t in Algorith 1 obey dist(z t, x) 1 10 (1 ν)t x, t = 0, 1,... (17) where c 0, c 1, c 2, c 3 > 0, 0 < ν < 1, and µ 0 > 0 are certain nuerical constants depending on the choice of algorithic paraeters S, β, γ, and µ. According to Theore 1, a few interesting properties of our RAF algorith are worth highlighting. To start, RAF recovers the true solution exactly with high probability whenever the ratio /n of the nuber of equations to the unknowns exceeds soe nuerical constant. Expressed differently, RAF achieves the inforation-theoretic optial order of saple coplexity, which is consistent with the state-of-the-art including TWF [7], TAF [29], and RWF [31]. Notice that (17) also holds at t = 0, naely, dist(z 0, x) x /10, therefore providing perforance guarantees for the proposed initialization schee (cf. Step 3 in Algorith 1). Moreover, starting fro this initial estiate, RAF converges linearly to the true solution x. That is, to reach any ɛ-relative solution accuracy (i.e., dist(z T, x) ɛ x ), it suffices to run at ost T = O(log 1 /ɛ) RAF iterations (cf. Step 4). This in 7

8 ! log 10 L(z T ) conjunction with the per-iteration coplexity O(n) confirs that RAF solves exactly a quadratic syste in tie O(n log 1 /ɛ), which is linear in O(n), the tie required to read the entire data {(a i ; ψ i )} 1 i. Given the fact that the initialization stage can be perfored in tie O(n S ) and S <, the overall linear-tie coplexity of RAF is order-optial. Proof of Theore 1 is provided in the suppleentary aterial. 4 Siulated tests Our theoretical findings about RAF have been corroborated with coprehensive nuerical tests, a saple of which are discussed next. Perforance of RAF is evaluated relative to the state-of-the-art (T)WF, RWF, and TAF in ters of the epirical success rate aong 100 MC trials, where a success will be declared for a trial if the returned estiate incurs error ψ Az T 10 5 x where the odulus operator is understood eleent-wise. The real Gaussian odel and the physically realizable CDPs were siulated in this section. For fairness, all schees were ipleented with their suggested paraeter values. The true signal vector x was randoly generated using x N (0, I), and the i.i.d. sensing vectors a i a i N (0, I). Each schee obtained the initial guess based on 200 power iterations, followed by Realization nuber Figure 2: Function value L(z T ) by RAF for 100 MC realizations when = 2n 1. a series of T = 2, 000 (truncated/reweighted) gradient iterations. All experients were perfored using MATLAB on an Intel 3.4 GHz (32 GB RAM) coputer. For reproducibility, the Matlab code of the RAF algorith is publicly available at To deonstrate the power of RAF in the high-diensional regie, the function value L(z) in (2) evaluated at the returned estiate z T for 100 independent trials is plotted (in negative logarithic scale) in Fig. 2, where = 2n 1 = 9, 999. It is self-evident that RAF succeeded in all trials even at this challenging inforation liit. To the best of our knowledge, RAF is the first algorith that epirically recovers any solution exactly fro a inial nuber of rando quadratic equations. Left panel in Fig. 3 further copares the epirical success rate of five schees under the real Gaussian odel with n = 1, 000 and /n varying by 0.1 fro 1 to 5. Evidently, the developed RAF achieves perfect recovery as soon as is about 2n, where its copeting alternatives do not work well. To deonstrate the stability and robustness of RAF in the presence of additive noise, the right panel in Fig. 3 depicts the noralized ean-square error NMSE := dist2 (z T, x) x 2 as a function of the signal-to-noise ratio (SNR) for /n taking values {3, 4, 5}. The noise odel ψ i = a i, x + η i, 1 i with η := [η i ] 1 i N (0, σ 2 I ) was eployed, where σ 2 was set such that certain SNR := 10 log 10 ( Ax 2 /σ 2 ) values on the x-axis were achieved. To exaine the efficacy and scalability of RAF in real-world conditions, the last experient entails the Galaxy iage 3 depicted by a three-way array X R 1,080 1,920 3, whose first two coordinates encode the pixel locations, and the third the RGB color bands. Consider the physically realizable CDP odel with rando asks [3]. Letting x R n (n ) be a vectorization of a certain band of X, the CDP odel with K asks is ψ (k) = F D (k) x, 1 k K, 3 Downloaded fro 8

10 0 =3n =4n =5n 1 0.8 10-2 NMSE Epirical success rate 10-1 0.6 10-3 0.4 10-4 RAF TAF TWF RWF WF 0.

where F Cn n is a discrete Fourier transfor atrix, and diagonal atrices D (k) have their diago nal entries sapled uniforly at rando fro {1, 1, j, j} with j := 1.

9 10 0 =3n =4n =5n NMSE Epirical success rate RAF TAF TWF RWF WF , SNR (db) for x2 R1,000 /n for x2 R Figure 3: Real Gaussian odel: Epirical success rate (Left); and, Relative MSE vs. SNR (Right). where F Cn n is a discrete Fourier transfor atrix, and diagonal atrices D (k) have their diago nal entries sapled uniforly at rando fro {1, 1, j, j} with j := 1. Each D (k) represents a rando ask placed after the object to odulate the illuination patterns [5]. Ipleenting K = 4 asks, each algorith perfors independently over each band 100 power iterations for an initial guess, which was refined by 100 gradient iterations. Recovered iages of TAF (left) and RAF (right) are displayed in Fig. 4, whose relative errors were and , respectively. WF and TWF returned iages of corresponding relative error and , which are far away fro the ground truth. Figure 4: Recovered Galaxy iages after 100 gradient iterations of TAF (Left); and of RAF (Right). 5 Conclusion This paper developed a linear-tie algorith called RAF for solving systes of rando quadratic equations. Our procedure consists of two stages: a weighted axial correlation initializer attainable with a few power or Lanczos iterations, and a sequence of scalable reweighted gradient refineents for a nonconvex nonsooth least-squares loss function. It was deonstrated that RAF achieves the optial saple and coputational coplexity. Judicious nuerical tests showcase its superior perforance over state-of-the-art alternatives. Epirically, RAF solves a set of rando quadratic equations with high probability so long as a unique solution exists. Proising extensions include studying robust and/or sparse phase retrieval and (sei-definite) atrix recovery via (stochastic) reweighted aplitude flow counterparts, and in particular exploiting the power of (re)weighting regularization techniques to enable ore general nonconvex optiization such as training deep neural networks [19]. Acknowledgents Work in this paper was partially supported by NSF grants , , and

10 References [1] R. Balan, P. Casazza, and D. Edidin, On signal reconstruction without phase, Appl. Coput. Haron. Anal., vol. 20, no. 3, pp , May [2] A. Ben-Tal and A. Neirovski, Lectures on Modern Convex Optiization: Analysis, Algoriths, and Engineering Applications. SIAM, 2001, vol. 2. [3] E. J. Candès, X. Li, and M. Soltanolkotabi, Phase retrieval via Wirtinger flow: Theory and algoriths, IEEE Trans. Inf. Theory, vol. 61, no. 4, pp , Apr [4] E. J. Candès, T. Stroher, and V. Voroninski, PhaseLift: Exact and stable signal recovery fro agnitude easureents via convex prograing, Appl. Coput. Haron. Anal., vol. 66, no. 8, pp , Nov [5] E. J. Candès, X. Li, and M. Soltanolkotabi, Phase retrieval fro coded diffraction patterns, Appl. Coput. Haron. Anal., vol. 39, no. 2, pp , Sept [6] J. Chen, L. Wang, X. Zhang, and Q. Gu, Robust Wirtinger flow for phase retrieval with arbitrary corruption, arxiv: , [7] Y. Chen and E. J. Candès, Solving rando quadratic systes of equations is nearly as easy as solving linear systes, in Adv. on Neural Inf. Process. Syst., Montréal, Canada, 2015, pp [8] A. Conca, D. Edidin, M. Hering, and C. Vinzant, An algebraic characterization of injectivity in phase retrieval, Appl. Coput. Haron. Anal., vol. 38, no. 2, pp , Mar [9] J. C. Duchi and F. Ruan, Solving (ost) of a set of quadratic equalities: Coposite optiization for robust phase retrieval, arxiv: , [10] J. Duchi and F. Ruan, Stochastic ethods for coposite optiization probles, arxiv: , [11] J. R. Fienup, Phase retrieval algoriths: A coparison, Appl. Opt., vol. 21, no. 15, pp , Aug [12] R. W. Gerchberg and W. O. Saxton, A practical algorith for the deterination of phase fro iage and diffraction, Optik, vol. 35, pp , Nov [13] T. Goldstein and S. Studer, PhaseMax: Convex phase retrieval via basis pursuit, arxiv: v1, [14] P. Hand and V. Voroninski, An eleentary proof of convex phase retrieval in the natural paraeter space via the linear progra phaseax, arxiv: , [15] R. H. Keshavan, A. Montanari, and S. Oh, Matrix copletion fro a few entries, IEEE Trans. Inf. Theory, vol. 56, no. 6, pp , Jun [16] Y. M. Lu and G. Li, Phase transitions of spectral initialization for high-diensional nonconvex estiation, arxiv: , [17] P. Netrapalli, P. Jain, and S. Sanghavi, Phase retrieval using alternating iniization, in Adv. on Neural Inf. Process. Syst., Stateline, NV, 2013, pp [18] P. M. Pardalos and S. A. Vavasis, Quadratic prograing with one negative eigenvalue is NP-hard, J. Global Opti., vol. 1, no. 1, pp , [19] G. Pereyra, G. Tucker, J. Chorowski, Ł. Kaiser, and G. Hinton, Regularizing neural networks by penalizing confident output distributions, arxiv: , [20] J. R. Rice, Nuerical Methods in Software and Analysis. Acadeic Press, [21] Y. Saad, Nuerical Methods for Large Eigenvalue Probles: Revised Edition. SIAM,

11 [22] Y. Shechtan, Y. C. Eldar, O. Cohen, H. N. Chapan, J. Miao, and M. Segev, Phase retrieval with application to optical iaging: A conteporary overview, IEEE Signal Process. Mag., vol. 32, no. 3, pp , May [23] M. Soltanolkotabi, Structured signal recovery fro quadratic easureents: Breaking saple coplexity barriers via nonconvex optiization, arxiv: , [24] J. Sun, Q. Qu, and J. Wright, A geoetric analysis of phase retrieval, Found. Coput. Math., 2017 (to appear); see also arxiv: , [25] I. Waldspurger, Phase retrieval with rando Gaussian sensing vectors by alternating projections, axiv: , [26] I. Waldspurger, A. d Aspreont, and S. Mallat, Phase recovery, axcut and coplex seidefinite prograing, Math. Progra., vol. 149, no. 1, pp , [27] G. Wang and G. B. Giannakis, Solving rando systes of quadratic equations via truncated generalized gradient flow, in Adv. on Neural Inf. Process. Syst., Barcelona, Spain, 2016, pp [28] G. Wang, G. B. Giannakis, and J. Chen, Scalable solvers of rando quadratic equations via stochastic truncated aplitude flow, IEEE Trans. Signal Process., vol. 65, no. 8, pp , Apr [29] G. Wang, G. B. Giannakis, and Y. C. Eldar, Solving systes of rando quadratic equations via truncated aplitude flow, IEEE Trans. Inf. Theory, 2017 (to appear); see also arxiv: , [30] X. Yi, C. Caraanis, and S. Sanghavi, Alternating iniization for ixed linear regression, in Proc. Intl. Conf. on Mach. Learn., Beijing, China, 2014, pp [31] H. Zhang, Y. Zhou, Y. Liang, and Y. Chi, Reshaped Wirtinger flow and increental algorith for solving quadratic syste of equations, J. Mach. Learn. Res., 2017 (to appear); see also arxiv: ,

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow

Solving Large-scale Systems of Random Quadratic Equations via Stochastic Truncated Amplitude Flow Gang Wang,, Georgios B. Giannakis, and Jie Chen Dept. of ECE and Digital Tech. Center, Univ. of Minnesota,