Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow

Size: px

Start display at page:

Download "Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow"

Joseph Fisher
5 years ago
Views:

1 Provable Non-convex Phase Retrieval with Outliers: Median runcated Wirtinger Flow Huishuai Zhang Departent of EECS, Syracuse University, Syracuse, NY 3244 USA Yuejie Chi Departent of ECE, he Ohio State University, Colubus, OH 4320 USA Yingbin Liang Departent of EECS, Syracuse University, Syracuse, NY 3244 USA Abstract Solving systes of quadratic equations is a central proble in achine learning and signal processing. One iportant exaple is phase retrieval, which ais to recover a signal fro only agnitudes of its linear easureents. his paper focuses on the situation when the easureents are corrupted by arbitrary outliers, for which the recently developed non-convex gradient descent Wirtinger flow (WF) and truncated Wirtinger flow (WF) algoriths likely fail. We develop a novel edian-wf algorith that exploits robustness of saple edian to resist arbitrary outliers in the initialization and the gradient update in each iteration. We show that such a non-convex algorith provably recovers the signal fro a near-optial nuber of easureents coposed of i.i.d. Gaussian entries, up to a logarithic factor, even when a constant portion of the easureents are corrupted by arbitrary outliers. We further show that edian-wf is also robust when easureents are corrupted by both arbitrary outliers and bounded noise. Our analysis of perforance guarantee is accoplished by developent of non-trivial concentration easures of edian-related quantities, which ay be of independent interest. We further provide nuerical experients to deonstrate the effectiveness of the approach.. Introduction Phase retrieval is a classical proble in achine learning, signal processing and optical iaging, where one ais to Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 206. JMLR: W&CP volue 48. Copyright 206 by the author(s). recover a signal x R n fro only observing the agnitudes of its linear easureents: y i = a i, x 2, i =,...,. It has any iportant applications such as X-ray crystallography (Drenth, 2007), but is known to be notoriously difficult due to the quadratic for of the easureents. Classical ethods based on alternating iniization between the signal of interest and the phase inforation (Fienup, 982), though coputationally siple, are often trapped at local inia and lack rigorous perforance guarantees. Using the lifting trick, the phase retrieval proble can be reforulated as estiating a rank-one positive seidefinite atrix X = xx fro linear easureents (Balan et al., 2006), to which convex relaxations into seidefinite prograing are considered (Waldspurger et al., 205; Candès et al., 203; Chen et al., 205; Deanet & Hand, 204; Candès & Li, 204; Li & Voroninski, 203). In particular, when the easureent vectors a i s are coposed of i.i.d. Gaussian entries, Phaselift (Candès et al., 203) perfectly recovers all x R n with high probability as long as the nuber of easureents is on the order of n. However, the coputational cost of Phaselift becoes prohibitive when the signal diension is large. Appealingly, a so-called Wirtinger flow (WF) algorith based on gradient descent was recently proposed in (Candès et al., 205; Soltanolkotabi, 204) and shown to work rearkably well: it converges to the global optia when properly initialized using the spectral ethod. he truncated Wirtinger flow (WF) algorith (Chen & Candès, 205) further iproves WF by eliinating saples whose contributions to both the initialization and the search direction are excessively deviated fro the saple ean, so that the behavior of each gradient update is well controlled. WF is shown to converge globally at a geoetric rate as long as is on the order of n for i.i.d. Gaussian easureent vectors using a constant step size. Both WF and WF algoriths have

2 been shown to be robust to bounded noise in the easureents. However, the perforance of WF and WF can be very sensitive to outliers that take arbitrary values and can introduce anoalous search directions. Even for WF, since the saple ean can be arbitrarily perturbed, the truncation rule based on such saple ean cannot control the gradient well. On the other hand, the ability to handle outliers is of great iportance for phase retrieval algoriths, because outliers arise frequently fro the phase iaging applications (Weller et al., 205) due to various reasons such as detector failures, recording errors, and issing data. While a for of Phaselift (Hand, 205) is shown to be robust to sparse outliers even when they constitute a constant portion of all easureents, it is coputationally too expensive... Main Contributions he ain contribution of this paper lies in the developent of a non-convex phase retrieval algorith with both statistical and coputational efficiency, and provable robustness to even a constant proportion of outliers. o the best of the authors knowledge, our work is the first application of the edian to robustify high-diensional statistical estiation in the presence of arbitrary outliers with rigorous non-asyptotic perforance guarantees. Our strategy is to carefully robustify the WF algorith by replacing the saple ean used in the truncation rule by its robust counterpart, the saple edian. We refer to the new algorith as edian truncated Wirtinger flow (edian-wf). Appealingly, edian-wf does not require any knowledge of the outliers. he robustness property of edian lies in the fact that the edian cannot be arbitrarily perturbed unless the outliers doinate the inliers (Huber, 20). his is in sharp contrast to the ean, which can be ade arbitrarily large even by a single outlier. hus, using the saple edian in the truncation rule can effectively reove the ipact of outliers and indeed, the perforance of edian-wf can be provably guaranteed. Statistically, the saple coplexity of edian-wf is near-optial up to a logarithic factor when the easureent vectors are coposed of i.i.d. Gaussian entries. We deonstrate that as soon as the nuber of easureents is on the order of n log n, edian-wf converges to the global optia, i.e. recovers the ground truth up to a global sign difference, even when the nuber of outliers scales linearly with. Coputationally, edian-wf converges at a geoetric rate, requiring a coputational cost of O(n log /ɛ) to reach ɛ-accuracy, which is linear in the proble size. Reassuringly, under the sae saple coplexity, edian-wf still recovers the ground truth when outliers are absent. It can therefore handle outliers in an oblivious fashion. Finally, edian-wf is also stable when the easureents are further corrupted by dense bounded noise besides outliers. Our proof proceeds by first showing the initialization of edian-wf is close enough to the ground truth, and that the neighborhood of the ground truth, where the initialization lands in, satisfies certain Regularity Condition (Candès et al., 205; Chen & Candès, 205) that guarantees convergence of the descent rule, as long as the size of the corruption is sall enough and the saple size is large enough. However, as a nonlinear operator, the saple edian used in edian-wf is uch ore difficult to analyze than the saple ean used in WF, which is a linear operator and any existing concentration inequalities are readily applicable. Considerable technical efforts lie in developing novel non-asyptotic concentrations of the saple edian, and various statistical properties of the saple edian related quantities, which ay be of independent interest..2. Related Work he adoption of edian in achine learning and coputer science is not unfailiar, for exaple, K-edian clustering (Chen, 2006) and resilient data aggregation for sensor networks (Wagner, 2004). Our work here further extends the applications of edian to robustifying highdiensional estiation probles. Another popular approach in robust estiation is to use the tried ean (Huber, 20), which has found success in robustifying sparse regression (Chen et al., 203), linear regression (Bhatia et al., 205), subspace clustering (Qu & Xu, 205), etc. However, using the tried ean requires knowledge of an upper bound on the nuber of outliers, whereas edian does not require such inforation. Developing non-convex algoriths with provable global convergence guarantees has attracted intensive research interest recently. A partial list includes low-rank atrix recovery (Keshavan et al., 200; Zheng & Lafferty, 205; u et al., 205; Chen & Wainwright, 205; White et al., 205), robust principal coponent analysis (Netrapalli et al., 204), robust tensor decoposition (Anandkuar et al., 205), dictionary learning (Arora et al., 205; Sun et al., 205), etc. We expect our analysis in this paper ay be extended to robustly solving other systes of quadratic or bilinear equations in a non-convex fashion, such as ixed linear regression (Chen et al., 204), sparse phase retrieval (Cai et al., 205), and blind deconvolution (Lee et al., 205)..3. Paper Organization and Notations he rest of this paper is organized as follows. Section 2 provides the proble forulation, and Section 3 describes the proposed edian-wf algorith. heoretical perforance guarantees are stated in Section 4, with proof outlines given in Section 5. Nuerical experients are presented in Section 6. Finally, we conclude in Section 7. We adopt the following notations in this paper. Given a vec-

3 tor of nubers {β i }, the saple edian is denoted as ed({β i } ). he indicator function A = if the event A holds, and A = 0 otherwise. For two atrices, A B if B A is a positive seidefinite atrix. We define the Euclidean distance between two vectors up to a global sign difference as dist(z, x) := in { z x, z + x }. 2. Proble Forulation Suppose the following set of easureents are given y i = a i, x 2 + η i, i =,,, () where x R n is the unknown signal, a i R n for i =,..., are easureent vectors with each a i having i.i.d. Gaussian entries distributed as N (0, ), and η i R for i =,..., are outliers with arbitrary values. We assue that outliers are sparse with s nonzero values, i.e., η 0 s, where η = {η i } R. Here, s is a nonzero constant, representing the faction of easureents that are corrupted by outliers. We are also interested in the odel when the easureents are corrupted by not only sparse arbitrary outliers but also dense bounded noise. Under such a odel, the easureents are given by y i = a i, x 2 + w i + η i, i =,,, (2) where the bounded noise w = {w i } satisfies w c x 2 for soe universal constant c, and as before, the outlier satisfies η 0 s. he goal is to recover the signal x (up to a global sign difference) fro the easureents y = {y i } and easureent vectors {a i }. 3. Median-WF Algorith A natural idea is to recover the signal as a solution to the following optiization proble in z l(z; y i ) (3) where l(z, y i ) is a likelihood function, e.g., using Gaussian or Poisson likelihood. Since the easureents are quadratic in x, the objective function is non-convex. A typical gradient descent procedure to solve (3) proceeds as z (t+) = z (t) + µ t l(z (t) ; y i ), (4) where z (t) denotes the tth iterate of the algorith, and µ t is the step size. By a careful initialization using the spectral We focus on real signals here, but our analysis can be extended to coplex signals. ethod, the WF algorith (Candès et al., 205) using the gradient descent update (4) is shown to converge globally under the Gaussian likelihood (i.e., quadratic loss) function, as long as the nuber of easureents is on the order of n log n for i.i.d. Gaussian easureent vectors. he WF algorith (Chen & Candès, 205) iproves WF in both initialization and the descent rule: only a subset of saples 0 contributes to the spectral ethod, and only a subset of data-dependent and iteration-varying saples t+ contributes to the search directions: z (t+) = z (t) + µ t i t+ l(z (t) ; y i ). (5) he idea is that through pruning, i.e., saples with gradient coponents l(z (t) ; y i ) being uch larger than the saple ean are truncated so that each update is well controlled. his odification yields both statistical and coputational benefits under the Poisson loss function, WF converges globally geoetrically to the true signal with a constant step size with easureents at the order of n and with i.i.d. Gaussian easureent vectors. However, if soe easureents are corrupted by arbitraryvalued outliers as in (), both WF and WF can fail. his is because the gradient of the loss function typically contains the ter y i a i z 2. With y i being corrupted by arbitrarily large η i, the gradient can deviate the search direction fro the signal arbitrarily. In WF, since the truncation rule is based on the saple ean of the gradient, which can be affected significantly even by a single outlier, we cannot expect it to converge globally, particularly when the fraction of corrupted easureents is linear with the total nuber of easureents, which is the regie we are interested in. o handle outliers, our central idea is to prune the saples in both the initialization and each iteration via the saple edian related quantities. Copared to the saple ean used in WF, the saple edian is uch less affected even in the presence of a certain linear fraction of outliers, and is thus ore robust to outliers with arbitrary values. In the following, we describe our edian-wf in ore details. We adopt the following Poisson likelihood function, l(z; y i ) = y i log a i z 2 a i z 2, (6) which is otivated by the axiu likelihood estiation of the signal when the easureents are corrupted by Poisson distributed noise. We note that our analysis is also applicable to the quadratic loss function, but in order to copare ore directly to WF in (Chen & Candès, 205), we adopt the Poisson likelihood function in (6). Our edian-wf algorith (suarized in Algorith ) contains the following ain steps:. Initialization: We initialize z (0) by the spectral ethod with a truncated set of saples, where the threshold is de-

4 Algorith Median runcated Wirtinger Flow (Median- WF) Input: y = {y i }, {a i} ; Paraeters: thresholds α y, α h, α l, and α u, stepsize µ t ; Initialization: Let z (0) = λ 0 z, where λ 0 = ed(y)/0.455 and z is the leading eigenvector of Y := y i a i a i { yi α 2 y λ2 0 }. (7) Gradient loop: for t = 0 : do z (t+) = z (t) + 2µ t y i a i z(t) 2 a a i E i i z(t) E2 i, (8) where { } E i := α l z (t) a i z (t) α u z (t), { E2 i := y i a i z (t) 2 a } i α h K z(t) t, z (t) ( ) K t := ed { y i a i z (t) 2 }. Output z. terined by the edian of {y i }. In coparison, WF does not truncate saples, and the truncation in WF is based on the ean of {y i }, which is not robust to outliers. As will be shown, as long as the portion of outliers is not too large, our initialization (7) is guaranteed to be within a sall neighborhood of the ground truth signal. 2. Gradient loop: for each iteration 0 t, coparing (4) and (8), edian-wf uses an iteration-varying truncated gradient given as l tr (z (t) ) = 2 y i a i z(t) 2 a a i E i i z(t) E2 i. (9) It is clear fro the definition of the set E i 2 (see Algorith ), that saples are truncated by the saple edian of gradient coponents evaluated at the current iteration, as opposed to the saple ean in WF. We set the step size in the edian-wf to be a fixed sall constant, i.e., µ t = 0.2. he rest of the paraeters {α y, α h, α l, α u } are set to satisfy { [ ] ζ := ax E ξ 2 { ξ <.0αl or ξ > 0.99α u}, [ ] } E { ξ <.0αl or ξ > 0.99α u}, ζ 2 := E [ ξ 2 { ξ >0.248αh } ], (0) 2(ζ + ζ 2 ) + 8/πα h <.99 α y 3, where ξ N (0, ). For exaple, we set α l = 0.3, α u = 5, α y = 3 and α h = 2, and consequently ζ 0.24 and ζ Perforance Guarantees of Median-WF In this section, we characterize the perforance guarantees of edian-wf. heore (Exact recovery with sparse arbitrary outliers). Consider the phase retrieval proble with sparse outliers given in (). here exist constants µ 0, s 0 > 0, 0 < ρ, ν < and c 0, c, c 2 > 0 such that if c 0 n log n, s < s 0, µ µ 0, then with probability at least c exp( c 2 ), the edian-wf yields dist(z (t), x) ν( ρ) t x, t N () siultaneously for all x R n \{0}. heore indicates that edian-wf adits exact recovery for all signals in the presence of sparse outliers with arbitrary agnitudes even when the nuber of outliers scales linearly with the nuber of easureents, as long as the nuber of saples satisfies n log n. his is nearoptial up to a logarithic factor. Moreover, edian-wf converges at a geoetric rate using a constant step size, with per-iteration cost O(n) (note that the edian can be coputed in linear tie (ibshirani, 2008)). o reach ɛ-accuracy, i.e., dist(z (t), x) ɛ, only O(log /ɛ) iterations are needed, and the total coputational cost is O(n log /ɛ), which is highly efficient. Not surprisingly, heore iplies that edian-wf also perfors well for the noise-free odel, as a special case of the odel with outliers. his justifies utilization of edian- WF in an oblivious situation without knowing whether the underlying easureents are corrupted by outliers. Corollary (Exact recovery for noise-free odel). Suppose that the easureents are noise-free, i.e., η i = 0 for i =,, in the odel (). here exist constants µ 0 > 0, 0 < ρ, ν < and c 0, c, c 2 > 0 such that if c 0 n log n and µ µ 0, then with probability at least c exp( c 2 ), the edian-wf yields dist(z (t), x) ν( ρ) t x, t N (2) siultaneously for all x R n \{0}. We next consider the odel when the easureents are corrupted by both sparse arbitrary outliers and dense bounded noise. Our following theore characterizes that edian-wf is robust to coexistence of the two types of noises. heore 2 (Stability to sparse arbitrary outliers and dense bounded noises). Consider the phase retrieval proble given in (2) in which easureents are corrupted

5 by both sparse arbitrary and dense bounded noises. here exist constants µ 0, s 0 > 0, 0 < ρ < and c 0, c, c 2 > 0 such that if c 0 n log n, s < s 0, µ µ 0, then with probability at least c exp( c 2 ), edian-wf yields dist(z (t), x) w x + ( ρ)t x, t N (3) siultaneously for all x R n \{0}. heore 2 iediately iplies the stability of edian- WF for the odel corrupted only by dense bounded noise. Corollary 2. Consider the phase retrieval proble in which easureents are corrupted only by dense bounded noises, i.e., η i = 0 for i =,, in the odel (2). here exist constants µ 0 > 0, 0 < ρ < and c 0, c, c 2 > 0 such that if c 0 n log n, µ µ 0, then with probability at least c exp( c 2 ), edian-wf yields dist(z (t), x) w x + ( ρ)t x, t N (4) siultaneously for all x R n \{0}. hus, heore 2 and Corollary 2 iply that edian-wf for the odel with both sparse arbitrary outliers and dense bounded noises achieves the sae convergence rate and the sae level of estiation error as the odel with only bounded noise. In fact, together with heore and Corollary, it can be seen that applying edian-wf does not require the knowledge of the noise corruption odels. When there indeed exist outliers, edian-wf achieves the sae perforance as if the outliers do not exist. 5. Proof Outlines of Main Results In this section, we first develop a few statistical properties of edian that will be useful for our analysis of perforance guarantees, and then provide a proof sketch of our ain results. he details of the proofs can be found in Suppleental Materials. 5.. Properties of Median We start by the definitions of the quantile of a population distribution and its saple version. Definition (Generalized quantile function). Let 0 < p <. For a cuulative distribution function (CDF) F, the generalized quantile function is defined as F (p) = inf{x R : F (x) p}. (5) For siplicity, denote θ p (F ) = F (p) as the p-quantile of F. Moreover for a saple sequence {X i }, the saple p-quantile θ p ({X i }) eans θ p ( ˆF ), where ˆF is the epirical distribution of the saples {X i }. Reark. We take the edian ed({x i }) = /2 ({X i }) and use both notations interchangeably. Next, we show that as long as the saple size is large enough, the saple quantile concentrates around the population quantile (otivated fro (Charikar et al., 2002)), as in Lea. Lea. Suppose F ( ) is cuulative distribution function (i.e., non-decreasing and right-continuous) with continuous density function F ( ). Assue the saples {X i } are i.i.d. drawn fro F. Let 0 < p <. If l < F (θ) < L for all θ in {θ : θ θ p ɛ}, then θ p ({X i } ) θ p (F ) < ɛ (6) holds with probability at least 2 exp( 2ɛ 2 l 2 ). Lea 2 bounds the distance between the edian of two sequences. Lea 2. Given a vector X = (X, X 2,..., X n ), reorder the in a non-decreasing anner X () X (2)... X (n ) X (n). Given another vector Y = (Y, Y 2,..., Y n ), then one has for all k =,..., n. X (k) Y (k) X Y, (7) Lea 3 suggests that in the presence of outliers, one can lower and upper bound the saple edian by neighboring quantiles of the corresponding clean saples. Lea 3. Consider clean saples { X i }. If a fraction s of the are corrupted by outliers, one obtains containated saples {X i } which contain s corrupted saples and ( s) clean saples. hen 2 s ({ X i }) 2 ({X i}) 2 +s ({ X i }). Finally, Lea 4 is related to bound the edian and density at the edian for the product of two possibly correlated standard Gaussian rando variables. Lea 4. Let u, v N (0, ) which can be correlated with the correlation coefficient ρ. Let r = uv, and ψ ρ (x) represent the density of r. Denote (ψ ρ) as 2 the edian of r, and the value of ψ ρ (x) at the edian as ψ ρ (/2 ). hen for all ρ, < /2 (ψ ρ ) < 0.455, 0.47 < ψ ρ (/2 ) < Robust Initialization with Outliers We show that the initialization provided by the ediantruncated spectral ethod in (7) is close enough to the

6 ground truth, i.e. dist(z (0), x) / x, even if there are s arbitrary outliers, as long as s is a sall enough constant.. We first bound the concentration of ed({y i }), also denoted by /2 ({y i }). Lea 3 suggests that 2 s ({(a i x) 2 }) < /2 ({y i }) < 2 +s ({(a i x) 2 }) Observe that (a i x)2 = ã 2 i x 2, where ã i = a i x/ x is a standard Gaussian rando variable. hus ã i 2 is a χ 2 rando variable, whose CDF is denoted as K. By Lea, for a sall ɛ, one has 2 s ({ ã i 2 }) 2 s (K) < ɛ and 2 +s ({ ã i 2 }) 2 +s (K) < ɛ with probability exp( cɛ 2 ). hus, let ζ s := 2 s (K) and ζ s := 2 +s (K), we have with probability exp( cɛ 2 ) (ζ s ɛ) x 2 < /2 ({y i }) < (ζ s + ɛ) x 2, (8) where ζ s and ζ s can be arbitrarily close if s is sall enough. 2. We next estiate the direction of x, assuing x =. he nor x can be estiated as in Algorith. For siplicity of presentation, we assue x =. Using (8), the atrix Y in (7) can be bounded by Y Y Y 2 with high probability, where Y := ai a i (a i x) 2 {(a i x)2 α 2 y (ζs ɛ)/0.455} Y 2 := ai a i (a i x) 2 {(a i x)2 α 2 y (ζs +ɛ)/0.455} + a i a i α 2 y(ζ s + ɛ)/ i S It can then be shown by concentration of rando atrices with non-isotropic sub-gaussian rows (Vershynin, 202) that Y and Y 2 concentrate on their eans which can be ade arbitrarily close with s sufficiently sall. ogether with the fact that both eans of Y and Y 2 have leading eigenvector x, one can justify that the leading eigenvector of Y can be ade close enough to x for sufficiently sall constant s Geoetric Convergence We now show that within a sall neighborhood of the ground truth, the truncated gradient (9) satisfies the Regularity Condition (RC) (Candès et al., 205; Chen & Candès, 205), which guarantees the geoetric convergence of edian-wf once the initialization lands into this neighborhood. Definition 2. he gradient l tr (z) is said to satisfy the Regularity Condition RC(µ, λ, ɛ) if l tr(z), h µ 2 2 l tr(z) + λ 2 h 2 (9) for all z and h = z x obeying h c z. We will first show that l tr (z) satisfies the RC for the noise-free case, and then extend it to odel () with sparse outliers, thus establishing the global convergence of edian-wf in both cases. he central step to establish the RC is to show that the saple edian used in the truncation rule concentrates at the level z x z as stated in the following proposition. Proposition. If > c 0 n log n, then with probability at least c exp( c 2 ), there exist constants β and β such that β z h ed( { a i x 2 a i z 2 } ) β z h, holds for all z, h := z x satisfying h < / z. We note that a siilar property for the saple ean has been shown in (Chen & Candès, 205) as long as the nuber of easureents is O(n). In fact, the edian is uch ore challenging to handle due to its non-linearity, which also causes slightly ore easureents copared to the saple ean. We next briefly sketch how we exploit the properties of the edian developed in Section 5. to show Proposition. First fix x and z satisfying x z < / z. It can be shown that (a i x) 2 (a i z) 2 = c u i v i h z, where u i and v i are correlated N (0, ) Gaussian rando variables and.89 < c < 2. Hence, Lea 4 and Lea iply that (0.65 ɛ) z h ed ({ (a i x) 2 (a i z) 2 }) (0.9 + ɛ) z h (20) for given x and z with high probability. hen applying the net covering arguent we prove that (20) holds for all z and x with z x z. In particular, Lea 2 is applied to bound the distance between the edians of two points on and off the net. For the odel in () with outliers, we show that ed({ y i (a i z) 2 }) continues to have property as in Proposition even with the presence of a sall constant portion of outliers. his can be accoplished by first observing 2 s ({ (a i x) 2 (a i z) 2 }) 2 ({ y i (a i z) 2 }) 2 +s ({ (a i x) 2 (a i z) 2 }). (2) using Lea 3, and then extending (20) to quantiles 2 s and 2 +s respectively for a sall constant s. aking all these together yields bounds for both sides of (2) at the level of z h. Reark 2. With ɛ sall enough, β = 0.6, β = is a valid choice for Proposition. We will set the algorith paraeters based on these two values.

7 /n /n 5.4. Stability with Additional Dense Bounded Noise Now, consider the odel in (2) with both sparse outliers and dense bounded noise. We oit the analysis of the initialization step as it is siilar to Section 5.2. We split our analysis of the gradient loop into two regies. Regie : Assue c 4 z h c 3 ( w / z ). Lea 3 iplies 2 s ({ ỹi (a i z) 2 }) ed( { yi (a i z) 2 } ) 2 +s ({ ỹ i (a i z) 2 }), (22) where ỹ i := (a i x)2 + w i i.e., easureents that are corrupted by only bounded noise. Moreover, Lea 2 and assuption of the regie iplies 2 +s ({ ỹ i (a i z) 2 }) 2 +s ({ (a i x) 2 (a i z) 2 }) w, 2 s ({ ỹ i (a i z) 2 }) 2 s ({ (a i x) 2 (a i z) 2 }) w. (23) herefore, cobining the above inequalities with Proposition, we have β x z z ed({ yi (a i z) 2 }) β x z z, iplying that RC holds in Regie, and the error decreases geoetrically in each iteration. Regie 2: Assue h c 3 ( w / z ). Since each update µ l tr(z) is at ost the order of w / z, the estiation error cannot increase by ore than ( w / z ) with a constant factor. hus, one has ( dist z + µ ) l tr(z), x c 5 ( w / x ) (24) for soe constant c 5. As long as w / x 2 is sufficiently sall, it is guaranteed that c 5 ( w / x ) c 4 x. If the iteration jups out of Regie 2, it falls into Regie described above. 6. Nuerical Experients In this section, we provide nuerical experients to deonstrate the effectiveness of edian-wf, which corroborates with our theoretical findings.we first show that, in the noise-free case, our edian-wf perfors siilarly as WF (Chen & Candès, 205) for exact recovery. We set the paraeters of edian-wf as specified in Section 3, and those of WF as suggested in (Chen & Candès, 205). Let the signal length n take values fro 000 to 0000 by a step size of 000, and the ratio of the saple coplexity to the signal length, /n, take values fro 2 to 6 by a step size of 0.. For each pair of (, n), we generate a signal x N (0, I n n ), and the easureent vectors a i N (0, I n n ) i.i.d. for i =,...,. For both algoriths, a fixed nuber of iterations = 500 are run, and the trial is declared successful if z ( ), the output of the algorith, satisfies dist(z ( ), x)/ x 0 8. Figure shows the nuber of successful trials out of 20 trials for both algoriths, with respect to /n and n. It can be seen that for both algoriths, as soon as is above 4n, exact recovery is achieved for both algoriths. Around the phase transition boundary, the perforance of edian- WF is slightly worse than that of WF, which is possibly due to the inefficiency of edian copared to ean in the noise-free case (Huber, 20) n (a) edian-wf n (b) WF Figure. Phase transition of edian-wf and WF for noisefree data: the gray scale of each cell (/n, n) indicates the nuber of successful recovery out of 20 trials. We next exaine the perforance of edian-wf in the presence of sparse outliers. We copare the perforance of edian-wf with not only WF but also an alternative which we call the triean-wf, based on replacing the saple ean in WF by the tried ean. More specifically, triean-wf requires knowing the fraction s of outliers so that saples corresponding to s largest gradient values are reoved, and truncation is then based on the ean of reaining saples. We fix the signal length n = 000 and the nuber of easureents = We assue each easureent y i is corrupted with probability s [0, 0.4] independently, where the corruption value η i U(0, η ) is randoly generated fro a unifor distribution. Figure 2 shows the success rate of exact recovery over 00 trials as a function of s at different levels of outlier agnitudes η / x 2 = 0.,, 0, 00, for the three algoriths edian-wf, triean-wf and WF. Fro Figure 2, it can be seen that edian-wf allows exact recovery as long as s is not too large for all levels of outlier agnitudes, without any knowledge of the outliers, which validates our theoretical analysis. Unsurprisingly, WF fails quickly even with very sall fraction of outliers. No successful instance is observed for WF when s 0.02 irrespective of the value of η. riean-wf does not exhibit as sharp phase transition as edian-wf, and in general underperfors our edian-wf, except when both η and s gets very large, see Figure 2(d). his is because in this range with large outliers, knowing the fraction s of outliers provides substantial advantage for

8 Relative error Relative error Relative error Success rate Success rate Success rate Success rate 0.8 edian-wf triean-wf WF 0.8 edian-wf triean-wf WF 0.8 edian WF triean WF WF edian-wf triean-wf WF Outliers fraction s Outliers fraction s Outliers fraction s Outliers fraction s (a) η = 0. x 2 (b) η = x 2 (c) η = 0 x 2 (d) η = 00 x 2 Figure 2. Success rate of exact recovery with outliers for edian-wf, triean-wf, and WF at different levels of outlier agnitudes edian-wf with outliers WF with outliers WF without outliers edian-wf with outliers WF with outliers WF without outliers edian-wf with outliers WF with outliers WF without outliers Iterations (a) w = 0.0 x Iterations (b) w = 0.00 x Iterations (c) Poisson noise Figure 3. he relative error with respect to the iteration count for edian-wf and WF with both dense noise and sparse outliers, and WF with only dense noise. Perforance of edian-wf with both dense noise and sparse outliers is coparable to that of WF without outliers. (a) and (b): Unifor noise with different levels; (c) Poisson noise. triean-wf to eliinate the, while the saple edian can be deviated significantly fro the true edian for large s. Moreover, it is worth entioning that exact recovery is ore challenging for edian-wf when the agnitudes of ost outliers are coparable to the easureents, as in Figure 2(c). In such a case, the outliers are ore difficult to exclude as opposed to the case with very large outlier agnitudes as in Figure 2(d); and eanwhile, the outlier agnitudes in Figure 2(c) are large enough to affect the accuracy heavily in contrast to the cases in Figure 2(a) and 2(b) where outliers are less proinent. We now exaine the perforance of edian-wf in the presence of both sparse outliers and dense bounded noise. he entries of the dense bounded noise w is generated independently fro U(0, w ), with w / x 2 = 0.00, 0.0 respectively. he entries of the outlier is then generated as η i w Bernoulli(0.) independently. Figure 3(a) and Figure 3(b) depict the relative error dist(z (t), x)/ x with respect to the iteration count t, for unifor noise at different levels. It can be seen that edian-wf under outlier corruption clearly outperfors WF under the sae situation, and acts as if the outliers do not exist by achieving alost the sae accuracy as WF under no outliers. Moreover, the solution accuracy of edian-wf has 0 ties gain fro Figure 3(a) to Figure 3(b) as w shrinks by /0, which corroborates heore 2 nicely. Finally, we consider when the easureents are corrupted both by Poisson noise and outliers, which odels photon detection in optical iaging applications. We generate each easureent as y i Poisson( a i, x 2 ), for i =,,, which is then corrupted with probability s = 0. by outliers. he entries of the outlier are obtained by first generating η i x 2 U(0, ) independently, and then rounding it to the nearest integer. Figure 3(c) depicts the relative error dist(z (t), x)/ x with respect to the iteration count t, where again edian-wf under both Poisson noise and sparse outlier noise has alost the sae accuracy as WF under only Poisson noise. 7. Conclusions In this paper, we proposed edian-wf, and showed that it allows exact recovery even with a constant proportion of arbitrary outliers for robust phase retrieval. his is in contrast to recently developed WF and WF, which likely to fail under outlier corruptions. We anticipate that saple edian can be applied to designing provably robust non-convex algoriths for other inference probles under sparse arbitrary corruptions. he techniques we develop here to analyze perforance guarantee for edian-based algoriths will be useful in those contexts as well. Acknowledgeents We would like to thank the anonyous reviewers for their helpful coents. We also thank Yining Wang and Yu- Xiang Wang at Carnegie Mellon University for helpful discussion on statistical properties of the edian. he research of Huishuai Zhang and Yingbin Liang is supported by the grant AFOSR FA he work of Y. Chi is supported in part by the grants NSF CCF , AFOSR FA and ONR N

9 References Anandkuar, A., Jain, P., Shi, Y., and Niranjan, U.N. ensor vs atrix ethods: Robust tensor decoposition under block sparse perturbations. arxiv preprint arxiv: , 205. Arora, S., Ge, R., Ma,., and Moitra, A. Siple, efficient, and neural algoriths for sparse coding. arxiv preprint arxiv: , 205. Balan, R., Casazza, P., and Edidin, D. On signal reconstruction without phase. Applied and Coputational Haronic Analysis, 20(3): , Bhatia, K., Jain, P., and Kar, P.. Robust regression via hard thresholding. In Advances in Neural Inforation Processing Systes (NIPS), pp , 205. Cai,.., Li, X., and Ma, Z. Optial rates of convergence for noisy sparse phase retrieval via thresholded wirtinger flow. arxiv preprint arxiv: , 205. Candès, E. J. and Li, X. Solving quadratic equations via phaselift when there are about as any equations as unknowns. Foundations of Coputational Matheatics, 4(5):07 026, 204. Candès, E. J., Stroher,., and Voroninski, V. Phaselift: Exact and stable signal recovery fro agnitude easureents via convex prograing. Counications on Pure and Applied Matheatics, 66(8):24 274, 203. Candès, E. J., Li, X., and Soltanolkotabi, M. Phase retrieval via wirtinger flow: heory and algoriths. IEEE ransactions on Inforation heory, 6(4): , 205. Charikar, M., Chen, K., and Farach-Colton, M. Finding frequent ites in data streas. In Autoata, languages and prograing, pp Springer, Chen, K. On k-edian clustering in high diensions. In Proceedings of the seventeenth annual ACM-SIAM syposiu on Discrete algorith, pp Chen, Y. and Candès, E. J. Solving rando quadratic systes of equations is nearly as easy as solving linear systes. In Advances in Neural Inforation Processing Systes (NIPS), pages Chen, Y. and Wainwright, M. J. Fast low-rank estiation by projected gradient descent: General statistical and algorithic guarantees. arxiv preprint arxiv: , 205. Chen, Y., Caraanis, C., and Mannor, S. Robust sparse regression under adversarial corruption. In Proceedings of the 30th International Conference on Machine Learning (ICML), pp , 203. Chen, Y., Yi, X., and Caraanis, C. A convex forulation for ixed regression with two coponents: Miniax optial rates. In Conf. on Learning heory, 204. Chen, Y., Chi, Y., and Goldsith, A.J. Exact and stable covariance estiation fro quadratic sapling via convex prograing. IEEE ransactions on Inforation heory, 6(7): , July 205. Deanet, L. and Hand, P. Stable optiizationless recovery fro phaseless linear easureents. Journal of Fourier Analysis and Applications, 20():99 22, 204. Donahue, Jaes D. Products and quotients of rando variables and their applications. echnical report, DIC Docuent, 964. Drenth, J. X-Ray Crystallography. Wiley Online Library, Fienup, J. R. Phase retrieval algoriths: a coparison. Applied optics, 2(5): , 982. Hand, P. Phaselift is robust to a constant fraction of arbitrary errors. arxiv preprint arxiv: , 205. Huber, P. J. Robust statistics. Springer, 20. Keshavan, R. H., Montanari, A., and Oh, S. Matrix copletion fro a few entries. IEEE ransactions on Inforation heory, 56(6): , June 200. Lee, K., Li, Y., Junge, M., and Bresler, Y. Blind recovery of sparse signals fro subsapled convolution. arxiv preprint arxiv:5.0649, 205. Li, X. and Voroninski, V. Sparse signal recovery fro quadratic easureents via convex prograing. SIAM Journal on Matheatical Analysis, 203. Netrapalli, P., Niranjan, U.N., Sanghavi, S., Anandkuar, A., and Jain, P. Non-convex robust pca. In Advances in Neural Inforation Processing Systes (NIPS), pp. 07 5, 204. Qu, C. and Xu, H. Subspace clustering with irrelevant features via robust dantzig selector. In Advances in Neural Inforation Processing Systes (NIPS), pp Soltanolkotabi, M. Algoriths and theory for clustering and nonconvex quadratic prograing. PhD thesis, Stanford University, 204. Sun, J., Qu, Q., and Wright, J. Coplete dictionary recovery using nonconvex optiization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pp , 205. ibshirani, R. J. Fast coputation of the edian by successive binning. arxiv preprint arxiv: , 2008.

10 u, S., Boczar, R., Soltanolkotabi, M., and Recht, B. Lowrank solutions of linear atrix equations via procrustes flow. arxiv preprint arxiv: , 205. Vershynin, R. Introduction to the non-asyptotic analysis of rando atrices. Copressed Sensing, heory and Applications, pp , 202. Wagner, D. Resilient aggregation in sensor networks. In Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks, pp ACM, Waldspurger, I., d Aspreont, A., and Mallat, S. Phase recovery, axcut and coplex seidefinite prograing. Matheatical Prograing, 49(-2):47 8, 205. Weller, D.S., Pnueli, A., Divon, G., Radzyner, O., Eldar, Y.C., and Fessler, J.A. Undersapled phase retrieval with outliers. IEEE ransactions on Coputational Iaging, (4): , Dec 205. White, C. D., Ward, R., and Sanghavi, S. he local convexity of solving quadratic equations. arxiv preprint arxiv: , 205. Zheng, Q. and Lafferty, J. A convergent gradient descent algorith for rank iniization and seidefinite prograing fro rando linear easureents. In Advances in Neural Inforation Processing Systes (NIPS), pp. 09 7, 205.

11 Suppleentary Materials A. Proof of Properties of Median (Section 5.) A.. Proof of Lea For siplicity, denote θ p := θ p (F ) and ˆθ p := θ p ({X i } ). Since F is continuous and positive, for an ɛ, there exists a δ such that P(X θ p ɛ) = p δ, where δ (ɛl, ɛl). hen one has ( ) ) ( ) (a) P (ˆθp < θ p ɛ = P {Xi θ p ɛ} p = P {Xi θ p ɛ} (p δ ) + δ (b) exp( 2δ 2 ) exp( 2ɛ 2 l 2 ), where (a) is due to the definition of the quantile function in (5) and (b) is due to the fact that {Xi θ p ɛ} Bernoulli(p δ ) i.i.d., followed by the Hoeffding inequality. Siilarly, one can show for soe δ 2 (ɛl, ɛl), ) P (ˆθp > θ p + ɛ exp( 2δ2) 2 exp( 2ɛ 2 l 2 ). Cobining these two inequalities, one has the conclusion. A.2. Proof of Lea 2 It suffices to show that X (k) Y (k) ax X l Y l, k =,, n. (25) l Case : k = n, suppose X (n) = X i and Y (n) = Y j, i.e., X i is the largest aong {X l } n l= and Y j is the largest aong {Y l } n l=. hen we have either X j X i Y j or Y i Y j X i. Hence, X (n) Y (n) = X i Y j ax{ X i Y i, X j Y j }. Case 2: k =, suppose that X () = X i and Y () = Y j. Siilarly X () Y () = X i Y j ax{ X i Y i, X j Y j }. Case 3: < k < n, suppose that X (k) = X i, Y (k) = Y j, and without loss of generality assue that X i < Y j (if X i = Y j, 0 = X (k) Y (k) ax l X l Y l holds trivially). We show the conclusion by contradiction. Assue X (k) Y (k) > ax l X l Y l. hen one ust have Y i < Y j and X j > X i and i j. Moreover for any p < k and q > k, the index of X (p) cannot be equal to the index of Y (q) ; otherwise the assuption is violated. hus, all Y (q) for q > k ust share the sae index set with X (p) for p > k. However, X j, which is larger than X i (thus if X j = X (k ), then k > k), shares the sae index with Y j, where Y j = Y (k). his yields contradiction. A.3. Proof of Lea 3 Assue that s is an integer. Since there are s corrupted saples in total, one can select out at least ( 2 s) clean saples fro the left half of ordered containated saples {/ ({X i }), θ 2/ ({X i }),, /2 ({X i })}. hus one has the left inequality. Furtherore, one can also select out at least ( 2 s) clean saples fro the right half of ordered containated saples {/2 ({X i }),, ({X i })}. One has the right inequality. A.4. Proof of Lea 4 First we introduce soe general facts for the distribution of the product of two correlated standard Gaussian rando variables (Donahue, 964). Let u N (0, ), v N (0, ), and their correlation coefficient be ρ [, ]. hen the density of uv is given by ( ) ( ) ρx x φ ρ (x) = π ρ exp 2 ρ 2 K 0 ρ 2, x 0,

12 where K 0 ( ) is the odified Bessel function of the second kind. hus the density of r = uv is [ ( ) ( ρx ψ ρ (x) = π exp ρ 2 ρ 2 + exp ρx )] ( ) x ρ 2 K 0 ρ 2, x > 0, (26) for ρ <. If ρ =, r becoes a χ 2 rando variable, with the density ψ ρ = (x) = 2π x /2 exp( x/2), x > 0. It can be seen fro (26) that the density of r only relates to the correlation coefficient ρ [, ]. Let /2 (ψ ρ ) be the /2 quantile (edian) of the distribution ψ ρ (x), and ψ ρ (/2 ) be the value of the function ψ ρ at the point /2 (ψ ρ ). Although it is difficult to derive the analytical expressions of /2 (ψ ρ ) and ψ ρ (/2 ) due to the coplicated for of ψ ρ in (26), due to the continuity of ψ ρ (x) and /2 (ψ ρ ), we can calculate the nuerically, as illustrated in Figure 4. Fro the nuerical calculation, one can see that both ψ ρ (/2 ) and /2 (ψ ρ ) are bounded fro below and above p=0.49 p=0.50 p= p=0.49 p=0.50 p=0.5 Quantile θ p (ψ ρ ) Density ψ ρ (θ p ) Correlation ρ Correlation ρ Figure 4. Quantiles and density at quantiles across ρ for all ρ [0, ] (ψ ρ ( ) is syetric over ρ, hence it is sufficient to consider ρ [0, ]), satisfying B. Robust Initialization with Outliers (Section 5.2) < /2 (ψ ρ ) < 0.455, 0.47 < ψ ρ (/2 ) < (27) his section proves that the truncated spectral ethod provides a good initialization even if s easureents are corrupted by arbitrary outliers as long as s is sall. Consider the odel in (). Lea 3 yields 2 s ({(a i x) 2 }) < /2 ({y i }) < 2 +s ({(a i x) 2 }). (28) Observe that a i x = ã2 i x 2, where ã i = a i x/ x is a standard Gaussian rando variable. hus ã i 2 is a χ 2 rando variable, whose cuulative distribution function is denoted as K(x). Moreover by Lea, for a sall ɛ, one has 2 s ({ ã i 2 }) 2 s (K) < ɛ and 2 +s ({ ã i 2 }) 2 +s (K) < ɛ with probability 2 exp( cɛ 2 ) and c is a constant around (c.f. Figure 4). We note that /2 (K) = and both 2 s (K) and 2 +s (K) can be (K) siultaneously as long as s is sall enough (independent of n). hus one has arbitrarily close to 2 ( 2 s (K) ɛ ) x 2 < /2 ({y i }) < ( 2 +s (K) + ɛ ) x 2, (29) with probability at least exp( cɛ 2 ). For the sake of siplicity, we introduce two new notations ζ s := 2 s (K) and ζ s := 2 +s (K). Specifically for the instance of s = 0.0, one has ζ s = and ζ s = It is easy to see that ζ s ζ s can be arbitrarily sall if s is sall enough. We first consider the case when x =. On the event that (29) holds, the truncation function has the following bounds, {yi α 2 y /2({y i})/0.455} {yi α 2 y (ζs +ɛ)/0.455} {yi α 2 y /2({y i})/0.455} {yi α 2 y (ζs ɛ)/0.455}.

13 On the other hand, denote the support of the outliers as S, we have Y = a i a i (a i x) 2 {(a i x) 2 α 2 y /2({y i})/0.455} + a i a i y i {yi α 2 y /2({y i})/0.455}. i/ S Consequently, one can bound Y as where we have Y := a i a i (a i x) 2 {(a i x) 2 α 2 y (ζs ɛ)/0.455} Y i/ S a i a i (a i x) 2 {(a i x) 2 α 2 y (ζs +ɛ)/0.455} + a i a i α 2 y(ζ s + ɛ)/0.455 =: Y 2, i/ S E[Y ] = ( s)(β xx + β 2 I), i S i S E[Y 2 ] = ( s)(β 3 xx + β 4 I) + sα 2 y (ζ s + ɛ) I, (30) with β := E[ξ 4 { } ξ α y (ζs ɛ)/0.455 ] E[ξ 2 { } ξ α y (ζs ɛ)/0.455 ], β 2 := E[ξ 2 { } ξ α y (ζs ɛ)/0.455 ] and β 3 := E[ξ 4 { } ξ α y (ζs ] E[ξ 2 { } +ɛ)/0.455 ξ α y (ζs ], β +ɛ)/ := E[ξ 2 { } ξ α y (ζs ], assuing +ɛ)/0.455 ξ N (0, ). Applying standard results on rando atrices with non-isotropic sub-gaussian rows (Vershynin, 202, equation (5.26)) and noticing that a i a i (a i x)2 { a i x c} can be rewritten as b i b i for soe sub-gaussian vector b i := a i (a i x) { a i x c}, one can deduce Y E[Y ] δ, Y 2 E[Y 2 ] δ (3) with probability exp( Ω()), provided that /n exceeds soe large constant. Besides, when ɛ and s are sufficiently sall, one further has E[Y ] E[Y 2 ] δ. Putting these together, one has Y ( s)(β xx + β 2 I) 3δ. (32) Let z (0) be the noralized leading eigenvector of Y. Repeating the sae arguent as in (Candès et al., 205, Section 7.8) and taking δ, ɛ to be sufficiently sall, one has for a given δ > 0, as long as /n exceeds soe large constant. dist( z (0), x) δ, (33) Furtherore let z (0) = ed{y i }/0.455 z (0) to handle cases x. By the bound (29), one has { ed({y i }) x 2 ζ s ɛ ax , ζ s } + ɛ x 2 ζs ζ s + ɛ x 2 (34) hus as long as s is a sall enough constant. dist(z (0), x) ζs ζ s + ɛ x δ x x (35) C. Geoetric Convergence for Noise-free Model (Proof of Corollary 2) After obtaining a good initialization, the central idea to establish geoetric convergence is to show that the truncated gradient l tr (z) in the neighborhood of the global optia satisfies the Regularity Condition RC(µ, λ, ɛ) defined in Definition 2. We show this by two steps. Step establishes a key concentration property for the saple edian used in the truncation rule, which is then subsequently exploited to prove RC in Step 2.

14 C.. Proof of Concentration Property for Saple Median We show that the saple edian used in the truncation rule concentrates at the level z x z as stated in the following proposition. Along the way, we also establish that the saple quantiles around the edian are also concentrated at the level z x z. Proposition 2 (Refined version of Proposition ). Fix ɛ (0, ). If > c 0 (ɛ 2 log ɛ )n log n, then with probability at least c exp( c 2 ɛ 2 ), (0.65 ɛ) z h ed{ (a i x) 2 (a i z) 2 } (0.9 + ɛ) z h, (36) (0.63 ɛ) z h θ 0.49, θ 0.5 { (a i x) 2 (a i z) 2 } ( ɛ) z h, (37) hold for all x, z with x z < / z, where h := z x. Proof. We first show for a fixed pair z and x, (36) and (37) hold with high probability. Let r i = (a i x)2 (a i z)2. hen r i s are i.i.d. copies of a rando variable r, where r = (a x) 2 (a z) 2 with the entries of a coposed of i.i.d. standard Gaussian rando variables. Note that the distribution of r is fixed once given h and z. Let x() denote the first eleent of a generic vector x, and x denote the reaining vector of x after eliinating the first eleent. Let U h be an orthonoral atrix with first row being h / h, and ã = U h a, z = U h z. Siilarly define U z and let b = U z ã. hen ã() and b() are independent standard noral rando variables. We further express r as follows. r = (a z) 2 (a x) 2 = (2a z a h)(a h) = (2ã z ã() h )(ã() h ) = (2h z h 2 )ã() 2 + 2(ã z )(ã() h ) = (2h z h 2 )ã() b() z ã() h = (2h z h 2 )ã() z 2 z() 2 ã() b() h ( ) = 2 h z h z h ( ) 2 ã() h z ã() b() h z z h z =: (2 cos(ω) t)ã() cos 2 (ω)ã() b() h z =: uṽ h z where ω is the angle between h and z, and t = h / z < /. Consequently, u = ã() N (0, ) and ṽ = (2 cos(ω) t)ã() + 2 sin(ω) b() is also a Gaussian rando variable with variance 3.6 < Var(ṽ) < 4 under the assuption t < /. Let v = ṽ/ Var(ṽ), then v N (0, ). Furtherore, let r = uv. Denote the density function of r as ψ ρ ( ) and the /2-quantile point of r as /2 (ψ ρ ). By Lea 4, we have 0.47 < ψ ρ (/2 ) < 0.76, < /2 (ψ ρ ) < (38) By Lea, we have with probability at least 2 exp( cɛ 2 ) (here c is around ), ɛ < ed({r i} ) < ɛ. (39) he sae arguents carry over to other quantiles θ 0.49 ({r i }) and θ 0.5({r i }). Fro Figure. 4, we observe that for ρ [0, ] 0.45 < ψ ρ (θ 0.49 ), ψ ρ (θ 0.5 ) < 0.78, < θ 0.49 (ψ ρ ), θ 0.5 (ψ ρ ) < (40)

15 and then we have with probability at least 2 exp( cɛ 2 ) (here c is around ), ɛ < θ 0.49 ({r }), θ 0.5 ({r }) < ɛ. (4) Hence, by ultiplying back Var(ṽ), we have with probability 2 exp( cɛ 2 ), (0.65 ɛ) z x z ed ({ (a i z) 2 (a i x) 2 }) (0.9 + ɛ) z x z, (42) (0.63 ɛ) z x z θ 0.49, θ 0.5 ({ (a i z) 2 (a i x) 2 }) ( ɛ) z x z. (43) We note that, to keep notation siple, c and ɛ ay vary line by line within constant factors. Up to now, we proved for any fixed z and x, the edian or neighboring quantiles of { (a i z)2 (a i x)2 } are upper and lower bounded by z x z ties constant factors. o prove (36) and (37) for all z and x with z x z, we use the net covering arguent. Still we argue for edian first and the sae arguents carry over to other quantiles soothly. o proceed, we restate (42) as ({ ( 2(a (0.65 ɛ) ed i z) z a i h h h z ) a i h h holds with probability at least 2 exp( cɛ 2 ) for a given pair h, z satisfying h / z /. }) (0.9 + ɛ), (44) Let τ = ɛ/(6n + 6), and let S τ be a τ-net covering the unit sphere, L τ be a τ-net covering a line with length /, and set N τ = {(z 0, h 0, t 0 ) : (z 0, h 0, t 0 ) S τ S τ L τ }. (45) One has cardinality bound (i.e., the upper bound on the covering nuber) N τ ( + 2/τ) 2n /(τ) < ( + 2/τ) 2n+. aking the union bound we have (0.65 ɛ) ed ({ 2(a i z 0 ) (a i h 0 )t 0 a i h 0 }) (0.9 + ɛ), (z 0, h 0, t 0 ) N ɛ (46) with probability at least ( + 2/τ) 2n+ exp( cɛ 2 ). We next argue that (46) holds with probability c exp( c 2 ɛ 2 ) for soe constants c, c 2 as long as c 0 (ɛ 2 log ɛ )n log n for sufficient large constant c 0. o prove this clai, we first observe ( + 2/τ) 2n+ exp(2n(log(n + ) + log 2 + log(/ɛ))) exp(2n(log )). We note that once ɛ is chosen, it is fixed in the whole proof and does not scale with or n. For siplicity, assue that ɛ < /e. Fix soe positive constant c < c c 2. It then suffices to show that there exist large constant c 0 such that if c 0 (ɛ 2 log ɛ )n log n, then 2n log < c ɛ 2. (47) For any fixed n, if (47) holds for soe and > (2/c )ɛ 2 n, then (47) always holds for larger, because 2n log( + ) = 2n log + 2n(log( + ) log ) = 2n log + 2n log( + ) 2n log + 2n c ɛ 2 + c ɛ 2 = c ( + )ɛ 2. Next, for any n, we can always find a c 0 such that (47) holds for = c 0 (ɛ 2 log ɛ )n log n. Such c 0 can be easily found for large n, i.e., c 0 = 4/c is a valid option if (4/c )(ɛ 2 log ɛ )n log n < n 2. (48) Moreover, since the nuber of n that violates (48) is finite, the axiu over all such c 0 serves the purpose.

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow Huishuai Zhang Department of EECS, Syracuse University, Syracuse, NY 3244 USA Yuejie Chi Department of ECE, The Ohio State