Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow

Size: px
Start display at page:

Download "Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow"

Transcription

1 Provable Non-convex Phase Retrieval with Outliers: Median runcated Wirtinger Flow Huishuai Zhang Departent of EECS, Syracuse University, Syracuse, NY 3244 USA Yuejie Chi Departent of ECE, he Ohio State University, Colubus, OH 4320 USA Yingbin Liang Departent of EECS, Syracuse University, Syracuse, NY 3244 USA Abstract Solving systes of quadratic equations is a central proble in achine learning and signal processing. One iportant exaple is phase retrieval, which ais to recover a signal fro only agnitudes of its linear easureents. his paper focuses on the situation when the easureents are corrupted by arbitrary outliers, for which the recently developed non-convex gradient descent Wirtinger flow (WF) and truncated Wirtinger flow (WF) algoriths likely fail. We develop a novel edian-wf algorith that exploits robustness of saple edian to resist arbitrary outliers in the initialization and the gradient update in each iteration. We show that such a non-convex algorith provably recovers the signal fro a near-optial nuber of easureents coposed of i.i.d. Gaussian entries, up to a logarithic factor, even when a constant portion of the easureents are corrupted by arbitrary outliers. We further show that edian-wf is also robust when easureents are corrupted by both arbitrary outliers and bounded noise. Our analysis of perforance guarantee is accoplished by developent of non-trivial concentration easures of edian-related quantities, which ay be of independent interest. We further provide nuerical experients to deonstrate the effectiveness of the approach.. Introduction Phase retrieval is a classical proble in achine learning, signal processing and optical iaging, where one ais to Proceedings of the 33 rd International Conference on Machine Learning, New York, NY, USA, 206. JMLR: W&CP volue 48. Copyright 206 by the author(s). recover a signal x R n fro only observing the agnitudes of its linear easureents: y i = a i, x 2, i =,...,. It has any iportant applications such as X-ray crystallography (Drenth, 2007), but is known to be notoriously difficult due to the quadratic for of the easureents. Classical ethods based on alternating iniization between the signal of interest and the phase inforation (Fienup, 982), though coputationally siple, are often trapped at local inia and lack rigorous perforance guarantees. Using the lifting trick, the phase retrieval proble can be reforulated as estiating a rank-one positive seidefinite atrix X = xx fro linear easureents (Balan et al., 2006), to which convex relaxations into seidefinite prograing are considered (Waldspurger et al., 205; Candès et al., 203; Chen et al., 205; Deanet & Hand, 204; Candès & Li, 204; Li & Voroninski, 203). In particular, when the easureent vectors a i s are coposed of i.i.d. Gaussian entries, Phaselift (Candès et al., 203) perfectly recovers all x R n with high probability as long as the nuber of easureents is on the order of n. However, the coputational cost of Phaselift becoes prohibitive when the signal diension is large. Appealingly, a so-called Wirtinger flow (WF) algorith based on gradient descent was recently proposed in (Candès et al., 205; Soltanolkotabi, 204) and shown to work rearkably well: it converges to the global optia when properly initialized using the spectral ethod. he truncated Wirtinger flow (WF) algorith (Chen & Candès, 205) further iproves WF by eliinating saples whose contributions to both the initialization and the search direction are excessively deviated fro the saple ean, so that the behavior of each gradient update is well controlled. WF is shown to converge globally at a geoetric rate as long as is on the order of n for i.i.d. Gaussian easureent vectors using a constant step size. Both WF and WF algoriths have

2 been shown to be robust to bounded noise in the easureents. However, the perforance of WF and WF can be very sensitive to outliers that take arbitrary values and can introduce anoalous search directions. Even for WF, since the saple ean can be arbitrarily perturbed, the truncation rule based on such saple ean cannot control the gradient well. On the other hand, the ability to handle outliers is of great iportance for phase retrieval algoriths, because outliers arise frequently fro the phase iaging applications (Weller et al., 205) due to various reasons such as detector failures, recording errors, and issing data. While a for of Phaselift (Hand, 205) is shown to be robust to sparse outliers even when they constitute a constant portion of all easureents, it is coputationally too expensive... Main Contributions he ain contribution of this paper lies in the developent of a non-convex phase retrieval algorith with both statistical and coputational efficiency, and provable robustness to even a constant proportion of outliers. o the best of the authors knowledge, our work is the first application of the edian to robustify high-diensional statistical estiation in the presence of arbitrary outliers with rigorous non-asyptotic perforance guarantees. Our strategy is to carefully robustify the WF algorith by replacing the saple ean used in the truncation rule by its robust counterpart, the saple edian. We refer to the new algorith as edian truncated Wirtinger flow (edian-wf). Appealingly, edian-wf does not require any knowledge of the outliers. he robustness property of edian lies in the fact that the edian cannot be arbitrarily perturbed unless the outliers doinate the inliers (Huber, 20). his is in sharp contrast to the ean, which can be ade arbitrarily large even by a single outlier. hus, using the saple edian in the truncation rule can effectively reove the ipact of outliers and indeed, the perforance of edian-wf can be provably guaranteed. Statistically, the saple coplexity of edian-wf is near-optial up to a logarithic factor when the easureent vectors are coposed of i.i.d. Gaussian entries. We deonstrate that as soon as the nuber of easureents is on the order of n log n, edian-wf converges to the global optia, i.e. recovers the ground truth up to a global sign difference, even when the nuber of outliers scales linearly with. Coputationally, edian-wf converges at a geoetric rate, requiring a coputational cost of O(n log /ɛ) to reach ɛ-accuracy, which is linear in the proble size. Reassuringly, under the sae saple coplexity, edian-wf still recovers the ground truth when outliers are absent. It can therefore handle outliers in an oblivious fashion. Finally, edian-wf is also stable when the easureents are further corrupted by dense bounded noise besides outliers. Our proof proceeds by first showing the initialization of edian-wf is close enough to the ground truth, and that the neighborhood of the ground truth, where the initialization lands in, satisfies certain Regularity Condition (Candès et al., 205; Chen & Candès, 205) that guarantees convergence of the descent rule, as long as the size of the corruption is sall enough and the saple size is large enough. However, as a nonlinear operator, the saple edian used in edian-wf is uch ore difficult to analyze than the saple ean used in WF, which is a linear operator and any existing concentration inequalities are readily applicable. Considerable technical efforts lie in developing novel non-asyptotic concentrations of the saple edian, and various statistical properties of the saple edian related quantities, which ay be of independent interest..2. Related Work he adoption of edian in achine learning and coputer science is not unfailiar, for exaple, K-edian clustering (Chen, 2006) and resilient data aggregation for sensor networks (Wagner, 2004). Our work here further extends the applications of edian to robustifying highdiensional estiation probles. Another popular approach in robust estiation is to use the tried ean (Huber, 20), which has found success in robustifying sparse regression (Chen et al., 203), linear regression (Bhatia et al., 205), subspace clustering (Qu & Xu, 205), etc. However, using the tried ean requires knowledge of an upper bound on the nuber of outliers, whereas edian does not require such inforation. Developing non-convex algoriths with provable global convergence guarantees has attracted intensive research interest recently. A partial list includes low-rank atrix recovery (Keshavan et al., 200; Zheng & Lafferty, 205; u et al., 205; Chen & Wainwright, 205; White et al., 205), robust principal coponent analysis (Netrapalli et al., 204), robust tensor decoposition (Anandkuar et al., 205), dictionary learning (Arora et al., 205; Sun et al., 205), etc. We expect our analysis in this paper ay be extended to robustly solving other systes of quadratic or bilinear equations in a non-convex fashion, such as ixed linear regression (Chen et al., 204), sparse phase retrieval (Cai et al., 205), and blind deconvolution (Lee et al., 205)..3. Paper Organization and Notations he rest of this paper is organized as follows. Section 2 provides the proble forulation, and Section 3 describes the proposed edian-wf algorith. heoretical perforance guarantees are stated in Section 4, with proof outlines given in Section 5. Nuerical experients are presented in Section 6. Finally, we conclude in Section 7. We adopt the following notations in this paper. Given a vec-

3 tor of nubers {β i }, the saple edian is denoted as ed({β i } ). he indicator function A = if the event A holds, and A = 0 otherwise. For two atrices, A B if B A is a positive seidefinite atrix. We define the Euclidean distance between two vectors up to a global sign difference as dist(z, x) := in { z x, z + x }. 2. Proble Forulation Suppose the following set of easureents are given y i = a i, x 2 + η i, i =,,, () where x R n is the unknown signal, a i R n for i =,..., are easureent vectors with each a i having i.i.d. Gaussian entries distributed as N (0, ), and η i R for i =,..., are outliers with arbitrary values. We assue that outliers are sparse with s nonzero values, i.e., η 0 s, where η = {η i } R. Here, s is a nonzero constant, representing the faction of easureents that are corrupted by outliers. We are also interested in the odel when the easureents are corrupted by not only sparse arbitrary outliers but also dense bounded noise. Under such a odel, the easureents are given by y i = a i, x 2 + w i + η i, i =,,, (2) where the bounded noise w = {w i } satisfies w c x 2 for soe universal constant c, and as before, the outlier satisfies η 0 s. he goal is to recover the signal x (up to a global sign difference) fro the easureents y = {y i } and easureent vectors {a i }. 3. Median-WF Algorith A natural idea is to recover the signal as a solution to the following optiization proble in z l(z; y i ) (3) where l(z, y i ) is a likelihood function, e.g., using Gaussian or Poisson likelihood. Since the easureents are quadratic in x, the objective function is non-convex. A typical gradient descent procedure to solve (3) proceeds as z (t+) = z (t) + µ t l(z (t) ; y i ), (4) where z (t) denotes the tth iterate of the algorith, and µ t is the step size. By a careful initialization using the spectral We focus on real signals here, but our analysis can be extended to coplex signals. ethod, the WF algorith (Candès et al., 205) using the gradient descent update (4) is shown to converge globally under the Gaussian likelihood (i.e., quadratic loss) function, as long as the nuber of easureents is on the order of n log n for i.i.d. Gaussian easureent vectors. he WF algorith (Chen & Candès, 205) iproves WF in both initialization and the descent rule: only a subset of saples 0 contributes to the spectral ethod, and only a subset of data-dependent and iteration-varying saples t+ contributes to the search directions: z (t+) = z (t) + µ t i t+ l(z (t) ; y i ). (5) he idea is that through pruning, i.e., saples with gradient coponents l(z (t) ; y i ) being uch larger than the saple ean are truncated so that each update is well controlled. his odification yields both statistical and coputational benefits under the Poisson loss function, WF converges globally geoetrically to the true signal with a constant step size with easureents at the order of n and with i.i.d. Gaussian easureent vectors. However, if soe easureents are corrupted by arbitraryvalued outliers as in (), both WF and WF can fail. his is because the gradient of the loss function typically contains the ter y i a i z 2. With y i being corrupted by arbitrarily large η i, the gradient can deviate the search direction fro the signal arbitrarily. In WF, since the truncation rule is based on the saple ean of the gradient, which can be affected significantly even by a single outlier, we cannot expect it to converge globally, particularly when the fraction of corrupted easureents is linear with the total nuber of easureents, which is the regie we are interested in. o handle outliers, our central idea is to prune the saples in both the initialization and each iteration via the saple edian related quantities. Copared to the saple ean used in WF, the saple edian is uch less affected even in the presence of a certain linear fraction of outliers, and is thus ore robust to outliers with arbitrary values. In the following, we describe our edian-wf in ore details. We adopt the following Poisson likelihood function, l(z; y i ) = y i log a i z 2 a i z 2, (6) which is otivated by the axiu likelihood estiation of the signal when the easureents are corrupted by Poisson distributed noise. We note that our analysis is also applicable to the quadratic loss function, but in order to copare ore directly to WF in (Chen & Candès, 205), we adopt the Poisson likelihood function in (6). Our edian-wf algorith (suarized in Algorith ) contains the following ain steps:. Initialization: We initialize z (0) by the spectral ethod with a truncated set of saples, where the threshold is de-

4 Algorith Median runcated Wirtinger Flow (Median- WF) Input: y = {y i }, {a i} ; Paraeters: thresholds α y, α h, α l, and α u, stepsize µ t ; Initialization: Let z (0) = λ 0 z, where λ 0 = ed(y)/0.455 and z is the leading eigenvector of Y := y i a i a i { yi α 2 y λ2 0 }. (7) Gradient loop: for t = 0 : do z (t+) = z (t) + 2µ t y i a i z(t) 2 a a i E i i z(t) E2 i, (8) where { } E i := α l z (t) a i z (t) α u z (t), { E2 i := y i a i z (t) 2 a } i α h K z(t) t, z (t) ( ) K t := ed { y i a i z (t) 2 }. Output z. terined by the edian of {y i }. In coparison, WF does not truncate saples, and the truncation in WF is based on the ean of {y i }, which is not robust to outliers. As will be shown, as long as the portion of outliers is not too large, our initialization (7) is guaranteed to be within a sall neighborhood of the ground truth signal. 2. Gradient loop: for each iteration 0 t, coparing (4) and (8), edian-wf uses an iteration-varying truncated gradient given as l tr (z (t) ) = 2 y i a i z(t) 2 a a i E i i z(t) E2 i. (9) It is clear fro the definition of the set E i 2 (see Algorith ), that saples are truncated by the saple edian of gradient coponents evaluated at the current iteration, as opposed to the saple ean in WF. We set the step size in the edian-wf to be a fixed sall constant, i.e., µ t = 0.2. he rest of the paraeters {α y, α h, α l, α u } are set to satisfy { [ ] ζ := ax E ξ 2 { ξ <.0αl or ξ > 0.99α u}, [ ] } E { ξ <.0αl or ξ > 0.99α u}, ζ 2 := E [ ξ 2 { ξ >0.248αh } ], (0) 2(ζ + ζ 2 ) + 8/πα h <.99 α y 3, where ξ N (0, ). For exaple, we set α l = 0.3, α u = 5, α y = 3 and α h = 2, and consequently ζ 0.24 and ζ Perforance Guarantees of Median-WF In this section, we characterize the perforance guarantees of edian-wf. heore (Exact recovery with sparse arbitrary outliers). Consider the phase retrieval proble with sparse outliers given in (). here exist constants µ 0, s 0 > 0, 0 < ρ, ν < and c 0, c, c 2 > 0 such that if c 0 n log n, s < s 0, µ µ 0, then with probability at least c exp( c 2 ), the edian-wf yields dist(z (t), x) ν( ρ) t x, t N () siultaneously for all x R n \{0}. heore indicates that edian-wf adits exact recovery for all signals in the presence of sparse outliers with arbitrary agnitudes even when the nuber of outliers scales linearly with the nuber of easureents, as long as the nuber of saples satisfies n log n. his is nearoptial up to a logarithic factor. Moreover, edian-wf converges at a geoetric rate using a constant step size, with per-iteration cost O(n) (note that the edian can be coputed in linear tie (ibshirani, 2008)). o reach ɛ-accuracy, i.e., dist(z (t), x) ɛ, only O(log /ɛ) iterations are needed, and the total coputational cost is O(n log /ɛ), which is highly efficient. Not surprisingly, heore iplies that edian-wf also perfors well for the noise-free odel, as a special case of the odel with outliers. his justifies utilization of edian- WF in an oblivious situation without knowing whether the underlying easureents are corrupted by outliers. Corollary (Exact recovery for noise-free odel). Suppose that the easureents are noise-free, i.e., η i = 0 for i =,, in the odel (). here exist constants µ 0 > 0, 0 < ρ, ν < and c 0, c, c 2 > 0 such that if c 0 n log n and µ µ 0, then with probability at least c exp( c 2 ), the edian-wf yields dist(z (t), x) ν( ρ) t x, t N (2) siultaneously for all x R n \{0}. We next consider the odel when the easureents are corrupted by both sparse arbitrary outliers and dense bounded noise. Our following theore characterizes that edian-wf is robust to coexistence of the two types of noises. heore 2 (Stability to sparse arbitrary outliers and dense bounded noises). Consider the phase retrieval proble given in (2) in which easureents are corrupted

5 by both sparse arbitrary and dense bounded noises. here exist constants µ 0, s 0 > 0, 0 < ρ < and c 0, c, c 2 > 0 such that if c 0 n log n, s < s 0, µ µ 0, then with probability at least c exp( c 2 ), edian-wf yields dist(z (t), x) w x + ( ρ)t x, t N (3) siultaneously for all x R n \{0}. heore 2 iediately iplies the stability of edian- WF for the odel corrupted only by dense bounded noise. Corollary 2. Consider the phase retrieval proble in which easureents are corrupted only by dense bounded noises, i.e., η i = 0 for i =,, in the odel (2). here exist constants µ 0 > 0, 0 < ρ < and c 0, c, c 2 > 0 such that if c 0 n log n, µ µ 0, then with probability at least c exp( c 2 ), edian-wf yields dist(z (t), x) w x + ( ρ)t x, t N (4) siultaneously for all x R n \{0}. hus, heore 2 and Corollary 2 iply that edian-wf for the odel with both sparse arbitrary outliers and dense bounded noises achieves the sae convergence rate and the sae level of estiation error as the odel with only bounded noise. In fact, together with heore and Corollary, it can be seen that applying edian-wf does not require the knowledge of the noise corruption odels. When there indeed exist outliers, edian-wf achieves the sae perforance as if the outliers do not exist. 5. Proof Outlines of Main Results In this section, we first develop a few statistical properties of edian that will be useful for our analysis of perforance guarantees, and then provide a proof sketch of our ain results. he details of the proofs can be found in Suppleental Materials. 5.. Properties of Median We start by the definitions of the quantile of a population distribution and its saple version. Definition (Generalized quantile function). Let 0 < p <. For a cuulative distribution function (CDF) F, the generalized quantile function is defined as F (p) = inf{x R : F (x) p}. (5) For siplicity, denote θ p (F ) = F (p) as the p-quantile of F. Moreover for a saple sequence {X i }, the saple p-quantile θ p ({X i }) eans θ p ( ˆF ), where ˆF is the epirical distribution of the saples {X i }. Reark. We take the edian ed({x i }) = /2 ({X i }) and use both notations interchangeably. Next, we show that as long as the saple size is large enough, the saple quantile concentrates around the population quantile (otivated fro (Charikar et al., 2002)), as in Lea. Lea. Suppose F ( ) is cuulative distribution function (i.e., non-decreasing and right-continuous) with continuous density function F ( ). Assue the saples {X i } are i.i.d. drawn fro F. Let 0 < p <. If l < F (θ) < L for all θ in {θ : θ θ p ɛ}, then θ p ({X i } ) θ p (F ) < ɛ (6) holds with probability at least 2 exp( 2ɛ 2 l 2 ). Lea 2 bounds the distance between the edian of two sequences. Lea 2. Given a vector X = (X, X 2,..., X n ), reorder the in a non-decreasing anner X () X (2)... X (n ) X (n). Given another vector Y = (Y, Y 2,..., Y n ), then one has for all k =,..., n. X (k) Y (k) X Y, (7) Lea 3 suggests that in the presence of outliers, one can lower and upper bound the saple edian by neighboring quantiles of the corresponding clean saples. Lea 3. Consider clean saples { X i }. If a fraction s of the are corrupted by outliers, one obtains containated saples {X i } which contain s corrupted saples and ( s) clean saples. hen 2 s ({ X i }) 2 ({X i}) 2 +s ({ X i }). Finally, Lea 4 is related to bound the edian and density at the edian for the product of two possibly correlated standard Gaussian rando variables. Lea 4. Let u, v N (0, ) which can be correlated with the correlation coefficient ρ. Let r = uv, and ψ ρ (x) represent the density of r. Denote (ψ ρ) as 2 the edian of r, and the value of ψ ρ (x) at the edian as ψ ρ (/2 ). hen for all ρ, < /2 (ψ ρ ) < 0.455, 0.47 < ψ ρ (/2 ) < Robust Initialization with Outliers We show that the initialization provided by the ediantruncated spectral ethod in (7) is close enough to the

6 ground truth, i.e. dist(z (0), x) / x, even if there are s arbitrary outliers, as long as s is a sall enough constant.. We first bound the concentration of ed({y i }), also denoted by /2 ({y i }). Lea 3 suggests that 2 s ({(a i x) 2 }) < /2 ({y i }) < 2 +s ({(a i x) 2 }) Observe that (a i x)2 = ã 2 i x 2, where ã i = a i x/ x is a standard Gaussian rando variable. hus ã i 2 is a χ 2 rando variable, whose CDF is denoted as K. By Lea, for a sall ɛ, one has 2 s ({ ã i 2 }) 2 s (K) < ɛ and 2 +s ({ ã i 2 }) 2 +s (K) < ɛ with probability exp( cɛ 2 ). hus, let ζ s := 2 s (K) and ζ s := 2 +s (K), we have with probability exp( cɛ 2 ) (ζ s ɛ) x 2 < /2 ({y i }) < (ζ s + ɛ) x 2, (8) where ζ s and ζ s can be arbitrarily close if s is sall enough. 2. We next estiate the direction of x, assuing x =. he nor x can be estiated as in Algorith. For siplicity of presentation, we assue x =. Using (8), the atrix Y in (7) can be bounded by Y Y Y 2 with high probability, where Y := ai a i (a i x) 2 {(a i x)2 α 2 y (ζs ɛ)/0.455} Y 2 := ai a i (a i x) 2 {(a i x)2 α 2 y (ζs +ɛ)/0.455} + a i a i α 2 y(ζ s + ɛ)/ i S It can then be shown by concentration of rando atrices with non-isotropic sub-gaussian rows (Vershynin, 202) that Y and Y 2 concentrate on their eans which can be ade arbitrarily close with s sufficiently sall. ogether with the fact that both eans of Y and Y 2 have leading eigenvector x, one can justify that the leading eigenvector of Y can be ade close enough to x for sufficiently sall constant s Geoetric Convergence We now show that within a sall neighborhood of the ground truth, the truncated gradient (9) satisfies the Regularity Condition (RC) (Candès et al., 205; Chen & Candès, 205), which guarantees the geoetric convergence of edian-wf once the initialization lands into this neighborhood. Definition 2. he gradient l tr (z) is said to satisfy the Regularity Condition RC(µ, λ, ɛ) if l tr(z), h µ 2 2 l tr(z) + λ 2 h 2 (9) for all z and h = z x obeying h c z. We will first show that l tr (z) satisfies the RC for the noise-free case, and then extend it to odel () with sparse outliers, thus establishing the global convergence of edian-wf in both cases. he central step to establish the RC is to show that the saple edian used in the truncation rule concentrates at the level z x z as stated in the following proposition. Proposition. If > c 0 n log n, then with probability at least c exp( c 2 ), there exist constants β and β such that β z h ed( { a i x 2 a i z 2 } ) β z h, holds for all z, h := z x satisfying h < / z. We note that a siilar property for the saple ean has been shown in (Chen & Candès, 205) as long as the nuber of easureents is O(n). In fact, the edian is uch ore challenging to handle due to its non-linearity, which also causes slightly ore easureents copared to the saple ean. We next briefly sketch how we exploit the properties of the edian developed in Section 5. to show Proposition. First fix x and z satisfying x z < / z. It can be shown that (a i x) 2 (a i z) 2 = c u i v i h z, where u i and v i are correlated N (0, ) Gaussian rando variables and.89 < c < 2. Hence, Lea 4 and Lea iply that (0.65 ɛ) z h ed ({ (a i x) 2 (a i z) 2 }) (0.9 + ɛ) z h (20) for given x and z with high probability. hen applying the net covering arguent we prove that (20) holds for all z and x with z x z. In particular, Lea 2 is applied to bound the distance between the edians of two points on and off the net. For the odel in () with outliers, we show that ed({ y i (a i z) 2 }) continues to have property as in Proposition even with the presence of a sall constant portion of outliers. his can be accoplished by first observing 2 s ({ (a i x) 2 (a i z) 2 }) 2 ({ y i (a i z) 2 }) 2 +s ({ (a i x) 2 (a i z) 2 }). (2) using Lea 3, and then extending (20) to quantiles 2 s and 2 +s respectively for a sall constant s. aking all these together yields bounds for both sides of (2) at the level of z h. Reark 2. With ɛ sall enough, β = 0.6, β = is a valid choice for Proposition. We will set the algorith paraeters based on these two values.

7 /n /n 5.4. Stability with Additional Dense Bounded Noise Now, consider the odel in (2) with both sparse outliers and dense bounded noise. We oit the analysis of the initialization step as it is siilar to Section 5.2. We split our analysis of the gradient loop into two regies. Regie : Assue c 4 z h c 3 ( w / z ). Lea 3 iplies 2 s ({ ỹi (a i z) 2 }) ed( { yi (a i z) 2 } ) 2 +s ({ ỹ i (a i z) 2 }), (22) where ỹ i := (a i x)2 + w i i.e., easureents that are corrupted by only bounded noise. Moreover, Lea 2 and assuption of the regie iplies 2 +s ({ ỹ i (a i z) 2 }) 2 +s ({ (a i x) 2 (a i z) 2 }) w, 2 s ({ ỹ i (a i z) 2 }) 2 s ({ (a i x) 2 (a i z) 2 }) w. (23) herefore, cobining the above inequalities with Proposition, we have β x z z ed({ yi (a i z) 2 }) β x z z, iplying that RC holds in Regie, and the error decreases geoetrically in each iteration. Regie 2: Assue h c 3 ( w / z ). Since each update µ l tr(z) is at ost the order of w / z, the estiation error cannot increase by ore than ( w / z ) with a constant factor. hus, one has ( dist z + µ ) l tr(z), x c 5 ( w / x ) (24) for soe constant c 5. As long as w / x 2 is sufficiently sall, it is guaranteed that c 5 ( w / x ) c 4 x. If the iteration jups out of Regie 2, it falls into Regie described above. 6. Nuerical Experients In this section, we provide nuerical experients to deonstrate the effectiveness of edian-wf, which corroborates with our theoretical findings.we first show that, in the noise-free case, our edian-wf perfors siilarly as WF (Chen & Candès, 205) for exact recovery. We set the paraeters of edian-wf as specified in Section 3, and those of WF as suggested in (Chen & Candès, 205). Let the signal length n take values fro 000 to 0000 by a step size of 000, and the ratio of the saple coplexity to the signal length, /n, take values fro 2 to 6 by a step size of 0.. For each pair of (, n), we generate a signal x N (0, I n n ), and the easureent vectors a i N (0, I n n ) i.i.d. for i =,...,. For both algoriths, a fixed nuber of iterations = 500 are run, and the trial is declared successful if z ( ), the output of the algorith, satisfies dist(z ( ), x)/ x 0 8. Figure shows the nuber of successful trials out of 20 trials for both algoriths, with respect to /n and n. It can be seen that for both algoriths, as soon as is above 4n, exact recovery is achieved for both algoriths. Around the phase transition boundary, the perforance of edian- WF is slightly worse than that of WF, which is possibly due to the inefficiency of edian copared to ean in the noise-free case (Huber, 20) n (a) edian-wf n (b) WF Figure. Phase transition of edian-wf and WF for noisefree data: the gray scale of each cell (/n, n) indicates the nuber of successful recovery out of 20 trials. We next exaine the perforance of edian-wf in the presence of sparse outliers. We copare the perforance of edian-wf with not only WF but also an alternative which we call the triean-wf, based on replacing the saple ean in WF by the tried ean. More specifically, triean-wf requires knowing the fraction s of outliers so that saples corresponding to s largest gradient values are reoved, and truncation is then based on the ean of reaining saples. We fix the signal length n = 000 and the nuber of easureents = We assue each easureent y i is corrupted with probability s [0, 0.4] independently, where the corruption value η i U(0, η ) is randoly generated fro a unifor distribution. Figure 2 shows the success rate of exact recovery over 00 trials as a function of s at different levels of outlier agnitudes η / x 2 = 0.,, 0, 00, for the three algoriths edian-wf, triean-wf and WF. Fro Figure 2, it can be seen that edian-wf allows exact recovery as long as s is not too large for all levels of outlier agnitudes, without any knowledge of the outliers, which validates our theoretical analysis. Unsurprisingly, WF fails quickly even with very sall fraction of outliers. No successful instance is observed for WF when s 0.02 irrespective of the value of η. riean-wf does not exhibit as sharp phase transition as edian-wf, and in general underperfors our edian-wf, except when both η and s gets very large, see Figure 2(d). his is because in this range with large outliers, knowing the fraction s of outliers provides substantial advantage for

8 Relative error Relative error Relative error Success rate Success rate Success rate Success rate 0.8 edian-wf triean-wf WF 0.8 edian-wf triean-wf WF 0.8 edian WF triean WF WF edian-wf triean-wf WF Outliers fraction s Outliers fraction s Outliers fraction s Outliers fraction s (a) η = 0. x 2 (b) η = x 2 (c) η = 0 x 2 (d) η = 00 x 2 Figure 2. Success rate of exact recovery with outliers for edian-wf, triean-wf, and WF at different levels of outlier agnitudes edian-wf with outliers WF with outliers WF without outliers edian-wf with outliers WF with outliers WF without outliers edian-wf with outliers WF with outliers WF without outliers Iterations (a) w = 0.0 x Iterations (b) w = 0.00 x Iterations (c) Poisson noise Figure 3. he relative error with respect to the iteration count for edian-wf and WF with both dense noise and sparse outliers, and WF with only dense noise. Perforance of edian-wf with both dense noise and sparse outliers is coparable to that of WF without outliers. (a) and (b): Unifor noise with different levels; (c) Poisson noise. triean-wf to eliinate the, while the saple edian can be deviated significantly fro the true edian for large s. Moreover, it is worth entioning that exact recovery is ore challenging for edian-wf when the agnitudes of ost outliers are coparable to the easureents, as in Figure 2(c). In such a case, the outliers are ore difficult to exclude as opposed to the case with very large outlier agnitudes as in Figure 2(d); and eanwhile, the outlier agnitudes in Figure 2(c) are large enough to affect the accuracy heavily in contrast to the cases in Figure 2(a) and 2(b) where outliers are less proinent. We now exaine the perforance of edian-wf in the presence of both sparse outliers and dense bounded noise. he entries of the dense bounded noise w is generated independently fro U(0, w ), with w / x 2 = 0.00, 0.0 respectively. he entries of the outlier is then generated as η i w Bernoulli(0.) independently. Figure 3(a) and Figure 3(b) depict the relative error dist(z (t), x)/ x with respect to the iteration count t, for unifor noise at different levels. It can be seen that edian-wf under outlier corruption clearly outperfors WF under the sae situation, and acts as if the outliers do not exist by achieving alost the sae accuracy as WF under no outliers. Moreover, the solution accuracy of edian-wf has 0 ties gain fro Figure 3(a) to Figure 3(b) as w shrinks by /0, which corroborates heore 2 nicely. Finally, we consider when the easureents are corrupted both by Poisson noise and outliers, which odels photon detection in optical iaging applications. We generate each easureent as y i Poisson( a i, x 2 ), for i =,,, which is then corrupted with probability s = 0. by outliers. he entries of the outlier are obtained by first generating η i x 2 U(0, ) independently, and then rounding it to the nearest integer. Figure 3(c) depicts the relative error dist(z (t), x)/ x with respect to the iteration count t, where again edian-wf under both Poisson noise and sparse outlier noise has alost the sae accuracy as WF under only Poisson noise. 7. Conclusions In this paper, we proposed edian-wf, and showed that it allows exact recovery even with a constant proportion of arbitrary outliers for robust phase retrieval. his is in contrast to recently developed WF and WF, which likely to fail under outlier corruptions. We anticipate that saple edian can be applied to designing provably robust non-convex algoriths for other inference probles under sparse arbitrary corruptions. he techniques we develop here to analyze perforance guarantee for edian-based algoriths will be useful in those contexts as well. Acknowledgeents We would like to thank the anonyous reviewers for their helpful coents. We also thank Yining Wang and Yu- Xiang Wang at Carnegie Mellon University for helpful discussion on statistical properties of the edian. he research of Huishuai Zhang and Yingbin Liang is supported by the grant AFOSR FA he work of Y. Chi is supported in part by the grants NSF CCF , AFOSR FA and ONR N

9 References Anandkuar, A., Jain, P., Shi, Y., and Niranjan, U.N. ensor vs atrix ethods: Robust tensor decoposition under block sparse perturbations. arxiv preprint arxiv: , 205. Arora, S., Ge, R., Ma,., and Moitra, A. Siple, efficient, and neural algoriths for sparse coding. arxiv preprint arxiv: , 205. Balan, R., Casazza, P., and Edidin, D. On signal reconstruction without phase. Applied and Coputational Haronic Analysis, 20(3): , Bhatia, K., Jain, P., and Kar, P.. Robust regression via hard thresholding. In Advances in Neural Inforation Processing Systes (NIPS), pp , 205. Cai,.., Li, X., and Ma, Z. Optial rates of convergence for noisy sparse phase retrieval via thresholded wirtinger flow. arxiv preprint arxiv: , 205. Candès, E. J. and Li, X. Solving quadratic equations via phaselift when there are about as any equations as unknowns. Foundations of Coputational Matheatics, 4(5):07 026, 204. Candès, E. J., Stroher,., and Voroninski, V. Phaselift: Exact and stable signal recovery fro agnitude easureents via convex prograing. Counications on Pure and Applied Matheatics, 66(8):24 274, 203. Candès, E. J., Li, X., and Soltanolkotabi, M. Phase retrieval via wirtinger flow: heory and algoriths. IEEE ransactions on Inforation heory, 6(4): , 205. Charikar, M., Chen, K., and Farach-Colton, M. Finding frequent ites in data streas. In Autoata, languages and prograing, pp Springer, Chen, K. On k-edian clustering in high diensions. In Proceedings of the seventeenth annual ACM-SIAM syposiu on Discrete algorith, pp Chen, Y. and Candès, E. J. Solving rando quadratic systes of equations is nearly as easy as solving linear systes. In Advances in Neural Inforation Processing Systes (NIPS), pages Chen, Y. and Wainwright, M. J. Fast low-rank estiation by projected gradient descent: General statistical and algorithic guarantees. arxiv preprint arxiv: , 205. Chen, Y., Caraanis, C., and Mannor, S. Robust sparse regression under adversarial corruption. In Proceedings of the 30th International Conference on Machine Learning (ICML), pp , 203. Chen, Y., Yi, X., and Caraanis, C. A convex forulation for ixed regression with two coponents: Miniax optial rates. In Conf. on Learning heory, 204. Chen, Y., Chi, Y., and Goldsith, A.J. Exact and stable covariance estiation fro quadratic sapling via convex prograing. IEEE ransactions on Inforation heory, 6(7): , July 205. Deanet, L. and Hand, P. Stable optiizationless recovery fro phaseless linear easureents. Journal of Fourier Analysis and Applications, 20():99 22, 204. Donahue, Jaes D. Products and quotients of rando variables and their applications. echnical report, DIC Docuent, 964. Drenth, J. X-Ray Crystallography. Wiley Online Library, Fienup, J. R. Phase retrieval algoriths: a coparison. Applied optics, 2(5): , 982. Hand, P. Phaselift is robust to a constant fraction of arbitrary errors. arxiv preprint arxiv: , 205. Huber, P. J. Robust statistics. Springer, 20. Keshavan, R. H., Montanari, A., and Oh, S. Matrix copletion fro a few entries. IEEE ransactions on Inforation heory, 56(6): , June 200. Lee, K., Li, Y., Junge, M., and Bresler, Y. Blind recovery of sparse signals fro subsapled convolution. arxiv preprint arxiv:5.0649, 205. Li, X. and Voroninski, V. Sparse signal recovery fro quadratic easureents via convex prograing. SIAM Journal on Matheatical Analysis, 203. Netrapalli, P., Niranjan, U.N., Sanghavi, S., Anandkuar, A., and Jain, P. Non-convex robust pca. In Advances in Neural Inforation Processing Systes (NIPS), pp. 07 5, 204. Qu, C. and Xu, H. Subspace clustering with irrelevant features via robust dantzig selector. In Advances in Neural Inforation Processing Systes (NIPS), pp Soltanolkotabi, M. Algoriths and theory for clustering and nonconvex quadratic prograing. PhD thesis, Stanford University, 204. Sun, J., Qu, Q., and Wright, J. Coplete dictionary recovery using nonconvex optiization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), pp , 205. ibshirani, R. J. Fast coputation of the edian by successive binning. arxiv preprint arxiv: , 2008.

10 u, S., Boczar, R., Soltanolkotabi, M., and Recht, B. Lowrank solutions of linear atrix equations via procrustes flow. arxiv preprint arxiv: , 205. Vershynin, R. Introduction to the non-asyptotic analysis of rando atrices. Copressed Sensing, heory and Applications, pp , 202. Wagner, D. Resilient aggregation in sensor networks. In Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks, pp ACM, Waldspurger, I., d Aspreont, A., and Mallat, S. Phase recovery, axcut and coplex seidefinite prograing. Matheatical Prograing, 49(-2):47 8, 205. Weller, D.S., Pnueli, A., Divon, G., Radzyner, O., Eldar, Y.C., and Fessler, J.A. Undersapled phase retrieval with outliers. IEEE ransactions on Coputational Iaging, (4): , Dec 205. White, C. D., Ward, R., and Sanghavi, S. he local convexity of solving quadratic equations. arxiv preprint arxiv: , 205. Zheng, Q. and Lafferty, J. A convergent gradient descent algorith for rank iniization and seidefinite prograing fro rando linear easureents. In Advances in Neural Inforation Processing Systes (NIPS), pp. 09 7, 205.

11 Suppleentary Materials A. Proof of Properties of Median (Section 5.) A.. Proof of Lea For siplicity, denote θ p := θ p (F ) and ˆθ p := θ p ({X i } ). Since F is continuous and positive, for an ɛ, there exists a δ such that P(X θ p ɛ) = p δ, where δ (ɛl, ɛl). hen one has ( ) ) ( ) (a) P (ˆθp < θ p ɛ = P {Xi θ p ɛ} p = P {Xi θ p ɛ} (p δ ) + δ (b) exp( 2δ 2 ) exp( 2ɛ 2 l 2 ), where (a) is due to the definition of the quantile function in (5) and (b) is due to the fact that {Xi θ p ɛ} Bernoulli(p δ ) i.i.d., followed by the Hoeffding inequality. Siilarly, one can show for soe δ 2 (ɛl, ɛl), ) P (ˆθp > θ p + ɛ exp( 2δ2) 2 exp( 2ɛ 2 l 2 ). Cobining these two inequalities, one has the conclusion. A.2. Proof of Lea 2 It suffices to show that X (k) Y (k) ax X l Y l, k =,, n. (25) l Case : k = n, suppose X (n) = X i and Y (n) = Y j, i.e., X i is the largest aong {X l } n l= and Y j is the largest aong {Y l } n l=. hen we have either X j X i Y j or Y i Y j X i. Hence, X (n) Y (n) = X i Y j ax{ X i Y i, X j Y j }. Case 2: k =, suppose that X () = X i and Y () = Y j. Siilarly X () Y () = X i Y j ax{ X i Y i, X j Y j }. Case 3: < k < n, suppose that X (k) = X i, Y (k) = Y j, and without loss of generality assue that X i < Y j (if X i = Y j, 0 = X (k) Y (k) ax l X l Y l holds trivially). We show the conclusion by contradiction. Assue X (k) Y (k) > ax l X l Y l. hen one ust have Y i < Y j and X j > X i and i j. Moreover for any p < k and q > k, the index of X (p) cannot be equal to the index of Y (q) ; otherwise the assuption is violated. hus, all Y (q) for q > k ust share the sae index set with X (p) for p > k. However, X j, which is larger than X i (thus if X j = X (k ), then k > k), shares the sae index with Y j, where Y j = Y (k). his yields contradiction. A.3. Proof of Lea 3 Assue that s is an integer. Since there are s corrupted saples in total, one can select out at least ( 2 s) clean saples fro the left half of ordered containated saples {/ ({X i }), θ 2/ ({X i }),, /2 ({X i })}. hus one has the left inequality. Furtherore, one can also select out at least ( 2 s) clean saples fro the right half of ordered containated saples {/2 ({X i }),, ({X i })}. One has the right inequality. A.4. Proof of Lea 4 First we introduce soe general facts for the distribution of the product of two correlated standard Gaussian rando variables (Donahue, 964). Let u N (0, ), v N (0, ), and their correlation coefficient be ρ [, ]. hen the density of uv is given by ( ) ( ) ρx x φ ρ (x) = π ρ exp 2 ρ 2 K 0 ρ 2, x 0,

12 where K 0 ( ) is the odified Bessel function of the second kind. hus the density of r = uv is [ ( ) ( ρx ψ ρ (x) = π exp ρ 2 ρ 2 + exp ρx )] ( ) x ρ 2 K 0 ρ 2, x > 0, (26) for ρ <. If ρ =, r becoes a χ 2 rando variable, with the density ψ ρ = (x) = 2π x /2 exp( x/2), x > 0. It can be seen fro (26) that the density of r only relates to the correlation coefficient ρ [, ]. Let /2 (ψ ρ ) be the /2 quantile (edian) of the distribution ψ ρ (x), and ψ ρ (/2 ) be the value of the function ψ ρ at the point /2 (ψ ρ ). Although it is difficult to derive the analytical expressions of /2 (ψ ρ ) and ψ ρ (/2 ) due to the coplicated for of ψ ρ in (26), due to the continuity of ψ ρ (x) and /2 (ψ ρ ), we can calculate the nuerically, as illustrated in Figure 4. Fro the nuerical calculation, one can see that both ψ ρ (/2 ) and /2 (ψ ρ ) are bounded fro below and above p=0.49 p=0.50 p= p=0.49 p=0.50 p=0.5 Quantile θ p (ψ ρ ) Density ψ ρ (θ p ) Correlation ρ Correlation ρ Figure 4. Quantiles and density at quantiles across ρ for all ρ [0, ] (ψ ρ ( ) is syetric over ρ, hence it is sufficient to consider ρ [0, ]), satisfying B. Robust Initialization with Outliers (Section 5.2) < /2 (ψ ρ ) < 0.455, 0.47 < ψ ρ (/2 ) < (27) his section proves that the truncated spectral ethod provides a good initialization even if s easureents are corrupted by arbitrary outliers as long as s is sall. Consider the odel in (). Lea 3 yields 2 s ({(a i x) 2 }) < /2 ({y i }) < 2 +s ({(a i x) 2 }). (28) Observe that a i x = ã2 i x 2, where ã i = a i x/ x is a standard Gaussian rando variable. hus ã i 2 is a χ 2 rando variable, whose cuulative distribution function is denoted as K(x). Moreover by Lea, for a sall ɛ, one has 2 s ({ ã i 2 }) 2 s (K) < ɛ and 2 +s ({ ã i 2 }) 2 +s (K) < ɛ with probability 2 exp( cɛ 2 ) and c is a constant around (c.f. Figure 4). We note that /2 (K) = and both 2 s (K) and 2 +s (K) can be (K) siultaneously as long as s is sall enough (independent of n). hus one has arbitrarily close to 2 ( 2 s (K) ɛ ) x 2 < /2 ({y i }) < ( 2 +s (K) + ɛ ) x 2, (29) with probability at least exp( cɛ 2 ). For the sake of siplicity, we introduce two new notations ζ s := 2 s (K) and ζ s := 2 +s (K). Specifically for the instance of s = 0.0, one has ζ s = and ζ s = It is easy to see that ζ s ζ s can be arbitrarily sall if s is sall enough. We first consider the case when x =. On the event that (29) holds, the truncation function has the following bounds, {yi α 2 y /2({y i})/0.455} {yi α 2 y (ζs +ɛ)/0.455} {yi α 2 y /2({y i})/0.455} {yi α 2 y (ζs ɛ)/0.455}.

13 On the other hand, denote the support of the outliers as S, we have Y = a i a i (a i x) 2 {(a i x) 2 α 2 y /2({y i})/0.455} + a i a i y i {yi α 2 y /2({y i})/0.455}. i/ S Consequently, one can bound Y as where we have Y := a i a i (a i x) 2 {(a i x) 2 α 2 y (ζs ɛ)/0.455} Y i/ S a i a i (a i x) 2 {(a i x) 2 α 2 y (ζs +ɛ)/0.455} + a i a i α 2 y(ζ s + ɛ)/0.455 =: Y 2, i/ S E[Y ] = ( s)(β xx + β 2 I), i S i S E[Y 2 ] = ( s)(β 3 xx + β 4 I) + sα 2 y (ζ s + ɛ) I, (30) with β := E[ξ 4 { } ξ α y (ζs ɛ)/0.455 ] E[ξ 2 { } ξ α y (ζs ɛ)/0.455 ], β 2 := E[ξ 2 { } ξ α y (ζs ɛ)/0.455 ] and β 3 := E[ξ 4 { } ξ α y (ζs ] E[ξ 2 { } +ɛ)/0.455 ξ α y (ζs ], β +ɛ)/ := E[ξ 2 { } ξ α y (ζs ], assuing +ɛ)/0.455 ξ N (0, ). Applying standard results on rando atrices with non-isotropic sub-gaussian rows (Vershynin, 202, equation (5.26)) and noticing that a i a i (a i x)2 { a i x c} can be rewritten as b i b i for soe sub-gaussian vector b i := a i (a i x) { a i x c}, one can deduce Y E[Y ] δ, Y 2 E[Y 2 ] δ (3) with probability exp( Ω()), provided that /n exceeds soe large constant. Besides, when ɛ and s are sufficiently sall, one further has E[Y ] E[Y 2 ] δ. Putting these together, one has Y ( s)(β xx + β 2 I) 3δ. (32) Let z (0) be the noralized leading eigenvector of Y. Repeating the sae arguent as in (Candès et al., 205, Section 7.8) and taking δ, ɛ to be sufficiently sall, one has for a given δ > 0, as long as /n exceeds soe large constant. dist( z (0), x) δ, (33) Furtherore let z (0) = ed{y i }/0.455 z (0) to handle cases x. By the bound (29), one has { ed({y i }) x 2 ζ s ɛ ax , ζ s } + ɛ x 2 ζs ζ s + ɛ x 2 (34) hus as long as s is a sall enough constant. dist(z (0), x) ζs ζ s + ɛ x δ x x (35) C. Geoetric Convergence for Noise-free Model (Proof of Corollary 2) After obtaining a good initialization, the central idea to establish geoetric convergence is to show that the truncated gradient l tr (z) in the neighborhood of the global optia satisfies the Regularity Condition RC(µ, λ, ɛ) defined in Definition 2. We show this by two steps. Step establishes a key concentration property for the saple edian used in the truncation rule, which is then subsequently exploited to prove RC in Step 2.

14 C.. Proof of Concentration Property for Saple Median We show that the saple edian used in the truncation rule concentrates at the level z x z as stated in the following proposition. Along the way, we also establish that the saple quantiles around the edian are also concentrated at the level z x z. Proposition 2 (Refined version of Proposition ). Fix ɛ (0, ). If > c 0 (ɛ 2 log ɛ )n log n, then with probability at least c exp( c 2 ɛ 2 ), (0.65 ɛ) z h ed{ (a i x) 2 (a i z) 2 } (0.9 + ɛ) z h, (36) (0.63 ɛ) z h θ 0.49, θ 0.5 { (a i x) 2 (a i z) 2 } ( ɛ) z h, (37) hold for all x, z with x z < / z, where h := z x. Proof. We first show for a fixed pair z and x, (36) and (37) hold with high probability. Let r i = (a i x)2 (a i z)2. hen r i s are i.i.d. copies of a rando variable r, where r = (a x) 2 (a z) 2 with the entries of a coposed of i.i.d. standard Gaussian rando variables. Note that the distribution of r is fixed once given h and z. Let x() denote the first eleent of a generic vector x, and x denote the reaining vector of x after eliinating the first eleent. Let U h be an orthonoral atrix with first row being h / h, and ã = U h a, z = U h z. Siilarly define U z and let b = U z ã. hen ã() and b() are independent standard noral rando variables. We further express r as follows. r = (a z) 2 (a x) 2 = (2a z a h)(a h) = (2ã z ã() h )(ã() h ) = (2h z h 2 )ã() 2 + 2(ã z )(ã() h ) = (2h z h 2 )ã() b() z ã() h = (2h z h 2 )ã() z 2 z() 2 ã() b() h ( ) = 2 h z h z h ( ) 2 ã() h z ã() b() h z z h z =: (2 cos(ω) t)ã() cos 2 (ω)ã() b() h z =: uṽ h z where ω is the angle between h and z, and t = h / z < /. Consequently, u = ã() N (0, ) and ṽ = (2 cos(ω) t)ã() + 2 sin(ω) b() is also a Gaussian rando variable with variance 3.6 < Var(ṽ) < 4 under the assuption t < /. Let v = ṽ/ Var(ṽ), then v N (0, ). Furtherore, let r = uv. Denote the density function of r as ψ ρ ( ) and the /2-quantile point of r as /2 (ψ ρ ). By Lea 4, we have 0.47 < ψ ρ (/2 ) < 0.76, < /2 (ψ ρ ) < (38) By Lea, we have with probability at least 2 exp( cɛ 2 ) (here c is around ), ɛ < ed({r i} ) < ɛ. (39) he sae arguents carry over to other quantiles θ 0.49 ({r i }) and θ 0.5({r i }). Fro Figure. 4, we observe that for ρ [0, ] 0.45 < ψ ρ (θ 0.49 ), ψ ρ (θ 0.5 ) < 0.78, < θ 0.49 (ψ ρ ), θ 0.5 (ψ ρ ) < (40)

15 and then we have with probability at least 2 exp( cɛ 2 ) (here c is around ), ɛ < θ 0.49 ({r }), θ 0.5 ({r }) < ɛ. (4) Hence, by ultiplying back Var(ṽ), we have with probability 2 exp( cɛ 2 ), (0.65 ɛ) z x z ed ({ (a i z) 2 (a i x) 2 }) (0.9 + ɛ) z x z, (42) (0.63 ɛ) z x z θ 0.49, θ 0.5 ({ (a i z) 2 (a i x) 2 }) ( ɛ) z x z. (43) We note that, to keep notation siple, c and ɛ ay vary line by line within constant factors. Up to now, we proved for any fixed z and x, the edian or neighboring quantiles of { (a i z)2 (a i x)2 } are upper and lower bounded by z x z ties constant factors. o prove (36) and (37) for all z and x with z x z, we use the net covering arguent. Still we argue for edian first and the sae arguents carry over to other quantiles soothly. o proceed, we restate (42) as ({ ( 2(a (0.65 ɛ) ed i z) z a i h h h z ) a i h h holds with probability at least 2 exp( cɛ 2 ) for a given pair h, z satisfying h / z /. }) (0.9 + ɛ), (44) Let τ = ɛ/(6n + 6), and let S τ be a τ-net covering the unit sphere, L τ be a τ-net covering a line with length /, and set N τ = {(z 0, h 0, t 0 ) : (z 0, h 0, t 0 ) S τ S τ L τ }. (45) One has cardinality bound (i.e., the upper bound on the covering nuber) N τ ( + 2/τ) 2n /(τ) < ( + 2/τ) 2n+. aking the union bound we have (0.65 ɛ) ed ({ 2(a i z 0 ) (a i h 0 )t 0 a i h 0 }) (0.9 + ɛ), (z 0, h 0, t 0 ) N ɛ (46) with probability at least ( + 2/τ) 2n+ exp( cɛ 2 ). We next argue that (46) holds with probability c exp( c 2 ɛ 2 ) for soe constants c, c 2 as long as c 0 (ɛ 2 log ɛ )n log n for sufficient large constant c 0. o prove this clai, we first observe ( + 2/τ) 2n+ exp(2n(log(n + ) + log 2 + log(/ɛ))) exp(2n(log )). We note that once ɛ is chosen, it is fixed in the whole proof and does not scale with or n. For siplicity, assue that ɛ < /e. Fix soe positive constant c < c c 2. It then suffices to show that there exist large constant c 0 such that if c 0 (ɛ 2 log ɛ )n log n, then 2n log < c ɛ 2. (47) For any fixed n, if (47) holds for soe and > (2/c )ɛ 2 n, then (47) always holds for larger, because 2n log( + ) = 2n log + 2n(log( + ) log ) = 2n log + 2n log( + ) 2n log + 2n c ɛ 2 + c ɛ 2 = c ( + )ɛ 2. Next, for any n, we can always find a c 0 such that (47) holds for = c 0 (ɛ 2 log ɛ )n log n. Such c 0 can be easily found for large n, i.e., c 0 = 4/c is a valid option if (4/c )(ɛ 2 log ɛ )n log n < n 2. (48) Moreover, since the nuber of n that violates (48) is finite, the axiu over all such c 0 serves the purpose.

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow

Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow Provable Non-convex Phase Retrieval with Outliers: Median Truncated Wirtinger Flow Huishuai Zhang Department of EECS, Syracuse University, Syracuse, NY 3244 USA Yuejie Chi Department of ECE, The Ohio State

More information

Solving Corrupted Quadratic Equations, Provably

Solving Corrupted Quadratic Equations, Provably Solving Corrupted Quadratic Equations, Provably Yuejie Chi London Workshop on Sparse Signal Processing September 206 Acknowledgement Joint work with Yuanxin Li (OSU), Huishuai Zhuang (Syracuse) and Yingbin

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation

Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation Yuejie Chi, Yuanxin Li, Huishuai Zhang, and Yingbin Liang Abstract Recent work has demonstrated the effectiveness

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method

Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Incremental Reshaped Wirtinger Flow and Its Connection to Kaczmarz Method Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 3244 hzhan23@syr.edu Yingbin Liang Department of EECS Syracuse

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Composite optimization for robust blind deconvolution

Composite optimization for robust blind deconvolution Coposite optiization for robust blind deconvolution Vasileios Charisopoulos Daek Davis Mateo Díaz Ditriy Drusvyatskiy Abstract The blind deconvolution proble seeks to recover a pair of vectors fro a set

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Solving Most Systems of Random Quadratic Equations

Solving Most Systems of Random Quadratic Equations Solving Most Systes of Rando Quadratic Equations Gang Wang, Georgios B. Giannakis Yousef Saad Jie Chen Digital Tech. Center & Dept. of Electrical and Coputer Eng., Univ. of Minnesota Key Lab of Intell.

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization Structured signal recovery fro quadratic easureents: Breaking saple coplexity barriers via nonconvex optiization Mahdi Soltanolkotabi Ming Hsieh Departent of Electrical Engineering University of Southern

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Reshaped Wirtinger Flow for Solving Quadratic System of Equations

Reshaped Wirtinger Flow for Solving Quadratic System of Equations Reshaped Wirtinger Flow for Solving Quadratic System of Equations Huishuai Zhang Department of EECS Syracuse University Syracuse, NY 344 hzhan3@syr.edu Yingbin Liang Department of EECS Syracuse University

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

arxiv: v1 [math.na] 10 Oct 2016

arxiv: v1 [math.na] 10 Oct 2016 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

A Simple Homotopy Algorithm for Compressive Sensing

A Simple Homotopy Algorithm for Compressive Sensing A Siple Hootopy Algorith for Copressive Sensing Lijun Zhang Tianbao Yang Rong Jin Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Departent of Coputer

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL

GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL CHAO MA, XIN LIU, AND ZAIWEN WEN Abstract. In this paper, we consider a nonlinear least squares odel for the phase retrieval proble. Since

More information

Robust polynomial regression up to the information theoretic limit

Robust polynomial regression up to the information theoretic limit 58th Annual IEEE Syposiu on Foundations of Coputer Science Robust polynoial regression up to the inforation theoretic liit Daniel Kane Departent of Coputer Science and Engineering / Departent of Matheatics

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

arxiv: v5 [cs.it] 16 Mar 2012

arxiv: v5 [cs.it] 16 Mar 2012 ONE-BIT COMPRESSED SENSING BY LINEAR PROGRAMMING YANIV PLAN AND ROMAN VERSHYNIN arxiv:09.499v5 [cs.it] 6 Mar 0 Abstract. We give the first coputationally tractable and alost optial solution to the proble

More information

A NEW ROBUST AND EFFICIENT ESTIMATOR FOR ILL-CONDITIONED LINEAR INVERSE PROBLEMS WITH OUTLIERS

A NEW ROBUST AND EFFICIENT ESTIMATOR FOR ILL-CONDITIONED LINEAR INVERSE PROBLEMS WITH OUTLIERS A NEW ROBUST AND EFFICIENT ESTIMATOR FOR ILL-CONDITIONED LINEAR INVERSE PROBLEMS WITH OUTLIERS Marta Martinez-Caara 1, Michael Mua 2, Abdelhak M. Zoubir 2, Martin Vetterli 1 1 School of Coputer and Counication

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Nonconvex Methods for Phase Retrieval

Nonconvex Methods for Phase Retrieval Nonconvex Methods for Phase Retrieval Vince Monardo Carnegie Mellon University March 28, 2018 Vince Monardo (CMU) 18.898 Midterm March 28, 2018 1 / 43 Paper Choice Main reference paper: Solving Random

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Recovery of Sparsely Corrupted Signals

Recovery of Sparsely Corrupted Signals TO APPEAR IN IEEE TRANSACTIONS ON INFORMATION TEORY 1 Recovery of Sparsely Corrupted Signals Christoph Studer, Meber, IEEE, Patrick Kuppinger, Student Meber, IEEE, Graee Pope, Student Meber, IEEE, and

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Fast and Memory Optimal Low-Rank Matrix Approximation

Fast and Memory Optimal Low-Rank Matrix Approximation Fast and Meory Optial Low-Rank Matrix Approxiation Yun Se-Young, Marc Lelarge, Alexandre Proutière To cite this version: Yun Se-Young, Marc Lelarge, Alexandre Proutière. Fast and Meory Optial Low-Rank

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

DISSIMILARITY MEASURES FOR ICA-BASED SOURCE NUMBER ESTIMATION. Seungchul Lee 2 2. University of Michigan. Ann Arbor, MI, USA.

DISSIMILARITY MEASURES FOR ICA-BASED SOURCE NUMBER ESTIMATION. Seungchul Lee 2 2. University of Michigan. Ann Arbor, MI, USA. Proceedings of the ASME International Manufacturing Science and Engineering Conference MSEC June -8,, Notre Dae, Indiana, USA MSEC-7 DISSIMILARIY MEASURES FOR ICA-BASED SOURCE NUMBER ESIMAION Wei Cheng,

More information

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis City University of New York (CUNY) CUNY Acadeic Works International Conference on Hydroinforatics 8-1-2014 Experiental Design For Model Discriination And Precise Paraeter Estiation In WDS Analysis Giovanna

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Hamming Compressed Sensing

Hamming Compressed Sensing Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information