Composite optimization for robust blind deconvolution

Size: px
Start display at page:

Download "Composite optimization for robust blind deconvolution"

Transcription

1 Coposite optiization for robust blind deconvolution Vasileios Charisopoulos Daek Davis Mateo Díaz Ditriy Drusvyatskiy Abstract The blind deconvolution proble seeks to recover a pair of vectors fro a set of rank one bilinear easureents We consider a natural nonsooth forulation of the proble and show that under standard statistical assuptions, its oduli of weak convexity, sharpness, and Lipschitz continuity are all diension independent This phenoenon persists even when up to half of the easureents are corrupted by noise Consequently, standard algoriths, such as the subgradient and prox-linear ethods, converge at a rapid diension-independent rate when initialized within constant relative error of the solution We then coplete the paper with a new initialization strategy, copleenting the local search algoriths The initialization procedure is both provably efficient and robust to outlying easureents Nuerical experients, on both siulated and real data, illustrate the developed theory and ethods Introduction A variety of tasks in data science aount to solving a nonlinear syste F x = 0, where F : R d R is a highly structured sooth ap The setting when F is a quadratic ap already subsues iportant probles such as phase retrieval [,37,47], blind deconvolution [4,33,36,49], atrix copletion [3,8,48], and covariance atrix estiation [5,35], to nae a few Recent works have suggested a nuber of two-stage procedures for globally solving such probles The first stage initialization yields a rough estiate x 0 of an optial solution, often using spectral techniques The second stage local refineent uses a local search algorith that rapidly converges to an optial solution, when initialized at x 0 For a detailed discussion, we refer the reader to the recent survey [6] The typical starting point for local refineent is to for an optiization proble in x X fx := hf x, School of Operations Research and Inforation Engineering, Cornell University, Ithaca, NY 4850, USA; peopleoriecornelledu/vc333/ School of Operations Research and Inforation Engineering, Cornell University, Ithaca, NY 4850, USA; peopleoriecornelledu/dsd95/ Center for Applied Matheatics, Cornell University Ithaca, NY 4850, USA; peoplecacornelledu/d85/ Departent of Matheatics, U Washington, Seattle, WA 9895; wwwathwashingtonedu/ ddrusv Research of Drusvyatskiy was supported by the NSF DMS 6585 and CCF awards

2 where h is a carefully chosen penalty function and X is a constraint set Most widelyused penalties are sooth and convex; eg, the squared l -nor hz = z is ubiquitous in this context Equipped with such penalties, the proble is sooth and therefore gradient-based ethods becoe iediately applicable The ain analytic challenge is that the condition nuber λax f λ in of the proble often grows with the diension of f the abient space d This is the case for exaple for phase retrieval, blind deconvolution, and atrix copletion probles; see eg [6] and references therein Consequently, generic nonlinear prograing guarantees yield efficiency estiates that are far too pessiistic Instead, a fruitful strategy is to recognize that the Hessian ay be well-conditioned along the relevant set of directions, which suffice to guarantee rapid convergence This is where new insight and analytic techniques for each particular proble coe to bare eg [37,39,49] Soothness of the penalty function h in is crucially used by the aforeention techniques A different recent line of work [6, 0,, 5] has instead suggested the use of nonsooth convex penalties ost notably the l -nor hz = z Such a nonsooth forulation will play a central role in our work A nuber of algoriths are available for nonsooth copositional probles, ost notably the subgradient ethod x t+ = proj X x t α t v t with v t fx t, and the prox-linear algorith x t+ = argin x X h F x t + F x t x x t + x x t α t The local convergence guarantees of both ethods can be succinctly described as follows Set X := argin X f and suppose there exist constants ρ, µ, L > 0 satisfying: approxiation hf y h F x + F xy x ρ y x for all x X, sharpness fx inf f µ dist x, X for all x X, Lipschitz bound v L for all v fx with distx, X ρ µ Then when equipped with an appropriate sequence α t and initialized at a point x 0 satisfying distx 0, X ρ, both the subgradient and prox-linear iterates will converge to an µ optial solution of the proble The prox-linear algorith converges quadratically, while the subgradient ethod converges at a linear rate governed by the ratio µ 0, L A possible advantage of nonsooth techniques can be gleaned fro the phase retrieval proble The papers [5, Corollary 3,3], [, Corollary 38] recently, showed that for the phase retrieval proble, standard statistical assuptions iply that with high probability all the constants ρ, µ, L > 0 are diension independent Consequently, copletely generic guarantees outlined above, without any odification, iply that both ethods converge at a diension-independent rate, when initialized within constant relative error of the optial solution This is in sharp contrast to the sooth forulation of the proble, where a ore nuanced analysis is required, based on restricted soothness and convexity Moreover, this approach is robust to outliers in the sense that analogous guarantees persist even when up to half of the easureents are corrupted by noise

3 In light of the success of the nonsooth penalty approach for phase retrieval, it is intriguing to deterine if nonsooth techniques can be fruitful for a wider class of large-scale probles Our current work fits squarely within this research progra In this work, we analyze a nonsooth penalty technique for the proble of blind deconvolution Forally, we consider the task of robustly recovering a pair w, x R d R d fro bilinear easureents: y i = l i, w r i, x + η i, supp η where η is an arbitrary noise corruption with frequency p fail := that is at ost one half, and l i R d and r i R d are known easureent vectors Such bilinear systes and their coplex analogues arise often in biological systes, control theory, coding theory, and iage deblurring, aong others Most notably such probles appear when recovering a pair u, v C C fro the convolution easureents y = Lu Rv C When passing to the Fourier doain this proble is equivalent to that of solving a coplex bilinear syste of equations; see the pioneering work [4] All the arguents we present can be extended to the coplex case We focus on the real case for siplicity In this work we analyze the following nonsooth forulation of the proble: in w, x ν M fw, x := l i, w r i, x y i, 3 where ν is a user-specified constant and M = w x F Our contributions are two-fold: Local refineent Suppose that the vectors l i and r i are both iid Sub-Gaussian and satisfy a ild growth condition which is autoatically satisfied for Gaussian rando vectors We will show that as long as the nuber of easureents satisfies d +d p fail ln p fail, the forulation 3 adits diension independent constants ρ, L, and µ with high probability Consequently, subgradient and prox-linear ethods rapidly converge to the optial solution at a diension-independent rate when initialized at a point x 0 with constant relative error w 0x 0 w x F Analogous w x T F results also hold under ore general incoherence assuptions Initialization Suppose now that l i and r i are both iid Gaussian and are independent fro the noise η We develop an initialization procedure that in the regie d + d and p fail [0, /0], will find a point x 0 satisfying w 0x 0 w x F, with w x T F high probability To the best of our knowledge, this is the only available initialization procedure with provable guarantees in presence of gross outliers We also develop copleentary guarantees under the weaker assuption that the vectors l i, r i corresponding to exact easureents are independent fro the noise η i in the outlying easureents This noise odel allows one to plant outlying easureents fro a copletely different pair of signals, and is therefore coputationally ore challenging The literature studying bilinear systes is rich Fro the inforation-theoretic perspective [7, 9, 34], the optial saple coplexity in the noiseless regie is d + d if no further assuptions eg sparsity are iposed on the signals Therefore, fro a saple coplexity viewpoint, our guarantees are optial Incidentally, to our best knowledge, all 3 i=

4 alternative approaches are either suboptial by a polylogarithic factor in d, d or require knowing the sign pattern of one of the underlying signals [3, 4] Recent algorithic advances for blind deconvolution can be classified into two ain approaches: works based on convex relaxations and those eploying gradient descent on a sooth nonconvex function The influential convex techniques of [3,4] lift the objective to a higher diension, thereby necessitating the resolution of a high-diensional seidefinite progra The ore recent work of [, ] instead relaxes the feasible region in the natural paraeter space, under the assuption that the coordinate signs of either w or x are known a priori Finally, with the exception of [4], the aforeentioned works do not provide guarantees in the noisy regie Nonconvex approaches for blind deconvolution typically apply gradient descent to a sooth forulation of the proble [7, 33, 37] Since the condition nuber of the proble scales with diension, as we entioned previously, these works introduce a nuanced analysis that is specific to the gradient ethod The authors of [33] propose applying gradient descent on a regularized objective function, and identify a basin of attraction around the solution The paper [37] instead analyzes gradient descent on the unregularized objective They use the leave-one-out technique and prove that the iterates reain within a region where the objective function satisfies restricted strong convexity and soothness conditions The saple coplexities of the ethods in [7, 33, 37, 37] are optial up to polylog factors The nonconvex strategies entioned above all use spectral ethods for initialization These ethods are not robust to outliers, since they rely on the leading singular vectors/values of a potentially noisy easureent operator Adapting the spectral initialization of [5] to bilinear inverse probles enables us to deal with gross outliers of arbitrary agnitude Indeed, high variance noise akes it easier for our initialization to reject outlying easureents The outline of the paper is as follows Section records basic notation we will use throughout the paper Section 3 reviews the ipact of sharpness and weak convexity on the rapid convergence of nuerical ethods Section 4 establishes estiates of weak convexity, sharpness, and Lipschitz oduli for the blind deconvolution proble under both deterinistic and statistical assuptions on the data Section 5 introduces the initialization procedure and proves its correctness even if a constant fraction of easureents is corrupted by gross outliers The final Section 6 presents nuerical experients illustrating the theoretical results in the paper Notation The section records basic notation that we will use throughout the paper To this end, we always endow R d with the dot product, x, y = x y, and the induced nor x = x, x The sybol S d denotes the unit sphere in R d, while B denotes the open unit ball When convenient, we will use the notation B d to ephasize the diension of the abient space More generally, B r x will stand for the open ball around x of radius r We define the distance and the nearest-point projection of a point x onto a closed set Q R d by distx, Q = inf x y and proj Q x = argin x y, y Q 4 y Q

5 respectively For any pair of real-valued functions f, g : R d R, the notation f g eans that there exists a positive constant C such that fx Cgx for all x R d We write f g if both f g and g f We will always use the trace inner product X, Y = TrX T Y on the space of atrices R d d The sybols A op and A F will denote the operator and Frobenius nor of A, respectively Assuing d, the ap σ : R d R d + returns the vector of ordered singular values σ A σ A σ d A Note the equalities A F = σa and A op = σ A Nonsooth functions will appear throughout this work Consequently will use soe basic constructions of generalized differentiation, as set out for exaple in the onographs [8, 38, 4, 45] Consider a function f : R d R {+ } and a point x, with fx finite Then the Fréchet subdifferential of f at x, denoted by fx, is the set of all vectors v R d satisfying fy fx + v, y x + o y x as y x Thus, a vector v lies in the subdifferential fx precisely when the function y fx + v, y x locally inorizes f up to first-order We say that a point x is stationary for f whenever the inclusion, 0 fx, holds Standard results show for convex functions f the subdifferential fx reduces to the subdifferential in the sense of convex analysis, while for differentiable functions f it consists only of the gradient fx = { fx} Notice that in general, the little-o ter in ay depend on the base-point x, and the estiate therefore ay be nonunifor In this work, we will only encounter functions whose subgradients autoatically satisfy a unifor type of lower-approxiation property We say that a function f : R d R {+ } is ρ-weakly convex if the perturbed function x fx + ρ x is convex It is straightforward to see that for any ρ-weakly convex function f, subgradients autoatically satisfy the unifor bound: fy fx + v, y x ρ y x x, y R d, v fx We will coent further on the class of weakly convex functions in Section 3 We say a that a rando vector X in R d is η-sub-gaussian whenever E exp u,x for η all vectors u S d The sub-gaussian nor of a real-valued rando variable X is defined X to be X ψ = inf{t > 0 : E exp }, while the sub-exponential nor is defined by t X ψ = inf{t > 0 : E exp X } Given a saple y = y t,, y n, we will write edy to denote its edian 3 Algoriths for sharp weakly convex probles The central thrust of this work is that under reasonable statistical assuptions, the penalty forulation 3 satisfies two key properties: the objective function is weakly convex and grows at least linearly as one oves away fro the solution set In this section, we review Weakly convex functions also go by other naes such as lower-c, uniforly prox-regularity, paraconvex, and seiconvex 5

6 the consequences of these two properties for local rapid convergence of nuerical ethods The discussion ostly follows the recent work [0], though eleents of this viewpoint can already be seen in the two papers [, 5] on robust phase retrieval Setting the stage, we introduce the following assuption Assuption A Consider the optiization proble, in x X fx 3 Suppose that the following properties hold for soe real µ, ρ > 0 Weak convexity The set X is closed and convex, while the function f : R d R is ρ-weakly convex Sharpness The set of iniizers X := argin fx is nonepty and the inequality x X fx inf f µ dist x, X holds for all x X The class of weakly convex functions is broad and its iportance in optiization is well docuented [5, 40, 43, 44, 46] It trivially includes all convex functions and all C -sooth functions with Lipschitz gradient More broadly, it includes all copositions fx = hf x, where h is convex and L-Lipschitz, and F is C -sooth with β-lipschitz Jacobian Indeed then the coposite function f = h F is weakly convex with paraeter ρ = Lβ; see eg [4, Lea 4] In particular, our target proble 3 is clearly weakly convex, being a coposition of the l nor and a quadratic ap The estiate ρ = Lβ on the weak convexity constant is often uch too pessiistic, however Indeed, under statistical assuptions, we will see that the target proble 3 has a uch better weak convexity constant The notion of sharpness, and the related error bound property, is now ubiquitous in nonlinear optiization Indeed, sharpness underlies uch of perturbation theory and rapid convergence guarantees of various nuerical ethods For a systeatic treatent of error bounds and their applications, we refer the reader to the onographs of Dontchev- Rockafellar [] and Ioffe [8], and the article of Lewis-Pang [3] Taken together, weak convexity and sharpness provide an appealing fraework for deriving local rapid convergence guarantees for nuerical ethods In this work, we specifically focus on two such procedures: the subgradient and prox-linear algoriths To this end, we ai to estiate both the radius of rapid converge around the solution set and the rate of convergence Our ultiate goal is to show that when specialized to our target proble 3, with high probability, both of these quantities are independent of the abient diensions d and d as soon as the nuber of easureents is sufficiently large Both the subgradient and prox-linear algoriths have the property that when initialized at a stationary point of the proble, they could stay there for all subsequent iterations Since we are interested in finding global inia, and not just stationary points, we ust therefore estiate the neighborhood of the solution set that has no extraneous stationary points This is the content of the following siple lea [0, Lea 3] 6

7 Lea 3 Suppose that Assuption A holds Then the proble 3 has no stationary points x satisfying 0 < distx; X < µ ρ Proof Fix a critical point x X / X Letting x := proj X x, we deduce µ distx, X fx fx ρ x x = ρ dist x, X Dividing by distx, X, the result follows The estiate µ of the radius in Lea 3 is tight To see this, consider iniizing ρ the univariate function fx = λ x on the real line X = R Observe that the set of iniizers is X = { ± λ}, while x = 0 is always an extraneous stationary point A quick coputation shows that the sallest valid weak convexity is ρ = λ while the largest valid sharpness constant is µ = λ We therefore deduce dist0, X = = µ Hence the radius of the region µ that is λ ρ ρ devoid of extraneous stationary points is tight In light of Lea 3, let us define for any γ > 0 the tube { T γ := z R d : distz, X γ µ } 3 ρ Thus we would like to search for algoriths whose basin of attraction is a tube T γ for soe nuerical constant γ > 0 Due to the above discussion, such a basin of attraction is in essence optial We next discuss two rapidly converging algoriths The first is the Polyak subgradient ethod, outlined in Algorith Notice that the only paraeter that is needed to ipleent the procedure is the inial value of the proble 3 This value is soeties known; case in point, the inial value of the penalty forulation 3 is zero when the bilinear easureents are exact Algorith : Polyak Subgradient Method Data: x 0 R d Step k: k 0 Choose ζ k fx k If ζ k = 0, then exit algorith Set x k+ = proj X x k fx k in X f ζ ζ k k The rate of convergence of the ethod relies on the Lipschitz constant and the condition easure: L := sup{ ζ : ζ fx, x T } and τ := µ L A straightforward arguent [0, Lea 3] shows τ [0, ] The following theore appears as [0, Theore 4], while its application to phase retrieval was investigated in [] Theore 3 Polyak subgradient ethod Suppose that Assuption A holds and fix a real γ 0, Then Algorith initialized at any point x 0 T γ produces iterates that converge Q-linearly to X, that is dist x k+, X γτ dist x k, X k

8 When the inial value of the proble 3 is unknown, there is a straightforward odification of the subgradient ethod that converges R-linearly The idea is to choose a geoetrically decaying control sequence for the stepsize The disadvantage is that the convergence guarantees rely on being able to tune estiates of L, ρ, and µ Algorith : Subgradient ethod with geoetrically decreasing stepsize Data: Real λ > 0 and q 0, Step k: k 0 Choose ζ k gx k If ζ k = 0, then exit algorith Set stepsize α k = λ q k ζ Update iterate x k+ = proj X x k α k k ζ k The following theore appears as [0, Theore 6] The convex version of the result dates back to Goffin [6] Theore 33 Geoetrically decaying subgradient ethod Suppose that Assuption A γµ holds, fix a real γ 0,, and suppose τ Set λ := and q := γτ γ ρl Then the iterates x k generated by Algorith, initialized at a point x 0 T γ, satisfy: dist x k ; X γ µ γτ k ρ k 0 34 Notice that both subgradient algoriths and are at best locally linearly convergent, with a relatively cheap per-iteration cost As the last exaple we discuss an algorith that is specifically designed for convex copositions, which is locally quadratically convergent The caveat is that the ethod ay have a high per-iteration cost, since in each iteration one ust solve an auxiliary convex optiization proble Setting the stage, let us introduce the following assuption Assuption B Consider the optiization proble, in fx := hf x 35 x X Suppose that the following properties holds for soe real µ, ρ > 0 Convexity and soothness The function h and the set X are convex and F is differentiable Approxiation accuracy The convex odels f x y := hf x + F xy x satisfy the estiate: fy f x y ρ y x x, y X 3 Sharpness The set of iniizers X := argin fx is nonepty and the inequality x X fx inf f µ dist x, X holds for all x X 8

9 It is straightforward to see that Assuption B iplies that f is ρ-weakly convex; see eg [4, Lea 73] Therefore Assuption B iplies Assuption A Algorith 3 describes the prox-linear ethod a close variant of Gauss-Newton For a historical account of the prox-linear ethod, see eg, [0, 4, 3] and the references therein Algorith 3: Prox-linear algorith Data: Initial point x 0 R d, proxial paraeter β > 0 Step k: k 0 Set x k+ argin x X {h F x k + F x k x x k + β x x k } The following theore proves that under Assuption B, the prox-linear ethod converges quadratically, when initialized sufficiently close to the solution set Guarantees of this type have appeared, for exaple, in [, 3, 3, 5] For the sake of copleteness, we provide a quick arguent Theore 34 Prox-linear algorith Suppose Assuption B holds Choose any β ρ and set γ := ρ/β Then Algorith 3 initialized at any point x 0 T γ converges quadratically: distx k+, X β µ dist x k, X k 0 Proof Consider an iterate x k and choose any x proj X x k Taking into account that the function x f xk x + β x x k is strongly convex and x k+ is its iniizer, we deduce f xk x k+ + β x k+ x k + β x k+ x f xk x + β x x k Using Assuption B, we therefore obtain fx k+ + β x k+ x fx + β x x k Rearranging and using sharpness Assuption B3, we conclude as claied µ distx k+, X fx k+ fx β dist x k, X, 4 Assuptions and Models In this section, we ai to interpret the efficiency of the subgradient and prox-linear algoriths discussed in Section 3, when applied to our target proble 3 To this end, we ust estiate the three paraeters ρ, µ, L > 0 These quantities control both the size of the attraction neighborhood around the optial solution set and the rate of convergence within the neighborhood In particular, we will show that these quantities are independent of the abient diension d, d under natural assuptions on the data generating echanis 9

10 It will be convenient for the tie being to abstract away fro the forulation 3, and instead consider the function gw, x := Awx y, where A: R d d R is an arbitrary linear ap and y R is an arbitrary vector The forulation 3 corresponds to the particular linear ap AX = l i Xr i i= Since we will be interested in the prox-linear ethod, let us define the convex odel g w,x ŵ, ˆx := Awx + wˆx x + ŵ wx y Our strategy is as follows Section 4 identifies deterinistic assuptions on the data, A and y, that yield favorable estiates of ρ, µ, L > 0 Then Section 4 shows that these deterinistic assuptions hold with high probability under natural statistical assuptions on the data generating echanis 4 Favorable Deterinistic Properties The following property, widely used in the literature, will play a central role in our analysis Assuption C Restricted Isoetry Property RIP There exist constants c, c > 0 such that for all atrices X R d d of rank at ost two the following bound holds: c X F AX c X F The following proposition estiates the two constants ρ and L, governing the perforance of the subgradient and prox-linear ethods under Assuption C Proposition 4 Approxiation accuracy and Lipschitz continuity Suppose Assuption C holds and let K > 0 be arbitrary Then the following estiates hold: gŵ, ˆx g w,x ŵ, ˆx c w, x ŵ, ˆx x, ˆx R d, w, ŵ R d, gw, x gŵ, ˆx c K w, x ŵ, ˆx x, ˆx KB, w, ŵ KB Proof To see the first estiate, observe gŵ, ˆx g w,x ŵ, ˆx = Aŵˆx y Awx + wˆx x + ŵ wx y Aŵˆx wx wˆx x ŵ wx = A w ŵx ˆx c w ŵx ˆx F c w ŵ + x ˆx, 0

11 where the last estiate follows fro Young s inequality ab a + b Now suppose w, ŵ KB and x, ˆx KB We then successively copute: gw, x gŵ, ˆx Awx ŵˆx c wx ŵˆx F The proof is coplete = c w ŵx + ŵx ˆx F c x w ŵ + c ŵ x ˆx c K w, x ŵ, ˆx We next ove on to estiates of the sharpness constant µ To this end, consider two vectors w R d and x R d, and set M := x w T F = x w T Without loss of generality, henceforth, we suppose w = x Our estiates on the sharpness constant will be valid only on bounded sets Consequently, define the two sets: S ν := ν M B d B d, S ν := {α w, /α x: /ν α ν} The set S ν siply encodes a bounded region, while S ν encodes all rank- factorizations of the atrix w x with bounded factors We begin with the following proposition, which analyzes the sharpness properties of the idealized function x, w wx w x F The proof is quite long, and therefore we have placed it in Appendix A Proposition 4 For any ν, we have the following bound wx w x F M ν + dist w, x, Sν for all w, x S ν M ν+ Thus the function x, w wx w x F is sharp on the set S ν with coefficient We note in passing that the analogue of Proposition 4 for syetric atrices was proved in [49, Lea 54] The sharpness of the loss g, in the noiseless regie ie when y = A w x is now iediate Proposition 43 Sharpness in the Noiseless Regie Suppose that Assuption C holds and that equality, y = A w x, holds Then for any ν, we have the following bound: gw, x g w, x c M ν + dist w, x, Sν Proof Using Assuption C and Proposition 4, we deduce for all w, x S ν gw, x g w, x = Awx w x c wx w x F c M ν + dist w, x, Sν, as claied

12 Sharpness in the noisy case requires an additional assuption We record it below Henceforth, for any set I, we define the restricted linear ap A I : R d d R I by setting A I X := AX i I Assuption D I-outliner bounds There exists a set I {,, }, vectors w R d, x R d, and a constant c 3 > 0 such that the following hold C Equality y i = A w x i holds for all i / I C For all atrices X R d d of rank at ost two, we have c 3 X F A I cx A IX 4 Cobining Assuption D with Proposition 4 quickly yields sharpness of the objective even in the noisy setting Proposition 44 Sharpness in the noisy regie Suppose that Assuption D holds Then gw, x g w, x c 3 M ν + dist w, x, Sν for all w, x S ν Proof Defining η = A w x T y, we have the following bound: gw, x g w, x = A wx w x + η η = Awx w x + Awx w x + η i i Awx w x i ηi i I Awx w x Awx w x i i I = Awx w x i Awx w x i i I c i I c 3 wx w x F c 3 M ν + dist w, x, Sν, where the first inequality follows by the reverse triangle inequality, the second inequality follows by Assuption C, and the final inequality follows fro Proposition 4 The proof is coplete To suarize, suppose Assuptions C and D are valid Then in the notation of Section 3 we ay set: ρ = c, L = c ν M, µ = c 3 M ν +

13 Consequently, the tube radius of T is µ ρ = c 3 c M ν+ and the the linear convergence rate of the subgradient ethod is governed by τ = µ = c 3 L c In particular, the local search 4ν+ algoriths ust be initialized at a point x, w, whose relative distance to the solution set distx,w,sν is upper bounded by a constant We record this conclusion below x w F Corollary 45 Convergence guarantees Suppose Assuptions C and D are valid, and consider the optiization proble Choose any pair x 0, y 0 satisfying Then the following are true in gw, x = x,w S ν Awx y distw 0, x 0, S ν w x F c 3 4 c ν + Polyak subgradient Algorith initialized x 0, y 0 produces iterates that converge linearly to Sν, that is dist w k, x k, Sν c k 3 c 3 k 0 w x F 3c ν + 4 3c ν + geoetric subgradient Set λ := c 3 w x F 6 c and q := 3 Then the c νν+ 3c ν+4 iterates x k generated by Algorith, initialized at w 0, x 0 converge linearly: dist w k, x k, Sν c k 3 c 3 k 0 w x F 3c ν + 4 3c ν + 3 prox-linear Algorith 3 with β = ρ and initialized at w 0, x 0 converges quadratically: distw k, x k, Sν c k 3 w x F k 0 c ν + 4 Assuptions under generative odels In this section, we present natural generative odels under which Assuptions C and D are guaranteed to hold Recall that at the high level, we ai to recover the pair of signals w, x based on given corrupted bilinear easureents y Forally, let us fix two disjoint sets I in [] and I out [], called the inlier and outlier sets Intuitively, the index set I in encodes exact easureents while I out encodes easureents that have been replaced by gross outliers Define the corruption frequency p fail := Iout ; henceforth, we will suppose p fail [0, / Then for an arbitrary, potentially rando sequence {ξ i } i=, we consider the easureent odel: { l i, w r i, x if i I in, y i := 4 ξ i if i I out 3

14 In accordance with the previous section, we define the linear ap A: R d d R by AX = l i Xr i i= To siplify notation, we let L R d denote the atrix whose rows, in colun for, are l i and we let R R d denote the atrix whose rows are r i Note that we ake no assuptions about the nature of ξ i In particular, ξ i can even encode exact easureents for a different signal We focus on two easureent atrix odels The first odel requires both atrices L and R to be rando For siplicity, the reader ay assue both are Gaussian with iid entries, though the results of this paper extend beyond this case The second odel allows sei-deterinistic atrices, naely deterinistic L and Gaussian R with iid entries In the later parts of the paper, we will put further incoherence assuptions on the deterinistic atrix L Rando atrix odels M The vectors l i and r i are iid realizations of η-sub-gaussian rando vectors l R d and r R d, respectively Suppose oreover that l and r are independent and satisfy the nondegeneracy condition, for soe real µ 0, p 0 > 0 inf X: rank X X F = P l Xr µ 0 p 0, 43 M The atrix L is arbitrary and the atrix R is standard Gaussian Soe coents are in order The odel M is fully stochastic, in the sense that l i and r i are generated by independent sub-gaussian rando vectors The nondegeneracy condition 43 essentially asserts that with positive probability, the products l Xr are non-negligible, uniforly over all unit nor rank two atrices X In particular, the following exaple shows that Gaussian atrices with iid entries are adissible under Model M In contrast, the odel M is sei-stochastic: it allows L to be deterinistic, while aking the stronger assuption that R is Gaussian Exaple 4 Gaussian Matrices Satisfy Model M Assue that l and r are standard Gaussian rando vectors in R d and R d, respectively We clai this setting is adissible under M To see this, fix a rank atrix X having unit Frobenius nor Consider now a singular value decoposition X = σ u v + σ u v, and note the equality, σ + σ = For each index i =, define a i := l, u i and b i := v i, r Then clearly a, a, b, b are iid standard Gaussian; see eg [5, Exercise 336] Thus, for any c 0, we copute P l Xr c = P σ a b + σ a b c = E P σ a b + σ a b c a, a Notice that conditioned on a, a, we have σ a b + σ a b N0, σ a + σ a Thus letting z be a standard noral, we have P l Xr c = E P σ a + σ a z c a, a = P σ a + σ a z c Pσ a z c P a z c Therefore, we ay siply set µ 0 = edian a z / and p 0 = 4

15 4 Assuptions C and D under Model M In this section, we ai to prove the following theore, which shows validity of Assuptions C and D under M, with high probability Theore 46 Measureent Model M Consider a set I {,, } satisfying I < / Then there exist constants c, c, c 3, c 4, c 5, c 6 > 0 depending only on µ 0, p 0, η such that the following holds As long as c d +d + c +, then with probability at I / ln c I / least 4 exp c 3 I /, every atrix X R d d and of rank at ost two satisfies c 4 X F AX c 5 X F, 44 [ A I cx A I X ] c 6 I X F 45 Due to scale invariance, in the proof we only concern ourselves with atrices X of rank at ost two satisfying X F = Let us fix such a atrix X and an arbitrary index set I {,, } with I < / We begin with the following lea Lea 47 Pointwise concentration The rando variable l Xr is sub-exponential with paraeter η Consequently, the estiate holds: µ 0 p 0 E l Xr η 46 Moreover, there exists a nuerical constant c > 0 such that for any t 0, η ], we have with probability at least exp ct η 4 the estiate: A I cx A I X E [ A I cx A I X ] t 47 Proof Markov s inequality along with 43 iplies E l Xr µ 0 P l Xr µ 0 µ 0 p 0, which is the lower bound in 46 Now we address the upper bound To that end, suppose that X has a singular value decoposition X = σ U V + σ U V We then deduce l Xr ψ = l σ U V + σ U V r ψ = σ l, U V, r + σ l, U V, r ψ σ l, U V, r ψ + σ l, U V, r ψ σ l, U ψ V, r ψ + σ l, U ψ V, r ψ σ + σ η η, where the second inequality follows since ψ is a nor and XY ψ X ψ Y ψ [5, Lea 77] This bound has two consequences: first l Xr is a sub-exponential rando variable with paraeter η and second E l Xr η, see [5, Exercise 7] The first bound will be useful oentarily, while the second copletes the proof of 46 5

16 Next define the sub-exponential rando variable { l i Xr i E l i Xr i if i / I Y i = l i Xr i E l i Xr i if i I Standard results eg [5, Exercise 70] iply Y i ψ η for all i Using Bernstein inequality for sub-exponential rando variables, Theore C6, to upper bound P i= Y i t copletes the proof Proof of Theore 46 Choose ɛ 0, and let N be the ɛ/ -net guaranteed by Lea C Let E denote the event that the following two estiates hold for all atrices in X N : A I cx A I X E [ A I cx A I X ] t, 48 AX E [ AX ] t 49 Throughout the proof, we will assue that the event E holds We will estiate the probability of E at the end of the proof Meanwhile, seeking to establish RIP, define the quantity c := sup X S AX We ai first to provide a high probability bound on c Let X S be arbitrary and let X be the closest point to X in N Then we have AX AX + AX X E AX + t + AX X 40 E AX + t + E AX X + AX X, 4 where 40 follows fro 47 and 4 follows fro the triangle inequality To siplify the third ter in 4, using SVD, we deduce that there exist two orthogonal atrices X, X of rank at ost two satisfying X X = X + X With this decoposition in hand, we copute AX X AX + AX c X F + X F c X X F c ɛ, 4 where the second inequality follows fro the definition of c and the estiate X F + X F X, X F = X + X F Thus, we arrive at the bound AX E AX + t + c ɛ 43 6

17 As X was arbitrary, we ay take the supreu of both sides of the inequality, yielding c sup X S E AX + t + c ɛ Rearranging yields the bound c Assuing that ɛ /4, we further deduce that sup X S E AX + t ɛ c σ := sup X S E AX + t, 44 establishing that the rando variable c is bounded by σ in the event E Now let Î denote either Î = or Î = I We now provide a unifor lower bound on A Î X c A Î X Indeed, A Î c X A Î X = A Î c X + AÎcX X A Î X + AÎX X A Î c X A Î X AX X 45 E [ A Î c X AÎX ] t AX X 46 E [ A Î c X AÎX ] t E AX X + AX X 47 E [ A Î c X AÎX ] t σɛ, 48 where 45 uses the forward and reverse triangle inequalities, 46 follows fro 48, the estiate 47 follows fro the forward and reverse triangle inequalities, and 48 follows fro 4 and 44 Switching the roles of I and I c in the above sequence of inequalities, and choosing ɛ = t/4 σ, we deduce sup AÎcX AÎX E [ AÎcX AÎX ] 3t X S In particular, setting Î =, we deduce sup AX E [ AX ] 3t X S and therefore using 46, we conclude the RIP property Next, let Î = I and note that µ 0 p 0 3t AX η + 3t, X S 49 E [ A Î X c AÎX ] = Ic I E l Xr µ 0 p 0 I 7

18 Therefore every X S satisfies [ A Î c X AÎX ] µ 0 p 0 I 3t 40 Setting t = 3 in{µ 0p 0 /, µ 0 p 0 I //} = 3 µ 0p 0 I / in 49 and 40, we deduce the claied estiates 44 and 45 Finally, let us estiate the probability of E Using Lea 47 and the union bound yields PE c P { 48 or 49 fails at X } X N 4 N exp ct η 4 d +d exp ct ɛ η 4 = 4 exp d + d + ln9/ɛ ct η 4 where the second inequality follows fro lea C and c is a constant Then we deduce since /ɛ = 4 σ/t + η / I / PE c c 4 exp c d + d + ln c + 4cµ 0p 0 I I / 9η 4 Hence as long as 8η4 c d +d + lnc + The result follows iediately 4cµ 0 p I 0 c I /, we can be sure PE c 4 exp 4cµ 0 p I 0 8η 4 Cobining Theore 46 with Corollary 45 we obtain the following guarantee Corollary 48 Convergence guarantees Consider the easureent odel 4 and suppose that odel M is valid Consider the optiization proble in x,w S ν fw, x = l i, w r i, x y i i= Then there exist constants c, c, c 3, c 4, c 5, c 6 > 0 depending only on µ 0, p 0, η such that as long as c d +d + p fail ln c + c p fail and you choose any pair x 0, y 0 with relative error distw 0, x 0, S ν w x F c 6 p fail 4 c 5 ν +, 4 then with probability at least 4 exp c 3 p fail the following are true 8

19 Polyak subgradient Algorith initialized x 0, y 0 produces iterates that converge linearly to S ν, that is dist w k, x k, S ν w x F c 6 p fail 3c 5ν + 4 k c 6 p fail 3c 5ν + k 0 geoetric subgradient Set λ := c 6 p fail w x F 6 and q := c 6 p fail c 5 νν+ 3c 5 ν+4 Then the iterates x k generated by Algorith, initialized at w 0, x 0 converge linearly: dist w k, x k, S ν w x F c 6 p fail 3c 5ν + 4 k c 6 p fail 3c 5ν + k 0 3 prox-linear Algorith 3 with β = ρ and initialized at w 0, x 0 converges quadratically: distw k, x k, X k c6 p fail w x F k 0 c 5 ν + Thus with high probability, if one initializes the subgradient and prox-linear ethods at a pair w 0, x 0 satisfying distw 0,x 0,Sν c 6 p fail, then the ethods will converge to the w x F 4 c 5 ν+ optial solution set at a diension independent rate 4 Assuptions C and D under Model M In this section, we verify Assuptions C and D under Model M and an extra incoherence condition Naely, we ipose further conditions on l p /l singular values of L p σ p,in L = inf Lw p and σ p,ax L := sup Lw p, w S d w S d which intuitively guarantee that the entries of any vector in {AX rankx } are well-spread Proposition 49 Measureent Model M Assue Model M and fix an arbitrary index I {,, } Define the paraeter := σ,inl π σ I,axL π, and suppose > 0 Then there exist nuerical constants c, c, c 3 > 0 such that with probability 4 exp c d + d + ln c + σ,axl σ,in L c 3 σ,inl σ,axl, 9

20 every atrix X R d d of rank at ost two satisfies σ,in L π X F AX 5/ + σ,ax L X F, 4 π and A I cx A IX X F 43 Proof The arguent irrors the proof of Proposition 46 and therefore we only provide a sketch Fix a unit Frobenius nor atrix X of rank at ost two We ai to show that for any fixed Î {,, }, the following rando variable is highly concentrated around its ean: ZÎ = A Î X c A Î X To that end, fix a singular value decoposition X = s u v + s u v We then copute AX i = l i s u v + s u v r i = s l i, u r i + s l i, u r i, where u and u are orthogonal, s + s =, and r i, r i are iid standard noral rando variables This decoposition, together with the rotation invariance of the noral distribution, furnishes us with the following distributional equivalence: AX i d = s l i, u + s l i, u r i, where r i is a standard noral rando variable Consequently, we have the following expression for the expectation: E [ZÎ] = s l i, u π + s l i, u s l i, u π + s l i, u i Îc We now upper/lower bound this expectation The upper bound follows fro the estiate [ ] E [ZÎ] E AX π Lu + Lu = 3/ σ,ax L π The lower bound uses the following two diensional inequality z holds for all z R : E [ZÎ] = π = π i Îc i Î z z, which s l i, u + s l i, u s l i, u π + s l i, u s l i, u + s l i, u π i= s Lu + s Lu Î π π σ,inl σ,ax L Î π π 0 i Î s l i, u + s l i, u i Î ax i=,, l i

21 In particular, setting Î =, we deduce σ,in L π E AX 3/ σ,ax L π 44 To establish concentration of the rando variable ZÎ, we apply a standard result Theore C5 on the concentration of weighted sus of ean zero independent sub-gaussian rando variables In particular, to apply Theore C5, we write Y i = r i E r i, and define weights { a i = s l i, u + s l i, u if i / Î, s l i, u + s l i, u if i Î Noticing that r i E r i ψ K, where K > 0 is an absolute constant, and a = i= s l i, u + s l i, u σ,axl, it follows that for any fixed unit Frobenius nor atrix X of rank at ost two, with probability at least exp ct, we have K σ,ax L A Î X c AÎX E [ AÎcX AÎX ] t 45 We have thus established concentration for any fixed X We now proceed with a covering arguent in the sae way as in the proof of Theore 46 To this end, choose ɛ 0, and let N be the ɛ/ -net guaranteed by lea C Let E denote the event that the following two estiates hold for all atrices X N : A I cx A I X E [ A I cx A I X ] t, AX E [ AX ] t Throughout the proof, we will assue that the event E holds By exactly the sae covering arguent as in Theore 46, setting ɛ = t/4 σ with σ = sup X S E AX +t, we deduce AÎcX sup AÎX E [ AÎcX AÎX ] X S where either Î = or Î = I In particular, setting Î = and using the bound 44, we deduce σ,in L π for all X S In turn, setting Î = I we deduce 3t AX 3/ σ,ax L + 3t π A I cx A IX E [ A I cx A I X ] 3t σ,ax L π σ,inl π 3t, I 3t

22 Setting t := σ,inl 3, the estiates 4 and 43 follow iediately Finally, estiating π the probability of E using the union bound quickly yields: PE c 4 exp c d + d + ln c + σ,axl c 3 σ,inl σ,in L σ,axl The result follows 5 Initialization Previous sections have focused on local convergence guarantees under various statistical assuptions In particular, under Assuptions C and D, one ust initialize the local search procedures at a point w, x, whose relative distance to the solution set distx,w,s ν is upper x w F bounded by a constant In this section, we present a new spectral initialization routine Algorith 4 that is able to efficiently find such point w, x The algorith is inspired by [5, Section 4] and [5] Before describing the intuition behind the procedure, let us forally introduce our assuptions Throughout this section, we ake the following assuption on the data generating echanis, which is stronger than Model M: M The entries of atrices L and R are iid Gaussian Our arguents rely heavily on properties of the Gaussian distribution We note, however, that our experiental results suggest that Algorith 4 provides high-quality initializations under weaker distributional assuptions Recall that in the previous sections, the noise ξ was arbitrary In this section, however, we ust assue ore about the nature of the noise We will consider two different settings N The easureent vectors {l i, r i } i= and the noise sequence {ξ i } i= are independent N The inlying easureent vectors {l i, r i } i Iin and the corrupted observations {ξ i } i Iout are independent The noise odels N and N differ in how an adversary ay choose to corrupt the easureents Model N allows an adversary to corrupt the signal, but does not allow observation of the easureent vectors {l i, r i } i= On the other hand, Model N allows an adversary to observe the outlying easureent vectors {l i, r i } i Iout and arbitrarily corrupt those easureents For exaple, the adversary ay replace the outlying easureents with those taken fro a copletely different signal: y i = A w x for i I i out We can now describe the intuition underlying Algorith 4 Throughout we denote unit vectors parallel to w and x by w and x, respectively Algorith 4 exploits the expected near orthogonality of the rando vectors l i and r i to the directions w and x, respectively, in order to select a good set of easureent vectors Naely, since E [ l i, w ] = E [ r i, x ] = 0, we expect inial eigenvectors of L init and R init to be near w and x, respectively Since our easureents are bilinear, we cannot necessarily select vectors for which l i, w and r i, x are both sall, rather, we ay only select vectors

23 Algorith 4: Initialization Data: y R, L R d, R R d I sel {i y i ed y } For directional estiates: L init i I sel l i l i, R init i I sel r i r i ŵ argin p S d p L init p, and x argin q S d q R init q Estiate the nor of the signal: return w 0, x 0 M argin Gβ := β R w 0 sign M M y i β l i, ŵ r i, x, i= / ŵ, and x 0 M / x for which the product l i, w r i, x is sall, leading to subtle abiguities not present in [5, Section 4] and [5]; see Figure Corruptions add further abiguities since the noise odel N allows a constant fraction of easureents to be adversarially odified y w l l r r x x Figure : Intuition behind spectral initialization The pair l, r will be included since both vectors are alost orthogonal to the true directions l, r is unlikely to be included since r is alost aligned with x Forally, Algorith 4 estiates an initial signal w 0, x 0 in two stages: first it constructs a pair of directions ŵ, ˆx which estiate the true directions w := w w and x := x x 3

24 up to sign; then it constructs an estiate M of the signed signal nor ±M, which corrects for sign errors in the first stage We now discuss both stages in ore detail, starting with the direction estiate Most proofs will be deferred to Appendix B The general proof strategy we follow is analogous to [5, Section 4] for phase retrieval, with soe subtle odifications due to asyetry Direction Estiate In the first stage of the algorith, we estiate the directions w and x, up to sign Key to our arguent is the following decoposition for odel N which will be proved in Appendix B: L init = Isel I d γ w w + L, R init = Isel I d γ x x + R, where γ, γ and the atrices L, R have sall operator nor decreasing with d + d /, with high probability Using the Davis-Kahan sin θ theore [9], we can then show that the inial eigenvectors of L init and R init are sufficiently close to {± w } and {± x }, respectively Proposition 5 Directional estiates There exist nuerical constants c, c, C > 0, so that for any p fail [0, /0] and t [0, ], with probability at least c exp c t, the following hold: in s {±} ŵ x sw x F C C ax{d,d } p fail + + t ax{d,d } + t under Model N, and under Model N Nor estiate In the second stage of the algorith, we estiate M as well as correct the sign of the direction estiates fro the previous stage In particular, for any ŵ, x S d S d define the quantity c 5 δ := + in ŵ x s w x c 6 p fail s {±} F, 5 Then we prove the following estiate see Ap- where c 5 and c 6 are as in Theore 46 pendix B Proposition 5 Nor Estiate Under either noise odel, N and N, there exist nuerical constants c,, c 6 > 0 so that if c d +d + p fail ln c + c p fail, then with probability at least 4 exp c 3 p fail, we have that any iniizer M of the function Gβ := y i β l i, ŵ x, r i i= satisfies M M δm Moreover, if in this event δ <, then we have sign M = argin s {±} ŵ x s w x F 4

25 Thus, the preceding proposition shows that tighter estiates on the nor M result fro better directional estiates in the first stage of Algorith 4 In light of Proposition 5, we next estiate the probability of the event δ /, which in particular iplies with high probability sign M = argin s {±} ŵ x s w x F Proposition 53 Sign estiate Under either Model N and N, there exist nuerical constants c 0, c, c, c 3 > 0 such that if p fail < c 0 and c 3 d +d, then the estiate holds: P δ > / c exp c Proof Using Theore 46 and Propositions 5, we deduce that for any t [0, ], with probability c exp c t we have ax{d C,d } + t under Model N, and δ ax{d C p fail +,d } + t under Model N Thus under odel N it suffices to set t = C ax{d,d } Then the probability of the event δ / is at least c exp c C ax{d, d } On the other hand, under odel N, it suffices to assue Cp fail < and then we can set t = C p fail ax{d,d } The probability of the event δ / is then at least c exp c C p fail ax{d, d } Finally using the bound ax{d, d } d + d c 3 yields the result Step 3: Final estiate at the following theore Putting the directional and nor estiates together, we arrive Theore 54 There exist nuerical constants c 0, c, c, c 3, C > 0 such that if p fail c 0 and c 4 d + d, then for all t [0, ], with probability at least c exp c 3 t, we have w0 x 0 w x F w x F C C ax{d,d } p fail + + t ax{d,d } + t under Model N, and under Model N Proof Suppose that we are in the events guaranteed by Propositions 5,5, and 53 Then noting that w 0 = sign M M / ŵ, x 0 = M / x, In the case of odel N, one can set c 0 = /0 5

26 we find that w0 x 0 w x F = sign M M ŵ x M w x F = M ŵ x sign M w x + M M M ŵ x M ŵ x sign Mw x F + Mδ = M + in ŵ x s w x c 5 c 6 p fail s {±} where c 5 and c 6 are defined in Theore 46 Appealing to Proposition 5, the result follows F F, Cobining Corollary 45 and Theore 54, we arrive at the following guarantee for the stage procedure Corollary 55 Efficiency estiates Suppose either of the odels N and N Let w 0, x 0 be the output of the initialization Algorith 4 Set M = w 0 x 0 F and consider the optiization proble M M in x, w M gw, x = Awx y 5 Set ν := and notice that the feasible region of 5 coincides with S ν Then there exist constants c 0, c, c, c 3, c 5 > 0 and c 4 0, such that as long as c 3 d + d and p fail c 0, the following properties hold with probability c exp c 3 subgradient Both Algoriths and with appropriate λ, q initialized x 0, y 0 produce iterates that converge linearly to S ν, that is dist w k, x k, S ν w x F c 4 c 4 k k 0 prox-linear Algorith 3 initialized at w 0, x 0 with appropriate β > 0 converges quadratically: distw k, x k, S ν w x F c 5 k k 0 Proof We provide the proof under odel N The proof under odel N is copletely analogous Cobining Proposition 5, Proposition 53, and Theore 54, we deduce that there exist constants c 0, c, c, c 3, C such that as long as c 3 d + d and p fail < c 0, then for any t [0, ], with probability c exp c t, we have M M δ, 53 3 In the case of odel N, one can set c 0 = /0 6

27 and w 0 x 0 w x F M ax{d, d } C + t In particular, notice fro 53 that ν 3 and therefore the feasible region S ν contains an optial solution of the original proble 3 Using Proposition 4, we have M w 0 x 0 w x F ν + dist w 0, x 0, Sν Cobining the estiates, we conclude distw 0, x 0, S ν M ν + w 0x 0 w x F M ax{d, d } ν + C + t Thus to ensure the relative error assuption 4, it suffices to ensure the inequality ax{d, d } ν + C + t c 6 p fail 4 c 5 ν +, where c 5, c 6 are the constants fro Corollary 48 Using the bound ν 3, it suffices to set c6 p t = 6 ax{d, d } 3c 5 C Thus the probability of the desired event becoes c exp c 3 c 4 ax{d, d } for soe constant c 4 Finally, using the bound ax{d, d } d + d c 3 and applying Corollary 48 copletes the proof 6 Nuerical Experients In this section we deonstrate the perforance and stability of the prox-linear and subgradient ethods, and the initialization procedure, when applied to real and artificial instances of Proble 3 All experients were perfored using the Julia [7] prograing language Subgradient ethod ipleentation Ipleentation of the subgradient ethod for Proble 3 is siple, and has low per-iteration cost Indeed, one ay siply choose the subgradient [ ] [ ] li 0 sign l i, w x, r i y x, r i + l 0 i, w fw, x, r i i= where signt denotes the sign of t, with the convention sign0 = 0 The cost of coputing this subgradient is on the order of four atrix ultiplications When applying Algorith, choosing the correct paraeters is iportant, since its convergence is especially sensitive to the value of the step-size decay q; the experient described in Section 6, which aided us epirically in choosing q for the rest of the experients, deonstrates this phenoenon Setting λ = 0 seeed to suffice for all the experients depicted hereafter 7

28 Recall that the convex odels used by the prox- Prox-linear ethod ipleentation linear ethod take the for: f wk,x k w, x = Aw kx k + w k x x k + w w k x k y 6 Equivalently, one ay rewrite this expression as a Least Absolute Deviation LAD objective: f wk,x k w, x = w x k, r i l i l i, w k ri wk y }{{} x x i l i, w k x k, r i k }{{} i= =:A i }{{} =:ỹ i =:z = Az ỹ Thus, each iteration of Algorith 3 requires solving a strongly convex optiization proble: { z k+ = argin z S ν Az ỹ + } α z Motivated by the work of [5] on robust phase retrieval, we solve this subproble with the graph splitting variant of the Alternating Direction Method of Multipliers, as described in [4] This iterative ethod applies to probles of the for in z X st t ỹ + α z t = Az Yielding the following subprobles, which are repeatedly executed: { z argin z S ν α z + ρ } z z k λ k { t argin t t ỹ + ρ } t t k ν k [ ] z+ Id +d A [ ] Id +d A z + λ t + A I 0 0 t + ν λ + λ + z z +, ν + ν + t t +, where λ R d +d and ν R are dual ultipliers and ρ > 0 is a control paraeter Each above step ay be coputed analytically We found in our experients that choosing α = and ρ yielded fast convergence Our stopping criteria for this subproble is considered et when the prial residual satisfies z +, t + z, t ɛ k d + d + ax { z, t } and the dual residual satisfies λ +, ν + λ, ν ɛ k d + d + ax { λ, ν } with ɛ k = k 6 Artificial Data We first illustrate the perforance of the prox-linear and subgradient ethods under noise odel N with iid standard Gaussian noise ξ i Both ethods are initialized with Algorith 4 We experiented with Gaussian noise of varying variances, and observed that 8

29 higher levels did not adversely affect the perforance of our algorith This is not surprising, since the theory suggests that both the objective and the initialization procedure are robust to gross outliers We analyze the perforance with proble diensions d {400, 000} and d = 500 and with nuber of easureents = c d + d with c varying fro to 8 In Fig and 3, we have depicted how the quantity wk x k w x F w x F changes per iteration for the prox-linear and subgradient ethods We conducted tests in both the oderate corruption p fail = 5 and high corruption p fail = 45 regies For both ethods, under oderate corruption p fail = 5 we see that exact recovery is possible as long as c 5 Likewise, even in high corruption regie p fail = 45 exact recovery is still possible as long as c 8 We also illustrate the perforance of Algorith when there is no corruption at all in Fig, which converges an order of agnitude faster than Algorith In ters of algorith perforance, we see that the prox-linear ethod takes few outer iterations, approxiately 5, to achieve very high accuracy, while the subgradient ethod requires a few hundred iterations This behavior is expected as the prox-linear ethod converges quadratically and the subgradient ethod converges linearly Although the nuber of iterations of the prox-linear ethod is sall, we deonstrate in the sequel that its total run-tie, including the cost of solving subprobles, can be higher than the subgradient ethod 6 Nuber of atrix-vector ultiplications Each iteration of the prox-linear ethod requires the nuerical resolution of a convex optiization proble We solve this subproble using the graph splitting ADMM algorith, as described in [4], the cost of which is doinated by the nuber of atrix vector products required to reach the target accuracy The nuber of inner iterations of the prox-linear ethod and thus the nuber of atrix vector products is not deterined a priori The cost of each iteration of the subgradient ethod, on the other hand, is on the order of 4 atrix vector products In the subsequent plots, we solve a sequence of synthetic probles for d = d = 00 and keep track of the total nuber of atrix-vector ultiplications perfored We run both ethods until we obtain wx w x F 0 w x 5 Additionally, we keep F track of the sae statistics for the subgradient ethod We present the results in Fig 4 We observe that the nuber of atrix-vector ultiplications required by the prox-linear ethod can be uch greater than those required by the subgradient ethod Additionally, they see to be uch ore sensitive to the ratio 6 Choice of step size decay d +d Due to the sensitivity of Algorith to the step size decay q, we experient with different choices of q in order to find an epirical range of values which yield acceptable perforance To that end, we generate synthetic probles of diension and choose q {090, 0905,, 0995}, and record the average error of the final iterate after 000 iterations of the subgradient ethod for different choices of = c d +d The average is taken 9

30 d =400,d =500 d =000,d = d =400,d =500,000, d =000,d =500,000, k 0 0 k d =400,d =500,000, d 0 =000,d 500 =500,000, k 0 0 k c = k c = 3 c = 4 c = 5 c = 6 c k= 8 Figure : Diensions are d, d = 400, 500 in the first colun and d, d = 000, 500 in the second colun We plot the error wk x k w x F / w x F vs iteration count Top row is using Algorith with p fail = 05 Second row is using Algorith with p fail = 045 Third row is using Algorith with p fail = 0 30

31 d =400,d =500 d =000,d = d 0 5 =400,d 0 = k d 0 5 =000,d 0 = k c = c = 3 c = 4 c = 5 c = 6 c = 7 c = 8 k 0 8 k Figure 3: Diensions are d, d = 400, 500 in the first colun and d, d = 000, 500 in the second colun We plot the error w k x k w x F / w x F vs iteration count for an application of Algorith 3 in the two settings: p fail = 05 top row and p fail = 045 botto row over 50 test runs with λ = 0 We test both noisy and noiseless instances to see if corruption of entries significantly changes the effective range of q Results are shown in Fig 5 63 Robustness to noise We now epirically validate the robustness of the prox-linear and subgradients algoriths to noise In a setup failiar fro other recent works [4, 5], we generate phase transition plots, where the x-axis varies with the level of corruption p fail, the y-axis varies as the ratio d +d changes, and the shade of each pixel represents the percentage of proble instances solved successfully For every configuration p fail, /d + d, we run 00 experients Noise odel N - independent noise Initially, we experient with Gaussian rando atrices and d, d {00, 00, 00, 00}, the results for which can be found in Fig 6 3

32 Matvec uls to accuracy Prox c = 4 Prox c = 8 Subg c = 4 Subg c = p fail Figure 4: Matrix-vector ultiplications to reach rel accuracy of 0 5 Final noralized error q Final noralized error q c = 3 c = 6 c = 8 c = 3 c = 6 c = 8 Figure 5: Final noralized error wk x k w x F / w x F for Algorith with different choices of q, in the settings p fail = 0 left and p fail = 05 right The phase transition plots are siilar for both diensionality choices, revealing that in the oderate independent noise regie p fail 5%, setting 4d +d suffices On the other hand, for exact recovery in high noise regies p fail 45%, one ay need to choose as large as 8 d + d We repeat the sae experient in the setting where the atrix L is deterinistic and has orthogonal coluns of Euclidean nor, and R is a gaussian rando atrix Specifically, we take L to be a partial Hadaard atrix, fro the first d coluns of an Hadaard atrix In that case, the operator v Lv can be coputed efficiently in O log tie by 0-padding v to length and coputing its Fast Walsh-Hadaard Transfor FWHT Additionally, the products w L w can also be coputed in O log tie by taking the FWHT of w and keeping the first d coordinates of the result 3

33 The phase transition plots can be found in Fig 7 A coparison with the phase transition plot in Fig 6 shows a different trend In this case, exact recovery does not occur when the noise is above p fail 0% and is in the range {,, 8} Noise odel N - arbitrary noise We now repeat the previous experients, but switch to noise odel N In particular, we now adversarially hide a different signal in a subset of easureents, ie, we set { l i, w x, r i, i / I in, y i = l i, w ip x ip, r i i I out, where in the above w ip, x ip R d R d is an arbitrary pair of signals Intuitively, this is a ore challenging noise odel than N, since it allows an adversary try to trick the algorith into recovering an entirely different signal Our experients confir that this regie is indeed ore difficult for the proposed algoriths, which is why we only depict the range p fail [0, 038] in Figs 8 and 9 below 6 Perforance of initialization on real data We now deonstrate the proposed initialization strategy on real world iages Specifically, we set w and x to be two rando digits fro the training subset of the MNIST dataset [30] In this experient, the easureent atrices L, R R have iid Gaussian entries, and the noise follows Model N with p fail = 045 We apply the initialization ethod and plot the resulting iages initial estiates in Fig 0 Evidently, the initial estiates of the iages are visually siilar to the true digits, up to sign; in other exaples, the foreground appears to be switched with the background, which corresponds to the natural sign abiguity Finally, we plot the noralized error for the two recovery ethods subgradient and prox-linear in Fig 63 Experients on Big Data We apply the subgradient ethod for recovering large-scale real color iages W, X R n n 3 In this setting, p fail = 00 so using Algorith is applicable with in X f = 0 We flatten the atrices W, X into 3n diensional vectors w, x In contrast to the previous experients, our sensing atrices are of the following for: HS L = HS k, R = where H {, } d d / d is the d d syetric noralized Hadaard atrix and S i = diagξ,, ξ d, ξ iid {, } is a diagonal rando sign atrix The sae holds for S i Notice that we can perfor the operations w Lw, x Rx in Okd log d tie: we first for the eleentwise product between the signal and the rando signs, and then take its Hadaard transfor, which can be perfored in Od log d flops We can efficiently copute 33 HS HS k,

34 /d + d d = d = d = d = /d + d d = d = d = d = Corruption level Corruption level /d + d Figure 6: Phase transition for M, N d = d = d = d = Corruption level Figure 8: Phase transition for M, N /d + d Figure 7: Phase transition for M, N d = d = d = d = Corruption level Figure 9: Phase transition for M, N p L p, q R q, required for the subgradient ethod, in a siilar fashion We recover each channel separately, which eans we essentially have to solve three siilar iniization probles Notice that this results in diensionality d = d = n, = kn for each channel We observed that our initialization procedure Algorith 4 is extreely accurate in this setting Therefore to better illustrate the perforance of the local search algoriths, we perfor the following heuristic initialization For each channel, we first saple ŵ, x S d, rescale by the true agnitude of the signal, and run Algorith for one step to obtain our initial estiates w 0, x 0 34

35 Figure 0: Digits 5, 6 top and 9, 6 botto Original iages are shown on the left, estiates on the right Paraeters: p fail = 045, = Fig 0top Fig 0botto 0 Fig 0top Fig 0botto Error 0 5 Error k k Figure : Relative error vs iteration count on nist digits for subgradient ethod left and prox-linear ethod right An exaple where we recover a pair of 5 5 color iages using the Polyak subgradient ethod Algorith is shown below; Fig shows the progression of the estiates w k, up until the 90-th iteration, while Fig 3 depicts the noralized error at each iteration for the different channels of the iages References [] Alireza Aghasi, Ali Ahed, and Paul Hand Branchhull: Convex bilinear inversion fro the entrywise product of signals with known signs arxiv preprint arxiv:700434, 35

36 Figure : Iterates w0i, i =,, 9, k, d, n =, 6, 8, 5 07 [] Alireza Aghasi, Ali Ahed, Paul Hand, and Babhru Joshi A convex progra for bilinear inversion of sparse vectors arxiv preprint arxiv: , 08 [3] Ali Ahed, Alireza Aghasi, and Paul Hand Blind deconvolutional phase retrieval via convex prograing arxiv preprint arxiv: , 08 [4] Ali Ahed, Benjain Recht, and Justin Roberg Blind deconvolution using convex prograing IEEE Transactions on Inforation Theory, 603:7 73, 04 36

37 Error Red Blue Green k Figure 3: Noralized error for different channels in iage recovery [5] Paolo Albano and Pierarco Cannarsa Singularities of seiconcave functions in Banach spaces In Stochastic analysis, control, optiization and applications, Systes Control Found Appl, pages 7 90 Birkhäuser Boston, Boston, MA, 999 [6] Yu Bai, Qijia Jiang, and Ju Sun Subgradient descent learns orthogonal dictionaries arxiv preprint arxiv:80070, 08 [7] Jeff Bezanson, Alan Edelan, Stefan Karpinski, and Viral B Shah Julia: A fresh approach to nuerical coputing SIAM review, 59:65 98, 07 [8] JM Borwein and QJ Zhu Techniques of Variational Analysis Springer Verlag, New York, 005 [9] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart Concentration inequalities: A nonasyptotic theory of independence Oxford university press, 03 [0] JV Burke Descent ethods for coposite nondifferentiable optiization probles Math Prograing, 333:60 79, 985 [] JV Burke and MC Ferris A Gauss-Newton ethod for convex coposite optiization Math Prograing, 7, Ser A:79 94, 995 [] EJ Candès, X Li, and M Soltanolkotabi Phase retrieval via Wirtinger flow: theory and algoriths IEEE Trans Infor Theory, 64: , 05 [3] EJ Candès and B Recht Exact atrix copletion via convex optiization Found Coput Math, 96:77 77, 009 [4] Eanuel J Candes and Yaniv Plan Tight oracle inequalities for low-rank atrix recovery fro a inial nuber of noisy rando easureents IEEE Transactions on Inforation Theory, 574:34 359, 0 37

38 [5] Yuxin Chen, Yuejie Chi, and Andrea J Goldsith Exact and stable covariance estiation fro quadratic sapling via convex prograing IEEE Trans Infor Theory, 67: , 05 [6] Yuejie Chi, Yue M Lu, and Yuxin Chen Nonconvex optiization eets low-rank atrix factorization: An overview arxiv preprint arxiv: , 08 [7] Sunav Choudhary and Urbashi Mitra Sparse blind deconvolution: What cannot be done In Inforation Theory ISIT, 04 IEEE International Syposiu on, pages IEEE, 04 [8] Mark A Davenport and Justin Roberg An overview of low-rank atrix recovery fro incoplete observations arxiv preprint arxiv:60064, 06 [9] Chandler Davis and Willia Morton Kahan The rotation of eigenvectors by a perturbation iii SIAM Journal on Nuerical Analysis, 7: 46, 970 [0] Daek Davis, Ditriy Drusvyatskiy, Kellie J MacPhee, and Courtney Paquette Subgradient ethods for sharp weakly convex functions arxiv preprint arxiv:803046, 08 [] Daek Davis, Ditriy Drusvyatskiy, and Courtney Paquette The nonsooth landscape of phase retrieval arxiv preprint arxiv:70347, 07 [] AL Dontchev and RT Rockafellar Iplicit functions and solution appings Monographs in Matheatics, Springer-Verlag, 009 [3] D Drusvyatskiy and AS Lewis Error bounds, quadratic growth, and linear convergence of proxial ethods To appear in Math Oper Res, arxiv:600666, 06 [4] D Drusvyatskiy and C Paquette Efficiency of iniizing copositions of convex functions and sooth aps Preprint arxiv:605005, 06 [5] JC Duchi and F Ruan Solving ost of a set of quadratic equalities: Coposite optiization for robust phase retrieval Preprint arxiv: , 07 [6] J L Goffin On convergence rates of subgradient optiization ethods Math Prograing, 33:39 347, 977 [7] Wen Huang and Paul Hand Blind deconvolution by steepest descent algorith on a quotient anifold arxiv preprint arxiv: v, 08 [8] Alexander D Ioffe Variational analysis of regular appings Springer Monographs in Matheatics Springer, Cha, 07 Theory and applications [9] Michael Kech and Felix Kraher Optial injectivity conditions for bilinear inverse probles with applications to identifiability of deconvolution probles SIAM Journal on Applied Algebra and Geoetry, :0 37, 07 38

39 [30] Y Lecun, L Bottou, Y Bengio, and P Haffner Gradient-based learning applied to docuent recognition Proceedings of the IEEE, 86:78 34, Nov 998 [3] Adrian S Lewis and Jong-Shi Pang Error bounds for convex inequality systes In Generalized convexity, generalized onotonicity: recent results Luiny, 996, volue 7 of Nonconvex Opti Appl, pages 75 0 Kluwer Acad Publ, Dordrecht, 998 [3] AS Lewis and SJ Wright A proxial ethod for coposite iniization Math Progra, pages 46, 05 [33] Xiaodong Li, Shuyang Ling, Thoas Stroher, and Ke Wei Rapid, robust, and reliable blind deconvolution via nonconvex optiization arxiv preprint arxiv: , 06 [34] Yanjun Li, Kiryung Lee, and Yora Bresler Identifiability in blind deconvolution with subspace or sparsity constraints IEEE Transactions on Inforation Theory, 67: , 06 [35] Yuanxin Li, Cong Ma, Yuxin Chen, and Yuejie Chi Nonconvex atrix factorization fro rank-one easureents arxiv preprint arxiv:800686, 08 [36] Shuyang Ling and Thoas Stroher Self-calibration and biconvex copressive sensing Inverse Probles, 3:500, 3, 05 [37] Cong Ma, Kaizheng Wang, Yuejie Chi, and Yuxin Chen Iplicit regularization in nonconvex statistical estiation: Gradient descent converges linearly for phase retrieval, atrix copletion and blind deconvolution arxiv preprint arxiv:70467, 07 [38] BS Mordukhovich Variational analysis and generalized differentiation I, volue 330 of Grundlehren der Matheatischen Wissenschaften [Fundaental Principles of Matheatical Sciences] Springer-Verlag, Berlin, 006 Basic theory [39] Sahand N Negahban, Pradeep Ravikuar, Martin J Wainwright, and Bin Yu A unified fraework for high-diensional analysis of M-estiators with decoposable regularizers Statist Sci, 74: , 0 [40] E A Nurinskii The quasigradient ethod for the solving of the nonlinear prograing probles Cybernetics, 9:45 50, Jan 973 [4] Neal Parikh and Stephen Boyd Block splitting for distributed optiization Matheatical Prograing Coputation, 6:77 0, 04 [4] J-P Penot Calculus without derivatives, volue 66 of Graduate Texts in Matheatics Springer, New York, 03 [43] RA Poliquin and RT Rockafellar Prox-regular functions in variational analysis Trans Aer Math Soc, 348: , 996 [44] RT Rockafellar Favorable classes of Lipschitz-continuous functions in subgradient optiization In Progress in nondifferentiable optiization, volue 8 of IIASA Collaborative Proc Ser CP-8, pages 5 43 Int Inst Appl Sys Anal, Laxenburg, 98 39

40 [45] RT Rockafellar and RJ-B Wets Variational Analysis Grundlehren der atheatischen Wissenschaften, Vol 37, Springer, Berlin, 998 [46] S Rolewicz On paraconvex ultifunctions In Third Syposiu on Operations Research Univ Mannhei, Mannhei, 978, Section I, volue 3 of Operations Res Verfahren, pages Hain, Königstein/Ts, 979 [47] Y Shechtan, Y C Eldar, O Cohen, H N Chapan, J Miao, and M Segev Phase retrieval with application to optical iaging: A conteporary overview IEEE Signal Processing Magazine, 33:87 09, May 05 [48] Ruoyu Sun and Zhi-Quan Luo Guaranteed atrix copletion via non-convex factorization IEEE Trans Infor Theory, 6: , 06 [49] Stephen Tu, Ross Boczar, Max Sichowitz, Mahdi Soltanolkotabi, and Benjain Recht Low-rank solutions of linear atrix equations via procrustes flow In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volue 48, ICML 6, pages JMLRorg, 06 [50] Roan Vershynin Introduction to the non-asyptotic analysis of rando atrices In Copressed sensing, pages 0 68 Cabridge Univ Press, Cabridge, 0 [5] Roan Vershynin High-Diensional Probability: An Introduction with Applications in Data Science Cabridge University Press, 08 [5] G Wang, GB Giannakis, and YC Eldar Solving systes of rando quadratic equations via a truncated aplitude flow arxiv: , 06 Appendix A Sharpness A Proof of Proposition 4 Without loss of generality, we assue that M = by rescaling and that w = e R d and x = e R d by rotation invariance Recall that the distance to Sν ay be written succinctly as distw, x, Sν = inf { w α /ν α ν w + x /α x } Before we establish the general result, we first consider the sipler case, d = d = Clai The following bound holds: wx inf { w /ν α ν α + x /α }, for all w, x [ ν, ν] Proof of Clai Consider a pair w, x R with w, x ν It is easy to see that without loss of generality, we ay assue w x We then separate the proof into two cases, which are graphically depicted in Figure 4 40

41 Figure 4: The regions K, K correspond to cases and of the proof of Clai, respectively Case : w x ν ν In this case, we will traverse fro w, x to the S ν in the direction, See Figure 4 First, consider the equation in the variable t and note the equality wx w + xt + t / =, wx w + xt + t / = w t/ x t/ Using the quadratic forula to solve for t, we get t = w + x w + x wx Note that the discriinant is nonnegative since w + x wx = w + x + xw + Set α = w t/ and note the identity /α = x t/ Therefore, wx = /αw α + αx /α + w αx /α Observe now the equality = x t/ t/ + w t/ t/ + t / = t w + x t/ = t w + x wx t t = w α + x /α / Hence it reains to bound α First we note that α 0, /α 0, since α + /α = w t/ + x t/ = w + x + w + x wx 0 In addition, since w x, we have α = w t/ x t/ = /α Since α and /α are positive, we ust therefore have α /ν Thus, it reains to verify the bound α ν To that end, notice that Therefore, ν ν /α = x t/ w t/ ν ν = α ν ν α Since the function t t is increasing, we deduce α ν α t 4

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A Probabilistic and RIPless Theory of Compressed Sensing

A Probabilistic and RIPless Theory of Compressed Sensing A Probabilistic and RIPless Theory of Copressed Sensing Eanuel J Candès and Yaniv Plan 2 Departents of Matheatics and of Statistics, Stanford University, Stanford, CA 94305 2 Applied and Coputational Matheatics,

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh April 6, 08 Abstract We investigate the convergence properties of increental irror descent type

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

An incremental mirror descent subgradient algorithm with random sweeping and proximal step

An incremental mirror descent subgradient algorithm with random sweeping and proximal step An increental irror descent subgradient algorith with rando sweeping and proxial step Radu Ioan Boţ Axel Böh Septeber 7, 07 Abstract We investigate the convergence properties of increental irror descent

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Fixed-to-Variable Length Distribution Matching

Fixed-to-Variable Length Distribution Matching Fixed-to-Variable Length Distribution Matching Rana Ali Ajad and Georg Böcherer Institute for Counications Engineering Technische Universität München, Gerany Eail: raa2463@gail.co,georg.boecherer@tu.de

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

ORIE 6340: Mathematics of Data Science

ORIE 6340: Mathematics of Data Science ORIE 6340: Matheatics of Data Science Daek Davis Contents 1 Estiation in High Diensions 1 1.1 Tools for understanding high-diensional sets................. 3 1.1.1 Concentration of volue in high-diensions...............

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization

Structured signal recovery from quadratic measurements: Breaking sample complexity barriers via nonconvex optimization Structured signal recovery fro quadratic easureents: Breaking saple coplexity barriers via nonconvex optiization Mahdi Soltanolkotabi Ming Hsieh Departent of Electrical Engineering University of Southern

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Tight Complexity Bounds for Optimizing Composite Objectives

Tight Complexity Bounds for Optimizing Composite Objectives Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago

More information

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE

RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS MEMBRANE Proceedings of ICIPE rd International Conference on Inverse Probles in Engineering: Theory and Practice June -8, 999, Port Ludlow, Washington, USA : RECOVERY OF A DENSITY FROM THE EIGENVALUES OF A NONHOMOGENEOUS

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

The linear sampling method and the MUSIC algorithm

The linear sampling method and the MUSIC algorithm INSTITUTE OF PHYSICS PUBLISHING INVERSE PROBLEMS Inverse Probles 17 (2001) 591 595 www.iop.org/journals/ip PII: S0266-5611(01)16989-3 The linear sapling ethod and the MUSIC algorith Margaret Cheney Departent

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all

2 Q 10. Likewise, in case of multiple particles, the corresponding density in 2 must be averaged over all Lecture 6 Introduction to kinetic theory of plasa waves Introduction to kinetic theory So far we have been odeling plasa dynaics using fluid equations. The assuption has been that the pressure can be either

More information

Data Dependent Convergence For Consensus Stochastic Optimization

Data Dependent Convergence For Consensus Stochastic Optimization Data Dependent Convergence For Consensus Stochastic Optiization Avleen S. Bijral, Anand D. Sarwate, Nathan Srebro Septeber 8, 08 arxiv:603.04379v ath.oc] Sep 06 Abstract We study a distributed consensus-based

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf Konrad-Zuse-Zentru für Inforationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin - Wilersdorf Folkar A. Borneann On the Convergence of Cascadic Iterations for Elliptic Probles SC 94-8 (Marz 1994) 1

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

A note on the realignment criterion

A note on the realignment criterion A note on the realignent criterion Chi-Kwong Li 1, Yiu-Tung Poon and Nung-Sing Sze 3 1 Departent of Matheatics, College of Willia & Mary, Williasburg, VA 3185, USA Departent of Matheatics, Iowa State University,

More information

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket.

Generalized eigenfunctions and a Borel Theorem on the Sierpinski Gasket. Generalized eigenfunctions and a Borel Theore on the Sierpinski Gasket. Kasso A. Okoudjou, Luke G. Rogers, and Robert S. Strichartz May 26, 2006 1 Introduction There is a well developed theory (see [5,

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

Recovery of Sparsely Corrupted Signals

Recovery of Sparsely Corrupted Signals TO APPEAR IN IEEE TRANSACTIONS ON INFORMATION TEORY 1 Recovery of Sparsely Corrupted Signals Christoph Studer, Meber, IEEE, Patrick Kuppinger, Student Meber, IEEE, Graee Pope, Student Meber, IEEE, and

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

STOPPING SIMULATED PATHS EARLY

STOPPING SIMULATED PATHS EARLY Proceedings of the 2 Winter Siulation Conference B.A.Peters,J.S.Sith,D.J.Medeiros,andM.W.Rohrer,eds. STOPPING SIMULATED PATHS EARLY Paul Glasseran Graduate School of Business Colubia University New Yor,

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information