arxiv: v1 [math.na] 10 Oct 2016

Size: px
Start display at page:

Download "arxiv: v1 [math.na] 10 Oct 2016"

Transcription

1 GREEDY GAUSS-NEWTON ALGORITHM FOR FINDING SPARSE SOLUTIONS TO NONLINEAR UNDERDETERMINED SYSTEMS OF EQUATIONS MÅRTEN GULLIKSSON AND ANNA OLEYNIK arxiv:6.395v [ath.na] Oct 26 Abstract. We consider the proble of finding sparse solutions to a syste of underdeterined nonlinear syste of equations. The ethods are based on a Gauss-Newton approach with line search where the search direction is found by solving a linearized proble using only a subset of the coluns in the Jacobian. The choice of coluns in the Jacobian is ade through a greedy approach looking at either axiu descent or an approach corresponding to orthogonal atching for linear probles. The ethods are shown to be convergent and efficient and outperfor the l approach on the test probles presented.. Introduction We consider the nonlinear underdeterined syste of equations or siply f x,..., x N ) =. f x,..., x N ) = ) fx) =, where x R N and f : D R N R, < N is twice continuously differentiable on the open convex set D, i.e., f i C 2 D), i =,...,. If fd) the solution to ) is not unique, which is a direct consequence of the Iplicit Function Theore []. We refer to [2, 3, 4, 5] for the exaples fro different application areas as otivation for solving ). In this paper we are interested in sparse solutions to ), i.e., solutions that contain only a few nonzero coponents. Let x be the so-called l - nor which is actually not a nor) on R N defined as the nuber of nonzero eleents x = {i : x i }. We say that a vector x is n-sparse if x n, and sparse if x. The proble of finding the ost sparse solution to ) reads 2) in x x s.t. fx) =. Due to the cobinatorial coplexity proble 2) is considered to be intractable, see [4], and current algoriths can not guarantee that the sparse) solution attained is a solution to 2). Linear probles, i.e., fx) = Ax b, A R N, b R has been studied extensively. For algoriths solving the linear sparse solution proble we refer to [4]. Iportant references can also be found in [6]. To the best of our knowledge there are no nuerical algoriths specifically developed to find sparse solutions of ) except the ones described in [2] which we will refer to as the l -ethod. 2 Matheatics Subject Classification. 68Q25, 68R, 68U5. Key words and phrases. sparse optiization, underdeterined nonlinear systes of equations, Gauss-Newton, line search, greedy algorith, sparsity constraints.

2 2 M. GULLIKSSON AND A. OLEYNIK We will later copare this ethod with our approach and therefore we describe the ethod in ore detail. Let x p, < p < be given as 3) x p = i x i p ) /p. For p 3) defines the l p -nor while for < p < it is only a quasi-nor. In the sequel, we use instead of 2. The algoriths in [2] are based on solving 4) in x x p s.t. fx) = for < p and f given as above, which is otivated by the fact that x p p x, p + on a bounded set. In particular, the l -nor algorith described in [2] is realized in the following way. Starting with x = one obtains a new approxiation as x k+ = x k + p k, k =, 2, 3,..., where p k is the solution to 5) in p p s.t. f k + J k p =. Here we denote f k = fx k ) and J k = f i x k )/ x j ) ij, i =,...,, j =,..., N is the Jacobian of fx) at x = x k. The proble 5) can be recast as a linear prograing proble 6) where in w c T w s.t. Aw = b, w c = 2N, A = J k, J k ), b = f k, w = u; v), p = u v. In [2] it was shown that the ethod converges locally to a solution which is not necessarily sparse) with quadratic convergence rate. However, global convergence was not proven. There are other ethods not directly applied to 2) but that contains soe ideas and properties related to our approach and thus relevant to ention here. In a series of papers [7, 8, 9, 6] a general theory is developed for the proble 7) in x F x) s.t. x C s B, where F : R N R, B is a closed and convex set, and C s = {x R n : x s}. The theory is used for a nuber of applications and several algoriths are developed and analyzed. In our context, we note that in [9] an algorith, GESPAR greedy sparse phase retreival), is developed to solve a nonlinear overdeterined least squares proble based on a coordinate search where the sparse sall) overdeterined nonlinear least squares subprobles are solved using a Gauss- Newton approach with line search. Convergence results for the gradient are derived. In [] the proble 7) with B = R n is considered with a coordinate search algorith based on a local gradient search in a sparse solution set gradient support pursuit). Estiates of the error in the iterates are developed using the size of the eleents in the gradient at the sparse solution. Furtherore, there are cobinatorial ethods that solve the nonlinear proble ) using cardinality constrains, see [5], which we do not consider here. Here we present an alternative ethod, that we call a Greedy Gauss-Newton algorith, that cobines a greedy approach with the Gauss-Newton ethod []. The ethod is based on a line search where at the k th iterate we set 8) x k+ = x k + α k p k, k =, 2,..., where p k is the search direction and α k is the step length. We start the iterations with x = or x sparse enough. In every iteration we use the atrix L k consisting of the coluns of J k corresponding to the nonzero part of x k and an additional colun of J k, J k :, t), to calculate the search direction as p t = arg in p f k L k, J:, t)) p. The choice of t is discussed and we analyze the two choices in detail. The first one is based on axiizing the descent of fx) 2 2

3 GREEDY GAUSS-NEWTON ALGORITHM 3 at x = x k in the direction p t and we call this ethod Maxiu Descent MD). The second idea of choosing t is siilar to orthogonal atching on the linear proble in f k + J k p 2, see [4], and consists of axiizing the angle between r k = f k + L k L + k f k and J k :, t), where L + k is the pseudo inverse of L k [2]. We denote our ethod based on orthogonal atching as OM. The paper is organized as follows. In Section 2 we describe how to calculate p t and we show that it is a descent direction together with soe useful corollaries. The MD algorith is presented in Section 2. and OM in Section 2.2. In Section 3 we show results on global and local convergence together with the algorith in pseudocode, and finally we give soe nuerical tests in Section The algorith Here we describe the line search ethod 8) to find a sparse solution to ). We start with x = or soe sufficiently sparse vector. At iteration k, let x k contain n k nonzero eleents at positions i Ω k and zero eleents at i Ω k where Ω k = {i, i 2,..., i nk }, Ω k = {, 2,..., N} \Ω k. We use Matlab [3] inspired notation, that is, xi) = x i, x:) = x, x : n k ) = x,..., x nk ) T, and xω k ) = x i,..., x ink ) T. We ai at finding p k in 8) such that i) p k is a descent direction and ii) the update x k+ = x k + α k p k is n k + )-sparse for any α k R. The ost straightforward approach would be to solve the linearized proble to ), that is, 9) f k + J k p k =. However, solving 9) for a sparse p k is not efficient enough for large N, [4]. Thus, for every t Ω k we define a projection Π t k as, i = j and i Ω k {t}, Π t k i, j) =, otherwise, where i, j =,..., N. Then instead of 9) we solve the iniization proble ) in p 2 p 2 s.t. in p 2 f k + J k Π t k p 2, with t Ω k to obtain p k. We choose t Ω k by two different ethods: MD, t = t MD, or OM, t = t OM, that we describe in details in the coing subsections. Let p t be a solution of ) for t Ω k. It is clear that p t for any t Ω k satisfies the sparsity requireent ii). Indeed, p t Ω k \ {t}) = for any t Ω k. Below we discuss when p t is a descent direction. Denote L k = J k :, Ω k ), then the reaining non-zero part of p t, that is, q t = p t Ω k ) T, p t t)) T R n k+ is the solution to ) that is, in q 2 q 2 s.t. in q 2 f k + L k, J:, t) ) q 2, 2) q t = L k, J k :, t) ) + fk. Note that q t is the unique iniu of f k + L k, J:, t) ) q if rank L k, J k :, t) )) n k +.

4 4 M. GULLIKSSON AND A. OLEYNIK Lea. Let p t be a solution of ), J k and f k be given as above. Then 3) p T t J T k f k = f T k L kl + k f k + f T k I L kl + k )J k:, t) 2 I L k L + k )J k:, t) 2 and p t is a descent direction of /2 fx) 2 at x = x k if and only if 4) f T k L kl + k f k + f T k I L kl + k )J k:, t) 2 I L k L + k )J k:, t) 2 >. Proof. Let t Ω k, a = J k :, t), and P = I L k L + k. Observe that L kl + k orthogonal projections on RL k ) and RL k ), respectively. We have p T t Jk T f k = qt T Lk, a ) + fk. Theore 2 in Ch.7 Section 5 in [2] yields 5) Lk, a ) + L + = k ) L+ k abt b T where P a) +, P a, b T = a T L + k )T L + k + a T L + k )T L + k a, P a. Thus, using 2) and 5) we obtain and P define the 6) p T t J T k f k = q T t Lk, a ) + fk = f T k L kl + k f k + f T k P a 2 P a 2. The second clai of the lea follows fro 3) and the definition of a descent direction. Corollary. The solution p t of ) is a descent direction of /2 fx) 2 at x = x k unless f k RL k ) and J k :, t) RL k ) siultaneously. Proof. Observe that L k L + k is positive sei-definite. Thus the two last ters in 4) are nonnegative. Assue that fk T L kl + k f k =. Then f k RL k ) and the last ter in 3) is equal to zero only if a RL k ). It is clear fro 3) that adding an extra colun of J k will iprove the descent as long as the added colun does not belong to RL k ). We forulate it as a corollary. Corollary 2. The descent of p t, t Ω k is not less than the descent of p where p Ω k ) = and p Ω k ) is the solution to 7) in d 2 xω k) + d 2 s.t. in d 2 f k + L k d 2. Proof. Siple calculations show that p T J k T f k = fk T L kl + k f k. Together with 3) it iplies p T J k T pt t Jk T. After p k is constructed, the step length α k is found by using a standard step length algorith, see [4], that satisfy the Goldan-Anijo rule. We get x k+ = x k +α k p k that has at least n k + nonzero eleents and Ω k+ = Ω k {t }. If the step length α k is too sall it indicates that the descent is insufficient and we restart the algorith with a sparse enough x where the positions and the values of nonzero eleents are chosen randoly, see Section 3.. If there are eleents in x k+ close to zero it could ake sense to put these values to zero and then recalculate the set of non-zero entries Ω k+. This approach would be however very uch proble dependent and we do not consider it here.

5 GREEDY GAUSS-NEWTON ALGORITHM Maxiu descent ethod MD). MD is based on choosing p k = p tmd where t MD = arg ax p T t Jk T f ) k t Ω k or, equivalently, 8) t MD = arg ax t Ω k qt T Lk, J:, t) ) ) T fk. The next lea gives us the explicit forula for coputing t = t MD. Lea 2. Let p t be the solution to ) for t Ω k. If there exists a t Ω k such that p t is a descent direction of /2 fx) 2 at x k, then the axiu descent direction is given as p k = p tmd where 9) t MD = arg ax t Moreover, p tmd f T k I L kl + k )J k:, t) I L k L + k )J k:, t). provides the iniu of the nor f k + J k p t, i.e., p tmd = arg in f k + J k p t. p t Proof. Let t Ω k, a = J k :, t), P = I L k L + k, and S = P aat P/ P a 2, where P and S define the orthogonal projections on RL k ) on RP a), respectively. Descent is given by 3) where the first ter in the right hand side, fk T L kl + k f k, does not depend on a and thus the axiu descent is achieved when fk T P a / P a is axiu. Thus, we obtain the expression in 9). To prove the second clai of the theore we copute the squared nor using the expression for q t in 2) 2) f k + J k Π t k p t 2 = f k + L k, a ) q t 2 = I L k, a ) L k, a ) ) + 2 f k. Using 5) we obtain I L k, a ) L k, a ) ) + 2 f k = P P ab T )f k 2 = P S)f k 2 2) = fk T P 2 P S)f k = fk T P f k f k T P a 2 P a 2 The ter fk T P f k does not depend on a and the nor f k + J k Π t k p t reaches its iniu when fk T P a / P a is axiu. Fro Corollary and Lea 2 it is clear that p k = p tmd is always a descent direction if rankj k ) > rankl k ). Let us assue that q t in 2) is calculated with a QR-decoposition, see [3], N n k, and t MD is calculated using 8). Then the coplexity nuber of flops, i.e., one addition, subtraction, ultiplication, or division of two floating-point nubers) of MD in iteration k is 2n 2 k + + )n k + )N n k ). If instead we use 9), the coplexity is 2n k + )N n k ). Assuing that the ter including N n k is the largest the coplexity of MD can be reduced by accepting a descent large enough without considering the whole set Ω k. However, we have not considered this generalization here Orthogonal atching ethod OM). Let L k = J k :, Ω k ) as before and consider 22) in 2 d 2 s.t. in d 2 f k + L k d 2. d The solution of 22) is d k = L + k f k which is the unique iniu to f k +L k d if rankl k ) n k, and the iniu nor solution otherwise.

6 6 M. GULLIKSSON AND A. OLEYNIK OM ais at finding the colun J k :, t OM ) that is the ost strongly correlated with the linear residual r k = f k + L k d k, i.e., t OM = arg ax J k :, t) rt k t Ω k J k :, t) or equivalently, fk T 23) t OM = arg ax I L kl + k )J k:, t), t Ω k J k :, t) to obtain p k = p tom. Following the assuptions ade for MD regarding coplexity analysis we get the coplexity of OM to be 4n 2 k + 2N n k) where the first ter is the calculation of r k and q tom in 2) and the second is fro solving the axiization proble in 23). Let us consider ) where we set pω k ) = d k, that is, 24) 25) Then ) can be rewritten as or, equivalently, with the solution in 2 { p 2 in s.t. p 2 f k + J k Π t k p 2 pω k ) = d k p in δ 2 dk δ ) 2 s.t. in δ 2 f k + L k d k + J:, t)δ 2, in δ 26) δ t = J k:, t) T δ s.t. in δ 2 I L kl + k )f k + J:, t)δ 2 J k :, t) 2 I L kl + k )f k. Hence, the solution to 24) is p t where p t Ω k ) = d k, p t t) = δ t and p t Ω k \ {t}) =. Lea 3. Let p t be a solution to ) and p t to 24) for t Ω k, and t OM be given by 23). If there exists a descent direction aong p t then p tom and p tom are descent directions. Moreover, p tom gives the iniu nor of f k + J k p t, i.e., p tom = arg in f k + J k p t. p t Proof. Let P define the orthogonal projections on RL k ), i.e.,p = I L k L + k. Fro Corollary,it i seen that p tom is a descent direction. Indeed, if f k RL k ) then any p t gives a descent. Assue that f k RL k ). Let p t, t Ω k be a descent direction. Hence, J k :, t ) RL k ), that is, fk T P J k:, t ) > which iplies fk T P J:, t OM) >. Let t Ω k, a = J k :, t) and Q = aa T / a 2 define the orthogonal projections on Ra). To show that p tom gives a descent we calculate ) p T t Jk T f k = d T L k, T δt ) k f k a T. f k Using the forulas for d k and 26) we have p T t J T k f k = f T k L kl + k f k + f T k P aat f k a 2

7 GREEDY GAUSS-NEWTON ALGORITHM 7 as P aa T is positive sei-definite which can be seen by looking at the eigenvalue equation P aa T u = λu giving λ. Siilarly to as above, f k RL k ) iplies that p t is a descent direction for any t Ω k. Assue that this is not the case and f k RL k ). Then J:, t OM ) RL k ) which iplies fk T P aat f k / a 2 > for a = J:, t OM ) and p tom gives a descent direction. To show that p tom provide the iniu nor we copute f k + J k p t 2 = P f k QP f k 2 = f T k P I Q)P f k = f T k P f k f T k P QP f k = f T k P f k f T k P a 2 a 2. The ter fk T P f k does not depend on a and the nor f k + J k p t reaches its iniu when fk T P a / a is axiu. Fro Lea 3 it follows that one can use p k = p tom instead of p k = p tom. However the coplexity of this approach would be only 2n k less than OM. As Lea 3 and Lea 2 iply f k + J k p tmd f k + J k p tom f k + J k p tom and we have not seen any real advantages of this approach copared to OM, we do not consider it further Coparison and generalizations of OM and MD. There are soe interesting coon features between MD and OM. In 9) we notice that the new colun is chosen as to axiize the angle between the vectors f k and v t k = I L kl + k )J k:, t)/ I L k L + k )J k:, t). Geoetrically this eans that we choose the colun J:, t) whose projection onto RL k ) is as parallel as possible to the nonlinear residual f k. In OM we instead choose t OM fro 23) which is the axiization of the angle between the linear residual r k and J k :, t). This is the sae Orthogonal Mathing principle as for linear proble [4] but here on the linearized proble in p f k + J k p. Fro a coplexity point of view the two ethods are coparable if we assue that N n k but if n k MD will be ore expensive since the large ter is O 2 N n k )) copared to ON n k )) using OM. We note that when n k = rankj k ) no colun will be added and we then choose to reain in the corresponding subspace. There are soe ore or less obvious variants or generalizations of MD and OM and we ention soe here. Firstly, ore than one colun can be added in every iteration siplifying the algorith and possibly aking it ore efficient. Secondly, the search of the coluns ay not be exhaustive, i.e., as soon as a colun is found satisfying the criteria for being added the search can be terinated. Specifically, this is an attractive approach for MD since only sufficient descent is necessary not necessarily axiu descent. Finally, it is possible to iterate in the corresponding subspace at each step possibly using a line search or any other approach. 3. Convergence properties The global convergence is given by the following classical theore that we state here for the sake of copleteness. For the reference see Theore in [5] or Theore in [6]. Theore Global Convergence of a Descent ethod). Let F : D R N R be continuously differentiable on the open convex set D and assue that F satisfy the Lipschitz condition F x) F x) 2 γ x z for every x, z D and soe γ >. Given x D assue that the level set Λ = {x D F x) F x )} is copact. Consider the sequence {x k } defined by 8) with α k satisfying the Arijo- Goldstein condition, and p T k F x k) > for all k N. Then {x k } Λ and p T k 27) li F x k) =. k p k

8 8 M. GULLIKSSON AND A. OLEYNIK Next we show that the algorith in Section 3. with p k chosen using MD ethod or OM has the sae convergence properties as the Gauss-Newton ethod for underdeterined nonlinear probles. Lea 4. Let f be given as in ), x D where D R N is a convex open set such that Λ = {x D fx) fx ) } is copact. Consider the sequence {x k } given by 8) with the descent direction p k chosen using MD or OM, and α k > satisfying the Arijo-Goldstein rule. If rankjx)) = ρ for all x Λ then there is k ρ N such that for k k ρ 28) p T k J T k f k = f T k J kj + k f k. Proof. Under the conditions of Theore x k Λ, see in [6], and thus rankj k ) = ρ, k N. Let a = J k :, t ) where where t = t MD or t = t OM, see 9) and 23). Fro Lea and Corollary, rankl k ) = ρ for all k k ρ for soe k ρ N and thus, I L k L + k )a =. Hence, fro 3) we have 29) p T k J k T f k = fk T L kl + k f k. ) Without loss of generality assue J k = L k, L k and let E R N N be a product of eleentary atrices such that ) ) J k = L k, L k = L k, E. Then which yields 28). ) J k J + k L = k, EE ) + L k, = L k, ) L + ) k = L k L + k Notice that fro Lea 4 the algorith becoes equivalent to the Gauss-Newton ethod only starting fro soe k ρ th iterate, when we already has hopefully) reached the vicinity of a sparse local iniu of /2 f 2, say x. This iniu is a solution to fx) = if rankjx )) = but this is not necessarily the case when rankjx )) <. In practice we exclude the convergence to a stationary point x giving fx ) > by restarting the algorith. We also do a restart when p k fails to give a significant descent, see Section 3.. Let {x k } be generated by the Greedy Gauss-Newton ethod and {x k } x where fx ) =. Then the convergence rate is quadratic given α k = in a vicinity of x, see [5]. However, fro Lea 4 this rate of convergence is only guarantied for k > k ρ. With next proposition we show that this assuption on k can be oitted. Proposition Rate of Convergence). Let f be given as in ) and ˆx R N be such that fˆx) =. Let the sequence {x k } given by 8) with the descent direction p k chosen using MD or OM and α k = converges to ˆx as k. If p k C f k for all k K, for soe K N, then {x k } converges to ˆx quadratically. Proof. Let A k :, Ω k {t }) = J k :, Ω k {t }) and A k :, Ω k \ {t }) = O where t = t MD or t = t OM. Then p k = A + k f k and A + k C. In a vicinity of ˆx the Taylor expansion is valid fˆx) = fx) + J k ˆx x k ) + rx k ) = f k + A k ˆx x k ) + r k with r k = O x ˆx 2 ) as the Hessian is continuous and thus uniforly bounded in a closed neighbourhood of ˆx. We have A + k fˆx) = A+ k f k + A + k A kˆx x k ) + A + k rx). Reebering that fˆx) = and A + k A k = I we obtain Next, which copletes our proof. x k ˆx = A + k f k + A + k rx). x k+ ˆx = x k ˆx) A + k f k = A + k rx) = O x k ˆx 2 )

9 GREEDY GAUSS-NEWTON ALGORITHM The Greedy Gauss-Newton Algorith in pseudocode. Below we outline the algorith we use in our nuerical tests. For the values of the constants in step. we refer to the nuerical tests in Section 4. The paraeter k ax stands for the axiu nuber of iterations counting throughout restarts), ε f, δ x, δ α, tol, and grad are tolerances. In step 4. the sign stands for the Hadaard product and randn, ) returns a vector of N uniforly distributed rando nubers in the interval, ), and prob, ]. The erit function φα) in step. is given as φα) = fx k + αp k ) 2 2 /2. Greedy Gauss-Newton Algorith Predefined functions are f : R N R and Jacobian Jx) : R N R N, < N. Input: k ax, ε f, δ x, δ α, tol, grad, prob 2. k =, x =, Ω =, n restarts = 3. while fx k ) > ε f and k < k ax 4. Find t ax fro 9) if MD or 23) if OM or any other ethod) 5. if the axiu in 9) or 23) respectively is larger than tol 6. Set Ω k+ = Ω k t ax else 7. Set Ω k+ = Ω k end 8. Copute p k = J:, Ω k ) + fx k ) 9. Find α k using the erit function φα). Set x k+ = x k + α k p k. if α k < δ α or Jk T f k / f k < grad 2. n restarts = n restarts + 3. Set x k+ = 2randN, ) ) randn, ) < prob) 4. Update Ω k+ = {i : x k i) > δ x } end 5. Update k = k + end 6. Update Ω k+ = {i : x k i) > δ x } and x k+ Ω k+ ) = 7. Output: Solution to fx) = or if k = k ax the vector x kax A restart, see step 4., is perfored if either the step length is too sall indicating not enough descent, or if the gradient is sall while the nor of f is not sall, see step 2. The first case appears when the Gauss-Newton ethod does not converge locally, i.e., the solution has a large residual f and/or a sall curvature, see [7] for details. The second case for a restart ay occur when the algorith is converging to a local inia where the nor of f is not close to zero. In the next section we use k ax = 2, δ x = 8, ε f = 3, δ α = 3, tol =, and grad = 6. The other constants vary for different probles and are given below. 4. Nuerical tests We test our ethod on three different probles where the solution space is known. The first is a sall proble that is considered in [2]. The second and the third one have quadratic and exponential nonlinearities, respectively. These are large probles which size can be changed. We illustrate the results fro both qualitative and quantitative point of view and test the algorith versus l -ethod described in [2]. 4.. Sall test proble. Let f in ) be given as fx) = Ax + φx) y

10 M. GULLIKSSON AND A. OLEYNIK where A = φx) = x2)x3) x3)x4) 684.4x4)x5) x4)x7).949x)x2).578x)x4).32x4)x7).76x)x2).578x)x4) +.32x4)x7) x)x5) x)x4) y =.999,.485,.567,.84,.96) T. We run the l -ethod and both MD and OM starting with x = R 8. It turns out that for this set up MD and OM are equivalent. All the ethods converged to the sae sparse solution ˆx =,,,,.,.5,, ) T. After three iterations we obtained fx 3 ) < e 5. Below we print the atrix X l = x, x 2, x 3 ) where x k, k =, 2, 3, are the iterates obtained using the l -ethod X l =.94e 5 5.7e 6.64e e 5.2e 5 2.3e 7.9e e e 5.44e e e 5, and X = x, x 2, x 3 ) with x k, k =, 2, 3, obtained using OM or MD) X = The atrices above give a good illustration of the difference between the two algoriths. In particular, the choice of the paraeter δ x plays ore significant role for the l -ethod then for the Greedy Gauss-Newton algorith. Moreover, the axiu sparsity of a solution obtained by the Greedy Gauss-Newton algorith not grater than, which can not be guaranteed by the l -ethod Quadratic test proble. Consider the quadratic function x x) T H x x) 3) fx) = Ax x) + x x) T H 2 x x) 2., x x) T H x x) where A, H i R N N, i =,...,. Let s, n be such that s < n + s N and Q = Q, Q 2 ), Q R n+s) n, Q 2 R n+s) s, Q T Q = I.,,

11 We define GREEDY GAUSS-NEWTON ALGORITHM A = BQ T, C ) Q T, and H i = i Q T ) S i where B, C, T i, S i, and R i, i =,...,, are all rando atrices of the corresponding sizes whose eleents are uniforly distributed in, ). We assue that x is n + s) - sparse with n + s) first non-zero eleents. Let z = x : n + s) then any x such that 3) x x = z z ) = Q2 y S T i R i ), y R s, is a solution to 3). Moreover, as one can always find y R s such that z = Q 2 y + z has additional s zeros, we conclude that there are solutions x of sparsity n. The Jacobian, Jx) R N, of f is given by J ij x) = a ij + e T j H i x x), i =,...,, j =,... N, where e j is the j th unit vector, and f i = H i. Thus, for x as in 3) we obtain Jx) = BQ T, C + J 2 ), J 2 = e T j S T i Q 2 y ȳ) which ost probably has rank. All the tests we run with N =, = 2, s = 2, prob =.2 and with the constants given in Section 3.. In Figures - 4 we deonstrate the qualitative behaviour of the Greedy Gauss-Newton ethod and copare it with the l -ethod. In Figure we show the results of the algorith for solving 3) using MD with n = 6. In particular, we plot the absolute value of the solution x obtained using MD, and the inus absolute value of the solution obtained using the l - ethod in Figure left upper). The sparsity of the solution obtained by MD is equal to n = 6 and the sparsity of the solution obtained by the l -ethod is 56. In Figure right lower) one can see which coluns of J k were added at each iteration step k =, 2,... We plot fx k ) in logarithic scale in Figure right upper) and the size of Ω k in Figure right lower) at each iteration. The sae test proble as in Figure is then solved using OM. We display the results in Figure 2. Note that the solution with OM is not the sae as the one for MD even if the sparsity is the sae. x x MD) x l ) - 5 spyx,..., x k ) 5 log fx k ) size Ω k Figure. MD ethod perforance for the test proble 3) with N =, = 2, n = 6 and s = 2

12 2 M. GULLIKSSON AND A. OLEYNIK x x OM) x l ) - 5 spyx,..., x k ) 5 log fx k ) size Ω k Figure 2. MD ethod perforance for the test proble 3) with N =, = 2, n = 6 and s = 2. For the chosen paraeters the convergence to a sparse solution, as in Figure and Figure 2, is the ost coon case. However, the algorith ay not produce a convergent to the solution) sequence starting with x =, see Figure 3, or produce an -sparse solution, as in Figure 4. In Figure 3 right upper), one can see an exaple of the case when the algorith got stuck in a subspace with a local iniu to fx) 2 /2 that does not yield a solution to fx) =. The rank of the Jacobian at these inia are equal to 8, 9, 9 which can be seen fro Figure 3 right lower). The algorith converged to a sparse solution after three different) restarts. We have plotted the absolute value of the solution and the inus absolute value of the solution of sparsity 56 obtained by the l - ethod in Figure 3 left upper). In Figure 3 left lower) the subspace of the local iniu and the subspace of the solution are shown. Finally, in Figure 4 we show the case where the algorith does not find a sparse solution but converges to a solution of the sparsity, = 2. The sparsity of the solution obtained by l -ethod is equal to 54, see Figure 4 lower left). Since we have not found significant difference in the qualitative behaviour between OM and MD we have displayed the results for the last two tests only for MD. We would like note that while solutions obtained by the Greedy Gauss-Newton ethod can not exceed, the l -ethod ay produce a solution of even larger sparsity than, which was the case for the considered test proble 3) for all our runs. In Figure 5 and 6 we illustrate the perforance of the algorith over the average of runs where N =, and n vary as = 8,,..., 98 and n = 2, 4,..., 6.. The upper two plots in Figure 5 show that the sparsity n of the solution is attained except for a curved ridge. It has been shown in [8] that for linear probles orthogonal atching pursuit can provably recover n-sparse signals when n /2 logn)). This estiate is illustrated by the cutting plane in the figures. It is seen that MD and OM anage to find less sparse solutions than the estiate. In the lower right plots in Figure 5 and 6 it is seen that MD outperfors OM for ost proble sizes. The nuber of restarts were insignificantly sall for these tests Exponential Test Proble. This proble is taken fro [3]. Define 32) fx) = Ae Bx b, A R N, b R, e x = e x,..., e x N ) T

13 GREEDY GAUSS-NEWTON ALGORITHM 3.5 x x MD) x l ) spyx,..., x k ) 5 log fx k ) size Ω k Figure 3. MD ethod perforance for the test proble 3) with N =, = 2, n = 6 and s = 2.5 x x OM) x l ) spyx,..., x k ) 5 log fx k ) size Ω k Figure 4. MD ethod perforance for the test proble 3) with N =, = 2, n = 6 and s = 2. where the eleents in A are chosen rando uniforly in, ) and then by using Singular Value Decoposition to have ranka) = p. The atrix B is constructed in the following way. First, we generate N N rando atrix whose eleents are uniforly distributed in [, ]. Next, using Singular Value Decoposition we fix this atrix to have the first n + s < coluns to have the rank n for soe n, s N. That is, rankb:, : n + s)) = n and B ost probably has the rank N s. We choose x = z, ) T with soe z R n+s and set b = A expb x). Then for any y R s ) V2 y 33) x = x +

14 4 M. GULLIKSSON AND A. OLEYNIK Sparsity for MD Sparsity for MD copared to n n n 9 Iterations for MD Iterations for OM copared to MD n n 9 Figure 5. The 3D plots of the perforance of MD and OM ethods for the test proble 3) over the average of runs, N =, s = 2, and = 8,,..., 98, n = 2, 4,..., Sparsity for MD 8 2 n Sparsity for MD copared to n 8 2 n Iterations for MD 8 2 n Iterations for OM copared to MD 8 2 n Figure 6. The contour plots of the perforance of MD and OM ethods for the test proble 3) over the average of runs, N =, s = 2, and = 8,,..., 98, n = 2, 4,..., 6. solves fx) = with V 2 R n+s) s such that RV 2 ) = N B:, : n+s)). Fro this construction it is clear that soe of x aong 33) have the sparsity n. The Jacobian and second derivatives are given as Jx) = A diag e x,..., e x N ) = A diage x ), f i = diag a i e x,..., a in e x N ), i =,...,. where a ij, j =,..., N, are the eleents of A. The atrix Jx) is always rank deficient. Indeed, since A diag e x,..., e x N ) has the sae rank as A we have rankjx)) in {ranka), rankb)} = in { p, N s}

15 GREEDY GAUSS-NEWTON ALGORITHM 5 Sparsity for MD Sparsity for MD copared to n n n 23 Iterations for MD Iterations for OM copared to MD n n 23 Figure 7. The perforance of MD and OM ethods for the test proble with 32) over the average of runs, N =, s = 4, and = 2, 6,..., 96, n = 2, 6,...,. 98 Sparsity for MD 3 98 Sparsity for MD copared to n n n Iterations for MD Iterations for OM copared to MD n 23 2 n 23 Figure 8. The perforance of MD and OM ethods for the test proble with 32) over the average of runs, N =, s = 4, and = 2, 6,..., 96, n = 2, 6,...,. All the tests were run with N =, s = 2, prob = 2 + /)/ and the constants given in see Section 3.. Furtherore, an additional condition for a restart, ax i x k i) > 3, is added in the condition of the if-stateent on row in the pseudocode to prevent convergence to infinity. In Figure 7 and 8 we illustrate the perforance of the algorith over the average of runs where N =, s = 4,, n vary as = 2, 6,..., 96, n = 2, 6,...,.

16 6 M. GULLIKSSON AND A. OLEYNIK The upper right plots in Figure 5 and 8 show that the sparsity of the solution is attained very close to the estiate n /2 logn)) obtained for linear probles. We however do not have theoretical justification of this estiate for nonlinear cases. Figure 5 lower right) and 8 lower right) shows that MD outperfors OM for all proble sizes. The nuber of restarts for this test proble were ore frequent than for the quadratic test proble, see Section 4.2. However, there were few cases when n and is large, where there was no convergence. References [] To M Apostol. Matheatical analysis; 2nd ed. Addison-Wesley Series in Matheatics. Addison-Wesley, Reading, MA, 974. [2] Philipp Kuegler. A sparse update ethod for solving underdeterined systes of nonlinear equations applied to the anipulation of biological signaling pathways. SIAM Journal on Applied Matheatics, 724):982, 22. [3] JosMario Martnez. Quasi-newton ethods for solving underdeterined nonlinear siultaneous equations. Journal of Coputational and Applied Matheatics, 342):7 9, 99. [4] J. A. Tropp and S. J. Wright. Coputational Methods for Sparse Solution of Linear Inverse Probles. Proceedings of the IEEE, 986): , jun 2. [5] Xiaoling Sun, Xiaojin Zheng, and Duan Li. Recent advances in atheatical prograing with seicontinuous variables and cardinality constraint. Journal of the Operations Research Society of China, ):55 77, 23. [6] Air Beck and Nadav Hallak. On the iniization over sparse syetric sets: Projections, optiality conditions, and algoriths. Matheatics of Operations Research, 4):96 223, 26. [7] Air Beck and Yonina C. Eldar. Sparsity constrained nonlinear optiization: Optiality conditions and algoriths. SIAM Journal on Optiization, 233):48 59, 23. [8] A. Beck and Y. C. Eldar. Sparse signal recovery fro nonlinear easureents. In 23 IEEE International Conference on Acoustics, Speech and Signal Processing, pages , May 23. [9] Y. Shechtan, A. Beck, and Y. C. Eldar. Gespar: Efficient phase retrieval of sparse signals. IEEE Transactions on Signal Processing, 624): , Feb 24. [] S. Bahani, P. Boufounos, and B. Raj. Greedy sparsity-constrained optiization. In 2 Conference Record of the Forty Fifth Asiloar Conference on Signals, Systes and Coputers ASILOMAR), pages 48 52, Nov 2. [] A. Björck. Nuerical Methods for Least Squares Probles. SIAM, Philadelphia, 996. [2] A. Ben-Israel and T.N.E. Greville. Generalized Inverses: Theory and Applications. CMS Books in Matheatics. Springer, 23. [3] Gene H. Golub and Van Loan. Matrix Coputations 4th Ed.). Johns Hopkins University Press, Baltiore, MD, USA, 23. [4] C. Kelley. Iterative Methods for Optiization. Society for Industrial and Applied Matheatics, 999. [5] J. Dennis and R. Schnabel. Nuerical Methods for Unconstrained Optiization and Nonlinear Equations. Society for Industrial and Applied Matheatics, 996. [6] J. Ortega and W. Rheinboldt. Iterative Solution of Nonlinear Equations in Several Variables. Society for Industrial and Applied Matheatics, 2. [7] J. Eriksson, P. A. Wedin, M. E. Gulliksson, and I. Söderkvist. Regularization ethods for uniforly rankdeficient nonlinear least-squares probles. Journal of Optiization Theory and Applications, 27): 26, 25. [8] Joel A. Tropp. On the conditioning of rando subdictionaries. Applied and Coputational Haronic Analysis, 25): 24, 28. M. Gulliksson, School of Science and Technology, Örebro University, Sweden E-ail address: arten.gulliksson@oru.se A. Oleynik, Departent of Matheatical Sciences and Technology, Norwegian University of Life Sciences, Postboks 53 NMBU 432 Ås E-ail address: anna.oleynik@nbu.no

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes

Explicit solution of the polynomial least-squares approximation problem on Chebyshev extrema nodes Explicit solution of the polynoial least-squares approxiation proble on Chebyshev extrea nodes Alfredo Eisinberg, Giuseppe Fedele Dipartiento di Elettronica Inforatica e Sisteistica, Università degli Studi

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Complex Quadratic Optimization and Semidefinite Programming

Complex Quadratic Optimization and Semidefinite Programming Coplex Quadratic Optiization and Seidefinite Prograing Shuzhong Zhang Yongwei Huang August 4 Abstract In this paper we study the approxiation algoriths for a class of discrete quadratic optiization probles

More information

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili,

Generalized AOR Method for Solving System of Linear Equations. Davod Khojasteh Salkuyeh. Department of Mathematics, University of Mohaghegh Ardabili, Australian Journal of Basic and Applied Sciences, 5(3): 35-358, 20 ISSN 99-878 Generalized AOR Method for Solving Syste of Linear Equations Davod Khojasteh Salkuyeh Departent of Matheatics, University

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

A new type of lower bound for the largest eigenvalue of a symmetric matrix

A new type of lower bound for the largest eigenvalue of a symmetric matrix Linear Algebra and its Applications 47 7 9 9 www.elsevier.co/locate/laa A new type of lower bound for the largest eigenvalue of a syetric atrix Piet Van Mieghe Delft University of Technology, P.O. Box

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

The Methods of Solution for Constrained Nonlinear Programming

The Methods of Solution for Constrained Nonlinear Programming Research Inventy: International Journal Of Engineering And Science Vol.4, Issue 3(March 2014), PP 01-06 Issn (e): 2278-4721, Issn (p):2319-6483, www.researchinventy.co The Methods of Solution for Constrained

More information

A Simple Homotopy Algorithm for Compressive Sensing

A Simple Homotopy Algorithm for Compressive Sensing A Siple Hootopy Algorith for Copressive Sensing Lijun Zhang Tianbao Yang Rong Jin Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Departent of Coputer

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

On the theoretical analysis of cross validation in compressive sensing

On the theoretical analysis of cross validation in compressive sensing MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.erl.co On the theoretical analysis of cross validation in copressive sensing Zhang, J.; Chen, L.; Boufounos, P.T.; Gu, Y. TR2014-025 May 2014 Abstract

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

New Classes of Positive Semi-Definite Hankel Tensors

New Classes of Positive Semi-Definite Hankel Tensors Miniax Theory and its Applications Volue 017, No., 1 xxx New Classes of Positive Sei-Definite Hankel Tensors Qun Wang Dept. of Applied Matheatics, The Hong Kong Polytechnic University, Hung Ho, Kowloon,

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE

ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE ADVANCES ON THE BESSIS- MOUSSA-VILLANI TRACE CONJECTURE CHRISTOPHER J. HILLAR Abstract. A long-standing conjecture asserts that the polynoial p(t = Tr(A + tb ] has nonnegative coefficients whenever is

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3

A1. Find all ordered pairs (a, b) of positive integers for which 1 a + 1 b = 3 A. Find all ordered pairs a, b) of positive integers for which a + b = 3 08. Answer. The six ordered pairs are 009, 08), 08, 009), 009 337, 674) = 35043, 674), 009 346, 673) = 3584, 673), 674, 009 337)

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL

paper prepared for the 1996 PTRC Conference, September 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL paper prepared for the 1996 PTRC Conference, Septeber 2-6, Brunel University, UK ON THE CALIBRATION OF THE GRAVITY MODEL Nanne J. van der Zijpp 1 Transportation and Traffic Engineering Section Delft University

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

When Short Runs Beat Long Runs

When Short Runs Beat Long Runs When Short Runs Beat Long Runs Sean Luke George Mason University http://www.cs.gu.edu/ sean/ Abstract What will yield the best results: doing one run n generations long or doing runs n/ generations long

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Recovery of Sparsely Corrupted Signals

Recovery of Sparsely Corrupted Signals TO APPEAR IN IEEE TRANSACTIONS ON INFORMATION TEORY 1 Recovery of Sparsely Corrupted Signals Christoph Studer, Meber, IEEE, Patrick Kuppinger, Student Meber, IEEE, Graee Pope, Student Meber, IEEE, and

More information

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Tie-Varying Jaing Links Jun Kurihara KDDI R&D Laboratories, Inc 2 5 Ohara, Fujiino, Saitaa, 356 8502 Japan Eail: kurihara@kddilabsjp

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

A Bernstein-Markov Theorem for Normed Spaces

A Bernstein-Markov Theorem for Normed Spaces A Bernstein-Markov Theore for Nored Spaces Lawrence A. Harris Departent of Matheatics, University of Kentucky Lexington, Kentucky 40506-0027 Abstract Let X and Y be real nored linear spaces and let φ :

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product.

Reed-Muller Codes. m r inductive definition. Later, we shall explain how to construct Reed-Muller codes using the Kronecker product. Coding Theory Massoud Malek Reed-Muller Codes An iportant class of linear block codes rich in algebraic and geoetric structure is the class of Reed-Muller codes, which includes the Extended Haing code.

More information

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

The Hilbert Schmidt version of the commutator theorem for zero trace matrices The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

Variations on Backpropagation

Variations on Backpropagation 2 Variations on Backpropagation 2 Variations Heuristic Modifications Moentu Variable Learning Rate Standard Nuerical Optiization Conjugate Gradient Newton s Method (Levenberg-Marquardt) 2 2 Perforance

More information

HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS

HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS R 702 Philips Res. Repts 24, 322-330, 1969 HESSIAN MATRICES OF PENALTY FUNCTIONS FOR SOLVING CONSTRAINED-OPTIMIZATION PROBLEMS by F. A. LOOTSMA Abstract This paper deals with the Hessian atrices of penalty

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Anisotropic reference media and the possible linearized approximations for phase velocities of qs waves in weakly anisotropic media

Anisotropic reference media and the possible linearized approximations for phase velocities of qs waves in weakly anisotropic media INSTITUTE OF PHYSICS PUBLISHING JOURNAL OF PHYSICS D: APPLIED PHYSICS J. Phys. D: Appl. Phys. 5 00 007 04 PII: S00-770867-6 Anisotropic reference edia and the possible linearized approxiations for phase

More information

Divisibility of Polynomials over Finite Fields and Combinatorial Applications

Divisibility of Polynomials over Finite Fields and Combinatorial Applications Designs, Codes and Cryptography anuscript No. (will be inserted by the editor) Divisibility of Polynoials over Finite Fields and Cobinatorial Applications Daniel Panario Olga Sosnovski Brett Stevens Qiang

More information

Order Recursion Introduction Order versus Time Updates Matrix Inversion by Partitioning Lemma Levinson Algorithm Interpretations Examples

Order Recursion Introduction Order versus Time Updates Matrix Inversion by Partitioning Lemma Levinson Algorithm Interpretations Examples Order Recursion Introduction Order versus Tie Updates Matrix Inversion by Partitioning Lea Levinson Algorith Interpretations Exaples Introduction Rc d There are any ways to solve the noral equations Solutions

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Page 1 Lab 1 Elementary Matrix and Linear Algebra Spring 2011

Page 1 Lab 1 Elementary Matrix and Linear Algebra Spring 2011 Page Lab Eleentary Matri and Linear Algebra Spring 0 Nae Due /03/0 Score /5 Probles through 4 are each worth 4 points.. Go to the Linear Algebra oolkit site ransforing a atri to reduced row echelon for

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL

GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL GLOBALLY CONVERGENT LEVENBERG-MARQUARDT METHOD FOR PHASE RETRIEVAL CHAO MA, XIN LIU, AND ZAIWEN WEN Abstract. In this paper, we consider a nonlinear least squares odel for the phase retrieval proble. Since

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical

ASSUME a source over an alphabet size m, from which a sequence of n independent samples are drawn. The classical IEEE TRANSACTIONS ON INFORMATION THEORY Large Alphabet Source Coding using Independent Coponent Analysis Aichai Painsky, Meber, IEEE, Saharon Rosset and Meir Feder, Fellow, IEEE arxiv:67.7v [cs.it] Jul

More information

Necessity of low effective dimension

Necessity of low effective dimension Necessity of low effective diension Art B. Owen Stanford University October 2002, Orig: July 2002 Abstract Practitioners have long noticed that quasi-monte Carlo ethods work very well on functions that

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Hamming Compressed Sensing

Hamming Compressed Sensing Haing Copressed Sensing Tianyi Zhou, and Dacheng Tao, Meber, IEEE Abstract arxiv:.73v2 [cs.it] Oct 2 Copressed sensing CS and -bit CS cannot directly recover quantized signals and require tie consuing

More information

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf

Konrad-Zuse-Zentrum für Informationstechnik Berlin Heilbronner Str. 10, D Berlin - Wilmersdorf Konrad-Zuse-Zentru für Inforationstechnik Berlin Heilbronner Str. 10, D-10711 Berlin - Wilersdorf Folkar A. Borneann On the Convergence of Cascadic Iterations for Elliptic Probles SC 94-8 (Marz 1994) 1

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Tail estimates for norms of sums of log-concave random vectors

Tail estimates for norms of sums of log-concave random vectors Tail estiates for nors of sus of log-concave rando vectors Rados law Adaczak Rafa l Lata la Alexander E. Litvak Alain Pajor Nicole Toczak-Jaegerann Abstract We establish new tail estiates for order statistics

More information