Proximal methods for the latent group lasso penalty

Size: px
Start display at page:

Download "Proximal methods for the latent group lasso penalty"

Transcription

1 Proximal methods for the latent grou lasso enalty The MIT Faculty has made this article oenly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Villa, Silvia, Lorenzo Rosasco, Sofia Mosci, and Alessandro Verri. Proximal Methods for the Latent Grou Lasso Penalty. Comut Otim Al 58, no. 2 (December 21, 2013): htt://dx.doi.org/ /s Sringer US Version Author's final manuscrit Accessed Mon Jan 07 01:50:09 EST 2019 Citable Link Terms of Use Detailed Terms htt://hdl.handle.net/1721.1/ Article is made available in accordance with the ublisher's olicy and may be subject to US coyright law. Please refer to the ublisher's site for terms of use.

2 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 1/28 Noname manuscrit No. (will be inserted by the editor) Proximal methods for the latent grou lasso enalty Silvia Villa Lorenzo Rosasco Sofia Mosci Alessandro Verri Received: date / Acceted: date Abstract We consider a regularized least squares roblem, with regularization by structured sarsity-inducing norms, which extend the usual l 1 and the grou lasso enalty, by allowing the subsets to overla. Such regularizations lead to nonsmooth roblems that are difficult to otimize, and we roose in this aer a suitable version of an accelerated roximal method to solve them. We rove convergence of a nested rocedure, obtained comosing an accelerated roximal method with an inner algorithm for comuting the roximity oerator. By exloiting the geometrical roerties of the enalty, we devise a new active set strategy, thanks to which the inner iteration is relatively fast, thus guaranteeing good comutational erformances of the overall algorithm. Our aroach allows to deal with high dimensional roblems without re-rocessing for dimensionality reduction, leading to better comutational and rediction erformances with resect to the state-of-the art methods, as shown emirically both on toy and real data. Keywords First keyword Structured sarsity Proximal methods More Regularization S. Villa Istituto Italiano di Tecnologia, Italy Silvia.Villa@iit.it Lorenzo Rosasco CBCL, McGovern Institute, Massachussets Institute of Technology, USA & Istituto Italiano di Tecnologia, Italy lrosasco@mit.edu Sofia Mosci DIBRIS, Università di Genova, Italy Sofia.Mosci@unige.it Alessandro Verri DIBRIS, Università di Genova, Italy Alessandro.Verri@unige.it

3 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 2/28 2 Silvia Villa et al. 1 Introduction Sarsity has become a oular way to deal with a number of roblems arising in signal and image rocessing, statistics and machine learning [19]. In a broad sense, it refers to the ossibility of writing the solution in terms of a few building blocks. Often sarsity based methods are the key towards finding interretable models in real-world roblems. For examle, sarse regularization based with l 1 -tye enalties is a owerful aroach to find sarse solutions by minimizing a convex functional [48,12,18]. The success of l 1 regularization motivated exloring different kinds of sarsity roerties for regularized otimization roblems, exloiting available a riori information, which restricts the admissible sarsity atterns of the solution. An examle of a sarsity attern is when the variables are artitioned into grous (known a riori), and the goal is to estimate a sarse model where variables belonging to the same grou are either jointly selected or discarded. This roblem can be solved by regularizing with the grou l 1 enalty, also known as grou lasso enalty [52]. The latter is the sum, over the grous, of the euclidean norms of the coefficients restricted to each grou. Note that, for any > 1, the same grouwise selection can be achieved by regularizing with the l 1 /l norm, i.e. the sum over the grous of the l norm of the coefficients restricted to each grou. A ossible generalization of the grou lasso enalty is obtained considering grous of variables which can be otentially overlaing [53, 24], and the goal is to estimate a model which suort is the union of grous. For examle, this is a common situation in bioinformatics (esecially in the context of highthroughut data such as gene exression and mass sectrometry data), where roblems are characterized by a very low number of samles with several thousands of variables. In fact, when the number of samles is not sufficient to guarantee accurate model estimation, a ossible solution is to take advantage of the huge amount of rior knowledge encoded in online databases such as the Gene Ontology [15]. Largely motivated by alications in bioinformatics, the latent grou lasso with overla enalty is roosed in [22] and further studied in [36,2] and in [38] in the image rocessing context, which generalizes the l 1 /l 2 enalty to overlaing grous, thus satisfying the assumtion that the admissible sarsity atterns must be unions of a subset of the grous. All the methods roosed in the literature solve the minimization roblem arising in [22] by alying state-of-the-art techniques for grou lasso in an exanded sace, called sace of latent variables, built by dulicating variables that belong to more than one grou. The most oular otimization strategies that have been roosed are interior-oints methods [3, 37], block coordinate descent [28], roximal methods [43, 31, 38, 26, 13] and the related alternating direction method [16]. Very recently, the aer [40] roosed an accelerated alternating direction method and [41] studied a block coordinate descent, along with a roximal method with variable ste-sizes. As already noted in [22], though very natural, every imlementation develoed in the latent variables does not scale to large datasets: when the grous have significant overla, a more scalable algorithm with no data dulication

4 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 3/28 Proximal methods for the latent grou lasso enalty 3 is needed. For this reason we roose an alternative otimization aroach to solve the grou lasso roblem with overla, and extend it to the entire family of grou lasso with overla enalties, that generalize the l 1 /l enalties to overlaing grous for > 1. Our method is a two-loos iterative scheme based on roximal methods (see for examle [33,7,6]), and more recisely on the accelerated version named FISTA [6]. It does not require exlicit relication of the variables and is thus more aroriate to deal with high dimensional roblems with large grou overla. In fact, the roximity oerator can be efficiently comuted by exloiting the geometrical roerties of the enalty. We show that such an oerator can be written as the identity minus the rojection onto a suitable convex set, which is the intersection of as many convex sets as the number of active grous, that is grous corresonding to active constraints, which can be easily found. Indeed, the identification of the active grous is a key ste, since it allows comuting the rojection in a reduced sace. For general, the rojection can be solved via the Cyclic Projections algorithm [4]. Furthermore, for the case = 2, we resent an accelerated scheme, where the reduced rojection is comuted by solving a corresonding dual roblem via the rojected Newton method [8], thus working in a much lower dimensional sace. The resent aer comletes and extends the reliminary results resented in the short conference version [32]. In articular, it contains a general mathematical resentation and all the roofs, which were omitted in [32]. We next describe how the rest of the aer is organized, and then highlight the main novelties with resect to the short version. In Section 2, we cast the roblem of Grou-wise Selection with Overla (GSO) as a regularization roblem based on a modified l 1 /l -tye enalty and comare it with other structured sarsity enalties. We extend the aroach in [32] for = 2 to general > 1. In Section 3, we describe the derivation of the roosed otimization scheme, and rove its convergence. Precisely, we first recall roximal methods in Subsection 3.1, then in Subsection 3.2 we describe the technical results that ease the comutation of the roximity oerator as a simlified rojection, and resent different rojection algorithms deending on. With resect to [32], we show that our active set strategy can be rofitably used in this generalized framework in combination with any algorithm chosen to comute the inner rojection. Furthermore, to solve the rojection for a general (1, + ], we discuss the use of a cyclic rojections algorithm, whose convergence in norm is guaranteed and results in a rate of convergence for the roosed roximal method, roved in Subsection 3.3. Section 4 is a substantial extension of the exeriments erformed in [32]. We emirically analyze the comutational erformance of our otimization rocedure. We first study the erformance of the different variations of the roosed otimization scheme. Then we resent a set of numerical exeriments comaring running time of our algorithm with state-of-the-art techniques. We conclude with a real data exeriment where we show that the imroved comutational erformance allows dealing with large data sets without rerocessing thus imroving also the rediction and selection erformance. Finally, in Aendix B we review the rojected Newton

5 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 4/28 4 Silvia Villa et al. method [8]. Notation. Given a vector x R d, we denote with the l -norm of x, defined as x = ( d j=1 x j )1/ and x = max j {1,...d} x j. We will also use the notation x G, = ( j G x j )1/ for 1, and x G, = max j G x j to denote the l -norm of the comonents of x in G {1,..., d}. When the subscrit is omitted, the l 2 norm is used, = 2. The conjugate exonent of is denoted by q; we recall that q is such that 1/+1/q = 1. In the following, X will denote R d and Y a bounded interval in R. 2 Grou-wise selection with Overla (GSO) This aer rooses an otimization algorithm for a regularized least-squares roblem of the tye min E x R d τ (x), Eτ (x) = 1 n Ψx y 2 + 2τΩ G (x), (GSO-) where Ψ : R d R n is a linear oerator, y R n, and Ω G : R d [0, + ) is a convex and lower semicontinuous enalty, deending on a arameter (1, + ], and on an a riori given grou structure, G = {G r } B, with G r {1,..., d} and B G r = {1,..., d}. Note that other data fit terms could be used, different from the quadratic one, as long as they are convex and continuously differentiable with Lischitz continuous gradient. We will focus on least squares to simlify the exosition. Most grou sarsity enalties can be built starting from the family of canonical linear rojections on the subsace identified by the indices belonging to G r, i.e. P r : R d R Gr. The definition of the enalties we consider is based on the adjoint of the linear oerator P : R d that is the oerator B R Gr, P x = (P 1 x,..., P B x), P : B R Gr R d, P (v 1,..., v B ) = B Pr v r, where Pr : R Gr R d is the canonical injection. For x R d we set Ω G (x) = min v R Gr P v=x B v r. (1) For = 2, the functional Ω2 G was introduced in [22] (see also [36,2]). The distinctive feature of the family of enalties Ω G, is that they have the roerty of inducing grou-wise selection, that is they lead to solutions with suort

6 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 5/28 Proximal methods for the latent grou lasso enalty 5 (i.e. set of non zero entries) which is the union of a subsets of the grous defined a riori. In fact, Ω G can be seen as a generalization of the mixed l 1 /l norms, originally introduced for disjoint grous: R G (x) = B x Gr,, 1. For = 2, R G is the grou lasso enalty, and it is well-known [52] that such enalties lead to solutions whose suort is the union of a small number of grous. The enalty R G can be written also if the grous overla, and more generally the comosite absolute enalties (CAP) J G γ,(x) = B ( x Gr,) γ, first introduced in [53] and coinciding with R G for γ = 1, have been intensively studied. The J γ, enalties allow to deal with comlex grous structures involving hierarchies or grahs and it is roved in [24] that the CAP enalties constraint the suort to be the comlement of a union of grous. Ω G and R G are thus somehow comlementary and have different domain of alications [24, 27, 25]. While many algorithms have been roosed to solve the otimization roblem corresonding to R G, the one corresonding Ω G is much less studied. This is due on the one hand to the fact that the enalty is more comlex, and on the other hand to the widesread use of the relication strategy. The latter is based on the observation that, using the definition of Ω G, and the surjectivity of P, the (GSO-) minimization roblem can be written as 1 min v B n ΨP v y 2 + 2τ RGr B v r, (2) which is a grou lasso roblem without overla for the linear oerator ΨP in the so called latent variables (v r ) B, obtained by relicating variables belonging to more than one grou. The last rewriting allows to aly every algorithm develoed for the standard grou-lasso to the overlaing case, but this strategy is not feasible for high dimensional roblems with large grou overlas, as otentially many artificial dimensions are created. The main goal of this aer is to roose and study an otimization algorithm which does not require the relication of variables belonging to more than one grou. The choice > 1 has both technical and ractical motivations. On the one hand, it guarantees convexity of the enalty which can be shown to be a norm (see Lemma 1 in [22] for = 2), and, as a consequence, of the (GSO-) regularization roblem (note that this is valid for = 1 too). On the other hand, it enforces democracy among the elements that belong to the same grou, in the sense that no intragrou sarsity is enforced, thus inducing

7 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 6/28 6 Silvia Villa et al. grou-wise selection. The case = 1 is trivial, since the enalty Ω1 G with the l 1 norm, or lasso enalty [48]: coincides Ω G 1 (x) = inf (v r) B R Gr P v=x B and is thus indeendent of G. j G r (v r ) j = inf d (v r) B R Gr P j=1 v=x B r:j G r (v r ) j = d x j, Examle 1 A articular instance of the above roblem occurs in statistical learning theory. Given a set X, a set Y R, and j = 1,..., d, let ψ j : X R be a function (the collection {ψ j j = 1,..., d} is called a dictionary). The family of functions defined as { d } x j ψ j x R d j=1 is called a generalized linear model. If the estimator and the regression function belong to a generalized linear model, then given a training set {(t i, y i ) n i=1 } (X Y ) n, the regularized emirical risk takes the form 1 n Ψx y 2 + 2τΩ G (x), with Ψ : R d R n, (Ψx) i = d j=1 x jψ j (t i ) and y = (y 1,..., y n ). Examle 2 Most results obtained in the aer hold in an infinite dimensional setting. In articular, our aroach can be naturally extended to the multile kernel learning(mkl) roblem [29]. For this roblem, given reroducing kernel Hilbert saces H 1,..., H m of functions g : X R, defining H = m H r, the resulting otimization roblem takes the form (see [29]) min g r Hr Ψ( r g r ) y 2 + m g r Hr, for a suitable Ψ : H R n, y R n. As can be readily seen, the multile kernel learning roblem has the same structure of the (GSO-) roblem described above. j=1 3 An efficient roximal algorithm Due to non-smoothness of the enalty term, solving the (GSO-) minimization roblem is not trivial. Moreover, if one needs to solve the (GSO-) roblem for high dimensional data, the use of standard second-order methods such as interior-oint methods is recluded (see for instance [7]), since they need to solve large systems of linear equations to comute the Newton stes. On the other hand, first order methods insired to Nesterov s seminal aer [34] (see also [33]) and based on the roximal ma are accurate, and robust, in the

8 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 7/28 Proximal methods for the latent grou lasso enalty 7 sense that their erformance does not deend on the fine tuning of various controlling arameters. Furthermore, these methods were already roved to be a comutationally efficient alternative for solving many regularized inverse roblems in image rocessing [11], comressed sensing [7] and machine learning alications [2, 17, 31]. 3.1 Proximal methods The (GSO-) regularized convex functional is the sum of a convex smooth term, F (x) = 1 n Ψx y 2, with Lischitz continuous gradient, and a nondifferentiable enalty τω G ( ). A minimizing sequence can be comuted with a roximal gradient algorithm [49] (a.k.a. forward-backward slitting method [14], and Iterative Shrinkage Thresholding Algorithm (ISTA) [6]) x m = rox τ σ ΩG ( x m 1 1 2σ F (xm 1 ) ) (ISTA) for a suitable choice of σ, and any initialization x 0. Recently, several accelerations of ISTA have been roosed [35,49,6]. With resect to ISTA, they only require the additional comutation of a linear combination of two consecutive iterates. Among them, FISTA (Fast Iterative Shrinkage Thresholding Algorithm) [6] is given by the following udating rule for m 1 x m = rox τ σ ΩG ( s m+1 = 1 2 h m+1 = ( h m s 2 m ) 2σ F (hm ) ) ( 1 + s ) m 1 x m + 1 s m x m 1 s m+1 s m+1 (FISTA) for a suitable choice of σ > 0, s 1 = 1, and any initialization h 1 = x 0. Both schemes are based on the comutation of the roximity oerator [30], which is defined as rox λω G (z) = argmin Φ λ (x), with Φ λ (x) = 1 x R 2λ x z 2 + Ω G (x), λ > 0. d (3) The convergence rate of Eτ (x m ) min Eτ, for ISTA and FISTA, is O(1/m) and O(1/m 2 ), resectively, when the roximity oerator is comuted exactly. However, in general, the exact exression is not available. Recently, it has been shown that, also in the resence of errors, the accelerated version maintains advantages with resect to the basic one. In fact, the rate O(1/m 2 ) for FISTA in the resence of comutational errors was recently roved in [46, 51] for various error criteria. Convergence of ISTA with errors was already known, and first roved in [42,14]. Since the roximity oerator of the enalty Ω G is not admissible in closed form, the (GSO-) minimization roblem can thus be solved via an inexact

9 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 8/28 8 Silvia Villa et al. Algorithm 1 FISTA for GSO- Given: G, (1, + ], τ > 0, ɛ 0 > 0, α > 0, x 0 = h 0 R d, s 0 = 1, Let: σ = Ψ T Ψ /n, m = 0 and q such that q = 1. while convergence not reached do ĥ m = h m 1 nσ Ψ T (Ψh m y) Find Ĝm = {G G, ĥm G τ σ } Aroximately comute the rojection of ĥm onto τ σ KĜm := { h R d : h G Ĝm G,q τ } σ with tolerance ɛ0 m α x m = ĥm π (ĥm τσ ) KĜm ( s m+1 = ) 1 + 4s 2 2 m ( ) h m+1 = 1 + sm 1 x s m + 1 sm x m+1 s m 1 m+1 end while return x m version of the iterative schemes ISTA or FISTA, where F (h m ) is simly 2Ψ T (Ψh m y)/n. Note that, in the secial case of not overlaing grous, the roximity oerator can be exlicitly evaluated grou-wise, and reduces to a grou-wise soft-thresholding oerator. In the general case, as exlained in Subsection 3.2, the roximity oerator can be written in terms of a rojection, and we will rovide an algorithm to aroximately comute it. Note also that we will show that at each ste the rojection involves only a subset of the initial grous, the active grous, thus significantly increasing the comutational erformance of the overall algorithm. 3.2 Comuting the roximity oerator of Ω G In this subsection we state the lemmas that allow us to efficiently comute the roximity oerator of Ω G and to formulate the inexact version of FISTA reorted in Algorithm 1. As a direct consequence of standard results of convex analysis, Lemma 1 shows that the comutation of the roximity oerator amounts to the comutation of a rojection oerator onto the intersection of convex sets, each of them corresonding to a grou. In Lemma 2, we theoretically justifies an active set strategy, by showing that when rojecting a vector onto this intersection, it is ossible to discard the constraints which are already satisfied. In the following, given a convex and closed subset A (in R d for some d) we denote by π A the associate rojection: x R d, π A (x) = argmin y x. y A

10 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 9/28 Proximal methods for the latent grou lasso enalty 9 Lemma 1 For any λ > 0 and 1, the roximity oerator of λω G, where Ω G is defined in (1), is given by rox λω G = I π λk G. where K G is given by K G = {x R d, x Gr,q 1, for r = 1,..., B}. (4) The roof exloits the articular definition of the enalty and relies on the Moreau decomosition ( x rox λω (x) = x λrox Ω. (5) λ λ) Formula (4) allows to comute the roximity oerator of Ω starting from the roximity oerator of the Fenchel conjugate. In our case, being Ω G one homogeneous, we obtain the identity minus the rojection onto a closed and convex set. The articular geometry of K G, which is the intersection of B convex generalized cylinders centered on a coordinate subsace, derives from definition of Ω G and the exlicit comutation of its Fenchel conjugate. Observe that by definition Ω G is the infimal convolution of B functions, and recisely the B norms on R Gr comosed with the rojections. By standard roerties of the Fenchel conjugate, it follows that (Ω G ) = ι q, where ι q is the dual function of, i.e. the indicator function of the l q unitary ball in R Gr, defined as { 0 if v q 1 ι q (v) = + otherwise. We give here a self-contained roof which does not use the notion of infimal convolution. A different roof for the case = 2 is given in [36]. Proof We start by comuting exlicitly the Fenchel conjugate of Ω G. By definition, (Ω G ) (u) = su x, u min B B v r = su su x, u v r x R d x R d [ v R Gr P v=x v R Gr P v=x ] B B B = su v Pr v r, u v r = su [ Pr v r, u v r ] R Gr v r R Gr B su = [ v r, P r u v r ] = B ι q (P r u), v r R Gr where ι q is the Fenchel conjugate of, i.e. the indicator function of the l q unitary ball in R Gr. We can rewrite the sum of indicator functions as B ι q(p r u) = ι K G (u). It is well-known that rox λιk G (x) = π K G (x).

11 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 10/28 10 Silvia Villa et al. Using the Moreau decomosition (5) and basic roerties of the rojection we obtain rox λω (x) = x λπ K G (x/λ) = x π λk G (x). (6) The following lemma shows that, when evaluating the rojection π K G (x), we can restrict ourselves to a subset of active grous, denoted by Ĝ = G(ˆx) and defined in Lemma 2. This equivalence is crucial to seed u Algorithm 1, in fact the number of active grous at iteration m will converge to the number of selected grous, which is tyically small if one is interested in sarse solutions. Lemma 2 Given x R d, it holds where Ĝ := {G G, x G,q > λ}. π λk G (x) = π λk Ĝ (x), (7) Proof Given a grou of indices G and a number > 1, we denote by C G, the convex set C G, = {x R d : x G,q 1}. To rove the result we first show that for any subset S G the rojection onto the intersection λk S = G S λc G, is non-exansive coordinate-wise with resect to zero. More recisely, for all x R d, it holds that π λk S (x) i x i for all i = 1,..., d and for all λ > 0. By contradiction, assume that there exists an index ĵ such that π λk S (x)ĵ > xĵ. Consider the vector x defined by setting { πλk S x j = (x) j if j ĵ xĵ otherwise. First note that x λk S, since x G,q π λk S (x) G,q λ for all G S. On the other hand x x 2 = d (x j x j ) 2 < x π λk S (x) 2, j=1 j ĵ which is a contradiction. To conclude, suose that x λk S, with S G. If we rove that π λk G (x) = π G\S λk (x), we are done. For the sake of brevity denote v = π G\S λk (x). Thanks to the non-exansive roerty it follows v j x j for all j = 1,..., d and therefore v λk S. Since v λk G\S by hyotheses, we get that v λk G. Furthermore by definition of rojection v x w x, for every w λk G\S and a fortiori v x w x for every w λk G.

12 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 11/28 Proximal methods for the latent grou lasso enalty The rojection on K G for general The convex set K G is an intersection of convex sets, recisely K G = G G C G, where C G, = {v R d, v G,q 1}. For general a ossible minimization scheme for comuting the rojection in (7) can be obtained by alying the Cyclic Projections algorithm [9] or one of its modified versions (see [4] and references therein). In the articular case of = 2, we describe the Lagrangian dual roblem corresonding to the rojection onto K2 G, and we roose an alternative otimization scheme, the rojected Newton method [8], which better exloits the geometry of the set K2 G, and in ractice roves to be faster than the Cyclic Projections algorithm. Note that, in order to satisfy the hyothesis of Theorem 2, the tolerance for stoing the iteration must decrease with the outer iteration m. A simle way to comute the rojection onto the intersection of convex sets is given by the Cyclic Projections algorithm [9], which amounts to cyclically rojecting onto each set. Here we recall in Algorithm 2 a modification of the Cyclic Projections algorithm roosed by [4], for which strong convergence is guaranteed (see Theorem 4.1 in [4]). Algorithm 2 Cyclic Projections Given x R d, {C G1,,..., C GB,} Let l = 0, w 0 = x and find CĜ1,,..., C Ĝ ˆB, while convergence not reached do l = l + 1 Let π l the rojection onto τcĝl mod ˆB, w l = 1 l + 1 x + end while l l + 1 π l(w l 1 ) In the following we describe how to comute each rojection π CG, secific values of and an arbitrary grou G. for = 2. In this case q = 2, and the rojection is trivial [π τcg,2 (w)] j = { τ w j w G,2 w j if j G and w G,2 > τ otherwise =. In this case q = 1, and C G, is an l 1 ball when restricting to the coordinates in G. From Lemma 4.2 in [20], we have that if w 1 > τ, then

13 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 12/28 12 Silvia Villa et al. the rojection of w onto the l 1 ball of radius τ, τb 1, is given by the softthresholding oeration [π τb1 (w)] j = ( w j µ) + sign(w j ) where µ (deending on w and τ) is chosen such that j ( w j µ) + = τ. We recall a simle rocedure rovided in [20] for determining µ. In a first ste, sort the absolute values of the comonents of w, resulting in the rearranged sequence, wj w j+1 0 for all j. Next, erform a search to find k such that k 1 k (wj wk) τ (wj wk+1). j=1 j=1 Then set µ = w k + k 1 ( k 1 j=1 (w j w k ) τ ) 2, +. In these cases no known closed form for the rojection on the set C G, exist, but it can be efficiently comuted using Newton s method, as done in [23] The rojection on K G for = 2 When = 2, the rojection onto K2 G minimization roblem amounts to solving the constrained Minimize v x 2 subject to v R d, v G,2 τ, for G Ĝ, (8) which Lagrangian dual roblem can be written in a closed form. Working on the dual is advantageous, since the number of grous is tyically much smaller than d, and furthermore Lemma 2 guarantees that one can restrict to the subset of grous Ĝ := {G G : x G,2 > τ} =: {Ĝ1,..., Ĝ ˆB} (9) which in general is a roer subset of G. In the following theorem we show how to comute the solution to roblem (8), by solving the associated dual roblem. Theorem 1 Given x R d, G = {G r } B with G r {1,..., d}, Ĝ as in (9) and τ > 0, the rojection of x onto the convex set τk2 G with KG 2 = {v Rd : v Gr,2 τ for r = 1,..., B} is given by [ ] π τk G (x) = x j 2 j 1 + ˆB for j = 1,..., d (10) λ r1 r,j where λ is the solution of argmax f(λ), with f(λ) = λ R ˆB + d j=1 x 2 j 1 + ˆB 1 r,jλ r and 1 r,j equal to 1 if j belongs to grou Ĝr and 0 otherwise. ˆB λ r τ 2, (11)

14 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 13/28 Proximal methods for the latent grou lasso enalty 13 Proof The Lagrangian function for the minimization roblem (8) is defined as L(v, λ) = v x 2 + ˆB d = (v j x j ) 2 + j=1 d = (1+ ˆB j=1 ˆB 1 r,j λ r ) λ r τ 2 + x 2 λ r ( v 2 G r τ 2 ) ˆB ( v j λ r 1 r,j vj 2 x j ˆB 1+ ˆB 1 r,jλ r λ r τ 2 ) 2 d j=1 x 2 j 1+ ˆB 1 r,jλ r + (12) where λ R ˆB. The dual function is then f(λ) = inf v R d L(v, λ) = L = d j=1 x 2 j ( 1 + ˆB 1 r,jλ r x j 1 + ˆB, λ 1 r,jλ r ˆB ) λ r τ 2 + x 2. Since strong duality holds, the minimum of (4) is equal to maximum of the dual roblem which is therefore Maximize f(λ) subject to λ r 0 for r = 1,..., ˆB. (13) Once the solution λ to the dual roblem (13) is obtained, the solution to the rimal roblem (8), v, is given by v j = x j 1 + ˆB λ r1 r,j for j = 1,..., d. The dual roblem can be efficiently solved, for instance, via Bertsekas rojected Newton method described in [8], and here reorted as Algorithm 5 in the Aendix, where the first and second artial derivatives of f(λ) are given by r f(λ) = d x 2 j 1 r,j (1 + ˆB τ 2, s=1 1 s,jλ s ) 2 j=1

15 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 14/28 14 Silvia Villa et al. and d 2x 2 j r s f(λ) = 1 r,j1 s,j j=1 (1 + ˆB s=1 1 s,jλ s ) 3 { 0 if Ĝ r Ĝs = = 2 x2 ˆB j Ĝr Ĝs j (1 + s=1 1 s,jλ s ) 3 otherwise. Bertsekas iterative scheme combines the basic simlicity of the steeest descent iteration [44] with the quadratic convergence of the rojected Newton s method [10]. It does not involve the solution of a quadratic rogram thereby avoiding the associated comutational overhead. Its convergence roerties have been studied in [8] and are briefly mentioned in next section. 3.3 Convergence analysis of GSO- Algorithm In this subsection we clarify the accuracy in the comutation of the rojection which is required to rove convergence of the Algorithm 1. As mentioned above, we rely on recent theorems roviding a convergence rate for roximal gradient methods with aroximations. Definition 1 We say that w is an aroximation of π τ/σk G (x) with tolerance ɛ if w π τ/σk G (x) ɛ. Theorem 2 Given x 0 R d, and σ = Ψ T Ψ /n. Assume that π τ/σk G (x m ) in Algorithm 1 is aroximately comuted at ste m with tolerance ɛ m = ɛ 0 /m α. If α > 2, there exists a constant C I := C I (, G, x 0, σ, τ, α) such that the iterative udate (ISTA) satisfies ( ) Eτ 1 m x i Eτ (x ) C I m m. (14) i=1 If α > 4, there exists a constant C F := C F (, G, x 0, σ, τ, α) such that the iterative udate (FISTA) satisfies E τ (x m ) E τ (x ) C F m 2. (15) Proof It is enough to show that there exists a constant C > 0 (indeendent of w l and x m ) such that where Φ τ σ w l π τk G (x m ) ɛ m C = Φ τ σ (wl ) min Φ τ σ + ɛ m (16) is defined as in (3). Then the statement directly follows from Proosition 1 and Proosition 2 in [46]. In order to rove equation (16) first note that thanks to the assumtion B G r = {1,..., d} made at the beginning, it

16 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 15/28 Proximal methods for the latent grou lasso enalty 15 easily follows from the definition that Ω G is a norm on R d, and therefore it is equivalent to the euclidean one. Thus, there exists a constant A (deending only on and G) such that Ω G (x) Ω G (x ) A x x, x, x R d. Next, let w and x be such that w π τ/σk G (x) γ, (17) for some γ > 0 (and suose w.l.o.g. that γ < 1). By Lemma 1 and by definition of rox τ (x) and Φ σ ΩG τ (see equation (3)) we have σ Φ τ σ (x π τ/σk G (x)) = min Φ τ σ. Thus, by equation (17), and using the fact that Ω G is a norm Φ τ σ (x w) = σ 2τ w 2 + Ω G (x w) σ 2τ w π τ/σk G (x) 2 + σ 2τ π τ/σk G (x) 2 + σ τ w π τ/σk G (x), π τ/σk G (x) + Ω G (x π τ/σk G (x)) + Ω G (π τ/σk G (x) w) min Φ τ + σ σ 2τ γ2 + σ γã + Aγ τ = min Φ τ σ + ( σ 2τ γ + σ τ à + A ) γ min Φ τ + Cγ σ where à is such that su v K v à and C = C(, G, σ, τ). Therefore, G equation (16) holds with C as defined above. By Theorem 3.1 in [4], Algorithm 2 is strongly convergent, and therefore, given arbitrary ɛ > 0 and x R d, there exists an index l m := l m (ɛ) such that w lm roduced through Algorithm 2 enjoys the roerty w l π τk G (x m ) ɛ, for every l l m. Algorithm 1 combined with Algorithm 2 thus converges to the minimum of (GSO-) roblem with rate 1/m 2, if the rojection is aroximately comuted with tolerance ɛ 0 /m α with α > 4. Similarly, one can use ISTA instead of FISTA as udating rule in Algorithm 1, obtaining the convergence rate 1/m, and setting α > 2. It is clear that the choice of α defines the stoing rule for the internal algorithm (see Subsection 4.1). As it haens for the exact accelerations of the basic forward-backward slitting algorithm such as [34,7,6], convergence of the sequence x m is no longer guaranteed unless strong convexity is assumed.

17 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 16/28 16 Silvia Villa et al. Every other algorithm roducing admissible aroximations can be used in lace of Algorithm 2 in the comutation of the rojection. In the case = 2, we tested Bertsekas rojected Newton method, reorted in the Aendix as Algorithm 5. Its convergence is not always guaranteed, since there are articular choices of x and G for which the artial Hessian of the dual function is not strictly ositive defined, as would be required to ensure strong convergence (see Proosition 3 and Proosition 4 in [8]). However, ideas which are useful for circumventing the same roblem for unconstrained Newton s method, such as reconditioning, could be easily adated to this case, and convergence has always been observed in our exeriments (for more details see the discussion in [8] and also the comments at the end of the next subsection). 3.4 Comuting the regularization ath In Algorithm 3 we reort the comlete scheme for comuting the regularization ath for the Grou-wise Selection with Overla roblem (GSO-), i.e. the set of solutions corresonding to different values of the regularization arameter τ 1 >...>τ T. Note that we emloy the continuation strategy roosed in [21]. When comuting the roximity oerator with Bertsekas rojected Newton Algorithm 3 Regularization ath for GSO- Given: τ 1 > τ 2 > > τ T, G, ɛ 0 > 0, ν > 0 Let: σ = Ψ T Ψ /n, x (τ 0) = 0 for t = 1,..., T do Initialize: x = x (τ t 1) while convergence not reached do udate x according to Algorithm 1, with the rojection comuted via Cyclic Projections or by solving the dual roblem end while x (τ t) = x end for return x (τ 1),..., x (τ T ) method, a similar warm starting is alied to the inner iteration, since the m-th rojection is initialized with the solution of the (m 1)-th rojection. Desite the local nature of Bertsekas scheme, such an initialization emirically roved to guarantee convergence. 3.5 The relicates formulation As discussed in Section 2, the most common method to solve (GSO-) roblem is to minimize the standard grou l 1 /l regularization (without overla) in the exanded sace of latent variables in (2) built by relicating variables belonging to more than one grou, thus working in a d-dimensional sace

18 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 17/28 Proximal methods for the latent grou lasso enalty 17 with d = B G r. Setting Ψ = ΨP and R G (v) = B v r, roblem (2) can be written as 1 min v B n Ψv y 2 + 2τR G (v). RGr The main advantage of such a formulation relies on the ossibility of using any state-of-the-art otimization rocedure for l 1 /l regularization without overla. In terms of roximal methods, a ossible solution is given by Algorithm 3, where the roximity oerator can be now comuted grou-wise as ((rox λr G (v)) j ) j G r = ( I π λsgr, ) ((vj ) j Gr ) for all r = 1,..., B, where S Gr, now denotes the l q unitary ball in R Gr. Furthermore for = 2 and = + each rojection can be comuted exactly as described in Subsection 3.2.1, and the roximity oerator of R G is thus exact. The otimization algorithm for solving (GSO-) via FISTA in the relicated sace is reorted in Algorithm 4. Algorithm 4 FISTA for Grou-wise Selection without overla Given: v 0 B RGr, τ > 0, σ = Ψ T Ψ /n Initialize: m = 0, w 1 = v 0, t 1 = 1 while convergence not reached do for r = 1,..., B do ) v r = (I ( ( π S τσ Gr, w m 1 ) ) Ψ T ( Ψw m y) nσ j G r end for end while return v m s m+1 = 1 ( ) s 2 m 2 ( w m+1 = 1 + sm 1 ) v m + s m+1 s m+1 ( 1 sm ) v m 1 The relicate formulation involves a much simler roximity oerator, but each iteration has higher comutational cost, since now deends on d rather than on d, and thus increases with the amount of overla among variables subsets (see Section 4 for numerical comarisons between the rojection and relication aroaches). 4 Numerical exeriments In this section we resent numerical exeriments aimed at studying the comutational erformance of the roosed family of otimization algorithms, and at comaring them with the state-of-the-art algorithms alied to the relicate formulation.

19 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 18/28 18 Silvia Villa et al. Fig. 1 Comuting time (in seconds) necessary for evaluating the rox vs number of variables (d), for different values of the overla degree α and the tolerance, for fixed grou size b = Cyclic Projections vs dual formulation We build B grous,{g r } B, of size b, with G r {1,..., d}, by randomly drawing sets of b indexes from {1,..., d}, and consider the cases b = 10, and b = 100. We vary the number of grous B, so that the dimension of the exanded sace is α times the inut dimension, d = αd, with α = 1.2, 2 and 5. Clearly this amounts to taking B = α d/b. We then generate a vector x R d by randomly drawing each of its entry from N (0, 1). We then ick a value of τ such that, when comuting rox τω G (x), all grous are active. Precisely we take τ =.8 min,...,b x Gr,2. We first comute the exact solution x = rox Ω G (x) 1. Then we comute the aroximated solutions with 2 the Cyclic Projections Algorithm 2 and by solving the dual via the rojected Newton method. We will refer to the former as CP2 and to the latter as dual. We sto the iteration when the distance from the exact solution is less than ɛ the norm of x. We consider different values for the tolerance ɛ, recisely we take ɛ = 10 2, 10 3, Mean and standard deviation of the comuting time over 20 reetitions are lotted in Figure 1 and 2 for each value of α and ɛ. The dual formulation is faster than the Cyclic Projections algorithm in most situations. It is convenient to use Cyclic Projections when the number of active grous is high and the required tolerance very low. When comuting the rojection for Algorithm 1, it is thus reasonable to use Cyclic Projections in the very first outer iterations, 1 it is the solution comuted via the rojected Newton method for the dual roblem with very tight tolerance

20 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 19/28 Proximal methods for the latent grou lasso enalty 19 Fig. 2 Comuting time (in seconds) necessary for evaluating the rox vs number of variables (d), for different values of the overla degree α and the tolerance, for fixed grou size b = 100 when the tolerance which deends on the outer iterations is low, and the solution could be not sarse, because still far from convergence. After few iterations, it is more convenient to resort to the dual formulation. Even though, not otimal, in the following exeriments, when denoting GSO-2 via rojection we will consider always the rojection comuted with the dual formulation. 4.2 Projection vs relication In this Subsection we comare the running time erformance of the roosed set of algorithms where the roximity oerator is comuted aroximately, to state-of-the-art algorithms used to solve the equivalent formulation in the relicated sace. For such a comarison we restrict to = 2, since many benchmark algorithms are available in the case of grous that do not overla. In order to ensure a fair comarison, we first run some reliminary exeriments to identify the fastest codes for grou l 1 regularization with no overla Comarison without overla Recently there has been a very active research on this toic, see e.g. [40,41,13]. For the comarison, we considered three algorithms which are reresentative of the otimization techniques used to solve grou lasso: interior-oint methods, (grou) coordinate descent and its variations, and roximal methods. As an instance of the first set of techniques we emloyed the ublicly available

21 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 20/28 20 Silvia Villa et al. Matlab code at htt:// described in [1]. For coordinate descent methods, we emloyed the R-ackage grllasso, which imlements block coordinate gradient descent minimization for a set of ossible loss functions. In the following we will refer to these two algorithms as IP and BCGD. Finally, as an instance of roximal methods, we use our Matlab imlementation of FISTA for Grou-wise Selection, namely Algorithm 4 with FISTA instead of ISTA as udating rule. We will refer to it as PROX. In our exeriments, we sto PROX algorithm when the relative recision between two iterates is below a given threshold, i.e. when x m x m 1 ν x m 1. Though the theoretical results only guarantee convergence of the objective value, we observe in ractice that the algorithm with this stoing criterion always sto on our roblems. We first observe that on randomly generated toy examles for which the solution is easily comutable, the solutions of the three algorithms coincide u to an error which deends on each algorithm tolerance. We thus need to tune the each tolerance in order to guarantee that all iterative algorithms are stoed when the aroximation of the solution of the same level is obtained, namely when x m x ot is of the same order for all the three algorithms. Toward this end, we run Algorithm PROX with machine recision, ν = 10 16, in order to have a good aroximation of the solution x ot. We observe that for many values of n and d, and over a large range of values of τ, the aroximation of PROX when ν =10 6 is of the same order of the aroximation of IP with otaram.tol = 10 9, and of BCGD with tol = Note also that with these tolerances the three solutions coincide also in terms of selection, i.e. their suorts are identical for each value of τ. Therefore the following results corresond to otaram.tol = 10 9 for IP, tol = for BCGD, and ν = 10 6 for PROX. For the other arameters of IP we used the values used in the demos sulied with the code. Concerning the data generation rotocol, the inut variables t = (t 1,..., t n ) are uniformly drawn from [ 1, 1] d. The labels y are comuted using a noisecorruted linear regression function, i.e. y j = x, t j +w for all j {1,..., n}, where x deends on the first 30 variables, x j = c if j = 1,..., 30, and 0 otherwise, w is an additive noise, w N(0, 1), and c is a rescaling factor that sets the signal to noise ratio to 5:1. We consider the model described in Examle 1. In this case the dictionary is Ψ j (t) = t j for j = 1,..., d, so that the linear oerator Ψ can be reresented by the n d matrix with entries Ψ ij = (t i ) j. We then evaluate the entire regularization ath for the three algorithms with B sequential grous of 10 variables, (G 1 =[1,..., 10], G 2 = [11,..., 20], and so on), for different values of n and B. In order to make sure that we are working on the correct range of values for the arameter τ, we first evaluate the set of solutions of PROX corresonding to a large range of 500 values for τ, with ν = We then determine the smallest value of τ which corresonds to selecting less than n variables, τ min, and the smallest one returning the null solution, τ max. Finally we build the geometric series of 50 values between τ min and τ max, and use it to evaluate the regularization ath on the three algorithms. In order to obtain robust estimates of the running

22 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 21/28 Proximal methods for the latent grou lasso enalty 21 Table 1 Running time (mean and standard deviation) in seconds for comuting the entire regularization ath of IP, BCGD, and PROX for different values of B, and n. n = 100 B = 10 B = 100 IP 5.6 ± ± 90 BCGD 2.1 ± ± 0.6 PROX 0.21 ± ± 0.4 n = 500 n = 1000 B = 10 B = 100 IP 2.30 ± ± 30 BCGD 2.15 ± ± 0.5 PROX ± ± 0.16 B = 10 B = 100 IP 1.92 ± ± 22 BCGD 2.06 ± ± 3 PROX ± ± 0.5 times, we reeat 20 times for each air n, B. In Table 1 we reort the comutational times required to evaluate the entire regularization ath for the three algorithms. Algorithms BCGD and PROX are always faster than IP which, due to memory reasons, cannot be alied to roblems where the number of variables are more than 5000, since it requires to store the d d matrix Ψ T Ψ. It must be said that the code for IP was made available mainly in order to allow reroducibility of the results resented in [1], and is not otimized in terms of time and memory occuation. However it is well known that standard second-order methods are tyically recluded on large data sets, since they need to solve large systems of linear equations to comute the Newton stes. PROX is the fastest for B = 10, 100 and has a similar behavior to BCGD. The candidates as benchmark algorithms for comarison with FISTA via rojection are therefore BCGD and PROX. Since we are more familiar with the PROX algorithm, we therefore comare FISTA via rojection with the PROX algorithm, i.e. FISTA via relication only Comarison with overla Here we comare two different imlementations of the GSO-2 solution: FISTA via aroximated rojection comuted by solving the dual roblem with rojected Newton method, and FISTA via relication. We will refer to the former as FISTA-roj, and to the latter as FISTA-rel. The data generation rotocol is equal to the one described in the revious exeriments, but x deends on the first 12/5b variables (which corresond to the first three grous) x = ( c,..., c, 0, 0,..., 0). }{{}}{{} b 12/5 times d b 12/5 times

23 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 22/28 22 Silvia Villa et al. We then define B grous of size b, so that d = B b > d. The first three grous corresond to the subset of relevant variables, and are defined as G 1 = [1,..., b], G 2 = [4/5b + 1,..., 9/5b], and G 3 = [1,..., b/5, 8/5b + 1,..., 12/5b], so that they have a 20% air-wise overla. The remaining B 3 grous are built by randomly drawing sets of b indexes from {1, d}. In the following we will let n = 10 G 1 G 2 G 3, i.e. n is ten times the number of relevant variables, and vary d, b. We also vary the number of grous B, so that the dimension of the sace of latent variables is α times the inut dimension, d = αd, with α = 1.2, 2, 5. Clearly this amounts to taking B = α d/b. The arameter α can be thought of as the average number of grous a single variable belongs to. We identify the correct range of values for τ as in the revious exeriments, using FISTA-roj with loose tolerance, and then evaluate the running time and the number of iterations necessary to comute the entire regularization ath for FISTA-rel on the exanded sace and FISTA-roj, both with ν = Finally we reeat 20 times for each combination of the three arameters d, b, and α. Table 2 Running time (mean ± standard deviation) in seconds for b=10 (to), and b=100 (below). For each d and α, the left and right side corresond to FISTA-roj, and FISTA-rel, resectively. α = 1.2 α = 2 α = 5 d= ± ± ± ± ± ± 8 d= ± ± ± ± ± ± 57 d= ± ± ± ± ± ± 400 α = 1.2 α = 2 α = 5 d= ± ± ± ± ± ± 13 d= ± ± ± ± ± ± 80 d= ± ± 3 90 ± ± ± 16 Table 3 Number of iterations (mean ± standard deviation) for b = 10 (to) and b = 100 (below). For each d and α, the left and right side corresond to FISTA-roj, and FISTA-rel, resectively. α = 1.2 α = 2 α = 5 d= ± ± ± ± ± ± 1300 d= ± ± ± ± ± ± 2000 d= ± ± ± ± ± ± 6000 α = 1.2 α = 2 α = 5 d= ± ± ± ± ± ± 400 d= ± ± ± ± ± ± 500 d= ± ± ± ± ± 60

24 COAP9628_source [12/09 08:25] SmallExtended, MathPhysSci, Numbered, rh:otion 23/28 Proximal methods for the latent grou lasso enalty 23 Fig. 3 Number of iteration necessary for evaluating the rox vs number of variables (d), for different values of the overla degree α, and the tolerance. Running times and number of iterations are reorted in Table 2 and 3, resectively. When the overla, that is α, is low the comutational times of FISTA-rel and FISTA-roj are comarable. As α increases, there is a clear advantage in using FISTA-roj instead of FISTA-rel. The same behavior occurs for the number of iterations. 4.3 = 2 vs = We generate the grous and the coefficient vector as in Subsection 4.1, with b = 10. Differently from the Subsection 4.1, here we comare the comutational erformance of the same algorithm alied to two different roblems: Cyclic Projections for = 2 and Cyclic Projections for =, that yield different solutions, since rox τω G (x) rox 2 τω G (x). In order to guarantee a fair comarison we consider two different values of τ, τ 2 and τ, such that, when comuting rox τ2ω G (x) and rox 2 τ Ω (x), all grous are active. Precisely we G take τ 2 =.8 min,...,b x Gr,2. and τ =.8 min,...,b x Gr,. We comute the aroximated solutions with the Cyclic Projections Algorithm 2 for = 2 and =. We will refer to the former as CP2 and to the latter as CPinf. We sto the iteration when the relative decrease of the aroximated solution is below ɛ. We consider different values for the tolerance ɛ, recisely we take ɛ = 10 2, 10 3, For each value of α and ɛ we estimate the number of iterations, and the comuting time for the two algorithms, and average over 20 reetitions. Mean and standard deviation of number of iterations and the comuting time are lotted in Figure 3 and 4. In all conditions CP2 is much faster than CPinf.

arxiv: v1 [math.oc] 3 Sep 2012

arxiv: v1 [math.oc] 3 Sep 2012 Proximal methods for the latent group lasso penalty arxiv:1209.0368v1 [math.oc] 3 Sep 2012 Silvia Villa, Lorenzo Rosasco, Sofia Mosci, Alessandro Verri Istituto Italiano di Tecnologia, Genova, ITALY CBCL,

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Some results of convex programming complexity

Some results of convex programming complexity 2012c12 $ Ê Æ Æ 116ò 14Ï Dec., 2012 Oerations Research Transactions Vol.16 No.4 Some results of convex rogramming comlexity LOU Ye 1,2 GAO Yuetian 1 Abstract Recently a number of aers were written that

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

A Note on Guaranteed Sparse Recovery via l 1 -Minimization

A Note on Guaranteed Sparse Recovery via l 1 -Minimization A Note on Guaranteed Sarse Recovery via l -Minimization Simon Foucart, Université Pierre et Marie Curie Abstract It is roved that every s-sarse vector x C N can be recovered from the measurement vector

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction

GOOD MODELS FOR CUBIC SURFACES. 1. Introduction GOOD MODELS FOR CUBIC SURFACES ANDREAS-STEPHAN ELSENHANS Abstract. This article describes an algorithm for finding a model of a hyersurface with small coefficients. It is shown that the aroach works in

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction

AKRON: An Algorithm for Approximating Sparse Kernel Reconstruction : An Algorithm for Aroximating Sarse Kernel Reconstruction Gregory Ditzler Det. of Electrical and Comuter Engineering The University of Arizona Tucson, AZ 8572 USA ditzler@email.arizona.edu Nidhal Carla

More information

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Location of solutions for quasi-linear elliptic equations with general gradient dependence Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition TNN-2007-P-0332.R1 1 Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers

More information

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems Int. J. Oen Problems Comt. Math., Vol. 3, No. 2, June 2010 ISSN 1998-6262; Coyright c ICSRS Publication, 2010 www.i-csrs.org Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various

More information

BEST CONSTANT IN POINCARÉ INEQUALITIES WITH TRACES: A FREE DISCONTINUITY APPROACH

BEST CONSTANT IN POINCARÉ INEQUALITIES WITH TRACES: A FREE DISCONTINUITY APPROACH BEST CONSTANT IN POINCARÉ INEQUALITIES WITH TRACES: A FREE DISCONTINUITY APPROACH DORIN BUCUR, ALESSANDRO GIACOMINI, AND PAOLA TREBESCHI Abstract For Ω R N oen bounded and with a Lischitz boundary, and

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Efficient algorithms for the smallest enclosing ball problem

Efficient algorithms for the smallest enclosing ball problem Efficient algorithms for the smallest enclosing ball roblem Guanglu Zhou, Kim-Chuan Toh, Jie Sun November 27, 2002; Revised August 4, 2003 Abstract. Consider the roblem of comuting the smallest enclosing

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition Haiing Lu, K.N. Plataniotis and A.N. Venetsanooulos The Edward S. Rogers Sr. Deartment of

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces

Research Article An iterative Algorithm for Hemicontractive Mappings in Banach Spaces Abstract and Alied Analysis Volume 2012, Article ID 264103, 11 ages doi:10.1155/2012/264103 Research Article An iterative Algorithm for Hemicontractive Maings in Banach Saces Youli Yu, 1 Zhitao Wu, 2 and

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Learning with stochastic proximal gradient

Learning with stochastic proximal gradient Learning with stochastic proximal gradient Lorenzo Rosasco DIBRIS, Università di Genova Via Dodecaneso, 35 16146 Genova, Italy lrosasco@mit.edu Silvia Villa, Băng Công Vũ Laboratory for Computational and

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

Online homotopy algorithm for a generalization of the LASSO

Online homotopy algorithm for a generalization of the LASSO This article has been acceted for ublication in a future issue of this journal, but has not been fully edited. Content may change rior to final ublication. Online homotoy algorithm for a generalization

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

c Copyright by Helen J. Elwood December, 2011

c Copyright by Helen J. Elwood December, 2011 c Coyright by Helen J. Elwood December, 2011 CONSTRUCTING COMPLEX EQUIANGULAR PARSEVAL FRAMES A Dissertation Presented to the Faculty of the Deartment of Mathematics University of Houston In Partial Fulfillment

More information

Finding a sparse vector in a subspace: linear sparsity using alternating directions

Finding a sparse vector in a subspace: linear sparsity using alternating directions IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

Recursive Estimation of the Preisach Density function for a Smart Actuator

Recursive Estimation of the Preisach Density function for a Smart Actuator Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

Machine Learning: Homework 4

Machine Learning: Homework 4 10-601 Machine Learning: Homework 4 Due 5.m. Monday, February 16, 2015 Instructions Late homework olicy: Homework is worth full credit if submitted before the due date, half credit during the next 48 hours,

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH Luoluo Liu, Trac D. Tran, and Sang Peter Chin Det. of Electrical and Comuter Engineering, Johns Hokins Univ., Baltimore, MD 21218, USA {lliu69,trac,schin11}@jhu.edu

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i Comuting with Haar Functions Sami Khuri Deartment of Mathematics and Comuter Science San Jose State University One Washington Square San Jose, CA 9519-0103, USA khuri@juiter.sjsu.edu Fax: (40)94-500 Keywords:

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES

EXACTLY PERIODIC SUBSPACE DECOMPOSITION BASED APPROACH FOR IDENTIFYING TANDEM REPEATS IN DNA SEQUENCES EXACTLY ERIODIC SUBSACE DECOMOSITION BASED AROACH FOR IDENTIFYING TANDEM REEATS IN DNA SEUENCES Ravi Guta, Divya Sarthi, Ankush Mittal, and Kuldi Singh Deartment of Electronics & Comuter Engineering, Indian

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS Massimo Vaccarini Sauro Longhi M. Reza Katebi D.I.I.G.A., Università Politecnica delle Marche, Ancona, Italy

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES CONTROL PROBLEMS

ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES CONTROL PROBLEMS Electronic Transactions on Numerical Analysis. Volume 44,. 53 72, 25. Coyright c 25,. ISSN 68 963. ETNA ON THE DEVELOPMENT OF PARAMETER-ROBUST PRECONDITIONERS AND COMMUTATOR ARGUMENTS FOR SOLVING STOKES

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

The Value of Even Distribution for Temporal Resource Partitions

The Value of Even Distribution for Temporal Resource Partitions The Value of Even Distribution for Temoral Resource Partitions Yu Li, Albert M. K. Cheng Deartment of Comuter Science University of Houston Houston, TX, 7704, USA htt://www.cs.uh.edu Technical Reort Number

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition A Qualitative Event-based Aroach to Multile Fault Diagnosis in Continuous Systems using Structural Model Decomosition Matthew J. Daigle a,,, Anibal Bregon b,, Xenofon Koutsoukos c, Gautam Biswas c, Belarmino

More information

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm - (23) NLP - NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS The Goldstein-Levitin-Polya algorithm We consider an algorithm for solving the otimization roblem under convex constraints. Although the convexity

More information

Monopolist s mark-up and the elasticity of substitution

Monopolist s mark-up and the elasticity of substitution Croatian Oerational Research Review 377 CRORR 8(7), 377 39 Monoolist s mark-u and the elasticity of substitution Ilko Vrankić, Mira Kran, and Tomislav Herceg Deartment of Economic Theory, Faculty of Economics

More information

Department of Electrical and Computer Systems Engineering

Department of Electrical and Computer Systems Engineering Deartment of Electrical and Comuter Systems Engineering Technical Reort MECSE-- A monomial $\nu$-sv method for Regression A. Shilton, D.Lai and M. Palaniswami A Monomial ν-sv Method For Regression A. Shilton,

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

3 Properties of Dedekind domains

3 Properties of Dedekind domains 18.785 Number theory I Fall 2016 Lecture #3 09/15/2016 3 Proerties of Dedekind domains In the revious lecture we defined a Dedekind domain as a noetherian domain A that satisfies either of the following

More information

Convex Analysis and Economic Theory Winter 2018

Convex Analysis and Economic Theory Winter 2018 Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1) Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

Online Learning of Noisy Data with Kernels

Online Learning of Noisy Data with Kernels Online Learning of Noisy Data with Kernels Nicolò Cesa-Bianchi Università degli Studi di Milano cesa-bianchi@dsiunimiit Shai Shalev Shwartz The Hebrew University shais@cshujiacil Ohad Shamir The Hebrew

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP Submitted to the Annals of Statistics arxiv: arxiv:1706.07237 CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP By Johannes Tewes, Dimitris N. Politis and Daniel J. Nordman Ruhr-Universität

More information

RESOLUTIONS OF THREE-ROWED SKEW- AND ALMOST SKEW-SHAPES IN CHARACTERISTIC ZERO

RESOLUTIONS OF THREE-ROWED SKEW- AND ALMOST SKEW-SHAPES IN CHARACTERISTIC ZERO RESOLUTIONS OF THREE-ROWED SKEW- AND ALMOST SKEW-SHAPES IN CHARACTERISTIC ZERO MARIA ARTALE AND DAVID A. BUCHSBAUM Abstract. We find an exlicit descrition of the terms and boundary mas for the three-rowed

More information

On the Toppling of a Sand Pile

On the Toppling of a Sand Pile Discrete Mathematics and Theoretical Comuter Science Proceedings AA (DM-CCG), 2001, 275 286 On the Toling of a Sand Pile Jean-Christohe Novelli 1 and Dominique Rossin 2 1 CNRS, LIFL, Bâtiment M3, Université

More information

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China

More information

OPTIMAL AFFINE INVARIANT SMOOTH MINIMIZATION ALGORITHMS

OPTIMAL AFFINE INVARIANT SMOOTH MINIMIZATION ALGORITHMS 1 OPTIMAL AFFINE INVARIANT SMOOTH MINIMIZATION ALGORITHMS ALEXANDRE D ASPREMONT, CRISTÓBAL GUZMÁN, AND MARTIN JAGGI ABSTRACT. We formulate an affine invariant imlementation of the accelerated first-order

More information

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21 Trimester 1 Pretest (Otional) Use as an additional acing tool to guide instruction. August 21 Beyond the Basic Facts In Trimester 1, Grade 8 focus on multilication. Daily Unit 1: Rational vs. Irrational

More information

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

LORENZO BRANDOLESE AND MARIA E. SCHONBEK LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong

More information

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing PHYSICAL REVIEW E 75, 4197 27 Period-two cycles in a feedforward layered neural network model with symmetric sequence rocessing F. L. Metz and W. K. Theumann Instituto de Física, Universidade Federal do

More information

t 0 Xt sup X t p c p inf t 0

t 0 Xt sup X t p c p inf t 0 SHARP MAXIMAL L -ESTIMATES FOR MARTINGALES RODRIGO BAÑUELOS AND ADAM OSȨKOWSKI ABSTRACT. Let X be a suermartingale starting from 0 which has only nonnegative jums. For each 0 < < we determine the best

More information

Solving Support Vector Machines in Reproducing Kernel Banach Spaces with Positive Definite Functions

Solving Support Vector Machines in Reproducing Kernel Banach Spaces with Positive Definite Functions Solving Suort Vector Machines in Reroducing Kernel Banach Saces with Positive Definite Functions Gregory E. Fasshauer a, Fred J. Hickernell a, Qi Ye b, a Deartment of Alied Mathematics, Illinois Institute

More information

POINTS ON CONICS MODULO p

POINTS ON CONICS MODULO p POINTS ON CONICS MODULO TEAM 2: JONGMIN BAEK, ANAND DEOPURKAR, AND KATHERINE REDFIELD Abstract. We comute the number of integer oints on conics modulo, where is an odd rime. We extend our results to conics

More information

1-way quantum finite automata: strengths, weaknesses and generalizations

1-way quantum finite automata: strengths, weaknesses and generalizations 1-way quantum finite automata: strengths, weaknesses and generalizations arxiv:quant-h/9802062v3 30 Se 1998 Andris Ambainis UC Berkeley Abstract Rūsiņš Freivalds University of Latvia We study 1-way quantum

More information

Research of PMU Optimal Placement in Power Systems

Research of PMU Optimal Placement in Power Systems Proceedings of the 5th WSEAS/IASME Int. Conf. on SYSTEMS THEORY and SCIENTIFIC COMPUTATION, Malta, Setember 15-17, 2005 (38-43) Research of PMU Otimal Placement in Power Systems TIAN-TIAN CAI, QIAN AI

More information

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels

Approximate Dynamic Programming for Dynamic Capacity Allocation with Multiple Priority Levels Aroximate Dynamic Programming for Dynamic Caacity Allocation with Multile Priority Levels Alexander Erdelyi School of Oerations Research and Information Engineering, Cornell University, Ithaca, NY 14853,

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM

ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM ANALYTIC NUMBER THEORY AND DIRICHLET S THEOREM JOHN BINDER Abstract. In this aer, we rove Dirichlet s theorem that, given any air h, k with h, k) =, there are infinitely many rime numbers congruent to

More information

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2)

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2) Chater 5 Pell s Equation One of the earliest issues graled with in number theory is the fact that geometric quantities are often not rational. For instance, if we take a right triangle with two side lengths

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4 Discrete Adative Transmission for Fading Channels Lang Lin Λ, Roy D. Yates, Predrag Sasojevic WINLAB, Rutgers University 7 Brett Rd., NJ- fllin, ryates, sasojevg@winlab.rutgers.edu Abstract In this work

More information