DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY

Size: px
Start display at page:

Download "DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY"

Transcription

1 SIAM J. OPTIM. Vo. 7, No. 3, pp c 017 Society for Industria and Appied Mathematics DIFFERENCE-OF-CONVEX LEARNING: DIRECTIONAL STATIONARITY, OPTIMALITY, AND SPARSITY MIJU AHN, JONG-SHI PANG, AND JACK XIN Abstract. This paper studies a fundamenta bicriteria optimization probem for variabe seection in statistica earning; the two criteria are a oss/residua function and a mode contro (aso caed reguarization, penaty). The former function measures the fitness of the earning mode to data and the atter function is empoyed as a contro of the compexity of the mode. We focus on the case where the oss function is (strongy) convex and the mode contro function is a differenceof-convex (dc) sparsity measure. Our paper estabishes some fundamenta optimaity and sparsity properties of directiona stationary soutions to a nonconvex Lagrangian formuation of the bicriteria optimization probem, based on a speciay structured dc representation of many we-known sparsity functions that can be profitaby expoited in the anaysis. We reate the Lagrangian optimization probem with the penaty constrained probem in terms of their respective d(irectiona)-stationary soutions; this is in contrast to common anaysis that pertains to the (goba) imizers of the probem which are not computabe due to nonconvexity. Most importanty, we provide sufficient conditions under which the d(irectiona)-stationary soutions of the nonconvex Lagrangian formuation are goba imizers (possiby restricted due to nondifferentiabiity), thereby fiing the gap between previous imizer-based anaysis and practica computationa considerations. The estabished reation aows us to readiy appy the derived resuts for the Lagrangian formuation to the penaty constrained formuation. Speciaizations of the conditions to exact and surrogate sparsity functions are discussed, yieding optimaity and sparsity resuts for existing nonconvex formuations of the statistica earning probem. Key words. statistica earning, difference-of-convex programs, directiona stationary points, optimaity, sparsity AMS subject cassifications. 90C6, 65K10, 6B10 DOI /16M Introduction. Sparse representation 7] is a fundamenta methodoogy of data science in soving a broad range of probems from statistica and machine earning in artificia inteigence to physica sciences and engineering (e.g., imaging and sensing technoogies), and to medica decision making (e.g., cassification of heathy versus unheathy patients, benign and cancerous tumors). Significant advances have been made in the ast decade on constructing intrinsicay ow-dimensiona soutions in high-dimensiona probems via convex programg. In statistica earning, the Least Absoute Shrinkage and Seection Operator (LASSO) 49] is an efficient inear optimization method in regression and variabe seection probems based on the imization of the 1 -norm x 1 n x i of the n-dimensiona mode variabe x, either subject to a certain prescribed residua constraint, eading to a constrained optimization probem, or empoying such a residua as an additiona criterion to be imized, resuting in an unconstrained imization probem. (Throughout this Received by the editors Juy 14, 016; accepted for pubication (in revised form) May 1, 017; pubished eectronicay August 8, Funding: The first and second author s work was supported by the Nationa Science Foundation under grants CMMI and IIS and by the Air Force Office of Scientific Research under grant FA The third author s work was supported by the Nationa Science Foundation grants DMS-1507, DMS-15383, and IIS Department of Industria and Systems Engineering, University of Southern Caifornia, Los Angees, CA (mijuahn@usc.edu, jongship@usc.edu). Department of Mathematics, University of Caifornia, Irvine, CA 9697 (jxin@math.uci.edu). 1637

2 1638 MIJU AHN, JONG-SHI PANG, AND JACK XIN paper, we address the case of vector optimization and eave the anaysis of matrix optimization probems to future work.) Such a convex norm is empoyed as a surrogate of the 0 -function of x; i.e., x 0 n x i 0, where for a scaar t, t 0 { 1 if t 0 0 if t = 0 counts the nonzero entries in the mode variabe x. Theoretica anaysis of LASSO and its abiity for recovery of the true support of the ground truth vector was pioneered by Candès and Tao 6, 7]. This operator has many good statistica properties such as mode seection consistency 56], estimation consistency 3, 9], and persistence property for prediction 6]. There exist many modified versions and agorithms proposed to improve the computationa efficiency of LASSO; incuding adaptive LASSO 58], Bregman Iterative Agorithms 5], the Dantzig Seector 10, 8], iterativey reweighted LASSO 9], eastic net 59], and LARS (Least Ange Regression) 15]. Unti now, due to its favorabe theoretica underpinnings and many efficient soution methods, convex optimization has been a principa venue for soving many statistica earning probems. Yet there is increasing evidence supporting the use of nonconvex formuations to enhance the reaism of the modes and improve their generaizations. For instance, in compressed sensing and image science 13], recent findings reported in 34, 51, 54] show that the difference of 1 and norms ( 1 for short) and a (nonconvex) transformed 1 surrogate outperform 1 and other known convex penaties in sparse signa recovery when the sensing matrix is highy coherent; such a regime occurs in superresoution imaging where one attempts to recover fine scae structure from coarse scae data information; see 1, 5] and references therein. More broady, important appications in other areas such as computationa statistics, machine earning, bio-informatics, and portfoio seection 8] offer promising sources of probems for the empoyment of nonconvex functionas to express mode oss, promote sparsity, and enhance robustness. The present paper is motivated by the recent furry of activities pertaining to the use of nonconvex functionas for statistica earning probems. This idea dates back more than fifteen years ago in statistics when Fan and Li 17] pointed out that LASSO is a biased estimator and postuated severa desirabe statistics based properties for a good univariate surrogate function for 0 to have; the authors then introduced a univariate, butterfy shaped, foded concave 19], piecewise quadratic function caed SCAD for smoothy cipped absoute deviation that is symmetric with respect to the origin and concave on R + ; see aso 18]. Besides SCAD and 1, there exist today many penaized regression methods using nonconvex penaties, incuding the q (for q (0, 1)) penaty 0, 3], the reated p q penaty for p 1 and q 0, 1) 11], the combination of 0 and 1 penaties 3], the smooth integration of counting and absoute deviation (SICA) 35], the imax concave penaty (MCP) 53], the capped- 1 penaty 55], the ogarithmic penaty 37], the truncated 1 penaty 48], the hard threshoding penaty 57], the 1 penaty 51], and the transformed 1 40, 35, 54] mentioned above. Whie there are agorithms for soving severa of these nonconvex optimization probems, such as 17, 19, 53, 37, 4], due to the nondifferentiabiity of the imands in these nonconvex optimization probems, it is not cear what kind of stationary points these agorithms actuay compute. Indeed, without a cear understanding of such points, it is not possibe to rigorousy ascertain the stationarity, et aone imizing, properties of the imit points of the iterates produced by the agorithms. This wi aso hep to cose the gap between a imizer-based statistica anaysis and a practica computationa procedure, reaxing the requirement of optimaity to a more reaistic stationarity property of the computationa mode in such an anaysis.

3 DC LEARNING 1639 Our work focuses on two types of sparsity measures: one type consists of those mentioned above that are surrogates of the 0 -function; the other cass of functions is motivated by the 1 -function; this function has the interesting property that its zeros coincide with the 1-sparse vectors, i.e., x 1 x = 0 if and ony if x has at most one nonzero component. Generaizing this function, we exae a function whose zeros are the K-sparse vectors for a given positive integer K; these are vectors with at most K nonzero components. Whie this function has been used to describe the rank of matrices 1, 5, 39], its use to express the sparsity of vectors has not received much attention in the iterature. One exception is the reference 5] that has recognized the dc property 44, 45] of the K-sparsity function and empoyed it to describe cardinaity constraints. A these functions are nonconvex and nondifferentiabe, making the resuting optimization probems chaenging to anayze and sove. A major goa of our study is to address these chaenges, providing anaysis, agorithms, and numerics to demonstrate the properties and benefits of such nonconvex sparsity functions. The overarching contention is that for the anaysis of the resuting optimization probems, it is essentia that we focus on soutions that are computabe in practice, rather than on soutions that are not provaby computabe (such as goba imizers that cannot be guaranteed to be the outcome of an agorithm). Thus, our contribution is to provide the theoretica underpinning of what may be termed computabe statistica earning, reying on the recent work 4] where numerica agorithms supported by convergence theories have been introduced for computing d-stationary soutions to genera nonsmooth dc programs; see aso 43]. An accompanying study 1] wi report the detais of computationa experience of such agorithms appied to the probems described in this paper. With the above background and review of the iterature, we are ready to summarize the mutifod contributions of our work whose setting is on vector probems. The starting point of these contributions is the introduction of a unified formuation with a piecewise-smooth dc objective function for a the sparsity measures mentioned above. Since the concave summand of severa objective functions are of the piecewise smooth, thus nondifferentiabe kind, care must be exerted in understanding the stationary soutions of the optimization probems. For this purpose, we wi buid on the theory in the recent paper 4] that has identified directiona stationarity as the east restrictive concept for nonsmooth dc programs with the resuting d-stationary points being computabe by practica agorithms. Most importanty, a major contribution of our work is the derivation of severa theoretica resuts on the nonconvex Lagrangian formuation of the bicriteria statistica earning probem, giving conditions under which d-stationary soutions of the Lagrangian optimization probem are its goba ima (perhaps restricted due to nondifferentiabiity), and in some cases, offering error bounds on the deviation for these soutions to an arbitrary vector, incuding the ground truth to be discovered. Unike much anaysis in the statistics iterature, our approach is more in ine with computationa optimization by focusing on the optimization mode that is empoyed as the principa computationa vehice for discovery. In this vein, our anaysis is reated to that in 11] which has studied the choice of the weighing factor and derived properties of a goba imizer of the particuar p probem ; in contrast, our genera resuts in section 3 are appicabe to an arbitrary (strongy) convex oss function and a dc penaty function that covers many known sparsity functions in the iterature. The anaysis in the present paper is reated to that in two recent artices 31, 33]. There are severa differences, however. Perhaps the main one is the respective foci of the cited papers and ours. In 31], where there are resuts connecting the sparsity

4 1640 MIJU AHN, JONG-SHI PANG, AND JACK XIN imization probems formuated using the origina 0 objective and those using its surrogates, the emphasis therein is on the appication of the dc agorithm to the atter surrogate formuations, inking them to existing machine earning schemes. In 33], the anaysis is more from a statistica perspective whie ours is more from a practica optimization perspective. Specificay, the goa in 33] is to gain a deep understanding of how the soution to a reguarized optimization probem can recover a true imizer of the expected oss function (the so-caed ground truth), where the former probem is constructed by imizing the sum of the oss function pus a reguarization term and with the constraint that the support of the variabes is contained in that of the ground truth. Under extensive assumptions on the parameter choices, the anaysis therein is quite detaied and covers many oss and reguarization functions known in the iterature. Nevertheess, besides the support constraint imposed in the optimization probem, two restrictions were required of the reguarizer, i.e., the penaty function: separabiity and differentiabiity (except at the origin); these restrictions rue out severa important casses of sparsity functions, such as the exact K-sparse functions (see section 5) and the famiy of piecewise smooth functions such as the capped 1 -function and the piecewise inear approximation of the -function in the 1 -function. The exact K-sparsity function is nonseparabe; both casses of functions have nontrivia manifods of nondifferentiabe points that are potentiay the imit points of iterates computed by an optimization procedure. In contrast, our theory is appicabe to a these functions. More importanty, we aim to address the foowing issue, which from a computationa perspective is of more practica significance. Namey, given a oss function and a reguarizing/penaty function, can one identify suitabe weights combining these two criteria so that a d-stationary soution of the resuting nonconvex, nonsmooth optimization probem has some imizing properties? Once this is done, the atter property then aows further anaysis of the soution that incudes error bounds from the ground truth in terms of some known statistica estimates. Our anaysis is supported by numerica agorithms that can compute such a stationary soution, based on the recent work of 4]. Throughout, the dc property of the functions invoved is the backbone of our resuts. Athough this property is not expicity stated in 33], one of the assumptions imposed therein on the univariate reguarizer impies that it is a dc function. In summary, whie there exists previous anaysis of the bicriteria optimization probem with sparsity functions, our work contributes to a deeper understanding of the probem, particuary for nondifferentiabe sparsity functions. The principa goa of our anaysis is to address the optimaity and sparsity properties of a directiona stationary soution of the probem that can be computed in practice using a dcbased optimization agorithm 4]. Another noteworthy point is that we attempt to dea with an anaysis sufficienty broad that is appicabe to a host of oss and penaty functions, rather than focusing on speciaized probems whose generaizations to other formuations may not be immediate.. The unified DC program of statistica earning. We consider the foowing Lagrangian form of the bicriteria statistica earning optimization probem for seecting the mode variabe w R m : (1) imize w W Z γ(w) (w) + γ P (w), where γ is a positive scaar parameter baancing the oss (or residua) function (w) and the mode contro (aso caed penaty) function P (w) and W is a cosed convex (typicay poyhedra) set in R m constraining the mode variabe w. The unconstrained

5 DC LEARNING 1641 case where W = R m is of significant interest in many appications. Throughout the paper, we assume that (A 0 ) (w) is a convex function on the cosed convex set W and P (w) is a differenceof-convex (dc) function given by () P (w) g(w) h(w), with h(w) max 1 i I h i(w), for some integer I > 0, where g is convex but not necessariy differentiabe and each h i is convex and differentiabe. Thus the concave summand of P (w), i.e., h(w), is the negative of a pointwise maximum of finitey many convex differentiabe functions; as such h(w) is piecewise differentiabe (PC 1 ) according to the definition of such a piecewise smooth function 16, Definition 4.5.1]. (Specificay, a continuous function f(w) is PC 1 on an open domain Ω if there exist finite many C 1 functions {f i } M for some positive integer M such that f(w) {f i (w)} M for a w Ω. If each f i is an affine function, we say that f is piecewise affine on Ω.) In many (inexact) surrogate sparsity functions, the function P (w) is separabe in its arguments; i.e., P (w) = m p i(w i ), where each p i (w i ) is a univariate dc function whose concave summand is the negative of a pointwise maximum of finitey many convex differentiabe (univariate) functions; i.e., the univariate anaog of (). It is important to note that whie it has been recognized in the iterature (e.g.,, 4, 31, 50]) that severa casses of sparsity functions can be formuated as dc functions, the particuar form () of the function h(w) has not been identified in the genera dc approach to the sparsity optimization. Our work herein expoits this piecewise structure profitaby. Since every convex function can be represented as the pointwise maximum of a famiy of affine functions, possiby a continuum of them, it is natura to ask if the resuts in our work can be extended to a genera convex function h. A fu treatment of this question is regrettaby beyond the scope of this paper which is aimed at addressing probems in sparsity optimization and not for genera dc programs..1. A mode-contro constrained formuation. With two criteria, the oss (w) and mode contro P (w), there are two aternative optimization formuations for the choice of the mode variabe w; one of these two aternative formuations constrains the atter function whie imizing the former; this is in contrast to baancing the two criteria using a weighing parameter γ into a singe combined objective function to be imized. Specificay, given a prescribed scaar β > 0, consider (3) imize w W (w) subject to P (w) β, which we ca the mode contro-constrained version of (1). Since both (1) and (3) are nonconvex probems and invove nondifferentiabe functions, the connections between them are not immediatey cear. For one thing, from a computationa point of view, it is not practica to reate the two probems via their goba ima; the reason is simpe: such ima are computationay intractabe. Instead, one needs to reate these probems via their respective computabe soutions. Toward this end, it seems reasonabe to reate the d(irectiona)-stationary soutions of (1) to the B(ouigand)- stationary soutions of (3) as both kinds of soutions can be computed by the majorization/inearization agorithms described in 4] and they are the sharpest kind of stationary soutions for directionay differentiabe optimization probems. Before presenting the detais of this theory reating the two probems (1) and (3), we note

6 164 MIJU AHN, JONG-SHI PANG, AND JACK XIN that a third formuation can be introduced wherein one imizes the penaty function P (w) whie constraining the oss function (w) not to exceed a prescribed toerance. As the oss function (w) is assumed to be convex in our treatment, the atter formuation is a convex constrained dc program; this is unike (3) that is a dc constrained dc program, a probem that is consideraby more chaenging than a convex constrained program. Thus we wi omit this oss function constrained formuation and focus ony on reating the Lagrangian formuation (1) and the penaty constrained formuation (3)... Stationary soutions. With W being a cosed convex set, a vector w W is formay a d-stationary soution of (1) if the directiona derivative Z γ(w ; w w ) im τ 0 Z γ (w + τ(w w )) Z γ (w ) τ 0 w W. Letting Ŵβ {w W P (w) β} be the (nonconvex) feasibe set of (3), we say that a vector w Ŵβ is a B-stationary soution of this probem if ( w; dw) 0 for a dw T (Ŵβ; w), where T (Ŵβ; w) is the tangent cone of Ŵβ at w; the atter cone consists of a vector dw for which there exist a sequence of vectors {w k } Ŵ β converging to w and a sequence of positive scaars {τ k } converging to zero such w that dw = im k w k τ k. This paper focuses on the derivation of optimaity and sparsity properties of directiona stationary soutions to the probem (1). Through the connections with the constrained formuation (3) estabished beow, the obtained resuts for (1) can be adopted to the former probem. A natura question arises why we choose to focus on directiona stationary soutions rather than stationary soutions of other kinds, such as that of a critica point for dc programs 4, 31, 44, 45]. For reasons given in 4, section 3.3] (see, in particuar, the summary of impications therein), directiona stationary soutions are the sharpest kind among stationary soutions of other kinds in the sense a directiona stationary soution must be stationary according to other definitions of stationarity. Moreover, as shown in Proposition.1 beow and Proposition 3.1 ater, d-stationary soutions possess imizing properties that are not in genera satisfied by stationary soutions of other kinds. In essence, the reason for the sharpness of d-stationarity is because it captures a the active pieces as described by the index set A(w ) at the point under consideration w ; see (4). In contrast, a critica point of a dc program fais to capture a the active pieces. Thus any property that a critica point might have may not be appicabe to a the active pieces. Our first resut is a characterization of a d-stationary soution of (1). Proposition.1. Under assumption (A 0 ), a vector w W is a d-stationary soution of (1) if and ony if (4) w arg w W (w) + γ { g(w) h i (w ) T ( w w ) } }{{} i A(w ), convex function in w where A(w ) { i h i (w ) = h(w ) }. Moreover, if the function h is piecewise affine, then any such d-stationary soution must be a oca imizer of Z γ on W. Proof. It suffices to note that h (w ; dw) = max i A(w ) h i (w ) T dw for a vectors dw R m. The ast assertion about the oca imizing property foows readiy

7 DC LEARNING 1643 from the fact 16, Exercise ] that we have h(w) = h(w ) + h (w ; w w ) for any piecewise affine function h, provided that w is sufficienty near w. Remark. The ast assertion in the above proposition generaizes a resut in 30] where an additiona differentiabiity assumption was assumed. The characterization of a B-stationary point of (3) is ess straightforward, the reason being the nonconvexity of the feasibe set Ŵβ that makes it difficut to unvei the structure of the tangent cone T (Ŵβ; w). If P ( w) < β, then T (Ŵβ; w) = T (W; w). The foowing resut shows that this case is not particuary interesting when it pertains to the stationarity anaysis of w. Proposition.. Under assumption (A 0 ), if w W β is a B-stationary point of (3) such that P ( w) < β, then w is a goba imizer of (3). Proof. As noted above, the fact that P ( w) < β impies T (Ŵβ; w) = T (W; w); hence w satisfies ( w; dw) 0 for a dw T (W; w). Since is a convex function and W is a convex set, it foows that w is a goba imizer of on W. In turn, this impies that w is a goba imizer of on Ŵβ because Ŵβ is a subset of W. To understand the cone T (Ŵβ; w) when P ( w) = β, we need certain constraint quaifications (CQs), under which it becomes possibe to obtain a necessary and sufficient condition for a B-stationary point of (3) simiar to that in Proposition.1 for a d-stationary point of (1). We ist two such CQs as foows, one of which is a pointwise condition whie the other one pertains to the entire set Ŵβ: The pointwise Sater CQ is said to hod at w Ŵβ satisfying P ( w) = β if d T (W β ; w) exists such that g ( w; d) < h j ( w) T d for a j A( w). The piecewise affine CQ (PACQ) is said to hod for Ŵβ if W is a poyhedron and the function g is piecewise affine. (This CQ impies neither the convexity nor the piecewise poyhedraity of Ŵβ because no inearity is assumed for the functions h i.) Under these CQs, we have the foowing characterization of a B-stationary point of the penaty constrained probem (3). See 4] for a proof. Proposition.3. Under assumption (A 0 ), if either the pointwise Sater CQ hods at w or the PACQ hods for the set Ŵβ, then w is a B-stationary point of (3) if and ony if ( w; dw) 0 for a dw Ĉ j ( w) { d T (W; w) g ( w; d) h j ( w) T d } and every j A(w ). With the above two propositions, we can formay reate the two probems (1) and (3) as foows. Proposition.4. Under assumption (A 0 ), the foowing two statements hod. (a) If w is a d-stationary soution of (1) for some γ 0, and if the pointwise Sater CQ hods at w for the set Ŵβ where β P (w ), then w is a B-stationary soution of (3). (b) If w is a B-stationary soution of (3) for some β > 0, and if the pointwise Sater CQ hods at w for the set Ŵβ, then the foowing two statements hod: for each j A( w), a scaar γ j 0 exists such that w is a d-stationary soution of (5) imize w W (w) + γ j ( g(w) h j (w) ) ] ; nonnegative scaars γ and { γ j } I j=1 exist with I j=1 γ j = 1 and γ j (h(w) h j (w)) =

8 1644 MIJU AHN, JONG-SHI PANG, AND JACK XIN 0 for a j such that w is a d-stationary soution of I imize (w) + γ g(w) γ j h j (w). w W Finay, statements (a) and (b) remain vaid if the PACQ hods instead of the pointwise Sater CQ. j=1 Proof. If w is a d-stationary soution of (1), then (w ; w w ) + γ g (w ; w w ) h i (w ) T (w w ) ] 0 for a w W and a i A(w ). Let dw Ĉ j ( w) for some j A(w ). We have w dw = im k w k τ k for some sequence {w k } W converging to w and sequence {τ k } R ++ converging to zero. For each k, we have (w ; w k w ) + γ g (w ; w k w ) h i (w ) T (w k w ) ] 0. Dividing by τ k and etting k, we deduce (w ; dw) + γ g (w ; dw) h i (w ) T dw ] 0, which impies (w ; dw) 0 because the term in the square bracket is nonpositive. This proves parts (a) and (c). To prove (b), et w be as given. By Proposition.3, it foows that for every j A( w), ( w; dw) 0 for a dw Ĉ j ( w) { d T (W; w) g ( w; d) h j ( w) T d }. Thus dw = 0 is a imizer of ( w; dw) for dw Ĉ j ( w). Since the atter set is convex and satisfies the Sater CQ, a standard resut in noninear programg duaity theory (see, e.g., ]) yieds the existence of a scaar γ j such that dw = 0 is a imizer of ( w; dw) + γ j g ( w; dw) h j ( w) T dw ] on T (W; w). This proves statement (i) in (b). Adding up the inequaities ( w; dw) + γ j g ( w; dw) h j ( w) T dw ] 0, dw T (W; w), for j A( w), we deduce ( w; dw) + γ g ( w; dw) where γ 1 A( w) j A( w) γ j h j ( w) T dw 0, dw T (W; w), j A( w) γ γ j and γ j j j A( w) γ. This estabishes (b). j Part (b) of Proposition.4 fas short in estabishing the converse of part (a), i.e., in estabishing the fu equivaence of the two famiies of probems {(1)} γ 0 and {(3)} β 0 in terms of their d-stationary soutions and B-stationary soutions, respectivey. This adds to the known chaenges of the atter dc-constrained famiy of dc probems over the former convex constrained dc programs. In the next section, we focus on the Lagrangian formuation (1) ony and rey on the connections estabished in this section to obtain parae resuts for the constrained formuation (3). Proposition.4 compements the penaty resuts in dc programg initiay studied by 46] and recenty expanded in 5, 4]. These previous penaty resuts address the question of when a fixed constrained probem (3) has an exact penaty formuation

9 DC LEARNING 1645 as the probem (1) for sufficient arge γ in terms of their goba ima. Furthermore, in the case of a quadratic oss function, the reference 5] derives a ower bound for the scaar γ for the penaty formuation to be exact. In contrast, aowing for a genera convex oss function, Proposition.4 deas with stationary soutions that from a computationa perspective are more reasonabe as the resuts pertaining to goba ima ack pragmatism and shoud at best be regarded as providing ony theoretica insights and conceptua evidence about the connection between the two formuations. To cose this subsection, we give a resut pertaining to the case where the function h is piecewise affine; in this case, by Proposition.1, every d-stationary soution of (1) must be a oca imizer. Combining this fact with Proposition.4(b), we can estabish a simiar resut for the probem (3). Proposition.5. Let assumption (A 0 ) hod with each h i being affine additionay. If w is a B-stationary soution of (3) for some β > 0, such that P ( w) = β, and if either the pointwise Sater CQ hods at w for the set Ŵβ or the PACQ hods for the same set, then w is a oca imizer of (3). Proof. By Proposition.4(b) and (c), for each j A( w), a scaar γ j 0 exists such that w is a d-stationary soution, thus imizer, of (5), which is a convex program because h j is affine. To compete the proof, et w Ŵβ be sufficienty cose to w such that A(w) A( w). Let j A(w). We then have ( w) = ( w) + γ j P ( w) γ j β = ( w) + γ j ( g( w) h j ( w) ) γ j β because j A( w) (w) + γ j ( g(w) h j (w) ) γ j β by the imizing property of w = (w) + γ j ( P (w) β ) (w) because j A(w) and P (w) β. This estabishes that w is a oca imizer of (3). 3. Minimizing and sparsity properties. In genera, a stationary soution of (1) is not guaranteed to possess any imizing property. For smooth probems invoving twice continuousy differentiabe functions, it foows from cassica noninear programg theory that with an appropriate second-order sufficiency condition, a stationary soution can be shown to be stricty ocay imizing. Such a we-known resut becomes highy compicated when the functions are nondifferentiabe. Athough one coud, in principe, appy some generaized second derivatives, e.g., those based on Mordukhovich s no-smooth cacuus 38], it woud be preferabe in our context to derive a simpified theory that is more readiy appicabe to particuar sparsity functions, such as those to be discussed in sections 5 and Preiary discussion of the main resut. In a nutshe, our goa is to extend the characterization of a d-stationary soution of (1) in Proposition.1 by showing that under a set of assumptions, incuding a specific choice of the convex function g, for a range of vaues of γ to be identified in the anaysis, any such nonzero d-stationary soution w either has w 0 bounded above by a scaar computabe from certain mode constants, or (6) w arg w W (w) + γ ( g(w) h i (w) ) i A(w ); }{{} remains dc

10 1646 MIJU AHN, JONG-SHI PANG, AND JACK XIN see Theorem 3. in subsection 3.4 for detais. In terms of the origina function Z γ, the above condition impies a restricted goba optimaity property of w ; namey, (7) w arg w W Z γ (w), where W { w W A(w) A(w ) }. To see this impication, it suffices to note that if w W, then etting i be a common index in A(w) A(w ), we have Z γ (w) Z γ (w ) = (w) + γ ( g(w) h i (w) ) ] (w ) + γ ( g(w ) h i (w ) ) ] 0. Borrowing a teroogy from piecewise programg, we say that the vectors w and w share a common piece if A(w) A(w ). Thus the restricted goba optimaity property (7) of w says that w is a true goba imizer of Z γ (w) among those vectors w W that share a common piece with w. The subset W incudes a neighborhood of w in W; i.e., for a suitabe scaar δ > 0, B(w ; δ) W W, where B(w ; δ) is an Eucidean ba centered at w and with radius δ > 0. Thus, the optimaity property (6) impies in particuar that w is a oca imizer of Z γ on W. Another consequence of (6) is that if I = 1 (so that h is convex and differentiabe), then w is a goba imizer of Z γ on W. Besides the above specia cases, our main resut appies to many we-known sparsity functions in the statistica earning iterature, pus the reativey new and argey unexpored cass of exact K-sparsity functions to be discussed ater. 3.. The assumptions. The anaysis requires two main sets of assumptions: one on the mode functions (w) and {h i (w)} I and the other one on the constraint set W. We first state the former set of assumptions, which introduce the key mode constants Lip, λ, Lip h, and ζ. In principe, we coud ocaize these assumptions around a given d-stationary soution; instead, our intention of stating these assumptions gobay on the set W is to use the mode constants to identify the parameter γ that defines the optimization probem (1). (A) In addition to (A 0 ), (A L ) the oss function (w) is of cass LC1 (for continuousy differentiabe with a Lipschitz gradient) on W; i.e., a positive scaar Lip exists such that for a w and w in W, (8) (w) (w ) Lip w w w, w W; (A cvx ) a nonnegative constant λ exists such that (9) (w) (w ) (w ) T ( w w ) λ w w w, w W; (A L h ) nonnegative constants Lip h and β exist such that for each i {1,..., I} and a w and w in W, with 0/0 defined to be zero, ( (10) 0 h i (w) h i (w ) h i (w ) T ( w w Lip h ) + β ) w w w. (A B h ) max 1 i I sup w W h i (w) ] ζ <. By the mean-vaue theorem for mutivariate functions 41, 3..1], we derive from the inequaity (8), (w) (w ) (w ) T ( w w ) Lip w w w, w W.

11 DC LEARNING 1647 Thus Lip λ. When (w) = 1 wt Qw + q T w is a strongy convex quadratic function with a symmetric positive definite matrix Q, the ratio λ Lip = λ(q) λ, max(q) where the numerator and denoator of the right-side ratio are the smaest and argest eigenvaues of Q, respectivey, is exacty the reciproca of the condition number of the matrix Q. We discuss a bit about the two assumptions (A cvx ) and (A L h ). The constant λ expresses the convexity of the oss function, and the strong convexity when λ > 0. For certain nonpoyhedra sparsity functions, this constant needs to be positive in order for interesting resuts to be obtained. One may argue that the atter positivity condition may be too restrictive to be satisfied in appications. For instance, in sparse inear regression where (w) = 1 Aw b, the matrix A can be expected to be coumn rank deficient so that is ony convex but not strongy. For this probem, the resuts beow are appicabe to the reguarized oss function α (w) = (w) + α w for any positive scaar α. More generay, any reguarized convex function is strongy convex and thus satisfies condition (A cvx ) with a positive λ. Another way to soften the positivity of λ satisfying (9) is to require that this inequaity (with a positive λ ) hods ony on a subset of W, e.g., on the set L W, where L { w R m w i = 0 whenever w i = 0 }. In this case, the imizing property of w wi be restricted to the set L W. It turns out that if the functions h i are affine so that the function h(w) = max 1 i I h i (w) is piecewise affine (and convex), the main resut coud hod for λ = 0. Yet another ocaization of the goba inequaity (9) is to assume that the Hessian (w ) exists and is positive definite (either on the entire space R m or on the subspace L ). Under such a pointwise strong convexity condition (via the positive definiteness of the Hessian matrix), one obtains a ocay stricty imizing property of w. The upshot of this discussion is that, in genera, if one desires a strong imizing property to hod on the entire set W, then the inequaity (9) with a positive λ is essentia to compensate for the nonconvexity of the sparsity function P when h i is nonaffine; reaxation of the condition (9) with a positive λ is possibe either with a piecewise inear function h or at the expense of weakening the imizing concusion when the functions h i are not affine. As we wi see in the subsequent sessions, assumption (A L h ) accommodates a host of convex functions h i. Foremost is the case when the functions h i are of cass LC 1 ; in this case, we may take β = 0 and Lip h to be the argest of the Lipschitz modui of the gradients { h i (w)} I on W. In particuar, when each function h i is affine, we may further take Lip h to be zero or any positive scaar. Assumption (A L h ) aso incudes the case where I = 1 and h(w) = w. Indeed, this foows from the foowing inequaity: provided w 0, ( ) w w w T w ( w w ) 1 w w w, which is equivaent to w w w ( w ) T ( w w ) w w.

12 1648 MIJU AHN, JONG-SHI PANG, AND JACK XIN The eft-hand side is equa to w w (w ) T w. We have w w w w (w ) T w ] = w (w ) T w + w w w w w w + w = w w ] 0, which shows that we may take Lip h = 0 and β = 1 for the -norm function. It is important to remark that, in genera, if (10) hods with a zero Lip h, it certainy hods with a positive Lip h. Nevertheess, we refrain from taking this constant as aways positive because it affects the seection of the constant γ as it wi become cear in the main theorem, Theorem 3.. Finay, (A L h ) aso accommodates any inear combination of functions each satisfying this assumption. Such a combination may be reevant in probems of group sparsity imization where one may consider the use of different sparsity functions for different groups. The second assumption (B) beow is on the constraint set W. Consistent with the previous assumptions, we state assumption (B) with respect to an arbitrary vector w W, making it a goba-ike condition; however, it wi be cear from the anaysis that the assumption is most specific to a stationary soution being anayzed. The assumption is satisfied for exampe if W = m W i where each W i is one of three sets: R (for an unrestricted variabe) or R ± (for probems with sign restrictions on the variabes w i ). Thus our resuts are appicabe in particuar to sign constrained probems, a departure from the iterature which typicay either deas with unconstrained probems, or makes an interiority assumption on w which essentiay converts the anaysis to the unconstrained case. (B) For a given vector w W, there exists ε > 0 such that for a ε (0, ε ] and for every i with w i 0, the vectors w±ε;i defined beow beong to W: w ±ε;i j { w j ± ε if j = i, w j if j i, j = 1,..., m. Whie our anaysis is appicabe to a genera convex function g, a more interesting case is when g is a weighted 1 -function given by (11) g(w) ξ i w i, w R m, for some positive constants ξ i. This choice of the function g was empoyed in the study 31] of DC approximations to sparsity imization; Proposition 6 in this reference shows that a broad cass of separabe sparsity functions in the statistica earning iterature has a dc decomposition with the above function g as the convex part. For this choice of g, if η g(w) with g(w) denoting the subdifferentia of g at w, then η 0 = i : w i 0 ξ i ( ) ξ i w 0, 1 i m where η 0 denotes the subvector of η with components corresponding to the nonzero components of w. Let ξ 1 i m ξ i so that η 0 ξ w 0 for a w R m.

13 DC LEARNING The case β = 0. We begin our anaysis with a resut pertaining to the case where condition (A L h ) hods with β = 0. This is the case where each h i is of cass LC 1. The specia property (B) on W is not needed in this case; aso, no assumption is imposed on g except for its convexity. Proposition 3.1. Suppose that (A cvx ) and (A L h ) with β = 0 hod. For any scaar γ > 0 such that δ λ γ Lip h 0, the foowing two statements hod: (a) If w is a d-stationary point of Z γ on W, then for a i A(w ), (w) + γ ( g(w) h i (w) ) ] (w ) + γ ( g(w ) h i (w ) ) ] δ w w w W. Hence w is a imizer of Z γ (w) on W. Moreover, if δ > 0, then w is the unique imizer of (w) + γ ( g(w) h i (w) ) on W for a i A(w ). Hence the unique imizer of Z γ (w) on W. (b) If I = 1, then Z γ is convex on W with moduus δ (thus is strongy convex if δ > 0). Proof. To prove (a), et w W be arbitrary. For any i A(w ), we have (w) + γ ( g(w) h i (w) ) ] (w ) + γ ( g(w ) h i (w ) ) ] (w) + γ ( g(w) h i (w) ) ] (w ) + γ ( g(w ) h i (w ) ) ] (w ) T ( w w ) + γ ( g (w ; w w ) h i (w ) T ( w w ) ) ] = (w) (w ) (w )( w w ) ] + γ g(w) g(w ) g (w ; w w ) ] γ h i (w) h i (w ) h i (w ) T ( w w ) ] ( ) λ γ Lip h w w. 1 Thus (a) foows. To prove (b), suppose I = 1. We have, for any w and w in W, Z γ (w) Z γ (w ) Z γ(w ; w w ) 1 ( λ γ Lip h ) w w. Statement (b) foows readiy from the above inequaity. Statement (b) of the above resut is, in genera, not true if I > 1 as iustrated by the univariate function Z γ (t) = t γ t = t γ max(t, t). (This univariate function fits our framework with g = t and h = t.) It can be seen that for any γ > 0 this function is not convex because Z γ (0) = 0 > γ 4 = 1 Z γ(γ/) + Z γ ( γ/) ]. Nevertheess, this function has two stationary points at ±γ/ that are both goba ima of Z γ. This simpe resut iustrates a coupe noteworthy points. First, the convexity of the function Z γ cannot be expected for I > 1 even though the oss function may be strongy convex. Yet it is possibe for a d-stationary soution of Z γ to be a goba imizer. The atter possibiity is encouraging and serves as the motivation for the subsequent extended anaysis with β > 0, where an upper bound on γ wi persist The case of g being a weighted 1 function. In this subsection, we refine the proof of Proposition 3.1 by taking g to be the weighted 1 function (11) and

14 1650 MIJU AHN, JONG-SHI PANG, AND JACK XIN incuding the constant β. We have, for a w W and a i A(w ), (w) + γ ( g(w) h i (w) ) ] (w ) + γ ( g(w ) h i (w ) ) ] (w) (w ) (w )( w w ) ] + γ g(w) g(w ) g (w ; w w ) ] γ h i (w) h i (w ) h i (w ) T ( w w ) ] ( λ Lip h γ + β ) ] w w w. The remaining anaysis uses the d-stationarity of w to estabish the nonnegativity of the mutipicative factor M γ λ γ ( Lip h + β ) w. By the characterization of d-stationarity in Proposition.1, there exists a subgradient η i g(w ) such that (w ) + γ ( η i h i (w ) ) ] T ( w w ) 0 w W. By assumption (B) on the constraint set W, we may substitute w = w ±ε;k for a k such that wk 0 and for arbitrary ε (0, ε ] and obtain (1) (w ) + γ ( η i h i (w ) ) ] k = 0 k such that w k 0. Thus etting 0 h i (w) be the vector ( h i(w) w k, we have )k:w k 0 w Lip + (0) (w ) = γ η 0 0 h i (w ) γ η 0 0 h i (w ) ], where the first inequaity is by the LC 1 property of and the triange inequaity which aso yieds the ast inequaity. Suppose γ c (0) for a constant c > 0 to be detered momentariy. We deduce w Lip γ η 0 0 h i (w ) c 1 ]. With g being the weighted 1 -function (11) and by recaing the scaar ζ in condition (A B h ), we deduce ] (13) w Lip γ ξ w 0 ζ c 1. Moreover, provided that ξ w 0 > ζ + c 1, we obtain γ w Lip ξ w 0 ζ c. 1 Hence, if w 0 > greater than 1), then Consequenty, if (14) δ γ (c) 1 ] ξ ζ + c 1, (the factor can be repaced by any constant γ w < Lip ( ) ζ+c. Thus M 1 γ 1 λ γlip h β Lip ζ+c. 1 ( ) λ β Lip γ Lip h 0, ζ + c 1 it foows that M γ 0. It remains to detere the constant c. To ensure the vaidity of the above derivations, we need to have 1 (15) c (0) Lip h λ β Lip ζ + c 1,

15 DC LEARNING 1651 which is equivaent to q (c) c ζ (0) Lip h + c (0) Lip h + β Lip λ ζ ] λ 0. Summarizing the above derivations, we have estabished the foowing main resut which does not require further proof. Theorem 3.. Under the assumptions in subsection 3. with the constants λ, Lip, Lip h, β, ζ, and ξ as given therein, et g be given by (11), and et c > 0 be any constant satisfying q (c) 0. Let γ > 0 satisfy (16) c (0) Lip h γ Lip h λ β Lip ζ + c 1. If w is a d-stationary soution of Z γ on W, then either (17) or for a i A(w ) and a w W, w 0 ξ ζ + c 1 ], (w) + γ ( g(w) h i (w) ) ] (w ) + γ ( g(w ) h i (w ) ) ] δ γ (c) w w. If the above inequaity hods, then w is a imizer of Z γ (w) on W. Moreover, if δ γ (c) > 0, then w is the unique imizer of (w) + γ ( g(w) h i (w) ) on W for a i A(w ), hence the unique imizer of Z γ (w) on W. Remark. As shown above, q (c) 0 if and ony if (15) hods. With the atter inequaity, we can choose γ > 0 so that (16) hods. In turn, the inequaities in (16) impicity impose a condition on the four mode constants: λ, Lip, Lip h, and β and suggest a choice of γ in terms of them for the theorem to hod. As a (restricted) goba imizer of Z γ, w has the property that for every w W, either (w ) (w) or P (w ) < P (w). In particuar, if w is not a goba imizer of the oss function on W, then we must have P (w ) < P ( w) where w is the atter imizer. In the anguage of muticriteria optimization such a vector w is a Pareto point on W of the two criteria: oss and mode contro; specificay, there does not exist a w W such that the pair ((w), P (w)) ((w ), P (w )). The impication of the (restricted) goba imizing property of a d-stationary point is reiscent of the cass of pseudoconvex functions 36] which, by their definition, are differentiabe functions with the property that a its stationary points are goba imizers of the functions. There are two differences, however. One difference is that the function Z γ is ony directionay differentiabe and its stationary soutions are defined in terms of the directiona derivatives. Another difference is that when β > 0, a (non)sparsity stipuation (i.e., the faiure of (17)) on the d-stationary soution w is needed for it to be a (restricted) goba imizer. The foowing coroary of Theorem 3. is worth mentioning. For simpicity, we state the coroary in terms of Z γ on the subset W. No proof is required. Coroary 3.3. Under the assumptions of Theorem 3., suppose that Lip h = 0. Let g be given by (11). Let c > 0 satisfy c β Lip λ ζ ] λ. For any γ c (0), if w is a d-stationary soution of Z γ on W, then either λ (18) Z γ (w) Z γ (w ) β Lip ] ζ + c 1 w w w W, or w 0 ( ζ+c 1 ) ξ.

16 165 MIJU AHN, JONG-SHI PANG, AND JACK XIN We cose this section by giving a bound on w 0 based on the inequaity (13). The vaidity of this bound does not require the positivity of λ ; nevertheess, the sparsity bound preassumes a bound on w ; this is in contrast to Theorem 3. which makes no assumption on w except for its d-stationarity. Proposition 3.4. Suppose that assumptions (A cvx ), (A L ), and (B) hod. Let g be given by (11). For every γ > c (0) for some scaar c > 0, if w is a d- stationary soution of Z γ on W such that w ] c (0) Lip ξ L + 1 ζ c 1 for some integer L > 0 then w 0 L. Proof. Assume for contradiction that w 0 L + 1. Then w 0; hence ξ L + 1 > ζ + c 1. We have c (0) ξ L + 1 ζ c 1 ] < γ ξ L + 1 ζ c 1 ] by the choice of γ γ ξ w 0 ζ c 1 ] by assumption on w 0 w Lip by (13) c (0) ξ L + 1 ζ c 1 ] by assumption on w. This contradiction estabishes the desired bound on w Sparsity functions and their dc representations. In this and the next section, we investigate the appication of the resuts in section 3 to a host of nonconvex sparsity functions that have been studied extensivey in the iterature. These nonconvex functions are deviations from the convex 1 -norm that is a we-known convex surrogate for the 0 -function. As mentioned in the introduction, we cassify these sparsity functions into two categories: the exact ones and the surrogate ones. The next section discusses the cass of exact sparsity functions; section 6 discusses the surrogate sparsity functions. In each case, we identify the constants Lip h, β, ζ, and ξ of the sparsity functions and present the speciaization of the resuts in the ast section to these functions. For convenience, we summarize in Tabe 1 the sparsity functions being anayzed. The resuts obtained in the next two sections are summarized in Tabes and 3 appended. 5. Exact sparsity functions. Whie the cass of exact K-sparsity functions has been discussed excusivey in the context of matrix optimization, for the sake of competeness we formay define these functions for vectors and highight their properties most reevant to our discussion. For an m-vector w = (w i ) m, et w ] (w i]) m be derived from w by ranking the absoute vaues of its components in nonincreasing order: max 1 i m w i w 1] w ] w m] 1 i m w i ; thus w k] is the kth argest of the m components of w in absoute vaues and {1],..., m]} is an arrangement of the index set {1,..., m}. For a fixed positive integer K, et P K] (w) w 1 }{{} g K] (w) K w k] k=1 }{{} h K] (w) = k=k+1 w k]. Ceary, P K] (w) = 0 if and ony if w has no more than K nonzero components. The convexity of h K], thus the dc-property of P K], foows from the vaue-function

17 DC LEARNING 1653 Tabe 1 Exact and surrogate penaty functions and their properties. Penaty Difference of convex representation K w K-sparsity 1 w k], k=1 β = Lip h = 0 w 1 w, 1 β = 1; Lip h = 0 0 if w i λ, SCAD Capped 1 Transformed 1 Logarithmic ( w i λ ) λ i w i ( a 1 ) λ w i ( a + 1 ) λ β = 0; Lip h = 1 w i { max 0, w i 1, w } i 1, a i a i a i β = Lip h = 0 w i a i + 1 a i ai + 1 w i (a ] i + 1) w i, a i a i + w i if λ w i a λ, if w i a λ, a i + 1 β = 0; Lip h = max 1 i m a i λ i ] wi w i λ i og( w i + ε i ) + og ε i, ε i ε i λ i β = 0; Lip h = max 1 i m ε i Tabe Summary of genera resuts. w is a d-stationary soution. Resut Condition on constants Concusions Thm. 3. Prop. 3.4 c (0) Lip h γlip h λ βlip ζ + c 1, q (c) 0 c (0) < γ Either w 0 ξ ζ + c 1 ], or w is a imizer on W, unique if γlip h < λ βlip ζ + c 1 w c (0) ] ξ L + 1 ζ c 1 Lip w 0 L. representation beow: (19) h K] (w) where K,m K w k] = maximum v i w i, v K,m { k=1 } v 0, 1 ] m v i = K. Since h K] is piecewise inear, it is LC 1 with Lip h = 0; moreover, we have β = 0. In order to cacuate the constant ζ associated with the function h K], it woud be usefu to express h K] (w) as the pointwise maximum of finitey many inear functions. For this purpose, et E( K,m ) be the finite set of extreme points of the poytope K,m.

18 1654 MIJU AHN, JONG-SHI PANG, AND JACK XIN Tabe 3 Summary of speciaized resuts. q (c) c ζ (0) Lip h +c (0) Lip h + β Lip λ ζ ] λ. Resut Condition on constants Concusions Thm. 5.1 (K-sparsity) Thm. 5.1 (K-sparsity) Thm. 5. ( 1 ) Thm. 6.1 (SCAD) Thm. 6.1 (SCAD) Thm. 6. (Capped 1 ) Thm. 6. (Capped 1 ) Thm. 6.3 (Transformed 1 ) Thm. 6.3 (Transformed 1 ) c (0) < γ, K + 1 > K + c 1 c (0) < γ, λ c Lip λ γ 1 λ c (0) < γ 1 λ c (0) < γ λ c (0) γ a i + 1 max 1 i m a i λ c (0) < γ a i + 1 max 1 i m a i w is a imizer on W w c (0) Lip w 0 K Either w 0 (1 + c 1 ), K + 1 K c 1 ] or w is a goba imizer on W w is a goba imizer on W w c (0) Lip w 0 (0) Lip 1 j m λ j c 1 c w is a goba imizer on W w c (0) Lip w 0 ( ] 1 j m λ j 1 c 1 1 j m a j c (0) 1 j m a j w is a goba imizer on W w c (0) Lip ] Lip ) 1 1 c 1 1 j m a j ( ) ] c i m a i i = 1,..., m, either w i = 0 or w i a i Thm. 6.4 (Logarithmic) Thm. 6.4 (Logarithmic) λ c (0) γ λ i max 1 i m ε i λ c (0) < γ λ i max 1 i m ε i w is a goba imizer on W w c (0) Lip ] λ i c 1 1 i m ε i w 0 c (0) ] λ i ] c 1 1 m ε i Lip 1 i m ε i We then have { m } h K] (w) = max v i σ i w i v E( K,m ); σ { ±1 } m. Note that if v E( K,m ) then each component of v is either 0 or 1 and there are exacty K ones. For a given pair (v, σ) E( K,m ) {±1} m, the inear function h v,σ : w m v i σ i w i has gradient given by v σ, where is the notation for the Hadamard product of two vectors. Hence, 0 h v,σ (w) = i : w i 0 v i σ i = ( K, w 0 ). Since for any nonzero vector w and for any η g K] (w), η 0 = w 0, it foows

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Combining reaction kinetics to the multi-phase Gibbs energy calculation 7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

New Efficiency Results for Makespan Cost Sharing

New Efficiency Results for Makespan Cost Sharing New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Symbolic models for nonlinear control systems using approximate bisimulation

Symbolic models for nonlinear control systems using approximate bisimulation Symboic modes for noninear contro systems using approximate bisimuation Giordano Poa, Antoine Girard and Pauo Tabuada Abstract Contro systems are usuay modeed by differentia equations describing how physica

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS

A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS SIAM J. OPTIM. Vo. 18, No. 1, pp. 242 259 c 2007 Society for Industria and Appied Mathematics A BUNDLE METHOD FOR A CLASS OF BILEVEL NONSMOOTH CONVEX MINIMIZATION PROBLEMS MIKHAIL V. SOLODOV Abstract.

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

Available online at ScienceDirect. IFAC PapersOnLine 50-1 (2017)

Available online at   ScienceDirect. IFAC PapersOnLine 50-1 (2017) Avaiabe onine at www.sciencedirect.com ScienceDirect IFAC PapersOnLine 50-1 (2017 3412 3417 Stabiization of discrete-time switched inear systems: Lyapunov-Metzer inequaities versus S-procedure characterizations

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

14 Separation of Variables Method

14 Separation of Variables Method 14 Separation of Variabes Method Consider, for exampe, the Dirichet probem u t = Du xx < x u(x, ) = f(x) < x < u(, t) = = u(, t) t > Let u(x, t) = T (t)φ(x); now substitute into the equation: dt

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

Absolute Value Preconditioning for Symmetric Indefinite Linear Systems

Absolute Value Preconditioning for Symmetric Indefinite Linear Systems MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.mer.com Absoute Vaue Preconditioning for Symmetric Indefinite Linear Systems Vecharynski, E.; Knyazev, A.V. TR2013-016 March 2013 Abstract We introduce

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Equilibrium of Heterogeneous Congestion Control Protocols

Equilibrium of Heterogeneous Congestion Control Protocols Equiibrium of Heterogeneous Congestion Contro Protocos Ao Tang Jiantao Wang Steven H. Low EAS Division, Caifornia Institute of Technoogy Mung Chiang EE Department, Princeton University Abstract When heterogeneous

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES by Michae Neumann Department of Mathematics, University of Connecticut, Storrs, CT 06269 3009 and Ronad J. Stern Department of Mathematics, Concordia

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation

Robust Sensitivity Analysis for Linear Programming with Ellipsoidal Perturbation Robust Sensitivity Anaysis for Linear Programming with Eipsoida Perturbation Ruotian Gao and Wenxun Xing Department of Mathematica Sciences Tsinghua University, Beijing, China, 100084 September 27, 2017

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

An explicit Jordan Decomposition of Companion matrices

An explicit Jordan Decomposition of Companion matrices An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057

More information

Uniformly Reweighted Belief Propagation: A Factor Graph Approach

Uniformly Reweighted Belief Propagation: A Factor Graph Approach Uniformy Reweighted Beief Propagation: A Factor Graph Approach Henk Wymeersch Chamers University of Technoogy Gothenburg, Sweden henkw@chamers.se Federico Penna Poitecnico di Torino Torino, Itay federico.penna@poito.it

More information

arxiv: v2 [cs.lg] 4 Sep 2014

arxiv: v2 [cs.lg] 4 Sep 2014 Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2

More information

The Group Structure on a Smooth Tropical Cubic

The Group Structure on a Smooth Tropical Cubic The Group Structure on a Smooth Tropica Cubic Ethan Lake Apri 20, 2015 Abstract Just as in in cassica agebraic geometry, it is possibe to define a group aw on a smooth tropica cubic curve. In this note,

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information

Coupling of LWR and phase transition models at boundary

Coupling of LWR and phase transition models at boundary Couping of LW and phase transition modes at boundary Mauro Garaveo Dipartimento di Matematica e Appicazioni, Università di Miano Bicocca, via. Cozzi 53, 20125 Miano Itay. Benedetto Piccoi Department of

More information

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems

Source and Relay Matrices Optimization for Multiuser Multi-Hop MIMO Relay Systems Source and Reay Matrices Optimization for Mutiuser Muti-Hop MIMO Reay Systems Yue Rong Department of Eectrica and Computer Engineering, Curtin University, Bentey, WA 6102, Austraia Abstract In this paper,

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

A UNIVERSAL METRIC FOR THE CANONICAL BUNDLE OF A HOLOMORPHIC FAMILY OF PROJECTIVE ALGEBRAIC MANIFOLDS

A UNIVERSAL METRIC FOR THE CANONICAL BUNDLE OF A HOLOMORPHIC FAMILY OF PROJECTIVE ALGEBRAIC MANIFOLDS A UNIERSAL METRIC FOR THE CANONICAL BUNDLE OF A HOLOMORPHIC FAMILY OF PROJECTIE ALGEBRAIC MANIFOLDS DROR AROLIN Dedicated to M Saah Baouendi on the occasion of his 60th birthday 1 Introduction In his ceebrated

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5].

(f) is called a nearly holomorphic modular form of weight k + 2r as in [5]. PRODUCTS OF NEARLY HOLOMORPHIC EIGENFORMS JEFFREY BEYERL, KEVIN JAMES, CATHERINE TRENTACOSTE, AND HUI XUE Abstract. We prove that the product of two neary hoomorphic Hece eigenforms is again a Hece eigenform

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized

More information

Linear Complementarity Systems with Singleton Properties: Non-Zenoness

Linear Complementarity Systems with Singleton Properties: Non-Zenoness Proceedings of the 2007 American Contro Conference Marriott Marquis Hote at Times Square New York City, USA, Juy 11-13, 2007 ThA20.1 Linear Compementarity Systems with Singeton Properties: Non-Zenoness

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information

Convergence and quasi-optimality of adaptive finite element methods for harmonic forms

Convergence and quasi-optimality of adaptive finite element methods for harmonic forms Noname manuscript No. (wi be inserted by the editor) Convergence and quasi-optimaity of adaptive finite eement methods for harmonic forms Aan Demow 1 the date of receipt and acceptance shoud be inserted

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Wavelet shrinkage estimators of Hilbert transform

Wavelet shrinkage estimators of Hilbert transform Journa of Approximation Theory 163 (2011) 652 662 www.esevier.com/ocate/jat Fu ength artice Waveet shrinkage estimators of Hibert transform Di-Rong Chen, Yao Zhao Department of Mathematics, LMIB, Beijing

More information

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0 Bocconi University PhD in Economics - Microeconomics I Prof M Messner Probem Set 4 - Soution Probem : If an individua has an endowment instead of a monetary income his weath depends on price eves In particuar,

More information

Tracking Control of Multiple Mobile Robots

Tracking Control of Multiple Mobile Robots Proceedings of the 2001 IEEE Internationa Conference on Robotics & Automation Seou, Korea May 21-26, 2001 Tracking Contro of Mutipe Mobie Robots A Case Study of Inter-Robot Coision-Free Probem Jurachart

More information

M. Aurada 1,M.Feischl 1, J. Kemetmüller 1,M.Page 1 and D. Praetorius 1

M. Aurada 1,M.Feischl 1, J. Kemetmüller 1,M.Page 1 and D. Praetorius 1 ESAIM: M2AN 47 (2013) 1207 1235 DOI: 10.1051/m2an/2013069 ESAIM: Mathematica Modeing and Numerica Anaysis www.esaim-m2an.org EACH H 1/2 STABLE PROJECTION YIELDS CONVERGENCE AND QUASI OPTIMALITY OF ADAPTIVE

More information

Optimization with stochastic preferences based on a general class of scalarization functions

Optimization with stochastic preferences based on a general class of scalarization functions Optimization with stochastic preferences based on a genera cass of scaarization functions Abstract: It is of crucia importance to deveop risk-averse modes for muticriteria decision making under uncertainty.

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations Goba Optimaity Principes for Poynomia Optimization Probems over Box or Bivaent Constraints by Separabe Poynomia Approximations V. Jeyakumar, G. Li and S. Srisatkunarajah Revised Version II: December 23,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

pp in Backpropagation Convergence Via Deterministic Nonmonotone Perturbed O. L. Mangasarian & M. V. Solodov Madison, WI Abstract

pp in Backpropagation Convergence Via Deterministic Nonmonotone Perturbed O. L. Mangasarian & M. V. Solodov Madison, WI Abstract pp 383-390 in Advances in Neura Information Processing Systems 6 J.D. Cowan, G. Tesauro and J. Aspector (eds), Morgan Kaufmann Pubishers, San Francisco, CA, 1994 Backpropagation Convergence Via Deterministic

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information

A Statistical Framework for Real-time Event Detection in Power Systems

A Statistical Framework for Real-time Event Detection in Power Systems 1 A Statistica Framework for Rea-time Event Detection in Power Systems Noan Uhrich, Tim Christman, Phiip Swisher, and Xichen Jiang Abstract A quickest change detection (QCD) agorithm is appied to the probem

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR

Maximizing Sum Rate and Minimizing MSE on Multiuser Downlink: Optimality, Fast Algorithms and Equivalence via Max-min SIR 1 Maximizing Sum Rate and Minimizing MSE on Mutiuser Downink: Optimaity, Fast Agorithms and Equivaence via Max-min SIR Chee Wei Tan 1,2, Mung Chiang 2 and R. Srikant 3 1 Caifornia Institute of Technoogy,

More information

Substructuring Preconditioners for the Bidomain Extracellular Potential Problem

Substructuring Preconditioners for the Bidomain Extracellular Potential Problem Substructuring Preconditioners for the Bidomain Extraceuar Potentia Probem Mico Pennacchio 1 and Vaeria Simoncini 2,1 1 IMATI - CNR, via Ferrata, 1, 27100 Pavia, Itay mico@imaticnrit 2 Dipartimento di

More information

Regularization for Nonlinear Complementarity Problems

Regularization for Nonlinear Complementarity Problems Int. Journa of Math. Anaysis, Vo. 3, 2009, no. 34, 1683-1691 Reguarization for Noninear Compementarity Probems Nguyen Buong a and Nguyen Thi Thuy Hoa b a Vietnamse Academy of Science and Technoogy Institute

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems Consistent inguistic fuzzy preference reation with muti-granuar uncertain inguistic information for soving decision making probems Siti mnah Binti Mohd Ridzuan, and Daud Mohamad Citation: IP Conference

More information

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set BASIC NOTIONS AND RESULTS IN TOPOLOGY 1. Metric spaces A metric on a set X is a map d : X X R + with the properties: d(x, y) 0 and d(x, y) = 0 x = y, d(x, y) = d(y, x), d(x, y) d(x, z) + d(z, y), for a

More information

FOURIER SERIES ON ANY INTERVAL

FOURIER SERIES ON ANY INTERVAL FOURIER SERIES ON ANY INTERVAL Overview We have spent considerabe time earning how to compute Fourier series for functions that have a period of 2p on the interva (-p,p). We have aso seen how Fourier series

More information

UNIFORM CONVERGENCE OF MULTIPLIER CONVERGENT SERIES

UNIFORM CONVERGENCE OF MULTIPLIER CONVERGENT SERIES royecciones Vo. 26, N o 1, pp. 27-35, May 2007. Universidad Catóica de Norte Antofagasta - Chie UNIFORM CONVERGENCE OF MULTILIER CONVERGENT SERIES CHARLES SWARTZ NEW MEXICO STATE UNIVERSITY Received :

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices

Preconditioned Locally Harmonic Residual Method for Computing Interior Eigenpairs of Certain Classes of Hermitian Matrices MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.mer.com Preconditioned Locay Harmonic Residua Method for Computing Interior Eigenpairs of Certain Casses of Hermitian Matrices Vecharynski, E.; Knyazev,

More information