Some Properties of Regularized Kernel Methods

Size: px
Start display at page:

Download "Some Properties of Regularized Kernel Methods"

Transcription

1 Journa of Machine Learning Research 5 (2004) Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di Modena Modena, Itay and INFN, Sezione di Genova Genova, Itay Lorenzo Rosasco Andrea Caponnetto DISI Università di Genova, Genova, Itay Michee Piana DIMA Università di Genova, Genova, Itay DEVITO@UNIMO.IT ROSASCO@DISI.UNIGE.IT CAPONNETTO@DISI.UNIGE.IT PIANA@DIMA.UNIGE.IT Aessandro Verri DISI Università di Genova, Genova, Itay VERRI@DISI.UNIGE.IT Editor: Aexander J. Smoa Abstract In reguarized kerne methods, the soution of a earning probem is found by minimizing functionas consisting of the sum of a data and a compexity term. In this paper we investigate some properties of a more genera form of the above functionas in which the data term corresponds to the expected risk. First, we prove a quantitative version of the representer theorem hoding for both regression and cassification, for both differentiabe and non-differentiabe oss functions, and for arbitrary offset terms. Second, we show that the case in which the offset space is non trivia corresponds to soving a standard probem of reguarization in a Reproducing Kerne Hibert Space in which the penaty term is given by a seminorm. Finay, we discuss the issues of existence and uniqueness of the soution. From the speciaization of our anaysis to the discrete setting it is immediate to estabish a connection between the soution properties of sparsity and coefficient boundedness and some properties of the oss function. For the case of Support Vector Machines for cassification, we aso obtain a compete characterization of the whoe method in terms of the Khun-Tucker conditions with no need to introduce the dua formuation. Keywords: statistica earning, reproducing kerne Hibert spaces, convex anaysis, representer theorem, reguarization theory c 2004 Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Michee Piana,and Aessandro Verri.

2 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI 1. Introduction The probem of earning from exampes can be seen as the probem of estimating an unknown functiona dependency given ony a finite (possiby sma) number of instances. The semina work of Vapnik Vapnik (1988) shows that the key to effectivey sove this probem is by controing the compexity of the soution. In the context of statistica earning this eads to techniques known as reguarization networks (Evgeniou et a., 2000) or reguarized kerne methods (Vapnik, 1988; Cristianini and Shawe Tayor, 2000; Schökopf and Smoa, 2002). More precisey, given a training set S = (x i,y i ) i=1 of pairs of exampes, the estimator is defined as f λ S argmin f H { 1 i=1 V(y i, f(x i ))+λ f 2 H }, (1) where V is the oss function, H is the Hibert space of the hypotheses and λ > 0 is the reguarization parameter. As shown by Evgeniou et a. (2000) the above minimization probem can aso be seen as particuar instance of Tikhonov Reguarization (Tikhonov and Arsenin, 1977; Mukherjee et a., 2002) for a mutivariate function approximation probem which is we known to be i-posed (Bertero et a., 1988; Evgeniou et a., 2000; Poggio and Smae, 2003). In this paper we study the generaization of the above probem to the continuous setting, that is, given a probabiity distribution ρ defined on X Y where X is the input space and Y is the output space, we study the properties of the estimator ( f λ,g λ ) argmin ( f,g) H B { V(y, f(x)+g(x))dρ(x,y)+λ f 2 H }, (2) where H and B are reproducing kerne Hibert spaces (RKHS): H is the space of penaized functions and B is the offset space (Wahba, 1990). Considering the continuous setting is meaningfu for severa reasons. First, it is usefu in order to study the probem of the generaization properties of kerne methods (Steinwart, 2002). To this purpose, one associates with each function f : X R its expected risk, I[ f] = V(y, f(x))dρ(x,y), where ρ is the unknown probabiity distribution describing the reation between the input x X and the output y Y. Foowing Cucker and Smae (2002), for reguarized kerne methods the discrepancy between the expected risk of the estimator, fs λ, and the minimum obtainabe risk, inf f H I[ f], can be decomposed as ( ) ( ) I[ fs λ ] inf I[ f] = I[ fs λ ] I[ f λ ] + I[ f λ ] inf I[ f], f H f H where the first term represents the sampe error and the second term the approximation error (Niyogi and Girosi, 1999). Ceary, insight on the form of f λ can be usefu to obtain better bounds on both errors. Second, considering the continuous measure ρ corresponds intuitivey to finding a stabe soution to the earning probem in the case of infinite number of exampes and, hence, gives information about the best we can do in the hypothesis space H B (Mukherjee et a., 2002). Third, 1364

3 SOME PROPERTIES OF REGULARIED KERNEL METHODS we can treat both the empirica measure and the idea unknown probabiity distribution in a unified framework. The contribution of our work is threefod. First we provide a compete characterization of the expicit form of the estimator ( f λ,g λ ) given by Eq. (2) by expoiting a convexity assumption on the oss functions. Our resut can be interpreted as a quantitative version of the representer theorem hoding for both regression and cassification and in which expicit care is taken of the offset space B. Then, we discuss the roe of the offset space B. The starting point of our discussion is the obvious observation that the estimator given by Probem (2) is not the pair ( f λ,g λ ) but the sum f λ + g λ. In other words the natura hypothesis space is the sum H + B instead of the product H B (which is not even a space of functions from X to R). For arbitrary oss function we prove that Probem (2) is equivaent to a kerne method defined on H + B, which is a RKHS, with a penaty term given by a seminorm. Finay, for sake of competeness, we study the issues of the existence and uniqueness for Probem (2). When B is not the empty set, both issues are not trivia. In particuar, for B equa to the set of constants, we prove existence under very reasonabe conditions: for exampe, for cassification, one needs at east two exampes with different abes. About uniqueness we show that, for stricty convex oss functions, one has uniqueness if and ony if the space B is sma enough to be separated by the measure ρ: for exampe, in the discrete setting, this ast condition means that a function g B is equa to 0 if and ony if g(x i ) = 0 for a i. For the hinge oss function, which is convex but not stricty convex, we give an ad hoc condition in terms of number of support vectors of the two casses. The pan of the paper is as foows. In Section 2 we discuss our contributions with respect to previous works. In Section 3 we introduce some basic concepts of earning theory and state the assumptions we make on the oss function V and hypothesis spaces H and B. In Section 4 we study the form of the soution of Probem (2). In Section 5 we discuss the theoretica meaning of the offset space B. We discuss the probem of existence and uniqueness in Section 6. In Section 7 we appy our resuts to the discrete setting and focus on the case of Support Vector Machines. In the appendix we reca some notions from convex anaysis in infinite dimensiona spaces. 2. Putting Our Work in Context We now briefy discuss the reation between our resuts and the previous works on this subject. Resuts about the form of the soution of kerne methods are known in the iterature as representer theorems (if B is not trivia they are caed semiparametric representer theorems). The first resut in this direction is due to Kimedorf and Wahba (1970) for the squared oss function (see aso Wahba, 1990). However, the structure of the proof hods for arbitrary oss function as shown by many authors such as Cox and O Suivan (1990). In the framework of statistica earning, Schökopf et a. (2001) give a proof of the representer theorem that hods for an arbitrary oss function and for any penaty term, being it a stricty increasing function of the norm. This kind of resuts shows that, if the H is a RKHS with kerne K, the estimator fs λ defined by Eq. (1) can be written as f λ S (x) = i=1 α i K(x,x i ). The above resut hods for arbitrary oss function and for a arge cass of penaty terms. However, the form of the coefficients α i is unknown. 1365

4 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI For the squared oss function, the form of the coefficients is we known in the context of inverse probem, see, for exampe, Tikhonov and Arsenin (1977), and reduces to sove a inear system of equations. For arbitrary differentiabe functions, this probem was studied by Poggio and Girosi (1992); Girosi (1998); Wahba (1998) where the coefficients α i are soution of a system of agebraic equations. This approach cannot be appied to hinge and ε-insensitive oss function (Vapnik, 1988), since they are not differentiabe: the form of the coefficients α i is recovered ony through the usua dua Lagrangian formuation of the minimization probem, see, for exampe, Vapnik (1988); Cristianini and Shawe Tayor (2000). Recenty, hang (2001) gives a quantitative representer theorem in the cassification setting that hods for differentiabe oss function and Steinwart (2003) extends this resut for arbitrary convex oss function, without using the dua probem. In these papers the form of the coefficients α i is given in terms of a cosed equation invoving the subgradient of the oss function. Moreover, they are abe to extend the representer theorem to the continuous setting (a study of the soution of Tikhonov reguarization in the continuous setting when the square oss is used can be found aso in Cucker and Smae, 2002). This paper, using techniques simiar to those of Steinwart (2003), extends the above resut in the foowing directions: our resut hods both for regression and cassification; we provide a genera resut that hods aso when the offset term is considered. The presence of the offset space forces the coefficients α i to satisfy a system of inear equations; we do not assume that input space X and the output space Y are compact. In particuar, for regression we can assume Y = R; we provide a simper proof than the one of Steinwart (2003) by using known resuts about integra convex functionas. A discussion of the roe of the offset terms can be found in Evgeniou et a. (2000) and in Poggio et a. (2002) when the space B reduces to the set of constant functions. The resuts are cose to our Theorem 6, but they are proved assuming that the unit constant is in the Mercer decomposition of the kerne and for the discrete setting, whie our resut hods true for offset term iving in arbitrary RKHS. The probem of the existence and uniqueness is discussed in Wahba (1998) for the discrete setting and with differentiabe oss functions. For arbitrary ρ the papers by Steinwart (2002, 2003) study the existence for the cassification setting with offset space reduced to the constant functions. For the hinge oss and ε-insensitive oss, the probem of uniqueness is treated in Burges and Crisp (2000, 2003). Their proof is based on the dua probem and on the Kuhn-Tucker conditions. Our resuts subsume the cited resuts as specia cases, but are a obtained in the more genera continuous setting. In particuar our resuts on uniqueness of SVM soution are simiar to those in Burges and Crisp (2000, 2003) but do not make use of the dua formuation. 3. Notation and Assumptions In this section we first fix the notation and then state and comment upon the basic assumptions needed to derive the resuts described in the rest of the paper. We start with input and output spaces. 1366

5 SOME PROPERTIES OF REGULARIED KERNEL METHODS 3.1 Input and Output Spaces As usua, we denote with X and Y the input and output spaces respectivey. We assume that X is a ocay compact second countabe space (this assumption is satisfied for instance if X is a cosed subset of R d ) and Y is a cosed subspace of R. We et = X Y and endow it with a probabiity distribution ρ defined on the Bore σ-agebra of. We reca that, since ρ is a bounded measure and is second countabe, ρ is a Radon measure. In practice, ρ wi be either the unknown distribution describing the reation between x and y or the empirica measure ρ S = 1 i=1 δ (xi,y i ), associated with the training set S = {(x i,y i )} i=1 drawn i.i.d. with respect to ρ. We now dea with oss functions. 3.2 Loss Functions We coect the mathematica assumptions on the oss function in the foowing definition and we comment on the purpose of each assumption. Definition 1 Given p [1,+ [, a function V : Y R [0,+ [ such that 1. for a y Y the function V(y, ) is convex on R; 2. the function V is measurabe on Y R; 3. there are b [0,+ [ and a : Y [0,+ [ such that V(y,w) a(y)+b w p w R, y Y (3) a(y)dρ(x, y) < +, (4) is caed a p-oss function with respect to ρ. If the context is cear, V is simpy caed a oss function. The convexity hypothesis is not restrictive, being satisfied by a the oss functions commony in use. Moreover, it is powerfu from a technica point of view: it aows for the use of subgradient techniques without assuming differentiabiity of V and makes it possibe to use convex anaysis toos in the study of existence and uniqueness of functiona minimizers. Finay, this requirement ensures stronger bounds for the sampe error (Bartett et a., 2002; Bartett, 2003; Bartett et a., 2003). Assumption 2. is a minima requirement for defining the expected risk and it is usuay satisfied since oss functions commony in use are continuous on. Condition 3. is a technica hypothesis we need in order to use resuts from convex integra functiona anaysis. For exampe, it is satisfied in the foowing cases 1. for p = 2, if V is the square oss function, V(y,w) = (y w) 2, and y 2 dρ(x,y) < + ; 1367

6 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI 2. for p = 1, if V(y, ) is Lipschitz on R with a Lipschitz constant independent of y and V(y,0)dρ(x,y) < +. We now restrict our anaysis to some functionas studied in statistica earning. 3.3 Learning Functionas The expected risk of a measurabe function f : X R is defined as I[ f] = V(y, f(x))dρ(y,x), and can be seen as the average error obtained by the function f, where f is a possibe soution of the earning probem and the probabiity measure ρ is unknown. Given a training set S, a possibe way to estimate I[ f] is to evauate the empirica risk I S emp[ f] = 1 i=1 V(y i, f(x i )). The probem of earning is to find, given the training set S, an estimator f effectivey predicting the abe of a new point. This transates in finding a function f such that its expected risk is sma with high probabiity. A possibe way to efficienty sove the earning probem is provided by reguarized kerne methods which amounts to soving a probem of functiona minimization as Probem (1). A generaization of Probem (1) to a continuous setting is provided by Probem (2) in which the continuous measure ρ repaces the empirica measure ρ S in the first term. In what foows we wi refer to the functionas to be minimized in both Eq. (1) and Eq. (2) as Tikhonov functionas and to the soutions as the reguarized soutions. The second term of a Tikhonov functiona is a smoothness or a compexity term measuring the norm of the function f in a suitabe Hibert space H. The minimization takes pace in the hypothesis space H B. We now coect the assumptions on the hypothesis space at the basis of our anaysis. 3.4 Hypothesis Space First of a, we reca the definition of reproducing kerne Hibert space. A RKHS H on X with kerne K : X X R is defined as the unique Hibert space of rea vaued functions on X such that, for a f H, f(x) = f,k x H x X, (5) where K x is the function on X defined by K x (s) = K(x,s). Given a probabiity meausure ρ on and p [1,+ [, we say that the kerne K is p-bounded with respect to ρ if the function K is measurabe on X X and K(x,x) p 2 dρ(x,y) < +. (6) Ceary the above condition depends ony on the margina distribution of ρ on X and ensures that H is a subspace of L p (,ρ) with continuous incusion (see Lemma 4 in Section 4). This fact is 1368

7 SOME PROPERTIES OF REGULARIED KERNEL METHODS essentia for proving our resuts. In particuar, the p-boundedness of the kerne is fufied for a p [1,+ [ if X is compact and the kerne is continuous or if the kerne is measurabe and bounded. We are now ready to discuss the assumptions on the hypothesis space. We fix the probabiity measure ρ on and p [1,+ [ such that V is p-bounded with respect to ρ. We require that the space of penaized functions H and the space of offset functions B are RKHS on X such that the corresponding kernes K and K B are p-bounded with respect to ρ. We denote the corresponding norms by H and B. Finay, we notice that, in genera, the product space H B is not a RKHS. In earning theory usuay X is compact, K is continuous and B is the one dimensiona vector space of constant functions B = { f : X R f(x) = b, b R} = R with kerne K B simpy given by K B (x,s) = 1. Another exampe of offset space, which arises in approximation probems in RKHS on a bounded interva, is the space of spines of order n, whose corresponding kerne is continuous (Wahba, 1990). In both case the p-boundedness assumption is satisfied for a p. Our framework aows to treat arbitrary (possiby infinite-dimensiona) offset spaces with the possibiity to incorporate jumps in the offset term. Finay, the requirement that the hypothesis space is a RKHS is due to the fact that minimization of a convex functiona in a Hibert space is easier to treat than in an arbitrary Banach space since in the former case the subgradient of the functiona is an eement of the space itsef. Moreover, in the proofs we use extensivey the reproducing property given by Eq. (5). 4. Expicit Form of the Reguarized Soution In this section we determine the expicit form of the minimizer of the Tikhonov functiona introduced in the previous section. We first state the main theorem and comment on the obtained resut, then we provide the mathematica proof. 4.1 Main Theorem Theorem 2 Let ρ be a probabiity measure on X Y where X is a ocay compact second countabe space and Y is a cosed subset of R. Let V be a p-oss function with respect to ρ, p [1,+ [. Let H and B reproducing kerne Hibert spaces such that the corresponding kernes K and K B are p-bounded with respect to ρ. Define q =]1,+ ] such that 1 q + 1 p = 1. Let λ > 0 and ( f λ,g λ ) H B, then { } ( f λ,g λ ) argmin V(y, f(x)+g(x))dρ(x,y)+λ f 2 H ( f,g) H B if and ony if there is α L q (,ρ) satisfying α(x,y) ( V)(y, f λ (x)+g λ (x)) (x,y) X Y a.e. (8) f λ (s) = 1 K(s,x)α(x,y)dρ(x,y) s X (9) 2λ 0 = K B (s,x)α(x,y)dρ(x,y) s X. (10) (7) 1369

8 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI The proof of this theorem is given in the foowing subsection. A few important remarks are in order. First, the theorem gives a genera quantitative version of the representer theorem. The generaity is obtained by considering the continuous setting which subsumes the discrete setting if the measure ρ is the empirica measure ρ S. In this case, the integra reduces to a finite sum and we recover the we known resut that f λ S = i=1 α ik xi, where the x i form the training set. Moreover, the soution is quantitativey characterized since the coefficients α are given by Eq. (8) invoving the subgradient. For differentiabe oss functions in the discrete setting, Eq. (8) reduces to α i = V (y i, f λ S (x i )+g λ S(x i )), where V denotes the derivative with respect to the second variabe (Girosi, 1998; Wahba, 1998). Second, if {ψ i } m i=1 is a base for B, the offset part of the soution can be written as gλ = m i=1 d iψ i, where the coefficients d i are again constrained by Eq. (8). A discussion on how to sove expicity Eq. (8) can be found in Wahba (1998). Furthermore, the presence of B induces a system of inear constraints on the coefficients α i expressed by Eq. (10) that, for B = R, reduces to the we known condition α i = 0. i=1 We stress that, unike previous works, the above equation has been derived without introducing the dua formuation. Finay, we discuss the roe of Assumption 3) in Definition 1. From the proof, it is apparent that this assumption is needed to ensure the continuity of the first term in the Tikhonov functiona which in the discrete setting is triviay guaranteed. Therefore, for the discrete setting Theorem 2 hods for any convex oss function. In particuar, L q (,ρ S ) = R and the condition α L q (,ρ S ) is aways satisfied. Back to the continuous setting, if V(y, ) is Lipschitz on R with a Lipschitz constant independent of y and V(y,0)dρ(x,y) < +, one can choose p = 1, so that q = + and condition α L (,ρ) means that α is bounded. For the square oss, ceary p = 2, so that q = 2 and α is square-integrabe. As shown by Steinwart (2003), for cassification and compact X, one can again remove Assumption 3) of Definition 1 using the fact that a convex function is ocay Lipschitz and the range of possibe y is bounded. The foowing coroary is the restatement of the representer theorem without offset space. Coroary 3 With the assumptions of Theorem 2, et f λ H then { } f λ argmin V(y, f(x))dρ(x,y)+λ f 2 H f H if and ony if there is α L q (,ρ) satisfying α(x,y) ( V)(y, f λ (x)) (x,y) X Y a.e. f λ (s) = 1 K(s,x)α(x,y)dρ(x,y) s X. 2λ 1370

9 SOME PROPERTIES OF REGULARIED KERNEL METHODS 4.2 Proof of the Main Theorem Before giving the proof of the theorem we discuss the proof structure, which aside from some technicaities is very simpe, and is based on two emmas. The Tikhonov functiona I[ f + g] + λ f 2 H is a convex map on H B, so ( f λ,g λ ) is a minimizer of the Tikhonov functiona if and ony if (0,0) is in its subgradient, which is a subset of H B. Using inearity, the computation of the subgradient of the Tikhonov functiona reduces to the computation of the subgradient of I[ f + g] and f 2 H respectivey. Since the atter functiona is differentiabe, the subgradient evauation is straightforward. Some care is needed for the subgradient of the former. First, we rewrite it as an integra functiona on L p (,ρ) and then use a fundamenta resut of convex anaysis to interchange the integra and the subgradient. Proof [of Theorem 2] Ceary, λ f 2 H is continuous and, by Lemma 4, the functiona I[ f + g] is continuous and finite. So, from item 5 of Proposition 14, one has that Now, the map ( I[ f + g]+λ f 2 H ) = (I[ f + g])+λ ( f 2 H ). ( f,g) f 2 H is differentiabe with derivative (2 f,0) and, therefore, by item 1 of Proposition 14, ( f 2 H ) = {(2 f,0)}. (11) The main difficuty is the evauation of the subgradient of the map I[ f + g] given in Lemma 5. By means of this emma we obtain that the eements of the subgradient of I[ f + g] at ( f,g) are of the form ( ) K(x, )α(x,y)dρ(x,y), K B (x, )α(x,y)dρ(x,y), (12) where α L q (,ρ) satisfies α(x,y) ( V)(y, f(x)+g(x)) (13) for ρ-amost a (x,y) X Y. Now, by combining Eq. (11) and Eq. (12), we have that the eements of the subgradient of I[ f + g]+λ f 2 H at point ( f,g) are of the form ( ) K(x, )α(x,y)dρ(x,y)+2λ f, K B (x, )α(x,y)dρ(x,y). (14) where α L q (,ρ) satisfies Eq. (13). From item 3 of Proposition 14, we have that an eement ( f λ,g λ ) H B is a minimizer of I[ f + g]+λ f 2 H if and ony if (0,0) beongs to the subgradient evauated at ( f λ,g λ ). Using Eq. (14), one has that f λ (s) = 1 α(x, y)k(x, s)dρ(x, y) 2λ α(x,y)k B (x,s)dρ(x,y) = 0. where, by means of Eq. (13), α L q (,ρ) satisfies Eq. (8). This ends the proof. 1371

10 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI Before computing the subgradient of the map I[ f + g] in Lemma 5, we need to extend the definition of expected risk on L p (,ρ). First of a, we et I 0 [u] = V(y,u(x,y)) dρ(x,y) u L p (,ρ), so that I[ f + g] = I 0 (J(f,g)) where J : H B L p (,ρ) is the inear map J(f,g) = f + g, (the function f + g is viewed in a natura way as a function on ). The foowing emma coects some technica facts on I 0 and J. Lemma 4 With the above notations, 1. the functiona I 0 : L p (,ρ) [0,+ [ is we-defined and continuous; 2. the operator J : H B L p (,ρ) is we-defined and continuous. Proof Since the oss function V can be regarded as function on R, that is, V(z,w) = V(y,w) where z = (x,y), one has that I 0 [u] is the Nemitski functiona associated with V (see Appendix), that is, I 0 [u] = V(z,u(z))dρ(z) u L p (,ρ). We caim that I 0 [u] is finite. Indeed, given u L p (,ρ), by Eq. (3), V(y,u(z))dρ(x,y) a(y)+b u(z) p dρ(x,y) < +. The proof that I 0 is continuous can be found in Proposition III.5.1 of Ekeand and Turnbu (1983). In order to prove the second item, we et f H. Then, by Eq. (5), f(x) p dρ(x,y) = f,k x H p dρ(x,y) f p H K(x,x) p 2 dρ(x,y) = C f p H < +. where C = R K(x,x) p 2 dρ(x,y) is finite since K is p-bounded (see Eq. (6)). In particuar, the function (x,y) f(x) is in L p (,ρ) and f L p p C f H. The same reation ceary hods for g B. It foows that J is we defined and Since J is inear, it foows that J is continuous. f + g L p p C f H + p C g B. Finay, the foowing emma computes the subgradient of I = I 0 J. 1372

11 SOME PROPERTIES OF REGULARIED KERNEL METHODS Lemma 5 With the above notations, et ( f,g) H B, then (φ,ψ) (I 0 J)( f,g) if and ony if there is α L q (,ρ) such that α(x,y) φ(s) = ψ(s) = ( V)(y, f(x)+g(x)) (x,y) X Y a.e. K(s,x)α(x,y)dρ(x,y) s X K B (s,x)α(x,y)dρ(x,y) s X. Proof Since I 0 is finite and continuous in 0 = J(0), by point 6 of Proposition 14, we know that where J : L q (,ρ) H B is the adjoint of J, that is, (I 0 J)( f,g) = J ( I 0 )(J(f,g)), (15) J α,( f,g) H B = α(x,y)j(f,g)(x,y) dρ(x,y). First of a, we compute I 0. Since I 0 [0] < +, we can appy Proposition 15 so that, given u L p (,ρ), then α ( I 0 )(u) if and ony if α L q (,ρ) and α(z) ( V)(y,u(x,y)), for ρ-amost a (x,y) X Y. We now compute the adjoint of J. Let α L q (,ρ) and (φ,ψ) = J α H B. Using the reproducing property of H and the definition of J we can write φ(s) = φ,k s H = J α,(k s,0) H B = α,j(k s,0) L 2 (,ρ). Writing the scaar product expicity we then find φ(s) = K(s, x)α(x, y)dρ(x, y). Reasoning in the same way we find that ψ(s) = K B (s,x)α(x,y)dρ(x,y). Repacing the above formuas in Eq. (15), we have the thesis. 5. Deaing with the Offset Space B In this section we dea with the offset term which often appears in reguarized soutions. We first motivate our anaysis, then state and discuss our main resut on this issue. Finay, we give the proof of the resuts. 1373

12 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI 5.1 Motivations In the previous section we minimized a Tikhonov functiona on the set H B, deaing expicity with the possibe presence of an offset term in the form of the soution. Typica exampes in which offset spaces arise are Support Vector Machine agorithms (Vapnik, 1988), where the offset term is a constant accounting for the transation invariance of the separating hyperpane, and penaization methods (Wahba, 1990), where the offset space is the kerne space of the penaization operator. However, the fact that the set H B is not a RKHS (in fact, it is not even a function space) makes it cumbersome to extend of typica statistica earning resuts to the genera setting in which the offset term is considered. For exampe a separate anaysis, with and without the offset term, is needed for measuring the compexity of the hypothesis space or studying agorithm consistency. In this section we show that under very weak conditions the presence of an offset term is equivaent to soving a standard reguarization probem with a seminorm (Wahba, 1990). The fact that the estimator is f λ (x)+g λ (x) (for regression) or sgn ( f λ (x)+g λ (x) ) (for cassification) suggests to repace H B with the sum S = H + B = { f + g f H, g B}. The hypothesis space S is a space of functions on X and, in particuar, a RKHS, the kerne being the sum of the kernes of H and B. In this section we show that the minimization of a Tikhonov functiona on H B is essentiay equivaent to the minimization of an appropriate functiona on S. This provides a rigorous derivation of the foowing facts. 1. The equivaent functiona on S is aso a Tikhonov functiona. The penaty term is a seminorm penaizing the functions in S orthogona to B ony. 2. The estimator given by the minimization of the Tikhonov functiona on S depends ony on the kerne sum. Moreover, since the hypothesis space S is a RKHS, a number of cassica resuts of earning theory foows without further effort. Finay, we notice that the norm of B (hence the kerne K B ) pays no roe in the functiona I[ f + g]+λ f 2 H, that is, a kernes, whose corresponding RKHS is B as a vector space, give rise to the same minimizers ( f λ,g λ ). This fact is confirmed by Eq. (18) beow (see aso Eq. (20)). 5.2 Main Theorem We reca that the norm in S is given by f + g 2 S = ( inf f 2 f H,g B H + g B) 2 f+g= f +g (16) and, with respect to this norm, S is a RKHS on X with kerne K + K B (Schwartz, 1964). We are now ready to state the foowing resut. 1374

13 SOME PROPERTIES OF REGULARIED KERNEL METHODS Theorem 6 Let Q be the orthogona projection on the cosed subspace of S S 0 = {s S s,g S = 0 g B}, that is the subset of functions orthogona to B w.r.t. the scaar product in S. We have the foowing facts. 1. If ( f λ,g λ ) H B is a soution of the probem min {I[ f + g]+λ f ( f,g) H B 2 H }, then s λ = f λ + g λ S is a soution of the probem and f λ = Qs λ. 2. If s λ S is a soution of the probem et f λ = Qs λ and g λ = s λ Qs λ, then min s S {I[s]+λ Qs 2 S } min s S {I[s]+λ Qs 2 S }, I[ f λ + g λ ]+λ f λ 2 = inf {I[ f + g]+λ f H ( f,g) H B 2 H }. In particuar, if g λ B, then ( f λ,g λ ) H B is a minimizer of I[ f + g]+λ f 2 H. Before giving the proof in the foowing subsection we comment on this resut. First, notice that if H B = {0} then S = H B and f + g 2 S = f 2 H + g 2 B. In this case the theorem is trivia. However, in the arbitrary case care is needed because there are functions in H not orthogona to B. Moreover, the norm S restricted to H and B coud be different from H and B : in particuar, it coud happen that (B ) B, where the orthogonaity is meant with respect to the dot product in S. This pathoogy is at the root of the fact that there are cases in which the probem min s S {I[s]+λ Qs 2 S } has a soution, whereas the functiona I[ f + g]+λ f 2 H does not admit a minimizer on H B (see exampe beow). In practice, since H B in most appications is finite dimensiona, this pathoogy does not occur and the minimization probem on H B is fuy equivaent to the one on S. Second, the advantage of using the penaty term f 2 H instead of Qs 2 S is that one can sove the minimization probem without knowing the expicit form of the projection Q. Conversey, the space S is the natura space to address theoretica issues. 1375

14 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI Third, we observe that since the proof does not depend on the convexity of the oss function, the theorem hods for arbitrary (positive) oss functions. However, if V satisfies the hypotheses of Definition 1, from Theorem 2 it foows that the minimizer s λ of I[s]+λ Qs 2 S is of the form ( ) s λ (s) = α(x, y) K(x,s)+K B (x,s) dρ(x,y)+g λ (s) (17) = α(x,y)k(x,s)dρ(x,y)+g λ (s) (18) where g λ B and α L q (,ρ) satisfies α(x,y) ( V)(y,s λ (x)) (19) α(x,y)k B (x,s) = 0. (20) In particuar, this impies that, given h B, one can repace the kerne K with K(x,s)+h(x)h(s), without changing the form of the minimizer s λ. For exampe if B is the set of constant functions, the two kernes K(x,s) = x s and K(x,s) = x s+1 are equivaent since both penaize the functions orthogona to 1, that is the space of inear functions. 5.3 Proof Before giving the proof of Theorem 6 we need to prove the foowing technica emma. For this purpose we reca that S 0 was defined as S 0 = {s S s,g S = 0 g B}, and Q was the corresponding orthogona projection from S onto S 0. Moreover we et H 0 be the cosed subspace of H given by H 0 = { f H f,h H = 0 h H B} and P be the corresponding orthogona projection from H onto H 0. In order to prove the main theorem we need the foowing technica emma that characterizes the space S 0. Lemma 7 Let s = f + g S with f H and g B, then and there is a sequence ( f n,g n ) H B such that with f n + g n = s. Qs = P f (21) Qs S = P f H (22) im P f f n n H = 0 (23) 1376

15 SOME PROPERTIES OF REGULARIED KERNEL METHODS Equations (21) and (22) show that S 0 and H 0 are the same Hibert space and, in particuar, Qs H. However, in genera, it coud happen that s Qs B. Equation (23) is a technica trick to overcome this pathoogy. Proof [of Lemma 7] To give the proof of the emma we need some preiminary facts. Let K be the cosed subspace of H B K = {( f,g) H B ( f,h) H = (g,h) B h H B}. It is known (Schwartz, 1964) that, given s S, there is a unique ( f,g) K such that s = f + g. Moreover for a ( f,g ) H B, s, f + g S = f, f H + g,g B. (24) From Eq. (16) one has that f S f H f H (25) First of a we caim that H 0 S 0. Ceary, if f H 0, then ( f,0) K and, by Eq. (24), for a g B, f + 0,0+g S = f,0 H + 0,g B = 0, that is f S 0. This shows the caim. Moreover, f 2 S = f + 0, f + 0 S = f, f H = f 2 H. (26) Let s = f + g with f H and g B. Ceary, f = P f + h where h H0 = ((H B) ) = H B (here denotes the orthogona compement with respect to the scaar product of H ). It foows that there is a sequence h n H B such that im h h n n H = 0. (27) Since, by Eq. (25), h h n S h h n H and Q is continuous, it foows that Qh = im n Qh n = 0, since Qh n = 0. The statements of the theorem easiy foow from the above facts. Indeed Qs = Q(P f + h+g) = QP f = P f, since P f H 0 S 0, and Equation (21) is proved. Equation (22) foows from Eq. (26). Finay et now f n = P f + h h n and g n = g+h n. Ceary, f n + g n = f + g = s, f n H and g n B and moreover Eq. (23) foows from Eq. (27). We are now ready to prove the main theorem of this section. Proof [Theorem 6] First of a we note the foowing facts. Let f H, g B and s = f + g S. By Eq. (22) I[s]+λ Qs 2 S = I[ f + g]+λ P f 2 H (28) Let ( f n,g n ) H B as in Lemma 7, then I[ f + g]+λ P f 2 H = im n ( ) I[ f n + g n ]+λ f n 2 H. 1377

16 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI From the above equaities it foows that I[s]+λ Qs 2 S = im n ( ) I[ f n + g n ]+λ f n 2 H. (29) We can now prove the first part of the theorem. Assume that ( f λ,g λ ) H B is a minimizer of I[ f + g]+λ f 2 H and et sλ = f λ + g λ. From Eq. (29) and the definition of minimizer, one has that, for a s S, I[s]+λ Qs 2 S I[ f λ + g λ ]+λ f λ 2. (30) H In particuar with the choice s = s λ, by means of Eq. (22), one has that Qs S = P f λ H f λ H, and, hence, that Qs λ = P f λ = f λ. Therefore, it foows that I[s]+λ Qs 2 S I[sλ ]+λ Qs λ 2, S that is, s λ is a minimizer of I[s]+λ Ps 2 S. Before proving the second part of the theorem we note that the foowing inequaity foows as a simpe consequence of the definition of projection. I[s]+λ Qs 2 S = I[ f + g]+λ P f 2 H I[ f + g]+λ f 2 H. (31) Assume now that s λ S is a minimizer of I[s]+λ Qs 2 S. Let f λ = Qs λ and g λ = s f λ, then, by Eq. (31) and Eq. (22), it foows that I[ f λ + g λ ]+λ f λ 2 inf {I[ f + g]+λ f H ( f,g) H B 2 H }. However, using Eq. (29) with s = f λ + g λ, one has that I[ f λ + g λ ]+λ f λ 2 inf {I[ f + g]+λ f H ( f,g) H B 2 H }. So I[ f λ +g λ ]+λ f λ 2 H is the infimum of I[ f +g]+λ f 2 H on H B. Ceary, if gλ B, it foows that ( f λ,g λ ) is a minimizer of I[ f + g]+λ f 2 H. 5.4 A Counterexampe The foowing exampe shows that in some pathoogica framework the minimization on H B is not equivaent to the one on S = H + B. Exampe 1 Let H = 2 = { f = ( f n ) n N n f 2 n < + }. The space 2 is a RKHS on N with respect to the kerne K(n,m) = δ n,m. Let B = { f 2 n n 2 f 2 n < + } with the scaar product f,g B = n 2 f n g n. n 1378

17 SOME PROPERTIES OF REGULARIED KERNEL METHODS The space B is a RKHS with respect to the kerne K B (n,m) = 1 n 2 δ n,m. Ceary, B H, so that H B = B, which is not cosed in H. Since B is dense in H, P = 0 and, by Lemma 7, Q = 0. Let V be the squared oss function and choose h = (h n ) n N H such that h B. Let ρ(n,y) = δ(y h n ) so that I[s] = s h 2 S, then I[s]+λ Qs 2 S = s h 2 S, and the minimizer is s λ = h. Moreover, by our theorem, one has that inf {I[ f + g]+λ f f H,g B 2 H } = I[sλ ]+λ Qs λ 2 = 0. S If ( f λ,g λ ) H B were a minimizer, then f λ = 0 and, hence, g λ = h, but this is impossibe since h B. 6. Existence and Uniqueness We now discuss existence and uniqueness of the reguarized soution in S. Before stating and proving the main resuts we summarize our findings and show that if the offset space is empty both existence and uniqueness are easiy obtained. Our anaysis extends existence to a cases of interest under some weak assumptions on the kerne and the oss function for both regression and cassification. Uniqueness depends criticay on the convexity assumption. For stricty convex functions we prove that the soution is unique if and ony if the offset space satisfies suitabe conditions, fufied in the case of constant offsets. For oss functions which are not stricty convex we imit our attention to the hinge oss and show that the soution is unique uness some particuar conditions on the number and ocation of the support vectors are met. In Burges and Crisp (2000, 2003) simiar resuts were obtained considering the dua formuation of the minimization probem. If the offset space is empty, strict convexity and coerciveness of the penaty term triviay impy both existence and uniqueness. Indeed, we have the foowing proposition. Proposition 8 Given λ > 0, there exists a unique soution of the probem ( ) min I[ f]+λ f 2 H. f H ( ) Proof The function I[ f]+λ f 2 H is stricty convex and continuous. Moreover I[ f]+λ f 2 H λ f 2 H + if f H goes to +. From item 4 of Proposition 14 both existence and uniqueness foow. 1379

18 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI 6.1 Existence We now consider existence. If B is not trivia, there are no genera resuts (see Wahba, 1990, for a discussion on this subject). However, if B is the set of constant functions, we derive existence of the soution in two different settings. The first proposition hods ony for cassification under the assumption that the oss function V goes to infinity when y f(x) goes to (see Condition 1 of Proposition 9 beow). Simiar resuts were obtained in Steinwart (2002). We et ν be the margina measure on X associated with ρ and suppν its support. Proposition 9 Assume that the foowing conditions hod 1. im w V(1,w) = + and im w + V( 1,w) = + 2. there is C > 0 such that K(x,x) C for a x suppν 3. ρ(x {1}) > 0 and ρ(x { 1}) > 0 Then there is at east one soution of the probem where S = H +R. min s S ( I[s]+λ Qs 2 S We observe that Assumption 2. is satisfied if X is compact and K is continuous. Assumption 3. has a very natura interpretation in the discrete setting where it simpy amounts to have one exampe for each cass. This condition is need since Assumption 1. does not requires that V goes to + when y f(x) goes to +. Typica exampe of oss function satisfying Assumption 1. is the hinge oss. The second resut hods both for regression and cassification, but it requires the oss function going to infinity when f(x) goes to ±, uniformy in y (compare Assumption 1. of Proposition 10 and Assumption 1. of Proposition 9). Proposition 10 Assume that the foowing conditions hod 1. im w ± (inf y Y V(y,w)) = there is C > 0 such that K(x,x) C for a x suppν. Then there is at east one soution of the probem where S = H +R. min s S ( I[s]+λ Qs 2 S We observe that for cassification with symmetric oss functions, as the squared oss function, this proposition gives a sharper resut than Proposition 9. We now prove Proposition 9 and omit the proof of Proposition 10 since it is essentiay the same. Proof [of Proposition 9] The idea of the proof is to show that the functiona we have to minimize goes to + when s S goes to +. With this aim, et ), ), α = min{ρ(x {1}),ρ(X { 1})}. 1380

19 SOME PROPERTIES OF REGULARIED KERNEL METHODS By assumption 3, α > 0. For a fixed M > 0, we are ooking for R > 0 such that for a s S with s S R, I[s]+λ Qs 2 S M. Due to assumption 1, there is r > 0 such that, for a w r, V(1,w) M α and, for a w r, V( 1,w) M α. We now et R = max{2(1+c) M λ,2r} and choose s S with s S R. If Qs S = Qs H R 2(1+C), then I[s]+λ Qs 2 S λ Qs 2 S R λ( 2(1+C) )2 M, M since R 2(1+C) λ. If Qs R 2(1+C), et b = s Qs R, then b Assume, for exampe, that b > 0. For a x suppν = s Qs S s S Qs S R R 2(1+C) = R2C+ 1 2C+ 2. s(x) = Qs,K x H + b b Qs H K x H R 2C+ 1 2C+ 2 R 2(1+C) C R C+ 1 2C+ 2 = R 2 r, since R r 2. By definition of r, one has that for a x suppν V( 1,s(x)) M α. Integrating both sides, we find V( 1,s(x))dρ(x, 1) M α from which it foows that X { 1} I[s]+λ Qs 2 S M. ρ(x { 1}) M The same proof hods when b < 0 repacing the integration on X { 1} with the integration on X {1}. Since M is arbitrary, we have that I[s]+λ Qs 2 S λ Qs 2 S +. Since the functiona is continuous, from item 4 of Proposition 14 the existence of the minimizer foows. 1381

20 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI 6.2 Uniqueness The first proposition competey characterizes uniqueness for stricty convex functions. Proposition 11 Let s λ be a soution of the probem min s S 1. If s is another soution, then Qs = Qs λ. ( I[s]+λ Qs 2 S 2. If V(y, ) is stricty convex for a y Y then a the minimizers are of the form s λ + g, with g S such that Qg = 0 and g(x) = 0 for ν-amost a x X. Let us comment on this proposition before providing the proof. We reca that a soution s λ is the sum of two terms: f λ = Qs λ which is orthogona to B and g λ = s λ f λ. The uniqueness of f λ (item 1) is due to the strict convexity of the penaty term. Item 2 states the genera conditions that shoud be satisfied by offset functions to obtain uniqueness on s λ : in the discrete setting one has uniqueness if and ony if the condition g(x i ) = 0 for a i impies that g is equa to zero. Ceary, if B is the space of constant functions uniqueness is ensured. We now give the proof of the proposition. Proof [of Proposition 11] 1. Let s another minimizer and assume that Qs λ Qs. Then, by the strict convexity of 2 S, one has that, for a t ]0,1[, (1 t)qs λ +tqs 2 Qs < (1 t) λ 2 +t Qs 2 S S S. Since I[s] is convex, one has that ). I[(1 t)s λ +ts ] (1 t)i[s λ ]+ti[s ]. From the above two inequaities we find I[(1 t)s λ +ts ] + λ Q ((1 t)s λ +ts ) 2 S ( < (1 t) I[s λ ]+λ Qs λ 2 S ( ) I[s]+λ Qs 2 S. = min s S Since this is impossibe, it foows that Qs λ = Qs. ) ( +t I[s ]+λ Qs ) 2 S 2. Let s = s λ + g with g as in item 1. By straightforward computation we have that s is a minimizer. It is eft to show that the minimizers are ony the functions written in the above form. From item 1 we have that Qg = 0. Let U be the measurabe set U = {x X g(x) 0} = {x X s (x) s λ (x)}. By contradiction, et us assume that ν(u) > 0 and, hence, ρ(u Y) > 0. Fix t ]0, 1[. since V(y, ) is stricty convex, for a (x,y) U Y, one has that V(y,(1 t)s λ (x)+ts (x)) < (1 t)v(y,s λ (x))+tv(y,s (x)). 1382

21 SOME PROPERTIES OF REGULARIED KERNEL METHODS Therefore, by integration, U Y V(y,(1 t)s λ (x)+ts (x))dρ(x,y) < < (1 t) V(y,s λ (x))dρ(x,y)+t U Y U Y V(y,s (x))dρ(x,y). On the compement of U Y, we have V(y,s λ (x)) = V(y,s (x)), so that I[(1 t)s λ +ts ] < (1 t)i[s λ ]+ti[s ]. By the same ine of reasoning of item 1, one finds a contradiction. It foows that ν(u) = 0, that is, g(x) = 0 for ν-amost a x X. Two important exampes of convex oss functions which are not stricty convex are the hinge and the ε-insensitive oss. The next proposition deas with the hinge oss though a simiar resut can be aso derived for the ε-insensitive oss, see Burges and Crisp (2000). For the sake of simpicity we deveop our resut in the discrete setting for the case of constant offset functions. In this case uniqueness of the soution is expressed as a condition on the number of support vectors of the two casses. Simiar but a itte bit more invoved conditions can be found considering the continuous setting. Proposition 12 Let Y = {±1}, V(y,w) = 1 yw + and B = R. Let s λ be a soution of ( ) 1 min V(y i,s(x i ))+λ Qs 2 s S S, and define i=1 I + = {i y i = 1, s λ (x) < 1} I = {i y i = 1, s λ (x) > 1} B + = {i y i = 1, s λ (x 1 ) = 1} B = {i y i = 1, s λ (x 1 ) = 1}. The soution is unique if and ony if and where # denotes set cardinaity. #I + #I + #B (32) #I #I + + #B +, (33) Proof Assume that s is another soution. From item 1 of proposition 11, we have that Qs λ = Qs and s = s λ + b. Since both functions are minimizers, one concudes that i=1 1 y i s λ (x i ) + = i=1 1 y i s (x i ) + (34) 1383

22 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI We notice that if yw 1 < 1 and yw 2 > 1, then V(y,(1 t)w 1 +tw 2 ) < (1 t)v(y,w 1 )+tv(y,w 2 ). Reasoning as in the proof of the previous proposition, one has that, for a i I + I, and, for a i (I + I B + B ) y i s (x i ) 1 y i s (x i ) 1. Using the above two equations, it foows that equaity (34) becomes (1 y i s λ (x i )) = i I + I (1 y i s (x i ))+ by i +, i I + I i B + B (if the index set is empty, we et the corresponding sum be equa to 0). The above equation is equivaent to by i = by i +, i I + I i B + B that has a not trivia soution if and ony if both the foowing conditions are true 1. if b > 0, then i I+ I y i = B y i (that is, Eq. (32) hods). 2. if b < 0, then i I+ I y i = B+ y i (that is, Eq. (33) hods). Now, if neither Eq. (32) nor Eq. (33) hods, then b = 0 and s λ is unique. Conversey, assume for exampe that Eq. (32) hods. It is simpe to check that there is b > 0 such that for a i I + I, and, for a i (I + I B + B ) Finay, by direct computation one has that y i (s λ (x i )+b) 1 y i (s λ (x i )+b) 1. I[s λ ] = I[s λ + b]. If the soution is not unique, the soution famiy is parameterized as s λ +b, where b runs in a cosed, not necessariy bounded interva. However, if there is at east one exampe for each cass, b ies in the bounded interva [b,b + ] and one can easiy show that 1. for the soution with b = b, Eq. (32) hods; 2. for the soution with b = b +, Eq. (33) hods; 3. for the soution with b < b < b +, both Eqs. (32) and (33) hod, from which it foows that #I + = #I and #B + = #B =

23 SOME PROPERTIES OF REGULARIED KERNEL METHODS 7. Discrete Tikhonov Reguarization We now speciaize our resuts to the case in which the probabiity measure is the empirica distribution ρ S and B is the space of constant functions (K B = 1) and discuss in detai Support Vector Machines for cassification. We start by recaing that, from item 2 of Proposition 14 it foows that the eft and right derivatives of V(y, ) aways exist and ( V)(y,w) = [V (y,w),v +(y,w)]. Coroary 13 Let S = H +R and Q the projection on {s S s,1 S = 0}. Given λ > 0, et f λ H and b λ R and define s λ = f λ + b λ S, then { ( f λ,b λ 1 ) argmin f H,b R V(y i, f(x i )+b)+λ f 2 H i } if and ony if s λ argmin s S f λ = Qs λ { 1 i V(y i,s(x i ))+λ Qs 2 H } if and ony if there are α 1,...,α R such that f λ = i=1 α i K xi = i=1 α i (K xi + 1) 1 2λ V +(y i, f λ (x i )+b λ ) α i 1 2λ V (y i, f λ (x i )+b λ ) i=1 α i = 0 We notice two facts. First, α i can be zero ony if 0 ( V)(y i, f λ (x i )+b λ ) that is, ony if f λ (x i )+ b λ is a minimizer of V(y i, ). Therefore, a necessary condition for obtaining sparsity is a pateaux in the oss function. A quantitative discussion on this topic can be found in Steinwart (2003). Second if V and V + are bounded by a constant M > 0, one has that α i 2λM that is, a sufficient conditions for box constraints on the coefficients. In the rest of this section we consider Support Vector Machines for cassification showing that through our anaysis the soution is competey characterized in the prima formuation. A simpe cacuation for the hinge oss shows that [V (y,w),v +(y,w)] = y for yw < 1 [min{ y, 0}, max{0, y}] for yw = 1 0 for yw > (35)

24 DE VITO, ROSASCO, CAPONNETTO, PIANA AND VERRI To be consistent with the notation used in the iterature, we et C = 1 2λ and factorize the abes y i from the coefficients α i. Then, according to the above coroary, the soution of the SVM agorithm is given by s λ = i=1 α i y i K xi + b λ where the set (α 1,...,α,b λ ) soves the foowing agebraic system of inequaities ( ) 0 α i C if y i α j y j K(x i,x j )+b λ = 1 j=1 ( ) α i = 0 if y i α j y j K(x i,x j )+b λ > 1 (36) j=1 ( ) α i = C if y i α j y j K(x i,x j )+b λ < 1 j=1 α i y i = 0 i Interestingy, the above inequaities, which fuy characterize the support vectors associated with the soution, are usuay obtained as the Kuhn-Tucker conditions of the dua QP optimization probem (Vapnik, 1988). Looking at Eqs.(35-36), it is immediate to see that the box constraints (0 α i C) are due to the inearity of V(y f(x)) for y f(x) < 1, whereas sparsity (α i = 0) foows from the constancy of V(y f(x)) for y f(x) > Concusion In this paper we study some properties of earning functionas derived from Tikhonov reguarization. We deveop our anaysis in a continuous setting and use toos from convex anaysis in infinite dimensiona spaces to quantitativey characterize the expicit form of the reguarized soution for both regression and cassification. We aso address the case with and without the offset term within the same unifying framework. We show that the presence of an offset term is equivaent to soving a standard probem of reguarization in a Reproducing Kerne Hibert Space in which the penaty term is given by a seminorm. Finay, we discuss issues of existence and uniqueness of the soution and speciaize our resuts to the discrete setting. Current work aims at extending these resuts to vector-vaued functions (Micchei and Ponti, 2003) and exporing possibe use of offset functions to incorporate invariances (Girosi and Chan, 1995). Acknowedgments We thank the anonymous referees for suggestions eading to an improved version of the paper. A. Caponnetto is supported by a PRIN feowship within the project Inverse probems in medica imaging, n This research has been partiay funded by the INFM Project MAIA, 1386

25 SOME PROPERTIES OF REGULARIED KERNEL METHODS the FIRB Project ASTAA and the IST Programme of the European Community, under the PASCAL Network of Exceence, IST Appendix A. Convex Functions in Infinite Dimensiona Spaces The proof of Theorem 2 is based on the properties of convex functions defined on infinite dimensiona spaces. In particuar, we use the notion of subgradient that extends the notion of derivative to convex non-differentiabe functions. In this appendix we coect the resuts we need. For detais see the book Ekeand and Turnbu (1983) and aso Ekeand and Teman (1974). Let H be a Banach space and H its dua. A function F : H R + is convex if F(tv+(1 t)w) tf(v)+(1 t)f(w), for a v, w H and t [0, 1] (if the strict inequaity hods for t (0, 1), F is caed stricty convex). Let v 0 H such that F(v 0 ) < +. The subgradient of F at point v 0 H is the subset of H given by F(v 0 ) = {w H F(v) F(v 0 )+ w,v v 0, v H }. (37) where, is the pairing between H and H. If F(v) = +, we et F(v 0 ) = /0. In the foowing proposition we summarize the main properties of the subgradient we need. Proposition 14 The foowing facts hod: 1. If F is differentiabe at v 0, the subgradient reduces to the usua gradient F (v 0 ). 2. If F is defined on R and F(v 0 ) < +, then F admits eft and right derivative and F(v 0 ) = [F (v 0 ),F +(v 0 )]. 3. Assume that F +. A point v 0 is a minimizer of F if and ony if 0 F(v 0 ). 4. If F is continuous and im F(v) = +. v H + then F has a minimizer. If F is stricty convex, the minimizer is unique. 5. Let G be another convex function on H. Assume that there is v 0 H such that F and G are continuous and finite at v 0. Let a,b 0, then af + bg is convex and, for a v H, (af + bg)(v) = a( F)(v)+b( G)(v). 6. Let H be another Banach space and J be a continuous inear operator from H into H. Assume that there is v 0 H such that F is continuous and finite at J v 0. For a v H ( F J)(v ) = J ( F)(J v ), where J : H H is the adjoint of J defined by v,j v H = J v,v H. for a v H and v H. 1387

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Coupling of LWR and phase transition models at boundary

Coupling of LWR and phase transition models at boundary Couping of LW and phase transition modes at boundary Mauro Garaveo Dipartimento di Matematica e Appicazioni, Università di Miano Bicocca, via. Cozzi 53, 20125 Miano Itay. Benedetto Piccoi Department of

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information

Restricted weak type on maximal linear and multilinear integral maps.

Restricted weak type on maximal linear and multilinear integral maps. Restricted weak type on maxima inear and mutiinear integra maps. Oscar Basco Abstract It is shown that mutiinear operators of the form T (f 1,..., f k )(x) = R K(x, y n 1,..., y k )f 1 (y 1 )...f k (y

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

A UNIVERSAL METRIC FOR THE CANONICAL BUNDLE OF A HOLOMORPHIC FAMILY OF PROJECTIVE ALGEBRAIC MANIFOLDS

A UNIVERSAL METRIC FOR THE CANONICAL BUNDLE OF A HOLOMORPHIC FAMILY OF PROJECTIVE ALGEBRAIC MANIFOLDS A UNIERSAL METRIC FOR THE CANONICAL BUNDLE OF A HOLOMORPHIC FAMILY OF PROJECTIE ALGEBRAIC MANIFOLDS DROR AROLIN Dedicated to M Saah Baouendi on the occasion of his 60th birthday 1 Introduction In his ceebrated

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES by Michae Neumann Department of Mathematics, University of Connecticut, Storrs, CT 06269 3009 and Ronad J. Stern Department of Mathematics, Concordia

More information

Symbolic models for nonlinear control systems using approximate bisimulation

Symbolic models for nonlinear control systems using approximate bisimulation Symboic modes for noninear contro systems using approximate bisimuation Giordano Poa, Antoine Girard and Pauo Tabuada Abstract Contro systems are usuay modeed by differentia equations describing how physica

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS. B. L. S.

WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS. B. L. S. Indian J. Pure App. Math., 41(1): 275-291, February 2010 c Indian Nationa Science Academy WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

ANALOG OF HEAT EQUATION FOR GAUSSIAN MEASURE OF A BALL IN HILBERT SPACE GYULA PAP

ANALOG OF HEAT EQUATION FOR GAUSSIAN MEASURE OF A BALL IN HILBERT SPACE GYULA PAP ANALOG OF HEAT EQUATION FOR GAUSSIAN MEASURE OF A BALL IN HILBERT SPACE GYULA PAP ABSTRACT. If µ is a Gaussian measure on a Hibert space with mean a and covariance operator T, and r is a} fixed positive

More information

Completion. is dense in H. If V is complete, then U(V) = H.

Completion. is dense in H. If V is complete, then U(V) = H. Competion Theorem 1 (Competion) If ( V V ) is any inner product space then there exists a Hibert space ( H H ) and a map U : V H such that (i) U is 1 1 (ii) U is inear (iii) UxUy H xy V for a xy V (iv)

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

2M2. Fourier Series Prof Bill Lionheart

2M2. Fourier Series Prof Bill Lionheart M. Fourier Series Prof Bi Lionheart 1. The Fourier series of the periodic function f(x) with period has the form f(x) = a 0 + ( a n cos πnx + b n sin πnx ). Here the rea numbers a n, b n are caed the Fourier

More information

Another Class of Admissible Perturbations of Special Expressions

Another Class of Admissible Perturbations of Special Expressions Int. Journa of Math. Anaysis, Vo. 8, 014, no. 1, 1-8 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.1988/ijma.014.31187 Another Cass of Admissibe Perturbations of Specia Expressions Jerico B. Bacani

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Numerical methods for elliptic partial differential equations Arnold Reusken

Numerical methods for elliptic partial differential equations Arnold Reusken Numerica methods for eiptic partia differentia equations Arnod Reusken Preface This is a book on the numerica approximation of partia differentia equations. On the next page we give an overview of the

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

JENSEN S OPERATOR INEQUALITY FOR FUNCTIONS OF SEVERAL VARIABLES

JENSEN S OPERATOR INEQUALITY FOR FUNCTIONS OF SEVERAL VARIABLES PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Voume 128, Number 7, Pages 2075 2084 S 0002-99390005371-5 Artice eectronicay pubished on February 16, 2000 JENSEN S OPERATOR INEQUALITY FOR FUNCTIONS OF

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC (January 8, 2003) A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC DAMIAN CLANCY, University of Liverpoo PHILIP K. POLLETT, University of Queensand Abstract

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set

BASIC NOTIONS AND RESULTS IN TOPOLOGY. 1. Metric spaces. Sets with finite diameter are called bounded sets. For x X and r > 0 the set BASIC NOTIONS AND RESULTS IN TOPOLOGY 1. Metric spaces A metric on a set X is a map d : X X R + with the properties: d(x, y) 0 and d(x, y) = 0 x = y, d(x, y) = d(y, x), d(x, y) d(x, z) + d(z, y), for a

More information

Are Loss Functions All the Same?

Are Loss Functions All the Same? Are Loss Functions All the Same? L. Rosasco E. De Vito A. Caponnetto M. Piana A. Verri November 11, 2003 Abstract In this paper we investigate the impact of choosing different loss functions from the viewpoint

More information

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda

More information

Smoothness equivalence properties of univariate subdivision schemes and their projection analogues

Smoothness equivalence properties of univariate subdivision schemes and their projection analogues Numerische Mathematik manuscript No. (wi be inserted by the editor) Smoothness equivaence properties of univariate subdivision schemes and their projection anaogues Phiipp Grohs TU Graz Institute of Geometry

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Rate-Distortion Theory of Finite Point Processes

Rate-Distortion Theory of Finite Point Processes Rate-Distortion Theory of Finite Point Processes Günther Koiander, Dominic Schuhmacher, and Franz Hawatsch, Feow, IEEE Abstract We study the compression of data in the case where the usefu information

More information

Homework 5 Solutions

Homework 5 Solutions Stat 310B/Math 230B Theory of Probabiity Homework 5 Soutions Andrea Montanari Due on 2/19/2014 Exercise [5.3.20] 1. We caim that n 2 [ E[h F n ] = 2 n i=1 A i,n h(u)du ] I Ai,n (t). (1) Indeed, integrabiity

More information

The Group Structure on a Smooth Tropical Cubic

The Group Structure on a Smooth Tropical Cubic The Group Structure on a Smooth Tropica Cubic Ethan Lake Apri 20, 2015 Abstract Just as in in cassica agebraic geometry, it is possibe to define a group aw on a smooth tropica cubic curve. In this note,

More information

Higher dimensional PDEs and multidimensional eigenvalue problems

Higher dimensional PDEs and multidimensional eigenvalue problems Higher dimensiona PEs and mutidimensiona eigenvaue probems 1 Probems with three independent variabes Consider the prototypica equations u t = u (iffusion) u tt = u (W ave) u zz = u (Lapace) where u = u

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

Theory of Generalized k-difference Operator and Its Application in Number Theory

Theory of Generalized k-difference Operator and Its Application in Number Theory Internationa Journa of Mathematica Anaysis Vo. 9, 2015, no. 19, 955-964 HIKARI Ltd, www.m-hiari.com http://dx.doi.org/10.12988/ijma.2015.5389 Theory of Generaized -Difference Operator and Its Appication

More information

The Construction of a Pfaff System with Arbitrary Piecewise Continuous Characteristic Power-Law Functions

The Construction of a Pfaff System with Arbitrary Piecewise Continuous Characteristic Power-Law Functions Differentia Equations, Vo. 41, No. 2, 2005, pp. 184 194. Transated from Differentsia nye Uravneniya, Vo. 41, No. 2, 2005, pp. 177 185. Origina Russian Text Copyright c 2005 by Izobov, Krupchik. ORDINARY

More information

Math 124B January 31, 2012

Math 124B January 31, 2012 Math 124B January 31, 212 Viktor Grigoryan 7 Inhomogeneous boundary vaue probems Having studied the theory of Fourier series, with which we successfuy soved boundary vaue probems for the homogeneous heat

More information

Bourgain s Theorem. Computational and Metric Geometry. Instructor: Yury Makarychev. d(s 1, s 2 ).

Bourgain s Theorem. Computational and Metric Geometry. Instructor: Yury Makarychev. d(s 1, s 2 ). Bourgain s Theorem Computationa and Metric Geometry Instructor: Yury Makarychev 1 Notation Given a metric space (X, d) and S X, the distance from x X to S equas d(x, S) = inf d(x, s). s S The distance

More information

On Non-Optimally Expanding Sets in Grassmann Graphs

On Non-Optimally Expanding Sets in Grassmann Graphs ectronic Cooquium on Computationa Compexity, Report No. 94 (07) On Non-Optimay xpanding Sets in Grassmann Graphs Irit Dinur Subhash Khot Guy Kinder Dor Minzer Mui Safra Abstract The paper investigates

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

arxiv: v1 [math.pr] 6 Oct 2017

arxiv: v1 [math.pr] 6 Oct 2017 EQUICONTINUOUS FAMILIES OF MARKOV OPERATORS IN VIEW OF ASYMPTOTIC STABILITY SANDER C. HILLE, TOMASZ SZAREK, AND MARIA A. ZIEMLAŃSKA arxiv:1710.02352v1 [math.pr] 6 Oct 2017 Abstract. Reation between equicontinuity

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0

u(x) s.t. px w x 0 Denote the solution to this problem by ˆx(p, x). In order to obtain ˆx we may simply solve the standard problem max x 0 Bocconi University PhD in Economics - Microeconomics I Prof M Messner Probem Set 4 - Soution Probem : If an individua has an endowment instead of a monetary income his weath depends on price eves In particuar,

More information

The arc is the only chainable continuum admitting a mean

The arc is the only chainable continuum admitting a mean The arc is the ony chainabe continuum admitting a mean Aejandro Ianes and Hugo Vianueva September 4, 26 Abstract Let X be a metric continuum. A mean on X is a continuous function : X X! X such that for

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS

A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS J App Prob 40, 226 241 (2003) Printed in Israe Appied Probabiity Trust 2003 A CLUSTERING LAW FOR SOME DISCRETE ORDER STATISTICS SUNDER SETHURAMAN, Iowa State University Abstract Let X 1,X 2,,X n be a sequence

More information

arxiv: v1 [math.fa] 23 Aug 2018

arxiv: v1 [math.fa] 23 Aug 2018 An Exact Upper Bound on the L p Lebesgue Constant and The -Rényi Entropy Power Inequaity for Integer Vaued Random Variabes arxiv:808.0773v [math.fa] 3 Aug 08 Peng Xu, Mokshay Madiman, James Mebourne Abstract

More information

Maejo International Journal of Science and Technology

Maejo International Journal of Science and Technology Fu Paper Maejo Internationa Journa of Science and Technoogy ISSN 1905-7873 Avaiabe onine at www.mijst.mju.ac.th A study on Lucas difference sequence spaces (, ) (, ) and Murat Karakas * and Ayse Metin

More information

Indirect Optimal Control of Dynamical Systems

Indirect Optimal Control of Dynamical Systems Computationa Mathematics and Mathematica Physics, Vo. 44, No. 3, 24, pp. 48 439. Transated from Zhurna Vychisite noi Matematiki i Matematicheskoi Fiziki, Vo. 44, No. 3, 24, pp. 444 466. Origina Russian

More information

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations Goba Optimaity Principes for Poynomia Optimization Probems over Box or Bivaent Constraints by Separabe Poynomia Approximations V. Jeyakumar, G. Li and S. Srisatkunarajah Revised Version II: December 23,

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

arxiv: v3 [math.ca] 8 Nov 2018

arxiv: v3 [math.ca] 8 Nov 2018 RESTRICTIONS OF HIGHER DERIVATIVES OF THE FOURIER TRANSFORM MICHAEL GOLDBERG AND DMITRIY STOLYAROV arxiv:1809.04159v3 [math.ca] 8 Nov 018 Abstract. We consider severa probems reated to the restriction

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

MIXING AUTOMORPHISMS OF COMPACT GROUPS AND A THEOREM OF SCHLICKEWEI

MIXING AUTOMORPHISMS OF COMPACT GROUPS AND A THEOREM OF SCHLICKEWEI MIXING AUTOMORPHISMS OF COMPACT GROUPS AND A THEOREM OF SCHLICKEWEI KLAUS SCHMIDT AND TOM WARD Abstract. We prove that every mixing Z d -action by automorphisms of a compact, connected, abeian group is

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Volume 13, MAIN ARTICLES

Volume 13, MAIN ARTICLES Voume 13, 2009 1 MAIN ARTICLES THE BASIC BVPs OF THE THEORY OF ELASTIC BINARY MIXTURES FOR A HALF-PLANE WITH CURVILINEAR CUTS Bitsadze L. I. Vekua Institute of Appied Mathematics of Iv. Javakhishvii Tbiisi

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Lecture Notes 4: Fourier Series and PDE s

Lecture Notes 4: Fourier Series and PDE s Lecture Notes 4: Fourier Series and PDE s 1. Periodic Functions A function fx defined on R is caed a periodic function if there exists a number T > such that fx + T = fx, x R. 1.1 The smaest number T for

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Selmer groups and Euler systems

Selmer groups and Euler systems Semer groups and Euer systems S. M.-C. 21 February 2018 1 Introduction Semer groups are a construction in Gaois cohomoogy that are cosey reated to many objects of arithmetic importance, such as cass groups

More information

On the evaluation of saving-consumption plans

On the evaluation of saving-consumption plans On the evauation of saving-consumption pans Steven Vanduffe Jan Dhaene Marc Goovaerts Juy 13, 2004 Abstract Knowedge of the distribution function of the stochasticay compounded vaue of a series of future

More information

B. Brown, M. Griebel, F.Y. Kuo and I.H. Sloan

B. Brown, M. Griebel, F.Y. Kuo and I.H. Sloan Wegeerstraße 6 53115 Bonn Germany phone +49 8 73-347 fax +49 8 73-757 www.ins.uni-bonn.de B. Brown, M. Griebe, F.Y. Kuo and I.H. Soan On the expected uniform error of geometric Brownian motion approximated

More information

Learning, Regularization and Ill-Posed Inverse Problems

Learning, Regularization and Ill-Posed Inverse Problems Learning, Regularization and Ill-Posed Inverse Problems Lorenzo Rosasco DISI, Università di Genova rosasco@disi.unige.it Andrea Caponnetto DISI, Università di Genova caponnetto@disi.unige.it Ernesto De

More information

FRIEZE GROUPS IN R 2

FRIEZE GROUPS IN R 2 FRIEZE GROUPS IN R 2 MAXWELL STOLARSKI Abstract. Focusing on the Eucidean pane under the Pythagorean Metric, our goa is to cassify the frieze groups, discrete subgroups of the set of isometries of the

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

OPERATORS WITH COMMON HYPERCYCLIC SUBSPACES

OPERATORS WITH COMMON HYPERCYCLIC SUBSPACES OPERATORS WITH COMMON HYPERCYCLIC SUBSPACES R. ARON, J. BÈS, F. LEÓN AND A. PERIS Abstract. We provide a reasonabe sufficient condition for a famiy of operators to have a common hypercycic subspace. We

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

On the estimation of multiple random integrals and U-statistics

On the estimation of multiple random integrals and U-statistics Péter Major On the estimation of mutipe random integras and U-statistics Lecture Note January 9, 2014 Springer Contents 1 Introduction................................................... 1 2 Motivation

More information

An explicit resolution of the equity-efficiency tradeoff in the random allocation of an indivisible good

An explicit resolution of the equity-efficiency tradeoff in the random allocation of an indivisible good An expicit resoution of the equity-efficiency tradeoff in the random aocation of an indivisibe good Stergios Athanassogou, Gauthier de Maere d Aertrycke January 2015 Abstract Suppose we wish to randomy

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

CHAPTER 2 AN INTRODUCTION TO WAVELET ANALYSIS

CHAPTER 2 AN INTRODUCTION TO WAVELET ANALYSIS CHAPTER 2 AN INTRODUCTION TO WAVELET ANALYSIS [This chapter is based on the ectures of Professor D.V. Pai, Department of Mathematics, Indian Institute of Technoogy Bombay, Powai, Mumbai - 400 076, India.]

More information

THIELE CENTRE. On spectral distribution of high dimensional covariation matrices. Claudio Heinrich and Mark Podolskij

THIELE CENTRE. On spectral distribution of high dimensional covariation matrices. Claudio Heinrich and Mark Podolskij THIELE CENTRE for appied mathematics in natura science On spectra distribution of high dimensiona covariation matrices Caudio Heinrich and Mark Podoskij Research Report No. 02 December 2014 On spectra

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

QUANTITATIVE ANALYSIS OF FINITE-DIFFERENCE APPROXIMATIONS OF FREE-DISCONTINUITY PROBLEMS

QUANTITATIVE ANALYSIS OF FINITE-DIFFERENCE APPROXIMATIONS OF FREE-DISCONTINUITY PROBLEMS QUANTITATIVE ANALYSIS OF FINITE-DIFFERENCE APPROXIMATIONS OF FREE-DISCONTINUITY PROBLEMS ANNIKA BACH, ANDREA BRAIDES, AND CATERINA IDA ZEPPIERI Abstract. Motivated by appications to image reconstruction,

More information

ADELIC ANALYSIS AND FUNCTIONAL ANALYSIS ON THE FINITE ADELE RING. Ilwoo Cho

ADELIC ANALYSIS AND FUNCTIONAL ANALYSIS ON THE FINITE ADELE RING. Ilwoo Cho Opuscua Math. 38, no. 2 208, 39 85 https://doi.org/0.7494/opmath.208.38.2.39 Opuscua Mathematica ADELIC ANALYSIS AND FUNCTIONAL ANALYSIS ON THE FINITE ADELE RING Iwoo Cho Communicated by.a. Cojuhari Abstract.

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information