Sparse Semi-supervised Learning Using Conjugate Functions

Size: px
Start display at page:

Download "Sparse Semi-supervised Learning Using Conjugate Functions"

Transcription

1 Journa of Machine Learning Research (200) Submitted 2/09; Pubished 9/0 Sparse Semi-supervised Learning Using Conjugate Functions Shiiang Sun Department of Computer Science and Technoogy East China Norma University 500 Dongchuan Road, Shanghai 20024, P. R. China John Shawe-Tayor Department of Computer Science University Coege London Gower Street, London WCE 6BT, United Kingdom Editor: Tony Jebara Abstract In this paper, we propose a genera framework for sparse semi-supervised earning, which concerns using a sma portion of unabeed data and a few abeed data to represent target functions and thus has the merit of acceerating function evauations when predicting the output of a new exampe. This framework makes use of Fenche-Legendre conjugates to rewrite a convex insensitive oss invoving a reguarization with unabeed data, and is appicabe to a famiy of semi-supervised earning methods such as muti-view co-reguarized east squares and singe-view Lapacian support vector machines (SVMs). As an instantiation of this framework, we propose sparse muti-view SVMs which use a squared ε-insensitive oss. The resutant optimization is an inf-sup probem and the optima soutions have arguaby sadde-point properties. We present a gobay optima iterative agorithm to optimize the probem. We give the margin bound on the generaization error of the sparse muti-view SVMs, and derive the empirica Rademacher compexity for the induced function cass. Experiments on artificia and rea-word data show their effectiveness. We further give a sequentia training approach to show their possibiity and potentia for uses in arge-scae probems and provide encouraging experimenta resuts indicating the efficacy of the margin bound and empirica Rademacher compexity on characterizing the roes of unabeed data for semi-supervised earning. Keywords: semi-supervised earning, Fenche-Legendre conjugate, representer theorem, mutiview reguarization, support vector machine, statistica earning theory. Introduction Semi-supervised earning, considering how to estimate a target function from a few abeed exampes and a arge quantity of unabeed exampes, is one of currenty active research directions. If the unabeed data are propery used, it can get a superior performance over the counterpart supervised earning approaches. For an overview of semi-supervised earning methods, refer to Chapee et a. (2006) and Zhu (2008). Athough semi-supervised earning was argey motivated by different rea-word appications where obtaining abes is expensive or time-consuming, a ot of theoretica outcomes have aso been accompished. Typica appications of semi-supervised earning incude natura image cassification and text cassification, where it is inexpensive to coect arge numbers of images and texts by c 200 Shiiang Sun and John Shawe-Tayor.

2 SUN AND SHAWE-TAYLOR automatic programs, but needs a high cost to abe them manuay. Theoretica resuts on semisupervised earning incude PAC-anaysis (Bacan and Bum, 2005), manifod reguarization (Bekin et a., 2006), and muti-view reguarization theories (Sindhwani and Rosenberg, 2008), etc. Among the methods proposed for semi-supervised earning, a famiy of them, for exampe, Lapacian reguarized east squares (RLS), Lapacian support vector machines (SVMs), Co-RLS, Co-Lapacian RLS, Co-Lapacian SVMs, and manifod co-reguarization (Bekin et a., 2006; Sindhwani et a., 2005; Sindhwani and Rosenberg, 2008), make use of the foowing representer theorem (Kimedorf and Wahba, 97) to represent the target function in a reproducing kerne Hibert space (RKHS). Theorem (Representer theorem) LetH be an RKHS with kerne k :X X R. Fix any function V : R n R and any nondecreasing function Ψ : R R. Define J( f)= V( f(x ),..., f(x n ))+Ψ( f 2 ), and inear spacel = span{k(x, ),...,k(x n, )}. Then for any f H we have J( f L ) J( f) with f L being the projection of f ontol in the foowing form f L = n α i k(x i, ). Thus if J = min f J( f) exists, this minimum is attained for some f L. Moreover, if Ψ is stricty increasing, each minimizer of J( f) overh must be contained inl. Generay, in the objective function J( f) of these semi-supervised earning methods, abeed exampes are used to cacuate an empirica oss of the target function and simutaneousy unabeed exampes are used for some reguarization purpose. By the representer theorem, the target function woud invove kerne evauations on a the abeed and unabeed exampes. This is computationay undesirabe, because for semi-supervised earning usuay a consideraby arge number of unabeed exampes are avaiabe. Consequenty, sparsity in the number of unabeed data used to represent target functions is crucia, which constitutes the focus of this paper. However, itte work has been done on this theme. In particuar, there is no unified framework proposed yet to dea with this sparsity concern. Whie the sparse Lapacian core vector machines (Tsang and Kwok, 2007) touched this probem, it has a compicated optimization and is not generic enough to generaize to other simiar semi-supervised earning methods. In contrast with this, the technique deveoped in this paper, based on Fenche-Legendre conjugates, is computationay simpe and widey appicabe. As far as muti-view earning is concerned there has been work that introduces sparsity of the unabeed data into the representation of the cassifiers (Szedmak and Shawe-Tayor, 2007). This buids on the ideas deveoped for two view earning known as the SVM-2K (Farquhar et a., 2006). The approach adopted is the use of an ε-insensitive oss function for the simiarity constraint between the two functions from two views. Unfortunatey the resuting optimization is somewhat unmanageabe and ony scaes to sma-scae data sets despite interesting theoretica bounds that show the improvement gained using the unabeed data. The work by Szedmak and Shawe-Tayor (2007) forms the starting point for the current paper which aims to deveop reated methods that are possibe to be scaed to very arge data sets. Our approach is to go back to consider 2 oss between the outputs of the cassifiers arising from two 2424

3 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS views and shows that this probem can be soved impicity with variabes ony indexed by the abeed data. To compute the vaue of this function on new data woud sti require a non-sparse dua representation in terms of the unabeed data. However, we show that through optimizing weights of the unabeed data the soution of the 2 probem converges to the soution of an ε-insensitive probem ensuring that we subsequenty obtain sparsity in the unabeed data. Furthermore, we deveop the generaization anaysis of Szedmak and Shawe-Tayor (2007) to this case giving computabe expressions for the corresponding empirica Rademacher compexity. To show the appication of Fenche-Legendre conjugates, in Section 2 we propose a nove sparse semi-supervised earning approach: sparse muti-view SVMs, where the conjugate functions pay a centra roe in reformuating the optimization probem. The dua optimization of a subroutine of the sparse muti-view SVMs is converted to a quadratic programming probem in Section 3 whose scae ony depends on the number of abeed exampes, indicating the advantages of using conjugate functions. The generaization error of the sparse muti-view SVMs is given in Section 4 in terms of Rademacher compexity theory, foowed by a derivation of empirica Rademacher compexity of the cass of functions induced by this new method in Section 5. Section 6 reports experimenta resuts of the sparse muti-view SVMs, comparisons with reated methods, and the possibiity and potentia for arge-scae appications through sequentia training. Extensions of the use of conjugate functions to a genera convex oss and other reated semi-supervised earning approaches are discussed in Section 7. Finay, Section 8 concudes this paper. 2. Sparse Muti-view SVMs Muti-view semi-supervised earning, an important branch of semi-supervised earning, combines different sets of properties of an exampe to earn a target function. These different sets of properties are often referred to as views. Typica appications of muti-view earning are web-page categorization and content-based mutimedia information retrieva. In web-page categorization, each web-page can be simutaneousy described by disparate properties such as main text, inbound and outbound hyper-inks. In content-based mutimedia information retrieva, a mutimedia segment can incude both audio and video components. For such scenarios earning with mutipe views is usuay very beneficia. Even for probems with no natura mutipe views, artificiay generated views can sti work favoraby (Nigam and Ghani, 2000). A usefu assumption for muti-view earning is that features from each view are sufficient to train a good earner (Bum and Mitche, 998; Bacan et a., 2005; Farquhar et a., 2006). Making good use of this assumption through coaborative training or reguarization between views can remove many fase hypotheses from the hypothesis space, and thus faciitates effective earning. For muti-view earning, an input x consists of mutipe components from different views, for exampe, x = (x,...,x m ) for an m-view representation. A function f j defined on view j ony depends on x j, that is f j (x) := f j (x j ). Suppose we have a set of abeed exampes {(x i,y i )} with y i {, }, and a set of u unabeed exampes{x i } i=+ +u. The objective function of our sparse muti-view SVMs in the case of two views is given as foows, which can be readiy extended to 2425

4 SUN AND SHAWE-TAYLOR more than two views. min f H, f 2 H 2 2 [( y i f (x i )) + +( y i f 2 (x i )) + ]+ γ n ( f 2 + f 2 2 +u )+γ v ( f (x i ) f 2 (x i ) ε) 2 +, () where nonnegative scaars γ n,γ v are respectivey norm reguarization and muti-view reguarization coefficients, and the ast term is an ε-insensitive oss between two views with function ( ) + := max(0, ) being the hinge oss. The fina cassifier for predicting the abe of a new exampe is ( ) f (x)+ f 2 (x) f c (x)=sgn. (2) 2 In the rest of this section, we wi show that the use of the ε-insensitive oss indeed enforces sparsity, and Fenche-Legendre conjugates can be adopted to reformuate the optimization probem. We aso show the sadde-point properties for optima soutions and give a (gobay optima) iterative optimization agorithm. 2. Sparsity In order to show the roe of the ε-insensitive oss for sparsity pursuit, here we represent f (x) and f 2 (x) in feature spaces as f (x)=w φ (x)+b, f 2 (x)=w 2 φ 2 (x)+b 2, where φ i (x)(,2) is the image of x in feature spaces. Probem () can be rewritten as min w,w 2,ξ,ξ 2,b,b 2 s.t. P 0 = 2 γ v +u (ξ i + ξ i 2)+γ n ( w 2 + w 2 2 )+ ( w φ (x i )+b w 2 φ 2 (x i ) b 2 ε) 2 + y i (w φ (x i )+b ) ξ i, y i (w 2 φ 2(x i )+b 2 ) ξ i 2, ξ i, ξi 2 0,,...,, where ξ :=[ξ,...,ξ ] and ξ 2 :=[ξ 2,...,ξ 2 ]. The Lagrangian is L= P 0 [λ i (y i (w φ (x i )+b ) +ξ i )+ λ i 2(y i (w 2 φ 2 (x i )+b 2 ) +ξ i 2)+ν i ξ i + ν i 2ξ i 2], where λ i,λi 2,νi,νi 2 0 (,...,) are Lagrange mutipiers. Suppose w, w 2 are the optima soutions. By the KKT conditions, the optima soutions w shoud satisfy L w = 0. Therefore, we get w = γ +u v γ n ( w φ (x i )+b w 2 φ 2 (x i ) b 2 ε) + φ i + 2γ n 2426 λ i y i φ (x i ),

5 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS where we suppose the derivative exists everywhere and φ i := sgn{w φ (x i )+b w 2 φ 2(x i ) b 2 }φ (x i ). Now we can assess the sparsity of probem (). From the above equation, w is the inear combination of abeed exampes with λ i > 0, and those unabeed exampes on which the difference of predictions from two views exceeds ε. In this sense, we can get sparse soutions by providing a non-zero ε. Simiar anaysis appies to w 2. Therefore, function f(x) is sparse in the number of used unabeed exampes. This anaysis on sparsity is aso we justified by the representer theorem. 2.2 Reformuation Using Conjugate Functions Define t i :=[ f (x i ) f 2 (x i )] 2. Then the ε-insensitive oss term can be written as +u f ε (t)= ( t i ε) 2 +, (3) where vector t :=[t,...,t +u ]. We give a theorem affirming the convexity of function f ε (t). Theorem 2 Function f ε (t) defined by (3) is convex. Proof First, we show that f ε (t i ) := ( t i ε) 2 + with a convex domain [0,+ ) is convex. When t i (ε 2,+ ), the second derivative 2 f ε (t i ) = 2 εt 3/2 i 0. Thus, function f ε (t i ) is convex for t i (ε 2,+ ). Moreover, the vaue of function f ε (t i ) for t i (ε 2,+ ) is arger than 0 which is the vaue of f ε (t i ) for t i [0,ε 2 ], and function f ε (t i ) with domain [0,+ ) is continuous at ε 2. Hence, f ε (t i ) is convex on the domain[0,+ ). Then, being a nonnegative weighted sum of convex functions, f ε (t) is indeed convex. Define conjugate vector z=[z,...,z +u ] with entries being conjugate variabes. The Fenche- Legendre conjugate (which is aso often caed convex conjugate or conjugate function) f ε(z) is fε(z)= sup (t z f ε (t))=sup t dom f ε t +u [z i t i ( t i ε) 2 +]= +u sup t i [z i t i ( t i ε) 2 +]. The domain of the conjugate function consists of z R +u for which the supremum is finite (i.e., bounded above) (Boyd and Vandenberghe, 2004). Define f ε(z i )=sup t i [z i t i ( t i ε) 2 +]. (4) Then, f ε(z) = +u f ε(z i ). As a pointwise supremum of a famiy of affine functions, f ε(z i ) is convex. Being a nonnegative weighted sum of convex functions, f ε(z) is aso convex. Beow we derive the formuation of f ε(z i ). Theorem 3 Function f ε(z i ) defined by (4) has the foowing form f ε(z i )= { zi ε 2 z i, for 0<z i < 0, for z i 0. (5) 2427

6 SUN AND SHAWE-TAYLOR Proof By definition, we have f ε(z i )=max t i { sup 0 t i ε 2 z i t i, sup t i >ε 2 [z i t i ( t i ε) 2 ] }. (6) The vaue of function sup 0 ti ε 2 z it i is simpe to characterize. We now characterize the second term sup ti >ε 2[z it i ( t i ε) 2 ]=sup ti >ε 2(z it i t i ε 2 + 2ε t i ). For 0<z i <, we et the first derivative equa to zero to find the supremum. For z i 0 or z i the derivative does not exist and thus we use function vaues at end points to find the supremum. As a resut, we have sup[z i t i ( t i ε) 2 ]= t i >ε 2 z i ε 2 z i, for 0<z i < z i ε 2, for z i 0 +, for z i. According to (6) and further removing the range where f ε(z i ) is unbounded above, we reach the conjugate given in (5). Now the Fenche-Legendre conjugate f ε(z) can be represented by +u f ε(z i ), which is aso we justified by the foowing theorem. Theorem 4 (Boyd and Vandenberghe, 2004) If ϕ(u,v)=ϕ (u)+ϕ 2 (v), where ϕ and ϕ 2 are independent convex functions (independent means they are functions of different variabes) with conjugates ϕ and ϕ 2, respectivey, then ϕ (ω,z)=ϕ (ω)+ϕ 2(z). A nice property of the conjugate function is on the conjugate of the conjugate, which is centra to the reformuation of our optimization probem. This property is stated by Lemma 5. Lemma 5 (Rifkin and Lippert, 2007) If function f is cosed, convex, and proper, then the conjugate function of the conjugate is itsef, that is, f = f, where we have defined function f is cosed if its epigraph is cosed, and f is proper if dom f /0 and f >. It is true that function f ε (t) is cosed and proper. Moreover, we have proved the convexity of f ε (t) in Theorem 2. Therefore, we can use Lemma 5 to get the foowing equaity f ε (t)=sup(z t fε(z)). (7) z That is By (7), we have +u ( t i ε) 2 + = sup z (z t f ε(z))=sup z +u [z i t i f ε(z i )]. +u ( f (x i ) f 2 (x i ) ε) 2 + = +u ( t i ε) 2 + = sup z +u [z i t i f ε(z i )]. 2428

7 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS Therefore, the objective function for sparse muti-view SVMs becomes min f H, f 2 H 2 2 [( y i f (x i )) + +( y i f 2 (x i )) + ]+ γ n ( f 2 + f 2 2 +u )+γ v sup {z i [ f (x i ) f 2 (x i )] 2 fε(z i )}. (8) z As an appication of Theorem, the soution to probem (8) has the foowing form +u f (x)= α i +u k (x i,x), f 2 (x)= Appying the reproducing properties of kernes, we get f 2 = α K α, f 2 2 = α 2 K 2 α 2, α i 2k 2 (x i,x). (9) where K and K 2 are(+u) (+u) Gram matrices from two viewsv andv 2, respectivey, and vector α =(α,...,α+u ), α 2 =(α 2,...,α+u 2 ). Moreover, we have f = K α, f 2 = K 2 α 2, with f := ( f (x ),..., f (x +u )), f 2 := ( f 2 (x ),..., f 2 (x +u )). Define diagona matrix U = diag(z,...,z +u ) with every eement taking vaues in the range [0,). Probem (8) can be reformuated as min sup α,α 2,ξ,ξ 2,b,b 2 z s.t. 2.3 Sadde-Point Property 2 (ξ i + ξ i 2)+γ n (α K α + α 2 K 2 α 2 )+ γ v [(K α K 2 α 2 ) +u U(K α K 2 α 2 ) y i ( +u j= α j k (x j,x i )+b ) ξ i, f ε(z i )] y i ( +u j= α j 2 k 2(x j,x i )+b 2 ) ξ i 2, (0) ξ i, ξi 2 0,,...,. We present a theorem concerning the convexity and concavity of optimization probem (0). Theorem 6 The objective function in probem (0) is convex with respect to α,α 2, ξ, ξ 2, b, and b 2, and concave with respect to z. Proof First, we show the convexity. The standard form of this optimization probem is min α,α 2,ξ,ξ 2,b,b 2 sup z s.t. 2 (ξ i + ξ i 2)+γ n (α K α + α 2 K 2 α 2 )+ γ v [(K α K 2 α 2 ) +u U(K α K 2 α 2 ) y i ( +u j= α j k (x j,x i )+b )+ ξ i 0, y i ( +u j= α j 2 k 2(x j,x i )+b 2 )+ ξ i 2 0, ξ i, ξi 2 0,,...,. f ε(z i )] 2429

8 SUN AND SHAWE-TAYLOR This probem invoves one objective function and three sets of inequaity constraint functions (on the eft hand side of each inequaity). Ceary, the domain of each objective and constraint function is a convex set. Now it suffices to prove the convexity of this probem by assessing the convexity of these functions. As a constraint functions are affine, they are convex. Then, we use the second-order condition, positive semidefinite property of a function s Hessian or second derivative to judge the convexity of the objective function (Boyd and Vandenberghe, 2004). According to this condition, the first two items of the objective function are cear to be convex. The third part can be rewritten as (K α K 2 α 2 ) U(K α K 2 α 2 )= U /2( K K 2 ) ( α α 2 ) 2, which is a convex function 2 composed with an affine mapping and thus aso convex (Boyd and Vandenberghe, 2004). Being a nonnegative weighted sums of convex functions, the objective function is therefore convex with respect to α,α 2,ξ,ξ 2,b,b 2. Then, we show the concavity using (8). As fε(z i ) is convex, z i [ f (x i ) f 2 (x i )] 2 fε(z i ) is concave with respect to z i. The concavity of +u {z i[ f (x i ) f 2 (x i )] 2 fε(z i )} foows from the fact that a nonnegative weighted sum of concave functions is concave (Boyd and Vandenberghe, 2004). Hence, the objective function in probem (0) is concave with respect to z. Let θ denote the parameters α,α 2,ξ,ξ 2,b,b 2. We can simpy denote the above optimization probem as inf sup f(θ,z) () θ z associated with constraints on the abeed exampes, where f(θ,z) is convex with respect to θ, and concave with respect to z. We give the foowing theorem on the equivaence of swapping the infimum and supremum for our optimization probem and incude a proof for competeness. Theorem 7 (Boyd and Vandenberghe, 2004) If f(θ, z) with domain Θ and Z is convex with respect to θ Θ, and concave with respect to z Z, the foowing equaity hods under some sight assumptions. inf θ sup z f(θ,z)=supinf f(θ,z) z θ Proof The idea is first to represent the eft-hand side as a vaue of a convex function, and then show that the conjugate of its conjugate is equa to the right-hand side when the same input vaue is pugged in Boyd and Vandenberghe (2004). The eft-hand side can be expressed as p(0), where p(u)=inf sup [ f(θ,z)+u z]. θ z It is not difficut to show that p is a convex function. Being a pointwise supremum of convex function f(θ,z)+u z, sup z [ f(θ,z)+u z] is a convex function of(θ,u). Because sup z [ f(θ,z)+u z] is convex with respect to (θ,u), we have p(u) is convex. The Fenche-Legendre conjugate of p(u) is p (v)=sup[u v inf sup ( f(θ,z)+u z)], u θ z 2430

9 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS which woud be+ if z v. Therefore, { p infθ f(θ,v), for v Z (v)= +, otherwise. The conjugate of p (z) is given by p (u)=sup(u z p (z))=sup(u z+inf z Z z Z θ f(θ,z))=sup inf [ z θ f(θ,z)+u z]. Suppose 0 domp(u) and p(u) is cosed and proper. Then by Lemma 5 we have p(0) = p (0) which competes the proof. Now we give a theorem showing that the optima pair θ, z is a sadde-point. Theorem 8 If the foowing equaity hods for function f(θ,z) inf θ sup z then the optima pair θ, z forms a sadde-point. Proof From the given equaity, we have and Therefore, f(θ,z)=supinf f(θ,z)= f( θ, z), z θ inf sup f(θ,z)=sup f( θ,z)= f( θ, z), θ z z supinf f(θ,z)=inf f(θ, z)= f( θ, z), z θ θ f( θ,z) f( θ, z) f(θ, z), which indeed satisfies the definition of a sadde-point. The proof is competed. 2.4 Iterative Optimization Agorithm To sove the optimization probem sup z inf θ f(θ,z) which is respectivey concave and convex with respect to z and θ, we give an agorithm with guaranteed convergence by the foowing theorem. Theorem 9 Given an initia vaue z 0 for z, sove inf θ f(θ,z 0 ) and obtain the goba optima point θ 0. Then we find argmax z f(θ 0,z) to get z from which we can get θ as a resut of optimize inf θ f(θ,z ). Repeat this process unti a convergence point (ˆθ,ẑ) is reached. Suppose θ, z is a sadde point. We have f(ˆθ,ẑ) = f( θ, z). That is, we got the optima vaues of the objective function. If f is stricty concave and stricty convex with respect to the variabes, we further have ˆθ= θ and ẑ= z. Proof According to the properties of function f and the agorithm procedure, we know that the convergence point is a sadde point. Thus, we have f( θ,ẑ) f(ˆθ,ẑ) f(ˆθ, z). 243

10 SUN AND SHAWE-TAYLOR By the sadde-point property of ( θ, z), we have f( θ, z) f( θ,ẑ), and f( θ, z) f(ˆθ, z). Therefore, the above inequaities shoud hod with equaities and we have f( θ, z)= f(ˆθ,ẑ). Furthermore, if f is stricty concave and stricty convex with respect to the variabes, it is true that ˆθ= θ and ẑ= z. On soving argmax z f(θ,z) required in Theorem 9, we can maximize the term reated to z, namey z t f ε(z)= +u [z it i f ε(z i )]. For this purpose, we have the foowing theorem. Theorem 0 and arg sup z i dom f ε(z i ) sup [z i t i fε(z i )]=( t i ε) 2 +, z i dom fε(z i ) [z i t i f ε(z i )]= { ε ti, for t i > ε 2 0, for 0 t i ε 2. Without oss of generaity, we can confine the range of z i to[0,). Proof We have sup [z i t i fε(z i )]=max{ sup z i dom fε(z i ) z i 0<z i < z i t i z iε 2, supz i t i }. z i The first supremum can be soved by setting the derivative with respect to z i to zero. We have sup z i t i z iε 2 =( t i ε) 2 0<z i < z i where t i > ε 2, and the supremum is attained with z i = ε ti. When t i < 0, sup zi 0 z it i is unbounded above. When 0 t i ε 2, sup zi 0 z it i = 0 with the supremum attained at z i = 0. Therefore, max zi {sup 0<zi < z it i z iε 2 z i,sup zi 0 z it i } = ( t i ε) 2 + with the supremum attained when z i [0,), which competes the proof. z i 0 For sparsity pursuit, during each iteration we remove those unabeed exampes whose corresponding z i s are zero. By the representer theorem, this woud not infuence the vaue of the objective function. For Theorem 9, this means that the eement of z whose vaues are zero in the ast iteration wi remain zero for the next iteration. When there are no unabeed exampes eigibe for eimination, the iteration wi terminate and the convergence point (ˆθ,ẑ) is reached. 2432

11 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS 3. Dua Optimization According to the iterative optimization agorithm, when optimizing probem (0), we start from an initia vaue z 0 and then sove θ 0. In this section, we show how to sove this subroutine with fixed z. Now the optimization probem is equivaent to min α,α 2,ξ,ξ 2,b,b 2 s.t. F 0 = 2 (ξ i + ξ i 2)+γ n (α K α + α 2 K 2 α 2 )+ γ v (K α K 2 α 2 ) U(K α K 2 α 2 ) y i ( +u j= α j k (x j,x i )+b ) ξ i, y i ( +u j= α j 2 k 2(x j,x i )+b 2 ) ξ i 2, (2) ξ i, ξi 2 0,,...,. 3. Lagrange Dua Function We wi sove probem (2) through optimizing its dua probem which is simper to sove. Now we derive its Lagrange dua function. Suppose λ i,λi 2,νi,νi 2 0 (i =,...,) be the Lagrange mutipiers associated with the inequaity constraints. Define λ j = [λ j,...,λ j ] and ν j = [ν j,...,ν j ] ( j =,2). The Lagrangian L(α,α 2,ξ,ξ 2,b,b 2,λ,λ 2,ν,ν 2 ) can be written as Note that L= F 0 [λ i +u (y i ( α j k (x j,x i )+b ) +ξ i )+ j= λ i +u 2(y i ( α j 2 k 2(x j,x i )+b 2 ) +ξ i 2)+ν i ξ i + ν i 2ξ i 2]. j= (K α K 2 α 2 ) U(K α K 2 α 2 ) = α K UK α 2α K UK 2 α 2 + α 2 K 2 UK 2 α 2. To obtain the Lagrangian dua function, L has to be minimized with respect to the prima variabes α,α 2,ξ,ξ 2,b,b 2. To eiminate these variabes, we compute the corresponding partia derivatives and set them to 0, obtaining the foowing conditions 2J α 2γ v K UK 2 α 2 = Λ, (3) 2J 2 α 2 2γ v K 2 UK α = Λ 2, (4) λ i + ν i = 2, (5) λ i 2+ ν i 2 = 2, (6) λ i y i = 0, λ i 2y i = 0, (7) 2433

12 SUN AND SHAWE-TAYLOR where J := γ n K + γ v K UK, J 2 := γ n K 2 + γ v K 2 UK 2, Λ := Λ 2 := λ i y i K (:,i), λ i 2y i K 2 (:,i), with K (:,i) and K 2 (:,i) being the ith coumn of the corresponding Gram matrices. Substituting (3) (7) into L resuts in the foowing expression of the Lagrangian dua function g L (λ,λ 2,ν,ν 2 ) g L = γ n (α K α + α 2 K 2 α 2 )+γ v (α K UK α 2α K UK 2 α 2 + α 2 K 2 UK 2 α 2 ) α Λ α 2 Λ 2 + = 2 α Λ + 2 α 2 Λ 2 α Λ α 2 Λ 2 + = 2 α Λ 2 α 2 Λ 2 + We obtain the foowing from (3) and (4) (λ i + λ i 2) (λ i + λ i 2) (λ i + λ i 2). (8) From (3) and (20), we have α = 2 J (Λ + 2γ v K UK 2 α 2 ) (9) α 2 = 2 J 2 (Λ 2+ 2γ v K 2 UK α ). (20) (2J 2γ 2 vk UK 2 J 2 K 2UK )α = Λ + γ v K UK 2 J 2 Λ 2. Define M = 2J 2γ 2 vk UK 2 J 2 K 2UK. Suppose the above inear system is we-posed (if iposed we can empoy approximate numerica anaysis techniques). We get From (4) and (9), we have α = M (Λ + γ v K UK 2 J 2 Λ 2). (2J 2 2γ 2 vk 2 UK J K UK 2 )α 2 = Λ 2 + γ v K 2 UK J Λ. Define M 2 = 2J 2 2γ 2 vk 2 UK J K UK 2. Thus we get α 2 = M 2 (Λ 2+ γ v K 2 UK J Λ ). 2434

13 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS Now with α and α 2 substituted into (8), the Lagrange dua function g L (λ,λ 2,ν,ν 2 ) is 3.2 Soving the Dua Probem g L = inf L= α,α 2,ξ,ξ 2,b,b 2 2 α Λ 2 α 2 Λ 2 + The Lagrange dua probem is given by = 2 (Λ + γ v K UK 2 J 2 Λ 2) M Λ 2 (Λ 2+ γ v K 2 UK J Λ ) M2 Λ 2+ (λ i + λ i 2). (λ i + λ i 2) max λ,λ 2 s.t. g L 0 λ i 2, 0 λ i 2 2, λi y i = 0, λi 2 y i = 0.,...,,..., As Lagrange dua functions are aways concave (Boyd and Vandenberghe, 2004), we can formuate the above probem as a convex optimization probem min λ,λ 2 s.t. g L 0 λ i 2, 0 λ i 2 2, λi y i = 0, λi 2 y i = 0.,...,,..., (2) Define matrix Y = diag(y,...,y ). Then, Λ = K Y λ and Λ 2 = K 2 Y λ 2 with K = K (:, :) and K 2 = K 2 (:, :). We have g L = 2 (Λ + γ v K UK 2 J 2 Λ 2) M Λ + 2 (Λ 2+ γ v K 2 UK J Λ ) M2 Λ 2 = ( 2 (λ λ A B 2) C D )( λ λ 2 (λ i + λ i 2) ) (λ + λ 2 ), where A B C D := Y KM K Y, := γ v Y KJ K UK 2 M2 K 2Y, := γ v Y K2J 2 K 2UK M K Y, := Y K2M 2 K 2Y, 2435

14 SUN AND SHAWE-TAYLOR and =(,..., () ). Substituting M and M 2 into the expressions of B and C, we ( can prove ) that B= C. In addition, A B because of the convexity of function g, we affirm that matrix is positive semi-definite. C D Hence, the optimization probem in (2) can be rewritten as min λ,λ 2 s.t. ( 2 (λ λ A B 2) C D 0 λ 2, 0 λ 2 2, λ y=0, λ 2 y=0, )( λ λ 2 ) (λ + λ 2 ) where y=(y,...,y ). After soving this probem using standard software, we then obtain ν i and ν i 2 by (5) and (6). We now state the advantages of optimizing this dua probem over optimizing the prima probem (2): Less optimization variabes as for typica semi-supervised earning u, and Simper constraint functions. The soution of bias terms b and b 2 can be obtained through support vectors. Due to KKT conditions, the foowing equaities hod λ i +u (y i ( α j k (x j,x i )+b ) +ξ i )=0, j= λ i +u 2(y i ( α j 2 k 2(x j,x i )+b 2 ) +ξ i 2)=0, j= ν i ξ i = 0, ν i 2ξ i 2 = 0,,...,. For support vectors x i, we have ν i j > 0 (and thus ξi j = 0) and λi j > 0 ( j =,2). Therefore, we can resove the bias terms by averaging y i ( +u j= α j k (x j,x i )+b ) = 0 and y i ( +u j= α j 2 k 2(x j,x i )+ b 2 ) =0 over a support vectors. 3.3 Advantages of Using Conjugate Functions In this subsection, we show the direct optimization of probem () without the use of conjugate functions is of arge scae and time-consuming, which justifies the advantages of using conjugate functions. 2436

15 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS The prima probem can be rewritten as min D 0 = α,α 2,ξ,ξ 2,b,b 2,δ i 2 (ξ i + ξ i 2)+γ n (α K α + α +u 2 K 2 α 2 )+γ v δ 2 i y i ( +u j= α j k (x j,x i )+b ) ξ i,,...,, y i ( +u j= α j 2 k 2(x j,x i )+b 2 ) ξ i 2,,...,, s.t. ξ i, ξi 2 0,,...,, (22) ( +u j= α j k (x j,x i )+b ) ( +u j= α j 2 k 2(x j,x i )+b 2 ) δ i ε,,...,+u, ( +u j= α j k (x j,x i )+b ) ( +u j= α j 2 k 2(x j,x i )+b 2 ) δ i + ε,,...,+u, where y i {, }, γ n,γ v 0. We wi sove probem (22) through optimizing its dua probem which can be simper to sove. Suppose λ i,λi 2,νi,νi 2 0(,...,) and µi,µi 2 (,...,+u) are the Lagrange mutipiers associated with the inequaity constraints of probem (22). Define δ=[δ,...,δ +u ], λ j =[λ j,...,λ j ], ν j = [ν j,...,ν j ], and µ j = [µ j,...,µ+u j ] ( j =,2). The Lagrangian L(α,α 2,ξ,ξ 2,b,b 2,δ,λ,λ 2,ν,ν 2,µ,µ 2 ) can be written as L= D 0 [λ i +u (y i ( α j k (x j,x i )+b ) +ξ i )+ j= λ i +u 2(y i ( α j 2 k 2(x j,x i )+b 2 ) +ξ i 2)+ν i ξ i + ν i 2ξ i 2] +u +u j= µ i +u [ j= µ i +u 2[ j= α j +u k (x j,x i )+b j= α j +u k (x j,x i )+b j= α j 2 k 2(x j,x i ) b 2 + δ i + ε]+ α j 2 k 2(x j,x i ) b 2 δ i ε]. To obtain the Lagrangian dua function, L has to be minimized with respect to the prima variabes α,α 2,ξ,ξ 2,b,b 2,δ. To eiminate these variabes, we compute the corresponding partia 2437

16 SUN AND SHAWE-TAYLOR derivatives and set them to 0, obtaining the foowing conditions 2γ n K α = 2γ n K 2 α 2 = λ i + ν i = 2,,..., λ i 2+ ν i 2 = 2,,..., λ i +u y i K (:,i)+ (µ i µ i 2)K (:,i), λ i +u 2y i K 2 (:,i) (µ i µ i 2)K 2 (:,i), λ i +u y i µ i +u + µ i 2 = 0, λ i +u 2y i + µ i +u µ i 2 = 0, 2γ v δ i µ i µ i 2 = 0,,...,+u. Substituting these equations into the Lagrangian as what was done in Section 3., it is cear that finay L is a quadratic function invoving λ,λ 2,µ,µ 2. The dua optimization probem woud be a quadratic optimization invoving 2 + 2( + u) parameters. Now we see this direct optimization is indeed of arge-scae and time-consuming. 4. Generaization Error In this section, we anayze the generaization performance of the sparse muti-view SVMs making use of Rademacher compexity theory and the margin bound. 4. Rademacher Compexity Theory Important background on Rademacher compexity theory (Bartett and Mendeson, 2002; Shawe- Tayor and Cristianini, 2004) is introduced beow. Definition For a sampe S ={x,...,x } generated by a distribution D on a set X and a reavaued function cassf with domainx, the empirica Rademacher compexity off is the random variabe ] 2 ˆR (F)=E σ [sup σ f F i f(x i ) x,...,x, where σ = {σ,...,σ } are independent uniform {±}-vaued (Rademacher) random variabes. The Rademacher compexity off is ] 2 R (F)=E S [ ˆR (F)]=E Sσ [sup σ f F i f(x i ). Lemma 2 Fix δ (0,) and et F be a cass of functions mapping from an input space X ( X = X Y or X =X) to[0,]. Let( x i ) be drawn independenty according to a probabiity distribution 2438

17 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS D. Then with probabiity at east δ over random draws of sampes of size, every f F satisfies n(2/δ) E D [ f( x)] Ê[ f( x)]+r (F)+ 2 n(2/δ) Ê[ f( x)]+ ˆR (F)+3, 2 where Ê[ f( x)] is the empirica error averaged on theexampes. 4.2 Margin Bound for Sparse Muti-view SVMs By (2), the prediction function of the sparse muti-view SVMs is derived from the average of predictions from two views. Define the soft prediction function as g(x)= 2 ( f (x)+ f 2 (x)). We obtain the foowing margin bound regarding the generaization error of sparse muti-view SVMs. This bound is widey appicabe to muti-view SVMs, for exampe, Szedmak and Shawe- Tayor (2007) independenty provided a simiar bound for the SVM-2K method. Theorem 3 Fix δ (0,) and etf be the cass of functions mapping from X =X Y to R given by f(x,y)= yg(x) where g= 2 ( f + f 2 ) G and f F. Let S={(x,y ),,(x,y )} be drawn independenty according to a probabiity distribution D. Then with probabiity at east δ over sampes of size, every g G satisfies P D (y sgn(g(x))) 2 n(2/δ) (ξ i + ξ i 2)+2 ˆR (G)+3, 2 where ξ i :=( y i f (x i )) +, ξ i 2 :=( y i f 2 (x i )) +. Function y i f (x i ) and y i f 2 (x i ) are caed margins. Proof Let H( ) be the Heaviside function that returns if its argument is greater than 0 and zero otherwise. We have P D (y sgn(g(x)))=e D [H( yg(x))]. (23) Consider a oss functiona : R [0,], given by, if a 0; A(a)= +a, if a 0; 0, otherwise. By Lemma 2 and since functiona dominates H, we have (Shawe-Tayor and Cristianini, 2004) E D [H( f(x,y)) ] E D [A( f(x,y)) ] n(2/δ) Ê[A( f(x,y)) ]+ ˆR ((A ) F)

18 SUN AND SHAWE-TAYLOR Therefore, E D [H( f(x,y))] In addition, we have n(2/δ) Ê[A( f(x,y))]+ ˆR ((A ) F)+3. 2 Ê[A( f(x,y))] = 2 2 = 2 ( y i g(x i )) + ( y i f (x i )+ y i f 2 (x i )) + [( y i f (x i )) + +( y i f 2 (x i )) + ] (ξ i + ξ i 2), where ξ i denotes the amount by which function f fais to achieve margin for(x i,y i ) and ξ i 2 appies simiary to function f 2. Since (A )(0)=0, we can appy the Lipschitz condition (Bartett and Mendeson, 2002) of function(a ) to get ˆR ((A ) F) 2ˆR (F). It remains to bound the empirica Rademacher compexity of the cassf. With y i {, }, we have ˆR (F) = E σ [sup 2 f F = E σ [sup 2 g G = E σ [sup 2 g G Finay, combining (23) (24) competes the proof. σ i f(x i,y i ) ] σ i y i g(x i ) ] σ i g(x i ) ] = ˆR (G). (24) 5. Empirica Rademacher Compexity Our optimization agorithm iterativey updates z to sove θ. In this section, we first derive the empirica Rademacher compexity of ˆR (G) for the function cass induced after one iteration with an initiay fixed z, and then give its formuation appicabe for any number of subsequent iterations incuding the termination case. This Rademacher compexity is crucia for Theorem 3 when anayzing the performance of the corresponding cassifiers obtained by our iterative optimization agorithm. Specificay, for the empirica Rademacher compexity we give the foowing theorem 2440

19 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS /2 )Uu, Theorem 4 SupposeS = γ n (K K K +K 2K2 K 2 ), Θ= γ n Uu /2 (K u K K u +K 2uK2 K 2u J = γ n Uu /2 (K u K K K 2uK2 K 2 ), where K and K 2 are respectivey the first rows of the Gram matrices K and K 2, K u and K 2u are respectivey the ast u rows of matrix K and K 2, and U u is the diagona matrix incuding the ast u diagona eements (initiay fixed z +,...,z +u ) of U. Then the empirica Rademacher compexity ˆR (G) is bounded as U 2 ˆR (G) U, where U2 = tr(s) γ v tr(j (I+ γ v Θ) J) for the first iteration of sparse muti-view SVMs, andu 2 = tr(s) for subsequent iterations. The remainder of this section before Section 5.4 competes the proof of this theorem, which was partiay inspired by Rosenberg and Bartett (2007) for anayzing co-reguarized east squares. We use probem (8) to reason about ˆR (G). As a resut of fixed z, we can remove f ε(z i ) without oss of generaity to resove f and f 2. It is true that the oss function ˆL :H H 2 [0, ) with ˆL := 2 [( y i f (x i )) + +( y i f 2 (x i )) + ] satisfies ˆL(0,0)=. Let Q( f, f 2 ) denote the objective function in (8) with f ε(z i ) removed. Substituting in the trivia predictors f 0 and f 2 0 gives the foowing upper bound min Q( f, f 2 ) Q(0,0)= ˆL(0,0)=. f, f 2 H H 2 Since a terms of Q( f, f 2 ) are nonnegative, we concude that any( f, f 2 ) minimizing Q( f, f 2 ) is contained in Ĥ = {( f, f 2 ) : γ n ( f 2 + f 2 2 +u )+γ v z i [ f (x i ) f 2 (x i )] 2 }. (25) i=+ Therefore, the fina predictor is chosen from the function cass The compexity ˆR (G) is G ={x 2 [ f (x)+ f 2 (x)] :( f, f 2 ) Ĥ}. ˆR (G)=E σ [ sup ( f, f 2 ) Ĥ ] σ i ( f (x i )+ f 2 (x i )). (26) As it ony depends on the vaues of function f ( ) and f 2 ( ) on theabeed exampes, by the reproducing kerne property which says the projection of function f onto a cosed subspace containing k(x, ) has the same vaue at x as f itsef does (Rosenberg and Bartett, 2007) we can restrict the function cassĥ to the span of abeed and unabeed data and thus write it as Ĥ = {( f, f 2 ) : γ n (α K α + α 2 K 2 α 2 )+ γ v (K u α K 2u α 2 ) U u (K u α K 2u α 2 ) } ( ) = {( f, f 2 ) :(α α α 2)N }, where K u and K 2u are respectivey the ast u rows of matrix K and K 2, U u is the diagona matrix incuding the ast u diagona eements of U, and ( ) ( ) K 0 K N := γ n + γ u 0 K v 2 K2u U u (K u K 2u ). (27) α 2 244

20 SUN AND SHAWE-TAYLOR 5. Evauating the Supremum in Eucidean Space Since ( f, f 2 ) Ĥ impies ( f, f 2 ) Ĥ, we can drop the absoute sign in (26). Now we can write ˆR (G) = ( ) E σ sup {σ K α + σ K 2 α 2 :(α α α 2)N } α,α 2 R α +u 2 = E σ sup α,α 2 R +u {σ (K K 2 ) ( α α 2 ) :(α α 2)N ( α α 2 ) }, (28) where K,K 2 represent the firstrows of the Gram matrices K and K 2, respectivey. For a symmetric positive definite matrix M, it is simpe to show that (Rosenberg and Bartett, 2007) sup v α= M /2 v. α:α Mα Without oss of generaity, suppose positive semi-definite matrix N in (28) is positive definite and thus has fu rank. If N does not have fu rank, we can use subspace decomposition to rewrite ˆR (G) to obtain a simiar representation. Thus, we can evauate the supremum as described above to get 5.2 Bounding ˆR (G) above and beow ˆR (G)= E σ N /2 ( K K 2 ) σ. We make use of the Kahane-Khintchine inequaity (Lataa and Oeszkiewicz, 994), stated here for convenience, to bound ˆR (G). Lemma 5 For any vectors a,,a n in a Hibert space and independent Rademacher random variabes σ,,σ n, we have By Lemma 5 we have n 2 E σ i a i 2 (E n σ i a i ) 2 E n σ i a i 2. U 2 ˆR (G) U, (29) where U 2 = E σ N /2 ( K K 2 ) σ 2 = E σ tr[(k K 2 )N ( K = tr[(k K 2 )N ( K K 2 K 2 ) ]. ) σσ ] Reca that N = γ n ( K 0 0 K 2 ) + γ v ( K u K 2u ) U u (K u K 2u ). 2442

21 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS Define Σ=γ n ( K 0 0 K 2 ), R=( K u K 2u ) U /2 u. Using the Sherman-Morrison-Woodbury formua (Goub and Loan, 996), we expand N as Define Ω=(K K 2 ). We get Define N = Σ γ v Σ R(I+ γ v R Σ R) R Σ. U 2 = tr(ωσ Ω ) γ v tr[ωσ R(I+ γ v R Σ R) R Σ Ω ]. S = ΩΣ Ω = γ n (K K K + K 2 K 2 K 2), Θ = R Σ R= Uu /2 (K u K γ K u+ K 2u K2 K 2u)Uu /2, n J = R Σ Ω = Uu /2 (K u K γ K K 2u K2 K 2). (30) n Putting expressions together, we get U 2 = tr(s) γ v tr(j (I+ γ v Θ) J). (3) 5.2. REGULARIZATION TERM ANALYSIS From (29) and (3), it is cear to see the roes the reguarization parameters γ n and γ v pay in the empirica Rademacher compexity ˆR (G). The amount of reduction in the Rademacher compexity brought by γ v is (γ v )=γ v tr(j (I+ γ v Θ) J). This term has the property shown by the foowing emma given by Rosenberg and Bartett (2007) when anayzing co-reguarized east squares. Here the meanings of J and Θ are different from Rosenberg and Bartett (2007). Lemma 6 (Rosenberg and Bartett, 2007) (0)=0, (γ v ) is nondecreasing on γ v 0, and given that Θ is positive definite, we have 5.3 Extending to Iterative Optimization im (γ v)= tr(j Θ J). γ v As our sparse muti-view SVMs empoy an iterative optimization procedure for sparsity pursuit, the former outcome for empirica Rademacher compexity woud not appy if we use more than one iteration to update z. However, we can extend the former anaysis to suit this case. 2443

22 SUN AND SHAWE-TAYLOR Reca that z i [0,)(i=+,...,+u) and U u = diag(z +,...,z +u ). During iterations, it is possibe that U u becomes a zero matrix or other arbitrary matrix with diagona eements in the range [0,). In any case, the resutant function cass can be covered by Ĥ = {( f, f 2 ) : γ n ( f 2 + f 2 2 ) }, which is obtained by omitting the term containing z i in (25). Foowing a simiar derivation, the matrix N in (27) woud be ( ) K 0 N = γ n. 0 K 2 Finay, we can obtain a bound on the empirica Rademacher compexity ˆR (G) identica to (29) but nowu 2 = tr(s) withs defined in (30). The proof of Theorem 4 is competed. 5.4 Examining ˆR (G) Here, we examine the roe ˆR (G) pays in the margin bound. Since K and K 2 are the first rows of K and K 2, the formuation of tr(s) can be simpified as tr(s) = γ n tr(k K K + K 2 K 2 K 2)= γ n (K (i,i)+k 2 (i,i)). (32) Now, we see that for iterative optimization of sparse muti-view SVMs, the empirica Rademacher compexity ˆR (G) withu 2 = tr(s) ony depends on the abeed exampes and the chosen kerne functions. Consequenty, the margin bound does not rey on the unabeed training sets. In this case the margin bound is quite straightforward to reason. If we do not use iterative optimization, the empirica Rademacher compexity ˆR (G) wi invove other two terms Θ and J. By a simiar technique as in (32), we can show that Θ ony depends on the unabeed data and the kerne functions, whie J encodes the interaction between abeed and unabeed data. As a resut, the margin bound reies on both abeed and unabeed data. For this case, we wi give an evauation of the margin bound with different sizes of unabeed sets in Section Experiments We performed experiments on artificia data and rea-word data to evauate the proposed sparse muti-view SVMs (SpMvSVMs). For SpMvSVMs with ε>0, the entries of z were fixed as for abeed data and initiaized as for unabeed data. The termination condition for iterative optimization is either no unabeed exampes can be removed or the maximum iteration number surpasses 50. Comparisons are made with supervised SVMs, and the unsupervised SVM-2K method. Each accuracy/error reported in this paper is an averaged accuracy/error vaue over ten random spits of data into abeed, unabeed and test data. Later in this section, we aso provide a sequentia training strategy for SpMvSVMs, which shows an accuracy improvement over the gradua adding of unabeed data whie with roughy inear and sub-inear increases of running time. This indicates the possibiity and potentia of appying SpMvSVMs to arge-scae data sets. At the end, margin bound evauation resuts are reported. 2444

23 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS 2 Cass Cass 2 2 Cass Cass (a) View (b) View 2 Figure : Exampes in the two-moons-two-ines data set. 6. Artificia Data This two-moons-two-ines synthetic data set was generated according to Sindhwani et a. (2005). Exampes in two casses scatter ike two moons in one view and two parae ines in the other. To ink the two views, points on one moon were enforced to associate at random with points on one ine. Each cass has 400 exampes and a tota of 800 exampes were generated as shown in Figure. For SpMvSVMs, the numbers of exampes in the abeed training set, unabeed training set and test set were fixed as four, 596, and 200, respectivey. Gaussian kerne with bandwidth 0.35 and the inear kerne were used for view and view 2, respectivey. The parameters γ n and γ v were seected from a sma grid {0 6,0 4,0 2,,0,00} by five-fod cross vaidation on the whoe data set. The chosen vaues are γ n = 0 4 and γ v =. In this paper, γ v is normaized by the number of abeed and unabeed exampes invoved in the muti-view reguarization term. For supervised SVMs, which concatenated features from the two views, we aso found the reguarization coefficient from this grid by five-fod cross vaidation. To evauate SpMvSVMs, we varied the size of the unabeed training set from 20%, 60% to 00% of the tota number of unabeed data, and used different vaues for the insensitive parameter ε, which ranged from 0 to 0.2 with an interva 0.0 (when ε is zero, sparsity is not considered). The test accuracies and transductive accuracies (on the corresponding unabeed set) are given in Figure 2(a) and Figure 2(b), respectivey. It shoud be noted that the numbers of data used to cacuate transductive accuracies are different for the three curves in Figure 2(b). The numbers of removed unabeed exampes for different ε vaues are shown in Figure 3. From Figure 2 and Figure 3, we find that with the increase of ε, more and more unabeed data are removed, and the remove of a sma number of unabeed data can hardy decrease the performance of the resutant cassifiers, especiay when the origina size of unabeed set is arge. Therefore, we can find a good baance between sparsity and accuracy using an appropriate ε. In addition, more unabeed data can benefit the performance of the earned cassifiers with the same ε. 2445

24 SUN AND SHAWE-TAYLOR Accuracy 0.9 Accuracy SVM #Unabeed:20% #Unabeed:60% #Unabeed:00% Epsion 0.85 SVM #Unabeed:20% #Unabeed:60% #Unabeed:00% Epsion (a) Test accuracy (b) Transductive accuracy Figure 2: Cassification accuracies of SpMvSVMs with different sizes of unabeed set and ε vaues on the artificia data. The accuracies of SVMs are aso shown. Number of Removed Unabeed Exampes #Unabeed:20% #Unabeed:60% #Unabeed:00% Epsion Figure 3: The numbers of unabeed exampes removed by SpMvSVMs for different ε vaues on the artificia data. 6.2 Text Cassification We appied the SpMvSVMs to the WebKB text cassification task studied in Bum and Mitche (998); Sindhwani et a. (2005); Sun (2008). The data set consists of 05 two-view web pages coected from the computer science department web sites at four U.S. universities: Corne, University of Washington, University of Wisconsin, and University of Texas. The task is to predict whether a web page is a course home page or not. This probem has an unbaanced cass distribution since there are a tota of 230 course home pages (positive exampes). The first view of the data is the words appearing on the web page itsef, whereas the second view is the underined words 2446

25 SPARSE SEMI-SUPERVISED LEARNING USING CONJUGATE FUNCTIONS Accuracy 0.85 Accuracy SVM #Unabeed:20% #Unabeed:60% #Unabeed:00% Epsion 0.8 SVM #Unabeed:20% #Unabeed:60% #Unabeed:00% Epsion (a) Test accuracy (b) Transductive accuracy Figure 4: Cassification accuracies of SpMvSVMs with different sizes of unabeed set and ε vaues on text cassification. The accuracies of SVMs are aso shown. in a inks pointing to the web page from other pages. We preprocessed each view by removing stop words, punctuation and numbers and then appied Porter s stemming to the text (Porter, 980). In addition, words that occur in five or fewer documents were ignored. This resuted in 2332 and 87-dimensiona vectors in the first and second view, respectivey. Finay, document vectors were normaized to t f.id f (the product of term frequency and inverse document frequency) features (Saton and Buckey, 988). For SpMvSVMs, the numbers of exampes in the abeed training set, unabeed training set and test set were fixed as 32, 699, and 320, respectivey. In the training set and test set, the numbers of negative exampes are three times of those of positive exampes to refect the overa proportion of positive and negative exampes. The inear kerne was used for both views. The parameters γ n and γ v for SpMvSVMs and the reguarziation coefficient for SVMs were seected using the same method as in Section 6.. The chosen vaues for SpMvSVMs are γ n = 0 6 and γ v = 0.0. To evauate SpMvSVMs, we aso varied the size of the unabeed training set from 20%, 60% to 00% of the tota number of unabeed data, and used different vaues for the insensitive parameter ε ranging from 0 to 0.2 with an interva 0.0. The test accuracies and transductive accuracies are given in Figure 4(a) and Figure 4(b), respectivey. The numbers of removed unabeed exampes for different ε vaues are shown in Figure 5. From Figure 5, we find that with the increase of ε, more and more unabeed data can be removed. Refected by Figure 4, the remove of unabeed data ony sighty decrease the performance of the resutant cassifiers, and this decrease is ess when more unabeed data is used. We draw a same concusion as before: an appropriate ε can be adopted to keep a good baance between sparsity and accuracy. We aso observe a different phenomenon, that is, using 60% and 00% unabeed data resut in simiar test accuracies as shown in Figure 4(a). This is reasonabe because the performance improvement of any cassifier is aways bounded no matter how many data are used. 2447

26 SUN AND SHAWE-TAYLOR Number of Removed Unabeed Exampes #Unabeed:20% #Unabeed:60% #Unabeed:00% Epsion Figure 5: The numbers of unabeed exampes removed by SpMvSVMs for different ε vaues on text cassification. 6.3 Comparison with SVM-2K, and Sequentia Training The SVM-2K method proposed by Szedmak and Shawe-Tayor (2007) can expoit unabeed data for muti-view earning. Simiar to SpMvSVMs, it aso combines the maximum margin and mutiview reguarization principes. However, it adopts norm for muti-view reguarization, and this reguarization ony uses unabeed data. Specificay, the SVM-2K method has the foowing optimization for cassifier parameters w, w 2, b, and b 2 in two views min s.t. 2 w w 2 2 +C ξ i +C 2 ξ i 2+C η +u η j j=+ w φ (x j )+b w 2 φ 2(x j ) b 2 η j + ε y i (w φ (x i )+b ) ξ i y i (w 2 φ 2(x i )+b 2 ) ξ i 2 ξ i 0,ξi 2 0,η j 0 for a,...,,and j=+,...,+u, where an ε-insensitive parameter is used to reax the prediction consistency between views. In this subsection, we carry out an empirica comparison between SVM-2K and our SpMvSVMs for semi-supervised earning with identica data spits. Our first comparison takes the ε-insensitive parameter in both SpMvSVMs and SVM-2K as zero and uses the above two data sets with different sizes of unabeed training sets, namey, from 20% to 60% to 00%. For SVM-2K, we adopted the same parameter seection approach as in Szedmak and Shawe-Tayor (2007) through five-fod cross-vaidation. That is, the vaues of C and C 2 were fixed to and C η were seected from the range{0.0 2 i }(,...,0). The experimenta resuts are isted in Tabe, from which we see that both the test accuracies and transductive accuracies of SpMvSVMs are superior to those counterparts of SVM-2K. The second comparison considers sequentia training of SpMvSVMs and SVM-2K. The purpose is to show the reationship between running time, cassification accuracies and the number of 2448

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

New Efficiency Results for Makespan Cost Sharing

New Efficiency Results for Makespan Cost Sharing New Efficiency Resuts for Makespan Cost Sharing Yvonne Beischwitz a, Forian Schoppmann a, a University of Paderborn, Department of Computer Science Fürstenaee, 3302 Paderborn, Germany Abstract In the context

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

arxiv: v2 [stat.ml] 17 Mar 2015

arxiv: v2 [stat.ml] 17 Mar 2015 Journa of Machine Learning Research 1x (201x) x-xx Submitted x/0x; Pubished x/0x A Unifying Framework in Vector-vaued Reproducing Kerne Hibert Spaces for Manifod Reguarization and Co-Reguarized Muti-view

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems

Convergence Property of the Iri-Imai Algorithm for Some Smooth Convex Programming Problems Convergence Property of the Iri-Imai Agorithm for Some Smooth Convex Programming Probems S. Zhang Communicated by Z.Q. Luo Assistant Professor, Department of Econometrics, University of Groningen, Groningen,

More information

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction

Akaike Information Criterion for ANOVA Model with a Simple Order Restriction Akaike Information Criterion for ANOVA Mode with a Simpe Order Restriction Yu Inatsu * Department of Mathematics, Graduate Schoo of Science, Hiroshima University ABSTRACT In this paper, we consider Akaike

More information

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA

T.C. Banwell, S. Galli. {bct, Telcordia Technologies, Inc., 445 South Street, Morristown, NJ 07960, USA ON THE SYMMETRY OF THE POWER INE CHANNE T.C. Banwe, S. Gai {bct, sgai}@research.tecordia.com Tecordia Technoogies, Inc., 445 South Street, Morristown, NJ 07960, USA Abstract The indoor power ine network

More information

Some Properties of Regularized Kernel Methods

Some Properties of Regularized Kernel Methods Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di

More information

Efficient Generation of Random Bits from Finite State Markov Chains

Efficient Generation of Random Bits from Finite State Markov Chains Efficient Generation of Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Adaptive Regularization for Transductive Support Vector Machine

Adaptive Regularization for Transductive Support Vector Machine Adaptive Reguarization for Transductive Support Vector Machine Zengin Xu Custer MMCI Saarand Univ. & MPI INF Saarbrucken, Germany zxu@mpi-inf.mpg.de Rong Jin Computer Sci. & Eng. Michigan State Univ. East

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

Combining reaction kinetics to the multi-phase Gibbs energy calculation

Combining reaction kinetics to the multi-phase Gibbs energy calculation 7 th European Symposium on Computer Aided Process Engineering ESCAPE7 V. Pesu and P.S. Agachi (Editors) 2007 Esevier B.V. A rights reserved. Combining reaction inetics to the muti-phase Gibbs energy cacuation

More information

PHYS 110B - HW #1 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased

PHYS 110B - HW #1 Fall 2005, Solutions by David Pace Equations referenced as Eq. # are from Griffiths Problem statements are paraphrased PHYS 110B - HW #1 Fa 2005, Soutions by David Pace Equations referenced as Eq. # are from Griffiths Probem statements are paraphrased [1.] Probem 6.8 from Griffiths A ong cyinder has radius R and a magnetization

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

Multi-view Laplacian Support Vector Machines

Multi-view Laplacian Support Vector Machines Multi-view Laplacian Support Vector Machines Shiliang Sun Department of Computer Science and Technology, East China Normal University, Shanghai 200241, China slsun@cs.ecnu.edu.cn Abstract. We propose a

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Approximated MLC shape matrix decomposition with intereaf coision constraint Thomas Kainowski Antje Kiese Abstract Shape matrix decomposition is a subprobem in radiation therapy panning. A given fuence

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SIAM J. NUMER. ANAL. Vo. 0, No. 0, pp. 000 000 c 200X Society for Industria and Appied Mathematics VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

FFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection

FFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection FFTs in Graphics and Vision Spherica Convoution and Axia Symmetry Detection Outine Math Review Symmetry Genera Convoution Spherica Convoution Axia Symmetry Detection Math Review Symmetry: Given a unitary

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

arxiv: v1 [cs.db] 1 Aug 2012

arxiv: v1 [cs.db] 1 Aug 2012 Functiona Mechanism: Regression Anaysis under Differentia Privacy arxiv:208.029v [cs.db] Aug 202 Jun Zhang Zhenjie Zhang 2 Xiaokui Xiao Yin Yang 2 Marianne Winsett 2,3 ABSTRACT Schoo of Computer Engineering

More information

14 Separation of Variables Method

14 Separation of Variables Method 14 Separation of Variabes Method Consider, for exampe, the Dirichet probem u t = Du xx < x u(x, ) = f(x) < x < u(, t) = = u(, t) t > Let u(x, t) = T (t)φ(x); now substitute into the equation: dt

More information

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations

Global Optimality Principles for Polynomial Optimization Problems over Box or Bivalent Constraints by Separable Polynomial Approximations Goba Optimaity Principes for Poynomia Optimization Probems over Box or Bivaent Constraints by Separabe Poynomia Approximations V. Jeyakumar, G. Li and S. Srisatkunarajah Revised Version II: December 23,

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

Physics 235 Chapter 8. Chapter 8 Central-Force Motion

Physics 235 Chapter 8. Chapter 8 Central-Force Motion Physics 35 Chapter 8 Chapter 8 Centra-Force Motion In this Chapter we wi use the theory we have discussed in Chapter 6 and 7 and appy it to very important probems in physics, in which we study the motion

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information

Section 6: Magnetostatics

Section 6: Magnetostatics agnetic fieds in matter Section 6: agnetostatics In the previous sections we assumed that the current density J is a known function of coordinates. In the presence of matter this is not aways true. The

More information

An explicit Jordan Decomposition of Companion matrices

An explicit Jordan Decomposition of Companion matrices An expicit Jordan Decomposition of Companion matrices Fermín S V Bazán Departamento de Matemática CFM UFSC 88040-900 Forianópois SC E-mai: fermin@mtmufscbr S Gratton CERFACS 42 Av Gaspard Coriois 31057

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

Coupling of LWR and phase transition models at boundary

Coupling of LWR and phase transition models at boundary Couping of LW and phase transition modes at boundary Mauro Garaveo Dipartimento di Matematica e Appicazioni, Università di Miano Bicocca, via. Cozzi 53, 20125 Miano Itay. Benedetto Piccoi Department of

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Math 124B January 17, 2012

Math 124B January 17, 2012 Math 124B January 17, 212 Viktor Grigoryan 3 Fu Fourier series We saw in previous ectures how the Dirichet and Neumann boundary conditions ead to respectivey sine and cosine Fourier series of the initia

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

8 Digifl'.11 Cth:uits and devices

8 Digifl'.11 Cth:uits and devices 8 Digif'. Cth:uits and devices 8. Introduction In anaog eectronics, votage is a continuous variabe. This is usefu because most physica quantities we encounter are continuous: sound eves, ight intensity,

More information

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES

THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES THE REACHABILITY CONES OF ESSENTIALLY NONNEGATIVE MATRICES by Michae Neumann Department of Mathematics, University of Connecticut, Storrs, CT 06269 3009 and Ronad J. Stern Department of Mathematics, Concordia

More information

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l

Investigation on spectrum of the adjacency matrix and Laplacian matrix of graph G l Investigation on spectrum of the adjacency matrix and Lapacian matrix of graph G SHUHUA YIN Computer Science and Information Technoogy Coege Zhejiang Wani University Ningbo 3500 PEOPLE S REPUBLIC OF CHINA

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

Appendix for Stochastic Gradient Monomial Gamma Sampler

Appendix for Stochastic Gradient Monomial Gamma Sampler 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 4 5 6 7 8 9 3 3 3 33 34 35 36 37 38 39 4 4 4 43 44 45 46 47 48 49 5 5 5 53 54 Appendix for Stochastic Gradient Monomia Gamma Samper A The Main Theorem We provide the foowing

More information

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case

Analysis of Emerson s Multiple Model Interpolation Estimation Algorithms: The MIMO Case Technica Report PC-04-00 Anaysis of Emerson s Mutipe Mode Interpoation Estimation Agorithms: The MIMO Case João P. Hespanha Dae E. Seborg University of Caifornia, Santa Barbara February 0, 004 Anaysis

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Bourgain s Theorem. Computational and Metric Geometry. Instructor: Yury Makarychev. d(s 1, s 2 ).

Bourgain s Theorem. Computational and Metric Geometry. Instructor: Yury Makarychev. d(s 1, s 2 ). Bourgain s Theorem Computationa and Metric Geometry Instructor: Yury Makarychev 1 Notation Given a metric space (X, d) and S X, the distance from x X to S equas d(x, S) = inf d(x, s). s S The distance

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information

arxiv: v1 [math.fa] 23 Aug 2018

arxiv: v1 [math.fa] 23 Aug 2018 An Exact Upper Bound on the L p Lebesgue Constant and The -Rényi Entropy Power Inequaity for Integer Vaued Random Variabes arxiv:808.0773v [math.fa] 3 Aug 08 Peng Xu, Mokshay Madiman, James Mebourne Abstract

More information

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS

NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS NOISE-INDUCED STABILIZATION OF STOCHASTIC DIFFERENTIAL EQUATIONS TONY ALLEN, EMILY GEBHARDT, AND ADAM KLUBALL 3 ADVISOR: DR. TIFFANY KOLBA 4 Abstract. The phenomenon of noise-induced stabiization occurs

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

Approximated MLC shape matrix decomposition with interleaf collision constraint

Approximated MLC shape matrix decomposition with interleaf collision constraint Agorithmic Operations Research Vo.4 (29) 49 57 Approximated MLC shape matrix decomposition with intereaf coision constraint Antje Kiese and Thomas Kainowski Institut für Mathematik, Universität Rostock,

More information