Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

Size: px
Start display at page:

Download "Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels"

Transcription

1 Non-parametric Group Orthogona Matching Pursuit for Sparse Learning with Mutipe Kernes Vikas Sindhwani and Auréie C. Lozano IBM T.J. Watson Research Center Yorktown Heights, NY 0598 Abstract We consider reguarized risk minimization in a arge dictionary of Reproducing kerne Hibert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commony referred to as Sparse Mutipe Kerne Learning (MKL), may be viewed as the non-parametric extension of group sparsity in inear modes. Whie the two dominant agorithmic strands of sparse earning, namey convex reaxations using norm (e.g., Lasso) and greedy methods (e.g., OMP), have both been rigorousy extended for group sparsity, the sparse MKL iterature has so far mainy adopted the former with mid empirica success. In this paper, we cose this gap by proposing a Group-OMP based framework for sparse MKL. Unike -MKL, our approach decoupes the sparsity reguarizer (via a direct 0 constraint) from the smoothness reguarizer (via RKHS norms), which eads to better empirica performance and a simper optimization procedure that ony requires a back-box singe-kerne sover. The agorithmic deveopment and empirica studies are compemented by theoretica anayses in terms of Rademacher generaization bounds and sparse recovery conditions anaogous to those for OMP [27] and Group-OMP [6]. Introduction Kerne methods are widey used to address a variety of earning probems incuding cassification, regression, structured prediction, data fusion, custering and dimensionaity reduction [22, 23]. However, choosing an appropriate kerne and tuning the corresponding hyper-parameters can be highy chaenging, especiay when itte is known about the task at hand. In addition, many modern probems invove mutipe heterogeneous data sources (e.g. gene functiona cassification, prediction of protein-protein interactions) each necessitating the use of a different kerne. This strongy suggests avoiding the risks and imitations of singe kerne seection by considering fexibe combinations of mutipe kernes. Furthermore, it is appeaing to impose sparsity to discard noisy data sources. As severa papers have provided evidence in favor of using mutipe kernes (e.g. [9, 4, 7]), the mutipe kerne earning probem (MKL) has generated a arge body of recent work [3, 5, 24, 33], and become the foca point of the intersection between non-parametric function estimation and sparse earning methods traditionay expored in inear settings. Given a convex oss function, the MKL probem is usuay formuated as the minimization of empirica risk together with a mixed norm reguarizer, e.g., the square of the sum of individua RKHS norms, or variants thereof, that have a cose reationship to the Group Lasso criterion [30, 2]. Equivaenty, this formuation may be viewed as simutaneous optimization of both the non-negative convex combination of kernes, as we as prediction functions induced by this combined kerne. In constraining the combination of kernes, the penaty is of particuar interest as it encourages sparsity in the supporting kernes, which is highy desirabe when the number of kernes considered is arge. The MKL iterature has rapidy evoved aong two directions: one concerns scaabiity of op-

2 timization agorithms beyond the eary pioneering proposas based on Semi-definite programming or Second-order Cone programming [3, 5] to simper and more efficient aternating optimization schemes [20, 29, 24]; whie the other concerns the use of p norms [0, 29] to construct compex non-sparse kerne combinations with the goa of outperforming -norm MKL which, as reported in severa papers, has demonstrated mid success in practica appications. The cass of Orthogona Matching Pursuit techniques has recenty received considerabe attention, as a competitive aternative to Lasso. The basic OMP agorithm originates from the signa-processing community and is simiar to forward greedy feature seection, except that it performs re-estimation of the mode parameters in each iteration, which has been shown to contribute to improved accuracy. For inear modes, some strong theoretica performance guarantees and empirica support have been provided for OMP [3] and its extension for variabe group seection, Group-OMP [6]. In particuar it was shown in [25, 9] that OMP and Lasso exhibit competitive theoretica performance guarantees. It is therefore desirabe to investigate the use of Matching Pursuit techniques in the MKL framework and whether one may be abe to improve upon existing MKL methods. Our contributions in this paper are as foows. We propose a non-parametric kerne-based extension to Group-OMP [6]. In terms of the feature space (as opposed to function space) perspective of kerne methods, this aows Group-OMP to hande groups that can potentiay contain infinite features. By adding reguarization in Group-OMP, we aow it to hande settings where the sampe size might be smaer than the number of features in any group. Rather than imposing a mixed /RKHSnorm reguarizer as in group-lasso based MKL, a group-omp based approach aows us to consider the exact sparse kerne seection probem via 0 reguarization instead. Note that in contrast to the group-asso penaty, the 0 penaty by itsef has no effect on the smoothness of each individua component. This aows for a cear decouping between the roe of the smoothness reguarizer (namey, an RKHS reguarizer) and the sparsity reguarizer (via the 0 penaty). Our greedy agorithms aow for simpe and fexibe optimization schemes that ony require a back-box sover for standard earning agorithms. In this paper, we focus on mutipe kerne earning with Reguarized east squares (RLS). We provide a bound on the Rademacher compexity of the hypothesis sets considered by our formuation. We derive conditions anaogous to OMP [27] and Group-OMP [6] to guarantee the correctness of kerne seection. We cose this paper with empirica studies on simuated and rea-word datasets that confirm the vaue of our methods. 2 Learning Over an RKHS Dictionary In this section, we setup some notation and give a brief background before introducing our main objective function and describing our agorithm in the next section. Let H...H N be a coection of Reproducing Kerne Hibert Spaces with associated Kerne functions k...k N defined on the input space X R d. Let H denote the sum space of functions, H = H H 2... H N = {f : X R f(x) = N f j (x),x X,f j H j,j =...N} Let us equip this space with the foowing p norms, p N N f p(h) = inf f j p H j : f(x) = f j (x),x X,f j H j,j =...N () j= It is now natura to consider a reguarized risk minimization probem over such a RKHS dictionary, given a coection of training exampes {x i,y i } i=, arg min f H j= j= V(y i,f(x i ))+λ f 2 p(h) (2) i= wherev(, ) is a convex oss function such as squared oss in the Reguarized Least Squares (RLS) agorithm or the hinge oss in the SVM method. If this probem again has eements of an RKHS structure, then, via the Representer Theorem, it can again be reduced to a finite dimensiona probem and efficienty soved. 2

3 Let q = p 2 p and et us define theq-convex hu of the set of kerne functions to be the foowing, N N co q (k...k N ) = k γ : X X R k γ (x,z) = γ j k j (x,z), γ q j =,γ j 0 where γ R N. It is easy to see that the non-negative combination of kernes, k γ, is itsef a vaid kerne with an associated RKHS H kγ. With this definition, [7] show the foowing, { } f p(h) = inf f Hkγ,k γ co q (k...k N ) (3) γ This reationship connects Tikhonov reguarization with p norms over H to reguarization over RKHSs parameterized by the kerne functions k γ. This eads to a arge famiy of mutipe kerne earning agorithms (whose variants are aso sometimes referred to as q -MKL) where the basic idea is to sove an equivaent probem, arg min f H kγ,γ q j= j= V(y i,f(x i ))+λ f 2 H kγ (4) i= where q = {γ R N : γ q =, n j= γ j 0}. For a fixed γ, the optimization over f H kγ is recognizabe as an RKHS probem for which a standard back box sover may be used. The weights γ may then optimized in an aternating minimization scheme, athough severa other optimization procedures are aso be used (see e.g., [4]). The case where p = is of particuar interest in the setting when the size of the RKHS dictionary is arge but the unknown target function can be approximated in a much smaer number of RKHSs. This eads to a arge famiy of sparse mutipe kerne earning agorithms that have a strong connection to the Group Lasso [2, 20, 29]. 3 Mutipe Kerne Learning with Group Orthogona Matching Pursuit Let us reca the 0 pseudo-norm, which is the cardinaity of the sparsest representation of f in the dictionary, f 0(H) = min{ J : f = j J f j}. We now pose the foowing exact sparse kerne seection probem, arg min f H V(y i,f(x i ))+λ f 2 2(H) subject to f 0(H) s (5) i= It is important to note the foowing: when using a dictionary of universa kernes, e.g., Gaussian kernes with different bandwidths, the presence of the reguarization term f 2 2(H) is critica (i.e., λ > 0) since otherwise the abeed data can be perfecty fit by any singe kerne. In other words, the kerne seection probem is i-posed. Whie conceptuay simpe, our formuation is quite different from those proposed earier since the roe of a smoothness reguarizer (via the f 2 2(H) penaty) is decouped from the roe of a sparsity reguarizer (via the constraint on f 0(H) s). Moreover, the atter is imposed directy as opposed through ap = penaty making the spirit of our approach coser to Group Orthogona Matching Pursuit (Group-OMP [6]) where groups are formed by very highdimensiona (infinite for Gaussian kernes) feature spaces associated with the kernes. It has been observed in recent work [0, 29] on -MKL that sparsity aone does not ead it to improvements in rea-word empirica tasks and hence severa methods have been proposed to expore q -norm MKL with q > in Eqn. 4, making MKL depart away from sparsity in kerne combinations. By contrast, we note that as q, p 2. Our approach gives a direct knob both on smoothness (via λ) and sparsity (via s) with a soution path aong these dimensions that differs from that offered by Group-Lasso based q -MKL as q is varied. By combining 0 pseudo-norm with RKHS norms, our method is conceptuay reminiscent of the eastic net [32] (aso see [26, 2, 2]). If kernes arise from different subsets of input variabes, our approach is aso reated to sparse additive modes [8]. Our agorithm, MKL-GOMP, is outined beow for reguarized east squares. Extensions for other oss functions, e.g., hinge oss for SVMs, can aso be simiary derived. In the description of the agorithm, our notation is as foows: For any functionf beonging to an RKHSF k with kerne function k(, ), we denote the reguarized objective function as,r λ (f,y) = i= (y i f(x i )) 2 +λ f Fk 3

4 where F denotes the RKHS norm. Reca that the minimizer f = argmin f F R λ (f,y) is given by soving the inear system, α = (K + λi) y where K is the gram matrix of the kerne on the abeed data, and by setting f (x) = i= α ik(x,x i ). Moreover, the objective vaue achieved by the minimizer is: R λ (f,y) = λy T (K + λi) y. Note that MKL-GOMP shoud not be confused with Kerne Matching Pursuit [28] whose goa is different: it is designed to sparsify α in a singe-kerne setting. The MKL-GOMP procedure iterativey expands the hypothesis space, H G () H G (2)... H G (i), by greediy seecting kernes from a given dictionary, where G (i) {...N} is a subset of indices and H G = j G H j. Note that each H G is an RKHS with kerne k G = j G k j (see Section 6 in []). The seection criteria is the best improvement, I(f (i),h j ), given by a new hypothesis space H j in reducing the norm of the current residua r (i) = y f (i) where f (i) = [f (i) (x )...f (i) (x )] T, by finding the best reguarized (smooth) approximation. Note that since min g Hj R λ (g,r) R λ (0,r) = r 2, the vaue of the improvement function, I(f (i),h j ) = r (i) 2 2 min R λ (g,r (i) ) g H j is aways non-negative. Once a kerne is seected, the function is re-estimated by earning in H G (i). Note that since H G is an RKHS whose kerne function is the sum j G k j, we can use a simpe RLS inear system sover for refitting. Unike group-lasso based MKL, we do not need an iterative kerne reweighting step which essentiay arises as a mechanism to transform the ess convenient group sparsity norms into reweighted squared RKHS norms. MKL-GOMP converges when the best improvement is no better than ǫ (or, in practice, if a maximum aowed number of s kernes have been seected). Input: Data matrix X = [x...x ] T, Labe vector y R, Kerne Dictionary {k j(, )} N j=, Precisionǫ > 0, Maximum sparsitys Output: Seected Kernes G (i) and a functionf (i) H G (i) Initiaization: G (0) =,f (0) = 0, set residua r (0) = y for i = 0,,2,...s. Kerne Seection: For a j / G (i), set: I(f (i),h j) = r (i) 2 2 min g Hj R λ (g,r (i) ) = r ( (i)t I λ(k j +λi) ) r (i) Pick j (i) = argmax j/ G (i) I(f (i),h j) ( ) 2. Convergence Check: if I(f (i),h j (i)) ǫ return f (i) end 3. Refitting: Set G (i+) = G (i) {j (i) }. Setf (i+) (x) = j= αjk(x,xj) where k = j G (i+) kj andα = ( j G (i+) Kj +λi ) y 4. Update Residua: r (i+) = y f (i+) where f (i+) = [f (i+) (x )...f (i+) (x )] T. Remarks: Note that our agorithm can be appied to mutivariate probems with group structure among outputs simiar to Mutivariate Group-OMP [5]. In particuar, in our experiments on muticass datasets, we treat a outputs as a singe group and evauate each kerne for seection based on how we the tota residua is reduced across a outputs simutaneousy. Kerne matrices are normaized to unit trace or to have uniform variance of data points in their associated feature spaces, as in [0, 33]. In practice, we can aso monitor error on a vaidation set to decide the optima degree of sparsity. For efficiency, we can precompute the matrices Q j = (I λ(k j + λi) ) 2 so that I(f (i),h j ) = Q j r 2 2 can be very quicky evauated at seection time, and/or reduce the search space by considering a random subsampe of the dictionary. 4 Theoretica Anaysis Our anaysis is composed of two parts. In the first part, we estabish generaization bounds for the hypothesis spaces considered by our formuation, based on the notion of Rademacher compex- 4

5 ity. The second component of our theoretica anaysis consists of deriving conditions under which MKL-GOMP can recover good soutions. Whie the first part can be seen as characterizing the statistica convergence of our method, the second part characterizes its numerica convergence as an optimization method, and is required to compement the first part. This is because matching pursuit methods can be deemed to sove an exact sparse probem approximatey, whie reguarized methods (e.g. norm MKL) sove an approximate probem exacty. We therefore need to show that MKL-GOMP recovers a soution that is cose to an optimum soution of the exact sparse probem. 4. Rademacher Bounds Theorem. Consider the hypothesis space of sufficienty sparse and smooth functions, { } H τ,s = f H : f 2 τ, f 2(H) 0(H) s Let δ (0,) and κ = sup x X,j=...N k j (x,x). Let ρ be any probabiity distribution on (x,y) X R satisfying y M amost surey, and et {x i,y i } i= be randomy samped according to ρ. Define, ˆf = argminf Hτ,s i= (y i f(x i )) 2 to be the empirica risk minimizer and f = argmin f Hτ,s R(f) to be the true risk minimizer in H τ,s where R(f) = E (x,y) ρ (y f(x)) 2 denotes the true risk. Then, with probabiity ateast δ over random draws of sampes of size, sκτ R(ˆf) R(f )+8L where y f L = (M + sκτ). +4L 2 og( 3 δ ) 2 The proof is given in suppementary materia, but can aso be reasoned as foows. In the standard singe-rkhs case, the Rademacher compexity can be upper bounded by a quantity that is proportiona to the square root of the trace of the Gram matrix, which is further upper bounded by κ. In our case, any coection of s-sparse functions from a dictionary of N RKHSs reduces to a singe RKHS whose kerne is the sum ofsbase kernes, and hence the corresponding trace can be bounded by sκ for a possibe subsets of size s. Once it is estabished that the empirica Rademacher compexity ofh λ,s is upper bounded by sκτ, the generaization bound foows from we-known resuts [6] taiored to reguarized east squares regression with bounded target variabe. For -norm MKL, in the context of margin-based oss functions, Cortes et. a., 200 [8] bound the Rademacher compexity as ce og(n) κτ (6) where is the ceiing function that rounds to next integer, e is the exponentia and c = Using VC-based ower-bound arguments, they point out that the og(n) dependence on N is essentiay optima. By contrast, our greedy approach with sequentia reguarized risk minimization imposes direct contro over degree of sparsity as we as smoothness, and hence the Rademacher compexity in our case is independent of N. If s = O(ogN), the bounds are simiar. A critica difference between -norm MKL and sparse greedy approximations, however, is that the former is convex and hence the empirica risk can be minimized exacty in the hypothesis space whose compexity is bounded by Rademacher anaysis. This is not true in our case, and therefore, to compement Rademacher anaysis, we need conditions under which good soutions can be recovered. 4.2 Exact Recovery Conditions in Noiseess Settings We now assume that the regression function f ρ (x) = ydρ(y x) is sparse, i.e., f ρ H Ggood for some subset G good of s good kernes, and that it is sufficienty smooth in the sense that for some λ > 0, given sufficient sampes, the empirica minimizer ˆf = argmin f HGgood R λ (f,y) gives near optima generaization as per Theorem. In this section our main concern is to characterize Group- OMP ike conditions under which MKL-GOMP wi be abe to earn ˆf by recovering the support G good exacty. Reca our notation that k Ggood = j G good k j is the kerne associated withh Ggood. Note that Tikhonov reguarization using a penaty term λ 2, and Ivanov Reguarization which uses a ba constraint 2 τ return identica soutions for some one-to-one correspondence between λ andτ. 5

6 Let us denote r (i) = ˆf f (i) as the residua function at step i of the agorithm. Initiay, r (0) = ˆf H Ggood. Infact, by the Representer theorem, r (0) = ˆf ĤG good H Ggood where we use the notation ĤG good = span{k Ggood (x i, ),i =...}. Our argument is inductive: if at any step i, r (i) ĤG good and, under this assumption, we can aways guarantee that (a) max j Ggood I(f (i),h j ) > max j/ Ggood I(f (i),h j ), i.e., a good kerne offers better greedy improvement and is therefore seected and (b) that after refitting, the new residua r (i+) ĤG good, then by induction it is cear that the agorithm correcty expands the hypothesis space and never makes a mistake. Without oss of generaity, et us rearrange the dictionary so that G good = {...s}. For any function f ĤG good, we now wish to derive the foowing upper bound, (I(f,H s+ )...I(f,H N )) (I(f,H )...I(f,H s )) µ H (G good ) 2 (7) Ceary, a sufficient condition for exact recovery isµ H (G good ) <. We need some notation to state our main resut. Let s = G good, i.e., the number of good kernes. For any matrix A R s (N s), et A (2,) denote the matrix norm induced by the foowing vector norms: for any vector u = [u...u s ] R s define u (2,) = s i= u i 2 ; and simiary, for any vector v = [v...v N s ] R (N s) define v (2,) = N s i= v i 2. Then, A (2,) = Av sup (2,) v R (N s) v (2,). We can now state the foowing: Theorem 2. Given the kerne dictionary{k j (, )} N j= with associated gram matrices{k j} N i= over the abeed data, MKL-GOMP correcty recovers the good kernes, i.e.,g (s) = G good, if µ H (G good ) = C λ,h (G good ) (2,) < where C λ,h (G good ) R s (N s) is a coherence matrix whose (i,j) th bock of size, i G good,j / G good, is given by, C λ,h (G good ) i,j = K Ggood Q i Q k K 2 G good Q k Q j K Ggood (8) k G good where K Ggood = j G good K j, Q j = (I λ(k j +λi) ) 2,j =...N. The proof is given in suppementary materia. This resut is anaogous to sparse recovery conditions for OMP and methods and their (inear) group counterparts. In the noiseess setting, Tropp [27] gives an exact recovery condition of the form X good X bad <, where X good and X bad refer to the restriction of the data matrix to good and bad features, X good denotes pseudo-inverse of X good and refers to the induced matrix norm. Intriguingy, the same paper shows that this condition is aso sufficient for the Basis Pursuit minimization probem. For Group-OMP [6], the condition generaizes to invove a group sensitive matrix norm on the same matrix objects. Likewise, Bach [2] generaizes the Lasso variabe seection consistency conditions to appy to Group Lasso and then further to non-parametric -MKL. The above resut is simiar in spirit. A stronger sufficient condition can be derived by requiring Q j K Ggood 2 to be sufficienty sma for a j / G good. Intuitivey, this means that smooth functions inh Ggood cannot be we approximated by using smooth functions induced by the bad kernes, so that MKL-GOMP is never ed to making a mistake. 5 Empirica Studies We report empirica resuts on a coection of simuated datasets and 3 cassification probems from computationa ce bioogy. In a experiments, as in [0, 33], candidate kernes are normaized mutipicativey to have uniform variance of data points in their associated feature spaces. 5. Adaptabiity to Data Sparsity - Simuated Setting We adapt the experimenta setting proposed by [0] where the sparsity of the target function is expicity controed, and the optima subset of kernes is varied from requiring the entire dictionary to 6

7 Figure : Simuated Setting: Adaptabiity to Data Sparsity test error norm MKL 4/3 norm MKL 2 norm MKL 4 norm MKL norm MKL (=RLS) MKL GOMP Bayes Error v(θ) = fraction of noise kernes [in %] Sparsity Smoothness 80 % of Kernes Seected v(θ) = fraction of noise kernes [in %] Vaue of λ v(θ) = fraction of noise kernes [in %] requiring a singe kerne. Our goa is to study the soution paths offered by MKL-GOMP in comparison to q -norm MKL. For consistency, we use squared oss in a experiments 2. We impemented q -norm MKL for reguarized east squares (RLS) using an aternating minimization scheme adapted from [7, 29]. Different binary cassification datasets 3 with 50 abeed exampes are randomy generated by samping the two casses from 50-dimensiona isotropic Gaussian distributions with equa covariance matrices (identity) and equa but opposite, means µ =.75 θ θ and µ 2 = µ where θ is a binary vector encoding the true underying sparsity. The fraction of zero components in θ is a measure for the feature sparsity of the earning probem. For each dataset, a inear kerne (normaized as in [0]) is generated from each feature and the resuting dictionary is input to MKL-GOMP and q -norm MKL. For each eve of sparsity, a training of size 50, vaidation and test sets of size 0000 are generated 0 times and average cassification errors are reported. For each run, the vaidation error is monitored as kerne seection progresses in MKL-GOMP and the number of kernes with smaest vaidation error are chosen. The reguarization parameters for both MKL-GOMP and q norm MKL are simiary chosen using the vaidation set. Figure 5. shows test error rates as a function of sparsity of the target function: from non-sparse (a kernes needed) to extremey sparse (ony kerne needed). We recover the observations aso made in [0]: -norm MKL exces in extremey sparse settings where a singe kerne carries the whoe discriminative information of the earning probem. However, in the other scenarios it mosty performs worse than the other q > variants, despite the fact that the vector θ remains sparse in a but the uniform scenario. As q is increased, the error rate in these settings improves but deteriorates in sparse settings. As reported in [], the eastic net MKL approach of [26] performs simiar to -MKL in the hinge oss case. As can be seen in the Figure, the error curve of MKL-GOMP tends to be beow the ower enveope of the error rates given by q -MKL soutions. To adapt to the sparsity of the probem, q methods ceary need to tune q requiring severa fresh invocations of the appropriate q -MKL sover. On the other hand, in MKL-GOMP the hypothesis space grows as function of the iteration number and the soution trajectory naturay expands sequentiay in the direction of decreasing sparsity. The right pot in Figure 5. shows the number of kernes seected by MKL-GOMP and the optima vaue of λ, suggesting that MKL-GOMP adapts to the sparsity and smoothness of the earning probem. 5.2 Protein Subceuar Locaization The muticass generaization of -MKL proposed in [33] (MCMKL) is state of the art methodoogy in predicting protein subceuar ocaization, an important ce bioogy probem that concerns the estimation of where a protein resides in a ce so that, for exampe, the identification of drug targets can be aided. We use three muticass datasets: PSORT+, PSORT- and PLANT provided by the authors of [33] at together with a dictionary of 69 kernes derived with bioogica insight: 2 kernes on phyogenetic 2 q-mkl with SVM hinge oss behaves simiary. 3 Provided by the authors of [0] at mdata.org/repository/data/viewsug/mk-toy/ 7

8 Performance (higher is better) mkgomp mcmk sum singe other psort+ psort pant Figure 2: Protein Subceuar Locaization Resuts trees, 3 kernes based on simiarity to known proteins (BLAST E-vaues), and 64 kernes based on amino-acid sequence patterns. The statistics of the three datasets are as foows: PSORT+ has 54 proteins abeed with 4 ocation casses, PSORT- has 444 proteins in 5 casses and PLANT is a 4-cass probem with 940 proteins. For each dataset, resuts are averaged over 0 spits of the dataset into training and test sets. We used exacty the same experimenta protoco, data spits and evauation methodoogy as given in [33]: the hyper-parameters of MKL-GOMP (sparsity and the reguarization parameter λ) were tuned based on 3-fod cross-vaidation; resuts on PSORT+, PSORTare F-scores averaged over the casses whie those on PLANT are Mathew s correation coefficient 4. Figure 2 compare MKL-GOMP against MCMKL, baseines such as using the sum of a the kernes and using the best singe kerne, and resuts from other prediction systems proposed in the iterature. As can be seen, MKL-GOMP sighty outperforms MCMKL on PSORT+ an PSORT- datasets and is sighty worse on PLANT where RLS with the sum of a the kernes aso performs very we. On the two PSORT datasets, [33] report seecting 25 kernes using MCMKL. On the other hand, on average, MKL-GOMP seects 4 kernes on PSORT+, 5 on PSORT- and 24 kernes on PLANT. Note that MKL-GOMP is appied in mutivariate mode: the kernes are seected based on their utiity to reduce the tota residua error across a target casses. 6 Concusion By proposing a Group-OMP based framework for sparse mutipe kerne earning, anayzing theoreticay the performance of the resuting methods in reation to the dominant convex reaxation-based approach, and demonstrating the vaue of our framework through extensive experimenta studies, we beieve greedy methods arise as a natura aternative for tacking MKL probems. Reevant directions for future research incude extending our theoretica anaysis to the stochastic setting, investigating compex mutivariate structures and groupings over outputs, e.g., by generaizing the mutivariate version of Group-OMP [5], and extending our agorithm to incorporate interesting structured kerne dictionaries [3]. Acknowedgments: We thank Rick Lawrence, Ha Quang Minh and David Rosenberg for insightfu conversations and enthusiastic support for this work. References [] N. Aronszajn. Theory of reproducing kerne hibert spaces. Transactions of the American Mathematica Society, 68(3): , 950. [2] F. Bach. Consistency of group asso and mutipe kerne earning. JMLR, 9:79 225, [3] F. Bach. High-dimensiona non-inear variabe seection through hierarchica kerne earning. In Technica report, HAL , [4] F. Bach, R. Jenatton, J. Maira, and G. Obozinski. Optimization with sparsity-inducing penaties. In Technica report, HAL , see 8

9 [5] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Mutipe kerne earning, conic duaity, and the smo agorithm. In ICML, [6] P. Bartett and S. Mendeson. Rademacher and gaussian compexities: Risk bounds and structura resuts. JMLR, 3: , [7] A. Ben-Hur and W. S. Nobe. Kerne methods for predicting protein protein interactions. Bioinformatics, 2, January [8] C. Cortes, M. Mohri, and Afshin Rostamizadeh. Generaization bounds for earning kernes. In ICML, 200. [9] A. K. Fetcher and S. Rangan. Orthogona matching pursuit from noisy measurements: A new anaysis. In NIPS, [0] M. Koft, U. Brefed, S. Sonnenburg, and A. Zien. p-norm mutipe kerne earning. JMLR, 2: , 20. [] M. Koft, U. Ruckert, and P. Bartett. A unifying view of mutipe kerne earning. In European Conference on Machine Learning (ECML), 200. [2] V. Kotchinskii and M. Yuan. Sparsity in mutipe kerne earning. The Annas of Statistics, 38(6): , 200. [3] G. R. G. Lanckriet, N. Cristianini, P. Bartett, L. E Ghaoui, and M. I. Jordan. Learning the kerne matrix with semidefinite programming. J. Mach. Learn. Res., 5:27 72, December [4] G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Nobe. A statistica framework for genomic data fusion. Bioinformatics, 20, November [5] A. C. Lozano and V. Sindhwani. Bock variabe seection in mutivariate regression and high-dimensiona causa inference. In NIPS, 200. [6] A. C. Lozano, G. Swirszcz, and N. Abe. Group orthogona matching pursuit for variabe seection and prediction. In NIPS, [7] C. Michei and M. Ponti. Learning the kerne function via reguarization. JMLR, 6:099 25, [8] H. Liu P. Ravikumar, J. Lafferty and L. Wasserman. Sparse additive modes. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy) (JRSSB), 7 (5): , [9] P. Pavidis, J. Cai, J. Weston, and W.S. Nobe. Learning gene functiona cassifications from mutipe data types. Journa of Computationa Bioogy, 9:40 4, [20] A. Rakotomamonjy, F.Bach, S. Cano, and Y. Grandvaet. SimpeMKL. Journa of Machine Learning Research, 9: , [2] G. Raskutti, M. Wainwrigt, and B. Yu. Minimax-optima rates for sparse additive modes over kerne casses via convex programming. In Technica Report 795, Statistics Department, UC Berkeey., 200. [22] Bernhard Schokopf and Aexander J. Smoa. Learning with Kernes: Support Vector Machines, Reguarization, Optimization, and Beyond. MIT Press, 200. [23] J. Shawe-Tayor and N. Cristianini. Kerne Methods for Pattern Anaysis. Cambridge University Press, [24] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schökopf. Large scae mutipe kerne earning. J. Mach. Learn. Res., 7, December [25] Zhang T. Sparse recovery with orthogona matching pursuit under rip. Computing Research Repository, 200. [26] R. Tomioka and T. Suzuki. Sparsity-accuracy tradeoff in mk. In NIPS Workshop: Understanding Mutipe Kerne Learning Methods. Technica report, arxiv:00.265v, 200. [27] J. Tropp. Greed is good: Agorithmic resuts for sparse approximation. IEEE Trans. Inform. Theory,, 50(0): , [28] P. Vincent and Y. Bengio. Kerne matching pursuit. Machine Learning, 48:65 88, [29] Z. Xu, R. Jin, H. Yang, I. King, and M.R. Lyu. Simpe and efficient mutipe kerne earning by group asso. In ICML, 200. [30] Ming Yuan, Ai Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in mutivariate inear regression. Journa Of The Roya Statistica Society Series B, 69(3): , [3] Tong Zhang. On the consistency of feature seection using greedy east squares regression. J. Mach. Learn. Res., 0, June [32] H. Zhou and T. Hastie. Reguarization and variabe seection via the eastic net. Journa of the Roya Statistica Society, 67(2):30 320, [33] A. Zien and Cheng S. Ong. Muticass mutipe kerne earning. ICML,

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

Moreau-Yosida Regularization for Grouped Tree Structure Learning

Moreau-Yosida Regularization for Grouped Tree Structure Learning Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999

More information

arxiv: v1 [cs.lg] 31 Oct 2017

arxiv: v1 [cs.lg] 31 Oct 2017 ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization

Scalable Spectrum Allocation for Large Networks Based on Sparse Optimization Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Sparse Semi-supervised Learning Using Conjugate Functions

Sparse Semi-supervised Learning Using Conjugate Functions Journa of Machine Learning Research (200) 2423-2455 Submitted 2/09; Pubished 9/0 Sparse Semi-supervised Learning Using Conjugate Functions Shiiang Sun Department of Computer Science and Technoogy East

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

Adaptive Regularization for Transductive Support Vector Machine

Adaptive Regularization for Transductive Support Vector Machine Adaptive Reguarization for Transductive Support Vector Machine Zengin Xu Custer MMCI Saarand Univ. & MPI INF Saarbrucken, Germany zxu@mpi-inf.mpg.de Rong Jin Computer Sci. & Eng. Michigan State Univ. East

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Statistics for Applications. Chapter 7: Regression 1/43

Statistics for Applications. Chapter 7: Regression 1/43 Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)

A Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c) A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,

More information

Active Learning & Experimental Design

Active Learning & Experimental Design Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations

Optimality of Inference in Hierarchical Coding for Distributed Object-Based Representations Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna

More information

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel

Sequential Decoding of Polar Codes with Arbitrary Binary Kernel Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient

More information

Target Location Estimation in Wireless Sensor Networks Using Binary Data

Target Location Estimation in Wireless Sensor Networks Using Binary Data Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments

Learning Structural Changes of Gaussian Graphical Models in Controlled Experiments Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University

More information

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete

Uniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Melodic contour estimation with B-spline models using a MDL criterion

Melodic contour estimation with B-spline models using a MDL criterion Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex

More information

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix

Do Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

BDD-Based Analysis of Gapped q-gram Filters

BDD-Based Analysis of Gapped q-gram Filters BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de

More information

Paragraph Topic Classification

Paragraph Topic Classification Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

BP neural network-based sports performance prediction model applied research

BP neural network-based sports performance prediction model applied research Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied

More information

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones

ASummaryofGaussianProcesses Coryn A.L. Bailer-Jones ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe

More information

Kernel Matching Pursuit

Kernel Matching Pursuit Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique

More information

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM

DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing

More information

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION

NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks

Power Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro

More information

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7

6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7 6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the

More information

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain

Alberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

arxiv: v2 [stat.ml] 8 Mar 2013

arxiv: v2 [stat.ml] 8 Mar 2013 Scaabe Matrix-vaued Kerne Learning for High-dimensiona Noninear Mutivariate Regression and Granger Causaity arxiv:1210.4792v2 [stat.ml] 8 Mar 2013 Vikas Sindhwani IBM Research New York 10598, USA vsindhw@us.ibm.com

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Identification of macro and micro parameters in solidification model

Identification of macro and micro parameters in solidification model BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vo. 55, No. 1, 27 Identification of macro and micro parameters in soidification mode B. MOCHNACKI 1 and E. MAJCHRZAK 2,1 1 Czestochowa University

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Unconditional security of differential phase shift quantum key distribution

Unconditional security of differential phase shift quantum key distribution Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice

More information

II. PROBLEM. A. Description. For the space of audio signals

II. PROBLEM. A. Description. For the space of audio signals CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time

More information

Fitting Algorithms for MMPP ATM Traffic Models

Fitting Algorithms for MMPP ATM Traffic Models Fitting Agorithms for PP AT Traffic odes A. Nogueira, P. Savador, R. Vaadas University of Aveiro / Institute of Teecommunications, 38-93 Aveiro, Portuga; e-mai: (nogueira, savador, rv)@av.it.pt ABSTRACT

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

A Comparison Study of the Test for Right Censored and Grouped Data

A Comparison Study of the Test for Right Censored and Grouped Data Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the

More information

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm

Asymptotic Properties of a Generalized Cross Entropy Optimization Algorithm 1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete

More information

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance

Research of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient

More information

Primal and dual active-set methods for convex quadratic programming

Primal and dual active-set methods for convex quadratic programming Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:

More information

Copyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU

Copyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU Copyright information to be inserted by the Pubishers Unspitting BGK-type Schemes for the Shaow Water Equations KUN XU Mathematics Department, Hong Kong University of Science and Technoogy, Cear Water

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis

More information

Some Properties of Regularized Kernel Methods

Some Properties of Regularized Kernel Methods Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di

More information

Learning Fully Observed Undirected Graphical Models

Learning Fully Observed Undirected Graphical Models Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,

More information

Translation Microscopy (TRAM) for super-resolution imaging.

Translation Microscopy (TRAM) for super-resolution imaging. ransation Microscopy (RAM) for super-resoution imaging. Zhen Qiu* 1,,3, Rhodri S Wison* 1,3, Yuewei Liu 1,3, Aison Dun 1,3, Rebecca S Saeeb 1,3, Dongsheng Liu 4, Coin Ricman 1,3, Rory R Duncan 1,3,5, Weiping

More information

Support Vector Machine and Its Application to Regression and Classification

Support Vector Machine and Its Application to Regression and Classification BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views

More information

General Certificate of Education Advanced Level Examination June 2010

General Certificate of Education Advanced Level Examination June 2010 Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/Q10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of using one or two eyes on the perception

More information

arxiv: v1 [math.co] 17 Dec 2018

arxiv: v1 [math.co] 17 Dec 2018 On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

arxiv: v1 [math.ca] 6 Mar 2017

arxiv: v1 [math.ca] 6 Mar 2017 Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear

More information

FUSED MULTIPLE GRAPHICAL LASSO

FUSED MULTIPLE GRAPHICAL LASSO FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Lasso and probabilistic inequalities for multivariate point processes

Lasso and probabilistic inequalities for multivariate point processes Submitted to the Bernoui arxiv: arxiv:128.57 Lasso and probabiistic inequaities for mutivariate point processes NIELS RICHARD HANSEN 1, PATRICIA REYNAUD-BOURET 2 and VINCENT RIVOIRARD 3 1 Department of

More information

arxiv: v2 [stat.ml] 17 Mar 2015

arxiv: v2 [stat.ml] 17 Mar 2015 Journa of Machine Learning Research 1x (201x) x-xx Submitted x/0x; Pubished x/0x A Unifying Framework in Vector-vaued Reproducing Kerne Hibert Spaces for Manifod Reguarization and Co-Reguarized Muti-view

More information

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems

Bayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University

More information

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1

Another Look at Linear Programming for Feature Selection via Methods of Regularization 1 Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007

More information

Adaptive Localization in a Dynamic WiFi Environment Through Multi-view Learning

Adaptive Localization in a Dynamic WiFi Environment Through Multi-view Learning daptive Locaization in a Dynamic WiFi Environment Through Muti-view Learning Sinno Jiain Pan, James T. Kwok, Qiang Yang, and Jeffrey Junfeng Pan Department of Computer Science and Engineering Hong Kong

More information

On the Goal Value of a Boolean Function

On the Goal Value of a Boolean Function On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor

More information

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems

Consistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems Consistent inguistic fuzzy preference reation with muti-granuar uncertain inguistic information for soving decision making probems Siti mnah Binti Mohd Ridzuan, and Daud Mohamad Citation: IP Conference

More information

arxiv: v2 [cs.lg] 4 Sep 2014

arxiv: v2 [cs.lg] 4 Sep 2014 Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2

More information

arxiv: v2 [stat.ml] 19 Oct 2016

arxiv: v2 [stat.ml] 19 Oct 2016 Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv:1407.4543v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University ye@stanford.edu Abstract Trevor Hastie Department of

More information

The Binary Space Partitioning-Tree Process Supplementary Material

The Binary Space Partitioning-Tree Process Supplementary Material The inary Space Partitioning-Tree Process Suppementary Materia Xuhui Fan in Li Scott. Sisson Schoo of omputer Science Fudan University ibin@fudan.edu.cn Schoo of Mathematics and Statistics University of

More information

Distributed average consensus: Beyond the realm of linearity

Distributed average consensus: Beyond the realm of linearity Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,

More information

Learning Gaussian Processes from Multiple Tasks

Learning Gaussian Processes from Multiple Tasks Kai Yu kai.yu@siemens.com Information and Communication, Corporate Technoogy, Siemens AG, Munich, Germany Voker Tresp voker.tresp@siemens.com Information and Communication, Corporate Technoogy, Siemens

More information

Data Mining Technology for Failure Prognostic of Avionics

Data Mining Technology for Failure Prognostic of Avionics IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA

More information

On a geometrical approach in contact mechanics

On a geometrical approach in contact mechanics Institut für Mechanik On a geometrica approach in contact mechanics Aexander Konyukhov, Kar Schweizerhof Universität Karsruhe, Institut für Mechanik Institut für Mechanik Kaiserstr. 12, Geb. 20.30 76128

More information

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance

Efficient Similarity Search across Top-k Lists under the Kendall s Tau Distance Efficient Simiarity Search across Top-k Lists under the Kenda s Tau Distance Koninika Pa TU Kaisersautern Kaisersautern, Germany pa@cs.uni-k.de Sebastian Miche TU Kaisersautern Kaisersautern, Germany smiche@cs.uni-k.de

More information

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University

Turbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver

More information

A Novel Learning Method for Elman Neural Network Using Local Search

A Novel Learning Method for Elman Neural Network Using Local Search Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

Statistical Inference, Econometric Analysis and Matrix Algebra

Statistical Inference, Econometric Analysis and Matrix Algebra Statistica Inference, Econometric Anaysis and Matrix Agebra Bernhard Schipp Water Krämer Editors Statistica Inference, Econometric Anaysis and Matrix Agebra Festschrift in Honour of Götz Trenker Physica-Verag

More information