Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
|
|
- Abigayle Jones
- 6 years ago
- Views:
Transcription
1 Non-parametric Group Orthogona Matching Pursuit for Sparse Learning with Mutipe Kernes Vikas Sindhwani and Auréie C. Lozano IBM T.J. Watson Research Center Yorktown Heights, NY 0598 Abstract We consider reguarized risk minimization in a arge dictionary of Reproducing kerne Hibert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commony referred to as Sparse Mutipe Kerne Learning (MKL), may be viewed as the non-parametric extension of group sparsity in inear modes. Whie the two dominant agorithmic strands of sparse earning, namey convex reaxations using norm (e.g., Lasso) and greedy methods (e.g., OMP), have both been rigorousy extended for group sparsity, the sparse MKL iterature has so far mainy adopted the former with mid empirica success. In this paper, we cose this gap by proposing a Group-OMP based framework for sparse MKL. Unike -MKL, our approach decoupes the sparsity reguarizer (via a direct 0 constraint) from the smoothness reguarizer (via RKHS norms), which eads to better empirica performance and a simper optimization procedure that ony requires a back-box singe-kerne sover. The agorithmic deveopment and empirica studies are compemented by theoretica anayses in terms of Rademacher generaization bounds and sparse recovery conditions anaogous to those for OMP [27] and Group-OMP [6]. Introduction Kerne methods are widey used to address a variety of earning probems incuding cassification, regression, structured prediction, data fusion, custering and dimensionaity reduction [22, 23]. However, choosing an appropriate kerne and tuning the corresponding hyper-parameters can be highy chaenging, especiay when itte is known about the task at hand. In addition, many modern probems invove mutipe heterogeneous data sources (e.g. gene functiona cassification, prediction of protein-protein interactions) each necessitating the use of a different kerne. This strongy suggests avoiding the risks and imitations of singe kerne seection by considering fexibe combinations of mutipe kernes. Furthermore, it is appeaing to impose sparsity to discard noisy data sources. As severa papers have provided evidence in favor of using mutipe kernes (e.g. [9, 4, 7]), the mutipe kerne earning probem (MKL) has generated a arge body of recent work [3, 5, 24, 33], and become the foca point of the intersection between non-parametric function estimation and sparse earning methods traditionay expored in inear settings. Given a convex oss function, the MKL probem is usuay formuated as the minimization of empirica risk together with a mixed norm reguarizer, e.g., the square of the sum of individua RKHS norms, or variants thereof, that have a cose reationship to the Group Lasso criterion [30, 2]. Equivaenty, this formuation may be viewed as simutaneous optimization of both the non-negative convex combination of kernes, as we as prediction functions induced by this combined kerne. In constraining the combination of kernes, the penaty is of particuar interest as it encourages sparsity in the supporting kernes, which is highy desirabe when the number of kernes considered is arge. The MKL iterature has rapidy evoved aong two directions: one concerns scaabiity of op-
2 timization agorithms beyond the eary pioneering proposas based on Semi-definite programming or Second-order Cone programming [3, 5] to simper and more efficient aternating optimization schemes [20, 29, 24]; whie the other concerns the use of p norms [0, 29] to construct compex non-sparse kerne combinations with the goa of outperforming -norm MKL which, as reported in severa papers, has demonstrated mid success in practica appications. The cass of Orthogona Matching Pursuit techniques has recenty received considerabe attention, as a competitive aternative to Lasso. The basic OMP agorithm originates from the signa-processing community and is simiar to forward greedy feature seection, except that it performs re-estimation of the mode parameters in each iteration, which has been shown to contribute to improved accuracy. For inear modes, some strong theoretica performance guarantees and empirica support have been provided for OMP [3] and its extension for variabe group seection, Group-OMP [6]. In particuar it was shown in [25, 9] that OMP and Lasso exhibit competitive theoretica performance guarantees. It is therefore desirabe to investigate the use of Matching Pursuit techniques in the MKL framework and whether one may be abe to improve upon existing MKL methods. Our contributions in this paper are as foows. We propose a non-parametric kerne-based extension to Group-OMP [6]. In terms of the feature space (as opposed to function space) perspective of kerne methods, this aows Group-OMP to hande groups that can potentiay contain infinite features. By adding reguarization in Group-OMP, we aow it to hande settings where the sampe size might be smaer than the number of features in any group. Rather than imposing a mixed /RKHSnorm reguarizer as in group-lasso based MKL, a group-omp based approach aows us to consider the exact sparse kerne seection probem via 0 reguarization instead. Note that in contrast to the group-asso penaty, the 0 penaty by itsef has no effect on the smoothness of each individua component. This aows for a cear decouping between the roe of the smoothness reguarizer (namey, an RKHS reguarizer) and the sparsity reguarizer (via the 0 penaty). Our greedy agorithms aow for simpe and fexibe optimization schemes that ony require a back-box sover for standard earning agorithms. In this paper, we focus on mutipe kerne earning with Reguarized east squares (RLS). We provide a bound on the Rademacher compexity of the hypothesis sets considered by our formuation. We derive conditions anaogous to OMP [27] and Group-OMP [6] to guarantee the correctness of kerne seection. We cose this paper with empirica studies on simuated and rea-word datasets that confirm the vaue of our methods. 2 Learning Over an RKHS Dictionary In this section, we setup some notation and give a brief background before introducing our main objective function and describing our agorithm in the next section. Let H...H N be a coection of Reproducing Kerne Hibert Spaces with associated Kerne functions k...k N defined on the input space X R d. Let H denote the sum space of functions, H = H H 2... H N = {f : X R f(x) = N f j (x),x X,f j H j,j =...N} Let us equip this space with the foowing p norms, p N N f p(h) = inf f j p H j : f(x) = f j (x),x X,f j H j,j =...N () j= It is now natura to consider a reguarized risk minimization probem over such a RKHS dictionary, given a coection of training exampes {x i,y i } i=, arg min f H j= j= V(y i,f(x i ))+λ f 2 p(h) (2) i= wherev(, ) is a convex oss function such as squared oss in the Reguarized Least Squares (RLS) agorithm or the hinge oss in the SVM method. If this probem again has eements of an RKHS structure, then, via the Representer Theorem, it can again be reduced to a finite dimensiona probem and efficienty soved. 2
3 Let q = p 2 p and et us define theq-convex hu of the set of kerne functions to be the foowing, N N co q (k...k N ) = k γ : X X R k γ (x,z) = γ j k j (x,z), γ q j =,γ j 0 where γ R N. It is easy to see that the non-negative combination of kernes, k γ, is itsef a vaid kerne with an associated RKHS H kγ. With this definition, [7] show the foowing, { } f p(h) = inf f Hkγ,k γ co q (k...k N ) (3) γ This reationship connects Tikhonov reguarization with p norms over H to reguarization over RKHSs parameterized by the kerne functions k γ. This eads to a arge famiy of mutipe kerne earning agorithms (whose variants are aso sometimes referred to as q -MKL) where the basic idea is to sove an equivaent probem, arg min f H kγ,γ q j= j= V(y i,f(x i ))+λ f 2 H kγ (4) i= where q = {γ R N : γ q =, n j= γ j 0}. For a fixed γ, the optimization over f H kγ is recognizabe as an RKHS probem for which a standard back box sover may be used. The weights γ may then optimized in an aternating minimization scheme, athough severa other optimization procedures are aso be used (see e.g., [4]). The case where p = is of particuar interest in the setting when the size of the RKHS dictionary is arge but the unknown target function can be approximated in a much smaer number of RKHSs. This eads to a arge famiy of sparse mutipe kerne earning agorithms that have a strong connection to the Group Lasso [2, 20, 29]. 3 Mutipe Kerne Learning with Group Orthogona Matching Pursuit Let us reca the 0 pseudo-norm, which is the cardinaity of the sparsest representation of f in the dictionary, f 0(H) = min{ J : f = j J f j}. We now pose the foowing exact sparse kerne seection probem, arg min f H V(y i,f(x i ))+λ f 2 2(H) subject to f 0(H) s (5) i= It is important to note the foowing: when using a dictionary of universa kernes, e.g., Gaussian kernes with different bandwidths, the presence of the reguarization term f 2 2(H) is critica (i.e., λ > 0) since otherwise the abeed data can be perfecty fit by any singe kerne. In other words, the kerne seection probem is i-posed. Whie conceptuay simpe, our formuation is quite different from those proposed earier since the roe of a smoothness reguarizer (via the f 2 2(H) penaty) is decouped from the roe of a sparsity reguarizer (via the constraint on f 0(H) s). Moreover, the atter is imposed directy as opposed through ap = penaty making the spirit of our approach coser to Group Orthogona Matching Pursuit (Group-OMP [6]) where groups are formed by very highdimensiona (infinite for Gaussian kernes) feature spaces associated with the kernes. It has been observed in recent work [0, 29] on -MKL that sparsity aone does not ead it to improvements in rea-word empirica tasks and hence severa methods have been proposed to expore q -norm MKL with q > in Eqn. 4, making MKL depart away from sparsity in kerne combinations. By contrast, we note that as q, p 2. Our approach gives a direct knob both on smoothness (via λ) and sparsity (via s) with a soution path aong these dimensions that differs from that offered by Group-Lasso based q -MKL as q is varied. By combining 0 pseudo-norm with RKHS norms, our method is conceptuay reminiscent of the eastic net [32] (aso see [26, 2, 2]). If kernes arise from different subsets of input variabes, our approach is aso reated to sparse additive modes [8]. Our agorithm, MKL-GOMP, is outined beow for reguarized east squares. Extensions for other oss functions, e.g., hinge oss for SVMs, can aso be simiary derived. In the description of the agorithm, our notation is as foows: For any functionf beonging to an RKHSF k with kerne function k(, ), we denote the reguarized objective function as,r λ (f,y) = i= (y i f(x i )) 2 +λ f Fk 3
4 where F denotes the RKHS norm. Reca that the minimizer f = argmin f F R λ (f,y) is given by soving the inear system, α = (K + λi) y where K is the gram matrix of the kerne on the abeed data, and by setting f (x) = i= α ik(x,x i ). Moreover, the objective vaue achieved by the minimizer is: R λ (f,y) = λy T (K + λi) y. Note that MKL-GOMP shoud not be confused with Kerne Matching Pursuit [28] whose goa is different: it is designed to sparsify α in a singe-kerne setting. The MKL-GOMP procedure iterativey expands the hypothesis space, H G () H G (2)... H G (i), by greediy seecting kernes from a given dictionary, where G (i) {...N} is a subset of indices and H G = j G H j. Note that each H G is an RKHS with kerne k G = j G k j (see Section 6 in []). The seection criteria is the best improvement, I(f (i),h j ), given by a new hypothesis space H j in reducing the norm of the current residua r (i) = y f (i) where f (i) = [f (i) (x )...f (i) (x )] T, by finding the best reguarized (smooth) approximation. Note that since min g Hj R λ (g,r) R λ (0,r) = r 2, the vaue of the improvement function, I(f (i),h j ) = r (i) 2 2 min R λ (g,r (i) ) g H j is aways non-negative. Once a kerne is seected, the function is re-estimated by earning in H G (i). Note that since H G is an RKHS whose kerne function is the sum j G k j, we can use a simpe RLS inear system sover for refitting. Unike group-lasso based MKL, we do not need an iterative kerne reweighting step which essentiay arises as a mechanism to transform the ess convenient group sparsity norms into reweighted squared RKHS norms. MKL-GOMP converges when the best improvement is no better than ǫ (or, in practice, if a maximum aowed number of s kernes have been seected). Input: Data matrix X = [x...x ] T, Labe vector y R, Kerne Dictionary {k j(, )} N j=, Precisionǫ > 0, Maximum sparsitys Output: Seected Kernes G (i) and a functionf (i) H G (i) Initiaization: G (0) =,f (0) = 0, set residua r (0) = y for i = 0,,2,...s. Kerne Seection: For a j / G (i), set: I(f (i),h j) = r (i) 2 2 min g Hj R λ (g,r (i) ) = r ( (i)t I λ(k j +λi) ) r (i) Pick j (i) = argmax j/ G (i) I(f (i),h j) ( ) 2. Convergence Check: if I(f (i),h j (i)) ǫ return f (i) end 3. Refitting: Set G (i+) = G (i) {j (i) }. Setf (i+) (x) = j= αjk(x,xj) where k = j G (i+) kj andα = ( j G (i+) Kj +λi ) y 4. Update Residua: r (i+) = y f (i+) where f (i+) = [f (i+) (x )...f (i+) (x )] T. Remarks: Note that our agorithm can be appied to mutivariate probems with group structure among outputs simiar to Mutivariate Group-OMP [5]. In particuar, in our experiments on muticass datasets, we treat a outputs as a singe group and evauate each kerne for seection based on how we the tota residua is reduced across a outputs simutaneousy. Kerne matrices are normaized to unit trace or to have uniform variance of data points in their associated feature spaces, as in [0, 33]. In practice, we can aso monitor error on a vaidation set to decide the optima degree of sparsity. For efficiency, we can precompute the matrices Q j = (I λ(k j + λi) ) 2 so that I(f (i),h j ) = Q j r 2 2 can be very quicky evauated at seection time, and/or reduce the search space by considering a random subsampe of the dictionary. 4 Theoretica Anaysis Our anaysis is composed of two parts. In the first part, we estabish generaization bounds for the hypothesis spaces considered by our formuation, based on the notion of Rademacher compex- 4
5 ity. The second component of our theoretica anaysis consists of deriving conditions under which MKL-GOMP can recover good soutions. Whie the first part can be seen as characterizing the statistica convergence of our method, the second part characterizes its numerica convergence as an optimization method, and is required to compement the first part. This is because matching pursuit methods can be deemed to sove an exact sparse probem approximatey, whie reguarized methods (e.g. norm MKL) sove an approximate probem exacty. We therefore need to show that MKL-GOMP recovers a soution that is cose to an optimum soution of the exact sparse probem. 4. Rademacher Bounds Theorem. Consider the hypothesis space of sufficienty sparse and smooth functions, { } H τ,s = f H : f 2 τ, f 2(H) 0(H) s Let δ (0,) and κ = sup x X,j=...N k j (x,x). Let ρ be any probabiity distribution on (x,y) X R satisfying y M amost surey, and et {x i,y i } i= be randomy samped according to ρ. Define, ˆf = argminf Hτ,s i= (y i f(x i )) 2 to be the empirica risk minimizer and f = argmin f Hτ,s R(f) to be the true risk minimizer in H τ,s where R(f) = E (x,y) ρ (y f(x)) 2 denotes the true risk. Then, with probabiity ateast δ over random draws of sampes of size, sκτ R(ˆf) R(f )+8L where y f L = (M + sκτ). +4L 2 og( 3 δ ) 2 The proof is given in suppementary materia, but can aso be reasoned as foows. In the standard singe-rkhs case, the Rademacher compexity can be upper bounded by a quantity that is proportiona to the square root of the trace of the Gram matrix, which is further upper bounded by κ. In our case, any coection of s-sparse functions from a dictionary of N RKHSs reduces to a singe RKHS whose kerne is the sum ofsbase kernes, and hence the corresponding trace can be bounded by sκ for a possibe subsets of size s. Once it is estabished that the empirica Rademacher compexity ofh λ,s is upper bounded by sκτ, the generaization bound foows from we-known resuts [6] taiored to reguarized east squares regression with bounded target variabe. For -norm MKL, in the context of margin-based oss functions, Cortes et. a., 200 [8] bound the Rademacher compexity as ce og(n) κτ (6) where is the ceiing function that rounds to next integer, e is the exponentia and c = Using VC-based ower-bound arguments, they point out that the og(n) dependence on N is essentiay optima. By contrast, our greedy approach with sequentia reguarized risk minimization imposes direct contro over degree of sparsity as we as smoothness, and hence the Rademacher compexity in our case is independent of N. If s = O(ogN), the bounds are simiar. A critica difference between -norm MKL and sparse greedy approximations, however, is that the former is convex and hence the empirica risk can be minimized exacty in the hypothesis space whose compexity is bounded by Rademacher anaysis. This is not true in our case, and therefore, to compement Rademacher anaysis, we need conditions under which good soutions can be recovered. 4.2 Exact Recovery Conditions in Noiseess Settings We now assume that the regression function f ρ (x) = ydρ(y x) is sparse, i.e., f ρ H Ggood for some subset G good of s good kernes, and that it is sufficienty smooth in the sense that for some λ > 0, given sufficient sampes, the empirica minimizer ˆf = argmin f HGgood R λ (f,y) gives near optima generaization as per Theorem. In this section our main concern is to characterize Group- OMP ike conditions under which MKL-GOMP wi be abe to earn ˆf by recovering the support G good exacty. Reca our notation that k Ggood = j G good k j is the kerne associated withh Ggood. Note that Tikhonov reguarization using a penaty term λ 2, and Ivanov Reguarization which uses a ba constraint 2 τ return identica soutions for some one-to-one correspondence between λ andτ. 5
6 Let us denote r (i) = ˆf f (i) as the residua function at step i of the agorithm. Initiay, r (0) = ˆf H Ggood. Infact, by the Representer theorem, r (0) = ˆf ĤG good H Ggood where we use the notation ĤG good = span{k Ggood (x i, ),i =...}. Our argument is inductive: if at any step i, r (i) ĤG good and, under this assumption, we can aways guarantee that (a) max j Ggood I(f (i),h j ) > max j/ Ggood I(f (i),h j ), i.e., a good kerne offers better greedy improvement and is therefore seected and (b) that after refitting, the new residua r (i+) ĤG good, then by induction it is cear that the agorithm correcty expands the hypothesis space and never makes a mistake. Without oss of generaity, et us rearrange the dictionary so that G good = {...s}. For any function f ĤG good, we now wish to derive the foowing upper bound, (I(f,H s+ )...I(f,H N )) (I(f,H )...I(f,H s )) µ H (G good ) 2 (7) Ceary, a sufficient condition for exact recovery isµ H (G good ) <. We need some notation to state our main resut. Let s = G good, i.e., the number of good kernes. For any matrix A R s (N s), et A (2,) denote the matrix norm induced by the foowing vector norms: for any vector u = [u...u s ] R s define u (2,) = s i= u i 2 ; and simiary, for any vector v = [v...v N s ] R (N s) define v (2,) = N s i= v i 2. Then, A (2,) = Av sup (2,) v R (N s) v (2,). We can now state the foowing: Theorem 2. Given the kerne dictionary{k j (, )} N j= with associated gram matrices{k j} N i= over the abeed data, MKL-GOMP correcty recovers the good kernes, i.e.,g (s) = G good, if µ H (G good ) = C λ,h (G good ) (2,) < where C λ,h (G good ) R s (N s) is a coherence matrix whose (i,j) th bock of size, i G good,j / G good, is given by, C λ,h (G good ) i,j = K Ggood Q i Q k K 2 G good Q k Q j K Ggood (8) k G good where K Ggood = j G good K j, Q j = (I λ(k j +λi) ) 2,j =...N. The proof is given in suppementary materia. This resut is anaogous to sparse recovery conditions for OMP and methods and their (inear) group counterparts. In the noiseess setting, Tropp [27] gives an exact recovery condition of the form X good X bad <, where X good and X bad refer to the restriction of the data matrix to good and bad features, X good denotes pseudo-inverse of X good and refers to the induced matrix norm. Intriguingy, the same paper shows that this condition is aso sufficient for the Basis Pursuit minimization probem. For Group-OMP [6], the condition generaizes to invove a group sensitive matrix norm on the same matrix objects. Likewise, Bach [2] generaizes the Lasso variabe seection consistency conditions to appy to Group Lasso and then further to non-parametric -MKL. The above resut is simiar in spirit. A stronger sufficient condition can be derived by requiring Q j K Ggood 2 to be sufficienty sma for a j / G good. Intuitivey, this means that smooth functions inh Ggood cannot be we approximated by using smooth functions induced by the bad kernes, so that MKL-GOMP is never ed to making a mistake. 5 Empirica Studies We report empirica resuts on a coection of simuated datasets and 3 cassification probems from computationa ce bioogy. In a experiments, as in [0, 33], candidate kernes are normaized mutipicativey to have uniform variance of data points in their associated feature spaces. 5. Adaptabiity to Data Sparsity - Simuated Setting We adapt the experimenta setting proposed by [0] where the sparsity of the target function is expicity controed, and the optima subset of kernes is varied from requiring the entire dictionary to 6
7 Figure : Simuated Setting: Adaptabiity to Data Sparsity test error norm MKL 4/3 norm MKL 2 norm MKL 4 norm MKL norm MKL (=RLS) MKL GOMP Bayes Error v(θ) = fraction of noise kernes [in %] Sparsity Smoothness 80 % of Kernes Seected v(θ) = fraction of noise kernes [in %] Vaue of λ v(θ) = fraction of noise kernes [in %] requiring a singe kerne. Our goa is to study the soution paths offered by MKL-GOMP in comparison to q -norm MKL. For consistency, we use squared oss in a experiments 2. We impemented q -norm MKL for reguarized east squares (RLS) using an aternating minimization scheme adapted from [7, 29]. Different binary cassification datasets 3 with 50 abeed exampes are randomy generated by samping the two casses from 50-dimensiona isotropic Gaussian distributions with equa covariance matrices (identity) and equa but opposite, means µ =.75 θ θ and µ 2 = µ where θ is a binary vector encoding the true underying sparsity. The fraction of zero components in θ is a measure for the feature sparsity of the earning probem. For each dataset, a inear kerne (normaized as in [0]) is generated from each feature and the resuting dictionary is input to MKL-GOMP and q -norm MKL. For each eve of sparsity, a training of size 50, vaidation and test sets of size 0000 are generated 0 times and average cassification errors are reported. For each run, the vaidation error is monitored as kerne seection progresses in MKL-GOMP and the number of kernes with smaest vaidation error are chosen. The reguarization parameters for both MKL-GOMP and q norm MKL are simiary chosen using the vaidation set. Figure 5. shows test error rates as a function of sparsity of the target function: from non-sparse (a kernes needed) to extremey sparse (ony kerne needed). We recover the observations aso made in [0]: -norm MKL exces in extremey sparse settings where a singe kerne carries the whoe discriminative information of the earning probem. However, in the other scenarios it mosty performs worse than the other q > variants, despite the fact that the vector θ remains sparse in a but the uniform scenario. As q is increased, the error rate in these settings improves but deteriorates in sparse settings. As reported in [], the eastic net MKL approach of [26] performs simiar to -MKL in the hinge oss case. As can be seen in the Figure, the error curve of MKL-GOMP tends to be beow the ower enveope of the error rates given by q -MKL soutions. To adapt to the sparsity of the probem, q methods ceary need to tune q requiring severa fresh invocations of the appropriate q -MKL sover. On the other hand, in MKL-GOMP the hypothesis space grows as function of the iteration number and the soution trajectory naturay expands sequentiay in the direction of decreasing sparsity. The right pot in Figure 5. shows the number of kernes seected by MKL-GOMP and the optima vaue of λ, suggesting that MKL-GOMP adapts to the sparsity and smoothness of the earning probem. 5.2 Protein Subceuar Locaization The muticass generaization of -MKL proposed in [33] (MCMKL) is state of the art methodoogy in predicting protein subceuar ocaization, an important ce bioogy probem that concerns the estimation of where a protein resides in a ce so that, for exampe, the identification of drug targets can be aided. We use three muticass datasets: PSORT+, PSORT- and PLANT provided by the authors of [33] at together with a dictionary of 69 kernes derived with bioogica insight: 2 kernes on phyogenetic 2 q-mkl with SVM hinge oss behaves simiary. 3 Provided by the authors of [0] at mdata.org/repository/data/viewsug/mk-toy/ 7
8 Performance (higher is better) mkgomp mcmk sum singe other psort+ psort pant Figure 2: Protein Subceuar Locaization Resuts trees, 3 kernes based on simiarity to known proteins (BLAST E-vaues), and 64 kernes based on amino-acid sequence patterns. The statistics of the three datasets are as foows: PSORT+ has 54 proteins abeed with 4 ocation casses, PSORT- has 444 proteins in 5 casses and PLANT is a 4-cass probem with 940 proteins. For each dataset, resuts are averaged over 0 spits of the dataset into training and test sets. We used exacty the same experimenta protoco, data spits and evauation methodoogy as given in [33]: the hyper-parameters of MKL-GOMP (sparsity and the reguarization parameter λ) were tuned based on 3-fod cross-vaidation; resuts on PSORT+, PSORTare F-scores averaged over the casses whie those on PLANT are Mathew s correation coefficient 4. Figure 2 compare MKL-GOMP against MCMKL, baseines such as using the sum of a the kernes and using the best singe kerne, and resuts from other prediction systems proposed in the iterature. As can be seen, MKL-GOMP sighty outperforms MCMKL on PSORT+ an PSORT- datasets and is sighty worse on PLANT where RLS with the sum of a the kernes aso performs very we. On the two PSORT datasets, [33] report seecting 25 kernes using MCMKL. On the other hand, on average, MKL-GOMP seects 4 kernes on PSORT+, 5 on PSORT- and 24 kernes on PLANT. Note that MKL-GOMP is appied in mutivariate mode: the kernes are seected based on their utiity to reduce the tota residua error across a target casses. 6 Concusion By proposing a Group-OMP based framework for sparse mutipe kerne earning, anayzing theoreticay the performance of the resuting methods in reation to the dominant convex reaxation-based approach, and demonstrating the vaue of our framework through extensive experimenta studies, we beieve greedy methods arise as a natura aternative for tacking MKL probems. Reevant directions for future research incude extending our theoretica anaysis to the stochastic setting, investigating compex mutivariate structures and groupings over outputs, e.g., by generaizing the mutivariate version of Group-OMP [5], and extending our agorithm to incorporate interesting structured kerne dictionaries [3]. Acknowedgments: We thank Rick Lawrence, Ha Quang Minh and David Rosenberg for insightfu conversations and enthusiastic support for this work. References [] N. Aronszajn. Theory of reproducing kerne hibert spaces. Transactions of the American Mathematica Society, 68(3): , 950. [2] F. Bach. Consistency of group asso and mutipe kerne earning. JMLR, 9:79 225, [3] F. Bach. High-dimensiona non-inear variabe seection through hierarchica kerne earning. In Technica report, HAL , [4] F. Bach, R. Jenatton, J. Maira, and G. Obozinski. Optimization with sparsity-inducing penaties. In Technica report, HAL , see 8
9 [5] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Mutipe kerne earning, conic duaity, and the smo agorithm. In ICML, [6] P. Bartett and S. Mendeson. Rademacher and gaussian compexities: Risk bounds and structura resuts. JMLR, 3: , [7] A. Ben-Hur and W. S. Nobe. Kerne methods for predicting protein protein interactions. Bioinformatics, 2, January [8] C. Cortes, M. Mohri, and Afshin Rostamizadeh. Generaization bounds for earning kernes. In ICML, 200. [9] A. K. Fetcher and S. Rangan. Orthogona matching pursuit from noisy measurements: A new anaysis. In NIPS, [0] M. Koft, U. Brefed, S. Sonnenburg, and A. Zien. p-norm mutipe kerne earning. JMLR, 2: , 20. [] M. Koft, U. Ruckert, and P. Bartett. A unifying view of mutipe kerne earning. In European Conference on Machine Learning (ECML), 200. [2] V. Kotchinskii and M. Yuan. Sparsity in mutipe kerne earning. The Annas of Statistics, 38(6): , 200. [3] G. R. G. Lanckriet, N. Cristianini, P. Bartett, L. E Ghaoui, and M. I. Jordan. Learning the kerne matrix with semidefinite programming. J. Mach. Learn. Res., 5:27 72, December [4] G. R. G. Lanckriet, T. De Bie, N. Cristianini, M. I. Jordan, and W. S. Nobe. A statistica framework for genomic data fusion. Bioinformatics, 20, November [5] A. C. Lozano and V. Sindhwani. Bock variabe seection in mutivariate regression and high-dimensiona causa inference. In NIPS, 200. [6] A. C. Lozano, G. Swirszcz, and N. Abe. Group orthogona matching pursuit for variabe seection and prediction. In NIPS, [7] C. Michei and M. Ponti. Learning the kerne function via reguarization. JMLR, 6:099 25, [8] H. Liu P. Ravikumar, J. Lafferty and L. Wasserman. Sparse additive modes. Journa of the Roya Statistica Society: Series B (Statistica Methodoogy) (JRSSB), 7 (5): , [9] P. Pavidis, J. Cai, J. Weston, and W.S. Nobe. Learning gene functiona cassifications from mutipe data types. Journa of Computationa Bioogy, 9:40 4, [20] A. Rakotomamonjy, F.Bach, S. Cano, and Y. Grandvaet. SimpeMKL. Journa of Machine Learning Research, 9: , [2] G. Raskutti, M. Wainwrigt, and B. Yu. Minimax-optima rates for sparse additive modes over kerne casses via convex programming. In Technica Report 795, Statistics Department, UC Berkeey., 200. [22] Bernhard Schokopf and Aexander J. Smoa. Learning with Kernes: Support Vector Machines, Reguarization, Optimization, and Beyond. MIT Press, 200. [23] J. Shawe-Tayor and N. Cristianini. Kerne Methods for Pattern Anaysis. Cambridge University Press, [24] S. Sonnenburg, G. Rätsch, C. Schäfer, and B. Schökopf. Large scae mutipe kerne earning. J. Mach. Learn. Res., 7, December [25] Zhang T. Sparse recovery with orthogona matching pursuit under rip. Computing Research Repository, 200. [26] R. Tomioka and T. Suzuki. Sparsity-accuracy tradeoff in mk. In NIPS Workshop: Understanding Mutipe Kerne Learning Methods. Technica report, arxiv:00.265v, 200. [27] J. Tropp. Greed is good: Agorithmic resuts for sparse approximation. IEEE Trans. Inform. Theory,, 50(0): , [28] P. Vincent and Y. Bengio. Kerne matching pursuit. Machine Learning, 48:65 88, [29] Z. Xu, R. Jin, H. Yang, I. King, and M.R. Lyu. Simpe and efficient mutipe kerne earning by group asso. In ICML, 200. [30] Ming Yuan, Ai Ekici, Zhaosong Lu, and Renato Monteiro. Dimension reduction and coefficient estimation in mutivariate inear regression. Journa Of The Roya Statistica Society Series B, 69(3): , [3] Tong Zhang. On the consistency of feature seection using greedy east squares regression. J. Mach. Learn. Res., 0, June [32] H. Zhou and T. Hastie. Reguarization and variabe seection via the eastic net. Journa of the Roya Statistica Society, 67(2):30 320, [33] A. Zien and Cheng S. Ong. Muticass mutipe kerne earning. ICML,
Statistical Learning Theory: A Primer
Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO
More informationMoreau-Yosida Regularization for Grouped Tree Structure Learning
Moreau-Yosida Reguarization for Grouped Tree Structure Learning Jun Liu Computer Science and Engineering Arizona State University J.Liu@asu.edu Jieping Ye Computer Science and Engineering Arizona State
More informationA Brief Introduction to Markov Chains and Hidden Markov Models
A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,
More informationStatistical Learning Theory: a Primer
??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa
More informationProbabilistic Graphical Models
Schoo of Computer Science Probabiistic Graphica Modes Gaussian graphica modes and Ising modes: modeing networks Eric Xing Lecture 0, February 0, 07 Reading: See cass website Eric Xing @ CMU, 005-07 Network
More informationExplicit overall risk minimization transductive bound
1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,
More informationFrom Margins to Probabilities in Multiclass Learning Problems
From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error
More informationFRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)
1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using
More informationTwo view learning: SVM-2K, Theory and Practice
Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk
More informationAn Algorithm for Pruning Redundant Modules in Min-Max Modular Network
An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai
More informationMultilayer Kerceptron
Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,
More informationA unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio
MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March23, 1999
More informationarxiv: v1 [cs.lg] 31 Oct 2017
ACCELERATED SPARSE SUBSPACE CLUSTERING Abofaz Hashemi and Haris Vikao Department of Eectrica and Computer Engineering, University of Texas at Austin, Austin, TX, USA arxiv:7.26v [cs.lg] 3 Oct 27 ABSTRACT
More informationA. Distribution of the test statistic
A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch
More informationMARKOV CHAINS AND MARKOV DECISION THEORY. Contents
MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After
More informationScalable Spectrum Allocation for Large Networks Based on Sparse Optimization
Scaabe Spectrum ocation for Large Networks ased on Sparse Optimization innan Zhuang Modem R&D Lab Samsung Semiconductor, Inc. San Diego, C Dongning Guo, Ermin Wei, and Michae L. Honig Department of Eectrica
More informationCONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION
CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization
More informationSparse Semi-supervised Learning Using Conjugate Functions
Journa of Machine Learning Research (200) 2423-2455 Submitted 2/09; Pubished 9/0 Sparse Semi-supervised Learning Using Conjugate Functions Shiiang Sun Department of Computer Science and Technoogy East
More informationSVM-based Supervised and Unsupervised Classification Schemes
SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro
More informationCryptanalysis of PKP: A New Approach
Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in
More informationAdaptive Regularization for Transductive Support Vector Machine
Adaptive Reguarization for Transductive Support Vector Machine Zengin Xu Custer MMCI Saarand Univ. & MPI INF Saarbrucken, Germany zxu@mpi-inf.mpg.de Rong Jin Computer Sci. & Eng. Michigan State Univ. East
More informationBayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?
Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine
More informationStatistics for Applications. Chapter 7: Regression 1/43
Statistics for Appications Chapter 7: Regression 1/43 Heuristics of the inear regression (1) Consider a coud of i.i.d. random points (X i,y i ),i =1,...,n : 2/43 Heuristics of the inear regression (2)
More informationStochastic Variational Inference with Gradient Linearization
Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,
More informationInductive Bias: How to generalize on novel data. CS Inductive Bias 1
Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow
More informationA Simple and Efficient Algorithm of 3-D Single-Source Localization with Uniform Cross Array Bing Xue 1 2 a) * Guangyou Fang 1 2 b and Yicai Ji 1 2 c)
A Simpe Efficient Agorithm of 3-D Singe-Source Locaization with Uniform Cross Array Bing Xue a * Guangyou Fang b Yicai Ji c Key Laboratory of Eectromagnetic Radiation Sensing Technoogy, Institute of Eectronics,
More informationActive Learning & Experimental Design
Active Learning & Experimenta Design Danie Ting Heaviy modified, of course, by Lye Ungar Origina Sides by Barbara Engehardt and Aex Shyr Lye Ungar, University of Pennsyvania Motivation u Data coection
More informationLecture Note 3: Stationary Iterative Methods
MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or
More informationOptimality of Inference in Hierarchical Coding for Distributed Object-Based Representations
Optimaity of Inference in Hierarchica Coding for Distributed Object-Based Representations Simon Brodeur, Jean Rouat NECOTIS, Département génie éectrique et génie informatique, Université de Sherbrooke,
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sampe cover image for this issue The actua cover is not yet avaiabe at this time) This artice appeared in a journa pubished by Esevier The attached copy is furnished to the author for interna
More informationSequential Decoding of Polar Codes with Arbitrary Binary Kernel
Sequentia Decoding of Poar Codes with Arbitrary Binary Kerne Vera Miosavskaya, Peter Trifonov Saint-Petersburg State Poytechnic University Emai: veram,petert}@dcn.icc.spbstu.ru Abstract The probem of efficient
More informationTarget Location Estimation in Wireless Sensor Networks Using Binary Data
Target Location stimation in Wireess Sensor Networks Using Binary Data Ruixin Niu and Pramod K. Varshney Department of ectrica ngineering and Computer Science Link Ha Syracuse University Syracuse, NY 344
More informationUniversal Consistency of Multi-Class Support Vector Classification
Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the
More informationLearning Structural Changes of Gaussian Graphical Models in Controlled Experiments
Learning Structura Changes of Gaussian Graphica Modes in Controed Experiments Bai Zhang and Yue Wang Bradey Department of Eectrica and Computer Engineering Virginia Poytechnic Institute and State University
More informationUniprocessor Feasibility of Sporadic Tasks with Constrained Deadlines is Strongly conp-complete
Uniprocessor Feasibiity of Sporadic Tasks with Constrained Deadines is Strongy conp-compete Pontus Ekberg and Wang Yi Uppsaa University, Sweden Emai: {pontus.ekberg yi}@it.uu.se Abstract Deciding the feasibiity
More informationAppendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model
Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this
More informationMelodic contour estimation with B-spline models using a MDL criterion
Meodic contour estimation with B-spine modes using a MDL criterion Damien Loive, Ney Barbot, Oivier Boeffard IRISA / University of Rennes 1 - ENSSAT 6 rue de Kerampont, B.P. 80518, F-305 Lannion Cedex
More informationDo Schools Matter for High Math Achievement? Evidence from the American Mathematics Competitions Glenn Ellison and Ashley Swanson Online Appendix
VOL. NO. DO SCHOOLS MATTER FOR HIGH MATH ACHIEVEMENT? 43 Do Schoos Matter for High Math Achievement? Evidence from the American Mathematics Competitions Genn Eison and Ashey Swanson Onine Appendix Appendix
More informationSVM: Terminology 1(6) SVM: Terminology 2(6)
Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are
More informationA proposed nonparametric mixture density estimation using B-spline functions
A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),
More informationBDD-Based Analysis of Gapped q-gram Filters
BDD-Based Anaysis of Gapped q-gram Fiters Marc Fontaine, Stefan Burkhardt 2 and Juha Kärkkäinen 2 Max-Panck-Institut für Informatik Stuhsatzenhausweg 85, 6623 Saarbrücken, Germany e-mai: stburk@mpi-sb.mpg.de
More informationParagraph Topic Classification
Paragraph Topic Cassification Eugene Nho Graduate Schoo of Business Stanford University Stanford, CA 94305 enho@stanford.edu Edward Ng Department of Eectrica Engineering Stanford University Stanford, CA
More informationThe EM Algorithm applied to determining new limit points of Mahler measures
Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,
More informationBP neural network-based sports performance prediction model applied research
Avaiabe onine www.jocpr.com Journa of Chemica and Pharmaceutica Research, 204, 6(7:93-936 Research Artice ISSN : 0975-7384 CODEN(USA : JCPRC5 BP neura networ-based sports performance prediction mode appied
More informationASummaryofGaussianProcesses Coryn A.L. Bailer-Jones
ASummaryofGaussianProcesses Coryn A.L. Baier-Jones Cavendish Laboratory University of Cambridge caj@mrao.cam.ac.uk Introduction A genera prediction probem can be posed as foows. We consider that the variabe
More informationKernel Matching Pursuit
Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique
More informationDIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM
DIGITAL FILTER DESIGN OF IIR FILTERS USING REAL VALUED GENETIC ALGORITHM MIKAEL NILSSON, MATTIAS DAHL AND INGVAR CLAESSON Bekinge Institute of Technoogy Department of Teecommunications and Signa Processing
More informationNEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION
NEW DEVELOPMENT OF OPTIMAL COMPUTING BUDGET ALLOCATION FOR DISCRETE EVENT SIMULATION Hsiao-Chang Chen Dept. of Systems Engineering University of Pennsyvania Phiadephia, PA 904-635, U.S.A. Chun-Hung Chen
More informationMATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES
MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is
More informationEfficiently Generating Random Bits from Finite State Markov Chains
1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown
More informationSome Measures for Asymmetry of Distributions
Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester
More informationPower Control and Transmission Scheduling for Network Utility Maximization in Wireless Networks
ower Contro and Transmission Scheduing for Network Utiity Maximization in Wireess Networks Min Cao, Vivek Raghunathan, Stephen Hany, Vinod Sharma and. R. Kumar Abstract We consider a joint power contro
More information6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17. Solution 7
6.434J/16.391J Statistics for Engineers and Scientists May 4 MIT, Spring 2006 Handout #17 Soution 7 Probem 1: Generating Random Variabes Each part of this probem requires impementation in MATLAB. For the
More informationAlberto Maydeu Olivares Instituto de Empresa Marketing Dept. C/Maria de Molina Madrid Spain
CORRECTIONS TO CLASSICAL PROCEDURES FOR ESTIMATING THURSTONE S CASE V MODEL FOR RANKING DATA Aberto Maydeu Oivares Instituto de Empresa Marketing Dept. C/Maria de Moina -5 28006 Madrid Spain Aberto.Maydeu@ie.edu
More informationC. Fourier Sine Series Overview
12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a
More informationarxiv: v2 [stat.ml] 8 Mar 2013
Scaabe Matrix-vaued Kerne Learning for High-dimensiona Noninear Mutivariate Regression and Granger Causaity arxiv:1210.4792v2 [stat.ml] 8 Mar 2013 Vikas Sindhwani IBM Research New York 10598, USA vsindhw@us.ibm.com
More informationAsynchronous Control for Coupled Markov Decision Systems
INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of
More informationExpectation-Maximization for Estimating Parameters for a Mixture of Poissons
Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating
More informationIdentification of macro and micro parameters in solidification model
BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vo. 55, No. 1, 27 Identification of macro and micro parameters in soidification mode B. MOCHNACKI 1 and E. MAJCHRZAK 2,1 1 Czestochowa University
More informationComponentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems
Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,
More informationUnconditional security of differential phase shift quantum key distribution
Unconditiona security of differentia phase shift quantum key distribution Kai Wen, Yoshihisa Yamamoto Ginzton Lab and Dept of Eectrica Engineering Stanford University Basic idea of DPS-QKD Protoco. Aice
More informationII. PROBLEM. A. Description. For the space of audio signals
CS229 - Fina Report Speech Recording based Language Recognition (Natura Language) Leopod Cambier - cambier; Matan Leibovich - matane; Cindy Orozco Bohorquez - orozcocc ABSTRACT We construct a rea time
More informationFitting Algorithms for MMPP ATM Traffic Models
Fitting Agorithms for PP AT Traffic odes A. Nogueira, P. Savador, R. Vaadas University of Aveiro / Institute of Teecommunications, 38-93 Aveiro, Portuga; e-mai: (nogueira, savador, rv)@av.it.pt ABSTRACT
More informationNonlinear Analysis of Spatial Trusses
Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes
More informationA Comparison Study of the Test for Right Censored and Grouped Data
Communications for Statistica Appications and Methods 2015, Vo. 22, No. 4, 313 320 DOI: http://dx.doi.org/10.5351/csam.2015.22.4.313 Print ISSN 2287-7843 / Onine ISSN 2383-4757 A Comparison Study of the
More informationAsymptotic Properties of a Generalized Cross Entropy Optimization Algorithm
1 Asymptotic Properties of a Generaized Cross Entropy Optimization Agorithm Zijun Wu, Michae Koonko, Institute for Appied Stochastics and Operations Research, Caustha Technica University Abstract The discrete
More informationResearch of Data Fusion Method of Multi-Sensor Based on Correlation Coefficient of Confidence Distance
Send Orders for Reprints to reprints@benthamscience.ae 340 The Open Cybernetics & Systemics Journa, 015, 9, 340-344 Open Access Research of Data Fusion Method of Muti-Sensor Based on Correation Coefficient
More informationPrimal and dual active-set methods for convex quadratic programming
Math. Program., Ser. A 216) 159:469 58 DOI 1.17/s117-15-966-2 FULL LENGTH PAPER Prima and dua active-set methods for convex quadratic programming Anders Forsgren 1 Phiip E. Gi 2 Eizabeth Wong 2 Received:
More informationCopyright information to be inserted by the Publishers. Unsplitting BGK-type Schemes for the Shallow. Water Equations KUN XU
Copyright information to be inserted by the Pubishers Unspitting BGK-type Schemes for the Shaow Water Equations KUN XU Mathematics Department, Hong Kong University of Science and Technoogy, Cear Water
More information4 Separation of Variables
4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE
More informationGeneral Certificate of Education Advanced Level Examination June 2010
Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/P10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of temperature on the rate of photosynthesis
More informationSome Properties of Regularized Kernel Methods
Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di
More informationLearning Fully Observed Undirected Graphical Models
Learning Fuy Observed Undirected Graphica Modes Sides Credit: Matt Gormey (2016) Kayhan Batmangheich 1 Machine Learning The data inspires the structures we want to predict Inference finds {best structure,
More informationTranslation Microscopy (TRAM) for super-resolution imaging.
ransation Microscopy (RAM) for super-resoution imaging. Zhen Qiu* 1,,3, Rhodri S Wison* 1,3, Yuewei Liu 1,3, Aison Dun 1,3, Rebecca S Saeeb 1,3, Dongsheng Liu 4, Coin Ricman 1,3, Rory R Duncan 1,3,5, Weiping
More informationSupport Vector Machine and Its Application to Regression and Classification
BearWorks Institutiona Repository MSU Graduate Theses Spring 2017 Support Vector Machine and Its Appication to Regression and Cassification Xiaotong Hu As with any inteectua project, the content and views
More informationGeneral Certificate of Education Advanced Level Examination June 2010
Genera Certificate of Education Advanced Leve Examination June 2010 Human Bioogy HBI6T/Q10/task Unit 6T A2 Investigative Skis Assignment Task Sheet The effect of using one or two eyes on the perception
More informationarxiv: v1 [math.co] 17 Dec 2018
On the Extrema Maximum Agreement Subtree Probem arxiv:1812.06951v1 [math.o] 17 Dec 2018 Aexey Markin Department of omputer Science, Iowa State University, USA amarkin@iastate.edu Abstract Given two phyogenetic
More informationSTA 216 Project: Spline Approach to Discrete Survival Analysis
: Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing
More informationarxiv: v1 [math.ca] 6 Mar 2017
Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear
More informationFUSED MULTIPLE GRAPHICAL LASSO
FUSED MULTIPLE GRAPHICAL LASSO SEN YANG, ZHAOSONG LU, XIAOTONG SHEN, PETER WONKA, JIEPING YE Abstract. In this paper, we consider the probem of estimating mutipe graphica modes simutaneousy using the fused
More informationDetermining The Degree of Generalization Using An Incremental Learning Algorithm
Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c
More informationLasso and probabilistic inequalities for multivariate point processes
Submitted to the Bernoui arxiv: arxiv:128.57 Lasso and probabiistic inequaities for mutivariate point processes NIELS RICHARD HANSEN 1, PATRICIA REYNAUD-BOURET 2 and VINCENT RIVOIRARD 3 1 Department of
More informationarxiv: v2 [stat.ml] 17 Mar 2015
Journa of Machine Learning Research 1x (201x) x-xx Submitted x/0x; Pubished x/0x A Unifying Framework in Vector-vaued Reproducing Kerne Hibert Spaces for Manifod Reguarization and Co-Reguarized Muti-view
More informationBayesian Unscented Kalman Filter for State Estimation of Nonlinear and Non-Gaussian Systems
Bayesian Unscented Kaman Fiter for State Estimation of Noninear and Non-aussian Systems Zhong Liu, Shing-Chow Chan, Ho-Chun Wu and iafei Wu Department of Eectrica and Eectronic Engineering, he University
More informationAnother Look at Linear Programming for Feature Selection via Methods of Regularization 1
Another Look at Linear Programming for Feature Seection via Methods of Reguarization Yonggang Yao, The Ohio State University Yoonkyung Lee, The Ohio State University Technica Report No. 800 November, 2007
More informationAdaptive Localization in a Dynamic WiFi Environment Through Multi-view Learning
daptive Locaization in a Dynamic WiFi Environment Through Muti-view Learning Sinno Jiain Pan, James T. Kwok, Qiang Yang, and Jeffrey Junfeng Pan Department of Computer Science and Engineering Hong Kong
More informationOn the Goal Value of a Boolean Function
On the Goa Vaue of a Booean Function Eric Bach Dept. of CS University of Wisconsin 1210 W. Dayton St. Madison, WI 53706 Lisa Heerstein Dept of CSE NYU Schoo of Engineering 2 Metrotech Center, 10th Foor
More informationConsistent linguistic fuzzy preference relation with multi-granular uncertain linguistic information for solving decision making problems
Consistent inguistic fuzzy preference reation with muti-granuar uncertain inguistic information for soving decision making probems Siti mnah Binti Mohd Ridzuan, and Daud Mohamad Citation: IP Conference
More informationarxiv: v2 [cs.lg] 4 Sep 2014
Cassification with Sparse Overapping Groups Nikhi S. Rao Robert D. Nowak Department of Eectrica and Computer Engineering University of Wisconsin-Madison nrao2@wisc.edu nowak@ece.wisc.edu ariv:1402.4512v2
More informationarxiv: v2 [stat.ml] 19 Oct 2016
Sparse Quadratic Discriminant Anaysis and Community Bayes arxiv:1407.4543v2 [stat.ml] 19 Oct 2016 Ya Le Department of Statistics Stanford University ye@stanford.edu Abstract Trevor Hastie Department of
More informationThe Binary Space Partitioning-Tree Process Supplementary Material
The inary Space Partitioning-Tree Process Suppementary Materia Xuhui Fan in Li Scott. Sisson Schoo of omputer Science Fudan University ibin@fudan.edu.cn Schoo of Mathematics and Statistics University of
More informationDistributed average consensus: Beyond the realm of linearity
Distributed average consensus: Beyond the ream of inearity Usman A. Khan, Soummya Kar, and José M. F. Moura Department of Eectrica and Computer Engineering Carnegie Meon University 5 Forbes Ave, Pittsburgh,
More informationLearning Gaussian Processes from Multiple Tasks
Kai Yu kai.yu@siemens.com Information and Communication, Corporate Technoogy, Siemens AG, Munich, Germany Voker Tresp voker.tresp@siemens.com Information and Communication, Corporate Technoogy, Siemens
More informationData Mining Technology for Failure Prognostic of Avionics
IEEE Transactions on Aerospace and Eectronic Systems. Voume 38, #, pp.388-403, 00. Data Mining Technoogy for Faiure Prognostic of Avionics V.A. Skormin, Binghamton University, Binghamton, NY, 1390, USA
More informationOn a geometrical approach in contact mechanics
Institut für Mechanik On a geometrica approach in contact mechanics Aexander Konyukhov, Kar Schweizerhof Universität Karsruhe, Institut für Mechanik Institut für Mechanik Kaiserstr. 12, Geb. 20.30 76128
More informationEfficient Similarity Search across Top-k Lists under the Kendall s Tau Distance
Efficient Simiarity Search across Top-k Lists under the Kenda s Tau Distance Koninika Pa TU Kaisersautern Kaisersautern, Germany pa@cs.uni-k.de Sebastian Miche TU Kaisersautern Kaisersautern, Germany smiche@cs.uni-k.de
More informationTurbo Codes. Coding and Communication Laboratory. Dept. of Electrical Engineering, National Chung Hsing University
Turbo Codes Coding and Communication Laboratory Dept. of Eectrica Engineering, Nationa Chung Hsing University Turbo codes 1 Chapter 12: Turbo Codes 1. Introduction 2. Turbo code encoder 3. Design of intereaver
More informationA Novel Learning Method for Elman Neural Network Using Local Search
Neura Information Processing Letters and Reviews Vo. 11, No. 8, August 2007 LETTER A Nove Learning Method for Eman Neura Networ Using Loca Search Facuty of Engineering, Toyama University, Gofuu 3190 Toyama
More informationAlgorithms to solve massively under-defined systems of multivariate quadratic equations
Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations
More informationStatistical Inference, Econometric Analysis and Matrix Algebra
Statistica Inference, Econometric Anaysis and Matrix Agebra Bernhard Schipp Water Krämer Editors Statistica Inference, Econometric Anaysis and Matrix Agebra Festschrift in Honour of Götz Trenker Physica-Verag
More information