Improved Bounds for the Nyström Method with Application to Kernel Classification

Size: px
Start display at page:

Download "Improved Bounds for the Nyström Method with Application to Kernel Classification"

Transcription

1 Iproved Bounds for the yströ Method with Application to Kernel Classification Rong Jin, Meber, IEEE, Tianbao Yang, Mehrdad Mahdavi, Yu-Feng Li, Zhi-Hua Zhou, Fellow, IEEE Abstract We develop two approaches for analyzing the approxiation error bound for the yströ ethod that approxiates a positive seidefinite PSD atrix by sapling a sall set of coluns, one based on the concentration inequality of integral operator, and one based on the rando atrix theory We show that the approxiation error, easured in the spectral nor, can be iproved fro O/ to O/ ρ in the case of large eigengap, where is the total nuber of data points, is the nuber of sapled data points, and ρ 0, / is a positive constant that characterizes the eigengap When the eigenvalues of the kernel atrix follow a p-power law, our analysis based on rando atrix theory further iproves the bound to O/ p under an incoherence assuption We present a kernel classification approach based on the yströ ethod and derive its generalization perforance using the iproved bound We show that when the eigenvalues of kernel atrix follow a p-power law, we can reduce the nuber of support vectors to p/p, which is sublinear in when p > +, without seriously sacrificing its generalization perforance Index Ters yströ ethod, approxiation error, concentration inequality, kernel ethods, rando atrix theory I ITRODUCTIO The yströ ethod has been widely applied in achine learning to efficiently approxiate large kernel atrices with low rank atrices in order to speed up kernel algoriths [], [], [3], [4], [5], [6], [7], [8], [9], [0], [] In order to evaluate the quality of the yströ ethod, we typically bound the nor of the difference between the original kernel atrix and the low rank approxiate atrix generated by the yströ ethod Several analysis were developed to bound the approxiation error of the yströ ethod [], [4], [9], [], [0], [3], [4] Most of the focus on additive error bound, and base on the theoretical results fro [] When the target atrix is of low rank, significantly better bounds for the approxiation error of the yströ ethod were given in [0] and [3] These results are further generalized to kernel atrices of an arbitrary rank by a relative error bound in [4] In this study, we focus on the additive error bound of the Rong Jin and Mehrdad Mahrdavi are with the Departent of Coputer Science and Engineering, Michigan State University, East Lansing, MI, 4884 USA E-ail: rongjin, ahdavi@csesuedu Tianbao Yang is with GE Global Research, San Raon, CA, USA Eail: tyang@geco Yu-Feng Li and Zhi-Hua Zhou are with ational Key Laboratory of ovel Software Technology, anjing University, anjing, 003, China Eail: liyf, zhouzh@ladanjueducn Copyright c 0 IEEE Personal use of this aterial is peritted However, perission to use this aterial for any other purposes ust be obtained fro the IEEE by sending a request to pubs-perissions@ieeeorg yströ ethod for PSD atrices, and will copare our results ainly to the ones stated in [] Although a relative error bound is usually tighter than an additive bound [5], we show in this paper that the approxiation error resulting fro our additive error analysis is better than that indicated by the relative error analysis in [4], when the kernel atrix follows a power law eigen-value distribution Below, we review the ain results in [] and their liitations Let K R be a PSD kernel atrix to be approxiated, and i, i =,, be the eigenvalues of K ranked in the descending order Let Kr be an approxiate kernel atrix of rank r generated by the yströ ethod Let be the nuber of coluns sapled fro K used to construct Kr Then, under the assuption K i,i = O, Drineas and Mahoney [] showed that for any uniforly sapled coluns, with a high probability, K Kr r+ + O, where stands for the spectral nor of a atrix In particular, by setting r =, the bound becoes K K + + O The ain proble with the bound in is its slow reduction rate in the nuber of sapled coluns ie, O /, iplying that a large nuber of saples is needed in order to achieve a sall approxiation error In this study, we ai to iprove the approxiation error bound in by considering two special cases of the kernel atrix K In the first case, we assue there is a large eigengap in the spectru of K More specifically, we assue there exists a rank r [] such that r = Ω/ ρ and r+ = O/ ρ, where ρ < / Here, paraeter ρ is introduced to characterize the eigengap r r+ : the saller the ρ, the larger the eigengap will be We show that the approxiation error bound can be iproved to O/ ρ in the case of large eigengap The second case assues that the eigenvalues of K follow a p-power law with p > Under this assuption, we obtain an iproved O/ p bound on the approxiation error, provided that the eigenvector atrix sat- For copleteness, we did include the coparison to the relative error bound in [4] in the later rearks Although the ain results in [] use a data dependent sapling schee, they also hold for the unifor sapling entioned by the authors of [] and explicitly established in [6]

2 isfies an incoherence assuption 3 This result ay shed light on why the yströ ethod works well for kernel atrices with power law eigenvalue distributions, a phenoenon that is often observed in practice [0] The second contribution of this study is a kernel classification algorith that explicitly explores the iproved bounds of the yströ ethod developed here We show that when the eigenvalues of the kernel atrix follow a p-power law with p >, we can construct a kernel classifier that yields a siilar generalization perforance as the full version of kernel classifier but with no ore than p/p support vectors, which is sublinear in when p > + Although the generalization error bound of using the yströ ethod for classification has been studied in [], to the best of knowledge, this is the first work that bounds the nuber of support vectors using the analysis of the yströ ethod Although the results presented in this paper are theoretical, they can find applications in the real world The result for a PSD atrix with a large eigen-gap can find applications in docuent analysis [7] and social network analysis [8] Several works [9], [0] have observed that a Guassian kernel atrix usually exhibits a skewed eigen-value distribution on real data sets, which ake our result for a PSD atrix with a power law eigen-value distribution attractive Finally, as we prepare the final version of the anuscript, two related works [], [] have been accepted for publication In [], the authors copared the yströ ethod with different sapling and projection strategies and established iproved bounds for these strategies Their result for the spectral nor error bound with the unifor sapling strategy, as considered in this work, is siilar to the one presented in [4] In [], the author also addresses the generalization error of kernel learning using the yströ ethod The key result of [] indicates that the iniu nuber of sapled coluns depends on the degree of freedo Our result addresses the saple coplexity of kernel learning fro a different aspect II OTATIOS AD BACKGROUD Let D = x,, x be a collection of saples, where x i X, and K = [κx i, x j ] be the kernel atrix for the saples in D, where κ, is a kernel function For siplicity, we assue κx, x for any x X We denote by v i, i, i =,, the eigenvectors and eigenvalues of K ranked in the descending order of eigenvalues, and by V = v,, v the orthonoral eigenvector atrix In order to build the low rank approxiation of kernel atrix K, the yströ ethod first saples < exaples randoly fro D, denoted by D = x,, x Let K = [κ x i, x j ] easure the kernel siilarity between any two saples in D and K b = [κx i, x j ] easure the siilarity between the saples in D and D Using the saples in D, with rank r set to or the rank of K if it is less than, the yströ ethod approxiates K by K K b Kb, where K denote the pseudo inverse of K Our goal is to 3 A siilar assuption was used in the previous analysis of the yströ ethod [0], [3], [4] provide a high probability bound for the approxiation error K K K b Kb We choose r = or the rank of K because according to [], [4], it yields the best approxiation error for a non-singular kernel atrix In this study, we focus on the spectral nor for easuring the approxiation error, which is particularly suitable for kernel classification [] We also restrict the analysis to the unifor sapling for the yströ ethod This is because according to [4], [6], for ost real-world datasets, unifor sapling is coputationally ore efficient, and yields perforance that is coparable to, if not better than, the non-unifor sapling strategies [], [8], [3], [4] Our analysis for the yströ ethod extensively exploits the properties of the integral operator This is in contrast to ost of the previous studies for the yströ ethod that rely on atrix analysis The ain advantage of using the integral operator is its convenience in handling the unseen data points ie, test data, aking it attractive for the analysis of generalization error bounds In particular, we introduce a linear operator L defined over the saples in D For any function f, operator L is defined as L [f] = κx i, fx i It can be shown that the eigenvalues of the operator L are i /, i =,, [5] Let ϕ,, ϕ be the corresponding eigenfunctions of L that are noralized by functional nor, ie, ϕ i, ϕ j Hκ = δi, j, i j, where, Hκ denotes the inner product in H κ and δi, j is the Kronecker delta function According to [5], the eigenfunctions satisfy j ϕ j = V i,j κx i,, j =,,, where V i,j is the i, jth eleent in V Siilarly, we can write κx j, by its eigen-expansion as κx j, = i V j,i ϕ i, j =,, 3 Furtherore, let L be an operator defined on the saples in D, ie, L [f] = κ x i, f x i Finally we denote by f, g Hκ and f Hκ the inner product and function nor in Hilbert space H κ, respectively, and denote by L HS and L the Hilbert Schid nor and spectral nor of a linear operator L, respectively, ie L HS = i,j ϕ i, Lϕ j H κ and L = ax f Hκ Lf H κ, where ϕ i, i =,, is a coplete orthogonal basis of H κ The two nors are the analogs of Frobenius and spectral nor in Euclidean space, respectively In the following analysis, oitted proofs are presented in the appendix

3 3 III APPROXIMATIO ERROR BOUD BY THE YSTRÖM METHOD Our first step is to turn K K b K K b into a functional approxiation proble To this end, we introduce two sets: H a = span κ x,,, κ x, H b = f = u i κx i, : u i, where H a is the subspace spanned by kernel functions defined on the saples in D, and H b is a subset of a functional space spanned by kernel functions defined on the saples in D with bounded coefficients Using the eigen-expansion of κx j, in 3, it is straightforward to show that H b can be rewritten in the basis of the eigenfunctions ϕ i H b = f = w i i ϕ i : wi Define Eg, H a as the iniu error in approxiating a function g H b by functions in H a, ie, Eg, H a = in f H a f g H κ = f H κ + g H κ f, g Hκ Define EH a as the worst error in approxiating any function g H b by functions in H a, ie, EH a = ax Eg, H a 4 g H b The following proposition connects K K K b Kb with EH a Proposition : For any rando saples x,, x, we have K Kb K K = EH a as b Proof: Since g H b and f H a, we can write g and f g = u i κx i, and f = z i κ x i,, where u = u,, u R satisfies u and z = z,, z R We thus can rewrite Eg, H a as an optiization proble in ters of z, ie, and therefore Eg, H a = in z R z Kz u K b z + u Ku = u K K K b Kb u, EH a = ax g H b Eg, H a = ax u u K K K b Kb = K K K b Kb u Reark : We can restrict the space H a to its subspace Ha r = z i κ x i, : z span v,, v r, where v i, i =,, r are the first r eigenvectors of K, to conduct the analysis for the rank r < approxiation of the yströ ethod To proceed our analysis, for any r [] we define H r = spanϕ,, ϕ r, H r = spanϕ r+,, ϕ, r Hb r = f = w i i ϕ i : r r wi, H r b = f = w i i+r ϕ i+r : r w i Define EH a, r = ax Eg, H a as the worst error in g H r b approxiating any function g Hb r by functions in H a The proposition below bounds EH a by EH a, r Proposition : For any r [], we have EH a ax EH a, r, r+ EH a, r + r+ Proof: We first note that any f H a can be written as f = f +f, where f H a H r, and f H a H r For any g H b, we can write g = g + g, where g δh r b, g δh r b, and δ [0, ] Using these notations, we rewrite EH a as EH a = ax δ [0, ] g δh r b g δh r b ax δ ax δ [0,] = ax δ [0,] g H r b δ ax g H r b in f g + f g H κ f H a H r f H a H r in f g H + δ ax g a H r g H r H κ b in f g H + δ ax g a g H r H κ b = ax δ [0,] δeh a, r + δ r+ = ax EH a, r, r+, where the second equality follows that for any g Hb r, in f g H = in f g H a, and the last inequality a H r follows the definition of EH a, r As indicated by Proposition, in order to bound the approxiation error EH a, we can bound EH a, r, naely the approxiation error for functions in the subspace spanned by the top eigenfunctions of L In the next two subsections, we discuss two approaches for bounding EH a, r: the first approach relies on the concentration inequality of integral operator [5], and the second approach explores the rando atrix theory [6] Before proceeding to upper bound EH a, we first provide a lower bound for EH a Theore : There exists a kernel atrix K R with all its diagonal entries being such that for any sapling strategy that selects coluns, the approxiation error of the yströ ethod is lower bounded by Ω, ie, K K K b Kb Ω, provided > 64[ln 4]

4 4 Reark : Theore shows that the lower bound for the approxiation error of the yströ ethod is Ω/ The analysis developed in this work ais to bridge the gap between the known upper bound ie, O/ and the obtained lower bound A Bound for EH a, r using Concentration Inequality of Integral Operator In this section, we bound EH a, r using the concentration inequality of integral operator We show that the approxiation error of the yströ ethod can be iproved to O/ ρ when there is a large eigengap in the spectru of kernel atrix K, where ρ < / is introduced to characterize the eigengap We first state the concentration inequality of a general rando variable Proposition 3: Proposition [5] Let ξ be a rando variable on X, P X with values in a Hilbert space H, Assue ξ M < is alost sure, then with a probability at least δ, we have ξx i E[ξ] 4M ln/δ The approxiation error of the yströ ethod using the concentration inequality is given in the following theore Theore : With a probability at least δ, for any r [], we have K K K b Kb 6[ln/δ] + r+ r We consider the scenario where there is very large eigengap in the spectru of kernel atrix K In particular, we assue that there exists a rank r and ρ 0, / such that r = Ω/ ρ and r+ = O/ ρ Paraeter ρ is introduced to characterize the eigengap which is given by r r+ = Ω ρ ρ = Ω ρ [ ρ Evidently, the saller the ρ, the larger the eigengap When ρ = /, the eigengap is sall Under the large eigengap assuption, the bound in Theore is siplified as K K b K K b O ρ ] 5 Copared to the bound in, the bound in 5 iproves the approxiation error fro O/ to O/ ρ, when ρ < / To prove Theore, we define two sets of functions H r c = h = r w i i ϕ i : H r d = f H κ : f H κ / r r wi i, where r corresponds to the rank with a large eigengap It is evident that Hc r Hd r; and for any g Hr b, it can also be written as g = L [h], where h Hc r Using Hc r and Hd r, we have EH a, r = ax Eg, H a g H r b = ax in L h f H a h H r c ax in L h f H a h H r d By constructing f as L [h] we can bound EH a, r as EH a, r ax in L h f H a h H r d ax L L h h Hd r H κ L L L L HS r r, 6 where the last step follows the fact L L L L HS The corollary below followed iediately fro Proposition 3, allows us to bound the difference between L and L and Corollary : With a probability δ, we have L L HS 4 ln/δ Finally, Theore follows directly the inequality in 6 and the result in Corollary B Bound for EH a, r using Rando Matrix Theory In this subsection, we ai to develop a better error bound for the yströ ethod for kernel atrices with eigenvalues that follow a power law distribution Our analysis explicitly explores soe of the key results in rando atrix theory that has been the ain building blocks in the theory of copressive sensing [6], [7] To this end, we first introduce the definition of the power law distribution of eigenvalues [8], [9] The eigenvalues σ i, i =, ranked in the non-increasing order follows a p-power law distribution if there exists a constant c > 0 such that σ k ck p In the sequel, we assue the noralized eigenvalues i /, i =,, ie, the eigenvalues of the operator L, follow a p-power law distribution 4 A well-known exaple of kernel with a power law eigenvalue distribution [8] is the kernel function that generates Sobolev Spaces W α, T d of soothness α > d/, where T d is d-diensional torus Its eigenvalues follow a p-power law with p = α > d It is also observed that the eigenvalues of a Gaussian kernel by appropriately setting the width paraeter follow a power law distribution [9] In order to exploit the result of rando atrices fro [6], we introduce the definition of the coherence µ for the eigevenvector atrix V = v,, v as µ = ax i,j V i,j 4 We assue a power law distribution for the noralize eigenvalues i / because the eigenvalues i of K scales in

5 5 Intuitively, the coherence easures the degree to which the eigenvectors in V are correlated with the canonical bases According to the result fro copressive sensing, highly coherent atrices are difficult even ipossible to be recovered by atrix copletion with rando sapling The coherence easure was first introduced into the error analysis of the yströ ethod by Talwalkar and Rostaizadeh [0] Their analysis shows that a low rank kernel atrix with incoherent eigvenvectors ie, with low coherence can be accurately approxiated by the yströ ethod using the unifor sapling This result is generalized to noisy observation in [3] for low rank atrix The ain liitation of these results is that they only apply to low rank atrices Recently, A Gittens [4] developed a relative error bound of the yströ ethod for kernel atrices with an arbitrary rank using a slightly different coherence easure Siilar to [0], [3], [4], coherence easure also plays an iportant role in our analysis However, unlike the previous studies, we focus on the error bound of the yströ ethod for kernel atrices with an arbitrary rank and a skewed eigenvalue distribution The ain result of our analysis is given in the following theore Theore 3: Assue the eigenvalues i /, i =,, follow a p-power law with p > Given a sufficiently large nuber of saples, ie, ln > µ ax 6, C ab ln3 3, 4Cab ln 3 3 γ we have, with a probability 3, K K K b Kb Õ p, where Õ suppresses the polynoial factor that depends on ln, and C ab is a nuerical constant as revealed in our later analysis Reark 3: Copared to the approxiation error in, Theore 3 iproves the bound fro O/ to O/ p provided the eigenvalues of kernel atrix follow a power law It is worth noting that siilar to [0], [3], [4], the bound in Theore 3 is eaningful only when the coherence µ of the eigenvector atrix is sall ie, the eigenvector atrix satisfies the incoherence assuption We also would like to copare the absolute error bound in Theore 3 with the relative error bound in [4], which is doinated by O / p+ when eigenvalues follow a p-power law This bound is worse than the O/ p bound obtained in Theore 3 when, a favorable setting when is very large and is sall Although we notice that the relative error bound in [4] holds ore generally than the error bound in Theore 3, the coparison ay shed light on how a power law eigen-value distribution could yield an iproved approxiation error bound of the yströ ethod, and could possibly inspire new error analysis of other rando atrix approxiation ethods In Reark 4, we ake a coparison between a ore general result in Theore 7 and [4] We ephasize that the result in Theore 3 does not contradict the lower bound given in Theore because Theore 3 holds only for the cases when eigenvalues of the kernel atrix follow a power law In fact, an updated lower bound for kernel atrix with a power law eigenvalue distribution is given in the following theore Theore 4: There exists a kernel atrix K R with all its diagonal entries being and its eigenvalues following a p-power law such that for any sapling strategy that selects coluns, the approxiation error of the yströ ethod is lower bounded by Ω, ie, p K K b K K b Ω p provided > 64[ln 4] We skip the proof of this theore as it is alost identical to that of Theore The gap between the upper bound and the lower bound given in Theores 3 and 4 indicates that there is potentially a roo for further iproveent ext, we present several theores and corollaries to pave the path for the proof of Theore 3 We borrow the following two theores used by the copressive sensing theory [6] Theore 5: Theore fro [6] Let V be an orthogonal atrix V V = I with coherence µ Fix a subset T of the signal doain Choose a subset S of the easureent doain of size S = uniforly at rando Suppose that the nuber of easureents obeys T µ ax C a ln T, C b ln3/δ for soe positive constants C a and C b Then Pr V S,T V S,T I / δ Theore 6: Lea 33 fro [6] Let V, S, and T be the sae as defined in Theore 5 Let u k be the k-th row of VS, V S,T Define σ = µ ax, µ T / Fix a > 0 obeying a /µ /4 if µ T / > and a /[µ T ] / otherwise Let z k = VS,T V S,T u k Then, we have Pr sup z k µ T / + aσ/ k T c exp γa + Pr VS,T V S,T for soe positive constant γ, where T c stands for the copleentary set to T We note that Theore 5 has been explored in [0] for showing that the yströ approxiation is exact when the target PSD atrix is of low rank However, our analysis does not restrict to a low rank PSD atrix Instead, by cobining the results in Theore 5 and Theore 6, we have the following high probability bound for sup k T c z k, which serves the key to prove a general result stated in Theore 7 for a PSD atrix with an arbitrary rank Corollary : If T ax C ab ln3 3, 4 ln γ, and ln µ ax T C ab ln3 3, 6 < µ T, γ where C ab = axc a, C b, then with a probability 3, we have T sup z k 4µ k T c

6 6 Using Corollary, we have the following bound for EH a, r Theore 7: If r > axc ab ln3 3, 4 ln /γ and ln µ ax rc ab ln3 3, 6 < µ r, γ then, with a probability 3, we have EH a, r 6µ r i=r+ Reark 4: Before presenting the proof, it is worthwhile to copare the approxiation error bound by cobing the results in Theore 7 and Proposition to Gittens relative error bound [4], ie, with a probability at least δ it holds that K K K b Kb r+ + provided 8τ r logr/δ, where τ is a coherence easure of the top-r eigen-space of K defined by τ = r ax r i j= V ij In the case when the eigenvalues decay fast eg, eigenvalues follow a power law, we have i=r+ i r+, and therefore our bound could be significantly better than Gittens relative error bound On the other hand, when eigenvalues follow a flat distribution eg, i r+ for all i [r+, ], we have i=r+ i r+, and therefore our bound is worse than Gittens relative bound by only a factor of µ r Proof of Theore 7: For the sake of siplicity, we assue that the first exaples are sapled, ie, D = x,, x For any g Hb r, we have g = r w i / i ϕ i, with r Below, we will ake w i specific construction of f based on g that ensures a sall approxiation error Let f be f = = = a j κx j, j= ϕ i / i b i / i ϕ i, i a j V j,i j= where b i = j= a jv j,i, i =,,, and the value of a = a,, a will be given later Define T =,, r and S =,, Under the condition that rµ ax C a, C b ln3 3 rµ ax C a ln r, C b ln3 3, Theore 5 holds, and therefore with a probability at least 3, in V S,T V S,T ax V 3 S,T V S,T 7 We construct a as a = V S,T [ V S,T V S,T ] w, where w = w,, w r Since b = V S, a = V S, V S,T V S,T V S,T w, where b = b,, b, it is straightforward to see that b j = w j for j T Using the result fro Corollary, we have, with a probability at least 3, r ax b j ax z j w 4µ j T c j T c, where z j is the j-th row of atrix VS, V S,T V S,T V S,T We thus obtain f g 6µ r H κ = i b i ϕ i i Hence, i T c / H κ EH a, r = ax in f g g H r H b 6µ r a i=r+ i=r+ Finally, we show the proof of Theore 3 using Theore 7 Proof of Theore 3: Let r = µ C ab ln3 3, then µ rc ab ln3 3 < µ r, where the right inequality follows that r µ C ab ln3 3, and > 4µ Cab ln 3 3 Then the conditions in Theore 7 hold and we have K K K b Kb ax EH a, r, r+ 6µ r ax, i i=r+ Since ax6µ r/, O due to the specific value we choose for r, and i=r+ i O/r p due to the power law distribution, then K K K b Kb O r p Õ p IV APPLICATIO OF THE YSTRÖM METHOD TO KEREL CLASSIFICATIO Although the yströ ethod was proposed in [] to speed up kernel achine, few studies exaine the application of the yströ ethod to kernel classification In fact, to the best of our knowledge, [] and [] are the only two pieces of work that explicitly explore the yströ ethod for kernel classification The key idea of both works is to apply the yströ ethod to approxiate the kernel atrix with a low rank atrix in order to reduce the coputational cost More specifically, we consider the following optiization proble for kernel classification in L f = f H κ + i ly i fx i, 8 where y i, + is the class label assigned to instance x i, and lz is a convex loss function To facilitate our analysis, we assue i lz is strongly convex with odulus σ, ie l z σ 5, and ii lz is Lipschitz continuous, ie 5 Loss functions such as square loss used for regression and logit function used for logistic regression are strongly convex

7 7 l z C for any z within the doain Using the convex conjugate of the loss function lz, denoted by l α, α Ω, where Ω is the doain for dual variable α, we can cast the proble in 8 into the following optiization proble over α ax α i Ω l α i α y Kα y, 9 with the solution f given by f = α iy i κx i, By the Fenchel conjugate theory, we have ax α Ω α C because l z C To reduce the coputational cost, [] and [] suggest to replace the kernel atrix K with its low rank approxiation K = K K b Kb, leading to the following optiization proble for α ax α i Ω l α i α y Kα y 0 One ain proble with this approach is that although it siplifies the coputation of kernel atrix, it does not siplify the classifier f, because the nuber of support vectors, after the application of the yströ ethod, is not guaranteed to be sall [30], [3], leading to a high coputational cost in perforing prediction on testing exaples We address this difficulty by presenting a new approach to explore the yströ ethod for kernel classification We show that, by an analysis of excessive risk, the nuber of support vectors for achieving a siilar perforance as the kernel classifier learned with the full kernel atrix is bounded by p/p, where p characterizes the power law of the eigen-value distribution of the kernel atrix Siilar to the previous analysis, we randoly select a subset of training exaples, denoted by D = x,, x, and restrict the solution of f to the subspace H a = spanκ x,,, κ x,, leading to the following optiization proble in f H a L f = f H κ + ly i fx i The following proposition shows that the optial solution to is closely related to the optial solution to 0 Proposition 4: The solution f to is given by f = z i y i κ x i,, where z = K Kb α and α is the optial solution to 0 It is iportant to note that the classifier obtained fro is only supported by the sapled training exaples in D, which significantly reduces the coplexity of the kernel classifier copared to the approach suggested in [], [] We also note that the proposed approach is equivalent to learning a linear classifier by representing each instance x with the vector φx = D / V κ x, x,, κ x, x, where D is a diagonal atrix with non-zero eigenvalues of K, and V is the corresponding eigenvector atrix Although this idea has already been adopted by practitioners, we are unable to find any reference on its epirical study The reaining of this work is to show that this approach could have a good generalization perforance provided that the eigenvalues of kernel atrix follow a skewed distribution Below, we develop the generalization error bound for the classifier learned fro Let f and f a be the optial solutions to 8 and, respectively Let f be the optial classifier that iniizes the expected loss function, ie, f = arg in P l f E x,y [lyfx] Let f L = E x [ fx ] denote the l nor square of f In order to create a tight bound, we exploit the technique of local Radeacher coplexity [3], [33] Define ψ as / ψδ = inδ, i Let ε be the solution to ε = ψ ε where the existence and uniqueness of ε is deterined by the sub-root property of ψδ [3] Finally we define 6 ln ɛ = ax ε, Theore 8: Assue with a probability 3, EH a Γ,, where Γ, is soe function depending on and Assue that is sufficiently large such that e ax f a Hκ, f Hκ ln, ax f a L, f L e 6 ln Then, with a probability at least 4 3, we have P l f a P l f + f H κ + C Γ, + C C ɛ 4 + C C ɛ + C Ce σ where ɛ is given in and C is a constant independent fro and By choosing that iniizes the above bound, we have P l f a P l f + 4 f Hκ ɛ C C Γ, + ɛ 4 + C C ɛ + C Ce σ Reark 5: In the case when the eigenvalues of the kernel atrix follow a p-power law with p >, we have ɛ = O p/p+ according to [8], and Γ, = O/ p according to Theore 3 Applying these results to Theore 8, the generalization perforance of f a becoes P l f a P l f + f H κ + C C p + C Ce + C 3C p/p+ + C 4C p/p+ 3 σ

8 8 where C, C 3, and C 4 are constants independent fro and By choosing that iniizes the bound in 3, we have P l f a P l f + 4 f Hκ C C p/p+ 3 + C p/p+ p + C 4C σ + C Ce p/p+ = P l f + O p/p+ + p / As indicated by above inequality, when the eigenvalues of the kernel atrix follow a p-power law, by setting = p/p, we are able to achieve a siilar perforance as the full version of kernel classifier ie, O p/p+ In other words, we can construct a kernel classifier without sacrificing its generalization perforance with no ore than p/p support vectors, which could be significantly saller than when p > + For the exaple of kernel that generates Sobolev Spaces W α, T d of soothness α > d/, where T d is d-diensional torus, its eigenvalues follow a p-power law with p = α > d, which is larger than + when d 3 V COCLUSIO We develop new ethods for analyzing the approxiation bound for the yströ ethod We show that the approxiation error can be iproved to O/ ρ in the case when there is a large eigengap in the spectru of a kernel atrix, where ρ 0, / is introduced to characterize the eigengap When the eigenvalues of a kernel atrix follow a p-power law, the approxiation error is further reduced to O/ p under an incoherence assuption We develop a kernel classification approach based on the yströ ethod and show that when the eigenvalues of a kernel atrix follow a p-power law p >, we can reduce the nuber of support vectors to p/p, which could be significantly less than if p is large, without seriously sacrificing its generalization perforance ACKOWLEDGMET Rong Jin and Mehrdad Mahdavi are supported in part by Office of avy Research OR award and Yu-Feng Li and Zhi-Hua Zhou are partially supported by the ational Fundaental Research Progra of China 00CB37903, the ational Science Foundation of China , 6006 REFERECES [] C Willias and M Seeger, Using the nystro ethod to speed up kernel achines, in Advances in eural Inforation Processing Systes 3 MIT Press, 00, pp [] P Drineas and M W Mahoney, On the nystro ethod for approxiating a gra atrix for iproved kernel-based learning, Journal of Machine Learning Research, vol 6, p 005, 005 [3] C Fowlkes, S Belongie, F Chung, and J Malik, Spectral grouping using the nystro ethod, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 6, p 004, 004 [4] S Kuar, M Mohri, and A Talwalkar, Sapling techniques for the nystro ethod, in Proceedings of Conference on Artificial Intelligence and Statistics, 009, pp [5] V D Silva and J B Tenenbau, Global versus local ethods in nonlinear diensionality reduction, in Advances in eural Inforation Processing Systes 5, 003, pp [6] J C Platt, Fast ebedding of sparse usic siilarity graphs, in Advances in eural Inforation Processing Systes 6 MIT Press, 004, p 004 [7] A Talwalkar, S Kuar, and H A Rowley, Large-scale anifold learning, in IEEE Coputer Society Conference on Coputer Vision and Pattern Recognition, 008 [8] K Zhang, I W Tsang, and J T Kwok, Iproved nystro lowrank approxiation and error analysis, in Proceedings of International Conference on Machine Learning, 008 [9] M-A Belabbas and P J Wolfe, Spectral ethods in achine learning and new strategies for very large data sets, Proceedings of the ational Acadey of Sciences of the USA, vol 06, pp , 009 [0] A Talwalkar and A Rostaizadeh, Matrix coherence and the nystro ethod, in Proceedings of Conference on Uncertainty in Artificial Intelligence, 00 [] C Cortes, M Mohri, and A Talwalkar, On the ipact of kernel approxiation on learning accuracy, Journal of Machine Learning Research - Proceedings Track, vol 9, pp 3 0, 00 [] M Li, J T Kwok, and B-L Lu, Making large-scale nyströ approxiation possible, in Proceedings of the 7th international conference on Machine learning, 00, pp [3] L W Mackey, A S Talwalkar, and M I Jordan, Divide-and-conquer atrix factorization, in Advances in eural Inforation Processing Systes 4, J Shawe-Taylor, R Zeel, P Bartlett, F Pereira, and K Weinberger, Eds, 0, pp 34 4 [4] A Gittens, The spectral nor error of the naive nystro extension, CoRR, 0 [5] M W Mahoney, Randoized algoriths for atrices and data, Foundations and Trends in Machine Learning, vol 3, no, pp 3 4, 0 [6] S Kuar, M Mohri, and A Talwalkar, Sapling ethods for the nystro ethod, J Mach Learn Res, vol 98888, pp , 0 [7] J Chen and Y Saad, Lanczos vectors versus singular vectors for effective diension reduction, IEEE Trans on Knowl and Data Eng, vol, pp 09 03, 009 [8] L Wu, X Ying, X Wu, and Z-H Zhou, Line orthogonality in adjacency eigenspace with application to counity partition, in Proceedings of the Twenty-Second international joint conference on Artificial Intelligence, 0, pp [9] M Ji, T Yang, B Lin, R Jin, and J Han, A siple algorith for sei-supervised learning with iproved generalization error bound, in Proceedings of the 9th international conference on Machine learning, 0, pp [0] T Yang, Y-F Li, M Mahdavi, R Jin, and Z-H Zhou, ystro ethod vs rando fourier features: A theoretical and epirical coparison, in Advances in eural Inforation Processing Systes 5, 0, pp [] A Gittens and M W Mahoney, Revisiting the nystro ethod for iproved large-scale achine learning, in Proceedings of the 30th international conference on Machine learning, 03, pp [] F Bach, Sharp analysis of low-rank kernel atrix approxiations, in Proceedings of the 6th Conference on Learning Theory, 03, pp [3] J B Hough, M Krishnapur, Y Peres, and B Virag, Deterinantal processes and independence, Probability Surveys, vol 3, pp 06 9, 006 [4] P Drineas, M W Mahoney, and S Muthukrishnan, Relative-error cur atrix decopositions, SIAM J Matrix Anal Appl, vol 30, pp , 008 [5] S Sale and D-X Zhou, Geoetry on probability spaces, Constr Approx, vol 30, pp 3 33, 009 [6] E Candés and J Roberg, Sparsity and incoherence in copressive sapling, Inverse Probles, vol 3, no 3, pp , 007 [7] D L Donoho, Copressed sensing, IEEE Transactions on Inforation Theory, vol 5, no 4, pp , 006 [8] V Koltchinskii and M Yuan, Sparsity in ultiple kernel learning, Annuals of Statistics, vol 38, pp , 00 [9] M Kloft and G Blanchard, The local radeacher coplexity of lpnor ultiple kernel learning, in Advances in eural Inforation Processing Systes 3, 0, pp [30] O Dekel and Y Singer, Support vector achines on a budget, in IPS, 006, pp [3] T Joachis and C- J Yu, Sparse kernel svs via cutting-plane training, Mach Learn, vol 76, pp 79 93, 009

9 9 [3] P L Bartlett, O Bousquet, and S Mendelson, Local radeacher coplexities, Annals of Statistics, pp 44 58, 00 [33] V Koltchinskii, Oracle Inequalities in Epirical Risk Miniization and Sparse Recovery Probles Springer, 0 APPEDIX A PROOF OF THEOREM We argue that there exists a kernel atrix K such that i all its diagonal entries equal to, and ii the first + eigenvalues of K are in the order of Ω/ To see the existence of such a atrix, we saple + vectors u,, u +, where u i R, fro a Bernoulli distribution, with Pru i,j = + = Pru i,j = = / We then construct K as K = + u i u i + = + UU, 4 where U = u,, u + First, since u i,j = ±, we have diagu i u i =, where is a vector of all ones, and therefore K i,i = for i [] Second, we show that with soe probability δ, all non-zero eigenvalues of U U are bounded between / and 3/, ie, in U U ax U U 3 5 To prove 5, we use the concentration inequality in Proposition 3 We define ξ i = z i z i, i =,,, where z i R is the ith row of the atrix U, and in the above proposition as the spectral nor of a atrix Since every eleent in z i is sapled fro a Bernoulli distribution with equal probabilities of being ±, we have E[z i z i ] = I and z i z i = Thus, with a probability δ, we have U U I = ξ i E[ξ] 4 ln/δ When > 64 [ln 4], for any sapled U, with 50% chance, we have U U I, which iplies 5 With the bound in 5 and using the fact that the eigenvalues of UU equal to the eigenvalues of U U, it is straightforward to see that the first + eigenvalues of K are in the order of Ω/ Up to this point, we proved the existence of such a kernel atrix ext, we prove the lower bound for the constructed kernel atrix Let V :+ = v,, v + the first + eigenvectors of K We construct ĝ as follows: Let u = V :+ a be a vector in the subspace spanv,, v + that satisfies the condition Kb u = 0 The existence of such a vector is guaranteed because rankkb V :+ We noralize a such that a = Then we let ĝ = u iκx i, = + w i i ϕ i, where w = V:+ u It is easy to verify that i ĝ H b since u = V :+ a =, and ii ĝ H a since u K b = 0 Using ĝ, we have EH a = ax in f g H g H b ĝ H κ = a = Ω w Ω, + + w i i where we use w = V:+ V :+a = a = We coplete the proof by using the fact EH a = K K K b Kb APPEDIX B PROOF OF COROLLARY Define ξ x i to be a rank one linear operator, ie, ξ x i [f] = κ x i, f x i Apparently, L = ξ x i and E[ξ x i ] = L We coplete the proof by using the result fro Proposition 3 and the fact ξ x k HS = ϕ i, κ x k, ϕ j x k i,j= = ϕ i x k ϕ j x k = κ x k, x k, i,j= where the last equality follows equation 3 APPEDIX C PROOF OF COROLLARY We choose a = ln /γ in Theore 6 Since, /4 6µ ln γ then we have a µ Additionally, by having µ T / >, the conditions in Theore 6 hold, and by setting δ = 3 in Theore 5, the condition in Theore 5 holds, which together iplies Pr sup z k µ T / + aσ/ k T c exp γa + Pr VS,T V S,T 3 + Pr V S,T V S,T I 3 Fro this we have, with a probability 3, /4 T sup z k µ k T c + µ3 T / µ T = 4µ

10 0 Since APPEDIX D PROOF OF PROPOSITIO 4 ly i fx i = ax α iy i fx i l α i, α i Ω we rewrite the optiization proble in into a convexconcave optiization proble in ax f H a α i Ω f H κ + α i y i fx i l α i Since f H a, we write f = z iκ x i,, resulting in the following optiization proble in ax z R α i Ω z Kz + α y K b z l α i Since the above proble in linear convex in z and concave in α, we can switch iniization with axiization We coplete the proof by taking the iniization over z APPEDIX E PROOF OF THEOREM 8 To siply our presentation, we introduce notations P l f = ly i fx i, Λf = P l f P l f Using P l f, we can write L f = P l f+ f H κ We first prove that L f L f a + C EH a, where ax z Ω z C ote that L f = ax α i Ω L f a Then = ax α i Ω L f = ax α i Ω l α i α y Kα y l α i α y Kα y l α i α y Kα y + α y K Kα y ax l α i α i Ω α y Kα y + ax α i Ω α y K Kα y L f a + α K K L f a + C EH a Then we proceed the proof as follows f a H κ + P l f a P l f a + f a H κ + P P l f a P l f + f H κ + C EH a + P P l f a P l f + f H κ + C EH a + P P l f a, where the third inequality follows fro the fact that f is the iniizer of P l f + f H κ Hence, Λf a f H κ f a H κ + C EH a + P P l f a l f Let r = f f a L and R = f f a H κ Define Gr, R = f H κ : f f L r, f f Hκ R Using the doain G, we rewrite the bound for Λf a by Λf a f H κ f a H κ + C EH a + sup f Gr,R P P l f l f Since ɛr e and ɛ R e, using Lea 9 fro [8], we have, with a probability 3, for any sup P P l f l f C Crɛ + Rɛ + e, f Gr,R where C is a constant independent fro Thus, with a probability at least 4 3, we have Λf a C Ce f H κ f a H κ + C Γ, + C Cɛ f a f L + C Cɛ f f a Hκ f H κ f a H κ + C Γ, + C C ɛ + σ σ 4 f a f L + C C ɛ 4 f H κ f a H κ + C Γ, + C C ɛ + σ σ 4 f a f L + C L ɛ 4 f H κ + C Γ, + σ 4 f a f L f H κ + C Γ, + C C ɛ σ + C C ɛ σ + 4 f f a H κ + f H κ + f a H κ + C C ɛ 4 + C C ɛ 4 + Λf a, where in the second inequality we apply Young s inequality ab a ɛ + ɛb twice, the last inequality follows fro the strong convexity of lz and f is the iniizer of P l f =

11 E x,y [lyfx] Thus, with a probability at least 4 3, we have P l f a P l f + f H κ + C Γ, + C C ɛ σ + C C ɛ 4 + C Ce We coplete the proof by iniizing over in the RHS of the above inequality Rong Jin is a professor of the Coputer and Science Engineering Dept at Michigan State University He has been working in the areas of statistical achine learning and its application to inforation retrieval He has extensive research experience in a variety of achine learning algoriths such as conditional exponential odels, support vector achine, boosting and optiization for different applications including inforation retrieval Dr Jin is an associative editor of ACM Transactions on Knowledge Discovery fro Data, and received SF Career Award in 006 Dr Jin obtained his PhD degree fro Carnegie Mellon University in 003, and received best paper award fro Conference of Learning Theory COLT in 0 Tianbao Yang received the PhD degree in Coputer Science fro Michigan State University in 0 He joined GE Global Research in 0 and works as a achine learning researcher Dr Yang has board interests in achine learning and he has focused on several research topics, including social network analysis and large scale optiization in achine learning He has published over 0 papers in prestigious achine learning conferences and journals He has won the Mark Fulk Best student paper award at 5th Conference on Learning Theory COLT in 0 Tianbao Yang also served as progra coittee for several conferences and journals, including AAAI, CIKM, 3, IJCAI 3, ACML, TKDD, TKDE, and etc and IJC 3 Zhi-Hua Zhou S 00-M 0-SM 06 received the BSc, MSc and PhD degrees in coputer science fro anjing University, China, in 996, 998 and 000, respectively, all with the highest honors He joined the Departent of Coputer Science & Technology at anjing University as an assistant professor in 00, and is currently professor and Director of the LAMDA group His research interests are ainly in artificial intelligence, achine learning, data ining, pattern recognition and ultiedia inforation retrieval In these areas he has published ore than 00 papers in leading international journals or conference proceedings, and holds patents He has won various awards/honors including the IEEE CIS Outstanding Early Career Award, the ational Science & Technology Award for Young Scholars of China, the Fok Ying Tung Young Professorship Award, the Microsoft Young Professorship Award and nine international journals/conferences paper or copetition awards He is an Associate Editor-in-Chief of the Chinese Science Bulletin, Associate Editor of the ACM Transactions on Intelligent Systes and Technology and on the editorial boards of various other journals He is the founder and Steering Coittee Chair of ACML, and Steering Coittee eber of PAKDD and PRICAI He serves/ed as General Chair/Co-chair of ACML, ADMA and PCM 3, Progra Chair/Co-Chair for PAKDD 07, PRICAI 08, ACML 09, SDM 3, etc, Workshop Chair of KDD, Progra Vice Chair or Area Chair of various conferences, and chaired any doestic conferences in China He is the Chair of the Machine Learning Technical Coittee of the Chinese Association of Artificial Intelligence, Chair of the Artificial Intelligence & Pattern Recognition Technical Coittee of the China Coputer Federation, Vice Chair of the Data Mining Technical Coittee of IEEE Coputational Intelligence Society and the Chair of the IEEE Coputer Society anjing Chapter He is a fellow of the IAPR, the IEEE, and the IET/IEE Mehrdad Mahdavi received the MSc degree fro Sharif University of Technology, Tehran, Iran Since 009, he has been pursuing a PhD in Coputer Science at the Michigan State University and before that he spent two years as a PhD candidate at Sharif University of Technology His current research interests include Machine Learning focused on Online Stochastic Convex Optiization, Learning Theory, and Learning in Gaes He has won the Mark Fulk Best Student Paper Award at the Conference on Learning Theory COLT in 0 Yu-Feng Li received the BSc and PhD degrees in coputer science fro anjing University, China, in 006 and 03, respectively His ain research interests include achine learning and data ining He has won the Microsoft Fellowship Award 009 He has been Progra Coittee eber of several conferences including IJCAI, ICDM, AAAI

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,

More information

Lower Bounds for Quantized Matrix Completion

Lower Bounds for Quantized Matrix Completion Lower Bounds for Quantized Matrix Copletion Mary Wootters and Yaniv Plan Departent of Matheatics University of Michigan Ann Arbor, MI Eail: wootters, yplan}@uich.edu Mark A. Davenport School of Elec. &

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Lecture 20 November 7, 2013

Lecture 20 November 7, 2013 CS 229r: Algoriths for Big Data Fall 2013 Prof. Jelani Nelson Lecture 20 Noveber 7, 2013 Scribe: Yun Willia Yu 1 Introduction Today we re going to go through the analysis of atrix copletion. First though,

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

Physics 215 Winter The Density Matrix

Physics 215 Winter The Density Matrix Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it

More information

Randomized Recovery for Boolean Compressed Sensing

Randomized Recovery for Boolean Compressed Sensing Randoized Recovery for Boolean Copressed Sensing Mitra Fatei and Martin Vetterli Laboratory of Audiovisual Counication École Polytechnique Fédéral de Lausanne (EPFL) Eail: {itra.fatei, artin.vetterli}@epfl.ch

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

A Simple Homotopy Algorithm for Compressive Sensing

A Simple Homotopy Algorithm for Compressive Sensing A Siple Hootopy Algorith for Copressive Sensing Lijun Zhang Tianbao Yang Rong Jin Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China Departent of Coputer

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Weighted- 1 minimization with multiple weighting sets

Weighted- 1 minimization with multiple weighting sets Weighted- 1 iniization with ultiple weighting sets Hassan Mansour a,b and Özgür Yılaza a Matheatics Departent, University of British Colubia, Vancouver - BC, Canada; b Coputer Science Departent, University

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Exact tensor completion with sum-of-squares

Exact tensor completion with sum-of-squares Proceedings of Machine Learning Research vol 65:1 54, 2017 30th Annual Conference on Learning Theory Exact tensor copletion with su-of-squares Aaron Potechin Institute for Advanced Study, Princeton David

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements

Compressive Distilled Sensing: Sparse Recovery Using Adaptivity in Compressive Measurements 1 Copressive Distilled Sensing: Sparse Recovery Using Adaptivity in Copressive Measureents Jarvis D. Haupt 1 Richard G. Baraniuk 1 Rui M. Castro 2 and Robert D. Nowak 3 1 Dept. of Electrical and Coputer

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup)

Recovering Data from Underdetermined Quadratic Measurements (CS 229a Project: Final Writeup) Recovering Data fro Underdeterined Quadratic Measureents (CS 229a Project: Final Writeup) Mahdi Soltanolkotabi Deceber 16, 2011 1 Introduction Data that arises fro engineering applications often contains

More information

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation

More information

Hybrid System Identification: An SDP Approach

Hybrid System Identification: An SDP Approach 49th IEEE Conference on Decision and Control Deceber 15-17, 2010 Hilton Atlanta Hotel, Atlanta, GA, USA Hybrid Syste Identification: An SDP Approach C Feng, C M Lagoa, N Ozay and M Sznaier Abstract The

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

On the Use of A Priori Information for Sparse Signal Approximations

On the Use of A Priori Information for Sparse Signal Approximations ITS TECHNICAL REPORT NO. 3/4 On the Use of A Priori Inforation for Sparse Signal Approxiations Oscar Divorra Escoda, Lorenzo Granai and Pierre Vandergheynst Signal Processing Institute ITS) Ecole Polytechnique

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Convex Programming for Scheduling Unrelated Parallel Machines

Convex Programming for Scheduling Unrelated Parallel Machines Convex Prograing for Scheduling Unrelated Parallel Machines Yossi Azar Air Epstein Abstract We consider the classical proble of scheduling parallel unrelated achines. Each job is to be processed by exactly

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Symmetrization and Rademacher Averages

Symmetrization and Rademacher Averages Stat 928: Statistical Learning Theory Lecture: Syetrization and Radeacher Averages Instructor: Sha Kakade Radeacher Averages Recall that we are interested in bounding the difference between epirical and

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Bipartite subgraphs and the smallest eigenvalue

Bipartite subgraphs and the smallest eigenvalue Bipartite subgraphs and the sallest eigenvalue Noga Alon Benny Sudaov Abstract Two results dealing with the relation between the sallest eigenvalue of a graph and its bipartite subgraphs are obtained.

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate The Siplex Method is Strongly Polynoial for the Markov Decision Proble with a Fixed Discount Rate Yinyu Ye April 20, 2010 Abstract In this note we prove that the classic siplex ethod with the ost-negativereduced-cost

More information

arxiv: v1 [cs.ds] 29 Jan 2012

arxiv: v1 [cs.ds] 29 Jan 2012 A parallel approxiation algorith for ixed packing covering seidefinite progras arxiv:1201.6090v1 [cs.ds] 29 Jan 2012 Rahul Jain National U. Singapore January 28, 2012 Abstract Penghui Yao National U. Singapore

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION

RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION RANDOM GRADIENT EXTRAPOLATION FOR DISTRIBUTED AND STOCHASTIC OPTIMIZATION GUANGHUI LAN AND YI ZHOU Abstract. In this paper, we consider a class of finite-su convex optiization probles defined over a distributed

More information

Lecture 9 November 23, 2015

Lecture 9 November 23, 2015 CSC244: Discrepancy Theory in Coputer Science Fall 25 Aleksandar Nikolov Lecture 9 Noveber 23, 25 Scribe: Nick Spooner Properties of γ 2 Recall that γ 2 (A) is defined for A R n as follows: γ 2 (A) = in{r(u)

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe

. The univariate situation. It is well-known for a long tie that denoinators of Pade approxiants can be considered as orthogonal polynoials with respe PROPERTIES OF MULTIVARIATE HOMOGENEOUS ORTHOGONAL POLYNOMIALS Brahi Benouahane y Annie Cuyt? Keywords Abstract It is well-known that the denoinators of Pade approxiants can be considered as orthogonal

More information

A Note on Online Scheduling for Jobs with Arbitrary Release Times

A Note on Online Scheduling for Jobs with Arbitrary Release Times A Note on Online Scheduling for Jobs with Arbitrary Release Ties Jihuan Ding, and Guochuan Zhang College of Operations Research and Manageent Science, Qufu Noral University, Rizhao 7686, China dingjihuan@hotail.co

More information

Optimal Jamming Over Additive Noise: Vector Source-Channel Case

Optimal Jamming Over Additive Noise: Vector Source-Channel Case Fifty-first Annual Allerton Conference Allerton House, UIUC, Illinois, USA October 2-3, 2013 Optial Jaing Over Additive Noise: Vector Source-Channel Case Erah Akyol and Kenneth Rose Abstract This paper

More information

The Hilbert Schmidt version of the commutator theorem for zero trace matrices

The Hilbert Schmidt version of the commutator theorem for zero trace matrices The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Tight Complexity Bounds for Optimizing Composite Objectives

Tight Complexity Bounds for Optimizing Composite Objectives Tight Coplexity Bounds for Optiizing Coposite Objectives Blake Woodworth Toyota Technological Institute at Chicago Chicago, IL, 60637 blake@ttic.edu Nathan Srebro Toyota Technological Institute at Chicago

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

SPECTRUM sensing is a core concept of cognitive radio

SPECTRUM sensing is a core concept of cognitive radio World Acadey of Science, Engineering and Technology International Journal of Electronics and Counication Engineering Vol:6, o:2, 202 Efficient Detection Using Sequential Probability Ratio Test in Mobile

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

DESIGN OF THE DIE PROFILE FOR THE INCREMENTAL RADIAL FORGING PROCESS *

DESIGN OF THE DIE PROFILE FOR THE INCREMENTAL RADIAL FORGING PROCESS * IJST, Transactions of Mechanical Engineering, Vol. 39, No. M1, pp 89-100 Printed in The Islaic Republic of Iran, 2015 Shira University DESIGN OF THE DIE PROFILE FOR THE INCREMENTAL RADIAL FORGING PROCESS

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

arxiv: v1 [cs.lg] 8 Jan 2019

arxiv: v1 [cs.lg] 8 Jan 2019 Data Masking with Privacy Guarantees Anh T. Pha Oregon State University phatheanhbka@gail.co Shalini Ghosh Sasung Research shalini.ghosh@gail.co Vinod Yegneswaran SRI international vinod@csl.sri.co arxiv:90.085v

More information

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010

A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING. Emmanuel J. Candès Yaniv Plan. Technical Report No November 2010 A PROBABILISTIC AND RIPLESS THEORY OF COMPRESSED SENSING By Eanuel J Candès Yaniv Plan Technical Report No 200-0 Noveber 200 Departent of Statistics STANFORD UNIVERSITY Stanford, California 94305-4065

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD

ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN

More information

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding

Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding IEEE TRANSACTIONS ON INFORMATION THEORY (SUBMITTED PAPER) 1 Design of Spatially Coupled LDPC Codes over GF(q) for Windowed Decoding Lai Wei, Student Meber, IEEE, David G. M. Mitchell, Meber, IEEE, Thoas

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information