Stability of Transductive Regression Algorithms
|
|
- Ami Francine Phillips
- 5 years ago
- Views:
Transcription
1 Stability of Transdctive Regression Algoriths Corinna Cortes Google Research, 76 Ninth Avene, New York, NY 00 Mehryar Mohri Corant Institte of Matheatical Sciences and Google Research, 25 Mercer Street, New York, NY 002 Ditry Pechyony Technion - Israel Institte of Technology, Haifa 32000, Israel Ashish Rastogi rastogi@csnyed Corant Institte of Matheatical Sciences, 25 Mercer Street, New York, NY 002 Keywords: transdctive inference, stability, regression, learning theory Abstract This paper ses the notion of algorithic stability to derive novel generalization bonds for several failies of transdctive regression algoriths, both by sing convexity and closed-for soltions Or analysis helps copare the stability of these algoriths It sggests that several existing algoriths ight not be stable bt prescribes a techniqe to ake the stable It also reports the reslts of experients with local transdctive regression deonstrating the benefit of or stability bonds for odel selection, in particlar for deterining the radis of the local neighborhood sed by the algorith Introdction Many learning probles in inforation extraction, coptational biology, natral langage processing and other doains can be forlated as transdctive inference probles (Vapnik, 982) In the transdctive setting, the learning algorith receives both a labeled training set, as in the standard indction setting, and a set of nlabeled test points The objective is to predict the labels of the test points No other test points will ever be considered This setting arises in a variety of applications Often, the points to label are known bt they have not been assigned a label de Appearing in Proceedings of the 25 th International Conference on Machine Learning, Helsinki, Finland, 2008 Copyright 2008 by the athor(s)/owner(s) to the prohibitive cost of labeling This otivates the se of transdctive algoriths which leverage the nlabeled data dring training to iprove learning perforance This paper deals with transdctive regression, which arises in probles sch as predicting the real-valed labels of the nodes of a known graph in coptational biology, or the scores associated with known docents in inforation extraction or search engine tasks Several algoriths have been devised for the specific setting of transdctive regression (Belkin et al, 2004b; Chapelle et al, 999; Schrans & Sothey, 2002; Cortes & Mohri, 2007) Several other algoriths introdced for transdctive classification can be viewed in fact as transdctive regression ones as their objective fnction is based on the sqared loss, eg, (Belkin et al 2004a; 2004b) Cortes and Mohri (2007) also gave explicit VC-diension generalization bonds for transdctive regression that hold for all bonded loss fnctions and coincide with the tight classification bonds of Vapnik (998) when applied to classification This paper presents novel algorith-dependent generalization bonds for transdctive regression Since they are algorith-specific, these bonds can often be tighter than bonds based on general coplexity easres sch as the VC-diension Or analysis is based on the notion of algorithic stability In Sec 2 we give a foral definition of the transdctive regression setting and the notion of stability for transdction Or bonds generalize the stability bonds given by Bosqet and Elisseeff (2002) for the in-
2 Stability of Transdctive Regression dctive setting and extend to regression the stabilitybased transdctive classification bonds of (El-Yaniv & Pechyony, 2006) Standard concentration bonds sch as McDiarid s bond (McDiarid, 989) cannot be readily applied to the transdctive regression setting since the points are not drawn independently bt niforly withot replaceent fro a finite set Instead, a generalization of McDiarid s bond that holds for rando variables sapled withot replaceent is sed, as in (El-Yaniv & Pechyony, 2006) Sec 3 gives a sipler proof of this bond This concentration bond is sed to derive a general transdctive regression stability bond in Sec 32 In Sec 4, we present the stability coefficients for a faily of local transdctive regression algoriths The analysis in this section is based on convexity In Sec 5, we stdy the stability of other transdctive regression algoriths (Belkin et al, 2004a; W & Schölkopf, 2007; Zho et al, 2004; Zh et al, 2003) based on their closed for soltion and propose a odification to the seeingly nstable algorith that akes the stable and garantees a non-trivial generalization bond Finally, Sec 6 shows the reslts of experients with local transdctive regression deonstrating the benefit of or stability bonds for odel selection, in particlar for deterining the radis of the local neighborhood sed by the algorith This provides a partial validation of or bonds and analysis 2 Definitions Let s first describe the transdctive learning setting Asse that a fll saple X of + exaples is given The learning algorith frther receives the labels of a rando sbset S of X of size which serves as a training saple The reaining nlabeled exaples, x +,, x + X, serve as test data We denote by X (S, T) a partitioning of X into the training set S and the test set T The transdctive learning proble consists of predicting accrately the labels y +,, y + of the test exaples, no other test exaples will ever be considered (Vapnik, 998) The specific probles where the labels are real-valed nbers, as in the case stdied in this paper, is that of transdction regression It differs fro the standard (indction) regression since the learning algorith is given the nlabeled test exaples beforehand and can Another natral setting for transdction is one where the training and test saples are both drawn according to the sae distribtion and where the test points, bt not their labels, are ade available to the learning algorith However, as pointed ot by Vapnik (998), any generalization bond in the setting we analyze directly yields a bond for this other setting, essentially by taking the expectation ths exploit this inforation to iprove perforance We denote by c(h, x) the cost of an error of a hypothesis h on a point x labeled with y(x) The cost fnction coonly sed in regression is the sqared loss c(h, x) = (h(x) y(x)) 2 In the reaining of this paper, we will asse a sqared loss bt any of or reslts generalize to other convex cost fnctions The training and test errors of h are respectively R(h) = c(h, x k) and R(h) = c(h, x +k) The generalization bonds we derive are based on the notion of transdctive algorithic stability Definition (Transdction β-stability) Let L be a transdctive learning algorith and let h denote the hypothesis retrned by L for X (S, T) and h the hypothesis retrned for X (S, T ) L is said to be niforly β-stable with respect to the cost fnction c if there exists β 0 sch that for any two partitionings X (S, T) and X (S, T ) that differ in exactly one training (and ths test) point and for all x X, c(h, x) c(h, x) β () 3 Transdction Stability Bonds 3 Concentration Bond for Sapling withot Replaceent Stability-based generalization bonds in the indctive setting are based on McDiarid s ineqality (989) In the transdctive setting, the points are drawn niforly withot replaceent and ths are not independent Therefore, McDiarid s concentration bond cannot be readily sed Instead, a generalization of McDiarid s bond for sapling withot replaceent is needed as in El-Yaniv and Pechyony (2006) We will denote by S a seqence of rando variables S,,S and write S = x as a shorthand for the eqalities S i = x i, i =,, and Pr[x xi, x i ]=Pr[S =x Si =x i, S i =x i ] Theore ((McDiarid, 989), 60) Let S be a seqence of rando variables, each S i taking vales in the set X, and asse that a easrable fnction φ : X R satisfies: i [, ], x i, x i X, E S h φ S i, S i = x i i E S h φ S i, S i = x ii ci Then, ǫ > 0, «2ǫ 2 Pr [ φ E [φ] ǫ] 2exp P i= c2 i The following is a concentration bond for sapling withot replaceent needed to analyze the generalization of transdctive algoriths
3 Stability of Transdctive Regression Theore 2 Let x be a seqence of rando variables, sapled fro an nderlying set X of + eleents withot replaceent, and let that φ : X R be a syetric fnction sch that for all i [, ] and for all x,, x X and x,,x X, φ(x,, x ) φ(x,, x i, x i, x,, x ) c Then, ǫ > 0, Pr ˆ φ E [φ] ǫ 2exp 2ǫ 2 where α(, ) = ) = x + /2 /(2ax{,}) α(, )c 2 Proof For a fixed i [, ], let g(s i ) = i E S ˆφ S, S i =x i i ES ˆφ S, S i =x i Then, g(x i φ(x i, x i,x )Pr[x xi, x i ] x φ(x i, x i,x )Pr[x x i, x i ] For nifor sapling withot replaceent, the probability ters can be written as: Prˆx x i, x i = Q! k=i = + k (+ i)! Ths, g(x i! ) = [P (+ i)! x φ(x i, x i,x ) P x φ(x i, x i,x )] To copte the expression between brackets, we divide the set of pertations {x } into two sets, those that contain x i and those that do not If a pertation x contains x i we can write it as x k x ix k+, where k is sch that x k = x i We then atch it p with the pertation x i x k x k+ fro the set {x i x } These two pertations contain exactly the sae eleents, and since the fnction φ is syetric in its argents, the difference in the vale of the fnction on the pertations is zero In the other case, if a pertation x does not contain the eleent x i, then we siply atch it p with the sae pertation in {x } The atching pertations appearing in the sation are then x i x and x i x which clearly only differ with respect to x i The difference in the vale of the fnction φ in this case can be bonded by c The nber of sch pertations is ( i)! ( ) + () = (+ i )! ( )! x φ(x i (+ i )! ( )! i, which leads to the following pper bond:, x i,x ) x φ(x i, x i,x ) c, which iplies that g(x i )! (+ i )! ( )! c + i (+ i)! c Then, cobining Theore with the identity i= (+ i) 2 yields that Pr [ ] ( φ E [φ] ǫ 2 exp + /2 /(2) + /2 /2, 2ǫ 2 α (,)c ), 2 where α (, ) = The fnction φ is syetric in and in the sense that selecting one of the sets niqely deterines the other set The stateent of the theore then follows fro a siilar «, bond with α (, ) = the tighter of the two 32 Transdctive Stability Bond + /2 /(2), taking To obtain a general transdctive regression stability bond, we apply the concentration bond of Theore 2 to the rando variable φ(s) = R(h) R(h) To do so, we need to bond E S [φ(s)], where S is a rando sbset of X of size, and φ(s) φ(s ) where S and S are saples differing by exactly one point Lea Let H be a bonded hypothesis set ( x X, h(x) y(x) B) and L a β-stable algorith retrning the hypotheses h and h for two training sets S and S of size each, respectively, differing in exactly one point Then, φ(s) φ(s ) 2β + B 2 ( + )/() (2) Proof By definition, S and S differ exactly in one point Let x i S, x +j S be the points in which the two sets differ The lea follows fro the observation that for each one of the coon labeled points in S and S, and for each one of the coon test points in T and T (recall T = X \ S, T = X \ S ), the difference in cost is bonded by β, while for x i and x +j, the difference in cost is bonded by B 2 Then, it follows that φ(s) φ(s ) ( )β + ( )β + B2 + B2 2β + ( B2 + ) Lea 2 Let h be the hypothesis retrned by a β- stable algorith L Then, E S [φ(s)] β Proof By definition of φ(s), its expectation is P ES [c(h, x +k)] P ES [c(h, x k)] Since E S [c(h, x +j )] is the sae for all j [, ], and E S [c(h, x i )] the sae for all i [, ], for any i and j, E S [φ(s)] = E S [c(h, x +j )] E S [c(h, x i )] = E S [c(h, x i )] E S [c(h, x i )] Ths, E S [φ(s)] = E S,S X [c(h, x i ) c(h, x i )] β Theore 3 Let H be a bonded hypothesis set ( x X, h(x) y(x) B) and L a β-stable algorith Let h be the hypothesis retrned by L when trained on X (S, T) Then, for any δ > 0, with prob at least δ, ( ) R(h) R(h)+β+ 2β + B2 ( + ) α(, )ln δ 2 Proof The reslt follows directly fro Theore 2 and Leas and 2 This is a general bond that applies to any transdctive algorith To apply it, the stability coefficient β,
4 Stability of Transdctive Regression which depends on and, needs to be deterined In the sbseqent sections, we derive bonds on β for a nber of transdctive regression algoriths (Cortes & Mohri, 2007; Belkin et al, 2004a; W & Schölkopf, 2007; Zho et al, 2004; Zh et al, 2003) 4 Stability of Local Transdctive Regression Algoriths This section describes and analyzes a general faily of local transdctive regression algoriths (LTR) generalizing the algorith of Cortes and Mohri (2007) LTR algoriths can be viewed as a generalization of the so-called kernel reglarization-based learning algoriths to the transdctive setting The objective fnction that they iniize is of the for: F(h, S) = h 2 K + C X c(h, x k )+ C X ec(h, x +k ), (3) where K is the nor in the reprodcing kernel Hilbert space (RKHS) with associated kernel K, C 0 and C 0 are trade-off paraeters, and c(h, x) = (h(x) ỹ(x)) 2 is the error of the hypothesis h on the nlabeled point x with respect to a psedo-target ỹ Psedo-targets are obtained fro neighborhood labels y(x) by a local weighted average Neighborhoods can be defined as a ball of radis r arond each point in the featre space We will denote by β loc the scorestability coefficient of the local algorith sed, that is the axial aont by which the two hypotheses differ on an given point, when trained on saples disagreeing on one point This notion is stronger than that of cost-based stability In this section, we se the bonded-labels assption, that is x S, y(x) M We also asse that for any x X, K(x, x) κ 2 We will se the following bond based on the reprodcing property and the Cachy-Schwarz ineqality valid for any hypothesis h H : x X, h(x) = h,k(x, ) h K p K(x, x) κ h K (4) Lea 3 Let h be the hypothesis iniizing (3) Asse that for any x X, K(x, x) κ 2 Then, for any x X, h(x) κm C + C Proof The proof is a straightforward adaptation of the techniqe of (Bosqet & Elisseeff, 2002) to LTR algoriths By Eqn 4, h(x) κ h K Let 0 R + be the hypothesis assigning label zero to all exaples By definition of h, F(h, S) F(0, S) (C + C )M 2 Using h K F(h, S) yields the stateent Since h(x) κm C + C, this iediately gives s a bond on h(x) y(x) M( + κ C + C ) Ths, we are in a position to apply Theore 3 with B = AM, A = + κ C + C We now derive a bond on the stability coefficient β To do so, the key property we will se is the convexity of h c(h, x) Note, however, that in the case of c, the psedo-targets ay depend on the training set S This dependency atters when we wish to apply convexity with two hypotheses h and h obtained by training on different saples S and S For convenience, for any two sch fixed hypotheses h and h, we extend the definition of c as follows For all t [0, ], c(th+( t)h, x) = ( (th+( t)h )(x) (tỹ+( t)ỹ ) ) 2 This allows s to se the sae convexity property for c as for c for any two fixed hypotheses h and h, as verified by the following lea, and does not affect the proofs otherwise Lea 4 Let h be a hypothesis obtained by training on S and h by training on S Then, for all t [0, ], t c(h, x) + ( t) c(h, x) c(th + ( t)h, x) (5) Proof Let ỹ = ỹ(x) be the psedo-target vale at x when the training set is S and ỹ = ỹ (x) when the training set is S For all t [0, ], tc(h, x) + ( t)c(h, x) c(th + ( t)h, x) = t(h(x) ey) 2 + ( t)(h (x) ey ) 2 ˆt(h(x) ey) + ( t)(h (x) ey ) 2 The stateent of the lea follows directly by the convexity of x x 2 over real nbers Let h be a hypothesis obtained by training on S and h by training on S Let = h h Then, for all x X, c(h, x) c(h, x) = (x)((h(x) y(x)) + (h (x) y(x))) 2M( + κ C + C ) (x) As in 4, for all x X, (x) κ K, ths for all x X, c(h, x) c(h, x) 2M( + κ C + C )κ K (6) Lea 5 Asse that for all x X, y(x) M Let S and S be two saples differing by exactly one point Let h be the hypothesis retrned by the algorith iniizing the objective fnction F(h, S), h be the hypothesis obtained by iniization of F(h, S ) and let ỹ and ỹ be the corresponding psedo-targets Then, C [c(h, x i ) c(h, x i )] / C [ c(h, x i ) c(h, x i )] / 2AM (κ K (C/ + C /) + β loc C /)
5 Stability of Transdctive Regression where = h h and A = + κ C + C Proof Let c(h i, ỹ i ) = c(h, x i ) and c(h i, ỹ i ) = c(h, x i ) By Lea 3 and the bonded-labels assption, c(h i, ỹ i ) c(h i, ỹ i ) = c(h i, ỹ i) c(h i, ỹ i ) + c(h i, ỹ i ) c(h i, ỹ i ) (ỹ i ỹ i)(ỹ i + ỹ i 2h i ) + (h i h i)(h i + h i 2ỹ i ) By the score-stability of local estiates, ỹ (x i ) ỹ(x i ) β loc Ths, c(h i, ỹ i) c(h i, ỹ i ) 2AM(β loc + κ K ) (7) Using 6 leads after siplification to the stateent of the lea The proof of the following theore is based on Lea 4 and Lea 5 and is given in the appendix Theore 4 Asse that for all x X, y(x) M and there exists κ sch that x X, K(x, x) κ 2 Frther, asse that the local estiator has nifor stability coefficient β loc Let A = +κ C + C Then, LTR is niforly β-stable with [ ( ) 2 ] C β 2(AM) 2 κ 2 + C C + + C + 2C β loc AMκ 2 Or experients with LTR will deonstrate the benefit of this bond for odel selection (Sec 6) 5 Stability Based on Closed-For Soltions 5 Unconstrained Reglarization Algoriths In this section, we consider a faily of transdctive regression algoriths that can be forlated as the following optiization proble: in h h T Qh + (h y) T C(h y) (8) Q R (+) (+) is a syetric reglarization atrix, C R (+) (+) is a syetric atrix of epirical weights (in practice it is often a diagonal atrix), y R (+) are the target vales of the labeled points together with the psedo-target vales of the nlabeled points (in soe forlations, the psedo-target vale is 0), and h R (+) is a coln vector whose ith row is the predicted target vale for the x i The closed-for soltion of (8) is given by h = (C Q + I) y (9) The forlation (8) is qite general and incldes as special cases the algoriths of (Belkin et al, 2004a; W & Schölkopf, 2007; Zho et al, 2004; Zh et al, 2003) We present a general fraework for bonding the stability coefficient of these algoriths and then exaine the stability coefficient of each of these algoriths in trn For a syetric atrix A R n n we will denote by λ M (A) its largest eigenvale and λ (A) its sallest Then, for any v R n, λ (A) v 2 Av 2 λ M (A) v 2 We will also se in the proof of the following proposition the fact that for syetric atrices A,B R n n, λ M (AB) λ M (A)λ M (B) Proposition Let h and h solve (8), nder test and training sets that differ exactly in one point and let C,C,y,y be the analogos epirical weight and the target vale atrices Then, h h 2 y y 2 λ (Q) λ + + λm(q) C C 2 y ( )( ) 2 λ(q) M(C) λ M(C ) + λ(q) λ + M(C) Proof Let = h h and y = y y Let C = (C Q + I) and C = (C Q + I) By definition, = C y C y = C y + (C C )y = C y + (C [ (C C )Q ]C )y Ths, 2 y 2 λ (C) +λ M(Q) C C 2 y 2 λ (C )λ (C) (0) Frtherore, λ (C) λ(q) λ M(C) + Plgging this bond back into Eqn 0 yields: 2 y 2 λ (Q) λ + + λm(q) C C 2 y ( )( ) 2 λ(q) M(C) λ M(C ) + λ(q) λ + M(C) Since h h is bonded by h h 2, the proposition provides a bond on the score-stability of h for the transdctive regression algoriths of Zho et al (2004); W and Schölkopf (2007); Zh et al (2003) For each of these algoriths, the psedo-targets sed are zero If we ake the bonded labels assption ( x X, y(x) M, for soe M >0), it is not difficlt to show that y y 2 2M and y 2 M We now exaine each algorith in trn Consistency ethod (CM) In the CM algorith (Zho et al, 2004), the atrix Q is a noralized Laplacian of a weight atrix W R (+) (+) that captres affinity between pairs of points in the
6 Stability of Transdctive Regression fll saple X Ths, Q = I D /2 WD /2, where D R (+) (+) is a diagonal atrix, with [D] i,i = j [W] i,j Note that λ (Q) = 0 Frtherore, atrices C and C are identical in CM, both diagonal atrices with (i, i)th entry eqal to a positive constant µ > 0 Ths C = C and sing Prop, we obtain the following bond on the score-stability of the CM algorith: β CM 2M Local learning reglarization (LL Reg) In the LL Reg algorith (W & Schölkopf, 2007), the reglarization atrix Q is (I A) T (I A), where I R (+) (+) is an identity atrix and A R (+) (+) is a non-negative weight atrix that captres the local siilarity between all pairs of points in X A is noralized, ie each of its rows s to Let C l, C > 0 be two positive constants The atrix C is a diagonal atrix with [C] i,i = C l if x i S and C otherwise Let C ax = ax{c l, C } and C in = in{c l, C } Ths, C C 2 = ( 2 C in C ax ) By the Perron-Frobenis theore, its eigenvales lie in the interval (, ] and λ M (A) Ths, λ (Q) 0 and λ M (Q) 4 and we have the following bond on the scorestability of the LL Reg algorith: β LL Reg 2M + 4 ( ) M C in C ax 2M + 4 M C in Gassian Mean Fields algorith GMF (Zh et al, 2003) is very siilar to the LL Reg, and adits exactly the sae stability coefficient Ths, the stability coefficients of the algoriths of CM,LL Reg, and GMF can be large Withot additional constraints on the atrix Q, these algoriths do not see to be stable enogh for the generalization bond of Theore 3 to converge A particlar exaple of constraint is the condition + i= h(x i) = 0 sed by Belkin et al s algorith (2004a) In the next section, we give a generalization bond for this algorith and then describe a general ethod for aking the algoriths jst exained stable 52 Stability of Constrained Reglarization Algoriths This sbsection analyzes constrained reglarization algoriths sch as the Laplacian-based graph reglarization algorith of Belkin et al (2004a) Given a weighted graph G = (X, E) in which edge weights represent the extent of siilarity between vertices, the task consists of predicting the vertex labels The hypothesis h retrned by the algorith is soltion of the following optiization proble: in h H ht Lh + C X (h(x i) y i) 2 i= + X sbject to: h(x i) = 0, i= () where L R (+) (+) is a soothness atrix, eg, the graph Laplacian, {y i i [, ]} are the target vales of the labeled nodes The hypothesis set H in this case can be thoght of as a hyperplane in R + that is orthogonal to the vector R + Maintaining the notation sed in (Belkin et al, 2004a), we let P H denote the operator corresponding to the orthogonal projection on H For a saple S drawn withot replaceent fro X, define I S R (+) (+) to be the diagonal atrix with [I S ] i,i = if x i S and 0 otherwise Siilarly, let y S R (+) be the coln vector with [y S ] i, = y i if x i S and 0 otherwise The closed-for soltion on a training saple S is given by (Belkin et al, 2004a): h S = P H C L + IS ys (2) Theore 5 Asse that the vertex labels of the graph G = (X, E) and the hypothesis h obtained by optiizing Eqn are both bonded ( x, h(x) M and y(x) M for soe M > 0) Let A = + κ C Then, for any δ > 0, with probability at least δ, «s R(h) R(h) b + β + 2β + (AM)2 ( + ) α(, ) ln δ, 2 with α(, ) = + /2 /(2 ax{,}) and β (4 2M 2 )/(λ 2 /C ) + (4 2M 2 )/(λ 2 /C ) 2, λ 2 is the second sallest eigenvale of the Laplacian Proof The proof is siilar to that of (Belkin et al, 2004a) bt ses or general transdctive regression bond instead The generalization bond we jst presented differs in several respects fro that of Belkin et al (2004a) Or bond explicitly depends on both and while theirs shows only a dependency on Also, or bond does not depend on the nber of ties a point is sapled in the training set (paraeter t), thanks to or analysis based on sapling withot replaceent Contrasting the stability coefficient of Belkin s algorith with the stability coefficient of LTR (Theore 4), we note that it does not depend on C and β loc This is becase nlabeled points do not enter the objective fnction, and ths C = 0 and ỹ(x) = 0 for all
7 x X However, the stability does depend on the second sallest eigenvale λ 2 and the bond diverges as λ 2 approaches C In all or regression experients, we observed that this algorith does not perfor as well in coparison with LTR 53 Making Seeingly Unstable Algoriths Stable In Sec 52, we saw that iposing additional constraints on the hypothesis, eg, h = 0, allowed one to derive non-trivial stability bonds This idea can be generalized and siilar non-trivial stability bonds can be derived for stable versions of the algoriths presented in Sec 5 CM,LL Reg, and GMF Recall that the stability bond in Prop is inversely proportional to the sallest eigenvale λ (Q) The ain difficlty with sing the proposition for these algoriths is that λ (Q) = 0 in each case Let v denote the eigenvector corresponding to λ (Q) and let λ 2 be the second sallest eigenvale of Q One can odify (8) and constrain the soltion to be orthogonal to v by iposing h v = 0 In the case of (Belkin et al, 2004a), v = This odification, otivated by the algorith of (Belkin et al, 2004a), is eqivalent to increasing the sallest eigenvale to be λ 2 As an exaple, by iposing the additional constraint, we can show that the stability coefficient ofcm becoes bonded by O(C/λ 2 ), instead of Θ() Ths, if C = O(/) and λ 2 = Ω(), it is bonded by O(/) and the generalization bond converges as O(/) 6 Experients 6 Model Selection Based on Bond This section reports the reslts of experients sing or stability-based generalization bond for odel selection for the LTR algorith A crcial paraeter of this algorith is the stability coefficient β loc (r) of the local algorith, which coptes psedo-targets ỹ x based on a ball of radis r arond each point We derive an expression for β loc (r) and show, sing extensive experients with ltiple data sets, that the vale r iniizing the bond is a rearkably good estiate of the best r for the test error This deonstrates the benefit of or generalization bond for odel selection, avoiding the need for a held-ot validation set The experients were carried ot on several pblicly available regression data sets: Boston Hosing, Elevators and Ailerons 2 For each of these data sets, we sed =, inspired by the observation that, all other Stability of Transdctive Regression 2 wwwliaadppt/~ltorgo/regression/datasetshtl paraeters being fixed, the bond of Theore 3 is tightest when = The vale of the inpt variables were noralized to have ean zero and variance one For the Boston Hosing data set, the total nber of exaples was 506 For the Elevators and the Ailerons data set, a rando sbset of 2000 exaples was sed For both of these data sets, other rando sbsets of 2000 saples led to siilar reslts The Boston Hosing experients were repeated for 50 rando partitions, while for the Elevators and the Ailerons data set, the experients were repeated for 20 rando partitions each Since the target vales for the Elevators and the Ailerons data set were extreely sall, they were scaled by a factor 000 and 00 respectively in a pre-processing step In or experients, we estiated the psedo-target of a point x T as a weighted average of the labeled points x N(x ) in a neighborhood of x Ths, ỹ x = x N(x ) α xy x / x N(x ) α x Weights are defined in ters of a siilarity easre K(x, x ) captred by a kernel K: α x = K(x, x ) Let (r) be the nber of labeled points in N(x ) Then, it is easy to show that β loc 4α ax M/(α in (r)), where α ax = ax x N(x ) α x and α in = in x N(x ) α x Ths, for a Gassian kernel with paraeter σ, β loc 4M/((r)e 2r2 /σ 2 ) To estiate β loc, one needs an estiate of (r), the nber of saples in a ball of radis r fro an nlabeled point x In or experients, we estiated (r) as the nber of saples in a ball of radis r fro the origin Since all featres are noralized to ean zero and variance one, the origin is also the centroid of the set X We ipleented a dal soltion ofltr and sed Gassian kernels, for which, the paraeter σ was selected sing cross-validation on the training set Experients were repeated across 36 different pairs of vales of (C, C ) For each pair, we varied the radis r of the neighborhood sed to deterine estiates fro zero to the radis of the ball containing all points Figre (a) shows the ean vales of the test MSE of or experients on the Boston Hosing data set for typical vales of C and C Figres (b)-(c) show siilar reslts for the Ailerons and Elevators data sets For the sake of coparison, we also report reslts for indction The relative standard deviations on the MSE are not indicated, bt were typically of the order of 0% LTR generally achieves a significant iproveent over indction The generalization bond we derived in Eqn 3 consists of the training error and a coplexity ter that depends on the paraeters of the LTR algorith (C, C, M,,, κ, β loc, δ) Only two ters depend
8 Stability of Transdctive Regression Transdction Indction Training Error + Slack Ter Transdction Indction Training Error + Slack Ter Transdction Indction Training Error + Slack Ter (a) (b) (c) Figre MSE against the radis r of LTR for three data sets: (a) Boston Hosing (b) Ailerons (c) Elevators The sall horizontal bar indicates the location (ean ± one standard deviation) of the ini of the epirically deterined r pon the choice of the radis r: R(h) and βloc Ths, keeping all other paraeters fixed, the theoretically optial radis r is the one that iniizes the training error pls the slack ter The figres also inclde plots of the training error cobined with the coplexity ter, appropriately scaled The epirical iniization of the radis r coincides with or is close to r The optial r based on test MSE is indicated with error bars 62 Stable Versions of Unstable Algoriths We refer to the stable version of the CM algorith presented in Sec 5 as CM STABLE We copared CM and CM STABLE epirically on the sae datasets, again sing = For the noralized Laplacian we sed k-nearest neighbors graphs based on Eclidean distance The paraeters k and C were chosen by five-fold cross-validation over the training set The experient was repeated 20 ties with rando partitions The averaged ean-sqared errors with standard deviations, are reported in Table 62 Dataset CM CM STABLE Elevators ± ± Ailerons 049 ± ± Hosing 5793 ± ± 65 We conclde fro this experient that CM and CM STABLE have the sae perforance However, as we showed previosly, CM STABLE has a non-trivial risk bond and ths coes with soe garantee 7 Conclsion We presented a coprehensive analysis of the stability of transdctive regression algoriths with novel generalization bonds for a nber of algoriths Since they are algorith-dependent, or bonds are often tighter than those based on coplexity easres sch as the VC-diension Or experients also show the effectiveness of or bonds for odel selection and the good perforance of LTR algoriths Acknowledgents The research of Mehryar Mohri and Ashish Rastogi was partially spported by the New York State Office of Science Technology and Acadeic Research (NYS- TAR) This project was also sponsored in part by the Departent of the Ary Award Nber W8XWH The US Ary Medical Research Acqisition Activity, 820 Chandler Street, Fort Detrick MD is the awarding and adinistering acqisition office The content of this aterial does not necessarily reflect the position or the policy of the Governent and no official endorseent shold be inferred References Belkin, M, Matveeva, I, & Niyogi, P (2004a) Reglarization and sei-spervised learning on large graphs COLT (pp ) Belkin, M, Niyogi, P, & Sindhwani, V (2004b) Manifold reglarization (Technical Report TR ) University of Chicago Bosqet, O, & Elisseeff, A (2002) Stability and generalization JMLR, 2, Chapelle, O, Vapnik, V, & Weston, J (999) Transdctive Inference for Estiating Vales of Fnctions NIPS 2 (pp ) Cortes, C, & Mohri, M (2007) On Transdctive Regression NIPS 9 (pp ) El-Yaniv, R, & Pechyony, D (2006) Stable transdctive learning COLT (pp 35 49) McDiarid, C (989) On the ethod of bonded differences Srveys in Cobinatorics (pp 48 88) Cabridge University Press, Cabridge Schrans, D, & Sothey, F (2002) Metric-Based Methods for Adaptive Model Selection and Reglarization Machine Learning, 48, 5 84 Vapnik, V N (982) Estiation of dependences based on epirical data Berlin: Springer
9 Stability of Transdctive Regression Vapnik, V N (998) Statistical learning theory New York: Wiley-Interscience W, M, & Schölkopf, B (2007) Transdctive classification via local learning reglarization AISTATS (pp ) Zho, D, Bosqet, O, Lal, T, Weston, J, & Schölkopf, B (2004) Learning with local and global consistency NIPS 6 (pp ) Zh, X, Ghahraani, Z, & Lafferty, J (2003) Seispervised learning sing gassian fields and haronic fnctions ICML (pp 92 99) The right-hand side can be bonded sing Lea 5, leading to a second-degree ineqality in K Upperbonding K with the positive root and sing (6) to bond c(h, x) c(h, x) leads to the bond on β and proves the stateent of the lea A Proof of Theore 4 Proof Let h and h be defined as in Lea 5 By definition, we have h = argin h H F(h, S) and h = arginf(h, S ) (3) h H Ths, F(h, S) F(h + t, S) 0 and F(h, S ) F(h t, S ) 0 For any h, an x, let c(h,, x) denote c(h, x) c(h+, x) and siilarly with c Then, sing these two ineqalities yields C C X X k i c(h, t h, x k ) + C c(h, t, x k ) + C X ec(h, t, x +k )+ X ec(h, t, x +k ) k j + C c(h, t,x +j) + C ec(h, t, x i) + h 2 K h + t 2 K + h 2 K h t 2 K 0 By convexity of c(h, ) in h, it follows that for all k [, + ], c(h, t, x k ) t c(h,, x k ), and c(h, t, x k ) t c(h,, x k ) By Lea 4, siilar ineqalities hold for c These observations lead to: Ct Ct X X k i c(h, δh, x k ) + C t c(h,,x k ) + C t X ec(h, δh, x +k )+ X ec(h,, x +k )+ k j Ct c(h,, x +j) + C t ec(h,, x i) + E 0, (4) with E = h 2 K h + t 2 K + h 2 K h t 2 K It is not hard to see by sing the definition of K in ters of the dot prodct in the reprodcing Hilbert space that E = 2t 2 ( t) Siplifying 4 leads to: 2t 2 ( t) = E Ct [ c(h,, x i )+ c(h,, x +j ) ] C t [ c(h,, x i ) + c(h,, x +j )]
On the Impact of Kernel Approximation on Learning Accuracy
On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu
More informationSupport Vector Machines. Goals for the lecture
Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed
More informationComputational and Statistical Learning Theory
Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationKernel based collaborative filtering for very large scale top-n item recommendation
ESANN 06 proceedings, Eropean Syposi on Artificial Neral Networks, Coptational Intelligence and Machine Learning. Brges (Belgi), 7-9 April 06, i6doc.co pbl., ISBN 978-8758707-8. Available fro http:www.i6doc.coen.
More informationStability Bounds for Non-i.i.d. Processes
tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 ohri@cis.nyu.edu Afshin Rostaiadeh Departent of Coputer
More informationNew MINLP Formulations for Flexibility Analysis for Measured and Unmeasured Uncertain Parameters
Anton A. Kiss, Edwin Zondervan, Richard Lakerveld, Leyla Özkan (Eds.) Proceedings of the 29 th Eropean Syposi on Copter Aided Process Engineering Jne 16 th to 19 th, 219, Eindhoven, The Netherlands. 219
More informationSupport Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization
Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering
More informationModels to Estimate the Unicast and Multicast Resource Demand for a Bouquet of IP-Transported TV Channels
Models to stiate the Unicast and Mlticast Resorce Deand for a Boqet of IP-Transported TV Channels Z. Avraova, D. De Vleeschawer,, S. Wittevrongel, H. Brneel SMACS Research Grop, Departent of Teleconications
More informationFoundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationQuadratic forms and a some matrix computations
Linear Algebra or Wireless Conications Lectre: 8 Qadratic ors and a soe atri coptations Ove Edors Departent o Electrical and Inoration echnology Lnd University it Stationary points One diension ( d d =
More informationRademacher Complexity Margin Bounds for Learning with a Large Number of Classes
Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute
More informationSOME EFFECTIVE ESTIMATION PROCEDURES UNDER NON-RESPONSE IN TWO-PHASE SUCCESSIVE SAMPLING
STATISTICS IN TRANSITION new series, Jne 06 63 STATISTICS IN TRANSITION new series, Jne 06 Vol. 7, No., pp. 63 8 SOME EFFECTIVE ESTIMATION PROCEDURES UNDER NONRESPONSE IN TWOPHASE SUCCESSIVE SAMPING G.
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE7C (Spring 018: Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee7c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee7c@berkeley.edu October 15,
More informationThe Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters
journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More information1 Rademacher Complexity Bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationSoft-margin SVM can address linearly separable problems with outliers
Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly
More informationDomain-Adversarial Neural Networks
Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada
More informationThis article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and
This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationShannon Sampling II. Connections to Learning Theory
Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,
More informationStability Bounds for Stationary ϕ-mixing and β-mixing Processes
Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences
More informationComputable Shell Decomposition Bounds
Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationStability Bounds for Stationary ϕ-mixing and β-mixing Processes
Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationPAC-Bayes Analysis Of Maximum Entropy Learning
PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationCS 331: Artificial Intelligence Naïve Bayes. Naïve Bayes
CS 33: Artificial Intelligence Naïe Bayes Thanks to Andrew Moore for soe corse aterial Naïe Bayes A special type of Bayesian network Makes a conditional independence assption Typically sed for classification
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationA Theoretical Analysis of a Warm Start Technique
A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful
More informationTesting Properties of Collections of Distributions
Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the
More informationPolygonal Designs: Existence and Construction
Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G
More informationNyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison
yströ Method vs : A Theoretical and Epirical Coparison Tianbao Yang, Yu-Feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Machine Learning Lab, GE Global Research, San Raon, CA 94583 Michigan State University,
More informationFoundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision
More informationResearch Article On the Isolated Vertices and Connectivity in Random Intersection Graphs
International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More informationFoundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research
Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products
More informationBounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks
Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University
More informationPhysics 215 Winter The Density Matrix
Physics 215 Winter 2018 The Density Matrix The quantu space of states is a Hilbert space H. Any state vector ψ H is a pure state. Since any linear cobination of eleents of H are also an eleent of H, it
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationNew Bounds for Learning Intervals with Implications for Semi-Supervised Learning
JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationComputable Shell Decomposition Bounds
Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago
More informationarxiv: v1 [cs.ds] 3 Feb 2014
arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/
More informationE0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)
E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how
More informationCOS 424: Interacting with Data. Written Exercises
COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well
More informationLeast Squares Fitting of Data
Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a
More informationIntroduction to Kernel methods
Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction
More informationLOSSY JPEG compression [1] achieves a good compression
1 JPEG Noises beyond the First Copression Cycle Bin Li, Tian-Tsong Ng, Xiaolong Li, Shnqan Tan, and Jiw Hang arxiv:1405.7571v1 [cs.mm] 29 May 2014 Abstract This paper focses on the JPEG noises, which inclde
More informationINNER CONSTRAINTS FOR A 3-D SURVEY NETWORK
eospatial Science INNER CONSRAINS FOR A 3-D SURVEY NEWORK hese notes follow closely the developent of inner constraint equations by Dr Willie an, Departent of Building, School of Design and Environent,
More informationChapter 6 1-D Continuous Groups
Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:
More informationSupplement to: Subsampling Methods for Persistent Homology
Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation
More informationMulti-Dimensional Hegselmann-Krause Dynamics
Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics
ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationAccuracy at the Top. Abstract
Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna
More informationRobustness and Regularization of Support Vector Machines
Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA
More informationStochastic Subgradient Methods
Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods
More informationA new type of lower bound for the largest eigenvalue of a symmetric matrix
Linear Algebra and its Applications 47 7 9 9 www.elsevier.co/locate/laa A new type of lower bound for the largest eigenvalue of a syetric atrix Piet Van Mieghe Delft University of Technology, P.O. Box
More informationJournal of Mathematical Analysis and Applications
J Math Anal Appl 386 202 205 22 Contents lists available at ScienceDirect Journal of Matheatical Analysis and Applications wwwelsevierco/locate/jaa Sei-Supervised Learning with the help of Parzen Windows
More informationQuantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search
Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationAn Analysis of W-fibers and W-type Fiber Polarizers
An Analysis of W-fibers and W-type Fiber Polarizers Corey M. Paye Thesis sbitted to the Faclty of the Virginia Polytechnic Institte and State University in partial flfillent of the reqireents for the degree
More informationThe Hilbert Schmidt version of the commutator theorem for zero trace matrices
The Hilbert Schidt version of the coutator theore for zero trace atrices Oer Angel Gideon Schechtan March 205 Abstract Let A be a coplex atrix with zero trace. Then there are atrices B and C such that
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationarxiv: v4 [cs.lg] 4 Apr 2016
e-publication 3 3-5 Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions arxiv:35796v4 cslg 4 Apr 6 Corinna Cortes Google Research, 76 Ninth Avenue, New York, NY Spencer
More informationCombining Classifiers
Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/
More informationUnderstanding Machine Learning Solution Manual
Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y
More informationBoosting with log-loss
Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the
More informationA Smoothed Boosting Algorithm Using Probabilistic Output Codes
A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu
More informationInteractive Markov Models of Evolutionary Algorithms
Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationlecture 36: Linear Multistep Mehods: Zero Stability
95 lecture 36: Linear Multistep Mehods: Zero Stability 5.6 Linear ultistep ethods: zero stability Does consistency iply convergence for linear ultistep ethods? This is always the case for one-step ethods,
More informationNeural Network Learning as an Inverse Problem
Neural Network Learning as an Inverse Proble VĚRA KU RKOVÁ, Institute of Coputer Science, Acadey of Sciences of the Czech Republic, Pod Vodárenskou věží 2, 182 07 Prague 8, Czech Republic. Eail: vera@cs.cas.cz
More informationLearnability and Stability in the General Learning Setting
Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu
More informationSupplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion
Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish
More informationON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Matheatical Sciences 04,, p. 7 5 ON THE TWO-LEVEL PRECONDITIONING IN LEAST SQUARES METHOD M a t h e a t i c s Yu. A. HAKOPIAN, R. Z. HOVHANNISYAN
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationMulti-view Discriminative Manifold Embedding for Pattern Classification
Multi-view Discriinative Manifold Ebedding for Pattern Classification X. Wang Departen of Inforation Zhenghzou 450053, China Y. Guo Departent of Digestive Zhengzhou 450053, China Z. Wang Henan University
More informationCh 12: Variations on Backpropagation
Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith
More informationA note on the multiplication of sparse matrices
Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani
More information4.2 First-Order Logic
64 First-Order Logic and Type Theory The problem can be seen in the two qestionable rles In the existential introdction, the term a has not yet been introdced into the derivation and its se can therefore
More informationStructured Prediction Theory Based on Factor Graph Complexity
Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly
More informationW-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS
W-BASED VS LATENT VARIABLES SPATIAL AUTOREGRESSIVE MODELS: EVIDENCE FROM MONTE CARLO SIMULATIONS. Introduction When it coes to applying econoetric odels to analyze georeferenced data, researchers are well
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Matheatical Sciences, 251 Mercer Street, New York,
More informationProc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES
Proc. of the IEEE/OES Seventh Working Conference on Current Measureent Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES Belinda Lipa Codar Ocean Sensors 15 La Sandra Way, Portola Valley, CA 98 blipa@pogo.co
More informationResearch Article Robust ε-support Vector Regression
Matheatical Probles in Engineering, Article ID 373571, 5 pages http://dx.doi.org/10.1155/2014/373571 Research Article Robust ε-support Vector Regression Yuan Lv and Zhong Gan School of Mechanical Engineering,
More informationPAC-Bayesian Learning of Linear Classifiers
Pascal Gerain Pascal.Gerain.@ulaval.ca Alexandre Lacasse Alexandre.Lacasse@ift.ulaval.ca François Laviolette Francois.Laviolette@ift.ulaval.ca Mario Marchand Mario.Marchand@ift.ulaval.ca Départeent d inforatique
More informationarxiv: v3 [stat.ml] 9 Aug 2016
with Specialization to Linear Classifiers Pascal Gerain Aaury Habrard François Laviolette 3 ilie Morvant INRIA, SIRRA Project-Tea, 75589 Paris, France, et DI, École Norale Supérieure, 7530 Paris, France
More information