Stability Bounds for Non-i.i.d. Processes

Size: px
Start display at page:

Download "Stability Bounds for Non-i.i.d. Processes"

Transcription

1 tability Bounds for Non-i.i.d. Processes Mehryar Mohri Courant Institute of Matheatical ciences and Google Research 25 Mercer treet New York, NY 002 Afshin Rostaiadeh Departent of Coputer cience Courant Institute of Matheatical ciences 25 Mercer treet New York, NY 002 Abstract The notion of algorithic stability has been used effectively in the past to derive tight generaliation bounds. A key advantage of these bounds is that they are designed for specific learning algoriths, exploiting their particular properties. But, as in uch of learning theory, existing stability analyses and bounds apply only in the scenario where the saples are independently and identically distributed (i.i.d.). In any achine learning applications, however, this assuption does not hold. The observations received by the learning algorith often have soe inherent teporal dependence, which is clear in syste diagnosis or tie series prediction probles. This paper studies the scenario where the observations are drawn fro a stationary ixing sequence, which iplies a dependence between observations that weaken over tie. It proves novel stability-based generaliation bounds that hold even with this ore general setting. These bounds strictly generalie the bounds given in the i.i.d. case. It also illustrates their application in the case of several general classes of learning algoriths, including upport Vector Regression and Kernel Ridge Regression. Introduction The notion of algorithic stability has been used effectively in the past to derive tight generaliation bounds 2 4,6. A learning algorith is stable when the hypotheses it outputs differ in a liited way when sall changes are ade to the training set. A key advantage of stability bounds is that they are tailored to specific learning algoriths, exploiting their particular properties. They do not depend on coplexity easures such as the VC-diension, covering nubers, or Radeacher coplexity, which characterie a class of hypotheses, independently of any algorith. But, as in uch of learning theory, existing stability analyses and bounds apply only in the scenario where the saples are independently and identically distributed (i.i.d.). Note that the i.i.d. assuption is typically not tested or derived fro a data analysis. In any achine learning applications this assuption does not hold. The observations received by the learning algorith often have soe inherent teporal dependence, which is clear in syste diagnosis or tie series prediction probles. A typical exaple of tie series data is stock pricing, where clearly prices of different stocks on the sae day or of the sae stock on different days ay be dependent. This paper studies the scenario where the observations are drawn fro a stationary ixing sequence, a widely adopted assuption in the study of non-i.i.d. processes that iplies a dependence between observations that weakens over tie 8, 0, 6, 7. Our proofs are also based on the independent block technique coonly used in such contexts 7 and a generalied version of McDiarid s inequality 7. We prove novel stability-based generaliation bounds that hold even with this ore general setting. These bounds strictly generalie the bounds given in the i.i.d. case and apply to all stable learning algoriths thereby extending the usefulness of stability-bounds to non-i.i.d. scenar-

2 ios. It also illustrates their application to general classes of learning algoriths, including upport Vector Regression (VR) 5 and Kernel Ridge Regression 3. Algoriths such as support vector regression (VR) 4, 5 have been used in the context of tie series prediction in which the i.i.d. assuption does not hold, soe with good experiental results 9, 2. To our knowledge, the use of these algoriths in non-i.i.d. scenarios has not been supported by any theoretical analysis. The stability bounds we give for VR and any other kernel regulariation-based algoriths can thus be viewed as the first theoretical basis for their use in such scenarios. In ection 2, we will introduce the definitions for the non-i.i.d. probles we are considering and discuss the learning scenarios. ection 3 gives our ain generaliation bounds based on stability, including the full proof and analysis. In ection 4, we apply these bounds to general kernel regulariation-based algoriths, including upport Vector Regression and Kernel Ridge Regression. 2 Preliinaries We first introduce soe standard definitions for dependent observations in ixing theory 5 and then briefly discuss the learning scenarios in the non-i.i.d. case. 2. Non-i.i.d. Definitions Definition. A sequence of rando variables Z = {Z t } t= is said to be stationary if for any t and non-negative integers and k, the rando vectors (Z t,...,z t+ ) and (Z t+k,...,z t++k ) have the sae distribution. Thus, the index t or tie, does not affect the distribution of a variable Z t in a stationary sequence. This does not iply independence however. In particular, for i < j < k, PrZ j Z i ay not equal PrZ k Z i. The following is a standard definition giving a easure of the dependence of the rando variables Z t within a stationary sequence. There are several equivalent definitions of this quantity, we are adopting here that of 7. Definition 2. Let Z = {Z t } t= be a stationary sequence of rando variables. For any i,j Z {,+ }, let σ j i denote the σ-algebra generated by the rando variables Z k, i k j. Then, for any positive integer k, the β-ixing and ϕ-ixing coefficients of the stochastic process Z are defined as β(k) = sup E sup PrA B PrA ϕ(k) = PrA B PrA. () n B σ n A σ n+k sup n A σ n+k B σ n Z is said to be β-ixing (ϕ-ixing) if β(k) 0 (resp. ϕ(k) 0) as k. It is said to be algebraically β-ixing (algebraically ϕ-ixing) if there exist real nubers β 0 > 0 (resp. ϕ 0 > 0) and r > 0 such that β(k) β 0 /k r (resp. ϕ(k) ϕ 0 /k r ) for all k, exponentially ixing if there exist real nubers β 0 (resp. ϕ 0 > 0) and β (resp. ϕ > 0) such that β(k) β 0 exp( β k r ) (resp. ϕ(k) ϕ 0 exp( ϕ k r )) for all k. Both β(k) and ϕ(k) easure the dependence of the events on those that occurred ore than k units of tie in the past. β-ixing is a weaker assuption than φ-ixing. We will be using a concentration inequality that leads to siple bounds but that applies to φ-ixing processes only. However, the ain proofs presented in this paper are given in the ore general case of β-ixing sequences. This is a standard assuption adopted in previous studies of learning in the presence of dependent observations 8, 0, 6, 7. As pointed out in 6, β-ixing sees to be just the right assuption for carrying over several PAC-learning results to the case of weakly-dependent saple points. everal results have also been obtained in the ore general context of α-ixing but they see to require the stronger condition of exponential ixing. Mixing assuptions can be checked in soe cases such as with Gaussian or Markov processes 0. The ixing paraeters can also be estiated in such cases. 2

3 Most previous studies use a technique originally introduced by based on independent blocks of equal sie 8, 0, 7. This technique is particularly relevant when dealing with stationary β-ixing. We will need a related but soewhat different technique since the blocks we consider ay not have the sae sie. The following lea is a special case of Corollary 2.7 fro 7. Lea (Yu 7, Corollary 2.7). Let µ and suppose that ( h is easurable function, with µ absolute value bounded by M, on a product probability space j= Ω j, ) µ σsi r i where r i s i r i+ for all i. Let Q be a probability easure on the product ( space with arginal easures Q i i+ on (Ω i,σr si i ), and let Q i+ be the arginal easure of Q on j= Ω j, i+ j= σsj r j ), i =,...,µ. Let β(q) = sup i µ β(k i ), where k i = r i+ s i, and P = µ Q i. Then, E Q h E P h (µ )Mβ(Q). (2) The lea gives a easure of the difference between the distribution of µ blocks where the blocks are independent in one case and dependent in the other case. The distribution within each block is assued to be the sae in both cases. For a onotonically decreasing function β, we have β(q) = β(k ), where k = in i (k i ) is the sallest gap between blocks. 2.2 Learning cenarios We consider the failiar supervised learning setting where the learning algorith receives a saple of labeled points = (,..., ) = ((x,y ),...,(x,y )) (X Y ), where X is the input space and Y the set of labels (Y = R in the regression case), both assued to be easurable. For a fixed learning algorith, we denote by h the hypothesis it returns when trained on the saple. The error of a hypothesis on a pair X Y is easured in ters of a cost function c : Y Y R +. Thus, c(h(x),y) easures the error of a hypothesis h on a pair (x,y), c(h(x),y) = (h(x) y) 2 in the standard regression cases. We will use the shorthand c(h,) := c(h(x),y) for a hypothesis h and = (x,y) X Y and will assue that c is upper bounded by a constant M > 0. We denote by R(h) the epirical error of a hypothesis h for a training saple = (,..., ): R(h) = c(h, i ). (3) In the standard achine learning scenario, the saple pairs,..., are assued to be i.i.d., a restrictive assuption that does not always hold in practice. We will consider here the ore general case of dependent saples drawn fro a stationary ixing sequence Z over X Y. As in the i.i.d. case, the objective of the learning algorith is to select a hypothesis with sall error over future saples. But, here, we ust distinguish two versions of this proble. In the ost general version, future saples depend on the training saple and thus the generaliation error or true error of the hypothesis h trained on ust be easured by its expected error conditioned on the saple : R(h ) = E c(h,). (4) This is the ost realistic setting in this context, which atches tie series prediction probles. A soewhat less realistic version is one where the saples are dependent, but the test points are assued to be independent of the training saple. The generaliation error of the hypothesis h trained on is then: R(h ) = E c(h,) = E c(h,). (5) This setting sees less natural since if saples are dependent, then future test points ust also depend on the training points, even if that dependence is relatively weak due to the tie interval after which test points are drawn. Nevertheless, it is this soewhat less realistic setting that has been studied by all previous achine learning studies that we are aware of 8,0,6,7, even when exaining specifically a tie series prediction proble 0. Thus, the bounds derived in these studies cannot be applied to the ore general setting. We will consider instead the ost general setting with the definition of the generaliation error based on Eq. 4. Clearly, our analysis applies to the less general setting just discussed as well. 3

4 3 Non-i.i.d. tability Bounds This section gives generaliation bounds for ˆβ-stable algoriths over a ixing stationary distribution. The first two sections present our ain proofs which hold for β-ixing stationary distributions. In the third section, we will be using a concentration inequality that applies to φ-ixing processes only. The condition of ˆβ-stability is an algorith-dependent property first introduced in 4 and 6. It has been later used successfully by 2, 3 to show algorith-specific stability bounds for i.i.d. saples. Roughly speaking, a learning algorith is said to be stable if sall changes to the training set do not produce large deviations in its output. The following gives the precise technical definition. Definition 3. A learning algorith is said to be (uniforly) ˆβ-stable if the hypotheses it returns for any two training saples and that differ by a single point satisfy X Y, c(h,) c(h,) ˆβ. (6) Many generaliation error bounds rely on McDiarid s inequality. But this inequality requires the rando variables to be i.i.d. and thus is not directly applicable in our scenario. Instead, we will use a theore that extends McDiarid s inequality to general ixing distributions (Theore, ection 3.3). To obtain a stability-based generaliation bound, we will apply this theore to Φ() = R(h ) R(h ). To do so, we need to show, as with the standard McDiarid s inequality, that Φ is a Lipschit function and, to ake it useful, bound EΦ. The next two sections describe how we achieve both of these in this non-i.i.d. scenario. 3. Lipschit Condition As discussed in ection 2.2, in the ost general scenario, test points depend on the training saple. We first present a lea that relates the expected value of the generaliation error in that scenario and the sae expectation in the scenario where the test point is independent of the training saple. We denote by R(h ) = E c(h,) the expectation in the dependent case and by R(h b ) = E e c(h b, ) that expectation when the test points are assued independent of the training, with b denoting a sequence siilar to but with the last b points reoved. Figure (a) illustrates that sequence. The block b is assued to have exactly the sae distribution as the corresponding block of the sae sie in. Lea 2. Assue that the learning algorith is ˆβ-stable and that the cost function c is bounded by M. Then, for any saple of sie drawn fro a β-ixing stationary distribution and for any b {0,...,}, the following holds: E R(h ) E R(h b ) bˆβ + β(b)m. (7) Proof. The ˆβ-stability of the learning algorith iplies that ER(h ) = E c(h,) E c(h b,) + bˆβ. (8),, The application of Lea yields ER(h ) E c(h b, ) + bˆβ + β(b)m = ẼR(h b ) + bˆβ + β(b)m. (9),e The other side of the inequality of the lea can be shown following the sae steps. We can now prove a Lipschit bound for the function Φ. The standard variable used for the stability coefficient is β. To avoid the confusion with the β-ixing coefficient, we will use ˆβ instead. 4

5 b i i i i,b i i,b b b b b b b b b (a) (b) (c) (d) b Figure : Illustration of the sequences derived fro that are considered in the proofs. Lea 3. Let = (, 2,..., ) and i = (, 2,..., ) be two sequences drawn fro a β-ixing stationary process that differ only in point i,, and let h and h i be the hypotheses returned by a ˆβ-stable algorith when trained on each of these saples. Then, for any i,, the following inequality holds: Φ() Φ( i ) (b + )2ˆβ + 2β(b)M + M. (0) Proof. To prove this inequality, we first bound the difference of the epirical errors as in 3, then the difference of the true errors. Bounding the difference of costs on agreeing points with ˆβ and the one that disagrees with M yields R(h ) R(h i) = c(h, j ) c(h i, j) () j= = c(h, j ) c(h i, j) + c(h, i ) c(h i, i) ˆβ + M. j i Now, applying Lea 2 to both generaliation error ters and using ˆβ-stability result in R(h ) R(h i) R(h b ) R(h i b ) + 2bˆβ + 2β(b) (2) = E e c(h b, ) c(h i b, ) + 2bˆβ + 2β(b)M ˆβ + 2bˆβ + 2β(b)M. The lea s stateent is obtained by cobining inequalities and Bound on EΦ As entioned earlier, to ake the bound useful, we also need to bound E Φ(). This is done by analying independent blocks using Lea. Lea 4. Let h be the hypothesis returned by a ˆβ-stable algorith trained on a saple drawn fro a stationary β-ixing distribution. Then, for all b,, the following inequality holds: E Φ() (6b + )ˆβ + 3β(b)M. (3) Proof. We first analye the ter E R(h ). Let i be the sequence with the b points before and after point i reoved. Figure (b) illustrates this definition. i is thus ade of three blocks. Let i denote a siilar set of three blocks each with the sae distribution as the corresponding block in i, but such that the three blocks are independent. In particular, the iddle block reduced to one point i is independent of the two others. By the ˆβ-stability of the algorith, E R(h ) = E c(h, i ) Ei c(h i, i ) + 2bˆβ. (4) Applying Lea to the first ter of the right-hand side yields E R(h ) E c(h ei, i ) + 2bˆβ + 2β(b)M. (5) ei 5

6 Cobining the independent block sequences associated to R(h ) and R(h ) will help us prove the lea in a way siilar to the i.i.d. case treated in 3. Let b be defined as in the proof of Lea 2. To deal with independent block sequences defined with respect to the sae hypothesis, we will consider the sequence i,b = i b, which is illustrated by Figure (c). This can result in as any as four blocks. As before, we will consider a sequence i,b with a siilar set of blocks each with the sae distribution as the corresponding blocks in i,b, but such that the blocks are independent. ince three blocks of at ost b points are reoved fro each hypothesis, by the ˆβ-stability of the learning algorith, the following holds: EΦ() = E R(h ) R(h ) = E c(h, i ) c(h,) (6), E c(h i,b, i ) c(h i,b,) + 6bˆβ. (7) i,b, E e i,b,e Now, the application of Lea to the difference of two cost functions also bounded by M as in the right-hand side leads to EΦ() c(h ei,b, i ) c(h ei,b, ) + 6bˆβ + 3β(b)M. (8) E e i,b,e ince and i are independent and the distribution is stationary, they have the sae distribution and we can replace i with in the epirical cost and write EΦ() c(h e i, ) c(h ei,b, ) + 6bˆβ + 3β(b)M ˆβ + 6bˆβ + 3β(b)M, (9) i,b where i i,b is the sequence derived fro i,b by replacing i with. The last inequality holds by ˆβ-stability of the learning algorith. The other side of the inequality in the stateent of the lea can be shown following the sae steps. 3.3 Main Results This section presents several theores that constitute the ain results of this paper. We will use the following theore which extends McDiarid s inequality to ϕ-ixing distributions. Theore (Kontorovich and Raanan 7, Th..). Let Φ : Z R be a function defined over a countable space Z. If Φ is l-lipschit with respect to the Haing etric for soe l > 0, then the following holds for all ǫ > 0: Pr Φ(Z) EΦ(Z) > ǫ 2exp Z where + 2 ϕ(k). k= ( ǫ 2 2l 2 2 ), (20) Theore 2 (General Non-i.i.d. tability Bound). Let h denote the hypothesis returned by a ˆβstable algorith trained on a saple drawn fro a ϕ-ixing stationary distribution and let c be a easurable non-negative cost function upper bounded by M > 0, then for any b 0, and any ǫ > 0, the following generaliation bound holds Pr h R(h) R(h ) b i > ǫ + (6b + )ˆβ + 6Mϕ(b) 2exp ǫ 2 ( + 2 P! ϕ(i)) 2. 2((b + )2ˆβ + 2Mϕ(b) + M/) 2 Proof. The theore follows directly the application of Lea 3 and Lea 4 to Theore. The theore gives a general stability bound for ϕ-ixing stationary sequences. If we further assue that the sequence is algebraically ϕ-ixing, that is for all k, ϕ(k) = ϕ 0 k r for soe r >, then we can solve for the value of b to optiie the bound. 6

7 Theore 3 (Non-i.i.d. tability Bound for Algebraically Mixing equences). Let h denote the hypothesis returned by a ˆβ-stable algorith trained on a saple drawn fro an algebraically ϕ-ixing stationary distribution, ϕ(k) = ϕ 0 k r with r > and let c be a easurable non-negative cost function upper bounded by M > 0, then for any ǫ > 0, the following generaliation bound holds Pr h R(h) b R(h ) > ǫ + ˆβ + (r + )6Mϕ(b) where ϕ(b) = ϕ 0 ( ) ˆβ r/(r+). rϕ 0M satisfies ˆβb = rmϕ(b), which gives b = following ter can be bounded as X X + 2 ϕ 0ϕ(i) = + 2ϕ 0 i r + 2ϕ 0 + i 2exp ǫ 2 ( + 2ϕ 0r/(r )) 2 2(2ˆβ + (r + )2Mϕ(b) + M/) 2 Proof. For an algebraically ixing sequence, the value of b iniiing the bound of Theore 2 ( ) /(r+) ( ) ˆβ rϕ 0M and ϕ(b) = ˆβ r/(r+). ϕ0 rϕ 0M The Z ««i r di = + 2ϕ 0 + r. (2) r For r >, the exponent of is negative, and so we can bound this last ter by + 2ϕ 0 r/(r ). Plugging in this value and the iniiing value of b in the bound of Theore 2 yields the stateent of the theore. In the case of a ero ixing coefficient (ϕ = 0 and b = 0), the bounds of Theore 2 and Theore 3 coincide with the i.i.d. stability bound of 3. In order for the right-hand side of these bounds to converge, we ust have ˆβ = o(/ ) and ϕ(b) = o(/ ). For several general classes of algoriths, ˆβ O(/) 3. In the case of algebraically ixing sequences with r > assued in Theore 3, ˆβ O(/) iplies ϕ(b) = ϕ 0 (ˆβ/(rϕ 0 M)) (r/(r+)) < O(/ ). The next section illustrates the application of Theore 3 to several general classes of algoriths. 4 Application We now present the application of our stability bounds to several algoriths in the case of an algebraically ixing sequence. Our bound applies to all algoriths based on the iniiation of a regularied objective function based on the nor K in a reproducing kernel Hilbert space, where K is a positive definite syetric kernel: argin h H c(h, i ) + λ h 2 K, (22) under soe general conditions, since these algoriths are stable with ˆβ O(/) 3. Two specific instances of these algoriths are VR, for which the cost function is based on the ǫ-insensitive cost: { 0 if h(x) y ǫ, c(h,) = h(x) y ǫ = (23) h(x) y ǫ otherwise, and Kernel Ridge Regression 3, for which c(h,) = (h() y) 2. Corollary. Assue a bounded output Y = 0,B, a bounded cost function with bound M > 0, and that K(x,x) κ for all x for soe κ > 0. Let h denote the hypothesis returned by the algorith when trained on a saple drawn fro an algebraically ϕ-ixing stationary distribution. Then, with probability at least δ, the following generaliation bounds hold for a. upport vector regression (VR): «R(h ) R(h b ) + κ2 κ 2 u «2λ + 3M λ + u ϕ 0 M + κ2 κ 2 u «r 2λ + M 2 log(2/δ) λ u b. Kernel Ridge Regression (KRR): «R(h ) R(h b ) + 2Bκ2 4κ 2 λ + B 2 u «3M λ + u ϕ 0 M + 2κ2 B 2 4κ 2 B 2 u M + λ λ u! «r 2log(2/δ) With u = r/(r + ) 2,, M = 2(r + )M/(2rϕ 0 M) u, and ϕ 0 = ( + 2ϕ 0 r/(r ))., 7

8 Proof. It has been shown in 3 that for VR ˆβ κ 2 /(2λ) and for KRR, ˆβ 2κ 2 B 2 /(λ). Plugging in these values in the bound of Theore 3 and setting the right hand side to δ, yield the stateent of the corollary. These bounds give, to the best of our knowledge, the first stability-based generaliation bounds for VR and KRR in a non-i.i.d. scenario. iilar bounds can be obtained for other failies of algoriths such as axiu entropy discriination, which can be shown to have coparable stability properties 3. These bounds are non-trivial when the condition λ / /2 /r on the regulariation paraeter holds for all large values of, which clearly coincides with the i.i.d. case as r tends to infinity. It would be interesting to give a quantitative coparison of our bounds and the generaliation bounds of 0 based on covering nubers for ixing stationary distributions, in the scenario where test points are independent of the training saple. In general, because the bounds of 0 are not algorith-dependent, one can expect tighter bounds using stability, provided that a tight bound is given on the stability coefficient. The coparison also depends on how fast the covering nuber grows with the saple sie and trade-off paraeters such as λ. For a fixed λ, the asyptotic behavior of our stability bounds for VR and KRR is tight. 5 Conclusion Our stability bounds for ixing stationary sequences apply to large classes of algoriths, including VR and KRR, extending to weakly dependent observations existing bounds in the i.i.d. case. ince they are algorith-specific, these bounds can often be tighter than other generaliation bounds. Weaker notions of stability ight help further iprove or refine the. Acknowledgents This work was partially funded by the New York tate Office of cience Technology and Acadeic Research (NYTAR) and was also sponsored in part by the Departent of the Ary Award Nuber W23RYX N605. The U.. Ary Medical Research Acquisition Activity, 820 Chandler treet, Fort Detrick MD is the awarding and adinistering acquisition office. The content of this aterial does not necessarily reflect the position or the policy of the Governent and no official endorseent should be inferred. References. N. Bernstein. ur l extension du théorèe liite du calcul des probabilités aux soes de quantités dépendantes. Math. Ann., 97: 59, O. Bousquet and A. Elisseeff. Algorithic stability and generaliation perforance. In Advances in Neural Inforation Processing ystes (NIP 2000), O. Bousquet and A. Elisseeff. tability and generaliation. JMLR, 2: , L. Devroye and T. Wagner. Distribution-free perforance bounds for potential function rules. In Inforation Theory, IEEE Transactions on, volue 25, pages , P. Doukhan. Mixing: Properties and Exaples. pringer-verlag, M. Kearns and D. Ron. Algorithic stability and sanity-check bounds for leave-one-out cross-validation. In Coputational Learing Theory, pages 52 62, L. Kontorovich and K. Raanan. Concentration inequalities for dependent rando variables via the artingale ethod, A. Loano,. Kulkarni, and R. chapire. Convergence and consistency of regularied boosting algoriths with stationary β-ixing observations. In NIP, D. Mattera and. Haykin. upport vector achines for dynaic reconstruction of a chaotic syste. In Advances in kernel ethods: support vector learning, pages MIT Press, Cabridge, MA, R. Meir. Nonparaetric tie series prediction through adaptive odel selection. ML, 39():5 34, D. Modha and E. Masry. On the consistency in nonparaetric estiation under ixing assuptions. IEEE Transactions of Inforation Theory, 44:7 33, K.-R. Müller, A. ola, G. Rätsch, B. chölkopf, J. K., and V. Vapnik. Predicting tie series with support vector achines. In Proceedings of ICANN 97, LNC, pages pringer,

9 3 C. aunders, A. Gaeran, and V. Vovk. Ridge Regression Learning Algorith in Dual Variables. In Proceedings of the ICML 98, pages Morgan Kaufann Publishers Inc., B. chölkopf and A. ola. Learning with Kernels. MIT Press: Cabridge, MA, Vladiir N. Vapnik. tatistical Learning Theory. Wiley-Interscience, New York, M. Vidyasagar. Learning and Generaliation: With Applications to Neural Networks. pringer, B. Yu. Rates of convergence for epirical processes of stationary ixing sequences. The Annals of Probability, 22():94 6, Jan

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200 66-686 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes

Stability Bounds for Stationary ϕ-mixing and β-mixing Processes Journal of Machine Learning Research (200) 789-84 Subitted /08; Revised /0; Published 2/0 Stability Bounds for Stationary ϕ-ixing and β-ixing Processes Mehryar Mohri Courant Institute of Matheatical Sciences

More information

Rademacher Complexity Bounds for Non-I.I.D. Processes

Rademacher Complexity Bounds for Non-I.I.D. Processes Rademacher Complexity Bounds for Non-I.I.D. Processes Mehryar Mohri Courant Institute of Mathematical ciences and Google Research 5 Mercer treet New York, NY 00 mohri@cims.nyu.edu Afshin Rostamizadeh Department

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes Radeacher Coplexity Margin Bounds for Learning with a Large Nuber of Classes Vitaly Kuznetsov Courant Institute of Matheatical Sciences, 25 Mercer street, New York, NY, 002 Mehryar Mohri Courant Institute

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Coputable Shell Decoposition Bounds John Langford TTI-Chicago jcl@cs.cu.edu David McAllester TTI-Chicago dac@autoreason.co Editor: Leslie Pack Kaelbling and David Cohn Abstract Haussler, Kearns, Seung

More information

Learnability and Stability in the General Learning Setting

Learnability and Stability in the General Learning Setting Learnability and Stability in the General Learning Setting Shai Shalev-Shwartz TTI-Chicago shai@tti-c.org Ohad Shair The Hebrew University ohadsh@cs.huji.ac.il Nathan Srebro TTI-Chicago nati@uchicago.edu

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

1 Rademacher Complexity Bounds

1 Rademacher Complexity Bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Max Goer March 07, 2013 1 Radeacher Coplexity Bounds Recall the following theore fro last lecture: Theore 1. With probability

More information

Computable Shell Decomposition Bounds

Computable Shell Decomposition Bounds Journal of Machine Learning Research 5 (2004) 529-547 Subitted 1/03; Revised 8/03; Published 5/04 Coputable Shell Decoposition Bounds John Langford David McAllester Toyota Technology Institute at Chicago

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

On the Impact of Kernel Approximation on Learning Accuracy

On the Impact of Kernel Approximation on Learning Accuracy On the Ipact of Kernel Approxiation on Learning Accuracy Corinna Cortes Mehryar Mohri Aeet Talwalkar Google Research New York, NY corinna@google.co Courant Institute and Google Research New York, NY ohri@cs.nyu.edu

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and This article appeared in a ournal published by Elsevier. The attached copy is furnished to the author for internal non-coercial research and education use, including for instruction at the authors institution

More information

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning

New Bounds for Learning Intervals with Implications for Semi-Supervised Learning JMLR: Workshop and Conference Proceedings vol (1) 1 15 New Bounds for Learning Intervals with Iplications for Sei-Supervised Learning David P. Helbold dph@soe.ucsc.edu Departent of Coputer Science, University

More information

Robustness and Regularization of Support Vector Machines

Robustness and Regularization of Support Vector Machines Robustness and Regularization of Support Vector Machines Huan Xu ECE, McGill University Montreal, QC, Canada xuhuan@ci.cgill.ca Constantine Caraanis ECE, The University of Texas at Austin Austin, TX, USA

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

Note on generating all subsets of a finite set with disjoint unions

Note on generating all subsets of a finite set with disjoint unions Note on generating all subsets of a finite set with disjoint unions David Ellis e-ail: dce27@ca.ac.uk Subitted: Dec 2, 2008; Accepted: May 12, 2009; Published: May 20, 2009 Matheatics Subject Classification:

More information

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011)

E0 370 Statistical Learning Theory Lecture 5 (Aug 25, 2011) E0 370 Statistical Learning Theory Lecture 5 Aug 5, 0 Covering Nubers, Pseudo-Diension, and Fat-Shattering Diension Lecturer: Shivani Agarwal Scribe: Shivani Agarwal Introduction So far we have seen how

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory TTIC 31120 Prof. Nati Srebro Lecture 2: PAC Learning and VC Theory I Fro Adversarial Online to Statistical Three reasons to ove fro worst-case deterinistic

More information

Shannon Sampling II. Connections to Learning Theory

Shannon Sampling II. Connections to Learning Theory Shannon Sapling II Connections to Learning heory Steve Sale oyota echnological Institute at Chicago 147 East 60th Street, Chicago, IL 60637, USA E-ail: sale@athberkeleyedu Ding-Xuan Zhou Departent of Matheatics,

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

arxiv: v4 [cs.lg] 4 Apr 2016

arxiv: v4 [cs.lg] 4 Apr 2016 e-publication 3 3-5 Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions arxiv:35796v4 cslg 4 Apr 6 Corinna Cortes Google Research, 76 Ninth Avenue, New York, NY Spencer

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

Accuracy at the Top. Abstract

Accuracy at the Top. Abstract Accuracy at the Top Stephen Boyd Stanford University Packard 264 Stanford, CA 94305 boyd@stanford.edu Mehryar Mohri Courant Institute and Google 25 Mercer Street New York, NY 002 ohri@cis.nyu.edu Corinna

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

Understanding Machine Learning Solution Manual

Understanding Machine Learning Solution Manual Understanding Machine Learning Solution Manual Written by Alon Gonen Edited by Dana Rubinstein Noveber 17, 2014 2 Gentle Start 1. Given S = ((x i, y i )), define the ultivariate polynoial p S (x) = i []:y

More information

Kernel-Based Nonparametric Anomaly Detection

Kernel-Based Nonparametric Anomaly Detection Kernel-Based Nonparaetric Anoaly Detection Shaofeng Zou Dept of EECS Syracuse University Eail: szou@syr.edu Yingbin Liang Dept of EECS Syracuse University Eail: yliang6@syr.edu H. Vincent Poor Dept of

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

arxiv: v1 [math.pr] 17 May 2009

arxiv: v1 [math.pr] 17 May 2009 A strong law of large nubers for artingale arrays Yves F. Atchadé arxiv:0905.2761v1 [ath.pr] 17 May 2009 March 2009 Abstract: We prove a artingale triangular array generalization of the Chow-Birnbau- Marshall

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information

Rademacher Bounds for Non-i.i.d. Processes

Rademacher Bounds for Non-i.i.d. Processes Rademacher Bounds for Non-i.i.d. Processes Afshin Rostamizadeh Joint work with: Mehryar Mohri Background Background Generalization Bounds - How well can we estimate an algorithm s true performance based

More information

Some Proofs: This section provides proofs of some theoretical results in section 3.

Some Proofs: This section provides proofs of some theoretical results in section 3. Testing Jups via False Discovery Rate Control Yu-Min Yen. Institute of Econoics, Acadeia Sinica, Taipei, Taiwan. E-ail: YMYEN@econ.sinica.edu.tw. SUPPLEMENTARY MATERIALS Suppleentary Materials contain

More information

Bootstrapping Dependent Data

Bootstrapping Dependent Data Bootstrapping Dependent Data One of the key issues confronting bootstrap resapling approxiations is how to deal with dependent data. Consider a sequence fx t g n t= of dependent rando variables. Clearly

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Structured Prediction Theory Based on Factor Graph Complexity

Structured Prediction Theory Based on Factor Graph Complexity Structured Prediction Theory Based on Factor Graph Coplexity Corinna Cortes Google Research New York, NY 00 corinna@googleco Mehryar Mohri Courant Institute and Google New York, NY 00 ohri@cisnyuedu Vitaly

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Kernel Methods. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Kernel Methods Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Motivation Efficient coputation of inner products in high diension. Non-linear decision

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Stochastic Subgradient Methods

Stochastic Subgradient Methods Stochastic Subgradient Methods Lingjie Weng Yutian Chen Bren School of Inforation and Coputer Science University of California, Irvine {wengl, yutianc}@ics.uci.edu Abstract Stochastic subgradient ethods

More information

Domain-Adversarial Neural Networks

Domain-Adversarial Neural Networks Doain-Adversarial Neural Networks Hana Ajakan, Pascal Gerain 2, Hugo Larochelle 3, François Laviolette 2, Mario Marchand 2,2 Départeent d inforatique et de génie logiciel, Université Laval, Québec, Canada

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Asymptotics of weighted random sums

Asymptotics of weighted random sums Asyptotics of weighted rando sus José Manuel Corcuera, David Nualart, Mark Podolskij arxiv:402.44v [ath.pr] 6 Feb 204 February 7, 204 Abstract In this paper we study the asyptotic behaviour of weighted

More information

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Lecture 5. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Lecture 5 Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Kernel Methods Motivation Non-linear decision boundary. Efficient coputation of inner products

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

A theory of learning from different domains

A theory of learning from different domains DOI 10.1007/s10994-009-5152-4 A theory of learning fro different doains Shai Ben-David John Blitzer Koby Craer Alex Kulesza Fernando Pereira Jennifer Wortan Vaughan Received: 28 February 2009 / Revised:

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Improved Guarantees for Agnostic Learning of Disjunctions

Improved Guarantees for Agnostic Learning of Disjunctions Iproved Guarantees for Agnostic Learning of Disjunctions Pranjal Awasthi Carnegie Mellon University pawasthi@cs.cu.edu Avri Blu Carnegie Mellon University avri@cs.cu.edu Or Sheffet Carnegie Mellon University

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

arxiv: v2 [stat.ml] 23 Feb 2016

arxiv: v2 [stat.ml] 23 Feb 2016 Perutational Radeacher Coplexity A New Coplexity Measure for Transductive Learning Ilya Tolstikhin 1, Nikita Zhivotovskiy 3, and Gilles Blanchard 4 arxiv:1505.0910v stat.ml 3 Feb 016 1 Max-Planck-Institute

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks

Bounds on the Minimax Rate for Estimating a Prior over a VC Class from Independent Learning Tasks Bounds on the Miniax Rate for Estiating a Prior over a VC Class fro Independent Learning Tasks Liu Yang Steve Hanneke Jaie Carbonell Deceber 01 CMU-ML-1-11 School of Coputer Science Carnegie Mellon University

More information

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS

AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Statistica Sinica 6 016, 1709-178 doi:http://dx.doi.org/10.5705/ss.0014.0034 AN OPTIMAL SHRINKAGE FACTOR IN PREDICTION OF ORDERED RANDOM EFFECTS Nilabja Guha 1, Anindya Roy, Yaakov Malinovsky and Gauri

More information

On Conditions for Linearity of Optimal Estimation

On Conditions for Linearity of Optimal Estimation On Conditions for Linearity of Optial Estiation Erah Akyol, Kuar Viswanatha and Kenneth Rose {eakyol, kuar, rose}@ece.ucsb.edu Departent of Electrical and Coputer Engineering University of California at

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation

Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Algorithic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation Michael Kearns AT&T Labs Research Murray Hill, New Jersey kearns@research.att.co Dana Ron MIT Cabridge, MA danar@theory.lcs.it.edu

More information

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions

Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Relative Deviation Learning Bounds and Generalization with Unbounded Loss Functions Corinna Cortes Spencer Greenberg Mehryar Mohri January, 9 Abstract We present an extensive analysis of relative deviation

More information

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre

Multiscale Entropy Analysis: A New Method to Detect Determinism in a Time. Series. A. Sarkar and P. Barat. Variable Energy Cyclotron Centre Multiscale Entropy Analysis: A New Method to Detect Deterinis in a Tie Series A. Sarkar and P. Barat Variable Energy Cyclotron Centre /AF Bidhan Nagar, Kolkata 700064, India PACS nubers: 05.45.Tp, 89.75.-k,

More information

Deep Boosting. Abstract. 1. Introduction

Deep Boosting. Abstract. 1. Introduction Corinna Cortes Google Research, 8th Avenue, New York, NY Mehryar Mohri Courant Institute and Google Research, 25 Mercer Street, New York, NY 2 Uar Syed Google Research, 8th Avenue, New York, NY Abstract

More information

Testing Properties of Collections of Distributions

Testing Properties of Collections of Distributions Testing Properties of Collections of Distributions Reut Levi Dana Ron Ronitt Rubinfeld April 9, 0 Abstract We propose a fraework for studying property testing of collections of distributions, where the

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011

Yongquan Zhang a, Feilong Cao b & Zongben Xu a a Institute for Information and System Sciences, Xi'an Jiaotong. Available online: 11 Mar 2011 This article was downloaded by: [Xi'an Jiaotong University] On: 15 Noveber 2011, At: 18:34 Publisher: Taylor & Francis Infora Ltd Registered in England and Wales Registered Nuber: 1072954 Registered office:

More information

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone

Characterization of the Line Complexity of Cellular Automata Generated by Polynomial Transition Rules. Bertrand Stone Characterization of the Line Coplexity of Cellular Autoata Generated by Polynoial Transition Rules Bertrand Stone Abstract Cellular autoata are discrete dynaical systes which consist of changing patterns

More information

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions Kernel Choice and Classifiability for RKHS Ebeddings of Probability Distributions Bharath K. Sriperubudur Departent of ECE UC San Diego, La Jolla, USA bharathsv@ucsd.edu Kenji Fukuizu The Institute of

More information

Journal of Mathematical Analysis and Applications

Journal of Mathematical Analysis and Applications J Math Anal Appl 386 202 205 22 Contents lists available at ScienceDirect Journal of Matheatical Analysis and Applications wwwelsevierco/locate/jaa Sei-Supervised Learning with the help of Parzen Windows

More information

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE

MSEC MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL SOLUTION FOR MAINTENANCE AND PERFORMANCE Proceeding of the ASME 9 International Manufacturing Science and Engineering Conference MSEC9 October 4-7, 9, West Lafayette, Indiana, USA MSEC9-8466 MODELING OF DEGRADATION PROCESSES TO OBTAIN AN OPTIMAL

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs

On the Inapproximability of Vertex Cover on k-partite k-uniform Hypergraphs On the Inapproxiability of Vertex Cover on k-partite k-unifor Hypergraphs Venkatesan Guruswai and Rishi Saket Coputer Science Departent Carnegie Mellon University Pittsburgh, PA 1513. Abstract. Coputing

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information