A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Size: px
Start display at page:

Download "A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split"

Transcription

1 A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 Abstract: We give an analysis of the generalization error of cross validation in terms of two natural measures of the difficulty of the roblem under consideration: the aroximation rate (the accuracy to which the target function can be ideally aroximated as a function of the number of hyothesis arameters), and the estimation rate (the deviation between the training and generalization errors as a function of the number of hyothesis arameters). The aroximation rate catures the comlexity of the target function with resect to the hyothesis model, and the estimation rate catures the extent to which the hyothesis model suffers from overfitting. Using these two measures, we give a rigorous and general bound on the error of cross validation. The bound clearly shows the tradeoffs involved with making the fraction of data saved for testing too large or too small. By otimizing the bound with resect to, we then argue (through a combination of formal analysis, lotting, and controlled exerimentation) that the following qualitative roerties of cross validation behavior should be quite robust to significant changes in the underlying model selection roblem: When the target function comlexity is small comared to the samle size, the erformance of cross validation is relatively insensitive to the choice of. The imortance of choosing otimally increases, and the otimal value for decreases, as the target function becomes more comlex relative to the samle size. There is nevertheless a single fixed value for that works nearly otimally for a wide range of target function comlexity. Category: Learning Theory. Prefer oral resentation. 1 INTRODUCTION In this aer we analyze the erformance of cross validation in the context of model selection and comlexity regularization. We work in a setting in which we must choose the right number of arameters for a hyothesis function in resonse to a finite training samle, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emhasizes asymtotic statistical roerties, or the exact calculation of the generalization error for simle models. (The literature is too large to survey here; foundational aers include those of Stone [7, 8].) Our aroach here is somewhat different, and is rimarily insired by two sources: the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (MDL in their case) in terms of a quantity known as the index of resolvability; and the work of Vanik [9], who rovides extremely owerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation erformance. In the first and more formal art of the aer, we give a rigorous bound on the error of cross validation in terms of two arameters of the underlying model selection roblem (which is defined by a target function, an inut distribution, and a nested sequence of increasingly comlex classes of hyothesis functions): the aroximation rate and the estimation rate, mentioned in the abstract and defined formally shortly. Taken together, these two roblem arameters determine (our analogue of) the index of resolvability, and Vanik s work yields estimation rates alicable to many natural roblems. In the second art of the aer, we investigate the imlications of our bound for choosing, the fraction of data withheld for testing in cross validation. The most interesting asect of this analysis is the identification of several qualitative

2 roerties (identifiedin the abstract, and in greater detail in the aer) of the otimal that aear to be invariant over a wide class of model selection roblems. 2 THE FORMALISM In this aer we consider model selection as a two-art roblem: choosing the aroriate number of arameters for the hyothesis function, and tuning these arameters. The training samle is used in both stes of this rocess. In many settings, the tuning of the arameters is determined by a fixed learning algorithm such as backroagation, and then model selection reduces to the roblem of choosing the architecture (which determines the number of weights, the connectivity attern, and so on). For concreteness, here we adot an idealized version of this division of labor. We assume a nested sequence of function classes H 1 H d, called the structure [9], where H d is a class of boolean functions of d arameters, each function being a maing from some inut sace X into f; 1g. For simlicity, in this aer we assume that the Vanik-Chervonenkis (VC) dimension [1, 9] of the class H d is O(d). To remove this assumtion, one simly relaces all occurrences of d in our bounds by the VC dimension of H d. We assume that we have in our ossession a learning algorithm L that on inut any training samle S and any value d will outut a hyothesis function h d 2 H d that minimizes the training error over H d thatis, t (h d )=min h2hd f t (h)g, where t (h) is the fraction of the examles in S on which h disagrees with the given label. In many situations, training error minimization is known to be comutationally intractable, leading researchers to investigate heuristics such as backroagation. The extent to which the theory resented here alies to such heuristics will deend in art on the extent to which they aroximate training error minimization for the roblem under consideration; however, many of our results have rigorous generalizations for the case in which no assumtions are made on the heuristic L (with the quality of the boundsof course decaying with the quality of the hyotheses oututby L). Model selection is thus the roblem of choosing the best value of d. More recisely, we assume an arbitrary target function f (which may or may not reside in one of the function classes in the structure H 1 H d ), and an inut distribution D; f and D together define the generalization error function g (h) =Pr x2d [h(x) 6= f (x)]. We are given a training samle S of f, consisting of m random examles drawn according to D and labeled by f (with the labels ossibly corruted by noise). In many model selection methods (such as Rissanen s Minimum Descrition Length Princile [5] and Vanik s Guaranteed Risk Minimization [9]), for each value of d = 1; 2; 3;::: we give the entire samle S and d to the learning algorithm L to obtain the function h d minimizing the training error in H d.some function of d and the training errors t (h d ) is then used to choose among the sequence h 1 ;h 2 ;h 3 ;:::. Whatever the method, we interret the goal to be that of minimizing the generalization error of the hyothesis selected. In this aer, we will make the rather mild but very useful assumtion that the structure has the roerty that for any samle size m, there is a value d max (m) such that t (h d max (m)) =forany labeled samle S of m examles; we will refer to the function d max (m) as the fitting number of the structure 1. The fitting number formalizes the simle notion that with enough arameters, we can always fit the training data erfectly, a roerty held by most sufficiently owerful function classes (including multilayer neural networks). We tyically exect the fitting number to be a linear function of m, or at worst a olynomial in m. The significance of the fitting number for us is that no reasonable model selection method should choose h d for d d max (m), since doing so simly adds comlexity without reducing the training error. In this aer we concentrate on the simlest version of cross validation. Unlike the methods mentioned above, which use the entire samle for training the h d, in cross validation we choose a arameter 2 [; 1], which determines the slit between training and test data. Given the inut samle S of m examles, let S be the subsamle consisting of the first (1, )m examles in S,andS the subsamle consisting of the last m examles. In cross validation, rather than giving the entire samle S to L, we give only the smaller samle S, resulting in the sequence h 1 ;:::;h d max ((1,)m) of increasingly comlex hyotheses. Each hyothesis is now obtained by training on only (1, )m examles, which imlies that we will consider only d smaller than the corresonding fitting number d max ((1, )m); let us introduce the shorthand d max for d max((1, )m). Cross validation chooses the h d satisfying h d = min i2f1;:::;d max g f t (h i)g where t (h i) is the error of h i on the subsamle S. Notice that we are not considering multifold cross validation, or other variants that make more efficient use of the samle, because our analyses will require the indeendence of the test set. However, we believe that many of the themes that emerge here aly to these more sohisticated variants as well. We use cv (m) to denote the generalization error g (h d ) of the hyothesis h d chosen by cross validation when given as inut a samle S of m random examles of the target function; obviously, cv (m) deends on the structure, f, D, and the noise rate. When bounding cv (m), we will use the exression with high robability to mean with robability 1 Weaker definitions for d max(m) also suffice, but this is the simlest.

3 1, over the samle S, for some small fixed constant >. All of our results can also be stated with as a arameter at the cost of a log(1=) factor in the bounds, or in terms of the exected value of cv (m). 3 THE APPROXIMATION RATE It is aarent that we should not exect to give nontrivial bounds on cv (m) that take no account of some measure of the comlexity of the unknown target function f; the correct measure of this comlexity is less obvious. Following the examle of Barron and Cover s analysis of MDL erformance in the context of density estimation [2], we roose the aroximation rate as a natural measure of the comlexity of f and D in relationshi to the chosen structure H 1 H d. Thus we define the aroximation rate function g (d) to be g (d) =min h2hd f g (h)g. The function g (d) tells us the best generalization error that can be achieved in the class H d, and it is a nonincreasing function of d. If g (d )=forsomesufficiently large d, this means that the target function f, at least with resect to the inut distribution, is realizable in the class H d, and thus d isacoarsemeasureofhowcomlexf is. More generally, even if g (d) > foralld, the rate of decay of g (d) still gives a nice indication of how much reresentational ower we gain with resect to f and D by increasing the comlexity of ourmodels. Still missing, of course, issome means of determining the extent to which this reresentational ower can be realized by training on a finite samle of a given size, but this will be added shortly. First we give examles of the aroximation rate that we will examine in some detail following the general bound on cv (m). The Intervals Problem. In this roblem, the inut sace X is the real interval [; 1], and the class H d of the structure consists of all boolean ste functions over [; 1] of at most d stes; thus, each function artitions the interval [; 1] into at most d disjoint segments (not necessarily of equal width), and assigns alternating ositive and negative labels to these segments. Thus, the inut sace is one-dimensional, but the structure contains arbitrarily comlex functions over [; 1]. It is easily verified that our assumtion that the VC dimension of H d is O(d) holds here, and that the fitting number obeys d max (m) m. Now suose that the inut density D is uniform, and suose that the target function f is the function of s alternating segments of equal width 1=s, forsomes (thus, f lies in the class H s ). We will refer to these settings as the intervals roblem. Then the aroximation rate g (d) is g (d) =(1=2)(1, d=s) for 1 d<sand g (d) =ford s. Thus, as long as d<s, increasing the comlexity d gives linear ayoff in terms of decreasing the otimal generalization error. For d s, there is no ayoff forincreasing d, since we can already realize the target function. The reader can easily verify that if f lies in H s, but does not have equal width intervals, g (d) is still iecewise linear, but for d<sis concave u : the gain in aroximation obtained by incrementally increasing d diminishes as d becomes larger. Although the intervals roblem is rather simle and artificial, a recise analysis of cross validation behavior can be given for it, and we will argue that this behavior is reresentative of much broader and more realistic settings. The Percetron Problem. In this roblem, the inut sace X is < N for some large natural number N. The class H d consists of all ercetrons over the N inuts in which at most d weights are nonzero. If the inut density is sherically symmetric (for instance, the uniform density on the unit ball in < N ), and the target function is a function in H s with all s nonzero weights equal to 1, then it can be shown that the aroximation rate function g (d) is g (d) =(1=) cos,1 ( d=n ) for d<s[6], and of course g (d) =ford s. This roblem rovides a nice contrast to the intervals roblem, since here the behavior of the aroximation rate for small d is concave down: as long as d<s, an incremental increase in d yields more aroximative ower for large d than it does for small d (excet for very small values of d). Power Law Decay. In addition to the secific examles just given, we would also like to study reasonably natural arametric forms of g (d), to determine the sensitivity of our theory to a lausible range of behaviors for the aroximation rate. This is imortant, since in ractice we do not exect to have recise knowledge of g (d), since it deends on the target function and inut distribution. Following the work of Barron [1], who shows a c=d bound on g (d) for the case of neural networks with one hidden layer under a squared error generalization measure (where c is a measure of target function comlexity in terms of a Fourier transform integrability condition) 2, in the later art of the aer we investigate g (d) of the form (c=d) + min,where min is a arameter reresenting the degree of unrealizability of f with resect to the structure, and c; > are arameters caturing the rate of decay to min. Our analysis concludes that the qualitative henomena we identify are invariant to wide ranges of choices for these arameters. Note that for all three cases, there is a natural measure of target function comlexity catured by g (d): in the intervals roblem it is the number of target intervals, in the ercetron roblem it is the number of nonzero weights in the target, 2 Since the bounds we will give have straightforward generalizations to real-valued function learning under squared error (details in the full aer), examining behavior for g(d) in this setting seems reasonable.

4 and in the more general ower law case it is catured by the arameters of the ower law. In the later art of the aer, we will study cross validation erformance as a function of these comlexity measures, and obtain remarkably similar redictions. 4 THE ESTIMATION RATE For a fixed f, D and H 1 H d, we say that a function (d; m) is an estimation rate bound if for all d and m, with high robabilityover the samle S we have j t (h d ), g (h d )j(d; m), where as usual h d is the result of training error minimization on S within H d. Thus (d; m) simly bounds the deviation between the training error and the generalization error of h 3 d. Note that the best such bound may deend in a comlicated way on all of the elements of the roblem: f, D and the structure. Indeed, much of the recent work on the statistical hysics theory of learning curves has documented the wide variety of behaviors that such deviations may assume [6, 3]. However, for many natural roblems it is both convenient and accurate to rely on a universal estimation rate bound rovided by the owerful theory of uniform convergence: Namely, for any f, D and any structure the function (d; m) = (d=m) log(m=d) is an estimation rate bound [9] 4. Deending uon the details of the roblem, it is sometimes aroriate to omit the log(m=d) factor, and often aroriate to refine the d=m behavior to a function that interolates smoothly between d=m behavior for small t to d=m for large t. Although such refinements are both interesting and imortant, many of the qualitative claims and redictions we will make are invariant to them as long as the deviation j t (h d ), g (h d )j is well-aroximated by a ower law (d=m) (>); it will be more imortant to recognize and model the cases in which ower law behavior is grossly violated. Note that this universal estimation rate bound holds only under the assumtion that the training samle is noise-free, but straightforward generalizations exist. For instance, if the training data is corruted by random label noise at rate <1=2, then (d; m) = (d=(1, 2) 2 m)log(m=d) is again a universal estimation rate bound. After giving a general bound on cv (m) in which the aroximation and estimation rate functions are arameters, we investigate the behavior of cv (m) (and more secifically, of the arameter ) for secific choices of these arameters. 5 THE BOUND Theorem 1 Let H 1 H d be any structure, where the VC dimension of H d is O(d). Let f and D be any target function and inut distribution, let g (d) be the aroximation rate function for the structure with resect to f and D, and let (d; m) be an estimation rate bound for the structure with resect to f and D. Then for any m, with high robability cv (m) min 1dd max f g (d) +(d; (1, )m)g + s log(d max) m 1 A (1) where is the fraction of the training samle used for testing, and dmax = d max ((1, )m). Using the universal estimation bound rate and the rather weak assumtion that d max (m) is olynomial in m, we obtain that with high robability cv (m) min 1dd max ( g (d) +O s d (1, )m log m d!) + O s log((1, )m) m! : (2) Straightforward generalizations of these bounds for the case where the data is corruted by random label noise can be obtained, using the modified estimation rate bound mentioned in Section 4 (details in the full aer). Proof Sketch: We have sace only to highlight the main ideas of the roof. For each d from 1 to d max, fix a function f d 2 H d satisfying g (f d )= g (d); thus, f d is the best ossible aroximation to the target f within the class H d. By a standard Chernoff bound argument it can be shown that with high robability we have j t (f d ), g (f d )j log(dmax )=m for all 1 d d max. This means that within each H d, the minimum training error t (h d ) is at 3 In the later and less formal art of the aer, we will often assume that secific forms of (d; m) are not merely uer bounds on this deviation, but accurate aroximations to it. 4 The results of Vanik actually show the stronger result that jt(h), g(h)j (d=m) log(m=d) for all h 2 Hd, not only for the training error minimizer hd.

5 most g (d) + log(d max)=m. Since(d; m) is an estimation rate bound, we have that with high robability, for all 1 d d max, g (h d ) t (h d )+(d; m) (3) q g (d) + log(d max )=m + (d; m): (4) Thus we have bounded the generalization error of the d max hyotheses h d, only one of which will be selected. If we knew the actual values of these generalization errors (equivalent to having an infinite test samle in cross validation), we could bound our error by the minimum over all 1 d d max of the exression (4) above. However, in cross validation we do not know the exact values of these generalization errors but must instead use the m testing examles to estimate them. Again by standard Chernoff bound arguments, this introduces an additional log(d max )=(m) error term, resulting in our final bound. This concludes the roof sketch. In the bounds given by (1) and (2), the minfg exression is analogous to Barron and Cover s index of resolvability [2]; the final term in the bounds reresents the error introduced by the testing hase of cross validation. These bounds exhibit tradeoff behavior with resect to the arameter : as we let aroach, we are devoting more and more of the samle to training the h d, and the estimation rate bound term (d; (1, )m) is decreasing. However, the test error term O( log(d max )=(m)) is increasing, since we have less data to accurately estimate the g (h d ). The reverse henomenon occurs as we let aroach 1. While we believe Theorem 1 to be enlightening and otentially useful in its own right, we would now like to take its interretation a ste further. More recisely, suose we assume that the bound is an aroximation to the actual behavior of cv (m). Then in rincile we can otimize the bound to obtain the best value for. Of course, in addition to the assumtions involved (the main one being that (d; m) is a good aroximation to the training-generalization error deviations of the h d ), this analysis can only be carried out given information that we should not exect to have in ractice (at least in exact form) in articular, the aroximation rate function g (d), which deends on f and D. However, we argue in the coming sections that several interesting qualitative henomena regarding the choice of are largely invariant to a wide range of natural behaviors for g (d). 6 A CASE STUDY: THE INTERVALS PROBLEM We begin by erforming the suggested otimization of for the intervals roblem. Recall that the aroximation rate here is g (d) = (1=2)(1, d=s) for d<sand g (d) =ford s, wheres is the comlexity of the target function. Here we analyze the behavior obtained by assuming that the estimation rate (d; m) actually behaves as (d; m) = d=(1, )m (so we are omitting the log factor from the universal bound) 5, and to simlify the formal analysis a bit (but without changing the qualitative behavior) we relace the term log((1, )m)=(m) by the weaker log(m)=m. Thus, if we define the function F (d; m; ) = g (d) + d=(1, )m + log(m)=(m) (5) then following (1), we are aroximating cv (m) by cv (m) min 1dd max ff (d; m; )g 6 (see Figure 1 for a lot of this aroximation). The first ste of the analysis is to fix a value for and differentiate F (d; m; ) with resect to d to discover the minimizing value of d. This differentiation must have two regimes due to the discontinuity at d = s in g (d). It is easily verified that the derivative is,(1=2s) +1=(2 d(1, )m) for d<sand 1=(2 d(1, )m) for d s. It can be shown that rovided that (1, )m 4s then d = s is a global minimum of of F (d; m; ), andif this condition is violated then the value we obtain for cv (m) is vacuously large anyway (meaning that this fixed choice of can not end u being the otimal one, or, if m<4s, that our analysis claims we simly do not have enough data for nontrivial generalization, regardless of how we slit it between training and testing). Plugging in d = s yields cv (m) F (s; m; ) = s=(1, )m + log(m)=(m) for this fixed choice of. Now by differentiating F (s; m; ) with resect to, it can be shown that the otimal choice of under the assumtions is ot =(log(m)=s) 1=3 =(1 +(log(m)=s) 1=3 ). 5 It can be argued that this ower law estimation rate is actually a rather accurate aroximation for the true behavior of the training-generalization deviations of the hd for this roblem. 6 Although there are hidden constants in the O() notation of the bounds, it is the relative weights of the estimation and test error terms that is imortant, and choosing both constants equal to 1 is a reasonable choice (since both terms have the same Chernoff bound origins).

6 It is imortant to remember at this oint that desite the fact that we have derived a recise exression for ot, due to the assumtions and aroximations we have made in the various constants, any quantitative interretation of this exression is meaningless. However, we can reasonably exect that this exression catures the qualitative way in which the otimal changes as the amount of data m changes in relation to the target function comlexity s. Onthis score the situation initially aears rather bleak, as the function (log(m)=s) 1=3 =(1 +(log(m)=s) 1=3 ) is quite sensitive to the ratio log(m)=s. For examle, for m = 1, if s = 1 we obtain ot = :524,ifs = 1 we obtain ot = :338,andifs = 1 we obtain ot = :191. Thus ot is becoming smaller as log(m)=s becomes small, and the analysis suggests vastly different choices for ot deending on the target function comlexity, which is something we do not exect to have the luxury of knowing in ractice. However, it is both fortunate and interesting that ot does not tell the entire story. In Figure 2, we lot the function F (s; m; ) 7 as a function of for m = 1 and for several different values of s (note that for consistency with the later exerimental lots, the x axis of the lot is actually the training fraction 1, ). Here we can observe four imortant qualitative henomena: (A) When s is small comared to m, the redicted error is relatively insensitive to the choice of : as a function of, F (s; m; ) has a wide, flat bowl, indicating a wide range of yielding essentially the same near-otimal error. (B) As s becomes larger in comarison to the fixed samle size m, the relative sueriority of ot over other values for becomes more ronounced. In articular, large values for become rogressively worse as s increases. For examle, the lots indicate that for s = 1 (again, m = 1), even though ot = :524 the choice = :75 will result in error quite near that achieved using ot.however,for s = 5, = :75 is redicted to yield greatly subotimal error. (C) Because of the insensitivity to for s small comared to m, thereisafixed value of which seems to yield reasonably good erformance for a wide range of values for s; this value is essentially the value of ot for the case where s is large (but nontrivial generalization is still ossible), since choosing the best value for is more imortant there than for the small s case. Note that for very large s, the boundredicts vacuously large error for all values of, so that thechoice of again becomes irrelevant. (D) The value of ot is decreasing as s increases. This is slightly difficult to confirm from the lot, but can be seen clearly from the recise exression for ot. Desite the fact that our analysis so far has been rather secialized (addressing the behavior for a fixed structure, target function and inut distribution), it is our belief that (A), (B), (C) and (D) above are rather universal henomena that hold for many other model selection roblems. For instance, we shall shortly demonstrate that our theory again redicts that (A), (B), (C) and (D) hold for the ercetron roblem, and for case of ower law decay of g (d) described earlier. First we give an exerimental demonstration that at least the redicted roerties (A), (B) and (C) truly do hold for the intervals roblem. In Figures 3, 4 and 5, we lot the results of exeriments in which labeled random samles of size m = 5 were generated for a target function of s equal width intervals, for s = 1; 1 and 5. The samles were corruted by random label noise at rate = :3. For each value of and each value of d, (1, )m of the samle was given to a rogram imlementing training error minimization for the class H 8 d ; the remaining m examles were used to select the best h d according to cross validation. The lots show the true generalization error of the h d selected by cross validation as a function of ; this generalization error can be comuted exactly for this roblem. Each oint in the lots reresents an average over 1 trials. While there are obvious and significant quantitative differences between these exerimental lots and Figure 2, the roerties (A), (B) and (C) are rather clearly borne out by the data: (A) In Figure 3, where s is small comared to m, there is a wide range of accetable ; it aears that any choice of between :1 and :5 yields nearly otimal generalization error. (B) By the time s = 1 (Figure 4), the sensitivity to is considerably more ronounced. For examle, the choice = :5 now results in clearly subotimal erformance, and it is more imortantto have close to :1. (C) Desite these comlexities, there does indeed aear to be single value of aroximately :1 that erforms nearly otimally for the entire range of s examined. The roerty (D) namely, that the otimal decreases as the target function comlexity is increased relative to a fixed m is certainly not refuted by the exerimental results, but any such effect is simly too small to be verified. It 7 In the lots, we now use the more accurate test enalty term log((1, )m)=(m) since we are no longer concerned with simlifying the calculation. 8 A nice feature of the intervals roblem is the fact that training error minimization can be erformed in almost linear time using a dynamic rogramming aroach [4].

7 would be interesting to verify this rediction exerimentally, erhas on a different roblem where the redicted effect is more ronounced. 7 POWER LAW DECAY AND THE PERCEPTRON PROBLEM For the cases where the aroximation rate g (d) obeys either ower law decay or is that derived for the ercetron roblem discussed in Section 3, the behavior of cv (m) as a function of redicted by our theory is largely the same. For examle, if g (d) = (c=d) and we use the standard estimation rate (d; m) = d=(1, )m, thenan analysis similar to that erformed for the intervals roblem reveals that for fixed the minimizing choice of d is d =(4c 2 (1, )m) 1=3 ; lugging this value of d back into the bound on cv (m) yields Figure 6, which, like Figure 2 for the intervals roblem, shows the redicted behavior of cv (m) as a function of for a fixed m and several different choices for the comlexity arameter c. We again see that roerties (A), (B), (C) and (D) hold strongly desite the change in g (d) from the intervals roblem (although quantitative asects of the rediction, which already must be taken lightly for reasons reviously stated, have obviously changed, such as the interesting values of the ratio of samle size to target function comlexity). Through a combination of formal analysis and lotting, it is ossible to demonstrate that the roerties (A), (B), (C) and (D) are robust to wide variations in the arameters and min in the arametric form g (d) =(c=d) + min, as well as wide variations in the form of the estimation rate (d; m). For examle, if g (d) =(c=d) 2 (faster than the aroximation rate examined above) and = d=m (faster than the estimation rated examined above), then for the interesting ratios of m to c (that is, where the generalization error redicted is bounded away from and the trivial value of 1=2), a figure quite similar to Figure 6 is obtained. Similar redictions can be derived for the ercetron roblem using the universal estimation rate bound or any similar ower law form. In summary, our theory redicts that although significant quantitative differences in the behavior of cross validation may arise for different model selection roblems, the roerties (A), (B), (C) and (D) should be resent in a wide range of roblems. At the very least, the behavior of our bounds exhibits these roerties for a wide range of roblems. It would be interesting to try to identify natural roblems for which one or more of these roerties is strongly violated; a otential source for such roblems may be those for which the underlying learning curve deviates from classical ower law behavior [6, 3]. 8 ACKNOWLEDGEMENTS I give warm thanks to Yishay Mansour, Dana Ron and Yoav Freund for their contributions to this work and for many interesting discussions. The code used in the exeriments was written by Andrew Ng. References [1] A. Barron. Universal aroximation bounds for suerositions of a sigmoidal function. IEEE Transactions on Information Theory, 19:93 944, [2] A. R. Barron and T. M. Cover. Minimum comlexity density estimation. IEEE Transactions on Information Theory, 37: , [3] D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigourous learning curve bounds from statistical mechanics. In Proceedings of the Seventh Annual ACM Confernce on Comutational Learning Theory, ages 76 87, [4] M. Kearns, Y. Mansour, A. Ng, and D. Ron. An exerimental and theoretical comarison of model selection methods. In Proceedings of the Eigth Annual ACM Conference on Comutational Learning Theory, [5] J. Rissanen. Stochastic Comlexity in Statistical Inquiry, volume 15 of Series in Comuter Science. World Scientific, [6] H. S. Seung, H. Somolinsky, and N. Tishby. Statistical mechanics of learning from examles. Physical Review, A45: , [7] M. Stone. Cross-validatory choice and assessment of statistical redictions. Journal of the Royal Statistical Society B, 36: , [8] M. Stone. Asymtotics for and against cross-validation. Biometrika, 64(1):29 35, [9] V. N. Vanik. Estimation of Deendences Based on Emirical Data. Sringer-Verlag, New York, [1] V. N. Vanik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their robabilities. Theory of Probability and its Alications, 16(2):264 28, 1971.

8 cv error bound, intervals, s = 1, m = 1 cv error bound, intervals, d = s = 1 slice, m= y trainfrac d trainfrac Figure 1: Plot of the bound/aroximation cv(m) g(d) + d=(1, )m+ log((1, )m)=(m) as a function of the training fraction 1, and d for m = 1, and g(d) for the intervals roblem with a target function of comlexity s = 1. Several interesting features are evident for these values, including the redicted choice of d = s = 1, and the curvature as a function of, indicating a unique otimal choice for bounded away from and 1. (Note that we have lotted the minimum of the bound and.5, since this generalization error can be trivially achieved by fliing a fair coin.) err vs cv ratio, 1 at 5/5, 3 noise, samle 5 Figure 2: Plot of the redicted generalization error of cross validation for the intervals model selection roblem, as a function of the fraction 1, of data used for training. (In the lot, the fraction of training data isontheleft( = 1) and 1 on the right ( = )). The fixed samle size m = 1; was used, and the 6 lots show the error redicted by the theory for target function comlexity values s = 1 (bottom lot), 5, 1, 25, 5, and 1 (to lot). The movement and relative sueriority of the otimal choice for identified by roerties (A), (B), (C) and (D) can be seen clearly. (We have again lotted the minimum of the the redicted error and.5.) err vs cv ratio, 1 at 5/5, 3 noise, samle Figure 3: Exerimental lot of cross validation generalization error as a function of training set size (1, )m,fors = 1 and m = 5, with 3% label noise added. As in Figure 2, the x axis indicates the amount of data used for training. For this small value of the ratio of s to m, roerty (A) redicted by the theory is confirmed: there is a wide range of values yielding essentially the same generalization error. err vs cv ratio, 5 at 5/5, 3 noise, samle 5 Figure 4: Exerimental lot of cross validation generalization error as a function of training set size (1, )m as in Figure 3, but now for the target function comlexity s = 1. Proerty (B) redicted by the theory can be seen: comared to Figure 3, the relative sueriority of the otimal over other values is increasing, and in articular larger values of (such as :5) are now clearly subotimal choices. cv bound, (c/d) for c from 1. to 15., m= y trainfrac Figure 5: Exerimental lot of cross validation generalization error as a function of training set size (1, )m,nowfors = 5. Figures 3, 4 and 5 together confirm the redicted roerty (C): as in the theoretical lot of Figure 2, desite the differing shaes of the three lots there is a fixed value (about = :1) yielding near-otimal erformance for the entire range of s. Figure 6: Plot of the redicted generalization error of cross validation for the case g(d)=(c=d), as a function of the fraction 1, of data used for training. The fixed samle size m = 25; was used, and the 6 lots show the error redicted by the theory for target function comlexity values c = 1 (bottom lot), 25, 5, 75, 1, and 15 (to lot). Proerties (A), (B), (C) and (D) are again evident. (We have again lotted the minimum of the the redicted error and.5.)

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Spectral Analysis by Stationary Time Series Modeling

Spectral Analysis by Stationary Time Series Modeling Chater 6 Sectral Analysis by Stationary Time Series Modeling Choosing a arametric model among all the existing models is by itself a difficult roblem. Generally, this is a riori information about the signal

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

8 STOCHASTIC PROCESSES

8 STOCHASTIC PROCESSES 8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Department of Electrical and Computer Systems Engineering

Department of Electrical and Computer Systems Engineering Deartment of Electrical and Comuter Systems Engineering Technical Reort MECSE-- A monomial $\nu$-sv method for Regression A. Shilton, D.Lai and M. Palaniswami A Monomial ν-sv Method For Regression A. Shilton,

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

8.7 Associated and Non-associated Flow Rules

8.7 Associated and Non-associated Flow Rules 8.7 Associated and Non-associated Flow Rules Recall the Levy-Mises flow rule, Eqn. 8.4., d ds (8.7.) The lastic multilier can be determined from the hardening rule. Given the hardening rule one can more

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT

Keywords: pile, liquefaction, lateral spreading, analysis ABSTRACT Key arameters in seudo-static analysis of iles in liquefying sand Misko Cubrinovski Deartment of Civil Engineering, University of Canterbury, Christchurch 814, New Zealand Keywords: ile, liquefaction,

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization

More information

Fuzzy Automata Induction using Construction Method

Fuzzy Automata Induction using Construction Method Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control

More information

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale

Multilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale World Alied Sciences Journal 5 (5): 546-552, 2008 ISSN 1818-4952 IDOSI Publications, 2008 Multilayer Percetron Neural Network (MLPs) For Analyzing the Proerties of Jordan Oil Shale 1 Jamal M. Nazzal, 2

More information

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.

More information

3.4 Design Methods for Fractional Delay Allpass Filters

3.4 Design Methods for Fractional Delay Allpass Filters Chater 3. Fractional Delay Filters 15 3.4 Design Methods for Fractional Delay Allass Filters Above we have studied the design of FIR filters for fractional delay aroximation. ow we show how recursive or

More information

Characteristics of Beam-Based Flexure Modules

Characteristics of Beam-Based Flexure Modules Shorya Awtar e-mail: shorya@mit.edu Alexander H. Slocum e-mail: slocum@mit.edu Precision Engineering Research Grou, Massachusetts Institute of Technology, Cambridge, MA 039 Edi Sevincer Omega Advanced

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Topic 7: Using identity types

Topic 7: Using identity types Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules

More information

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS Massimo Vaccarini Sauro Longhi M. Reza Katebi D.I.I.G.A., Università Politecnica delle Marche, Ancona, Italy

More information

M. Opper y. Institut fur Theoretische Physik. Justus-Liebig-Universitat Giessen. D-6300 Giessen, Germany. physik.uni-giessen.dbp.

M. Opper y. Institut fur Theoretische Physik. Justus-Liebig-Universitat Giessen. D-6300 Giessen, Germany. physik.uni-giessen.dbp. Query by Committee H. S. Seung Racah Institute of Physics and Center for eural Comutation Hebrew University Jerusalem 9194, Israel seung@mars.huji.ac.il M. Oer y Institut fur Theoretische Physik Justus-Liebig-Universitat

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

John Weatherwax. Analysis of Parallel Depth First Search Algorithms Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel

More information

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing

Period-two cycles in a feedforward layered neural network model with symmetric sequence processing PHYSICAL REVIEW E 75, 4197 27 Period-two cycles in a feedforward layered neural network model with symmetric sequence rocessing F. L. Metz and W. K. Theumann Instituto de Física, Universidade Federal do

More information

Generalized Coiflets: A New Family of Orthonormal Wavelets

Generalized Coiflets: A New Family of Orthonormal Wavelets Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University

More information

Effective conductivity in a lattice model for binary disordered media with complex distributions of grain sizes

Effective conductivity in a lattice model for binary disordered media with complex distributions of grain sizes hys. stat. sol. b 36, 65-633 003 Effective conductivity in a lattice model for binary disordered media with comlex distributions of grain sizes R. PIASECKI Institute of Chemistry, University of Oole, Oleska

More information

FE FORMULATIONS FOR PLASTICITY

FE FORMULATIONS FOR PLASTICITY G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system

More information

The Noise Power Ratio - Theory and ADC Testing

The Noise Power Ratio - Theory and ADC Testing The Noise Power Ratio - Theory and ADC Testing FH Irons, KJ Riley, and DM Hummels Abstract This aer develos theory behind the noise ower ratio (NPR) testing of ADCs. A mid-riser formulation is used for

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Lower bound solutions for bearing capacity of jointed rock

Lower bound solutions for bearing capacity of jointed rock Comuters and Geotechnics 31 (2004) 23 36 www.elsevier.com/locate/comgeo Lower bound solutions for bearing caacity of jointed rock D.J. Sutcliffe a, H.S. Yu b, *, S.W. Sloan c a Deartment of Civil, Surveying

More information

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP Submitted to the Annals of Statistics arxiv: arxiv:1706.07237 CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP By Johannes Tewes, Dimitris N. Politis and Daniel J. Nordman Ruhr-Universität

More information

arxiv: v1 [quant-ph] 22 Apr 2017

arxiv: v1 [quant-ph] 22 Apr 2017 Quaternionic Quantum Particles SERGIO GIARDINO Institute of Science and Technology, Federal University of São Paulo (Unifes) Avenida Cesare G. M. Lattes 101, 147-014 São José dos Camos, SP, Brazil arxiv:1704.06848v1

More information

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule

The Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca

More information

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE THE 19 TH INTERNATIONAL CONFERENCE ON COMPOSITE MATERIALS A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE K.W. Gan*, M.R. Wisnom, S.R. Hallett, G. Allegri Advanced Comosites

More information

t 0 Xt sup X t p c p inf t 0

t 0 Xt sup X t p c p inf t 0 SHARP MAXIMAL L -ESTIMATES FOR MARTINGALES RODRIGO BAÑUELOS AND ADAM OSȨKOWSKI ABSTRACT. Let X be a suermartingale starting from 0 which has only nonnegative jums. For each 0 < < we determine the best

More information

Recursive Estimation of the Preisach Density function for a Smart Actuator

Recursive Estimation of the Preisach Density function for a Smart Actuator Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator

More information

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0 Advanced Finite Elements MA5337 - WS7/8 Solution sheet This exercise sheets deals with B-slines and NURBS, which are the basis of isogeometric analysis as they will later relace the olynomial ansatz-functions

More information

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS

KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS 4 th International Conference on Earthquake Geotechnical Engineering June 2-28, 27 KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS Misko CUBRINOVSKI 1, Hayden BOWEN 1 ABSTRACT Two methods for analysis

More information

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1) Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

1 Probability Spaces and Random Variables

1 Probability Spaces and Random Variables 1 Probability Saces and Random Variables 1.1 Probability saces Ω: samle sace consisting of elementary events (or samle oints). F : the set of events P: robability 1.2 Kolmogorov s axioms Definition 1.2.1

More information

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM

More information

One step ahead prediction using Fuzzy Boolean Neural Networks 1

One step ahead prediction using Fuzzy Boolean Neural Networks 1 One ste ahead rediction using Fuzzy Boolean eural etworks 1 José A. B. Tomé IESC-ID, IST Rua Alves Redol, 9 1000 Lisboa jose.tome@inesc-id.t João Paulo Carvalho IESC-ID, IST Rua Alves Redol, 9 1000 Lisboa

More information

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China

More information

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE

More information

arxiv: v2 [stat.me] 3 Nov 2014

arxiv: v2 [stat.me] 3 Nov 2014 onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

A compression line for soils with evolving particle and pore size distributions due to particle crushing

A compression line for soils with evolving particle and pore size distributions due to particle crushing Russell, A. R. (2011) Géotechnique Letters 1, 5 9, htt://dx.doi.org/10.1680/geolett.10.00003 A comression line for soils with evolving article and ore size distributions due to article crushing A. R. RUSSELL*

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Hidden Predictors: A Factor Analysis Primer

Hidden Predictors: A Factor Analysis Primer Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

An Analysis of TCP over Random Access Satellite Links

An Analysis of TCP over Random Access Satellite Links An Analysis of over Random Access Satellite Links Chunmei Liu and Eytan Modiano Massachusetts Institute of Technology Cambridge, MA 0239 Email: mayliu, modiano@mit.edu Abstract This aer analyzes the erformance

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin

PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH Luoluo Liu, Trac D. Tran, and Sang Peter Chin Det. of Electrical and Comuter Engineering, Johns Hokins Univ., Baltimore, MD 21218, USA {lliu69,trac,schin11}@jhu.edu

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at

Proof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at A Scaling Result for Exlosive Processes M. Mitzenmacher Λ J. Sencer We consider the following balls and bins model, as described in [, 4]. Balls are sequentially thrown into bins so that the robability

More information

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation

Uniformly best wavenumber approximations by spatial central difference operators: An initial investigation Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations

More information

Finding a sparse vector in a subspace: linear sparsity using alternating directions

Finding a sparse vector in a subspace: linear sparsity using alternating directions IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright

More information

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of.. IOSR Journal of Mathematics (IOSR-JM) e-issn: 78-578, -ISSN: 319-765X. Volume 1, Issue 4 Ver. III (Jul. - Aug.016), PP 53-60 www.iosrournals.org Asymtotic Proerties of the Markov Chain Model method of

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

Best approximation by linear combinations of characteristic functions of half-spaces

Best approximation by linear combinations of characteristic functions of half-spaces Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of

More information

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin

MAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin November 3, 1994 MAKING WALD TESTS WORK FOR COINTEGRATED VAR SYSTEMS Juan J. Dolado CEMFI Casado del Alisal, 5 28014 Madrid and Helmut Lutkeohl Humboldt Universitat zu Berlin Sandauer Strasse 1 10178 Berlin,

More information

An Improved Calibration Method for a Chopped Pyrgeometer

An Improved Calibration Method for a Chopped Pyrgeometer 96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 17 An Imroved Calibration Method for a Choed Pyrgeometer FRIEDRICH FERGG OtoLab, Ingenieurbüro, Munich, Germany PETER WENDLING Deutsches Forschungszentrum

More information

7.2 Inference for comparing means of two populations where the samples are independent

7.2 Inference for comparing means of two populations where the samples are independent Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht

More information