A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split
|
|
- Elaine Chandler
- 5 years ago
- Views:
Transcription
1 A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 Abstract: We give an analysis of the generalization error of cross validation in terms of two natural measures of the difficulty of the roblem under consideration: the aroximation rate (the accuracy to which the target function can be ideally aroximated as a function of the number of hyothesis arameters), and the estimation rate (the deviation between the training and generalization errors as a function of the number of hyothesis arameters). The aroximation rate catures the comlexity of the target function with resect to the hyothesis model, and the estimation rate catures the extent to which the hyothesis model suffers from overfitting. Using these two measures, we give a rigorous and general bound on the error of cross validation. The bound clearly shows the tradeoffs involved with making the fraction of data saved for testing too large or too small. By otimizing the bound with resect to, we then argue (through a combination of formal analysis, lotting, and controlled exerimentation) that the following qualitative roerties of cross validation behavior should be quite robust to significant changes in the underlying model selection roblem: When the target function comlexity is small comared to the samle size, the erformance of cross validation is relatively insensitive to the choice of. The imortance of choosing otimally increases, and the otimal value for decreases, as the target function becomes more comlex relative to the samle size. There is nevertheless a single fixed value for that works nearly otimally for a wide range of target function comlexity. Category: Learning Theory. Prefer oral resentation. 1 INTRODUCTION In this aer we analyze the erformance of cross validation in the context of model selection and comlexity regularization. We work in a setting in which we must choose the right number of arameters for a hyothesis function in resonse to a finite training samle, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emhasizes asymtotic statistical roerties, or the exact calculation of the generalization error for simle models. (The literature is too large to survey here; foundational aers include those of Stone [7, 8].) Our aroach here is somewhat different, and is rimarily insired by two sources: the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (MDL in their case) in terms of a quantity known as the index of resolvability; and the work of Vanik [9], who rovides extremely owerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation erformance. In the first and more formal art of the aer, we give a rigorous bound on the error of cross validation in terms of two arameters of the underlying model selection roblem (which is defined by a target function, an inut distribution, and a nested sequence of increasingly comlex classes of hyothesis functions): the aroximation rate and the estimation rate, mentioned in the abstract and defined formally shortly. Taken together, these two roblem arameters determine (our analogue of) the index of resolvability, and Vanik s work yields estimation rates alicable to many natural roblems. In the second art of the aer, we investigate the imlications of our bound for choosing, the fraction of data withheld for testing in cross validation. The most interesting asect of this analysis is the identification of several qualitative
2 roerties (identifiedin the abstract, and in greater detail in the aer) of the otimal that aear to be invariant over a wide class of model selection roblems. 2 THE FORMALISM In this aer we consider model selection as a two-art roblem: choosing the aroriate number of arameters for the hyothesis function, and tuning these arameters. The training samle is used in both stes of this rocess. In many settings, the tuning of the arameters is determined by a fixed learning algorithm such as backroagation, and then model selection reduces to the roblem of choosing the architecture (which determines the number of weights, the connectivity attern, and so on). For concreteness, here we adot an idealized version of this division of labor. We assume a nested sequence of function classes H 1 H d, called the structure [9], where H d is a class of boolean functions of d arameters, each function being a maing from some inut sace X into f; 1g. For simlicity, in this aer we assume that the Vanik-Chervonenkis (VC) dimension [1, 9] of the class H d is O(d). To remove this assumtion, one simly relaces all occurrences of d in our bounds by the VC dimension of H d. We assume that we have in our ossession a learning algorithm L that on inut any training samle S and any value d will outut a hyothesis function h d 2 H d that minimizes the training error over H d thatis, t (h d )=min h2hd f t (h)g, where t (h) is the fraction of the examles in S on which h disagrees with the given label. In many situations, training error minimization is known to be comutationally intractable, leading researchers to investigate heuristics such as backroagation. The extent to which the theory resented here alies to such heuristics will deend in art on the extent to which they aroximate training error minimization for the roblem under consideration; however, many of our results have rigorous generalizations for the case in which no assumtions are made on the heuristic L (with the quality of the boundsof course decaying with the quality of the hyotheses oututby L). Model selection is thus the roblem of choosing the best value of d. More recisely, we assume an arbitrary target function f (which may or may not reside in one of the function classes in the structure H 1 H d ), and an inut distribution D; f and D together define the generalization error function g (h) =Pr x2d [h(x) 6= f (x)]. We are given a training samle S of f, consisting of m random examles drawn according to D and labeled by f (with the labels ossibly corruted by noise). In many model selection methods (such as Rissanen s Minimum Descrition Length Princile [5] and Vanik s Guaranteed Risk Minimization [9]), for each value of d = 1; 2; 3;::: we give the entire samle S and d to the learning algorithm L to obtain the function h d minimizing the training error in H d.some function of d and the training errors t (h d ) is then used to choose among the sequence h 1 ;h 2 ;h 3 ;:::. Whatever the method, we interret the goal to be that of minimizing the generalization error of the hyothesis selected. In this aer, we will make the rather mild but very useful assumtion that the structure has the roerty that for any samle size m, there is a value d max (m) such that t (h d max (m)) =forany labeled samle S of m examles; we will refer to the function d max (m) as the fitting number of the structure 1. The fitting number formalizes the simle notion that with enough arameters, we can always fit the training data erfectly, a roerty held by most sufficiently owerful function classes (including multilayer neural networks). We tyically exect the fitting number to be a linear function of m, or at worst a olynomial in m. The significance of the fitting number for us is that no reasonable model selection method should choose h d for d d max (m), since doing so simly adds comlexity without reducing the training error. In this aer we concentrate on the simlest version of cross validation. Unlike the methods mentioned above, which use the entire samle for training the h d, in cross validation we choose a arameter 2 [; 1], which determines the slit between training and test data. Given the inut samle S of m examles, let S be the subsamle consisting of the first (1, )m examles in S,andS the subsamle consisting of the last m examles. In cross validation, rather than giving the entire samle S to L, we give only the smaller samle S, resulting in the sequence h 1 ;:::;h d max ((1,)m) of increasingly comlex hyotheses. Each hyothesis is now obtained by training on only (1, )m examles, which imlies that we will consider only d smaller than the corresonding fitting number d max ((1, )m); let us introduce the shorthand d max for d max((1, )m). Cross validation chooses the h d satisfying h d = min i2f1;:::;d max g f t (h i)g where t (h i) is the error of h i on the subsamle S. Notice that we are not considering multifold cross validation, or other variants that make more efficient use of the samle, because our analyses will require the indeendence of the test set. However, we believe that many of the themes that emerge here aly to these more sohisticated variants as well. We use cv (m) to denote the generalization error g (h d ) of the hyothesis h d chosen by cross validation when given as inut a samle S of m random examles of the target function; obviously, cv (m) deends on the structure, f, D, and the noise rate. When bounding cv (m), we will use the exression with high robability to mean with robability 1 Weaker definitions for d max(m) also suffice, but this is the simlest.
3 1, over the samle S, for some small fixed constant >. All of our results can also be stated with as a arameter at the cost of a log(1=) factor in the bounds, or in terms of the exected value of cv (m). 3 THE APPROXIMATION RATE It is aarent that we should not exect to give nontrivial bounds on cv (m) that take no account of some measure of the comlexity of the unknown target function f; the correct measure of this comlexity is less obvious. Following the examle of Barron and Cover s analysis of MDL erformance in the context of density estimation [2], we roose the aroximation rate as a natural measure of the comlexity of f and D in relationshi to the chosen structure H 1 H d. Thus we define the aroximation rate function g (d) to be g (d) =min h2hd f g (h)g. The function g (d) tells us the best generalization error that can be achieved in the class H d, and it is a nonincreasing function of d. If g (d )=forsomesufficiently large d, this means that the target function f, at least with resect to the inut distribution, is realizable in the class H d, and thus d isacoarsemeasureofhowcomlexf is. More generally, even if g (d) > foralld, the rate of decay of g (d) still gives a nice indication of how much reresentational ower we gain with resect to f and D by increasing the comlexity of ourmodels. Still missing, of course, issome means of determining the extent to which this reresentational ower can be realized by training on a finite samle of a given size, but this will be added shortly. First we give examles of the aroximation rate that we will examine in some detail following the general bound on cv (m). The Intervals Problem. In this roblem, the inut sace X is the real interval [; 1], and the class H d of the structure consists of all boolean ste functions over [; 1] of at most d stes; thus, each function artitions the interval [; 1] into at most d disjoint segments (not necessarily of equal width), and assigns alternating ositive and negative labels to these segments. Thus, the inut sace is one-dimensional, but the structure contains arbitrarily comlex functions over [; 1]. It is easily verified that our assumtion that the VC dimension of H d is O(d) holds here, and that the fitting number obeys d max (m) m. Now suose that the inut density D is uniform, and suose that the target function f is the function of s alternating segments of equal width 1=s, forsomes (thus, f lies in the class H s ). We will refer to these settings as the intervals roblem. Then the aroximation rate g (d) is g (d) =(1=2)(1, d=s) for 1 d<sand g (d) =ford s. Thus, as long as d<s, increasing the comlexity d gives linear ayoff in terms of decreasing the otimal generalization error. For d s, there is no ayoff forincreasing d, since we can already realize the target function. The reader can easily verify that if f lies in H s, but does not have equal width intervals, g (d) is still iecewise linear, but for d<sis concave u : the gain in aroximation obtained by incrementally increasing d diminishes as d becomes larger. Although the intervals roblem is rather simle and artificial, a recise analysis of cross validation behavior can be given for it, and we will argue that this behavior is reresentative of much broader and more realistic settings. The Percetron Problem. In this roblem, the inut sace X is < N for some large natural number N. The class H d consists of all ercetrons over the N inuts in which at most d weights are nonzero. If the inut density is sherically symmetric (for instance, the uniform density on the unit ball in < N ), and the target function is a function in H s with all s nonzero weights equal to 1, then it can be shown that the aroximation rate function g (d) is g (d) =(1=) cos,1 ( d=n ) for d<s[6], and of course g (d) =ford s. This roblem rovides a nice contrast to the intervals roblem, since here the behavior of the aroximation rate for small d is concave down: as long as d<s, an incremental increase in d yields more aroximative ower for large d than it does for small d (excet for very small values of d). Power Law Decay. In addition to the secific examles just given, we would also like to study reasonably natural arametric forms of g (d), to determine the sensitivity of our theory to a lausible range of behaviors for the aroximation rate. This is imortant, since in ractice we do not exect to have recise knowledge of g (d), since it deends on the target function and inut distribution. Following the work of Barron [1], who shows a c=d bound on g (d) for the case of neural networks with one hidden layer under a squared error generalization measure (where c is a measure of target function comlexity in terms of a Fourier transform integrability condition) 2, in the later art of the aer we investigate g (d) of the form (c=d) + min,where min is a arameter reresenting the degree of unrealizability of f with resect to the structure, and c; > are arameters caturing the rate of decay to min. Our analysis concludes that the qualitative henomena we identify are invariant to wide ranges of choices for these arameters. Note that for all three cases, there is a natural measure of target function comlexity catured by g (d): in the intervals roblem it is the number of target intervals, in the ercetron roblem it is the number of nonzero weights in the target, 2 Since the bounds we will give have straightforward generalizations to real-valued function learning under squared error (details in the full aer), examining behavior for g(d) in this setting seems reasonable.
4 and in the more general ower law case it is catured by the arameters of the ower law. In the later art of the aer, we will study cross validation erformance as a function of these comlexity measures, and obtain remarkably similar redictions. 4 THE ESTIMATION RATE For a fixed f, D and H 1 H d, we say that a function (d; m) is an estimation rate bound if for all d and m, with high robabilityover the samle S we have j t (h d ), g (h d )j(d; m), where as usual h d is the result of training error minimization on S within H d. Thus (d; m) simly bounds the deviation between the training error and the generalization error of h 3 d. Note that the best such bound may deend in a comlicated way on all of the elements of the roblem: f, D and the structure. Indeed, much of the recent work on the statistical hysics theory of learning curves has documented the wide variety of behaviors that such deviations may assume [6, 3]. However, for many natural roblems it is both convenient and accurate to rely on a universal estimation rate bound rovided by the owerful theory of uniform convergence: Namely, for any f, D and any structure the function (d; m) = (d=m) log(m=d) is an estimation rate bound [9] 4. Deending uon the details of the roblem, it is sometimes aroriate to omit the log(m=d) factor, and often aroriate to refine the d=m behavior to a function that interolates smoothly between d=m behavior for small t to d=m for large t. Although such refinements are both interesting and imortant, many of the qualitative claims and redictions we will make are invariant to them as long as the deviation j t (h d ), g (h d )j is well-aroximated by a ower law (d=m) (>); it will be more imortant to recognize and model the cases in which ower law behavior is grossly violated. Note that this universal estimation rate bound holds only under the assumtion that the training samle is noise-free, but straightforward generalizations exist. For instance, if the training data is corruted by random label noise at rate <1=2, then (d; m) = (d=(1, 2) 2 m)log(m=d) is again a universal estimation rate bound. After giving a general bound on cv (m) in which the aroximation and estimation rate functions are arameters, we investigate the behavior of cv (m) (and more secifically, of the arameter ) for secific choices of these arameters. 5 THE BOUND Theorem 1 Let H 1 H d be any structure, where the VC dimension of H d is O(d). Let f and D be any target function and inut distribution, let g (d) be the aroximation rate function for the structure with resect to f and D, and let (d; m) be an estimation rate bound for the structure with resect to f and D. Then for any m, with high robability cv (m) min 1dd max f g (d) +(d; (1, )m)g + s log(d max) m 1 A (1) where is the fraction of the training samle used for testing, and dmax = d max ((1, )m). Using the universal estimation bound rate and the rather weak assumtion that d max (m) is olynomial in m, we obtain that with high robability cv (m) min 1dd max ( g (d) +O s d (1, )m log m d!) + O s log((1, )m) m! : (2) Straightforward generalizations of these bounds for the case where the data is corruted by random label noise can be obtained, using the modified estimation rate bound mentioned in Section 4 (details in the full aer). Proof Sketch: We have sace only to highlight the main ideas of the roof. For each d from 1 to d max, fix a function f d 2 H d satisfying g (f d )= g (d); thus, f d is the best ossible aroximation to the target f within the class H d. By a standard Chernoff bound argument it can be shown that with high robability we have j t (f d ), g (f d )j log(dmax )=m for all 1 d d max. This means that within each H d, the minimum training error t (h d ) is at 3 In the later and less formal art of the aer, we will often assume that secific forms of (d; m) are not merely uer bounds on this deviation, but accurate aroximations to it. 4 The results of Vanik actually show the stronger result that jt(h), g(h)j (d=m) log(m=d) for all h 2 Hd, not only for the training error minimizer hd.
5 most g (d) + log(d max)=m. Since(d; m) is an estimation rate bound, we have that with high robability, for all 1 d d max, g (h d ) t (h d )+(d; m) (3) q g (d) + log(d max )=m + (d; m): (4) Thus we have bounded the generalization error of the d max hyotheses h d, only one of which will be selected. If we knew the actual values of these generalization errors (equivalent to having an infinite test samle in cross validation), we could bound our error by the minimum over all 1 d d max of the exression (4) above. However, in cross validation we do not know the exact values of these generalization errors but must instead use the m testing examles to estimate them. Again by standard Chernoff bound arguments, this introduces an additional log(d max )=(m) error term, resulting in our final bound. This concludes the roof sketch. In the bounds given by (1) and (2), the minfg exression is analogous to Barron and Cover s index of resolvability [2]; the final term in the bounds reresents the error introduced by the testing hase of cross validation. These bounds exhibit tradeoff behavior with resect to the arameter : as we let aroach, we are devoting more and more of the samle to training the h d, and the estimation rate bound term (d; (1, )m) is decreasing. However, the test error term O( log(d max )=(m)) is increasing, since we have less data to accurately estimate the g (h d ). The reverse henomenon occurs as we let aroach 1. While we believe Theorem 1 to be enlightening and otentially useful in its own right, we would now like to take its interretation a ste further. More recisely, suose we assume that the bound is an aroximation to the actual behavior of cv (m). Then in rincile we can otimize the bound to obtain the best value for. Of course, in addition to the assumtions involved (the main one being that (d; m) is a good aroximation to the training-generalization error deviations of the h d ), this analysis can only be carried out given information that we should not exect to have in ractice (at least in exact form) in articular, the aroximation rate function g (d), which deends on f and D. However, we argue in the coming sections that several interesting qualitative henomena regarding the choice of are largely invariant to a wide range of natural behaviors for g (d). 6 A CASE STUDY: THE INTERVALS PROBLEM We begin by erforming the suggested otimization of for the intervals roblem. Recall that the aroximation rate here is g (d) = (1=2)(1, d=s) for d<sand g (d) =ford s, wheres is the comlexity of the target function. Here we analyze the behavior obtained by assuming that the estimation rate (d; m) actually behaves as (d; m) = d=(1, )m (so we are omitting the log factor from the universal bound) 5, and to simlify the formal analysis a bit (but without changing the qualitative behavior) we relace the term log((1, )m)=(m) by the weaker log(m)=m. Thus, if we define the function F (d; m; ) = g (d) + d=(1, )m + log(m)=(m) (5) then following (1), we are aroximating cv (m) by cv (m) min 1dd max ff (d; m; )g 6 (see Figure 1 for a lot of this aroximation). The first ste of the analysis is to fix a value for and differentiate F (d; m; ) with resect to d to discover the minimizing value of d. This differentiation must have two regimes due to the discontinuity at d = s in g (d). It is easily verified that the derivative is,(1=2s) +1=(2 d(1, )m) for d<sand 1=(2 d(1, )m) for d s. It can be shown that rovided that (1, )m 4s then d = s is a global minimum of of F (d; m; ), andif this condition is violated then the value we obtain for cv (m) is vacuously large anyway (meaning that this fixed choice of can not end u being the otimal one, or, if m<4s, that our analysis claims we simly do not have enough data for nontrivial generalization, regardless of how we slit it between training and testing). Plugging in d = s yields cv (m) F (s; m; ) = s=(1, )m + log(m)=(m) for this fixed choice of. Now by differentiating F (s; m; ) with resect to, it can be shown that the otimal choice of under the assumtions is ot =(log(m)=s) 1=3 =(1 +(log(m)=s) 1=3 ). 5 It can be argued that this ower law estimation rate is actually a rather accurate aroximation for the true behavior of the training-generalization deviations of the hd for this roblem. 6 Although there are hidden constants in the O() notation of the bounds, it is the relative weights of the estimation and test error terms that is imortant, and choosing both constants equal to 1 is a reasonable choice (since both terms have the same Chernoff bound origins).
6 It is imortant to remember at this oint that desite the fact that we have derived a recise exression for ot, due to the assumtions and aroximations we have made in the various constants, any quantitative interretation of this exression is meaningless. However, we can reasonably exect that this exression catures the qualitative way in which the otimal changes as the amount of data m changes in relation to the target function comlexity s. Onthis score the situation initially aears rather bleak, as the function (log(m)=s) 1=3 =(1 +(log(m)=s) 1=3 ) is quite sensitive to the ratio log(m)=s. For examle, for m = 1, if s = 1 we obtain ot = :524,ifs = 1 we obtain ot = :338,andifs = 1 we obtain ot = :191. Thus ot is becoming smaller as log(m)=s becomes small, and the analysis suggests vastly different choices for ot deending on the target function comlexity, which is something we do not exect to have the luxury of knowing in ractice. However, it is both fortunate and interesting that ot does not tell the entire story. In Figure 2, we lot the function F (s; m; ) 7 as a function of for m = 1 and for several different values of s (note that for consistency with the later exerimental lots, the x axis of the lot is actually the training fraction 1, ). Here we can observe four imortant qualitative henomena: (A) When s is small comared to m, the redicted error is relatively insensitive to the choice of : as a function of, F (s; m; ) has a wide, flat bowl, indicating a wide range of yielding essentially the same near-otimal error. (B) As s becomes larger in comarison to the fixed samle size m, the relative sueriority of ot over other values for becomes more ronounced. In articular, large values for become rogressively worse as s increases. For examle, the lots indicate that for s = 1 (again, m = 1), even though ot = :524 the choice = :75 will result in error quite near that achieved using ot.however,for s = 5, = :75 is redicted to yield greatly subotimal error. (C) Because of the insensitivity to for s small comared to m, thereisafixed value of which seems to yield reasonably good erformance for a wide range of values for s; this value is essentially the value of ot for the case where s is large (but nontrivial generalization is still ossible), since choosing the best value for is more imortant there than for the small s case. Note that for very large s, the boundredicts vacuously large error for all values of, so that thechoice of again becomes irrelevant. (D) The value of ot is decreasing as s increases. This is slightly difficult to confirm from the lot, but can be seen clearly from the recise exression for ot. Desite the fact that our analysis so far has been rather secialized (addressing the behavior for a fixed structure, target function and inut distribution), it is our belief that (A), (B), (C) and (D) above are rather universal henomena that hold for many other model selection roblems. For instance, we shall shortly demonstrate that our theory again redicts that (A), (B), (C) and (D) hold for the ercetron roblem, and for case of ower law decay of g (d) described earlier. First we give an exerimental demonstration that at least the redicted roerties (A), (B) and (C) truly do hold for the intervals roblem. In Figures 3, 4 and 5, we lot the results of exeriments in which labeled random samles of size m = 5 were generated for a target function of s equal width intervals, for s = 1; 1 and 5. The samles were corruted by random label noise at rate = :3. For each value of and each value of d, (1, )m of the samle was given to a rogram imlementing training error minimization for the class H 8 d ; the remaining m examles were used to select the best h d according to cross validation. The lots show the true generalization error of the h d selected by cross validation as a function of ; this generalization error can be comuted exactly for this roblem. Each oint in the lots reresents an average over 1 trials. While there are obvious and significant quantitative differences between these exerimental lots and Figure 2, the roerties (A), (B) and (C) are rather clearly borne out by the data: (A) In Figure 3, where s is small comared to m, there is a wide range of accetable ; it aears that any choice of between :1 and :5 yields nearly otimal generalization error. (B) By the time s = 1 (Figure 4), the sensitivity to is considerably more ronounced. For examle, the choice = :5 now results in clearly subotimal erformance, and it is more imortantto have close to :1. (C) Desite these comlexities, there does indeed aear to be single value of aroximately :1 that erforms nearly otimally for the entire range of s examined. The roerty (D) namely, that the otimal decreases as the target function comlexity is increased relative to a fixed m is certainly not refuted by the exerimental results, but any such effect is simly too small to be verified. It 7 In the lots, we now use the more accurate test enalty term log((1, )m)=(m) since we are no longer concerned with simlifying the calculation. 8 A nice feature of the intervals roblem is the fact that training error minimization can be erformed in almost linear time using a dynamic rogramming aroach [4].
7 would be interesting to verify this rediction exerimentally, erhas on a different roblem where the redicted effect is more ronounced. 7 POWER LAW DECAY AND THE PERCEPTRON PROBLEM For the cases where the aroximation rate g (d) obeys either ower law decay or is that derived for the ercetron roblem discussed in Section 3, the behavior of cv (m) as a function of redicted by our theory is largely the same. For examle, if g (d) = (c=d) and we use the standard estimation rate (d; m) = d=(1, )m, thenan analysis similar to that erformed for the intervals roblem reveals that for fixed the minimizing choice of d is d =(4c 2 (1, )m) 1=3 ; lugging this value of d back into the bound on cv (m) yields Figure 6, which, like Figure 2 for the intervals roblem, shows the redicted behavior of cv (m) as a function of for a fixed m and several different choices for the comlexity arameter c. We again see that roerties (A), (B), (C) and (D) hold strongly desite the change in g (d) from the intervals roblem (although quantitative asects of the rediction, which already must be taken lightly for reasons reviously stated, have obviously changed, such as the interesting values of the ratio of samle size to target function comlexity). Through a combination of formal analysis and lotting, it is ossible to demonstrate that the roerties (A), (B), (C) and (D) are robust to wide variations in the arameters and min in the arametric form g (d) =(c=d) + min, as well as wide variations in the form of the estimation rate (d; m). For examle, if g (d) =(c=d) 2 (faster than the aroximation rate examined above) and = d=m (faster than the estimation rated examined above), then for the interesting ratios of m to c (that is, where the generalization error redicted is bounded away from and the trivial value of 1=2), a figure quite similar to Figure 6 is obtained. Similar redictions can be derived for the ercetron roblem using the universal estimation rate bound or any similar ower law form. In summary, our theory redicts that although significant quantitative differences in the behavior of cross validation may arise for different model selection roblems, the roerties (A), (B), (C) and (D) should be resent in a wide range of roblems. At the very least, the behavior of our bounds exhibits these roerties for a wide range of roblems. It would be interesting to try to identify natural roblems for which one or more of these roerties is strongly violated; a otential source for such roblems may be those for which the underlying learning curve deviates from classical ower law behavior [6, 3]. 8 ACKNOWLEDGEMENTS I give warm thanks to Yishay Mansour, Dana Ron and Yoav Freund for their contributions to this work and for many interesting discussions. The code used in the exeriments was written by Andrew Ng. References [1] A. Barron. Universal aroximation bounds for suerositions of a sigmoidal function. IEEE Transactions on Information Theory, 19:93 944, [2] A. R. Barron and T. M. Cover. Minimum comlexity density estimation. IEEE Transactions on Information Theory, 37: , [3] D. Haussler, M. Kearns, H.S. Seung, and N. Tishby. Rigourous learning curve bounds from statistical mechanics. In Proceedings of the Seventh Annual ACM Confernce on Comutational Learning Theory, ages 76 87, [4] M. Kearns, Y. Mansour, A. Ng, and D. Ron. An exerimental and theoretical comarison of model selection methods. In Proceedings of the Eigth Annual ACM Conference on Comutational Learning Theory, [5] J. Rissanen. Stochastic Comlexity in Statistical Inquiry, volume 15 of Series in Comuter Science. World Scientific, [6] H. S. Seung, H. Somolinsky, and N. Tishby. Statistical mechanics of learning from examles. Physical Review, A45: , [7] M. Stone. Cross-validatory choice and assessment of statistical redictions. Journal of the Royal Statistical Society B, 36: , [8] M. Stone. Asymtotics for and against cross-validation. Biometrika, 64(1):29 35, [9] V. N. Vanik. Estimation of Deendences Based on Emirical Data. Sringer-Verlag, New York, [1] V. N. Vanik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their robabilities. Theory of Probability and its Alications, 16(2):264 28, 1971.
8 cv error bound, intervals, s = 1, m = 1 cv error bound, intervals, d = s = 1 slice, m= y trainfrac d trainfrac Figure 1: Plot of the bound/aroximation cv(m) g(d) + d=(1, )m+ log((1, )m)=(m) as a function of the training fraction 1, and d for m = 1, and g(d) for the intervals roblem with a target function of comlexity s = 1. Several interesting features are evident for these values, including the redicted choice of d = s = 1, and the curvature as a function of, indicating a unique otimal choice for bounded away from and 1. (Note that we have lotted the minimum of the bound and.5, since this generalization error can be trivially achieved by fliing a fair coin.) err vs cv ratio, 1 at 5/5, 3 noise, samle 5 Figure 2: Plot of the redicted generalization error of cross validation for the intervals model selection roblem, as a function of the fraction 1, of data used for training. (In the lot, the fraction of training data isontheleft( = 1) and 1 on the right ( = )). The fixed samle size m = 1; was used, and the 6 lots show the error redicted by the theory for target function comlexity values s = 1 (bottom lot), 5, 1, 25, 5, and 1 (to lot). The movement and relative sueriority of the otimal choice for identified by roerties (A), (B), (C) and (D) can be seen clearly. (We have again lotted the minimum of the the redicted error and.5.) err vs cv ratio, 1 at 5/5, 3 noise, samle Figure 3: Exerimental lot of cross validation generalization error as a function of training set size (1, )m,fors = 1 and m = 5, with 3% label noise added. As in Figure 2, the x axis indicates the amount of data used for training. For this small value of the ratio of s to m, roerty (A) redicted by the theory is confirmed: there is a wide range of values yielding essentially the same generalization error. err vs cv ratio, 5 at 5/5, 3 noise, samle 5 Figure 4: Exerimental lot of cross validation generalization error as a function of training set size (1, )m as in Figure 3, but now for the target function comlexity s = 1. Proerty (B) redicted by the theory can be seen: comared to Figure 3, the relative sueriority of the otimal over other values is increasing, and in articular larger values of (such as :5) are now clearly subotimal choices. cv bound, (c/d) for c from 1. to 15., m= y trainfrac Figure 5: Exerimental lot of cross validation generalization error as a function of training set size (1, )m,nowfors = 5. Figures 3, 4 and 5 together confirm the redicted roerty (C): as in the theoretical lot of Figure 2, desite the differing shaes of the three lots there is a fixed value (about = :1) yielding near-otimal erformance for the entire range of s. Figure 6: Plot of the redicted generalization error of cross validation for the case g(d)=(c=d), as a function of the fraction 1, of data used for training. The fixed samle size m = 25; was used, and the 6 lots show the error redicted by the theory for target function comlexity values c = 1 (bottom lot), 25, 5, 75, 1, and 15 (to lot). Proerties (A), (B), (C) and (D) are again evident. (We have again lotted the minimum of the the redicted error and.5.)
4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationRadial Basis Function Networks: Algorithms
Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.
More informationOn split sample and randomized confidence intervals for binomial proportions
On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have
More informationApproximating min-max k-clustering
Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationRecent Developments in Multilayer Perceptron Neural Networks
Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael
More informationNotes on Instrumental Variables Methods
Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of
More informationSolved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.
Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the
More informationState Estimation with ARMarkov Models
Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,
More informationarxiv: v1 [physics.data-an] 26 Oct 2012
Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch
More informationSpectral Analysis by Stationary Time Series Modeling
Chater 6 Sectral Analysis by Stationary Time Series Modeling Choosing a arametric model among all the existing models is by itself a difficult roblem. Generally, this is a riori information about the signal
More informationEstimation of the large covariance matrix with two-step monotone missing data
Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo
More informationThe non-stochastic multi-armed bandit problem
Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at
More informationMATH 2710: NOTES FOR ANALYSIS
MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite
More informationFeedback-error control
Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller
More informationStatics and dynamics: some elementary concepts
1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and
More informationHENSEL S LEMMA KEITH CONRAD
HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar
More informationSystem Reliability Estimation and Confidence Regions from Subsystem and Full System Tests
009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract
More informationDistributed Rule-Based Inference in the Presence of Redundant Information
istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced
More informationUniversal Finite Memory Coding of Binary Sequences
Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv
More information8 STOCHASTIC PROCESSES
8 STOCHASTIC PROCESSES The word stochastic is derived from the Greek στoχαστικoς, meaning to aim at a target. Stochastic rocesses involve state which changes in a random way. A Markov rocess is a articular
More informationUncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning
TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment
More informationA New Perspective on Learning Linear Separators with Large L q L p Margins
A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical
More informationConvex Optimization methods for Computing Channel Capacity
Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem
More informationDepartment of Electrical and Computer Systems Engineering
Deartment of Electrical and Comuter Systems Engineering Technical Reort MECSE-- A monomial $\nu$-sv method for Regression A. Shilton, D.Lai and M. Palaniswami A Monomial ν-sv Method For Regression A. Shilton,
More informationMATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK
Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment
More informationarxiv:cond-mat/ v2 25 Sep 2002
Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,
More informationarxiv: v3 [physics.data-an] 23 May 2011
Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland
More information8.7 Associated and Non-associated Flow Rules
8.7 Associated and Non-associated Flow Rules Recall the Levy-Mises flow rule, Eqn. 8.4., d ds (8.7.) The lastic multilier can be determined from the hardening rule. Given the hardening rule one can more
More informationCryptanalysis of Pseudorandom Generators
CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we
More informationKeywords: pile, liquefaction, lateral spreading, analysis ABSTRACT
Key arameters in seudo-static analysis of iles in liquefying sand Misko Cubrinovski Deartment of Civil Engineering, University of Canterbury, Christchurch 814, New Zealand Keywords: ile, liquefaction,
More informationAI*IA 2003 Fusion of Multiple Pattern Classifiers PART III
AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers
More informationA Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve
In Advances in Neural Information Processing Systems 4, J.E. Moody, S.J. Hanson and R.P. Limann, eds. Morgan Kaumann Publishers, San Mateo CA, 1995,. 950{957. A Simle Weight Decay Can Imrove Generalization
More informationFuzzy Automata Induction using Construction Method
Journal of Mathematics and Statistics 2 (2): 395-4, 26 ISSN 1549-3644 26 Science Publications Fuzzy Automata Induction using Construction Method 1,2 Mo Zhi Wen and 2 Wan Min 1 Center of Intelligent Control
More informationMultilayer Perceptron Neural Network (MLPs) For Analyzing the Properties of Jordan Oil Shale
World Alied Sciences Journal 5 (5): 546-552, 2008 ISSN 1818-4952 IDOSI Publications, 2008 Multilayer Percetron Neural Network (MLPs) For Analyzing the Proerties of Jordan Oil Shale 1 Jamal M. Nazzal, 2
More informationUse of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek
Use of Transformations and the Reeated Statement in PROC GLM in SAS Ed Stanek Introduction We describe how the Reeated Statement in PROC GLM in SAS transforms the data to rovide tests of hyotheses of interest.
More information3.4 Design Methods for Fractional Delay Allpass Filters
Chater 3. Fractional Delay Filters 15 3.4 Design Methods for Fractional Delay Allass Filters Above we have studied the design of FIR filters for fractional delay aroximation. ow we show how recursive or
More informationCharacteristics of Beam-Based Flexure Modules
Shorya Awtar e-mail: shorya@mit.edu Alexander H. Slocum e-mail: slocum@mit.edu Precision Engineering Research Grou, Massachusetts Institute of Technology, Cambridge, MA 039 Edi Sevincer Omega Advanced
More informationMetrics Performance Evaluation: Application to Face Recognition
Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,
More informationUsing the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process
Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund
More informationHotelling s Two- Sample T 2
Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test
More informationTopic 7: Using identity types
Toic 7: Using identity tyes June 10, 2014 Now we would like to learn how to use identity tyes and how to do some actual mathematics with them. By now we have essentially introduced all inference rules
More informationSTABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS
STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS Massimo Vaccarini Sauro Longhi M. Reza Katebi D.I.I.G.A., Università Politecnica delle Marche, Ancona, Italy
More informationM. Opper y. Institut fur Theoretische Physik. Justus-Liebig-Universitat Giessen. D-6300 Giessen, Germany. physik.uni-giessen.dbp.
Query by Committee H. S. Seung Racah Institute of Physics and Center for eural Comutation Hebrew University Jerusalem 9194, Israel seung@mars.huji.ac.il M. Oer y Institut fur Theoretische Physik Justus-Liebig-Universitat
More informationElementary Analysis in Q p
Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some
More informationJohn Weatherwax. Analysis of Parallel Depth First Search Algorithms
Sulementary Discussions and Solutions to Selected Problems in: Introduction to Parallel Comuting by Viin Kumar, Ananth Grama, Anshul Guta, & George Karyis John Weatherwax Chater 8 Analysis of Parallel
More informationPeriod-two cycles in a feedforward layered neural network model with symmetric sequence processing
PHYSICAL REVIEW E 75, 4197 27 Period-two cycles in a feedforward layered neural network model with symmetric sequence rocessing F. L. Metz and W. K. Theumann Instituto de Física, Universidade Federal do
More informationGeneralized Coiflets: A New Family of Orthonormal Wavelets
Generalized Coiflets A New Family of Orthonormal Wavelets Dong Wei, Alan C Bovik, and Brian L Evans Laboratory for Image and Video Engineering Deartment of Electrical and Comuter Engineering The University
More informationEffective conductivity in a lattice model for binary disordered media with complex distributions of grain sizes
hys. stat. sol. b 36, 65-633 003 Effective conductivity in a lattice model for binary disordered media with comlex distributions of grain sizes R. PIASECKI Institute of Chemistry, University of Oole, Oleska
More informationFE FORMULATIONS FOR PLASTICITY
G These slides are designed based on the book: Finite Elements in Plasticity Theory and Practice, D.R.J. Owen and E. Hinton, 1970, Pineridge Press Ltd., Swansea, UK. 1 Course Content: A INTRODUCTION AND
More informationAsymptotically Optimal Simulation Allocation under Dependent Sampling
Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee
More informationdn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential
Chem 467 Sulement to Lectures 33 Phase Equilibrium Chemical Potential Revisited We introduced the chemical otential as the conjugate variable to amount. Briefly reviewing, the total Gibbs energy of a system
More informationThe Noise Power Ratio - Theory and ADC Testing
The Noise Power Ratio - Theory and ADC Testing FH Irons, KJ Riley, and DM Hummels Abstract This aer develos theory behind the noise ower ratio (NPR) testing of ADCs. A mid-riser formulation is used for
More informationYixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA
Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose
More informationAn Analysis of Reliable Classifiers through ROC Isometrics
An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationSupplementary Materials for Robust Estimation of the False Discovery Rate
Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides
More informationCOMMUNICATION BETWEEN SHAREHOLDERS 1
COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov
More informationInformation collection on a graph
Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements
More informationLower bound solutions for bearing capacity of jointed rock
Comuters and Geotechnics 31 (2004) 23 36 www.elsevier.com/locate/comgeo Lower bound solutions for bearing caacity of jointed rock D.J. Sutcliffe a, H.S. Yu b, *, S.W. Sloan c a Deartment of Civil, Surveying
More informationCONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP
Submitted to the Annals of Statistics arxiv: arxiv:1706.07237 CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP By Johannes Tewes, Dimitris N. Politis and Daniel J. Nordman Ruhr-Universität
More informationarxiv: v1 [quant-ph] 22 Apr 2017
Quaternionic Quantum Particles SERGIO GIARDINO Institute of Science and Technology, Federal University of São Paulo (Unifes) Avenida Cesare G. M. Lattes 101, 147-014 São José dos Camos, SP, Brazil arxiv:1704.06848v1
More informationThe Graph Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule
The Grah Accessibility Problem and the Universality of the Collision CRCW Conflict Resolution Rule STEFAN D. BRUDA Deartment of Comuter Science Bisho s University Lennoxville, Quebec J1M 1Z7 CANADA bruda@cs.ubishos.ca
More informationA SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE
THE 19 TH INTERNATIONAL CONFERENCE ON COMPOSITE MATERIALS A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE K.W. Gan*, M.R. Wisnom, S.R. Hallett, G. Allegri Advanced Comosites
More informationt 0 Xt sup X t p c p inf t 0
SHARP MAXIMAL L -ESTIMATES FOR MARTINGALES RODRIGO BAÑUELOS AND ADAM OSȨKOWSKI ABSTRACT. Let X be a suermartingale starting from 0 which has only nonnegative jums. For each 0 < < we determine the best
More informationRecursive Estimation of the Preisach Density function for a Smart Actuator
Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator
More informationSolution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0
Advanced Finite Elements MA5337 - WS7/8 Solution sheet This exercise sheets deals with B-slines and NURBS, which are the basis of isogeometric analysis as they will later relace the olynomial ansatz-functions
More informationKEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS
4 th International Conference on Earthquake Geotechnical Engineering June 2-28, 27 KEY ISSUES IN THE ANALYSIS OF PILES IN LIQUEFYING SOILS Misko CUBRINOVSKI 1, Hayden BOWEN 1 ABSTRACT Two methods for analysis
More informationIntroduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)
Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal
More informationElements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley
Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the
More informationPreconditioning techniques for Newton s method for the incompressible Navier Stokes equations
Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College
More information#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS
#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt
More informationEvaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models
Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models Ketan N. Patel, Igor L. Markov and John P. Hayes University of Michigan, Ann Arbor 48109-2122 {knatel,imarkov,jhayes}@eecs.umich.edu
More informationMODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL
Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management
More information1 Probability Spaces and Random Variables
1 Probability Saces and Random Variables 1.1 Probability saces Ω: samle sace consisting of elementary events (or samle oints). F : the set of events P: robability 1.2 Kolmogorov s axioms Definition 1.2.1
More informationDETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS
Proceedings of DETC 03 ASME 003 Design Engineering Technical Conferences and Comuters and Information in Engineering Conference Chicago, Illinois USA, Setember -6, 003 DETC003/DAC-48760 AN EFFICIENT ALGORITHM
More informationOne step ahead prediction using Fuzzy Boolean Neural Networks 1
One ste ahead rediction using Fuzzy Boolean eural etworks 1 José A. B. Tomé IESC-ID, IST Rua Alves Redol, 9 1000 Lisboa jose.tome@inesc-id.t João Paulo Carvalho IESC-ID, IST Rua Alves Redol, 9 1000 Lisboa
More informationNew Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms
New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China
More informationON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE
MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE
More informationarxiv: v2 [stat.me] 3 Nov 2014
onarametric Stein-tye Shrinkage Covariance Matrix Estimators in High-Dimensional Settings Anestis Touloumis Cancer Research UK Cambridge Institute University of Cambridge Cambridge CB2 0RE, U.K. Anestis.Touloumis@cruk.cam.ac.uk
More informationCombining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)
Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment
More informationA compression line for soils with evolving particle and pore size distributions due to particle crushing
Russell, A. R. (2011) Géotechnique Letters 1, 5 9, htt://dx.doi.org/10.1680/geolett.10.00003 A comression line for soils with evolving article and ore size distributions due to article crushing A. R. RUSSELL*
More informationCHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules
CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is
More informationOn the capacity of the general trapdoor channel with feedback
On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:
More informationHidden Predictors: A Factor Analysis Primer
Hidden Predictors: A Factor Analysis Primer Ryan C Sanchez Western Washington University Factor Analysis is a owerful statistical method in the modern research sychologist s toolbag When used roerly, factor
More informationSTA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2
STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous
More informationAn Analysis of TCP over Random Access Satellite Links
An Analysis of over Random Access Satellite Links Chunmei Liu and Eytan Modiano Massachusetts Institute of Technology Cambridge, MA 0239 Email: mayliu, modiano@mit.edu Abstract This aer analyzes the erformance
More informationOn Wald-Type Optimal Stopping for Brownian Motion
J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of
More informationPARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH. Luoluo Liu, Trac D. Tran, and Sang Peter Chin
PARTIAL FACE RECOGNITION: A SPARSE REPRESENTATION-BASED APPROACH Luoluo Liu, Trac D. Tran, and Sang Peter Chin Det. of Electrical and Comuter Engineering, Johns Hokins Univ., Baltimore, MD 21218, USA {lliu69,trac,schin11}@jhu.edu
More informationDeriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.
Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &
More informationChapter 3. GMM: Selected Topics
Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2
More informationProof: We follow thearoach develoed in [4]. We adot a useful but non-intuitive notion of time; a bin with z balls at time t receives its next ball at
A Scaling Result for Exlosive Processes M. Mitzenmacher Λ J. Sencer We consider the following balls and bins model, as described in [, 4]. Balls are sequentially thrown into bins so that the robability
More informationUniformly best wavenumber approximations by spatial central difference operators: An initial investigation
Uniformly best wavenumber aroximations by satial central difference oerators: An initial investigation Vitor Linders and Jan Nordström Abstract A characterisation theorem for best uniform wavenumber aroximations
More informationFinding a sparse vector in a subspace: linear sparsity using alternating directions
IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright
More informationAsymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..
IOSR Journal of Mathematics (IOSR-JM) e-issn: 78-578, -ISSN: 319-765X. Volume 1, Issue 4 Ver. III (Jul. - Aug.016), PP 53-60 www.iosrournals.org Asymtotic Proerties of the Markov Chain Model method of
More informationBayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis
HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling
More informationBest approximation by linear combinations of characteristic functions of half-spaces
Best aroximation by linear combinations of characteristic functions of half-saces Paul C. Kainen Deartment of Mathematics Georgetown University Washington, D.C. 20057-1233, USA Věra Kůrková Institute of
More informationMAKING WALD TESTS WORK FOR. Juan J. Dolado CEMFI. Casado del Alisal, Madrid. and. Helmut Lutkepohl. Humboldt Universitat zu Berlin
November 3, 1994 MAKING WALD TESTS WORK FOR COINTEGRATED VAR SYSTEMS Juan J. Dolado CEMFI Casado del Alisal, 5 28014 Madrid and Helmut Lutkeohl Humboldt Universitat zu Berlin Sandauer Strasse 1 10178 Berlin,
More informationAn Improved Calibration Method for a Chopped Pyrgeometer
96 JOURNAL OF ATMOSPHERIC AND OCEANIC TECHNOLOGY VOLUME 17 An Imroved Calibration Method for a Choed Pyrgeometer FRIEDRICH FERGG OtoLab, Ingenieurbüro, Munich, Germany PETER WENDLING Deutsches Forschungszentrum
More information7.2 Inference for comparing means of two populations where the samples are independent
Objectives 7.2 Inference for comaring means of two oulations where the samles are indeendent Two-samle t significance test (we give three examles) Two-samle t confidence interval htt://onlinestatbook.com/2/tests_of_means/difference_means.ht
More information