A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs

Size: px
Start display at page:

Download "A variance decomposition and a Central Limit Theorem for empirical losses associated with resampling designs"

Transcription

1 Mathias Fuchs, Norbert Krautenbacher A variance ecomposition an a Central Limit Theorem for empirical losses associate with resampling esigns Technical Report Number 173, 2014 Department of Statistics University of Munich

2 A VARIANCE DECOMPOSITION AND A CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES ASSOCIATED WITH RESAMPLING DESIGNS MATHIAS FUCHS NORBERT KRAUTENBACHER ABSTRACT. The mean preiction error of a classification or regression proceure can be estimate using resampling esigns such as the cross-valiation esign. We ecompose the variance of such an estimator associate with an arbitrary resampling proceure into a small linear combination of covariances between elementary estimators, each of which is a regular parameter as escribe in the theory of U-statistics. The enumerative combinatorics of the occurrence frequencies of these covariances govern the linear combination s coefficients an, therefore, the variance s large scale behavior. We stuy the variance of incomplete U-statistics associate with kernels which are partly but not entirely symmetric. This leas to asymptotic statements for the preiction error s estimator, uner general non-empirical conitions on the resampling esign. In particular, we show that the resampling base estimator of the average preiction error is asymptotically normally istribute uner a general an easily verifiable conition. Likewise, we give a sufficient criterion for consistency. We thus evelop a new approach to unerstaning small-variance esigns as they have recently appeare in the literature. We exhibit the U-statistics which estimate these variances. We present a case from linear regression where the covariances between the elementary estimators can be compute analytically. We illustrate our theory by computing estimators of the stuie quantities in an artificial ata example. 1. INTRODUCTION This paper is concerne with the variance of resampling esigns in machine learning an statistics. A resampling esign a collection of splits of the ata into learning an test sets yiels an estimator of the expectation of a loss function of a moel fitting proceure; see Section 2 for etails of the set-up. An example of a resampling esign is the leave-p-out estimator of the average preiction error; a recent preprint [5] exploits the fact that this estimator is a U-statistic to erive its properties. Consequently, it is asymptotically normally istribute uner a very weak conition, namely that of existing an non-vanishing asymptotic variance. In this work, we generalize from the leave-p-out estimator to general resampling esigns such as cross-valiation. We set up a very general conition on the resampling esign that leas to consistency an a narrower one that leas to asymptotic normality. In a similar framework, consistency of cross-valiation was shown in Yang [13]; the principal ifference between our work an Yang s is that in Theorems 4 an 5 we treat the case of a fixe learning set size; hence, we o not subsume cross-valiation in these theorems. A resampling esign is an non-empirical atum in the sense that it can an shoul be 2010 Mathematics Subject Classification. 62G05, 62G09, 62G10, 62G20. Key wors an phrases. U-statistic, cross-valiation, limit theorem, esign, moel selection. supporte by the German Science Founation (DFG-Einzelförerung BO3139/2-2 to Anne-Laure Boulesteix). 1

3 2 MATHIAS FUCHS NORBERT KRAUTENBACHER specifie by the experimenter before seeing the ata. Moreover, a esign is, by nature, algebraic: its efinition involves no probability or analysis. On the other han, a Central Limit Theorem is a probabilistic-analytical statement an thus of quite a ifferent nature. Suppose we are given a sequence of resampling esigns in the sense that for every sufficiently large sample size n a collection of learning sets is specifie. Then, the Central Limit Theorem may either hol or not. By specifying a sufficient criterion in this work, we pose the question of whether there is any necessary criterion on the esign. This question seems to be very challenging. Likewise, it seems to be ifficult to etermine sufficient or necessary conitions on the resampling esign for the Strong Law of Large Numbers, the Berry-Esseen theorem, an the Law of the Iterate Logarithm to be vali. Therefore, it seems that the present work raises certain interesting an challenging problems for further research, at the bounary between the combinatorial worl of esigns an the probabilisticanalytical worl of limit theorems. Partial answers are given by the theory of incomplete U-statistics; however, the theory of incomplete U-statistics has only been evelope thoroughly in the case of symmetric kernels. Here, in contrast, a resampling esign is an incomplete U-statistic that is naturally associate with a non-symmetric kernel but usually not a symmetric one (note that only complete U-statistics are always associate with symmetric kernels). Usually, in statistical esign theory, the experimental units are allocate to blocks, an each experimental unit leas to a response [2, Section 3.1, Equation (2)]. Here, the picture is quite ifferent: our inepenent observations correspon to what are calle the treatments; thus, a response is measure for each treatment. It seems that this viewpoint appears less frequently in statistics. Here, we point out the usefulness of statistical esign theory to resampling. Although esign theory has been use in resampling an variance estimation theory, previous papers seem to have focuse on giving surveys [11], whereas we examine moel fitting algorithms in general. Design theory an U-statistics also seem to have been examine in the case where the blocks are the evaluation inices of symmetric kernels. Here, we look at a very ifferent scenario: The blocks are the inices of the learning sets, an the kernel is non-symmetric since it involves a learning set together with a testing observation. Likewise, the literature escribing resampling proceures for moel fitting in the language of U-statistics seems to be surprisingly sparse. Let us now outline the main results in more etail. The problem that cross-valiation suffers high variance is well stuie; further, approaches aiming at alleviating this are classical an treate in vast amounts of literature. Recently, in Zhang an Qian [14], cross-valiation esigns akin to Latin hyper-cube esigns in experimental esign theory were propose, an it was shown that such esigns, although of a computational cost similar to that of cross-valiation, have clearly smaller variance an are therefore generally preferable. Zhang an Qian [14, after Formula 12] gives a variance ecomposition of the average preiction error estimator associate with several particular esigns; we will give the corresponing formula for any esign. The same reference also contains an extensive overview of recent literature. Moreover, Fuchs et al. [5] outline that the leave-p-out preiction error estimator can be seen as a U-statistic an exploite this fact to euce the existence of an approximately exact hypothesis test of the equality of the two preiction errors. Since Fuchs et al. [5] is a preprint, we give a synopsis of that paper in Section 2.4. Thus, we aim to exploit the fact that any resampling proceure is an incomplete U-statistic an to view the results of Zhang

4 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 3 an Qian [14] in the light of the variance calculation framework of U-statistics. There is a general theory of incomplete U-statistics esigns such that the variance of such a incomplete U-statistics is as small as possible an, therefore, as close as possible to that of the leave-p-out classifier [9, Chapter 4]; let us recall the fact that any complete U-statistic associate with a possibly non-symmetric kernel is simultaneously a U-statistic associate with a symmetric kernel, namely the striation of the original kernel. Thus the theory of complete U-statistics is entirely covere by that for symmetric kernels. However, the picture is completely ifferent for incomplete U-statistics. The reason is that if one efines an incomplete U-statistic just as an average taken over symmetric kernels of a collection of subsets, then one misses a goo eal of interesting statistics. Here, we will investigate a more general efinition that calls any average of non-symmetric kernels an incomplete U-statistic. In contrast to a efinition just containing symmetric kernels, we will have to perform optimization for non-symmetric kernels. Then, the kernel efining the U-statistic which is the leave-p-out error estimator, is genuinely non-symmetric. The associate symmetrization is the leave-one-out error estimator on a sample whose size is just one plus the original learning sample size. We are now face with the ifficulty that this kernel is computationally very unfortunate. Therefore, we set out to generalize the theory of incomplete U-statistics to that of non-symmetric kernels. However, we will o so just for the case of a milly non-symmetric kernel such as ours in fact, only a few summans are necessary in orer to obtain a symmetric one. Subsuming this point, it seems that the existing theories are restricte to the case of symmetric kernels. In contrast, a proper resampling proceure woul not rely on a symmetric kernel, because there is no reason why small-variance proceures coul be achieve with a symmetric kernel. Moreover, it seems very intuitive that the symmetrize formula of the kernel leas to a very high ratio of variance to computational cost. In generalizing the theory of incomplete U-statistics to that of non-symmetric kernels, we give a conceptual approach to fining esigns similar to the a-hoc esigns of Zhang an Qian [14] which were efine without any mention of U-statistics. The main results of our paper concern a ecomposition of the variance of any crossvaliation-like proceure into a linear combination of four series of core covariances, generalizing the covariances appearing in Bengio an Granvalet [3, Corollary 2]. Each of these, enote by τ (i) for i = 1,...,4, is a regular parameter an can therefore be estimate optimally by another U-statistic. The variance estimation of U-statistics has alreay been consiere in the literature [10, 12]. The coefficients of the linear combination are polynomials of egree at most two that only epen on the sample size an the learning set size. Thus, they are known in avance of seeing the ata an easily calculable. The ecomposition is a significant generalization of the classical ecomposition of Hoeffing [7, Formula 5.18] for the variance of a U-statistic to the case of an incomplete U-statistic associate with a symmetric kernel. The ifference of our variance formula to Lee s is that ours extens over four series of covariances instea of just one. It turns out that the variance expression thus attaine is extremely ifficult (or perhaps impossible) to minimize over all esigns of a given size uniformly over all unerlying probability istributions P. Therefore, we will approximate an asymptotic case of large sample size. Our main results are: proving the existence of unbiase variance estimators for the core

5 4 MATHIAS FUCHS NORBERT KRAUTENBACHER covariances (Corollary 1), the variance structure of cross-valiation (Theorem 16) an its estimation (Theorem 3), the analytical computation of the core covariances in a toy regression moel in Section 4, the Central Limit Theorem 5 an the associate asymptotic test in (29) an the numerical computation of the estimators in a relate regression moel in Section 6. The paper is structure as follows. In Section 2, we specify the set-up, Section 3 explains the variance ecompositions, Section 4 presents an analytical computation of the core covariances, in Section 5, we efine the variance estimators an show the Central Limit Theorem, an Section 6 illustrates our theory by means of a ata example in which we compute the estimators numerically. 2. THE SET-UP 2.1. The loss estimator. The general framework of the loss estimator is slightly more general than that unerlying the largest part of statistical literature. In the general framework, there is a univariate response variable Y ranging over a set Y, an a multivariate preictor variable X ranging over X (both X an Y are assume to be equippe with fixe σ-algebras). The joint istribution of (X,Y ) is escribe by a probability measure P on X Y equippe with the prouct σ-algebra. The quality of the preiction of Y is measure by a loss function (y,y ) l(y,y ). Typically, binary classification uses the misclassification loss 1 y y, but we can also use any other measurable loss. Other loss functions inclue, for instance, the usual regression mean-square loss (y y ) 2 or a survival analysis loss after extening the loss function s omain of efinition to censore observations. We fix a learning sample size g an then consier a statistical moel fitting proceure in the form of a function (1) s : (X Y ) g X Y (x 1,y 1,...,x g,y g,x g+1 ) s(x 1,y 1,...,x g,y g ;x g+1 ) which maps the learning sample (x 1,y 1,...,x g,y g ) to the preiction rule applie to the test observation x g+1. Equivalently, s can be seen as mapping the learning sample to a classification rule which is a map from preictors X to responses in Y. (Sometimes, s(x 1,y 1,...,x g,y g ;x g+1 ) is enote by f (x g+1 x 1,y 1,...,x g,y g ) to escribe a learne estimator f for a true moel f : X Y.) Throughout the paper, we will assume that s treats all learning arguments equally, so that it is invariant uner permutation of the first g arguments, an we assume that s is measurable with respect to the prouct σ-algebra on (X Y ) g X. The joint expectation of the loss function with respect to the g + 1-fol prouct measure is (2) E(l(s)) = l(s(x 1,y 1,...,x g,y g ;x g+1 ),y g+1 )P(x 1,y 1 ),...,P(x g+1,y g+1 ) an is calle the unconitional loss of the moel fitting proceure, where the left-han sie uses a slightly sloppy but unambiguous notation. It is of practical interest to estimate it, together with the ifference E(l 1 (s 1 )) E(l 2 (s 2 )) = E(l 1 (s 1 ) l 2 (s 2 )), for two moel fitting proceures s 1 an s 2 an two loss functions l 1 an l 2. Remark 1. E(l(s)) generalizes the usual mean square error in sense that the loss function is arbitrary instea of being the quaratic loss, the true moel is arbitrary instea of being in the particular form Y = f (X) + ε, the preictors X are ranom, an the expectation is taken with respect to the learning ata as well.

6 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 5 Even if the true moel is of the form Y = f (X) + ε an the loss is quaratic, one cannot immeiately obtain a bias-variance ecomposition as in Hastie et al. [6, Formula 2.47] because the joint testing an learning expectation instea of just the testing expectation leas to covariance between f (X g+1 ) an f (X g+1 ). The erivation of the bias-variance ecomposition usually relies on ignoring this covariance by viewing the X i as non-ranom Estimators for the loss. Let us efine Γ(i 1,...,i g ;i g+1 ) := l 1 (s 1 (x i1,y i1,...,x ig,y ig ;x ig+1 ),y ig+1 ) (3) l 2 (s 2 (x i1,y i1,...,x ig,y ig ;x ig+1 ),y ig+1 ), a function on a set of g + 1 ifferent inices i k 1,...,n, for two moel fitting proceures s 1,s 2 an two appropriate loss functions l 1,l 2. We allow for each moel fitting proceure to have its own loss function because then the case l 2 := 0 yiels the loss of a single proceure, which is of obvious practical interest. We have: E(Γ) = E(l 1 (s 1 ) l 2 (s 2 )) an we efine Θ = EΓ as a slight generalization of (2). The expectations are taken with respect to the (g + 1)-fol prouct space of X Y an are assume to exist. A resampling proceure is a collection of isjoint learning an test sets. For every pair of learning set an test observation one obtains an elementary estimator of the error rate. Averaging these across all learning an test sets of the resampling proceure efines an unbiase estimator for Θ. Quite often, another convention is use where such an estimator is seen as an approximation for the preiction error on another learning set size such as the total sample size; then, unbiaseness is of course lost. It is now of interest to gain insight into the variance of such an estimator. All expectations an variances are taken with respect to the g + 1-fol prouct measure of P. The efinition of Γ was such that the number g + 1 of arguments is minimal uner the restriction that Θ = E Γ for all unerlying probability istributions such that this expectation exists. This minimality woul be lost if the efinition of Γ involve a larger test set size than one. Let T be a collection of pairs (S,a) where S {1,...,n} is an (unorere) set of isjoint learning inices, an a {1,...,n}\S is a test inex. Then, each Γ(S;a) is an elementary estimator of Θ, an we efine Θ(T ) := 1 T Γ(S;a). (S,a) T In simple cases, it is possible to compute Θ analytically. For instance, we will o so in Section Complete an incomplete U-statistics. This section summarizes some efinitions an ieas from [7]. Let n enote the sample size. A U-statistic is a statistic of the form U = ( n k) 1 h(zi1,...,z ik ) for a symmetric function h of k vector arguments, where the summation extens over all possible subsets (i 1,...,i k ). Since the number of such subsets is ( n k), the expectation of U is equal to that of h with respect to the k-fol prouct measure of P, so U is an unbiase estimator of E(h). A regular parameter is a functional of the form P h k (P). The minimal k such that there exists a symmetric function h such that E(h) = Θ hols for all probability istributions P is calle the egree of the U-statistic. Any such minimal function is calle a kernel of U. If a non-symmetric function with that property exists, then, by symmetrization, a symmetric function exists. An important property of U-statistics is that they are the unique minimum variance estimator of the expecte value Θ. Furthermore, the convergence of U towars Θ is controlle

7 6 MATHIAS FUCHS NORBERT KRAUTENBACHER by precise theorems: the Laws of Large Numbers, the Law of the Iterate Logarithm, the Law of Berry-Esseen, an the Central Limit Theorem. An incomplete U-statistic is often efine in the literature as one associate with a symmetric kernel, namely as a sum of the form K 1 S S h(z S1,...,z Sk ), where h is a symmetric function an S is a collection of k-subsets S. We write S =: K because it generalizes the corresponing nomenclature in K-fol cross-valiation. Since h is symmetric, it suffices to exten the summation over collections of increasing subsets, an an evaluation of h is alreay etermine by its evaluation on increasing inices: each subset S can be written as S = (S i ) such that 1 S 1 < < S k n. Here, we will consier statistics of the more general form R 1 S R h(z R1,...,z Rk ) where h is not necessarily symmetric, an therefore R is a collection of arbitrary orere, but not necessarily increasing, subsets R = {R 1,...,R k }. Variance-minimizing esigns have been set up for incomplete U-statistics with symmetric kernels but not yet for those with not necessarily symmetric kernels. We will o so in the special case of h = Γ. One coul consier variance minimizing esigns associate with the symmetrization Γ 0 (as efine below) but the variance can be reuce further in the general case A test for the comparison of two average preiction errors. Here, we give a short, self-containe overview of the results of Fuchs et al. [5]. One efines Γ 0 (1,...,g + 1) := (g + 1) 1 Γ(π(1),...,π(g);π(g + 1)) π where the sum is taken over all g + 1 cyclic permutations π of 1,...,g + 1, namely all permutations of the form (1,...,g+1) (q,...,g+1,1,...,q 1), where q {1,...,g+ 1}. Then Γ 0 is the leave-one-out version of Γ, an Γ 0 is a symmetric function of g + 1 vector arguments. Therefore, Γ 0 efines a U-statistic, an sorting out the terms shows that this U-statistic is the leave-p-out estimator of the error [1] where p := n g (this efinition hols for the rest of the paper). Likewise, Γ 0 is obtaine from Γ by symmetrizing over all (g + 1)! permutations; the sum then simplifies to the cyclic permutations because all learning observations are treate equally. Let T or, when the sample size is neee, T,n enote the maximal esign, consisting of all ( n g) (n g) possible pairs (S;a). Then, the U-statistic associate with the symmetric kernel Γ 0 is Θ(T ), the leave-p-out estimator. An important consequence of ientifying the leave-p-out estimator as a U-statistic is that it has minimal variance among all estimators of the error rate. Also, all of the many properties of U-statistics, such as asymptotic normality an so on, automatically apply to the leavep-out estimator Θ(T ). We implicitly assume Assumption 1. The egree of Θ is exactly g + 1. Similarly, the egree of Θ 2 is 2g + 2. Remark 2. It seems to be very har to prove analytically the first part of the assumption, or to give numerical evience. However, it seems to be very intuitive to assume that the true error can not be achieve by a smaller learning set size than g, across all istributions P. The secon part of the assumption is violate, for instance, if σ1 2 = 0 (efine in Definition 4), which correspons to the case that the U-statistic is egenerate. It is unclear whether the secon part of the assumption can be violate if the U-statistic is nonegenerate. Furthermore, it turns out that the variance of a U-statistic, trivially given by U 2 Θ 2, is another regular parameter an can therefore be estimate by a U-statistic. However, uner

8 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 7 Assumption 1, the variance is a U-statistic of twice the egree of that of the unerlying U-statistic, an therefore, there is no unbiase estimator of the variance of the leave-p-out error estimator unless n 2(g + 1). Therefore, the learning set size must be less than half the total sample size. However, uner this constraint, stuentization is possible because of the consistence of the variance estimator, the Laws of Large Numbers, an Slutsky s theorem. This leas to the fact that the stanarize statistic (U 2 Θ 2 ) 1/2 U is approximately normal, implying that there is an approximately exact test for the comparison of the losses of two statistical proceures [5]. 3. THE CORE COVARIANCES AND THEIR THEORETICAL PROPERTIES In the following, we will generalize the variance ecomposition of Bengio an Granvalet [3, Formula (7)] to arbitrary esigns. Thus, we will erive the general formula for the variance of a resampling proceure. In particular, we will take avantage of the fact that the large number of covariance terms occurring in the variance of a resampling proceure reuces to a few core covariance terms which we will call τ (i). Our goal is the variance ecompositions in formulas (14) an (19). These are variance ecompositions of incomplete U-statistics associate with only partially symmetric kernels. In the particular case where the kernel is symmetric (which oes not happen for kernels of the form (3)), we recover part of the variance ecomposition of incomplete U- statistics as in Lee [9, Chapter 4]. However, it is quite important to note that our variance ecomposition (19) is somewhat analogous to, but oes not reuce to, the variance ecomposition of Lee [9, Chapter 4, Formula (2)]. In fact, our quantities B γ only refer to the learning sets an are therefore ifferent from Lee s B γ s. Definition 1. Let S = {1,...,g}, a = g + 1, S = {g + 1,...,2g + 1}, a = 2g + 2. Then, the functional Θ 2 is efine by Θ 2 (P) = Γ(S;a)Γ(S ;a ) 2g+2 P(Z 1,...,Z 2g+2 ). This is a regular parameter of egree at most 2g + 2. In the case that Θ is egenerate (meaning that σ 1 = 0 for all P where σ 1 is efine in (4)), Θ 2 = E(Γ 0 (1,...,g + 1)Γ 0 (g + 1,...,2g + 1)) an therefore it is of smaller egree, it seems reasonable to assume that this is the only way Θ 2 can have smaller egree The four series - efinition. Let us now consier proucts of two evaluations of Γ where the inex sets overlap in inices, but there is either no overlap in the test inices, or one test observation occurs in the learning observation of the other, or both test observations occur in the other s learning set, respectively, or both test observations coincie. These four cases are illustrate by Figure 1 an escribe all possible configurations. Definition 2. E(Γ(1,...,g;g + 1)Γ(1,...,,g + 2,...,2g + 1 ;2g + 2 )), i = 1 τ (i) := Θ 2 E(Γ(1,...,g;g + 1)Γ(1,..., 1,g + 1,...,2g + 1 ;2g + 2 )), i = 2 + E(Γ(1,...,g;g + 1)Γ(1,..., 2,g + 1,...,2g + 2 ; 1)), i = 3 E(Γ(1,...,g;g + 1)Γ(1,..., 1,g + 2,...,2g + 2 ;g + 1)), i = 4 for = 1,...,g+1, an the exceptional cases τ (i) 0 = 0 for all i, an τ(3) 1 = τ (1) g+1 = τ(2) g+1 = 0.

9 8 MATHIAS FUCHS NORBERT KRAUTENBACHER S a S a S' a' S' a' (a) Case 1 (b) Case 2 S a S a S' a' S' a' (c) Case 3 () Case 4 Figure 1. Let S,a an S,a be any pair of g-subsets S an S, an a / S, a / S. Then Cov(Γ(S;a),Γ(S ;a )) only epens on which of the four cases escribes the overlap pattern. Here: example for = 5 Remark 3. Therefore, the quantity σ 2 from Bengio an Granvalet [3] appears in this classification as τ (4) n n/k+1 where n is the total sample size an K is the number of blocks of cross-valiation, their ω is our τ (1) n n/k an their γ is our τ(3) n 2n/K+1. The seemingly more complicate nomenclature, involving lower inices, allows for the treatment of any resampling proceure instea of only cross-valiation. Notational Convention 1. Throughout this work, we enote the total overlap size (S {a}) (S {a }) between two evaluation tuples by the letter, an the overlap between two learning sets S S by the letter c. The interest in these quantities is that any occurring covariance between evaluations of Γ is equal to one of them. Note that there is an astronomical number of possible pairs of evaluations of Γ, but there are only 4g + 1 quantities τ c (i) unequal to zero. Observation 1. Let S,a an S,a be any pair of g-subsets S an S, an a / S, a / S. Then Cov(Γ(S;a),Γ(S ;a )) = τ (i) (S {a}) (S {a for some i = 1,...,4 that escribes the overlap }) pattern. This is obvious from the fact that Γ is symmetric in the learning inices, an the prouct measure n P is permutation invariant σ 2 as a linear combination of the core covariances. Let us efine (4) σ 2 := E(Γ 0(1,...,g + 1)Γ 0 (g + 2,...,2g + 2 )) Θ 2

10 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 9 for = 1,...,g + 1. (σ 2 is calle ζ in Hoeffing [7].) Thus, σ 2 measures the covariance between two symmetrize kernels whose overlap has size. By a short computation, these numbers can be seen to be conitional variances, hence they are non-negative an it is justifie to efine them as squares. By plugging in the efinition of Γ 0 an expaning the sum we arrive at the following expression in terms of the four series: σ 2 = 1 ( ) (5) (g + 1) 2 (g + 1 ) 2 τ (1) + 2(g + 1 ) τ (2) + ( 1) τ (3) + τ (4). In particular, we see that the right han sie must be non-negative. The asymptotic variance of the complete U-statistic, the leave-p-out estimator, is (g + 1) 2 σ1 2 /n [7, 5.23] (recall that p = n g). So, the limiting variance is (6) lim nv( Θ(T,n )) = g 2 τ (1) 1 + 2gτ (2) 1 + τ (4) 1, where the limit is taken for g fixe Possible values for τ (i). Furthermore, plugging (5) into the inequality (7) σ 2 / σ 2 / [7, 5.19] for puts constraints on the τ (i), on top of those from Bengio an Granvalet [3, Section 6]. Some of the τ (i) can be ientifie as variances. Let us efine the conitional expectations Γ (i), (z 1,...,z ) := { E(Γ(1,...,g;g + 1) Z 1 = z 1,...,Z = z ) if i = 1 E(Γ(1,...,g;g + 1) Z 1 = z 1,...,Z 1 = z 1,Z g+1 = z ) if i = 4 Uner favorable regularity conitions on f an P, for instance those analogous to those state in connection with Cramér [4, ], these functions possess the more explicit form { Γ (i), (z G(z1,...,z 1,...,z ) :=,Z +1,...,Z g ;Z g+1 )P(Z +1 )...P(Z g )P(Z g+1 ), i = 1 G(z1,...,z 1,Z,...,Z g ;z )P(Z )...P(Z g ), i = 4. where we have to use the function G as a slightly ifferent notation for Γ, namely G(Z 1,...,Z g ;Z g+1 ) := Γ(1,...,g + 1) Let us enote the b b-matrix implementing the binomial transform by P or P(b), thus P i j = ( 1) i+ j( i j) for 1 j i b an Pi j = 0 for j > i. For any U-statistic U, let us enote the associate quantities σ an δ from [7, 5.27] by σ (U) an δ (U), respectively. We then have δ(u) = Pσ(U) as vectors with inex = 1,...,eg(U). The following lemma puts strong constraints on the possible values of τ (1) an τ (4) ; these constraints o not appear in the treatment by Bengio an Granvalet [3, Section 6]. Lemma 1. For any = 1,...,g, the quantity τ (1) is the quantity σ 2 (1) (U ) of the U-statistic U (1) associate with the kernel Γ (1),g of egree g. Consequently, (8) τ (1) = V(Γ (1), ) 0, an (9) τ (1) / τ(1) e /e

11 10 MATHIAS FUCHS NORBERT KRAUTENBACHER for any 1 < e g, an (10) Pτ (1) = Pσ(U (1) ) = δ(u (1) ) 0 for the binomial matrix P = P(g). For type (4), one has only (11) τ (4) = V(Γ (4), ) 0. Proof. Γ (1),g is a function of g arguments. Plugging in the ranom variables Z i, we arrive at a ranom variable Γ (1),g (Z 1,...,Z g ). Using the fact that Γ (1),g is symmetric, one obtains σ (U (1) ) = Cov(Γ (1),g (Z 1,...,Z g ),Γ (1),g (Z 1,...,Z,Z g+1,...,z 2g )). Writing this covariance as the ifference of the expectation of the prouct an the prouct of the expectations, the first term is seen to have overlap pattern of type (,(1)), an the secon is Θ 2, thus the first claim. The claim on type (4) ensues analogously. In contrast, no such assertion seems to hol for the series (2) an (3), an there seems to be no positivity statement. Also, there seems to be no reason why τ (4) coul be ientifie with the quantities σ of some U-statistic, nor why Pτ (4) shoul be non-negative. In contrast, Bengio an Granvalet [3] also give constraints that are not covere by Lemma 1. Lemma 2. (1) τ (4) τ (1) 1 for all. (2) τ (4) /2 τ(3) 2 τ (4) for all. (3) τ (i) τ g (4) for all an i. Proof. The first two statements follow from plugging in all possible values for n an K into Bengio an Granvalet [3, Lemma 8]. The thir statement is the Cauchy-Schwarz inequality Variance ecomposition of incomplete U-statistics. Let us turn our attention to the general incomplete U-statistic associate with a collection T of pairs (S, a) of a learning set S an a test observation a / S. We will briefly enote an overlap size an type of pattern by Ψ((S,a),(S,a )) = (,(i)) when (S {a}) (S {a }) = an the type is (i), an will then write τ(ψ((s,a),(s,a )) instea inicating the type of the overlap pattern with lower an upper inices. The variance of the cross-valiation-like proceure associate with the collection T is (12) V( Θ(T )) = T 2 i, j Cov(Γ(S i ;a i ),Γ(S j ;a j )) = T 2 τ(ψ((s,a),(s,a ))), i, j which is convenient because the last sum can be written as a linear combination with much fewer summans because many summans take the same value Variance ecomposition of test-complete esigns. Definition 3. (13) (1) Consier the following linear combination of the τ (i) : ξ c :=(n 2g + c)(n 2g + c 1) τ (1) c + 2(g c)(n 2g + c) τ (2) c (g c) 2 τ (3) c+2 + (n 2g + c) τ(4) c+1 for all c = 0,...,g, where we efine τ (3) g+2 = 0.

12 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 11 (2) Furthermore, let us call a esign T test-complete whenever the following hols: (S,a) T = (S,b) T for any b / S. In wors, a esign is test-complete whenever it contains, together with a learning set S, the combinations of S with all possible test observations. Note that a test-complete esign is uniquely specifie by the learning sets it contains. Whenever a test-complete esign T is specifie by the collection of learning sets it contains, we will write S for the collection of learning sets, where each learning set S is counte only once even if it occurs in several pairs (S,a). Thus, T = K(n g) (of course, we suppose S to contain each learning set only once). (3) Let T be a test-complete esign. For any c = 0,...,g, let fc l N 0 be the number of orere pairs of learning sets (S,S ), both occurring in T, such that S S = c. Pairs (S,S) with the same learning set occurring twice are also allowe (where l is a mere symbol instea of an inex). For instance, any cross-valiation esign is test-complete. The same hols for the complete esign efining the leave-p-out estimator. For any test-complete esign, the associate numbers f l c are easily computable. For instance, they are given by the number of entries equal to c in N T N, where N is the incience matrix of the learning sets occurring in the esign. Obviously, only test-complete esigns seem to be relevant in practice because of the low computational cost of evaluating the loss function for a given moel an given test observations. Theorem 1. Let T be a test-complete esign an let S be the associate collection of learning sets. Then, the variance of the error estimator satisfies (14) V( Θ(T )) = T 2 g fc l ξ c c=0 where ξ c was efine in (13). Proof. This follows from expaning the variance as in (12) into the form T 2 multiplie by the sum of all entries of the T T -covariance matrix between the non-rescale summans of Θ(T ) an counting the terms. Each entry of the covariance matrix is escribe by two pairs (S,a),(S,a ) an therefore efines a specific type (1),...,(4) of the overlap pattern between (S,a) an (S,a ), an a particular overlap size = (S {a}) (S {a }). Any two summans of the same type (i) an the same overlap size are equal, namely τ (i). Now, counting an summing up all such terms with learning overlap size c, one obtains ξ c. This implies the result. Minimization of the expression f l c ξ c seems to be very har in practice. However, we will outline below a few cases where this task is feasible. Example 1 (Variance of cross-valiation). Let us assume n is ivisible by K, an that therefore the learning sets have size g = n n/k. We then arrive at the following. For K-fol cross-valiation, K 2, we count 0, c / {n n/k,n 2n/K} (15) fc l = K, c = n n/k K 2 K, c = n 2n/K. The variance of cross-valiation is given by the formula (16) V( Θ(T )) = (K 1 n 1 )τ (1) n n/k + (1 K 1 )τ (3) n 2n/K+2 + n 1 τ (4) n n/k+1

13 12 MATHIAS FUCHS NORBERT KRAUTENBACHER In the case K = 2, we obtain the expression (17) V( Θ(T )) = 1 ( (1) (n/2 1)τ n n/2 + n/2 τ(3) 2 + τ (4) ) n/2+1 Since it is unclear whether an how fast the τ (i) converge to zero, one can not immeiately euce asymptotic statements from (17) Non-asymptotic minimization of f l c ξ c. Definition 4. Let T be a test-complete esign. For γ = 1,...,g an a subset s {1,...,n} such that s = γ, let n(s) be the number of learning sets S in the esign (where each single learning set is counte only once) such that s S. Let B l γ := s n(s) 2, where the sum is taken over all ( n γ) subsets s. Analogously, let B0 := K 2 = T 2 (n g) 2 = g c=0 f l c. Lemma 3. The quantities f l c are uniquely etermine by the B l γ. In fact, f l c = g γ=c( 1) γ c( γ c) Bγ for all 0 c g. Proof. For 1 c g, the proof procees in complete analogy to the proof of Lee [9, Chapter 4, Equation (7)], even though our f l c an B γ are quite ifferent from Lee s f c an B γ. For c = 0, one has f l 0 = g c=0 f l c g c=1 f l c = B 0 g c=1 g γ=c B γ = = B 0 + g γ=1 ( 1)γ B γ = g γ=0 ( 1)γ B γ, using that γ c=1 ( 1)c( γ c) = 1 for all γ c. Let us write this result in the form f l = PB for the upper-triangular matrix P efine by P c,γ = ( 1) γ c( γ) c for all 0 c γ g an Pc,γ = 0 for γ < c, where ( γ 0) := 1 for all γ 0 (The map escribe by the matrix P is often calle the binomial transform.) Using (14), we can now write V( Θ(T )) = T 2 < f l,ξ >= T 2 < PB,ξ >= T 2 < B l,p T ξ >. For this reason, we consier the binomial transformation P T ξ of the vector ξ separately: Definition 5. γ (18) α γ := c=0( 1) γ c( ) γ ξ c for all 0 γ g c Thus, we have shown that (19) V( Θ(T )) = T 2 g B γ α γ, γ=0 an in orer to minimize this, we have to maximize those B γ for which α γ is negative, an minimize those for which it is positive. This stans in contrast to the classical case where all B γ have to be minimize. The usefulness of (19) lies in the fact that in the case that ξ c is a polynomial of small egree in c, all α γ vanish when γ is greater than the polynomial s egree because γ c=0 ( 1)c( γ c) c = 0 for any < γ. In Section 4, we will exhibit a case where the ξ c is a polynomial of egree two, an in Section 6 we will give numerical evience that the ξ can more often be approximate well by a quaratic polynomial. Precisely, if ξ is of egree one, we have ξ c = b + Ac an then α 0 = b,α 1 = A an α γ = 0 for γ 2. If ξ is of egree two, we have ξ c = b + Ac +Cc 2, an then it is easy to calculate that α 0 = b (20) α 1 = A +C α 2 = 2C α γ = 0 for all γ 3.

14 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES ANALYTICAL COMPUTATION OF THE CORE COVARIANCES IN A TOY INTERCEPT ESTIMATION MODEL Let us consier the following simple example. The ranom variable X is univariate an istribute accoring to some unknown istribution P X, an the joint istribution of (Y,X) is given by the simple moel Y = β 0 +β 1 X +ε, where ε N (0,v) with β 0 an v unknown. This moel is a close relative of that use in univariate orinary regression where the slope coefficient β 1 is known an only the intercept is estimate. We show the following facts in the supplement 6.4: There is an explicit formula for the kernel Γ, the τ c (i) are quaratic polynomials in c which we write own, consequently, ξ c is a quaratic polynomial in c as well, an α γ is non-zero if an only if γ = 0,1,2. A first consequence is that by (19), two esigns have the same variance alreay as soon as they have the same B 0,B 1 an B 2. Example 2. The calculations of 6.4 can be use to compare the variance of cross-valiation with that of the leave-p-out estimator in close form expressions. The full U-statistic associate with the kernel Γ, i.e., the leave-p-out estimator on a sample of size n, is equal to ˆv(1 + g 1 ). Since ˆv v(n 1) 1 χn 1 2, the leave-p-out estimator is istribute as the v(1 + g 1 )(n 1) 1 -fol of a chi-square of n 1 egrees of freeom. Therefore, the variance of the leave-p-out estimator is V(v(1 + g 1 )(n 1) 1 χn 1 2 ) = v 2 (1 + g 1 ) 2 (n 1) 2 2(n 1) = 2v 2 (1 + g 1 ) 2 (n 1) 1. This is consistent with the fact that by (6) an (5), we have lim nv( Θ(T )) = 2(g 2 + 2g 1 + 1)v 2 which coul also be erive from the expression V( Θ(T )) = T 2 B γ α γ. So, we have erive the rescale limiting variance of the leave-p-out estimator in three ways. In contrast, for the esign T 2-CV escribing two-fol cross-valiation (g = n/2), we obtain by (17): (21) V( Θ(T CV )) = 2v 2[ n n 2]. Thus, the ratio V 2-CV /V( Θ(T,n )) is one for n = 2 an tens to one for n, an attains its maximum, 25/16, for n = 6. Note that here g an n both ten to infinity, in contrast to the rest of the paper: (22) lim nv( Θ(T )) = lim nv( Θ(T CV )) = 2v 2. Also, one can check that V( Θ(T )) < V( Θ(T CV ) for all n, as it shoul. Let us go back to the usual scenario of fixe g, an let n ten to infinity. Thus, in this example, the limiting variance 2v 2 agrees in all three cases: where g is fixe, the case of cross-valiation with g = n/2, an the leave-p-out case where g = n/2. Note also that minimizing B γ α γ involves only three non-zero summans whereas f l c ξ c involves g + 1 summans. Therefore, the minimization problem s imensionality is rastically reuce when passing from the ξ c to the α γ. Let us now show how to apply our calculations to the variance minimization problem. Let us say we are given fixe values for n,g an K. The problem is to fin a esign that minimizes the expression B 1 α 1 + B 2 α 2, because the pre-factor as well as the summan corresponing to c = 0 of (19) can be ignore because they are etermine by the pre-set quantity K. Let us assume that each observation occurs in the same number of learning sets. This is analogous to the usual restriction to equireplicate esigns as in Lee [9, Section 4.3.2]), an

15 14 MATHIAS FUCHS NORBERT KRAUTENBACHER we also call such esigns equireplicate, even though we are only referring to the learning sets. In such esigns, the conition that B 1 = K 2 g 2 /n is impose. Thus, only B 2 remains as a egree of freeom in the optimization, eliminating any trae-off between competing components. Since α 2 > 0, B 2 has to be minimize. Subsuming the results of this section, we have shown the following: Theorem 2. In the intercept estimation moel of this chapter, all equireplicate esigns for fixe n,g an K that have the same B 2 have the same variance. Any equireplicate esign with minimal B 2 among all equireplicate esigns achieves the minimal variance among all equireplicate esigns of the same n, g, an K. Assuming that the configuration of n,g, an K allows for the existence of a Balance Incomplete Block Design (see Definition 7), any Balance Incomplete Block Design of these n, g, an K is a esign with minimal variance among all equireplicate esigns of these n, g, an K. Proof. It only remains to show the last assertion. This is one in complete analogy to the proof of Lee [9, Chapter 4, Theorem 1]. For instance, for g = 2, B 2 is boun to be equal to K, an therefore all equireplicate esigns have the same variance. Another simple example is the leave-one-out case g = n 1,K = n. Then, the minimality of the esign s variance has been unveile to be the minimality of a symmetric Balance Incomplete Block Design s variance. Since the α γ, unlike those in the classical context, can happen to be negative, one might ask whether there exists a configuration (n, g, K) such that an equireplicate esign exists but a non-equireplicate esign has smaller variance than the best equireplicate one. Such a non-equireplicate esign woul then maximize B 1 instea of minimizing B 2. Thus, it woul be, in some sense, the opposite of an equireplicate esign. It seems that whenever ξ c is a polynomial in c of small egree, arguments similar to those in this chapter can be use to etermine equireplicate minimal-variance esigns in a non-empirical way. 5. A GENERAL CENTRAL LIMIT THEOREM AND A HYPOTHESIS TEST 5.1. The core covariance as regular parameters an their estimation. Let us recall that a linear combination of regular parameters is a regular parameter [7, mile of Page 295]. This allows us to split off the common regular parameter Θ 2 from an integral that is specific for the overlap pattern, in the following sense: each quantity τ (i) can be written as τ (i) (P) = Γ(S;a)Γ(S ;a ) 2g+2 P(Z 1,...,Z 2g+2 ) Θ 2 (P) where the overlap pattern between (S,a) an (S,a ) is of type (i) an (S {a}) (S {a }) =. Note that this implies that S {a} S {a } = 2g + 2, so the number of integrals in the first summan is correctly specifie. Therefore, τ (i) is a regular parameter of egree at most 2g + 2. Lemma 4. If Θ 2 has egree 2g + 2, τ (i) is a regular parameter of egree exactly 2g + 2. Proof. Assume, there was a function f of only 2g + 1 arguments such that τ (i) (P) = f 2g+2 1 P

16 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 15 for all P. This covers the general case because if there were a function with even fewer arguments then there woul also be a function with 2g + 1 arguments, efine by ignoring the aitional ones. Then, by linearity, f Γ(S;a)Γ(S ;a ) woul be a kernel of egree 2g + 1 for Θ 2, in contraiction to Assumption 1. as the U-statistics associate with these regular pa- Definition 6. We efine statistics rameters. τ (i) As with U-statistics, they satisfy several optimality properties. τ (i) Corollary 1. The estimators are the minimal variance unbiase estimators for τ (i). They are consistent an satisfy the Weak an Strong Law of Large Numbers. Lemma 5. Since Θ 2,ξ c an α γ are regular parameters, there exist U-statistics Θ 2, ξ c, an α γ. Let us abbreviate the U-statistic for the regular parameter V( Θ(T )) as [V( Θ(T ))]. Then [V( Θ(T ))] = ( Θ(T )) 2 Θ 2 = T 2 g c=0 f l c ξ c = T 2 Likewise, the estimators ξ c satisfy the empirical analog to (13). This is in analogy to V( Θ(T )) = E[( Θ(T )) 2 ] Θ 2 = T 2 g c=0 f l c ξ c = T 2 g γ=0 g γ=0 B γ α γ. B γ α γ. Proof. This fact is not obvious but can be checke by straightforwar computation Variance estimation of cross-valiation. In the following, we are referring to a situation where n observations are use to carry out cross-valiation, an there exist n socalle extra observations such that the total number of observations satisfies n + n 2g + 2. Then, there exists a variance estimator for cross-valiation, which may be contraste with Bengio an Granvalet [3]. Precisely, plugging (15) into Theorem 14, we obtain: Theorem 3. The empirical counterpart of the right han-sie of (16) is a U-statistic of egree 2g + 2 uner Assumption 1. It efines the unique minimal variance unbiase estimator of the variance of cross-valiation, if there are enough extra observations. Otherwise, no unbiase estimator exists. This U-statistic [V( Θ(T ))] is ientical to the plug-in estimator (K 1 n 1 (1) ) τ n n/k + (1 K 1 ) τ (3) n 2n/K+1 + n 1 τ(4) n n/k The Weak Law of Large Numbers. There is the following general criterion on the resampling esign for the Weak Law of Large Numbers to hol. Assume that for each sample size n a esign T n with learning sets S n is given. We will only consier the case where the learning set sizes g are the same across all n, so lim g/n = 0. Let us write K n := S n. Theorem 4. Assume that U (1) is non-egenerate in the sense that τ (1) 1 = σ1 2 (1) (U ) 0. Then, the following are equivalent:

17 16 MATHIAS FUCHS NORBERT KRAUTENBACHER (1) (23) lim f l g 0,n/( c=0 f l c,n) = 1. (2) Θ(T n ) is weakly consistent in the sense that Θ(T n ) converges in probability to Θ as n. Proof. (1) = (2): By (13), the quantity ξ c,n is O(n) for c = 0 because τ (1) 0 = 0 an is O(n 2 ) for c 1. Therefore, the summan T n 2 f l 0,nξ 0,n = (n g) 2 ( g c=0 f l c,n) 1 f l 0,nξ 0,n of (14) always vanishes, taking into account that g c=0 f l c,n = (n g) 2 T n 2 for a testcomplete esign. Similarly, the remaining summan of the ecomposition (14) is (n g) 2 ( g fc,n) l 1 g c=0 c=1 f l c,nξ c,n. Since (n g) 2 g c=1 ξ c,n is a boune sequence, it follows from the conition that lim V( Θ(T n )) = 0, an thus the assertion. (2) = (1): Convergence in probability implies convergence of the variance to zero. Since lim ξ 0/(2gnτ (2) 1 + nτ (4) 1 ) = 1 we have 1 lim n 2 Kn 2 ( f l 0,n(2gnτ (2) 1 + nτ (4) 1 ) + n2 c 1 lim ξ c/(n 2 τ c (1) ) = 1,c 1, fc,nτ l c (1) 1 ) = lim K n 2 fc,nτ l c (1) = 0. c 1 Since τc 1 τ (1) 1 0, this implies that limkn 2 fc,n l = 0 for all c 1. Hence, f0,n l lim Kn 2 = 1 lim f l c,n c 1 Kn 2 = 1 0 = 1. Example 3. The conition appearing in the first equivalence of Theorem 4 is satisfie, for instance, for the complete esign sequence. In contrast, it is violate for a esign sequence such that there is an observation that is containe in every learning set for every n. One can also construct esign sequences such that the limit (23) takes values between zero an one A Central Limit Theorem. The following theorem generalizes Lee [9, Theorem 1, Chapter 4.3.1] to U-statistics with a non-symmetric kernel an thus to CV-like proceures. Lemma 6. Let Θ(T ) be a CV-like proceure base on a fixe esign T T an Θ(T ) be the leave-p-out estimator. Then ) (24) V( Θ(T )) V( Θ(T )) = V( Θ(T ) Θ(T ) 0.

18 VARIANCE AND CENTRAL LIMIT THEOREM FOR EMPIRICAL LOSSES 17 Proof. Since V( Θ(T ) Θ(T )) = V( Θ(T )) 2Cov( Θ(T ), Θ(T )) + V( Θ(T )), (24) hols if an only if Cov( Θ(T ), Θ(T )) = V( Θ(T )). Since Cov(Γ(S;a), Θ(T )) is the same for every (S;a) T, we have V( Θ(T )) = Cov( Θ(T ), Θ(T )) = Cov( T 1 (S;a) T Γ(S;a), Θ(T )) = T 1 (S;a) T Cov(Γ(S;a), Θ(T )) = T 1 T Cov(Γ(S;a), Θ(T )) = T 1 T Cov(Γ(S;a), Θ(T )) = T 1 Cov(Γ(S;a), Θ(T )) (S;a) T = Cov( Θ(T ), Θ(T )). The proof iffers from that of [9] not only because we generalize his theorem, but also because there is a mistake in his proof. He assumes that the covariances of an incomplete U-statistic an a kernel are all equal, for each set of the esign. This property, however, is not vali in general. We are now going to investigate a situation where a esign is pre-specifie for each sample size, an will give a general sufficient criterion for a Central Limit Theorem. Let us abbreviate the complete, leave-p-out esign for n by T,n, an its learning sets by S,n such that K,n := S,n = ( n g). Suppose, again, that for each n 2g + 2, a test-complete esign T n for sample size n is given such that the trivial conition K n is satisfie. For each n 2g + 2, an any c = 0,...,g, let fn,c l N 0 be the number of orere pairs of learning sets (S,S ), both occurring in S n, such that S S = c. Recall that T n 2 = (n g) 2 g c=0 f c,n l = (n g) 2 Kn 2. Using f,c,n l = ( n g )( n g g)( c g c), we see that the numbers f,c,n l satisfy the following asymptotic properties: (25) lim n f,1,n l K,n 2 = g 2 for c = 1, an (26) lim n f,c,n l K,n 2 = 0 for all 2 c g. Lemma 7. Assume f l c,n satisfies equations (25), (26) with f l c,n in place of f l,c,n an K n in place of K,n. Then (27) lim n(v( Θ(T n ) V( Θ(T,n )) = 0.

19 18 MATHIAS FUCHS NORBERT KRAUTENBACHER Proof. Let us express the fact that ξ c epens on n by writing ξ c,n. One consiers the orer of magnitue of the ξ c,n as given in the proof of Theorem 4. One substitutes (14) for V( Θ(T n )) an for V( Θ(T,n )) an consiers each summan of the left-han sie of (27) separately (the finite sum over c can be pulle outsie the limit). For instance, the summan belonging to c = 0 is zero because, by assumption, an, similarly For c = 1, we have lim ((n f l 0,nξ 0,n )/ T n 2 ) = lim ((n 2 f l 0,n/ T n 2 ) ξ 0,n /n) = lim (n2 f0,n/ T l n 2 ) lim (ξ 0,n /n) = lim (ξ 0,n /n) lim ((n f l,0,nξ 0,n )/ T,n 2 ) = lim (ξ 0,n /n). lim ((n f l 1,nξ 1,n )/ T n 2 ) = lim ((n 3 f l 1,n/ T n 2 ) ξ 1,n /n 2 ) = lim (n3 f l 1,n/ T n 2 ) lim (ξ 1,n /n 2 ) = g 2 lim (ξ 1,n /n 2 ) an lim ((n f l,1,nξ 1,n )/ T,n 2 ) = g 2 lim (ξ 1,n /n 2 ). Analogously, one shows that every summan belonging to any other c vanishes. Theorem 5. The following are equivalent: (1) Equation (27) (2) ( Θ(T n ) Θ)(V( Θ(T,n ))) 1/2 N (0,1) in istribution as g remains fixe, n. The conition of Lemma 7 implies the conition (23) of Theorem 4; so we pass to a more specific case. Likewise, the situation of Theorem 4 is that a relaxation of (27) hols, namely that the left-han sie of (27) lacks the factor n. It is easy to construct examples of esign sequences T n such that (23) hols but (27) is violate. Proof. (1) = (2): We have (28) ( Θ(T ) Θ)(V( Θ(T,n ))) 1/2 = ( Θ(T ) Θ(T,n ))(V( Θ(T,n ))) 1/2 + ( Θ(T,n ) Θ)(V( Θ(T,n ))) 1/2. We will show that the first summan converges in probability to zero while the secon converges in istribution to N (0,1). The claim then follows from the stanar fact that istribution in convergence is invariant uner perturbation with a term that converges to zero in probability. Using Lemma 6, the variance of the first summan of the right-han sie of (28) is V [ ( Θ(T ) Θ(T,n ))(V( Θ(T,n ))) 1/2] = V( Θ(T n ))/V( Θ(T,n )) 1, which converges to zero as we have just shown. The secon summan of the right-han sie of (28) is the stanarize U-statistic which satisfies the Central Limit Theorem [7, Theorem 7.1], multiplie by the square root of the ratio of the variances which converges to one. This completes the proof. (2) = (1): Taking the variance of the left-han sie of (27), we see that

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10

DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10 DIFFERENTIAL GEOMETRY, LECTURE 15, JULY 10 5. Levi-Civita connection From now on we are intereste in connections on the tangent bunle T X of a Riemanninam manifol (X, g). Out main result will be a construction

More information

Introduction to the Vlasov-Poisson system

Introduction to the Vlasov-Poisson system Introuction to the Vlasov-Poisson system Simone Calogero 1 The Vlasov equation Consier a particle with mass m > 0. Let x(t) R 3 enote the position of the particle at time t R an v(t) = ẋ(t) = x(t)/t its

More information

The Exact Form and General Integrating Factors

The Exact Form and General Integrating Factors 7 The Exact Form an General Integrating Factors In the previous chapters, we ve seen how separable an linear ifferential equations can be solve using methos for converting them to forms that can be easily

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction

FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS. 1. Introduction FLUCTUATIONS IN THE NUMBER OF POINTS ON SMOOTH PLANE CURVES OVER FINITE FIELDS ALINA BUCUR, CHANTAL DAVID, BROOKE FEIGON, MATILDE LALÍN 1 Introuction In this note, we stuy the fluctuations in the number

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Logarithmic spurious regressions

Logarithmic spurious regressions Logarithmic spurious regressions Robert M. e Jong Michigan State University February 5, 22 Abstract Spurious regressions, i.e. regressions in which an integrate process is regresse on another integrate

More information

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS

ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS ALGEBRAIC AND ANALYTIC PROPERTIES OF ARITHMETIC FUNCTIONS MARK SCHACHNER Abstract. When consiere as an algebraic space, the set of arithmetic functions equippe with the operations of pointwise aition an

More information

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

Ramsey numbers of some bipartite graphs versus complete graphs

Ramsey numbers of some bipartite graphs versus complete graphs Ramsey numbers of some bipartite graphs versus complete graphs Tao Jiang, Michael Salerno Miami University, Oxfor, OH 45056, USA Abstract. The Ramsey number r(h, K n ) is the smallest positive integer

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Witt#5: Around the integrality criterion 9.93 [version 1.1 (21 April 2013), not completed, not proofread]

Witt#5: Around the integrality criterion 9.93 [version 1.1 (21 April 2013), not completed, not proofread] Witt vectors. Part 1 Michiel Hazewinkel Sienotes by Darij Grinberg Witt#5: Aroun the integrality criterion 9.93 [version 1.1 21 April 2013, not complete, not proofrea In [1, section 9.93, Hazewinkel states

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Permanent vs. Determinant

Permanent vs. Determinant Permanent vs. Determinant Frank Ban Introuction A major problem in theoretical computer science is the Permanent vs. Determinant problem. It asks: given an n by n matrix of ineterminates A = (a i,j ) an

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy,

NOTES ON EULER-BOOLE SUMMATION (1) f (l 1) (n) f (l 1) (m) + ( 1)k 1 k! B k (y) f (k) (y) dy, NOTES ON EULER-BOOLE SUMMATION JONATHAN M BORWEIN, NEIL J CALKIN, AND DANTE MANNA Abstract We stuy a connection between Euler-MacLaurin Summation an Boole Summation suggeste in an AMM note from 196, which

More information

θ x = f ( x,t) could be written as

θ x = f ( x,t) could be written as 9. Higher orer PDEs as systems of first-orer PDEs. Hyperbolic systems. For PDEs, as for ODEs, we may reuce the orer by efining new epenent variables. For example, in the case of the wave equation, (1)

More information

Jointly continuous distributions and the multivariate Normal

Jointly continuous distributions and the multivariate Normal Jointly continuous istributions an the multivariate Normal Márton alázs an álint Tóth October 3, 04 This little write-up is part of important founations of probability that were left out of the unit Probability

More information

Calculus of Variations

Calculus of Variations Calculus of Variations Lagrangian formalism is the main tool of theoretical classical mechanics. Calculus of Variations is a part of Mathematics which Lagrangian formalism is base on. In this section,

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

Tractability results for weighted Banach spaces of smooth functions

Tractability results for weighted Banach spaces of smooth functions Tractability results for weighte Banach spaces of smooth functions Markus Weimar Mathematisches Institut, Universität Jena Ernst-Abbe-Platz 2, 07740 Jena, Germany email: markus.weimar@uni-jena.e March

More information

On combinatorial approaches to compressed sensing

On combinatorial approaches to compressed sensing On combinatorial approaches to compresse sensing Abolreza Abolhosseini Moghaam an Hayer Raha Department of Electrical an Computer Engineering, Michigan State University, East Lansing, MI, U.S. Emails:{abolhos,raha}@msu.eu

More information

PDE Notes, Lecture #11

PDE Notes, Lecture #11 PDE Notes, Lecture # from Professor Jalal Shatah s Lectures Febuary 9th, 2009 Sobolev Spaces Recall that for u L loc we can efine the weak erivative Du by Du, φ := udφ φ C0 If v L loc such that Du, φ =

More information

Generalized Tractability for Multivariate Problems

Generalized Tractability for Multivariate Problems Generalize Tractability for Multivariate Problems Part II: Linear Tensor Prouct Problems, Linear Information, an Unrestricte Tractability Michael Gnewuch Department of Computer Science, University of Kiel,

More information

Sturm-Liouville Theory

Sturm-Liouville Theory LECTURE 5 Sturm-Liouville Theory In the three preceing lectures I emonstrate the utility of Fourier series in solving PDE/BVPs. As we ll now see, Fourier series are just the tip of the iceberg of the theory

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1

d dx But have you ever seen a derivation of these results? We ll prove the first result below. cos h 1 Lecture 5 Some ifferentiation rules Trigonometric functions (Relevant section from Stewart, Seventh Eition: Section 3.3) You all know that sin = cos cos = sin. () But have you ever seen a erivation of

More information

Agmon Kolmogorov Inequalities on l 2 (Z d )

Agmon Kolmogorov Inequalities on l 2 (Z d ) Journal of Mathematics Research; Vol. 6, No. ; 04 ISSN 96-9795 E-ISSN 96-9809 Publishe by Canaian Center of Science an Eucation Agmon Kolmogorov Inequalities on l (Z ) Arman Sahovic Mathematics Department,

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Self-normalized Martingale Tail Inequality

Self-normalized Martingale Tail Inequality Online-to-Confience-Set Conversions an Application to Sparse Stochastic Banits A Self-normalize Martingale Tail Inequality The self-normalize martingale tail inequality that we present here is the scalar-value

More information

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation

Tutorial on Maximum Likelyhood Estimation: Parametric Density Estimation Tutorial on Maximum Likelyhoo Estimation: Parametric Density Estimation Suhir B Kylasa 03/13/2014 1 Motivation Suppose one wishes to etermine just how biase an unfair coin is. Call the probability of tossing

More information

Parameter estimation: A new approach to weighting a priori information

Parameter estimation: A new approach to weighting a priori information Parameter estimation: A new approach to weighting a priori information J.L. Mea Department of Mathematics, Boise State University, Boise, ID 83725-555 E-mail: jmea@boisestate.eu Abstract. We propose a

More information

On the enumeration of partitions with summands in arithmetic progression

On the enumeration of partitions with summands in arithmetic progression AUSTRALASIAN JOURNAL OF COMBINATORICS Volume 8 (003), Pages 149 159 On the enumeration of partitions with summans in arithmetic progression M. A. Nyblom C. Evans Department of Mathematics an Statistics

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold

CHAPTER 1 : DIFFERENTIABLE MANIFOLDS. 1.1 The definition of a differentiable manifold CHAPTER 1 : DIFFERENTIABLE MANIFOLDS 1.1 The efinition of a ifferentiable manifol Let M be a topological space. This means that we have a family Ω of open sets efine on M. These satisfy (1), M Ω (2) the

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Some Examples. Uniform motion. Poisson processes on the real line

Some Examples. Uniform motion. Poisson processes on the real line Some Examples Our immeiate goal is to see some examples of Lévy processes, an/or infinitely-ivisible laws on. Uniform motion Choose an fix a nonranom an efine X := for all (1) Then, {X } is a [nonranom]

More information

A Weak First Digit Law for a Class of Sequences

A Weak First Digit Law for a Class of Sequences International Mathematical Forum, Vol. 11, 2016, no. 15, 67-702 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1288/imf.2016.6562 A Weak First Digit Law for a Class of Sequences M. A. Nyblom School of

More information

The Subtree Size Profile of Plane-oriented Recursive Trees

The Subtree Size Profile of Plane-oriented Recursive Trees The Subtree Size Profile of Plane-oriente Recursive Trees Michael FUCHS Department of Applie Mathematics National Chiao Tung University Hsinchu, 3, Taiwan Email: mfuchs@math.nctu.eu.tw Abstract In this

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes

3.7 Implicit Differentiation -- A Brief Introduction -- Student Notes Fin these erivatives of these functions: y.7 Implicit Differentiation -- A Brief Introuction -- Stuent Notes tan y sin tan = sin y e = e = Write the inverses of these functions: y tan y sin How woul we

More information

Integration Review. May 11, 2013

Integration Review. May 11, 2013 Integration Review May 11, 2013 Goals: Review the funamental theorem of calculus. Review u-substitution. Review integration by parts. Do lots of integration eamples. 1 Funamental Theorem of Calculus In

More information

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations Characterizing Real-Value Multivariate Complex Polynomials an Their Symmetric Tensor Representations Bo JIANG Zhening LI Shuzhong ZHANG December 31, 2014 Abstract In this paper we stuy multivariate polynomial

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Approximate Constraint Satisfaction Requires Large LP Relaxations

Approximate Constraint Satisfaction Requires Large LP Relaxations Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi

More information

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs

Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Perfect Matchings in Õ(n1.5 ) Time in Regular Bipartite Graphs Ashish Goel Michael Kapralov Sanjeev Khanna Abstract We consier the well-stuie problem of fining a perfect matching in -regular bipartite

More information

The Principle of Least Action

The Principle of Least Action Chapter 7. The Principle of Least Action 7.1 Force Methos vs. Energy Methos We have so far stuie two istinct ways of analyzing physics problems: force methos, basically consisting of the application of

More information

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH

TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH English NUMERICAL MATHEMATICS Vol14, No1 Series A Journal of Chinese Universities Feb 2005 TOEPLITZ AND POSITIVE SEMIDEFINITE COMPLETION PROBLEM FOR CYCLE GRAPH He Ming( Λ) Michael K Ng(Ξ ) Abstract We

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

1. Aufgabenblatt zur Vorlesung Probability Theory

1. Aufgabenblatt zur Vorlesung Probability Theory 24.10.17 1. Aufgabenblatt zur Vorlesung By (Ω, A, P ) we always enote the unerlying probability space, unless state otherwise. 1. Let r > 0, an efine f(x) = 1 [0, [ (x) exp( r x), x R. a) Show that p f

More information

REAL ANALYSIS I HOMEWORK 5

REAL ANALYSIS I HOMEWORK 5 REAL ANALYSIS I HOMEWORK 5 CİHAN BAHRAN The questions are from Stein an Shakarchi s text, Chapter 3. 1. Suppose ϕ is an integrable function on R with R ϕ(x)x = 1. Let K δ(x) = δ ϕ(x/δ), δ > 0. (a) Prove

More information

Mathematical Review Problems

Mathematical Review Problems Fall 6 Louis Scuiero Mathematical Review Problems I. Polynomial Equations an Graphs (Barrante--Chap. ). First egree equation an graph y f() x mx b where m is the slope of the line an b is the line's intercept

More information

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7.

1 dx. where is a large constant, i.e., 1, (7.6) and Px is of the order of unity. Indeed, if px is given by (7.5), the inequality (7. Lectures Nine an Ten The WKB Approximation The WKB metho is a powerful tool to obtain solutions for many physical problems It is generally applicable to problems of wave propagation in which the frequency

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS

SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS SYMMETRIC KRONECKER PRODUCTS AND SEMICLASSICAL WAVE PACKETS GEORGE A HAGEDORN AND CAROLINE LASSER Abstract We investigate the iterate Kronecker prouct of a square matrix with itself an prove an invariance

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

The chromatic number of graph powers

The chromatic number of graph powers Combinatorics, Probability an Computing (19XX) 00, 000 000. c 19XX Cambrige University Press Printe in the Unite Kingom The chromatic number of graph powers N O G A A L O N 1 an B O J A N M O H A R 1 Department

More information

05 The Continuum Limit and the Wave Equation

05 The Continuum Limit and the Wave Equation Utah State University DigitalCommons@USU Founations of Wave Phenomena Physics, Department of 1-1-2004 05 The Continuum Limit an the Wave Equation Charles G. Torre Department of Physics, Utah State University,

More information

2Algebraic ONLINE PAGE PROOFS. foundations

2Algebraic ONLINE PAGE PROOFS. foundations Algebraic founations. Kick off with CAS. Algebraic skills.3 Pascal s triangle an binomial expansions.4 The binomial theorem.5 Sets of real numbers.6 Surs.7 Review . Kick off with CAS Playing lotto Using

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Similar Operators and a Functional Calculus for the First-Order Linear Differential Operator

Similar Operators and a Functional Calculus for the First-Order Linear Differential Operator Avances in Applie Mathematics, 9 47 999 Article ID aama.998.067, available online at http: www.iealibrary.com on Similar Operators an a Functional Calculus for the First-Orer Linear Differential Operator

More information

Linear and quadratic approximation

Linear and quadratic approximation Linear an quaratic approximation November 11, 2013 Definition: Suppose f is a function that is ifferentiable on an interval I containing the point a. The linear approximation to f at a is the linear function

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

A Modification of the Jarque-Bera Test. for Normality

A Modification of the Jarque-Bera Test. for Normality Int. J. Contemp. Math. Sciences, Vol. 8, 01, no. 17, 84-85 HIKARI Lt, www.m-hikari.com http://x.oi.org/10.1988/ijcms.01.9106 A Moification of the Jarque-Bera Test for Normality Moawa El-Fallah Ab El-Salam

More information

Review of Differentiation and Integration for Ordinary Differential Equations

Review of Differentiation and Integration for Ordinary Differential Equations Schreyer Fall 208 Review of Differentiation an Integration for Orinary Differential Equations In this course you will be expecte to be able to ifferentiate an integrate quickly an accurately. Many stuents

More information

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2. Assignment 13 Exercise 8.4 For the hypotheses consiere in Examples 8.12 an 8.13, the sign test is base on the statistic N + = #{i : Z i > 0}. Since 2 n(n + /n 1) N(0, 1) 2 uner the null hypothesis, the

More information

Calculus of Variations

Calculus of Variations 16.323 Lecture 5 Calculus of Variations Calculus of Variations Most books cover this material well, but Kirk Chapter 4 oes a particularly nice job. x(t) x* x*+ αδx (1) x*- αδx (1) αδx (1) αδx (1) t f t

More information

Well-posedness of hyperbolic Initial Boundary Value Problems

Well-posedness of hyperbolic Initial Boundary Value Problems Well-poseness of hyperbolic Initial Bounary Value Problems Jean-François Coulombel CNRS & Université Lille 1 Laboratoire e mathématiques Paul Painlevé Cité scientifique 59655 VILLENEUVE D ASCQ CEDEX, France

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

3 The variational formulation of elliptic PDEs

3 The variational formulation of elliptic PDEs Chapter 3 The variational formulation of elliptic PDEs We now begin the theoretical stuy of elliptic partial ifferential equations an bounary value problems. We will focus on one approach, which is calle

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Step 1. Analytic Properties of the Riemann zeta function [2 lectures]

Step 1. Analytic Properties of the Riemann zeta function [2 lectures] Step. Analytic Properties of the Riemann zeta function [2 lectures] The Riemann zeta function is the infinite sum of terms /, n. For each n, the / is a continuous function of s, i.e. lim s s 0 n = s n,

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

How to Minimize Maximum Regret in Repeated Decision-Making

How to Minimize Maximum Regret in Repeated Decision-Making How to Minimize Maximum Regret in Repeate Decision-Making Karl H. Schlag July 3 2003 Economics Department, European University Institute, Via ella Piazzuola 43, 033 Florence, Italy, Tel: 0039-0-4689, email:

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

Robustness and Perturbations of Minimal Bases

Robustness and Perturbations of Minimal Bases Robustness an Perturbations of Minimal Bases Paul Van Dooren an Froilán M Dopico December 9, 2016 Abstract Polynomial minimal bases of rational vector subspaces are a classical concept that plays an important

More information

Differentiability, Computing Derivatives, Trig Review. Goals:

Differentiability, Computing Derivatives, Trig Review. Goals: Secants vs. Derivatives - Unit #3 : Goals: Differentiability, Computing Derivatives, Trig Review Determine when a function is ifferentiable at a point Relate the erivative graph to the the graph of an

More information