OBSERVATIONS ON BAGGING

Size: px
Start display at page:

Download "OBSERVATIONS ON BAGGING"

Transcription

1 OBSERVATIONS ON BAGGING Andreas Buja and Werner Stuetzle Uniersity of Pennsylania and Uniersity of Washington Abstract: Bagging is a deice intended for reducing the prediction error of learning algorithms. In its simplest form, bagging draws bootstrap samples from the training sample, applies the learning algorithm to each bootstrap sample, and then aerages the resulting prediction rules. More generally, the resample size M may be different from the original sample size N, and resampling can be done with or without replacement. We inestigate bagging in a simplified situation: the prediction rule produced by a learning algorithm is replaced by a simple real-alued U-statistic of i.i.d. data. U-statistics of high order can describe complex dependencies, and yet they admit a rigorous asymptotic analysis. We show that bagging U-statistics often but not always decreases ariance, whereas it always increases bias. The most striking finding, howeer, is an equialence between bagging based on resampling with and without replacement: the respectie resample sizes M with = α with N and M w/o = α w/o N produce ery similar bagged statistics if α with = α w/o /( α w/o ). While our deriation is limited to U-statistics, the equialence seems to be uniersal. We illustrate this point in simulations where bagging is applied to cart trees. Key words and phrases: Bagging, U-statistics, cart.. Introduction Bagging, short for bootstrap aggregation, was introduced by Breiman (996) as a deice for reducing the prediction error of learning algorithms. Bagging is performed by drawing bootstrap samples from the training sample, applying the learning algorithm to each bootstrap sample, and aeraging/aggregating the resulting prediction rules, that is, aeraging or otherwise aggregating the predicted alues for test obserations. Breiman presents empirical eidence that bagging can indeed reduce prediction error. It appears to be most effectie for cart trees (Breiman et al. 984). Breiman s heuristic explanation is that cart trees are highly unstable functions of the data a small change in the training sample

2 ANDREAS BUJA AND WERNER STUETZLE can result in a ery different tree and that aeraging oer bootstrap samples reduces the ariance component of the prediction error. The goal of the present article is to contribute to the theoretical understanding of bagging. We inestigate bagging in a simplified situation: the prediction rule produced by a learning algorithm is replaced by a simple real-alued statistic of i.i.d. data. While this simplification does not capture some characteristics of function fitting, it still enables us, for example, to analyze the conditions under which ariance reduction occurs. The claim that bagging always reduces ariance is in fact not true. We start by describing bagging in operational terms. Bagging a statistic θ(x,..., X N ) is defined as aeraging it oer bootstrap samples X,..., X N : θ B (X,..., X N ) = ae X,...,X N θ(x,..., X N). where the obserations X i are i.i.d. draws from { X,..., X N }. The bagged statistic can also be written as θ B (X,..., X N ) = N N i,...,i N θ(x i,..., X in ) because there are N N sets of bootstrap samples, each haing probability /N N. For realistic sample sizes N, the N N sets cannot be enumerated in actual computations, hence one resorts to sampling a smaller number, often as few as 50. Our analysis coers seeral ariations on bagging. Instead of aeraging the alues of a statistic oer bootstrap samples of the same size N as the original sample, we may choose the resample size M to be smaller, or een larger, than N. Another alternatie coered by our analysis is resampling without replacement. The specific class of statistics we consider is that of finite sums of U-statistics. While they do not capture the statistical properties of cart trees, U-statistics can model complex interactions and yet they allow for a rigorous second order analysis. (For an approach tailored to tree-based methods, see Buhlmann and Yu (003).) The most striking effect we obsere, both theoretically and in simulations, is a correspondence between bagging based on resampling with and without replacement. The two modes of resampling produce ery similar bagged statistics

3 OBSERVATIONS ON BAGGING 3 if the resampling fractions α w/o = M w/o /N for sampling without replacement and α with = M with /N for sampling with replacement satisfy the relation α with = α w/o α w/o, or =. α with α w/o This equialence holds to order N under regularity assumptions. equialence is implicit in one form or another in preious work: Friedman and Hall (000, sec..6) notice it for a type of polynomial expansions, but they do not make use of it other than noting that half-sampling without replacement (α w/o = /) and standard bootstrap sampling (α with = ) yield ery similar bagged statistics. The Knight and Bassett (00, sec. 4) note the equialence for half-sampling and bootstrap in the case of quantile estimators. In the present article we show the equialence for U-statistics of fixed but arbitrary order. We also illustrate it in simulations for bagged trees where it holds with surprising accuracy, hinting at a much greater range of alidity. Other obserations about the effects of bagging concern the ariance, squared bias, and mean squared error (MSE) of bagged U-statistics. Similar to Chen and Hall (00) and Knight and Bassett (00), we obtain effects that are only of order O(N ). We also find that, with decreasing resample size, squared bias always increases and ariance often but not always decreases. More precisely, the difference between bagged and unbagged for the squared bias is an increasing quadratic function of g := =, α with α w/o and for the ariance it is an often but not always decreasing linear function of g. Therefore, the only possible beneficial effect of bagging stems from ariance reduction. In those situations where ariance is reduced, the combined effect of bagging is to reduce the MSE in an interal of g near zero; equialently, the MSE is reduced for α with near infinity and correspondingly for α w/o near. For the standard bootstrap ( α with = ) and half-sampling (α w/o = /), improements in MSE are obtained only if the resample sizes fall in the respectie critical interals. Howeer, there can arise odd situations in which the MSE is improed only for α with > and α w/o > /. We finish this article with some illustratie simulations of bagged cart trees. A purpose of these illustrations is to gain some understanding of the peculiarities

4 4 ANDREAS BUJA AND WERNER STUETZLE of trees in light of the fact that bagging often shows dramatic improements that apparently go beyond the effects described by O(N ) asymptotics. An important point to keep in mind is that there are two notion of bias: If we regard θ(f N ) as a plug-in estimate of θ(f ), then the plug-in bias is E θ(f N ) θ(f ). If we regard θ(f N ) as an estimate for some parameter µ, then the estimation bias is E θ(f N ) µ. The second notion of bias estimation bias is the one commonly used in function estimation. Our theory of bagged U-statistics, howeer, is concerned with plugin bias, not with estimation bias. The same applies to Chen and Hall s (00) theory of bagging estimating equations, as well as Knight and Bassett s (00) theory of bagged quantiles. This point een applies to Buhlmann and Yu s (003) treatment of bagged stumps and trees because their notion of bias refers not to the true underlying function but to the optimal asymptotic target, that is, the asymptotically best fitting stump or tree. Their theory therefore explains bagging s effect on the ariance of stumps and trees (better than any of the other theories, including ours), but it, too, has nothing to say about bias in the usual sense of function fitting. An interesting obseration we make in the simulations is that for smooth underlying f(x) bagging not only decreases ariance, but it can reduce estimation bias as well. This should not be too surprising because according to Buhlmann and Yu s theory the effect of bagging is essentially to replace fitting a stump with fitting a stump conoled with a narrow-bandwidth kernel. The conoled stump is smooth and has a chance to reduce estimation bias when the underlying f(x) is smooth. Note: Unless otherwise noted, proofs of propositions can be found in the appendix.. Resampling U-Statistics Let X, X,..., X N be i.i.d. random ariables. We consider statistics of X,..., X N that are finite sums U = A Xi + N N B Xi,X j + N 3 C Xi,X j,x k +..., i i,j i,j,k

5 OBSERVATIONS ON BAGGING 5 where the kernels B, C,... are permutation symmetric in their arguments. [We put the arguments in subscripts in order to aoid the clutter caused by frequent parentheses.] The normalizations of the sums are such that under common assumptions limits for N exist. Strictly speaking, only the off-diagonal part i j B X i,x j (e. g.) is a proper U-statistic. Because we include the diagonal i = j in the double sum, this is strictly speaking a V-statistic or on Mises statistic (Serfling 980, Sec. 5..), but we use the better known term U-statistic anyway. The reason for including the diagonal is that only in this way can U be interpreted as the plug-in estimate U(F N ) of a statistical functional U(F ) = E A X + E B X,Y + E C X,Y,Z +... where X, Y, Z,... are i.i.d. Knowing what the statistic U estimates is a necessity for bias calculations. A second reason for including the diagonal is that bagging has the effect of creating terms such as B Xi,X i, so we may as well include such terms from the outset. It is possible to explicitly calculate the bagged ersion U bag of a sum of U-statistics U. We can allow bagging based on resampling with and without replacement as well as arbitrary resample sizes M. Let W = W..., W N 0 be integer-alued random ariables counting the multiplicities of X,..., X N in a resample. For resampling with replacement, that is, bootstrap, the distribution of W is Multinomial(/N,..., /N; M). Conentional bootstrap corresponds to M = N, but we allow M to range between and. Although M > N is computationally undesirable, infinity is the conceptually plausible upper bound on M: for M = no aeraging takes place because with an infinite resample one has F M = F N. For resampling without replacement, that is, subsampling, the distribution of W is Hypergeometric(M, N). Half-sampling, for example, is for M = N/, but the resample size M can range between and N. For the upper bound M = N no aeraging takes place because the resample is just a permutation of the data, hence F M = F N. With these facts we can write down the resampled and the bagged ersion of a U explicitly. We illustrate this for a statistic U with kernels A Xi and B Xi,X j. For

6 6 ANDREAS BUJA AND WERNER STUETZLE a resample of M with multiplicities W,..., W N, the alue of U is U resample = M Wi A Xi + M Wi W j B Xi,X j. The bagged ersion of U under either mode of resampling is the expected alue with respect to W: [ U bag = E W Wi A Xi + ] Wi M M W j B Xi,X j = E [Wi ] A Xi + E [Wi M M W j ] B Xi,X j. From the form of U bag it is apparent that the only releant quantities are moments of W: E W i = M N E Wi = E W i W j = with: w/o: with: w/o: with and w/o M N + M(M ) N M N M(M ) N M(M ) N(N ) (i j) The bagged functional can now be written down explicitly. It is necessary to distinguish between the two resampling modes: we denote U bag by U with and U w/o for resampling with and without replacement, respectiely. U with = ( A Xi + ) B Xi,X N M i + ( i with N M i,j with U w/o = A Xi + M w/o N N B Xi,X M i + N w/o N i i,j ) B Xi,X j, M w/o N B Xi,X j. Analogous calculations can be carried out for statistics with U-terms of orders higher than two. We summarize: Proposition : A bagged sum of U-statistics is also a sum of U-statistics. For a statistic with kernels A x and B x,y only, the bagged terms A with x, Bx,y with and A w/o x, Bx,y w/o, respectiely, depend on A x and B x,y as follows: A with x = A x + ( B x,x, Bx,y with = ) B x,y, M with M with

7 A w/o x = A x + OBSERVATIONS ON BAGGING 7 M w/o N N M w/o B x,x, B w/o x,y = M w/o N B x,y. For U-statistics with terms of first and second order, the proposition is a direct result of the preceding calculations. For general U-statistics of arbitrary order, the proposition is a consequence of the proofs in the Appendix. We see from the proposition that the effect of bagging is to remoe mass from the proper U-part of B ( i j ) and shift it to the diagonal ( i=j ), thus increasing the importance of the linear part. Similar effects take place in higher orders where ariability is shifted to lower orders. 3. Equialence of Resampling With and Without Replacement Proposition yields a heuristic for an important fact: bagging based on resampling with replacement yields results ery similar to bagging based on resampling without replacement if the resample sizes M with and M w/o are suitably matched up. The required correspondence can be deried by equating A with = A w/o and/or B with = B w/o in Proposition ; both equations yield the identical condition: Corollary: Bagging a sum of U-statistics of first and second order yields identical results under the two resampling modes if N = N. M with M w/o For a general finite sum of U-statistics of arbitrary order, we do not obtain an identity but an approximate equialence: Proposition: Bagging a finite sum of U-statistics of arbitrary order under either resampling mode yields the same results up to order O(N ) if N = N, M with M w/o assuming the kernels are bounded. If the kernels are not bounded but hae moments of order q, the approximation is to order O(N p ), where p + q =.

8 8 ANDREAS BUJA AND WERNER STUETZLE We will similarly see that ariance, squared bias and hence MSE of bagged U-statistics all agree in the N term in the two resampling modes under corresponding resample sizes. The correspondence between the two resampling modes is more intuitie if one expresses the resample sizes M with and M w/o as fractions/multiples of the sample size N: α with = M with (> 0, < ) and α N w/o = M w/o N The condition of Proposition aboe is equialent to α with = α w/o. α w/o (> 0, < ). It equates, for example, half-sampling without replacement, α w/o = /, with conentional bootstrap, α with =. Subsampling without replacement with α w/o > / corresponds to bootstrap with α with >, that is, bootstrap resamples larger than the original sample. The correspondence also maps α w/o = to α with =, both of which mean that the bagged and the unbagged statistic are identical. 4. The Effect of Bagging on Variance, Bias and MSE 4. Variance Variances of U-statistics can be calculated explicitly. statistic that has only terms A X and B X,Y, the ariance is Var(U) = N Var(A X + B X ) For example, for a + N (Co(A X, B X,X ) + 4Co(B X,X, B X ) 4Co(A X, B X ) +Var(B X,Y ) Var(B X )) + N 3 (Var(B X,X ) Var(B X,Y ) + 8Var(B X ) 4Co(B X,X, B X )) We are, howeer, primarily interested not in ariances but differences between ariances of bagged and unbagged statistics: Proposition 3: Let g = N M for sampling with replacement and g = N M for sampling without replacement. Assume g is fixed and 0 < g < as N. Let U be a finite sum of U-statistics of arbitrary order; then: Var(U bag ) Var(U) = N S Var g + O( N 3 ),

9 OBSERVATIONS ON BAGGING 9 for both sampling with and without replacement. If U has only terms A X and B X,Y, then: S Var = Co(A X + B X, B X,X B X ). The effect of bagging on ariance is of order O(N ). The proof is in the appendix; we also show how to calculate S Var for statistics with U-terms of any order. The assumption about g is essential. If it is not satisfied, the order of the asymptotics will be affected. The jackknife is a case in point: It is obtained for M = N and resampling without replacement. This implies g 0, which iolates the assumption of the proposition. It would be easy to coer this type of asymptotics because the calculations can be performed exactly. There exist situations in which bagging increases the ariance, namely, when S Var > 0. If S Var < 0, ariance is reduced, and the beneficial effect becomes the more pronounced the smaller the resample size. Therefore, the fact that bagging may reduce ariance cannot be the whole story: if ariance were the only criterion of interest, one should choose the resample size M as low as operationally feasible for maximal ariance reduction. Obiously, one has to take into account bias as well. 4. Bias We show that bagging U-statistics always increases squared bias (except for linear statistics, where the bias anishes). Recall that the statistic U = U(F N ) is the plug-in estimator for the functional U(F ), so the bias is E U(F N ) U(F ). Proposition 4: Under the same assumptions as in Proposition 3, we hae: Bias (U bag ) Bias (U) = N (g + g) S Bias + O( N 3 ), for both sampling with and without replacement. If U has only terms A X and B X,Y, then S Bias = (E B X,X E B X,Y ). The appendix has proofs and a general formula for S Bias for statistics with U-terms of any order.

10 0 ANDREAS BUJA AND WERNER STUETZLE Bias^ 0 MSE Var g Figure 4.: Dependence of Variance, Squared Bias and MSE on g. The graph shows the situation for S Var /S Bias = 4. Bagging is beneficial for g < 6, that is, for resample sizes M with > N/6 and M w/o > N/7. Optimal is g = 3, that is, M with = N/3 and M w/o = N/4. Just as in the comparison of ariances, sampling with and without replacement agree in the N term modulo differing interpretation of g in the two resampling modes. 4.3 Mean Squared Error The mean squared error of U = U(F N ) is MSE(U) = E ( [U(F N ) U(F )] ) = Var(U) + Bias (U). The difference between MSEs of bagged and unbagged functionals is as follows: Proposition 5: Under the same assumptions as in Propositions 3 and 4, we hae: MSE(U bag M (F N)) MSE(U(F N )) = ( ) N S Bias g + (S Var + S Bias ) g for both sampling with and without replacement. + O( N 3 ). 4.4 Choice of Resample Size

11 OBSERVATIONS ON BAGGING In some situations one may obtain a reduction in MSE for some resample sizes M but not for others, while in other situations bagging may neer lead to an improement. The critical factor is the dependence of the MSE difference on g: S Bias g + (S Var + S Bias ) g. One immediately reads off the following condition for MSE improement: Corollary 5: There exist resample sizes for which bagging improes the MSE to order N iff S Var + S Bias < 0. Under this condition the range of beneficial resample sizes is characterized by ( ) SVar g < +. S Bias The resample size with optimal MSE improement is ( ) g opt SVar = +. S Bias Conentional bootstrap, M with = N, and half-sampling, M w/o = N/, (both characterized by g = ) are beneficial iff and they are optimal iff S Var S Bias < 3, S Var S Bias =. Recall from Proposition 3 that the resample sizes M with and M w/o are expressed in terms of g with = N/M with and g w/o = N/M w/o. The corollary therefore prescribes a minimum resample size to achiee MSE reduction. See Figure for an illustration. The intuition that the benefits of bagging arise from ariance reduction is thus correct, although it must be qualified: Bagging is not always beneficial, but if it is, the reduction in MSE is due to reduction in ariance. This follows from the fact that S Bias is always positie, hence bagging always increases bias, but if the ariance dips sufficiently strongly, an oerall benefit results.

12 ANDREAS BUJA AND WERNER STUETZLE Recall that the aboe statements should be limited to alues of g bounded away from zero and infinity. Near either boundary a different type of asymptotics sets in. 4.5 An Example: Quadratic Functionals Consider as a concrete example of U-statistics the case of quadratic functions: A X = a X and B X,Y = b XY, that is, U = a X N i + b ( Xi ). N In order to determine the terms S Var and S Bias, we need the first four moments of X: Let µ = E X, σ = E [(X µ) ], γ = E [(X µ) 3 )]/σ 3 and κ = E [(X µ) 4 ]/σ 4 be expectation, ariance, skewness and kurtosis, respectiely. Then: S Var = (µγσ 3 + (κ )σ 4 ) ab + µγσ 3 b and S Bias = b σ 4. It is conenient to write the criterion for the existence of resample sizes with beneficial effect on the MSE as S Var /S Bias + < 0: ( µ σ γ + (κ ) ) a b + ( µ σ γ + ) If µ = 0 or γ = 0, this simplifies to (κ ) a b + < 0. < 0. Since κ > for all distributions except a balanced -point mass, the condition becomes a b < κ. For a =, b =, that is, the empirical ariance U = mean(x ) mean(x), beneficial effects of bagging exist iff κ >. For a = 0, that is, the squared mean U = mean(x), no beneficial effects exist. 5. Simulation Experiment The prinicipal purpose of the experiments presented here is is to demonstrate the correspondence between resampling with and without replacement in the non-triial setting of bagging cart trees.

13 Scenarios. OBSERVATIONS ON BAGGING 3 We consider four scenarios, differing in the size N of the training sample, the dimension p of the predictor space, the noise ariance σ, the number K of leaes of the cart tree, and the true regression function f. The scenarios are adapted from Friedman and Hall (000). Scenario N p X σ K f(x) 800 U[0, ] I(x > 0.5) 800 U[0, ] f(x) = x U[0, ] i= I(x i > 0.3) U[0, ] i= i x i We grew all trees in Scenarios 3 and 4 in a stagewise forward manner without pruning; at each stage we split the node that resulted in the largest reduction of the residual sum of squares, till the desired number of leaes was reached. Performance Measures. Let Tα w/o ( ; L) be the bagged tree obtained by aeraging cart trees grown on resamples of size α N drawn without replacement from a training sample L = (x, y ),..., (x N, y N ), and let Tα wi ( ; L) be the bagged tree obtained by aeraging oer resamples of size α N/( α) drawn with replacement. The mean squared error (MSE) of Tα w/o is MSE (T w/o α ) = E X [ E L ((T w/o α (X; L) f(x)) )] = E X [ E L ((Tα w/o (X; L) E L (Tα w/o (X; L))) )] +E X [ (E L (T w/o α (X; L) f(x))) ] = Var(Tα w/o ) + Bias est(tα w/o ). The MSE of Tα wi is defined analogously. Recall that the definition of bias used here is estimation bias expected difference between the estimated regression function for a finite sample size and the true regression function. As pointed out in the introduction, this is different from plug-in bias expected difference between the alue of a statistic for a finite sample size and its alue for infinite sample size which was analyzed in the earlier sections of the article. cart trees with a fixed number of leaes and their bagging aerages are not in general consistent estimates of the true regression function, and in cases where they are not, as in scenarios () and (4) aboe, the two notions of bias differ.

14 4 ANDREAS BUJA AND WERNER STUETZLE Operational details of the experiment. We estimated plug-in bias, estimation bias, ariance, and MSE for α = 0., 0.,..., 0.9, 0.95, 0.99, ; α = corresponds to unbagged cart. Estimates were obtained by aeraging oer 00 training samples and 0, 000 test obserations. We approximated the bagged trees Tα w/o ( ; L) and Tα wi ( ; L) by aeraging oer 50 resamples. A finite number of resamples adds a significant ariance component to the Monte Carlo estimates of Var(Tα w/o ) and Var(Tα wi ). We adjusted the estimates by remoing this component. The influence on bias is of smaller order. To calculate the plug-in bias we need to know the cart tree for infinite training sample size. In Scenarios and 3 this is not a problem because the trees are consistent estimates for the true regression functions. In Scenarios and 4 we approximated the tree for infinite training sample size by a tree grown on a training sample of size n = 00, 000. Simulation results. Figure 5. summarizes the simulation results for Scenario. The top panels show ariance, squared plug-in bias, and squared estimation bias as functions of the resampling fraction α, for resampling with and without replacement. The bottom panel shows the MSE for both resampling modes, and ariance and squared estimation bias for sampling with replacement only. To make the tick mark labels more readable, ertical scales in all the panels are relatie to the MSE of the unbagged tree. We note that ariance decreases monotonically with decreasing resampling fraction, which confirms the intuition that smaller resample size means more aeraging. Estimation bias and plug-in bias agree because a tree with two leaes is a consistent estimate for the true regression function, which in this scenario is a step function. Squared plug-in bias increases with decreasing resampling fraction, as predicted by the theory presented in Sections 4. through 4.3. Figure 5.3 shows the corresponding results for Scenario. Again, ariance is decreasing with decreasing resampling fraction, and squared plug-in bias is increasing, as predicted by the theory. Squared estimation bias, howeer, is decreasing with decreasing resampling fraction. Bagging therefore coneys a double benefit, decreasing both ariance and squared (estimation) bias. The explanation is simple: A bagged cart tree is smoother than the corresponding unbagged tree,

15 OBSERVATIONS ON BAGGING 5 because bagging smoothes out the discontinuities of a piecewise constant model. If the true regression function is smooth, smoothing the estimate can be expected to be beneficial. Admittedly, the scenario considered here is highly unrealistic, but the beneficial effect can also be expected in more realistic situations, like Scenario 4 discussed below. Scenario 3 is analogous to Scenario, with 0-dimensional instead of onedimensional predictor space. The true regression function is piecewise constant and can be consistently estimated by a cart tree with 50 leaes. The results, shown in Figure 5.4, parallel those for Scenario. The results for Scenario 4, shown in Figure 5.5, closely parallel those for Scenario. Again, both ariance and squared (estimation) bias decrease with decreasing resampling fraction. The experiments confirm the agreement between bagging with and without replacement predicted by the theory deeloped in Section 3: Bagging without replacement with resample size N α gies almost the same results in terms of bias, ariance, and MSE as bagging with replacement with resample size N α/( α). The experiments also confirm that bagging does increase squared plug-in bias. Howeer, the releant quantity in a regression context is estimation bias: if the true regression function is smooth, bagging can in fact reduce estimation bias as well as ariance and therefore yield a double benefit.

16 6 ANDREAS BUJA AND WERNER STUETZLE Scenario, Variance Scenario, Squared Plug in Bias Scenario, Squared Estimation Bias With replacement Without replacement alpha alpha alpha Scenario, Mean Squared Error b b b b b With replacement Without replacement Squared estimation bias Variance b b b b b b bb alpha Figure 5.: Simulation results for Scenario. Top panels: Variance, squared plug-in bias, and squared estimation bias for resampling with and without replacement. Bottom panel: MSE for both resampling modes, and ariance and squared estimation bias for resampling with replacement.

17 OBSERVATIONS ON BAGGING Scenario, Variance With replacement Without replacement Scenario, Squared Plug in Bias Scenario, Squared Estimation Bias alpha alpha alpha Scenario, Mean Squared Error b b b b b With replacement Without replacement Squared estimation bias Variance b alpha b b b b b bb Figure 5.3: Simulation results for Scenario. Top panels: Variance, squared plug-in bias, and squared estimation bias for resampling with and without replacement. Bottom panel: MSE for both resampling modes, and ariance and squared estimation bias for resampling with replacement.

18 8 ANDREAS BUJA AND WERNER STUETZLE Scenario 3, Variance With replacement Without replacement alpha Scenario 3, Squared Plug in Bias alpha Scenario 3, Squared Estimation Bias alpha Scenario 3, Mean Squared Error b With replacement Without replacement Squared estimation bias Variance b b b b b b b b b b bb alpha Figure 5.4: Simulation results for Scenario 3. Top panels: Variance, squared plug-in bias, and squared estimation bias for resampling with and without replacement. Bottom panel: MSE for both resampling modes, and ariance and squared estimation bias for resampling with replacement.

19 OBSERVATIONS ON BAGGING Scenario 4, Variance With replacement Without replacement alpha Scenario 4, Squared Plug in Bias alpha Scenario 4, Squared Estimation Bias alpha Scenario 4, Mean Squared Error b With replacement Without replacement Squared estimation bias Variance b b b b b b b b b b bb alpha Figure 5.5: Simulation results for Scenario 4. Top panels: Variance, squared plug-in bias, and squared estimation bias for resampling with and without replacement. Bottom panel: MSE for both resampling modes, and ariance and squared estimation bias for resampling with replacement.

20 0 ANDREAS BUJA AND WERNER STUETZLE 6. Summary and Conclusions We studied the effects of bagging for U-statistics of any order and finite sums thereof. U-statistics of high order can describe complex data dependencies and yet they admit a rigorous asymptotic analysis. The findings are as follows: The effects of bagging on ariance, squared bias and mean squared error are of order N. (The following statements are all meant to second order.) If one allows boostrap samples with or without replacement and arbitrary resample sizes, then bagging based on sampling with for resample size M with is equialent to sampling without for resample size M w/o if N/M with = N/M w/o = g (> 0, < ). While our deriation is limited to U-statistics and sums thereof, the equialence seems to hold more widely, as illustrated by our experiments with bagging cart trees. Var(bagged) Var(raw) is a linear function of g; bagging improes ariance if the slope is negatie. Bias (bagged) Bias (raw) is a positie quadratic function of g; bagging hence always increases squared bias. MSE (bagged) MSE (raw) is a quadratic function of g; bagging may or may not improe mean squared error, and if it does, it is for sufficiently small g, that is, sufficiently large resample sizes M. While our theory does not explain the striking magnitude of the improements seen when bagging cart trees, there is qualitatie agreement between our theoretical findings and the experimental results. Plug-in bias increases and ariance decreases with decreasing resample size, as predicted by the theory, and the theoretically predicted equialence between bagging with and without replacement is indeed obsered in the experiments.

21 OBSERVATIONS ON BAGGING 7. Appendix 7. Summation Patterns for U-Statistics The calculations for U-statistics in this and the following sections are reminiscent of those found in Hoeffding (948). We introduce notation for statistical functionals that are interactions of order J and K, respectiely: B = N J B µ, C = N K C ν, µ where µ = (µ,..., µ J ) {,..., N} J, B µ = B Xµ,...,X µj, ν = (ν,..., ν K ) {,..., N} K, C ν = C Xν,...,X νk. We assume the functions B x,...,x J and C y,...,y K to be permutation symmetric in their arguments, the random ariables X,..., X N to be i.i.d., and the second moments of B µ and C ν to exist for all µ and ν. As is usual in the context of on Mises expansions, we do not limit the summations to distinct indices as is usual in the context of U-statistics. One reason is that we wish B and C to be plug-in estimates of the functionals E B,...,J and E C,...,K. Another reason is that bagging produces lower order interactions from higher order, as we will see. In what follows we will need to partition sums such as σ µ according to how many indexes appear multiple times in µ = (µ,..., µ J ). To this end, we introduce t(µ) as the numbers of essential ties in µ: t(µ) = #{ (i, j) i < j, µ i = µ j, µ i µ,..., µ i }. The sub-index i marks the first appearance of the index µ i, and all other µ j equal to µ i are counted relatie to i. For example, µ = (,,,, ) has three essential ties: µ = µ, µ = µ 4, and µ 3 = µ 5 ; the tie µ = µ 4 is inessential because it can be inferred from the essential ties. An important obseration concerns the counts of indexes with a gien number of essential ties. The following will be used repeatedly: #{ µ t(µ) = 0} = N J = O(N J ), #{ µ t(µ) = } = N J J = O(N J ), ν

22 ANDREAS BUJA AND WERNER STUETZLE #{ µ t(µ) = 0} = O(N J ). Another notation we need is for the number c(µ, ν) of essential cross-ties between µ and ν: c(µ, ν) = #{ (i, j) µ i = ν j, µ i µ,..., µ i, ν j ν,..., ν j }. We exclude inessential cross-ties that can be inferred from the ties within µ and ν. For example, for µ = (,, ) and ν = (3, ) the only essential cross-tie is µ = ν = ; the remaining inessential cross-tie µ 3 = ν can be inferred from the essential tie µ = µ 3 within µ. With these definitions we hae the following fact for the number of essential ties of the concatenated sequence (µ, ν): t((µ, ν)) = t(µ) + t(ν) + c(µ, ν). 7. Coariance of General Interactions In expanding the coariance between B and C, we note that the terms with zero cross-ties between µ and ν anish due to independence. Thus: Co(B, C) = N J+K c(µ,ν)>0 Co(B µ, C ν ). Because #{(µ, ν) c(µ, ν) > 0 } is of order O(N J+K ) (a crude upper bound is JKN J+K ), it follows that Co(B, C) is of order O(N ), as it should. We now show that in order to capture terms of order N and N in Co(B, C) it is sufficient to limit the summation to those (µ, ν) that satisfy either t(µ) = 0, t(ν) = 0 and c(µ, ν) =, or t(µ) =, t(ν) = 0 and c(µ, ν) =, or t(µ) = 0, t(ν) = and c(µ, ν) =, or t(µ) + t(ν) = 0, and c(µ, ν) = for short. To this end, we note that the number of terms with t(µ) + t(ν) and c(µ, ν) is of order N J+K 3. This

23 is seen from the following crude upper bound: OBSERVATIONS ON BAGGING 3 #{ (µ, ν) t(µ) + t(ν), c(µ, ν) } #{ (µ, ν) t((µ, ν)) 3 } K + J K + J J + K + + N J+K 3, 4, K + J 4 3,, K + J 5,,, J + K 6 where the choose terms arise from choosing the index patterns (,,, ), (,,,, ) and (,,,, 3, 3) in all possible ways in a sequence (µ, ν) of length K + J; these three patterns are necessary and sufficient for t((µ, ν)) 3. Using N J+K 3 instead of N(N )... (N (J + K 4)) makes this an upper bound. With the assumption of finite second moments of B µ and C ν for all µ and ν, it follows that the sum of terms with t(µ) + t(ν) and c(µ, ν) is of order O(N 3 ). Abbreiating we hae: Co(B, C) = N J+K = + + N J+K N J+K N J+K N = L N! (N L)! t(µ)+t(ν)=0,; c(µ,ν)= t(µ)=0, t(ν)=0, c(µ,ν)= t(µ)=, t(ν)=0, c(µ,ν)= t(µ)=0, t(ν)=, c(µ,ν)= = N(N )... (N (L )) Co(B µ, C ν ) + O(N 3 ) Co(B µ, C ν ) Co(B µ, C ν ) Co(B µ, C ν ) + O(N 3 ) N = JK Co(B N J+K (,...), C (,...) ) J + K + J N ( ) N J+K KN Co(B (,,...), C (,...) ) + Co(B (,,,...), C (,...) ) J + K 3

24 4 ANDREAS BUJA AND WERNER STUETZLE + N J+K J K N + O(N 3 ), N J + K 3 ( ) Co(B (,...), C (,,...) ) + Co(B (,...,J), C (,,,...) ) where... inside a coariance stands for as many distinct other indices as necessary. Using we obtain Co(B, C) = + + N N L J + K ( ) N + O(N 3 ) J ( ) N + O(N 3 ) + O(N 3 ). = N L L N L + O(N L ) N + O(N 3 ) JK Co(B (,...), C (,...) ) K J K ( ) Co(B (,,...), C (,...) ) + Co(B (,,,...), C (,...) ) ( ) Co(B (,...), C (,,...) ) + Co(B (,...), C (,,...) ) Collecting terms O(N 3 ), the aboe can be written in a more sightly manner as Co(B, C) = N J + K N JK Co(B X, C X ) + N J K (Co(B X,X, C X ) + Co(B X,X,Y, C Y )) + N J K (Co(B X, C X,X ) + Co(B X, C X,Y,Y )) + O(N 3 ) = a N + b N + O(N 3 ). 7.3 Moments of Resampling Coefficients

25 OBSERVATIONS ON BAGGING 5 We consider sampling in terms of M draws from N objects {,..., N} with and without replacement. The draws are M exchangeable random ariables R,..., R M, where R i {,..., N}. Each draw is equally likely: P [R i = n] = N, but for sampling with replacement the draws are independent; for sampling w/o replacement they are dependent and the joint probabilities are P [R = n, R = n,..., R J = n J ] = M / N for distinct n i s, and = 0 if ties exist J J among the n i s. For resampling one is interested in the count ariables W n,m,n = W n = µ=,...,m [Rµ=n], where we drop M and N from the subscripts if they are fixed. We let W = W M,N = (W,..., W N ) and recall: For resampling with replacement: W Multinomial(/N,..., /N; M). For resampling w/o replacement: W Hypergeometric(M, N). For bagging one needs the moments of W. Because of exchangeability of W for fixed M and N, it is sufficient to consider moments of the form E [W i n=,m,n W i n=,m,n W i L n=l,m,n ]. The following recursion formulae hold for i l : E [W i n=,m,n W i n=,m,n W i L n=l,m,n ] with : = w/o : M N E [(W n=,m,n + ) i W i n=,m,n W i L n=l,m,n ], M N E [W i n=,m,n W i L n=l,m,n ]. From these we derie the moments that will be needed below. Recall α = M/N, and g = α for resampling with, g = α for resampling without, replacement. Using repeatedly approximations such as N = N L L L N L + O(N L ),

26 6 ANDREAS BUJA AND WERNER STUETZLE we obtain: E [W i W i W i L L ] = O() E [W W W L ] = with : M /N L L w/o : M / N L L = with : α L α L L w/o : α L α L L α N + O(N ) ( α ) N + O(N ) = α L L g N + O(N ) E [W W W L ] = = with : M /N L + L M /N L L w/o : M / N L L with : α L + α L + O(N ) w/o : α L + O(N )

27 OBSERVATIONS ON BAGGING 7 = α L (g + ) + O(N ). 7.4 Equialence of resampling With and Without Replacement We show the equialence of resampling with and without replacement to order N. To this end we need to distinguish between the resampling sizes M with and M w/o, and the corresponding resampling fractions α with = M with /N and α w/o = M w/o /N. The equialence holds under the condition α with = α w/o (=: g). The two types of bagged U-statistics are denoted, respectiely, by B with = B w/o = Mwith J µ M J w/o µ E E [ ] Wµ with Wµ with J B µ, [ ] Wµ w/o Wµ w/o J B µ. Bagging differentially reweights the parts of a general interaction in terms of moments of the resampling ector W. The result of bagging is no longer a pure interaction but a general U-statistic because bagging creates lower-order interactions from higher orders. Recall two facts about the bagging weights, that is, the moments of W: ) They depend on the structure of the ties in the index ectors µ = (µ,..., µ J ) only; for example, µ = (,, ) and µ = (3,, 3) hae the same weights, E [W W ] = E [W 3 W ] due to exchangeability. ) The moments of W are of order O() in N (Subsection ) and hence presere the orders O(N ), O(N ), O(N 3 ) of the terms considered in Subsection. We derie a crude bound on their difference using B bound = max µ B µ. We assume the aboe condition on α with and α w/o and obtain: B with B w/o µ = Mwith J E t(µ)=0 + t(µ)= [ ] Wµ with Wµ with J + t(µ)> M J w/o... B bound E ] µ J [ W w/o µ W w/o B bound

28 8 ANDREAS BUJA AND WERNER STUETZLE = t(µ)=0 Mwith J αwith J J g N + O(N ) Mw/o J αw/o J J g N + O(N ) B bound + [ ] M J αwith J (g + ) + O(N ) t(µ)= with [ ] Mw/o J αw/o J (g + ) + O(N ) B bound + M J [ O() ] t(µ)> with Mw/o J [ O() ] B bound = N J O(N ) + O(N ) + O() B bound t(µ)=0 t(µ)= t(µ)> = N N J O(N ) + N J O(N ) J J + N J N N J O() B bound J J = [ ] N J O(N J ) O(N ) + O(N J ) O(N ) + O(N J ) O() B bound = O(N ) B bound This proes the per-sample equialence of bagging based on resampling with and without replacement up to order O(N ). The result is somewhat unsatisfactory because the bound depends on the extremes of the U-terms B µ, which tend to infinity for N, unless B µ is bounded. Other bounds at a weaker rate can be obtained with the Hölder inequality: ( B with B w/o O N ) ( p N J µ B µ q ) q for p + q =. This specializes to the preiously deried bound when p = and q =, for which the best rate of O(N ) is obtained, albeit under the strongest assumptions on B µ. 7.5 Coariances of Bagged Interactions

29 OBSERVATIONS ON BAGGING 9 Resuming calculations begun in Subsection for coariances of unbagged interaction terms, we now derie the coariance of their M-bagged ersions: B bag = M J µ E [W µ W µj ] B µ, C bag = M K The moment calculations of Subsection yield the following: ν E [W ν W νk ] C ν. Co(B bag, C bag ) = M J+K + O(N 3 ) = + + M J+K M J+K M J+K + O(N 3 ) t(µ)+t(ν)=0,; c(µ,ν)= t(µ)=0, t(ν)=0, c(µ,ν)= t(µ)=, t(ν)=0, c(µ,ν)= t(µ)=0, t(ν)=, c(µ,ν)= N = N J+K JK αj+k J + K E [W µ W µj ] E [W ν W νk ] Co(B µ, C ν ) E [W µ W µj ] E [W ν W νk ] Co(B µ, C ν ) E [W µ W µ W µj ] E [W ν W νk ] Co(B µ, C ν ) E [W µ W µ W µj ] E [W ν W ν W νk ] Co(B µ, C ν ) E [W W J ] E [W W K ] Co(B (,...), C (,...) ) + J N N J+K α J+K KN J + K 3 ( ) E [W W W J ] E [W W K ] Co(B (,,...), C (,...) ) + Co(B (,,,...), C (,...) ) + N J+K α J+K J K N N J + K 3 ( ) E [W W J ] E [W W W K ] Co(B (,...), C (,,...) ) + Co(B (,...), C (,,,...) ) + O(N 3 )

30 30 ANDREAS BUJA AND WERNER STUETZLE = JK N J + K N K g N Co(B X, C X ) + J g N J K N (g + ) (Co(B X,X, C X ) + Co(B X,X,Y, C Y )) + J K N (g + ) (Co(B X, C X,X ) + Co(B X, C X,Y,Y )) + O(N 3 ) = N N J + K N J + K g JK Co(B X, C X ) + N J K (g + ) (Co(B X,X, C X ) + Co(B X,X,Y, C Y )) + N J K (g + ) (Co(B X, C X,X ) + Co(B X, C X,Y,Y )) + O(N 3 ) The last three lines form the final result of these calculations. 7.6 Difference Between Variances of Bagged and Unbagged Comparing the results of the Sections and, we get: Co(B bag, C bag ) Co(B, C) = N J + K g JK Co(B X, C X ) + N J K g (Co(B X,X, C X ) + Co(B X,X,Y, C Y )) + N J K g (Co(B X, C X,X ) + Co(B X, C X,Y,Y )) + O(N 3 ) = N g J + K JK Co(B X, C X )

31 OBSERVATIONS ON BAGGING 3 + J K (Co(B X,X, C X ) + Co(B X,X,Y, C Y )) + J K (Co(B X, C X,X ) + Co(B X, C X,Y,Y )) + O(N 3 ) = N g S Var (B, C) + O(N 3 ), where S Var (B, C) = J K Co(C X, B X,X + B X,Y,Y JB X ) + K J Co(B X, C X,X + C X,Y,Y KC X ). The expression for S Var (B, C) remains correct for J and K as low as, in which case one interprets J = 0 and B X,X = 0 when J =, and B X,Y,Y = 0 when J, and similar for C when K = or. The result generalizes to arbitrary finite sums of interactions U = A + B + C +... = A i + N N B i,j + N 3 C i,j,k i i,j Because S Var (B, C) is a bilinear form in its arguments, the corresponding constant S Var (U) for sums of U-statistics can be expanded as follows: S Var (U) = S Var (A, A) + S Var (A, B) + S Var (B, B) + S Var (A, C) + S Var (B, C) + S Var (C, C) +..., so that Var(U bag ) Var(U) = N g S Var (U) + O(N 3 ). For example a functional consisting of first and second order terms, i,j,k U = A + B = A i + N N B i,j, i i,j

32 3 ANDREAS BUJA AND WERNER STUETZLE yields S Var (U) = S Var (A, A) + S Var (A, B) + S Var (B, B) = Co(A X, B X,X B X ) + Co(B X, B X,X B X ) = Co(A X + B X, B X,X B X ). Note that S Var (A, A) = 0 because bagging leaes additie statistics unchanged. 7.7 Difference between Squared Bias of Bagged and Unbagged statistic We consider a single K-th order interaction first, with functional and plug-in U(F ) = E C (,,...,K), U(F N ) = N K N ν,...,ν K = C (ν,...,ν K ). [Recall that C ν and C (ν,...,ν K ) are short for C Xν,...,X νk.] The functional U(F ) plays the role of the parameter to be estimated by the statistic U = U(F N ), so that the notion of bias applies. We first calculate the bias for the unbagged statistic U and second for the bagged statistic U bag. Note that E C X = E C,...,K = U(F ). E [U(F N )] = N K E C (ν,...,ν K ) ν,...,ν K = N N K E C (,...,K) + K N E C (,,,...,K ) + O(N K ) K K = U(F ) + N K ( E C X,X E C X ) + O(N ). Now for the bias of the bagged statistic: E U bag = = = M K N K α K N K α K N ν,...,ν k = t(ν)=0 E [W ν W νk ] E C (ν,...,ν K ) + t(ν)= + O(N K ) N E [W W K ] E C (,...,K) K

33 = + OBSERVATIONS ON BAGGING 33 K N E [W W W K ] E C (,,,...,K ) K + O(N ) K N K g N E C (,...,K) + N K (g + ) E C (,,,...,K ) + O(N ) = U(F ) N K (g + ) E C (,...,K) + N K (g + ) E C (,,,...,K ) + O(N ) = U(F ) + N K (g + ) (E C X,X E C X ) + O(N ) Thus: Bias (U bag ) = N K (g + ) (E C X,X E C X ) + O(N ) As for ariances, we can now consider more general statistics that are finite sums of interactions: The final result is: U = A + B + C +... b = Ai + Bi,j N N + Ci,j,k N Bias (U bag ) Bias (U) ( = N (g + ) ) (E B X,X E B X ) + 3 (E C X,X E C X ) O(N 3 ). As usual, g = α for sampling with, and g = α for sampling w/o, replacement.

34 34 ANDREAS BUJA AND WERNER STUETZLE References L. Breiman (996). Bagging predictors. Machine Learning 6, L. Breiman, J. H. Friedman, R. Olshen, and C. J. Stone (984). Classification and Regression Trees. Wadsworth, Belmont, California. P. Buhlmann and B. Yu (00). Analyzing bagging. Ann. of Statist. 30, S. X. Chen and P. Hall (003). Effects of bagging and bias correction on estimators defined by estimating equations. Statistica Sinica 3, J.H. Friedman and O. Hall (000). On bagging and nonlinear estimation. Can be downloaded from jhf/#reports. W. Hoeffding (948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 9, K. Knight and Jr. G. W. Bassett (00). Second order improements of sample quantiles using subsamples. Can be downloaded from R. J. Serfling (980). Approximation Theorems of Mathematical Statistics. Wiley, New York. Department of Statistics, The Wharton School, Uniersity of Pennsylania, Philadelphia, PA buja@wharton.upenn.edu Department of Statistics, Box 3543, Uniersity of Washington, Seattle, WA wxs@stat.washington.edu

Astrometric Errors Correlated Strongly Across Multiple SIRTF Images

Astrometric Errors Correlated Strongly Across Multiple SIRTF Images Astrometric Errors Correlated Strongly Across Multiple SIRTF Images John Fowler 28 March 23 The possibility exists that after pointing transfer has been performed for each BCD (i.e. a calibrated image

More information

Insights into Cross-validation

Insights into Cross-validation Noname manuscript No. (will be inserted by the editor) Insights into Cross-alidation Amit Dhurandhar Alin Dobra Receied: date / Accepted: date Abstract Cross-alidation is one of the most widely used techniques,

More information

Notes on Linear Minimum Mean Square Error Estimators

Notes on Linear Minimum Mean Square Error Estimators Notes on Linear Minimum Mean Square Error Estimators Ça gatay Candan January, 0 Abstract Some connections between linear minimum mean square error estimators, maximum output SNR filters and the least square

More information

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias

Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias Asymptotic Normality of an Entropy Estimator with Exponentially Decaying Bias Zhiyi Zhang Department of Mathematics and Statistics Uniersity of North Carolina at Charlotte Charlotte, NC 28223 Abstract

More information

LECTURE 2: CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS

LECTURE 2: CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS LECTURE : CROSS PRODUCTS, MULTILINEARITY, AND AREAS OF PARALLELOGRAMS MA1111: LINEAR ALGEBRA I, MICHAELMAS 016 1. Finishing up dot products Last time we stated the following theorem, for which I owe you

More information

Patterns of Non-Simple Continued Fractions

Patterns of Non-Simple Continued Fractions Patterns of Non-Simple Continued Fractions Jesse Schmieg A final report written for the Uniersity of Minnesota Undergraduate Research Opportunities Program Adisor: Professor John Greene March 01 Contents

More information

Lecture J. 10 Counting subgraphs Kirchhoff s Matrix-Tree Theorem.

Lecture J. 10 Counting subgraphs Kirchhoff s Matrix-Tree Theorem. Lecture J jacques@ucsd.edu Notation: Let [n] [n] := [n] 2. A weighted digraph is a function W : [n] 2 R. An arborescence is, loosely speaking, a digraph which consists in orienting eery edge of a rooted

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

Section 6: PRISMATIC BEAMS. Beam Theory

Section 6: PRISMATIC BEAMS. Beam Theory Beam Theory There are two types of beam theory aailable to craft beam element formulations from. They are Bernoulli-Euler beam theory Timoshenko beam theory One learns the details of Bernoulli-Euler beam

More information

CHAPTER (i) No. For each coefficient, the usual standard errors and the heteroskedasticity-robust ones are practically very similar.

CHAPTER (i) No. For each coefficient, the usual standard errors and the heteroskedasticity-robust ones are practically very similar. SOLUTIONS TO PROBLEMS CHAPTER 8 8.1 Parts (ii) and (iii). The homoskedasticity assumption played no role in Chapter 5 in showing that OLS is consistent. But we know that heteroskedasticity causes statistical

More information

An Explicit Lower Bound of 5n o(n) for Boolean Circuits

An Explicit Lower Bound of 5n o(n) for Boolean Circuits An Eplicit Lower Bound of 5n o(n) for Boolean Circuits Kazuo Iwama, Oded Lachish, Hiroki Morizumi, and Ran Raz Graduate School of Informatics, Kyoto Uniersity, Kyoto, JAPAN {iwama, morizumi}@kuis.kyoto-u.ac.jp

More information

NON-PARAMETRIC METHODS IN ANALYSIS OF EXPERIMENTAL DATA

NON-PARAMETRIC METHODS IN ANALYSIS OF EXPERIMENTAL DATA NON-PARAMETRIC METHODS IN ANALYSIS OF EXPERIMENTAL DATA Rajender Parsad I.A.S.R.I., Library Aenue, New Delhi 0 0. Introduction In conentional setup the analysis of experimental data is based on the assumptions

More information

Scalar multiplication and algebraic direction of a vector

Scalar multiplication and algebraic direction of a vector Roberto s Notes on Linear Algebra Chapter 1: Geometric ectors Section 5 Scalar multiplication and algebraic direction of a ector What you need to know already: of a geometric ectors. Length and geometric

More information

Brake applications and the remaining velocity Hans Humenberger University of Vienna, Faculty of mathematics

Brake applications and the remaining velocity Hans Humenberger University of Vienna, Faculty of mathematics Hans Humenberger: rake applications and the remaining elocity 67 rake applications and the remaining elocity Hans Humenberger Uniersity of Vienna, Faculty of mathematics Abstract It is ery common when

More information

Chapter 4: Techniques of Circuit Analysis

Chapter 4: Techniques of Circuit Analysis Chapter 4: Techniques of Circuit Analysis This chapter gies us many useful tools for soling and simplifying circuits. We saw a few simple tools in the last chapter (reduction of circuits ia series and

More information

Constructing Prediction Intervals for Random Forests

Constructing Prediction Intervals for Random Forests Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor

More information

SUPPLEMENTARY MATERIAL. Authors: Alan A. Stocker (1) and Eero P. Simoncelli (2)

SUPPLEMENTARY MATERIAL. Authors: Alan A. Stocker (1) and Eero P. Simoncelli (2) SUPPLEMENTARY MATERIAL Authors: Alan A. Stocker () and Eero P. Simoncelli () Affiliations: () Dept. of Psychology, Uniersity of Pennsylania 34 Walnut Street 33C Philadelphia, PA 94-68 U.S.A. () Howard

More information

The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost

The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost The Budgeted Minimum Cost Flow Problem with Unit Upgrading Cost Christina Büsing Sarah Kirchner Arie Koster Annika Thome October 6, 2015 Abstract The budgeted minimum cost flow problem (BMCF(K)) with unit

More information

Network Flow Problems Luis Goddyn, Math 408

Network Flow Problems Luis Goddyn, Math 408 Network Flow Problems Luis Goddyn, Math 48 Let D = (V, A) be a directed graph, and let s, t V (D). For S V we write δ + (S) = {u A : u S, S} and δ (S) = {u A : u S, S} for the in-arcs and out-arcs of S

More information

Physics 2A Chapter 3 - Motion in Two Dimensions Fall 2017

Physics 2A Chapter 3 - Motion in Two Dimensions Fall 2017 These notes are seen pages. A quick summary: Projectile motion is simply horizontal motion at constant elocity with ertical motion at constant acceleration. An object moing in a circular path experiences

More information

Lecture 1. 1 Overview. 2 Maximum Flow. COMPSCI 532: Design and Analysis of Algorithms August 26, 2015

Lecture 1. 1 Overview. 2 Maximum Flow. COMPSCI 532: Design and Analysis of Algorithms August 26, 2015 COMPSCI 532: Design and Analysis of Algorithms August 26, 205 Lecture Lecturer: Debmalya Panigrahi Scribe: Allen Xiao Oeriew In this lecture, we will define the maximum flow problem and discuss two algorithms

More information

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes.

Random Forests. These notes rely heavily on Biau and Scornet (2016) as well as the other references at the end of the notes. Random Forests One of the best known classifiers is the random forest. It is very simple and effective but there is still a large gap between theory and practice. Basically, a random forest is an average

More information

Online Companion to Pricing Services Subject to Congestion: Charge Per-Use Fees or Sell Subscriptions?

Online Companion to Pricing Services Subject to Congestion: Charge Per-Use Fees or Sell Subscriptions? Online Companion to Pricing Serices Subject to Congestion: Charge Per-Use Fees or Sell Subscriptions? Gérard P. Cachon Pnina Feldman Operations and Information Management, The Wharton School, Uniersity

More information

UNDERSTAND MOTION IN ONE AND TWO DIMENSIONS

UNDERSTAND MOTION IN ONE AND TWO DIMENSIONS SUBAREA I. COMPETENCY 1.0 UNDERSTAND MOTION IN ONE AND TWO DIMENSIONS MECHANICS Skill 1.1 Calculating displacement, aerage elocity, instantaneous elocity, and acceleration in a gien frame of reference

More information

VISUAL PHYSICS ONLINE RECTLINEAR MOTION: UNIFORM ACCELERATION

VISUAL PHYSICS ONLINE RECTLINEAR MOTION: UNIFORM ACCELERATION VISUAL PHYSICS ONLINE RECTLINEAR MOTION: UNIFORM ACCELERATION Predict Obsere Explain Exercise 1 Take an A4 sheet of paper and a heay object (cricket ball, basketball, brick, book, etc). Predict what will

More information

Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES

Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES Computing Laboratory A GAME-BASED ABSTRACTION-REFINEMENT FRAMEWORK FOR MARKOV DECISION PROCESSES Mark Kattenbelt Marta Kwiatkowska Gethin Norman Daid Parker CL-RR-08-06 Oxford Uniersity Computing Laboratory

More information

arxiv: v2 [math.co] 12 Jul 2009

arxiv: v2 [math.co] 12 Jul 2009 A Proof of the Molecular Conjecture Naoki Katoh and Shin-ichi Tanigawa arxi:0902.02362 [math.co] 12 Jul 2009 Department of Architecture and Architectural Engineering, Kyoto Uniersity, Kyoto Daigaku Katsura,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Trajectory Estimation for Tactical Ballistic Missiles in Terminal Phase Using On-line Input Estimator

Trajectory Estimation for Tactical Ballistic Missiles in Terminal Phase Using On-line Input Estimator Proc. Natl. Sci. Counc. ROC(A) Vol. 23, No. 5, 1999. pp. 644-653 Trajectory Estimation for Tactical Ballistic Missiles in Terminal Phase Using On-line Input Estimator SOU-CHEN LEE, YU-CHAO HUANG, AND CHENG-YU

More information

Sparse Functional Regression

Sparse Functional Regression Sparse Functional Regression Junier B. Olia, Barnabás Póczos, Aarti Singh, Jeff Schneider, Timothy Verstynen Machine Learning Department Robotics Institute Psychology Department Carnegie Mellon Uniersity

More information

Lecture 13: Ensemble Methods

Lecture 13: Ensemble Methods Lecture 13: Ensemble Methods Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu 1 / 71 Outline 1 Bootstrap

More information

OPTIMAL RESOLVABLE DESIGNS WITH MINIMUM PV ABERRATION

OPTIMAL RESOLVABLE DESIGNS WITH MINIMUM PV ABERRATION Statistica Sinica 0 (010), 715-73 OPTIMAL RESOLVABLE DESIGNS WITH MINIMUM PV ABERRATION J. P. Morgan Virginia Tech Abstract: Amongst resolable incomplete block designs, affine resolable designs are optimal

More information

Modeling Highway Traffic Volumes

Modeling Highway Traffic Volumes Modeling Highway Traffic Volumes Tomáš Šingliar1 and Miloš Hauskrecht 1 Computer Science Dept, Uniersity of Pittsburgh, Pittsburgh, PA 15260 {tomas, milos}@cs.pitt.edu Abstract. Most traffic management

More information

arxiv: v1 [math.co] 25 Apr 2016

arxiv: v1 [math.co] 25 Apr 2016 On the zone complexity of a ertex Shira Zerbib arxi:604.07268 [math.co] 25 Apr 206 April 26, 206 Abstract Let L be a set of n lines in the real projectie plane in general position. We show that there exists

More information

Chapter 1. The Postulates of the Special Theory of Relativity

Chapter 1. The Postulates of the Special Theory of Relativity Chapter 1 The Postulates of the Special Theory of Relatiity Imagine a railroad station with six tracks (Fig. 1.1): On track 1a a train has stopped, the train on track 1b is going to the east at a elocity

More information

Lossless Online Bayesian Bagging

Lossless Online Bayesian Bagging Lossless Online Bayesian Bagging Herbert K. H. Lee ISDS Duke University Box 90251 Durham, NC 27708 herbie@isds.duke.edu Merlise A. Clyde ISDS Duke University Box 90251 Durham, NC 27708 clyde@isds.duke.edu

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

State-space Modelling of Hysteresis-based Control Schemes

State-space Modelling of Hysteresis-based Control Schemes European Control Conference (ECC) July 7-9,, Zürich, Switzerland. State-space Modelling of Hysteresis-based Control Schemes Soumya Kundu Ian A. Hiskens Abstract The paper deelops a state-space model for

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Residual migration in VTI media using anisotropy continuation

Residual migration in VTI media using anisotropy continuation Stanford Exploration Project, Report SERGEY, Noember 9, 2000, pages 671?? Residual migration in VTI media using anisotropy continuation Tariq Alkhalifah Sergey Fomel 1 ABSTRACT We introduce anisotropy

More information

arxiv: v1 [math.pr] 25 Jun 2015

arxiv: v1 [math.pr] 25 Jun 2015 Random walks colliding before getting trapped Louigi Addario-Berry Roberto I. Olieira Yual Peres Perla Sousi ari:156.78451 math.pr 25 Jun 215 Abstract Let P be the transition matrix of a finite, irreducible

More information

Сollisionless damping of electron waves in non-maxwellian plasma 1

Сollisionless damping of electron waves in non-maxwellian plasma 1 http:/arxi.org/physics/78.748 Сollisionless damping of electron waes in non-mawellian plasma V. N. Soshnio Plasma Physics Dept., All-Russian Institute of Scientific and Technical Information of the Russian

More information

0 a 3 a 2 a 3 0 a 1 a 2 a 1 0

0 a 3 a 2 a 3 0 a 1 a 2 a 1 0 Chapter Flow kinematics Vector and tensor formulae This introductory section presents a brief account of different definitions of ector and tensor analysis that will be used in the following chapters.

More information

Transmission lines using a distributed equivalent circuit

Transmission lines using a distributed equivalent circuit Cambridge Uniersity Press 978-1-107-02600-1 - Transmission Lines Equialent Circuits, Electromagnetic Theory, and Photons Part 1 Transmission lines using a distributed equialent circuit in this web serice

More information

v v Downloaded 01/11/16 to Redistribution subject to SEG license or copyright; see Terms of Use at

v v Downloaded 01/11/16 to Redistribution subject to SEG license or copyright; see Terms of Use at The pseudo-analytical method: application of pseudo-laplacians to acoustic and acoustic anisotropic wae propagation John T. Etgen* and Serre Brandsberg-Dahl Summary We generalize the pseudo-spectral method

More information

Inelastic Collapse in One Dimensional Systems with Ordered Collisions

Inelastic Collapse in One Dimensional Systems with Ordered Collisions Lake Forest College Lake Forest College Publications Senior Theses Student Publications 4-4-05 Inelastic Collapse in One Dimensional Systems with Ordered Collisions Brandon R. Bauerly Lake Forest College,

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

4. A Physical Model for an Electron with Angular Momentum. An Electron in a Bohr Orbit. The Quantum Magnet Resulting from Orbital Motion.

4. A Physical Model for an Electron with Angular Momentum. An Electron in a Bohr Orbit. The Quantum Magnet Resulting from Orbital Motion. 4. A Physical Model for an Electron with Angular Momentum. An Electron in a Bohr Orbit. The Quantum Magnet Resulting from Orbital Motion. We now hae deeloped a ector model that allows the ready isualization

More information

Isoperimetric problems

Isoperimetric problems CHAPTER 2 Isoperimetric problems 2.1. History One of the earliest problems in geometry was the isoperimetric problem, which was considered by the ancient Greeks. The problem is to find, among all closed

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

The optimal pebbling number of the complete m-ary tree

The optimal pebbling number of the complete m-ary tree Discrete Mathematics 222 (2000) 89 00 www.elseier.com/locate/disc The optimal pebbling number of the complete m-ary tree Hung-Lin Fu, Chin-Lin Shiue Department of Applied Mathematics, National Chiao Tung

More information

Towards Green Distributed Storage Systems

Towards Green Distributed Storage Systems Towards Green Distributed Storage Systems Abdelrahman M. Ibrahim, Ahmed A. Zewail, and Aylin Yener Wireless Communications and Networking Laboratory (WCAN) Electrical Engineering Department The Pennsylania

More information

A Regularization Framework for Learning from Graph Data

A Regularization Framework for Learning from Graph Data A Regularization Framework for Learning from Graph Data Dengyong Zhou Max Planck Institute for Biological Cybernetics Spemannstr. 38, 7076 Tuebingen, Germany Bernhard Schölkopf Max Planck Institute for

More information

A spectral Turán theorem

A spectral Turán theorem A spectral Turán theorem Fan Chung Abstract If all nonzero eigenalues of the (normalized) Laplacian of a graph G are close to, then G is t-turán in the sense that any subgraph of G containing no K t+ contains

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY

RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY 1 RANDOMIZING OUTPUTS TO INCREASE PREDICTION ACCURACY Leo Breiman Statistics Department University of California Berkeley, CA. 94720 leo@stat.berkeley.edu Technical Report 518, May 1, 1998 abstract Bagging

More information

Introduction to Thermodynamic Cycles Part 1 1 st Law of Thermodynamics and Gas Power Cycles

Introduction to Thermodynamic Cycles Part 1 1 st Law of Thermodynamics and Gas Power Cycles Introduction to Thermodynamic Cycles Part 1 1 st Law of Thermodynamics and Gas Power Cycles by James Doane, PhD, PE Contents 1.0 Course Oeriew... 4.0 Basic Concepts of Thermodynamics... 4.1 Temperature

More information

A comparative study of ordinary cross-validation, r-fold cross-validation and the repeated learning-testing methods

A comparative study of ordinary cross-validation, r-fold cross-validation and the repeated learning-testing methods Biometrika (1989), 76, 3, pp. 503-14 Printed in Great Britain A comparative study of ordinary cross-validation, r-fold cross-validation and the repeated learning-testing methods BY PRABIR BURMAN Division

More information

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets

Econometrics II - EXAM Outline Solutions All questions have 25pts Answer each question in separate sheets Econometrics II - EXAM Outline Solutions All questions hae 5pts Answer each question in separate sheets. Consider the two linear simultaneous equations G with two exogeneous ariables K, y γ + y γ + x δ

More information

A-level Mathematics. MM03 Mark scheme June Version 1.0: Final

A-level Mathematics. MM03 Mark scheme June Version 1.0: Final -leel Mathematics MM0 Mark scheme 660 June 0 Version.0: Final Mark schemes are prepared by the Lead ssessment Writer and considered, together with the releant questions, by a panel of subject teachers.

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Bc. Dominik Lachman. Bruhat-Tits buildings

Bc. Dominik Lachman. Bruhat-Tits buildings MASTER THESIS Bc. Dominik Lachman Bruhat-Tits buildings Department of Algebra Superisor of the master thesis: Study programme: Study branch: Mgr. Vítězsla Kala, Ph.D. Mathematics Mathematical structure

More information

6.3 Vectors in a Plane

6.3 Vectors in a Plane 6.3 Vectors in a Plane Plan: Represent ectors as directed line segments. Write the component form of ectors. Perform basic ector operations and represent ectors graphically. Find the direction angles of

More information

ERAD THE SEVENTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY

ERAD THE SEVENTH EUROPEAN CONFERENCE ON RADAR IN METEOROLOGY AND HYDROLOGY Multi-beam raindrop size distribution retrieals on the oppler spectra Christine Unal Geoscience and Remote Sensing, TU-elft Climate Institute, Steinweg 1, 68 CN elft, Netherlands, c.m.h.unal@tudelft.nl

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

BAGGING PREDICTORS AND RANDOM FOREST

BAGGING PREDICTORS AND RANDOM FOREST BAGGING PREDICTORS AND RANDOM FOREST DANA KANER M.SC. SEMINAR IN STATISTICS, MAY 2017 BAGIGNG PREDICTORS / LEO BREIMAN, 1996 RANDOM FORESTS / LEO BREIMAN, 2001 THE ELEMENTS OF STATISTICAL LEARNING (CHAPTERS

More information

Optimal Joint Detection and Estimation in Linear Models

Optimal Joint Detection and Estimation in Linear Models Optimal Joint Detection and Estimation in Linear Models Jianshu Chen, Yue Zhao, Andrea Goldsmith, and H. Vincent Poor Abstract The problem of optimal joint detection and estimation in linear models with

More information

A possible mechanism to explain wave-particle duality L D HOWE No current affiliation PACS Numbers: r, w, k

A possible mechanism to explain wave-particle duality L D HOWE No current affiliation PACS Numbers: r, w, k A possible mechanism to explain wae-particle duality L D HOWE No current affiliation PACS Numbers: 0.50.-r, 03.65.-w, 05.60.-k Abstract The relationship between light speed energy and the kinetic energy

More information

Cases of integrability corresponding to the motion of a pendulum on the two-dimensional plane

Cases of integrability corresponding to the motion of a pendulum on the two-dimensional plane Cases of integrability corresponding to the motion of a pendulum on the two-dimensional plane MAXIM V. SHAMOLIN Lomonoso Moscow State Uniersity Institute of Mechanics Michurinskii Ae.,, 99 Moscow RUSSIAN

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

Boundary Element Method Calculation of Moment Transfer Parameters in Slab-Column Connections

Boundary Element Method Calculation of Moment Transfer Parameters in Slab-Column Connections ACI STRUCTURAL JOURNAL Title no. 107-S16 TECHNICAL PAPER Boundary Element Method Calculation of Moment Transfer Parameters in Slab-Column Connections by Mohammed A. Nazief, Youssef F. Rashed, and Wael

More information

An Optimal Split-Plot Design for Performing a Mixture-Process Experiment

An Optimal Split-Plot Design for Performing a Mixture-Process Experiment Science Journal of Applied Mathematics and Statistics 217; 5(1): 15-23 http://www.sciencepublishinggroup.com/j/sjams doi: 1.11648/j.sjams.21751.13 ISSN: 2376-9491 (Print); ISSN: 2376-9513 (Online) An Optimal

More information

(a) During the first part of the motion, the displacement is x 1 = 40 km and the time interval is t 1 (30 km / h) (80 km) 40 km/h. t. (2.

(a) During the first part of the motion, the displacement is x 1 = 40 km and the time interval is t 1 (30 km / h) (80 km) 40 km/h. t. (2. Chapter 3. Since the trip consists of two parts, let the displacements during first and second parts of the motion be x and x, and the corresponding time interals be t and t, respectiely. Now, because

More information

Integer Parameter Synthesis for Real-time Systems

Integer Parameter Synthesis for Real-time Systems 1 Integer Parameter Synthesis for Real-time Systems Aleksandra Joanoić, Didier Lime and Oliier H. Roux École Centrale de Nantes - IRCCyN UMR CNRS 6597 Nantes, France Abstract We proide a subclass of parametric

More information

Dyadic Classification Trees via Structural Risk Minimization

Dyadic Classification Trees via Structural Risk Minimization Dyadic Classification Trees via Structural Risk Minimization Clayton Scott and Robert Nowak Department of Electrical and Computer Engineering Rice University Houston, TX 77005 cscott,nowak @rice.edu Abstract

More information

REGRESSION TREE CREDIBILITY MODEL

REGRESSION TREE CREDIBILITY MODEL LIQUN DIAO AND CHENGGUO WENG Department of Statistics and Actuarial Science, University of Waterloo Advances in Predictive Analytics Conference, Waterloo, Ontario Dec 1, 2017 Overview Statistical }{{ Method

More information

A Geometric Review of Linear Algebra

A Geometric Review of Linear Algebra A Geometric Reiew of Linear Algebra The following is a compact reiew of the primary concepts of linear algebra. I assume the reader is familiar with basic (i.e., high school) algebra and trigonometry.

More information

Lectures on Spectral Graph Theory. Fan R. K. Chung

Lectures on Spectral Graph Theory. Fan R. K. Chung Lectures on Spectral Graph Theory Fan R. K. Chung Author address: Uniersity of Pennsylania, Philadelphia, Pennsylania 19104 E-mail address: chung@math.upenn.edu Contents Chapter 1. Eigenalues and the Laplacian

More information

On computing Gaussian curvature of some well known distribution

On computing Gaussian curvature of some well known distribution Theoretical Mathematics & Applications, ol.3, no.4, 03, 85-04 ISSN: 79-9687 (print), 79-9709 (online) Scienpress Ltd, 03 On computing Gaussian curature of some well known distribution William W.S. Chen

More information

The Riemann-Roch Theorem: a Proof, an Extension, and an Application MATH 780, Spring 2019

The Riemann-Roch Theorem: a Proof, an Extension, and an Application MATH 780, Spring 2019 The Riemann-Roch Theorem: a Proof, an Extension, and an Application MATH 780, Spring 2019 This handout continues the notational conentions of the preious one on the Riemann-Roch Theorem, with one slight

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Decision Trees. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Decision Trees Tobias Scheffer Decision Trees One of many applications: credit risk Employed longer than 3 months Positive credit

More information

9. Integral Ring Extensions

9. Integral Ring Extensions 80 Andreas Gathmann 9. Integral ing Extensions In this chapter we want to discuss a concept in commutative algebra that has its original motivation in algebra, but turns out to have surprisingly many applications

More information

The Full-rank Linear Least Squares Problem

The Full-rank Linear Least Squares Problem Jim Lambers COS 7 Spring Semeseter 1-11 Lecture 3 Notes The Full-rank Linear Least Squares Problem Gien an m n matrix A, with m n, and an m-ector b, we consider the oerdetermined system of equations Ax

More information

Variance Reduction for Stochastic Gradient Optimization

Variance Reduction for Stochastic Gradient Optimization Variance Reduction for Stochastic Gradient Optimization Chong Wang Xi Chen Alex Smola Eric P. Xing Carnegie Mellon Uniersity, Uniersity of California, Berkeley {chongw,xichen,epxing}@cs.cmu.edu alex@smola.org

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Consistency of Nearest Neighbor Methods

Consistency of Nearest Neighbor Methods E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study

More information

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1

IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 1 1 2 IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS - PART B 2 An experimental bias variance analysis of SVM ensembles based on resampling

More information

Optimization Problems in Multiple Subtree Graphs

Optimization Problems in Multiple Subtree Graphs Optimization Problems in Multiple Subtree Graphs Danny Hermelin Dror Rawitz February 28, 2010 Abstract We study arious optimization problems in t-subtree graphs, the intersection graphs of t- subtrees,

More information

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference

Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

PRODUCTS IN CONDITIONAL EXTREME VALUE MODEL

PRODUCTS IN CONDITIONAL EXTREME VALUE MODEL PRODUCTS IN CONDITIONAL ETREME VALUE MODEL RAJAT SUBHRA HAZRA AND KRISHANU MAULIK Abstract. The classical multiariate extreme alue theory tries to capture the extremal dependence between the components

More information

International Journal of Scientific & Engineering Research, Volume 7, Issue 11, November-2016 ISSN Nullity of Expanded Smith graphs

International Journal of Scientific & Engineering Research, Volume 7, Issue 11, November-2016 ISSN Nullity of Expanded Smith graphs 85 Nullity of Expanded Smith graphs Usha Sharma and Renu Naresh Department of Mathematics and Statistics, Banasthali Uniersity, Banasthali Rajasthan : usha.sharma94@yahoo.com, : renunaresh@gmail.com Abstract

More information

Chapter 14 Thermal Physics: A Microscopic View

Chapter 14 Thermal Physics: A Microscopic View Chapter 14 Thermal Physics: A Microscopic View The main focus of this chapter is the application of some of the basic principles we learned earlier to thermal physics. This will gie us some important insights

More information

ISSN BCAM Do Quality Ladders Rationalise the Observed Engel Curves? Vincenzo Merella University of Cagliari.

ISSN BCAM Do Quality Ladders Rationalise the Observed Engel Curves? Vincenzo Merella University of Cagliari. Birkbeck Centre for Applied Macroeconomics ISSN 1745-8587 BCAM 1401 Do Quality Ladders Rationalise the Obsered Engel Cures? incenzo Merella Uniersity of Cagliari March 2014 Birkbeck, Uniersity of London

More information

A Geometric Review of Linear Algebra

A Geometric Review of Linear Algebra A Geometric Reiew of Linear Algebra The following is a compact reiew of the primary concepts of linear algebra. The order of presentation is unconentional, with emphasis on geometric intuition rather than

More information

Management of Power Systems With In-line Renewable Energy Sources

Management of Power Systems With In-line Renewable Energy Sources Management of Power Systems With In-line Renewable Energy Sources Haoliang Shi, Xiaoyu Shi, Haozhe Xu, Ziwei Yu Department of Mathematics, Uniersity of Arizona May 14, 018 Abstract Power flowing along

More information

Hölder-type inequalities and their applications to concentration and correlation bounds

Hölder-type inequalities and their applications to concentration and correlation bounds Hölder-type inequalities and their applications to concentration and correlation bounds Christos Pelekis * Jan Ramon Yuyi Wang Noember 29, 2016 Abstract Let Y, V, be real-alued random ariables haing a

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information