Bounds on the Moments for an Ensemble of Random Decision Trees

Size: px
Start display at page:

Download "Bounds on the Moments for an Ensemble of Random Decision Trees"

Transcription

1 Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: Sep. 17, 2013 / Revised: Mar. 04, 2014 / Accepted: Jun. 30, 2014 Abstract An ensemble of random decision trees is a popular classification tecnique, especially known for its ability to scale to large domains. In tis paper, we provide an efficient strategy to compute bounds on te moments of te generalization error computed over all datasets of a particular size drawn from an underlying distribution, for tis classification tecnique. Being able to estimate tese moments can elp us gain insigts into te performance of tis model. As we will see in te experimental section tese bounds tend to be significantly tigter tan te state-of-te-art Breiman s bounds based on strengt and correlation and, ence, more useful in practice. Keywords bounds, random decision trees, moments 1 Introduction An ensemble of random decision trees is a widely used classification tecnique [16, 13, 22, 14], especially known for its ability to scale to large domains. Tis classification tecnique was introduced in [13] were te autors portrayed te efficacy of te metod on real applications. According to te description in [13], te tecnique can be viewed as a special case of te random forest algoritm given in [7], were an attribute is cosen uniformly at random from te available list of attributes (i.e., from an uniform distribution over te available attributes) to form a node in te tree. In a random forest a set of attributes (say m attributes) are cosen uniformly at random as potential candidates for a node and ten, based on statistical measures suc as information gain (IG), gini gain (GG), etc., an attribute is selected from tis set. Hence, an ensemble of random decision trees is te special case of a random forest, were m = 1. Te primary motivation and te advantage of focusing on tis variant is tat it is igly scalable, since measures suc as IG or GG do not ave to be computed at every node in te tree. In addition, te prediction accuracy of tis model is sown to be quite good in practice [16, 13, 22, 23]. Amit Durandar IBM T.J. Watson aduran@us.ibm.com

2 2 Realizing te widespread utility of an ensemble of random decision trees, it is important tat we carefully analyze tis model. In previous works [11], te autors provided a moment based metodology to accurately caracterize classification models. In particular, tey provided generic expressions to compute te first two moments of te generalization error expected error over te entire input, given an underlying probability distribution. Tese moments were computed over Z(N), wic is te set of all possible classifiers produced by training a given classification algoritm over datasets of size N, drawn independently and identically (i.i.d.) from some distribution. Te callenge was ten to customize tese expressions to specific algoritms in order to meticulously study tem. In [11,10,12], tese expressions were customized for te Naive Bayes Classifier, Nearest Neigbor Classifier and (single) Random decision trees (not an ensemble). It was seen tat suc expressions ave te potential to provide extremely tigt caracterizations of te beavior of individual classification algoritms as well as certain model selection tecniques (viz. cross-validation, old-out-set estimation). In addition to being accurate, tese moments were also very efficient to compute. In [10], one of te future callenges tat was laid out was to develop strategies to efficiently and accurately estimate tese moments for an ensemble of random decision trees. Te strategies and expressions presented in [10] are practical only for single random decision trees, since as we will later see, a straigtforward extension of tose ideas to an ensemble leads to igly inefficient caracterizations. Estimating moments to evaluate quality of classifiers is a standard procedure in macine learning [4, 5], were a dataset is split into multiple training and test sets, and te average error wit a confidence interval or variance is reported as a measure of performance. Our metodology is an idealized version of tis approac were rater tan reporting te error averaged over a limited number of trials, we want to compute te expected error and maybe in some cases even te variance over all possible classifiers tat would be produced by training over datasets of size N sampled from te underlying distribution. Tis as discussed in previous works [11, 10, 12] is a muc more robust evaluation of classification algoritms. Recently, oter suc moment based metodologies [1 3] ave also gained prominence for parameter estimation of models over traditional maximum likeliood and bayesian approaces, wic are inefficient and ave a tendency to get stuck in poor local minima. It is important to stress te fact tat te generic expressions provided in [11] are difficult to customize to specific learning tecniques. Te customization requires analyzing te inner workings of te tecniques, wic are unique to every learning algoritm. Te situation ere is analogous to PAC Bayes bounds literature [17, 15, 18], were many works ave been publised describing a way to compute tem for different tecniques. Moreover, in all te relevant previous works [11, 10, 12], exact and efficient to compute formulations were derived for te respective tecniques. For an ensemble of random decision trees toug, exact formulations based on extensions of te results in [10] lead to igly inefficient caracterizations, wic are exponential in te number of trees T. Tus, we in tis paper, for te first time in tis line of work, describe a novel way to find accurate and efficient to compute upper bounds. Non-trivial and efficient to compute upper bounds can be very useful in gaining confidence in a metod [6]. In fact, te gain in time complexity is exponential in T. Tis is a significant improvement given tat T is usually in te undreds. Te strategies employed to derive tis bound bear little resemblance to previous analysis. Consequently, te contributions in tis paper are by no means attainable by incremental extensions of prior works.

3 3 Table 1 Notation used in te paper. Symbol Meaning X Random vector modeling input. X Domain of random vector (input space) X. Y Random variable modeling output. Y (x) Random variable modeling output for input x. Y Set of class labels (output space). N Sample size ζ Classifier GE(ζ) Generalization error of classifier ζ. Z(N) Te set of classifiers obtained by application of a classification algoritm to i.i.d. samples of size N. E Z(N) [] Expectation w.r.t. te space of classifiers built on a sample of size N. d Dimensionality of te input space. Heigt of trees in te ensemble. pat p p t possible pat amongst te total number of allowed pats given by te tree building algoritm. ξ(pat py) Count of te number of samples in te training set tat correspond to pat p and class y. ηx y Denotes te condition tat class y gets te maximum number of votes in classifying x troug te respective pats in te ensemble. m (S) Number of pats of lengt tat classify x into class y in te sample S. To summarize, we provide in tis paper a metod to efficiently obtain bounds on tese moments for an ensemble of random decision trees grown to a specified eigt. We analyze te fixed eigt variant, since te main strengt of tis tecnique is being able to create an ensemble wit ig diversity [13]. Tis is true for most ensemble tecniques as we want to cover different parts of te ypotesis space. Growing trees too deep limits tis and, ence, restricting te eigt is desirable. Note tat te case were trees are grown to teir full eigt is just a special case in tis setting and tus our analysis is still applicable. Moreover, our analysis does not place any restriction on te number of classes or te number of splits per attribute and is ence applicable to a large class of random decision tree ensembles. Troug experiments on syntetic data and bootstrap distributions built on real data, we sow tat our (upper) bounds are tigter tan te widely used strengt and correlation bounds introduced in [7]. Te rest of te paper is organized as follows. In Section 2, we first describe te ensemble algoritm and ten briefly review te tecnical framework tat our analysis is based on. In Section 3, we discuss te issues wit directly extending te ideas used for caracterizing single random decision trees to an ensemble. We ten provide a solution by adopting a different perspective tan te one in [10]. Moreover, as in te previous study, our analysis applies to categorical as well as continuous attributes wit split points predetermined for eac attribute. In Section 4, we compare te bounds derived from tis analysis wit Breiman s strengt and correlation bounds on syntetic and real datasets. We discuss future lines of researc and summarize te major developments in te paper in section 5. 2 Preliminaries In tis section we first provide a precise description of te algoritm for random decision trees tat we analyze ere. We ten revisit te generic expressions provided in [11] tat need to be customized to specific classification algoritms; in tis case an ensemble

4 4 Algoritm 1 Procedure to build an ensemble of random decision trees. Input: T,, F = {f 1,..., f d } {T is te # of trees in te ensemble, is te eigt and te f i s are te d attributes.} Output: W {Te ensemble.} Set W = φ {No trees in te ensemble initially.} for i = 1; i T ; i + + do Randomly pick f i F {Pick te root.} w = {f i }, w l = {f i } {w denotes te tree built so far and w l te attributes at te leaf of w.} for j = 1; j < ; j + + do w t = φ for f w l do If f as s splits, ten randomly pick s attributes F s F one for eac split suc tat eac attribute is distinct from its ancestors. w t = w t F s end for w l = w t w = w w l end for W = W w end for Return W of random decision trees. Finally, we provide te customized expressions for random decision trees (not ensemble) of prespecified eigt and discuss te intuitions in its derivation. 2.1 Ensemble of Random Decision Trees In tis paper, we consider te following procedure for building eac random decision tree in an ensemble of T trees. Te root of a tree is cosen uniformly at random from te available list of attributes. Nodes at te next level in te tree are cosen uniformly at random from a list of attributes tat excludes te root. Hence, at eac level of te tree a node is picked at random (uniform distribution) from te list of attributes tat excludes its parents and ancestors. Te tree is grown to a prespecified eigt. Tis procedure is clearly elucidated in algoritm 1. It is assumed tat te split points for continuous attributes are predetermined. After T trees ave been built using te aforementioned procedure to form an ensemble, te class label of a datapoint is determined by taking a majority vote. Te procedure for building random decision trees considered ere as been previously mentioned in [13,10] and is sown to be efficient as well as accurate in practice. 2.2 Generic Expressions We first introduce some notation tat is used primarily in tis section. X is a random vector modeling input wose domain is denoted by X. Y is a random variable modeling output wose domain is denoted by Y (set of class labels). Y (x) is a random variable modeling output for input x. ζ represents a particular classifier wit its generalization error (GE) denoted by GE(ζ). Formally,

5 5 GE(ζ) = E[λ(ζ(X), Y )] (1) were λ(.,.) is a zero-one loss function and te expectation is over te distribution on te X Y space. Z(N) denotes a set of classifiers obtained by application of a classification algoritm to different samples of size N. Te fundamental concept beind te generic expressions is to caracterize a class of classifiers induced by a classification algoritm and an i.i.d. sample of a particular size from an underlying distribution. We tus ave a distribution over classifiers, wit te GE of tese classifiers being a random function tat as its own distribution. Computing te entire distribution can be igly inefficient especially for a complicated function suc as te GE of classifiers and ence we resort to efficiently computing te first few moments. Wit tis we now revisit te expressions for te first two moments around zero of te GE of a classifier, E Z(N) [GE(ζ)] = P [X =x] P Z(N) [ζ(x)=y x] P [Y (x) y x] dx, x X y Y (2) [ E Z(N) Z(N) GE(ζ)GE(ζ ] ) = P [X =x] P [ X =x ] x X x X [ P Z(N) Z(N) ζ(x)=y ζ (x )=y x, x ] y Y y Y P [Y (x) y x] P [ Y (x ) y x ] dxdx (3) It becomes clear from te above equations tat to compute tese moments we must be able to caracterize te beavior of a classifier on eac input independently for te first moment, and on pairs of inputs for te second moment. More specifically, we need be able to compute P Z(N) [ζ(x)=y] for te first moment and P Z(N) Z(N) [ ζ(x)=y ζ (x )=y ] for te second moment for te classification algoritm under consideration 1. Te oter terms in tese equations indicate te error of a single classifier or errors of two classifiers derived from te underlying distribution for te first and second moment respectively. Tis can be better understood from te following simple example. If te class prior for class 1 is p and tat for class 2 is 1-p, ten te error of a classifier classifying data into class 1 is 1-p and te error of a classifier classifying data into class 2 is p. 2.3 Customized Expressions for Random Decision Trees As we saw in te previous subsection, in order to compute te moments we need to caracterize P Z(N) [ζ(x)=y] for te first moment and P Z(N) Z(N) [ ζ(x)=y ζ (x )=y ] for te second moment for specific classification algoritms. To caracterize tese probabilities we ave to matematically model te manner in wic a particular classification algoritm classifies a specific input (or pairs of inputs for te second moment). In 1 Tese probabilities and P [Y (x) y] are conditioned on x. We omit explicitly writing te conditional since it improves readability and is obvious from te context.

6 6 te case of decision trees, a specific input is classified based on te relevant pat in te tree from root to leaf wit te majority class being cosen as te class for te input. Tus, te P Z(N) [ζ(x)=y] for a decision tree algoritm is given by, P Z(N) [ζ(x)=y] = P Z(N) [ξ(pat py) > ξ(pat py ), pat pexists, p y y, y, y Y] were p indexes all allowed pats by te decision tree algoritm in classifying input x. After te summation, te term ξ(pat py) is te number of (count of) inputs in te pat pat p tat lie in class y. A similar caracterization exists for P Z(N) Z(N) [ ζ(x)=y ζ (x )=y ] were, rater tan single pats, in eac probability after te sum we ave pairs of pats. Te allowed pats are governed by te attribute selection metod and te stopping criteria in te decision tree algoritm. For example, if we grow random decision trees of eigt were tere are total of d attributes te above probability would be, (4) P Z(N) [ζ(x)=y] = P Z(N) [ξ(pat py) > ξ(pat py ), y y, y, y Y] ( ) (5) p d ( ) d Tis is te case, since for any input tere are possible pats and every pat ( ) is equally likely, i.e., P [pat pexists] = 1 d d p. Te possible pats comes from te fact tat as we build te tree tere are d coices to pick te root, tere are d 1 coices to pick eac of te nodes at te next ( level ) and so on till we reac level, were d we ave d + 1 coices. Eac of tese pats is indexed by p and pat p is te corresponding pat. By similar arguments te probability for te second moment is given by, [ P Z(N) Z(N) ζ(x)=y ζ (x )=y ] = 1 ( ( ) 2 P Z(N) Z(N) [ξ(pat py) > ξ(pat py ), d p,q ξ(pat qy ) > ξ(pat qy ), y y, y y, y, y, y, y Y ] ) An important ting to notice in te above formulas is tat tere are two sources of randomness, te first due to te tree building metod (i.e., pat exists) and te second due to te random samples (i.e., comparing counts) tat are generated from te underlying distribution. As we will see later, tis observation will elp us in deriving bounds for te moments of an ensemble of random decision trees. (6)

7 7 3 Caracterizing an Ensemble of Random Decision Trees In tis section, we first sow tat naively extending te analysis for (single) random decision trees to an ensemble leads to igly inefficient formulations wit no obvious way of deriving reasonable bounds on te moments. We ten suggest a solution using te generic expressions tat is efficient to compute and gives tigter bounds tan directly finding bounds on te moments. 3.1 Straigtforward Extension An ensemble of random decision trees classify an input into a class based on te classes predicted by individual trees in te ensemble. In particular, any input is classified into te class tat receives te maximum number of votes. To caracterize P Z(N) [ζ(x)=y] we ave to caracterize all possible ways in wic class y receives te maximum number of votes in classifying input x. Given tat te ensemble as T trees wit p i indexing all allowed pats for an input x in te i t tree (i T ) we ave, 1 ( P Z(N) [ζ(x)=y] = ( ) T P Z(N) [ηx] y ) (7) d p 1,...,p T were η y x denotes class y gets te maximum number of votes in classifying x troug te respective pats of te T trees. Hence, given a set of pats, P Z(N) [η y x] computes te probability of classifying input x into y over all samples of size N from an underlying distribution. η y x is caracterized by comparing counts of various classes in te leaves of te trees in te ensemble. Tis is a generalization of te caracterization for single trees sown in equation 5, were counts are compared in a single leaf to counts being compared in te corresponding leaves of all te trees in te ensemble. Tis leads to an exponential in T number of massive joint probabilities tat need to be computed. Given tat T is generally in te undreds, exactly computing te above formula is impractical for even extremely small domains. It is also important to notice tat te joint probabilities containing te counts from different trees cannot be factorized as a product of probabilities containing counts of individual trees, altoug te trees are built independently, te individual tree classifiers are correlated troug te sample. Tere also does not seem to be any obvious way of deriving efficient and tigt bounds on tese expressions. Te same issues are faced wen estimating or bounding te probability for pairs of inputs used in te computation of te second moment. 3.2 Main Result In te previous subsection we observed tat a naive extension of te ideas used to caracterize single random decision trees leads to igly inefficient formulations for an ensemble of random decision trees. As indicated before, tere are two sources of randomness: te first due to te random tree building metod and te second due to te random samples from te underlying distribution. To derive te main result in tis paper, we caracterize te second source of randomness, ten fix it and compute probabilities based on te first source of randomness. Tis strategy leads to bounds tat can be computed efficiently. Wit tis we ave te following teorem.

8 8 Teorem 1 Consider a set of samples drawn from te underlying distribution tat ave cumulative measure/probability mass α and let S max be a sample in tis set for wic te number of pats of lengt ( d), tat classify input x into class y is maximum. Let us denote tis maximum number( of ) pats by m (S max). Ten given d tat tere are T trees in te ensemble and q = is te total number of pats of lengt we ave, P Z(N) [ζ(x)=y] βb( T 2, T, m (S max) ) + (1 β) q ) were 0 β α 1 and B( T 2, T, m (S) q ) = ( T T i= T 2 i ( m (S) q ) i (1 m (S) q ) (T i). Te bound on te probability for te second moment is analogous to te above bound wit te only difference being tat we consider pairs of inputs rater tan just single inputs. Before we describe te details in deriving te above bound and tecniques to efficiently estimate te necessary parameters (viz. m (S max) and β), we provide a brief overview of ow tese critical parameters can be estimated. 1. To compute m (S max) for samples of size N wit total measure α, find te (worst) sample for wic te number of pats of lengt tat would classify input x into class y is maximum in tis set. 2. Given tis sample we can easily ceck to see ow many of tese q pats classify input x into class y. Te iger te m (S max) te looser te bound. 3. β can be estimated using Bonferronis inequality [21] for te given set of samples. Te lower te β te looser te bound. Notice tat tere can be many sets of samples of size N wit measure α. Te best set of samples, i.e., te ones tat would give te tigtest bound would be te ones tat ave te lowest m (S max). In tis paper, we provide a strategy tat finds a particular set of samples efficiently tat are not necessarily te best. Te set of samples our procedure finds seems reasonable, owever, given tat our bounds are mostly non-trivial and significantly tigter tan Breiman s bounds, as witnessed in te experimental section. Decipering te best set of samples efficiently is part of future researc. 3.3 Derivation and Efficient Parameter Estimation of te Bound We ave seen tat a straigtforward extension of te ideas tat were used to caracterize single random decision trees cannot be used (due to computational considerations) to caracterize an ensemble of random decision trees. We tus ave to come up wit a different strategy or perspective to tackle tis problem. Tere are 3 key elements in coming up wit tis caracterization, wic are as follows: 1. First, caracterize te second source of randomness, i.e., randomness due to te underlying distribution, and ten conditioning on tis source of randomness caracterize te first source of randomness wic is due to te tree building process. 2. Define a parameter 0 α < 1 tat denotes te measure of te samples of size N drawn from te underlying distribution. Tis parameter affects te tigtness of te bounds.

9 9 Fig. 1 Te figure sows te key steps in deriving te bound for an ensemble of random decision trees. 3. Find a set of samples wose measure is exactly α can be computationally intensive and, ence, find samples wit measure say β were β α wic can be done efficiently and still leads to an useful upper bound on te moments. Tese elements are also mentioned in figure 1, and te details regarding tem are given below. Two Sources of Randomness: As mentioned before, tere are two sources of randomness tat are present: te first due to te tree building metod (i.e., pat exists) and te second due to te random samples (i.e., comparing counts) tat are generated from te underlying distribution. In te caracterization for random decision trees, we observe tat we compute te probability for a certain pat to exist and ten, conditioned on tis pat, we compute te probability of classifying an input into a particular class over all samples of a particular size. In oter words, we caracterize te first source of randomness and ten fix it in te computation of P Z(N) [ζ(x)=y] and P Z(N) Z(N) [ ζ(x)=y ζ (x )=y ]. However, tere is also te oter possibility of caracterizing te second source of randomness and fixing it in order to caracterize tese probabilities. In tis scenario we would ave, P Z(N) [ζ(x)=y] = S D(N) P [S]P t [η y x S] (8) were D(N) denotes all datasets of size N from te underlying distribution, and P t [η y x S] is te probability of classifying input x into class y based on te random tree generation process denoted by t given a particular dataset S D(N). For a given dataset, te number of pats of lengt tat would classify an input x into class y is te same for all trees in te ensemble. In addition, te trees are built independently.

10 10 Hence, if m (S) is te number of pats of lengt tat classify input x into class y given a dataset S and if T is te total number of trees in te ensemble, ten P t [η y x S] is given by a binomial cumulative distribution function (cdf), ( ) d were q = ( ) T T i= T 2 i P t [ηx S] y = B( T 2, T, m (S) ) (9) q is te total number of pats of lengt and B( T 2, T, m (S) q ) = ( m (S) q ) i (1 m (S) q ) (T i). It is tus easy to see tat given a sample, P t [ηx S] y can be computed efficiently. However, te number of samples will be uge for any reasonable problem size. Hence, directly computing te moments using tis formulation also does not seem feasible. In fact, tis formulation is infeasible even for single random decision trees. Noneteless, tis formulation does present us wit an opportunity to compute bounds on te moments wic we will now derive. Te solution we provide is efficient and, as we will see in te experimental section turns out to be tigter tan Breiman s bounds: κ (1 s2 ), were κ is te correlation between te s 2 random decision trees in an ensemble and s is te strengt of te resultant classifier. 2 Defining α: One way of upper bounding P Z(N) [ζ(x)=y], is to find te maximum P t [ηx S] y over all samples and ten to substitute tis value for eac of te conditionals in equation 8. Te maximum P t [ηx S] y would most likely be 1, since for arbitrary distributions tere migt exist samples were m (S) = q. Tis would make te bound trivial (i.e., 1), since S D(N) P [S] = 1. Hence, rater tan coosing all samples, we can coose a fraction of tem wit measure α to upper bound te probabilities. α = 1 corresponds to coosing all samples wile lower values of α (viz or 0.99, etc.) correspond to coosing a subset of te samples wit measure α. For tis subset, we can find te sample wic as te maximum m (S) and, ence, te maximum P t [ηx S] y for tat set. In te experimental section, we will see tat our metod of finding tis sample and consequently te maximum P t [ηx S] y for tat set leads to bounds tat are significantly tigter tan Breimans strengt and correlation bounds. Wit tis we ave te following inequality, P Z(N) [ζ(x)=y] = P [S]P t [ηx S] y S D(N) αp t [ηx S y max] + (1 α) (10) S max is te sample for wic P t [η y x S max] as te maximum value in tat set of samples wit total measure α. Note tat tere can be many sets of samples wit total measure α. Ideally, one would want to coose te set for wic P t [η y x S max] is te least, since tis would lead to te tigtest possible bound among te oter alternatives. At tis point, it is not clear ow tis optimal set can be decipered; owever, we provide a solution were we find one of te feasible sets in an efficient manner and, as we will see later, tis solution tends to give reasonably good bounds. As mentioned before, our analysis applies to discrete attributes as well as continuous attributes wit prespecified split points. Given tis, any distribution over te 2 For furter details refer to [7] and [8].

11 11 Table 2 Example data distribution. X, Y y 1 y 2 y c x 1 p 11 p 12 p 1c x 2 p 21 p 22 p 2c.. x n p n1 p n2 p nc N data can be represented as a multinomial wit parameters N, p 11, p 12,..., p nc as sown in table 2. Here, N is te sample size and p 11, p 12,..., p nc are te probabilities for te corresponding input-output pairs (n distinct inputs and c classes) to occur wit n,c i=1,j=1 p ij = 1. Notice tat bootstrap distributions built by resampling a dataset are subsumed by te given class of distributions. Given suc a multinomial distribution, we want to find a set of samples/datasets of size N were te total measure of tese datasets is equal to α. Computing β: Attempting to directly find tese samples would be computationally intensive, since we would ave to sum te measures of eac sample in te set in order to verify tat te measure is α. However, since we want to find an upper bound on P Z(N) [ζ(x)=y], substituting a lower bound for α in equation 10 would give us an upper bound on P Z(N) [ζ(x)=y], as desired. Finding a lower bound on α can be done efficiently wit te elp of Bonferroni s inequality [21]. Bonferroni s inequality states tat given g random variables Z 1,..., Z g and 2g real numbers a 1,..., a g, b 1,..., b g te following relationsip olds, P [a 1 Z 1 b 1,..., a g Z g b g] g ( 1 1 P [ai Z i b i ] ) (11) i=1 Hence, te joint probability of variables can be lower bounded by a function of te marginals. Te reason tis result is useful is tat one-dimensional confidence regions (i.e., confidence intervals or domain of marginals) of a particular measure can be efficiently computed as opposed to simultaneous (multidimensional) confidence regions wic are muc arder to exactly compute [19,20]. One easy way of obtaining confidence intervals for commonly used measures is by using standard formulas. For example, if we want te confidence interval of 0.95 measure for Z i, we know tat an approximation of it is given by µ i σ i, were µ i and σ i are te mean and standard deviation of Z i. Similar formulas are known for oter measures and, ence, one can compute confidence intervals for tese measures in constant time. If we want te exact confidence interval, we can compute it as a binomial cdf, wic can also be done in essentially constant time using series expansions for te incomplete regularized beta function. Wit tis if β is te measure on te rigt and side of equation 11, ten te measure over te simultaneous confidence region or te relevant set of datasets for our problem is at least β. If α is te true measure over te set of datasets, but β α is wat we efficiently compute based on te above inequality, ten we ave,

12 12 Table 3 Collapsed view of te underlying distribution as required for bounding P Z(N) [ζ(x i )=y]. X, Y y 1 y 2 y c x x i 1 p x i 11 p x i x x i 2 p x i 21 p x i. x x i k R x i p x i k1 12 p x i 1c 22 p x i 2c p x i k2 p x i kc p x i R N P Z(N) [ζ(x)=y] = P [S]P t [ηx S] y S D(N) αp t [ηx S y max] + (1 α) βp t [ηx S y max] + (1 β) (12) In our problem, te Z i represent te number of datapoints corresponding to an input-output pair were te input as at least attributes wit te same value as te input x i for wic te above probability is to be upper bounded. Tis is seen in table 3, were x xi j denotes te j t input wic as at least attributes wit te same value as input x i. If eac of te d attributes can take v values 3 giving a total of n = v d distinct inputs, ten k = ( ) d d i= (v 1) d i < n and R xi denotes te entry in te table for i te probability remaining after accounting for probabilities of eac of te kc pairs i.e. p xi R = 1 k,c j=1,l=1 pxi jl and N as usual is te sample size. Hence, we ave kc degrees of freedom i.e. g = kc, since te probabilities sum to 1 and te probability in te last cell (p xi R ) is determined by te oter probabilities. Wit tis we ave Z 1 is te random variable corresponding to te count in te cell formed by te intersection of input x xi 1 and class y 1, Z 2 is te random variable corresponding to te count in te cell formed by te intersection of input x xi 1 and class y 2 and so on until Z g, wic is te random variable corresponding to te count in te cell formed by te intersection of input x xi k and class y c. Consequently, one way of obtaining a confidence interval of measure 0.95 for te random variable representing te intersection of input x xi j and class y l would be Np xi jl ( Np xi jl (1 pxi jl )). We can owever coose any confidence interval for eac Z i (not necessarily te same) so as to ave te appropriate β, wic as we know, lower bounds α. We also need to ceck tat te individual confidence intervals produce legal datasets, since te total sample size is a fixed number N. Tis can be easily done by adding te upper limit of te intervals for eac Z i and verifying tat te eventual sum is N. If it is not, one can adjust te limits of one or more of te intervals so as to satisfy te condition. Te confidence intervals for eac Z i togeter form a confidence region tat represents te relevant set of datasets. From tis set we now ave to find S max, i.e., te dataset for wic te number of pats tat classify input x into class y is te igest among te set. Tis dataset is te one formed by taking upper limits of te confidence intervals corresponding to class y and te lower limits of te remaining confidence intervals, wit te appropriate value for 3 Tis is after splitting te continuous attributes.

13 13 Algoritm 2 Overview of procedure to find S max and m (S max) for an input-output pair (x i, y b ). Input:, X = { x 1,..., x n}, Y = {y 1,..., y c}, P = {p 11,..., p nc} {P is te estimated or given underlying probability distribution over X Y.} Output: S max, m (S max) Let x x i j denote te j t input wic as at least attributes wit te same value as input x i. Let p x i jl be te (rolled-up) probability of being (x x i j, y l), wic is computed from P. Let n r = N for all j {1,..., k}, l {1,.., c} do if l=b {γ jl below depends on desired β, wic is te rigt side of equation 11, and N.} ten Compute n jl = Np x i jl + γ jl( Np x i jl (1 px i jl )) else Compute n jl = Np x i jl γ jl( Np x i jl (1 px i jl )) end if Compute n r = n r n jl end for S max = {n 11,..., n kc, n r} Compute m (S max) = Count # of pats of lengt in S max, were y b is te majority class. Return S max, m (S max) te cell corresponding to R x, in order to ave a total sample size of N. A dataset formed by tis procedure would be S max, since taking te upper and lower limits as indicated would indeed produce a dataset wit maximum number of pats classifying te particular input into class y in te relevant set. An overview of tis procedure is given in algoritm 2. Wit tis, we ave completely described a process tat can be used to upper bound P Z(N) [ζ(x)=y]. An analogous procedure can be used to upper bound P Z(N) Z(N) [ ζ(x)=y ζ (x )=y ] wit te sligt difference being we ave to consider pairs of inputs rater tan single inputs. Te basic tecnique, owever, remains te same. 3.4 Time Complexity In te previous subsection we provided a strategy to upper bound P Z(N) [ζ(x)=y] [ and P Z(N) Z(N) ζ(x)=y ζ (x )=y ] x, x X, y, y Y and consequently te moments. In tis subsection, we analyze te time complexity of tis strategy and compare it wit te straigtforward extension described in section 3. Consider te following setup, were we ave d input attributes eac taking v values, a class attribute taking c values, a sample size of N and T trees in te ensemble eac of eigt d. Given tis, te straigtforward extension would ave a time complexity of O(ncq T N T +1 ) for te first moment and O(n 2 c 2 q 2T N 2T +2 () for ) te second moment, were n = v d d is te number of distinct inputs and q = is te total number of pats of lengt for a particular input. Te nc term in te time complexity comes from te fact tat we ave to estimate nc probabilities, te q T term indicates we ave to sum over all combinations of q pats for te T trees and N T +1 is te time it takes to compute eac of te uge probabilities in equation 7. Te term N T +1 can be reduced by approximating eac of te probabilities using Monte Carlo sampling,

14 14 owever, te q T term is still very intimidating given tat T is usually in te undreds. Te time complexity of our bound is O(nkc 2 qt ) for te first moment and a quadratic function of tis value for te second moment, were k = ( ) d d i= (v 1) d i < n. i Tis time complexity comes from te fact tat tables, suc as table 3, can be created for eac input in one pass of te dataset and ten for eac input-output pair we can find m (S max) q followed by te binomial probability in O(kcqT ) time. A key component enabling tis exponential gain is te fact tat te simultaneous confidence region can be lower bounded by bounding eac variable independently and tat m (S max) can be found efficiently as sown before. Hence, tere is a uge overall reduction in time complexity for practical applications as tere is an exponential gain in T, wit te sample size N not affecting te analysis. Te increased O(kc) factor is more tan compensated for by tese reductions. We ave tus reduced te complexity from someting tat was practically impossible to compute even for small-scale problems to someting tat is reasonable to compute for medium-scale and even certain large-scale problems. Moreover, te bounds on te individual probabilities can be computed in parallel. 3.5 Practical Considerations In te previous subsection we analyzed te worst-case time complexity of computing our bound. It owever may be possible to furter speed up its computation in practice. For instance, to upper bound te moments, we may not ave to upper bound te probabilities for every input-output pair. For every input, it is sufficient to upper bound every probability starting from te one corresponding to te igest error (i.e., igest P [X =x] P [Y (x) y] y Y ) to te ones wit lower errors in a (descending) sorted fasion until te upper bounds exceed 1 or if we reac te probability wit te lowest error, i.e., for te first moment te P Z(N) [ζ(x)=y] need not be upper bounded if P [X =x] P [Y (x) y] is te lowest error among all of Y and similarly for te second moment. Te weigt multiplied to te lowest error term corresponding to input x is 1 y Y,y y U(x, y ) were U(x, y ) is te upper bound on P Z(N) [ ζ(x)=y ] if te sum of U(x, y ) does not exceed 1. Tis can be done, since te probabilities for classifying an input into different classes form a convex combination. Conversely, if we first upper bound te probability wit te lowest error for a particular input and ten in a (ascending) sorted fasion upper bound probabilities wit successively iger errors until one of te aforementioned conditions are met, we would get lower bounds on te moments. Suc a strategy would give tigter bounds tan blindly upper bounding every probability. It is important to realize tat, because of te generic expressions, we get tigter bounds wen compared wit directly bounding te moments using te procedure mentioned in te previous subsection. From Bonferonni s inequality in equation 11, we see tat as te number of random variables increase, te bound (linearly) loosens. Hence, aving fewer variables will invariably lead to tigter bounds. If we were to derive bounds directly, we would ave to consider te entire dataset wit all te cells witout collapsing or aggregation of cells into fewer cells and ence fewer variables [11], as is te case using te generic expressions. Te reason for tis is tat te generic expressions consider cells corresponding to an individual input or pairs of inputs rater tan te entire input space.

15 15 4 Experiments In te previous section we described a tecnique to derive bounds on te moments. In tis section, we test te usefulness of tese bounds by performing experiments on syntetic and real data. In particular, we compare te tigtness of te upper bound derived by our metod wit te commonly used strengt and correlation bounds given by Breiman, wic are upper bounds on GE. As we will see, our bound tends to be significantly tigter tan te alternative in most circumstances. 4.1 Setup Te experimental setup tat we ave ere is very similar to te one in a previous study [10]. In eac of te experiments on syntetic and real data tat we describe below, we obtain a robust estimate of E Z(N) [GE(ζ)] by generating 1000 old-out or test sets of datapoints and ten averaging te error over tese 1000 old-out sets. Note tat, even in te syntetic case, te E Z(N) [GE(ζ)] cannot be computed exactly, since te expectation is over all possible classifiers built over datasets of size N, wic is computationally infeasible even for N in te undreds. We set te confidence intervals per variable (Z i ) to Te number of trees in te ensemble (T ) is set to a 100 for te real data and varied from 100 to 500 for te syntetic data. We also report te average computation time in seconds for te respective bounds. Te experimental setup for syntetic data is as follows: In our initial experiments we fix N to 100 and ten increase it to Te number of classes is fixed to two. We compute te upper bounds on E Z(N) [GE(ζ)] were te number of attributes is fixed to d = 5 wit eac attribute aving 2 attribute values. We ten increase te number of attribute values to 3 to observe te effect tat increasing te number of split points as on te performance of te estimators. We also increase te number of attributes to d = 8 to study te effect tat increasing te number of attributes as on te performance. Wit tis we ave a d + 1 dimensional contingency table wose d dimensions are te attributes and te (d + 1) t dimension represents te class labels. Wen eac attribute as two values, te total number of cells in te table is c = 2 d+1 and wit tree values te total number of cells is c = 3 d 2. If we fix te probability of observing a datapoint in cell i to be p i suc tat c i=1 p i = 1 and te sample size to N te distribution tat perfectly models tis scenario is a multinomial distribution wit parameters N and te set {p 1, p 2,..., p c}. In fact, irrespective of te value of d and te number of attribute values for eac attribute, te scenario can be modeled by a multinomial distribution. In te studies tat follow, te p i s are varied and te amount of dependence between te attributes and te class labels (ρ) is computed for eac set of p i s using te Ci-square test [9]. More precisely, we sum over all i te squares of te difference of eac p i wit te product of its corresponding marginals, wit eac squared difference being divided by tis product, tat is, correlation = (p i p im) 2 i p im, were p im is te product of te marginals for te i t cell. In te case of real data, we perform experiments by building bootstrap distributions on tree UCI data sets tat ave been used previously in te context of decision trees [10] and six oter industrial datasets obtained from diverse domains. Te six proprietary datasets denoted by Oil, Semiconductor, Store, Spend, Airline and Invoice in table 6, are obtained from te petrocemical industry, te semiconductor industry,

16 16 te consumer products industry, te finance domain, te airline industry and te procurement domain respectively. Oil: Te Oil dataset as information obtained from a major oil corporation. Tere are a total of 9 measures in te dataset. Tese measures are obtained from te sensors of a 3-stage separator tat separates oil, water, and gas. Te 9 measures are composed of 2 measured levels of te oil water interface at eac of te 3 stages and 3 overall measures. In particular, te measures are as follows: 1. 20LT 0017 mean (stage 1) 2. 20LT 0021 mean (stage 1) 3. 20LT 0081 mean (stage 2) 4. 20LT 0085 mean (stage 2) 5. 20LT 0116 mean (stage 3) 6. 20LT 0120 mean (stage 3) 7. Daily water in oil (Daily WIO) 8. Daily oil in water (Daily OIW) 9. Daily production normal Our target measure is te last listed measure: Daily production normal (binary variable), wic indicates if te daily oil production meets te desired level or not. Te dataset size is 992. Semiconductor: In te cip manufacturing industry predicting wafers lying witin certain spec (wic is a collection of cips) accurately aead of time can be crucial in coosing te appropriate set of wafers to send forward for furter processing. Eliminating faulty wafers can save te industry a uge amount of resources in terms of time and money. Te dataset we ave as 175 features including te target. Te target is a binary variable, wic indicates if te wafer conforms to required spec or not. Te oter features are a combination of pysical measurements and electrical measurements made on te wafer. Te dataset size is Store: In te consumer products industry, predicting wen a certain item of a competitor brand may or may not be on promotion can be critical for strategic planning. Te dataset we ave as information about past promotions tat we ave carried out for a type of bread and, for te corresponding time periods, we also ave information about 10 competitors tat ave carried out promotions. Te target ere is te binary time series corresponding to our closest competitor. Te dataset size is 140 corresponding to 140 weeks of data. Spend: Te Spend dataset contains a couple of years wort of (spend) transactions spread across various categories belonging to a large corporation. Tere are transactions wic are indicative of te companies expediture in tis time frame. Te dataset as 13 attributes namely; requestor name, cost center name, description code, material group, vendor name, business unit name, region, purcase order type, addressable, spend type, compliant, invoice spend amount. Our target is te compliance attribute, wic indicates if a transaction was compliant or not. Given tis, te goal is to identify te caracteristics of a transaction tat igly correlate wit it being compliant/non-compliant. Wit tis information te company would be able to put in

17 17 Table 4 Te table sows E Z(N) [GE(ζ)] and te corresponding upper bounds for different levels of correlation (ρ) between te attributes and class labels for T = 100. Te values before te commas in te cells under different levels of correlation correspond to our metod wile te values following te commas correspond to Brieman s strengt and correlation bounds. Parameter Settings Split Metric ρ = 0.9 ρ = 0.36 ρ = 0.11 ρ = 0.02 N = 100, binary Bounds 0.22, , , ,1.34 d = 5, = 3 E Z(N) [GE(ζ)] Time 0.09, , , ,0.11 N = 100, ternary Bounds 0.48, , , ,1.7 d = 5, = 3 E Z(N) [GE(ζ)] Time 0.21, , , ,0.32 N = 100, binary Bounds 0.71, , , ,0.97 d = 8, = 3 E Z(N) [GE(ζ)] Time 0.35, , , ,0.56 N = 10000, binary Bounds 0.22, , , ,0.69 d = 5, = 3 E Z(N) [GE(ζ)] Time 0.10, , , ,0.12 N = 10000, ternary Bounds 0.26, , , ,1.42 d = 5, = 3 E Z(N) [GE(ζ)] Time 0.22, , , ,0.32 N = 10000, binary Bounds 0.68, , , ,0.62 d = 8, = 3 E Z(N) [GE(ζ)] Time 0.34, , , ,0.58 place appropriate policies and practices tat could lead to potentially uge savings. Airline: Tis dataset contains information about fligts of a major airline carrier. In particular, te dataset as information about te date of departure and arrival of a fligt, te airport te fligt daparts from and arrives at, te pilots serial number, te type of aircraft, te dispatcers name, te cost and multiple fuel indicators signifying te amount of fuel put in te fligt, te minimum required by te government and oter safety fuel limits. We ave information of over 100,000 fligts and te goal is to identify factors tat lead to te fligt carrying excess fuel tan wat is required. We ave a binary indicator, wic depicts if te fligt carried excess fuel. Controlling te factors wic lead to tis can elp an airline carrier save money and improve safety. Invoice: Large corporations ave many of business units (BUs) viz. travel, marketing, auditing, information tecnology etc. wit eac BU consisting of various commodity councils (CCs) viz. office supplies, tec services, communication services, etc. Trougout a calendar year, eac of te commodity councils carry out multiple transactions and record te corresponding invoices. It is extremely useful for tese CCs and BUs at large to estimate in advance te invoice amounts tat are likely to be registered in te near future. Tis dataset as 8 BUs wit eac BU consisting of 150 CCs. Te data was collected daily for a year and, ence, te dataset as only 365 datapoints. Our target is te CC communication services 4 under te BU information tecnology, wic as one of te igest invoice amounts and terefore is critical to business. For eac of te above datasets, we split te continuous attributes at te mean of te given data. We tus can form a contingency table representing eac of te data 4 Partitioned into 3 categories ig, medium and low.

18 18 Table 5 Te table sows E Z(N) [GE(ζ)] and te corresponding upper bounds for different levels of correlation (ρ) between te attributes and class labels for T = 500. Te values before te commas in te cells under different levels of correlation correspond to our metod wile te values following te commas correspond to Brieman s strengt and correlation bounds. Parameter Settings Split Metric ρ = 0.9 ρ = 0.36 ρ = 0.11 ρ = 0.02 N = 100, binary Bounds 0.23, , , ,1.29 d = 5, = 3 E Z(N) [GE(ζ)] Time 0.48, , , ,0.56 N = 100, ternary Bounds 0.49, , , ,1.8 d = 5, = 3 E Z(N) [GE(ζ)] Time 1.07, , , ,1.59 N = 100, binary Bounds 0.73, , , ,0.98 d = 8, = 3 E Z(N) [GE(ζ)] Time 1.78, , , ,2.89 N = 10000, binary Bounds 0.23, , , ,0.67 d = 5, = 3 E Z(N) [GE(ζ)] Time 0.49, , , ,0.60 N = 10000, ternary Bounds 0.29, , , ,1.41 d = 5, = 3 E Z(N) [GE(ζ)] Time 1.08, , , ,1.73 N = 10000, binary Bounds 0.71, , , ,0.61 d = 8, = 3 E Z(N) [GE(ζ)] Time 1.73, , , ,2.96 Table 6 Te table sows E Z(N) [GE(ζ)] for te real datasets wit te respective upper bounds and running time. Dataset E Z(N) [GE(ζ)] Our Bound, Time Breiman s Bound, Time Pima Indians , , 0.57 Balloon , , 0.08 Suttle Landing Control , , 0.10 Oil , , 0.63 Semiconductor , , Store , , 0.71 Spend , , Airline , , Invoice , , sets. Te counts in te individual cells divided by te data set size provide us wit empirical estimates for te individual cell probabilities (p i s). Tus, wit te knowledge of N (data set size) and te individual p i s, we ave a multinomial distribution. Using tis distribution we observe te beavior of te bounds. Te results are sown in tables 4, 5 and 6, were we also see te upper bounds computed using Breiman s formula [7]: κ (1 s2 ) described before. Estimating κ and s we s 2 find te upper bound on te GE for a particular classifier. Since we need an estimate of E Z(N) [GE(ζ)], we perform te above procedure multiple times tus building multiple ensembles and computing an upper bound on GE for eac. We ten average te upper bounds tat we ave computed and report te result as an estimate of te upper bound on E Z(N) [GE(ζ)].

19 Observations In te syntetic experiments, our bound is te tigtest wen d = 5, = 3 and wen we ave binary splits wile it is te worst wen d = 8, = 3. Tis is because te number of variables is te least in te first scenario wile it is te most in te second scenario. Tis trend is especially seen at moderate correlation (0.36), since te number of pats classifying inputs into te class wit ig errors for sample S max increases more dramatically wit te number of variables for tis case tan at ig or low correlations. In any case, we do significantly better tan Breiman s bound. Increasing te sample size from 100 to as little effect on te time it takes to compute our bound wile te quality of te bound remains intact, wic is promising. Increasing te number of trees in te ensemble from 100 (table 4) to 500 (table 5) also maintains te quality of our bound toug te time to compute it linearly increases. Tis is consistent wit complexity analysis we provided in te previous section. In te experiments on real data, we see a similar beavior were our bound outperforms Breiman s bound on all te nine datasets. Tese datasets are from diverse domains and exibit varied caracteristics in terms of sample size, dimensionality and type of attributes. Te type of attributes are a mix of continuous and discrete, wit te discrete attributes aving a range of splits. We see in tese as well as te syntetic experiments tat te time to compute our bound increases as te number of attributes and te splitting factor increases, toug it is still faster tan computing Brieman s bound. Te reason for tis is tat we do not ave to explicitly build te trees to compute our bound. Hence, from te experiments we see tat our bound is never worse tan tat computed using Breimans s formula, toug in certain situations it is close to being trivial, i.e., close to 1. 5 Discussion In tis paper we described a procedure to derive bounds for an ensemble of random decision trees. It would be interesting to extend suc an analysis to te more general random forest algoritm. In order to derive bounds for te random forest algoritm, we would ave to caracterize te probability of coosing a particular attribute based on two criteria: first, tat it is part of te subset of attributes tat are randomly cosen and, second, tat it as te maximum IG or GG in tat set. For te ensemble of random decision trees, we just ad to deal wit te first criteria wit a subset size of one. Deriving acceptable bounds for te random forest algoritm efficiently is tus a callenge for te future. In te analysis presented in previous sections, one of te main ideas in computing te bound efficiently was to use Bonferroni s inequality. However, tis inequality can lead to loose bounds wit increasing dimensionality. In te future, it would be interesting to come up wit better caracterizations of te simultaneous confidence region leading to tigter and, ence, more useful bounds. Te callenge, owever, is to deciper tese regions in a scalable fasion. To conclude, we ave presented a novel procedure to derive bounds on te moments of GE for an ensemble of random decision trees. Tis procedure is efficient and leads to tigter bounds tan te well establised Breiman s strengt and correlation bounds. Te derivation of tese bounds required a perspective and a set of analytical tools

20 20 tat are significantly different from te strategies used to caracterize (single) random decision trees [10]. It would be interesting to see if te ideas used in analyzing te ensemble of random decision trees can be used to analyze oter suc ensembles in te future. Acknowledgement I would like to tank te editor and te anonymous reviewers for teir constructive comments. I would also like to tank Katerine Durandar for proofreading te paper. References 1. A. Anandkumar, D. Foster, D. Hsu, S. Kakade and Y. Liu. A Spectral Algoritm for Latent Diriclet Allocation. In NIPS, page , Lake Taoe, USA, B. Boots and G. Gordon. Two Manifold Problems wit Applications to Nonlinear System Identification. In ICML, page 338, Edinburg, Scotland, UK, N. Bsouty and P. Long. Finding Planted Partitions in Nearly Linear Time using Arrested Spectral Clustering. In ICML, page , Haifa, Israel, T. Hastie, R. Tibsirani and J. Friedman. Elements of Statistical Learning. Springer, 2 edition, R. Duda, P. Hart, D. Stork. Pattern Classification. Wiley New York, 2 edition, A. Durandar and A. Dobra. Distribution free bounds for relational classification. Knowledge and Information Systems, L. Breiman. Random forests. Macine learning, 45(1):5 32, S. Buttrey and I. Kobayasi. On strengt and correlation in random forests. In Proceedings of te 2003 Joint Statistical Meetings, Section on Statistical Computing, J. Connor-Linton. Ci square tutorial. ttp:// web ci tut.tml, A. Durandar and A. Dobra. Probabilistic caracterization of random decision trees. Journal of Macine Learning Researc, 9: , A. Durandar and A. Dobra. Semi-analytical metod for analyzing models and model selection measures based on moment analysis. ACM Transactions on Knowledge Discovery and Data Mining, A. Durandar and A. Dobra. Probabilistic caracterization of nearest neigbor classifiers. Intl. Journal on Macine Learning and Cybernetics, W. Fan, H. Wang, P. Yu, and S. Ma. Is random model better? on its accuracy and efficiency. In ICDM 03: Proceedings of te Tird IEEE International Conference on Data Mining, page 51, Wasington, DC, USA, IEEE Computer Society. 14. P. Geurts, D. Ernst, and L. Weenkel. Extremely randomized trees. Macine Learning, 63(1):3 42, Jon Langford. Tutorial on practical prediction teory for classification. J. Mac. Learn. Res., 6: , December F. Liu, K. Ting, and W. Fan. Maximizing tree diversity by building complete-random decision trees. In PAKDD, pages , D. McAllester. Pac-bayesian model averaging. In In Proceedings of te Twelft Annual Conference on Computational Learning Teory, pages ACM Press, David Mcallester. Simplified pac-bayesian margin bounds. In In COLT, pages , S. Roy and R. Bose. Simultaneous confidence interval estimation. Annals of Matematical Statistics, 24(3): , C. Sison and J. Glaz. Simultaneous confidence intervals and sample size determination for multinomial proportions. JASA, 90(429): , Y. Tong. Probabilistic Inequalities for Multivariate Distributions. Academic Press, 1 edition, K. Zang and W. Fan. Forecasting skewed biased stocastic ozone days: analyses, solutions and beyond. Knowl. Inf. Syst., 14(3): , X. Zang, Q. Yuan, S. Zao, W. Fan, W. Zeng, and Z. Wang. Multi-label classification witout te multi-label cost. In SDM 10: Proceedings of te Siam Conference on Data Mining, pages , 2010.

21 21 Autor Biograpy Fig. 2 Amit Durandar is a researc staff member in te Matematical Sciences Dept. at IBM T.J. Watson. He received is B.E. in computer engineering from Pune University in He ten received is Masters and P..d. in computer engineering from University of Florida in 2005 and 2009 respectively. He is te acting Knowledge Discovery and Data Mining Professional Interest Community (KDD PIC) cair at IBM T.J. Watson. Broadly speaking, Amit s researc interests primarily span te areas of macine learning, data mining and computational neuroscience. He as autored several papers and as been a reviewer for many top quality conferences and journals.

Bounds on the Moments for an Ensemble of Random Decision Trees

Bounds on the Moments for an Ensemble of Random Decision Trees Noname manuscript No. (will be inserted by te editor) Bounds on te Moments for an Ensemble of Random Decision Trees Amit Durandar Received: / Accepted: Abstract An ensemble of random decision trees is

More information

A = h w (1) Error Analysis Physics 141

A = h w (1) Error Analysis Physics 141 Introduction In all brances of pysical science and engineering one deals constantly wit numbers wic results more or less directly from experimental observations. Experimental observations always ave inaccuracies.

More information

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households

Volume 29, Issue 3. Existence of competitive equilibrium in economies with multi-member households Volume 29, Issue 3 Existence of competitive equilibrium in economies wit multi-member ouseolds Noriisa Sato Graduate Scool of Economics, Waseda University Abstract Tis paper focuses on te existence of

More information

The derivative function

The derivative function Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Te derivative function Wat you need to know already: f is at a point on its grap and ow to compute it. Wat te derivative

More information

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative

Mathematics 5 Worksheet 11 Geometry, Tangency, and the Derivative Matematics 5 Workseet 11 Geometry, Tangency, and te Derivative Problem 1. Find te equation of a line wit slope m tat intersects te point (3, 9). Solution. Te equation for a line passing troug a point (x

More information

Efficient algorithms for for clone items detection

Efficient algorithms for for clone items detection Efficient algoritms for for clone items detection Raoul Medina, Caroline Noyer, and Olivier Raynaud Raoul Medina, Caroline Noyer and Olivier Raynaud LIMOS - Université Blaise Pascal, Campus universitaire

More information

lecture 26: Richardson extrapolation

lecture 26: Richardson extrapolation 43 lecture 26: Ricardson extrapolation 35 Ricardson extrapolation, Romberg integration Trougout numerical analysis, one encounters procedures tat apply some simple approximation (eg, linear interpolation)

More information

Copyright c 2008 Kevin Long

Copyright c 2008 Kevin Long Lecture 4 Numerical solution of initial value problems Te metods you ve learned so far ave obtained closed-form solutions to initial value problems. A closedform solution is an explicit algebriac formula

More information

Numerical Differentiation

Numerical Differentiation Numerical Differentiation Finite Difference Formulas for te first derivative (Using Taylor Expansion tecnique) (section 8.3.) Suppose tat f() = g() is a function of te variable, and tat as 0 te function

More information

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION

LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LIMITATIONS OF EULER S METHOD FOR NUMERICAL INTEGRATION LAURA EVANS.. Introduction Not all differential equations can be explicitly solved for y. Tis can be problematic if we need to know te value of y

More information

2.8 The Derivative as a Function

2.8 The Derivative as a Function .8 Te Derivative as a Function Typically, we can find te derivative of a function f at many points of its domain: Definition. Suppose tat f is a function wic is differentiable at every point of an open

More information

Average Rate of Change

Average Rate of Change Te Derivative Tis can be tougt of as an attempt to draw a parallel (pysically and metaporically) between a line and a curve, applying te concept of slope to someting tat isn't actually straigt. Te slope

More information

. If lim. x 2 x 1. f(x+h) f(x)

. If lim. x 2 x 1. f(x+h) f(x) Review of Differential Calculus Wen te value of one variable y is uniquely determined by te value of anoter variable x, ten te relationsip between x and y is described by a function f tat assigns a value

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximatinga function fx, wose values at a set of distinct points x, x, x,, x n are known, by a polynomial P x suc

More information

Combining functions: algebraic methods

Combining functions: algebraic methods Combining functions: algebraic metods Functions can be added, subtracted, multiplied, divided, and raised to a power, just like numbers or algebra expressions. If f(x) = x 2 and g(x) = x + 2, clearly f(x)

More information

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER*

ERROR BOUNDS FOR THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BRADLEY J. LUCIER* EO BOUNDS FO THE METHODS OF GLIMM, GODUNOV AND LEVEQUE BADLEY J. LUCIE* Abstract. Te expected error in L ) attimet for Glimm s sceme wen applied to a scalar conservation law is bounded by + 2 ) ) /2 T

More information

2.1 THE DEFINITION OF DERIVATIVE

2.1 THE DEFINITION OF DERIVATIVE 2.1 Te Derivative Contemporary Calculus 2.1 THE DEFINITION OF DERIVATIVE 1 Te grapical idea of a slope of a tangent line is very useful, but for some uses we need a more algebraic definition of te derivative

More information

ch (for some fixed positive number c) reaching c

ch (for some fixed positive number c) reaching c GSTF Journal of Matematics Statistics and Operations Researc (JMSOR) Vol. No. September 05 DOI 0.60/s4086-05-000-z Nonlinear Piecewise-defined Difference Equations wit Reciprocal and Cubic Terms Ramadan

More information

HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS

HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS HARMONIC ALLOCATION TO MV CUSTOMERS IN RURAL DISTRIBUTION SYSTEMS V Gosbell University of Wollongong Department of Electrical, Computer & Telecommunications Engineering, Wollongong, NSW 2522, Australia

More information

Chapter 2 Ising Model for Ferromagnetism

Chapter 2 Ising Model for Ferromagnetism Capter Ising Model for Ferromagnetism Abstract Tis capter presents te Ising model for ferromagnetism, wic is a standard simple model of a pase transition. Using te approximation of mean-field teory, te

More information

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines

Lecture 15. Interpolation II. 2 Piecewise polynomial interpolation Hermite splines Lecture 5 Interpolation II Introduction In te previous lecture we focused primarily on polynomial interpolation of a set of n points. A difficulty we observed is tat wen n is large, our polynomial as to

More information

How to Find the Derivative of a Function: Calculus 1

How to Find the Derivative of a Function: Calculus 1 Introduction How to Find te Derivative of a Function: Calculus 1 Calculus is not an easy matematics course Te fact tat you ave enrolled in suc a difficult subject indicates tat you are interested in te

More information

Regularized Regression

Regularized Regression Regularized Regression David M. Blei Columbia University December 5, 205 Modern regression problems are ig dimensional, wic means tat te number of covariates p is large. In practice statisticians regularize

More information

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example,

NUMERICAL DIFFERENTIATION. James T. Smith San Francisco State University. In calculus classes, you compute derivatives algebraically: for example, NUMERICAL DIFFERENTIATION James T Smit San Francisco State University In calculus classes, you compute derivatives algebraically: for example, f( x) = x + x f ( x) = x x Tis tecnique requires your knowing

More information

IEOR 165 Lecture 10 Distribution Estimation

IEOR 165 Lecture 10 Distribution Estimation IEOR 165 Lecture 10 Distribution Estimation 1 Motivating Problem Consider a situation were we ave iid data x i from some unknown distribution. One problem of interest is estimating te distribution tat

More information

MATH745 Fall MATH745 Fall

MATH745 Fall MATH745 Fall MATH745 Fall 5 MATH745 Fall 5 INTRODUCTION WELCOME TO MATH 745 TOPICS IN NUMERICAL ANALYSIS Instructor: Dr Bartosz Protas Department of Matematics & Statistics Email: bprotas@mcmasterca Office HH 36, Ext

More information

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my course tat I teac ere at Lamar University. Despite te fact tat tese are my class notes, tey sould be accessible to anyone wanting to learn or needing a refreser

More information

Symmetry Labeling of Molecular Energies

Symmetry Labeling of Molecular Energies Capter 7. Symmetry Labeling of Molecular Energies Notes: Most of te material presented in tis capter is taken from Bunker and Jensen 1998, Cap. 6, and Bunker and Jensen 2005, Cap. 7. 7.1 Hamiltonian Symmetry

More information

Problem Solving. Problem Solving Process

Problem Solving. Problem Solving Process Problem Solving One of te primary tasks for engineers is often solving problems. It is wat tey are, or sould be, good at. Solving engineering problems requires more tan just learning new terms, ideas and

More information

Polynomial Interpolation

Polynomial Interpolation Capter 4 Polynomial Interpolation In tis capter, we consider te important problem of approximating a function f(x, wose values at a set of distinct points x, x, x 2,,x n are known, by a polynomial P (x

More information

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point

1 The concept of limits (p.217 p.229, p.242 p.249, p.255 p.256) 1.1 Limits Consider the function determined by the formula 3. x since at this point MA00 Capter 6 Calculus and Basic Linear Algebra I Limits, Continuity and Differentiability Te concept of its (p.7 p.9, p.4 p.49, p.55 p.56). Limits Consider te function determined by te formula f Note

More information

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx.

Consider a function f we ll specify which assumptions we need to make about it in a minute. Let us reformulate the integral. 1 f(x) dx. Capter 2 Integrals as sums and derivatives as differences We now switc to te simplest metods for integrating or differentiating a function from its function samples. A careful study of Taylor expansions

More information

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES

A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES A MONTE CARLO ANALYSIS OF THE EFFECTS OF COVARIANCE ON PROPAGATED UNCERTAINTIES Ronald Ainswort Hart Scientific, American Fork UT, USA ABSTRACT Reports of calibration typically provide total combined uncertainties

More information

Time (hours) Morphine sulfate (mg)

Time (hours) Morphine sulfate (mg) Mat Xa Fall 2002 Review Notes Limits and Definition of Derivative Important Information: 1 According to te most recent information from te Registrar, te Xa final exam will be eld from 9:15 am to 12:15

More information

Pre-Calculus Review Preemptive Strike

Pre-Calculus Review Preemptive Strike Pre-Calculus Review Preemptive Strike Attaced are some notes and one assignment wit tree parts. Tese are due on te day tat we start te pre-calculus review. I strongly suggest reading troug te notes torougly

More information

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =

Introduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f = Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.

More information

Effect of the Dependent Paths in Linear Hull

Effect of the Dependent Paths in Linear Hull 1 Effect of te Dependent Pats in Linear Hull Zenli Dai, Meiqin Wang, Yue Sun Scool of Matematics, Sandong University, Jinan, 250100, Cina Key Laboratory of Cryptologic Tecnology and Information Security,

More information

EDML: A Method for Learning Parameters in Bayesian Networks

EDML: A Method for Learning Parameters in Bayesian Networks : A Metod for Learning Parameters in Bayesian Networks Artur Coi, Kaled S. Refaat and Adnan Darwice Computer Science Department University of California, Los Angeles {aycoi, krefaat, darwice}@cs.ucla.edu

More information

HOMEWORK HELP 2 FOR MATH 151

HOMEWORK HELP 2 FOR MATH 151 HOMEWORK HELP 2 FOR MATH 151 Here we go; te second round of omework elp. If tere are oters you would like to see, let me know! 2.4, 43 and 44 At wat points are te functions f(x) and g(x) = xf(x)continuous,

More information

7 Semiparametric Methods and Partially Linear Regression

7 Semiparametric Methods and Partially Linear Regression 7 Semiparametric Metods and Partially Linear Regression 7. Overview A model is called semiparametric if it is described by and were is nite-dimensional (e.g. parametric) and is in nite-dimensional (nonparametric).

More information

Introduction to Derivatives

Introduction to Derivatives Introduction to Derivatives 5-Minute Review: Instantaneous Rates and Tangent Slope Recall te analogy tat we developed earlier First we saw tat te secant slope of te line troug te two points (a, f (a))

More information

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems

5 Ordinary Differential Equations: Finite Difference Methods for Boundary Problems 5 Ordinary Differential Equations: Finite Difference Metods for Boundary Problems Read sections 10.1, 10.2, 10.4 Review questions 10.1 10.4, 10.8 10.9, 10.13 5.1 Introduction In te previous capters we

More information

Sin, Cos and All That

Sin, Cos and All That Sin, Cos and All Tat James K. Peterson Department of Biological Sciences and Department of Matematical Sciences Clemson University Marc 9, 2017 Outline Sin, Cos and all tat! A New Power Rule Derivatives

More information

Financial Econometrics Prof. Massimo Guidolin

Financial Econometrics Prof. Massimo Guidolin CLEFIN A.A. 2010/2011 Financial Econometrics Prof. Massimo Guidolin A Quick Review of Basic Estimation Metods 1. Were te OLS World Ends... Consider two time series 1: = { 1 2 } and 1: = { 1 2 }. At tis

More information

Lab 6 Derivatives and Mutant Bacteria

Lab 6 Derivatives and Mutant Bacteria Lab 6 Derivatives and Mutant Bacteria Date: September 27, 20 Assignment Due Date: October 4, 20 Goal: In tis lab you will furter explore te concept of a derivative using R. You will use your knowledge

More information

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY

SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY (Section 3.2: Derivative Functions and Differentiability) 3.2.1 SECTION 3.2: DERIVATIVE FUNCTIONS and DIFFERENTIABILITY LEARNING OBJECTIVES Know, understand, and apply te Limit Definition of te Derivative

More information

Learning based super-resolution land cover mapping

Learning based super-resolution land cover mapping earning based super-resolution land cover mapping Feng ing, Yiang Zang, Giles M. Foody IEEE Fellow, Xiaodong Xiuua Zang, Siming Fang, Wenbo Yun Du is work was supported in part by te National Basic Researc

More information

Exam 1 Review Solutions

Exam 1 Review Solutions Exam Review Solutions Please also review te old quizzes, and be sure tat you understand te omework problems. General notes: () Always give an algebraic reason for your answer (graps are not sufficient),

More information

Derivatives. By: OpenStaxCollege

Derivatives. By: OpenStaxCollege By: OpenStaxCollege Te average teen in te United States opens a refrigerator door an estimated 25 times per day. Supposedly, tis average is up from 10 years ago wen te average teenager opened a refrigerator

More information

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set

WYSE Academic Challenge 2004 Sectional Mathematics Solution Set WYSE Academic Callenge 00 Sectional Matematics Solution Set. Answer: B. Since te equation can be written in te form x + y, we ave a major 5 semi-axis of lengt 5 and minor semi-axis of lengt. Tis means

More information

Order of Accuracy. ũ h u Ch p, (1)

Order of Accuracy. ũ h u Ch p, (1) Order of Accuracy 1 Terminology We consider a numerical approximation of an exact value u. Te approximation depends on a small parameter, wic can be for instance te grid size or time step in a numerical

More information

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006

Math 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 Mat 102 TEST CHAPTERS 3 & 4 Solutions & Comments Fall 2006 f(x+) f(x) 10 1. For f(x) = x 2 + 2x 5, find ))))))))) and simplify completely. NOTE: **f(x+) is NOT f(x)+! f(x+) f(x) (x+) 2 + 2(x+) 5 ( x 2

More information

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible.

1. Which one of the following expressions is not equal to all the others? 1 C. 1 D. 25x. 2. Simplify this expression as much as possible. 004 Algebra Pretest answers and scoring Part A. Multiple coice questions. Directions: Circle te letter ( A, B, C, D, or E ) net to te correct answer. points eac, no partial credit. Wic one of te following

More information

Section 3: The Derivative Definition of the Derivative

Section 3: The Derivative Definition of the Derivative Capter 2 Te Derivative Business Calculus 85 Section 3: Te Derivative Definition of te Derivative Returning to te tangent slope problem from te first section, let's look at te problem of finding te slope

More information

5.1 We will begin this section with the definition of a rational expression. We

5.1 We will begin this section with the definition of a rational expression. We Basic Properties and Reducing to Lowest Terms 5.1 We will begin tis section wit te definition of a rational epression. We will ten state te two basic properties associated wit rational epressions and go

More information

Chapter 5 FINITE DIFFERENCE METHOD (FDM)

Chapter 5 FINITE DIFFERENCE METHOD (FDM) MEE7 Computer Modeling Tecniques in Engineering Capter 5 FINITE DIFFERENCE METHOD (FDM) 5. Introduction to FDM Te finite difference tecniques are based upon approximations wic permit replacing differential

More information

Section 2: The Derivative Definition of the Derivative

Section 2: The Derivative Definition of the Derivative Capter 2 Te Derivative Applied Calculus 80 Section 2: Te Derivative Definition of te Derivative Suppose we drop a tomato from te top of a 00 foot building and time its fall. Time (sec) Heigt (ft) 0.0 00

More information

Chapter 1. Density Estimation

Chapter 1. Density Estimation Capter 1 Density Estimation Let X 1, X,..., X n be observations from a density f X x. Te aim is to use only tis data to obtain an estimate ˆf X x of f X x. Properties of f f X x x, Parametric metods f

More information

1watt=1W=1kg m 2 /s 3

1watt=1W=1kg m 2 /s 3 Appendix A Matematics Appendix A.1 Units To measure a pysical quantity, you need a standard. Eac pysical quantity as certain units. A unit is just a standard we use to compare, e.g. a ruler. In tis laboratory

More information

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES

SECTION 1.10: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES (Section.0: Difference Quotients).0. SECTION.0: DIFFERENCE QUOTIENTS LEARNING OBJECTIVES Define average rate of cange (and average velocity) algebraically and grapically. Be able to identify, construct,

More information

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS

EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Statistica Sinica 24 2014, 395-414 doi:ttp://dx.doi.org/10.5705/ss.2012.064 EFFICIENCY OF MODEL-ASSISTED REGRESSION ESTIMATORS IN SAMPLE SURVEYS Jun Sao 1,2 and Seng Wang 3 1 East Cina Normal University,

More information

Exercises for numerical differentiation. Øyvind Ryan

Exercises for numerical differentiation. Øyvind Ryan Exercises for numerical differentiation Øyvind Ryan February 25, 2013 1. Mark eac of te following statements as true or false. a. Wen we use te approximation f (a) (f (a +) f (a))/ on a computer, we can

More information

Notes on wavefunctions II: momentum wavefunctions

Notes on wavefunctions II: momentum wavefunctions Notes on wavefunctions II: momentum wavefunctions and uncertainty Te state of a particle at any time is described by a wavefunction ψ(x). Tese wavefunction must cange wit time, since we know tat particles

More information

Cubic Functions: Local Analysis

Cubic Functions: Local Analysis Cubic function cubing coefficient Capter 13 Cubic Functions: Local Analysis Input-Output Pairs, 378 Normalized Input-Output Rule, 380 Local I-O Rule Near, 382 Local Grap Near, 384 Types of Local Graps

More information

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations

Parameter Fitted Scheme for Singularly Perturbed Delay Differential Equations International Journal of Applied Science and Engineering 2013. 11, 4: 361-373 Parameter Fitted Sceme for Singularly Perturbed Delay Differential Equations Awoke Andargiea* and Y. N. Reddyb a b Department

More information

The Complexity of Computing the MCD-Estimator

The Complexity of Computing the MCD-Estimator Te Complexity of Computing te MCD-Estimator Torsten Bernolt Lerstul Informatik 2 Universität Dortmund, Germany torstenbernolt@uni-dortmundde Paul Fiscer IMM, Danisc Tecnical University Kongens Lyngby,

More information

2.11 That s So Derivative

2.11 That s So Derivative 2.11 Tat s So Derivative Introduction to Differential Calculus Just as one defines instantaneous velocity in terms of average velocity, we now define te instantaneous rate of cange of a function at a point

More information

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA

The Krewe of Caesar Problem. David Gurney. Southeastern Louisiana University. SLU 10541, 500 Western Avenue. Hammond, LA Te Krewe of Caesar Problem David Gurney Souteastern Louisiana University SLU 10541, 500 Western Avenue Hammond, LA 7040 June 19, 00 Krewe of Caesar 1 ABSTRACT Tis paper provides an alternative to te usual

More information

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab

Te comparison of dierent models M i is based on teir relative probabilities, wic can be expressed, again using Bayes' teorem, in terms of prior probab To appear in: Advances in Neural Information Processing Systems 9, eds. M. C. Mozer, M. I. Jordan and T. Petsce. MIT Press, 997 Bayesian Model Comparison by Monte Carlo Caining David Barber D.Barber@aston.ac.uk

More information

Taylor Series and the Mean Value Theorem of Derivatives

Taylor Series and the Mean Value Theorem of Derivatives 1 - Taylor Series and te Mean Value Teorem o Derivatives Te numerical solution o engineering and scientiic problems described by matematical models oten requires solving dierential equations. Dierential

More information

The Priestley-Chao Estimator

The Priestley-Chao Estimator Te Priestley-Cao Estimator In tis section we will consider te Pristley-Cao estimator of te unknown regression function. It is assumed tat we ave a sample of observations (Y i, x i ), i = 1,..., n wic are

More information

Online Learning: Bandit Setting

Online Learning: Bandit Setting Online Learning: Bandit Setting Daniel asabi Summer 04 Last Update: October 0, 06 Introduction [TODO Bandits. Stocastic setting Suppose tere exists unknown distributions ν,..., ν, suc tat te loss at eac

More information

Continuity and Differentiability of the Trigonometric Functions

Continuity and Differentiability of the Trigonometric Functions [Te basis for te following work will be te definition of te trigonometric functions as ratios of te sides of a triangle inscribed in a circle; in particular, te sine of an angle will be defined to be te

More information

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm

Probabilistic Graphical Models Homework 1: Due January 29, 2014 at 4 pm Probabilistic Grapical Models 10-708 Homework 1: Due January 29, 2014 at 4 pm Directions. Tis omework assignment covers te material presented in Lectures 1-3. You must complete all four problems to obtain

More information

1 Proving the Fundamental Theorem of Statistical Learning

1 Proving the Fundamental Theorem of Statistical Learning THEORETICAL MACHINE LEARNING COS 5 LECTURE #7 APRIL 5, 6 LECTURER: ELAD HAZAN NAME: FERMI MA ANDDANIEL SUO oving te Fundaental Teore of Statistical Learning In tis section, we prove te following: Teore.

More information

Simulation and verification of a plate heat exchanger with a built-in tap water accumulator

Simulation and verification of a plate heat exchanger with a built-in tap water accumulator Simulation and verification of a plate eat excanger wit a built-in tap water accumulator Anders Eriksson Abstract In order to test and verify a compact brazed eat excanger (CBE wit a built-in accumulation

More information

Fast Exact Univariate Kernel Density Estimation

Fast Exact Univariate Kernel Density Estimation Fast Exact Univariate Kernel Density Estimation David P. Hofmeyr Department of Statistics and Actuarial Science, Stellenbosc University arxiv:1806.00690v2 [stat.co] 12 Jul 2018 July 13, 2018 Abstract Tis

More information

DIGRAPHS FROM POWERS MODULO p

DIGRAPHS FROM POWERS MODULO p DIGRAPHS FROM POWERS MODULO p Caroline Luceta Box 111 GCC, 100 Campus Drive, Grove City PA 1617 USA Eli Miller PO Box 410, Sumneytown, PA 18084 USA Clifford Reiter Department of Matematics, Lafayette College,

More information

Material for Difference Quotient

Material for Difference Quotient Material for Difference Quotient Prepared by Stepanie Quintal, graduate student and Marvin Stick, professor Dept. of Matematical Sciences, UMass Lowell Summer 05 Preface Te following difference quotient

More information

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning

Long Term Time Series Prediction with Multi-Input Multi-Output Local Learning Long Term Time Series Prediction wit Multi-Input Multi-Output Local Learning Gianluca Bontempi Macine Learning Group, Département d Informatique Faculté des Sciences, ULB, Université Libre de Bruxelles

More information

Finding and Using Derivative The shortcuts

Finding and Using Derivative The shortcuts Calculus 1 Lia Vas Finding and Using Derivative Te sortcuts We ave seen tat te formula f f(x+) f(x) (x) = lim 0 is manageable for relatively simple functions like a linear or quadratic. For more complex

More information

Profit Based Unit Commitment in Deregulated Electricity Markets Using A Hybrid Lagrangian Relaxation - Particle Swarm Optimization Approach

Profit Based Unit Commitment in Deregulated Electricity Markets Using A Hybrid Lagrangian Relaxation - Particle Swarm Optimization Approach Profit Based Unit Commitment in Deregulated Electricity Markets Using A Hybrid Lagrangian Relaxation - Particle Swarm Optimization Approac Adline K. Bikeri, Cristoper M. Maina and Peter K. Kiato Proceedings

More information

Optimal parameters for a hierarchical grid data structure for contact detection in arbitrarily polydisperse particle systems

Optimal parameters for a hierarchical grid data structure for contact detection in arbitrarily polydisperse particle systems Comp. Part. Mec. 04) :357 37 DOI 0.007/s4057-04-000-9 Optimal parameters for a ierarcical grid data structure for contact detection in arbitrarily polydisperse particle systems Dinant Krijgsman Vitaliy

More information

Quantum Numbers and Rules

Quantum Numbers and Rules OpenStax-CNX module: m42614 1 Quantum Numbers and Rules OpenStax College Tis work is produced by OpenStax-CNX and licensed under te Creative Commons Attribution License 3.0 Abstract Dene quantum number.

More information

158 Calculus and Structures

158 Calculus and Structures 58 Calculus and Structures CHAPTER PROPERTIES OF DERIVATIVES AND DIFFERENTIATION BY THE EASY WAY. Calculus and Structures 59 Copyrigt Capter PROPERTIES OF DERIVATIVES. INTRODUCTION In te last capter you

More information

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II

Excursions in Computing Science: Week v Milli-micro-nano-..math Part II Excursions in Computing Science: Week v Milli-micro-nano-..mat Part II T. H. Merrett McGill University, Montreal, Canada June, 5 I. Prefatory Notes. Cube root of 8. Almost every calculator as a square-root

More information

REVIEW LAB ANSWER KEY

REVIEW LAB ANSWER KEY REVIEW LAB ANSWER KEY. Witout using SN, find te derivative of eac of te following (you do not need to simplify your answers): a. f x 3x 3 5x x 6 f x 3 3x 5 x 0 b. g x 4 x x x notice te trick ere! x x g

More information

APPLICATION OF A DIRAC DELTA DIS-INTEGRATION TECHNIQUE TO THE STATISTICS OF ORBITING OBJECTS

APPLICATION OF A DIRAC DELTA DIS-INTEGRATION TECHNIQUE TO THE STATISTICS OF ORBITING OBJECTS APPLICATION OF A DIRAC DELTA DIS-INTEGRATION TECHNIQUE TO THE STATISTICS OF ORBITING OBJECTS Dario Izzo Advanced Concepts Team, ESTEC, AG Noordwijk, Te Neterlands ABSTRACT In many problems related to te

More information

THE STURM-LIOUVILLE-TRANSFORMATION FOR THE SOLUTION OF VECTOR PARTIAL DIFFERENTIAL EQUATIONS. L. Trautmann, R. Rabenstein

THE STURM-LIOUVILLE-TRANSFORMATION FOR THE SOLUTION OF VECTOR PARTIAL DIFFERENTIAL EQUATIONS. L. Trautmann, R. Rabenstein Worksop on Transforms and Filter Banks (WTFB),Brandenburg, Germany, Marc 999 THE STURM-LIOUVILLE-TRANSFORMATION FOR THE SOLUTION OF VECTOR PARTIAL DIFFERENTIAL EQUATIONS L. Trautmann, R. Rabenstein Lerstul

More information

Differentiation. Area of study Unit 2 Calculus

Differentiation. Area of study Unit 2 Calculus Differentiation 8VCE VCEco Area of stud Unit Calculus coverage In tis ca 8A 8B 8C 8D 8E 8F capter Introduction to limits Limits of discontinuous, rational and brid functions Differentiation using first

More information

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT

LIMITS AND DERIVATIVES CONDITIONS FOR THE EXISTENCE OF A LIMIT LIMITS AND DERIVATIVES Te limit of a function is defined as te value of y tat te curve approaces, as x approaces a particular value. Te limit of f (x) as x approaces a is written as f (x) approaces, as

More information

Differentiation in higher dimensions

Differentiation in higher dimensions Capter 2 Differentiation in iger dimensions 2.1 Te Total Derivative Recall tat if f : R R is a 1-variable function, and a R, we say tat f is differentiable at x = a if and only if te ratio f(a+) f(a) tends

More information

Blueprint Algebra I Test

Blueprint Algebra I Test Blueprint Algebra I Test Spring 2003 2003 by te Commonwealt of Virginia Department of Education, James Monroe Building, 101 N. 14t Street, Ricmond, Virginia, 23219. All rigts reserved. Except as permitted

More information

MAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016

MAT244 - Ordinary Di erential Equations - Summer 2016 Assignment 2 Due: July 20, 2016 MAT244 - Ordinary Di erential Equations - Summer 206 Assignment 2 Due: July 20, 206 Full Name: Student #: Last First Indicate wic Tutorial Section you attend by filling in te appropriate circle: Tut 0

More information

Spike train entropy-rate estimation using hierarchical Dirichlet process priors

Spike train entropy-rate estimation using hierarchical Dirichlet process priors publised in: Advances in Neural Information Processing Systems 26 (23), 276 284. Spike train entropy-rate estimation using ierarcical Diriclet process priors Karin Knudson Department of Matematics kknudson@mat.utexas.edu

More information

Solution for the Homework 4

Solution for the Homework 4 Solution for te Homework 4 Problem 6.5: In tis section we computed te single-particle translational partition function, tr, by summing over all definite-energy wavefunctions. An alternative approac, owever,

More information

Math 34A Practice Final Solutions Fall 2007

Math 34A Practice Final Solutions Fall 2007 Mat 34A Practice Final Solutions Fall 007 Problem Find te derivatives of te following functions:. f(x) = 3x + e 3x. f(x) = x + x 3. f(x) = (x + a) 4. Is te function 3t 4t t 3 increasing or decreasing wen

More information

New Fourth Order Quartic Spline Method for Solving Second Order Boundary Value Problems

New Fourth Order Quartic Spline Method for Solving Second Order Boundary Value Problems MATEMATIKA, 2015, Volume 31, Number 2, 149 157 c UTM Centre for Industrial Applied Matematics New Fourt Order Quartic Spline Metod for Solving Second Order Boundary Value Problems 1 Osama Ala yed, 2 Te

More information

FUNDAMENTAL ECONOMICS Vol. I - Walrasian and Non-Walrasian Microeconomics - Anjan Mukherji WALRASIAN AND NON-WALRASIAN MICROECONOMICS

FUNDAMENTAL ECONOMICS Vol. I - Walrasian and Non-Walrasian Microeconomics - Anjan Mukherji WALRASIAN AND NON-WALRASIAN MICROECONOMICS FUNDAMENTAL ECONOMICS Vol. I - Walrasian and Non-Walrasian Microeconomics - Anjan Mukerji WALRASIAN AND NON-WALRASIAN MICROECONOMICS Anjan Mukerji Center for Economic Studies and Planning, Jawaarlal Neru

More information

Robotic manipulation project

Robotic manipulation project Robotic manipulation project Bin Nguyen December 5, 2006 Abstract Tis is te draft report for Robotic Manipulation s class project. Te cosen project aims to understand and implement Kevin Egan s non-convex

More information