Transform Regression and the Kolmogorov Superposition Theorem

Size: px
Start display at page:

Download "Transform Regression and the Kolmogorov Superposition Theorem"

Transcription

1 Transform Regression an the Kolmogorov Superposition Theorem Ewin Penault IBM T. J. Watson Research Center Kitchawan Roa, P.O. Box 2 Yorktown Heights, NY 59 USA penault@us.ibm.com Abstract This paper presents a new preictive moeling algorithm that raws inspiration from the Kolmogorov superposition theorem. An initial version of the algorithm is presente that combines graient boosting, generalize aitive moels, an ecision-tree methos to construct moels that have the same overall mathematical structure as Kolmogorov s superposition equation. Improvements to the algorithm are then presente that significantly increase its rate of convergence. The resulting algorithm, ubbe transform regression, generates surprisingly goo moels compare to those prouce by the unerlying ecision-tree metho when the latter is applie irectly. Transform regression is highly scalable an a parallelize atabase-embee version of the algorithm has been implemente as part of IBM DB2 Intelligent Miner Moeling. Keywors: Graient boosting, Generalize aitive moeling, Decision trees.. Introuction In many respects, ecision trees an neural networks represent iametrically oppose classes of learning techniques. A strength of one is often a weakness of the other. Decision-tree methos approximate response surfaces by segmenting the input space into regions an using simple moels within each region for local surface approximation. The strengths of ecision-tree methos are that they are nonparametric, fully automate, an computationally efficient. Their weakness is that statistical estimation errors increase with the epth of trees, which ultimately limits the granularity of the surface approximation that can be achieve for fixe size ata. In contrast, neural network methos fit highly flexible families of nonlinear parametric functions to entire surfaces to construct global approximations. The strength of this approach is that it avois the increase in estimation error that accompanies segmentation an local moel fitting. The weakness is that fitting nonlinear parametric functions to ata is computationally emaning, an these emans are exacerbate by the fact that several network architectures often nee to be traine an evaluate in orer to maximize preictive accuracy. This paper presents a new moeling approach that attempts to combine the strengths of the methos escribe above specifically, the global fitting aspect of neural networks with the automate, computationally efficient, an nonparametric aspects of ecision trees. To achieve this union, this new moeling metho raws inspiration from the Kolmogorov superposition theorem: Theorem (Kolmogorov, 957). For every integer imension 2, there exist continuous real functions h i (x) efine on the unit interval U = [,], such that for every continuous real function f(x,,x ) efine on the -imensional unit hypercube U, there exist real continuous functions g i (x) such that 2 + f ( x,..., x ) = gi hi ( x ). i= = Stronger versions of this theorem have also been reporte (Lorentz, 92; Sprecher, 95). The theorem is interesting because it states that even the most complex multivariate functions can be ecompose into combinations of univariate functions, thereby enabling cross-prouct interactions to be moele without introucing cross-prouct terms in the moel. Hecht-Nielson (97) has note that the superposition equation can be interprete as a three-layer neural network an has suggeste using the theorem as basis for unerstaning multilayer neural networks. Girosi an Poggio (99), on the other han, have criticize this suggestion for several reasons, one being that applying Kolmogorov s theorem woul require the inuctive learning of nonparametric activation functions. Neural network methos, by contrast, usually assume that the activation functions are given an the problem is to learn values for the weights that appear in the networks. Although the usual paraigm for training weights can be extene to incorporate the learning of smooth parametric activation functions (i.e., by incluing their parameters in the partial erivatives that are calculate uring training), the incorporation of nonparametric learning methos into the training paraigm was seen as problematic. 35

2 Nonparametric learning, on the other han, is a key strength of ecision-tree methos. The learning of nonparametric activation functions thus provies a starting point for combining the global-fitting aspects neural network methos with nonparametric learning aspects of ecision-tree methos. In the sections that follow, an initial algorithm is presente an then subsequently refine that uses ecision-tree methos to inuctively learn instantiations of the g i an h i functions that appear in Kolmogorov s superposition equation so as to make the equation a goo preictor of unerlying response surfaces. In this respect, the initial algorithm an its refinement are inspire by, but are not mathematically base upon, Kolmogorov s theorem. The g i an h i functions that are create by the algorithms presente here are quite ifferent from those that are constructe in the various proofs of Kolmogorov s theorem an its variants. The latter are highly nonsmooth fractal functions that in some respects are comparable to hashing functions (Girosi an Poggio, 99). Moreover, Kolmogorov s theorem requires that the h i functions be universal for a given imension ; that is, the h i functions are fixe for each imension an only the g i functions epen on the specific function f. The initial algorithm presente below, on the other han, heuristically constructs both g i an h i functions that epen on training ata. It is important to note that neither universality nor the specific functional forms of the g i an h i functions that appear in the various proofs of Kolmogorov s theorem are necessary in orer to satisfy the superposition equation. For example, consier the function f(x,y) = xy. This function can be rewritten in superpositional form as f(x,y) =.25(x + y) 2.25(x y) 2. In this case, h (x) = h 2 (x) = x, h 2 (y) = y, h 22 (y) = y, h i (z) = for i, > 2, g (z) =.25z 2, g 2 (z) =.25z 2, an g i (z) = for i > 2. Although these particular g i an h i functions satisfy the superposition equation, they o not satisfy the preamble of the theorem because the above h i functions are not universal for all continuous functions f(x,y). Nor o the above g i an h i functions at all resemble the g i an h i functions that are constructe in the proofs of Kolmogorov s theorem an its variants. In general, for any given function f(x,,x ), there can exist a wie range of g i an h i functions that satisfy the superposition equation without satisfying the preamble of the theorem. Taking the above observations to heart, the initial algorithm presente below likewise ignores the preamble of Kolmogorov s theorem an instea focuses on the mathematical structure of the superposition equation itself. Decision-tree methos an greey search heuristics are use to construct g i an h i functions base on training ata in an attempt to make the superposition equation a goo preictor. The approach contrasts with previous work on irect application of the superposition theorem (Nerua, Štěrý, & Drkošová, 2; Sprecher 99, 997, 22). One ifficulty with irect application is that the g i an h i functions that nee to be constructe are extremely complex an entail very large computational overheas to implement, even when the target function is known (Nerua, Štěrý, & Drkošová, 2). Another problem is that noisy ata is highly problematic. The approach presente here avois both these issues, but it is heuristic in nature. Although there are no mathematical guarantees of obtaining goo preictive moels using this approach, the initial algorithm an its refinements nevertheless prouce very goo results in practice. Thus, one of the contributions of this paper is to emonstrate that the mathematical form of the superposition theorem is interesting in an of itself, an can be heuristically exploite to obtain goo preictive moels. The initial algorithm is base on heuristically interpreting Kolmogorov s superposition equation as a graient boosting moel (Frieman, 2, 22) in which the base learner constructs generalize aitive moels (Hastie & Tibshirani, 99) whose outputs are then nonlinearly transforme to remove systematic errors in their resiuals. To provie the necessary backgroun to motivate this interpretation, graient boosting an generalize aitive moeling are first briefly overviewe in Sections 2 an 3. The initial algorithm is then presente in Section. The initial algorithm, however, has very poor convergence properties. Sections 5 an therefore present improvements to the initial algorithm to obtain much faster rates of convergence, yieling a new algorithm calle transform regression. Faster convergence is achieve by moifying Frieman s graient boosting framework so as to introuce a nonlinear form of Gram-Schmit orthogonalization. The moification requires generalizing the mathematical forms of the moels to allow constraine multivariate g i an h i functions to appear in the resulting moels in orer to implement the orthogonalization metho. The resulting moels thus epart from the pure univariate form of Kolmogorov s superposition equation, but the benefit is significantly improve convergence properties. The nonlinear Gram-Schmit orthogonalization technique is another contribution of this paper, since it can be combine with other graient boosting algorithms to obtain similar benefits, such as Frieman s (2, 22) graient tree boosting algorithm. Section 7 presents evaluation results that compare the performance of the transform regression algorithm to the unerlying ecision-tree metho that is employe. In the evaluation stuy, transform regression often prouce better preictive moels than the unerlying ecisiontree metho when the latter was applie irectly. This result is interesting because transform regression uses ecision trees in a highly constraine manner. Section iscusses some of the etails of a parallelize atabase-embee implementation of transform 3

3 regression that was evelope for IBM DB2 Intelligent Miner Moeling. Section 9 presents conclusions an iscusses possible irections for future work. 2. Graient boosting Graient boosting (Frieman, 2, 22) is a metho for making iterative improvements to regression-base preictive moels. The metho is similar to graient escent except that, instea of calculating graient irections in parameter space, a learning algorithm (calle the base learner) is use to estimate graient irections in function space. Whereas with conventional graient escent each iteration contributes an aitive upate to the current vector of parameter values, with graient boosting each iteration contributes an aitive upate to the current regression moel. When the learning obective is to minimize total square error, graient boosting is equivalent to Jiang s LSBoost.Reg algorithm (Jiang, 2, 22) an to Mallat an Zhang s matching pursuit algorithm (Mallat an Zhang, 993). In the special case of square-error loss, the graient irections in function space that are estimate by the base learner are moels that preict the resiuals of the current overall moel. Moel upating is then accomplishe by summing the output of the current moel with the output of the moel for preicting the resiuals, which has the effect of aing correction terms to the current moel to improve its accuracy. The resulting graient boosting algorithm is summarize in Table. If the base learner is able to perform a true least-squares fit on the resiuals at each iteration, then the multiplying scalar α will always be equal to one. In the general case of an arbitrary loss function, the resiual error (a.k.a., pseuo-resiuals) at each iteration is equal to the negative partial erivative of the loss function with respect to the moel output for each ata recor. In this more general setting, line search is usually neee to optimize the value of α. Table. The graient boosting metho for minimizing square error. Let the current moel M be zero everywhere; Repeat until the current moel M oes not appreciably Use the base learner to construct a moel R that preicts the resiual error of M, ensuring that R oes not overfit the ata; Fin a value for scalar α that minimizes the loss (i.e., total square error) for the moel M + αr; Upate M M + αr; When applying graient boosting, overfitting can be an issue an some means for preventing overfitting must be employe (Frieman, 2, 22; Jiang, 2, 22). For the algorithms presente in this paper, a portion of the training ata is reserve as a holout set which the base learner employs at each iteration to perform moel selection for the resiual moels. Because it is also possible to overfit the ata by aing too many boosting stages (Jiang, 2, 22), this same holout set is also use to prune the number of graient boosting stages. The algorithms continue to a graient boosting stages until a point is reache at which either the net improvement obtaine on the training ata as a result of an iteration falls below a preset threshol, or the preiction error on the holout set has increase steaily for a preset number of iterations. The current moel is then prune back to the boosting iteration that maximizes preictive accuracy on the holout set. 3. Generalize aitive moels Generalize aitive moels (Hastie & Tibshirani, 99), is a metho for constructing regression moels of the form: ~ y = y + h ( x ) = Hastie an Tibshirani s backfitting algorithm is typically use to perform this moeling task. Backfitting assumes the availability of a learning algorithm calle a smoother for estimating univariate functions. Traitional examples of smoothers inclue classical smoothing algorithms that use kernel functions to calculate weighte averages of training ata, where the center of the kernel is the point at which the univariate function is to be estimate an the weights are given by the shapes of the kernel functions. However, in general, any learning algorithm can be use as a smoother, incluing ecision tree methos. An attractive aspect of ecision tree methos is that they can explicitly hanle both numerical an categorical input features, whereas classical kernel smoothers require preprocessing to construct numerical encoings of categorical features. With the backfitting algorithm, the value of y woul first be set to the mean of the target variable an a smoother woul then be applie to successive input variables x in roun-robin fashion to iteratively (re)estimate the h functions until convergence is achieve. The resulting algorithm is summarize in Table 2. Overfitting an loop termination can be hanle in a similar fashion as for graient boosting. The initialization of the h functions can be accomplishe by setting them to be zero everywhere. Alternatively, one can obtain very goo initial estimates by applying a. 37

4 Table 2. The backfitting algorithm. Set y equal to the mean of the target variable y; For =,, initialize the function h (x ); Repeat until the functions h (x ) o not appreciably For =,, o the following: Use the smoother to construct a moel H (x ) that preicts the following target value using only input feature x : new target = y y h k ( x k ) Upate h (x ) H (x ); k Table 3. A greey one-pass aitive moeling algorithm. Set y equal to the mean of the target variable y; For =,, use the smoother to construct a moel H (x ) that preicts the target value ( y y ) using only input feature x : Calculate linear regression coefficients λ such that λ H ( x ) is a best preictor of y y ) ; For =,, set h x ) = λ H ( x ) ; ( ( greey one-pass approximation to backfitting that inepenently applies the smoother to each input x an then combines the resulting moels using linear regression. This one-pass aitive moeling algorithm is summarize in Table 3. Remarkably, the greey one-pass algorithm shown in Table 3 can often prouce surprisingly goo moels in practice without aitional iterative backfitting. The onepass algorithm also has the avantage that overfitting can be controlle in the linear regression calculation, either via feature selection or by applying a regularization metho. This overfitting control is available in aition to overfitting controls that may be provie by the smoother. Examples of the latter inclue kernel-with parameters in the case of classical kernel smoothers an tree pruning in the case of ecision-tree methos.. An initial algorithm As mentione earlier, inspiration for the initial algorithm presente below is base on interpreting Kolmogorov s superposition equation as a graient boosting moel in which the base learner constructs generalize aitive moels whose outputs are then nonlinearly transforme to remove systematic errors in their resiuals. To motivate this interpretation, suppose that we are trying to infer a preictive moel for y as function of inputs x,,x given a set of noisy training ata { x,,x, y }. As a first attempt, we might try constructing a generalize aitive moel of the form where = h ( x ) = y + h ˆ ( x ) = = ~ y x ) h ( x ) + h, () ( = ˆ y. (2) This moeling task can be performe by first applying either the backfitting algorithm shown in Table 2 or the greey one-pass algorithm shown in Table 3, an then istributing the aitive constant y that is obtaine equally among the transforme inputs as per Equation 2. Inepenent of whether backfitting or the greey onepass algorithm is applie, resiual nonlinearities can still exist in the relationship between the aitive moel output ỹ an the target value y. To remove such nonlinearities, the same smoother use for aitive moeling can again be applie, this time to linearize ỹ with respect to y. The resulting combine moel woul then have the form = g ( ~ ) = y g h ( x ). (3) = To further improve the moel, graient boosting can be applie by using the above two-stage moeling technique as the base learner. The resulting graient boosting moel woul then have the form ~ y i = = h ( x ) = i ( hˆ i ( x ) + yi ) = (a) ( ) i = g ~ i yi = gi hi ( x ) (b) = = = i gi h i ( x ). (c) i i = Equations a an b efine the stages in the resulting graient boosting moel. Equation a efines the generalize aitive moels ỹ i that are constructe in each boosting stage, while Equation b efines the boosting stage outputs ŷ i. Equation c efines the output 3

5 (a) (b) (c) () Figure. An example of the moeling behavior of the initial algorithm. (a) The target function. (b) The moel output after one graient boosting stage. (c) After two boosting stages. () After ten boosting stages. ŷ of the overall moel, which is the sum of the boosting stage outputs ŷ i. The resulting algorithm will thus generate preictive moels that have the same mathematical form as Kolmogorov s superposition equation. The algorithm itself is summarize in Table. Table. The initial algorithm. Let the current moel M be zero everywhere; Repeat until the current moel M oes not appreciably At the i th iteration, construct an aitive moel ỹ i that that preicts the resiual error of M using the raw features x,,x as input, ensuring that ỹ i oes not overfit the ata; Construct an aitive moel ŷ i that preicts the resiual error of M using the output of moel ỹ i as the only input feature, ensuring that ŷ i oes not overfit the ata; Upate M M + ŷ i ; To emonstrate the behavior of the initial algorithm, the ProbE linear regression tree (LRT) algorithm (Nataraan & Penault, 22) was use as the smoother in combination with the one-pass greey aitive moeling algorithm shown in Table 3. The LRT algorithm constructs ecision trees with multivariate linear regression moels in the leaves. To prevent overfitting, LRT employs a combination of tree pruning an stepwise linear regression techniques to select both the size of the resulting tree an the variables that appear in the leaf moels. Preictive accuracy on a holout ata set is use as the basis for making this selection. In its use as a smoother, the LRT algorithm was configure to construct splits only on the feature being transforme. In aition, in the case of numerical features, the feature being smoothe was also allowe to appear as a regression variable in the leaf moels. The resulting h i functions are thus piecewise linear functions for numerical features x, an piecewise constant functions for categorical features x. For the linear regression operation in Table 3, forwar stepwise regression was use for feature selection an the same holout set use by LRT for tree pruning was use for feature pruning to prevent overfitting. 39

6 For illustration purposes, synthetic training ata was generate using the following target function: π x z = f ( x, y) = x + y + sin 2 π y sin 2 Data was generate by sampling the above function in 2 the region x, y [,] at gri increments of.. This ata was then subsample at increments of. to create a test set, with the remaining ata ranomly ivie into a training set an a holout set. Figure illustrates the above target function an the preictions on the test set after one, two, an ten boosting stages. As can be seen in Figure, the algorithm is able to moel the cross-prouct interaction expresse in Equation 5, but the convergence of the algorithm is very slow. Even after ten boosting stages, the root mean square error is.239, which is quite large given that no noise was ae to the training ata. Nevertheless, Figure oes illustrate the appeal of the Kolmogorov superposition theorem, which is its implication that cross-prouct interactions can be moele without explicitly introucing cross-prouct terms into the moel. Figure 2 illustrates how the initial algorithm is able to accomplish the same solely by making use of the mathematical structure of the superposition equation. Figures 2a an 2b show scatter plots of the test ata as viewe along the x an y input features, respectively. Also plotte in Figures 2a an 2b as soli curves are the feature transformations ĥ x (x) an ĥ y (y), respectively, that were constructe from the x an y inputs. Figure 2c shows a scatter plot of the test ata as viewe along the aitive moel output ỹ, together with the output of the g (ỹ ) function plotte as a soli curve. (5) As can be seen in Figures 2a an 2b, the first stage feature transformations ĥ x(x) an ĥ y(y) extract only the linear terms in the target function. From the point of view of these transformations, the cross-prouct relationship appears as heteroskeastic noise. However, as shown in Figure 2c, from the point of view of the aitive moel output ỹ, the cross-prouct relationship appears as resiual systematic error together with lower heteroskeastic noise. This resiual systematic error is moele by the g (ỹ ) transformation of the first boosting stage. The resulting output prouces the first approximation of the cross-prouct interaction shown in Figure b. As this example illustrates, the nonlinear transformations g i in Equation b (an in Kolmogorov s theorem) have the effect of moeling cross-prouct interactions without requiring that explicit cross-prouct terms be introuce into the moels. Target Value Z Target Value Z Target Value Z Input Feature X 2-2 (a) Input Feature Y 2-2 (b) Aitive Moel Output ỹ (c) Figure 2. The test ata as seen from various points in the first boosting stage. (a) Test ata an h x (x) plotte against the x axis. (b) Test ata an h y (y) plotte against the y axis. (c) Test ata an g (ỹ ) plotte against the erive ỹ axis.

7 5. Orthogonalize, fee-forwar graient boosting A necessary conition that must be satisfie to maximize the rate of convergence of graient boosting for squareerror loss is that the boosting stage outputs must be mutually orthogonal. To see why, let R i be the output of the i th boosting stage. Then the current moel M i obtaine after i boosting iterations is given by i M i = α R. = If some of the boosting stage outputs are not mutually orthogonal, then uner normal circumstances there will exist coefficients λ i such that the moel M given by i M i = λ α R = will have a strictly smaller square error on the training ata than moel M i. These coefficients can be estimate by performing a linear regression of the boosting stage outputs. Aitional boosting iterations woul therefore nee to be performe on moel M i ust to match the square error of M i. Hence, the rate of convergence will be suboptimal in this case. To illustrate, suppose the base learner performs a univariate linear regression using one an only one input, always picking the input that yiels the smallest square error. Graient boosting applie to this base learner prouces an iterative algorithm for performing multivariate linear regression that is essentially equivalent to coorinate escent. Suppose further that we have a training set comprising two inputs, x an y, an a target value f(x,y) = x+y with no noise ae. If the inputs are orthogonal (i.e., if the ot prouct between x an y is zero), then only two boosting iterations will be neee to fit the target function to within roun-off error. However, if the inputs are not orthogonal, more than two iterations will be neee, an the rate of convergence will ecrease as the egree to which they are not orthogonal increases (i.e., as the proection of one input ata vector onto the other increases). In orer to improve the rate of convergence, the graient boosting metho nees to be moifie so as to increase the egree of orthogonality between boosting stages. One obvious approach woul be to apply Gram-Schmit orthogonalization to the boosting stage outputs. Gram- Schmit orthogonalization is a technique use in QR ecomposition an relate linear regression algorithms. Its purpose is to convert simple coorinate escent into an optimal search metho for linear least-squares optimization by moifying the coorinate irections of successive regressors. In the case of graient boosting, the coorinate irections are the efine by the boosting i stage outputs. However, the usual Gram-Schmit orthogonalization proceure assumes that all coorinates are specifie up front an at the start of the proceure. Graient boosting, on the other han, constructs coorinates ynamically in a stagewise fashion, so an incremental orthogonalization proceure is neee. Although not as computationally efficient as the traitional Gram-Schmit proceure, incremental orthogonalization can be reaily accomplishe simply by replacing the αr term in the graient boosting algorithm shown in Table with a linear regression of the current boosting stage output R an all previous boosting stage outputs. Aing this step to the proceure prouces the orthogonalize graient boosting metho shown in Table 5. In the case of square-error loss, the line search step in Table 5 to fin an optimal scalar α is actually not neee because this scalar will equal one by construction. However, the line search is inclue in Table 5 to inicate how orthogonalize graient boosting generalizes to arbitrary loss functions. When orthogonalize graient boosting is applie to the univariate linear regression base learner escribe above, an optimum rate of convergence is achieve an the resulting overall algorithm is closely relate to the QR ecomposition algorithm for linear regression. Table 5. Orthogonalize graient boosting for square-error loss. Let the current moel M be zero everywhere; Repeat until the current moel M oes not appreciably At the i th iteration, use the base learner to construct a moel R i that preicts the resiual error of M, ensuring that R i oes not overfit the ata; Use least-squares linear regression to fin coefficients λ k, k i, such that Σλ k R k best fits the resiual error of M. Fin a value for scalar α that minimizes the loss function (i.e., total square error) for the moel M + α Σλ k R k ; Upate M M + α Σλ k R k ; Although the orthogonalize graient boosting metho in Table 5 is expeient, it oes not aress the unerlying source of non-orthogonality between boosting stage outputs, which is the base learner itself. A more refine solution woul be to strengthen the base learner so that it prouces moels whose outputs are alreay orthogonal with respect to previous boosting stages. To accomplish that, the approach presente here involves first moifying the graient boosting metho

8 Table. Fee-forwar graient boosting for square-error loss. Let the current moel M be zero everywhere; Repeat until the current moel M oes not appreciably At the i th iteration, use the base learner to construct a moel R i that preicts the resiual error of M using all previous boosting stage outputs R,, R i- as aitional inputs, while ensuring that R i oes not overfit the ata; Fin a value for scalar α that minimizes the loss (i.e., total square error) for the moel M + αr i ; Upate M M + αr i ; shown in Table so that, at each iteration, all previous boosting stage outputs are mae available to the base learner as aitional inputs. This moification yiels the fee-forwar graient boosting metho shown in Table. The primary motivation for fee-forwar graient boosting is to enable the base learner to perform an implicit Gram-Schmit orthogonalization instea of the explicit orthogonalization performe in Table 5. In particular, by making previous boosting stage outputs available to the base learner, it might be possible to moify the base learner so that it achieves a nonlinear form of Gram-Schmit orthogonalization, in contrast to the linear orthogonalization step in Table 5. In the next section, it will be shown how such a moification can in fact be mae to the aitive-moeling base learner use in the initial algorithm. An aitional but no less important effect of feeforwar graient boosting is that it further strengthens weak base learners by expaning their hypothesis spaces through function composition. If in the first iteration the base learner consiers moels of the form M (x,,x ), then in the secon iteration it will consier moels of the form M 2 (x,,x,m (x,,x )). Unless the base learner s hypothesis space is close uner this type of function composition, fee-forwar graient boosting will have the sie effect of expaning the base learner s hypothesis space without moifying its moe of operation. The strengthening of the base learner prouce by this expansion effect can potentially improve the accuracy of the moels that are constructe. Although fee-forwar graient boosting is motivate by orthogonality consierations, the metho steps efine in Table are not sufficient to guarantee orthogonality for arbitrary base learners, since some base learners might not be able to make full use of the outputs of previous boosting stages in orer to achieve orthogonality. In such cases, the orthogonalize fee-forwar graient boosting metho shown in Table 7 can be employe. Table 7. Orthogonalize fee-forwar graient boosting for square-error loss. Let the current moel M be zero everywhere; Repeat until the current moel M oes not appreciably At the i th iteration, use the base learner to construct a moel R i that preicts the resiual error of M using the previous boosting stage outputs R,, R i- as aitional inputs, while ensuring that R i oes not overfit the ata; Use least-squares linear regression to fin coefficients λ k, k i, such that Σλ k R k best fits the resiual error of M. Fin a value for scalar α that minimizes the loss function (i.e., total square error) for the moel M + α Σλ k R k ; Upate M M + α Σλ k R k ;. The transform regression algorithm In the case of the initial algorithm presente in Section, the base learner itself can be moifie to take full avantage of the outputs of previous boosting stages. In particular, two moifications are mae to the initial algorithm in orer to arrive at the transform regression algorithm. Because fee-forwar graient boosting permits boosting stage outputs to be use as input features to subsequent stages, the first moification is to convert the g i functions in Equation into h i functions by eliminating Equation b an by using the outputs of the aitive moels in Equations a as input features to all subsequent graient boosting stages. This moification is intene mainly to simplify the mathematical form of the resulting moels by exploiting the fact that feeforwar graient boosting explicitly makes boosting stage outputs available to subsequent stages, which enables the h i functions to perform ual roles: transform the input features an transform the aitive moel outputs. The secon moification is to introuce multivariate h i functions by further allowing the outputs of the aitive moels in Equations a to appear as aitional inputs to the h i functions in all subsequent stages. This moification is intene to push the orthogonalization of moel outputs all the way own to the construction of the h i feature transformation functions. As iscusse in Section, the ProbE linear regression tree (LRT) algorithm (Nataraan & Penault, 22) was use in the initial algorithm to construct the univariate g i an h i functions that appear in Equation. The LRT algorithm, however, is capable of constructing multivariate linear 2

9 regression moels in the leaves of trees, an not ust univariate linear regression moels as was neee for the initial algorithm. By allowing previous boosting stage outputs to appear as regressors in the leaves of these trees, the output of each leaf moel will then be orthogonal to those previous boosting stage outputs when the leaf conitions are satisfie. Hence, the overall nonlinear transformations efine by the trees will be orthogonal to the previous boosting stage outputs an any linear combination of the trees will likewise be orthogonal. With the above changes, the mathematical form of the resulting transform regression moels is given by the following system of equations: i = i = + i + h ik k = + = h ( x ) = (a) ( x,..., ) h i (,..., ), i > k where the notation ( x,..., y ) i (b) = ˆ, (c) i h i is use to inicate that function h i is meant to be a nonlinear transformation of x an that this transformation is allowe to vary as a function of y ˆ,..., i. Likewise for the h ik ( k,..., i ) functions that appear in Equation b. Note that the latter are the counterparts to the g i functions in Equation b. In concrete terms, when applying the ProbE LRT algorithm to construct y i ˆi h ( x,..., y ) an h (,..., y ) i ik k, the LRT algorithm is constraine to split only on the features being transforme (i.e., x i an ŷ k-, respectively) with all other inputs (i.e., y ˆ,..., i ) allowe to appear as regressors in the linear regression moels in the leaves of the resulting trees. Of course, the features being transforme can likewise appear as regressors in the leaf moels if they are numeric. Equation a correspons to the first boosting stage while Equation b correspons to all subsequent stages. Equation c efines the output of the overall moel. The resulting algorithm is summarize in Table. ˆi ˆi Although Equation eparts from the mathematical form of Kolmogorov s superposition equation, the above moifications ramatically improve the rate of convergence of the resulting algorithm. Figure 3 illustrates the increase rate of convergence of the Table. The transform regression algorithm. Let the current moel M be zero everywhere; Repeat until the current moel M oes not appreciably At the i th iteration, construct an aitive moel ŷ i that that preicts the resiual error of M using both the raw features x,,x an the outputs ŷ,, ŷ i- of all previous boosting stages as potential input features to ŷ i, allowing the feature transformation to vary as a function of the previous boosting stage outputs ŷ,, ŷ i-, an ensuring that ŷ i oes not overfit the ata; Upate M M + ŷ i ; transform regression algorithm compare to the initial algorithm when transform regression is applie to the same ata as in Figures an 2. In this experiment, the ProbE linear regression tree (LRT) algorithm was again use, this time exploiting its ability to construct multivariate regression moels in the leaves of trees. As with the initial algorithm, one-pass greey aitive moeling was use with stepwise linear regression, with the holout set use to estimate generalization error in orer to avoi overfitting. As shown in Figure 3a, because the g i functions have been remove, the first stage of transform regression extracts the two linear terms of the target function, but not the cross-prouct term. As can be seen in Figure 3, the first boosting stage therefore has a higher approximation error than the first boosting stage of the initial algorithm. However, for all subsequent boosting stages, transform regression outperforms the initial algorithm, as can be seen in Figures 3b-. As this example emonstrates, by moifying the graient boosting algorithm an the base learner to achieve orthogonality of graient boosting stage outputs, a ramatic increase the rate of convergence can be obtaine. 7. Experimental evaluation Table 9 shows evaluation results that were obtaine on eight ata sets that were use to compare the performance of the transform regression algorithm to the unerlying LRT algorithm. Also shown are results for the first graient boosting stage of transform regression, an for the stepwise linear regression algorithm that is use both in the leaves of linear regression trees an in the greey one-pass aitive moeling metho. The first four ata sets are available from the UCI Machine Learning Repository an the UCI KDD Archive. The last four are internal IBM ata sets. 3

10 (a) (b) RMS Error Transform Regression Initial Algorithm Boosting Stage (c) () Figure 3. An example of the moeling behavior of the transform regression algorithm. (a) Moel output after one graient boosting stage. (b) After two stages. (c) After three stages. () RMS errors of successive graient boosting stages. Because all ata sets have nonnegative target values, an because all but one (i.e., KDDCup9 TargetD) have / target values, comparisons were mae base on Gini coefficients of cumulative gains charts (Han, 997) estimate on holout test sets. Cumulative gains charts (a.k.a., lift curves) are closely relate to ROC curves, except that gains charts have the benefit of being applicable to continuous nonnegative numeric target values in aition to / categorical values. Gini coefficients are normalize areas uner cumulative gains charts, where the normalization prouces a value of zero for moels that are no better than ranom guessing, an a value of one for perfect preictors. Gini coefficients are thus closely relate to AUC measurements (i.e., the areas uner ROC curves). Note, however, that moels that are no better than ranom guessing have an AUC of.5 but a Gini coefficient of zero. As can be seen in Table 9, transform regression prouces better moels than the unerlying LRT algorithm on all ata sets but one, an for the one exception the LRT moel is only slightly better. Remarkably, the first graient boosting stage also prouces better moels than Table 9. Gini coefficients for ifferent ata sets an algorithms. For each ata set, the best coefficient is highlighte in bol, the secon best in italics. DATA SET TRANS REG FIRST BOOST STAGE LIN REG TREES STEP LIN REG ADULT COIL KDD-9 B KDD-9 D A D M R the LRT algorithm on a maority of the ata sets. In one case, the first stage moel is also better than the overall transform regression moel, which inicates an overfitting problem with the prototype implementation use for these experiments.

11 . Computational consierations In aition to its other properties, transform regression can also be implemente as a computationally efficient parallelizable algorithm Such an implementation is achieve by combining the greey one-pass aitive moeling algorithm shown in Table 3 with the ProbE linear regression tree (LRT) algorithm (Nataraan & Penault, 22). As per Table 3, each boosting stage is calculate in two phases. The first phase constructs initial estimates of the feature transformation functions h i an h ik that appear in Equations a an b. The secon phase performs a stepwise linear regression on these initial feature transformations in orer to select the most preictive transformations an to estimate their scaling coefficients, as per Table 3. The ProbE LRT technology enables the computations for the first phase to be performe using only a single pass over the ata. It also enables the computations to be ata-partition parallelize for scalability. The LRT algorithm incorporates a generalize version of the bottom-up merging technique use in the CHAID algorithm (Biggs, eville, an Suen 99; Kass 9). Accoringly, multiway splits are first constructe for each input feature. Next, ata is scanne to estimate sufficient statistics for the leaf moels in each multiway split. Finally, leaf noes an their sufficient statistics are merge in a bottom-up pairwise fashion to prouce trees for each feature without further accessing the ata. For categorical features, the category values efine the multiway splits. For numerical features, the feature values are iscretize into intervals an these intervals efine the multiway splits. Although the CHAID metho consiers only constant leaf moels, the approach can be generalize to inclue stepwise linear regression moels in the leaves (Nataraan & Penault, 22). In the case of linear regression, the sufficient statistics are mean vectors an covariance matrices. By calculating sufficient statistics simultaneously for both training ata an holout ata, the tree builing an tree pruning steps can be performe using only these sufficient statistics without any further ata access. Linear regression tree estimates of the h i an h ik feature transformation functions can therefore be calculate using only a single pass over both the training an holout ata at each iteration. In aition, because the technique of merging sufficient statistics can be applie to any isoint ata partitions, the same merging metho use uring tree builing can be use to merge sufficient statistics that are calculate in parallel on isoint ata partitions (Dorneich et al., 2). This merging capability enables ata scans to be reaily parallelize. In the first phase, the stepwise linear regression moels that appear in the leaves of the feature transformation trees are relatively small. At each iteration, the maximum number of regressors is equal to the iteration number. In the secon phase, no trees are constructe but a large stepwise linear regression is performe instea. In this case, the number of regressors is equal to the number of transformation trees (i.e., the number of input features plus the iteration inex minus one). As with the first phase, the mean vectors an covariance matrices that efine the sufficient statistics for the linear regression can be calculate using only a single pass over the training an holout ata. The sufficient statistics can likewise be calculate in parallel on isoint ata partitions, with the results then merge using the same merging technique use for tree builing. The above implementation techniques prouce a scalable an efficient algorithm. These techniques have been incorporate into a parallelize version of transform regression that is now available in IBM DB2 Intelligent Miner Moeling, which is IBM s atabase-embee ata mining prouct (Dorneich et al., 2). 9. Conclusions Although the experimental results presente above are by no means an exhaustive evaluation, the consistency of the results clearly emonstrate the benefits of the global function-fitting approach of transform regression compare to the local fitting approach of the unerlying linear regression tree (LRT) algorithm that is employe. Transform regression uses the LRT algorithm to construct a series of global functions that are then combine using linear regression. Although this use of LRT is highly constraine, in many cases it enables better moels to be constructe than with the pure local fitting of LRT. In this respect, transform regression successfully combines the global-fitting aspects of learning methos such as neural networks with the nonparametric local-fitting aspects of ecision trees. Transform regression is also computationally efficient. Only two passes over the ata are require to construct each boosting stage: one pass to buil linear regression trees for all input features to a boosting stage, an another pass to perform the stepwise linear regression that combines the outputs of the resulting trees to form an aitive moel. The amount of computation that is require per boosting stage is therefore between one to two times the amount of computation neee by the LRT algorithm to buil a single level of a conventional linear regression tree when the LRT algorithm is applie outsie the transform regression framework. Another aspect of transform regression is that it emonstrates how Frieman s graient boosting framework can be enhance to strengthen the base learner an improve the rate of convergence. One enhancement is to use the outputs of boosting stages as first-class input features to subsequent stages. This moification has the effect of expaning the hypothesis space through function composition of boosting stage moels. Another enhancement is to moify the base 5

12 learner so that it prouces moels whose outputs are linearly orthogonal to all previous boosting stage outputs. This orthogonality property improves the efficiency of the graient escent search performe by the boosting algorithm, thereby increasing the rate of convergence. In the case of transform regression, this secon moification involve using boosting stage outputs as aitional multivariate inputs to the feature transformation functions h i an h ik. This very same approach can likewise be use in combination with Frieman s tree boosting algorithm by replacing his conventional tree algorithm with the LRT algorithm. It shoul likewise be possible to exten the approach presente here to other tree-base boosting techniques. Transform regression, however, is still a greey hillclimbing algorithm. As such, it can get caught in local minima an at sale points. In orer to avoi local minima an sale points entirely, aitional research is neee to further improve the transform regression algorithm. Several authors (Kůrková, 99, 992; Nerua, Štěrý, & Drkošová, 2; Sprecher 99, 997, 22) have been investigating ways of overcoming the computational problems of irectly applying Kolmogorov s theorem. Given the strength of the results obtain here using the form of the superposition equation alone, research aime at creating a combine approach coul potentially be quite fruitful. References Biggs, D., e Ville, B., an Suen, E. (99). A metho of choosing multiway partitions for classification an ecision trees. Journal of Applie Statistics, ():9-2. Dorneich, A., Nataraan, R., Penault, E., & Tipu, F. (2). Embee preictive moeling in a parallel relational atabase, to appear in Proceeings of the 2st ACM Symposium on Applie Computing, April 2, Dion, France. Frieman, J.H. (2). Greey function approximation: a graient boosting machine. Annals of Statistics 29(5): Frieman, J.H. (22). Stochastic graient boosting. Computational Statistics & Data Analysis 3(): Girosi, F. & Poggio, T. (99). Representation properties of networks: Kolmogorov's theorem is irrelevant. Neural Computation ():59. Han, D.J. (997). Construction an Assessment of Classification Rules. New York: John Wiley an Sons. Hastie, T. & Tibshirani, R. (99). Generalize Aitive Moels. New York: Chapman an Hall. Hecht-Nielsen, R. (97). Kolmogorov's mapping neural network existence theorem. Proc. IEEE International Conference on Neural Networks, Vol. 3,. Jiang, W. (2). Is regularization unnecessary for boosting? Proc. th Intl. Workshop on AI an Statistics, 57. San Mateo, California: Morgan Kaufmann. Jiang, W. (22). On weak base hypotheses an their implications for boosting regression an classification. Annals of Statistics 3():5-73. Kass, G. V. (9). An exploratory technique for investigating large quantities of categorical ata. Applie Statistics 29(2):9-27. Kolmogorov, A.N. (957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable an aition. Doklay Akaemii Nauk SSSR, (5): Translate in American Mathematical Society Translations Issue Series 2, 2:55-59 (93). Kůrková, V. (99). Kolmogorov s theorem is relevant. Neural Computation 3():722. Kůrková, V. (992). Kolmogorov's theorem an multilayer neural networks. Neural Networks 5(3):5-5. Lorentz, G.G. (92). Metric entropy, withs, an superposition of functions. American Mathematical Monthly, 9:95. Mallat, S.G. an Zhang, Z. (993). Matching pursuits with time-frequency ictionaries. IEEE Trans. Signal Processing (2): Nataraan, R. & Penault, E.P.D. (22). Segmente regression estimators for massive ata sets. Proc. Secon SIAM International Conference on Data Mining, available online at Nerua, R., Štěrý, A., & Drkošová J. (2). Towars feasible learning algorithm base on Kolmogorov theorem. Proc. International Conference on Artificial Intelligence, Vol. II, pp CSREA Press. Sprecher, D.A. (95). On the structure of continuous functions of several variables. Transactions American Mathematical Society, 5(3): Sprecher, D.A. (99). A numerical implementation of Kolmogorov's superpositions. Neural Networks 9(5): Sprecher, D.A. (997). A numerical implementation of Kolmogorov's superpositions II. Neural Networks (3):757. Sprecher, D.A. (22). Space-filling curves an Kolmogorov superposition-base neural networks. Neural Networks 5():577.

Least-Squares Regression on Sparse Spaces

Least-Squares Regression on Sparse Spaces Least-Squares Regression on Sparse Spaces Yuri Grinberg, Mahi Milani Far, Joelle Pineau School of Computer Science McGill University Montreal, Canaa {ygrinb,mmilan1,jpineau}@cs.mcgill.ca 1 Introuction

More information

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21

'HVLJQ &RQVLGHUDWLRQ LQ 0DWHULDO 6HOHFWLRQ 'HVLJQ 6HQVLWLYLW\,1752'8&7,21 Large amping in a structural material may be either esirable or unesirable, epening on the engineering application at han. For example, amping is a esirable property to the esigner concerne with limiting

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Level Construction of Decision Trees in a Partition-based Framework for Classification

Level Construction of Decision Trees in a Partition-based Framework for Classification Level Construction of Decision Trees in a Partition-base Framework for Classification Y.Y. Yao, Y. Zhao an J.T. Yao Department of Computer Science, University of Regina Regina, Saskatchewan, Canaa S4S

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes

Leaving Randomness to Nature: d-dimensional Product Codes through the lens of Generalized-LDPC codes Leaving Ranomness to Nature: -Dimensional Prouct Coes through the lens of Generalize-LDPC coes Tavor Baharav, Kannan Ramchanran Dept. of Electrical Engineering an Computer Sciences, U.C. Berkeley {tavorb,

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Linear First-Order Equations

Linear First-Order Equations 5 Linear First-Orer Equations Linear first-orer ifferential equations make up another important class of ifferential equations that commonly arise in applications an are relatively easy to solve (in theory)

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Influence of weight initialization on multilayer perceptron performance

Influence of weight initialization on multilayer perceptron performance Influence of weight initialization on multilayer perceptron performance M. Karouia (1,2) T. Denœux (1) R. Lengellé (1) (1) Université e Compiègne U.R.A. CNRS 817 Heuiasyc BP 649 - F-66 Compiègne ceex -

More information

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION The Annals of Statistics 1997, Vol. 25, No. 6, 2313 2327 LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION By Eva Riccomagno, 1 Rainer Schwabe 2 an Henry P. Wynn 1 University of Warwick, Technische

More information

Separation of Variables

Separation of Variables Physics 342 Lecture 1 Separation of Variables Lecture 1 Physics 342 Quantum Mechanics I Monay, January 25th, 2010 There are three basic mathematical tools we nee, an then we can begin working on the physical

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Capacity Analysis of MIMO Systems with Unknown Channel State Information

Capacity Analysis of MIMO Systems with Unknown Channel State Information Capacity Analysis of MIMO Systems with Unknown Channel State Information Jun Zheng an Bhaskar D. Rao Dept. of Electrical an Computer Engineering University of California at San Diego e-mail: juzheng@ucs.eu,

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback Journal of Machine Learning Research 8 07) - Submitte /6; Publishe 5/7 An Optimal Algorithm for Banit an Zero-Orer Convex Optimization with wo-point Feeback Oha Shamir Department of Computer Science an

More information

Schrödinger s equation.

Schrödinger s equation. Physics 342 Lecture 5 Schröinger s Equation Lecture 5 Physics 342 Quantum Mechanics I Wenesay, February 3r, 2010 Toay we iscuss Schröinger s equation an show that it supports the basic interpretation of

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Sparse Reconstruction of Systems of Ordinary Differential Equations

Sparse Reconstruction of Systems of Ordinary Differential Equations Sparse Reconstruction of Systems of Orinary Differential Equations Manuel Mai a, Mark D. Shattuck b,c, Corey S. O Hern c,a,,e, a Department of Physics, Yale University, New Haven, Connecticut 06520, USA

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES

EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION OF UNIVARIATE TAYLOR SERIES MATHEMATICS OF COMPUTATION Volume 69, Number 231, Pages 1117 1130 S 0025-5718(00)01120-0 Article electronically publishe on February 17, 2000 EVALUATING HIGHER DERIVATIVE TENSORS BY FORWARD PROPAGATION

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

SYNCHRONOUS SEQUENTIAL CIRCUITS

SYNCHRONOUS SEQUENTIAL CIRCUITS CHAPTER SYNCHRONOUS SEUENTIAL CIRCUITS Registers an counters, two very common synchronous sequential circuits, are introuce in this chapter. Register is a igital circuit for storing information. Contents

More information

UNIFYING PCA AND MULTISCALE APPROACHES TO FAULT DETECTION AND ISOLATION

UNIFYING PCA AND MULTISCALE APPROACHES TO FAULT DETECTION AND ISOLATION UNIFYING AND MULISCALE APPROACHES O FAUL DEECION AND ISOLAION Seongkyu Yoon an John F. MacGregor Dept. Chemical Engineering, McMaster University, Hamilton Ontario Canaa L8S 4L7 yoons@mcmaster.ca macgreg@mcmaster.ca

More information

Gaussian processes with monotonicity information

Gaussian processes with monotonicity information Gaussian processes with monotonicity information Anonymous Author Anonymous Author Unknown Institution Unknown Institution Abstract A metho for using monotonicity information in multivariate Gaussian process

More information

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz

A note on asymptotic formulae for one-dimensional network flow problems Carlos F. Daganzo and Karen R. Smilowitz A note on asymptotic formulae for one-imensional network flow problems Carlos F. Daganzo an Karen R. Smilowitz (to appear in Annals of Operations Research) Abstract This note evelops asymptotic formulae

More information

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING

STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING STATISTICAL LIKELIHOOD REPRESENTATIONS OF PRIOR KNOWLEDGE IN MACHINE LEARNING Mark A. Kon Department of Mathematics an Statistics Boston University Boston, MA 02215 email: mkon@bu.eu Anrzej Przybyszewski

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Lower Bounds for the Smoothed Number of Pareto optimal Solutions

Lower Bounds for the Smoothed Number of Pareto optimal Solutions Lower Bouns for the Smoothe Number of Pareto optimal Solutions Tobias Brunsch an Heiko Röglin Department of Computer Science, University of Bonn, Germany brunsch@cs.uni-bonn.e, heiko@roeglin.org Abstract.

More information

Lagrangian and Hamiltonian Mechanics

Lagrangian and Hamiltonian Mechanics Lagrangian an Hamiltonian Mechanics.G. Simpson, Ph.. epartment of Physical Sciences an Engineering Prince George s Community College ecember 5, 007 Introuction In this course we have been stuying classical

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

Linear Regression with Limited Observation

Linear Regression with Limited Observation Ela Hazan Tomer Koren Technion Israel Institute of Technology, Technion City 32000, Haifa, Israel ehazan@ie.technion.ac.il tomerk@cs.technion.ac.il Abstract We consier the most common variants of linear

More information

Math 1B, lecture 8: Integration by parts

Math 1B, lecture 8: Integration by parts Math B, lecture 8: Integration by parts Nathan Pflueger 23 September 2 Introuction Integration by parts, similarly to integration by substitution, reverses a well-known technique of ifferentiation an explores

More information

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x) Y. D. Chong (2016) MH2801: Complex Methos for the Sciences 1. Derivatives The erivative of a function f(x) is another function, efine in terms of a limiting expression: f (x) f (x) lim x δx 0 f(x + δx)

More information

Fast Resampling Weighted v-statistics

Fast Resampling Weighted v-statistics Fast Resampling Weighte v-statistics Chunxiao Zhou Mar O. Hatfiel Clinical Research Center National Institutes of Health Bethesa, MD 20892 chunxiao.zhou@nih.gov Jiseong Par Dept of Math George Mason Univ

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N.

Modeling the effects of polydispersity on the viscosity of noncolloidal hard sphere suspensions. Paul M. Mwasame, Norman J. Wagner, Antony N. Submitte to the Journal of Rheology Moeling the effects of polyispersity on the viscosity of noncolloial har sphere suspensions Paul M. Mwasame, Norman J. Wagner, Antony N. Beris a) epartment of Chemical

More information

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy

IPA Derivatives for Make-to-Stock Production-Inventory Systems With Backorders Under the (R,r) Policy IPA Derivatives for Make-to-Stock Prouction-Inventory Systems With Backorers Uner the (Rr) Policy Yihong Fan a Benamin Melame b Yao Zhao c Yorai Wari Abstract This paper aresses Infinitesimal Perturbation

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

We G Model Reduction Approaches for Solution of Wave Equations for Multiple Frequencies

We G Model Reduction Approaches for Solution of Wave Equations for Multiple Frequencies We G15 5 Moel Reuction Approaches for Solution of Wave Equations for Multiple Frequencies M.Y. Zaslavsky (Schlumberger-Doll Research Center), R.F. Remis* (Delft University) & V.L. Druskin (Schlumberger-Doll

More information

On the Surprising Behavior of Distance Metrics in High Dimensional Space

On the Surprising Behavior of Distance Metrics in High Dimensional Space On the Surprising Behavior of Distance Metrics in High Dimensional Space Charu C. Aggarwal, Alexaner Hinneburg 2, an Daniel A. Keim 2 IBM T. J. Watson Research Center Yortown Heights, NY 0598, USA. charu@watson.ibm.com

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum

Analytic Scaling Formulas for Crossed Laser Acceleration in Vacuum October 6, 4 ARDB Note Analytic Scaling Formulas for Crosse Laser Acceleration in Vacuum Robert J. Noble Stanfor Linear Accelerator Center, Stanfor University 575 San Hill Roa, Menlo Park, California 945

More information

Local Linear ICA for Mutual Information Estimation in Feature Selection

Local Linear ICA for Mutual Information Estimation in Feature Selection Local Linear ICA for Mutual Information Estimation in Feature Selection Tian Lan, Deniz Erogmus Department of Biomeical Engineering, OGI, Oregon Health & Science University, Portlan, Oregon, USA E-mail:

More information

A simplified macroscopic urban traffic network model for model-based predictive control

A simplified macroscopic urban traffic network model for model-based predictive control Delft University of Technology Delft Center for Systems an Control Technical report 9-28 A simplifie macroscopic urban traffic network moel for moel-base preictive control S. Lin, B. De Schutter, Y. Xi,

More information

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13) Slie10 Haykin Chapter 14: Neuroynamics (3r E. Chapter 13) CPSC 636-600 Instructor: Yoonsuck Choe Spring 2012 Neural Networks with Temporal Behavior Inclusion of feeback gives temporal characteristics to

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs Lectures - Week 10 Introuction to Orinary Differential Equations (ODES) First Orer Linear ODEs When stuying ODEs we are consiering functions of one inepenent variable, e.g., f(x), where x is the inepenent

More information

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10 Some vector algebra an the generalize chain rule Ross Bannister Data Assimilation Research Centre University of Reaing UK Last upate 10/06/10 1. Introuction an notation As we shall see in these notes the

More information

Switching Time Optimization in Discretized Hybrid Dynamical Systems

Switching Time Optimization in Discretized Hybrid Dynamical Systems Switching Time Optimization in Discretize Hybri Dynamical Systems Kathrin Flaßkamp, To Murphey, an Sina Ober-Blöbaum Abstract Switching time optimization (STO) arises in systems that have a finite set

More information

Function Spaces. 1 Hilbert Spaces

Function Spaces. 1 Hilbert Spaces Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure

More information

Final Exam Study Guide and Practice Problems Solutions

Final Exam Study Guide and Practice Problems Solutions Final Exam Stuy Guie an Practice Problems Solutions Note: These problems are just some of the types of problems that might appear on the exam. However, to fully prepare for the exam, in aition to making

More information

Nonparametric Additive Models

Nonparametric Additive Models Nonparametric Aitive Moels Joel L. Horowitz The Institute for Fiscal Stuies Department of Economics, UCL cemmap working paper CWP20/2 Nonparametric Aitive Moels Joel L. Horowitz. INTRODUCTION Much applie

More information

Approximate Constraint Satisfaction Requires Large LP Relaxations

Approximate Constraint Satisfaction Requires Large LP Relaxations Approximate Constraint Satisfaction Requires Large LP Relaxations oah Fleming April 19, 2018 Linear programming is a very powerful tool for attacking optimization problems. Techniques such as the ellipsoi

More information

6 General properties of an autonomous system of two first order ODE

6 General properties of an autonomous system of two first order ODE 6 General properties of an autonomous system of two first orer ODE Here we embark on stuying the autonomous system of two first orer ifferential equations of the form ẋ 1 = f 1 (, x 2 ), ẋ 2 = f 2 (, x

More information

Optimal Variable-Structure Control Tracking of Spacecraft Maneuvers

Optimal Variable-Structure Control Tracking of Spacecraft Maneuvers Optimal Variable-Structure Control racking of Spacecraft Maneuvers John L. Crassiis 1 Srinivas R. Vaali F. Lanis Markley 3 Introuction In recent years, much effort has been evote to the close-loop esign

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments

Time-of-Arrival Estimation in Non-Line-Of-Sight Environments 2 Conference on Information Sciences an Systems, The Johns Hopkins University, March 2, 2 Time-of-Arrival Estimation in Non-Line-Of-Sight Environments Sinan Gezici, Hisashi Kobayashi an H. Vincent Poor

More information

Chapter 6: Energy-Momentum Tensors

Chapter 6: Energy-Momentum Tensors 49 Chapter 6: Energy-Momentum Tensors This chapter outlines the general theory of energy an momentum conservation in terms of energy-momentum tensors, then applies these ieas to the case of Bohm's moel.

More information

Generalizing Kronecker Graphs in order to Model Searchable Networks

Generalizing Kronecker Graphs in order to Model Searchable Networks Generalizing Kronecker Graphs in orer to Moel Searchable Networks Elizabeth Boine, Babak Hassibi, Aam Wierman California Institute of Technology Pasaena, CA 925 Email: {eaboine, hassibi, aamw}@caltecheu

More information

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties

Flexible High-Dimensional Classification Machines and Their Asymptotic Properties Journal of Machine Learning Research 16 (2015) 1547-1572 Submitte 1/14; Revise 9/14; Publishe 8/15 Flexible High-Dimensional Classification Machines an Their Asymptotic Properties Xingye Qiao Department

More information

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations

Optimized Schwarz Methods with the Yin-Yang Grid for Shallow Water Equations Optimize Schwarz Methos with the Yin-Yang Gri for Shallow Water Equations Abessama Qaouri Recherche en prévision numérique, Atmospheric Science an Technology Directorate, Environment Canaa, Dorval, Québec,

More information

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE

THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE Journal of Soun an Vibration (1996) 191(3), 397 414 THE VAN KAMPEN EXPANSION FOR LINKED DUFFING LINEAR OSCILLATORS EXCITED BY COLORED NOISE E. M. WEINSTEIN Galaxy Scientific Corporation, 2500 English Creek

More information

A study on ant colony systems with fuzzy pheromone dispersion

A study on ant colony systems with fuzzy pheromone dispersion A stuy on ant colony systems with fuzzy pheromone ispersion Louis Gacogne LIP6 104, Av. Kenney, 75016 Paris, France gacogne@lip6.fr Sanra Sanri IIIA/CSIC Campus UAB, 08193 Bellaterra, Spain sanri@iiia.csic.es

More information

Harmonic Modelling of Thyristor Bridges using a Simplified Time Domain Method

Harmonic Modelling of Thyristor Bridges using a Simplified Time Domain Method 1 Harmonic Moelling of Thyristor Briges using a Simplifie Time Domain Metho P. W. Lehn, Senior Member IEEE, an G. Ebner Abstract The paper presents time omain methos for harmonic analysis of a 6-pulse

More information

One-dimensional I test and direction vector I test with array references by induction variable

One-dimensional I test and direction vector I test with array references by induction variable Int. J. High Performance Computing an Networking, Vol. 3, No. 4, 2005 219 One-imensional I test an irection vector I test with array references by inuction variable Minyi Guo School of Computer Science

More information

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y

ensembles When working with density operators, we can use this connection to define a generalized Bloch vector: v x Tr x, v y Tr y Ph195a lecture notes, 1/3/01 Density operators for spin- 1 ensembles So far in our iscussion of spin- 1 systems, we have restricte our attention to the case of pure states an Hamiltonian evolution. Toay

More information

Quantum Mechanics in Three Dimensions

Quantum Mechanics in Three Dimensions Physics 342 Lecture 20 Quantum Mechanics in Three Dimensions Lecture 20 Physics 342 Quantum Mechanics I Monay, March 24th, 2008 We begin our spherical solutions with the simplest possible case zero potential.

More information

Perturbation Analysis and Optimization of Stochastic Flow Networks

Perturbation Analysis and Optimization of Stochastic Flow Networks IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. XX, NO. Y, MMM 2004 1 Perturbation Analysis an Optimization of Stochastic Flow Networks Gang Sun, Christos G. Cassanras, Yorai Wari, Christos G. Panayiotou,

More information

Situation awareness of power system based on static voltage security region

Situation awareness of power system based on static voltage security region The 6th International Conference on Renewable Power Generation (RPG) 19 20 October 2017 Situation awareness of power system base on static voltage security region Fei Xiao, Zi-Qing Jiang, Qian Ai, Ran

More information

Introduction to Markov Processes

Introduction to Markov Processes Introuction to Markov Processes Connexions moule m44014 Zzis law Gustav) Meglicki, Jr Office of the VP for Information Technology Iniana University RCS: Section-2.tex,v 1.24 2012/12/21 18:03:08 gustav

More information

3.2 Shot peening - modeling 3 PROCEEDINGS

3.2 Shot peening - modeling 3 PROCEEDINGS 3.2 Shot peening - moeling 3 PROCEEDINGS Computer assiste coverage simulation François-Xavier Abaie a, b a FROHN, Germany, fx.abaie@frohn.com. b PEENING ACCESSORIES, Switzerlan, info@peening.ch Keywors:

More information

arxiv: v1 [cs.ds] 31 May 2017

arxiv: v1 [cs.ds] 31 May 2017 Succinct Partial Sums an Fenwick Trees Philip Bille, Aners Roy Christiansen, Nicola Prezza, an Freerik Rye Skjoljensen arxiv:1705.10987v1 [cs.ds] 31 May 2017 Technical University of Denmark, DTU Compute,

More information

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control 19 Eigenvalues, Eigenvectors, Orinary Differential Equations, an Control This section introuces eigenvalues an eigenvectors of a matrix, an iscusses the role of the eigenvalues in etermining the behavior

More information

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing

Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Examining Geometric Integration for Propagating Orbit Trajectories with Non-Conservative Forcing Course Project for CDS 05 - Geometric Mechanics John M. Carson III California Institute of Technology June

More information

Math 342 Partial Differential Equations «Viktor Grigoryan

Math 342 Partial Differential Equations «Viktor Grigoryan Math 342 Partial Differential Equations «Viktor Grigoryan 6 Wave equation: solution In this lecture we will solve the wave equation on the entire real line x R. This correspons to a string of infinite

More information

Constraint Reformulation and a Lagrangian Relaxation based Solution Algorithm for a Least Expected Time Path Problem Abstract 1.

Constraint Reformulation and a Lagrangian Relaxation based Solution Algorithm for a Least Expected Time Path Problem Abstract 1. Constraint Reformulation an a Lagrangian Relaation base Solution Algorithm for a Least Epecte Time Path Problem Liing Yang State Key Laboratory of Rail Traffic Control an Safety, Being Jiaotong University,

More information

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation

Error Floors in LDPC Codes: Fast Simulation, Bounds and Hardware Emulation Error Floors in LDPC Coes: Fast Simulation, Bouns an Harware Emulation Pamela Lee, Lara Dolecek, Zhengya Zhang, Venkat Anantharam, Borivoje Nikolic, an Martin J. Wainwright EECS Department University of

More information

Non-Linear Bayesian CBRN Source Term Estimation

Non-Linear Bayesian CBRN Source Term Estimation Non-Linear Bayesian CBRN Source Term Estimation Peter Robins Hazar Assessment, Simulation an Preiction Group Dstl Porton Down, UK. probins@stl.gov.uk Paul Thomas Hazar Assessment, Simulation an Preiction

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

The Press-Schechter mass function

The Press-Schechter mass function The Press-Schechter mass function To state the obvious: It is important to relate our theories to what we can observe. We have looke at linear perturbation theory, an we have consiere a simple moel for

More information

Optimal Signal Detection for False Track Discrimination

Optimal Signal Detection for False Track Discrimination Optimal Signal Detection for False Track Discrimination Thomas Hanselmann Darko Mušicki Dept. of Electrical an Electronic Eng. Dept. of Electrical an Electronic Eng. The University of Melbourne The University

More information

Conservation Laws. Chapter Conservation of Energy

Conservation Laws. Chapter Conservation of Energy 20 Chapter 3 Conservation Laws In orer to check the physical consistency of the above set of equations governing Maxwell-Lorentz electroynamics [(2.10) an (2.12) or (1.65) an (1.68)], we examine the action

More information

arxiv:hep-th/ v1 3 Feb 1993

arxiv:hep-th/ v1 3 Feb 1993 NBI-HE-9-89 PAR LPTHE 9-49 FTUAM 9-44 November 99 Matrix moel calculations beyon the spherical limit arxiv:hep-th/93004v 3 Feb 993 J. Ambjørn The Niels Bohr Institute Blegamsvej 7, DK-00 Copenhagen Ø,

More information

Cost-based Heuristics and Node Re-Expansions Across the Phase Transition

Cost-based Heuristics and Node Re-Expansions Across the Phase Transition Cost-base Heuristics an Noe Re-Expansions Across the Phase Transition Elan Cohen an J. Christopher Beck Department of Mechanical & Inustrial Engineering University of Toronto Toronto, Canaa {ecohen, jcb}@mie.utoronto.ca

More information

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract

Problems Governed by PDE. Shlomo Ta'asan. Carnegie Mellon University. and. Abstract Pseuo-Time Methos for Constraine Optimization Problems Governe by PDE Shlomo Ta'asan Carnegie Mellon University an Institute for Computer Applications in Science an Engineering Abstract In this paper we

More information

State observers and recursive filters in classical feedback control theory

State observers and recursive filters in classical feedback control theory State observers an recursive filters in classical feeback control theory State-feeback control example: secon-orer system Consier the riven secon-orer system q q q u x q x q x x x x Here u coul represent

More information

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions

Computing Exact Confidence Coefficients of Simultaneous Confidence Intervals for Multinomial Proportions and their Functions Working Paper 2013:5 Department of Statistics Computing Exact Confience Coefficients of Simultaneous Confience Intervals for Multinomial Proportions an their Functions Shaobo Jin Working Paper 2013:5

More information

THE ACCURATE ELEMENT METHOD: A NEW PARADIGM FOR NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS

THE ACCURATE ELEMENT METHOD: A NEW PARADIGM FOR NUMERICAL SOLUTION OF ORDINARY DIFFERENTIAL EQUATIONS THE PUBISHING HOUSE PROCEEDINGS O THE ROMANIAN ACADEMY, Series A, O THE ROMANIAN ACADEMY Volume, Number /, pp. 6 THE ACCURATE EEMENT METHOD: A NEW PARADIGM OR NUMERICA SOUTION O ORDINARY DIERENTIA EQUATIONS

More information

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency

Transmission Line Matrix (TLM) network analogues of reversible trapping processes Part B: scaling and consistency Transmission Line Matrix (TLM network analogues of reversible trapping processes Part B: scaling an consistency Donar e Cogan * ANC Eucation, 308-310.A. De Mel Mawatha, Colombo 3, Sri Lanka * onarecogan@gmail.com

More information

Adjoint Transient Sensitivity Analysis in Circuit Simulation

Adjoint Transient Sensitivity Analysis in Circuit Simulation Ajoint Transient Sensitivity Analysis in Circuit Simulation Z. Ilievski 1, H. Xu 1, A. Verhoeven 1, E.J.W. ter Maten 1,2, W.H.A. Schilers 1,2 an R.M.M. Mattheij 1 1 Technische Universiteit Einhoven; e-mail:

More information

Expected Value of Partial Perfect Information

Expected Value of Partial Perfect Information Expecte Value of Partial Perfect Information Mike Giles 1, Takashi Goa 2, Howar Thom 3 Wei Fang 1, Zhenru Wang 1 1 Mathematical Institute, University of Oxfor 2 School of Engineering, University of Tokyo

More information

23 Implicit differentiation

23 Implicit differentiation 23 Implicit ifferentiation 23.1 Statement The equation y = x 2 + 3x + 1 expresses a relationship between the quantities x an y. If a value of x is given, then a corresponing value of y is etermine. For

More information

Lecture 5. Symmetric Shearer s Lemma

Lecture 5. Symmetric Shearer s Lemma Stanfor University Spring 208 Math 233: Non-constructive methos in combinatorics Instructor: Jan Vonrák Lecture ate: January 23, 208 Original scribe: Erik Bates Lecture 5 Symmetric Shearer s Lemma Here

More information

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas

The Role of Models in Model-Assisted and Model- Dependent Estimation for Domains and Small Areas The Role of Moels in Moel-Assiste an Moel- Depenent Estimation for Domains an Small Areas Risto Lehtonen University of Helsini Mio Myrsylä University of Pennsylvania Carl-Eri Särnal University of Montreal

More information

Optimal CDMA Signatures: A Finite-Step Approach

Optimal CDMA Signatures: A Finite-Step Approach Optimal CDMA Signatures: A Finite-Step Approach Joel A. Tropp Inst. for Comp. Engr. an Sci. (ICES) 1 University Station C000 Austin, TX 7871 jtropp@ices.utexas.eu Inerjit. S. Dhillon Dept. of Comp. Sci.

More information

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods

Hyperbolic Moment Equations Using Quadrature-Based Projection Methods Hyperbolic Moment Equations Using Quarature-Base Projection Methos J. Koellermeier an M. Torrilhon Department of Mathematics, RWTH Aachen University, Aachen, Germany Abstract. Kinetic equations like the

More information