Department of Electrical and Computer Systems Engineering

Size: px
Start display at page:

Download "Department of Electrical and Computer Systems Engineering"

Transcription

1 Deartment of Electrical and Comuter Systems Engineering Technical Reort MECSE-- A monomial $\nu$-sv method for Regression A. Shilton, D.Lai and M. Palaniswami

2 A Monomial ν-sv Method For Regression A. Shilton, D. Lai, M. Palaniswami, Senior Member, IEEE, Abstract In the resent aer we describe a new formulation for Suort Vector regression SVR, namely monomial ν-svr. Like the standard ν-svr, the monomial ν-svr method automatically adjusts the radius of insensitivity the tube width, ǫ to suit the training data. However, by relacing Vanik s ǫ-insensitive cost with a more general monomial ǫ-insensitive cost and likewise relacing the linear tube shrinking term with a monomial tube shrinking term, the erformance of the monomial ν-svr is imroved for data corruted by a wider range of noise distributions. We focus on the quadric form of monomial ν-svr and show that the dual form of this is simler than the standard ν-svr. We show that, like Suykens Least-Squares SVR LS-SVR method and unlike standard ν-svr, the quadric ν-svr dual has a unique global solution. Comarisons are made between the asymtotic efficiency of our method and that of standard ν-svr and LS-SVR which demonstrate the sueriority of our method for the secial case of higher order olynomial noise. These theoretical redictions are validated using exerimental comarisons with the alternative aroaches of standard ν-svr, LS-SVR and weighted LS-SVR. I. ITRODUCTIO Suort Vector regressors SVRs [] [] [] are a class of non-linear regressors insired by Vanik s SV methods for attern classification [] []. Like Vanik s method, SVRs first imlicitly ma all data into a usually higher dimensional feature sace. In this feature sace, the SVR attemts to construct a linear function of osition that mimics the relationshi between inut osition in feature sace and outut observed in the training data by minimising a measure of the emirical risk. To revent overfitting a regularisation term is included to bias the result toward functions with smaller gradient in feature sace. Two major advantages that SVRs have over cometing methods unregularised least-squares methods, for examle are sarseness and simlicity [] [6]. SVRs are able to give accurate results based only on a sarse subset of the comlete training set, making them ideal for roblems with large training sets. Moreover, such results are achievable without excessive algorithmic comlexity, and use of the kernel trick makes the dual form of the SVR roblem articularly simle. Roughly seaking, SVR methods may be broken into ǫ-svr [] [] and ν-svr methods [7] [8], both of which require a-riori selection of certain arameters. Of articular interest is the ǫ or ν in ν-svr methods arameter, which controls the sensitivity of the SVR to resence of noise in the training data. In both cases, this arameter controls the threshold ǫ directly for ǫ-svr, indirectly for ν-svr of insensitivity of the cost function to noise through use of Vanik s ǫ-insensitive loss function. The standard ǫ-svr aroach is associated with a simle dual roblem, but unfortunately selection of ǫ requires knowledge of the noise resent in the training data and its variance in articular which may not be available [9]. Conversely, the standard ν-svr method has a more comlex dual form, but has the advantage that selection of ν requires less knowledge of the noise rocess [9] only the form of the noise is required, not the variance. Thus both forms have certain difficulties associated with them. Yet another aroach is that of Suykens LS-SVR [], which uses the normal least-squares cost function with an added regularisation term insired by Vanik s original SV method. The two main advantages of this aroach are the simlicity of the resulting dual cost function, which is even simler than ǫ-svr; and having one less constant to choose a-riori. The disadvantages include loss of sarsity and robustness in the solution. These roblems may be ameliorated somewhat through use of a weighted LS-SVR scheme []. However, while this method is noticeably suerior when extreme outliers are resent in the training data, in our exerience the erformance of the weighted LS-SVR may not be significantly better than the standard LS-SVR if such outliers are not resent. In view of the shortcomings of these aroaches, we resent a modification of Smola s ν-svr method monomial ν-svr. Our aroach retains the feature that ν may be selected without knowledge of the variance of the noise resent in the training data. For the secial case of quadric ν-svr second order monomial ν-svr [], the associated dual otimisation roblem is simler than the standard ν-svr method. Furthermore, we show that quadric ν-svr method is able to out-erform both standard ν-svr and LS-SVR weighted or otherwise in several cases for examle in the resence of higher order olynomial noise. We begin in section II by reviewing the standard ǫ-svr method and its roerties. ext, using the theory of maximumlikelihood estimation as motivation, we resent a modification of this method using a new monomial ǫ-insensitive cost function monomial ǫ-svr. Concentrating on the quadric form of this new cost function, we form the dual and show that it is no more comlex than the standard ǫ-svr dual. In subsection II-C we consider the asymtotic efficiency of our method in comarison A. Shilton and M. Palaniswami are with the Centre of Exertise on etworked Decision and Sensor Systems, Deartment of Electrical and Electronic Engineering, The University of Melbourne, Victoria, Australia {ash,swami}@ee.mu.oz.au. D. Lai is currently with the Deartment of Electrical and Comuter Systems Engineering, Monash University, Victoria 68, Australia daniel.lai@eng.monash.edu.au

3 to the standard ǫ-svr and LS-SVR. We also address the issue of selecting ǫ to maximise this efficiency. Finally, in subsection II-D, we analyze the sarsity of the various methods. In section III we further exlore the roerties of the standard ν-svr method, and in articular its roerty of insensitivity to the variance of noise resent in the training data. We then aly the obvious extension of this aroach to the monomial ǫ-svr formulation introduced reviously to roduce a monomial ν-svr method, and show that the dual form of the quadric ν-svr is actually less comlex than the dual form of the standard ν-svr. We then consider the roblem of otimal ν selection for both our method and standard ν-svr, and show that the roerty of noise variance insensitivity when selecting this constant carries over from the standard ν-svr to monomial ν-svr. Finally, in subsection III-C, we consider the issue of sarsity for the standard ν-svr, monomial ν-svr and LS-SVR methodologies. In order to gain a better feel for the roblem, in section IV we consider the articular case of training data affected by olynomial noise of degree 6. In articular, we comare the theoretical efficiencies and the otimal values of ǫ and ν for our method against other aroaches. Finally, in section V, we consider a model roblem where various orders of olynomial noise are added to the training data, and comare the results achieved here with our theoretical redictions. We show that the results fit the rediction within tolerable accuracy. Moreover, we show that the redictions for the arameter ν rovide a worthwhile first guess of the actual otimal value. II. ǫ-sv REGRESSIO The regression roblem may be formulated as follows. Given a training set: Θ = {x,z,x,z,...,x,z } x i R dl z i R where z i = ĝ x i + noise for some ĝ : R dl R; and x i is drawn i.i.d. manner from an unknown distribution, construct an aroximation g : R dl R of ĝ. An aroximation g constructed for a given training set Θ is called a trained machine, and the construction rocess training. We assume that all noise sources eg. measurement noise, system noise etc. are smooth, i.i.d and zero mean. In the SV aroach [], it is usual to define imlicitly, as will be seen later a set of functions ϕ j : R dl R, j d H, which collectively form a ma from inut sace to feature sace, ϕ : R dl R dh, where ϕx = ϕ x,ϕ x,...,ϕ dh x. Using this ma, the trained machine is defined to be: g x = w T ϕ x + b which is a linear function of osition in feature sace. In the ǫ-svr framework, w and b are selected to minimise the regularised risk functional: R w,b = wt w + C g x i z i ǫ x i,z i Θ where. ǫ = max. ǫ, is Vanik s ǫ-insensitive loss function ǫ is a constant. In this exression, the first term wt w characterises the comlexity of the model while the second term is a measure of emirical risk associated with the training set when this model is alied. The constant C > controls the trade-off between emirical risk minimisation and otential overfitting if C is large and comlexity minimisation and otential underfitting if C is small. An imortant roerty of is that errors of magnitude less than ǫ do not contribute to the cost, R w,b. Assuming ǫ is well matched to the noise resent in the training data an issue we will return to shortly, this should lend a degree of noise insensitivity to the cost function []. For convenience, is usually exressed in terms of non-negative slack variables ξ, ξ. Using this notation, the rimal form of the ǫ-svr training roblem is: min w,b,ξ,ξ R w,b,ξ,ξ = wt w + C ξ i + ξi i= such that: w T ϕ x i + b z i ǫ ξ i w T ϕ x i + b z i ǫ ξi ξ,ξ For reasons of mathematical tractability, it is usual to deal with the dual of, which may be constructed as follows. For each of the inequality constraints in we associate a non-negative Lagrange-multilier, resectively α i, α i, γ i and γ i i, noting that this gives a - corresondence between the training air x i,z i and the Lagrange multiliers α i the larger wt w, the larger the gradient of g x in feature sace, and hence the more g x may vary for a given variation in inut, x

4 and αi for all i, and furthermore that at least one of α i and αi techniques it is straightforward to show that the dual form of is: will be zero for all i. Using the usual min α,α L α,α = α α T Gα α α α T z + ǫα + α T such that: α C α C T α α = where G i,j = K x i,x j = ϕ x i T ϕx j, and K : R dl R dl R is the kernel function associated with the ma ϕ : R dl R dh. The trained machine g x may be written in terms of the kernel function: g y = i α i α i K x i,y + b The bias b is calculated indirectly, using the fact that g x i = z i ǫ for all i such that < α i < C and likewise g x i = z i + ǫ for all i such that < αi < C. At no oint during training or use of the trained machine is knowledge of the exact form of ma ϕ : R dl R dh required. Only the kernel function K : R dl R dl R is required, and any symmetric function K : R dl R dl R satisfying Mercer s condition [] can be shown to be sufficient for the task []. oting that each training vector corresonds to one air α i, αi, and that one of these will be zero for all i, we may divide our training vectors into three distinct classes, namely: on-suort vectors: α i = αi =, ξ i = ξi =. Boundary vectors: α i αi [ C, ] C \{}, ξi = ξi =. Error vectors: α i = C, ξ i > or αi = C, ξ i >. Suort vectors are any vectors which contribute to i.e. both boundary and error vectors. We define S to be the number of suort vectors in the training set, E the number of error vectors and B the number of boundary vectors; so S = B + E. on-suort vectors are said to lie inside the ǫ-tube, boundary vectors on the edge of the ǫ-tube and error vectors outside the ǫ-tube. We will require the following theorem later: Theorem : Theorem., []: If the distribution from which the measured oututs {z,z,...,z } are drawn is smooth then: lim S = lim E A. Monomial ǫ-sv Regression Consider when ǫ =. If C is sufficiently large, and assuming that the emirical risk is non-zero, the second term the emirical risk term will be much larger than the first term the regularisation term. Hence, in this case: R w,b C g x i z i x i,z i Θ But this is just the Max-log-likelihood ML cost function for data affected by Lalacian noise []. It has been shown [] that, under certain assumtions given in section II-C, the otimal value ǫ ot for ǫ when the outut is affected by Lalacian noise is. For other tyes of noise it is often found that ǫ ot, as the emirical risk comonent of the cost function does not corresond to the ML cost in such cases. The resence of ǫ allows us to achieve a degree of noise insensitivity even though the cost function does not corresond the ML cost function. The question raised by these observations is whether one may achieve better erformance in the SVR for non-lalacian noise by modifying the rimal cost function to match the ML cost function when C is large. However, when doing so we must be mindful of the effect any changes may have on the mathematical tractability of the roblem, as mathematical simlicity is one of the major strengths of the SV aroach. One variant of standard SVR which is known to have a articularly simle dual form is Suykens least squares LS SV method []. This uses the following modified rimal cost function: R LS w,b = wt w + C x i,z i Θ g x i z i If C is sufficiently large the second emirical risk term in this exression will be dominant. But the emirical risk term in is just the ML cost function for training data affected by Gaussian noise, so if C is sufficiently large then R LS w,b will corresond aroximately to the ML cost function for training data affected by Gaussian noise. Hence one would exect the

5 LS-SVR to erform well in the resence of Gaussian noise. However, if the noise is not Gaussian, the lack of ǫ-insensitivity in the rimal is likely to make the LS-SVR excessively sensitive to noise. Motivated by this, we roose the following modification of the standard ǫ-svr rimal: R q w,b = wt w + C q x i,z i Θ g x i z i q ǫ 6 where q Z + is a constant. If C is large and ǫ =, the second emirical risk term will be dominant and hence R q w,b will corresond aroximately to the ML cost function for degree q olynomial noise, where olynomial noise of degree q is characterised by the density function τ = ce d τ q, where c,d > are constants. If q =, 6 reduces to the standard ǫ-sv cost function. Similarly, if q = and ǫ =, 6 reduces to rimal form of Suykens LS-SVR. However, like the standard ǫ-sv cost function and unlike the LS-SVR cost function, 6 incororates ǫ-insensitivity to achieve noise insensitivity when the emirical risk comonent of the cost function does not match the noise rocess effecting the training data. In terms of the usual slack variables, 6 may be written: min w,b,ξ,ξ R q w,b,ξ,ξ = wt w+ C q ξ q i + ξ q i i= such that: w T ϕx i + b z i ǫ ξ i w T ϕ x i + b 7 z i ǫ ξi ξ,ξ We shall refer to a regressor of form 7 as a monomial ǫ-svr as the emirical risk term is a monomial function of Vanik s ǫ-insensitive cost. Unfortunately, for the general case q > the dual form of 7 is rather comlicated []. For this reason we will restrict ourselves to the secial case q = quadric ǫ-svr, in which case it turns out that the dual roblem is mathematically nice. Before we construct the dual form of 7 we show that if q = the ositivity constraints ξ,ξ in 7 are suerfluous, giving us the simlified rimal roblem: min w,b,ξ,ξ R w,b,ξ,ξ = wt w+ C ξ i + ξi i= such that: w T ϕx i + b 8 z i ǫ ξ i w T ϕx i + b z i ǫ ξi We have the following theorems: Theorem : For every solution w,b,ξ,ξ of 8, ξ,ξ. Proof: Suose there exists a solution w, b, ξ, ξ of 8 such that ξi < for some i. Then for all other w,b,ξ,ξ satisfying the constraints contained in 8, R w,b,ξ,ξ R w, b, ξ, ξ by definition. Consider w,b,ξ,ξ, where w = w, b = b, ξ = ξ and: { ξj if i j ξ j = otherwise First, note that as w T ϕ x i + b z i ǫ ξ i, w = w, b = b, ξ i < and ξ i =, it follows that w T ϕ x i + b z i ǫ ξ i. Hence w,b,ξ,ξ satisfies the constraints in 8. Second, note that: R w, b, ξ, ξ = R w,b,ξ,ξ + C ξ i R w,b,ξ,ξ < R w, b, ξ, ξ These two observations contradict the original assertion that w, b, ξ, ξ with ξi < for some i was a solution of 8. Hence, for all solutions w,b,ξ,ξ of 8, ξ. The roof of the non-negativity of ξ follows from an analogous argument for the elements of this vector. Theorem : Any solution w,b,ξ,ξ of 8 will also be a solution of 7 when q =, and vice-versa. Proof: This follows trivially from theorem. Using 8 it is straightforward to construct the dual form of 7 when q = via the usual method. The dual is: min α,α L α,α = α α T Gα α + C αt α + C α T α α α T z + ǫα + α T such that: α,α T α α = As an alternative to the aroach resented here, this roblem may be tackled by using a weighted LS-SVR method []. This involves using a two-ste rocess. First, a standard LS-SVR is constructed. Based on this, weights are calculated for each training air. These weights are subsequently used in a second weighted LS-SVR, the training of which results in the trained machine. 9

6 where G is as before. The trained machine takes the same form as. ote that the only constraints in 9 are ositivity constraints on the dual variables, α and α, and a single equality constraint. Furthermore, ξ = C α. B. Training Issues To clearly see the structure of the monomial ǫ-svr dual roblem 9, it is instructive to re-exress it as follows: [ ] T [ ] [ ] T min α,α L q α,α = α α α α Q α + α s such that: α,α T α α = where: [ G G Q = [ G ] G ǫ z s = ǫ + z ] + C I Once we convert the roblem into this form, it is essentially trivial to aly standard SVM training techniques for examle [6], [7] to the roblem. We also have the following useful roerty which our formulation shares with Suykens LS-SVR method: Theorem : The dual roblem has a unique global solution. Proof: It follows from age 79 of [8] that in order to rove that has a unique solution it is sufficient to rove that Q is ositive definite. Since we are using a Mercer kernel, G is ositive [ semidefinite. ] Given that C >, it follows that C I G G is ositive definite. We also know that, for G ositive semidefinite, must be ositive semidefinite. Hence Q G G must be ositive definite, and must have a unique global solution. As an aside, note that if G is not ositive semidefinite to working recision, Q may still be made ositive definite using the Levenberg-Marquardt [9] [] method by choosing C aroriately. C. Asymtotically Otimal Selection of ǫ In [], Smola describes how, based on certain assumtions, the arameter ǫ may be selected in an otimal fashion for the standard ǫ-svr. While the assumtions made in this aer are not met by the SVR, exerimental results suggest that regardless of this the redictions made are reasonably accurate, and certainly rovide a useful first guess of the otimal value for ǫ. More imortantly, Smola derives a relationshi between the variance of the measurement noise and the efficiency of the training machine if the machine is trained using a given ǫ. Smola s and our assumtions are: The training set is infinitely large. The function g estimated by the SVR converges to the actual relationshi ĝ. The general SVR model is relaced by an unregularised location arameter estimator. Mathematically, the third assumtion is met in the limit K, C, and imlies that g x = b. Throughout the resent aer, these will be referred to as the usual assumtions. Following [], assume Z = {z,z,...,z } is drawn in an i.i.d. manner from some robability density function z θ with mean θ and variance σ. Let ˆθ Z be an unbiased estimator for θ. The efficiency e of the estimator is defined to be []: e = det IB where I is the Fisher information matrix and B is the covariance matrix []. Loosely seaking, the higher the efficiency, the better the estimator as there is less variance in the estimate of the mean θ. By otimal selection of ǫ, we mean choosing ǫ to maximise the efficiency, e. The value which achieves this maxima will be written ǫ ot. The Cramer-Rao bound [] states that e. For a single arameter estimator of the form: ˆθ Z = arg min ˆθ z Z d z, ˆθ where d z, ˆθ is a iecewise twice differentiable function of ˆθ then, asymtotically [], lemma : e = Q IG

7 6 where []: I = z ln z θ θ z θdz G = z Q = z dz,θ θ z θdz dz,θ θ z θdz Making the usual assumtions, the monomial ν-svr has form. That is: d z, ˆθ = z ˆθ where ˆθ = b and q Z +. Define std τ and q std τ to be, resectively, the normalised zero mean, unit variance and symmetrised normalised distribution functions. That is: q std τ = std τ + std τ std z = σ σz θ θ Then, denoting = ǫ σ : Consequently: where: I = σ G = = Q = = q ǫ ln std τ τ std τ dτ dz,θ θ z θdz z R\[θ ǫ,θ+ǫ] = σ q = σ q dz,θ θ z R\[θ ǫ,θ+ǫ] = σ q τ R\[,] τ R\[,] d z θ ǫ z θ ǫ σ std d τ τ z θ σ dz std τ dτ τ q q std τdτ z θdz d z θ ǫ z θ ǫ σ std y θ σ dz d τ τ std τdτ = σ q { qstd if q = q τ q q std τ dτ if q e = I std q std q stdτdτ q τ q q std τdτ τ q q std τdτ I std = ln std τ τ std τdτ if q = if q is the efficiency of the monomial ǫ-svr under the usual assumtions. The otimal value ǫ ot of the arameter ǫ is defined to be the value which maximizes this efficiency thereby minimising the variance in the estimate of b. Defining: ot = arg min e it is clear that maximum efficiency will be achieved by setting ǫ = ǫ ot = ot σ. This imlies that ǫ ot is directly roortional to the noise variance σ, where the constant of roortionality is deendent on the tye of noise. If the tye and amount variance of the noise are both known, rovides a reasonable basis for selecting ǫ or at least a reasonable first guess. If q =, can be used to obtain the theoretical efficiency of Suykens LS-SVR, under the usual assumtions, by setting ǫ =. For later reference, we define e LS to be this efficiency, i.e.: e LS = q stdτdτ I std τ q std τdτ

8 7 D. Sarsity of the monomial ǫ-svr As discussed reviously, art of our motivation for develoing the framework for monomial ǫ-svr is Suykens LS-SVR technique. Indeed, if C is sufficiently large and the noise affecting measurements Gaussian, LS-SVR corresonds aroximately to the ML estimator for the arameters w and b. However, a downside of the LS-SVR aroach is the lack of sarsity in the solution [], where by sarsity we are referring here to the number of non-zero elements in the vector α α or, equivalently as see theorem, the fraction of training vectors that are also suort vectors. In the LS-SVR case, all training vectors are suort vectors, i.e. S =. While the sarsity roblems of the LS-SVR may be overcome to a degree using the weighted LS-SVR aroach [], this aroach is somewhat heuristic in nature, requiring some external arbiter either human or machine to decide when to cease runing the dataset. However, by maintaining the ǫ-insensitive comonent of the cost function, the need for such heuristics at least at this level of abstraction is removed. So long as ǫ >, we would exect that the solution α α to the monomial ǫ-svr dual roblem will contain some non-zero fraction of non-suort vectors. Under the usual assumtions, we have the following theorem: Theorem : The fraction of suort vectors found by the ǫ-svr under the usual assumtions will, asymtotically, be: lim S = q std τdτ where = ǫ σ. Proof: Given that errors vectors are those for which g x i z i ǫ > i.e. those lying outside the ǫ-tube, using theorem it can be seen that: lim S = lim E = Pr g x ĝ x > ǫ = z θdz z R\[θ ǫ,θ+ǫ] = std τdτ where = ǫ σ. ote that this imlies that: z R\[,] = lim S = ǫ q std τdτ as exected for LS-SVR. It also imlies that any decrease in ǫ is likely to lead to a decrease in the sarsity of the solution. So, in general, if the training set is large then ǫ should be chosen to be as large as ossible while still maintaining accetable erformance to maximise the sarsity of the solution. III. ν-sv REGRESSIO The major drawback of ǫ-svr is aarent in the relation ǫ ot = ot σ. Secifically, selection of ǫ requires knowledge of what tye of and how much noise is resent in the training set or alternatively we must have a secific intended accuracy to use as a basis for selecting ǫ. However, while we may have some idea of the tye of noise we can exect to be dealing with for a given roblem and hence be able to calculate ot, we are unlikely to know how much noise will be resent, leaving σ and therefore ǫ ot uncertain. To overcome this roblem, Scholkof et. al. [7] introduced the ν-svr formulation of the roblem, which includes an additional term in the rimal roblem to trade-off the tube size ǫ, no longer a constant against model comlexity and emirical risk. From [7], the rimal formulation of the standard ν-svr training roblem is: min R,... = w,b,ξ,ξ wt w + Cνǫ + C ξ i + ξi,ǫ i= such that: w T ϕ x i + b z i ǫ ξ i w T ϕ x i + b z i ǫ ξi ξ,ξ ǫ 6 Of course, some heuristic inut will still be required for each aroach, either for ǫ selection or when deciding how much comromise is accetable during runing. The advantage of the former is that there exist alternative criteria which may be used to select ǫ i.e. otimal erformance for a given noise model, with sarsity roerties being just a useful side affect of this choice. If the dataset is very large, however, sarsity may be of rimary imortance, in which case neither aroach will have a clear advantage.

9 8 where ν > is a constant. The associated dual is [7]: min α,α L, α,α = α α T Gα α α α T z such that: α C α C T α α = T α + α Cν 7 where G is as before. We require the following theorem: Theorem 6: Theorem., []: For a ν-svr trained with any training set of size : ν is an uer bound on the fraction of error vectors. ν is a lower bound on the fraction of suort vectors. From this theorem we can immediately see that: ν = lim S A. Monomial ν-sv Regression We shall demonstrate shortly that there is a direct functional relationshi between ν and the theoretical efficiency e under the usual assumtions that is indeendent of σ. This means that it is ossible to find ν ot to achieve maximum theoretical efficiency without knowing the noise variance σ although the tye of noise is still required. However, the rice we ay for this is increased comlexity in the dual formulation 7, secifically the resence of a new constraint, T α + α Cν. Also, like standard ǫ-svr, the emirical risk comonent of the standard ν-svr rimal corresonds to ML only for Lalacian noise. To deal with these issues, we introduce the following formulation: min R q,r... = w,b,ξ,ξ wt w + Cν r ǫr + C q ξ q i + ξ q i,ǫ i= such that: w T ϕ x i + b z i ǫ ξ i w T ϕ x i + b z i ǫ ξi ξ,ξ ǫ 8 where q,r Z + are constants. We shall refer to this as monomial ν-svr when q = r. This reduces to the standard ν-svr rimal 6 if we choose q = r =. We will be concentrating on the case q = r = quadric ν-svr. We have already shown in theorem which will hold here also that the ositivity constraints ξ,ξ are suerfluous if q =. We now show that the constraint ǫ is also suerfluous if r =, giving us the simlified rimal roblem when q = r = : min R,... = w,b,ξ,ξ wt w + Cν ǫ + C ξ i + ξi,ǫ i= such that: w T ϕ x i + b 9 z i ǫ ξ i w T ϕ x i + b z i ǫ ξi We have the following theorems: Theorem 7: For every solution w,b,ξ,ξ,ǫ of 9, ǫ. Proof: Suose there exists a solution w, b, ξ, ξ, ǫ of 9 such that ǫ <. Then: R, w, b, ξ, ξ, ǫ = R, w, b, ξ, ξ, + Cν ǫ R, w, b, ξ, ξ, < R, w, b, ξ, ξ, ǫ Furthermore, if the constraints in 9 are satisfied for w, b, ξ, ξ, ǫ they must also be satisfied for w, b, ξ, ξ,. Therefore w, b, ξ, ξ, ǫ does not minimise R, subject to the constraints and hence cannot be a solution of 9, which contradicts the original assertion. So we conclude that every solution w,b,ξ,ξ,ǫ of 9 must satisfy ǫ. Theorem 8: Any solution w,b,ξ,ξ,ǫ of 9 will also be a solution of 8 when q = r =, and vice-versa. Proof: This follows trivially from theorems 7 and.

10 9 It is not difficult to show that the dual form of 9 and hence 8 with q = r = is: min α,α L, α,α = α α T Gα α + α + α T Cν Eα + α C αt α + C α T α α α T z such that: α,α T α α = where E is a matrix where all elements are. The trained machine takes the same form as. Furthermore: from which we see that: ǫ = ν ǫ = Cν T α + α ξ = C α x i,z i Θ g x i z i ǫ ote that the only constraints in are ositivity constraints on the dual variables, α and α, and a single equality constraint. By comarison, the standard ν-svr dual 7 which is otherwise of comarable comlexity has uer bounds on these variables and an uer bound on T α + α. For training uroses, note that may be exressed identically to by setting: [ G G Q = [ G] G z s = z ] + Cν E + C I Like monomial ǫ-svr, this form will have a unique global minimum to see this, note that the additional term Cν E is ositive semidefinite and therefore will not affect the validity of theorem. Finally, note that in the limit ν it is clear from that if q = r =, ǫ. In fact, it is clear from the general form of the monomial ν-svr 8 that as ν we must have ǫ to ensure that the rimal cost R, is finite. This imlies that in the limit ν the form of the quadric ν-svr aroaches the form of the LS-SVR. B. Asymtotically Otimal Selection of ν In section II-C, we showed that for ǫ-svr the theoretical efficiency e is a function of = ǫ σ. We now show that e may also be exressed somewhat indirectly as a function of ν making the usual assumtions, indeendent of σ if q = r. The advantage of this form is that, unlike ǫ ot, calculation of ν ot the value of ν which results in maximum efficiency does not require knowledge of σ. Consider the regularised cost function in its rimal form: min R q,r... = w,b,ξ,ξ wt w + Cν r ǫr + C q ξ q i + ξ q i,ǫ i= such that: w T ϕx i + b z i ǫ ξ i w T ϕx i + b z i ǫ ξi ξ,ξ ǫ where the ositivity constraints on ǫ, ξ and ξ may or may not be suerfluous; and r,q Z +. First, we note that: R q,r ǫ = C νǫ r + ξ q ξ i i ǫ + ξ q ξ i i ǫ where : i= { ξ i if g ǫ = xi > z i ǫ or g x i = z i ǫ,δǫ > { otherwise ξ i if g ǫ = xi < z i + ǫ or g x i = z i + ǫ,δǫ > otherwise We have taken some liberties here when calculating this derivative, which is not actually well defined. However, the rate of change as ǫ is either increased or decreased is well defined, which is what we are indicating here.

11 and hence: νǫ r S q =,δǫ < νǫ ǫ = C r E q =,δǫ > νǫ r ξ q i + ξ q i otherwise i= νǫ r S q =,δǫ < νǫ = C r E q =,δǫ > νǫ r z i g x i q ǫ otherwise R q,r i= We aim to select ν to arrive at a secific ǫ. In order to achieve this, the otimality condition Rq,r ǫ = must be met for this articular value of ǫ. In other words, if q we require that: ν = ǫ r ξ q i + ξ q i i= = ǫ r zi w T ϕx i + b q Alying the usual assumtions g x = b,, we find that in all cases : [ ν = σ q r ] r τ q q std τdτ where, as usual, = ǫ σ, and q std τ is defined by. If q = r then: i= ǫ ν = q τ q q std τdτ This imlies that if q = r then ν may be selected to achieve a articular efficiency e select assuming said efficiency is achievable by finding required to achieve this and then substituting this into the relevant exression for ν. At no oint is it necessary to use σ, so only the noise tye is required. ote that if we select q = r =, we can retrieve Smola s original result []: ν = std τdτ from which see that for a standard ν-svr if ν = then, in the limit, = ǫ =. Generally, if q = r then in order to achieve maximum efficiency we should choose: ν ot = q ot τ ot ot q q std τdτ where: C. Sarsity of the Monomial ν-svr ot = arg min We have shown in theorem that, under the usual assumtions: lim S = e q std τdτ which is a - function on if the distribution q std τ is ositive on τ. We have also shown in the revious section that again using these assumtions if q = r then from : ν = q τ q q std τdτ which is also a - function on if q std τ is ositive on τ. These two functions demonstrate that in general there exists a direct relation between the asymtotic value of the fraction of suort vectors in the training set and hence the sarsity of the solution and the arameter ν. In general, the If q = then in the limit becomes: R lim,r ǫ = C νǫ r lim S if δǫ < νǫ r lim E if δǫ > which is well defined, as theorem states that lim S = lim E R. For otimality, q,r =, and so using theorem it follows that, if q = : ǫ ν = σ r [ r q std τdτ ]

12 TABLE I OPTIMAL ǫ AD ν FOR STADARD AD QUADRIC LABELLED S AD Q, RESPECTIVELY ǫ-svr AD ν-svr METHODS WITH POLYOMIAL σ ADDITIVE OISE OF DEGREE 6, AD ASYMPTOTIC SUPPORT VECTOR RATIO S AT OPTIMALITY. Polynomial degree 6 ǫ Otimal S σ Otimal ν S lim S S ǫ Otimal Q σ Otimal ν Q lim S Q exact nature of this relation will be deendent of the form of the noise rocess affecting the training data. In the secial case q = r = i.e. standard ν-svr, we see that: ν = q std τdτ and so: which coincides with the asymtotic form of theorem 6. ν = lim S IV. PERFORMACE I THE PRESECE OF POLYOMIAL OISE To gain a better understanding of the method, we will consider a articular examle where the training data is affected by additive noise that is olynomial in nature this includes Gaussian and Lalacian noise as secial cases = and = resectively. For olynomial noise, std τ = c e c τ, Z +, where: c = Γ Γ Γ c Γ = Γ We will consider the related questions of otimal arameter selection and sarsity, comaring Smola s standard ν-svr, our monomial ν-svr and Suykens LS-SVR method. A. Asymtotically Otimal Selection of ǫ and ν Using and it is not difficult to show that under the usual assumtions, for a monomial ν-svr, denoting the efficiency of an order q monomial SV R for a training set affected by olynomial noise of order as e,q under the usual assumtions, and likewise the connection between ν and as ν,q σ,: e c ℸ, if q = where we have defined: e,q = Γ q c ℸ q,c ; ℸ q,c ; if q Γ e LS = Γ [ Γ c ν,q σ, = ℸ q q, c ; ] 6 c ℸ m, β; = β τ m e βτ dτ = m m i m i β i Γ i+,β for m Z\Z and Z + ; and Γa,x is the comlementary incomlete gamma function []: i= Γa,x = x ta e t dt Table I shows the otimal values for ǫ and ν for olynomial noise of degree 6 for both standard and quadric ǫ-svr and ν-svr source []. Unsurrisingly, the otimal value for ǫ for the quadric ǫ-svr in the resence of Gaussian noise = is, as in this case the quadric ǫ-svr rimal and the maximum-likelihood estimator both take aroximately 7

13 = = = = = = Fig.. Comarative inverse asymtotic efficiency versus ǫ of standard ǫ-svr, quadric ǫ-svr and LS-SVR for olynomial noise of degrees 6. σ In all cases, the solid line reresents the efficiency of quadric ǫ-svr, the dotted line the efficiency of standard ǫ-svr, and the dashed line the efficiency of the LS-SVR. the same form if ǫ = and C is large just as ǫ ot = for standard ǫ-svr in the resence of Lalacian noise =. ote also that, for the cases reresented in the table, ǫ ot > for all > q and ǫ ot = for all q. Generally: Theorem 9: Under the usual assumtions, given a set of training data affected by olynomial noise of degree = q with non-zero variance, the otimal value ǫ ot which maximizes e for the arameter ǫ as defined by will be zero. Theorem : Under the usual assumtions, given a set of training data affected by olynomial noise of degree > q with non-zero variance, the otimal value ǫ ot which maximizes e for the arameter ǫ as defined by will be ositive. Conjecture : Under the usual assumtions, given a set of training data affected by olynomial noise of degree < q with non-zero variance, the otimal value ǫ ot which maximizes e for the arameter ǫ as defined by will be zero. The roofs for theorems 9 and, and a artial roof of conjecture which we have observed exerimentally to be try for all q may be found in aendix I. Figure shows the inverse asymtotic efficiency versus ǫ σ for both standard and quadric ǫ-svrs, as well as LS-SVR. ote that with the excetion of Lalacian noise = the otimal theoretical efficiency of the quadric ǫ-svr exceeds the otimal theoretical efficiency of the standard ǫ-svr. Also note that the efficiency of monomial ǫ-svr methods exceeds that of LS-SVR for all >. Finally, figure shows the inverse asymtotic efficiency versus ν for both standard and quadric ν-svrs, as well as LS-SVR. The imortant thing to note from these grahs is that, although the range of ν is much larger for quadric ν-svr methods than for standard ν-svr methods, the efficiency itself quickly flattens out. In articular, although the theoretically otimal value for Gaussian noise is ν, it can be seen that even when ν = the efficiency is very close to it s maximum, e =. Also note the comarative flatness of the efficiency curves for quadric ν-svr. B. Sarsity Issues In the case of olynomial noise, theorem imlies that for monomial SVRs: lim S = c ℸ, c ; We have already observed that for monomial ν-svrs: ν,q = c c c ℸ q q, c ; Or, in the two cases of articular interest standard q = and quadric q = ν-svr: ν, = c ℸ c, c ; ν, = c c ; c ℸ,

14 = = = = = = Fig.. Comarative inverse asymtotic efficiency versus ν of standard ν-svr, quadric ν-svr and LS-SVR for olynomial noise of degrees 6. In all cases, the solid line reresents the efficiency of quadric ν-svr, the dotted line the efficiency of standard ν-svr and the dashed line the efficiency of the LS-SVR. ote that for all grahs the ν-scale differs for standard and quadric ν-svrs. For standard ν-svrs, the actual setting for ν is one tenth of that indicated on the x axis for quadric ν-svrs, ν is as indicated. In the case q =, as exected, this imlies: lim S = ν The case q = is slightly more comlex, and best illustrated grahically. We do this in figure, which shows our redictions for the fraction of suort vectors found as a function of ν for both standard and quadric ν-svr methods for olynomial noise of degree 6. ote that the general shae of the curves is essentially identical in all cases. Generally, the fraction of suort vectors found by the quadric ν-svr will increase quickly while ν is small and then level out, aroaching as ν as exected, given that the LS case corresonds with ν and treats all vectors as suort vectors. Table I gives the exected asymtotic ratio of suort vectors to training vectors when ν is otimally selected for the usual degrees of olynomial noise. On average, the results given in the table imly that quadric ν-svrs may require aroximately twice as many suort vectors as standard ν-svrs to achieve otimal accuracy on the same dataset. This may be understood by realising that the act of extracting suort vectors is essentially a form of lossy comression. The modified ν-svr is theoretically able to achieve more accurate results than standard ν-svr because it can handle more information by using less comression or, equivalently, finding more suort vectors before over-fitting and subsequent degradation in erformance begins. V. EXPERIMETAL RESULTS Due to the restrictive assumtions made when deriving the results for theoretical efficiency given in the receding sections which will, in the strictest sense, never be satisfied for any SVR, it is imortant that we seek some form of exerimental confirmation of these results. This is our aim in the resent section. For ease of comarison our exerimental rocedure is modelled on that of []. We have numerically comuted the risk in the form of root mean squared error RMSE as a function of ν for both standard and quadric ν-svr methods, allowing clear comarison between the two methods. We also comute the RMSE for Suykens LS-SVR, which is the limiting case of quadric ν-svr as ν, and non-sarse weighted LS-SVR []. Plots of risk versus ν are given for olynomial noise of degree 6 to comare the theoretical and exerimental results, and some relevant results are given for the effect of other arameters on the ν curves. Finally, we comare the sarsity of standard and quadric ν-svr for different orders of olynomial noise 6, and comare these results with the theoretical redictions. As in [], the training set consisted of examles x i,z i where x i is drawn uniformly from the range [,] and z i is given by the noisy sinc function: z i = sincx i + ζ i = sinπxi πx i + ζ i

15 = = = = = = Fig.. Comarative asymtotic fraction of suort vectors versus ν of standard and quadric ν-svr for olynomial noise of degrees 6. In all cases, the solid line reresents the efficiency of modified ν-svr and the dotted line standard ν-svr. ote that for all grahs the ν-scale differs for standard and quadric ν-svrs. For standard ν-svrs, the actual setting for ν is one tenth of that indicated on the x axis for quadric ν-svrs, ν is as indicated. where ζ i reresents additive olynomial noise. The test set consisted of airs x i,z i where the x i s were equally saced over the interval [,] and z i was given by the noiseless sinc function. trials were carried out for each result. Quadric ν-svr, standard LS-SVR and weighted LS-SVR code for this exeriment was written in C++ and comiled using DJGPP on a GHz Pentium III with MB of memory, running Windows 6. Exeriments using standard ν-svr methods were done using LibSVM [6]. By default, we set the arameter C = and noise variance σ =.. In all exeriments we use the Gaussian RBF kernel function K x,y = e σ kernel x y where σkernel = by default. A. Additive Polynomial oise In our first exeriment, we have used training data affected by olynomial noise of degree 6. Plots of RMSE for standard ν-svr, quadric ν-svr, LS-SVR and weighted LS-SVR are shown in figure. With the excetion of Lalacian noise =, these results resemble the theoretical redictions shown in figure. ote that the quadric ν-svr outerforms both standard ν-svr and weighted and unweighted LS-SVR in all cases excet Lalacian noise =. Whilst the general shae of the RMSE curves closely resembles the shae of the redicted efficiency curves, it is imortant to note that the shar otimum redicted by the theory for see figure is not resent in the exerimental results. Instead, the region of otimality is somewhat to the right of this and also somewhat blunter. From an alication oint of view, this is actually an advantage, as it means that results are far less sensitive to ν than exected. Indeed, from these results we can emirically say that the sweet sot for ν lies between. and for olynomial noise of degree and anything above otherwise. But, roughly seaking, selecting ν = will in all cases resented give results suerior to the standard ν-svr method and at least comarable to the LS-SVR methods better if. In the case of Lalacian noise the actual erformance in terms of RMSE of both the quadric ν-svr and the LS-SVRs are better than that of the standard ν-svr, whereas theory would suggest that this should not be the case. We are unsure as to why this anomaly occurs. Figure shows the ratio of suort vectors to training vectors for both standard and quadric ν-svrs as a function of ν. These curves closely match the redictions given in figure. It is clear from figures and that, as exected, the number of suort vectors found by the quadric ν-svr is substantially larger than the number found by the standard ν-svr. However, this is still substantially less than the number of suort vectors found by the LS-SVR which uses all training vectors as suort vectors. B. Parameter Variation with Additive Gaussian oise In our second exeriment, we consider the erformance of the standard and quadric ν-svr in the resence of additive Gaussian noise as other arameters of the roblem articularly σ the noise variance, C and σ kernel. In articular, we wish 6 code available at htt://

16 . =. =. =.. quadric nu SVR standard nu SVR LS SVR Weighted LS SVR = = = Fig.. Risk RMSE versus ν for sinc data with olynomial noise of degree {,,,,, 6} working left to right, to to bottom c.f. [], figure. In all cases, σ =., C = and σkernel =. ote that for all grahs the ν-scale differs for standard and modified ν-svrs. For standard ν-svrs, the actual setting for ν is one tenth of that indicated on the x axis for modified ν-svrs, ν is as indicated. to see if the RMSE versus ν curve retains the same general form as these arameters are varied. The to row of figure 6 shows the form of the RMSE versus ν curve for a range of noise levels. In all cases the form of the curves remains essentially unchanged. It will be noted, however, that for the lowest noise case σ =. the standard ν-svr is able to out-erform the quadric ν-svr. We are unsure as to why this occurs. Once again, selecting ν > gives reasonable erformance in all cases c.f. Smola s result for standard ν-svr [9], where the otimal area is ν [.,.8]. The middle row of figure 6 shows the same curve with the noise variance σ fixed for different values of C, namely C {,,}. Once again, the RMSE curve for quadric ν-svr is roughly as redicted, and ν > gives reasonable results. In this case, C = rovides an anomalous result. Finally, in the bottom row of figure 6, we give the RMSE versus ν curves when the kernel arameter σkernel is varied. These results follow the same attern as for variation of σ and C. It is interesting to note here that, while the RMSE versus ν curve obtained using the quadric ν-svr fits more closely the redicted curve than does the same curve for standard ν-svr when arameters are chosen badly, this does not imly that the erformance of the quadric ν-svr will necessarily be better than the standard ν-svr in this case. Indeed, the two cases where other SV arameters i.e. not ν have been chosen badly i.e. C = and σkernel =. in figure 6 are the two cases where the standard ν-svr most outerforms the quadric ν-svr. However, as one should continue to search until aroriate arameters are found, this should not be too much of a roblem in ractical situations. VI. COCLUSIO In this aer we have reviewed the standard SVR techniques of ǫ-svr and ν-svr. Motivated by the relative merits and demerits of these aroaches, we then introduced a new, more general form of SVR monomial ν-svr. We roceeded to investigate a secial case of monomial ν-svr quadric ν-svr in some detail and showed that the dual form of the quadric ν-svr dual roblem is significantly simler than the standard ν-svr dual. We have comared the theoretical asymtotic efficiencies of our roosed formulation against other SVR methods under certain restrictive assumtions. Our investigation indicates that the quadric ν-svr method not only shares the standard ν-svr method s roerty of theoretical noise variance insensitivity, but is also more efficient in many cases in articular, when the training data is affected by olynomial noise of degree. These redictions have been exerimentally tested, and comarisons have been made between the erformances of quadric ν-svr, standard ν-svr, standard LS-SVR and weighted LS-SVR methods in the resence of higher order olynomial noise. Based on these results, we conclude that the theoretical redictions give a useful insight into the characteristics of the various methods. Both theoretical and exerimental results indicate that erformance of the quadric ν-svr method in many cases exceeds that of both standard ν-svr and LS-SVR.

17 6 = = = = = = Fig.. umber of suort vectors out of training vectors versus ν for sinc data with olynomial noise of degree {,,,,,6} working left to right, to to bottom. In all cases, σ =., C = and σkernel =. The dotted line gives results for standard ν-svr, while the solid line gives results for quadric ν-svr. ote that for all grahs the ν-scale differs for standard and modified ν-svrs. For standard ν-svrs, the actual setting for ν is one tenth of that indicated on the x axis for modified ν-svrs, ν is as indicated.... sigmanoise =. quadric nu SVR standard nu SVR LS SVR Weighted LS SVR... sigmanoise =.... sigmanoise = C = kernel = C = kernel = C = kernel = Fig. 6. Risk RMSE versus ν for sinc data with Gaussian noise c.f. [], figure. The to row shows erformance for different noise variances, σ {.,., } from left to right, with C = and σkernel =. The middle row gives erformance for C {,, }, resectively, with σ =. and σkernel =. Finally, the bottom row shows erformance for σ kernel {.,, }, resectively, with σ =. and C =. ote that for all grahs the ν-scale differs for standard and quadric ν-svrs. For standard ν-svrs, the actual setting for ν is one tenth of that indicated on the x axis for quadric ν-svrs, ν is as indicated.

18 7 APPEDIX I PROOFS OF THEOREMS In this aendix we give roofs for theorems and 9, and a artial roof of conjecture. Before roceeding, however, some reliminary results are required. In what follows, B x, y is the beta function []: B x,y = ΓxΓy Γx+y It is straightforward to show that: { d d ℸ mℸm, β; if m > m, β; = β e β if m = ℸ m, β; = β m Γ m+ Theorem : lim xγax = x + a for all a >. Proof: Using the Euler limit form of Γx [6], it can be seen that: xγax = x [ ax n= + ax ] n + ax n and therefore lim xγax = x + a. Theorem : B a + c,b > B a,b + c for all a > b, a,b,c > ; and B a + c,b < B a,b + c for all a < b, a,b,c >. Proof: From [], section., equation, it can be seen that: B a + c,b = Bc,b B a,b + c = Ba,c Γb Γb+c Γa B a,b + c Γa+c But the gradient ψ x of ln Γx is a monotonically increasing function of x >, and so for all a > b, a,b,c > : Γb Γb+c > Γa ln Γa + c ln Γa > lnγb + c ln Γb and hence Γa+c for all a > b, a,b,c >. Using, it follows that B a + c,b > B a,b + c for all a > b, a,b,c >, which roves the first art of the theorem. The roof of the second art is essentially the same, excet that, as Γb a < b, Γb+c < Γa Γa+c, and so B a + c,b < B a,b + c for all a < b, a,b,c >. We may now roceed to the roofs of theorems 9 and. Proof of theorem 9: From, if = q then, using 9 and noting that xγx = Γx + : ℸ, if = e, = Γ = = = ℸ,c ; ℸ,c ; if c if = c Γ c c Γ if if = Γ if Γ But e,q by the Cramer-Rao bound [], so otimal efficiency is achieved for = ǫ σ =. Hence ǫ ot = is a solution. Proof of theorem : ote that e,q C for. ote also that the range of is [,. If ǫ ot =, and hence ot =, e,q must have a global maxima at =, which imlies that the gradient e,q of e,q must be non-ositive at. 7 Given this, to rove the theorem, it is sufficient to rove that the gradient e,q of e,q at = is ositive for all > q intuitively it may be seen that if the gradient is ositive at zero then increasing must result in an increase in e,q, and so = cannot be a maxima of e,q, global or otherwise. Using 8, it is straightforward to show that e,q = d,q ē,q, where: e c c ℸ, c ; q = ē,q= ℸ, c ; c e c ℸ,c ; ℸ,c ; q = c q ℸ q,c ; q ℸ q,c ; ; c ℸq,c ℸ q,c ; q 7 As = is at the end of the range [, it follows that if the gradient is negative at = then there will be a local maxima at that oint. 8 9

19 8 and: d q, = c e c Γ ℸ,c ; q = Γ Γ ℸ,c ;ℸ,c ; Γ ℸ,c ; q = q q Γ Γ ℸq,c ;ℸq,c ;ℸq,c ; Γ ℸ q,c ; q is a ositive smooth function for,,q Z +. Hence if ē,q > for > q Z + then e,q > for > q Z +, which is sufficient to rove the theorem. Using 8, for all,q Z + : if q = ē,q = Γ Γ if q = Γ q Γ q q Γ q if q Γ q Γ q Which roves the theorem in the case q =. Consider the case > q >. Writing q = + m, = + m + n, where m,n Z +, and using the result xγx = Γx +, if q : ē,q = q q = Γ q Γ q m+n+ Γ m+n+ Γ m+n+ m+n+ = Γa+b+c ΓaΓb Γ q Γ q m+n+ Γ m+n+ m+ Γ m+ B a + c,b B a,b + c where a = m+n+ m+ m+n+ >, b = m+n+ > and c = m+n+ >. As a > b, it follows from theorem that B a + c,b > B a,b + c, and hence ē >, which roves the theorem for the case q. Suose that n and subsequently q is treated as a real number such that n so q. The inequality B a + c,b > B a,b + c will still hold, as a > b for all n [,, m >. Hence lim ē q +,q >. Furthermore, using theorem : q lim q + q Γ q Γ q Γ q Γ q = Γ Γ Γ which imlies that: ē, = lim q + ē,q > and so ē,q > for all > q Z +. Hence e,q > for all > q Z +, which roves the theorem. Using an analogous method, we can also give a artial roof of conjecture : Partial roof of conjecture : To rove that ǫ ot = it is necessary although insufficient to rove that e,q has a local maxima at =. By analogy with the roof of theorem it is necessary to rove that the gradient e,q of e,q at = is non-ositive for all < q. From the roof of theorem, e,q = d,q ē,q, d,q >, for all,,q Z +, where ē,q is given by. Hence, if ē,q, e,q, and e,q will have a local maxima at =. As < < q, q. If q =, =, and hence by ē, =. Therefore e, has a local maxima at =. If q, writing q = + m, = + m n, where m Z +, n {,,...,m + }, it can be seen that if > q > : ē,q = Γa+b+c ΓaΓb B a + c,b B a,b + c where a = m n+ m+ m n+ >, b = m n+ > and c = m n+ >. As a < b, it follows from theorem that B a + c,b < B a,b + c, and hence ē <. Therefore, in general, e,q has a local maxima at = for all > q Z +. ow, if we could rove that this is a unique and hence global maxima, it would follow that ot = ǫ ot =, which would rove the conjecture. Alternatively, roving that e,q for all would demonstrate that the maxima at = is global although not necessarily unique and thereby rove the conjecture. Unfortunately we have been unable to do either rigorously, and so the roof remains incomlete.

20 9 REFERECES [] H. Drucker, C. J. C. Burges, L. Kaufman, A. Smola, and V. Vanik, Suort vector regression machines, in Advances in eural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds., vol. 9. The MIT Press, 997,.. [] V. Vanik, S. Golowich, and A. Smola, Advances in eural Information Processing Systems. MIT Press, 997, vol. 9, ch. Suort Vector Methods for Function Aroximation, Regression Estimation, and Signal Processing, [] A. Smola and B. Schölkof, A tutorial on suort vector regression, Tech. Re. eurocolt Technical Reort Series, C-TR-998-, Oct 998. [] C. Cortes and V. Vanik, Suort vector networks, Machine Learning, vol., no.,. 7 97, 99. [] I. Guyon, B. Boser, and V. Vanik, Automatic caacity tuning of very large VC-dimension classifiers, in Advances in eural Information Processing Systems, S. Hanson, J. Cowan, and C. Giles, Eds., vol.. Morgan Kaufmann, 99,. 7. [6] C. J. C. Burges, A tutorial on suort vector machines for attern recognition, Knowledge Discovery and Data Mining, vol.,. 67, 998. [7] B. Schölkof, P. Bartlett, A. Smola, and R. Williamson, Shrinking the tube: A new suort vector regression algorithm, Advances in eural Information Processing Systems, vol.,. 6, 999. [8] B. Schölkof and A. J. Smola, ew suort vector algorithms, Tech. Re. eurocolt Technical Reort Series, C-TR-998-, ov 998. [9] A. Smola,. Murata, B. Schölkof, and K.-R. Muller, Asymtotically otimal choice of ǫ-loss for suort vector machines, in Proceedings of the 8th International Conference on Artificial eural etworks - Persectives in eural Comuting, L. iklasson, M. Boden, and T. Ziemke, Eds. Sringer Verlag, 998,.. [] J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Suort Vector Machines. ew Jersey: World Scientific Publishing,. [], Weighted least squares suort vector machines: robustness and sarse aroximation, eurocomuting, Secial issue on fundamental and information rocessing asects of neurocomuting, vol. 8, no.,. 8, Oct. [] A. Shilton and M. Palaniswami, A modified ν-sv method for simlified regression, in Proceedings of the International Conference on Intelligent Sensing and Information Processing,,. 7. [] J. Mercer, Functions of ositive and negative tye, and their connection with the theory of integral equations, Transactions of the Royal Society of London, vol. 9, no. A, 99. [] B. Schölkof and A. J. Smola, Learning with Kernels: Suort Vector Machines, Regularization, Otimization, and Beyond. Cambridge, Massachusetts: MIT Press,. [] A. Smola, B. Schölkof, and K. Muller, General cost functions for suort vector regression, in Proceedings of the inth Australian Conf. on eural etworks, T. Downs, M. Frean, and M. Gallagher, Eds., 998, [6] C. Chang and C. Lin, LIBSVM: a library for suort vector machines version.,. [7] M. Palaniswami and A. Shilton, Adative suort vector machines for regression, in Proceedings of the 9th International Conference on eural Information Processing,. [8] R. Fletcher, Practical Methods of Otimisation. Chichester: John Wiley and Sons, 98. [9] K. Levenberg, A method for the solution of certain roblems in least squares, Quart. Al. Math., vol.,. 6 68, 9. [] D. Marquardt, An algorithm for least-squares estimation of nonlinear arameters, SIAM J. Al. Math., vol.,., 96. [] C. R. Rao, Linear Statistical Inference and its Alications. ew York: Wiley, 97. []. Murata, S. Yoshizawa, and S.-I. Amari, etwork Information Criterion determining the number of hidden units for an artificial neural network model, IEEE Transactions on eural etworks, vol., no. 6, , ovember 99. [Online]. Available: citeseer.nj.nec.com/murata9network.html [] S. G. Krantz, Handbook of Comlex Variables. Birkhusers, 999. [] A. Chalimourda, B. Schölkof, and A. Smola, Exerimentally otimal ν in suort vector regression for different noise models and arameter settings, eural etworks, vol. 7, no.,. 7,. [] A. Erdélyi, W. Magnus, F. Oberhettinger, and F. G. Tricomi, Higher Transcendental Functions. ew York: Krieger, 98, vol., ch.. The Beta Function,. 9. [6] H. Bateman, Higher Transcendental Functions. McGraw-Hill Book Comany Inc., 9, vol.. Alistair Shilton received his combined B.Sc. / B.Eng. degree from the University of Melbourne, Melbourne, Australia, in, secialising in hysics, alied mathematics and electronic engineering. He is currently ursuing the Ph.D. degree in electronic engineering at the University of Melbourne. His research interests include machine learning, secializing in suort vector machines; signal rocessing, communications and differential toology. Daniel Lai received his B.Eng degree in Electrical and Comuter Systems from Monash University, Melbourne, Australia in. He is currently ursuing his Ph.D at Monash University. His research interests include otimization techniques for machine learning formulations, decomosition techniques, evolutionary otimization and dynamic rogramming.

21 M. Palaniswami received his BEHons from the University of Madras, ME from the Indian Institute of science, India, MEngSc from the University of Melbourne and Ph.D from the University of ewcastle, Australia before rejoining the University of Melbourne. He has been serving the University of Melbourne for over 6 years. He has ublished more than 8 refereed aers and a huge roortion of them aeared in restigious IEEE Journals and Conferences. He was given a Foreign Secialist Award by the Ministry of Education, Jaan in recognition of his contributions to the field of Machine Learning. He served as associate editor for Journals/transactions including IEEE Transactions on neural etworks and Comutational Intelligence for Finance.. His research interests include SVMs, Sensors and Sensor etworks, Machine Learning, eural etwork, Pattern Recognition, Signal Processing and Control. He is the Co-Director of Centre of Exertise on etworked Decision & Sensor Systems. He is the associate editor for International Journal of Comutational Intelligence and Alications and serves on the editorial board of AZ Journal on Intelligent Information rocessing Systems. He is also the Subject Editor for International Journal on Distributed Sensor etworks.

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

A Modified ν-sv Method For Simplified Regression A. Shilton M. Palaniswami

A Modified ν-sv Method For Simplified Regression A. Shilton M. Palaniswami A Modified ν-sv Method For Simplified Regression A Shilton apsh@eemuozau), M Palanisami sami@eemuozau) Department of Electrical and Electronic Engineering, University of Melbourne, Victoria - 3 Australia

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information

More information

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Technical Sciences and Alied Mathematics MODELING THE RELIABILITY OF CISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL Cezar VASILESCU Regional Deartment of Defense Resources Management

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE

ON POLYNOMIAL SELECTION FOR THE GENERAL NUMBER FIELD SIEVE MATHEMATICS OF COMPUTATIO Volume 75, umber 256, October 26, Pages 237 247 S 25-5718(6)187-9 Article electronically ublished on June 28, 26 O POLYOMIAL SELECTIO FOR THE GEERAL UMBER FIELD SIEVE THORSTE

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

On a class of Rellich inequalities

On a class of Rellich inequalities On a class of Rellich inequalities G. Barbatis A. Tertikas Dedicated to Professor E.B. Davies on the occasion of his 60th birthday Abstract We rove Rellich and imroved Rellich inequalities that involve

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014

Morten Frydenberg Section for Biostatistics Version :Friday, 05 September 2014 Morten Frydenberg Section for Biostatistics Version :Friday, 05 Setember 204 All models are aroximations! The best model does not exist! Comlicated models needs a lot of data. lower your ambitions or get

More information

COMMUNICATION BETWEEN SHAREHOLDERS 1

COMMUNICATION BETWEEN SHAREHOLDERS 1 COMMUNICATION BTWN SHARHOLDRS 1 A B. O A : A D Lemma B.1. U to µ Z r 2 σ2 Z + σ2 X 2r ω 2 an additive constant that does not deend on a or θ, the agents ayoffs can be written as: 2r rθa ω2 + θ µ Y rcov

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

arxiv:cond-mat/ v2 25 Sep 2002

arxiv:cond-mat/ v2 25 Sep 2002 Energy fluctuations at the multicritical oint in two-dimensional sin glasses arxiv:cond-mat/0207694 v2 25 Se 2002 1. Introduction Hidetoshi Nishimori, Cyril Falvo and Yukiyasu Ozeki Deartment of Physics,

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Statics and dynamics: some elementary concepts

Statics and dynamics: some elementary concepts 1 Statics and dynamics: some elementary concets Dynamics is the study of the movement through time of variables such as heartbeat, temerature, secies oulation, voltage, roduction, emloyment, rices and

More information

DIFFERENTIAL evolution (DE) [3] has become a popular

DIFFERENTIAL evolution (DE) [3] has become a popular Self-adative Differential Evolution with Neighborhood Search Zhenyu Yang, Ke Tang and Xin Yao Abstract In this aer we investigate several self-adative mechanisms to imrove our revious work on [], which

More information

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling Journal of Modern Alied Statistical Methods Volume 15 Issue Article 14 11-1-016 An Imroved Generalized Estimation Procedure of Current Poulation Mean in Two-Occasion Successive Samling G. N. Singh Indian

More information

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations

Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Positivity, local smoothing and Harnack inequalities for very fast diffusion equations Dedicated to Luis Caffarelli for his ucoming 60 th birthday Matteo Bonforte a, b and Juan Luis Vázquez a, c Abstract

More information

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES AARON ZWIEBACH Abstract. In this aer we will analyze research that has been recently done in the field of discrete

More information

3.4 Design Methods for Fractional Delay Allpass Filters

3.4 Design Methods for Fractional Delay Allpass Filters Chater 3. Fractional Delay Filters 15 3.4 Design Methods for Fractional Delay Allass Filters Above we have studied the design of FIR filters for fractional delay aroximation. ow we show how recursive or

More information

Plotting the Wilson distribution

Plotting the Wilson distribution , Survey of English Usage, University College London Setember 018 1 1. Introduction We have discussed the Wilson score interval at length elsewhere (Wallis 013a, b). Given an observed Binomial roortion

More information

Notes on duality in second order and -order cone otimization E. D. Andersen Λ, C. Roos y, and T. Terlaky z Aril 6, 000 Abstract Recently, the so-calle

Notes on duality in second order and -order cone otimization E. D. Andersen Λ, C. Roos y, and T. Terlaky z Aril 6, 000 Abstract Recently, the so-calle McMaster University Advanced Otimization Laboratory Title: Notes on duality in second order and -order cone otimization Author: Erling D. Andersen, Cornelis Roos and Tamás Terlaky AdvOl-Reort No. 000/8

More information

On the capacity of the general trapdoor channel with feedback

On the capacity of the general trapdoor channel with feedback On the caacity of the general tradoor channel with feedback Jui Wu and Achilleas Anastasooulos Electrical Engineering and Comuter Science Deartment University of Michigan Ann Arbor, MI, 48109-1 email:

More information

Spectral Analysis by Stationary Time Series Modeling

Spectral Analysis by Stationary Time Series Modeling Chater 6 Sectral Analysis by Stationary Time Series Modeling Choosing a arametric model among all the existing models is by itself a difficult roblem. Generally, this is a riori information about the signal

More information

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels

Quantitative estimates of propagation of chaos for stochastic systems with W 1, kernels oname manuscrit o. will be inserted by the editor) Quantitative estimates of roagation of chaos for stochastic systems with W, kernels Pierre-Emmanuel Jabin Zhenfu Wang Received: date / Acceted: date Abstract

More information

Universal Finite Memory Coding of Binary Sequences

Universal Finite Memory Coding of Binary Sequences Deartment of Electrical Engineering Systems Universal Finite Memory Coding of Binary Sequences Thesis submitted towards the degree of Master of Science in Electrical and Electronic Engineering in Tel-Aviv

More information

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms

New Schedulability Test Conditions for Non-preemptive Scheduling on Multiprocessor Platforms New Schedulability Test Conditions for Non-reemtive Scheduling on Multirocessor Platforms Technical Reort May 2008 Nan Guan 1, Wang Yi 2, Zonghua Gu 3 and Ge Yu 1 1 Northeastern University, Shenyang, China

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK Comuter Modelling and ew Technologies, 5, Vol.9, o., 3-39 Transort and Telecommunication Institute, Lomonosov, LV-9, Riga, Latvia MATHEMATICAL MODELLIG OF THE WIRELESS COMMUICATIO ETWORK M. KOPEETSK Deartment

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model Shadow Comuting: An Energy-Aware Fault Tolerant Comuting Model Bryan Mills, Taieb Znati, Rami Melhem Deartment of Comuter Science University of Pittsburgh (bmills, znati, melhem)@cs.itt.edu Index Terms

More information

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators S. K. Mallik, Student Member, IEEE, S. Chakrabarti, Senior Member, IEEE, S. N. Singh, Senior Member, IEEE Deartment of Electrical

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS A Dissertation by JAMES ROBERT FISHER Submitted to the Office of Graduate Studies of Texas A&M University in artial fulfillment

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations

Preconditioning techniques for Newton s method for the incompressible Navier Stokes equations Preconditioning techniques for Newton s method for the incomressible Navier Stokes equations H. C. ELMAN 1, D. LOGHIN 2 and A. J. WATHEN 3 1 Deartment of Comuter Science, University of Maryland, College

More information

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm Gabriel Noriega, José Restreo, Víctor Guzmán, Maribel Giménez and José Aller Universidad Simón Bolívar Valle de Sartenejas,

More information

A New Perspective on Learning Linear Separators with Large L q L p Margins

A New Perspective on Learning Linear Separators with Large L q L p Margins A New Persective on Learning Linear Searators with Large L q L Margins Maria-Florina Balcan Georgia Institute of Technology Christoher Berlind Georgia Institute of Technology Abstract We give theoretical

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE

A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE THE 19 TH INTERNATIONAL CONFERENCE ON COMPOSITE MATERIALS A SIMPLE PLASTICITY MODEL FOR PREDICTING TRANSVERSE COMPOSITE RESPONSE AND FAILURE K.W. Gan*, M.R. Wisnom, S.R. Hallett, G. Allegri Advanced Comosites

More information

Hotelling s Two- Sample T 2

Hotelling s Two- Sample T 2 Chater 600 Hotelling s Two- Samle T Introduction This module calculates ower for the Hotelling s two-grou, T-squared (T) test statistic. Hotelling s T is an extension of the univariate two-samle t-test

More information

Finding a sparse vector in a subspace: linear sparsity using alternating directions

Finding a sparse vector in a subspace: linear sparsity using alternating directions IEEE TRANSACTION ON INFORMATION THEORY VOL XX NO XX 06 Finding a sarse vector in a subsace: linear sarsity using alternating directions Qing Qu Student Member IEEE Ju Sun Student Member IEEE and John Wright

More information

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS #A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS Ramy F. Taki ElDin Physics and Engineering Mathematics Deartment, Faculty of Engineering, Ain Shams University, Cairo, Egyt

More information

Recent Developments in Multilayer Perceptron Neural Networks

Recent Developments in Multilayer Perceptron Neural Networks Recent Develoments in Multilayer Percetron eural etworks Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, Texas 75265 walter.delashmit@lmco.com walter.delashmit@verizon.net Michael

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Improved Capacity Bounds for the Binary Energy Harvesting Channel Imroved Caacity Bounds for the Binary Energy Harvesting Channel Kaya Tutuncuoglu 1, Omur Ozel 2, Aylin Yener 1, and Sennur Ulukus 2 1 Deartment of Electrical Engineering, The Pennsylvania State University,

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

ON MINKOWSKI MEASURABILITY

ON MINKOWSKI MEASURABILITY ON MINKOWSKI MEASURABILITY F. MENDIVIL AND J. C. SAUNDERS DEPARTMENT OF MATHEMATICS AND STATISTICS ACADIA UNIVERSITY WOLFVILLE, NS CANADA B4P 2R6 Abstract. Two athological roerties of Minkowski content

More information

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Location of solutions for quasi-linear elliptic equations with general gradient dependence Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations

More information

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment Neutrosohic Sets and Systems Vol 14 016 93 University of New Mexico Otimal Design of Truss Structures Using a Neutrosohic Number Otimization Model under an Indeterminate Environment Wenzhong Jiang & Jun

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Cryptanalysis of Pseudorandom Generators

Cryptanalysis of Pseudorandom Generators CSE 206A: Lattice Algorithms and Alications Fall 2017 Crytanalysis of Pseudorandom Generators Instructor: Daniele Micciancio UCSD CSE As a motivating alication for the study of lattice in crytograhy we

More information

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes

16.2. Infinite Series. Introduction. Prerequisites. Learning Outcomes Infinite Series 6.2 Introduction We extend the concet of a finite series, met in Section 6., to the situation in which the number of terms increase without bound. We define what is meant by an infinite

More information

Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer

Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer Frequency-Weighted Robust Fault Reconstruction Using a Sliding Mode Observer C.P. an + F. Crusca # M. Aldeen * + School of Engineering, Monash University Malaysia, 2 Jalan Kolej, Bandar Sunway, 4650 Petaling,

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

Lower bound solutions for bearing capacity of jointed rock

Lower bound solutions for bearing capacity of jointed rock Comuters and Geotechnics 31 (2004) 23 36 www.elsevier.com/locate/comgeo Lower bound solutions for bearing caacity of jointed rock D.J. Sutcliffe a, H.S. Yu b, *, S.W. Sloan c a Deartment of Civil, Surveying

More information

Session 5: Review of Classical Astrodynamics

Session 5: Review of Classical Astrodynamics Session 5: Review of Classical Astrodynamics In revious lectures we described in detail the rocess to find the otimal secific imulse for a articular situation. Among the mission requirements that serve

More information

The Binomial Approach for Probability of Detection

The Binomial Approach for Probability of Detection Vol. No. (Mar 5) - The e-journal of Nondestructive Testing - ISSN 45-494 www.ndt.net/?id=7498 The Binomial Aroach for of Detection Carlos Correia Gruo Endalloy C.A. - Caracas - Venezuela www.endalloy.net

More information

1 Riesz Potential and Enbeddings Theorems

1 Riesz Potential and Enbeddings Theorems Riesz Potential and Enbeddings Theorems Given 0 < < and a function u L loc R, the Riesz otential of u is defined by u y I u x := R x y dy, x R We begin by finding an exonent such that I u L R c u L R for

More information

Principal Components Analysis and Unsupervised Hebbian Learning

Principal Components Analysis and Unsupervised Hebbian Learning Princial Comonents Analysis and Unsuervised Hebbian Learning Robert Jacobs Deartment of Brain & Cognitive Sciences University of Rochester Rochester, NY 1467, USA August 8, 008 Reference: Much of the material

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1) Introduction to MVC Definition---Proerness and strictly roerness A system G(s) is roer if all its elements { gij ( s)} are roer, and strictly roer if all its elements are strictly roer. Definition---Causal

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

Proximal methods for the latent group lasso penalty

Proximal methods for the latent group lasso penalty Proximal methods for the latent grou lasso enalty The MIT Faculty has made this article oenly available. Please share how this access benefits you. Your story matters. Citation As Published Publisher Villa,

More information

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0

Solution sheet ξi ξ < ξ i+1 0 otherwise ξ ξ i N i,p 1 (ξ) + where 0 0 Advanced Finite Elements MA5337 - WS7/8 Solution sheet This exercise sheets deals with B-slines and NURBS, which are the basis of isogeometric analysis as they will later relace the olynomial ansatz-functions

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals Samling and Distortion radeoffs for Bandlimited Periodic Signals Elaheh ohammadi and Farokh arvasti Advanced Communications Research Institute ACRI Deartment of Electrical Engineering Sharif University

More information

Metrics Performance Evaluation: Application to Face Recognition

Metrics Performance Evaluation: Application to Face Recognition Metrics Performance Evaluation: Alication to Face Recognition Naser Zaeri, Abeer AlSadeq, and Abdallah Cherri Electrical Engineering Det., Kuwait University, P.O. Box 5969, Safat 6, Kuwait {zaery, abeer,

More information

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise Brad L. Miller Det. of Comuter Science University of Illinois at Urbana-Chamaign David E. Goldberg Det. of General Engineering University

More information

A New Asymmetric Interaction Ridge (AIR) Regression Method

A New Asymmetric Interaction Ridge (AIR) Regression Method A New Asymmetric Interaction Ridge (AIR) Regression Method by Kristofer Månsson, Ghazi Shukur, and Pär Sölander The Swedish Retail Institute, HUI Research, Stockholm, Sweden. Deartment of Economics and

More information

The Fekete Szegő theorem with splitting conditions: Part I

The Fekete Szegő theorem with splitting conditions: Part I ACTA ARITHMETICA XCIII.2 (2000) The Fekete Szegő theorem with slitting conditions: Part I by Robert Rumely (Athens, GA) A classical theorem of Fekete and Szegő [4] says that if E is a comact set in the

More information

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables

Partial Identification in Triangular Systems of Equations with Binary Dependent Variables Partial Identification in Triangular Systems of Equations with Binary Deendent Variables Azeem M. Shaikh Deartment of Economics University of Chicago amshaikh@uchicago.edu Edward J. Vytlacil Deartment

More information

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari

substantial literature on emirical likelihood indicating that it is widely viewed as a desirable and natural aroach to statistical inference in a vari Condence tubes for multile quantile lots via emirical likelihood John H.J. Einmahl Eindhoven University of Technology Ian W. McKeague Florida State University May 7, 998 Abstract The nonarametric emirical

More information