A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

Size: px
Start display at page:

Download "A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio"

Transcription

1 MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No March23, 1999 C.B.C.L Paper No. 171 A unified framework for Reguarization Networks and Support Vector Machines Theodoros Evgeniou, Massimiiano Ponti, Tomaso Poggio This pubication can be retrieved by anonymous ftp to pubications.ai.mit.edu. The pathname for this pubication is: ai-pubications/ /aim-1654.ps Abstract Reguarization Networks and Support Vector Machines are techniques for soving certain probems of earning from exampes in particuar the regression probem of approximating a mutivariate function from sparse data. We present bothformuations in a unified framework, namey in the context of Vapnik s theory of statistica earning which provides a genera foundation for the earning probem, combining functiona anaysis and statistics. Copyright c Massachusetts Institute of Technoogy, 1998 This report describers research done at the Center for Bioogica & Computationa Learning and the Artificia Inteigence Laboratory of the Massachusetts Institute of Technoogy. This research was sponsored by the Nationa Science Foundation under contract No. IIS , the Office of Nava Research under contract No. N and contract No. N Partia support was aso provided by Daimer-Benz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T.

2 Contents 1 Introduction 3 2 Overview of statistica earning theory Uniform Convergence and the Vapnik-Chervonenkis bound The method of Structura Risk Minimization ɛ-uniform convergence and the V γ dimension Overview of our approach Reproducing Kerne Hibert Spaces: a brief overview 14 4Reguarization Networks Radia Basis Functions Reguarization, generaized spines and kerne smoothers Dua representation of Reguarization Networks From regression to cassification Support vector machines SVMin RKHS From regression to cassification SRM for RNs and SVMs SRMfor SVMCassification Distribution dependent bounds for SVMC A Bayesian Interpretation of Reguarization and SRM? Maximum A Posteriori Interpretation of Reguarization Bayesian interpretation of the stabiizer in the RN and SVMfunctionas Bayesian interpretation of the data term in the Reguarization and SVMfunctiona Why a MAP interpretation may be miseading Connections between SVMs and Sparse Approximation techniques The probem of sparsity Equivaence between BPDN and SVMs Independent Component Anaysis Remarks Reguarization Networks can impement SRM The SVMfunctiona is a specia formuation of reguarization SVM, sparsity and compression Gaussian processes, reguarization and SVM Kernes and how to choose an input representation Capacity contro and the physica word A Reguarization Theory for Learning 41 B An exampe of RKHS 42 1

3 C Reguarized Soutions in RKHS 43 D Reation between SVMC and SVMR 44 E Proof of the theorem F The noise mode of the data term in SVMR 46 2

4 1 Introduction The purpose of this paper is to present a theoretica framework for the probem of earning from exampes. Learning from exampes can be regarded as the regression probem of approximating a mutivariate function from sparse data and we wi take this point of view here 1. The probem of approximating a function from sparse data is i-posed and a cassica way to sove it is reguarization theory [92, 10, 11]. Cassica reguarization theory, as we wi consider here 2,formuates the regression probem as a variationa probem of finding the function f that minimizes the functiona min H[f] =1 (y i f(x i )) 2 + λ f 2 K (1) f H where f 2 K is a norm in a Reproducing Kerne Hibert Space H defined by the positive definite function K, is the number of data points or exampes (the pairs (x i,y i )) and λ is the reguarization parameter (see the semina work of [102]). Under rather genera conditions the soution of equation (1) is f(x) = c i K(x, x i ). (2) Unti now the functionas of cassica reguarization have acked a rigorous justification for a finite set of training data. Their formuation is based on functiona anaysis arguments which rey on asymptotic resuts and do not consider finite data sets 3. Reguarization is the approach we have taken in earier work on earning [69, 39, 77]. The semina work of Vapnik [94, 95, 96] has now set the foundations for a more genera theory that justifies reguarization functionas for earning from finite sets and can be used to extend consideraby the cassica framework of reguarization, effectivey marrying a functiona anaysis perspective with modern advances in the theory of probabiity and statistics. The basic idea of Vapnik s theory is cosey reated to reguarization: for a finite set of training exampes the search for the best mode or approximating function has to be constrained to an appropriatey sma hypothesis space (which can aso be thought of as a space of machines or modes or network architectures). If the space is too arge, modes can be found which wi fit exacty the data but wi have a poor generaization performance, that is poor predictive capabiity on new data. Vapnik s theory characterizes and formaizes these concepts in terms of the capacity of a set of functions and capacity contro depending on the training data: for instance, for a sma training set the capacity of the function space in which f is sought has to be sma whereas it can increase with a arger training set. As we wi see ater in the case of reguarization, a form of capacity contro eads to choosing an optima λ in equation (1) for a given set of data. A key part of the theory is to define and bound the capacity of a set of functions. Thus the key and somewhat nove theme of this review is a) to describe a unified framework for severa earning techniques for finite training sets and b) to justify them in terms of statistica earning theory. We wi consider functionas of the form 1 There is a arge iterature on the subject: usefu reviews are [44, 19, 102, 39], [96] and references therein. 2 The genera reguarization scheme for earning is sketched in Appendix A. 3 The method of quasi-soutions of Ivanov and the equivaent Tikhonov s reguarization technique were deveoped to sove i-posed probems of the type Af = F,whereA is a (inear) operator, f is the desired soution in a metric space E 1,andF are the data in a metric space E 2. 3

5 H[f] = 1 V (y i,f(x i )) + λ f 2 K, (3) where V (, ) isaoss function. We wi describe how cassica reguarization and Support Vector Machines [96] for both regression (SVMR) and cassification (SVMC) correspond to the minimization of H in equation (3) for different choices of V : Cassica (L 2 ) Reguarization Networks (RN) Support Vector Machines Regression (SVMR) V (y i,f(x i )) = (y i f(x i )) 2 (4) Support Vector Machines Cassification (SVMC) V (y i,f(x i )) = y i f(x i ) ɛ (5) V (y i,f(x i )) = 1 y i f(x i ) + (6) where ɛ is Vapnik s epsion-insensitive norm (see ater), x + = x if x is positive and zero otherwise, and y i is a rea number in RN and SVMR, whereas it takes vaues 1, 1 in SVMC. Loss function (6) is aso caed the soft margin oss function. For SVMC, we wi aso discuss two other oss functions: The hard margin oss function: The miscassification oss function: V (y i,f(x)) = θ(1 y i f(x i )) (7) V (y i,f(x)) = θ( y i f(x i )) (8) Where θ( ) is the Heaviside function. For cassification one shoud minimize (8) (or (7)), but in practice other oss functions, such as the soft margin one (6) [22, 95], are used. We discuss this issue further in section 6. The minimizer of (3) using the three oss functions has the same genera form (2) (or f(x) = c i K(x, x i )+b, see ater) but interestingy different properties 4. In this review we wi show how different earning techniques based on the minimization of functionas of the form of H in (3) can be justified for a few choices of V (, ) using a sight extension of the toos and resuts of Vapnik s statistica earning theory. In section 2 we outine the main resuts in the theory of statistica earning and in particuar Structura Risk Minimization the technique suggested by Vapnik to sove the probem of capacity contro in earning from sma training sets. At the end of the section we wi outine a technica extension of Vapnik s Structura Risk Minimization framework (SRM). With this extension both RN and Support Vector Machines (SVMs) can be seen within a SRMscheme. In recent years a number of papers caim that SVMcannot be 4 For genera differentiabe oss functions V the form of the soution is sti the same, as shown in Appendix C. 4

6 justified in a data-independent SRMframework (i.e. [86]). One of the goas of this paper is to provide such a data-independent SRMframework that justifies SVMas we as RN. Before describing reguarization techniques, section 3 reviews some basic facts on RKHS which are the main function spaces on which this review is focused. After the section on reguarization (section 4) we wi describe SVMs (section 5). As we saw aready, SVMs for regression can be considered as a modification of reguarization formuations of the type of equation (1). Radia Basis Functions (RBF) can be shown to be soutions in both cases (for radia K) but with a rather different structure of the coefficients c i. Section 6 describes in more detai how and why both RN and SVMcan be justified in terms of SRM, in the sense of Vapnik s theory: the key to capacity contro is how to choose λ for a given set of data. Section 7 describes a naive Bayesian Maximum A Posteriori (MAP) interpretation of RNs and of SVMs. It aso shows why a forma MAP interpretation, though interesting and even usefu, may be somewhat miseading. Section 8 discusses reations of the reguarization and SVMtechniques with other representations of functions and signas such as sparse representations from overcompete dictionaries, Bind Source Separation, and Independent Component Anaysis. Finay, section 9 summarizes the main themes of the review and discusses some of the open probems. 2 Overview of statistica earning theory We consider the case of earning from exampes as defined in the statistica earning theory framework [94, 95, 96]. We have two sets of variabes x X R d and y Y R that are reated by a probabiistic reationship. We say that the reationship is probabiistic because generay an eement of X does not determine uniquey an eement of Y, but rather a probabiity distribution on Y. This can be formaized assuming that a probabiity distribution P (x,y)is defined over the set X Y. The probabiity distribution P (x,y) is unknown, and under very genera conditions can be written as P (x,y)=p(x)p(y x) wherep (y x) is the conditiona probabiity of y given x, andp (x) is the margina probabiity of x. We are provided with exampes of this probabiistic reationship, that is with a data set D {(x i,y i ) X Y } caed the training data, obtained by samping times the set X Y according to P (x,y). The probem of earning consists in, given the data set D, providing an estimator, that is a function f : X Y, that can be used, given any vaue of x X, to predict a vaue y. In statistica earning theory, the standard way to sove the earning probem consists in defining a risk functiona, which measures the average amount of error associated with an estimator, and then to ook for the estimator, among the aowed ones, with the owest risk. If V (y, f(x)) is the oss function measuring the error we make when we predict y by f(x) 5, then the average error is the so caed expected risk: I[f] V (y, f(x))p (x,y) dxdy (9) X,Y We assume that the expected risk is defined on a arge cass of functions F and we wi denote by f 0 the function which minimizes the expected risk in F: 5 Typicay for regression the oss functions is of the form V (y f(x)). f 0 (x) =argmini[f] (10) F 5

7 The function f 0 is our idea estimator, and it is often caed the target function 6. Unfortunatey this function cannot be found in practice, because the probabiity distribution P (x,y) that defines the expected risk is unknown, and ony a sampe of it, the data set D,is avaiabe. To overcome this shortcoming we need an induction principe that we can use to earn from the imited number of training data we have. Statistica earning theory as deveoped by Vapnik buids on the so-caed empirica risk minimization (ERM) induction principe. The ERM method consists in using the data set D to buid a stochastic approximation of the expected risk, which is usuay caed the empirica risk, and is defined as 7 : I emp [f; ] = 1 V (y i,f(x i )). (11) The centra question of the theory is whether the expected risk of the minimizer of the empirica risk in F is cose to the expected risk of f 0. Notice that the question is not necessariy whether we can find f 0 but whether we can imitate f 0 in the sense that the expected risk of our soution is cose to that of f 0. Formay the theory answers the question of finding under which conditions the method of ERMsatisfies: im I emp[ ˆf ; ] = im I[ ˆf ]=I[f 0 ] (12) in probabiity (a statements are probabiistic since we start with P (x,y) on the data), where we note with ˆf the minimizer of the empirica risk (11) in F. It can been shown (see for exampe [96]) that in order for the imits in eq. (12) to hod true in probabiity, or more precisey, for the empirica risk minimization principe to be non-triviay consistent (see [96] for a discussion about consistency versus non-trivia consistency), the foowing uniform aw of arge numbers (which transates to one-sided uniform convergence in probabiity of empirica risk to expected risk in F) isanecessary and sufficient condition: { } im P sup (I[f] I emp [f; ]) >ɛ f F =0 ɛ >0 (13) Intuitivey, if F is very arge then we can aways find ˆf Fwith 0 empirica error. This however does not guarantee that the expected risk of ˆf is aso cose to 0, or cose to I[f 0 ]. Typicay in the iterature the two-sided uniform convergence in probabiity: { } im P sup I[f] I emp [f; ] >ɛ f F =0 ɛ >0 (14) is considered, which ceary impies (13). In this paper we focus on the stronger two-sided case and note that one can get one-sided uniform convergence with some minor technica changes to the theory. We wi not discuss the technica issues invoved in the reations between consistency, non-trivia consistency, two-sided and one-sided uniform convergence (a discussion can be found in [96]), and from now on we concentrate on the two-sided uniform convergence in probabiity, which we simpy refer to as uniform convergence. The theory of uniform convergence of ERMhas been deveoped in [97, 98, 99, 94, 96]. It has aso been studied in the context of empirica processes [29, 74, 30]. Here we summarize the main resuts of the theory. 6 In the case that V is (y f(x)) 2, the minimizer of eq. (10) is the regression function f 0 (x) = yp(y x)dy. 7 It is important to notice that the data terms (4), (5) and (6) are used for the empirica risks I emp. 6

8 2.1 Uniform Convergence and the Vapnik-Chervonenkis bound Vapnik and Chervonenkis [97, 98] studied under what conditions uniform convergence of the empirica risk to expected risk takes pace. The resuts are formuated in terms of three important quantities that measure the compexity of a set of functions: the VC entropy, the anneaed VC entropy, and the growth function. We begin with the definitions of these quantities. First we define the minima ɛ-net of a set, which intuitivey measures the cardinaity of a set at resoution ɛ: Definition 2.1 Let A be a set in a metric space A with distance metric d. For a fixed ɛ>0, the set B Ais caed an ɛ-net of A in A, if for any point a A there is a point b B such that d(a, b) <ɛ. We say that the set B is a minima ɛ-net of A in A, if it is finite and contains the minima number of eements. Given a training set D = {(x i,y i ) X Y }, consider the set of -dimensiona vectors: q(f) =(V (y 1,f(x 1 )),..., V (y,f(x ))) (15) with f F, and define the number of eements of the minima ɛ-net of this set under the metric: d(q(f),q(f )) = max 1 i V (y i,f(x i )) V (y i,f (x i )) to be N F (ɛ; D ) (which ceary depends both on F and on the oss function V ). Intuitivey this quantity measures how many different functions effectivey we have at resoution ɛ, when we ony care about the vaues of the functions at points in D. Using this quantity we now give the foowing definitions: Definition 2.2 Given a set X Y and a probabiity P (x,y) defined over it, the VC entropy of a set of functions V (y, f(x)), f F, on a data set of size is defined as: H F (ɛ; ) X,Y n N F (ɛ; D ) P (x i,y i )dx i dy i Definition 2.3 Given a set X Y and a probabiity P (x,y) defined over it, the anneaed VC entropy of a set of functions V (y, f(x)), f F, on a data set of size is defined as: H F ann (ɛ; ) n X,Y N F (ɛ; D ) P (x i,y i )dx i dy i Definition 2.4 Given a set X Y, the growth function of a set of functions V (y, f(x)), f F, on a data set of size is defined as: ( ) G F (ɛ; ) n sup N F (ɛ; D ) D (X Y ) 7

9 Notice that a three quantities are functions of the number of data and of ɛ, and that ceary: H F (ɛ; ) H F ann (ɛ; ) GF (ɛ; ). These definitions can easiy be extended in the case of indicator functions, i.e. functions taking binary vaues 8 such as { 1, 1}, in which case the three quantities do not depend on ɛ for ɛ<1, since the vectors (15) are a at the vertices of the hypercube {0, 1}. Using these definitions we can now state three important resuts of statistica earning theory [96]: For a given probabiity distribution P (x,y): 1. The necessary and sufficient condition for uniform convergence is that H F (ɛ; ) im =0 ɛ >0 2. A sufficient condition for fast asymptotic rate of convergence 9 is that im Hann F (ɛ; ) =0 ɛ >0 It is an open question whether this is aso a necessary condition. A sufficient condition for distribution independent (that is, for any P (x,y)) fast rate of convergence is that G F (ɛ; ) im =0 ɛ >0 For indicator functions this is aso a necessary condition. According to statistica earning theory, these three quantities are what one shoud consider when designing and anayzing earning machines: the VC-entropy and the anneaed VC-entropy for an anaysis which depends on the probabiity distribution P (x,y) of the data, and the growth function for a distribution independent anaysis. In this paper we consider ony distribution independent resuts, athough the reader shoud keep in mind that distribution dependent resuts are ikey to be important in the future. Unfortunatey the growth function of a set of functions is difficut to compute in practice. So the standard approach in statistica earning theory is to use an upper bound on the growth function which is given using another important quantity, the VC-dimension, which is another (ooser) measure of the compexity, capacity, of a set of functions. In this paper we concentrate on this quantity, but it is important that the reader keeps in mind that the VC-dimension is in a sense a weak measure of compexity of a set of functions, so it typicay eads to oose upper bounds on the growth function: in genera one is better off, theoreticay, using directy the growth function. We now discuss the VC-dimension and its impications for earning. The VC-dimension was first defined for the case of indicator functions and then was extended to rea vaued functions. 8 In the case of indicator functions, y is binary, and V is 0 for f(x) =y, 1otherwise. 9 This means that for any > 0 we have that P {sup f F I[f] I emp [f] >ɛ} <e cɛ2 for some constant c>0. Intuitivey, fast rate is typicay needed in practice. 8

10 Definition 2.5 The VC-dimension of a set {θ(f(x)),f F}, of indicator functions is the maximum number h of vectors x 1,...,x h that can be separated into two casses in a 2 h possibe ways using functions of the set. If, for any number N, it is possibe to find N points x 1,...,x N that can be separated in a the 2 N possibe ways, we wi say that the VC-dimension of the set is infinite. The remarkabe property of this quantity is that, athough as we mentioned the VC-dimension ony provides an upper bound to the growth function, in the case of indicator functions, finiteness of the VC-dimension is a necessary and sufficient condition for uniform convergence (eq. (14)) independent of the underying distribution P (x,y). Definition 2.6 Let A V (y, f(x)) B, f F, with A and B<. The VC-dimension of the set {V (y, f(x)), f F}is defined as the VC-dimension of the set of indicator functions {θ (V (y, f(x)) α), α (A, B)}. Sometimes we refer to the VC-dimension of {V (y, f(x)), f F}as the VC dimension of V in F. Itcanbeeasiyshownthatfory { 1, +1} and for V (y, f(x)) = θ( yf(x)) as the oss function, the VC dimension of V in F computed using definition 2.6 is equa to the VC dimension of the set of indicator functions {θ(f(x)), f F}computed using definition 2.5. In the case of rea vaued functions, finiteness of the VC-dimension is ony sufficient for uniform convergence. Later in this section we wi discuss a measure of capacity that provides aso necessary conditions. An important outcome of the work of Vapnik and Chervonenkis is that the uniform deviation between empirica risk and expected risk in a hypothesis space can be bounded in terms of the VC-dimension, as shown in the foowing theorem: Theorem 2.1 (Vapnik and Chervonenkis 1971) Let A V (y, f(x)) B, f F, F be a set of bounded functions and h the VC-dimension of V in F. Then, with probabiity at east 1 η, the foowing inequaity hods simutaneousy for a the eements f of F: I emp [f; ] (B A) h n 2e n( η ) h n 2e h 4 I[f] I emp [f; ]+(B A) n( η ) h 4 (16) The quantity I[f] I emp [f; ] is often caed estimation error, and bounds of the type above are usuay caed VC bounds 10. From eq. (16) it is easy to see that with probabiity at east 1 η: I[ ˆf ] 2(B A) h n 2e h n( η 4 ) I[f 0 ] I[ ˆf ]+2(B A) h n 2e h n( η 4 ) where ˆf is, as in (12), the minimizer of the empirica risk in F. A very interesting feature of inequaities (16) and (17) is that they are non-asymptotic, meaning that they hod for any finite number of data points, and that the error bounds do not necessariy depend on the dimensionaity of the variabe x. Observe that theorem (2.1) and inequaity (17) are meaningfu in practice ony if the VCdimension of the oss function V in F is finite and ess than. Since the space F where the 10 It is important to note that bounds on the expected risk using the anneaed VC-entropy aso exist. These are tighter than the VC-dimension ones. 9 (17)

11 oss function V is defined is usuay very arge (i.e. a functions in L 2 ), one typicay considers smaer hypothesis spaces H. The cost associated with restricting the space is caed the approximation error (see beow). In the iterature, space F where V is defined is caed the target space, whie H is what is caed the hypothesis space. Of course, a the definitions and anaysis above sti hod for H, where we repace f 0 with the minimizer of the expected risk in H, ˆf is now the minimizer of the empirica risk in H, andh the VC-dimension of the oss function V in H. Inequaities (16) and (17) suggest a method for achieving good generaization: not ony minimize the empirica risk, but instead minimize a combination of the empirica risk and the compexity of the hypothesis space. This observation eads us to the method of Structura Risk Minimization that we describe next. 2.2 The method of Structura Risk Minimization The idea of SRMis to define a nested sequence of hypothesis spaces H 1 H 2... H n() with n() a non-decreasing integer function of, where each hypothesis space H i has VC-dimension finite and arger than that of a previous sets, i.e. if h i is the VC-dimension of space H i,then h 1 h 2... h n(). For exampe H i coud be the set of poynomias of degree i, orasetof spines with i nodes, or some more compicated noninear parameterization. For each eement H i of the structure the soution of the earning probem is: ˆf i, =argmin f H i I emp [f; ] (18) Because of the way we define our structure it shoud be cear that the arger i is the smaer the empirica error of ˆf i, is (since we have greater fexibiity to fit our training data), but the arger the VC-dimension part (second term) of the right hand side of (16) is. Using such a nested sequence of more and more compex hypothesis spaces, the SRMearning technique consists of choosing the space H n () for which the right hand side of inequaity (16) is minimized. It can be shown [94] that for the chosen soution ˆf n (), inequaities (16) and (17) hod with probabiity at east (1 η) n() 1 n()η 11, where we repace h with h n (), f 0 with the minimizer of the expected risk in H n (), nameyf n (), and ˆf with ˆf n (),. With an appropriate choice of n() 12 it can be shown that as and n(), the expected risk of the soution of the method approaches in probabiity the minimum of the expected risk in H = H i,nameyi[f H ]. Moreover, if the target function f 0 beongs to the cosure of H, then eq. (12) hods in probabiity (see for exampe [96]). However, in practice is finite ( sma ), so n() is sma which means that H = n() H i is a sma space. Therefore I[f H ] may be much arger than the expected risk of our target function f 0,sincef 0 may not be in H. The distance between I[f H ]andi[f 0 ] is caed the approximation error and can be bounded using resuts from approximation theory. We do not discuss these resuts here and refer the reader to [54, 26]. 2.3 ɛ-uniform convergence and the V γ dimension As mentioned above finiteness of the VC-dimension is not a necessary condition for uniform convergence in the case of rea vaued functions. To get a necessary condition we need a sight 11 We want (16) to hod simutaneousy for a spaces H i, since we choose the best ˆf i,. 12 Various cases are discussed in [27], i.e. n() =. 10

12 extension of the VC-dimension that has been deveoped (among others) in [50, 2], known as the V γ dimension 13. Here we summarize the main resuts of that theory that we wi aso use ater on to design regression machines for which we wi have distribution independent uniform convergence. Definition 2.7 Let A V (y, f(x)) B, f F, with A and B<. The V γ -dimension of V in F (of the set {V (y, f(x)), f F}) is defined as the the maximum number h of vectors (x 1,y 1 )...,(x h,y h ) that can be separated into two casses in a 2 h possibe ways using rues: cass 1 if: V (y i,f(x i )) s + γ cass 0 if: V (y i,f(x i )) s γ for f Fand some s 0. If, for any number N, it is possibe to find N points (x 1,y 1 )...,(x N,y N ) that can be separated in a the 2 N possibe ways, we wi say that the V γ -dimension of V in F is infinite. Notice that for γ = 0 this definition becomes the same as definition 2.6 for VC-dimension. Intuitivey, for γ>0 the rue for separating points is more restrictive than the rue in the case γ = 0. It requires that there is a margin between the points: points for which V (y, f(x)) is between s + γ and s γ are not cassified. As a consequence, the V γ dimension is a decreasing function of γ and in particuar is smaer than the VC-dimension. If V is an indicator function, say θ( yf(x)), then for any γ definition 2.7 reduces to that of the VC-dimension of a set of indicator functions. Generaizing sighty the definition of eq. (14) we wi say that for a given ɛ>0theermmethod converges ɛ-uniformy in F in probabiity, (or that there is ɛ-uniform convergence) if: im P { sup I emp [f; ] I[f] >ɛ f F } =0. (19) Notice that if eq. (19) hods for every ɛ>0 we have uniform convergence (eq. (14)). It can be shown (variation of [96]) that ɛ-uniform convergence in probabiity impies that: I[ ˆf ] I[f 0 ]+2ɛ (20) in probabiity, where, as before, ˆf is the minimizer of the empirica risk and f 0 is the minimizer of the expected expected risk in F 14. The basic theorems for the V γ -dimension are the foowing: Theorem 2.2 (Aon et a., 1993 ) Let A V (y, f(x))) B, f F, F be a set of bounded functions. For any ɛ>0, ifthev γ dimension of V in F is finite for γ = αɛ for some constant α 1, then the ERM method ɛ-converges in probabiity. 48 Theorem 2.3 (Aon et a., 1993 ) Let A V (y, f(x))) B, f F, F be a set of bounded functions. The ERM method uniformy converges (in probabiity) if and ony if the V γ dimension of V in F is finite for every γ>0. So finiteness of the V γ dimension for every γ>0is a necessary and sufficient condition for distribution independent uniform convergence of the ERM method for rea-vaued functions. 13 In the iterature, other quantities, such as the fat-shattering dimension and the P γ dimension, are aso defined. They are cosey reated to each other, and are essentiay equivaent to the V γ dimension for the purpose of this paper. The reader can refer to [2, 7] for an in-depth discussion on this topic. 14 This is ike ɛ-earnabiity in the PAC mode [93]. 11

13 Theorem 2.4 (Aon et a., 1993 ) Let A V (y, f(x)) B, f F, F be a set of bounded functions. For any ɛ 0, for a 2 ɛ 2 we have that if h γ is the V γ dimension of V in F for γ = αɛ (α 1 ), h 48 γ finite, then: { } P sup I emp [f; ] I[f] >ɛ f F G(ɛ,, h γ ), (21) where G is an increasing function of h γ and a decreasing function of ɛ and, with G 0as 15. From this theorem we can easiy see that for any ɛ>0, for a 2 ɛ 2 : P { I[ ˆf ] I[f 0 ]+2ɛ } 1 2G(ɛ,, h γ ), (22) where ˆf is, as before, the minimizer of the empirica risk in F. An important observations to keep in mind is that theorem 2.4 requires the V γ dimension of the oss function V in F. In the case of cassification, this impies that if we want to derive bounds on the expected miscassification we have to use the V γ dimension of the oss function θ( yf(x)) (which is the VC dimension of the set of indicator functions {sgn (f(x)),f F}), and not the V γ dimension of the set F. The theory of the V γ dimension justifies the extended SRMmethod we describe beow. It is important to keep in mind that the method we describe is ony of theoretica interest and wi ony be used ater as a theoretica motivation for RN and SVM. It shoud be cear that a the definitions and anaysis above sti hod for any hypothesis space H, where we repace f 0 with the minimizer of the expected risk in H, ˆf is now the minimizer of the empirica risk in H, and h the VC-dimension of the oss function V in H. Let be the number of training data. For a fixed ɛ>0such that 2,etγ = 1 ɛ,and ɛ 2 48 consider, as before, a nested sequence of hypothesis spaces H 1 H 2... H n(,ɛ),whereeach hypothesis space H i has V γ -dimension finite and arger than that of a previous sets, i.e. if h i is the V γ -dimension of space H i,thenh 1 h 2... h n(,ɛ). For each eement H i of the structure consider the soution of the earning probem to be: ˆf i, =argmin f H i I emp [f; ]. (23) Because of the way we define our structure the arger i is the smaer the empirica error of ˆf i, is (since we have more fexibiity to fit our training data), but the arger the right hand side of inequaity (21) is. Using such a nested sequence of more and more compex hypothesis spaces, this extended SRM earning technique consists of finding the structure eement H n (,ɛ) for which the trade off between empirica error and the right hand side of (21) is optima. One practica idea is to find numericay for each H i the effective ɛ i so that the bound (21) is the same for a H i,andthenchoose ˆf i, for which the sum of the empirica risk and ɛ i is minimized. We conjecture that as, for appropriate choice of n(, ɛ) withn(, ɛ) as,the expected risk of the soution of the method converges in probabiity to a vaue ess than 2ɛ away from the minimum expected risk in H = H i. Notice that we described an SRMmethod for afixedɛ. IftheV γ dimension of H i is finite for every γ>0, we can further modify the extended SRMmethod so that ɛ 0as.Weconjecture that if the target function f 0 beongs to the 15 Cosed forms of G can be derived (see for exampe [2]) but we do not present them here for simpicity of notation. 12

14 cosure of H, thenas, with appropriate choices of ɛ, n(, ɛ) andn (, ɛ) the soution of this SRMmethod can be proven (as before) to satisfy eq. (12) in probabiity. Finding appropriate forms of ɛ, n(, ɛ) andn (, ɛ) is an open theoretica probem (which we beieve to be a technica matter). Again, as in the case of standard SRM, in practice is finite so H = n(,ɛ) H i is a sma space and the soution of this method may have expected risk much arger that the expected risk of the target function. Approximation theory can be used to bound this difference [61]. The proposed method is difficut to impement in practice since it is difficut to decide the optima trade off between empirica error and the bound (21). If we had constructive bounds on the deviation between the empirica and the expected risk ike that of theorem 2.1 then we coud have a practica way of choosing the optima eement of the structure. Unfortunatey existing bounds of that type [2, 7] are not tight. So the fina choice of the eement of the structure may be done in practice using other techniques such as cross-vaidation [102]. 2.4 Overview of our approach In order to set the stage for the next two sections on reguarization and Support Vector Machines, we outine here how we can justify the proper use of the RN and the SVMfunctionas (see (3)) in the framework of the SRMprincipes just described. The basic idea is to define a structure in terms of a nested sequence of hypothesis spaces H 1 H 2... H n() with H m being the set of functions f in the RKHS with: f K A m, (24) where A m is a monotonicay increasing sequence of positive constants. Foowing the SRM method outined above, for each m we wi minimize the empirica risk 1 V (y i,f(x i )), subject to the constraint (24). This in turn eads to using the Lagrange mutipier λ m and to minimizing 1 V (y i,f(x i )) + λ m ( f 2 K A2 m ), with respect to f and maximizing with respect to λ m 0 for each eement of the structure. We can then choose the optima n () and the associated λ (), and get the optima soution ˆf n (). The soution we get using this method is ceary the same as the soution of: 1 V (y i,f(x i )) + λ () f 2 K (25) where λ () is the optima Lagrange mutipier corresponding to the optima eement of the structure A n (). Notice that this approach is quite genera. In particuar it can be appied to cassica L 2 reguarization, to SVMregression, and, as we wi see, to SVMcassification with the appropriate V (, ). In section 6 we wi describe in detai this approach for the case that the eements of the structure are infinite dimensiona RKHS. We have outined this theoretica method here so that the reader 13

15 understands our motivation for reviewing in the next two sections the approximation schemes resuting from the minimization of functionas of the form of equation (25) for three specific choices of the oss function V : V (y, f(x)) = (y f(x)) 2 for reguarization. V (y, f(x)) = y f(x) ɛ for SVMregression. V (y, f(x)) = 1 yf(x) + for SVMcassification. For SVMcassification the oss functions: V (y, f(x)) = θ(1 yf(x)) (hard margin oss function), and V (y, f(x)) = θ( yf(x)) (miscassification oss function) wi aso be discussed. First we present an overview of RKHS which are the hypothesis spaces we consider in the paper. 3 Reproducing Kerne Hibert Spaces: a brief overview A Reproducing Kerne Hibert Space (RKHS) [5] is a Hibert space H of functions defined over some bounded domain X R d with the property that, for each x X, the evauation functionas F x defined as F x [f] =f(x) f H are inear, bounded functionas. The boundedness means that there exists a U = U x R + such that: F x [f] = f(x) U f for a f in the RKHS. It can be proved [102] that to every RKHS H there corresponds a unique positive definite function K(x, y) oftwovariabes inx, caed the reproducing kerne of H (hence the terminoogy RKHS), that has the foowing reproducing property: f(x) =<f(y),k(y, x) > H f H, (26) where <, > H denotes the scaar product in H. The function K behaves in H as the deta function does in L 2, athough L 2 is not a RKHS (the functionas F x are ceary not bounded). To make things cearer we sketch a way to construct a RKHS, which is reevant to our paper. The mathematica detais (such as the convergence or not of certain series) can be found in the theory of integra equations [45, 20, 23]. Let us assume that we have a sequence of positive numbers λ n and ineary independent functions φ n (x) such that they define a function K(x, y) in the foowing way 16 : K(x, y) λ n φ n (x)φ n (y), (27) n=0 16 When working with compex functions φ n (x) this formua shoud be repaced with K(x, y) n=0 λ nφ n (x)φ n (y) 14

16 where the series is we defined (for exampe it converges uniformy). A simpe cacuation shows that the function K defined in eq. (27) is positive definite. Let us now take as our Hibert space to be the set of functions of the form: f(x) = a n φ n (x) (28) n=0 for any a n R, and define the scaar product in our space to be: a n d n < a n φ n (x), d n φ n (x) > H. (29) n=0 n=0 n=0 λ n Assuming that a the evauation functionas are bounded, it is now easy to check that such an Hibert space is a RKHS with reproducing kerne given by K(x, y). In fact we have: a n λ n φ n (x) <f(y),k(y, x) > H = = a n φ n (x) =f(x), (30) n=0 λ n n=0 hence equation (26) is satisfied. Notice that when we have a finite number of φ n,theλ n can be arbitrary (finite) numbers, since convergence is ensured. In particuar they can a be equa to one. Generay, it is easy to show [102] that whenever a function K of the form (27) is avaiabe, it is possibe to construct a RKHS as shown above. Vice versa, for any RKHS there is a unique kerne K and corresponding λ n, φ n, that satisfy equation (27) and for which equations (28), (29) and (30) hod for a functions in the RKHS. Moreover, equation (29) shows that the norm of the RKHS has the form: f 2 K = a 2 n. (31) n=0 λ n The φ n consist a basis for the RKHS (not necessariy orthonorma), and the kerne K is the correation matrix associated with these basis functions. It is in fact we know that there is a cose reation between Gaussian processes and RKHS [58, 40, 72]. Wahba [102] discusses in depth the reation between reguarization, RKHS and correation functions of Gaussian processes. The choice of the φ n defines a space of functions the functions that are spanned by the φ n. We aso ca the space {(φ n (x)) n=1, x X} the feature space induced by the kerne K. The choice of the φ n defines the feature space where the data x are mapped. In this paper we refer to the dimensionaity of the feature space as the dimensionaity of the RKHS. This is ceary equa to the number of basis eements φ n, which does not necessariy have to be infinite. For exampe, with K a Gaussian, the dimensionaity of the RKHS is infinite (φ n (x) are the Fourier components e in x ), whie when K is a poynomia of degree k (K(x, y) =(1+x y) k - see section 4), the dimensionaity of the RKHS is finite, and a the infinite sums above are repaced with finite sums. It is we known that expressions of the form (27) actuay abound. In fact, it foows from Mercer s theorem [45] that any function K(x, y) which is the kerne of a positive operator 17 in L 2 (Ω) has an expansion of the form (27), in which the φ i and the λ i are respectivey the orthogona eigenfunctions and the positive eigenvaues of the operator corresponding to K. In 17 We remind the reader that positive definite operators in L 2 are sef-adjoint operators suchthat <Kf,f> 0 for a f L 2. 15

17 [91] it is reported that the positivity of the operator associated to K is equivaent to the statement that the kerne K is positive definite, that is the matrix K ij = K(x i, x j ) is positive definite for a choices of distinct points x i X. NoticethatakerneK coud have an expansion of the form (27) in which the φ n are not necessariy its eigenfunctions. The ony requirement is that the φ n are ineary independent but not necessariy orthogona. In the case that the space X has finite cardinaity, the functions f are evauated ony at a finite number of points x. If M is the cardinaity of X, then the RKHS becomes an M-dimensiona space where the functions f are basicay M-dimensiona vectors, the kerne K becomes an M M matrix, and the condition that makes it a vaid kerne is that it is a symmetric positive definite matrix (semi-definite if M is arger than the dimensionaity of the RKHS). Positive definite matrices are known to be the ones which define dot products, i.e. fkf T 0 for every f in the RKHS. The space consists of a M-dimensiona vectors f with finite norm fkf T. Summarizing, RKHS are Hibert spaces where the dot product is defined using a function K(x, y) which needs to be positive definite just ike in the case that X has finite cardinaity. The eements of the RKHS are a functions f that have a finite norm given by equation (31). Notice the equivaence of a) choosing a specific RKHS H b) choosing a set of φ n and λ n c) choosing a reproducing kerne K. The ast one is the most natura for most appications. A simpe exampe of a RKHS is presented in Appendix B. Finay, it is usefu to notice that the soutions of the methods we discuss in this paper can be written both in the form (2), and in the form (28). Often in the iterature formuation (2) is caed the dua form of f, whie (28) is caed the prima form of f. 4 Reguarization Networks In this section we consider the approximation scheme that arises from the minimization of the quadratic functiona min H[f] =1 (y i f(x i )) 2 + λ f 2 K (32) f H for a fixed λ. Formuations ike equation (32) are a specia form of reguarization theory deveoped by Tikhonov, Ivanov [92, 46] and others to sove i-posed probems and in particuar to sove the probem of approximating the functiona reation between x and y given a finite number of exampes D = {x i,y i }. As we mentioned in the previous sections our motivation in this paper is to use this formuation as an approximate impementation of Vapnik s SRMprincipe. In cassica reguarization the data term is an L 2 oss function for the empirica risk, whereas the second term caed stabiizer is usuay written as a functiona Ω(f) with certain properties [92, 69, 39]. Here we consider a specia cass of stabiizers, that is the norm f 2 K in a RKHS induced by a symmetric, positive definite function K(x, y). This choice aows us to deveop a framework of reguarization which incudes most of the usua reguarization schemes. The ony significant omission in this treatment that we make here for simpicity is the restriction on K to be symmetric positive definite so that the stabiizer is a norm. However, the theory can be extended without probems to the case in which K is positive semidefinite, in which case the stabiizer is a semi-norm [102, 56, 31, 33]. This approach was aso sketched in [90]. The stabiizer in equation (32) effectivey constrains f to be in the RKHS defined by K. It is possibe to show (see for exampe [69, 39]) that the function that minimizes the functiona (32) 16

18 has the form: f(x) = c i K(x, x i ), (33) where the coefficients c i depend on the data and satisfy the foowing inear system of equations: (K + λi)c = y (34) where I is the identity matrix, and we have defined (y) i = y i, (c) i = c i, (K) ij = K(x i, x j ). It is remarkabe that the soution of the more genera case of min H[f] =1 V (y i f(x i )) + λ f 2 f H K, (35) where the function V is any differentiabe function, is quite simiar: the soution has exacty the same genera form of (33), though the coefficients cannot be found anymore by soving a inear system of equations as in equation (34) [37, 40, 90]. For a proof see Appendix C. The approximation scheme of equation (33) has a simpe interpretation in terms of a network with one ayer of hidden units [71, 39]. Using different kernes we get various RN s. A short ist of exampes is given in Tabe 1. Kerne Function Reguarization Network K(x y) =exp( x y 2 ) Gaussian RBF K(x y) =( x y 2 + c 2 ) 1 2 Inverse Mutiquadric K(x y) =( x y 2 + c 2 ) 1 2 Mutiquadric K(x y) = x y 2n+1 Thin pate spines K(x y) = x y 2n n( x y ) K(x, y) =tanh(x y θ) (ony for some vaues of θ) Muti Layer Perceptron K(x, y) =(1+x y) d Poynomia of degree d K(x, y) =B 2n+1 (x y) B-spines K(x, y) = sin(d+1/2)(x y) sin (x y) 2 Trigonometric poynomia of degree d Tabe 1: Some possibe kerne functions. The first four are radia kernes. The mutiquadric and thin pate spines are positive semidefinite and thus require an extension of the simpe RKHS theory of this paper. The ast three kernes were proposed by Vapnik [96], originay for SVM. The ast two kernes are one-dimensiona: mutidimensiona kernes can be buit by tensor products of one-dimensiona ones. The functions B n are piecewise poynomias of degree n, whoseexact definition can be found in [85]. When the kerne K is positive semidefinite, there is a subspace of functions f which have norm f 2 K equa to zero. They form the nu space of the functiona f 2 K and in this case the minimizer of (32) has the form [102]: 17

19 k f(x) = c i K(x, x i )+ b α ψ α (x), (36) α=1 where {ψ α } k α=1 is a basis in the nu space of the stabiizer, which in most cases is a set of poynomias, and therefore wi be referred to as the poynomia term in equation (36). The coefficients b α and c i depend on the data. For the cassica reguarization case of equation (32), the coefficients of equation (36) satisfy the foowing inear system: (K + λi)c +Ψ T b = y, (37) Ψc =0, (38) where I is the identity matrix, and we have defined (y) i = y i, (c) i = c i, (b) i = b i, (K) ij = K(x i, x j ), (Ψ) αi = ψ α (x i ). When the kerne is positive definite, as in the case of the Gaussian, the nu space of the stabiizer is empty. However, it is often convenient to redefine the kerne and the norm induced by it so that the induced RKHS contains ony zero-mean functions, that is functions f 1 (x)s.t. X f 1(x)dx =0. In the case of a radia kerne K, for instance, this amounts to considering a new kerne K (x, y) =K(x, y) λ 0 without the zeroth order Fourier component, and a norm f 2 K = n=1 a 2 n λ n. (39) The nu space induced by the new K is the space of constant functions. Then the minimizer of the corresponding functiona (32) has the form: f(x) = c i K (x, x i )+b, (40) with the coefficients satisfying equations (37) and (38), that respectivey become: (K + λi)c + 1b =(K λ 0 I + λi)c + 1b =(K +(λ λ 0 )I)c + 1b = y, (41) c i =0. (42) Equations (40) and (42) impy that the the minimizer of (32) is of the form: f(x) = c i K (x, x i )+b = c i (K(x, x i ) λ 0 )+b = c i K(x, x i )+b. (43) Thus we can effectivey use a positive definite K and the constant b, since the ony change in equation (41) just amounts to the use of a different λ. Choosingtouseanon-zerob effectivey 18

20 means choosing a different feature space and a different stabiizer from the usua case of equation (32): the constant feature is not considered in the RKHS norm and therefore is not penaized. This choice is often quite reasonabe, since in many regression and, especiay, cassification probems, shifts by a constant in f shoud not be penaized. In summary, the argument of this section shows that using a RN of the form (43) (for a certain cass of kernes K) is equivaent to minimizing functionas such as (32) or (35). The choice of K is equivaent to the choice of a corresponding RKHS and eads to various cassica earning techniques such as RBF networks. We discuss connections between reguarization and other techniques in sections 4.2 and 4.3. Notice that in the framework we use here the kernes K are not required to be radia or even shift-invariant. Reguarization techniques used to sove supervised earning probems [69, 39] were typicay used with shift invariant stabiizers (tensor product and additive stabiizers are exceptions, see [39]). We now turn to such kernes. 4.1 Radia Basis Functions Let us consider a specia case of the kerne K of the RKHS, which is the standard case in severa papers and books on reguarization [102, 70, 39]: the case in which K is shift invariant, that is K(x, y) =K(x y) and the even more specia case of a radia kerne K(x, y) =K( x y ). Section 3 impies that a radia positive definite K defines a RKHS in which the features φ n are Fourier components that is K(x, y) λ n φ n (x)φ n (y) λ n e i2πn x e i2πn y. (44) n=0 n=0 Thus any positive definite radia kerne defines a RKHS over [0, 1] with a scaar product of the form: <f,g> H n=0 f(n) g (n) λ n, (45) where f is the Fourier transform of f. The RKHS becomes simpy the subspace of L 2 ([0, 1] d )of the functions such that f 2 K = f(n) 2 < +. (46) n=1 λ n Functionas of the form (46) are known to be smoothness functionas. In fact, the rate of decrease to zero of the Fourier transform of the kerne wi contro the smoothness property of the function in the RKHS. For radia kernes the minimizer of equation (32) becomes: f(x) = c i K( x x i )+b (47) and the corresponding RN is a Radia Basis Function Network. Thus Radia Basis Function networks are a specia case of RN [69, 39]. In fact a transation-invariant stabiizers K(x, x i )=K(x x i ) correspond to RKHS s where the basis functions φ n are Fourier eigenfunctions and ony differ in the spectrum of the eigenvaues (for a Gaussian stabiizer the spectrum is Gaussian, that is λ n = Ae ( n2/2) (for σ = 1)). For 19

21 exampe, if λ n =0foran>n 0, the corresponding RKHS consists of a bandimited functions, that is functions with zero Fourier components at frequencies higher than n Generay λ n are such that they decrease as n increases, therefore restricting the cass of functions to be functions with decreasing high frequency Fourier components. In cassica reguarization with transation invariant stabiizers and associated kernes, the common experience, often reported in the iterature, is that the form of the kerne does not matter much. We conjecture that this may be because a transation invariant K induce the same type of φ n features - the Fourier basis functions. 4.2 Reguarization, generaized spines and kerne smoothers A number of approximation and earning techniques can be studied in the framework of reguarization theory and RKHS. For instance, starting from a reproducing kerne it is easy [5] to construct kernes that correspond to tensor products of the origina RKHS; it is aso easy to construct the additive sum of severa RKHS in terms of a reproducing kerne. Tensor Product Spines: In the particuar case that the kerne is of the form: K(x, y) =Π d j=1 k(xj,y j ) where x j is the jth coordinate of vector x and k is a positive definite function with onedimensiona input vectors, the soution of the reguarization probem becomes: f(x) = i c i Π d j=1 k(xj i,x j ) Therefore we can get tensor product spines by choosing kernes of the form above [5]. Additive Spines: In the particuar case that the kerne is of the form: d K(x, y) = k(x j,y j ) j=1 where x j is the jth coordinate of vector x and k is a positive definite function with onedimensiona input vectors, the soution of the reguarization probem becomes: f(x) = i d d c i ( k(x j i,x j )) = j=1 j=1( i d c i k(x j i,x j )) = f j (x j ) j=1 So in this particuar case we get the cass of additive approximation schemes of the form: d f(x) = f j (x j ) j=1 A more extensive discussion on reations between known approximation methods and reguarization can be found in [39]. 18 The simpest K is then K(x, y) =sinc(x y), or kernes that are convoution with it. 20

Statistical Learning Theory: A Primer

Statistical Learning Theory: A Primer Internationa Journa of Computer Vision 38(), 9 3, 2000 c 2000 uwer Academic Pubishers. Manufactured in The Netherands. Statistica Learning Theory: A Primer THEODOROS EVGENIOU, MASSIMILIANO PONTIL AND TOMASO

More information

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio

A unified framework for Regularization Networks and Support Vector Machines. Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1654 March 23, 1999

More information

Statistical Learning Theory: a Primer

Statistical Learning Theory: a Primer ??,??, 1 6 (??) c?? Kuwer Academic Pubishers, Boston. Manufactured in The Netherands. Statistica Learning Theory: a Primer THEODOROS EVGENIOU AND MASSIMILIANO PONTIL Center for Bioogica and Computationa

More information

Regularization Networks and Support Vector Machines

Regularization Networks and Support Vector Machines Advances in Computational Mathematics x (1999) x-x 1 Regularization Networks and Support Vector Machines Theodoros Evgeniou, Massimiliano Pontil, Tomaso Poggio Center for Biological and Computational Learning

More information

SVM: Terminology 1(6) SVM: Terminology 2(6)

SVM: Terminology 1(6) SVM: Terminology 2(6) Andrew Kusiak Inteigent Systems Laboratory 39 Seamans Center he University of Iowa Iowa City, IA 54-57 SVM he maxima margin cassifier is simiar to the perceptron: It aso assumes that the data points are

More information

Explicit overall risk minimization transductive bound

Explicit overall risk minimization transductive bound 1 Expicit overa risk minimization transductive bound Sergio Decherchi, Paoo Gastado, Sandro Ridea, Rodofo Zunino Dept. of Biophysica and Eectronic Engineering (DIBE), Genoa University Via Opera Pia 11a,

More information

A Brief Introduction to Markov Chains and Hidden Markov Models

A Brief Introduction to Markov Chains and Hidden Markov Models A Brief Introduction to Markov Chains and Hidden Markov Modes Aen B MacKenzie Notes for December 1, 3, &8, 2015 Discrete-Time Markov Chains You may reca that when we first introduced random processes,

More information

Multilayer Kerceptron

Multilayer Kerceptron Mutiayer Kerceptron Zotán Szabó, András Lőrincz Department of Information Systems, Facuty of Informatics Eötvös Loránd University Pázmány Péter sétány 1/C H-1117, Budapest, Hungary e-mai: szzoi@csetehu,

More information

From Margins to Probabilities in Multiclass Learning Problems

From Margins to Probabilities in Multiclass Learning Problems From Margins to Probabiities in Muticass Learning Probems Andrea Passerini and Massimiiano Ponti 2 and Paoo Frasconi 3 Abstract. We study the probem of muticass cassification within the framework of error

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part IX The EM agorithm In the previous set of notes, we taked about the EM agorithm as appied to fitting a mixture of Gaussians. In this set of notes, we give a broader view

More information

4 Separation of Variables

4 Separation of Variables 4 Separation of Variabes In this chapter we describe a cassica technique for constructing forma soutions to inear boundary vaue probems. The soution of three cassica (paraboic, hyperboic and eiptic) PDE

More information

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with?

Bayesian Learning. You hear a which which could equally be Thanks or Tanks, which would you go with? Bayesian Learning A powerfu and growing approach in machine earning We use it in our own decision making a the time You hear a which which coud equay be Thanks or Tanks, which woud you go with? Combine

More information

Week 6 Lectures, Math 6451, Tanveer

Week 6 Lectures, Math 6451, Tanveer Fourier Series Week 6 Lectures, Math 645, Tanveer In the context of separation of variabe to find soutions of PDEs, we encountered or and in other cases f(x = f(x = a 0 + f(x = a 0 + b n sin nπx { a n

More information

6 Wave Equation on an Interval: Separation of Variables

6 Wave Equation on an Interval: Separation of Variables 6 Wave Equation on an Interva: Separation of Variabes 6.1 Dirichet Boundary Conditions Ref: Strauss, Chapter 4 We now use the separation of variabes technique to study the wave equation on a finite interva.

More information

Separation of Variables and a Spherical Shell with Surface Charge

Separation of Variables and a Spherical Shell with Surface Charge Separation of Variabes and a Spherica She with Surface Charge In cass we worked out the eectrostatic potentia due to a spherica she of radius R with a surface charge density σθ = σ cos θ. This cacuation

More information

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES

MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES MATH 172: MOTIVATION FOR FOURIER SERIES: SEPARATION OF VARIABLES Separation of variabes is a method to sove certain PDEs which have a warped product structure. First, on R n, a inear PDE of order m is

More information

Appendix A: MATLAB commands for neural networks

Appendix A: MATLAB commands for neural networks Appendix A: MATLAB commands for neura networks 132 Appendix A: MATLAB commands for neura networks p=importdata('pn.xs'); t=importdata('tn.xs'); [pn,meanp,stdp,tn,meant,stdt]=prestd(p,t); for m=1:10 net=newff(minmax(pn),[m,1],{'tansig','purein'},'trainm');

More information

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA)

FRST Multivariate Statistics. Multivariate Discriminant Analysis (MDA) 1 FRST 531 -- Mutivariate Statistics Mutivariate Discriminant Anaysis (MDA) Purpose: 1. To predict which group (Y) an observation beongs to based on the characteristics of p predictor (X) variabes, using

More information

$, (2.1) n="# #. (2.2)

$, (2.1) n=# #. (2.2) Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

C. Fourier Sine Series Overview

C. Fourier Sine Series Overview 12 PHILIP D. LOEWEN C. Fourier Sine Series Overview Let some constant > be given. The symboic form of the FSS Eigenvaue probem combines an ordinary differentia equation (ODE) on the interva (, ) with a

More information

4 1-D Boundary Value Problems Heat Equation

4 1-D Boundary Value Problems Heat Equation 4 -D Boundary Vaue Probems Heat Equation The main purpose of this chapter is to study boundary vaue probems for the heat equation on a finite rod a x b. u t (x, t = ku xx (x, t, a < x < b, t > u(x, = ϕ(x

More information

XSAT of linear CNF formulas

XSAT of linear CNF formulas XSAT of inear CN formuas Bernd R. Schuh Dr. Bernd Schuh, D-50968 Kön, Germany; bernd.schuh@netcoogne.de eywords: compexity, XSAT, exact inear formua, -reguarity, -uniformity, NPcompeteness Abstract. Open

More information

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil

A note on the generalization performance of kernel classifiers with margin. Theodoros Evgeniou and Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 68 November 999 C.B.C.L

More information

An approximate method for solving the inverse scattering problem with fixed-energy data

An approximate method for solving the inverse scattering problem with fixed-energy data J. Inv. I-Posed Probems, Vo. 7, No. 6, pp. 561 571 (1999) c VSP 1999 An approximate method for soving the inverse scattering probem with fixed-energy data A. G. Ramm and W. Scheid Received May 12, 1999

More information

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model

Appendix of the Paper The Role of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Model Appendix of the Paper The Roe of No-Arbitrage on Forecasting: Lessons from a Parametric Term Structure Mode Caio Ameida cameida@fgv.br José Vicente jose.vaentim@bcb.gov.br June 008 1 Introduction In this

More information

Some Properties of Regularized Kernel Methods

Some Properties of Regularized Kernel Methods Journa of Machine Learning Research 5 (2004) 1363 1390 Submitted 12/03; Revised 7/04; Pubished 10/04 Some Properties of Reguarized Kerne Methods Ernesto De Vito Dipartimento di Matematica Università di

More information

arxiv: v1 [math.ca] 6 Mar 2017

arxiv: v1 [math.ca] 6 Mar 2017 Indefinite Integras of Spherica Besse Functions MIT-CTP/487 arxiv:703.0648v [math.ca] 6 Mar 07 Joyon K. Boomfied,, Stephen H. P. Face,, and Zander Moss, Center for Theoretica Physics, Laboratory for Nucear

More information

A proposed nonparametric mixture density estimation using B-spline functions

A proposed nonparametric mixture density estimation using B-spline functions A proposed nonparametric mixture density estimation using B-spine functions Atizez Hadrich a,b, Mourad Zribi a, Afif Masmoudi b a Laboratoire d Informatique Signa et Image de a Côte d Opae (LISIC-EA 4491),

More information

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS

LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL HARMONICS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Physics 8.07: Eectromagnetism II October 7, 202 Prof. Aan Guth LECTURE NOTES 9 TRACELESS SYMMETRIC TENSOR APPROACH TO LEGENDRE POLYNOMIALS AND SPHERICAL

More information

Problem set 6 The Perron Frobenius theorem.

Problem set 6 The Perron Frobenius theorem. Probem set 6 The Perron Frobenius theorem. Math 22a4 Oct 2 204, Due Oct.28 In a future probem set I want to discuss some criteria which aow us to concude that that the ground state of a sef-adjoint operator

More information

A. Distribution of the test statistic

A. Distribution of the test statistic A. Distribution of the test statistic In the sequentia test, we first compute the test statistic from a mini-batch of size m. If a decision cannot be made with this statistic, we keep increasing the mini-batch

More information

Higher dimensional PDEs and multidimensional eigenvalue problems

Higher dimensional PDEs and multidimensional eigenvalue problems Higher dimensiona PEs and mutidimensiona eigenvaue probems 1 Probems with three independent variabes Consider the prototypica equations u t = u (iffusion) u tt = u (W ave) u zz = u (Lapace) where u = u

More information

Partial permutation decoding for MacDonald codes

Partial permutation decoding for MacDonald codes Partia permutation decoding for MacDonad codes J.D. Key Department of Mathematics and Appied Mathematics University of the Western Cape 7535 Bevie, South Africa P. Seneviratne Department of Mathematics

More information

Algorithms to solve massively under-defined systems of multivariate quadratic equations

Algorithms to solve massively under-defined systems of multivariate quadratic equations Agorithms to sove massivey under-defined systems of mutivariate quadratic equations Yasufumi Hashimoto Abstract It is we known that the probem to sove a set of randomy chosen mutivariate quadratic equations

More information

More Scattering: the Partial Wave Expansion

More Scattering: the Partial Wave Expansion More Scattering: the Partia Wave Expansion Michae Fower /7/8 Pane Waves and Partia Waves We are considering the soution to Schrödinger s equation for scattering of an incoming pane wave in the z-direction

More information

WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS. B. L. S.

WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS. B. L. S. Indian J. Pure App. Math., 41(1): 275-291, February 2010 c Indian Nationa Science Academy WAVELET LINEAR ESTIMATION FOR DERIVATIVES OF A DENSITY FROM OBSERVATIONS OF MIXTURES WITH VARYING MIXING PROPORTIONS

More information

221B Lecture Notes Notes on Spherical Bessel Functions

221B Lecture Notes Notes on Spherical Bessel Functions Definitions B Lecture Notes Notes on Spherica Besse Functions We woud ike to sove the free Schrödinger equation [ h d r R(r) = h k R(r). () m r dr r m R(r) is the radia wave function ψ( x) = R(r)Y m (θ,

More information

LECTURE NOTES 8 THE TRACELESS SYMMETRIC TENSOR EXPANSION AND STANDARD SPHERICAL HARMONICS

LECTURE NOTES 8 THE TRACELESS SYMMETRIC TENSOR EXPANSION AND STANDARD SPHERICAL HARMONICS MASSACHUSETTS INSTITUTE OF TECHNOLOGY Physics Department Physics 8.07: Eectromagnetism II October, 202 Prof. Aan Guth LECTURE NOTES 8 THE TRACELESS SYMMETRIC TENSOR EXPANSION AND STANDARD SPHERICAL HARMONICS

More information

14 Separation of Variables Method

14 Separation of Variables Method 14 Separation of Variabes Method Consider, for exampe, the Dirichet probem u t = Du xx < x u(x, ) = f(x) < x < u(, t) = = u(, t) t > Let u(x, t) = T (t)φ(x); now substitute into the equation: dt

More information

Universal Consistency of Multi-Class Support Vector Classification

Universal Consistency of Multi-Class Support Vector Classification Universa Consistency of Muti-Cass Support Vector Cassification Tobias Gasmachers Dae Moe Institute for rtificia Inteigence IDSI, 6928 Manno-Lugano, Switzerand tobias@idsia.ch bstract Steinwart was the

More information

Math 124B January 31, 2012

Math 124B January 31, 2012 Math 124B January 31, 212 Viktor Grigoryan 7 Inhomogeneous boundary vaue probems Having studied the theory of Fourier series, with which we successfuy soved boundary vaue probems for the homogeneous heat

More information

Mat 1501 lecture notes, penultimate installment

Mat 1501 lecture notes, penultimate installment Mat 1501 ecture notes, penutimate instament 1. bounded variation: functions of a singe variabe optiona) I beieve that we wi not actuay use the materia in this section the point is mainy to motivate the

More information

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION

ORTHOGONAL MULTI-WAVELETS FROM MATRIX FACTORIZATION J. Korean Math. Soc. 46 2009, No. 2, pp. 281 294 ORHOGONAL MLI-WAVELES FROM MARIX FACORIZAION Hongying Xiao Abstract. Accuracy of the scaing function is very crucia in waveet theory, or correspondingy,

More information

Cryptanalysis of PKP: A New Approach

Cryptanalysis of PKP: A New Approach Cryptanaysis of PKP: A New Approach Éiane Jaumes and Antoine Joux DCSSI 18, rue du Dr. Zamenhoff F-92131 Issy-es-Mx Cedex France eiane.jaumes@wanadoo.fr Antoine.Joux@ens.fr Abstract. Quite recenty, in

More information

The EM Algorithm applied to determining new limit points of Mahler measures

The EM Algorithm applied to determining new limit points of Mahler measures Contro and Cybernetics vo. 39 (2010) No. 4 The EM Agorithm appied to determining new imit points of Maher measures by Souad E Otmani, Georges Rhin and Jean-Marc Sac-Épée Université Pau Veraine-Metz, LMAM,

More information

Two view learning: SVM-2K, Theory and Practice

Two view learning: SVM-2K, Theory and Practice Two view earning: SVM-2K, Theory and Practice Jason D.R. Farquhar jdrf99r@ecs.soton.ac.uk Hongying Meng hongying@cs.york.ac.uk David R. Hardoon drh@ecs.soton.ac.uk John Shawe-Tayor jst@ecs.soton.ac.uk

More information

Discriminant Analysis: A Unified Approach

Discriminant Analysis: A Unified Approach Discriminant Anaysis: A Unified Approach Peng Zhang & Jing Peng Tuane University Eectrica Engineering & Computer Science Department New Oreans, LA 708 {zhangp,jp}@eecs.tuane.edu Norbert Riede Tuane University

More information

Some Measures for Asymmetry of Distributions

Some Measures for Asymmetry of Distributions Some Measures for Asymmetry of Distributions Georgi N. Boshnakov First version: 31 January 2006 Research Report No. 5, 2006, Probabiity and Statistics Group Schoo of Mathematics, The University of Manchester

More information

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS

SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS ISEE 1 SUPPLEMENTARY MATERIAL TO INNOVATED SCALABLE EFFICIENT ESTIMATION IN ULTRA-LARGE GAUSSIAN GRAPHICAL MODELS By Yingying Fan and Jinchi Lv University of Southern Caifornia This Suppementary Materia

More information

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents

MARKOV CHAINS AND MARKOV DECISION THEORY. Contents MARKOV CHAINS AND MARKOV DECISION THEORY ARINDRIMA DATTA Abstract. In this paper, we begin with a forma introduction to probabiity and expain the concept of random variabes and stochastic processes. After

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW Abstract. One of the most efficient methods for determining the equiibria of a continuous parameterized

More information

On the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil

On the V γ Dimension for Regression in Reproducing Kernel Hilbert Spaces. Theodoros Evgeniou, Massimiliano Pontil MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1656 May 1999 C.B.C.L

More information

Homogeneity properties of subadditive functions

Homogeneity properties of subadditive functions Annaes Mathematicae et Informaticae 32 2005 pp. 89 20. Homogeneity properties of subadditive functions Pá Burai and Árpád Száz Institute of Mathematics, University of Debrecen e-mai: buraip@math.kte.hu

More information

FFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection

FFTs in Graphics and Vision. Spherical Convolution and Axial Symmetry Detection FFTs in Graphics and Vision Spherica Convoution and Axia Symmetry Detection Outine Math Review Symmetry Genera Convoution Spherica Convoution Axia Symmetry Detection Math Review Symmetry: Given a unitary

More information

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1

Inductive Bias: How to generalize on novel data. CS Inductive Bias 1 Inductive Bias: How to generaize on nove data CS 478 - Inductive Bias 1 Overfitting Noise vs. Exceptions CS 478 - Inductive Bias 2 Non-Linear Tasks Linear Regression wi not generaize we to the task beow

More information

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES

VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SIAM J. NUMER. ANAL. Vo. 0, No. 0, pp. 000 000 c 200X Society for Industria and Appied Mathematics VALIDATED CONTINUATION FOR EQUILIBRIA OF PDES SARAH DAY, JEAN-PHILIPPE LESSARD, AND KONSTANTIN MISCHAIKOW

More information

Lecture 6: Moderately Large Deflection Theory of Beams

Lecture 6: Moderately Large Deflection Theory of Beams Structura Mechanics 2.8 Lecture 6 Semester Yr Lecture 6: Moderatey Large Defection Theory of Beams 6.1 Genera Formuation Compare to the cassica theory of beams with infinitesima deformation, the moderatey

More information

Determining The Degree of Generalization Using An Incremental Learning Algorithm

Determining The Degree of Generalization Using An Incremental Learning Algorithm Determining The Degree of Generaization Using An Incrementa Learning Agorithm Pabo Zegers Facutad de Ingeniería, Universidad de os Andes San Caros de Apoquindo 22, Las Condes, Santiago, Chie pzegers@uandes.c

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, we as various

More information

CONGRUENCES. 1. History

CONGRUENCES. 1. History CONGRUENCES HAO BILLY LEE Abstract. These are notes I created for a seminar tak, foowing the papers of On the -adic Representations and Congruences for Coefficients of Moduar Forms by Swinnerton-Dyer and

More information

Efficiently Generating Random Bits from Finite State Markov Chains

Efficiently Generating Random Bits from Finite State Markov Chains 1 Efficienty Generating Random Bits from Finite State Markov Chains Hongchao Zhou and Jehoshua Bruck, Feow, IEEE Abstract The probem of random number generation from an uncorreated random source (of unknown

More information

Math 124B January 17, 2012

Math 124B January 17, 2012 Math 124B January 17, 212 Viktor Grigoryan 3 Fu Fourier series We saw in previous ectures how the Dirichet and Neumann boundary conditions ead to respectivey sine and cosine Fourier series of the initia

More information

David Eigen. MA112 Final Paper. May 10, 2002

David Eigen. MA112 Final Paper. May 10, 2002 David Eigen MA112 Fina Paper May 1, 22 The Schrodinger equation describes the position of an eectron as a wave. The wave function Ψ(t, x is interpreted as a probabiity density for the position of the eectron.

More information

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract

Stochastic Complement Analysis of Multi-Server Threshold Queues. with Hysteresis. Abstract Stochastic Compement Anaysis of Muti-Server Threshod Queues with Hysteresis John C.S. Lui The Dept. of Computer Science & Engineering The Chinese University of Hong Kong Leana Goubchik Dept. of Computer

More information

Course 2BA1, Section 11: Periodic Functions and Fourier Series

Course 2BA1, Section 11: Periodic Functions and Fourier Series Course BA, 8 9 Section : Periodic Functions and Fourier Series David R. Wikins Copyright c David R. Wikins 9 Contents Periodic Functions and Fourier Series 74. Fourier Series of Even and Odd Functions...........

More information

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron

A Solution to the 4-bit Parity Problem with a Single Quaternary Neuron Neura Information Processing - Letters and Reviews Vo. 5, No. 2, November 2004 LETTER A Soution to the 4-bit Parity Probem with a Singe Quaternary Neuron Tohru Nitta Nationa Institute of Advanced Industria

More information

STA 216 Project: Spline Approach to Discrete Survival Analysis

STA 216 Project: Spline Approach to Discrete Survival Analysis : Spine Approach to Discrete Surviva Anaysis November 4, 005 1 Introduction Athough continuous surviva anaysis differs much from the discrete surviva anaysis, there is certain ink between the two modeing

More information

Discrete Techniques. Chapter Introduction

Discrete Techniques. Chapter Introduction Chapter 3 Discrete Techniques 3. Introduction In the previous two chapters we introduced Fourier transforms of continuous functions of the periodic and non-periodic (finite energy) type, as we as various

More information

CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION

CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION Statistica Sinica 16(2006), 425-439 CONVERGENCE RATES OF COMPACTLY SUPPORTED RADIAL BASIS FUNCTION REGULARIZATION Yi Lin and Ming Yuan University of Wisconsin-Madison and Georgia Institute of Technoogy

More information

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network

An Algorithm for Pruning Redundant Modules in Min-Max Modular Network An Agorithm for Pruning Redundant Modues in Min-Max Moduar Network Hui-Cheng Lian and Bao-Liang Lu Department of Computer Science and Engineering, Shanghai Jiao Tong University 1954 Hua Shan Rd., Shanghai

More information

Kernel Matching Pursuit

Kernel Matching Pursuit Kerne Matching Pursuit Pasca Vincent and Yoshua Bengio Dept. IRO, Université demontréa C.P. 6128, Montrea, Qc, H3C 3J7, Canada {vincentp,bengioy}@iro.umontrea.ca Technica Report #1179 Département d Informatique

More information

2M2. Fourier Series Prof Bill Lionheart

2M2. Fourier Series Prof Bill Lionheart M. Fourier Series Prof Bi Lionheart 1. The Fourier series of the periodic function f(x) with period has the form f(x) = a 0 + ( a n cos πnx + b n sin πnx ). Here the rea numbers a n, b n are caed the Fourier

More information

Notes: Most of the material presented in this chapter is taken from Jackson, Chap. 2, 3, and 4, and Di Bartolo, Chap. 2. 2π nx i a. ( ) = G n.

Notes: Most of the material presented in this chapter is taken from Jackson, Chap. 2, 3, and 4, and Di Bartolo, Chap. 2. 2π nx i a. ( ) = G n. Chapter. Eectrostatic II Notes: Most of the materia presented in this chapter is taken from Jackson, Chap.,, and 4, and Di Bartoo, Chap... Mathematica Considerations.. The Fourier series and the Fourier

More information

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries

First-Order Corrections to Gutzwiller s Trace Formula for Systems with Discrete Symmetries c 26 Noninear Phenomena in Compex Systems First-Order Corrections to Gutzwier s Trace Formua for Systems with Discrete Symmetries Hoger Cartarius, Jörg Main, and Günter Wunner Institut für Theoretische

More information

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION

CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION CONJUGATE GRADIENT WITH SUBSPACE OPTIMIZATION SAHAR KARIMI AND STEPHEN VAVASIS Abstract. In this paper we present a variant of the conjugate gradient (CG) agorithm in which we invoke a subspace minimization

More information

Completion. is dense in H. If V is complete, then U(V) = H.

Completion. is dense in H. If V is complete, then U(V) = H. Competion Theorem 1 (Competion) If ( V V ) is any inner product space then there exists a Hibert space ( H H ) and a map U : V H such that (i) U is 1 1 (ii) U is inear (iii) UxUy H xy V for a xy V (iv)

More information

Reichenbachian Common Cause Systems

Reichenbachian Common Cause Systems Reichenbachian Common Cause Systems G. Hofer-Szabó Department of Phiosophy Technica University of Budapest e-mai: gszabo@hps.ete.hu Mikós Rédei Department of History and Phiosophy of Science Eötvös University,

More information

Statistical Inference, Econometric Analysis and Matrix Algebra

Statistical Inference, Econometric Analysis and Matrix Algebra Statistica Inference, Econometric Anaysis and Matrix Agebra Bernhard Schipp Water Krämer Editors Statistica Inference, Econometric Anaysis and Matrix Agebra Festschrift in Honour of Götz Trenker Physica-Verag

More information

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces

Research Article Numerical Range of Two Operators in Semi-Inner Product Spaces Abstract and Appied Anaysis Voume 01, Artice ID 846396, 13 pages doi:10.1155/01/846396 Research Artice Numerica Range of Two Operators in Semi-Inner Product Spaces N. K. Sahu, 1 C. Nahak, 1 and S. Nanda

More information

BALANCING REGULAR MATRIX PENCILS

BALANCING REGULAR MATRIX PENCILS BALANCING REGULAR MATRIX PENCILS DAMIEN LEMONNIER AND PAUL VAN DOOREN Abstract. In this paper we present a new diagona baancing technique for reguar matrix pencis λb A, which aims at reducing the sensitivity

More information

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems

Componentwise Determination of the Interval Hull Solution for Linear Interval Parameter Systems Componentwise Determination of the Interva Hu Soution for Linear Interva Parameter Systems L. V. Koev Dept. of Theoretica Eectrotechnics, Facuty of Automatics, Technica University of Sofia, 1000 Sofia,

More information

Asynchronous Control for Coupled Markov Decision Systems

Asynchronous Control for Coupled Markov Decision Systems INFORMATION THEORY WORKSHOP (ITW) 22 Asynchronous Contro for Couped Marov Decision Systems Michae J. Neey University of Southern Caifornia Abstract This paper considers optima contro for a coection of

More information

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons

Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Expectation-Maximization for Estimating Parameters for a Mixture of Poissons Brandon Maone Department of Computer Science University of Hesini February 18, 2014 Abstract This document derives, in excrutiating

More information

Coupling of LWR and phase transition models at boundary

Coupling of LWR and phase transition models at boundary Couping of LW and phase transition modes at boundary Mauro Garaveo Dipartimento di Matematica e Appicazioni, Università di Miano Bicocca, via. Cozzi 53, 20125 Miano Itay. Benedetto Piccoi Department of

More information

Smoothness equivalence properties of univariate subdivision schemes and their projection analogues

Smoothness equivalence properties of univariate subdivision schemes and their projection analogues Numerische Mathematik manuscript No. (wi be inserted by the editor) Smoothness equivaence properties of univariate subdivision schemes and their projection anaogues Phiipp Grohs TU Graz Institute of Geometry

More information

Lecture Note 3: Stationary Iterative Methods

Lecture Note 3: Stationary Iterative Methods MATH 5330: Computationa Methods of Linear Agebra Lecture Note 3: Stationary Iterative Methods Xianyi Zeng Department of Mathematica Sciences, UTEP Stationary Iterative Methods The Gaussian eimination (or

More information

On Non-Optimally Expanding Sets in Grassmann Graphs

On Non-Optimally Expanding Sets in Grassmann Graphs ectronic Cooquium on Computationa Compexity, Report No. 94 (07) On Non-Optimay xpanding Sets in Grassmann Graphs Irit Dinur Subhash Khot Guy Kinder Dor Minzer Mui Safra Abstract The paper investigates

More information

Stochastic Variational Inference with Gradient Linearization

Stochastic Variational Inference with Gradient Linearization Stochastic Variationa Inference with Gradient Linearization Suppementa Materia Tobias Pötz * Anne S Wannenwetsch Stefan Roth Department of Computer Science, TU Darmstadt Preface In this suppementa materia,

More information

SVM-based Supervised and Unsupervised Classification Schemes

SVM-based Supervised and Unsupervised Classification Schemes SVM-based Supervised and Unsupervised Cassification Schemes LUMINITA STATE University of Pitesti Facuty of Mathematics and Computer Science 1 Targu din Vae St., Pitesti 110040 ROMANIA state@cicknet.ro

More information

FOURIER SERIES ON ANY INTERVAL

FOURIER SERIES ON ANY INTERVAL FOURIER SERIES ON ANY INTERVAL Overview We have spent considerabe time earning how to compute Fourier series for functions that have a period of 2p on the interva (-p,p). We have aso seen how Fourier series

More information

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC

A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC (January 8, 2003) A NOTE ON QUASI-STATIONARY DISTRIBUTIONS OF BIRTH-DEATH PROCESSES AND THE SIS LOGISTIC EPIDEMIC DAMIAN CLANCY, University of Liverpoo PHILIP K. POLLETT, University of Queensand Abstract

More information

Nonlinear Analysis of Spatial Trusses

Nonlinear Analysis of Spatial Trusses Noninear Anaysis of Spatia Trusses João Barrigó October 14 Abstract The present work addresses the noninear behavior of space trusses A formuation for geometrica noninear anaysis is presented, which incudes

More information

arxiv: v1 [math.fa] 23 Aug 2018

arxiv: v1 [math.fa] 23 Aug 2018 An Exact Upper Bound on the L p Lebesgue Constant and The -Rényi Entropy Power Inequaity for Integer Vaued Random Variabes arxiv:808.0773v [math.fa] 3 Aug 08 Peng Xu, Mokshay Madiman, James Mebourne Abstract

More information

Integrating Factor Methods as Exponential Integrators

Integrating Factor Methods as Exponential Integrators Integrating Factor Methods as Exponentia Integrators Borisav V. Minchev Department of Mathematica Science, NTNU, 7491 Trondheim, Norway Borko.Minchev@ii.uib.no Abstract. Recenty a ot of effort has been

More information

Physics 235 Chapter 8. Chapter 8 Central-Force Motion

Physics 235 Chapter 8. Chapter 8 Central-Force Motion Physics 35 Chapter 8 Chapter 8 Centra-Force Motion In this Chapter we wi use the theory we have discussed in Chapter 6 and 7 and appy it to very important probems in physics, in which we study the motion

More information

Nonlinear Gaussian Filtering via Radial Basis Function Approximation

Nonlinear Gaussian Filtering via Radial Basis Function Approximation 51st IEEE Conference on Decision and Contro December 10-13 01 Maui Hawaii USA Noninear Gaussian Fitering via Radia Basis Function Approximation Huazhen Fang Jia Wang and Raymond A de Caafon Abstract This

More information

Formulas for Angular-Momentum Barrier Factors Version II

Formulas for Angular-Momentum Barrier Factors Version II BNL PREPRINT BNL-QGS-06-101 brfactor1.tex Formuas for Anguar-Momentum Barrier Factors Version II S. U. Chung Physics Department, Brookhaven Nationa Laboratory, Upton, NY 11973 March 19, 2015 abstract A

More information

JENSEN S OPERATOR INEQUALITY FOR FUNCTIONS OF SEVERAL VARIABLES

JENSEN S OPERATOR INEQUALITY FOR FUNCTIONS OF SEVERAL VARIABLES PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Voume 128, Number 7, Pages 2075 2084 S 0002-99390005371-5 Artice eectronicay pubished on February 16, 2000 JENSEN S OPERATOR INEQUALITY FOR FUNCTIONS OF

More information

AST 418/518 Instrumentation and Statistics

AST 418/518 Instrumentation and Statistics AST 418/518 Instrumentation and Statistics Cass Website: http://ircamera.as.arizona.edu/astr_518 Cass Texts: Practica Statistics for Astronomers, J.V. Wa, and C.R. Jenkins, Second Edition. Measuring the

More information

c 2007 Society for Industrial and Applied Mathematics

c 2007 Society for Industrial and Applied Mathematics SIAM REVIEW Vo. 49,No. 1,pp. 111 1 c 7 Society for Industria and Appied Mathematics Domino Waves C. J. Efthimiou M. D. Johnson Abstract. Motivated by a proposa of Daykin [Probem 71-19*, SIAM Rev., 13 (1971),

More information