A simple remark on the order ofapproximation by compactly supported radial basis functions, and related networks Xin Li Department of Mathematical Sciences, University of Nevada, Las E-Mail: xinlixin@nevada.edu Abstract We consider simultaneous approximation of multivariate functions and their derivatives by using Wendland's compactly supported radial basis functions <t>s,k' By applying a greedy algorithm, it is presented that, regardless of dimension, an O(ra~*/^) order of approximation can be achieved by a linear combination of m translates of <t>s,k- A similar result on approximation by neural networks is established by using univariate radial functions as the activation functions of the networks. 1. Introduction Multivariate interpolation by radial basis functions has been studied and applied in several areas of mathematics, such as approximation theory (cf. Franke^, Micchelli**, Schaback**), curve and surface fitting (cf. Daehlen, Lyche, and Schumaker*), and numerical solutions of PDE equations (cf. Golberg and Chen^). A function $ is radial if <&(x) = </>( x ), where 0 : R+ > R is a univariate function and x is the usual Euclidean norm of x ft*.for / C(R*) and a set X = {xi,, XAT} C R of distinct points, the radial basis function interpolant a/ is given by N */,x(x) = ]>]a,0(x - x,), (1.1) 3 = 1 where the coefficents c*i,, c%# are determined by 8f,x(*j) = f(xj), l<j<n. (1.2)
336 Boundary Element Technology To ensure the solubility of the interpolation problem (1.1) and (1.2), positive definite functions are used. A continuous function $ : R* > R is positive definite, denoted by <& PD, if for any N Z+, any set of pairwise distinct centers X = {x%,,x#} C R*, and any vector a #^\{0}, the quadratic form N N is positive. If 3> is compactly supported, written as $ CS, the coefficients (*i,, OiN in (1.1) are easy to determine by (1.2). A celebrated Theorem of Bochner (cf. Steward^) characterizes all positive definite functions. In the case that $ is compactly supported, the Theorem is interpreted as: <& is positive definite if and only if its Fourier transform is nonnegative and positive on an open subset. Since the Fourier transform of a radial basis function 3>(x) = </>( x ) is given by = (2rr) R* JO, (1.3) where p = w, and J ^ is the Bessel function of thefirstkind, compactly supported radial basis functions are constructed in Wu^ and Wendland^. In this paper, we consider approximation by using Wendland functions, which we now review the definition and their properties. Following Reference [16], let yoc = r sf(s)ds, Jr and, (1.4) where ^ _a/2j+fc+im = (1 --F)+, and [x\ denotes the largest integer < x. It is shown in Reference [16] that <^& is compactly supported in [0,1], and induces a positive definite function on R* in the way that,.m-/? -*(» )> 0<r<l, where r x, and PS^ is a univariate polynomial of degree [s/2j +3&4-1. Moreover, <f>s^ possesses continuous derivatives up to order 2fc, and it is of minimal degree for a given space dimension s and smoothness 2k and is up to a constant factor uniquely determined by this setting. It is also shown in Reference [17] that the Fourier transform of #^^(x) = <^3,k( x ) satisfies w ^)-^^-'^ (1.5)
Boundary Element Technology 337 for some constants K\ and K^ and it was derived in the Theorem 2.2 of Reference [17], by applying the results of Wu and Schaback^, that if X = {xi, -,Xn} C fi for some compact subset 0 in R* satisfying uniform interior cone condition, then for any / %(#*), where i = a/2 4- k 4-1/2 and Hi(R*} is the Sobolev space, with /i := sup min llx xji " " being sufficiently small. In this paper, instead of using radial interpolants, we apply a greedy algorithm and discuss the approximation by convexly linear combinations of translates of <f>s,k- By applying the results in Reference [10], we present that, regardless of the dimension, an O(ra~*/^) order of approximation can be achieved by a linear combination of m translates of (^&, which is a well known result in the literature of neural networks. Meanwhile, we derive a similar result on the neural networks by using compactly supported univariate radial functions as the activation functions of the networks. 2. Greedy Algorithm and Order of Approximation by Compactly supported Radial Basis Functions We begin by describing a greedy algorithm. For more information on this subject, readers are referred to Jones^, DeVore and Temlyakov^, Davis, Mallat, and Avellaneda^, and Donahue, Gurvits, Darken, and Sontag^. We state a result by Jones^, and also Maurey in Pisier^. Lemma 1 If f is in the closure of the convex hull of a set G in a Hilbert space, with \\g\\ < b for each g G, then for every n > 1, and every c'> I? (I/IP, there is an fn in the convex hull of n points in G such that An iterative procedure for achieving above approximation order is provided by Jones, which we describe as a greedy algorithm as follows. Let /i G such that Inductively, for k > 1, let
338 Boundary Element Technology such that Then, _«_^ (l-o)a + c%7-/. This algorithm is greedy in the sense that we choose optimal approximation in each step. As understood, to ensure that an algorithm converges, certain conditions are needed to be imposed on the function /. Lemma 1 is first applied by Barron* to establish a well known result on approximating a multivariate function in the order of O(ra~*/^) by a network withraneurons (cf. Section 3). It is then applied by the author^ to derive a similar result on simultanuous approximation by translated radial networks, which we especially apply to compactly supported radial basis functions in this section. For a region ft G A*, denote by %%(ft) the Sobolev space consisting of all distributions / on ft with D*f L^(ft) for any k Z+, k < n, where n > 0 is an integer, with 1/2 When ft = jr*, an equivalent norm of / G Hn(R*) is given by a ^1/2 7(w)»(l + w Vdw) l- / In this paper, we consider approximation in Hn := "Hn([ K,K]*)' For a compactly supported 0a,fc> we define a function G by G(x)= ^^( x-27rk ), (2.1) which is 2?r-periodic coordinately. (The function G introduced in (2.1) is slightly different from the one in Reference [10], where ^] 0( 27rk ^) is used to ensure the study of simultaneous approximation.) For / [ 7T,7r]^), its Fourier series expansion is /(%) = where, k
Boundary Element Technology 339 are the Fourier coefficients of /. Notice = (27r)-/ 0..fc( x )e-'< Jfl» = *.,fc(k) (2.2) Let oo}, q > 0. Then, the following lemma follows from (1.5), (2.2), and Lemma 4.3 of Reference [10] and its proof, which is similar in spirit to Lemma 5 proved in the next section. Lemma 2 Let G be given in (2.1) with respect to 0s, A;- Then for any / %%+^ where 26 > m, 6% Wmf #, = ^ke^ W/)A,k(k), / 6, in the closure of the convex hull of ME, (G) = {cg(x t) : t G [ TT, TT]^, c e R with\c\ <Ef}. In the above Lemma and following results, we request 2k > n to ensure <&a,& E "Hn since (j)g^ G C?*. As an obvious conseqence of Lemmas 1 and 2, the following result arrives. Proposition 3 Let f W*~*"^+*. Let 0^ denote the compactly supported radial basis functions of minimal degree, positive definite on R* and in where 2k >n. Then for any integer m > I, there is a Junction where Cj G R, tj G [ 37T,37r]% for I <j < 2*m, such that where C/ is a constant independent of m. Proof By Lemmas 1 and 2, there is a function where Cj G #, tj G [ 7T,7r]^, for 1 < j < m, such that
340 Boundary Element Technology Notice that for each t [ 7T,7r]*, by (2.1), G(- t) is a sum of at most 2* function </>«,&( t 2?rk), k Z% which do not vanish in [ 7T,7r]*, since <&s,&(") = </>s,fc( * ) is supported in [ 1,1]*. The conclusion then follows. 3. Neural Networks by Compactly Supported Functions A neural network with one hidden layer is mathematically expressed as fc=i &,x) 4-0%), (3.1) where a is the activation function of the network, c& R, w& E #*, 0& #, for 1 < k < n, and n is the number of neurons. Approximation by the networks in (3.1) has been extensively studied in the literature, with various results established by many authors under more or less conditions (cf. [1,9], and references therein). In this section we use a compactly supported univariate function <t>i,k(%) to construct neural networks, and establish the following result. Proposition 4 Let f H^*. Let 4%% denote the compactly supported univariate radial basis function of minimal degree, positive definite on R and in C^, where 2k > n. Then for any m > 1, there is a network in the form such that 6=1 where Cf is independent of m. Let For a constant M > 0, set a(x) = V* <t>i,k(x 4-2mr), x E [-TT, TT]. (3.2) c,0e^,w ^, with c < M, w < 1, 0 <?r}. (3.3) Then we have the following lemma. For the sake of convenience to the reader, we present its proof by using a similar argument as in Reference [10]. Lemma 5 Let a be given in (3.2) with respect to 4>i^. Then for any f *H%*~^, where 2k > n, /(x) <%(/) is in the closure of the convex hull oftom,(<r) inhn norm, where M/ = M/)/$i,jb(U ).
Boundary Element Technology 341 Proof Without loss of generality, assume <%(/) = 0. Denote the closure of the convex hull of fi^/ (&) by Co(QMf (<?)) Then Co( lmf (#)) is a bounded subset in Hn- Suppose that / is not in Co(fi^(cr)), then by a standard argument on the dual space and applying the Hahn-Banach Theorem (cf. [13], for instance), there exist g& L?[ TT, TT]*, a < n, such that sup Dhgadx < r < \a\<n (3.4) for some constant r. Observe that for any e > 0, one can easily construct <?a with all order partial derivatives, compactly supported in [ 7r,7r]*, such that \\9a~ 9<*\\L*[-ir,ir]' < Therefore, without loss of generality, we assume that the functions g&, in (3.4) have all order partial derivatives and are compactly supported in [ 7r,7r]*. Evaluating the integrals in (3.4) by parts yields _ (3.5) for any h Cof^M/M). Let s(x) = ^ (-l)i^d^^(x). Then for any N<n C, c < My, Set c j a«w,x) 4-6)s(x)dx < r < I f(x)s(x)dx. (3.6) i/[-7r,7r]* 7[-7r,7r]^ Then (3.6) implies (3.7) For any multi-integer j Z*, j ^ 0, let w^ = j -. Hence w, < 1. We obtain = I s(x) I e'u'*<r J[ TT.TT]^ J -re
342 Boundary Element Technology From (3.7), we have According to the definition of M/, we have / /(xkx)dx < = f, which contradicts (3.5). The proof of the lemma is finished. Proof of Proposition 4 First, by Lemmas 1 and 5, we conclude that there is a network m N(x) = CQ(/) + Y^Cfc<7((Wfc,x) 4-Ok), W& < 1, #& < 7T fc=l such that Observe that (w&,x) -f Ok\ < (s 4- I)TT for x [ 7r,7r]*, and for t G [ (5 + l)?r, (s4-l)7r], a(t) is a sum of at most 5+2 functions </>( 4-2n7r), 2n < s-\-l. Therefore, the conclusion of Propostion 4 follows. Acknowledgement: The author thanks referees' helpful comments. References [1] A. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theory, 39 (1993), 930-945. [2] G. Davis, S. Mallat, and M. Avellaneda, Adaptive greedy approximations, Constr. Approx., 13 (1997), 57-98. [3] M.J. Donahue, L. Gurvits, C. Darken, and E. Sontag, Rates of convex approximation in non-hilbert spaces, Constr. Approx., 13 (1997), 187-220. [4] M. Daehlen, T. Lyche, and L.L. Schumaker, Mathematical Methods for Curves and Surfaces, Nashville & London, Vanderbilt University Press, 1995.
Boundary Element Technology 343 [5] R.A. DeVore and V.N. Temlyakov, Some remarks on greedy algorithms, Advances in Computational Mathematics, 5 (1996), 173-187. [6] R. Franke, Scattered data interpolation: test of some methods, Mathematics of Computation, 38 (1982), 181-200. [7] M.A. Golberg and C.S. Chen, Discrete Projection Methods in Integral Equations, Computational Mechanics Publications, Southampton, Boston, 1997. [8] L.K. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Ann. Statist., 20 (1992), 608-613. [9] X. Li, Simultaneous approximations of multivariate functions and their derivatives by neural networks with one hidden layer, Neurocomputing, 12 (1996), 327-343. [10] X. Li, On simultaneous approximations by radial basis function neural networks, Applied Mathematics and Computation, 95 (1998), 75-89. [11] C.A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constr. Approx., 1 (1986), 11-22. [12] G. Pisier, Remarques sur un resultat non public de B. Maurey, Proc. Seminaire d' analyse fonctionelle, vol. 1-12, Ecole Polytechnique, Palaiseau, 1981. [13] W. Rudin, Functional Analysis, McGraw-Hill, New York, 1973. [14] R. Schaback, Improved error bounds for scattered data interpolation by radial basis function, Math. Comp., to appear. [15] J. Stewart, Positive definite functions and generalizations, an historical survey, Rocky Mountain J. Math., 6 (1976), 409-434. [16] H. Wendland, Piecewise polynomial, positive definite and compactly supported radial basis functions of minimal degree, Advances in Computational Mathematics, 4 (1995), 389-396. [17] H. Wendland, Error estimates for interpolation by compactly supported radial basis functions of minimal degree, J. Approx. Theory, 93 (1998), 258-272. [18] Z. Wu, Compactly supported positive definite radial functions, Advances in Computational Mathematics, 4 (1995) 283-292. [19] Z. Wu and R. Schaback, Local error estimates for radial basis function interpolation of scattered data, IMA Journal of Numerical Analysis, 13 (1993) 13-27.