Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method

Size: px

Start display at page:

Download "Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method"

Eric Tate
5 years ago
Views:

1 Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Bartlomiej Beliczynski Warsaw University of Technology, Institute of Control and Industrial Electronics, ul. Koszykowa 75, -66 Warszawa, Poland Bartlomiej.Beliczynski@ee.pw.edu.pl Abstract. In this paper an approximation of multivariable functions by Hermite basis is presented and discussed. Considered here basis is constructed as a product of one-variable Hermite functions with adjustable scaling parameters. The approximation is calculated via hybrid method, the expansion coefficients by using an explicit, non-search formulae, and scaling parameters are determined via a search algorithm. A set of excessive number of Hermite functions is initially calculated. To constitute the approximation basis only those functions are taken which ensure the fastest error decrease down to a desired level. Working examples are presented, demonstrating a very good generalization property of this method. Keywords: Function approximation, Neural networks, Orthonormal basis. Introduction Thanks their elegance and usefulness, for many years Hermite polynomials and Hermite functions have been attractive in various fields of science and engineering. In quantum mechanics of harmonic oscillators, ultra high band telecommunication channels, ECG data compression and various sorts of approximation tasks they proved to be useful tools. A set of Hermite functions forming an orthonormal basis is naturally suitable for approximation, classification and data compression tasks. These basis functions are defined over the real numbers set R and they can be recursively calculated. The approximating function coefficients can be determined relatively easily to achieve the best approximation property. Since Hermite functions are eigenfunctions of the Fourier transform, time and frequency spectra are simultaneously approximated. Each subsequent basis function extends frequency bandwidth within a limited range of well concentrated energy; see for instance []. By introducing a scaling parameter we may control this bandwidth influencing at the same time the dynamic range of the input argument. As pointed out in [] the product of time and frequency bandwidths for Hermite functions, is the largest over set of continuous functions. Hermite functions display various geometrical shapes controlled by simple parameter(s). It was suggested to use Hermite functions as activation functions in A. Dobnikar, U. Lotrič, and B. Šter (Eds.): ICANNGA, Part I, LNCS 6593, pp. 3 39,. c Springer-Verlag Berlin Heidelberg

2 Approximation of Functions by Multivariable Hermite Basis 3 neural schemes. In [3], a so called constructive approximation scheme is used. It is a type of incremental approximation developed in [4], [5]. Every node in the hidden layer has a different activation function. Intuitively the most appropriate shape can be applied. However, in such approach the orthogonality of Hermite functions is not really exploited. If Hermite, one-variable functions are extended into two-variables, the approximation retains the same useful properties and it turns out to be very suitable for image compression tasks. For n-variables case, although main features are the same, the whole process become more complicated. The biggest advantage of approximation by Hermite basis is, that due to its orthonormality, the approximation does not involve search algorithms. However for an initial step of approximation, one has to consider the time and frequency bandwidths. For one-variable case, these two bandwidths could be controlled by a simple scaling parameter which could be selected to some extent arbitrarily. Much more difficult is to choose appropriate scaling parameters in a multivariable case. So we are suggesting to use a search algorithm for that, while the expansion coefficients are calculated explicitly via appropriate formulae. Because approximation by orthonormal basis is numerically very efficient, one can take advantage of that and calculate a larger number of basis functions, then select from them only those which contribute the most to the approximation error decrease. It seems, that this basis selection procedure is the main reason for a good generalization property of this method. This paper is organized as follows. In Section basic facts about approximation needed for later use are recalled. In Section 3 one-variable Hermite functions as basic components for multivariable case, are shortly described. Then we present our results in Section 4, describing multivariable Hermite basis construction, scaling parameters selection, final choice of basis functions and working examples. Finally in Section 5, conclusions are drawn. Approximation Framework Some selected facts on function approximation useful for this paper will be recalled. Let us consider the following function n f n+ = w i g i, () i= where g i G H,andH is a Hilbert space H =(H,. ), i=,..., n, and w i IR,i=,...,n. For any function f from a Hilbert space H and a closed (finite dimensional) subspace G Hwith basis {g,..., g n } there exists a unique best approximation of f by elements of G ([6]). Let us denote it by g b. Because the error of the best approximation is orthogonal to all elements of the approximation space f g b G, the coefficients w i may be calculated from the following set of linear equations g i,f g b =fori =,..., n () where.,. denotes inner product.

3 3 B. Beliczynski The formula () can also be written as g i,f n k= w kg k = g i,f n k= w k g i,g k =fori =,..., n or in the matrix form Γw= G f (3) where Γ =[ g i,g j ], i,j=,..., n, w =[w,..., w n ] T, G f =[ g,f,..., g n,f ] T and T denotes transposition. Because there exists a unique best approximation of f in a n + dimensional space G with basis {g,..., g n }, the matrix Γ is nonsingular and w b = Γ G f. For any basis {g,..., g n } one can find such orthonormal basis {e,..., e n }, e i,e j =wheni = j and e i,e j =wheni j that span{g,..., g n } = span{e,..., e n }. In such a case, Γ is a unit matrix and Finally () will take the form w b = [ e,f, e,f,..., e n,f ] T. (4) f n+ = n e i,f e i, i =,,..., n. (5) i= The squared error error n+ =<f f n,f f n > of the best approximation of a function f in the basis {e,..., e n } is thus expressible by error n+ = f n wi. (6) In a typically stated approximation problem, a basis of n + functions {e,e,..., e n } is given and we are looking for their expansion coefficients w i = e i,f,i=,,..., n. According to formula (6) those expansion coefficients are contributing directly to the error decrease, and they can be used to order the basis from the most to the least significant as far as error decrease is concerned. 3 One-Variable Hermite Functions Our multivariable basis for approximation will be composed from one-variable Hermite functions, so we will briefly describe these components. Let us consider aspacel + (, + ) with the inner product defined <x,y>= x(t)y(t)dt. In such space a sequence of orthonormal functions could be defined as follows (see for instance [6]): h (t),h (t),..., h n (t),... (7) where h n (t) =c n e t Hn (t); H n (t) =( ) n e t and H n (t) isapolynomial. i= dn dt n (e t ); c n = ( n n!. (8) π) /

4 Approximation of Functions by Multivariable Hermite Basis 33 The polynomials H n (t) are called Hermite polynomials and the functions h n (t) Hermite functions. According to (8) the first several Hermite functions could be calculated h (t) = t e π/4 ; h (t) = π /4 e t t; (9) h (t) = π /4 e t (4t ); h 3 (t) = 4 3π /4 e t (8t 3 t) () Plots of several Hermite functions are shown in Fig h h3 h Fig.. Hermite functions h, h, h 9 One can see that increasing of indices of Hermite functions cause enlarging bandwidths in time and frequency. So when approximating a function, it is reasonable to start from lower indices basis functions and gradually go for higher ones. If approximated function is located not in the range of a Hermite function as displayed in Fig., then one can modify the basis (7) by scaling t variable via σ (, ) as a parameter. So if one substitutes t := t σ into (8) and modifies c n to ensure orthonormality, then and h n (t, σ) =c n,σ e t σ H n ( t σ ) where c n,σ = (σ n n! () π) / h n (t, σ) = σ h n ( t σ )and h n (ω, σ) = σ h n (σω) ()

5 34 B. Beliczynski Note that h n as defined by () is the two arguments function whereas h n as defined by (8) has only one argument. These functions are related by (). Thus by introducing scaling parameter σ into () one may adjust both the dynamic range of the input argument h n (t, σ) and its frequency bandwidth t [ σ n +,σ n +]; ω [ n +, n + ] (3) σ σ Suppose that one-variable function f defined over the range of its argument t [ t max,t max ] has to be approximated by using Hermite expansions. Assume that the retained function angular frequency should at least be ω r, then according to (3), the following two conditions should be fulfilled σ n + t max and σ n + ωr (4) or σ [σ l,σ h ] where σ l = t max n + and σ h = (5) n + ω r One would expect that σ l σ h, what is equivalent to t max ω r n + (6) In order to preserve orthonormality of the set {h (t, σ),h (t, σ),..., h n (t, σ)}, σ must be chosen the same for all functions h i (t, σ), i=,..., n. Widely discussed on such occasion the lost of basis orthonormality due to basis truncation, in many practical cases is not crucial [7]. 4 Multivariable Function Approximation 4. Multivariable Hermite Basis Let function to be approximated f belongs to Hilbert space f H, H =(H,. ) and be function of n-variables. Let denote it explicitly as f(x,x,..., x n ). Let one-variable Hermite function be denoted as h i (x j,σ j ), where j {,..., m} and i {,,..., n} (7) and multivariable basis function h l (x,x,..., x m,σ,σ,..., σ m ) be the following h l (x,x,..., x m,σ,σ,..., σ m )=h i (x,σ )h i (x,σ )...h im (x m,σ m ) (8) where i,i,..., i m {,,..., n}. Clearly for each one out of m variables, there are n + indices of Hermite functions. This gives total (n +) m basis functions. They can be enumerated so l {,,..., (n +) m }. l = m i j (n +) j (9) j=

6 Approximation of Functions by Multivariable Hermite Basis 35 Naming now x =(x,x,..., x m )andσ = (σ,σ,..., σ m ), then instead of h l (x,x,..., x m,σ,σ,..., σ m ), we will write in short h l (x, σ) orh l. Finally the multivariable basis is the following { h, h,..., h (n+) m } () One can easily verify that the multivariable basis is orthonormal i.e. { } for i = j hi, h j = elsewhere The approximant f (n+) m of f will be expressed as where f (n+) m(x, σ) = (n+) m l= w l h l (x, σ) () w l = h l,f. () f (n+) m approaches function f if number of elements n goes to infinity f = f = l= w l h l. An interesting survey of math research on multivariables polynomials and Hermite interpolation one can find in [8]. 4. Scaling Parameters Hermite functions are well localized in frequency and time. If a scaling parameter is introduced, it influences both time and frequency ranges but in opposite ways (3). If it is chosen too small, then a fragment of function could poorly be approximated. If it is chosen too large only part of the approximated function spectrum is preserved. If only one-variable function is being approximated, the scaling parameter σ can even intuitively be chosen. If however several variables are involved, the best choice is more complicated and must be calculated. We suggest the following criterion σ =argmin σ f(n+) m(x, σ) f(x) Usually, in order to get σ, a number of iterations is needed. 4.3 Basis Selection If we approximate m-variables function and along each variable we use n + orthonormal components, then it will be (n +) m summation terms in (). For instance if we approximate a 3 variables function with 5 Hermitian components along each variable, then we have 3375 summation terms. One expects that a significant part of all components have a very small, practically negligible influence to the approximation. As clearly visible from formula (6), the components

7 36 B. Beliczynski associated with large wi (or w i ) are contributing the most to the error decrease. So taking advantage of efficiency of approximation by orthonormal basis, we initially calculate an excessive number of Hermite expansion terms and select only the most significant as far as error decrease is concerned. This basis selection can be interpreted as a simple pruning method, a classical neural technique improving generalization, see for instance [9]. 4.4 Examples Example. Let function to be approximated be the following f(x,x )=x e x x. (3) Its plot is presented in Fig.. Let us approximate the function in the range [ 3, 3]. We take 4 points along each axis obtaining totally 68 pairs of the (argument, function value) to be processed. Along each axis number of Hermite components was set to 3, so every one-variable Hermite function could have indices, or. We obtained 3 Hermite components. The expansion coefficients (weights) were calculated according to (). Two scaling factors σ and σ were determined via search-type procedure. Finally we found that σ = σ =.77. The Hermite expansion components () were ordered by squares of their coefficients w i. The first two components are written in (4). f 9 (x,x,σ,σ )=w h (x,σ )h (x,σ )+w 7 h (x,σ )h (x,σ )+... (4) and their expansion coefficients were w =.667 and w 7 =.563e 8. It is clear that to approximate this function it is sufficient to take only one node. Finally the result is the following f (x, σ) =.667h (x, σ), or f (x,x,.77,.77) =.667h (x,.77)h (x,.77) The h and h functions are calculated by using () and (9). Mean Squares Error (MSE) of the approximation is 5.6e, so the approximant is almost exactly the same as the origin. Performance of this approximation is an argument in favour of a good generalization property of this Hermite function based approximation. In fact one can write the following f(x,x )=x e x x π =( )( π 4 π =( )h (x, )h (x, e x σ x )( π 4 e x σ )= )=.667h (x,.77)h (x,.77) what means that generalization from numerical data is almost perfect. We have obtained the function formula which is suitable to be used anywhere, also outside the given region [ 3, 3].

8 Approximation of Functions by Multivariable Hermite Basis 37.5 z.5 4 y 4 4 x 4 Fig.. The original function More demanding generalization experiment is the following. For every function value, the noise signal is randomly generated in the range [.,.] and added to the function. The noised function is presented and Fig Fig. 3. Random noise added to the function values to be used as an input for approximation algorithm

9 38 B. Beliczynski As in the previous case there was only one expansion term sufficient. Because random feature of the experiment, we ran it 5 times, averaging obtained numbers. As the result w =.683, σ =.739, σ =.75 were calculated. Those parameters are very close to the originals. MSE between the original function and the approximation obtained from noisy function was.4e 5, what seems to be very good result of generalization. Example. In this example the function to be approximated is the following f(x,x,x 3 )=x e x x sin(x + x + x 3 ). (5) Let us use again the range [ 3, 3]. We take points along each axis obtaining totally 96 pairs of arguments and function values to be processed. Along each axis, the number of Hermite components was set again to 3, so every one-variable Hermite function could have indices, or. We obtained 3 3 Hermite components. Squares of the expansion coefficients (weights) ordered in nonincreasing order are plotted in Fig.4 It is clear from this plot that 4 out of 7 expansion Hermite terms is sufficient to approximate function (5). MSE between the original function and approximated function is on the level of 3.4e 4. If instead, one takes only out of 7, this ensures 99% of error reduction. When similarly to the previous example a noise generated randomly from the range [.,.] was added and noisy data were used to process function approximation, then again difference between the original function (5) and the approximant (MSE), was on similar level 3.6e 4. Again this is a good sign of generalization ability of this type Hermite based approximation l Fig. 4. Squares of w i () versus l from the most significant to the least

10 5 Conclusions Approximation of Functions by Multivariable Hermite Basis 39 We presented a hybrid method of multivariable function approximation by Hermite basis. The basis is composed from one-variable Hermite functions. Scaling parameters are determined via search algorithm, while expansion coefficients are calculated explicitly from appropriate formulae. Initially we take an excessive number of expansion terms and select only those which contribute the most to the error decrease. This procedure seems to be the reason for a very good generalization property of the method. References. Beliczynski, B.: Properties of the Hermite activation functions in a neural approximation scheme. In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 7, Part II. LNCS, vol. 443, pp Springer, Heidelberg (7). Hlawatsch, F.: Time-Frequency Analysis and Synthesis of Linear Signal Spaces. Kluwer Academic Publishers, Dordrecht (998) 3. Ma, L., Khorasani, K.: Constructive feedforward neural networks using Hermite polynomial activation functions. IEEE Transactions on Neural Networks 6, (5) 4. Kwok, T., Yeung, D.: Constructive algorithms for structure learning in feedforward neural networks for regression problems. IEEE Trans. Neural Netw. 8(3), (997) 5. Kwok, T., Yeung, D.: Objective functions for training new hidden units in constructive neural networks. IEEE Trans. Neural Networks 8(5), 3 48 (997) 6. Kreyszig, E.: Introductory functional analysis with applications. J. Wiley, Chichester (978) 7. Beliczynski, B., Ribeiro, B.: Some enhanencement to approximation of one-variable functions by orthonormal basis. Neural Network World 9, 4 4 (9) 8. Lorentz, R.: Multivariate hermite interpolation by algebraic polynomials: A survey. Journal of Computational and Applied Mathematics, 67 () 9. Reed, R.: Pruning algorithms - a survey. IEEE Trans. on Neural Networks 4(5), (993)

On Riesz-Fischer sequences and lower frame bounds

On Riesz-Fischer sequences and lower frame bounds P. Casazza, O. Christensen, S. Li, A. Lindner Abstract We investigate the consequences of the lower frame condition and the lower Riesz basis condition