An l 1 Regularized Method for Numerical Differentiation Using Empirical Eigenfunctions

Journal of Matheatical Research with Applications Jul., 207, Vol. 37, No. 4, pp. 496 504 DOI:0.3770/j.issn:2095-265.207.04.0 Http://jre.dlut.edu.cn An l Regularized Method for Nuerical Differentiation Using Epirical Eigenfunctions Junbin LI, Renhong WANG, Min XU School of Matheatical Sciences, Dalian University of Technology, Liaoning 6024, P. R. China Abstract We propose an l regularized ethod for nuerical differentiation using epirical eigenfunctions. Copared with traditional ethods for nuerical differentiation, the output of our ethod can be considered directly as the derivative of the underlying function. Moreover, our ethod could produce sparse representations with respect to epirical eigenfunctions. Nuerical results show that our ethod is quite effective. Keywords nuerical differentiation; epirical eigenfunctions; l regularization; ercer kernel MR(200) Subject Classification 65D5; 65F22. Introduction Nuerical differentiation is a proble to deterine the derivatives of a function fro the values on scattered points. It plays an iportant role in scientific research and application, such as solving Volterra integral equation [], iage processing [2], option pricing odels [3] and identification [4]. The ain difficulty of nuerical differentiation is that it is an ill-posed proble, which eans, the sall error of easureent ay cause huge error in the coputed derivatives [5]. Several ethods for nuerical differentiation have been proposed in the literature, including difference ethods [6] and interpolation ethods [7]. In particular, soe researchers proposed to use Tikhonov regularization for nuerical differentiation probles, which has been shown quite effective [8 0]. Note that ost regularization ethods for nuerical differentiation consist of estiating a function fro the given data and then coputing derivatives of the function. However, in any practical applications, the thing we need to obtain is the derivative of the underlying function not the underlying function itself [3,4]. Thus, a natural approach to coputing derivatives would be to directly estiate the derivatives. In this paper, we propose an algorith for nuerical differentiation in the fraework of statistical learning theory. More specifically, we study an l regularized algorith for nuerical differentiation using epirical eigenfunctions. The key Received April 25, 207; Accepted May 25, 207 Supported by the National Nature Science Foundation of China (Grant Nos. 30052; 30045; 27060; 60064; 67068), the Fundaental Research Funds for the Central Universities (Grant No. DUT6LK33) and the Fundaental Research of Civil Aircraft (Grant No. MJ-F-202-04). * Corresponding author E-ail address: junbin@ail.dlut.edu.cn (Junbin LI)

An l regularized ethod for nuerical differentiation using epirical eigenfunctions 497 advantage of the algorith is that its output could be considered directly as the derivative of the underlying function. Moreover, the algorith produces sparse representations with respect to epirical eigenfunctions, without assuing sparsity in ters of any basis or syste. The reainder of this paper is organized as follows. In Section 2, we first review soe basic facts in statistical learning theory and then present our ain algorith. In Section 3, we present an approach for coputing explicitly the epirical eigenfunctions. In Section 4, we establish the representer theore of the algorith. To illustrate the effectiveness of the algorith, we provide several nuerical exaples in Section 5. Finally, soe concluding rearks are given in Section 6. 2. Forulation of the ethod theory. To present our ain algorith, let us first describe the basic setting of statistical learning Let X be the input space and Y R the output space. Assue that ρ is a Borel probability easure on Z X Y. Let ρ X be the arginal distribution on X and ρ( x) the conditional distribution on Y at given x. Let f ρ be the function defined by f ρ (x) ydρ(y x), x X. Y Given a saple z {(x i, y i )} i drawn independently and identically according to ρ, we are interested in estiating the derivative of f ρ. More precisely, we want to find a function f z : X R that can be used as an approxiation of the derivative of f ρ. Before proceeding further, we need to introduce soe notions related to kernels [,2]. A Mercer kernel on X is defined to be a syetric continuous function K : X X R such that for any finite subset {x i } i of X, the atrix K whose (i, j) entry is K(x i, x j ) is positive seidefinite. Let span{k x : x X} denote the space spanned by the set {K x K(, x) : x X}. We define an inner product in the space span{k x : x X} as follows: s α i K xi, i t β j K tj j K s i j t α i β j K(x i, t j ). The reproducing kernel Hilbert space H K associated with K is defined to be the copletion of span{k x : x X} under the nor K induced by the inner product, K. The reproducing property in H K takes the for f(x) f, K x K for all x X and f H K. Let κ sup x,y X K(x, y). Then it follows fro the reproducing property that f κ f K, f H K. Taylor s expansion of a function g(u) about the point x gives us, for u x, g(u) g(x) + g (x)(u x). Thus the epirical error incurred by the function f on the saple points x x i, u x j can be easured by (g(u) g(x) g (x)(u x)) 2 (y i y j + f(x i )(x j x i )) 2.

498 Junbin LI, Renhong WANG and Min XU The restriction u x could be enforced by the weight ω i,j ω (s) i,j > 0 corresponding to (x i, x j ) with the requireent that ω (s) i,j 0 as x i x j /s. One possible choice of weights is given by a Gaussian with variance s > 0. Let ω be the function on R given by ω(x) s e x2 4 2s 2. Then this choice of weights is ω i,j ω (s) i,j ω(x i x j ). The following regularized algorith for nuerical differentiation was proposed in [3]: in f H K { 2 } ω(x i x j )(y i y j f(x i )(x i x j )) 2 + γ f 2 K. i,j In this paper, we shall odify the algorith () by using an l regularizer. Note that the l regularizer plays a key role in producing sparse approxiations. This phenoenon has been observed in LASSO [4] and copressed sensing [5], under the assuption that the approxiated function has a sparse representation with respect to soe basis. Let L K,s denote the operator defined by L K,s (f) ω(x u)k x (u x) 2 f(x)dρ X (x)dρ X (u), f H K. X X The operator L K,s is copact, positive, and self-adjoint [3]. Therefore it has at ost countably any eigenvalues, and all of these eigenvalues are nonnegative. One can arrange these eigenvalues {λ l } (with ultiplicities) as a nonincreasing sequence tending to 0 and take an associated sequence of eigenfunctions {φ l } to be an orthonoral basis of H K. Let x denote the unlabeled part of the saples z {(x i, y i )} i, i.e., x {x i} i. defined on H K as follows: L x K,s(f) ( ) We consider another operator Lx K,s ω(x i x j )K xi (x j x i ) 2 f(x i ), f H K. (3) i,j It is easy to show that E x (L x K,s f) L K,sf, which eans E x (L x K,s ) L K,s. As a result, L x K,s can be viewed as an epirical version of the operator L K,s with respect to x. The operator L x K,s is self-adjoint, positive. Its eigensyste, called an epirical eigensyste, is denoted by {(λ x l, φx l )}, where the eigenvalues {λ x l } are arranged in nonincreasing order. We notice here two iportant facts: for one thing, all the epirical eigenfunctions {φ x l } for an orthonoral basis of H K; for another, at ost eigenvalues are nonzero, i.e., λ x l 0 whenever l >. Based on the first epirical eigenfunctions {φ x l } l, we are now in a position to present our ain algorith as follows: c z γ arg in c R { 2 i,j The output function of algorith (4) is ( ω(x i x j ) y i y j f z γ ( l c z γ,lφ x l, l ) ) 2 } c l φ x l (x i ) (x j x i ) + γ c. (4) which is expected to approxiate the derivative of the underlying target function f ρ. Next we shall focus on the coputations of epirical eigenpairs, the representer theore (i.e., the explicit solution to proble (4), and the sparsity of coefficients in the representation f z γ l cz γ,l φx l.

An l regularized ethod for nuerical differentiation using epirical eigenfunctions 499 3. Coputations of epirical eigenpairs We shall establish in this section an approach for coputing explicitly the epirical eigenpairs {(λ x l, φx l )}. To present our ethod, soe notations and definitions are needed. Recall that K denotes the atrix whose (i, j) entry is K(x i, x j ). For i, define b i j ω(x i x j )(x j x i ) 2, d i b i. Let B diag{b, b 2,..., b }, D diag{d, d 2,..., d }, and A DKD. Denote rank(a) and rank(l x K,s ) to be the ranks of the atrix A and the operator L x K,s, respectively. In the following theore, we shall express the epirical eigenpairs of the operator L x K,s in ters of the eigenpairs of a atrix. Theore 3. Let d rank(a). Denote all eigenvalues of A as λ λ 2 λ d > λ d+ λ 0, and the corresponding orthonoral eigenvectors as u, u 2,..., u. Then rank(l x K,s ) rank(a), and the epirical eigenpairs {(λx l, φx l )}d l of Lx K,s can be coputed in ters of the eigenpairs of A as follows: λ x l λ l ( ), φx l λl Proof By the definitions of L x K,s and φx l, we have j d j (u l ) j K xj. and L x K,s(φ x l ) ( ) λ l ( ) λ l ( ) λ l ( ) λ l ( ) λ l λ l ( ) j i,p i i ω(x i x p )K xi (x p x i ) 2 d j (u l ) j K(x i, x j ) K xi K xi j K xi d i i j ( ) ω(x i x p )(x p x i ) 2 K(x i, x j )d j (u l ) j j p K xi d i λ l (u l ) i i λl i φ x p, φ x q K λp λ q λp λ q b i K(x i, x j )d j (u l ) j d i K(x i, x j )d j (u l ) j d i (u l ) i K xi λ x l φ x l i,j i,j di (u p ) i K xi, d j (u q ) j K xj d i (u p ) i K(x i, x j )d j (u q ) j (Du p ) T K(Du q ) u T p D T KDu q λp λ q λp λ q K

500 Junbin LI, Renhong WANG and Min XU u T λ q p Au q u T p u q δ p,q. λp λ q λ p Therefore, the nubers {λ x l }d l are eigenvalues of Lx K,s with corresponding orthonoral eigenfunctions {φ x l }d l, and rank(lx K,s ) rank(a). that Let φ x l On the other hand, let t rank(l x K,s ). Then, for l t, it follows fro Lx K,s (φx l ) λx l φx l ( ) ω(x i x j )K(x i, x p )(x j x i ) 2 φ x l (x i ) i,j ( ) K(x i, x p )b i φ x l (x i ) λ x l φ x l (x p ), p. i x (φ x l (x ),..., φ x l (x )) T. Then Now, for p, q, we have ( ) KBφx l x λ x l φ x l x, ( ) DKD2 φ x l x λ x l Dφ x l x, ( ) ADφx l x λ x l Dφ x l x. δ p,q λ x p L x K,s(φ x p), φ x q K ( ) ω(x i x j )K xi (x j x i ) 2 φ x p, φ x q K ( ) ( ) ( ) i,j ω(x i x j )(x j x i ) 2 φ x p(x i )φ x q (x i ) i,j φ x p(x i )φ x q (x i ) ω(x i x j )(x j x i ) 2 i i j φ x p(x i )b i φ x q (x i ) ( ) ( ) Dφx p x, Dφ x q x. (d i φ x p(x i ))(d i φ x q (x i )) It follows that for l t, Dφ x l x are the orthonoral eigenvector syste of A, and rank(a) rank(l x K,s ). The proof of the theore is now copleted. i Reark 3.2 According to the proof of Theore 3., we know that the eigenfunctions {φ x l }d l satisfy the following two properties: ( ) i,j ω(x i x j )(x j x i ) 2 φ x p(x i )φ x q (x i ) δ p,q λ x p. If λ x l 0, then φ x l (x i)(x j x i ) 0. 4. Representer Theore

An l regularized ethod for nuerical differentiation using epirical eigenfunctions 50 The following theore provides the solution to proble (4) explicitly. Theore 4. For l, denote { Sl z 2 λ x i,j ω(s) (x i x j )(y i y j )φ x l (x i)(x j x i ), if λ x l > 0, l 0, otherwise. Then the solution to proble (4) is given by 0, ( ) if 2λ x l Sz l γ, c z γ,l ( ) Sl z γ ( 2λ x l ) Sl z + ( ) γ 2λ x l, if 2λ x l Sz l > γ and Sz l > γ 2λ, x l, if 2λ x l Sz l > γ and Sz l < γ 2λ. x l Proof Let ω i,j ω(x i x j ). By using Reark 3.2, we can reduce the epirical error part in algorith (4) as follows: ( ) ) 2 2 ω i,j (y i y j + c l φ x l (x i ) (x j x i ) i,j i,j l l [( 2 ] 2 ω i,j c l φ x l (x i )(x j x i )) + 2(y i y j ) c l φ x l (x i )(x j x i ) + (y i y j ) 2 2 2 2 2 i,j p,q 2 2 i,j ω i,j l c l φ x l (x i )(x j x i ) 2 l ω i,j (y i y j ) c l φ x l (x i )(x j x i ) c p c q i,j c l l i,j l l c l φ x l (x i ) + 2 ω i,j (y i y j ) 2 + i,j ω i,j φ x p(x i )(x j x i ) 2 φ x q (x i ) + 2 ω i,j (y i y j ) 2 + ω i,j (y i y j )φ x l (x i )(x j x i ) 2 c 2 l λ x l ( ) + 2 ω i,j (y i y j ) 2 + 2 2 l c l l i,j l i,j ω i,j (y i y j )φ x l (x i )(x j x i ) i,j c 2 l λ x l + 2 λ x l Sl z c l + 2 ω i,j (y i y j ) 2. l i,j We now have an equivalent for of the algorith as c z { γ arg in c R λx l (c l + ( ) Sz l ) 2 + γ c l }. l It is easy to see that c z γ,l 0 when λx l 0. When λ x l > 0, the coponents c z γ,l can be found by solving the following optiization proble c z { γ,l arg in (c + c R ( ) Sz l ) 2 γ + c }, ( ) λ x l (5)

502 Junbin LI, Renhong WANG and Min XU which has the solution given by (5). This proves the theore. 5. Nuerical exaples We present three nuerical exaples to illustrate the approxiating perforance for the nuerical differentiation. We consider the following functions f (x) x 2 exp( x 2 /4), (6) f 2 (x) sin(x) exp( x 2 /8), (7) f 3 (x) x 2 cos(x)/8, (8) f 4 (x) x sin(x). (9) To estiate the coputational error, we choose N test points {t i } N i0 on the interval [ 4, 4] and then copute the errors by using the following two forulae: E (f) N (fγ z (t i ) f (t i ), N i0 E 2 (f) N (fγ N z (t i ) f (t i )) 2. i0 In the experients, the points {x i } 20 i0 are uniforly distributed over [ 4, 4], i.e., x i 4 + 0.4i (0 i 20). The paraeters s and γ are chosen as 0. and 0.00, respectively. The resulting nuerical results are shown in Figures and 2. Moreover, the errors are listed in Table. Fro these figures, it could be observed that the function f z γ atches the derivative function f ρ well. Meanwhile, the sparse properties could be explicitly found fro the nuber of non-zero coefficients in Table. Function f E (f) E 2 (f) rate of non-zero coefficients f (x) 0.042 0.0473 0/2 f 2 (x) 0.0222 0.0257 0/2 f 3 (x) 0.0489 0.0868 0/2 f 4 (x) 0.030 0.0349 0/2 Table Errors 6. Discussion In this paper, we study a ethod for nuerical differentiation in the fraework of statistical learning theory. Based on epirical eigenfunctions, we propose an l regularized algorith. We present an approach for coputing explicitly the epirical eigenfunctions and establish the representer theore of the algorith. Copared with traditional ethods for nuerical differentiation, the output of our ethod could be considered directly as the derivative of the

An l regularized ethod for nuerical differentiation using epirical eigenfunctions 503 underlying function. Moreover, the algorith could produce sparse representations with respect to epirical eigenfunctions, without assuing sparsity in ters of any basis or syste. Finally, this work leaves several open issues for further study. For exaple, it is interesting to extend our ethod to the estiation of gradient in high diensional space. (a) Figure (a) Approxiate derivative of f (x) (b) (b) Approxiate derivative of f 2(x) (a) Figure 2 (a) Approxiate derivative of f 3(x) (b) (b) Approxiate derivative of f 4(x) Acknowledgeents coents and constructive suggestions. The authors are indebted to the anonyous reviewers for their careful References [] Jinquan CHENG, Y. C. HON, Yanbo WANG. A Nuerical Method for the Discontinuous Solutions of Abel Integral Equations. Aer. Math. Soc., Providence, RI, 2004. [2] S. R. DEANS. The Radon Transfor and Soe of Its Applications. Courier Corporation, 2007. [3] E. G. HAUG. The Coplete Guide to Option Pricing Forulas. McGraw-Hill Copanies, 2007. [4] M. HANKE, O. SCHERZER. Error analysis of an equation error ethod for the identification of the diffusion coefficient in a quasi-linear parabolic differential equation. SIAM J. Appl. Math., 999, 59(3): 02 027. [5] A. N. TIKHONOV, V. Y. ARSENIN. Solutions of Ill-Posed Probles. Washington, DC: Winston, 977. [6] R. S. ANDERSSEN, M. HEGLAND. For nuerical differentiation, diensionality can be a blessing!. Math. Cop., 999, 68(227): 2 4. [7] T. J. RIVLIN. Optially stable Lagrangian nuerical differentiation. SIAM J. Nuer. Anal., 975, 2(5): 72 725. [8] J. CULLUM. Nuerical differentiation and regularization. SIAM J. Nuer. Anal., 97, 8(2): 254 265.

504 Junbin LI, Renhong WANG and Min XU [9] Shuai LU, S. PEREVERZEV. Nuerical differentiation fro a viewpoint of regularization theory. Math. Cop., 2006, 75(256): 853 870. [0] Ting WEI, Y. C. HON, Yanbo WANG. Reconstruction of nuerical derivatives fro scattered noisy data. Inverse Probles, 2005, 2(2): 657 672. [] N. ARONSZAJN. Theory of reproducing kernels. Trans. Aer. Math. Soc., 950, 68(3): 337 404. [2] F. CUKER, Dingxuan ZHOU. Learning Theory: An Approxiation Theory Viewpoint. Cabridge University Press, Cabridge, 2007. [3] S. MUKHERJEE, Dingxuan ZHOU. Learning coordinate covariances via gradients. J. Mach. Learn. Res., 2006, 7: 59 549. [4] E. J. CANDÈS, J. ROMBERG, T. TAO. Robust uncertainty principles: exact signal reconstruction fro highly incoplete frequency inforation. IEEE Trans. Infor. Theory, 2006, 52(2): 489 509. [5] Hongyan WANG, Quanwu XIAO, Dingxuan ZHOU. An approxiation theory approach to learning with l regularization. J. Approx. Theory, 203, 67: 240 258.