An Iteratve Modfed Kernel for Support Vector Regresson Fengqng Han, Zhengxa Wang, Mng Le and Zhxang Zhou School of Scence Chongqng Jaotong Unversty Chongqng Cty, Chna Abstract In order to mprove the performance of a support vector regresson, a new method for modfed kernel functon s proposed. In ths method the nformaton of whole samples s ncluded n kernel functon by conformal mappng. So the Kernel functon s data-dependent. Wth random ntal parameter of kernel functon, teratve modfyng s not stopped untl satsfactory effect. Comparng wth the conventonal model, the mproved approach does not need selectng parameters of kernel functon. Smulaton results show that the mproved approach has better learnng ablty and forecastng precson than tradtonal model. Keywords support vector regresson, data-dependent, kernel, teraton I. INTRODUCTION In recent years, a new pattern classfcaton algorthm, called Support Vector Machne(SVM) has proposed by Vapnk [1] s a powerful methodology for solvng a wde varety of problems n nonlnear classfcaton, functon estmaton, and densty estmaton. Unlke tradtonal neural network models whch mnmze the emprcal tranng error, SVM mplements the structural rsk mnmzaton prncple whch seeks to mnmze the tranng error and a confdence nterval term. Ths results n a good generalzaton error. Because of ts good propertes such as global optmal soluton, good learnng ablty for small samples and so on, the SVM has receved ncreasng attenton n recent years [-4]. Moreover t has been successfully appled to the support vector regresson (SVR), especally for nonlnear tme seres [5-7]. The performance of SVM largely depends on the kernel. Smola [8] elucdated the relaton between the SVM kernel method and the standard regularzaton theory. However, there are no theores concernng how to choose good kernel functon n a data-dependent way. In paper [9], the authors propose a method of modfyng a kernel to mprove the performance of a Support Vector Machne classfer. It s based on the Remannan geometrcal structure nduced by the kernel functon. In order to ncrease the separablty, the dea s to enlarge the spatal resoluton around the separatng boundary surface wth a conformal mappng. In ths method, a prmary kernel s used to obtan support vectors. Then the kernel s modfed conformally n a data dependent way by usng the nformaton of the support vectors. The Fnal classfer s traned by the modfed kernel. Inspred by ths dea, lang yanchun [10] apply the modfed kernel to SVR and forecast fnancal tme seres. These methods have acheved better performance than the conventonal SVM, but we need choose parameters of the modfed kernel carefully. The goal of ths paper s to propose a new method to tran SVM, whch modfes the kernel functon repeatedly. Unlke tradtonal methods whch select parameters of the kernel functon carefully, the parameters of ths algorthm are random. Smulaton experments show that the new method s obvously superor to the tradtonal SVM n the precson of predcton. The remander of ths paper s organzed as follows. Secton brefly ntroduces the learnng of SVR. Secton 3 provdes a background to data-dependent Kernel. Our method to tran SVR by teratve learnng wll be ntroduced n Secton 4. Secton 5 presents some computatonal results to show the effectveness of our method. Secton 6 concludes our works. II. SUPPORT VECTOR REGRESSION Let { x, y}, = 1,,, m be a gven set of tranng data, n where ( x, y ) R R. The output of the SVR s f ( x) =< w, φ( x) > + b, (1) where w s the weght vector, b the bas and φ ( x) the nonlnear mappng from the nput space S to the hgh dmensonal feature space F. < >, represent the nner product. The commonly used ε -nsenstve loss functon ntroduced by Vapnk s f( x) y ε, f( x) y ε Lε ( x ) =. () 0, f( x) y < ε In order to tran w and b, the followng functon s mnmzed 978-1-444-1674-5/08 /$5.00 008 IEEE CIS 008
1 Z = w + C ξ + ξ + mn ( ) + y < w, φ( x) > b ε + ξ, (3) st.. y +< w, φ( x) > + b ε + ξ + ξ, ξ 0, = 1,, m where C s the regularzed constant determnng the trade-off between the emprcal error and the regularzaton term. After the ntroducton of postve slack varables and Lagrange multplers, (3) s equvalent to a standard quadratc programmng (QP) problem whch can be solved wth QP. After (3) s optmzed, (1) can be rewrtten as f ( x) = ( α α ) k( x, x) + b, (4) x SV where α and α are the optmzed soluton of QP. kxx (, ) s the followng kernel functon kxx (, ) =< φ( x), φ( x ) >. (5) It should be ponted out that not all functons can be taken as kernel functon n the SVR. It has been proved that the kernel functon should satsfy the condtons of Mercer s theorem. The kernel functon plays an mportant role n the SVR. It has great effect on the predctng precson. In the next secton the kernel functon s modfed by the conformal mappng, whch makes the kernel functon data-dependent. III. DATA-DEPENDENT KERNEL Before you begn to format your paper, frst wrte and save the content as a separate text fle. Keep your text and graphc fles separate untl after the text has been formatted and styled. Do not use hard tabs, and lmt use of hard returns to only one return at the end of a paragraph. Do not add any knd of pagnaton anywhere n the paper. Do not number text headsthe template wll do that for you. In the tradtonal SVM and SVR, there are no theores concernng how to choose kernel functon n a data-dependent way. Whle some tme seres, such as fnancal tme seres, weather data etc, are nherently nosy, non-statonary and determnstcally chaotc. In order to mprove the precson of forecastng, t s necessary to redefne the kernel functon from the tranng data. In ths secton, a background to datadependent Kernel s provded. The kernel functon s modfed based on nformaton geometry [9-1], n a data-dependent way. From the pont of geometry, nonlnear mappng defnes an embeddng of nput space S nto feature space F as a curved sub-manfold. Generally, F s a reproducng kernel Hlbert space(rkhs) whch s a subspace of Hlbert space. So a Remannan metrc [9] can be nduced n the nput space, and the Remannan metrc can be expressed n the closed form n terms of the kernel g ( x) =< φ( x), φ ( x) >= k( x, x ) =. (6) x x x x j x x j j The volume form n the feature space s defned as dv = g( x) dx(1) dx() dx( n), (7) where gx ( ) = det gj( x), x(1), x(), x( n) are the whole elements of x, the factor gx ( ) represents how a local area s magnfed n F under the mappng φ( x) [9]. After a conformal mappng s ntroduced to the kernel functon, the new kernel functon s defned as k ( x, x ) = c( x) c( x ) k( x, x ), (8) where cx ( ) s a postve conformal mappng functon. It s easy to see that the new kernel functon k ( x, x ) satsfes the condtons of Mercer s theorem. The conformal mappng s taken as [9] cx h x x τ, (9) ( ) = exp( /( )) I where I s the support vectors set [9]. For the new kernel functon k ( x, x ), the Remannan metrc can be expressed as cx ( ) cx ( ) g x c x g x. (10) j ( ) = + ( ) j ( ) x xj One of the typcal kernel functons s Gaussan RBF kernel : In ths case, we have 1, where δj = 0, k( x, x ) = exp( x x / σ ). (11) = j. j g () x δ / σ j = j, (1) In the regon of a neghborhood of a support vector x, we have n n 4 gx ( ) = ( h / σ )exp( nr /( τ )) 1 + σ r / τ, (13) where r = x x s the Eucldean dstance between x and x.
IV. ITERATIVE MODIFIED ALGORITHM After the text edt has been completed, the paper s ready for the template. Duplcate the template fle by usng the Save As command, and use the namng conventon prescrbed by your conference for the name of your paper. In ths newly created fle, hghlght all of the contents and mport your prepared text fle. You are now ready to style your paper; use the scroll down wndow on the left of the MS Word Formattng toolbar. In order to mprove the performance of SVM classfer [9], the kernel functon s modfed such that the factor gx ( ) s enlarged around the support vectors. But n SVR, the kernel functon s modfed such that gx ( ) should be compressed nearby support vectors. In order to make sure the magnfcaton s compressed. Let cx = h x x τ, (14) ( ) exp( /( )) I s s k ( x, x ) = c ( x) c ( x ) k( x, x ), (15) s g sj ( x) = k s ( x, x ) x = x. (16) x x where s s a postve nteger and I the whole tranng data. The summaton n (14) runs over all the tranng data such that the tranng procedure can be smplfed. In ths paper the RBF kernel s adopted. For those pont x whch s nearby the tranng data x, we have j g s( x) = det g sj( x).(17) ns n 4 = ( h / σ )exp( nsr /( τ )) 1 + s σ r / τ It s easy to check the followng conclusons: 1. when h < mn{ σ,1}, g s ( x) s compressed nearby the tranng data x for s = 1,,.. when 1 h < e, g s ( x) s compressed nearby the tranng data x for s > M. 3. g s ( x) s compressed for x whch s far away every tranng data x. In summary, the tranng process of the new method s as followes. Step 1. Gve random values for σε,, Ch,, τ and error ; Step. Let k ( x, x ) = k( x, x ), j = 0 ; 0 Step 3. Tranng SVR by k ( x, x ), computer the fgure of ; mert δ = ( y f( x )) ( y y) j Step 4. If δ > error Then k (, ) ( ) ( ) (, ) j 1 x x + = c x c x kj x x j = j + 1, return Step 3 Else Ext. After the procedure fnshed, the modfed kernel s s s k ( x, x ) = c ( x) c ( x ) k( x, x ), (18) where s s the fnal teraton number. V. SIMULATION In ths secton, we do some smulatons to evaluate the performance of ths method. We have compared the teratve modfed SVR and the tradtonal SVR on the same examples. There have the same tranng parameters n optmal problem (3). To assess the approxmaton results and the generalzaton ablty, a fgure of mert s defned as, (19) δ = ( y f( x )) ( y y) where {( x, y ) = 1,, m} s the test set, y = y / m. A. Approxmaton of One-dmensonal Functon For comparson we show the results wth the teratve modfed SVR and wth the tradtonal SVR. The sold lnes represent the orgnal functon and the dashed lnes show the approxmatons. Fg.1 shows the results for the approxmaton of the functon f( x) = sn x+ (sn 3 x) / 3 sn( x/ ), x [0, π ]. Approxmaton results for three dfferent values of h and τ are represented n table I, where the parameters of the tradtonal SVR and the testng error are σ = 0.1, ε = 0.1, C = 0.05 and δ = 0.895 respectvely. The fgure of mert for each approxmaton s computed on a unformly sampled test set of 16 ponts. The nput x s constructed by the unform dstrbuton on nterval [0, π ], The tranng and test data are composed of 63 ponts and 16 ponts respectvely. Fg. shows the another approxmaton for the same functon, where the parameters of the tradtonal SVR and the testng error are σ = 0.1, ε = 0.1, C = 0.1 and δ = 0.790 respectvely. Ths shows the modfed method possesses better performance of generalzaton than the tradtonal SVR.
B. Approxmaton of Mult-dmensonal Functons Table II shows the approxmaton results of the earthquake wave, whch s regarded as tme seres. The tranng data set s 3 {( x, y) R R = 1,,,50}, a hstorcal lag wth order 3. The fgure of mert for each approxmaton s computed on a unformly sampled test set of 50 ponts. Fg.3 llustrates part of TABLE I. Parameters & testng error of the tradtonal SVR orgnal data, ts approxmaton by the tradtonal SVR and modfed SVR. From comparson we can see the modfed method possesses better performance of generalzaton than the tradtonal SVR. APPROXIMATION RESULTS AND PARAMETERS OF SINGLE VARIA BLE FUNCTION Parameters of the modfed SVR testng error of the frst modfed SVR testng error of the second modfed SVR testng error of the 10 s modfed SVR σ = 0.1 h = 0.1, τ = 1 0.5719 0.1598 0.1014 ε = 0.1 h = 0.1, τ = 0.8 0.681 0.3491 0.1014 C = 0.05 δ = 0.895 h = 0.08, τ = 0.8 0.7841 0.6099 0.1439 σ = 0.1, ε = 0.1 C = 0.1 h = 0.8, τ = 1 0.1044 0.1038 0.1038 δ = 0.790 a. The approxmaton result wth the tradtonal SVR. b. The approxmaton results wth the frst and second modfed SVR.
c. The approxmaton result wth the ten s modfed SVR. Fgure 1. Approxmaton results of one-dmensonal functon. Fgure. Approxmaton results wth the tradtonal and frst modfed SVR. TABLE II. APPROXIMATION RESULTS AND PARAMETERS OF MULTI-DIMENSIONAL FUNCTION Parameters of the tradtonal SVR Parameters of the modfed SVR testng error of the tradtonal SVR testng error of the frst modfed SVR testng error of the second modfed SVR σ = 0.04 ε = 0.01 C = h = 4, τ = 0.06 0.9753 0.1931 0.19
Fgure 3. Approxmaton results of mult-dmensonal functon. VI. CONCLUSION In ths paper we present an teratve modfed kernel to mprove the performance of SVR. It s based on the conformal mappng n nformaton geometry, whch results n datadependent kernel. The whole tranng data are used n the constructon of the conformal mappng nstead of support vectors. The dea s to compress the spatal resoluton such that the performance of forecastng s mproved. It s mportant that the kernel s modfed by teraton and the parameters of the kernel could be set as random values. Smulaton results show the effectveness and generalzaton ablty of ths method. ACKNOWLEDGMENT Ths work s supported by NSF Grant #50578168, Natural Scence Foundaton Project of CQ CSTC 007BB396 and KJ070403. REFERENCES [1] Vapnk V, Nature of Statstcal Learnng Theory [M], Translated by Zhang Xue-gong. Bejng: Tsnghua Unversty Press, 000, pp. 9 136. [] Scholkopf B, Sung K, Burges C, Comparng support vector machnes wth gaussan kernels to radal bass functon classfers,. IEEE Trans Sgnal Processng, 1997, vol. 45, pp. 758 765. [3] Perez-Cruz F, Nava-Vazquez A, Fgueras-Vdal A R, Emprcal rsk mnmzaton for support vector classfers, IEEE Trans on Neural Networks, 003, vol. 14, pp. 96 303. [4] Belhumeur P N, Hespanha J P, Kregman D J, Egenfaces vs fsherfaces :recognton usng class specfc lnear projecton, IEEE Trans on Pattern Analyss and Machne Intellgence, 1997, vol. 19, pp. 711 70. [5] Cao L-juan, Tay Francs E H, Fnancal forecastng usng support vector machnes, Neural Computng &Applcatons, 001, vol. 10, pp. 184 19. [6] Tay Francs E H, Cao L-juan, E-descendng support vector machnes for fnancal tme seres forecastng, Neural Processng Letters, 00, vol. 15, pp.179 195. [7] Yang Ha-qn, Chan La-wan, Kng Irw n support vector machne regresson for volatle stock market predcton, Proceedngs of Intellgent Data Engneerng and Automated Learnng. Berln: Sprnger- Verlag, 00, pp.319 396. [8] Smola, A.J., Schölkopf, B. &MÜLler, K. R., The connecton between regularzaton operators and support vector kernels, Neural Network, 1998, vol. 11, pp.637 649. [9] Amar S, W u S, Improvng support vector machne classfers by modfyng kernel functons, Neural Networks, 1999, vol. 1, pp.783 789. [10] Lang Yan-chun, Sun Yan-feng, An mproved method of support vector machne and ts applcatons to fnancal tme seres forcestng, Progresss n Natural Scence, 003, vol. 13, pp.696 700. [11] Coln Campbell, Kernel Methods: A survey of current technques, Neurocomputng, 00, vol. 48, pp.63 84. [1] Amar S, W u S, An nformaton-geometrcal method for mprovng the performance of support vector machne classfers, Proccedngs of Internatonal Conference on Artfcal Neural Network s 99. London: IEEE conference Publcaton No. 470, 1999, pp. 85 90.