Sparse Training Procedure for Kernel Neuron *

Similar documents
Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

NONLINEAR SYSTEM IDENTIFICATION BASE ON FW-LSSVM

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

Neural network-based athletics performance prediction optimization model applied research

Image Classification Using EM And JE algorithms

The University of Auckland, School of Engineering SCHOOL OF ENGINEERING REPORT 616 SUPPORT VECTOR MACHINES BASICS. written by.

MARKOV CHAIN AND HIDDEN MARKOV MODEL

Application of support vector machine in health monitoring of plate structures

Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.

The Application of BP Neural Network principal component analysis in the Forecasting the Road Traffic Accident

Multispectral Remote Sensing Image Classification Algorithm Based on Rough Set Theory

IDENTIFICATION OF NONLINEAR SYSTEM VIA SVR OPTIMIZED BY PARTICLE SWARM ALGORITHM

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Multilayer neural networks

Associative Memories

Nested case-control and case-cohort studies

Active Learning with Support Vector Machines for Tornado Prediction

Multi-layer neural networks

On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

The Entire Solution Path for Support Vector Machine in Positive and Unlabeled Classification 1

Multilayer Perceptron (MLP)

Predicting Model of Traffic Volume Based on Grey-Markov

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

Xin Li Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, CHINA

L-Edge Chromatic Number Of A Graph

Decentralized Adaptive Control for a Class of Large-Scale Nonlinear Systems with Unknown Interactions

Cyclic Codes BCH Codes

ON AUTOMATIC CONTINUITY OF DERIVATIONS FOR BANACH ALGEBRAS WITH INVOLUTION

International Journal "Information Theories & Applications" Vol.13

A finite difference method for heat equation in the unbounded domain

9 Adaptive Soft K-Nearest-Neighbour Classifiers with Large Margin

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Natural Language Processing and Information Retrieval

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

EEE 241: Linear Systems

Reactive Power Allocation Using Support Vector Machine

Evaluation of classifiers MLPs

A Novel Hierarchical Method for Digital Signal Type Classification

The Order Relation and Trace Inequalities for. Hermitian Operators

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Supporting Information

Adaptive and Iterative Least Squares Support Vector Regression Based on Quadratic Renyi Entropy

Generalized Linear Methods

Support Vector Machine Technique for Wind Speed Prediction

Semi-supervised Classification with Active Query Selection

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Week 5: Neural Networks

WAVELET-BASED IMAGE COMPRESSION USING SUPPORT VECTOR MACHINE LEARNING AND ENCODING TECHNIQUES

Approximate merging of a pair of BeÂzier curves

THE METRIC DIMENSION OF AMALGAMATION OF CYCLES

1 Convex Optimization

Optimal Guaranteed Cost Control of Linear Uncertain Systems with Input Constraints

Delay tomography for large scale networks

Support Vector Machines for Classification and Regression

Support Vector Machines

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

Relevance Vector Machines Explained

Multigradient for Neural Networks for Equalizers 1

A Dissimilarity Measure Based on Singular Value and Its Application in Incremental Discounting

Kernel Methods and SVMs Extension

Finding low error clusterings

Transient Stability Assessment of Power System Based on Support Vector Machine

Discriminating Fuzzy Preference Relations Based on Heuristic Possibilistic Clustering

[WAVES] 1. Waves and wave forces. Definition of waves

A New Evolutionary Computation Based Approach for Learning Bayesian Network

3. Stress-strain relationships of a composite layer

Non-Linear Back-propagation: Doing. Back-Propagation without Derivatives of. the Activation Function. John Hertz.

RBF Neural Network Model Training by Unscented Kalman Filter and Its Application in Mechanical Fault Diagnosis

IV. Performance Optimization

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

GENERATIVE AND DISCRIMINATIVE CLASSIFIERS: NAIVE BAYES AND LOGISTIC REGRESSION. Machine Learning

Correspondence. Performance Evaluation for MAP State Estimate Fusion I. INTRODUCTION

An Effective Training Method For Deep Convolutional Neural Network

Online Classification: Perceptron and Winnow

Lecture Notes on Linear Regression

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Solving Nonlinear Differential Equations by a Neural Network Method

Introduction to the Introduction to Artificial Neural Network

Support Vector Machines

Neural Networks & Learning

A new P system with hybrid MDE- k -means algorithm for data. clustering. 1 Introduction

MAXIMUM A POSTERIORI TRANSDUCTION

G : Statistical Mechanics

QUARTERLY OF APPLIED MATHEMATICS

Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters

Part II. Support Vector Machines

Numerical integration in more dimensions part 2. Remo Minero

A New Algorithm for Training Multi-layered Morphological Networks

Optimum Selection Combining for M-QAM on Fading Channels

CSE 252C: Computer Vision III

Research Article H Estimates for Discrete-Time Markovian Jump Linear Systems

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

GENERATION OF GOLD-SEQUENCES WITH APPLICATIONS TO SPREAD SPECTRUM SYSTEMS

Extending boosting for large scale spoken language understanding

Logistic Regression Maximum Likelihood Estimation

Transcription:

Sparse ranng Procedure for Kerne Neuron * Janhua XU, Xuegong ZHANG and Yanda LI Schoo of Mathematca and Computer Scence, Nanng Norma Unversty, Nanng 0097, Jangsu Provnce, Chna xuanhua@ema.nnu.edu.cn Department of Automaton, snghua Unversty / State Key Laboratory of Integent echnoogy and Systems, Beng 00084, Chna zhangxg@ma.tsnghua.edu.cn Abstract: he kerne neuron s the generazaton of cassca McCuoch-Ptts neuron usng Mercer kernes. In order to contro generazaton abty and prune structure of kerne neuron, we construct a reguarzed rsk functona ncudng both emprca rsk functona and Lapace reguarzaton term n ths paper. Based on the gradent descent method, a nove tranng agorthm s desgned, whch s referred to as sparse tranng procedure for kerne neuron. Such a procedure can reaze the man deas: kerne, reguarzaton (or arge margn) and sparseness n the kerne machnes (e.g. support vector machnes, kerne Fsher dscrmnant anayss, etc.), and can dea wth the nonnear cassfcaton and regresson probems effectvey. Keywords: Kerne Neuron, Support Vector Machne, Sparseness, Reguarzaton.. Introducton In the artfca neura networks the basc eement s McCuoch-Ptts neuron (or M-P) []. Rosenbatt [] proposed the frst earnabe procedure: Perceptron, whch coud ony dea wth the neary separabe cases as a smpe near cassfer. In order to hande the more compcated rea-word probems, many modes and ther tranng procedures are ntroduced, e.g. back propagaton tranng method for mutayer perceptron [3], adane wth some nonnear transform [4], and rada bass functon net (RBF) [5]. Recenty, severa kerne-based machnes for nonnear probems, such as support vector machnes (SVM) [6-8], kerne Fsher dscrmnant anayss (KFD) [9], and arge margn kerne pocket agorthm [0], are ganng more and more attenton n the nonnear cassfer desgnng. here exst three attractve concepts: kerne dea, arge margn or reguarzaton, and sparseness. he kerne dea s an effectve technque to * hs work s supported by Natona Natura Scence Foundaton of Chna, proect No. 6075007 reazng the nonnear transform mpcty. XU et a [] ntroduced the kerne neuron by generazng M-P neuron through Mercer kernes, and constructed a smpe tranng agorthm based on the gradent descent method. Kerne neuron and t tranng procedure can be consdered as a unfed framework for three nonnear technques mentoned above n the neura networks. he reguarzaton technque deveoped by khonov & Arsenn [] s to hande -posed probems. Such a technque has wdey been used n neura networks. It has been found that addng a proper reguarzaton term to an obectve functona can resut n sgnfcant mprovements n net generazaton [3], and aso can prune the structure of nets [4]. here many are three usua reguarzaton terms: the squared or Gaussan, absoute or Lapace, and normazed or Cauchy reguarzaton terms. Wth respect to the effecency of supervsed earnng, Sato and Nakano [3] gave a detaed comparson on the three reguarzaon terms and dfferent earnng agorthms, and ponted out that the combnaton of the squared reguarzaton term and the second order earnng agorthm drastcay mproves the convergence and generazaton abty. Wams [5] concuded that a Lapace reguarzaton term s more approprate than the Gaussan one from the vewpont of net prunng. Ishkawa [6] used a Lapace reguarzer to construct a smpe but effectve earnng method caed a structura earnng wth forgettng n order to prunng the forward neura nets. In ths paper, n order to mprove the generazaton abty and to get a sparse dscrmnant or regresson functon for kerne neuron, we add the Lapace reguarzaton term to the orgna emprca rsk functona defned n the paper of XU et a []. Based on gradent descent approach, a tranng agorthm s constructed. It can be referred to as the sparse tranng procedure for kerne neuron, and can reaze three man deas n support vector machnes. As a nonnear technque, t can hande both nonnear cassfcaton and regresson probems effectvey.

. Defnton of Kerne Neuron hs paper s devoted to two probems: cassfcaton and regresson probems, n neura networks. Let {( x, y ), ( x, y ),..., ( x, y ),..., ( x, y )} () be the tranng set of..d sampes, where n x R. For the cassfcaton probem of bnary casses ω, ), suppose ( ω +, f x ω y = (), f x ω whe for the regresson probem, suppose y R, =,...,. he cassca M-P neuron s defned as o ( x ) = f (( w x) + b) (3) where o s the output of neuron, x s the nput vector, w and b are the weght vector and threshod respectvey. Functon f mpes the transform functon. For the M-P neuron, f s a hard mtng functon,.e. the sgn functon. In neura networks, f s a contnuousy dfferentabe and monotone functon, e.g., sgmod functon and near functon. In the paper of Xu, et a [], a kerne verson of M-P neuron s defned as o( x) = f( α k( x, x ) + β ) (4) = where α R, =,,... are the coeffcents correspondng to each sampe, k( x, x) s the kerne functon satsfyng Mercer condtons, e.g., poynoma kerne, RBF kerne and two ayers neura net [7,8]. Note that generay speakng the nput-output reatonshp s nonnear n kerne neuron. Ony n the case when the transform functon s near and kerne functon s the near kerne (namey k ( x, x) = x x), the reatonshp s near, and can be consdered as the equvaent form of M-P neuron. he kerne neuron utzes the nonnear kernes to reaze the nonnear transform form the orgna n nput vector space ( R ) to the rea number space ( R ). 3. Sparse ranng Procedure for Kerne Neuron For the kerne neuron, XU et a [] defned an emprca functona and constructed a tranng agorthm based on standard gradent descent scheme. Such a tranng procedure ony reazes the kerne dea. Specay, t s dffcut to contro the generazaton abty and to obtan a sparse decson functon. Addng a proper reguarzaton term n rsk functona to decay the connecton weghts s smpe way to prunng weghts wthout compcatng the earnng agorthm much [4]. In kerne neuron, such a prunng mpes that a sparse representaton woud occur,.e. many α woud be cose to zeros. Smutaneousy reguarzaton method aso can mprove the generazaton and convergence of tranng procedure. hus, we defne a reguarzed rsk functona consstng of emprca rsk (square error summaton between the actua output and desred output) and a Lapace reguarzaton term, that s, E( α, β ) = [ y o( )] µ α x + (5) = = where α = [ α,..., α ], μ s the reguarzaton parameter. Now our goa s to construct an effectve agorthm to fnd the coeffcent vector α and threshod β that mnmze rsk functona (5). hs can st be done by the standard gradent descent scheme. he gradent of (5) s, E = y f( y) f ( y) k( m, ) α x x m = + µ sgn( αm) (6) m=,,, E = y f( y) f ( y) β where y = = α k( x, x ) + β and f ( y ) s = the frst dervatve of f( y ), sgn s a sgn functon. Lke the back-propagaton tranng agorthm n the forward neura networks, we aso use snge sampe correcton and add a momentum term n teratve procedure. herefore, a nove teratve procedure for kerne neuron coud be rewrtten as: Agorthm- (SKN-):. Let t=0 and αm(), t β () t be arbtrary.. Pck up some sampe x. 3. t=t+. 4. Cacuate y() t = α( t ) k( x, x ) + β( t ) = f ( y ( t)) 5. Cacuate

αm( t) = λ ( y f( y( t))) f ( y( t)) k( xm, x ) λ sgn( αm( t )) + λ3 αm( t ) β( t) = λ ( y f( y( t))) f ( y( t)) + λ β( t ) 6. Update 3 αm() t = αm( t ) + αm() t β() t = β( t ) + β() t 7. If otherwse go to step 3. α m() t + β () t < ε or t tmax stop, where m=,,, λ s the earnng rate, λ = λµ, λ3 denotes the momentum parameter, ε s a threshod to stop agorthm, t max s the maxma teratons. Ishkawa [6] ponted out that such a weght decay s constant n contrast to exponenta decay [7] and unnecessary connectons fade away. hs means that a arge number of parameters are cose to zeros and the sparseness happens. Partcuary when λ = λ3 = 0, ths approach s the same as the smpe tranng procedure for kerne neuron []. Ishkawa [6] advsed a seectve prunng procedure to make ony the connecton weghts decay whose absoute vaues s beow a threshod θ after a tranng procedure sted above. Another reguarzed rsk functona for kerne neuron coud be constructed as, E( α, β ) = [ y o( )] + µ α x (7) = = α < θ Such an dea can mprove the goodness of ft of a mode and decay the sma vaues further. hs means that a more sparse representaton can occur. he nove sparse tranng procedure woud be comprsed of two steps as foow: components are forced to zero, the fna dscrmnant functon changes tte. In ths paper, a proper threshod δ s set to mpose some components zeros. If α δ, ths sampe or vector s st caed as support vector. Now the fna dscrmnant or regresson functon can be represented as f( x) = f( α k( x, x ) + β ) (8) = α δ In the case when we hande bnary cassfcaton probem, for new nput vector x, f f ( x ) > 0, then x ω, otherwse x ω. For the regresson probem, we consder the f ( x ) as the regresson resut. 4. Experment Resuts and Anayss o evauate the performance of our new tranng procedure, we devsed three artfca data sets: a neary separabe cases, a nonnear case wth 0 mscassfed sampes, and a nonnear regresson. he exampe n Fg. s a neary separabe case, where there exst 79 sampes of two casses (marked by crosses and ponts n the fgure) whch can be cassfed wthout error by severa near cassfers. Fg. ustrates the seperaton nes obtaned by usng KN [], SKN-, SKN- and SVM ght method [9] wth near kerne k ( x, y) = x y, where the crces ndcate the support vectors (or SV). Agorthm- (SKN-):. Agorthm- sted above.. Agorthm-, but ncudng a threshod θ for α. In ths paper, we ca two tranng agorthms above as sparse tranng procedure and for kerne neuron, or smpy SKN- and SKN- respectvey. As n the sparse LS-SVM [8], we refer to the { α,..., α } as the spectrum of kerne neuron. he sparseness means that many components of spectrum are very cose to zeros. If such Fg.: Some separaton hyperpanes from dfferent agorthms for the near case. 3

Fg. show the correspondng spectrums of these earnng appproaches. In KN (Fg.(a)), the spectrum s not sparse obvousy. In Fg. (b) and (c), the arge number of components are cose to zeros and the sparesness occures n the dcson functon. Fg. (d) ustrates the spectrum of SVM ght method [9]. Fg.3: he separated hyperpanes from dfferent agorthms for nonnear case. Fg.: the spectrums from dfferent approaches for the near case. Fg.4: A nonnear regress exampe wth SKN- and SVM. For nonnear probem, we desgned an exampe whch contans ten mscassfed sampes usng some near cassfer. Fg.3 shows the decson boundares from KN, SKN-, SKN- and SVM ght wth the near kerne. Note that there exsts two contradcton sampes,.e. x = x, y y,. We fnd out that the number of support vectors from our sparse tranng agorthm s ess than that from SVM ght (C=000, other parameters are defaut vaues). he exampe demonstrates that our agorthm works we for nonnear probem. o the nonnear regresson probem, a functon 0.5x ) e f ( x) = ( x + x s used [3]. In the experment, a vaue of x s randomy generated n the range from 4 to +4, and the correspondng vaue of functon s computed and added a guassan nose wth a zero mean and a varance 0.04. he tota number of tranng sampes s 30. When a rada bass functon kernne wth wdth.0 s utzed, Fg. 4(a) demonstrates the regresson resut from SKN-, where the sod curve s the regresson resut, the dashed ne true functon, and the back 4

ponts s actua sampes. In Fg. 4 (b), the resut comes from SVM wth RBF keren (wdth =0.4), C=00.0, ε = 0.. In both (a) and (b), the crces denote support vectors. It s easy to see that the number of support vectros form SKN- s ess than that form SVM, that s 6 versus 6. hree artfca exampes above show sparse tranng procedure can work we for both nonnear cassfcaton and regress probems. It s possbe that our methods can obtan more sparse functon than SVM. 5. Dscussons and Concusons he kerne neuron s the nonnear generazaton of neuron wth kernes. In order to contro the generazaton abty and obtan the sparse decson functon, two reguarzed rsk functons are defned for kerne neuron, whch consst of the emperca rsk fuctona and Lapace reguarzaton term. Based on the gradent descent scheme, two sparse tranng procedures are deveoped,.e. SKN- and SKN-. he new methods can be regarded as a new genera-purpose nonnear earnng machne, snce they can be apped for both nonnear pattern recognton and regresson probems. Experments on artfca data sets show that they work we on both neary seperatabe and non-seperatabe data, and aso on regresson probem. For three usua knds of kerne functons, our kerne neuron and ts sparse tranng procedures can mpement the smar performances of mut-ayer perceptrons (the kerne of two ayer neura networks), rada bass functon net (RBF kerne) and the adane wth nonnear preprocessors (poynoma kerne). Furthermore SKN protects us from desgnng hdden ayer node, custerng the centers and constructng the poynoma transfrom, etc. References [] McCuoch W. S., Ptts W. A ogca cacuus of the deas mmanent n nervous actvty. Buetn of Mathematca Bophyscs 5, 5-3, 943. [] Rosenbatt F. he perceptron: probabstc mode for nformaton storage and organzaton n the bran. Psychoogca Revew, 65, 958 [3] Rumehart D. E., Hnton G. E., Wams R. J. Learnng representatons by back-propagatng errors. Nature 33(9), 533-536, 986. [4] Specht D.F. Generaton of poynoma dscrmnant functon for pattern recognton. IEEE ransactons on Eectronc Computer, EC-6 (308-39), 967. [5] heodords S., Koutroumbas K. Pattern Recognton. Academc Press, San Dego, 999. [6] Cortes C., Vapnk V. N. Support Vector Networks. Machne Learnng, 0(3), 73-97, 995. [7] Vapnk V. N. Statstca Learnng heory. Wey, New York, 998. [8] Vapnk V. N. he Nature of Statstca Learnng heory (nd ed.). Sprnger-Verag, New York, 999. [9] Mka S., Ratsch G., Weston J., Schokopf B., Muer K.-R.. Fsher dscrmnant anayss wth kernes. Neura Networks for Sgna Processng IX, 4-48. IEEE Press, New York, 999. [0] Xu J., Zhang X., L Y. Large margn kerne pocket agorthm. Proceedngs of IJCNN00, 480-485, Washngton DC, 00. [] Xu J., Zhang X., L Y. Kerne Neuron and ts ranng Agorthm. Proceedngs of 8 th Internatona Conference on Neura Informaton Processng, Vo. : 86-866, Shangha Chna, Nov. 4-8, 00, Fudan Unversty Press. [] khonov A. N., Arsenn, V. Y. Souton of -posed probem. W. H. Wanston, Washngton DC, 977. [3] Sato K., Nakano R. Second order earnng agorthm wth squared penaty term. Neura Computaton, (3), 709-79,000. [4] Reed R. Prunng agorthms a survey. IEEE transactons on Neura networks, 4(5), 740-747, 993. [5] Wams, P.M. Bayesan Reguarzaton and prunng usng a Lapace pror. Neura Computaton, 7(), 7-43, 995. [6] Ishkawa M. Structura earnng wth forgettng. Neura Networks, 9(3).509-5, 99. [7] Paut D.C., Nowan S. J., Hnton G.E. Experments on earnng by back propagaton. echnca Report. CMU-CS-86-6, Carnege-Maon Unv. 986. [8] Suykens J. A. K., Lukas L., Vandewae J. Sparse east squares support vector machnes cassfers. In 8 t h European Symposum on Artfca Neura Networks (ESANN 000), 37-4, 000. [9] Joachms. Makng Large-scae SVM Learnng Practca. Advances n Kerne Methods Support Vector Learnng, Schokopf B, Burges C, and Smoa A (ed), 69-84, Cambrdge MA: MI Press, 999. 5