Effects of Moving the Centers in an RBF Network

Size: px
Start display at page:

Download "Effects of Moving the Centers in an RBF Network"

Transcription

1 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER Effects of Moving the Centers in an RBF Network Chitra Panchapakesan, Marimuthu Palaniswami, Senior Member, IEEE, Daniel Ralph, and Chris Manzie Abstract In radial basis function (RBF) networks, placement of centers is said to have a signicant effect on the permance of the network. Supervised learning of center locations in some applications show that they are superior to the networks whose centers are located using unsupervised methods. But such networks can take the same training time as that of sigmoid networks. The increased time needed supervised learning offsets the training time of regular RBF networks. One way to overcome this may be to train the network with a set of centers selected by unsupervised methods and then to fine tune the locations of centers. This can be done by first evaluating whether moving the centers would decrease the error and then, depending on the required level of accuracy, changing the center locations. This paper provides new results on bounds the gradient and Hessian of the error considered first as a function of the independent set of parameters, namely the centers, widths, and weights; and then as a function of centers and widths the linear weights are now functions of the basis function parameters networks of fixed size. Moreover, bounds the Hessian are also provided along a line beginning at the initial set of parameters. Using these bounds, it is possible to estimate how much one can reduce the error by changing the centers. Further to that, a step size can be specied to achieve a guaranteed amount of reduction in error. Index Terms Generalized methods, gradient methods, Hessian matrices, intelligent networks, learning systems, neural-network architecture, nonlinear estimation. I. INTRODUCTION RADIAL BASIS function (RBF) networks are being used function approximation, pattern recognition, and time series prediction problems. To mention a few features, such networks have the universal approximation property [5], arise naturally as regularized solutions of ill-posed problems [3] and are dealt well in the theory of interpolation [4]. Their simple structure enables learning in stages, gives a reduction in the training time, and this has lead to the application of such networks to many practical problems. The adjustable parameters of such networks are the receptive field centers (the location of basis functions), the width (the spread), the shape of the receptive field and the linear output weights. The problem of determining the number of hidden nodes (or the number of basis functions) required any given practical problem is continually being tackled in the literature. Fixing the network size bee training, growing the architecture incrementally to achieve a needed level of accuracy, pruning to Manuscript received August 31, 1999; revised June 27, 2000 and May 15, This work was supported by a special research grant from The University of Melbourne. C. Panchapakesan, M. Palaniswami, and C. Manzie are with the Department of Electrical and Electronic Engineering, The University of Melbourne, Melbourne, Vic. 3010, Australia, D. Ralph is with the Judge Institute of Management Studies, University of Cambridge, Cambridge CB2 IAG, U.K. Digital Object Identier /TNN remove irrelevant units, or combining growing with pruning are some of the ways by which an optimal size the network is determined. By locating one basis function at each training input, it is possible to interpolate or to get a regularized solution (improving generalization) [3]. But, in general, it is desirable to have small networks that can generalize better and are faster to train. This calls an optimal positioning of the basis functions i.e., the location of centers. This paper looks into the problem of learning of the centers in an RBF network. In Section II, a summary of how centers are selected is mentioned bee giving the description of the problem which is covered in Section III. In Sections IV and V, some analytical results are given and their implications are also discussed. Section VI presents simple numerical examples to illustrate the theory. II. SELECTION OF CENTERS IN RBF NETWORKS Centers in RBF networks are in general located in one of the following ways. A set of grid points in the input space is selected [2]. In this method, the number of basis functions required would be quite large high-dimensional input spaces. Centers can be selected as a random subset of the training samples. Without prior knowledge about the prototype vectors, the number of centers to represent the data would be large. By using the -means clustering algorithm, learning vector quantization, or one of its variants, an optimal set of centers can be located [10], [11]. These methods are based on locating the dense regions of the training inputs and the centers are the means (averages) of the vectors in such regions. All the above methods are based on the distribution of the training inputs alone and do not take into consideration the output values which do influence the positioning of the centers, especially when the variation of the output in a cluster is high. So centers are also selected based on both the input and output data as follows. A set of training samples that explain the variation in the output in an optimal sense is selected using ward subset selection, regularization, and cross validation. Starting with an empty subset one selects gradually those centers, whose contribution toward reducing the error is appreciably large. An efficient procedure to achieve the same result is to use the orthogonal least squares method. To avoid overfit, regularization and cross validation are used [7], [8]. -means clustering that involves both input and output values [9]. The center vectors are learned using backpropagation [6], [16]. In this, following the generalized RBFs (GRBF) /02$ IEEE

2 1300 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 suggested by Girosi and Poggio, supervised learning of the centers and the linear weights are considered in the NETtalk domain (see [6]). This approach is reported to have a generalization ability superior to the sigmoidal networks as well as RBF networks whose centers are determined using unsupervised methods. In the case of on-line learning, the structure of the network is also allowed to change, depending on the novelty of the input, new centers are added and fine tuned using backpropagation or other methods [2], [12]. Growing cell structures are also used in deciding when and to add the new centers based on the accumulated error [13]. When centers are selected in one of the above unsupervised methods the linear output weights can be adapted using either the delta rule or calculated as the solution of an overdetermined system. III. MATHEMATICAL DESCRIPTION OF THE PROBLEM An RBF network is a feedward network with a single layer of hidden units that are fully connected to the linear output units. The output units m a linear combination of the basis (or kernel) functions computed by the hidden layer nodes. Activations of such hidden units 1) decreases monotonically with the distance from a central point or prototype (local) and 2) are identical inputs that lie at a fixed radial distance from the center (radially symmetric). We are trying to approximate a function with an RBF network whose structure is given below. The vector is an input, is the th basis function with center, width and is the vector of linear output weights and the number of basis functions used. We concatenate the centers to get and the widths to get. The output of the network and and is Let. Henceth, assume that the matrix is invertible when, is the matrix in with. The next well-known result shows how to obtain optimal weights given centers and width, provided is invertible [15]. Lemma 3.2: Let and. If is invertible then the unique least squares solution of is given by. The error function can be viewed either as or as and is the optimal vector as mentioned in the previous Lemma. Depending on the type of error let denote or. Given an initial set of parameters, it is known that the linear approximation of near is given by is the gradient of at. Denote by the closed unit ball in the parameter space. The next two results are a straightward application of nonlinear analysis, see [14]. Lemma 3.3: Let be as in Lemma 3.1. Then some and, is a function such that. Moreover, such, The above bound on the gap between the error function and its linear approximation can be used to show how the error function decreases along the direction of steepest descent. Lemma 3.4: In the situation of Lemma 3.3, suppose and. Then each we have Let be a set of training pairs and the desired output vector. For each,, and arbitrary weights,, which are chosen as nonnegative numbers in order to emphasize certain domains of the input space, set The point of this result is to provide a guaranteed and calculable way of decreasing the error along the steepest descent direction. This result can be directly applied in computer code, as we demonstrate in Section VI. Subsequently, the following expression the error will be used. Lemma 3.5: has the m In all of our discussions we restrict ourselves to 1 and norms of matrices defined as follows: For any matrix A, maximum of the 1-norm of all columns, and maximum of 1-norm of all rows. Lemma 3.1 [14]: Let be a continuous mapping and the matrix be invertible at. Then there exist positive constants such that in an -neighborhood of, the matrix is invertible and. We need a little more notation to discuss the derivatives of (equivalently ). Recall, and and define the corresponding index sets,, and.

3 PANCHAPAKESAN et al.: EFFECTS OF MOVING THE CENTERS IN AN RBF NETWORK 1301 By we denote one of several quantities, depending on which index set that is selected from. The notation indicates that the index belongs to one of the sets, or. Note that the cardinalities of the index sets,, are,,, respectively, and we make use of them in some of the results that follow. So the gradient vector is Similarly,,, and are the bounds on the second derivatives of the centers, widths, and mixed partial. We will also define the following bounds use in this section: Lemma 4.1: In an neighborhood of, the Jacobian of satisfies the following bounds: (1) the Jacobian of and the Hessian matrix is given by is Proof: We use the notation given at the end of Section III (2) IV. SOME NEW RESULTS ON THE ERROR FUNCTION We provide bounds on the gradient and the Hessian of the error function viewed as a function of the centers widths and weights. Throughout this section, let The second inequality can be proved in a similar way. Let and denote the bounds the 1, norms of.we will find explicit bounds in Lemma 4.2. Theorem 4.1: In an neighborhood of, the gradient of the error function satisfies the following bounds: the training vectors, the corresponding output vector and the weighting vector are given, and is a given radially symmetric function. We are also given, an initial set of centers, widths, weights, and a radius. denotes the open ball of radius about. In this neighborhood, let be a bound, and, be respective bounds the partial derivatives of with respect to the centers and widths,. i.e.,, Theorem 4.2: In an neighborhood of, the Hessian of the error function satisfies the following bound : Proof: The Hessian of is given by is a bound See Appendix 2 the calculations of.

4 1302 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 Using these partial derivatives in the expression elements are bounded by. Substituting the bounds, and their Jacobians into the following expression gives the necessary bounds: As the proof is given in the Appendix 1. Lemma 4.3: In an -neighborhood of, the following bounds on,, and hold. For Here, is a bound on, and,, and must be chosen such that. Proof: Choose such that, and. We have Let and. Now this we get the bounds using Lemma 3.1 and its proof. Bounds follow from the expression from Lemma 3.2 and its Jacobian Lemma 4.4: In an -neighborhood of Jacobian of are given below the bounds the Plugging in the values the norms in the above expressions and Lemma 4.1 gives a bound in a neighborhood of. Let be the matrix function given in Section III, is invertible; and let be the corresponding optimal basis weights. is the Jacobian matrix of at. For, let denote, respectively, the bounds of, its first and second partial derivatives in an -neighborhood of. Lemma 4.2: In an -neighborhood of the following bounds on,, the Jacobian of, and hold: See Appendix 3 the proof. Theorems 4.3 and 4.4 are shown along similar lines to the preceding proofs by substituting appropriate bounds from the preceding results. Theorem 4.3: In an -neighborhood of By using the above bound, it will be possible to find out whether a change in the centers and widths is desirable. In that case, the result in Theorem 4.2 can be used to fix a step size to get a guaranteed amount of decrease in the error. Theorem 4.4: In an -neighborhood of,,, is bounded above by Proof: The bounds follows from the fact that it is an matrix and that is the bound. Both and have elements, and there are variables. is an matrix such that the nonzero

5 PANCHAPAKESAN et al.: EFFECTS OF MOVING THE CENTERS IN AN RBF NETWORK 1303 Proof: Theorem 5.1: For the following bound:, the Hessian of the error satisfies,,,, and are functions of the initial set of parameters, the direction in which they are moved, the bounds on the basis function and the data. The basis functions are taken to be Gaussians. Proof: The first and the second derivatives of are given as follows. First For each let At an optimal,. For the proofs of the norms of s, which are similar in their derivations refer to the Appendix 4. Using the bounds,, from the lemmas and the bounds s in the expression the second derivative of the error, (2) in Section III, give the necessary bounds. V. BOUNDS FOR THE HESSIAN OF THE ERROR ALONG A GIVEN DIRECTION IN THE CASE OF GAUSSIAN RBF In Section IV, bounds the Hessian in an -ball have been given which can be used in finding the proper step size referred to in Lemma 3.4. But the approximation of the error in the descent direction suggests that a bound the Hessian along the direction in which one moves is sufficient to get the result. We illustrate this when Gaussians with fixed widths are used approximating a function. The bounds the Hessian in a given direction can be given as follows. Let the notation given at the start of Section IV hold here. Let us fix a direction, say, possibly a unit vector, along which we want to move from. For instance, we can select to be the normalized gradient of at the initial set of centers with fixed width and initial weights. We restrict our attention to the ray. We require as in Section IV, or equivalently. We abuse notation slightly by writing in place of. We will determine a constant such that. By further abuse of notation, we write Then the derivative of has the following expression:, are defined as follows: and. As the derivative of,wehave,, are given as functions of as follows. Now, the second derivative of, we have and The next result gives a bound on that relies on certain constants whose consruction is given explicitly in the proof. The remark following the proof outlines how we compute these constants in practice.

6 1304 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 Simplication of this leads to,,,, and are given as functions of as follows: Then we have We calculate expressions and and substitute them in the equation in Theorem 5.1 to find the necessary bounds. By substituting the expressions,, in equation leads to the following: We simply this to get,,,, Then we arrive at,,,, and are given as functions of as follows. Let and,,,, and are the functions of,,,,,, and that are developed next. First we define, each pair, the length. Let. Remark: To construct the values, we follow the method of the proof. First, calculate and as at the start of the proof. The values of are calculated using the length. The coefficients,,,,,,,, and are then calculated according to the mulas provided. Finally, the values of are obtained from the coefficients. VI. EXAMPLES We present two simple examples to illustrate the given theoretical results. The function we approximate is. Three dferent methods are used to update the centers, weights, and values in a function approximation problem using RBF networks. In Lemmas 3.3 and 3.4, a value the step size is

7 PANCHAPAKESAN et al.: EFFECTS OF MOVING THE CENTERS IN AN RBF NETWORK 1305 TABLE I INITIAL CENTER DATA amount by which the error function decreases as a result of the change in parameters TABLE II DATA POINTS TABLE III INITIAL CENTER DATA TABLE IV DATA POINTS suggested once a bound the Hessian of the error function is known. Theorems 4.2 and 5.1 give two methods of calculating the desired bounds. The first two update methods stem from these results. The third update method is to use an optimal line search algorithm as a comparison. The details of each of the methods used are summarized in the following list. 1) Bounds the Hessian of the error function are given all points in a ball around the initial set of parameters based on the calculations given in Theorem 4.2. The value of is chosen such that. 2) Bounds the Hessian of the error function are obtained along the normalized negative gradient descent direction starting from the initial set of centers, based on Theorem 5.1. As above, the value of is chosen such that. 3) Use a line search algorithm incorporating the Armijo- Goldstein and Wolfe conditions to determine an approximately optimal step size, with an initial step size guess of 1 (see [17] details). Two case scenarios are also used in the experiment. In case 1 (one center, two data points) the initial value of the parameters and the training pairs are given in Tables I and II. In case 2 (four centers, nine data points) the corresponding details are given in Tables III and IV. The results of the experiments are reported in Table V. The column marked Network error improvement is the The quantity shows the average rate of decrease of the error function along the line segment,. Finally, the quantity gives the guaranteed lower bound on the error improvement. When the updates are based on the calculation of bounds, the permance is similar both methods in the simpler Scenario 1. However, the second method, the bounds of the Hessian in an interval is used rather than the bounds in a ball, is much better in terms of achieving a reduction in error the more complex Scenario 2. Although the rates of descent are better methods 1 and 2 than method 3, the latter line search produces a far larger descent in total. Theree, the bounds of the Hessian provided need to be improved to make their application more practical. VII. CONCLUSION In RBF networks, an optimal set of centers would be required to make the networks small and efficient. Based on the error as function of centers, we have given bounds on the gradient and Hessian of the error function. These bounds may be used in deciding the optimality of the present set of centers. If a change in centers is desired, bounds on the Hessian may be used to fix a step size, which depends on the current centers to get a guaranteed amount of decrease in the error. The new theoretical results give permance guarantees on the supervised learning in RBF networks. APPENDIX I APPENDIX II.

8 1306 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 6, NOVEMBER 2002 TABLE V NETWORK ERROR IMPROVEMENT. APPENDIX IV. APPENDIX III either is a component of or. Thus

9 PANCHAPAKESAN et al.: EFFECTS OF MOVING THE CENTERS IN AN RBF NETWORK 1307 REFERENCES [1] D. S. Broomhead and D. Lowe, Multivariate functional interpolation and adaptive networks, Complex Syst., vol. 2, pp , [2] J. Platt, A resource allocating network function interpolation, Neural Comput., vol. 3, no. 2, pp , [3] T. Poggio and F. Girosi, Networks approximation and learning, Proc. IEEE, vol. 78, pp , Sept [4] M. J. D. Powell, Radial Basis Function Multivariate Interpolation: A Review Algorithms the Approximation of Functions and Data, J. C. Mason and M. G. Cox, Eds. Oxd, U.K.: Clarendon, [5] J. Park and I. W. Sandberg, Universal approximation using radial basis function networks, Neural Comput., vol. 3, no. 2, pp , [6] D. Wettschereck and T. Dietterich, Improving the permance of radial basis function networks by learning center locations, in Advances in Neural Inmation Processing Systems 4, J. E. Moody, S. J. Hanson, and R. P. Lippmann, Eds. San Mateo, CA: Morgan Kaufmann, 1992, pp [7] M. J. L. Orr, Regularization in the selection of radial basis function centers, Neural Comput., vol. 7, pp , [8] S. Chen, C. F. N. Cowan, and P. M. Grant, Orthogonal least squares learning algorithm radial basis function networks, IEEE Trans. Neural Networks, vol. 2, pp , Mar [9] Y. Zhang et al., A New Clustering and Training Method Radial Basis Function Networks. New York: IEEE, 1996, vol. 1, pp [10] J. Moody and C. J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput., vol. 1, pp , [11] M. Vogt, Combination of radial basis function neural networks with optimized learning vector quantization, Proc. IEEE, vol. 83, pp , Dec [12] V. Kadirkamanathan and M. Niranjan, A function estimation approach to sequential learning with neural networks, Neural Comput., vol. 5, no. 6, pp , [13] B. Fritzke, Fast learning with incremental RBF networks, Neural Processing Lett., vol. 1, no. 1, pp. 2 5, [14] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables. New York: Academic, [15] A. Bjorck, Numerical Methods Least Squares Problems. Philadelphia, PA: SIAM, [16] I. Cha and S. A. Kassam, RBFN restoration of nonlinearly degraded images, IEEE Trans. Image Processing, vol. 5, pp , June [17] R. Fletcher, Practical Methods of Optimization, 2nd ed. New York: Wiley, Chitra Panchapakesan was born in Tambaram, Chennai, India. She received the B.Sc. and M.Sc. degrees in mathematics from the University of Madras, Tamil Nadu, India, and was ranked first in both. She received the Master degree in mathematics from Cornell University, Ithaca, NY, and the Ph.D. degree in fixed-point theory from the Indian Institute of Technology, Madras, India. She received a Postdoctoral Award from the University of Melbourne, Melbourne, Australia, she resumed her research work in the Electrical and Electronics Engineering Department. Marimuthu Palaniswami (S 84 M 85 SM 94) received the B.E. (Hons.) degree from the University of Madras, Madras, India, the M.Eng.Sc. degree from the University of Melbourne, Melbourne, Australia, and the Ph.D. degree from the University of Newcastle, Newcastle, Australia. He is an Associate Professor at the University of Melbourne, Australia. His research interests are in the fields of computational intelligence and data mining, nonlinear dynamics, computer vision, intelligent control, and biomedical engineering. He has published more than 180 conference and journal papers in these topics. He was an Associate Editor of the IEEE TRANSACTIONS ON NEURAL NETWORKS and is on the editorial board of a few computing and electrical engineering journals. Dr. Palaniswami served as a Technical Program Co-Chair the IEEE International Conference on Neural Networks in 1995 and has served on the programme committees of a number of international conferences. His invited presentations include several keynote lectures and invited tutorials, in the areas of machine learning, biomedical engineering, and control. He has completed several industry sponsored projects National Australia Bank, Broken Hill Propriety Limited, Defence Science and Technology Organization, Integrated Control Systems Pty Ltd., and Signal Processing Associates Pty Ltd. He has also been supported with several Australian Research Council Grants, Industry Research and Development Grants, and Industry Research Contracts. He was also a recipient of a Foreign Specialist Award from the Ministry of Education, Japan. Daniel Ralph received the B.Sc. (Hons.) degree from the University of Melbourne, Melbourne, Australia, and the M.S. and Ph.D. degrees from the University of Wisconsin, Madison. He was a Lecturer with the University of Melbourne seven years and is now a Lecturer at Cambridge University, Cambridge, U.K. His research interests include analysis and algorithms in nonlinear programming and nondferentiable systems, quadratic programming methods, including their application to machine learning, discrete-time optimal control, and model predictive control. He has published numerous refereed papers, and coauthored a research monograph on an area of bilevel optimization called mathematical programming with equilibrium constraints. Dr. Ralph is a Member of the Editorial Board of SIAM Journal on Optimization and an Associate Editor both of Mathematics of Operations Research and The ANZIAM Journal. His conference activities, apart from invited lectures and session organization, include co-organizing the 2002 International Conference on Complementarity Problems and chairing streams at the 1998 International Conference on Nonlinear Programming and Variational Inequalities and the 1997 International Symposium on Mathematical Programming. He has been the recipient of a number of research grants from the Australian Research Council. Chris Manzie was born in Melbourne, Australia, in He received the B.Sc., B.Eng. (Hons.), and Ph.D. degrees from the Department of Electrical and Electronic Engineering, University of Melbourne, in 1996 and 2001, respectively. He is presently a Research Fellow with the University of Melbourne. His interests include the modeling and control of various problems relating to SI automotive engines.

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

X/94 $ IEEE 1894

X/94 $ IEEE 1894 I Implementing Radial Basis Functions Using Bump-Resistor Networks John G. Harris University of Florida EE Dept., 436 CSE Bldg 42 Gainesville, FL 3261 1 harris@j upit er.ee.ufl.edu Abstract- Radial Basis

More information

Linear Dependency Between and the Input Noise in -Support Vector Regression

Linear Dependency Between and the Input Noise in -Support Vector Regression 544 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 14, NO. 3, MAY 2003 Linear Dependency Between the Input Noise in -Support Vector Regression James T. Kwok Ivor W. Tsang Abstract In using the -support vector

More information

Neural Networks Lecture 4: Radial Bases Function Networks

Neural Networks Lecture 4: Radial Bases Function Networks Neural Networks Lecture 4: Radial Bases Function Networks H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi

More information

CHAPTER IX Radial Basis Function Networks

CHAPTER IX Radial Basis Function Networks Ugur HAICI - METU EEE - ANKARA 2/2/2005 CHAPTER IX Radial Basis Function Networks Introduction Radial basis function (RBF) networks are feed-forward networks trained using a supervised training algorithm.

More information

Filter Design for Linear Time Delay Systems

Filter Design for Linear Time Delay Systems IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 49, NO. 11, NOVEMBER 2001 2839 ANewH Filter Design for Linear Time Delay Systems E. Fridman Uri Shaked, Fellow, IEEE Abstract A new delay-dependent filtering

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators

Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 9, NO. 2, APRIL 2001 315 Adaptive Control of a Class of Nonlinear Systems with Nonlinearly Parameterized Fuzzy Approximators Hugang Han, Chun-Yi Su, Yury Stepanenko

More information

Back to the future: Radial Basis Function networks revisited

Back to the future: Radial Basis Function networks revisited Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohio-state.edu

More information

IN recent years, controller design for systems having complex

IN recent years, controller design for systems having complex 818 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL 29, NO 6, DECEMBER 1999 Adaptive Neural Network Control of Nonlinear Systems by State and Output Feedback S S Ge, Member,

More information

IN THIS PAPER, we consider a class of continuous-time recurrent

IN THIS PAPER, we consider a class of continuous-time recurrent IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 51, NO. 4, APRIL 2004 161 Global Output Convergence of a Class of Continuous-Time Recurrent Neural Networks With Time-Varying Thresholds

More information

A Complete Stability Analysis of Planar Discrete-Time Linear Systems Under Saturation

A Complete Stability Analysis of Planar Discrete-Time Linear Systems Under Saturation 710 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS, VOL 48, NO 6, JUNE 2001 A Complete Stability Analysis of Planar Discrete-Time Linear Systems Under Saturation Tingshu

More information

MANY methods have been developed so far for solving

MANY methods have been developed so far for solving IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 9, NO. 5, SEPTEMBER 1998 987 Artificial Neural Networks for Solving Ordinary Partial Differential Equations Isaac Elias Lagaris, Aristidis Likas, Member, IEEE,

More information

Neural-Network Methods for Boundary Value Problems with Irregular Boundaries

Neural-Network Methods for Boundary Value Problems with Irregular Boundaries IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER 2000 1041 Neural-Network Methods for Boundary Value Problems with Irregular Boundaries Isaac Elias Lagaris, Aristidis C. Likas, Member, IEEE,

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER The Evidence Framework Applied to Support Vector Machines

1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER The Evidence Framework Applied to Support Vector Machines 1162 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 5, SEPTEMBER 2000 Brief Papers The Evidence Framework Applied to Support Vector Machines James Tin-Yau Kwok Abstract In this paper, we show that

More information

THE winner-take-all (WTA) network has been playing a

THE winner-take-all (WTA) network has been playing a 64 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 1, JANUARY 1999 Analysis for a Class of Winner-Take-All Model John P. F. Sum, Chi-Sing Leung, Peter K. S. Tam, Member, IEEE, Gilbert H. Young, W. K.

More information

Title without the persistently exciting c. works must be obtained from the IEE

Title without the persistently exciting c.   works must be obtained from the IEE Title Exact convergence analysis of adapt without the persistently exciting c Author(s) Sakai, H; Yang, JM; Oka, T Citation IEEE TRANSACTIONS ON SIGNAL 55(5): 2077-2083 PROCESS Issue Date 2007-05 URL http://hdl.handle.net/2433/50544

More information

Lyapunov Stability of Linear Predictor Feedback for Distributed Input Delays

Lyapunov Stability of Linear Predictor Feedback for Distributed Input Delays IEEE TRANSACTIONS ON AUTOMATIC CONTROL VOL. 56 NO. 3 MARCH 2011 655 Lyapunov Stability of Linear Predictor Feedback for Distributed Input Delays Nikolaos Bekiaris-Liberis Miroslav Krstic In this case system

More information

Optimum Sampling Vectors for Wiener Filter Noise Reduction

Optimum Sampling Vectors for Wiener Filter Noise Reduction 58 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 1, JANUARY 2002 Optimum Sampling Vectors for Wiener Filter Noise Reduction Yukihiko Yamashita, Member, IEEE Absact Sampling is a very important and

More information

Two-Layer Network Equivalent for Electromagnetic Transients

Two-Layer Network Equivalent for Electromagnetic Transients 1328 IEEE TRANSACTIONS ON POWER DELIVERY, VOL. 18, NO. 4, OCTOBER 2003 Two-Layer Network Equivalent for Electromagnetic Transients Mohamed Abdel-Rahman, Member, IEEE, Adam Semlyen, Life Fellow, IEEE, and

More information

IN neural-network training, the most well-known online

IN neural-network training, the most well-known online IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 1, JANUARY 1999 161 On the Kalman Filtering Method in Neural-Network Training and Pruning John Sum, Chi-sing Leung, Gilbert H. Young, and Wing-kay Kan

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW1 due next lecture Project details are available decide on the group and topic by Thursday Last time Generative vs. Discriminative

More information

Constraint relationship for reflectional symmetry and rotational symmetry

Constraint relationship for reflectional symmetry and rotational symmetry Journal of Electronic Imaging 10(4), 1 0 (October 2001). Constraint relationship for reflectional symmetry and rotational symmetry Dinggang Shen Johns Hopkins University Department of Radiology Baltimore,

More information

A New Hybrid System for Recognition of Handwritten-Script

A New Hybrid System for Recognition of Handwritten-Script computing@tanet.edu.te.ua www.tanet.edu.te.ua/computing ISSN 177-69 A New Hybrid System for Recognition of Handwritten-Script Khalid Saeed 1) and Marek Tabdzki ) Faculty of Computer Science, Bialystok

More information

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms

2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST A General Class of Nonlinear Normalized Adaptive Filtering Algorithms 2262 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 47, NO. 8, AUGUST 1999 A General Class of Nonlinear Normalized Adaptive Filtering Algorithms Sudhakar Kalluri, Member, IEEE, and Gonzalo R. Arce, Senior

More information

Flexible Transfer Functions with Ontogenic Neural Networks

Flexible Transfer Functions with Ontogenic Neural Networks Flexible Transfer Functions with Ontogenic Neural Networks Norbert Jankowski Department of Computer Methods Nicholas Copernicus University ul. Grudziądzka 87-1 Toruń, Poland Abstract Transfer functions

More information

Support Vector Machine Classification via Parameterless Robust Linear Programming

Support Vector Machine Classification via Parameterless Robust Linear Programming Support Vector Machine Classification via Parameterless Robust Linear Programming O. L. Mangasarian Abstract We show that the problem of minimizing the sum of arbitrary-norm real distances to misclassified

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

RBF Networks Training Using a Dual Extended Kalman Filter

RBF Networks Training Using a Dual Extended Kalman Filter RBF Networks raining Using a Dual Extended Kalman Filter Iulian B. Ciocoiu echnical University of Iaşi, Romania P.O. Box 877, Iaşi, 6600, Romania Phone/Fax: + 40313737; email: iciocoiu@etc.tuiasi.ro Abstract:

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples

Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples 90 IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 Point and Interval Estimation for Gaussian Distribution, Based on Progressively Type-II Censored Samples N. Balakrishnan, N. Kannan, C. T.

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Conjugate gradient algorithm for training neural networks

Conjugate gradient algorithm for training neural networks . Introduction Recall that in the steepest-descent neural network training algorithm, consecutive line-search directions are orthogonal, such that, where, gwt [ ( + ) ] denotes E[ w( t + ) ], the gradient

More information

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil

Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Reinforcement Learning, Neural Networks and PI Control Applied to a Heating Coil Charles W. Anderson 1, Douglas C. Hittle 2, Alon D. Katz 2, and R. Matt Kretchmar 1 1 Department of Computer Science Colorado

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

Training Guidelines for Neural Networks to Estimate Stability Regions

Training Guidelines for Neural Networks to Estimate Stability Regions Training Guidelines for Neural Networks to Estimate Stability Regions Enrique D. Ferreira Bruce H.Krogh Department of Electrical and Computer Engineering Carnegie Mellon University 5 Forbes Av., Pittsburgh,

More information

Stabilization, Pole Placement, and Regular Implementability

Stabilization, Pole Placement, and Regular Implementability IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 47, NO. 5, MAY 2002 735 Stabilization, Pole Placement, and Regular Implementability Madhu N. Belur and H. L. Trentelman, Senior Member, IEEE Abstract In this

More information

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

Generalization and Function Approximation

Generalization and Function Approximation Generalization and Function Approximation 0 Generalization and Function Approximation Suggested reading: Chapter 8 in R. S. Sutton, A. G. Barto: Reinforcement Learning: An Introduction MIT Press, 1998.

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

Mark Gordon Low

Mark Gordon Low Mark Gordon Low lowm@wharton.upenn.edu Address Department of Statistics, The Wharton School University of Pennsylvania 3730 Walnut Street Philadelphia, PA 19104-6340 lowm@wharton.upenn.edu Education Ph.D.

More information

798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 10, OCTOBER 1997

798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 44, NO. 10, OCTOBER 1997 798 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL 44, NO 10, OCTOBER 1997 Stochastic Analysis of the Modulator Differential Pulse Code Modulator Rajesh Sharma,

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm

Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Some Formal Analysis of Rocchio s Similarity-Based Relevance Feedback Algorithm Zhixiang Chen (chen@cs.panam.edu) Department of Computer Science, University of Texas-Pan American, 1201 West University

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Complexity Bounds of Radial Basis Functions and Multi-Objective Learning

Complexity Bounds of Radial Basis Functions and Multi-Objective Learning Complexity Bounds of Radial Basis Functions and Multi-Objective Learning Illya Kokshenev and Antônio P. Braga Universidade Federal de Minas Gerais - Depto. Engenharia Eletrônica Av. Antônio Carlos, 6.67

More information

L p Approximation of Sigma Pi Neural Networks

L p Approximation of Sigma Pi Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 6, NOVEMBER 2000 1485 L p Approximation of Sigma Pi Neural Networks Yue-hu Luo and Shi-yi Shen Abstract A feedforward Sigma Pi neural networks with a

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

VECTORIZED signals are often considered to be rich if

VECTORIZED signals are often considered to be rich if 1104 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 54, NO 3, MARCH 2006 Theoretical Issues on LTI Systems That Preserve Signal Richness Borching Su, Student Member, IEEE, and P P Vaidyanathan, Fellow, IEEE

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

WHEN studying distributed simulations of power systems,

WHEN studying distributed simulations of power systems, 1096 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 21, NO 3, AUGUST 2006 A Jacobian-Free Newton-GMRES(m) Method with Adaptive Preconditioner and Its Application for Power Flow Calculations Ying Chen and Chen

More information

Algorithms for Constrained Optimization

Algorithms for Constrained Optimization 1 / 42 Algorithms for Constrained Optimization ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University April 19, 2015 2 / 42 Outline 1. Convergence 2. Sequential quadratic

More information

Generating Chaotic Attractors With Multiple Merged Basins of Attraction: A Switching Piecewise-Linear Control Approach

Generating Chaotic Attractors With Multiple Merged Basins of Attraction: A Switching Piecewise-Linear Control Approach 198 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: FUNDAMENTAL THEORY AND APPLICATIONS VOL 50 NO 2 FEBRUARY 2003 Generating Chaotic Attractors With Multiple Merged Basins of Attraction: A Switching Piecewise-Linear

More information

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE

Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 55, NO. 9, SEPTEMBER 2010 1987 Distributed Randomized Algorithms for the PageRank Computation Hideaki Ishii, Member, IEEE, and Roberto Tempo, Fellow, IEEE Abstract

More information

Sample questions for Fundamentals of Machine Learning 2018

Sample questions for Fundamentals of Machine Learning 2018 Sample questions for Fundamentals of Machine Learning 2018 Teacher: Mohammad Emtiyaz Khan A few important informations: In the final exam, no electronic devices are allowed except a calculator. Make sure

More information

SPARSE signal representations have gained popularity in recent

SPARSE signal representations have gained popularity in recent 6958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 10, OCTOBER 2011 Blind Compressed Sensing Sivan Gleichman and Yonina C. Eldar, Senior Member, IEEE Abstract The fundamental principle underlying

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Second-order Learning Algorithm with Squared Penalty Term

Second-order Learning Algorithm with Squared Penalty Term Second-order Learning Algorithm with Squared Penalty Term Kazumi Saito Ryohei Nakano NTT Communication Science Laboratories 2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 69-2 Japan {saito,nakano}@cslab.kecl.ntt.jp

More information

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Slide05 Haykin Chapter 5: Radial-Basis Function Networks Slide5 Haykin Chapter 5: Radial-Basis Function Networks CPSC 636-6 Instructor: Yoonsuck Choe Spring Learning in MLP Supervised learning in multilayer perceptrons: Recursive technique of stochastic approximation,

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE

A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE Yugoslav Journal of Operations Research 24 (2014) Number 1, 35-51 DOI: 10.2298/YJOR120904016K A PREDICTOR-CORRECTOR PATH-FOLLOWING ALGORITHM FOR SYMMETRIC OPTIMIZATION BASED ON DARVAY'S TECHNIQUE BEHROUZ

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

SIMULATION OF FREEZING AND FROZEN SOIL BEHAVIOURS USING A RADIAL BASIS FUNCTION NEURAL NETWORK

SIMULATION OF FREEZING AND FROZEN SOIL BEHAVIOURS USING A RADIAL BASIS FUNCTION NEURAL NETWORK SIMULATION OF FREEZING AND FROZEN SOIL BEHAVIOURS USING A RADIAL BASIS FUNCTION NEURAL NETWORK Z.X. Zhang 1, R.L. Kushwaha 2 Department of Agricultural and Bioresource Engineering University of Saskatchewan,

More information

Comparison of Multilayer Perceptron and Radial Basis Function networks as tools for flood forecasting

Comparison of Multilayer Perceptron and Radial Basis Function networks as tools for flood forecasting Destructive Water: Water-Caused Natural Disasters, their Abatement and Control (Proceedings of the Conference held at Anaheim, California, June 996). IAHS Publ. no. 239, 997. 73 Comparison of Multilayer

More information

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation

A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation A Reservoir Sampling Algorithm with Adaptive Estimation of Conditional Expectation Vu Malbasa and Slobodan Vucetic Abstract Resource-constrained data mining introduces many constraints when learning from

More information

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know

The classifier. Theorem. where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know The Bayes classifier Theorem The classifier satisfies where the min is over all possible classifiers. To calculate the Bayes classifier/bayes risk, we need to know Alternatively, since the maximum it is

More information

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA

The classifier. Linear discriminant analysis (LDA) Example. Challenges for LDA The Bayes classifier Linear discriminant analysis (LDA) Theorem The classifier satisfies In linear discriminant analysis (LDA), we make the (strong) assumption that where the min is over all possible classifiers.

More information

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing

Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Radial-Basis Function Networks. Radial-Basis Function Networks

Radial-Basis Function Networks. Radial-Basis Function Networks Radial-Basis Function Networks November 00 Michel Verleysen Radial-Basis Function Networks - Radial-Basis Function Networks p Origin: Cover s theorem p Interpolation problem p Regularization theory p Generalized

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme

A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme A Constraint-Reduced MPC Algorithm for Convex Quadratic Programming, with a Modified Active-Set Identification Scheme M. Paul Laiu 1 and (presenter) André L. Tits 2 1 Oak Ridge National Laboratory laiump@ornl.gov

More information

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Natural Gradient Learning for Over- and Under-Complete Bases in ICA NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent

More information

On the Cross-Correlation of a p-ary m-sequence of Period p 2m 1 and Its Decimated

On the Cross-Correlation of a p-ary m-sequence of Period p 2m 1 and Its Decimated IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 58, NO 3, MARCH 01 1873 On the Cross-Correlation of a p-ary m-sequence of Period p m 1 Its Decimated Sequences by (p m +1) =(p +1) Sung-Tai Choi, Taehyung Lim,

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Optimal transfer function neural networks

Optimal transfer function neural networks Optimal transfer function neural networks Norbert Jankowski and Włodzisław Duch Department of Computer ethods Nicholas Copernicus University ul. Grudziądzka, 87 Toru ń, Poland, e-mail:{norbert,duch}@phys.uni.torun.pl

More information

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp

a subset of these N input variables. A naive method is to train a new neural network on this subset to determine this performance. Instead of the comp Input Selection with Partial Retraining Pierre van de Laar, Stan Gielen, and Tom Heskes RWCP? Novel Functions SNN?? Laboratory, Dept. of Medical Physics and Biophysics, University of Nijmegen, The Netherlands.

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

Potential Design for Electron Transmission in Semiconductor Devices

Potential Design for Electron Transmission in Semiconductor Devices IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, VOL. 21, NO. 3, MAY 2013 869 Potential Design for Electron Transmission in Semiconductor Devices Jun Zhang, Senior Member, IEEE, Robert Kosut, Fellow, IEEE

More information

IMU-Laser Scanner Localization: Observability Analysis

IMU-Laser Scanner Localization: Observability Analysis IMU-Laser Scanner Localization: Observability Analysis Faraz M. Mirzaei and Stergios I. Roumeliotis {faraz stergios}@cs.umn.edu Dept. of Computer Science & Engineering University of Minnesota Minneapolis,

More information

An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations

An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations 2038 JOURNAL OF APPLIED METEOROLOGY An Adaptive Neural Network Scheme for Radar Rainfall Estimation from WSR-88D Observations HONGPING LIU, V.CHANDRASEKAR, AND GANG XU Colorado State University, Fort Collins,

More information

Unsupervised Classification via Convex Absolute Value Inequalities

Unsupervised Classification via Convex Absolute Value Inequalities Unsupervised Classification via Convex Absolute Value Inequalities Olvi L. Mangasarian Abstract We consider the problem of classifying completely unlabeled data by using convex inequalities that contain

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 33: Adaptive Iteration Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu MATH 590 Chapter 33 1 Outline 1 A

More information

An Efficient Approach to Multivariate Nakagami-m Distribution Using Green s Matrix Approximation

An Efficient Approach to Multivariate Nakagami-m Distribution Using Green s Matrix Approximation IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, VOL 2, NO 5, SEPTEMBER 2003 883 An Efficient Approach to Multivariate Nakagami-m Distribution Using Green s Matrix Approximation George K Karagiannidis, Member,

More information

Dual Estimation and the Unscented Transformation

Dual Estimation and the Unscented Transformation Dual Estimation and the Unscented Transformation Eric A. Wan ericwan@ece.ogi.edu Rudolph van der Merwe rudmerwe@ece.ogi.edu Alex T. Nelson atnelson@ece.ogi.edu Oregon Graduate Institute of Science & Technology

More information

CLOSE-TO-CLEAN REGULARIZATION RELATES

CLOSE-TO-CLEAN REGULARIZATION RELATES Worshop trac - ICLR 016 CLOSE-TO-CLEAN REGULARIZATION RELATES VIRTUAL ADVERSARIAL TRAINING, LADDER NETWORKS AND OTHERS Mudassar Abbas, Jyri Kivinen, Tapani Raio Department of Computer Science, School of

More information