I Implementing Radial Basis Functions Using Bump-Resistor Networks John G. Harris University of Florida EE Dept., 436 CSE Bldg 42 Gainesville, FL 3261 1 harris@j upit er.ee.ufl.edu Abstract- Radial Basis Function (RBF) networks provide a powerful learning architecture for neural networks [6]. We have implemented a RBF network in analog VLSI using the concept of bump-resistors. A bump-resistor is a nonlinear resistor whose conductance is a Gaussian-like function of the difference of two other voltages. The width of the Gaussian basis functions may be continuously varied so that the aggregate interpolating function varies from a nearest-neighbor lookup, piecewise constant function to a globally smooth function. The bump-resis tor methodology extends to arbitrary dimensions while still preserving the radiality of the basis functions. The feedforward network architecture needs no additional circuitry other than voltage sources and the 1D bump-resistors. A nine-transistor variation of the Delbruck bump circuit is used to compute the Gaussian-like basis functions [2]. Below threshold the current output fits a Gaussian extremely well, see Figure 1. Figure 3 shows that the shape of the function deviates from the Gaussian shape above threshold. The width of the bump can to be varied by almost an order of magnitude (see Figure 4). The Delbruck bump circuit is shown in Figure A follower aggregation network shown in Figure 5 computes an average of the inputs voltages ci weighted by conductance values gi [4]: The bump current is used to control the conductance of the resistors in Figure 5 such that gi = G(x - ti) where ti is the voltage representing the center location of each bump-resistor. The circuit now computes: This normalized RBF or partition of unity form has been used by Moody and Darken in learning simula- BT Gaussian C iit 1 4 : : : : : : : : : 1 43 0.4 4.3 a1 4.1 0.0 0.1 0.2 03 0.4 OS Y-TO Figure 1: Measured current from the Delbruck bump circuit (circles) and Gaussian fit (straightline). tions [5]. Some researchers have claimed that RBF networks show improved performance using this formulation [SI. Anderson, Platt and Kirk previously demonstrated the use of follower aggregation in an analog RBF chip [l]. The standard RBF form (without normalization) can be computed by holding the output to virtual ground and measuring the current. A one-dimensional eight-node circuit has been successfully fabricated and tested. Results from fitting 4 points using two different values of U are shown in Figure 6. The left plot shows the small U response which approximates a nearest neighbor lookup and the right plot shows the much smoother result for a large U. There are several strategies for extending this network to multiple dimensions. One is simply to construct a multi-dimensional bump function to use as the conductance function G(x) in Figure 5. For example, Kirk has designed a suitable multidimensional bump function by multiplying together several 1D Delbruck bump functions [3]. An alternate strategy used by Anderson, Platt and Kirk [l] 0-7803-1901-X/94 $4.00 01994 IEEE 1894
2 1 1 4 : : : : : : : : : I 110 11 21 23 1.) U 26 21 21 19 3.0 X M 2 1 1 4 : : : : : : : ; : I 1.0 11 11 13 2.b U 16 21 18 29 3.0 X M Figure 6: Measured chip curva from fitting four data points Figure 7: Two-dimensional network 1895
o.m 4 I 0.4 0.6 Od la 1.2 1.4 1.6 1.8 20 W O Figure 4: Plot of the measured standard deviation of the bump function vs. voltage control knob. Figure 5: Follower aggregation network 1896
Vb = 0.8 1.2 1.6 2.0 X-T (V) Figure 3: Normalized current from the bump circuit (circles) fit to Gaussians with identical variance. without worry of any problems with local minima. Learning the center locations (ti) and the widths of the Gaussians (ui) is a more difficult problem. REFERENCES [l] J. Anderson, J. C. Platt, and D. Kirk. An analog VLSI chip for radial basis functions. In J. Hanson, J. Cowan, and C. L. Giles, editors, Neural Information Processing Systems, pages 765-772. Morgan-Kaufman, Palo Alto, 1993. [2] T. Delbruck. Bump circuits for computing similarity and disimilarity of analog voltages. In International Joint Conference on Neural Networks, Seattle, WA., July 1991. [3] D. Kirk, D. Kerns, K. Fleischer, and A. Barr. Analog VLSI implementation of multi-dimensional gradient descent. In J. Hanson, J. Cowan, and C. L. Giles, editors, Neural Information Processing Systems, pages 765-772. Morgan-Kaufman, Palo Alto, 1993. [4] C. Mead. Analog VLSI and Neural Systems. Addison-Wesley, 1989. [5] J. Moody and C. Darken. Fast learning in net- works of locally-tuned processing units. Neural Computation, 1(2):281-294, 1989. [6] T. Poggio and F. Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978-982, 1990. [7] D. Scott. Multivariate Density Estimation. Wiley, 1992. [8] H.W. Werntges. Partitions of unity improve neural function approximators. In Proc. IEEE Intl. Conf. on Neural Networks, pages 914-918, San Francisco, CA,, Feb 1993. vol 2,. 1897
Figure 2: The Delbruck bump circuit. The voltage VB controls the width of the bump. is to use current summation to sum the squares of the voltage differences. A circuit must then be used to exponentiate the current and convert the result to a voltage. Rather than build more complex circuitry, we choose to use the physics of resistors in series to combine the dimensions. Figure 7 shows the the network in two dimensions (extension to further dimensions is straightforward). In two dimensions, each branch has two resistors with conductances of G(x) and G(y). The effective conductance H(z,y) is: (3) Typically we choose G(x) to be a Gaussian function since the Gaussian is the only function that can create a radial function by multiplying 1D functions. However, in this case the Gaussian is the wrong choice since H(x,y) will be nonradial. Instead, we choose G with the following form: Now the series combination is: 1 G(x) = (4) 1 + (x/c)2 which is radial. In many applications, radial functions are not strictly necessary. In fact, the density estimation literature reveals that some nonradial functions have been shown to be optimal under certain criteria [7]. We are also studying resistors that are controlled by sigmoidal functions. A feedforward RBF network is sufficient for many applications. A microprocessor can be used to implement a learning network and download the weights to the chip. On-chip learning will drastically speed up the learning process. Learning the coefficients ci is a straightforward process that can be performed with gradient descent 1898