New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 135 New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks Martin Bouchard, Member, IEEE Abstract In recent years, a few articles describing the use of neural networks for nonlinear active control of sound and vibration were published in this journal. Using a control structure with two multilayer feedforward neural networks (one as a nonlinear controller and one as a nonlinear plant model), steepest descent algorithms based on two distinct gradient approaches were introduced for the training of the controller network. The two gradient approaches were sometimes called the filtered-x approach and the adjoint approach. Some recursive-least-squares algorithms were also introduced, using the adjoint approach. In this paper, an heuristic procedure is introduced for the development of recursive-least-squares algorithms based on the filtered-x and the adjoint gradient approaches. This leads to the development of new recursive-least-squares algorithms for the training of the controller neural network in the two networks structure. These new algorithms produce a better convergence performance than previously published algorithms. Differences in the performance of algorithms using the filtered-x and the adjoint gradient approaches are discussed in the paper. The computational load of the algorithms discussed in the paper is evaluated for multichannel systems of nonlinear active control. Simulation results are presented to compare the convergence performance of the algorithms, showing the convergence gain provided by the new algorithms. Index Terms Active control of sound and vibration, gradient computations, multilayer feedforward neural networks, nonlinear control, recursive-least-squares algorithms, steepest descent algorithms. ADJ BP DEKF EBP EKF FX LMS MEKA NEKA RLS NOMENCLATURE Adjoint gradient computation approach. Backpropagation. Decoupled extended Kalman filter. Enhanced backpropagation. Extended Kalman filter. Filtered-x gradient computation approach. Least mean-squares algorithm. Multiple extended Kalman algorithm. Neuron-level extended Kalman algorithm. Recursive-least-squares algorithm. I. INTRODUCTION ACTIVE control of sound and vibration works on the principle of destructive interference between an original primary disturbance field and a secondary field that is gener- Manuscript received August 10, 1999; revised October 4, 2000. The author is with the School of Information Technology and Engineering, University of Ottawa, Ottawa, ON, K1N 6N5, Canada (e-mail: bouchard@site.uottawa.ca). Publisher Item Identifier S 1045-9227(01)00537-9. ated by some control actuators. Detailed presentations of active sound control and/or active vibration control theory and applications can be found in [1] [7], and a brief summary can also be found in a recent issue of this journal [8]. Adaptive linear filtering techniques [9], [10] have been extensively used for the active control of sound and vibration, and many of today s implementations of active control use those techniques. However, those linear digital controllers may not perform well in cases where nonlinearities are found in an active control system. The most common source of nonlinearity in active control systems is the actuator. An actuator has typically a nonlinear response when it operates with an input signal having an amplitude close to (or above) the nominal input signal value, or when it operates at a frequency outside of the normal frequency range of operation (or close to the limits). Nonlinear behaviors can also occur when the dynamics of the system to be controlled are nonlinear. For example, the vibration behavior of plates exhibiting buckling is nonlinear. The use of neural networks for the active control of nonlinear systems has been reported in [11], [8], using multilayer feedforward neural networks. Neural networks may also be used for linear active control systems. For example, radial basis functions neural networks have been used for the control of nonstationary signals and/or nonstationary systems [12]. This paper is concerned with active control of nonlinear systems. It extends and generalizes the algorithms introduced in [11], [8]. These algorithms use a control structure with two multilayer feedforward neural networks: one as a nonlinear controller and one as a nonlinear plant model, as shown in Fig. 1 for a multichannel active control system. The controller neural network basically generates control signals from reference signals, and these control signals are sent to the plant through actuators. The plant model neural network is required to perform the backpropagation of the error signals. The training of the plant model neural network of Fig. 1 can be done with classical neural networks algorithms, including backpropagation algorithms (with or without momentum), nonlinear optimization algorithms (quasi-newton algorithms, conjugate gradient algorithms) or nonlinear identification techniques (nonlinear extended Kalman filtering or recursive-least-squares algorithms). A review of many of these algorithms can be found in [13]. The training of the controller neural network of Fig. 1 can not be done with those classical algorithms, because of the tapped delay lines between the two neural networks of Fig. 1. In [11], [8], steepest descent algorithms based on two distinct gradient approaches (sometimes 1045 9227/01$10.00 2001 IEEE

136 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 Fig. 1. A two neural networks configuration for real-time on-line multichannel nonlinear active control of sound and vibration. named the filtered-x approach and the adjoint approach) were introduced for the training of the controller network. Some recursive-least-squares algorithms (i.e., deterministic versions of extended Kalman filter algorithms) were also introduced in [8] for the training of the controller neural network using the adjoint approach. In Section II of this paper, an heuristic procedure is first introduced for the development of recursive-least-squares algorithms based on the filtered-x and the adjoint gradient approaches, for the training of the controller network of Fig. 1. This leads to the development of new recursive-least-squares algorithms for the training of the controller network. Some of these new algorithms can produce a better convergence performance than the previously published recursive-least-squares or steepest descent algorithms. Differences in the performance of algorithms using the filtered-x and the adjoint gradient approaches are also discussed in Section II. The computational load of several algorithms discussed in Section II is evaluated in Section III, for several configurations of multichannel nonlinear active control systems. In Section IV, simulation results compare the convergence performance of the algorithms and validate the improved performance claimed for the new recursive-least-squares algorithms introduced in Section II. II. TRAINING ALGORITHMS FOR THE CONTROLLER NEURAL NETWORK A. Steepest Descent Algorithms In [11], the filtered-x gradient approach was used and combined with a steepest descent algorithm to produce a training algorithm for the controller neural network of Fig. 1. The filtered-x gradient approach is called this way because the simplification to the linear case (with no hidden layers and with linear activation functions) of a steepest descent algorithm using this approach produces the well-known filtered-x LMS algorithm [9]. In [8], an alternative gradient approach that was called the adjoint approach and was initially published in [14], [15] was introduced for the controller neural network of Fig. 1, and it was combined with steepest descent and recursive-least-squares algorithms. This second gradient approach was called adjoint because the simplification to the linear case of the steepest descent algorithm using this approach produces the adjoint-lms algorithm [16]. In this paper, bold variables are used to represent vectors, and upper case bold variables are used to represent matrices. The gradient computed by the two approaches is then described by filtered-x approach: gradient adjoint approach: gradient where sum of the squared instantaneous values of the error signals ; th error signal at time in the active control system (Fig. 1); total number of error signals; value of one particular weight, or a vector of several weights, or a vector of all the weights in the controller neural network at time ; number of delays in each tapped delay line between the two neural networks of Fig. 1. The filtered-x approach of (2) computes a gradient based on the fact that the error signals at time are affected by the values of the control network weights in the last samples (weight values at times ). The adjoint approach of (3) computes a gradient based on the fact the control network weights at time affect the error signals for the next samples (at times ). This noncausal statement can be delayed to produce a causal relation as in (3): the (1) (2) (3)

BOUCHARD: NEW RECURSIVE-LEAST-SQUARES ALGORITHMS FOR NONLINEAR ACTIVE CONTROL 137 control network weights at time affect the error signals at times. Using (2) or (3), it is possible to have a steepest descent algorithm to update the weights in the controller neural network of Fig. 1 where is a scalar gain. The combination of (2) and (4) produces the filtered-x backpropagation algorithm (FX-BP) [11], while the combination of (3) and (4) produces the adjoint-backpropagation algorithm (ADJ-BP) [8]. To describe more explicitly these algorithms and the other algorithms to be introduced, the following notation is defined. Number of delays in each tapped delay line at the input of the controller network (Fig. 1). Number of reference signals in the active control system. Number of actuators in the active control system. Number of hidden layers in the controller neural network (Fig. 1). Number of neurons in each hidden layer of the controller neural network. Number of hidden layers in the plant model neural network (Fig. 1). Number of neurons in each hidden layer of the plant model neural network. Value of the th reference signal at time. Value of the th output in the th layer of the controller neural network ( is the th input of the input layer and is the value of the th actuator signal in the active control system). The value of before the activation function of a neuron is. Value at time of the (adaptive) weight linking the th neuron of layer to the th neuron of layer in the controller neural network (4) ADJ-BP algorithm can be described by the equations of Tables II and III. The computational load of each equation (estimated by the number of multiplies) is also shown in Tables I III. To evaluate the computational load of the algorithms, some assumptions were made. First, the number of neurons in each hidden layer of each neural network is assumed constant (variables and ). Also, it is assumed that the values of the activation function of the neurons and its derivative function can be found in look-up tables and do not require multiply operations. Finally, it is assumed that all the neurons in two successive layers are fully connected, which may not be required in some applications. As it will be discussed further in Section III and as it can be seen by comparing Tables II III, the adjoint approach produces a steepest descent algorithm (ADJ-BP) that requires significantly less computations than the filtered-x approach (FX-BP). This is because the adjoint approach requires only the computation of one backpropagation instead of backpropagations, as shown by the last two equations of Tables II III. It can be seen in the description of the FX-BP and ADJ-BP algorithms that the memory requirement of the algorithms can be greatly reduced with the assumption for the FX-BP and for the ADJ-BP. This assumption is valid if the adaptation rate is slow as compared to the length of, which is true for most systems. This assumption will be made for all the algorithms to be discussed in this paper. B. Recursive-Least-Squares Algorithms The gradients of (2) and (3) can be expressed in numerous ways, such as: for the filtered-x gradient approach: (5) (6) Value of the th output in the th layer of the plant model neural network ( is the th output of the first hidden layer and is the th output of the output layer). The value of before the activation function of a neuron is. Value of the (fixed) weight linking the th neuron of layer to the th neuron of layer in the plant model neural network. Value of the (fixed) weight linking at time the actuator signal to the th neuron of the first hidden layer in the plant model neural network. Value at time of the th disturbance signal (i.e., signal to be reduced by the active control system). Activation function of a neuron. Derivative of the activation function of a neuron. Using the above notation, Table I describes the forward equations common to all algorithms. The FX-BP algorithm and the for the adjoint gradient approach: (7) (8) (9) (10) (11) (12)

138 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 TABLE I FORWARD EQUATIONS COMMON TO ALL ALGORITHMS AND THEIR NUMBER OF REQUIRED MULTIPLIES TABLE II EQUATIONS OF THE FX-BP ALGORITHM AND THEIR NUMBER OF REQUIRED MULTIPLIES

BOUCHARD: NEW RECURSIVE-LEAST-SQUARES ALGORITHMS FOR NONLINEAR ACTIVE CONTROL 139 TABLE III EQUATIONS OF THE ADJ-BP ALGORITHM AND THEIR NUMBER OF REQUIRED MULTIPLIES where value at time of one particular weight that affects the value of the intermediate variables or in (5) (12), or a vector of a few weights that affect those variables, or a vector of all the weights that affect those variables; vector containing all the values of vector containing all the values of. Equations (5) (8) are equivalent for the computation of the gradient in the FX-BP algorithm, and (9) (12) are equivalent for the computation of the gradient in the ADJ-BP algorithm. The intermediate variables or in (5) (12) are not the only possible choice, and since vectors can contain one weight, a few weights or all the weights in the controller network of Fig. 1, there are many possibilities for expressing a gradient. However, equations that have the form of (8) or (9) (11) are of particular interest, because the index is found only in one term inside of the summation. Thus one term can be taken out of the summation in these equations, and the resulting gradient has the form error. For (8) and (11), will be a matrix and error will be a vector, while for (9) and (10) will be a vector and error will be a scalar. For example, (9) can be rewritten as error (13) The form error is the familiar form of the stochastic gradient in the LMS algorithm [9], [10], that uses a steepest descent approach with a stochastic gradient for linear FIR adaptive filtering, as described by (14), (15) error desired (14) error (15) The sign in (14) (15) is the opposite of what is typically found in the literature. This is simply to reflect the fact that for the active control of sound and vibration, it is a summation as in (14) that occurs (it is the physical superposition of waves). For a linear equation with the form of (14), it is well known that

140 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 recursive-least-squares algorithms can achieve a faster convergence speed than stochastic descent algorithms, at least for stationary or quasistationary signals and systems, because the convergence speed of the recursive-least-squares algorithms does not depend on the eigenvalue spread of the input signals correlation matrix [10]. Recursive-least-squares algorithms are deterministic implementations of Kalman filtering algorithms [10] (or extended Kalman filtering algorithms in the case of nonlinear neural networks). For the training of nonlinear neural networks, recursive-least-squares algorithms may not only provide an increase of convergence speed, it was reported in [17], [18], [8] that they can also improve the steady-state performance achieved by the trained neural network. Combining gradients obtained with equations having the form of (8) or (9) (11) (i.e., gradients that have the form error ) with recursive-least-squares equations provides a simple yet effective heuristic procedure to develop recursive-least-squares algorithms for the training of the controller neural network of Fig. 1. The recursive-least-squares algorithms will minimize the weighted sum over time of the signal error [10] (or error error if error is a vector as in (8) and (11)). Using this heuristic procedure, the previously published recursive-least-squares algorithms for the training of the controller neural network of Fig. 1 can be described in just a few lines. Using (9) and each vector containing only the weights directly connected to the neuron with,a training algorithm called the ADJ-EBP (from the EBP algorithm in [17]) was introduced in [8]. Using (10) and each vector containing only the weights directly connected to the neuron with, a training algorithm called the ADJ-MEKA (from the MEKA algorithm in [17]) was introduced in [8]. Using (11) and each vector containing only the weights directly connected to a neuron in the controller network, a training algorithm called the ADJ-NEKA (from the NEKA algorithm in [17]) was introduced in [8]. Using (11) and a vector containing all the weights in the controller network, a training algorithm called the ADJ-EKF (from the EKF algorithm in [18]) was introduced in [8]. The simplification of these four recursive-least-squares algorithms to the linear case (no hidden layer, linear activation functions) is the adjoint-rls algorithm [19]. From simulations and real-time experiments in [8], the ADJ-EBP and the ADJ-MEKA algorithms produced a faster convergence than the steepest descent ADJ-BP and FX-BP algorithms, but they produced little or no improvement on the steady state minimum found by the algorithms. This weak performance is caused by the fact that the ADJ-EBP and ADJ-MEKA algorithms are local (each only includes the weights directly connected to a neuron) and also by the fact that the signals minimized by these algorithms are respectively and, (i.e., error, where error is found by writing (9) or (10) in the form error as in (13)). These signals have different statistics than the desired cost function. In other words, a fast or large reduction of the cost function minimized by these recursive-least-squares algorithms does not directly provide a fast or large reduction of the cost function. In [8], it was found that the ADJ-NEKA and the ADJ-EKF algorithms provided a much better convergence performance than both the FX-BP and the ADJ-BP algorithms, by greatly improving the steady-state minimum found by the algorithms. The ADJ-NEKA and the ADJ-EKF algorithms are the local and global versions of a recursive-least-squares algorithm minimizing the following cost function error with error found from (11) The statistics of this cost function are typically closer to the desired cost function, simply because is closer to the error signals found in. No recursive-least-squares algorithm has been published so far using the filtered-x approach to compute the gradient (i.e., using (8)). There are at least two reasons to motivate the development of such an algorithm. The signal error to be minimized by the recursive-least-squares algorithm would be, which is actually equal to. Thus, a fast or large reduction of the cost function minimized by the recursive-least-squares algorithm would directly provide a fast or large reduction of the desired cost function. The algorithms based on the filtered-x gradient approach can use a larger gain in the weights update equation than the algorithms based on the adjoint gradient approach. Indeed, in the case of the filtered-x gradient algorithms, it takes a delay up to samples to stabilize in the cost function (and thus in the gradient ) the effect of a change in the weights of the controller neural network, because of the tapped delay lines between the two networks of Fig. 1. The delay can be much less that samples, depending on which coefficients have the largest amplitude (as in the linear case where the equivalent delay of an impulse response is shorter than the whole length of the impulse response). When the value of this delay increases, the maximum gain that can be used in the weights update equation decreases, thus reducing the convergence speed. In the case of the adjoint gradient algorithms, the delay required before the effects of a weight update are stabilized is increased, because the gradient in this case is and will require extra samples to be stabilized, as compared to. Therefore a larger gain can typically be used for the filtered-x gradient algorithms than for the adjoint-gradient algorithms, especially for large values of. This has also been observed in the case of the filtered-x LMS and adjoint-lms linear algorithms [19]. Using (8) with a recursive-least-squares algorithm, and each vector containing the weights directly connected to a

BOUCHARD: NEW RECURSIVE-LEAST-SQUARES ALGORITHMS FOR NONLINEAR ACTIVE CONTROL 141 TABLE IV EQUATIONS OF THE NEW FX-NEKA ALGORITHM AND THEIR NUMBER OF REQUIRED MULTIPLIES neuron in the controller network of Fig. 1, a training algorithm called the FX-NEKA can be developed. The equations for this algorithm are found in Table IV. It is straightforward to modify the FX-NEKA algorithm to use a single global vector containing all the weights in the controller network, to produce the global FX-EKF algorithm. This algorithm is described by the first rows of Table IV and by replacing the last four equations of Table IV by Table V. The simplification of those two new algorithms to the linear case (no hidden layer, linear activation functions) is the filtered-x RLS algorithm [19]. In Sections III and IV, the performance and the computational load of these two new algorithms will be compared with the FX-BP, ADJ-BP, ADJ-NEKA and ADJ-EKF algorithms. The description of the ADJ-NEKA and ADJ-EKF algorithms can be found in Tables VI VII. C. Decoupled Extended Kalman Filter (DEKF) Algorithms The original NEKA algorithm from [17] is very similar to another algorithm called the decoupled extended Kalman filter (DEFK) [20], [21], [13]. In fact, only the equation (16)

142 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 TABLE V EQUATIONS OF THE NEW FX-EKF ALGORITHM AND THEIR NUMBER OF REQUIRED MULTIPLIES in Table IV and (17) in Table VI need to be changed to (18) and (19) to produce the resulting filtered-x DEKF and adjoint-dekf algorithms (18) (19) DEKF algorithms in (18), (19) invert one common global matrix while the NEKA algorithms in (16), (17) invert a different matrix for each neuron. This is the only difference between the two algorithms. For the specific system used in the simulations of Section IV, it was found that DEKF algorithms produced a performance slightly inferior to the NEKA algorithms, therefore only the results obtained with the NEKA algorithms will be discussed and compared to other algorithms in Section IV. III. COMPUTATIONAL LOAD OF THE ALGORITHMS The computational load of the algorithms described in Tables I VII was evaluated for several configurations of multichannel active control of sound and vibration. The computational load was estimated by the number of multiplies required by the different algorithms. The results appear in Table VIII. The upper part of Table VIII is for tonal control, with very small values of, while the lower part is for broadband control, with larger values of. Table VIII also varies the number of channels from one to eight, and the number of hidden layers from one to three. It should be mentioned that some algorithms require extra multiplies due to matrix inversions of small dimensions: inversions of matrices for the ADJ-NEKA algorithm, one inversion of a matrix for the ADJ-EKF algorithm, inversions of matrices for the FX-NEKA algorithm, and one inversion of a matrix for the FX-EKF algorithm. These extra multiplies are not considered in Table VIII. The following conclusions can be found from Table VIII. For global EKF algorithms, the use of the adjoint gradient approach over the filtered-x gradient approach only produces a very small reduction of the computational load (between 0.03% and 2.1% of reduction for tonal control, and between 0.04% and 1.1% of reduction for broad-band control) For local NEKA algorithms, the use of the adjoint gradient approach over the filtered-x gradient approach can

BOUCHARD: NEW RECURSIVE-LEAST-SQUARES ALGORITHMS FOR NONLINEAR ACTIVE CONTROL 143 TABLE VI EQUATIONS OF THE ADJ-NEKA ALGORITHM AND THEIR NUMBER OF REQUIRED MULTIPLIES produce a significant reduction of the computational load (between 2.8% and 7.1% of reduction for tonal control, and between 14% and 36% of reduction for broad-band control). For the steepest descent (BP) algorithms, the use of the adjoint gradient approach over the filtered-x gradient approach produces a large reduction of the computational load (between 22% and 29% of reduction for tonal control, and between 95% and 98% of reduction for broadband control). The reduction for broadband control is thus particularly impressive, and it is probably the case where the use of the adjoint approach may be most suitable even though its convergence properties would be slower with such a high value of. The main factor for large computational loads in Table VIII is the use of the recursive-least-squares algorithms instead of the steepest descent algorithms, and not the use of the filtered-x gradient approach instead of the adjoint gradient approach. IV. CONVERGENCE PERFORMANCE OF THE ALGORITHMS: SIMULATION RESULTS In order to evaluate the performance of the different learning algorithms, simulations of active control of sound in a duct using a nonlinear actuator were performed. In a real-time on-line system such as the one in Fig. 1, the error signals are measured from error sensors and then backpropagated. However, the structure of simulated or offline systems is slightly different, as shown in Fig. 2. In this case it is the disturbance signals which are measured, and the error signals are computed and then backpropagated. The structure of Fig. 2 was thus used in the simulations. The same nonlinear plant model as the one experimentally found in [8] was used: a neural network with a

144 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 TABLE VII EQUATIONS OF THE ADJ-EKF ALGORITHM AND THEIR NUMBER OF REQUIRED MULTIPLIES TABLE VIII COMPUTATION LOAD (NUMBER OF MULTIPLIES) OF THE ALGORITHMS FOR SEVERAL CONFIGURATIONS OF NON-LINEAR ACTIVE CONTROL OF SOUND AND VIBRATION SYSTEMS. THE UPPER PART OF THE TABLE REPRESENTS TONAL CONTROL CONFIGURATIONS, AND THE LOWER PART REPRESENTS BROADBAND CONTROL CONFIGURATIONS configuration (number of neurons) of 8-4-1 that was identified with a standard EKF algorithm [17], using multiple recordings of a loudspeaker (actuator) excited by 50 Hz, 100 Hz, and 150 Hz pure tones. Since the size of the loudspeaker was only 4 in, it obviously had a nonlinear behavior for strong excitation signals of low frequencies such as 50 Hz. The reference and the disturbance signals that were used in the control simulations were 50 Hz pure tone signals. The simulated system was thus a monochannel system, with one reference signal, one actuator signal, and one error signal. In the simulations, a double-precision floating point representation was used for the learning algorithms. This proved to be required because of the poor numerical robustness of the recursive-least-squares equations. To avoid this, robust implementations of the recursive-least-squares equations could be used (QR decomposition square-root realizations [10], etc.). In the controller network, the configuration was 2-6-1. The activation functions used in the simulations were for neurons of an output layer (20) for all other neurons (21) Bias weights (DC offsets) of (linear) neurons in an output layer were set to zero in the simulations. All the other weights were initialized to small values and modified with the learning algorithms. The convergence of the different algorithms was evaluated by computing the ratio of the (moving) average energy of the error signal divided by the average energy of the disturbance signal. The weight update gain that produced the greatest attenuation (or the fastest convergence when the attenuation was constant) was found by trial and error for each algorithm. For recursive-least-squares algorithms, the forgetting factor ( in Tables IV VII) was set to 0.995. The initialization of the inverse correlation matrix ( matrices in Tables IV VII) was found to have an important impact on the steady-state performance achieved by the recursive-least-squares algorithms (as opposed to the use of recursive-least-squares algorithms in linear systems, where the initialization of the inverse correlation matrix only has a transient effect, when the forgetting factor is less

BOUCHARD: NEW RECURSIVE-LEAST-SQUARES ALGORITHMS FOR NONLINEAR ACTIVE CONTROL 145 Fig. 2. A two neural networks configuration for simulated or off-line multichannel nonlinear active control of sound and vibration. Fig. 3. Convergence performance of the FX-BP algorithm, over 10 simulations with different data segments and different initial values. than one). An identity matrix multiplied by a scalar value was used to initialize the matrices, and this scalar value was also adjusted by trial and error, to achieve the greatest attenuation. Fig. 3 shows the result of ten simulations using the FX-BP algorithm. Each simulation was run with different sets of data and different initializations for the weights. It is clear from this figure that there is very little deviation in the performance of the algorithm for the simulated system. A performance of approximately 9 db is achieved. Fig. 4 shows the result of ten simulations using the ADJ-BP algorithm, and again very little deviation is observed. The gain used in the weight update equation of the ADJ-BP algorithm is smaller that in the case of the FX-BP algorithm, as discussed earlier in the paper (even though the value of is small in this case: 7). This explains why the convergence speed of the ADJ-BP is slower than for the FX-BP, as can be seen by comparing Figs. 3 and 4. However, the minimum reached by the ADJ-BP algorithm is the same as for the FX-BP algorithm (approximately 9 db), and the computational load of the ADJ-BP is lower, so the use of the ADJ-BP may be justified. Figs. 5 and 6 show the convergence results of ten simulations with the FX-NEKA and the ADJ-NEKA algorithms, respectively. It can be seen that there is a much larger deviation in the results produced by these algorithms. However, the minimum reached by the algorithms is always much better than the 9dB achieved by the FX-BP and ADJ-BP algorithms. As expected, the level of attenuation achieved by the FX-NEKA algorithm is better than for the ADJ-NEKA ( 39 db on average after 80 000 iterations, as compared to db only), because the cost func- Fig. 4. Convergence performance of the ADJ-BP algorithm, over 10 simulations with different data segments and different initial values. Fig. 5. Convergence performance of the FX-NEKA algorithm, over 10 simulations with different data segments and different initial values. tion minimized by the recursive-least-squares equations in the FX-NEKA algorithm is equal to the desired cost function. Since the FX-NEKA and the ADJ-NEKA algorithms have computational loads of the same order, it appears that the use of the FX-NEKA algorithm introduced in this paper is more advantageous than the use of the ADJ-NEKA algorithm. Figs. 7 and 8 show the performance produced by ten simulations with each of the two global EKF algorithms, i.e., the recursive-least-squares algorithms that use only one global vector that contains all the weights of the controller neural network. These algorithms have a much higher computational load (Table VIII), but it is clear from Figs. 7 and 8 that they also produce the best performance. Both algorithms have little deviation

146 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 Fig. 6. Convergence performance of the ADJ-NEKA algorithm, over 10 simulations with different data segments and different initial values. Fig. 8. Convergence performance of the ADJ-EKF algorithm, over 10 simulations with different data segments and different initial values. Fig. 7. Convergence performance of the FX-EKF algorithm, over 10 simulations with different data segments and different initial values. Fig. 9. Comparison of the convergence performance of the two EKF algorithms, over 10 simulations with different data segments and different initial values, first 15 000 iterations shown. in their performance (as compared to the NEKA algorithms of the previous paragraph), and they both achieve a performance of 43 db. For previously mentioned reasons, it is expected that the FX-EKF algorithm will converge faster than the ADJ-EKF algorithm. It is indeed the case as can be seen from Fig. 9, that shows the first convergence iterations of ten simulations with the FX-EKF algorithm and the ADJ-EKF algorithm. Some deviation in the results of the algorithms can be observed on this scale, but in every simulation the FX-EKF converged faster than the ADJ-EKF. It is expected that for broadband or multitone signals the convergence gain of the FX-EKF over the ADJ-EKF would be even higher because. The statistics of the cost function minimized by the recursive-least-squares equations of the ADJ-EKF would be more different from the desired cost function than in the case of a single tone to be controlled (i.e., the minimized cost function and would have different power spectrums). A high value of would reduces the gain more in the ADJ-EKF than in the FX-EKF algorithm, thus further reducing the convergence speed of the ADJ-EKF. Since the FX-EKF and the ADJ-EKF have a very similar computational load, it appears that the use of the FX-EKF algorithm introduced in this paper is more advantageous than the use of the ADJ-EKF algorithm. V. CONCLUSION In this paper, an heuristic procedure was introduced for the development of recursive-least-squares algorithms, used for the training of the controller neural network in the nonlinear active control of sound and vibration structure of Fig. 1. With the proposed procedure, it was straightforward to develop two new algorithms (the FX-NEKA and the FX-EKF algorithms) that proved to substantially improve the performance of the previously published algorithms for the training of the controller network. Moreover, it is expected that for systems with broadband signals, the convergence gain of the introduced algorithms over the previously published algorithms would be even greater. Although the algorithms introduced in this paper may solve the problem of slow convergence in nonlinear active control systems, they do not solve all the problems. For example, they do not solve the problem of finding a nonlinear model of the plant that will be valid for most control signals, as reported in [8]. Another potential problem is the poor numerical robustness of the recursive-least-squares algorithms discussed in this paper, and robust implementations of the algorithms may have to be used for long-term stability (QR decomposition square-root realizations, etc.). The use of recurrent networks (to reduce the number of neurons required in the two networks control structure) combined with the algorithms discussed in this paper would also be of interest.

BOUCHARD: NEW RECURSIVE-LEAST-SQUARES ALGORITHMS FOR NONLINEAR ACTIVE CONTROL 147 REFERENCES [1] S. M. Kuo and D. R. Morgan, Active noise control: A tutorial review, Proc. IEEE, vol. 87, pp. 943 973, 1999. [2] S. J. Elliott, Down with noise, IEEE Spectrum, vol. 36, no. 6, pp. 54 61, 1999. [3] S. J. Elliott and P. A. Nelson, Active noise control, IEEE Signal Processing Mag., vol. 10, pp. 12 35, 1993. [4] P. A. Nelson and S. J. Elliott, Active Control of Sound. London, U.K.: Academic, 1992. [5] C. H. Hansen and S. D. Snyder, Active Control of Noise and Vibration: E. FN. Spon, 1997. [6] S. M. Kuo and D. R. Morgan, Active Noise Control Systems: Algorithms and DSP Implementations. New York: Wiley, 1996. [7] C. R. Fuller, S. J. Elliott, and P. A. Nelson, Active Control of Vibration. New York: Academic, 1996. [8] M. Bouchard, B. Paillard, and C. T. Le Dinh, Improved training of neural networks for nonlinear active control of noise and vibration, IEEE Trans. Neural Networks, vol. 10, pp. 391 401, 1999. [9] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1985. [10] S. Haykin, Adaptive Filter Theory, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1996. [11] S. D. Snyder and N. Tanaka, Active control of vibration using a neural network, IEEE Trans. Neural Networks, vol. 6, pp. 819 828, 1995. [12] M. O. Tokhi and R. Wood, Active noise control using radial basis function networks, Contr. Eng. Practice, vol. 5, pp. 1311 1322, 1997. [13] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 1999. [14] F. Beaufays and E. A. Wan, Relating real-time backpropagation and backpropagation-through-time: An application of flow graph interreciprocity, Neural Comput., vol. 6, no. 2, pp. 296 306, 1994. [15] E. A. Wan and F. Beaufays, Diagrammatic derivation of gradient algorithms for neural networks, Neural Comput., vol. 8, no. 1, pp. 182 201, 1996. [16] E. A. Wan, Adjoint-LMS: An efficient alternative to the filtered-x LMS and multiple error LMS algorithms, in Proc. ICASSP 96, vol. 3, 1996, pp. 1842 1845. [17] S. Shah, F. Palmieri, and M. Datum, Optimal filtering algorithms for fast learning in feedforward neural networks, Neural Networks, vol. 5, pp. 779 787, 1992. [18] S. Singhal and L. Wu, Training feedforward networks with the extended Kalman filter, in Proc. ICASSP, vol. 2, 1989, pp. 1187 1190. [19] M. Bouchard and S. Quednau, Multichannel recursive-least-squares algorithms and fast-transversal-filter algorithms for active noise control and sound reproduction systems, IEEE Trans. Speech and Audio Proc., vol. 8, pp. 606 618, 2000. [20] G. V. Puskorius and L. A. Feldkamp, Decoupled extended Kalman filtering training of feedforward layered networks, in Proc. Int. Joint Conf. Neural Networks, vol. 1, 1991, pp. 771 777. [21], Neurocontrol of nonlinear dynamical systems with Kalman filter-trained recurrent networks, IEEE Trans. Neural Networks, vol. 5, pp. 279 297, 1994. Martin Bouchard (M 98) received the B.Eng., M.App.Sc., and Ph.D. degrees in electrical engineering from Sherbrooke University, Sherbrooke, QC, Canada, in 1993, 1995, and 1997, respectively. He worked in an instrumentation group at Bechtel-Lavalin in 1991, and at CAE Electronics Inc. in 1992 1993 in a real-time software group. From 1993 to 1997, he worked at the Groupe d Acoustique et de Vibrations de l Université de Sherbrooke, as a signal processing/control research engineer. From 1995 to 1997, he also worked at SoftDb Active Noise Control Systems Inc. that he cofounded. In January 1998, he joined the University of Ottawa, Ottawa, ON, Canada, where he is currently an Assistant Professor at the School of Information Technology and Engineering (SITE). His current research interests are signal processing, adaptive filtering and neural networks, applied to speech, audio and acoustics. Dr. Bouchard is a member of the Ordre des Ingénieurs du Québec, the IEEE, the Audio Engineering Society and the Acoustical Society of America.