Pose Estimation in SAR using an Information Theoretic Criterion

Similar documents
Entropy Manipulation of Arbitrary Non I inear Map pings

Learning from Examples with Information Theoretic Criteria

A Nonlinear Extension of the MACE Filter

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Statistical Learning Theory and the C-Loss cost function

PATTERN CLASSIFICATION

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Recursive Least Squares for an Entropy Regularized MSE Cost Function

WHEN IS A MAXIMAL INVARIANT HYPOTHESIS TEST BETTER THAN THE GLRT? Hyung Soo Kim and Alfred O. Hero

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Simultaneous Multi-frame MAP Super-Resolution Video Enhancement using Spatio-temporal Priors

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Lecture 3: Pattern Classification

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

Generalized Information Potential Criterion for Adaptive System Training

Feature Extraction with Weighted Samples Based on Independent Component Analysis

Pattern Recognition and Machine Learning

Statistical Rock Physics

Reconnaissance d objetsd et vision artificielle

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Small sample size generalization

Machine Learning Lecture 2

Statistical Pattern Recognition

Microarray Data Analysis: Discovery

Discriminative Direction for Kernel Classifiers

Variable selection and feature construction using methods related to information theory

Recursive Generalized Eigendecomposition for Independent Component Analysis

L11: Pattern recognition principles

Recognition Performance from SAR Imagery Subject to System Resource Constraints

Multiple Similarities Based Kernel Subspace Learning for Image Classification

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Overriding the Experts: A Stacking Method For Combining Marginal Classifiers

Classification of Ordinal Data Using Neural Networks

Regularization in Neural Networks

Lecture 3: Pattern Classification. Pattern classification

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems

STA 414/2104: Lecture 8

Aruna Bhat Research Scholar, Department of Electrical Engineering, IIT Delhi, India

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Density Estimation: ML, MAP, Bayesian estimation

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Machine Learning Lecture 2

Parametric Techniques

Functional Preprocessing for Multilayer Perceptrons

No. of dimensions 1. No. of centers

Classifier s Complexity Control while Training Multilayer Perceptrons

Learning features by contrasting natural images with noise

MODULE -4 BAYEIAN LEARNING

Kernel Methods and Support Vector Machines

Global Scene Representations. Tilke Judd

Links between Perceptrons, MLPs and SVMs

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Information-Theoretic Learning

Neural Network Training

Naïve Bayes classification

Parametric Techniques Lecture 3

A Method to Improve the Accuracy of Remote Sensing Data Classification by Exploiting the Multi-Scale Properties in the Scene

Old painting digital color restoration

Non-parametric Classification of Facial Features

Advanced statistical methods for data analysis Lecture 2

FAST METHODS FOR EVALUATING THE ELECTRIC FIELD LEVEL IN 2D-INDOOR ENVIRONMENTS

Introduction to Support Vector Machines

Neutron inverse kinetics via Gaussian Processes

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

Change Detection in Optical Aerial Images by a Multi-Layer Conditional Mixed Markov Model

Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps

Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares

Goodness of Fit Test and Test of Independence by Entropy

Learning Kernel Parameters by using Class Separability Measure

Recent Advances in Bayesian Inference Techniques

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Machine Learning 2017

Artificial Neural Networks

Probability Models for Bayesian Recognition

Iterative Laplacian Score for Feature Selection

Feature selection and extraction Spectral domain quality estimation Alternatives

Sensor Tasking and Control

Face Detection and Recognition

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Human Pose Tracking I: Basics. David Fleet University of Toronto

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Nonparametric Methods Lecture 5

W vs. QCD Jet Tagging at the Large Hadron Collider

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

VECTOR-QUANTIZATION BY DENSITY MATCHING IN THE MINIMUM KULLBACK-LEIBLER DIVERGENCE SENSE

Comparison of Modern Stochastic Optimization Algorithms

Generalized Laplacian as Focus Measure

STA 4273H: Sta-s-cal Machine Learning

Linear Classifiers as Pattern Detectors

Learning Gaussian Process Models from Uncertain Data

Information Theory in Computer Vision and Pattern Recognition

Towards a Ptolemaic Model for OCR

The memory centre IMUJ PREPRINT 2012/03. P. Spurek

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Supplementary Figure 1: Scheme of the RFT. (a) At first, we separate two quadratures of the field (denoted by and ); (b) then, each quadrature

Transcription:

Pose Estimation in SAR using an Information Theoretic Criterion Jose C. Principe, Dongxin Xu, John W. Fisher III Computational NeuroEngineering Laboratory, U. of Florida. {principe,xu,fisher}@cnel.ufl.edu Abstract This paper describes a pose estimation algorithm based on an information theoretic formulation. We formulate the pose estimation statistically and show that pose can be estimated from a low dimensional feature space obtained by maximizing the mutual information between the aspect angle and the output of the mapper. We use the Havrda-Charvat definition of entropy to implement a nonparametric estimator based on the Parzen window method. Results in the MSTAR data set are presented and show the good performance of the methodology. 1.0 Introduction Knowing the relative position of a vehicle with respect to the sensor (normally called the aspect angle of the observation or the pose) is an important piece of information for vehicle recognition. Since pattern classifiers are statistical machines, without the pose information the classifier has to be trained with all possible poses to become invariant to aspect angle during operation. This is the principle of classifiers based on the synthetic discriminant function (SDF) so widely used in optical correlators [1], or the template based classifiers [2]. Even if the classifier is built around Bayesian principles or neural networks, all possible aspect angles have to be included during training to describe reliably the object. In SAR this is not a simple task due to the enormous variability of the scattering phenomenology created by man-made objects. This argument suggests that alternatively one could divide the classification in two stages: first find the pose of the object and then decide which is the class by selecting a classifier trained exclusively for that pose. Notice that this approach drastically reduced the complexity of the classifier training. This in fact is the principle used in the MSTAR architecture [3] where classifica- Jose C. Principe 1 CNEL, University of Florida

tion is divided in an indexing module followed by search and match. However, the approach utilized in MSTARs is based on the traditional method of a priori selecting landmarks in the vehicle and then comparing them for the best match with a database of features taken at different angles. This solution has several drawbacks. First, it is computationally expensive (search has to be done on-line). Second, it is highly dependent on the quality of the landmarks. Edges have been proved useful in optical images, but in SAR point scatters are normally preferred due to the different image formation characteristics. The issue is that point scatters vary abruptly with the depression angle and pose so the stability of the method is still under investigation. Third, the size of the problem space increases drastically with the number of objects and the precision required when local features are utilized. Instead of thinking that the system complexity is intrinsic to the problem [4], we submit that the problem formulation also affects the complexity of the solution. If the landmarks are local, then it is obvious that the problem does not scale up well. Our approach is to extract optimal features directly from the data by training an adaptive system. The advantages are the following: First, the method is very fast. Once the system is trained, during testing the image is presented and the output of the system is the estimation of pose, i.e. we have created a content addressable memory (CAM). Any microprocessor can do this in real time. Second, the system is not sensitive to the detection of landmarks which is a big advantage primarily when we do not know how much information is carried in the landmarks. Until the information formulation proposed here, this optimal feature extraction could only be done using principal component analysis (PCA) or linear discriminant analysis. PCA provides only global (rough) information about the objects (second order statistics) and the information provided may not be directly related to pose, which is just one aspect of the input image. So the results may be disappointing. However, our method of mutual information maximization is using the full information contained on the probability density function (pdf) so it can utilize local information if it is needed to solve the problem and the model parameters are right directed to the pose, which is our only interest here. This paper starts with a statistical formulation of the problem of pose estimation, describes a method of computing entropy from samples and how to construct a mutual information estimator, and presents preliminary results in the MSTAR data set. Jose C. Principe 2 CNEL, University of Florida

2.0 A statistical formulation of pose estimation Suppose that we have collected data in pairs ( x i, a i ), i 1,, N, where the image x i can be regarded as a vector in a high dimensional space x i ( m is usually in the thousands) and a i is a vector of ground truth information relative to the image contents. For the general case of pose R m estimation, a i is a six dimensional vector containing the translational and rotational information [5]. Here we will treat the one degree of freedom (1DOF) pose estimation problem where is a x i SAR image of a land vehicle obtained at a given depression angle, and a i is the azimuth (aspect) angle of the vehicle. The MSTAR data set [6] can be readily utilized to test the accuracy of 1DOF pose estimation algorithms. In general, the estimation of the aspect angle (here called pose) given can be formulated as a MAP (maximum a posteriori probability) prob- a particular image x lem: R m â argmax f AX x ( x, a) a (1) where f AX ( xa, ) is the a posteriori probability density function (pdf) of the aspect angle A x given x. This formulation implies that the best estimation of the aspect angle given x is the one which maximizes the a posteriori probability. Although the aspect angle A is a continuous variable, we can discretize it for convenience, where the possible values are a i, i 1,, N, i.e. all the angles in the training set. Since we have no a priori knowledge about the aspect angle, the uniform distribution is the most reasonable assumption about the probability density function of A in the sense that it is the direct result of MaxEnt [7] principle. Under these conditions, the above MAP problem is equivalent to ML (Maximum Likelihood): â Pa ( i )f XA ai ( xa i ) argmax Pa ( i x) argmax ------------------------------------------- argmax f f X ( x) XA ai ( xa i ) i i i (2) where pa ( i x), i 1,, N, is the a posteriori probability of the discrete variable A given x, Jose C. Principe 3 CNEL, University of Florida

Pa ( i ) is the a priori probability of A, which here is the uniform distribution, i.e. Pa ( i ) constant for i 1,, N ; f XA ai ( xa i ) is the conditional pdf of the image x for a particular aspect angle A, and f X ( x) is the marginal pdf of x. Therefore from (2), the problem becomes the estimation of the conditional pdf of x for all the possible angle a i, i 1,, N. Since x is a very high dimensional vector and any assumption about the form of the pdf is not appropriate for realistic pose estimation in SAR, a non-parametric method should be used. However, nonparametric pdf estimation of x becomes very unreliable since x is in a very high dimensional space and training data is limited. So, dimensionality reduction or feature extraction is necessary in this case, which means that instead of estimating the angle directly from the image x, we estimate it from a feature space of the image x. a i Generally, a feature is the output of a mapping. Let y h( x, w) be a feature set for x, where h: R m R k is a mapping, also called the feature extractor, y R k, k «m, and w is the parameter set of the feature extractor. Now, the problem according to (2) becomes: â argmax f ya ai ( ), y h( x, w) i ya i (3) In this framework, the key issue is how to choose the parameter set w. We propose to apply Information Theory [8]. From the information theoretic point of view, a mapping or feature extractor is an information transmission channel. The parameter of the mapping should be chosen so that it transmits as much information as possible. Here, the problem requires that the mapping transmits the most information about the aspect angle, i.e. the feature y should best represent the aspect angle. According to Information Theory, the quantitative measure for this purpose is the mutual information between the feature y and aspect angle a. So, parameter selection can be formulated as: w opt argmax Iy ( hxw (, ), a) w (4) Jose C. Principe 4 CNEL, University of Florida

where Iya (, ) is the mutual information between y and a, that is the optimal parameter set should be the one which maximizes the mutual information between the feature and the angle. Actually, the mutual information measure directly relies on pdfs. As mentioned above, non-parametric pdf estimation should be used, so the Parzen Window [9] method is selected here. Unfortunately, Shannon s mutual information measure will become too complex to be implementable with the Parzen Window pdf estimation. In the next section, we will introduce our method of mutual information estimation by the Havrda-Chavart s entropy. 5.2.1 Pose estimation using the Havrda Chavart s entropy Figure 1 shows the proposed block diagram for pose estimation. A(ngles) x y 1 Estimate mutual information y 2 I(Y,A) y f( x, w) adapt parameters w Figure 1. Pose estimation with the MLP From Information Theory, the mutual information can be computed by the difference between the entropy and the conditional entropy: IyA (, ) H H2 ( Y) H H2 ( YA) (5) where y is the feature and A is aspect angle. For reasons that are connected to the estimator of entropy from samples, here we utilize the Havrda-Chavart definition of entropy [10] Jose C. Principe 5 CNEL, University of Florida

H Hα ( Y) + ----------- 1 f 1 α Y ( y) α dy 1 (6) with α2 which will also be called the Quadratic entropy. For a more in depth discussion of several definitions of entropy see [10]. So H 2 ( Y) is the Quadratic entropy of the output and H 2 ( YA) is the conditional Quadratic Entropy. Since the MLP is a universal mapper [11] it is used in this application as the mapping function (here we use the configuration e.g. 6,400x3x2). Now, the problem can be described as finding the parameters (w) of the MLP so that the mutual information between the output of the MLP and the aspect angle is maximized, i.e. we let the output convey the most information about the aspect angle. We can think of this scheme as information filtering as opposed to the more traditional image filtering so commonly utilized in image processing. Suppose the training data set are pairs { x i, a i }, where x i is a SAR image of a vehicle and a i is its true azimuth (aspect) angle. The feature set y i hx ( i, w) is a 2 dimensional vector (y 1i,y 2i ) where the aspect can be easily measured as the angle of the vector. We can discretize uniformly the angles around the curve described by the output vector, as shown in Figure 2, where a circumference is assumed for simplicity. y 2 a 0 a1 a i, i 1,, N a 2 y 1 Fig 2. Structure for the angle information In our problem formulation, the pose is a random variable which must be described statistically. Jose C. Principe 6 CNEL, University of Florida

We create a local structure weighting the samples of adjacent angles samples a i k a i 1 a i a i 1 weights w l w 1 w 0 w 1 w l + a i + k 1 w l 0 w l 1 The neighborhood size was experimentally set at l 2 nearest neighbors, and the weighting was selected as a Gaussian decay. Effectively this arrangement says that there is a fuzzy correspondence between several possible angles and each one of the sampled points in the unit circumference. The reason we selected the HC Quadratic entropy is related to the Parzen window estimator presented in [12]. Let R k, i 1,, N, be a set of samples from a random variable Y R k in k- y i dimensional feature space. One interesting question is what will be the entropy associated with this set of data points. One answer lies in the estimation of the data pdf by the Parzen window method using a Gaussian kernel: f Y ( y) N --- 1 Gy y N ( i, σ 2 ) i 1 (7) where G(.,.) is the Gaussian kernel Gyσ (, ) ------------------------------- 1 1 in dimensional ( 2π) k 2 σ exp --------y T y 1 2 2σ 2 k space, and σ 2 is the variance. When Shannon s entropy is used along with this pdf estimation, the measure becomes very complex. Fortunately, HC quadratic entropy of (6) leads to a simpler form and we obtain the following entropy measure for a set of discrete data points { }: y i H( { y i }) H H2 ( Y { y i }) 1 f Y ( y) 2 dy 1 V( { y i }) V( { y i }) N N + N 2 i 1 j 1 + ----- 1 Gy ( y i, σ 2 )Gy ( y j, σ 2 ) dy N N ----- 1 Gy ( i y j, 2σ 2 ) N 2 i 1 j 1 (8) Jose C. Principe 7 CNEL, University of Florida

With this estimator the mutual information related to the quadratic HC entropy becomes IYA (, ) k 2 --- 1 w N l Gy ( y i + l ) y ----- 1 Gy ( y N 2 i ) d 2 dy i i l k (9) The second term estimates the entropy due to all the input images, while the first term estimates the conditional entropy. In order to train the MLP, we take the derivative of (9) with respect to the parameters and interpret it as an injected error to the back-propagation algorithm [12]. In this way, the feature extraction mapping for pose estimation can be obtained. After training the testing image x is presented to the MLP, and its output y estimates the discrete conditional pdf in the output feature space ( ). Then the pose can be estimated by using (3). f YA ai ya i 3.0 Experimental Results This algorithm was validated in the MSTAR public release database [6]. We trained the pose estimator with the class BMP-2 vehicle, type sn-c21 with a depression angle of 15 degrees. We simply clipped the chips (128x128) from pixel 20 to 99 both vertically and horizontally (obtained image chip size of 80x80) to preserve the image of the vehicle and its shadow. No fine centering of the vehicle was attempted. The training set was constructed from 53 chips taken at approximately 3.5 o angle apart to cover angles from 0 to 180 degrees. The algorithm takes about 100 batch iterations to converge (very repeatable performance). In Figure 3 the circle at left (diamonds) represents the training results in the feature space. Notice that the MLP trained with our criterion created an output that is almost a perfect circle. The circle can be interpreted as the best output distribution to maximize the mutual information between the input and the pose. This result is intuitively appealing, but notice that it was discovered automatically using our algorithm (i.e. we did not enforced the circle as a desired response). The triangles at the left show the typical results in a test set (the chips for the same vehicle not used for training). It is interesting that the amplitude for the test set fluctuates a lot, but the outputs tend to move inwards along the radial direction, preserving the quality of the pose estimation. This means that Jose C. Principe 8 CNEL, University of Florida

the algorithm created an output metric that preserves angle relationships as we expected. The figure at the right shows the true and estimated pose. The vertical axis is the angle and the horizontal axis is the exemplar index. angle y1 y2 image # Figure 3. BMP-2, CN-C21 (180 degree training) The testing was conducted in the rest of the chips from the same vehicle and two other vehicle types (SN-9563 and 9566) which represent different configurations (all at the same depression angle). We also tested the pose estimator on a different class, the T-72, using the type sn-s7. Table I quantifies the results. Table 1: Testing with 0-180 training class/type error mean (degrees) error std. dev. (degrees) BMP2/sn-c21 3.45 2.58 BMP2/sn-9563 4.99 3.87 BMP2/sn-9566 4.99 6.64 T72/sn-s7 6.98 5.19 Notice that the pose estimation error in the testing of the same vehicle type is basically the same as the resolution in training (3.5 o ) which means that the accuracy of the estimator is very good. Therefore we expect that more precise pose estimation are achievable by creating training sets with more images with finer resolution in pose. Table I also shows that the algorithm generalizes very well to both other vehicle types and even Jose C. Principe 9 CNEL, University of Florida

other vehicle classes. We notice a degradation in performance in the T72, but it is a smooth rolloff. If we want to obviate this degradation of performance with the vehicle type we should utilize more than one vehicle in the training set, which at the same time will obviate the resolution problem addressed above. However, we have to state that the algorithm for mutual information estimation is O(N 2 ), which means that there is practically a limit on the number of exemplars (N) utilized in training. In order to quantify the robustness of the algorithm to vehicle occlusion we have replaced progressively one vehicle image with the background (this is an image of the BMP2 not used in training). We observed that although the amplitude of the output feature decreased appreciably when the bright output of the vehicle was substituted by the darker background (the triangles in the left portion of Figure 4), the pose estimation hold-off remarkably well (right portion of Figure 4). In this case the pose was within an angle of +/- 5 degree up to 50% occlusion and +/- 10 degrees up to 95% occlusion (which occurs at increment 36 in the plot). In our opinion this smooth degradation is one of the advantages of using a distributed system as a mapper, and the same behavior has been extensively reported in the associative memory literature [13]. However, different occlusion directions may provide different performance (it all depends upon which portions of the image are occluded). angle y1 y2 occlusion sequence Figure 4. Results of pose estimation with vehicle occlusion. Vehicle pose is 58 degrees. Jose C. Principe 10 CNEL, University of Florida

6.0 Conclusions This paper reports on our present efforts to create a robust and easy to implement pose estimator for SAR imagery. The need for such an algorithm stems from our goal of creating accurate and robust classifiers. Knowing the pose of the vehicle will streamline the size and training of the classifier module which should be translated in better performance. Our pose estimation framework is statistical in nature and utilizes directly information through manipulation of entropy from examples. We address the enormous complexity of the input space by creating an adaptive systems with optimal parameters. This is probably the best way to deal and conquer complexity. We project the input data to a subspace such that some property of the input relevant for our problem is optimally preserved. This can be thought as information filtering as opposed to the more conventional signal filtering. The issue is the choice of the criterion for optimization. We were fully aware of the limitation of the second order methods utilized traditionally in pattern recognition, so we sought a method that would utilize the full information about the pdf of the input class. The mutual information between the feature and pose becomes the criterion of choice. This criterion measures simply the uncertainty remaining in the feature (the output of the mapper) about pose. By maximizing mutual information we are decreasing the uncertainty of pose in the feature, i.e. we are transferring as much information as possible between the feature and pose. There are also other reasons to use mutual information for classification such as the decrease of the lower bound of the classification error according to Fano s equality [14]. The big issue is the estimation of entropy from examples. In [12] we proposed a Parzen window to estimate the pdf along with mean squared difference between the uniform distribution and the estimated one to manipulate the output entropy. The derivative of the criterion can be used as an injected error to adapt the parameters of our mapper (linear or nonlinear) using the backpropagation algorithm. In this paper we couple the entropy estimator with the definition of Quadratic entropy according to Havrda-Charvat to come up with an estimator of mutual information. The preliminary results of our method are very promising. We successfully trained our pose esti- Jose C. Principe 11 CNEL, University of Florida

mator with MSTAR vehicles. The accuracy in the test set is similar to the training set in the same vehicle, and the performance degrades gracefully to other vehicle types. Even with severe occlusion of the training vehicle (up to 95% occlusion) we obtain estimates of pose within +/- 10 degrees. Further testing of the algorithm is required, as well as further refinements to the theory. The image set is realistic, but still simple (1-DOF). Extension to more degrees of freedom will be pursued next, as well as more vehicles. Our pose estimator is based on the fact that the angle is discrete. It is important for accuracy to utilize the angle as a continuous variable. This will require a new estimator for the condition entropy. It is also important to understand the algorithm better and to compare its performance with alternate approaches. One of the bottlenecks of the method is that the computation is O(N 2 ), which imposes a practical limit on the size of training sets. Acknowledgments: This work was partially support by DARPA-Air force grant F33615-97-1019. 4.0 References [1] Kumar B., Minimum variance synthetic discriminant functions, J. Opt. Soc. Am. A 3(1), 1579-1584, 1986. [2] Duda R. and Hart P., Pattern classificatioin and scene analysis, Wiley, 1973. [3] MSTAR Kickoff Meeting Proceedings, Washingtom, 1995. [4] Minardi M., Moving & stationary target acquisition and recognition, WL talk, September 1997. [5] Lowe, D., Solving parameters of object models from image descriptions, In Proc. ARPA IU workshop, pp 121-127, 1980. [6] MSTAR (Public) Targets, CDROM, Veda Inc., Ohio, 1997. [7] Jaynes E., Information theory and statistical mechanics, Phys. Rev., vol 106, pp 620-630, 1957. [8] Shannon, C.E. A mathematical theory of communication. Bell Sys. Tech. J. 27, 1948, pp379-423, 623-653 [9] Parzen, E. On the estimation of a probability density function and the mode, Ann. Math. Jose C. Principe 12 CNEL, University of Florida

Stat. 33, 1962, p1065 [10] Kapur, J.N. Measures of Information and Their Applications. John Wiley & Sons. 1994 [11] Haykin S., Neural Networks, A Comprehensive Foundation, Macmillan Publishing Company, 1994 [12] Fisher J., Principe J., Entropy manipulation of arbitrary nonlinear mappings, Proc. IEEE Workshop on Neural Nt. for Signal Proc. VII, 14-23, 1997. [13] Kohonen T., Self-organization and associative memory, Springer Verlag, 1987. [14] Fisher J.W.III Nonlinear Extensions to the Minimum Average Correlation Energy Filter Ph.D dissertation, Dept. of ECE, University of Florida, 1997. Jose C. Principe 13 CNEL, University of Florida