A Nonlinear Extension of the MACE Filter

Size: px
Start display at page:

Download "A Nonlinear Extension of the MACE Filter"

Transcription

1 page 1 of 27 A Nonlinear Extension of the MACE Filter John W. Fisher III and Jose C. Principe Computational NeuroEngineering Laboratoryfisher@synapse.ee.ufl.edu Department of Electrical Engineeringprincipe@brain.ee.ufl.edu University of Florida Abstract - The minimum average correlation energy (MACE) filter, which is linear and shift invariant, has been used extensively in the area of automatic target detection and recognition (ATD/R). We present a nonlinear extension of the MACE filter, based on a statistical formulation of the optimization criterion, of which the linear MACE filter is a special case. A method by which nonlinear topologies can be incorporated into the filter design is presented and adaptation issues are discussed. In particular, we outline a method by which training exhaustively over the image plane is avoided which leads to much shorter adaptation. Experimental results, using target chips from 35 GHz TABILS 24 inverse synthetic aperture (ISAR) data are presented and performance comparisons are made between the MACE filter and this nonlinear extension. Acknowledgement: This work was partially supported by ARPA grant N C-A335. John W. FIsher 405 CSE, BLDG 42 University of Florida Gainesville, FL (904) (904) (FAX) fisher@synapse.ee.ufl.edu

2 page 2 of 27 Abstract - The minimum average correlation energy (MACE) filter, which is linear and shift invariant, has been used extensively in the area of automatic target detection and recognition (ATD/R). We present a nonlinear extension of the MACE filter, based on a statistical formulation of the optimization criterion, of which the linear MACE filter is a special case. A method by which nonlinear topologies can be incorporated into the filter design is presented and adaptation issues are discussed. In particular, we outline a method by which training exhaustively over the image plane is avoided which leads to much shorter adaptation. Experimental results, using target chips from 35 GHz TABILS 24 inverse synthetic aperture (ISAR) data are presented and performance comparisons are made between the MACE filter and this nonlinear extension. Keywords - correlation filters, MACE filter, ISAR, automatic target recognition.

3 page 3 of Introduction In the area of automatic target detection and recognition (ATD/R), it is not only desirable to recognize various targets, but to locate them with some degree of resolution. The minimum average correlation energy filter (MACE) (Mahalanobis et al, 1987) is of interest to the ATD/R problem due to its localization and discrimination properties. Correlation filters, of which the MACE is an example, have been widely used in optical pattern recognition in recent years. Our current interest is in the application of these types of filters to high-resolution synthetic aperture radar (SAR) imagery. Some recent articles have appeared showing experimental results using correlation filters on SAR data. Mahalanobis (Mahalanobis, Forman et al, 1994) used MACE filters in combination with distance classifier methods to discriminate 5 classes of vehicles and reject natural clutter as well as a confusion vehicle class with fairly good results. Novak (Novak et al, 1994) presents results comparing the performance of several classifiers including the MACE filter on SAR data. The MACE filter is a member of a family of correlation filters derived from the synthetic discriminant function (SDF) (Hester and Casasent, 1980). Other generalizations of the SDF include the minimum variance synthetic discriminant function (MVSDF) (Kumar et al, 1988), the MACE filter, and more recently the gaussian minimum average correlation energy (G-MACE) (Casasent et al, 1991) and the minimum noise and correlation energy (MINACE) (Ravichandran and Casasent, 1992) filters. All of these filters are linear and shift-invariant 1 and can be formulated as a quadratic optimization subject to a set of linear constraints in either the sample or spectral domain. The solution to these problems is obtained using the method of Lagrange multipliers. Kumar (Kumar; 1992) gives an excellent review of these filters. The bulk of the research using these types of filters has concentrated on optical and infra-red (IR) imagery and overcoming recognition problems in the presence of distortions associated with 1 We refer here to the signal processing definition of shift invariance, that is, an operator is said to be shift invariant if a shift in the input results in a corresponding shift in the output.

4 page 4 of 27 3-D to 2-D mappings, i.e. scale and rotation. Usually, several exemplars from the recognition class are used to represent the range of distortions over which the filter is to be used. Although the distortions in SAR imagery do not occur in the same way, that is a change in target aspect does not manifest exactly as a rotation in the SAR image, exemplars may still be sufficient to model a single target class over a range of target aspects and relative depression angles. Our focus is on the MACE filter and its variants because they are designed to produce a narrow constrained-amplitude peak response when the filter mask is centered on a target in the recognition class while minimizing the energy in the rest of the output plane. The filter can be modified to produce a low variance output for a designated rejection class as well. Another property of the MACE filter is that the constrained peak output is guaranteed over the training exemplars to be the maximum in the output image plane (Mahalanobis et al, 1987). Since the MACE filter is linear, it can only be used to realize linear discriminant functions. Along with its desirable properties it has been shown to be limited in its ability to generalize to between-aspect exemplars that are in the recognition class (but not in the training set), while simultaneously rejecting out-of-class inputs (Ravichandran and Casasent, 1992)(Casasent, and Ravichandran, 1992)(Casasent et al, 1991). The number of design exemplars can be increased in order to overcome generalization problems, however the computation of the filter coefficients becomes computationally prohibitive and numerically unstable as the number of design exemplars is increased (Kumar; 1992). The MINACE and G-MACE variations have improved generalization properties with a slight degradation in the average output plane variance and sharpness of the central peak, respectively. In the sample domain, the SDF family of correlation filters is equivalent to a cascade of a linear pre-processor followed by a linear correlator (Mahalanobis et al, 1987)(Kumar; 1992). (Fisher and Principe, 1994) showed that this is equivalent to a preprocessor followed by a linear associative memory (LAM), illustrated in figure 1 with vector operations. The pre-processor, in the case of the MACE filter, is a pre-whitening filter computed on the basis of the average power spectrum

5 page 5 of 27 of the recognition class training exemplars. Mahalanobis et al (Mahalanobis et al, 1987) use a synthetic discriminant function (SDF) to refer to the LAM portion of the filter decomposition. y = Ax h = y( y y) 1 d input image, x pre-processor LAM/SDF scalar output SDF Filter Decomposition FIGURE 1. Decomposition of SDF-type filter in space domain, assuming image and filter coefficients have been re-ordered into vectors. The input image vector, x, is pre-processed by the linear transformation, y = Ax. The resulting vector is processed by a linear associative memory (LAM), y out = y h. We use the associative memory viewpoint for investigating extensions to the MACE filter. It is well known that non-linear associative memory structures can outperform their linear counterparts on the basis of generalization and dynamic range (Kohonen, 1988)(Hinton and Anderson, 1981). In general, they are more difficult to design as their parameters cannot be computed in closed form. The parameters for a large class of nonlinear associative memories can, however, be determined by gradient search techniques. In this paper we discuss a non-linear extension of the MACE filter that shows promise in overcoming some of the problems described. In our development we show that the performance of the linear MACE filter can be improved upon in terms of generalization while maintaining its desirable properties, i.e. sharp, constrained peak at the center of the output plane. In this paper we present experimental results using a simple nonlinear modification of the MACE filter. We replace the LAM portion of the filter with a nonlinear associative memory structure, specifically a feedforward multi-layer perceptron (MLP) which retains the shift invariance properties but yields improved performance via a nonlinear discriminant function. In section 2.0 we review the MACE filter formulation and its relationship to associative memories. In section 3.0 we develop a generalized statistical filter structure of which the linear MACE filter is a special case. Section 4.0 details experimental results using TABILS 24 inverse synthetic

6 page 6 of 27 aperture (ISAR) imagery. We compare performance of the linear MACE filter to a nonlinear extension. We draw our conclusions and observations in section MACE Filter as an Associative Memory In the original development SDF type filters were formulated using correlation operations, although a convolutional approach can be easily adopted. The output, g( n 1, n 2 ), of a correlation filter is determined by N 1 1 N 2 1 g( n 1, n 2 ) = x ( n 1 + m 1, n 2 + m 2 )h( m 1, m 2 ) m 1 = 0 m 2 = 0 = x ( n 1, n 2 ) h( n 1, n 2 ), where x ( n 1, n 2 ) is the complex conjugate of the input image with N 1 N 2 region of support and h( n 1, n 2 ) represents the filter coefficients. The MACE filter is formulated as follows (Mahalanobis et al, 1987). Given a set of image exemplars, x i R N 1 N { 2 ; i = 1 N t }, we wish to find filter coefficients, h R N 1 N 2, such that average correlation energy at the output of the filter E = N t N t i = 1 N 1 1 N g( n N 1 N 1, n 2 ) 2 2 n 1 = 0 n 2 = 0 (1) is minimized subject to the constraints N 1 1 N 2 1 d i i g i ( 0, 0) = x i ( m 1, m 2 )h ( m 1, m ) 2 = ; = 1 N t m 1 = 0 m 2 = 0. (2) Mahalanobis (Mahalanobis et al, 1987) reformulates this as a vector optimization in spectral domain using Parseval s theorem. Let X C N 1 N 2 N t be a matrix whose columns contain the 2-D DFT coefficients of exemplars { x 1,, x N t } reordered into column vectors. Let the matrix

7 page 7 of 27 D i R N 1 N 2 N 1 N 2 be a diagonal matrix whose diagonal elements contain the magnitude squared of the 2-D DFT coefficients of the i th exemplar. The diagonal elements of the matrix D = N t D i N t i = 1 (3) are then the average power spectrum of the training exemplars. The solution to this optimization problem can be found using the method of Lagrange multipliers. In the spectral domain, the filter that satisfies the constraints of equation (2) and minimizes the criterion of equation (1) (Mahalanobis et al, 1987)(Kumar; 1992) is H = ( N 1 N 2 )D 1 X( X D 1 X) 1 d, (4) where H C N 1 N 2 1 contains the 2D-DFT coefficients of the filter, assuming the nonunitary 2-D DFT as defined in (Oppenheim and Shafer, 1989), re-ordered into a column vector and d R N t 1, for each exemplar. This formulation can be easily cast as an asso- contains the desired outputs, ciative memory. d i In general, associative memories are mechanisms by which patterns can be related to one another, typically in an input/output pair-wise fashion. From a signal processing perspective we view associative memories as projections (Kung, 1992), linear and nonlinear. The input patterns exist in a vector space and the associative memory projects them onto a new space. Kohonen s linear associative memory (Kohonen, 1988) is formulated exactly in this way. A simple form of the linear associative memory (the hetero-associative memory) maps vectors to scalars, that is, given a set of input/output vector/scalar pairs { x i R N 1, d i R, i = 1 N t }, find the linear projection, h, such that h T x = d T (5)

8 page 8 of 27 and, in the under-determined case, the product h T h (6) is minimized, while for the over-determined case h is found such that ( h T x d T )( h T x d T ) T (7) is minimized. The columns of the matrix x = [ x 1 x N t ] contain the input vectors and the elements of the vector, d = [ d 1 d N t ] T contain the associated desired output scalars. The optimal solution for the under-determined, using the pseudo-inverse of x is (Kohonen, 1988) h = x( x T x) 1 d. (8) As was shown in (Fisher and Principe, 1994), if we modify this linear associative memory model slightly by adding a pre-processing linear transformation matrix, A, and find h such that the under-determined system of equations h T ( Ax) = d T (9) is satisfied while h T h is minimized, we get the result h = Ax( x T A T Ax) 1 d. (10) If the pre-processing transformation, A, is the space-domain equivalent of the MACE filter s spectral pre-whitening filter then equation (10) combined with the pre-processing transformation yields exactly the space domain coefficients of the MACE filter when the input vectors, x, are the re-ordered elements of the original images.

9 page 9 of Nonlinear Extension of the MACE Filter The MACE filter is the best linear system that minimizes the energy in the output correlation plane subject to a peak constraint at the origin. One of the advantages of linear systems is that we have the mathematical tools to use them in optimal operating conditions. Such optimality conditions, however, should not be confused with the best possible performance. In the case of the MACE filter one drawback is poor generalization. A possible approach to design a nonlinear extension to the MACE filter and improve on the generalization properties is to simply substitute the linear processing elements of the LAM with nonlinear elements. Since such a system can be trained with error backpropagation, the issue would be simply to report on performance comparisons with the MACE. Such methodology does not, however, lead to understanding of the role of the nonlinearity, and does not elucidate the tradeoffs in the design and in training. Here we approach the problem from a different perspective. We seek to extend the optimality condition of the MACE to a nonlinear system, i.e. the energy in the output space is minimized while maintaining the peak constraint at the origin. Hence we will impose these constraints directly in the formulation, even knowing a priori that an analytical solution is very difficult or impossible to obtain. We reformulate the MACE filter from a statistical viewpoint and generalize it to arbitrary mapping functions, linear and nonlinear. We begin with a random vector, x R N 1 N 2 1, which is representative of the rejection class and a set of observations of the random vector, placed in the N t matrix x o R N 1 N 2 N t, which represent the target sub-class. We wish to find the parameters, α of a mapping, g( α, x):r N 1 N 2 1 R such that we may discriminate target vectors from vectors in the general rejection class. In this sense the mapping function, g, constrains the discriminator topology. Towards this goal, we wish to minimize the objective function J = E( g( α, x) 2 )

10 page 10 of 27 over the mapping parameters, α, subject to the system of constraints g( α, x o ) = d T, (11) where d R N t 1 is a column vector of desired outputs. It is assumed that the mapping function is applied to each column of x o, and E( ) is the expected value function. Using the method of Lagrange multipliers, we can augment the objective function as J = E( g( α, x) 2 ) + ( g( α x ) d T )λ, o, (12) where the mapping is assumed to be applied to each column of x o. Computing the gradient with respect to the mapping parameters yields J α = 2E g( α, x) g( α, x) g( α, x o ) λ α α. (13) Equation (13) along with the constraints of equation (11) can be used to solve for the optimal parameters, α o, assuming our constraints form a consistent set of equations. This is, of course dependent on the network topology. For arbitrary nonlinear mappings it will, in general, be very difficult to solve for globally optimal parameters analytically. Our initial goal, instead, is to develop topologies and adaptive training algorithms which are practical and yield improved generalization over the linear mappings. It is interesting to verify that this formulation yields the MACE filter as a special case. If, for example, we choose the mapping to be a linear projection of the input image, that is g( α, x) = α T x ; α [ h 1 h N 1 N 2 ] T R N 1 N 2 1 =, then equation (12) becomes, after simplification, J = α T E( xx T )α + ( α T xo d T )λ. (14)

11 page 11 of 27 In order to solve for the mapping parameters, α, we are still left with the task of computing the term E( xx T ) which, in general, we can only estimate from observations of the random vector, x. Assuming that we have a suitable estimator, the well known solution to the minimum of equation (14) over the mapping parameters subject to the constraints of equation (11) is α 1 = Rˆ x xo x o Rˆ x1 xo 1 d where Rˆ x = estimate{ E( xx T )}. (15) Depending on the characterization of x, equation (15) describes various SDF-type filters (i.e. MACE, MVSDF, etc.). In the case of the MACE filter, the random vector, x, is characterized by all 2D circular shifts of target class images away from the origin. Solving for the MACE filter coefficients is therefore equivalent to using the average circular autocorrelation sequence (or equivalently the average power spectrum in the frequency domain) over images in the target class as estimators of the elements of the matrix E( xx T ). Sudharsanan et al (Sudharsanan et al, 1990) suggest a very similar methodology for improving the performance of the MACE filter. In that case the average linear autocorrelation sequence is estimated over the target class and this estimator of E( xx T ) is used to solve for linear projection coefficients in the space domain. The resulting filter is referred to as the SMACE (space-domain MACE) filter. As stated, our goal is to find mappings, defined by a topology and a parameter set, which improve upon the performance of the MACE filter in terms of generalization while maintaining a sharp constrained peak in the center of the output plane for images in the recognition class. One approach, which leads to an adaptive algorithm, is to approximate the original objective function of equation (12) with the modified objective function J = ( 1 β)e( g( α, x) 2 ) + β[ g( α, x o ) d T ][ g( α, x o ) d T ] T. (16) The principal advantage gained by using equation (16) over equation (12) is that we can solve adaptively for the parameters of the mapping function (assuming it is differentiable). The constraint equations, however, are no longer satisfied with equality over the training set. Varying β in

12 page 12 of 27 the range [ 0, 1] controls the degree to which the average response to the rejection class is emphasized versus the variance about the desired output over the recognition class. In (Réfrégier and Figue, 1991.) an optimal criterion trade-off method is presented. The authors show that the convex combination over the set of criteria describe a performance bound for the linear mapping. Mahalanobis (Mahalanobis, Kumar et al, 1994.) extends this idea to unconstrained linear correlation filters. Further investigation will be required in order to explore the relationship and performance of these linear filters relative to the nonlinear mappings we are currently studying. As in the linear case, we can only estimate the expected variance of the output due to the random vector input and its associated gradient. If, as in the MACE (or SMACE) filter formulation, x is characterized by all 2D circular (or linear) shifts of the recognition class away from the origin then this term can be estimated with a sampled average over the exemplars, x o, for all such shifts. From an adaptive standpoint this leads to a gradient search method which trains exhaustively over the entire output plane. This becomes a computationally intensive problem for most nonlinear mappings. It is desirable, then, to find other equivalent characterizations of the rejection class which may alleviate the computational load without significantly impacting performance. This issue is addressed in later sections. 3.1 Architecture A block diagram of the proposed nonlinear extension is shown in figure 2. In the pre-processor/ LAM decomposition of the MACE filter the LAM structure was replaced with a feed-forward multi-layer perceptron (MLP). The pre-processor remains as a linear, shift-invariant pre-whitening transformation, R N 1 N 2 1 R N 1 N 2 1, yielding a pre-whitened space domain image. The MLP has two hidden layers with N 1 N 2 nodes on the input layer, corresponding to an input mask with N 1 N 2 support in the image domain. This layer can be implemented with two correlators followed by nonlinear elements. The outputs of these elements feed into four nodes on the second hidden layer which nonlinearly combine the two features followed by a single output node. The nonlinearity is the logistic function. Since the mapping is R N 1 N 2 1 R, we must, of course, apply

13 page 13 of 27 the filter input mask to each location in the original input image in order to obtain an output image. A Image Pre-processor Σ σ Σ σ Σ σ Σ σ Σ σ Output Scalar Σ σ Input Image NL-MACE MLP Σ σ FIGURE 2. Experimental nonlinear MACE structure. The specific architecture was chosen for several reasons. The linear MACE filter extracts the optimal feature over the design exemplars for a linear discriminant function. Any linear combinations of additional linear features will yield an equivalent linear feature. This means that the MACE filter is the best linear feature extractor for the target exemplars. Only a nonlinear system can improve on this design. The MLP structure has the advantage of providing an efficient means of nonlinearly combining an optimal linear feature with others. It is well known that a single hidden layer MLP can realize any smooth discriminant function of its inputs. If we view the output of each node in the first hidden layer as an extracted feature of the input image, then the second layer gives the capability of realizing any smooth discriminant function of first hidden layer output.

14 page 14 of 27 This is illustrated in figure 3, where the linear outputs plus bias terms, f 1 + θ 1 and f 2 + θ 1, of the first hidden layer are the features of interest, and f( ) is the nonlinear logistic function. A Image Pre-processor Σ σ f( f 1 + θ 1 ) Σ σ Σ σ f( f 2 + θ 1 ) Σ σ Σ σ Feature Extraction Σ σ Σ σ Discriminant Function FIGURE 3. Division of pre-processor/mlp into feature extraction and discriminant function. The division of figure 3 will be useful in later analysis. If the performance of the linear MACE filter can be improved, the addition of a single feature should be sufficient to illustrate this improvement. It is for this reason that we set the number of nodes to two on the first hidden layer, although more hidden nodes may lead to even better performance. Finally, the MLP with backpropagation provides a simple means for adapting the NL-MACE although a globally optimal solution is not guaranteed. The mapping function of the NL-MACE can be written g( α, x) = f ( W 3 f ( W 2 f ( W 1 x + θ 1 ) + θ 2 )) α W 1 R 2 N 1 N 2 = {, W 2 R 4 2 W 3 R 1 4, θ 1, θ 2 }. (17) Implicit in equation (17) is that the terms θ 1 and θ 2 are constant bias matrices with the appropriate dimensionality. It is also assumed that if the argument to the nonlinear function f ( ) is a matrix then the nonlinearity is applied to each element of the matrix. We can rewrite the linear

15 page 15 of 27 input term, W 1 x, which is the only term with dependency on the input image (reordered into a column vector), as W 1 x h T 1 x = : = h T N h1 x f 1 ( x) f N h1 : ( x), (18) where is the number of hidden nodes in the first layer of the MLP (two in our specific case) N h1 and h T T { 1,, h N h1 } R 1 N 1 N 2 are the rows of the matrix W 1. The elements of the result, { f 1 ( x),, f N h1 ( x) } R 1 N 1 N 2, are recognized as the outputs, in vector form (Kumar et al, 1988)(Mahalanobis et al, 1987), of purely real linear correlation filters operating in parallel, N h1 therefore the elements of this term are shift-invariant. Rewriting equation (17) as a function of its shift invariant terms g( x) = f W 3 f W 2 f f 1 ( x) f N h1 ( x) T + θ 1 + θ 2 (19) we can see that the output is a static function of shift invariant input terms. Any shift in the input image will be reflected as a corresponding shift in the output image. The mapping is, therefore, shift invariant. 3.2 Avoiding Exhaustive Training Training becomes an issue once the associative memory structure takes a nonlinear form. The output variance of the linear MACE filter is minimized for the entire output plane over the training exemplars. Even when the coefficients of the MACE filter are computed iteratively we need only consider the output point at the designated peak location (constraint) for each pre-whitened training exemplar (Fisher and Principe, 1994). This is due to the fact that for the under-determined case, the linear projection which satisfies the system of constraints with equality and has minimum norm is also the linear projection which minimizes the response to images with a flat power spectrum. This solution is arrived at naturally via a gradient search only at the constraint location.

16 page 16 of 27 This is no longer the case when the mapping is nonlinear. Adapting the parameters via gradient search on pre-whitened exemplars only at the constraint location will not, in general, minimize the variance in the output image. In order to minimize the variance over the entire output plane we must consider the response of the filter to each location in the input image, not just the constraint location. The brute force approach would be to adapt the parameters over the entire output plane which would require N 1 N 2 N t image presentations per training epoch. If such exhaustive training is done, then the pre-whitening stage seems unnecessary. The pre-whitening stage and the input layer weights could be combined into a single equivalent linear transformation, however, prewhitening separately enables us to greatly reduce the number of image presentations during training. This can be explained as follows: due to the statistical formulation, we are only reducing the response of the NL-MACE filter to images with the second order statistics of the rejection class. If the exemplars have been pre-whitened then the rejection class can be represented with random white images. Minimizing the response to these images, in the average, minimizes the response to shifts of the exemplar images since they have the same second-order statistics. In this way we do not have to train over the entire output plane exhaustively, thereby reducing training times proportionally by the input image size, N 1 N 2. Experimentally the difference in convergence time was approximately 2300 epochs of N 1 N 2 N t image presentations for exhaustive training versus 1800 epochs of ( N t + 4) image presentations (training exemplars plus 4 white noise images) for noise training, with nearly the same performance in both cases. This is obviously a considerable speedup in training for even moderate images sizes. In both cases, the resulting filters exhibit improved performance over the linear MACE filter in terms of generalization and output variance. 3.3 Linear Versus Nonlinear Discriminant Functions Several observations were made during our experiments. It became apparent that linear solutions were a strong attractor. Examination of the input layer showed that the columns of were W 1

17 page 17 of 27 highly correlated. When this condition is true, although a nonlinear system is being used, the mapping of the image space to the feature space is confined to a narrow strip. The net result is that a mapping similar to the linear MACE filter could be achieved with a single node on the first hidden layer and we have achieved a linear discriminant function with a complicated topology. Even if the resulting linear discriminant function yields better performance there are much better and well documented methods for finding linear discriminant functions (Réfrégier and Figue, 1991.) (Mahalanobis, Kumar et al, 1994.). In order to find, with the MLP, a nonlinear discriminant function of the image space modifications were made to the adaptation procedure. The presumption here is that better performance (in terms of discrimination, localization, and generalization) can be achieved using a nonlinear discriminant function. It is certainly possible that in some input spaces the best discrimination can be achieved with a linear projection, but in a space as rich as the one in which we are working we believe that this will rarely be the case. The modification to the adaptation was to enforce orthogonality on the columns of the input layer weight matrix, T h T W 1 W 1 h 1 h T 1 h 2 h = = h T 2 h 1 h T 2 h 0 h 2 2 via Gram-Schmidt orthogonalization, where { W 1, h 1, h 2 } are as in equations (17) and (18). This has two consequences. First, it guarantees that the mapping to the feature space is not rank-deficient although it does not ensure that the discriminant function derived through gradient search will utilize the additional feature. The second consequence is that, assuming we have pre-whit-

18 page 18 of 27 ened input images over the rejection class, the extracted linear features will also be orthogonal, in the statistical sense, over the rejection class. Mathematically, this can be shown as follows E W 1 xx T T ( W 1) = W 1 E( xx T T )W 1 = h T 1 E( xx T )h 1 h T 1 E( xx T )h 2 h T 2 E( xx T )h 1 h T 2 E( xx T )h 2. (20) As a consequence of the pre-whitening, the term E( xx T ) is of the form σ 2 I N 1 N 2, where σ 2 is a scalar and I N 1 N 2 is the N 1 N 2 N 1 N 2 identity matrix. Substituting into equation (20) gives E W 1 xx T T ( W 1) = h T 1 ( σ 2 I N 1 N 2 )h 1 h T 1 ( σ 2 I N 1 N 2 )h 2 h T 2 ( σ 2 I N 1 N 2 )h 1 h T 2 ( σ 2 I N 1 N 2 )h 2 = σ 2 T h1 h1 σ 2 T h1 h2 σ 2 T h2 h1 σ 2 T h2 h2. (21) σ 2 h σ 2 h 2 It is fairly straightforward to show that any affine transformation of these features will also be uncorrelated. Since the MLP is nonlinearly combining orthogonal features it will yield, in general, a nonlinear discriminant function. 4.0 Experimental Results For these experiments we used vehicle data from the TABILS 24 ISAR data set. The radar used for the data collection is a fully polarimetric, K a band radar. The ISAR imagery was processed

19 page 19 of 27 with a polarimetric whitening filter (PWF) (Novak et al, 1993) and then logarithmically scaled to units of dbsm (db square meters) prior to being used for our experiments. The data used was collected at a depression angle of 20 degrees, that is the radar antenna was directed 20 degrees down from the horizon. ISAR images were extracted in the range 5 to 85 degrees azimuth in increments of 0.8 degrees. This resulted in 100 ISAR images (50 training, 50 testing). Images within both the training and testing sets were separated by 1.6 degrees. FIGURE 4.Examples of ISAR imagery. Down range is increasing from left to right. Target vehicle is shown at aspects of 5 (left), 45 (middle), and 85 (right) degrees. 4.1 Experiment 1 In the first experiment, straight backpropagation training was conducted with no other modifications other than to weight quadratic penalty term associated with the constraints in (16) by β = 0.93 and the output variance term with ( 1 β) = The coefficients converged to a solution after approximately 1200 runs through the entire training set. Examination of the input layer (feature extracting layer) revealed that the coefficients associated with the first feature (1st column of matrix W 1 ) were highly correlated with the coefficients of the second feature. In effect the MLP converged to a linear discriminant function. At best, the MLP was equivalent to choosing a threshold for a linear filter. The resulting discriminant function is illustrated in figure 6. In the figure a contour plot of the discriminant function with respect to the linear outputs of the first hidden layer, f 1 and f 2 of figure 3, is plotted. Although the discriminant function implemented is nonlinear, the features are so

20 page 20 of 27 highly correlated that all inputs are projected onto a single curve in the feature space. Further adaptation continued to increase the correlation of the features. FIGURE 5. NL-MACE discriminant function with respect to extracted feature mapping. The cluster in the lower left is the mapping of noise (asterisk) with the same second order statistics as the rejection class. The cluster in the upper right is the mapping of testing (plus) and training (diamonds) exemplars. Since the features are highly correlated, inputs are mapped to a single curve in the feature space, and the overall filter is effectively a linear discriminant function of the input image. 4.2 Experiment 2 In light of the results of the first experiment (and several other experiments not described here for brevity), a modification was made to the training algorithm that yielded a nonlinear discriminant function. During training, orthogonality between the columns of the matrix was enforced via a Gram-Schmidt procedure at each training iteration. The approximate convergence time was nearly the same as in the first case, but the resulting discriminant function was no longer linear, indicating that the second feature was utilized by the filter. The new discriminant function is plotted in figure 6. The features are no longer correlated so the target exemplars and noise (rejection W 1

21 page 21 of 27 class) no longer lie on a single curve in the feature space. The resulting filter is utilizing the second feature and the discriminant is not equivalent to a linear discriminant function. FIGURE 6. Comparison of discriminant function with respect to extracted feature mapping when orthogonal features was enforced. The cluster in the upper left is the mapping of noise (asterisk) with the same secondorder statistics as the rejection class. The cluster in the lower right is the mapping of testing (plus) and training (diamonds) exemplars. The mapping is no longer confined to a single curve in the feature space and the discriminant function is a nonlinear function of both features.

22 page 22 of Performance Comparison At this point, we are satisfied that the nonlinear associative memory structure is doing more than applying a threshold to the linear discriminant function. We now compare the performance of the linear MACE filter to our nonlinear extension with orthogonal features. FIGURE 7.Sample responses of the linear MACE filter (left) as compared to the output of the nonlinear filter (right) given the same input. The samples shown include one training exemplar (top) and the adjacent testing exemplar (bottom). Sample responses of both filters, linear and nonlinear, are shown in figure 7. One training set exemplar and one testing set exemplar are shown for both the linear MACE filter and the nonlinear filter. It is evident from the figure that the nonlinear filter appears to reduce the variance in the output plane (correlation energy for the linear case) as compared to the linear filter while still maintaining a sharp peak near the center point. Recall that at no time during training were shifted exemplars presented to the network, although as in the MACE filter, the projection must be com-

23 page 23 of 27 puted at all positions in the input image in order to compute the output image. This response was typical for all exemplars. Localized peak and low variance properties were retained. FIGURE 8. Peak (center) response of the linear MACE filter (left) compared to the output of the nonlinear filter (right) over the entire training set (top) and testing set (bottom) plotted as a function of vehicle aspect angle. In figure 8 we show the peak response for both the linear and nonlinear filter for both the training and testing set. In the case of the training set for the linear filter the designed value is, of course, met exactly at the center point. The peak response over the training set always occurred at

24 page 24 of 27 the center point for the nonlinear filter. In order to determine the peak response for the testing set, for both the linear and nonlinear filter, we simply chose the peak response in the output plane. In all cases this point occurred within a 5 x 5 pixel area centered in the output plane, but was not necessarily the center point for the test set. It can be seen in the plot that the nonlinear filter appears to have better generalization properties over the training set than the linear filter. FIGURE 9. Filter output plane pdfs (excluding 5x5 pixel center region), estimated over training exemplars, for the linear MACE (solid line) and the NL-MACE (dotted line). In figure 9 we show the probability distribution of the output plane response estimated (via Parzen window method) from the testing exemplars. The linear MACE filter clearly exhibits a more significant tail in the distribution than does the nonlinear filter. 5.0 REMARKS AND CONCLUSIONS We have presented a method by which the MACE filter can be extended to nonlinear processing. A necessary part of any extension to the MACE filter must consider the entire output image plane. In the case of the nonlinear extension to the MACE filter the output image plane can no longer be

25 page 25 of 27 characterized by the average power spectrum over the recognition class and any iterative method for computing its parameters might have to train exhaustively over the entire output plane. Using a statistical treatment, however, we were able to develop a training method that did not require exhaustive output plane training and which drastically reduced the convergence time of our training algorithm and gave improved performance. Our training algorithm requires the generation of a small number of random sequences with the same second-order statistics as our recognition class. Pre-whitening of the input exemplars played an important role in the training algorithm because the random sequences could then be any white noise sequence, which, as a practical matter, are less difficult to generate during training. Our results also show that it is not enough to simply train a multi-layer perceptron using backpropagation; the black-box approach. Careful analysis of the final solution is necessary to confirm reasonable results. In particular, the linear solution is a strong attractor and must be avoided, otherwise the solution would be equivalent (at best) to the linear MACE filter followed by a threshold. We used Gram-Schmidt orthogonalization on the input layer which did result in a nonlinear discriminant function and improved performance. We are currently exploring other methods by which independent features will adapt naturally. In our experiments better generalization and reduced variance in the output plane were demonstrated. Our current interest is in the application of this filter structure to SAR imagery. We are in the process of testing with multiple targets in target-plus-clutter imagery and will be reporting our results in the future. Future investigations will also explore the performance and relationships to the class of unconstrained correlation filters of (Mahalanobis, Kumar et al, 1994.).

26 page 26 of References Kumar, B. V. K. Vijaya(1992); Tutorial survey of composite filter designs for optical correlators, Appl. Opt. 31 no.23, Ravichandran, G., and D. Casasent (1992); Minimum noise and correlation energy filters, Appl. Opt. 31 no. 11, Casasent, D., and G. Ravichandran (1992); Advanced distortion-invariant minimum average correlation energy (MACE) filters, Appl. Opt. 31 no. 8, Casasent, D., G. Ravichandran, and S. Bollapragada (1991); Gaussian minimum average correlation energy filters, Appl. Opt. 30 no. 35, Sudharsanan, S. I., A. Mahalanobis, and M. K. Sundareshan (1991); A unified framework for the synthesis of synthetic discriminant functions with reduced noise variance and sharp correlation structure, Appl. Opt. 30 no. 35, Hester, C. F., and D. Casasent (1980); Multivariant technique for multiclass pattern recognition, Appl. Opt. 19, Kumar, B.V.K. Vijaya, Z. Bahri, and A. Mahalanobis (1988); Constraint phase optimization in minimum variance synthetic discriminant functions, Appl. Opt. 27 no. 2, Mahalanobis, A., B.V.K. Vijaya Kumar, and D. Casasent (1987); Minimum average correlation energy filters, Appl. Opt. 26 no. 17, Kumar, B.V.K. Vijaya (1986); Minimum variance synthetic discriminant functions, J. Opt. Soc. Am. A 3 no. 10, Kohonen, T. (1988); Self-Organization and Associative Memory (1st ed.); Springer Series in Information Sciences, vol. 8; Springer-Verlag. Réfrégier, Ph., and J. Figue (1991); Optimal trade-off filter for pattern recognition and their comparison with Weiner approach, Opt. Comp. Proc. 1, Mahalanobis, A., B.V.K. Vijaya Kumar, Sewoong Song, S.R.F. Sims, and J.F. Epperson (1994); Unconstrained correlation filters; Appl. Opt. 33 no. 33, Fisher J., and Principe, J. C. (1994); Formulation of the MACE Filter as a Linear Associative Memory, Proceedings of the IEEE International Conference on Neural Networks, Vol. 5, p Mahalanobis, A., A. V. Forman, N. Day, M. Bower, R. Cherry (1994); Multi-class SAR ATR using shift-invariant correlation filters, Pattern Recognition 27 no. 4, Novak, L. M., G. Owirka, C. Netishen (1994); Radar target identification using spatial matched filters, Pattern Recognition 27 no. 4, Hinton, G. E., and J. A. Anderson Ed. (1981), Parallel Models of Associative Memory, Lawrence Erlbaum Associates, Publishers. Hertz, J., et al (1991); Introduction to the Theory of Neural Computation, Addison-Wesley Publishing Company.

27 page 27 of 27 Kung, S. Y. (1992); Digital Neural Networks, Prentice-Hall. Amit, D. J. (1989); Modeling Brain Function: The World of Attractor Neural Networks, Cambridge University Press. Novak, L. M., M. C. Burl, and W. W. Irving (1993); Optimal Polarimetric Processing for Enhanced Target Detection, IEEE Transactions on Aerospace and Electronic Systems, Vol. 29, p Oppenheim A. V., and R. W. Shafer (1989), Discrete-Time Signal Processing, Prentice-Hall, Inc.

Entropy Manipulation of Arbitrary Non I inear Map pings

Entropy Manipulation of Arbitrary Non I inear Map pings Entropy Manipulation of Arbitrary Non I inear Map pings John W. Fisher I11 JosC C. Principe Computational NeuroEngineering Laboratory EB, #33, PO Box 116130 University of Floridaa Gainesville, FL 326 1

More information

Pose Estimation in SAR using an Information Theoretic Criterion

Pose Estimation in SAR using an Information Theoretic Criterion Pose Estimation in SAR using an Information Theoretic Criterion Jose C. Principe, Dongxin Xu, John W. Fisher III Computational NeuroEngineering Laboratory, U. of Florida. {principe,xu,fisher}@cnel.ufl.edu

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, and Anupam Goyal III. ASSOCIATIVE RECALL BY A POLYNOMIAL MAPPING

Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, and Anupam Goyal III. ASSOCIATIVE RECALL BY A POLYNOMIAL MAPPING Synthesis of a Nonrecurrent Associative Memory Model Based on a Nonlinear Transformation in the Spectral Domain p. 1 Bobby Hunt, Mariappan S. Nadar, Paul Keller, Eric VonColln, Anupam Goyal Abstract -

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Nonlinearity optimization in nonlinear joint transform correlators

Nonlinearity optimization in nonlinear joint transform correlators Nonlinearity optimization in nonlinear joint transform correlators Leonid P. Yaroslavsky and Emanuel Marom Three types of nonlinear transformations of the joint spectrum in nonlinear joint transform correlators

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Robust Space-Time Adaptive Processing Using Projection Statistics

Robust Space-Time Adaptive Processing Using Projection Statistics Robust Space-Time Adaptive Processing Using Projection Statistics André P. des Rosiers 1, Gregory N. Schoenig 2, Lamine Mili 3 1: Adaptive Processing Section, Radar Division United States Naval Research

More information

THE multilayer perceptron (MLP) is a nonlinear signal

THE multilayer perceptron (MLP) is a nonlinear signal Proceedings of International Joint Conference on Neural Networks, Dallas, Texas, USA, August 4-9, 013 Partially Affine Invariant Training Using Dense Transform Matrices Melvin D Robinson and Michael T

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Target Detection Studies Using Fully Polarimetric Data Collected by the Lincoln Laboratory MMW SAR. L.M. Novak MIT Lincoln Laboratory

Target Detection Studies Using Fully Polarimetric Data Collected by the Lincoln Laboratory MMW SAR. L.M. Novak MIT Lincoln Laboratory Target Detection Studies Using Fully Polarimetric Data Collected by the Lincoln Laboratory MMW SAR Abstract L.M. Novak MIT Lincoln Laboratory Under DARPA sponsorship, MIT Lincoln Laboratory is investigating

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Optimal Polynomial Control for Discrete-Time Systems

Optimal Polynomial Control for Discrete-Time Systems 1 Optimal Polynomial Control for Discrete-Time Systems Prof Guy Beale Electrical and Computer Engineering Department George Mason University Fairfax, Virginia Correspondence concerning this paper should

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 65 CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE 4.0. Introduction In Chapter

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering

Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering Bernard Widrow and Gregory L. Plett Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510 Abstract

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

ADAPTIVE FILTER THEORY

ADAPTIVE FILTER THEORY ADAPTIVE FILTER THEORY Fourth Edition Simon Haykin Communications Research Laboratory McMaster University Hamilton, Ontario, Canada Front ice Hall PRENTICE HALL Upper Saddle River, New Jersey 07458 Preface

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Efficient Algorithms for Pulse Parameter Estimation, Pulse Peak Localization And Pileup Reduction in Gamma Ray Spectroscopy M.W.Raad 1, L.

Efficient Algorithms for Pulse Parameter Estimation, Pulse Peak Localization And Pileup Reduction in Gamma Ray Spectroscopy M.W.Raad 1, L. Efficient Algorithms for Pulse Parameter Estimation, Pulse Peak Localization And Pileup Reduction in Gamma Ray Spectroscopy M.W.Raad 1, L. Cheded 2 1 Computer Engineering Department, 2 Systems Engineering

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing

A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 5, SEPTEMBER 2001 1215 A Cross-Associative Neural Network for SVD of Nonsquared Data Matrix in Signal Processing Da-Zheng Feng, Zheng Bao, Xian-Da Zhang

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Least Absolute Shrinkage is Equivalent to Quadratic Penalization

Least Absolute Shrinkage is Equivalent to Quadratic Penalization Least Absolute Shrinkage is Equivalent to Quadratic Penalization Yves Grandvalet Heudiasyc, UMR CNRS 6599, Université de Technologie de Compiègne, BP 20.529, 60205 Compiègne Cedex, France Yves.Grandvalet@hds.utc.fr

More information

CFAR TARGET DETECTION IN TREE SCATTERING INTERFERENCE

CFAR TARGET DETECTION IN TREE SCATTERING INTERFERENCE CFAR TARGET DETECTION IN TREE SCATTERING INTERFERENCE Anshul Sharma and Randolph L. Moses Department of Electrical Engineering, The Ohio State University, Columbus, OH 43210 ABSTRACT We have developed

More information

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki

Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom. Alireza Avanaki Over-enhancement Reduction in Local Histogram Equalization using its Degrees of Freedom Alireza Avanaki ABSTRACT A well-known issue of local (adaptive) histogram equalization (LHE) is over-enhancement

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

A Statistical Analysis of Fukunaga Koontz Transform

A Statistical Analysis of Fukunaga Koontz Transform 1 A Statistical Analysis of Fukunaga Koontz Transform Xiaoming Huo Dr. Xiaoming Huo is an assistant professor at the School of Industrial and System Engineering of the Georgia Institute of Technology,

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Translation-invariant optical pattern recognition without correlation

Translation-invariant optical pattern recognition without correlation Translation-invariant optical pattern recognition without correlation Michael E. Lhamon, MEMBER SPIE Laurence G. Hassebrook, MEMBER SPIE University of Kentucky Department of Electrical Engineering 453

More information

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Statistical Geometry Processing Winter Semester 2011/2012

Statistical Geometry Processing Winter Semester 2011/2012 Statistical Geometry Processing Winter Semester 2011/2012 Linear Algebra, Function Spaces & Inverse Problems Vector and Function Spaces 3 Vectors vectors are arrows in space classically: 2 or 3 dim. Euclidian

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Neural Networks (and Gradient Ascent Again)

Neural Networks (and Gradient Ascent Again) Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems

An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems 1780 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 7, JULY 2002 An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems Deniz Erdogmus, Member, IEEE, and Jose

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Statistical Signal Processing Detection, Estimation, and Time Series Analysis

Statistical Signal Processing Detection, Estimation, and Time Series Analysis Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Generalized Information Potential Criterion for Adaptive System Training

Generalized Information Potential Criterion for Adaptive System Training IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1035 Generalized Information Potential Criterion for Adaptive System Training Deniz Erdogmus, Student Member, IEEE, and Jose C. Principe,

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

SPEECH ANALYSIS AND SYNTHESIS

SPEECH ANALYSIS AND SYNTHESIS 16 Chapter 2 SPEECH ANALYSIS AND SYNTHESIS 2.1 INTRODUCTION: Speech signal analysis is used to characterize the spectral information of an input speech signal. Speech signal analysis [52-53] techniques

More information

(Refer Slide Time: )

(Refer Slide Time: ) Digital Signal Processing Prof. S. C. Dutta Roy Department of Electrical Engineering Indian Institute of Technology, Delhi FIR Lattice Synthesis Lecture - 32 This is the 32nd lecture and our topic for

More information

Address for Correspondence

Address for Correspondence Research Article APPLICATION OF ARTIFICIAL NEURAL NETWORK FOR INTERFERENCE STUDIES OF LOW-RISE BUILDINGS 1 Narayan K*, 2 Gairola A Address for Correspondence 1 Associate Professor, Department of Civil

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

Gradient-Based Learning. Sargur N. Srihari

Gradient-Based Learning. Sargur N. Srihari Gradient-Based Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

THIS paper studies the input design problem in system identification.

THIS paper studies the input design problem in system identification. 1534 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 10, OCTOBER 2005 Input Design Via LMIs Admitting Frequency-Wise Model Specifications in Confidence Regions Henrik Jansson Håkan Hjalmarsson, Member,

More information

CONTROL AND TOPOLOGICAL OPTIMIZATION OF A LARGE MULTIBEAM ARRAY ANTENNA

CONTROL AND TOPOLOGICAL OPTIMIZATION OF A LARGE MULTIBEAM ARRAY ANTENNA CONTROL AND TOPOLOGICAL OPTIMIZATION OF A LARGE MULTIBEAM ARRAY ANTENNA Thierry Touya and Didier Auroux Mathematics Institute of Toulouse University of Toulouse Paul Sabatier 31062 Toulouse cedex 9 France

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Expressions for the covariance matrix of covariance data

Expressions for the covariance matrix of covariance data Expressions for the covariance matrix of covariance data Torsten Söderström Division of Systems and Control, Department of Information Technology, Uppsala University, P O Box 337, SE-7505 Uppsala, Sweden

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

Neural Network Based Response Surface Methods a Comparative Study

Neural Network Based Response Surface Methods a Comparative Study . LS-DYNA Anwenderforum, Ulm Robustheit / Optimierung II Neural Network Based Response Surface Methods a Comparative Study Wolfram Beyer, Martin Liebscher, Michael Beer, Wolfgang Graf TU Dresden, Germany

More information

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis

EIGENFILTERS FOR SIGNAL CANCELLATION. Sunil Bharitkar and Chris Kyriakakis EIGENFILTERS FOR SIGNAL CANCELLATION Sunil Bharitkar and Chris Kyriakakis Immersive Audio Laboratory University of Southern California Los Angeles. CA 9. USA Phone:+1-13-7- Fax:+1-13-7-51, Email:ckyriak@imsc.edu.edu,bharitka@sipi.usc.edu

More information

On Information Maximization and Blind Signal Deconvolution

On Information Maximization and Blind Signal Deconvolution On Information Maximization and Blind Signal Deconvolution A Röbel Technical University of Berlin, Institute of Communication Sciences email: roebel@kgwtu-berlinde Abstract: In the following paper we investigate

More information

Regularization in Neural Networks

Regularization in Neural Networks Regularization in Neural Networks Sargur Srihari 1 Topics in Neural Network Regularization What is regularization? Methods 1. Determining optimal number of hidden units 2. Use of regularizer in error function

More information

Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake

Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake Structural Damage Detection Using Time Windowing Technique from Measured Acceleration during Earthquake Seung Keun Park and Hae Sung Lee ABSTRACT This paper presents a system identification (SI) scheme

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information