Translation-invariant optical pattern recognition without correlation

Size: px

Start display at page:

Download "Translation-invariant optical pattern recognition without correlation"

Alyson McCormick
5 years ago
Views:

1 Translation-invariant optical pattern recognition without correlation Michael E. Lhamon, MEMBER SPIE Laurence G. Hassebrook, MEMBER SPIE University of Kentucky Department of Electrical Engineering 453 Anderson Hall Lexington, Kentucky Abstract. Most optical pattern recognition techniques rely on correlation that inherently achieves translation invariance. We introduce a significantly different formulation for image recognition in which a set of inner product operators are used to achieve translation-invariant pattern recognition. Our formulation extends the distortion-invariant linear phase coefficient composite filter family, developed by Hassebrook et al., into a set of translation-invariant inner product operators. Translation invariance is achieved by treating 2-D translation as distortion. The magnitudes of the inner product operations are insensitive to translation, whereas the phase responses vary, but are discarded. For large images containing many objects, this method can be applied by tiling the 2-D operators to the test image size, elementwise multiplying by the test image, and then convolving with a binary rectangular window. Impressive numerical efficiency, exceeding that of fast Fourier transform-based techniques, is attained by the inner product operator approach. Examples of our approach, distortion-invariant detection and discrimination capabilities, idealized optical implementation, and comparison with conventional matched filters are presented Society of Photo-Optical Instrumentation Engineers. Subject terms: filter banks; inner-product filters; harmonic expansion; distortioninvariant filters; composite filters; synthetic discriminant filter; matched filter; correlation. Paper received Sep. 23, 1995; revised manuscript received Mar. 11, 1996; accepted for publication Mar. 13, Introduction We introduce translation-invariant target detection with vector inner product VIP operations rather than correlation. As with more conventional synthetic discriminate function 1 SDF analysis, we lexicographically convert 2-D images into 1-D vectors. We therefore refer to both 1-D and 2-D operations as vector inner products VIPs. An inherent outcome of the conventional correlation process is translation invariance. However, this translation invariance requires the correlation filter impulse response to be incrementally, shifted, multiplied, and integrated with the input image. This incrementally repeated operation is numerically intensive when implemented electronically. There are two well-known techniques for performing correlation that are more efficient than correlation with the impulse response directly. One technique is optical correlation, which makes use of the Fourier transform capability of lenses. However, for programmable applications, the spatial light modulators are limited to phase-only operation. The VIP technique offers an alternative approach with a quite different optical architecture. Although the emphasis of this research is the mathematical aspects of VIP operator design, we present two ideal optical architectures that are potentially suitable for VIP operator implementation. The most common approach to correlation is by digital implementation of the fast Fourier transform FFT. However, this approach has a complicated processing architecture and requires O M 2 log(m) operations for M M images. Performing VIP operations is considerably simpler on general-purpose machines and ideal for most DSP architectures since the operator image is not shifted. Correlation is avoided by embedding translation invariance into the VIP operation. A binary window is correlated with the result but is considerably more efficient because it does not require multiplication. For large scenes, the VIP technique requires only O(M 2 ) operations. In the VIP operator design process, we embed translation invariance in the same way we would embed rotation invariance into distortion-invariant SDF filter design. That is, translation is included in the training set where the training set is representative of the object distortion. However, unlike SDF design, VIP operator design does not require matrix inversion. This is possible because circular shift or wraparound translation is a periodic distortion. An object having periodic distortion can be represented as a discrete harmonic expansion. A discrete form of a harmonic expansion is a weighted combination of training images in which the images are representative of the periodic distortion. The resulting expansion yields a set of harmonic components in the form of vectors that we call VIP operators. For this particular type of distortion, the VIP vectors are similar to Fourier vectors. Optical correlation has been the primary motivation for 2-D correlation filter design. The VIP operations introduced here represent a merging of several trains of thought which include SDF design, harmonic expansions, and filter bank design. Hester and Casasent originated the synthetic discriminant function 1 approach to distortion-invariant filter Opt. Eng. 35(9) (September 1996) /96/$ Society of Photo-Optical Instrumentation Engineers

2 ing which has spawned considerable research in distortioninvariant optical correlation. The output correlation responses to the training images are constrained to a fixed value, independent of distortion. Constraining the output correlation has become a dominant approach 2 5 in distortion-invariant filter design. Another approach to correlation filter design, developed by Arsenault and co-workers, 6 was to represent the images, distorted by in-plane rotation, in polar coordinates. Images represented in polar coordinates have periodic components, which are known as circular harmonic components. Fourier series expansion of an image in polar coordinates is known as a circular harmonic expansion. Schils and Sweeney introduced a general harmonic expansion and presented a training set selection technique 7 that selected the training images as the object distortion varied. For example, the training images are created as a 3-D object is gradually rotated in space. This method presented the idea of ordering the training set selection to correspond to the actual distortion variations. Hassebrook, Kumar, and Hostetler 8 presented the idea of mathematically incorporating the filter design with training set selection. This filter design concept is known as linear phase coefficient composite LPCC filters. It was recognized early on that LPCC filters are a discrete form of a harmonic expansion. In this matrix form, special eigenvalue and eigenvector relationships come about from the periodic distortion. By incorporating translation as the periodic distortion of the training set, the LPCC filter design yields a family of VIP operators. Arsenault and Hsu, 9 Schils and Sweeny, 7 Wu and Stark, 10 and others formulated harmonic expansion into optimal filter bank combinations which inspired similar filter bank architectures in LPCC filter systems. The advantage of these filter banks is that the performance tends to improve with the number of filters used in the bank. Thus for simpler problems, a single filter could be used for detection and discrimination. More filters could be added to the filter bank as the difficulty of the problem increased. Thus filter families allow the designer to isolate and use only the best filters for a particular application. These filter bank concepts can also be applied to the family of VIP operators. From this family of operators, we successively select the best operators for detection and discrimination performance, and use them in a VIP operator bank architecture. The basic mathematics of VIP operators is introduced in Sec. 2 with a 1-D signal format. Although VIP operator design is based on circular shift, resulting in a Fourier type of expansion, it is not limited to input signals that are the same length as the operators. In Sec. 2, a tiling technique, analogous to zero padding, is introduced that allows the processing of arbitrary-sized input signals. A binary window is convolved with the larger images but because the window is binary and rectangular, the number of operations is proportional to the window s area and does not involve multiplications. Translation in 2-D is effectively twoparameter distortion, which requires a more sophisticated mathematical structure 11 than the 1-D, single-parameter case. We introduce these enhancements in Sec. 3 and describe digital and optoelectronic implementation of VIP operators in Sec. 4. Numerical results are presented in Sec. 5 using well-known performance measures. VIP operators can be implemented digitally as well as optoelectronically and they have high numerical efficiencies. The numerical performance is detailed in Sec. 6 and conclusions are given in Sec Translation-Invariant 1-D Operator We use a 1-D signal processing problem to introduce the idea of embedding a periodic distortion into the phase response of a VIP operator. Periodic distortion in this case is circular shift of a finite-length, 1-D signal sequence. A family of VIP operators is found, and only a few from the family will be selected to be included into a VIP operator bank. The selection of operators and the order of their use are based on their discrimination ability as well as their output SNR. Performance metrics for discrimination and output SNR are presented and derived, respectively. 2.1 VIP Operator Design We limit our sample sequence length to N elements which represent the 1-D target signal as an N 1 vector x t. The vector x t is normalized so that x T t x t 1. We define N unique N 1 -shifted training vectors as x t,n m x t m n, where the image vector has the circular shift property of x t [m] x t [m N], m 0,1,...,N 1 and n 0,1,...,N 1. The element mapping of x t to x t,n is a circular shift so the elements of x t are wrapped around and mapped to x t,n. We form a nontarget or clutter class of training vectors in the same way so that x c,n [m] x c [m n] and x c T x c 1. In practice, the VIP system is not intensity invariant and it is assumed that the input signals are normalized. However, if test signals are not normalized, then related algorithms 12 for intensity invariance may be used to compensate. Signal energy can also be used to enhance discrimination between classes but is not a subject of the present research. Because of the normalization and circular shift property, the training set will form a cyclic Toeplitz i.e., circulant and symmetric correlation matrix as R tt X t T X t, where the N N training set matrix is X t x t,0 x t,1...x t,n 1 and the main diagonal of the matrix in Eq. 2 is unity. The training set matrix is symmetric because its m,n element is the inner product of the m th and n th target training vectors, which is identical to the n,m element value. The training set matrix is circulant because its m,n element x T t,m x t,n is equivalent to the circular correlation of two identical sequences that can be shown to be dependent only on the absolute periodic difference between m and n. Because of the circulant 13 property of R tt, we know the eigenvectors are Fourier vectors and are given by k n exp j 2 kn N, 3 where n 0,1,...,(N 1 and k 0,1,...,(N 1. We form a family of VIP operators as 1 2 Optical Engineering, Vol. 35 No. 9, September

3 N 1 h k * n 0 x t,n exp j 2 kn N X t k, 4 where k 0,1,...,(N 1 and * indicates complex conjugation. The response of the k th VIP operator is known 8 to be x T t,n h k * tt,k exp j 2 kn N, 5 where tt,k is a positive real eigenvalue because R tt is circulant and symmetric. The subscript tt denotes a target input. A clutter, or nontarget response, has the form of x T c,n h k * ct,k exp j ct,k exp j 2 kn N, 6 where ct,k is positive real scalar and ct,k is a constant phase offset. This result occurs because the clutter to target VIP matrix, discussed later, is circulant but not symmetric. In this case, it is well known 13 that the eigenvectors are Fourier vectors, as in the case of the training set matrix, but the eigenvalues are complex. The magnitude of the responses, ct,k in Eq. 6 is different than tt,k in Eq. 5, thus allowing discrimination of target and clutter independent of translation. However, location capability is lost in Eq. 6. In Sec. 2.4 we present an elegant method for efficiently reincorporating location capability. This is a significant aspect of this method because we do not need to perform correlation to attain translation invariance. However, to achieve a specific level of performance, we may need to repeat the VIP with other operators in the family. The choice of operator vector and optimum bank architecture requires knowledge of both the eigenvectors and eigenvalues. The VIP operator design equations are obtained by first augmenting all inputs to Eq. 5, so that X t T h k * X t T X t k R tt k tt,k k. The responses shown in Eq. 7 satisfy the eigenvector relationship of R tt where tt,k are the eigenvalues for k 0,1,..., N 1. If the distortion is periodic, that is, the correlation matrix is cyclic Toeplitz, we know the eigenvector values, and to find the eigenvalues we augment all the eigenvectors to Eq. 7 so that R tt tt, 7 8 where the N N discrete Fourier transform DFT matrix is 0 1 N 1 and tt diag( tt, 0, tt,1,..., tt,n 1 ). Similarly, the clutter to target VIP matrix is formed by R ct X T c X t. As in the case of the training set VIP matrix, the clutter to target VIP matrix is circulant 13 because the element values correspond to circular correlation of two sequences. The result is dependent only on the relative shift between the two sequences and not on the absolute shift. However, unlike the training set VIP matrix case, the clutter to target VIP matrix element values may depend on the direction of the shift difference and not just its absolute value. Thus the clutter to target VIP matrix is not necessarily symmetric. Since 1 1/N * and T, we may solve the eigenvalue matrix for the target as tt 1 N *R tt. For clutter ct 1 N *R ct, 9 10 where ct,k ct [k,k] for k 0,1,...,(N 1. The magnitude operation is required for ct,k because the eigenvalue may be complex, as indicated by ct,k in Eq. 6. These complex clutter eigenvalues are due to R ct being circulant but not necessarily symmetric. Individual VIP operator test responses are implemented by y k x T h k *, where x is the test signal vector. 2.2 VIP Operator Bank Architecture 11 Several VIP operator outputs can be combined into a bank of operators. We use, as is, an optimized maximum likelihood filter bank architecture developed by Hassebrook et al. in Refs. 8 and 14 for LPCC and hybrid composite correlation filters, respectively. This architecture is valid for VIP operators because its development is based on the origin response of correlation filters for an additive white Gaussian noise model. The origin response analysis is equivalent to VIP operations. The design begins by selecting the most desirable operators. The order in which the selected operators are included in a bank is determined so as to improve discrimination as well as SNR capability. The output of the VIP operator bank is given as K y bank k 1 K b k x T test h k * b k y k, k 1 12 where b k is a real positive weighting coefficient and K is the number of operators included in the bank. The magnitude operations in Eq. 12 remove the effects of the distortion while the summation forms a weighted average of the magnitude responses, thereby reducing the effects of spurious noise. The optimized coefficients 8 b k, associated with Eq. 12 are b k tt,k ct,k tt,k. 13 Each orthogonal operator provides unique information about the target. Not all of the VIP operators are equal in discrimination ability; some are better than others. We use the difference between target and clutter response to form a metric called discrimination-to-noise ratio DNR to mea Optical Engineering, Vol. 35 No. 9, September 1996

4 sure the discrimination ability of each operator. The DNR ratio has been derived and discussed elsewhere 8,15 as DNR k sgn tt,k ct,k tt,k ct,k 2 d tt,k N Including all of the operators may not yield the greatest discrimination and would be numerically inefficient so we use the DNR measure to select which operators to include. When selecting the operator coefficients, we do not consider the operator vector h 0 * because of its large dc response. Also, about half the VIP operators are complex conjugates of the other half and are redundant in magnitude response. We also do not use operators that yield negative DNR k values because in this case clutter responses are greater than target responses. 2.3 Reorganization of the Operators for Improved SNR The output of a VIP operator, as shown later, is proportional to the number of target signal elements divided by the number of training vectors. For the periodic translation distortion problem considered in this research, the number of signal elements equals the number of training vectors, thereby limiting the output SNR of the individual VIP operators. To understand these limitations fully, it is necessary to derive the output SNR of the individual VIP operators. In deriving the output SNR, we assume the noise is additive and stationary i.e., s n x t,n, where s n is the addition of a d 1 target vector and a d 1 noise vector. The noise vector has independent identically distributed, zero mean, Gaussian elements with variance 2. The energy of the noise vector is E T 2 2 d and the covariance matrix of the noise is E T 2 I d where I d is a d d identity matrix. Using Eq. 5, the VIP operator response to the target signal is y n,k x T t,n h k * tt,k exp j 2 kn N, 15 and the target signal energy is y n,k 2 2 tt,k. The noise energy is defined as E ỹ,k 2 E h k T Th k * 2 h k T h k *. Substituting in Eq. 4 into Eq. 16 yields 16 E ỹ,k 2 2 k H X t T X t k 2 k H tt,k k 2 tt,k N, 17 where the superscript H indicates conjugate transpose. The output SNR for one VIP operator is defined as SNR 0 k 2 tt,k 2 tt,k N tt,kd 2 N, where the input SNR is SNR i x t,n T x t,n As seen in Eq. 18, the output SNR is proportional to the target eigenvalues. As mentioned earlier, the VIP operator output SNR is limited by d N. Operators are first selected by the K DNR highest values. Out of this subset, the operators may be incorporated in the bank in descending order of DNR. No further sorting is necessary if all K operators are used. However, a secondary sort of the K operators based on the output SNR improves the intermediate SNR performance for noisy signals. That is, a subset of the K operators is selected in order of descending output SNR which corresponds to decreasing tt,k values. All numerical simulations in this research are based on this selection and ordering procedure. 2.4 Linear Correlation Equivalence of VIP Operators By themselves, VIP operators only detect a target s presence, not its location. We solve this problem by a tiling technique accommodating a convolution of large test images with a binary window. In practical applications, the test signals may be relatively large with a relatively small target located somewhere within them. Correlation filters are commonly designed by incorporating signals or images of about the same size as the target signal. These filters are typically designed for circular correlation. In practice they are extended by zero padding to perform an equivalent linear correlation. VIP operators can also be generated from zero padded training sets, but this increases the size of the matrices in the design Eqs. 7 through 10. In contrast to zero padding, the same goal is accomplished with VIP operators by abutting, or tiling, the VIP operators together. The resulting vector is elementwise multiplied by the test vector. Instead of a direct sum being performed, we perform a moving average with a window size equal to the original VIP operators. In this regard there is technically a correlation with a rectangular window but numerically this operation does not require multiplication. That is, only addition and subtraction operations on the order of the test vector size are required. Furthermore, the window size is independent of the specific target type and dependent only on the original VIP operator size. Thus one window type will function for all similarly sized signals. By tiling the operators, long sequences containing target signals can be processed without increasing the operator design complexity. In addition to this, with tiling not only is a target detected but it is also located within the scene. For example, assume we want to perform a VIP operation on an L MN long test signal, containing an N long target signal, with an N 1 element VIP operator h k, and M is an arbitrary integer value. First we generate an L 1 vector g k by tiling or augmenting h k,m times so that g k h k T h k T h k T... T. 20 An L 1 test vector f test is formed by using a training set vector and zero padding it to the L length so that f test 0 T x t T 0 T 0 T... T, 21 Optical Engineering, Vol. 35 No. 9, September

5 where 0 is an N 1 zero vector. Although in Eq. 21, the target vector starts at element N, the exact position is arbitrary to the algorithm and we have verified this in results not shown here. The ith output element of the output sequence is determined by y k,n i f test i g k * i rect i N, 22 where is circular convolution of a MN long sequence. Each VIP operator magnitude response y k,n [i is weighted by b k and summed with the other operator outputs to form the final output of the operator bank. Our results, given in Sec. 5, show that the output forms a response indicating the presence and location of the target signal. 3 Translation-Invariant 2-D Operator We have described how to formulate VIP operators to achieve 1-D signal detection. With a few modifications, the 1-D algorithm is extended to include multiparameter distortion, that is, 2-D translation, for 2-D images. The target training set for the 2-D VIP operator yields a correlation matrix that is no longer cyclic Toeplitz but is block circulant with circulant blocks 13 BCCB. The required modifications are needed for the extended training set matrix and the introduction of the extended eigenvector matrix D Images with 2-Translation Distortions Consider an image with row and column translations as distortions. The training set must be representative of these types of distortions and also create a BCCB correlation matrix in order to have special eigenvectors necessary for a distortion-invariant magnitude response. The training set is created by circularly shifting the input training image by rows and then by columns. Thus an N N square image would have N 2 shifts and therefore N 2 training images. After the circular shifts are made, the image is lexicographically converted to a column vector and placed in the training set matrix. As shown in Fig. 1, we use the 8 8 initials ML as the original target image and generate all 64 possible translations. In the same manner, but not shown, we use AF as a sample clutter image and generate all possible translations for the clutter training set. The gray grid in Fig. 1 is overlaid for a visual aid to show the circular shifts before lexicographic conversion. An N N image that has d N 2 pixels would create a training set matrix that is d N 2 such that Fig. 1 Target training set before lexicographic conversion of each translation. Original image is 8 8 and there are a total of 64 possible row and column translations. A gray grid was overlaid on the scene for a visual aid. is BCCB, we know the eigenvectors are a form of Fourier vectors and because R tt is also symmetric, we know the eigenvalues are real. In addition to this, since we know that the main diagonal elements of R tt are unity and larger than any off-diagonal elements, we conclude that the eigenvalues are non-negative. A slight refinement over the 1-D case in finding the eigenvalues is needed and described in Ref. 13, page 185. Instead of using the augmented Fourier vectors as in Eq. 8, weuseq, which is an N 2 N 2 matrix. The Q matrix is formed by Q, where is the Kronecker product. With this change, Eq. 8 becomes R tt tt, and the 2-D relationship is 24 X t x t,0,0 x t,1,0 x t,2,0 x t,0,1 x t,1,1 x t,2,1 x t,n 1,N Let us clarify this with an example. In Eq. 23, x t,1,2 is a d 1 vector that represents the original image circularly shifted down by one row and to the right by two columns, then lexicographically converted to a d 1 column vector. Each image is normalized to have unit energy, so that T x t,m,n x t,m,n 1. The clutter training images are processed similarly and form a clutter training set matrix X c. For Fig. 2 we used the initials ML as target represented in an 8 8 image. Therefore there are 64 blocks and each block consists of 8 8 elements. The correlation matrix R tt,is block circulant as shown by the gray levels in Fig. 2 in which each block is itself circulant, thus BCCB. Since R tt Fig. 2 Gray-level representation of VIP matrix, R tt Optical Engineering, Vol. 35 No. 9, September 1996

6 R tt Q Q tt, 25 where tt is real, positive, and diagonal. Similarly, the clutter to target VIP is formed by R ct X c T X t, so that R ct Q Q ct. 26 The properties of 1 can be proven 13 to be valid for Q so that Q 1 Q*/N 2. Since is nonsingular, then is also nonsingular. Using the inverse property and conjugate property of Kronecker product, then Q and so Q * N * N 1 N 2 * Q* N Using Eq. 27 to solve for the diagonal eigenvalue matrix in Eq. 26 yields Fig. 3 Example of a VIP operator bank architecture with three operators. x is the input, and the output response of the bank is a weighted sum of the individual operator magnitude responses. tt 1 N 2 Q*R tt Q, 28 where the eigenvalues are on the main diagonal of tt. The sum of the eigenvalues ideally equals the number of target training images used, i.e., N 2 1 N 2 k 0 tt,k. 29 The clutter to target eigenvalues are found similarly using R ct. However, while ct is a diagonal matrix because R ct is BCCB, R ct is not necessarily symmetric so the eigenvalues are not necessarily real valued. Therefore, the clutter eigenvalues are approximated by the magnitudes of the main diagonal elements of ct. The VIP operator weight b k, and operator selection metric DNR are found as in the 1-D case by using Eqs. 13 and 14, respectively. The entire VIP operator family is defined by the columns of H* X t Q Redundant VIP Operators and the Eigenvalue Matrix The DNR metric is used for selecting which VIP operators are to be included in the operator bank, but about half of the operators are complex conjugates of other operators. Since the phase response is discarded, these operators are redundant and do not contribute new information for detection. We present a method for identifying which operators are redundant and which are real valued. We note the realvalued operators because it is possible to construct an all real-valued operator bank; hence complex multiplies and additions are not needed. Although this aspect is not pursued in this research, it would be of particular interest for both optical and digital processor implementation. Each eigenvalue in the eigenvalue matrix tt is real and is associated with a VIP operator. Multiple operators may have similar eigenvalues so we cannot discriminate among the redundant operators by their eigenvalues. A diagonal matrix is obtained from F 1 H T H* since all the operators are orthogonal. Thus, each nonzero term of F 2 H H H*, 31 represents either a redundant operator pair or a real-valued operator where H indicates conjugate transpose. There is only one nonzero value in each column of F 2. The i th row and j th column location of the nonzero value in F 2 relate to which operator pairs (i, j) are conjugates of each other. All nonzero elements on the main diagonal of F 2 identify all real-valued operators. The very first location, F 2 0,0 is the 0 th order operator. This operator has poor discrimination so its DNR is not considered and the operator is not used. The other real-valued operators may be used even though they do not have a matching redundant operator. This is because these operators achieve discrimination by using negative values which act in a way similar to phaser cancellation. 4 Translation-Invariant Operator Implementation The VIP methodology is a significant departure from correlation-based filters and gives rise to uniquely different optoelectronic and digital implementations. In considering implementation, we focus on conveying the concept and potential of implementation rather than specific details. In this regard we ignore actual device limitation and practical issues and present three ideal hardware implementations. The first implementation is a DSP-based architecture. The second implementation is a coherent light, optoelectronic processor similar in design to the 4 f correlator. The third is an incoherent light optoelectronic processor, nicknamed the VIP camera, because it can process reflected light directly off an object. The DSP VIP architecture is summarized in Fig. 3, where we assume that operators h k1 *, h k2 *, and h k3 * are converted back to their 2-D form and are tiled to match the size of the input image x. Elementwise complex multiplication is required because h kn * is complex. The resulting product is then effectively convolved by a 2-D binary window. A 2-D binary window convolution is separable i.e., Optical Engineering, Vol. 35 No. 9, September

7 Fig. 4 Idealized coherent optical correlator system to implement one VIP operator. h w [m,n] h w1 [m]h w2 [n and can be performed with two simpler 1-D convolutions. For example, a 1-D convolution is performed on all the rows. The result is then convolved again but along the columns instead. The 1-D convolutions do not require multiplication, only addition and subtraction. This makes the VIP operator even more attractive for implementation on a DSP architecture, since most DSPs are capable of a multiplication and accumulate within each instruction cycle. After the convolutions, we multiply by a weighting coefficient b k, then each corresponding element magnitude is summed to form the response of the operator bank. An idealized optical implementation is presented in Fig. 4. We modify the familiar 4 f optical correlator system 17 by placing an additional full complex spatial light modulator SLM by the conventional input SLM. The first SLM is encoded with the input in the spatial plane and is illuminated by a coherent plane wave. The second SLM is used to encode the complex VIP operator in the spatial plane. The VIP operator is tiled to match the input scene dimensions. This second SLM is placed adjacent and parallel to the first SLM. If we assume ideal alignment, full complex operation, and no light leakage into adjacent pixels of the SLM, then an elementwise multiplication is performed between the two SLMs. The resulting product is convolved with a 2-D binary window. A Fourier transforming lens L1 is used to transform the spatial plane onto the filter plane where a 2-D sinc(u,v) function is encoded. The sinc function s parameters are dependent only on the original training image size N N, so an SLM is not needed. The second Fourier lens L2 is used to transform the results onto the correlation plane and is sensed by a CCD array. In practice, the width of the sinc function would be limited by the aperture in the frequency plane. The main lobe of the sinc would be the most critical component to be included within the aperture and its width is inversely related to the width of the binary window. Further research would be necessary to determine the exact effects of this limitation. It should also be noted that convolution with a rectangular window is similar to blurring the space domain. If only detection and discrimination of a target are desired and location is not necessary, a simple incoherent processor can be used. In Fig. 5, we show an incoherent detection system that is configured as a VIP camera. The object is imaged onto two SLMs using an objective Fig. 5 Idealized incoherent optical VIP operator. lens and an ideal beamsplitter. The first SLM contains the real component of the VIP operator and the second SLM contains the imaginary component. Both SLMs operate in amplitude-only mode, so the negative values of the operators are preserved by first adding an offset h off and later removing the offset s effect electronically. The multiplications performed by the SLMs are individually optically integrated and projected onto point detectors. The outputs of the point detectors are combined, with offset removed, and weighted as described for VIP bank operation. In this example the removed offset is input dependent and requires updating every other frame. 5 Vip Operator Banks Results We demonstrate the performance of the VIP operators by using character inputs with and without noise, and perform Monte Carlo simulations. Performance comparisons are made between the VIP operator bank and the conventional matched filter MF. Four experiments are presented and are itemized as follows: 1. Target and clutter are represented as 8 8 images zero padded to test inputs. Discrimination performance is demonstrated. 2. A large test input with systematically offset ML, and AF target and clutter, respectively, is used to demonstrate translation invariance, detection, and discrimination. 3. The discrimination performance of the VIP operator bank is evaluated for a variable number of operators. Performance is compared with a conventional matched filter. 4. Like experiment 3 but with 6 db input SNR, 100 runs of 1 to 15 operators in the bank. Performance is compared with a conventional matched filter. For the first experiment we use the upper left image ML of Fig. 1 as the target input and a similar image AF as the clutter input. Each of the 8 8 input images 2706 Optical Engineering, Vol. 35 No. 9, September 1996

8 Fig. 6 Example of 2-D tiling used for testing. is circularly shifted by rows and columns, as shown in Fig. 1, to generate 64 target and 64 clutter training sets, respectively. The VIP operators are generated and the best four operators are selected based on their DNR and SNR metrics described in Sec. 2. The operator bank is tested by tiling the operator and zero padding the original inputs to 3N 3N, as shown in Fig. 6. The product of the element-by-element multiplication is convolved with an 8 8 binary window. The output of the operator bank to the test target input is shown in Fig. 7 as a 3-D surface where the height represents intensity and the lateral dimensions correspond to position in the output plane. The output for the clutter input is shown in Fig. 8. Both target and clutter responses are scaled by the same amount so that the target peak is unity. As shown in Figs. 7 and 8, the VIP operator bank discriminates well between the target and clutter classes. To verify that the VIP operator bank is indeed translation invariant, a more complicated scene shown in Fig. 9 is used as input. This scene has 8 target and 8 clutter images, which are linearly shifted across tile boundaries, as indicated by a gray grid. The gray grid is not part of the input scene but was superimposed as a visual aid. The corresponding output intensity plane is shown in Fig. 10. In Fig. 10, the intensity is encoded as a gray level where white is maximum and black is minimum. Note that all 8 targets are equal peak intensity and clearly higher than the 8 clutter responses. To appreciate the tradeoff between discrimination performance and the number of operators used in a bank, a Fig. 8 VIP Response of four operators to a simple zero padded clutter input. series of numerical simulations are performed with a variable number of operators. These operators are selected by DNR and then sorted by output SNR as described in Sec. 2. The peak intensity response to target and clutter is shown in Fig. 11. The responses are normalized so that the target response is always unity. The circles in Fig. 11 represent the target intensity response for both MF and VIP operator bank. The crosses indicate the VIP clutter intensity response and the triangles indicate the MF clutter peak intensity response. Note that with only two VIP operators, the discrimination performance achieves that of an MF. Even though the MF uses only one filter, two VIP operations are numerically more efficient, as discussed in Sec. 6. We present an additional experiment testing the VIP operator bank with input noise. The simulations are set up according to experiment 3, but the input image is corrupted with zero mean additive white Gaussian noise. The variance of the noise is selected to provide a 6-dB input SNR Fig. 7 VIP Response of four operators to a simple zero padded target input. Fig. 9 A large test input in which target images are placed across filter boundaries. A gray grid was superimposed on the scene to identify where the tiled operator boundaries lie. Optical Engineering, Vol. 35 No. 9, September

9 Fig. 10 The VIP operator bank intensity response to Fig. 9 for four operators. within the 8 8 pixel region of the input image. The results of the experiment are presented as error bar plots in Fig. 12. The mean target peak intensity responses are scaled to unity and the bottom solid line denoted by the crosses represent the mean intensity response to the clutter for the VIP bank. The vertical bars are the standard deviations associated with the number of operators used in the VIP bank. The mean and standard deviations are computed from 100 runs, where in each run the noise seed is changed. The two dotted line pairs indicate the standard deviations of the matched filter for the target and the clutter. The line denoted by indicates the mean intensity response to the clutter for the matched filter. The mean values of the VIP bank in Fig. 12 are close to the responses without input noise. The matched filter is optimum for detection in white noise, and with this in mind the standard deviation of the matched filter is less than the deviation of the suboptimal VIP operator bank. Although Fig. 11 Peak intensity response as a function of the number of operators. The target and clutter responses are the lines denoted by circles and x s respectively. The center solid line denotes the classic matched filter clutter response. Fig. 12 Error bar plot of VIP peak intensity responses as a function of the number of sorted operators in the bank. The top solid line represents the mean of the peak intensity response for target, whereas the bottom solid line is the mean of the peak response for clutter. The vertical bars indicate standard deviations. The dotted line is the mean of matched filter intensity responses and the dashed lines are the associated deviations for the same noise. the VIP system is suboptimal in output SNR, its performance asymptotically approaches that of the MF with only a few operators. 6 Computational Savings Using VIP Operator Banks Space domain correlation is straightforward but numerically intensive. Computational savings of VIP operator banks are dramatic. The VIP method does not require iteration or complicated addressing mechanisms like those in the fast Fourier transform FFT -based implementation of correlation. Conventional correlation may be performed directly in the signal space or indirectly in the frequency domain by utilizing FFTs. For comparison, a 2-D frequency domain correlation of an L L scene requires L 2 (3/2)L 2 log 2 L complex multiplications and 4L 2 log 2 L complex additions, whereas the VIP operator bank requires L 2 K complex multiplications and 2L 2 K complex additions. The number of VIP operations K ranges from 1 to N/2 where the object image is N N. The 2-D FFT in the above discussion is based on a radix 2 by 2, and as a result the scene dimensions must be a power of 2 for the frequency domain correlation. For comparison, using the experiment of Fig. 7, we interpolate this FFT performance using the earlier discussion. By setting the scene size to L 24, for the FFT method, the number of complex multiplications approximately equals 4540 and complex additions approximately equal 10,560. For the VIP experiments presented here, we only used four operators, so K 4 and L 24. There are approximately 2304 complex multiplications and 4608 complex additions that are required for four VIP operators. Thus there is a computational savings by using VIP operator banks by approximately a factor of 2.0 for multiplications and a factor of 2.3 for additions. For larger scenes, the output SNR of both MF and VIP systems remains the same 2708 Optical Engineering, Vol. 35 No. 9, September 1996

while the computational efficiencies change. For example, the 80 80 scene of Fig. 9 results in operation ratios of 2.6 and 3.2 for complex multiplications and additions, respectively.

operators. The resulting VIP system performance is on the order of O(M 2 ) while the FFT method is O M 2 log(m).

10 while the computational efficiencies change. For example, the scene of Fig. 9 results in operation ratios of 2.6 and 3.2 for complex multiplications and additions, respectively. The numerical efficiency of the VIP system is even more impressive when considering a scene size L MN where M is variable, N is fixed constant by the object size, and K is a constant number of VIP operators. The resulting VIP system performance is on the order of O(M 2 ) while the FFT method is O M 2 log(m). The tradeoff for this improved performance is suboptimal output SNR, as demonstrated in Fig Conclusions This novel approach performs shift-invariant signal detection without conventional correlation. Only a vector inner product, an element-by-element multiplication and a circular convolution with a small binary window are required. This approach is computationally efficient i.e., O(M 2 ) over traditional methods, which include frequency domain correlation using FFT algorithms i.e., O M 2 log(m). The generation and implementation of the VIP operator bank structure is accomplished with only matrix multiplication and magnitude operations. No matrix inversion, singular-value decomposition, or iteration is required for VIP operator design. In addition to numerical efficiency, the VIP system implies new coherent and incoherent optical architectures, of which we have mentioned two ideal cases. The tradeoff for these virtues is the loss in output SNR. That is, for high input SNR problems, a single VIP operator may suffice, while for lower input SNR problems, more VIP operators may be required to maintain performance. In conclusion, the VIP operator architecture is a profoundly different approach to translation-invariant pattern recognition and gives rise to significantly different and highly efficient design and implementation techniques that deserve further attention in future research. Acknowledgments This research was supported through National Aeronautics and Space Administration Cooperative Agreement NCCW- 60. References 1. C. F. Hester and D. Casasent, Multivariant technique for multi-class pattern recognition, Appl. Opt. 19, A. Mahalanobis, B. V. K. Vijaya Kumar, and D. Casasent, Minimum average correlation energy filters, Appl. Opt. 26, Z. Bahri and B. V. K. Vijaya Kumar, Generalized synthetic discriminant functions, J. Opt. Soc. Am. A 5, B. V. K. Vijaya Kumar, A. Mahalanobis, S. Song, S. R. F. Sims, and J. F. Epperson, Minimum squared error synthetic discriminant functions, Opt. Eng. 31, B. V. K. Vijaya Kumar, Tutorial survey of composite filter designs for optical correlators, Appl. Opt Y. Hsu, H. H. Arsenault, and G. April, Rotation-invariant digital pattern recognition using circular harmonic expansion, Appl. Opt. 21, G. F. Schils and D. W. Sweeney, An optical processor for recognition of 3-D viewed from any direction, J. Opt. Soc. Am. A 5, L. Hassebrook, D. V. K. Vijaya Kumar, and Larry Hostetler, Linear phase coefficient composite filter banks for distortion-invariant optical pattern recognition, Opt. Eng Y. Hsu, H. H. Arsenault, and G. April, Statistical performance of circular harmonic filter for rotation-invariant pattern recognition, Appl. Opt R. Wu and H. Stark, Rotation-invariant pattern recognition using a vector reference, Appl. Opt L. Hassebrook and M. Rahmati, Training set selection with multiple out-of-plane rotation parameters, Proc. SPIE , M. Rahmati and L. G. Hassebrook, Intensity- and distortioninvariant pattern recognition with complex linear morphology, Pattern Recog. 27 4, P. J. Davis, Circulant Matrices, Wiley, New York L. G. Hassebrook, M. Rahmati, and B. V. K. Vijaya Kumar, Hybrid composite filter banks for distortion-invariant optical pattern recognition, Opt. Eng. 31, L. G. Hassebrook, Design of distortion-invariant linear phase response correlation filters and filter banks, pp , Ph.D. Dissertation, Carnegie-Mellon University, Pittsburgh, PA Z. Bahri and B. V. K. Vijaya Kumar, Generalized synthetic discriminant functions, J. Opt. Soc. Am. A 5, A. Vander Lugt, Signal detection by complex spatial filtering, IEEE Trans. Information Theory 10, Michael E. Lhamon received a BSEE degree from the Ohio Northern University in 1990 and an MSEE from the Georgia Institute of Technology in He worked for the National Radio Astronomy Observatory at the Very Large Array in Socorro, New Mexico, during 1989 and He is currently working toward his PhD at the University of Kentucky. His present interests include automatic object recognitiondiscrimination and real-time digital and optical signal processing. He is a member of SPIE and the Institute of Electrical and Electronics Engineers. Laurence G. Hassebrook received a BSEE from the University of Nebraska, in 1979, an MSEE from Syracuse University in 1987, and a PhD in electrical and computer engineering from Carnegie Mellon University in He worked at Lincoln Electric System Corporation from 1980 to 1981 in development and automation of load analysis systems. From 1981 to 1984 and then summers from 1985 through 1987, he worked in nondestructive testing at IBM Corporation in Endicott, New York. Since 1990, he has been a professor at the University of Kentucky where he is presently an associate professor of electrical engineering. His interests are in signal processing, structured light illumination, and automatic object recognition and discrimination. He is a member of the Pattern Recognition Society, SPIE, the Optical Society of America, and the Institute of Electrical and Electronics Engineers. Optical Engineering, Vol. 35 No. 9, September

Digital Image Processing

Digital Image Processing 2D SYSTEMS & PRELIMINARIES Hamid R. Rabiee Fall 2015 Outline 2 Two Dimensional Fourier & Z-transform Toeplitz & Circulant Matrices Orthogonal & Unitary Matrices Block Matrices