COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR

Similar documents
Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Introduction to Convolutional Neural Networks (CNNs)

Convolutional Neural Networks

Convolutional Neural Networks. Srikumar Ramalingam

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Introduction to (Convolutional) Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

P-TELU : Parametric Tan Hyperbolic Linear Unit Activation for Deep Neural Networks

arxiv: v1 [cs.lg] 11 May 2015

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural networks and support vector machines

Understanding How ConvNets See

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

Lecture 17: Neural Networks and Deep Learning

An overview of deep learning methods for genomics

Reading Group on Deep Learning Session 1

Convolutional neural networks

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

Deep Learning (CNNs)

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Introduction to Deep Neural Networks

Lecture 7 Convolutional Neural Networks

Lecture 35: Optimization and Neural Nets

Convolutional Neural Network Architecture

Introduction to Neural Networks

Jakub Hajic Artificial Intelligence Seminar I

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Multiple Wavelet Coefficients Fusion in Deep Residual Networks for Fault Diagnosis

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

RegML 2018 Class 8 Deep learning

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Lecture 14: Deep Generative Learning

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Bayesian Networks (Part I)

Computational statistics

Deep Feedforward Networks

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Neural networks and optimization

arxiv: v1 [cs.cv] 11 May 2015 Abstract

Neural Networks and Deep Learning.

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Machine Learning Lecture 14

Handwritten Indic Character Recognition using Capsule Networks

Deep Learning: a gentle introduction

Final Examination CS 540-2: Introduction to Artificial Intelligence

Statistical Machine Learning

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural networks (NN) 1

Neural Networks and Ensemble Methods for Classification

Feature Design. Feature Design. Feature Design. & Deep Learning

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CSCI 315: Artificial Intelligence through Deep Learning

Introduction to Deep Learning

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Lecture 3 Feedforward Networks and Backpropagation

Deep Convolutional Neural Networks for Pairwise Causality

Maxout Networks. Hien Quoc Dang

Nonlinear Classification

Lecture 5 Neural models for NLP

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Lecture 5: Logistic Regression. Neural Networks

From perceptrons to word embeddings. Simon Šuster University of Groningen

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

CSE446: Neural Networks Winter 2015

Deep Neural Networks (1) Hidden layers; Back-propagation

Spatial Transformer Networks

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

STA 414/2104: Lecture 8

CSC321 Lecture 5: Multilayer Perceptrons

Neural Networks (Part 1) Goals for the lecture

ECE521 Lectures 9 Fully Connected Neural Networks

Final Examination CS540-2: Introduction to Artificial Intelligence

Spatial Transformation

Feed-forward Network Functions

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Introduction to Deep Learning

Machine Learning Lecture 12

Artificial Neural Networks. MGS Lecture 2

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Ch.6 Deep Feedforward Networks (2/3)

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Iterative Laplacian Score for Feature Selection

CSC 578 Neural Networks and Deep Learning

SGD and Deep Learning

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Transcription:

COMPLEX INPUT CONVOLUTIONAL NEURAL NETWORKS FOR WIDE ANGLE SAR ATR Michael Wilmanski #*1, Chris Kreucher *2, & Alfred Hero #3 # University of Michigan 500 S State St, Ann Arbor, MI 48109 1 wilmansk@umich.edu, 2 ckreuche@umich.edu, 3 hero@umich.edu * Integrity Applications Incorporated 900 Victors Way, Suite 220, Ann Arbor, MI, 48108 ABSTRACT To date, automatic target recognition (ATR) techniques in synthetic aperture radar (SAR) imagery have largely focused on features that use only the magnitude part of SAR s complex valued magnitude-plus-phase history. While such techniques are often very successful, they inherently ignore the significant amount of discriminatory information available in the phase. This paper describes a method for exploiting the complex information for ATR by using a convolutional neural network (CNN) that accepts fully complex input features. We show a performance leap from 87.30% to 99.21% accuracy on real collected wide-angle SAR data with the use of complex features. Index Terms Complex-valued Deep Learning, Convolutional Neural Networks, Complex Feature Extraction, Wide-angle SAR 1. INTRODUCTION This paper describes an approach for fully exploiting complex synthetic aperture radar (SAR) data using a convolutional neural network (CNN). While magnitudeonly CNNs have been successful [3-5] for SAR ATR, ignoring phase potentially sacrifices useful information that can aid classification. This motivates our proposed approach that fully exploits complex data. We show experimentally that this complex-input approach provides a substantial classification improvement over magnitude-only inputs for a set of collected wide-angle SAR data. The paper proceeds as follows. First, we describe the magnitude-only CNN approach to ATR [4]. The input data is analogous to optical grayscale images, allowing CNN techniques developed for image classification to be used without modification. Second, we describe the proposed CNN that uses complex-valued inputs, based on [1,2]. In this implementation, the first convolutional layer is fully complex. It implements a nonlinearity that produces real values at the output, and the rest of the network is made up of ordinary convolution and pooling layers. Finally, we show experimental results for each approach that demonstrate that using complex features can aid in classification. 2. CNN CLASSIFIER WITH MAGNITUDE ONLY Recent CNN-based SAR classification schemes [3-5] use magnitude-detected input data and leverage network designs originally designed for greyscale natural imagery. We briefly describe this approach here in order to contrast it to the fully complex CNN we present later. As such, we focus on the pertinent parts of the CNN implementation, namely the input normalization and CNN topology, which are most distinct from the complex-valued CNN. For more details on magnitude-only CNN ATR implementations see [4]. 2.1 Normalization Before input images are used by the CNN, they are normalized so that pixel values fall between a fixed range, e.g., [-0.5, 0.5], with an image-wide average of 0. Let p(i, j) represent the unnormalized (magnitude-detected, realvalued) pixel values used as input. Let there be N total pixels. Then the normalized feature p n (i, j) is p n (i, j) = p(i, j) 1 N i,j p(i, j). (1) p(i, j) 2.2 Network Topology max i,j A CNN consists of several multi-resolution layers, each using a pooling method to pass information forward from layer to layer. A typical CNN topology for SAR image classification is summarized in Table 1 [3]-[5]. The numbers of network layers and filters in Fig. 5 are smaller than many general purpose CNN implementations to protect against overfitting for small size training image datasets typically available in SAR ATR. We implement the pooling layers with the well-established Stochastic Pooling [6] algorithm to further guard against overfitting. The output layer implements the CNN classification stage using the Softmax function.

Layer Type Image Size Feature Maps Kernel Size Input 80 x 80 1 - Convolutional 72 x 72 18 9 x 9 Pooling 18 x 18 18 4 x 4 Convolutional 12 x 12 36 7 x 7 Pooling 4 x 4 36 3 x 3 Convolutional 1 x 1 120 4 x 4 Fully Connected 1 120 - Fully Connected 1 120 - Output 1 10 - Table 1: Topology of the magnitude-only CNN. 3. COMPLEX CNN There are several ways to implement a CNN that uses complex inputs. The main difficulty lies in specification of the activation function, which allows networks to learn highly non-linear functions for classification as well as produce a weight-error gradient used for training. A successful activation function must be both nonlinear and differentiable. The most common modern activation functions are rectified linear units (ReLU) [7] and the close variants (e.g. Leak-ReLU [8], PReLU [9]). Neither these nor classic activation functions such as Sigmoid and hyperbolic tangent are differentiable for complex inputs. Some researchers suggest splitting real and imaginary parts and processing each as a real number [10,11] or close variations [12]. Others suggest activation functions that ignore magnitude [13]. The problem with split activation functions is that phase information is overly distorted. The issue with ignoring magnitude is that magnitude contains significant information, defeating the point of a complex CNN, which attempts to make use of all available information. These facts are illustrated in Figure 2 below, which shows the magnitude and phase information of an example from the GOTCHA dataset [14]. It is clear that both magnitude and phase contain structure, which suggests there is discriminatory information in both parts. produces real values. The rest of the network is made up of ordinary convolution and pooling layers. The complex layer has two filters for every feature map produced by the first hidden layer. Let A i denote the matrix resulting from convolving the real part of our complex input image with the first filter of the i th node and let B i represent the matrix resulting from convolving the imaginary part of the complex input with the second filter of the i th node. Our nonlinearity function for the i th node is f(a i, B i ) = A i 2 + B i 2 (2) The error gradients for training use the partial derivatives f(a i, B i ) A i = A i f(a i, B i ) and f(a i, B i ) B i = B i f(a i, B i ) The feature maps produced here look distinctly different from a magnitude-only CNN, as shown in Figure 2 below. (3) Figure 2: Detected image of a 5 quad trihedral labeled stri01_1 (L), feature maps of first hidden layer for a greyscale CNN (C), feature maps of first hidden layer for a complex-input CNN (R) 3.1 Normalization Normalization is handled differently in the proposed complex-valued CNN because subtracting the mean pixel value from every individual pixel distorts the phase information. Instead we use a scaling normalization p n (i, j) = p(i, j)/ max p(i, j). (4) i,j 3.2 Network Topology In contrast to the magnitude-only CNN described above, the first hidden layer is now a complex convolutional layer, as described above. Since the output of the complex convolutional layer is real-valued, the succeeding layers can be normal convolutional and pooling layers. See Figure 5 for the detailed complex CNN anatomy. 4. EXPERIMENTAL RESULTS Figure 1: Magnitude (L) and phase (R) of tophat1 Our proposed complex-valued CNN is similar to an approach described in [1,2]. It is distinguished from these earlier efforts in that we do not use the absolute value function as the nonlinearity after the first hidden layer. In our complex-valued CNN implementation, only the first layer is complex-valued and provides a nonlinearity that We use wide angle SAR data from the GOTCHA collect as in [14]. The dataset contains a set of civilian vehicles and a set of reflectors for target discrimination challenges. The vehicle set contains many moving targets and often has multiple targets per chip. To avoid multiple and or moving targets, that would require using motion-compensation techniques, we use the basic backprojection function

included in the GOTCHA distribution to synthesize composite images of the reflector set. We exclude SAR chips under the labels tophat2a and tophat2b because the targets are moving, and we exclude chips under label tophat_2_3 because the resolution is different and there are multiple targets per chip. We combine all instances of like shapes into single classes. The result is 651 complex chips of 80x80 pixels divided into 6 classes. We then split the set into 80% training and 20% testing sets of 525 and 126 chips respectively. The number of instances of each class is proportional between the training and testing sets. We trained the CNN s using stochastic mini-batches of size 35, resulting in 15 iterations per epoch. Each test involved training for 130 epochs (1950 iterations), tracking the loss value on the training set for another 10 epochs, then choosing the epoch that resulted in the lowest training set loss as the version of the network to test on. 4.1 Magnitude-only CNN Figure 3: Output entropy for our magnitude-only CNN 4.2 Complex CNN Accuracy on the training set is 100%, while testing accuracy is 99.21%. Table 3 shows the testing set confusion matrix and Figure 4 shows the histogram of prediction entropy. For the magnitude-only CNN the training set accuracy is 100%, while the testing set accuracy is 87.30%. Table 2 shows the testing set confusion matrix. Table 3: Confusion matrix for our complex CNN Table 2: Confusion matrix for the magnitude-only CNN An interesting metric is the entropy of the output prediction between classes, especially when comparing chips classified correctly and chips classified incorrectly. Low entropy among correctly classified chips is desirable, as it means that the network made a correct prediction with high certainty. Consequently, high entropy among misclassified chips is desirable as it means that incorrect predictions were made with low certainty and this is potentially detectable to the user. For probability distribution p(x), the entropy is defined as E(p(x)) = p(x)log (p(x)) (5) For our 6-class scenario, total uncertainty corresponds to an entropy value of 1.8; a 50/50 split between two classes is ~0.7. Figure 3 compares the entropy of correct predictions to incorrect predictions for the magnitude-only case. Figure 4: Output entropy for our complex CNN 5. CONCLUSION Figure 1 shows that there is structure to the phase in the area of the target, suggesting that there is discriminatory information to glean from it. From this, we expect that making use of the phase information could lead to better results, as seen in the complex CNN experiment. It is also interesting that the entropy of correct predictions decreased when the complex information was exploited. This suggests that the fully-complex approach has more desirable prediction entropy properties.

It is difficult to draw a conclusion for incorrect prediction entropy results since there are fewer incorrect predictions, but the outcomes seem comparable. Our future work will focus on validating the performance improvement of the proposed complex-valued CNN on a larger database of SAR images. 6. ACKNOWLEDGEMENT The authors would like to thank Dr. Mark Tygert for helpful discussions about his implementation of neural networks with complex-valued input features. This work was partially supported by ARO Grant W911NF-15-1-0479. Figure 5: Detailed anatomy of the complex-input CNN; the CNN block diagram was inspired by figure 3 of [15] Figure 6: A random subset of training examples (L) and a random subset of testing examples (R) of the reflectors set of [14]; all are magnitude-detected and without normalization

7. REFERENCES [1] M. Tygert, J. Bruna, S. Chintala, Y. LeCun, S. Piantino and A. Szlam, "A Mathematical Motivation for Complex- Valued Convolutional Networks", Neural Computation, vol. 28, no. 5, pp. 815-825, 2016. [2] M. Tygert, A. Szlam, S. Chintala, M. Ranzato, Y. Tian, and W. Zaremba, Convolutional networks and learning invariant to homogeneous multiplicative scalings, CoRR, vol. 1506.08230, 2015. [13] I. Aizenberg, Complex-valued neural networks with multi-valued neurons. Berlin: Springer, 2011. [14] K. Dungan, J. Ash, J. Nehrbass, J. Parker, L. Gorham and S. Scarborough, "Wide angle SAR data for target discrimination research", Algorithms for Synthetic Aperture Radar Imagery XIX, 2012. [15] M. Zeiler and R. Fergus, "Visualizing and Understanding Convolutional Networks", CoRR, vol. 1311.2901, 2013. [3] D. Morgan, "Deep convolutional neural networks for ATR from SAR imagery", SPIE Proceedings: Algorithms for Synthetic Aperture Radar Imagery XXII, vol. 9475, 2015. [4] M. Wilmanski, C. Kreucher and J. Lauer, "Modern approaches in deep learning for SAR ATR", SPIE Proceedings: Algorithms for Synthetic Aperture Radar Imagery XXIII, vol. 9843, Baltimore, 2016. [5] S. Chen, H. Wang, F. Xu and Y. Q. Jin, "Target Classification Using the Deep Convolutional Networks for SAR Images," in IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 8, pp. 4806-4817, 2016. [6] M. Zeiler and R. Fergus, "Stochastic Pooling for Regularization of Deep Convolutional Neural Networks", CoRR, vol. 1301.3557, 2013. [7] K. Jarrett, K. Kavukcuoglu, M. Ranzato and Y. LeCun, "What is the best multi-stage architecture for object recognition?," 2009 IEEE 12th International Conference on Computer Vision, pp. 2146-2153, Kyoto, 2009. [8] A. Maas, A. Hannun and A. Ng, "Rectifier Nonlinearities Improve Neural Network Acoustic Models", Stanford University Computer Science Department, 2013. [9] K. He, X. Zhang, S. Ren and J. Sun, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification", CoRR, vol. 1502.01852, 2015. [10] R. Hansch, "Complex-Valued Multi-Layer Perceptrons An Application to Polarimetric SAR Data", EUSAR, vol. 76, no. 9, pp. 1081-1088, 2010. [11] R. Haensch and O. Hellwich, "Complex-Valued Convolutional Neural Networks for Object Detection in PolSAR data," Synthetic Aperture Radar (EUSAR), 2010 8th European Conference on, pp. 1-4, Aachen, Germany, 2010. [12] T. Masters, Deep Belief Nets in C++ and Cuda C Vol. II : Autoencoding in the Complex Domain. CreateSpace Independent Publishing Platform, 2015.